Grid Scheduling

Grid Scheduling
Cécile Germain-Renaud
1
Scheduling
• Job
– A computation to run on a machine
– Possibly with network access e.g. input/output file (coarse grain) or
communication with other jobs (the DAG model)
• Schedule
– s(J) = date to begin execution of task J
– Alloc(J) = machine assigned to J
• One of the oldest Computer Science problems
• Principles of classification:
[Graham et al. Optimization and approximation in deterministic sequencing
and scheduling: A survey. Ann. Discrete Math. 5, (1979), 287-326]
• Computer-aided classification of complexity results (4536 at the time
of the paper) [Lageweg et al. Computer-Aided complexity classification of
combinational problems. CACM 11:2, 1892]
2
Classical scheduling in HPC
•
•
Context: parallel computing/computers
Application = Direct Acyclic Graph (T, E, w, c)
– T = set of sequential tasks
– E = dependence constraints
– w(t) = computational cost of task t
– c(t,t’) = communication cost (data sent from t to t’)
•
T
T’
Infrastructure
– P identical processors
– With or without preemption, dedicated (no sharing)
•
•
An optimization problem with objective function
Makespan = Total execution time S(T) = max (s(t) + w(t))
Complexity
– NP-complete for independant tasks and no communication E = vide, p =2 and c=
0
– NP-complete for UET-UCT graphs (w = c = 1)
– Very old: without communication, list scheduling provides a (2-1/p)
approximation
3
Scheduling in Institutional Grids
• Institutional: federation of ressources
– accounted-for: fair-share on the medium to long time scale is a
premium constraint
– Partially autonomous local policies must be allowed
• Grid
– Permanent regime: on-line decisions
– Large scale: strongly distributed
• Information system
• Scheduling services
• Relevant contexts
– Autonomous, multi-agents systems
– Auction algorithms
– Service Level Agreement (SLA) technology
4
EGEE gLite Scheduling
Site (node)
UI
Broker
Proc
CE
UI
Broker
UI
UI
Local
scheduler
Broker
UI
5
EGEE gLite Scheduling
Site (node)
BDII
UI
Broker
Publish
Proc
CE
UI
Broker
UI
UI
Local
scheduler
Broker
UI
6
EGEE gLite Scheduling
Site (node)
BDII
UI
UI
UI
Publish
Proc
CE
Query
Rank
Broker
Local
scheduler
UI
UI
The information published is
Static: eg which type of VO
is accepted
Dynamic: expected traversal
time
7
EGEE gLite Scheduling
Site (node)
BDII
UI
UI
UI
Publish
Proc
CE
Query
Rank
Broker
Local
scheduler
UI
UI
Rank: may be any user-defined
function, e.g. avoid « bad »
machines
Default is first locality, second
expected traversal time
8
EGEE gLite Scheduling
Site (node)
BDII
UI
Publish
Proc
CE
UI
UI
Broker
Update
Local
scheduler
UI
Query
UI
BDII broker cache
9
Not only academic
Overhead Ratio
Execution time (s)
• Long
waiting
times
• When
EGEE was
not so
heavily
loaded
10
Batch scheduling
• Very complex policies
• Maximise throughput under constraints
–
–
–
–
Weighted fair-share – VOs, type of jobs
Priorities
Hardware requirements
Advance reservations
• An indication of job duration is given by the type of
queue: infinite, long, medium, short, and exotic ones
[B. Bode et al.The Portable Batch Scheduler and the Maui
Scheduler on Linux Clusters]
11
Classical vs Grid
• (Relatively) easy:
– Throughput instead of makespan + Master-slave graph instead
of DAG allow for instance to define cyclic schedules in
polynomial time which are asymptotically optimal, but not local
[Y. Robert] [A. Rosenberg]
• Moderately difficult: information about
– Applications
– Infrastructures
• The same program on different data may run at very different speed
• The network performance is dynamic
• Really difficult
– Queues managed by local policies
– On-line decision
12
Information and Scheduling (I)
• Considerable work has been done in predicting CPU
load in shared environments – desktops, clusters,
desktop grids [P.A. Dinda, R. Wolski, J. Schopf]
– The basic technique is linear time-series analysis
q(B)
a +
zt =
f(B)(1 – B)d t
–
–
–
–
Self-similarity and epochal behavior
Usual goal is the prediction of the next value
Applied to soft real-time scheduling on shared clusters
Practical application in NWS
13
Information and scheduling (III)
• Less work on predicting the behavior of dedicated
systems
• Papers are on parallel systems, mostly based on timeseries techniques, but at least one based on a genetic
algorithm [Downey, Foster, Wolski]
• The traces are much more difficult to access
• No time slice - Irregular time series: the records are
event-driven
• Which analysis
– Average waiting time: clear but not very useful for prediction
– Fitting a distribution: not convincing for // systems
– Predicting an upper bound with a confidence interval: metric of
success?
14
Information and grid
• We cannot directly log the entire state of the system
– Access rights
– Size
• Currently available data
–
–
–
–
The lifecycle of jobs going through certain brokers
The job ranking at the same brokers
The detailed behavior of the queues on certain sites
Certain = LAL + possibly other mainstream
• Easy to get
– Summary data about the lifecycle of all jobs
– From which it could be possible to reconstruct the detailed state
and dynamic of the CE
15
What should we learn ?
• Learning besides time series make sense in a grid:
massive use of community programs instead of (?)
sparse runs of a very long and complex digital
experiment
• Information as sketched before
– Beware: not be a steady-state system
• New users, new machines, new software is the expected regime for
some years from now
• A community-based resource will tend display correlated activity
– Is there an invariant social graph? Is it a feature?
• System algorithms e.g. a site scheduler or the broker
– Validation ?
• Scheduling algorithms
– Validation ?
16