Static Process Scheduling

Anjum Reyaz-Ahmed
Part I : Static Process Scheduling
Precedence process model
Communication system model
Part II: Current Literary Review
"Optimizing
Static Job Scheduling in a Network of
Heterogeneous Computers," ICPP 2000
“Design Optimization of Time- and Cost- Constrained
Fault-Tolerant Distribution Embedded Systems”, DATE 2005
“White Box Performance Analysis Considering Static NonPreemptive Software Scheduling”, DATE 2009
Part III: Future Research Initiatives
Given a set of partially ordered tasks, define a mapping of
processes to processors before the execution of the processes.
Cost model: CPU cost and communication cost, both should
be specified in prior.
Minimize the overall finish time (makespan) on a non-
preemptive multiprocessor system (of identical processors)
 Except for some very restricted cases, scheduling to optimize
the makespan are NP-Complete
 Heuristic solution are usually proposed
[Chow and Johnson 1997]
This model is used to describe scheduling for ‘program’
which consists of several sub-tasks. The schedulable unit is
sub-tasks.
 Program is represented by a DAG.
 Precedence constraints among tasks in a program are
explicitly specified.
 critical path: the longest execution path in the DAG,
often used to compare the performance of a heuristic
algorithm.
[Chow and Johnson 1997]
Precedence Process and Communication
System Models
Communication overhead for A(P1) and E(P3)
=4*2=8
Execution time
Communication overhead
for one message
No. of messages
to communicate
[Chow and Johnson 1997]
 Scheduling goal:
minimize the makespan time.
Algorithms:
 List Scheduling (LS): Communication overhead is not considered.
Using a simple greedy heuristic: No processor remains idle if there
are some tasks available that it could process.
 Extended List Scheduling (ELS): the actual scheduling results of LS
with communication consideration.
 Earliest Task First scheduling (ETF): the earliest schedulable task
(with communication delay considered) is scheduled first.
[Chow and Johnson 1997]
Makespan Calculation for LS, ELS, and ETF
[Chow and Johnson 1997]
There are no precedence constrains among processes
modeled by a undirected graph G, node represent
processes and weight on the edge is the amount of
communication messages between two connected
processes.
 Process execution cost might be specified some times to
handle more general cases.
Scheduling goal: maximize the resource utilization.
[Chow and Johnson 1997]
 the
problem is to find an optimal assignment of m
process to P processors with respect to the target
function:
Cost (G, P) 
 e  p 
jV G
j
i

c p , p 

  
i, j
i , j E G
i
j
P: a set of processors. ej(pi): computation cost of
execution process pi in processor Pj.
ci,j(pi,pj): communication overhead between processes
pi and pj.
Assume a uniform communicating speed between
processors.
[Chow and Johnson 1997]
This is referred as Module Allocation problem. It is NP-
complete except for a few cases:
 For P=2, Stone suggested an polynomial time solution using
Ford-Fulkerson’s maximum flow algorithm.
 For some special graph topologies such as trees, Bokhari’s
algorithm can be used.
Known results: The mapping problem for an arbitrary number of
processors is NP-complete.
Problem
optimal polynomial time
algorithm
2 processor
Yes
2 proc. with varying load
Yes
tree-structured graph
Yes
series parallel graph
Yes
3 and more processor systems
suboptimal
yes
[Chow and Johnson 1997]
Stone’s two-processor model to achieve minimum total
execution and communication cost
 Example:
 Partition the graph by drawing a line cutting through some edges


Result in two disjoint graphs, one for each process
Set of removed edges  cut set
 Cost of cut set  sum of weights of the edges
Total inter-process communication cost between processors
 Of course, the cost of cut sets is 0 if all processes are assigned to
the same node
 Computation constraints (no more k, distribute evenly…)
 Example:
 Maximum flow and minimum cut in a commodity-flow network
 Find the maximum flow from source to destination

[Chow and Johnson 1997]
Maximum Flow Algorithm in Solving the
Scheduling Problem
[Chow and Johnson 1997]
Minimum-Cost Cut
Only the cuts that separate A and B
are feasible
[Chow and Johnson 1997]
1.
Stone uses a repetitive approach based on twoprocessor algorithm to solve n-processor problems.
2. Treat (n-1) processors as one super processor
3. The processors in the super-processor is further
broken down based on the results from previous
step.
[Chow and Johnson 1997]
Other heuristic: separate the optimization of computation
and communication.
 Assume communication delay is more significant cost
 merge processes with higher interprocess interaction into cluster of
processes
 clusters of processes are then assigned to the processor that
minimizes the computation cost
With reduced problem size, the optimal is relatively easier to solve
(exhaust search)
 A simple heuristic: merge processes if communication costs is
higher than a threshold C
 Also can put constrains on the total computation for the cluster, to
prevent over clustering.
[Chow and Johnson 1997]
Cluster of Processes
 For C = 9, We get three clusters (2,4), (1,6 )and (3,5)
 Clusters (2,4) and (1,6) must be mapped to processors A and B.
 Cluster (3,5) can be assigned to A 0r B But assigned to A due to lower
communication cost
 Total Cost = 41 ( Computation cost = 17 on A and 14 on B Communication
cost = 10)
6
(2,4))
11
(1,6))
4
(3,5))
[Chow and Johnson 1997]
Summary:
Static job scheduling schemes in a network of computers
with different speeds.
Optimization techniques are proposed for workload
allocation and job dispatching.
The proposed job dispatching algorithm is an extension
of the traditional round-robin scheme
[Tang & Chanson 2000]
Optimization for Workload Allocation
 a fraction αi of all the jobs are sent to computer ci
 where

n
i n
 i 1 and 0   i 1
[Tang & Chanson 2000]
[200]
Amount of workload for each computer proportional to
its processing speed
i 

si
n
j 1
sj
All computers are equally utilized .
Does not provide best performance
[Tang & Chanson 2000]
Beneficial to allocate a disproportional higher fraction of the
workload to the more powerful computers.
Assign new job to the machine with least normalized load
run queue length  1
proces sin g speed
it is known that jobs moved from a slow machine to a fast
machine, decreases slow machine’s utilization decreases a lot
whereas utilization of fast machine does not increase that
much
[Tang & Chanson 2000]
Random Based Job Dispatching
Newly arrived job is scheduled to run on “randomly” selected
computer
Round-Robin Based Job Dispatching
The objective here is to smooth inter-arrival intervals of
consecutive jobs .
For example suppose there are 4 computers c1, c2, c3 and c4 with
workload fractions 1/8, 1/8, 1/4 and ½ respectively.
Dispatching scheme - c4, c3, c4, c2, c4, c3, c4, c1, c4, c3, c4, c2, c4,
c3, c4, c1, ……
[Tang & Chanson 2000]
The key idea of optimizing the workload allocation scheme it
to send a disproportionately high fraction of workload to the
most powerful computers.
An analytical model is developed to derive the optimized
allocation strategy mathematically
For job dispatching an algorithm that extends round-robin to a
general case is presented
[Tang & Chanson 2000]
Synopsis
Re-execution and Replication are used for tolerating
transient faults
Processes are statically schedules and communication
are performed using the time triggered protocol
[Izosimov et al. 2005]
System Architecture
 Each node has a CPU and communication controller
running independently
 Time Triggered Communication Protocol
[Izosimov et al. 2005]
Fault-Tolerance Mechanisms
 Re-execution
 Active Replication
[Izosimov et al. 2005]
Addresses optimization of distributed embedded
systems for fault tolerance
Two fault-tolerance mechanism
Re-execution – time redundancy
Active replication – space redundancy
[Izosimov et al. 2005]
Synopsis
A novel approach for the integration of cooperative
and static non-preemptive scheduling in formal
white box analysis presented
[Viehl et al. 2009]
Use AI techniques for Static Scheduling
 Genetic Algorithm
 Simulated Annealing
References:
1. Randy Chow & Theodore Johnson . “Distributed Operating Systems &
Algorithms”. pp 156-163 Addison-Wesley 1997
2. Xueyan Tang & Samuel T. Chanson. “ Optimizing Static Job Scheduling in a
Network of Heterogeneous Computers”. pp 373- 382, icpp, IEEE 2000
3. Viacheslav Izosimov, Paul Pop, Petru Else & Zebo Peng. “ Design Optimization of
Time- and C0st-Constrained Fault Tolerant Distribution Embedded Systems”.
Design Automation and Test in Europe (DATE), IEEE, 2005
4. Alxander Viehl, Michael Pressler and Oliver Bringmann. “ White Box Performance
Analysis Considering Static Non-Preemptive Software Scheduling”. Design
Automation and Test in Europe (DATE), IEEE, 2009
 Thank you!!