Research in Scalable Computing - Computer Science

GHS: A Performance Prediction and
Task Scheduling System for Grid
Computing
Xian-He Sun
Department of Computer Science
Illinois Institute of Technology
[email protected]
SC/APART Nov. 22, 2002
Outline
• Introduction
Concept and challenge
• The Grid Harvest Service (GHS) System
–
–
–
–
Design methodology
Measurement system
Scheduling algorithms
Experimental testing
• Conclusion
Scalable Computing Software Laboratory
Introduction
• Parallel Processing
– Two or more working entities work together
toward a common goal for a better performance
• Grid Computing
– Use distributed resources as a unified compute
platform for a better performance
• New Challenges of Grid Computing
– Heterogeneous system, Non-dedicated
environment, Relative large data access delay
Degradations of Parallel Processing
Unbalanced Workload
Communication Delay
Overhead Increases with the Ensemble Size
Degradations of Grid Computing
Unbalanced Computing Power and Workload
Shared Computing and Communication Resource
Uncertainty, Heterogeneity, and Overhead Increases
with the Ensemble Size
Performance Evaluation
(Improving performance is the goal)
• Performance Measurement
– Metric, Parameter
• Performance Prediction
– Model, Application-Resource, Scheduling
• Performance Diagnose/Optimization
– Post-execution, Algorithm improvement,
Architecture improvement, State-of-the-art
Parallel Performance Metrics
(Run-time is the dominant metric)
•
•
•
•
Run-Time (Execution Time)
Speed: mflops, mips, cpi
Efficiency: throughput
Speedup
Uniprocess or Execution Time
Sp 
Parallel Execution Time
• Parallel Efficiency
• Scalability: The ability to maintain performance gain when
system and problem size increase
• Others: portability, programming ability,etc
Parallel Performance Models
(Predicting Run-time is the dominant goal)
• PRAM (parallel random-access model)
– EREW, CREW, CRCW
• BSP (bulk synchronous parallel) Model
– Supersteps, phase parallel model
• Alpha and Beta Model
–

comm. startup time,

data trans. time per byte
• Scalable Computing Model
– Scalable speedup, scalability
• Log(P) Model
– L-latency, o-overhead, g-gap, P-the number of processors
• Others
Research Projects and Tools
• Parallel Processing
–
–
–
–
–
Paradyn, W3 (why, when, and where)
TAU, tuning and analysis utilities
Pablo, Prophesy, SCALEA, SCALA, etc
for dedicated systems
instrumentation, post-execution analysis,
visualization, prediction, application
performance, I/O performance
Research Projects and Tools
• Grid Computing
– NWS (Network Weather Service)
• monitors and forecasts resource performance
– RPS (Resource Prediction System)
• predicts CPU availability of a Unix system
– AppLeS (Application-Level Scheduler)
• A application-level scheduler extended to nondedicated environment based on NWS
– Short-term system-level prediction
Do We Need
• New Metric for Computation Grid ?
– ????
• New Model for Computation Grid ?
– Yes
– Application-level performance prediction
• New Model for other Technical Advance?
– Yes
– Date access in hierarchical memory systems
The Grid Harvest Service (GHS) System
Sun/Wu 02
• A long-term application-level performance
prediction and scheduling system for non-dedicated
(Grid) environments
• A new prediction model derived by probability
analysis and simulation
• Non-intrusive measurement and scheduling
algorithms
• Implementation and testing
Performance Model (Gong,Sun,Watson,02)
• Remote job has low priority
• Local job arriving and service time based on extensive
monitoring and observation
wk
ws(k)
t
X1
Y1
Tk
XS
YS
Tk  X 1  Y1  X 2  Y2    X S  YS  Z
Tk  w  Y1  Y2    YS
Z
Predication Formula
Pr(Tk  t)
= Pr(Tkt | Sk=0)Pr(Sk = 0) + Pr(Tk  t | Sk>0)Pr(Sk > 0)
 e-wk + (1-e-wk)Pr(U(S ) t-w |S >0), if t  w
k
k k
k

= 
if t < wk
 0,
• Arrival of local jobs follow a Poisson distribution with rate  k
• Execution time of the owner job follows a general distribution
with mean 1  k and standard deviation  k
• Simulate the distribution of the local service rate, approaches
with a know distribution
Uk(S)|Sk>0
Gamma distribution
Prediction Formula
• Parallel task completion time
 m k wk
 (1  e k wk ) Pr(U ( S k )  t  wk | S k  0)],
 [e
Pr(T  t )   k 1
0,

if .t  wmax
otherwise
• Homogeneous parallel task completion time
[e  w  (1  e  w ) Pr(U ( S )   | S  0)] m ,
Pr(T  t )  
0,
where,   t  w
• Mean time balancing partition
wk 
W
m
 (1  
k 1
k
) k
(1   k ) k
if
 0
otherwise
Measurement Methodology
• A parameter x has a population with a mean and a
standard deviation, a confidence interval for the
population mean is given
( x  z1 / 2 d
n , x  z1 / 2 d
n)
• The smallest sample size n with a desired confidence
interval and a required accuracy r is given
100 z1 / 2 d 2
n(
)
rx
Measurement and Prediction of Parameters
• Utilization
• Job Arrival
i
i
J arrival J between  J start
i 

Tint erval
Tint erval
• Standard Deviation of Service Rate
• Least-Intrusive Measurement
Adapt _ avg( x, t ) 
t
1
i t  23
| t |
 x
t
| 
i
i
| i t 23
j 1
ij
Select previous N a days, {d1 , d 2,  d N } in the system
measurement history;
1 |X |
For each day d k (1  k  N ) , p(d k )  X  pi
i 1
where  means the set of p i measured during the
time interval (t1 , t 2 ) beginning from the day d k ;
a
End For
1
ps 
Na
Na
 p(d )
i 1
i
Select previous N b continuous time interval (t , t )
before (t , t ) , calculate p(d )  X1  p where  means the set
of p measured during (t m , t m1 ) ;
m
|X |
1
2
m
i 1
i
i
1
pr 
Nb
output
Nb
 p(d )
i 1
i
 * p s   * pr while (   )  1 and  ,   0
m 1
Scheduling Algorithm
Scheduling with a Given Number of Sub-tasks
List a set of lightly loaded machines M  {m , m  m } ;
List all possible sets of machines, such as | S | p
For each machine set S (1  k  z ) ,
Use mean time balancing partition to partition the task
Use the formula to calculate the mean and coefficient of variation
If E(T )(1  Coe.(T )) > E (T )(1  Coe.(T )) , then p   k ;
End For
Assign parallel task to the machine set S p ;
1
i
k
S p
S p
Sk
Sk
2,
q
Optimal Scheduling Algorithm
List a set of lightly loaded machines M  {m , m
While p  q do
Scheduling with p Sub-tasks
If E(T )(1  Coe.(T )) > E(T )(1  Coe.(T )), then
p  p ;
End If
End while

Assign parallel task to the machine set S k p .
1
Sk  p

Sk  p

Sk p
Sk p
2,
 mq }
;
Heuristic Scheduling Algorithm
•
•
•
•
List a set of lightly loaded machines M  {m , m  m } ;
Sort the machines in a decreasing order with (1   k ) k ;
Use the task ratio to find the upper limit q ;
Use bi-section search to find the p such as
1
E (TS p )(1  Coe.(TS p ))
k
is minimum
k
2,
q
Embedded in Grid Run-time System
Experimental Testing
Application-level Prediction
Measurement
prediction error (%)
|
Pr ediction period  Measurement
140
120
100
80
60
40
20
0
-20
|
expectation+variation
expectation-variation
expectation
0.5
1
2
4
8
rem ote task execution tim e
(hours)
Remote task completion time on single machine
200
expectation+variation
100
expectation
expectation-variation
512
32
8
128
-100
2
0
0.5
prediction(%)
300
-200
parallel task execution tim e
(hours)
prediction error(%)
Prediction of parallel task completion time
20
15
expectation+variation
expectation-variation
10
expectation
5
0
4
8
16
parallel task execution
tim e (hours)
Prediction of a multi-processor with local scheduler
Partition and Scheduling
execution time (m)
500
400
300
equal-load
(heterogeneous)
200
mean-time
100
equal-load
0
1
2
4
8
task demand (hours)
execution time (m)
Comparison of three partition approaches
500
400
300
equal-load
(heterogeneous)
200
mean-time
100
equal-load
0
1
1
2
2
4
4
8
8
task demand (hours) on
machine A and B respectively
execution time (second)
Performance Gain with Scheduling
1800
1600
1400
1200
1000
800
600
400
200
0
optimal
random (5 machines)
random (10 machines)
random (15 machines)
20 machines
heuristic
10
15
20
machine number
Execution time with different scheduling strategies
Cost and Gain
18
16
14
12
number of
meas urment per
hour
10
8
6
4
2
19
16
13
10
7
4
1
0
Measurement reduces when system steady
Node
Number
Time (s)
8
16
32
64
128
256
512
1024
0.00
0.01
0.02
0.04
0.08
0.16
0.31
0.66
The calculation time of the prediction component
The GHS System
• A Good Sample and Successful Story
– Performance modeling
– Parameter measurement and prediction schemes
– Application-level performance prediction
– Partition and Scheduling
• It has its limitation too
– Communication and data access delay
What We Know, What We Do Not
• We know there is no deterministic prediction in a nondeterministic shared environment. We do not know
how to reach a fussy engineering solution
Rule of
thumb
Stochastic
Heuristic
algorithms
AI
Statistic
Data
Mining
etc
Innovative
method
etc
Conclusion
• Application-level Performance Evaluation
– Code-machine versus machine, alg., alg.-machine
• New Requirement under New Environments
We know we are making progress. We do not know if
we can keep up with the technology improvement