Quantifying the Unfairness Properties of SRPT Scheduling

Quantifying the Properties
of SRPT Scheduling
Mingwei Gong and Carey Williamson
Department of Computer Science
University of Calgary
Outline
Introduction
Background
Web Server Scheduling Policies
Related Work
Research Methodology
Simulation Results
Defining/Refining Unfairness
Quantifying Unfairness
Summary, Conclusions, and Future Work
July 22, 2003
2
Introduction
Web: large-scale, client-server system
WWW: World Wide Wait!
User-perceived Web response time is composed
of several components:
Transmission delay, propagation delay in network
Queueing delays at busy routers
Delays caused by TCP protocol effects
(e.g., handshaking, slow start, packet loss, retxmits)
Queueing delays at the Web server itself, which may
be servicing 100’s or 1000’s of concurrent requests
Our focus in this work: Web request scheduling
July 22, 2003
3
Example Scheduling Policies
FCFS: First Come First Serve
typical policy for single shared resource (“unfair”)
e.g., drive-thru restaurant; Sens playoff tickets
PS: Processor Sharing
time-sharing a resource amongst M jobs
each job gets 1/M of the resources (equal, “fair”)
e.g., CPU; VM; multi-tasking; Apache Web server
SRPT: Shortest Remaining Processing Time
pre-emptive version of Shortest Job First (SJF)
give resources to job that will complete quickest
e.g., ??? (express lanes in grocery store)(almost)
July 22, 2003
4
Related Work
Theoretical work:
SRPT is provably optimal in terms of mean response
time and mean slowdown (“classical” results)
Practical work:
CMU: prototype implementation in Apache Web
server. The results are consistent with theoretical
work.
Concern: unfairness problem (“starvation”)
large jobs may be penalized (but not always true!)
July 22, 2003
5
Related Work (Cont’d)
Harchol-Balter et al. show theoretical results:
For the largest jobs, the slowdown asymptotically
converges to the same value for any preemptive workconserving scheduling policies (i.e., for these jobs,
SRPT, or even LRPT, is no worse than PS)
For sufficiently large jobs, the slowdown under SRPT
is only marginally worse than under PS, by at most a
factor of 1 + ε, for small ε > 0.
[M.Harchol-Balter, K.Sigman, and A.Wierman 2002],
“Asymptotic Convergence of Scheduling Policies w.r.t.
Slowdown”, Proceedings of IFIP Performance 2002, Rome, Italy,
September 2002
July 22, 2003
6
Related Work (Cont’d)
[Wierman and Harchol-Balter 2003]:
FSP
PS
Always
Fair
PLCFS
SJF
Sometimes
Unfair
SRPT
LAS
Always
Unfair
FCFS
LRPT
[A. Wierman and M.Harchol-Balter 2003],
(Best Paper)
“Classifying Scheduling Policies w.r.t. Unfairness in an M/GI/1”,
Proceedings of ACM SIGMETRICS, San Diego, CA, June 2003
July 22, 2003
7
A Pictorial View
“asymptotic
convergence”
Slowdown
8
“crossover region”
(mystery hump)
PS
1
1-p
SRPT
0
July 22, 2003
x Job Size
y
8
1
8
Research Questions
Do these properties hold in practice for
empirical Web server workloads? (e.g., general
arrival processes, service time distributions)
What does “sufficiently large” mean?
Is the crossover effect observable?
If so, for what range of job sizes?
Does it depend on the arrival process and the
service time distribution? If so, how?
Is PS (the “gold standard”) really “fair”?
Can we do better? If so, how?
July 22, 2003
9
Overview of Research Methodology
Trace-driven simulation of simple Web server
Empirical Web server workload trace
(1M requests from WorldCup’98) for main expts
Synthetic Web server workloads for the
sensitivity study experiments
Probe-based sampling methodology
Estimate job response time distributions for
different job size, load level, scheduling policy
Graphical comparisons of results
Statistical tests of results (t-test, F-test)
July 22, 2003
10
Simulation Assumptions
User requests are for static Web content
Server knows response size in advance
Network bandwidth is the bottleneck
All clients are in the same LAN environment
Ignores variations in network bandwidth and
propagation delay
Fluid flow approximation: service time = response size
Ignores packetization issues
Ignores TCP protocol effects
Ignores network effects
(These are consistent with SRPT literature)
July 22, 2003
11
Performance Metrics
Number of jobs in the system
Number of bytes in the system
Normalized slowdown:
The slowdown of a job is its observed response time
divided by the ideal response time if it were the only
job in the system
Ranges between 1 and

Lower is better
July 22, 2003
12
Empirical Web Server Workload
1998 WorldCup: Internet Traffic Archive: http://ita.ee.lbl.gov/
Item
Value
Trace Duration
861 sec
Total Requests
1,000,000
Unique Documents
5,549
Total Transferred Bytes
3.3 GB
Smallest Transfer Size (bytes)
4
Largest Transfer Size (bytes)
2,891,887
Median Transfer Size (bytes)
889
Mean Transfer Size (bytes)
3,498
Standard Deviation (bytes)
18,815
July 22, 2003
13
Preliminaries: An Example
Number of Bytes in the System
Number of Jobs in the System
Jobs in System
...
3
2
1
0.000315
0.001048
Bytes in System
5000
...
4000
TIMESTAMP
SIZE
0.000000
3038
0.000315
949
0.001048
2240
0.004766
2051
0.005642
366
0.005872
201
0.006380
298
0.006742
1272
0.007271
597
0.008008
283
3000
0.000315 0.001048
July 22, 2003
Time
14
Observations:
The “byte
backlog” is the
same for each
scheduling policy
The busy periods
are the same for
each policy.
 The distribution
of the number of
jobs in the
system is
different
July 22, 2003
15
General Observations (Empirical trace)
Load 50%
Load 80%
Load 95%
Marginal Distribution (Num Jobs in System) for PS and SRPT:
differences are more pronounced at higher loads
July 22, 2003
16
Objectives (Restated)
Compare PS policy with SRPT policy
Confirm theoretical results in previous work
(Harchol-Balter et al.)
For the largest jobs
For sufficiently large jobs
Quantify unfairness properties
July 22, 2003
17
Probe-Based Sampling Algorithm
The algorithm is based on PASTA (Poisson
Arrival See Time Average) Principle.
PS
Slowdown (1 sample)
PS
Repeat
N
times
July 22, 2003
PS
18
Probe-based Sampling Algorithm
For scheduling policy S =(PS, SRPT, FCFS, LRPT, …) do
For load level U = (0.50, 0.80, 0.95) do
For probe job size J = (1B, 1KB, 10KB, 1MB...) do
For trial I = (1,2,3… N) do
Insert probe job at randomly chosen point;
Simulate Web server scheduling policy;
Compute and record slowdown value observed;
end of I;
Plot marginal distribution of slowdown results;
end of J;
end of U;
end of S;
July 22, 2003
19
Example Results for 3 KB Probe Job
Load 50%
July 22, 2003
Load 80%
Load 95%
20
Example Results for 100 KB Probe Job
Load 80%
Load 95%
Size 100K
Load 50%
July 22, 2003
21
Example Results for 10 MB Probe Job
Load 50%
July 22, 2003
Load 80%
Load 95%
22
Statistical Summary of Results
July 22, 2003
23
Two Aspects of Unfairness
Endogenous unfairness:
(SRPT)
Caused by an intrinsic property of a job, such as its
size. This aspect of unfairness is invariant
Exogenous unfairness:
(PS)
Caused by external conditions, such as the number of
other jobs in the system, their sizes, and their
arrival times.
Analogy: showing up at a restaurant without a
reservation, wanting a table for k people
July 22, 2003
24
PS is “fair”
Sort of!
Observations for PS
July 22, 2003
Exogenous
unfairness
dominant
25
Observations for SRPT
July 22, 2003
Endogenous
unfairness
dominant
26
Asymptotic Convergence?
July 22, 2003
Yes!
27
Linear Scale
Log Scale
Illustrating
the
crossover
effect
(load=95%)
3M
3.5M
4M
July 22, 2003
28
Crossover Effect?
July 22, 2003
Yes!
29
Summary and Conclusions
Trace-driven simulation of Web server
scheduling strategies, using a probe-based
sampling methodology (probe jobs) to estimate
response time (slowdown) distributions
Confirms asymptotic convergence of the
slowdown metric for the largest jobs
Confirms the existence of the “cross-over
effect” for some job sizes under SRPT
Provides new insights into SRPT and PS
Two types of unfairness: endogenous vs. exogenous
PS is not really a “gold standard” for fairness!
July 22, 2003
30
Ongoing Work
Synthetic Web workloads
Sensitivity to arrival process (self-similar traffic)
Sensitivity to heavy-tailed job size distributions
Evaluate novel scheduling policies that may
improve upon PS (e.g., FSP, k-SRPT, …)
July 22, 2003
31
Sensitivity to Arrival Process
A bursty arrival process (e.g., self-similar
traffic, with Hurst parameter H > 0.5) makes
things worse for both PS and SRPT policies
A bursty arrival process has greater impact on
the performance of PS than on SRPT
PS exhibits higher exogenous unfairness than
SRPT for all Hurst parameters and system
loads tested
July 22, 2003
32
Sensitivity to Job Size Distribution
SRPT loves heavy-tailed distributions:
the heavier the tail the better!
For all Pareto parameter values and all system
loads considered, SRPT provides better
performance than PS with respect to mean
slowdown and standard deviation of slowdown
At high system load (U = 0.95), SRPT has more
pronounced endogenous unfairness than PS
July 22, 2003
33
Thank You!
Questions?
For more information:
M. Gong and C. Williamson,
“Quantifying the Properties of SRPT Scheduling”,
to appear, Proceedings of IEEE MASCOTS,
Orlando, FL, October 2003
Email: {gongm,carey}@cpsc.ucalgary.ca
July 22, 2003
34