n - York University

Scalably Scheduling Processes with
Arbitrary Speedup Curves
(Better Scheduling in the Dark)
Jeff Edmonds
York University
Kirk Pruhs
University of Pittsburgh
SODA 2009
Every Deterministic Nonclairvoyant Scheduler
has a Suboptimal Load Threshold
Jeff Edmonds
York University
Submitted to
STOC 2009
The Scheduling Problem
• Allocate p processors to a stream of n jobs:
• Measure of Quality:P
F (A(I )) =
n (c
i= 1 i
• Competitive Ratio:F ( A ( I ) )
max
I F ( O pt ( I ) )
¡ ri ) =
R
t
n t dt
Examples of Schedulers
Shortest Remaining
Processing Time (SRPT)
Shortest Elapsed
Time First (SETF)
Online
Online:
Optimal:
Future
?
All Knowing
All Powerful
Competitive:
Shortest Remaining
Processing Time (SRPT)
max I
F (SJ F (I ))
F ( O pt ( I ) )
= 1
Nonclairvoyant
Nonclairvoyant:
Optimal:
Not Competitive:
Future
?
All Knowing
All Powerful
Nonclairvoyant
I nput
Opt
F ( SE T F )
F ( O pt )
SE T F
=
(n)
I nput
E qui
Opt
F ( E qu i )
F ( O pt )
F ( N on cl ai r voy an t )
F ( O pt )
=
=
(
n
ln n
)
p
( n)
[MPT]
Performance vs Load
Average
Performance
F (A(I ))
Load
maxI
F (Opt(I ))
I
F (A(I ))
F ( O pt ( I ) )
=
(n)
Performance vs Load
Average
Performance
F (A(I ))
F (Opt(I ))
s
c
Load
maxI
I
F (A(I ))
F ( O pt ( I ) )
= O(1)
Performance vs Load
Average
Performance
F (Opt(I ))
F (A s (I ))
c
Load
maxI
I
F (A s (I ))
F ( O pt ( I ) )
= O(c)
Resource Augmentation
Nonclairvoyant:
Future
?
Extra Speed
Optimal:
Competitive:
All Knowing
All Powerful
Resource Augmentation
1+ 2²
I nput
Opt
F ( SE T F 1 + ² )
F ( O pt 1 )
[KP]
SE T F
= £ ( 1)
²
I nput
E qui 2+ ²
Opt
F ( E qu i 2 + ² )
F ( O pt 1 )
= £ ( 1)
Required
²
[E]
Sublinear Nondecreasing
Speedup JFunctions
= f J ;:::;J g
• A set of jobs:
• Each job has phases:
• Each phase:
– Work:
– Speedup function:
• Nondecreasing
• Sublinear
• Examples:
1
Ji =
Jq =
i
Wq
¡
i
q
i
n
J 1 ; : : : ; J qi
i
i
hW q ; ¡ q i
i
i
®
Sublinear Nondecreasing
Speedup Functions
Nonclairvoyant:
Future
Extra Speed
Optimal:
Competitive?
All Knowing
All Powerful
?
Sublinear Nondecreasing
Speedup Functions
Arrives
over time
Currently
Alive
Opt gives all its resources to the
parallelizable job and hence
Opt
competes them as they arrive.
The sequential jobs complete
with no resources.
Sublinear Nondecreasing
Speedup Functions
Arrives
over time
Currently
Alive
Shortest Elapsed Time First (SETF)
gives all its resources to a sequential
job, wasting it.
The parallelizable jobs, getting no
resources never complete.
SE T F s
F ( SE T F s )
F ( O pt 1 )
=
(n)
Sublinear Nondecreasing
Speedup Functions
Arrives
over time
Currently
Alive
`
²
jobs
Equi spreads its resources fairly.
Most are wasted on the sequential jobs.
The parallelizable jobs don’t get enough
and fall behind.
`
jobs
EQUI waists є resources,
has є extra.
E qui 1+but
²
F ( E qu i 1 + ² )
F ( O pt 1 )
=
` =² + `
`
=
1
²
Sublinear Nondecreasing
Speedup Functions
I nput
E qui 2+ ²
Opt
F ( E qu i 2 + ² )
F ( O pt 1 )
= £ ( 1)
Required
²
[E]
nt jobs currently alive
sorted by arrival time.
Latest Arrival Processor Sharing
Opt
SE T F
1 job But may be sequential.
L AP S ¯n t
E qui
New result
[EP]
F ( L A P Sh¯ ; 1 + ¯ +
F ( O pt 1 )
² ;i
)
nt
jobs
= £(
Speed
Compromise
Too thin
& needs
2+speed
²
jobs
1
¯²
)
nt jobs currently alive
sorted by arrival time.
Latest Arrival Processor Sharing
Opt
SE T F
1 job
L AP S ¯n t
nt
E qui
F ( SE T F 1+ ² )
F ( O pt 1 )
New result
[EP]
= £(
1
0²
¯ ¼0
Compromise
jobs
¯= 1
jobs
)
F ( L A P Sh¯ ; 1 + ¯ + ² ; i ) =
F ( O pt 1 )
F ( E qu i 2+ ² ) = £ ( 1 )
F ( O pt 1 )
1²
£(
1
¯²
)
Backwards Quantifies
Desired result:
Obtained:
New result
[E STOC09?]
New result
[EP]
9Alg 8²
F ( A l g1+ ² )
F ( O pt 1 )
=
8² 9Alg
F ( A l g1+ ² )
F ( O pt 1 )
= £(
F ( A l g1+ ² )
F ( O pt 1 )
= ! (1)
¯=
1²
2
8Alg 9²
²=
1¯
2
F ( L A P Sh¯ ; 1 + ¯ +
F ( O pt 1 )
²i
’
)
= £(
1
² O ( 1)
1
¯²
)
’
1
²2
)
Performance vs LLoad
Threshold
2 [0; 1]
F (Opt L (I )) = 1
Defn:
A set of jobs has load
if
i.e. can be optimally handled with speed L.
Defn:
F ¯ (L ) = max wit h load
I
L
F (L AP Sh¯ ;1i (I ))
Equi (β=1)
has the best performance,
but it only can handle half load.
L=
1
2
Small β can handleL = 1¡ ¯
almost full load
but its performance
1.
degrades with ¯
L
Lower Bound
Opt
L AP S ¯n t
jobs
Too thin
& needs
1+ speed
¯
³
´2
P
Measure
of resource concentration
i· nt
(½i ) 2 = (¯n t ) ¢
¯t =
nt
P
1
¯nt
1
i· nt
( ½i ) 2
=
1
¯nt
Lower Bound
Opt
To concentrated,
may be sequential.
Performance = 1/β
¯n t
Too thin
jobs
& needs
1+ speed
¯
β 0:
Constant β:
Alg specifies processor
allocation for each job
when nt jobs alive
Measure of resource concentration
¯t =
nt
1
P
i· nt
¯ = limt !
1
( ½i ) 2
¯t
Lower Bound
ti
O pt
I n put
Opt ignores extra jobs
& competesPstream
F low =
i
2t i
Alg
Alg attempts all
P
& completes
none
F low =
i
(1 + i )t i
Oops: We need an extra restriction that we can switch the job
the alg is working the most on to being sequential.
Likely because the alg favors the more recent jobs.
Lower Bound
Alg specifies processor
allocation for each job
when nt jobs alive
Eg Equi or Lapsβ
Arbitrary
Adv gives promises
no job completes
Compute work wi
completed
on each job
Adv gives work wi
Non trivial
to job so it
algebra does not complete
Brouwer's fixed
point theorem
I n put
O pt
Alg
Time ti
between jobs = wi
so Opt can complete
as arrive
Compute competitive ratio
Proof Sketch
F ( L A P Sh¯ ; 1 + ¯ +
F ( O pt 1 )
² ;i
)
= £(
1
¯²
)
Proof Sketch
• In the worst cast inputs, each phase is either
sequential or parallelizable.
LAPS
LAPS
Proof Sketch
Potential Function
•
•
•
•
•
Define a potential function Φt.
It says how much debt Laps has in the bank.
Φ0 = Φfinal = 0.
Φt does not increase as jobs arrive or complete.
At other dF
times,
( L A P S)
dF ( O pt )
d©
dt
+
dt
· c
• Result follows by integrating
dt
F (L AP S) + ©f i n al ¡ ©0 · cF (Opt)
Potential Function
nt jobs currently alive
sorted by arrival time.
Coefficient:
Parallelizable work
done by Opt
not by LAPS:
1 2 3 …
x 1x 2x 3
…
©= °
xnnt n0t+1
t
P
Job arrives:
i 2 [n t ]
d©
dt
i ¢max(x i ; 0)
= 0
Potential Function
n jobs currently alive
sorted by arrival time.
Coefficient:
Parallelizable work
done by Opt
not by LAPS:
x11x22x33 …
…
©= °
P
xnnt nt-1
i i+1i
0
i 2 [n t ]
d©
Job completes: dt
t
i ¢max(x i ; 0)
· 0
Potential Function
n jobs currently alive
sorted by arrival time.
Coefficient:
Parallelizable work
done by Opt
not by LAPS:
x11x22x33 …
xnnt
…
©= °
Opt works:
P
i 2 [n t ]
d©
dt
t
i ¢max(x i ; 0)
· ° ¢n t ¢1
Speed of Opt
Opt
Potential Function
n jobs currently alive
sorted by arrival time.
Coefficient:
Parallelizable work
done by Opt
not by LAPS:
LAPS works:
d©
dt
(1 ¡ ¯)n t
x ( 1¡ ¯ ) n
x11x22x33 …
…
©= °
· °
xnnt
t
P
i 2 [n t ]
t
L AP S
i ¢max(x i ; 0)
P
i 2 [( 1¡ ¯ ) n t ;n t ¡ `bt ]
i ¢¡
( 1+ ¯ + ² )
¯nt
Less work not done by Laps
Potential Function
n jobs currently alive
sorted by arrival time.
Coefficient:
Parallelizable work
done by Opt
not by LAPS:
LAPS works:
d©
dt
(1 ¡ ¯)n t
x ( 1¡ ¯ ) n
x11x22x33 …
…
©= °
· °
xnnt
t
P
i 2 [n t ]
t
L AP S
i ¢max(x i ; 0)
P
i 2 [( 1¡ ¯ ) n t ;n t ¡ `bt ]
i ¢¡
( 1+ ¯ + ² )
¯nt
Speed of Laps
Potential Function
n jobs currently alive
sorted by arrival time.
Coefficient:
Parallelizable work
done by Opt
not by LAPS:
LAPS works:
d©
dt
(1 ¡ ¯)n t
x ( 1¡ ¯ ) n
x11x22x33 …
…
©= °
· °
xnnt
t
P
i 2 [n t ]
t
L AP S
i ¢max(x i ; 0)
P
i 2 [( 1¡ ¯ ) n t ;n t ¡ `bt ]
i ¢¡
( 1+ ¯ + ² )
¯nt
Shared with
¯n t
jobs.
Potential Function
n jobs currently alive
sorted by arrival time.
Coefficient:
Parallelizable work
done by Opt
not by LAPS:
LAPS works:
d©
dt
(1 ¡ ¯)n t
x ( 1¡ ¯ ) n
x11x22x33 …
…
©= °
· °
xnnt
t
P
i 2 [n t ]
L AP S
i ¢max(x i ; 0)
P
i 2 [( 1¡ ¯ ) n t ;n t ¡ `bt ]
Range of
t
¯n t
i ¢¡
jobs worked on.
( 1+ ¯ + ² )
¯nt
Potential Function
n jobs currently alive
sorted by arrival time.
Coefficient:
Parallelizable work
done by Opt
not by LAPS:
LAPS works:
d©
dt
(1 ¡ ¯)n t
x ( 1¡ ¯ ) n
x11x22x33 …
…
©= °
· °
xnnt
t
P
i 2 [n t ]
t
L AP S
i ¢max(x i ; 0)
P
i 2 [( 1¡ ¯ ) n t ;n t ¡ `bt ]
`bt =
i ¢¡
( 1+ ¯ + ² )
¯nt
# of jobs
• sequential under LAPS
xi · 0
• LAPS is a head, i.e.
Potential Function
n jobs currently alive
sorted by arrival time.
Coefficient:
Parallelizable work
done by Opt
not by LAPS:
LAPS works:
d©
dt
(1 ¡ ¯)n t
x ( 1¡ ¯ ) n
x11x22x33 …
…
©= °
· °
t
P
i 2 [n t ]
xnnt
t
L AP S
i ¢max(x i ; 0)
P
i 2 [( 1¡ ¯ ) n t ;n t ¡ `bt ]
`bt =
i ¢¡
( 1+ ¯ + ² )
¯nt
# of jobs
• sequential under LAPS
xi · 0
• LAPS is a head, i.e.
Function
dF (Potential
L A P S) + d©
· c dF ( O pt )
dt
dt
dt
nt
· £(
# jobs alive
under Laps
1
¯²
)
b̀ · N
t
# jobs alive
under Opt
resulting
competitive ratio
A page of math later,
and the proof is done.
Opt works:
LAPS works:
d©
dt
· ° ¢n t ¢1
P
+°
i 2 [( 1¡ ¯ ) n t ;n t ¡ `bt ]
`bt =
i ¢¡
( 1+ ¯ + ² )
¯nt
# of jobs
• sequential under LAPS
xi · 0
• LAPS is a head, i.e.
Conclusions
nt jobs currently alive
sorted by arrival time.
Latest Arrival Processor Sharing
L AP S ¯n t
[EP] Resource
F ( LAugmentation:
APS
)
h¯ ; 1 + ¯ + ² ; i
F ( O pt 1 )
jobs
= £(
[E]: Suboptimal
Load
Threashold
8Alg 9² F ( A l g ) =
1+ ²
F ( O pt 1 )
1
¯²
)
! (1)
Other Models
Same Techniques
•Broadcast:
Many page requests
serviced simultaneously
[EP:SODA02, EP:SODA03]
•TCP:
Add Incr & Mult Decr
~ EQUI
[EDD:PAA03, E:Latin 04]
•Speed Scaling:
Each algorithm can dynamically choose its speed s,
but it must pay for it with energy P(s) = sα
[CELLSP:STACS09,EP??]
Thank you
Conclusions
nt jobs currently alive
sorted by arrival time.
Latest Arrival Processor Sharing
L AP S ¯n t
[EP] Resource
F ( LAugmentation:
APS
)
h¯ ; 1 + ¯ + ² ; i
[EP]:
jobs
= £(
F ( O pt 1 )
8Alg 9² F ( A l g1+ ² )
F ( O pt 1 )
=
1
¯²
)
(n)
[CELLMP]:
Scaling:
£
(®2 )
¯
L AP Speed
S¯
is
-competitive for
[EP]: L
Speed
with
£ (ln
p) multi-processors.:
AP SScaling
¯
-competitive
is
=
1
®
Scheduling in the Dark
Multiprocessor – Batch
Edmonds, Chinn, Brecht, Deng
STOC 97
- Speed 2+ε
Edmonds
STOC 99
- Speed 1+ε
Edmonds, Pruhs
SODA 09
- A ε not competitive
Edmonds, Pruhs
? STOC 09
Edmonds, Pruhs
SODA 02
Edmonds, Pruhs
SODA 04
TCP – one bottle neck
Edmonds, Datta, Dymond
PAA 03
- General Network
Edmonds
Latin 04
Chan, Edmonds, Lam, Lee, Marchetti-Spaccamela, and Pruhs
? STACS 09
Edmonds, Pruhs
being written
Multicast - reduction
- LWF
Speed Scaling – one proc.
- multi proc.
Nonclairvoyant Speed Scaling for
Flow and Energy
Ho-Leung Chan (Pittsburgh)
Jeff Edmonds (York)
Tak-Wah Lam (Hong Kong
Lap-Kei Lee (Hong Kong)
A. Marchetti-Spaccamela (Roma)
Kirk Pruhs (Pittsburgh)
Submitted to STACS 2009
Speed Scaling
s
Each algorithm can dynamically choose
,
®
P (s) its
= sspeed
but it must pay for it with energy
=
F ( A l g) + E ( A l g)
F ( O pt ) + E ( O pt )
R
R
n hA l g ; t i dt +
= ( shA l g ; t i ) ® dt
Rt
Rt
n hO p t ; t i dt + ( shO p t ; t i ) ® dt
t
Known:
9
3-competitive clairvoyant alg [BCP].
L AP S¯
New
[CELLMP]:
L AP S¯ :
t
is
£ (®3 )
-competitive for
shL A P S;t i = (n hL A P S;t i ) 1=®
Partition speed to the
latest arriving jobs.
¯n
¯=
1
®
Speed Scaling
s
Each algorithm can dynamically choose
,
®
P (s) its
= sspeed
but it must pay for it with energy
=
F ( A l g) + E ( A l g)
F ( O pt ) + E ( O pt )
R
R
n hA l g ; t i dt + ( shA l g ; t i ) ® dt
Rt
Rt
n hO p t ; t i dt + ( shO p t ; t i ) ® dt
t
Known:
9
3-competitive clairvoyant alg [BCP].
L AP S¯
New
[CELLMP]:
t
is
£ (®3 )
-competitive for
¯=
1
®
Only for one processor or fully parallelizable jobs
New
[EP]:
Multi processors & Parallel-Sequential Jobs
Speed Scaling
s
Each algorithm can dynamically choose
,
®
P (s) its
= sspeed
but it must pay for it with energy
s Processors Model:
• Dynamically allocate pi unit speed processors to job Ji.
• Energy is (#processors)α per time.
•
α2 competitive.
Individual Speeds Model:
•
•
•
•
Dynamically partition the p processors among the jobs.
Run processor k at speed sk.
Energy is skα per processor per time.
log p competitive.
Speed Scaling for Flow and Energy
with mult-Processors
and Arbitrary Speedup Curves
Jeff Edmonds
York University
Kirk Pruhs
University of Pittsburgh
Being Written
Performance Fvs
Load
(Opt (I )) = 1
s
Defn: A set of jobs has load s if
s 2 [0; 1]
i.e. can be optimally handled with speed
Defn:
F ¯ (s) = max wit h load F (L AP Sh¯ ;1i (I ))
I
s
= maxI
F ( L A P Sh¯ ; 1 i ( I ) )
F ( O pt s ( I )
sL A P S
sO p t
= 1+ ¯ + ²
sL A P S
sO p t
=
1
s
=
F ( L A P S( I ) )
F ( O pt ( I ) )
F ¯ (s) =
4( 1+ ¯ + ² )
¯²
41
s
¯ ( 1 ¡ 1¡ ¯ )
s
for s <
1
1+ ¯
Performance Fvs
Load
(Opt (I )) = 1
s
Defn: A set of jobs has load s if
s 2 [0; 1]
i.e. can be optimally handled with speed
Defn:
F ¯ (s) = max wit h load F (L AP Sh¯ ;1i (I ))
I
s
F ¯ (s) =
41
s
¯ ( 1 ¡ 1¡ ¯ )
for s <
s
Equi (β=1)
has the best performance,
but it only can handle half load.
s=
1
1+ ¯
1
2
Small β can handle
s ¼ 1¡ ¯
almost full load
but its performance
1.
¯
degrades with
Speed Scaling
s
Each algorithm can dynamically choose
,
®
P (s) its
= sspeed
but it must pay for it with energy
=
F ( A l g) + E ( A l g)
F ( O pt ) + E ( O pt )
R
R
n hA l g ; t i dt + ( shA l g ; t i ) ® dt
Rt
Rt
n hO p t ; t i dt + ( shO p t ; t i ) ® dt
t
Known:
9
3-competitive clairvoyant alg [BCP].
L AP S¯
New
[CELLMP]:
t
is
£ (®2 )
-competitive for
Every(®)
nonclairvoyant algorithm is ®
P (s) = s
! (1) -competitive for P (s) = s! ( 1)
-competitive for
¯=
1
®