Hierarchical Scheduling for Diverse Datacenter Workloads

Hierarchical Scheduling for
Diverse Datacenter Workloads
Arka A. Bhattacharya, David Culler, Ali Ghodsi, Scott
Shenker, and Ion Stoica
University of California, Berkeley
Eric Friedman
International Computer Science Institute, Berkeley
ACM SoCC’13
Hierarchical Scheduling
A feature of cloud schedulers.
 Enables scheduling resources to reflect
organizational priorities.

Hierarchical Share Guarantee

Assign to each node in the weighted tree
some guaranteed share of the resources.
◦ A node nL is guaranteed to get at least x share
of resources from it parent, where x equals to




Wi: weight of node ni
P(): parent of a node
C(): the set of children of a node
A(): the subset of demanding nodes
Example

Given 480 servers
240
96
0
48
80
96
160
Multi-resource Scheduling

Workloads in data centers tend to be
diverse.
◦ CPU-intensive, memory-intensive, or I/O
intensive.
◦ Ignoring the actual resource needs of jobs
leads to poor performance isolation and low
throughput for jobs.
Dominant Resource Fairness (DRF)
A generalization of max-min fairness to
multiple resource types.
 Maximize the minimum dominant shares of
users in the system.

◦ Dominant share si is the maximum share of
resource among all shares of a user.
◦ Dominant resource is the resource
corresponding to the dominant share.
ui , j
m
si  max j 1{ }
rj
Example

Dominant resource
◦ Job 1: memory
◦ Job 2: CPU

Dominant share
◦ 60%
How DRF Works

Given a set of users, each with a resource
demand vector.
◦ The resources required to execute one job.
 Starts with every user being allocated with zero
resources.
Repeatedly picks the user with the lowest
dominant share.
 Launches one of the user’s job if there are
enough resources available in the system.

Example

System with 9 CPUs and 18 GB RAM.
◦ User A: <1 CPU, 4 GB>
◦ User B: <3 CPUs, 1 GB>
Hierarchical DRF (H-DRF)
Static H-DRF
 Collapsed hierarchies
 Naive H-DRF
 Dynamic H-DRF

Static H-DRF
A static version of DRF to handle
hierarchies.
 Algorithm

◦ Given the hierarchy structure and the amount
of resources in the system.
◦ Starts with every leaf nodes being allocated
with zero resources.
◦ Repeatedly allocates resource to a leaf node
until no more resources can be assigned to
any node.
Resource Allocation in Static H-DRF
Start at the root of the tree and traverse
down to a leaf.
 At each step picking the demanding child
that has the smallest dominant share.

◦ Internal nodes are assigned the sum of all the
resources assigned to their immediate
children.

Allocate the leaf node an ε amount of its
resource demands.
◦ Increases the node’s dominant share by ε.
Example

Given 10 CPUs and 10 GPUs.
Weakness of Static H-DRF

Re-calculating the static H-DRF
allocations for each of the leaves and
arrivals from scratch is computationally
infeasible.
Collapsed Hierarchies

Converts a hierarchical scheduler into a
flat one and apply weighted DRF
algorithm.
◦ Works when only one resource is involved.
◦ Violates the hierarchical share guarantee for
internal nodes in the hierarchy.
Example

Given

Flatten
nr
n1,1 <1,1> 50%
n2,1 <1,0> 25%
n2,2 <0,1> 25%
Weighted DRF

Each user i is associated a weight vector
Wi = {wi,1, … wi,m}.
◦ wi,j represents the weight of user i for
resource j.

Dominant share si  max
m
j 1
{
ui , j
rj
}
wi,j
Weighted DRF in Collapsed
Hierarchies

Each node ni has a weight wi.
◦ Let wi,j = wi for 1≦j≦m
si  max
m
j 1
{
ui , j
wi
}
◦ The ratio between dominated resources
allocated to user a and user b equals to
wa/wb.
Example

Given
nr
n1,1 <1,1> 50%

Collapsed Hierarchies
n2,1 <1,0> 25%
n2,2 <0,1> 25%
Naive H-DRF

A natural adaptation of the original DRF
to the hierarchical setting.

The hierarchical share guarantee is
violated for leaf nodes.
◦ Starvation
Example

Static H-DRF
Dominate share = 1.0

Naive H-DRF
Dynamic H-DRF
Does not suffer from starvation.
 Satisfy the hierarchical share guarantee.


Two key features:
◦ Rescaling to minimum nodes
◦ Ignoring blocked nodes
Rescaling to Minimum Nodes

Compute the resource consumption of an
internal node as follows:
◦ Find the demanding child with minimum
dominant share M.
◦ Rescale every child’s resource consumption
vector so that its dominant share becomes M.
◦ Add all the children’s rescaled vectors to get
the internal node’s resource consumption
vector.
Example
Given 10 CPUs and 10 GPUs.
 After n2,1 finishes a job and release 1 CPU:

Dominate share = 0.4
<0.4, 0.4>
Dominate share = 0.5
<0.5, 0>
<0.4, 0>
<0, 1>
<0, 0.4>
Ignoring Blocked Nodes

Dynamic H-DRF only consider nonblocked nodes for rescaling.

A leaf node is blocked if either
◦ Any of the resources it requires are saturated.
◦ The node is non-demanding.

An internal node is blocked if all of its
children are blocked.
Example

Static H-DRF
Dominate share = 1/3

Without blocked
Allocation Properties
Hierarchical Share Guarantees
 Group Strategy-proofness

◦ No group of users can misrepresent their
resource requirements in such a way that all
of them are weakly better off, and at least one
of them is strictly better off.
Recursive Scheduling
 Not Population Monotonicity

◦ PM: Any node exiting the system should not decrease the
resource allocation to any other node in the hierarchy
tree.
Example
Evaluation - Hierarchical Sharing

49 Amazon EC2 severs
◦ Dominant resource:
 n1,1, n2,1, n2,2: CPU
 n1,2: GPU
Result

pareto-efficiency: no node in the hierarchy can be allocated an
extra task on the cluster without reducing the share of some other
node.
Conclusion

Proposed H-DRF, which is a hierarchical
multi-resource scheduler.
◦ Avoid job starvation and maintain hierarchical
share guarantee.

Future works
◦ DRF under placement constraints.
◦ Efficient allocation vector update.