Hierarchical Scheduling for
Diverse Datacenter Workloads
Arka A. Bhattacharya, David Culler, Ali Ghodsi, Scott
Shenker, and Ion Stoica
University of California, Berkeley
Eric Friedman
International Computer Science Institute, Berkeley
ACM SoCC’13
Hierarchical Scheduling
A feature of cloud schedulers.
Enables scheduling resources to reflect
organizational priorities.
Hierarchical Share Guarantee
Assign to each node in the weighted tree
some guaranteed share of the resources.
◦ A node nL is guaranteed to get at least x share
of resources from it parent, where x equals to
Wi: weight of node ni
P(): parent of a node
C(): the set of children of a node
A(): the subset of demanding nodes
Example
Given 480 servers
240
96
0
48
80
96
160
Multi-resource Scheduling
Workloads in data centers tend to be
diverse.
◦ CPU-intensive, memory-intensive, or I/O
intensive.
◦ Ignoring the actual resource needs of jobs
leads to poor performance isolation and low
throughput for jobs.
Dominant Resource Fairness (DRF)
A generalization of max-min fairness to
multiple resource types.
Maximize the minimum dominant shares of
users in the system.
◦ Dominant share si is the maximum share of
resource among all shares of a user.
◦ Dominant resource is the resource
corresponding to the dominant share.
ui , j
m
si max j 1{ }
rj
Example
Dominant resource
◦ Job 1: memory
◦ Job 2: CPU
Dominant share
◦ 60%
How DRF Works
Given a set of users, each with a resource
demand vector.
◦ The resources required to execute one job.
Starts with every user being allocated with zero
resources.
Repeatedly picks the user with the lowest
dominant share.
Launches one of the user’s job if there are
enough resources available in the system.
Example
System with 9 CPUs and 18 GB RAM.
◦ User A: <1 CPU, 4 GB>
◦ User B: <3 CPUs, 1 GB>
Hierarchical DRF (H-DRF)
Static H-DRF
Collapsed hierarchies
Naive H-DRF
Dynamic H-DRF
Static H-DRF
A static version of DRF to handle
hierarchies.
Algorithm
◦ Given the hierarchy structure and the amount
of resources in the system.
◦ Starts with every leaf nodes being allocated
with zero resources.
◦ Repeatedly allocates resource to a leaf node
until no more resources can be assigned to
any node.
Resource Allocation in Static H-DRF
Start at the root of the tree and traverse
down to a leaf.
At each step picking the demanding child
that has the smallest dominant share.
◦ Internal nodes are assigned the sum of all the
resources assigned to their immediate
children.
Allocate the leaf node an ε amount of its
resource demands.
◦ Increases the node’s dominant share by ε.
Example
Given 10 CPUs and 10 GPUs.
Weakness of Static H-DRF
Re-calculating the static H-DRF
allocations for each of the leaves and
arrivals from scratch is computationally
infeasible.
Collapsed Hierarchies
Converts a hierarchical scheduler into a
flat one and apply weighted DRF
algorithm.
◦ Works when only one resource is involved.
◦ Violates the hierarchical share guarantee for
internal nodes in the hierarchy.
Example
Given
Flatten
nr
n1,1 <1,1> 50%
n2,1 <1,0> 25%
n2,2 <0,1> 25%
Weighted DRF
Each user i is associated a weight vector
Wi = {wi,1, … wi,m}.
◦ wi,j represents the weight of user i for
resource j.
Dominant share si max
m
j 1
{
ui , j
rj
}
wi,j
Weighted DRF in Collapsed
Hierarchies
Each node ni has a weight wi.
◦ Let wi,j = wi for 1≦j≦m
si max
m
j 1
{
ui , j
wi
}
◦ The ratio between dominated resources
allocated to user a and user b equals to
wa/wb.
Example
Given
nr
n1,1 <1,1> 50%
Collapsed Hierarchies
n2,1 <1,0> 25%
n2,2 <0,1> 25%
Naive H-DRF
A natural adaptation of the original DRF
to the hierarchical setting.
The hierarchical share guarantee is
violated for leaf nodes.
◦ Starvation
Example
Static H-DRF
Dominate share = 1.0
Naive H-DRF
Dynamic H-DRF
Does not suffer from starvation.
Satisfy the hierarchical share guarantee.
Two key features:
◦ Rescaling to minimum nodes
◦ Ignoring blocked nodes
Rescaling to Minimum Nodes
Compute the resource consumption of an
internal node as follows:
◦ Find the demanding child with minimum
dominant share M.
◦ Rescale every child’s resource consumption
vector so that its dominant share becomes M.
◦ Add all the children’s rescaled vectors to get
the internal node’s resource consumption
vector.
Example
Given 10 CPUs and 10 GPUs.
After n2,1 finishes a job and release 1 CPU:
Dominate share = 0.4
<0.4, 0.4>
Dominate share = 0.5
<0.5, 0>
<0.4, 0>
<0, 1>
<0, 0.4>
Ignoring Blocked Nodes
Dynamic H-DRF only consider nonblocked nodes for rescaling.
A leaf node is blocked if either
◦ Any of the resources it requires are saturated.
◦ The node is non-demanding.
An internal node is blocked if all of its
children are blocked.
Example
Static H-DRF
Dominate share = 1/3
Without blocked
Allocation Properties
Hierarchical Share Guarantees
Group Strategy-proofness
◦ No group of users can misrepresent their
resource requirements in such a way that all
of them are weakly better off, and at least one
of them is strictly better off.
Recursive Scheduling
Not Population Monotonicity
◦ PM: Any node exiting the system should not decrease the
resource allocation to any other node in the hierarchy
tree.
Example
Evaluation - Hierarchical Sharing
49 Amazon EC2 severs
◦ Dominant resource:
n1,1, n2,1, n2,2: CPU
n1,2: GPU
Result
pareto-efficiency: no node in the hierarchy can be allocated an
extra task on the cluster without reducing the share of some other
node.
Conclusion
Proposed H-DRF, which is a hierarchical
multi-resource scheduler.
◦ Avoid job starvation and maintain hierarchical
share guarantee.
Future works
◦ DRF under placement constraints.
◦ Efficient allocation vector update.
© Copyright 2026 Paperzz