Decentralized Task-Aware Scheduling for Data
Center Networks
Fahad R. Dogar, Thomas Karagiannis, Hitesh Ballani, and Antony Rowstron
Microsoft Research
SIGCOMM 2014
Pengcheng Zhou
Mar. 27, 2017
Outline
• Background and Motivation
• Main Contributions
• Scheduling Policy
• Baraat
• Experiments and Evaluation
• Summary
• References
Background and Motivation
Background and Motivation
Flow-level scheduling can hurt application performance
• Tasks from data center applications such as web search or social networks, involve
hundreds and thousands of components, all of which need to finish before
a task is considered complete.
• Most network resource allocation schemes treat all flows in isolation rather than as part
of a task, and therefore only optimize flow-level metrics.
• For example, PDQ[1] and pFabric[2] can support a scheduling policy like shortest flow first
(SFF ), which minimizes flow completion times by assigning resources based on flow sizes.
• SFF considers flows in isolation, so it will schedule the shorter flows of every task first,
leaving longer flows to the end. This can hurt application performance by delaying
completion of tasks.
Allocate the network resources in a task-aware fashion.
Background and Motivation
Main Contributions
Main Contributions
• Study policies regarding the order in which tasks should be scheduled. ( FIFO-LM or FIFO
with limited multiplexing )
• Show that task-aware policies like FIFO-LM (and even FIFO) can reduce both the average
and the tail task completion times.
• Design Baraat, a decentralized task-aware scheduling system for data centers.
Background and Motivation
Scheduling Policy
Task Serialization
• Task Serialization: schedule tasks one by one (priority).
Fair Sharing (FS) vs.
Task Serialization (TS):
Task Serialization can effectively reduce the task completion time.
Task Serialization Policy: FIFO-LM (FIFO with limited multiplexing)
FIFO-LM processes tasks in a FIFO order, but can dynamically vary the number of tasks
that are multiplexed at a given time.
• If the degree of multiplexing is one, it performs exactly the same as FIFO.
• If the degree of multiplexing is 1, it works similar to fair sharing.
FIFO-LM performs like FIFO for the majority of tasks (the small ones), but when a large task
arrives, we can increase the level of multiplexing and allow small tasks to make progress as
well.
Background and Motivation
Baraat
Generating Task Identifiers
Data center application pattern:
• Baraat uses monotonically increasing
counter(s) to keep track of incoming
tasks.
• Task counter on a common point,
such as load balancer, job scheduler,
meta-data manager, and so on.
Prioritization Mechanism: Smart Priority Class (SPC)
Similar to priority queues used in switches: flows mapped to a higher priority class get strict
preference over those mapped to a lower priority class, and flows mapped to the same class
share bandwidth according to max-min fairness.
Additional smarts:
• To provide work-conservation in multi-hop settings, SPC supports explicit feedback from
switches.
• Dynamic mapping of flows to priority queues : a flow’s mapping may change during its
lifetime, if a heavy task is identified in the system.
Background and Motivation
Experiments and Evaluation
Experiments and Evaluation
Overall Performance:
Baraat’s performance against RCP for a parallel workflow scenario across all
experimental platforms.
Experiments and Evaluation
Task Completion Time and Tail
Task Completion Time:
Reduction in tail task completion time with Baraat against
decentralized schemes.
Reduction in task completion time for the partitionaggregate workflow.
Reduction in mean task completion time
for data-parallel jobs.
Reduction in tail task completion time with Baraat against
centralized schemes.
Background and Motivation
Summary
Summary
Baraat is a decentralized system for task-aware network scheduling.
• It provides a consistent treatment to all flows of a task, both across space and
time, allowing active flows of the task to be loosely synchronized and make
progress at the same time.
• By changing the level of multiplexing, Baraat effectively deals with the
presence of heavy tasks.
亮点:以分布式的方式实现了task-aware的流调度。
缺点:既要修改端主机,又要修改交换机。
References
[1] C. Hong, M. Caesar, and P. Godfrey. Finishing flows quickly with preemptive scheduling. ACM
SIGCOMM Computer Communication Review, 42(4):127{138, 2012.
[2] M. Alizadeh, S. Yang, M. Sharif, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. pfabric:
Minimal near-optimal datacenter transport. In ACM SIGCOMM, 2013.
© Copyright 2026 Paperzz