Stratified Round Robin

Stratified Round Robin: A
Low Complexity Packet
Scheduler with Bandwidth
Fairness and Bounded Delay
Sriram Ramabhadran
Joseph Pasquale
Presented by
Sailesh Kumar
Outline

Packet fair queuing algorithms

Motivation for stratified round robin

Operation

Implementation

Analysis
‹#› - Sailesh Kumar - 7/13/2017
Introduction


Recently a class of service discipline called Packet Fair
Queuing (PFQ) has got much attention
PFQ Algorithms approximate the idealized Generalized
Processor Sharing (GPS) policy which has two
desirable properties
» It can guarantee end-to-end delay for a flow in a packet
switched network.
» It can ensure instantaneous fair allocation of bandwidth
among all the backlogged flows

While many PFQ algorithms has been proposed, few
can achieve all of the following goals
» Support lots of flows with diverse rates
» Operate at high speeds (10 Gbps +)
» Maintain both a) delay bound, and b) worst case fairness
‹#› - Sailesh Kumar - 7/13/2017
Different classes of scheduling algorithms

Time stamp schedulers

Round robin schedulers

Try to emulate GPS

Assign time slots to flows



WFQ
WF2Q
SCFQ
…

DRR
W-DRR
BSFQ
…

Good delay bounds

Easy implementation


‹#› - Sailesh Kumar - 7/13/2017


Time stamp schedulers

There are three major associated costs
» The computation of system virtual time
– Normalized fair amount of service, each flow should receive
» Management of priority queue to order the packet
transmission
– It involves sorting across all active flows
» Management of another priority queue to regulate the packets
– It can be thought of as the set of eligible flows

Computation of system virtual time may be complex
» WFQ, WF2Q - O(N) complexity
» SCFQ, WF2Q+ – O(1) complexity
» Several other algorithms with O(logN), … complexities

Priority queues are almost always O(logN)
‹#› - Sailesh Kumar - 7/13/2017
Round robin schedulers

Eliminates sorting/priority queues

DRR
» Assigns a quantum to each queue and services the in a round
robin fashion

W-DRR
» Assigns different quantum to each flow but services them
sequentially



Easy implementation
Poor delay bounds
Bursty output
‹#› - Sailesh Kumar - 7/13/2017
Stratified round robin, Introduction

Uses best of both worlds

Round robin - Poor delay bound results from flows
with high rate disparity
Deadline based – Hard to implement due to large
number of flows, O(logN) can be prohibitive


SRR groups flows with similar rates into classes such
that there are few classes, say n (<< N)
» Use deadline based selection across groups, O(log n)
» And round robin based selections within groups, O(1)
» Total complexity O(log n)
‹#› - Sailesh Kumar - 7/13/2017
Stratified round robin

Let there are N backlogged flows f1, f2, . . . , fN that
share an output link of bandwidth R.

Flow fi has a reserved bandwidth of ri such that
» Sigma_ri (i = 1 to N) < R

Lets call the weight of ith flow as wi = ri/R
» Thus, Sigma_wi (i = 1 to N) < 1

Flows are grouped into classes based on their weights

Flow class Fk is defined as
» Fk = {fi : 1/2k ≤ wi < 1/2k-1}
‹#› - Sailesh Kumar - 7/13/2017
SRR (cont)

How many classes


Lets say that the minimum rate of a flow is r
Thus, there will be n = (log R/r) different classes

For R = 40 Gbps, and r = 1 bps,
» n = 36

A priority queue with n = 36 can be implemented to
run in O(1) complexity
» We will come to it later
‹#› - Sailesh Kumar - 7/13/2017
‹#› - Sailesh Kumar - 7/13/2017
SRR (cont)
Inter-class scheduling

Time is measured in slots and only one flow can send
packets in a slot

tC denotes the current time slot
tC is incremented after each slot


Thus virtual time concept over here is little bit similar
to SCFQ

Slots may not be assigned to any flow, in which case
tC is just incremented (zero real time)
‹#› - Sailesh Kumar - 7/13/2017
Inter-class scheduling (cont)

Scheduling intervals
» Fixed length and contiguous slots associated with a flow class

For class Fk, scheduling interval is always 2k slots
» Thus, if a scheduling interval for a class begins at time t, the
next will begin at time t+2k


Every flow fi ∈ Fk has a weight of approximately 2−k,
Therefore, if slots represent fixed-size units of
bandwidth allocation, fi is entitled to exactly one slot
every 2k slots.
In fact, Stratified Round Robin does exactly this by
trying to assign every flow fi ∈ Fk one slot in each
scheduling interval of Fk.
‹#› - Sailesh Kumar - 7/13/2017
Inter-class scheduling (cont)

A flow class Fk is called active if it contains at least
one backlogged flow, i.e., Nk > 0.
» Let A denote the set of active flow classes.

A backlogged flow fi ∈ Fk is called pending if fi has not
been assigned a slot in Fk’s current scheduling
interval.

A flow class is called pending if it contains at least one
pending flow.
» Let P denote the set of pending flow classes.
‹#› - Sailesh Kumar - 7/13/2017
Inter-class scheduling (cont)

Assign every flow fi ∈ Fk exactly one slot in each
scheduling interval of Fk.
» The end of the current scheduling interval of a flow class is
deadline for all backlogged flows belonging to that class.

Thus, the inter-class scheduler selects the flow class
Fk with the earliest deadline.

The intra-class scheduler then assigns a flow fi ∈ Fk
the current slot.
» A flow class Fk ceases to be pending when all flows belonging
to Fk is assigned a slot in its current scheduling interval.
» Then Fk remains like that until the start of its next scheduling
interval, when all flows belonging to Fk become pending
again.
‹#› - Sailesh Kumar - 7/13/2017
Inter-class scheduling (cont)

How to advance tC

After servicing a flow in the current time slot

If there are any pending flow classes, tC is
incremented by 1

Otherwise, tC is advanced to the earliest time when
some flow class becomes pending again
‹#› - Sailesh Kumar - 7/13/2017
Inter-class scheduling (cont)
Weights
w1 = 1/2
w2 = 1/8
w3 = 3/16
w4 = 1/16
w5 = 1/16
‹#› - Sailesh Kumar - 7/13/2017
Inter-class scheduling (cont)

Within a class flows are scheduled in a W-DRR fashion

Each flow is given a credit proportional to its weight

Output is not that bursty because maximum weight
disparity within a class in 2
‹#› - Sailesh Kumar - 7/13/2017
Implementation

A simple implementation aligns all the scheduling
intervals

Although this may result a little bit unfair service, it
makes the implementation extremely trivial

Deadline of class Fk will also be deadline of class Fk’,
when k’ < k
Deadlines of various classes
Class Fk+2
Class Fk+1
Class Fk
‹#› - Sailesh Kumar - 7/13/2017
Implementation (Selecting class)
Current tC
To select next class – choose the smallest k such that Fk is pending
Deadlines of various classes
Class FK+2
Class Fk+1
Class Fk

A simple priority encoder operating on the pending
status bits of flow classes can be used to choose the
class
‹#› - Sailesh Kumar - 7/13/2017
Implementation (Advancing tC)
Current tC
Scheduling intervals starts here for various classes
Class FK+2
Class Fk+1 (First active class)
Class Fk
Advance tC such that at least one class becomes pending

If Fk+1 is the first class that is active then
» Add 2k+1 and then reset k LSB bits

Can be implemented with a priority encoder and few
gates
‹#› - Sailesh Kumar - 7/13/2017
Analysis

Golestani fairness
» It essentially requires that the difference between the
normalized service received by any two backlogged flows fi
and fj, over any time period (t1, t2), be bounded by a small
constant

(SRR results). In any time period (t1, t2) during
which flows fi and fj are backlogged,
» Si(t1, t2)/ri − Sj (t1, t2)/rj ≤ 5LM(1/ri + 1/rj)

Bennet-Zhang fairness
» It compares the service received by a single flow fi to the
service it would receive in the ideal case, i.e., when fi has
exclusive access to an output link of bandwidth ri

(SRR results).
» δi < qi/ri + 5LM/ri + 5(N − 1)LM/R
‹#› - Sailesh Kumar - 7/13/2017
Analysis and simulation results

Single packet delay bound

(Single packet delay). For every flow fi, let ∆i be the
maximum delay experienced by a packet at the head
of fi’s queue. Then
» ∆i < 12*LM/ri

Independent of the the number of flows

Tighter bounds may be derived

A simple simulation demonstrates that SRR results in
similar performance (in terms of avg. delay) as WFQ
‹#› - Sailesh Kumar - 7/13/2017
If (doubts) Then
Ask;
Else
Thank you;
End if;
‹#› - Sailesh Kumar - 7/13/2017