V. sample round of srr

PAUL SOUTHERINGTON, ECE742, 28 APRIL 2005
1
The Smoothed Round-Robin Scheduler
Paul Southerington, Member, IEEE

Abstract—This paper reviews and illustrates the concepts put
forth in the original paper by Guo [1]. The Smoothed Round
Robin scheduling algorithm is explained and discussed, and a
sample Windows application for generation of SRR service curves
is developed.
Index Terms—Scheduling, SRR, SRR Demo, Weight Matrix,
WSS
I. INTRODUCTION
S
moothed Round Robin, or SRR, is a work-conserving
packet scheduling algorithm that attempts to provide
maximum fairness while maintaining only O(1) time
complexity.
This is in contrast to other approaches such as Weighted
Fair Queuing (WFQ), that use time stamps to approximate the
results of a Generalized Processor Sharing (GPS) scheduler.
Such methods often provide good fairness properties at the
cost of increased complexity of the algorithm. O(N) or
O(log N) complexity is typical of these schemes. Furthermore,
[1] suggests that it is unlikely that such algorithms will be able
to improve beyond O(log N) because of their dependence on
sorting algorithms, yet this level of complexity may be
unacceptable when operating at high speeds.
[1] lists four properties expected of a packet scheduler: low
time complexity, fair treatment of different flows, a low upper
bound on delay and low delay variation, and
simplicity/efficiency of implementation. In practice, however,
these are often conflicting needs. It is difficult to achieve high
fairness and low delay bounds with simple, easy-to-implement
algorithms. SRR does a remarkably good job of addressing
this tradeoff.
Formally, [1] defines several distinct roles in providing
service. The packet classifier, admission controller, and packet
enqueuer are considered to be separate tasks from the packet
scheduler itself and so are not be included in the O(1)
complexity measurement. In the case of SRR, flow
management is also excluded from this measurement.
II. THE WEIGHT MATRIX
The SRR algorithm makes use of several important data
structures. The first of these is the Weight Matrix. Each
element in the Weight Matrix will be either zero or one. The
Manuscript received April 28, 2005.
Paul Southerington is a graduate student in Computer Engineering at
George Mason University, Fairfax, VA 22030 USA (e-mail:
[email protected]).
matrix has one row corresponding to each data flow. The
individual elements of each row are determined based on the
weight of the corresponding flow; it may be appropriate to
normalize the weights by reducing such that the greatest
common divisor of any pair of weights is equal to one. The
contents of a row are determined by representing the reduced
weight as a series of binary coefficients. Such a series of
coefficients is referred to as a Weight Vector. For example, a
weight of 19 is equal to 16 + 2 + 1, which is equivalent to 24
+ 21 + 20. Thus:
WV
(19)
 1 0 0 1 1
The number of columns in the Weight Matrix may then be
found as:
n  log
c
2
(w f max )
where nc is the number of columns of the matrix, and wfmax is
the weight of highest-priority flow.
III. THE WEIGHT SPREAD SEQUENCE
In addition to the Weight Matrix, SRR defines the Weight
Spread Sequence (WSS). The WSS is used to determine the
service order of flows based on the contents of the matrix. For
a given matrix with k columns, a WSS of order k will be
needed. [1] uses the notation Sk to represent such a WSS. The
sequence of order 1 is defined as being a single element Sk = 1.
Higher-order Weight Spread Sequences are defined
recursively as:
S k 1  S k , k , S k
In particular, it should be noted that the values of the
Weight Spread Sequence are fixed – they are not dependent on
the number or weights of flows. However, the order of the
WSS used will be determined by number of columns in the
Weight Matrix; thus the order is determined based on the flow
weights.
void CWSS::Fill(int k) {
if (k > 1) Fill(k-1);
value[length++] = k;
if (k > 1) Fill(k-1);
}
Fig. 1: C++ Implementation of WSS Calculation in SRR Demo
1
121
12131213
12131213412131213
12131213412131213512131213412131213
Fig 2: Weight Spread Sequences of Order 1 through 5
PAUL SOUTHERINGTON, ECE742, 28 APRIL 2005
2
int CSRR::ServiceSequence(int *seq, int size) {
// Returns the size of the service sequence and fills
// the array seq with the numbers of flows served
int row, col;
int i;
// Current position in the wss
int t;
// Current time index
t = 0;
for (i = 0; i < wss.length; i++) {
// Pick columns from wss
col = wss.value[i] - 1;
for (row = 0; row < num_flows; row++) {
if (matrix[row][col] == 1) {
seq[t++] = row;
if (t >= size) return -1;
}
}
}
return t;
}
Fig 3: Simplified SRR Algorithm in C++ from SRR Demo
In C++ code, a WSS may be generated recursively as shown
in Fig. 1. However, in an actual scheduler implementation,
these would most likely be precomputed and stored in ROM or
other fixed form. Alternatively, there might be some benefit to
a linked-list implementation in which pointers were used to
insert and remove elements from the list.
Fig 2. shows the first five weight spread sequences. From
visual inspection, it should be clear that the length of a Weight
Spread Sequence of order k may be found as:
S k  2k  1
It should also be clear from Fig. 2 that a lower-order WSS
may be formed from the first elements of a higher-order one,
and that a given element i will appear in the WSS of order k a
total of 2k-i times. [1] refers to these properties as Fact 1 and
Fact 2 respectively.
IV. SRR IMPLEMENTATION
A. Simplified Service Order
The order in which flows are served is determined based on
the WSS in use and the values in the columns of the Weight
Matrix. Each value in the WSS represents a column of the
matrix. Each nonzero element in the column represents a data
flow that will be serviced whenever that column is selected.
For each value in the WSS, the scheduler will service all
flows that have nonzero terms in the column of the matrix with
the same number as the current element of the sequence.
In this way, SRR can be seen as an extension of the
traditional Weighted Round Robin algorithm [2], where the
flow weights are spread over the entire period of the scheduler,
rather than being served sequentially within each round.
A simplified, conceptual, version of the algorithm is shown
in Fig. 3. In particular, the simplified version assumes fixedlength packets and a constantly backlogged queue. Since all
flows are backlogged, the weight matrix will never change in
this simplified scenario.
B. Service Order – Full Description
The full version of Smoothed Round Robin is somewhat
more robust than described previously. To accommodate
changes in traffic, SRR is capable of dynamically adjusting the
weight matrix as needed. As additional flows arrive, additional
rows will be added to the matrix. Flows may also leave if all of
their traffic has been served or if they are manually removed.
Furthermore, SRR is able to make use of techniques used in
Deficit Round Robin [2], [3] for dealing with packets that may
not have fixed length. To do this, it maintains a deficit value
for each flow. A flow that is denied service in one iteration
will be allowed additional service time when that flow is next
serviced.
Three actions are defined: Schedule, Add_flow, and
Del_flow. These actions will be described in the following
sections. Pseudocode for these algorithms may be found in [1].
C. The Schedule function
Each time a flow is serviced, the scheduler begins by adding
the maximum packet length of the outgoing link to the current
deficit value for that flow. If the new deficit value is less than
or equal to zero, no further processing will be done on the flow
until its next time quantum. Otherwise, the scheduler will
begin transmitting packets.
Each time a packet is sent, the deficit value for the flow will
be reduced by the length of the transmitted packet. The
scheduler will continue to serve packets from the current flow
as long as the deficit value remains greater than zero and less
than the remaining deficit. If the current flow runs out of
packets before the deficit reaches zero, the queue for that flow
is empty and the flow will be deleted from the weight matrix.
Once the scheduler is finished with a given flow, it will move
to the next usable (i.e., nonzero) value in the current column of
the weight matrix. If no such value exists, the function will
read the next value of the Weight Spread Sequence and move
to the first usable value in the corresponding column of the
matrix. If this column contains no usable entries, the scheduler
will again move to the next element of the WSS, repeating this
PAUL SOUTHERINGTON, ECE742, 28 APRIL 2005
procedure as many times as necessary until it finds an entry
corresponding to some flow. Once the next scheduled flow has
been found, the Schedule function begins again.
[1] suggests representing each element in a column using a
doubly linked list. Each element of the matrix will contain
pointers to the previous and next elements of the row in
addition to the value identifying the flow to be served. This
method of implementation is designed to reduce the time
complexity of the algorithm by removing the need to scan each
column element. In this implementation, [1] argues that
Schedule may be implemented with O(1) complexity. This
is only partially true, because the current recommended
implementation includes a call to Del_flow, which is a
higher-complexity operation. This call should be removed
from the main scheduling algorithm if possible. Use of a timer
or other external mechanism for managing expired flows may
be appropriate.
In [1], the author does not specifically state how the elements
of the WSS should be represented. Rather than storing the
values as simple integers or flow ids, these could be stored as
pointers directly into the weight matrix itself.
Furthermore, it may be possible to perform column
compression within the WSS by removing entries
corresponding to columns that are empty. This would in effect
create a modified WSS tailored to the specific traffic. This
could result in a slight performance gain in the Schedule
function at the cost of decreased performance and possibly
increased complexity in the Add_flow and Del_flow
functions.
D. Add_flow and Del_flow
Add_flow will be called whenever a new flow arrives at
the scheduler. This could be a completely new flow, or merely
a flow that has been deleted from the wait matrix due to
inactivity.
The Add_flow function begins by resetting the deficit
value for that flow to zero. It must then calculate the binarycoefficient representation of the flow's weight in order to form
the new weight matrix. Rather than rebuilding the weight
matrix from scratch, the existing values may be inserted as
additional elements in a doubly-linked list described in the
preceding section. Since the addition of a new flow may be
seen as simply adding a new row to the bottom of the matrix,
the function merely adds a new element at the end, or bottom,
of the linked list for each column affected.
If the new flow has a larger weight than any existing flow,
the operation of Add_flow may also add new columns to the
left side of the matrix. If this occurs, a different WSS will be in
use, so the function will jump to the equivalent position in the
new WSS.
The Del_flow function is the inverse of Add_flow and
operates in the same manner; it merely removes nodes from
the columns of the Weight Matrix rather than adding them.
Since each of these functions may need to operate on the list
elements for each column, the worst-case complexity will be
proportional to the number of columns in the matrix. This is
equivalent to log2 of the maximum weight, which also
3
corresponds to the order k of the Weight Spread Sequence.
The worst-case complexity is therefore O(k).
V. SAMPLE ROUND OF SRR
The following illustrates a sample round of Smoothed Round
Robin. This example uses the simplified form of SRR and
assumes fixed packet sizes and a constant configuration of
always-backlogged flows. An additional example may be
found in [1].
Consider the scheduling of five flows with desired rates of
r1 = 640 kbps, r2 = 256 kbps, r3 = 256 kbps, r4 = 384 kbps, and
r5 =128 kb/s. We will assume that the capacity of the system is
sufficient to service all requests (Thus the system capacity
C ≥ 1664 kbps). The greatest common denominator of these
values is equal to 128; the relative weights assigned will then
be w1 = 5, w2 = 2, w3 = 2, w4 = 3, w5 = 1.
Since the highest weight is five, we can tell immediately that
the weight matrix will have only three columns. Since there are
five sources, the matrix will have five rows. The complete
matrix is:
WV1  1
WV  0
 2 
WM  WV3   0

 
WV 4  0
WV5  0
0 1
1 0
1 0

1 1
0 1
Because the weight matrix has three columns, we must use a
Weight Spread Sequence of order 3:
S3 1 2 1 3 1 2 1
In each round of SRR, the flows will be served in the
following order:
f1, f2, f3, f4, f1, f1, f4, f5, f1, f2, f3, f4, f1
Each flow is served proportionally to its weight, but the
packets served from each flow are spread over the entire
round. Specifically, each flow is served wf times during the
round. This may be determined easily by inspection, but a
short proof is provided in [1].
Fig. 4 shows the service distribution for this example,
Fig 4: Sample SRR Service Curve Shown in SRR Demo
PAUL SOUTHERINGTON, ECE742, 28 APRIL 2005
4
VII. ADDITIONAL PROPERTIES
Fig 5: Sample WRR Service Curve
viewed in the SRR Demo application. In comparison, Fig. 5
shows the equivalent service curve when using the simpler
Weighted Round Robin scheme [3].
VI. FAIRNESS
In general, however, we may observe the fairness property
for a service graph visually based on two criteria. First, a
global measure of fairness may be obtained based on the
number of packets served at the end of the round. These
should be directly proportional to the weights of the individual
flows. From Figs. 4 and 5, we observe that both WRR and
SRR are capable of providing long-term fairness.
This may not be sufficient, however. Short-term or local
fairness is also a concern. We may interpret this measure of
fairness based on how smooth or jagged the service curve is.
In the ideal case, all service curves should be straight lines.
Based on this, we can observe that SRR can provide good
short-term fairness as well. The ‘smoothing’ effect that gives
SRR its name is clearly visible in comparison to WRR.
Formal proofs of the fairness of SRR may be found in [1].
These show that the algorithm is globally fair – the service
rates at the end of each round will be proportional to the
weights of the flows. Additionally, these proofs provide
bounds on the degree of unfairness within individual rounds.
Much of the possibility for unfairness stems from the
sequential service of flows listed within a given column of the
Weight Matrix. This is deemed acceptable because
improvements to handling within one or more columns would
likely also increase the complexity of the scheduler.
The above also suggests that the algorithm will perform best
when the weights are distributed such that no column in the
Weight Matrix has more than one entry. [1] refers to such a
distribution of weights as being in the diagonal of the Weight
Matrix. A diagonal matrix is not strictly required, however.
For the purposes of the algorithm, any matrix that has no
column containing more than one entry should be sufficient.
An equivalent diagonal matrix may be created by simply
reordering the flows.
For any scheduling algorithm, it is important that there be an
upper bound on the maximum delay imposed on any packet or
flow. Furthermore, it is desired that these maximum delay
values be inversely proportional to the weights of the flows.
The delay for high-priority traffic is expected to be lower than
that for traffic that has a lower weight.
SRR only partially achieves this goal – it is not able to
guarantee delay bounds based solely on weight. For SRR, the
maximum delay imposed on a given flow will be determined
the binary coefficients of the flow's weight. The position of
these coefficients in the Weight Matrix will determine the
service order, and thus the maximum delay. Thus for SRR, the
delay will be inversely proportional to the weight, but also
proportional to the number of flows. This should be acceptable
for most applications, but will not be acceptable for guaranteed
real-time packet delivery or process scheduling in real-time
systems.
Scalability of SRR is quite good. The algorithm is not
affected by fluctuations in link speed, and the O(1)
Schedule function will not slow down as the number of
sources increases. To support increased traffic rates, either the
order (and length) of the Weight Scale Sequence may be
increased or the granularity of the weights may be reduced.
Reducing granularity could provide slightly lower delay
bounds at the cost of some flexibility because the WSS will
contain fewer terms. [1] recommends the use of a 32nd-order
WSS. In this configuration, SRR is capable of handling traffic
at up to 4 Tbps with 1 kbps granularity.
VIII. EXPERIMENTAL RESULTS
In [1], simulation is used to provide a comparison between
the amount of delay when using SRR versus Weighted Fair
Queuing and Differential Round Robin. The simulation was
conducted using the NS2 Network Simulator [4] software
package.
The simulated network consisted of a total of 12 hosts and
five routers. The routers were connected in sequence, with the
middle three acting as the network core. Two hosts were
connected to each of the outer core routers. Each edge router
was connected to four outside hosts. Each host or router was
assigned its own delay and bandwidth characteristics.
During the tests additional traffic was generated between
other hosts on the simulated network in a fixed to simulate
contention for resources. The traffic included real-time video,
ftp, and telnet traffic, and statistically random traffic generated
according to the Pareto distribution [5].
In addition to the baseline traffic, ten flows with Committed
Bit Rate between a fixed pair of outside hosts were added and
monitored. These hosts were selected from opposite ends of
the network and neither host was a sender or receiver of the
baseline traffic. Tests were performed using three different
weight configurations:
 Diagonalized rates set to 10,20,40,80... 320 kbps
 Weights chosen at random
o 10,20,20,40,80,80,160,260,320 kbps
 Equal rates set to 100 kbps for all flows
PAUL SOUTHERINGTON, ECE742, 28 APRIL 2005
The performance of SRR should be worst in the third case,
where SRR becomes functionally equivalent to DRR. This
occurs because there are no empty spaces in the Weight
Matrix; every column is identical. In this configuration, the
effect of the Weight Scale Sequence is nullified.
In the first two cases, SRR performs admirably, providing a
marked improvement of DRR and even approximating the
level of service provided by WFQ. Plots of the specific results
may be seen in [1]. While the data shown is consistent with the
expected results, only three test configurations are shown.
Also, no details about the configuration of SRR itself are
given. In particular, the granularity of the WSS should be
listed. If SRR is operating at 1 kbps resolution, then it is hard
to believe that the weights chosen in the second trial are fully
randomized, since all are multiples of ten. If operating at a
resolution of 10kbps, then only one row of the matrix will have
more than one binary coefficient -- all values except for 260
are built on powers of two. In either case, randomization could
be improved. However, the values as shown do succeed in
providing a middle point between the best-case and worst-case
scenarios. A Linux kernel-based implementation was also
developed and tested with similar results. These combined
results confirm the expectation that SRR can provide high
performance for non-real-time applications.
IX. ADDITIONAL COMMENTS AND CONCLUSIONS
It is important to note that, although Smoothed Round
Robin is described as O(1), there may still be significant
variations in performance. While the Schedule function
itself will require a constant time, there is a possibility of some
additional overhead incurred by other operations. In particular,
the time complexity of Add_flow and Del_flow will be
linear, as their executions times depend on the order of the
current Weight Scale Sequence. While the author of [1] does
acknowledge this fact, the title of the paper is somewhat
misleading. Practical operation of the scheduling system as a
whole must include the addition and deletion of flows. Since
the complexity of the system will be the greatest of all of the
component complexities, SRR could actually perform as
poorly as O(k). In practice, this will depend heavily on traffic
characteristics.
Future work on SRR might include attempts to improve the
performance of these two functions. The current
implementations are already simple and include several
optimizations. This suggests that it is unlikely that a significant
improvement will be found. However, any such discovery
could provide an immediate increase in performance.
There is also room for improvement in the handling of
inactive flows. For Schedule to be truly O(1), the call to
Del_flow should be removed. This will then require some
external method of removing expired flows.
Overall, SRR appears to provide a strong alternative to
existing schemes such as WRR and DRR. Because of its
inability to provide a strict scheduling delay bound, it is not a
substitute for true WFQ but may be far easier to implement.
SRR does a surprisingly good job of emulating GPS,
especially in light of its simplicity.
5
X. APPENDIX: IMPLEMENTATION OF DEMO SOFTWARE
A. Description of SRR Demo
The SRR Demo program is not intended to simulate a fully
functional SRR scheduler. Instead, it is meant as a simple tool
for visually comparing the service graphs of SRR with
different parameters, and for comparing the service graphs of
different scheduling algorithms. Screen shots of service curves
using this program are shown in Figs. 3 and 4.. The program is
capable of showing service curves for Round Robin, Weighted
Round Robin, and Smoothed Round Robin scheduling.
The program was developed using Microsoft Visual C++ on
the Windows XP platform. Generic scheduler functions are
implemented in an abstract base class to allow easier addition
of new scheduling algorithms. All source code fragments
shown in this paper are taken directly from the SRR Demo
program.
Usage of the program is self-explanatory. The user may
choose from between 1 and 10 flows, specifying the weights of
each. Where possible, these weights will be reduced to the
greatest common denominator before being displayed. The
output will be dynamically scaled based on the number of time
slots required for each round and the highest flow weight.
B. Obtaining the Software
Full source code and the executable version of the software
are available online. These may be downloaded from the
following URL:
http://www.southerington.com/souther/projects/srr/
REFERENCES
[1]
[2]
[3]
[4]
[5]
C. Guo, "SRR: An O(1) Time-Complexity Packet Scheduler for Flows
in Multiservice Packet Networks", IEEE/ACM Trans. Networking, vol.
12, pp. 1144-1155, Dec. 2004
S. Keshav, An Engineering Approach to Computer Networking,
Reading, MA: Addison-Wesley, 1997, pp. 236-240
M. Shreedhar and G. Varghese, "Efficient fair queuing using Deficit
Round Robin", IEEE/ACM Trans. Networking¸ vol 4, pp. 375-385, June
1996
The Network Simulator -- ns2 [Online]. Available:
http://www.isi.edu/nsnam/ns/
E. Weisstein, Pareto Distribution [Online]. Available:
http://mathworld.wolfram.com/ParetoDistribution.html