PAUL SOUTHERINGTON, ECE742, 28 APRIL 2005 1 The Smoothed Round-Robin Scheduler Paul Southerington, Member, IEEE Abstract—This paper reviews and illustrates the concepts put forth in the original paper by Guo [1]. The Smoothed Round Robin scheduling algorithm is explained and discussed, and a sample Windows application for generation of SRR service curves is developed. Index Terms—Scheduling, SRR, SRR Demo, Weight Matrix, WSS I. INTRODUCTION S moothed Round Robin, or SRR, is a work-conserving packet scheduling algorithm that attempts to provide maximum fairness while maintaining only O(1) time complexity. This is in contrast to other approaches such as Weighted Fair Queuing (WFQ), that use time stamps to approximate the results of a Generalized Processor Sharing (GPS) scheduler. Such methods often provide good fairness properties at the cost of increased complexity of the algorithm. O(N) or O(log N) complexity is typical of these schemes. Furthermore, [1] suggests that it is unlikely that such algorithms will be able to improve beyond O(log N) because of their dependence on sorting algorithms, yet this level of complexity may be unacceptable when operating at high speeds. [1] lists four properties expected of a packet scheduler: low time complexity, fair treatment of different flows, a low upper bound on delay and low delay variation, and simplicity/efficiency of implementation. In practice, however, these are often conflicting needs. It is difficult to achieve high fairness and low delay bounds with simple, easy-to-implement algorithms. SRR does a remarkably good job of addressing this tradeoff. Formally, [1] defines several distinct roles in providing service. The packet classifier, admission controller, and packet enqueuer are considered to be separate tasks from the packet scheduler itself and so are not be included in the O(1) complexity measurement. In the case of SRR, flow management is also excluded from this measurement. II. THE WEIGHT MATRIX The SRR algorithm makes use of several important data structures. The first of these is the Weight Matrix. Each element in the Weight Matrix will be either zero or one. The Manuscript received April 28, 2005. Paul Southerington is a graduate student in Computer Engineering at George Mason University, Fairfax, VA 22030 USA (e-mail: [email protected]). matrix has one row corresponding to each data flow. The individual elements of each row are determined based on the weight of the corresponding flow; it may be appropriate to normalize the weights by reducing such that the greatest common divisor of any pair of weights is equal to one. The contents of a row are determined by representing the reduced weight as a series of binary coefficients. Such a series of coefficients is referred to as a Weight Vector. For example, a weight of 19 is equal to 16 + 2 + 1, which is equivalent to 24 + 21 + 20. Thus: WV (19) 1 0 0 1 1 The number of columns in the Weight Matrix may then be found as: n log c 2 (w f max ) where nc is the number of columns of the matrix, and wfmax is the weight of highest-priority flow. III. THE WEIGHT SPREAD SEQUENCE In addition to the Weight Matrix, SRR defines the Weight Spread Sequence (WSS). The WSS is used to determine the service order of flows based on the contents of the matrix. For a given matrix with k columns, a WSS of order k will be needed. [1] uses the notation Sk to represent such a WSS. The sequence of order 1 is defined as being a single element Sk = 1. Higher-order Weight Spread Sequences are defined recursively as: S k 1 S k , k , S k In particular, it should be noted that the values of the Weight Spread Sequence are fixed – they are not dependent on the number or weights of flows. However, the order of the WSS used will be determined by number of columns in the Weight Matrix; thus the order is determined based on the flow weights. void CWSS::Fill(int k) { if (k > 1) Fill(k-1); value[length++] = k; if (k > 1) Fill(k-1); } Fig. 1: C++ Implementation of WSS Calculation in SRR Demo 1 121 12131213 12131213412131213 12131213412131213512131213412131213 Fig 2: Weight Spread Sequences of Order 1 through 5 PAUL SOUTHERINGTON, ECE742, 28 APRIL 2005 2 int CSRR::ServiceSequence(int *seq, int size) { // Returns the size of the service sequence and fills // the array seq with the numbers of flows served int row, col; int i; // Current position in the wss int t; // Current time index t = 0; for (i = 0; i < wss.length; i++) { // Pick columns from wss col = wss.value[i] - 1; for (row = 0; row < num_flows; row++) { if (matrix[row][col] == 1) { seq[t++] = row; if (t >= size) return -1; } } } return t; } Fig 3: Simplified SRR Algorithm in C++ from SRR Demo In C++ code, a WSS may be generated recursively as shown in Fig. 1. However, in an actual scheduler implementation, these would most likely be precomputed and stored in ROM or other fixed form. Alternatively, there might be some benefit to a linked-list implementation in which pointers were used to insert and remove elements from the list. Fig 2. shows the first five weight spread sequences. From visual inspection, it should be clear that the length of a Weight Spread Sequence of order k may be found as: S k 2k 1 It should also be clear from Fig. 2 that a lower-order WSS may be formed from the first elements of a higher-order one, and that a given element i will appear in the WSS of order k a total of 2k-i times. [1] refers to these properties as Fact 1 and Fact 2 respectively. IV. SRR IMPLEMENTATION A. Simplified Service Order The order in which flows are served is determined based on the WSS in use and the values in the columns of the Weight Matrix. Each value in the WSS represents a column of the matrix. Each nonzero element in the column represents a data flow that will be serviced whenever that column is selected. For each value in the WSS, the scheduler will service all flows that have nonzero terms in the column of the matrix with the same number as the current element of the sequence. In this way, SRR can be seen as an extension of the traditional Weighted Round Robin algorithm [2], where the flow weights are spread over the entire period of the scheduler, rather than being served sequentially within each round. A simplified, conceptual, version of the algorithm is shown in Fig. 3. In particular, the simplified version assumes fixedlength packets and a constantly backlogged queue. Since all flows are backlogged, the weight matrix will never change in this simplified scenario. B. Service Order – Full Description The full version of Smoothed Round Robin is somewhat more robust than described previously. To accommodate changes in traffic, SRR is capable of dynamically adjusting the weight matrix as needed. As additional flows arrive, additional rows will be added to the matrix. Flows may also leave if all of their traffic has been served or if they are manually removed. Furthermore, SRR is able to make use of techniques used in Deficit Round Robin [2], [3] for dealing with packets that may not have fixed length. To do this, it maintains a deficit value for each flow. A flow that is denied service in one iteration will be allowed additional service time when that flow is next serviced. Three actions are defined: Schedule, Add_flow, and Del_flow. These actions will be described in the following sections. Pseudocode for these algorithms may be found in [1]. C. The Schedule function Each time a flow is serviced, the scheduler begins by adding the maximum packet length of the outgoing link to the current deficit value for that flow. If the new deficit value is less than or equal to zero, no further processing will be done on the flow until its next time quantum. Otherwise, the scheduler will begin transmitting packets. Each time a packet is sent, the deficit value for the flow will be reduced by the length of the transmitted packet. The scheduler will continue to serve packets from the current flow as long as the deficit value remains greater than zero and less than the remaining deficit. If the current flow runs out of packets before the deficit reaches zero, the queue for that flow is empty and the flow will be deleted from the weight matrix. Once the scheduler is finished with a given flow, it will move to the next usable (i.e., nonzero) value in the current column of the weight matrix. If no such value exists, the function will read the next value of the Weight Spread Sequence and move to the first usable value in the corresponding column of the matrix. If this column contains no usable entries, the scheduler will again move to the next element of the WSS, repeating this PAUL SOUTHERINGTON, ECE742, 28 APRIL 2005 procedure as many times as necessary until it finds an entry corresponding to some flow. Once the next scheduled flow has been found, the Schedule function begins again. [1] suggests representing each element in a column using a doubly linked list. Each element of the matrix will contain pointers to the previous and next elements of the row in addition to the value identifying the flow to be served. This method of implementation is designed to reduce the time complexity of the algorithm by removing the need to scan each column element. In this implementation, [1] argues that Schedule may be implemented with O(1) complexity. This is only partially true, because the current recommended implementation includes a call to Del_flow, which is a higher-complexity operation. This call should be removed from the main scheduling algorithm if possible. Use of a timer or other external mechanism for managing expired flows may be appropriate. In [1], the author does not specifically state how the elements of the WSS should be represented. Rather than storing the values as simple integers or flow ids, these could be stored as pointers directly into the weight matrix itself. Furthermore, it may be possible to perform column compression within the WSS by removing entries corresponding to columns that are empty. This would in effect create a modified WSS tailored to the specific traffic. This could result in a slight performance gain in the Schedule function at the cost of decreased performance and possibly increased complexity in the Add_flow and Del_flow functions. D. Add_flow and Del_flow Add_flow will be called whenever a new flow arrives at the scheduler. This could be a completely new flow, or merely a flow that has been deleted from the wait matrix due to inactivity. The Add_flow function begins by resetting the deficit value for that flow to zero. It must then calculate the binarycoefficient representation of the flow's weight in order to form the new weight matrix. Rather than rebuilding the weight matrix from scratch, the existing values may be inserted as additional elements in a doubly-linked list described in the preceding section. Since the addition of a new flow may be seen as simply adding a new row to the bottom of the matrix, the function merely adds a new element at the end, or bottom, of the linked list for each column affected. If the new flow has a larger weight than any existing flow, the operation of Add_flow may also add new columns to the left side of the matrix. If this occurs, a different WSS will be in use, so the function will jump to the equivalent position in the new WSS. The Del_flow function is the inverse of Add_flow and operates in the same manner; it merely removes nodes from the columns of the Weight Matrix rather than adding them. Since each of these functions may need to operate on the list elements for each column, the worst-case complexity will be proportional to the number of columns in the matrix. This is equivalent to log2 of the maximum weight, which also 3 corresponds to the order k of the Weight Spread Sequence. The worst-case complexity is therefore O(k). V. SAMPLE ROUND OF SRR The following illustrates a sample round of Smoothed Round Robin. This example uses the simplified form of SRR and assumes fixed packet sizes and a constant configuration of always-backlogged flows. An additional example may be found in [1]. Consider the scheduling of five flows with desired rates of r1 = 640 kbps, r2 = 256 kbps, r3 = 256 kbps, r4 = 384 kbps, and r5 =128 kb/s. We will assume that the capacity of the system is sufficient to service all requests (Thus the system capacity C ≥ 1664 kbps). The greatest common denominator of these values is equal to 128; the relative weights assigned will then be w1 = 5, w2 = 2, w3 = 2, w4 = 3, w5 = 1. Since the highest weight is five, we can tell immediately that the weight matrix will have only three columns. Since there are five sources, the matrix will have five rows. The complete matrix is: WV1 1 WV 0 2 WM WV3 0 WV 4 0 WV5 0 0 1 1 0 1 0 1 1 0 1 Because the weight matrix has three columns, we must use a Weight Spread Sequence of order 3: S3 1 2 1 3 1 2 1 In each round of SRR, the flows will be served in the following order: f1, f2, f3, f4, f1, f1, f4, f5, f1, f2, f3, f4, f1 Each flow is served proportionally to its weight, but the packets served from each flow are spread over the entire round. Specifically, each flow is served wf times during the round. This may be determined easily by inspection, but a short proof is provided in [1]. Fig. 4 shows the service distribution for this example, Fig 4: Sample SRR Service Curve Shown in SRR Demo PAUL SOUTHERINGTON, ECE742, 28 APRIL 2005 4 VII. ADDITIONAL PROPERTIES Fig 5: Sample WRR Service Curve viewed in the SRR Demo application. In comparison, Fig. 5 shows the equivalent service curve when using the simpler Weighted Round Robin scheme [3]. VI. FAIRNESS In general, however, we may observe the fairness property for a service graph visually based on two criteria. First, a global measure of fairness may be obtained based on the number of packets served at the end of the round. These should be directly proportional to the weights of the individual flows. From Figs. 4 and 5, we observe that both WRR and SRR are capable of providing long-term fairness. This may not be sufficient, however. Short-term or local fairness is also a concern. We may interpret this measure of fairness based on how smooth or jagged the service curve is. In the ideal case, all service curves should be straight lines. Based on this, we can observe that SRR can provide good short-term fairness as well. The ‘smoothing’ effect that gives SRR its name is clearly visible in comparison to WRR. Formal proofs of the fairness of SRR may be found in [1]. These show that the algorithm is globally fair – the service rates at the end of each round will be proportional to the weights of the flows. Additionally, these proofs provide bounds on the degree of unfairness within individual rounds. Much of the possibility for unfairness stems from the sequential service of flows listed within a given column of the Weight Matrix. This is deemed acceptable because improvements to handling within one or more columns would likely also increase the complexity of the scheduler. The above also suggests that the algorithm will perform best when the weights are distributed such that no column in the Weight Matrix has more than one entry. [1] refers to such a distribution of weights as being in the diagonal of the Weight Matrix. A diagonal matrix is not strictly required, however. For the purposes of the algorithm, any matrix that has no column containing more than one entry should be sufficient. An equivalent diagonal matrix may be created by simply reordering the flows. For any scheduling algorithm, it is important that there be an upper bound on the maximum delay imposed on any packet or flow. Furthermore, it is desired that these maximum delay values be inversely proportional to the weights of the flows. The delay for high-priority traffic is expected to be lower than that for traffic that has a lower weight. SRR only partially achieves this goal – it is not able to guarantee delay bounds based solely on weight. For SRR, the maximum delay imposed on a given flow will be determined the binary coefficients of the flow's weight. The position of these coefficients in the Weight Matrix will determine the service order, and thus the maximum delay. Thus for SRR, the delay will be inversely proportional to the weight, but also proportional to the number of flows. This should be acceptable for most applications, but will not be acceptable for guaranteed real-time packet delivery or process scheduling in real-time systems. Scalability of SRR is quite good. The algorithm is not affected by fluctuations in link speed, and the O(1) Schedule function will not slow down as the number of sources increases. To support increased traffic rates, either the order (and length) of the Weight Scale Sequence may be increased or the granularity of the weights may be reduced. Reducing granularity could provide slightly lower delay bounds at the cost of some flexibility because the WSS will contain fewer terms. [1] recommends the use of a 32nd-order WSS. In this configuration, SRR is capable of handling traffic at up to 4 Tbps with 1 kbps granularity. VIII. EXPERIMENTAL RESULTS In [1], simulation is used to provide a comparison between the amount of delay when using SRR versus Weighted Fair Queuing and Differential Round Robin. The simulation was conducted using the NS2 Network Simulator [4] software package. The simulated network consisted of a total of 12 hosts and five routers. The routers were connected in sequence, with the middle three acting as the network core. Two hosts were connected to each of the outer core routers. Each edge router was connected to four outside hosts. Each host or router was assigned its own delay and bandwidth characteristics. During the tests additional traffic was generated between other hosts on the simulated network in a fixed to simulate contention for resources. The traffic included real-time video, ftp, and telnet traffic, and statistically random traffic generated according to the Pareto distribution [5]. In addition to the baseline traffic, ten flows with Committed Bit Rate between a fixed pair of outside hosts were added and monitored. These hosts were selected from opposite ends of the network and neither host was a sender or receiver of the baseline traffic. Tests were performed using three different weight configurations: Diagonalized rates set to 10,20,40,80... 320 kbps Weights chosen at random o 10,20,20,40,80,80,160,260,320 kbps Equal rates set to 100 kbps for all flows PAUL SOUTHERINGTON, ECE742, 28 APRIL 2005 The performance of SRR should be worst in the third case, where SRR becomes functionally equivalent to DRR. This occurs because there are no empty spaces in the Weight Matrix; every column is identical. In this configuration, the effect of the Weight Scale Sequence is nullified. In the first two cases, SRR performs admirably, providing a marked improvement of DRR and even approximating the level of service provided by WFQ. Plots of the specific results may be seen in [1]. While the data shown is consistent with the expected results, only three test configurations are shown. Also, no details about the configuration of SRR itself are given. In particular, the granularity of the WSS should be listed. If SRR is operating at 1 kbps resolution, then it is hard to believe that the weights chosen in the second trial are fully randomized, since all are multiples of ten. If operating at a resolution of 10kbps, then only one row of the matrix will have more than one binary coefficient -- all values except for 260 are built on powers of two. In either case, randomization could be improved. However, the values as shown do succeed in providing a middle point between the best-case and worst-case scenarios. A Linux kernel-based implementation was also developed and tested with similar results. These combined results confirm the expectation that SRR can provide high performance for non-real-time applications. IX. ADDITIONAL COMMENTS AND CONCLUSIONS It is important to note that, although Smoothed Round Robin is described as O(1), there may still be significant variations in performance. While the Schedule function itself will require a constant time, there is a possibility of some additional overhead incurred by other operations. In particular, the time complexity of Add_flow and Del_flow will be linear, as their executions times depend on the order of the current Weight Scale Sequence. While the author of [1] does acknowledge this fact, the title of the paper is somewhat misleading. Practical operation of the scheduling system as a whole must include the addition and deletion of flows. Since the complexity of the system will be the greatest of all of the component complexities, SRR could actually perform as poorly as O(k). In practice, this will depend heavily on traffic characteristics. Future work on SRR might include attempts to improve the performance of these two functions. The current implementations are already simple and include several optimizations. This suggests that it is unlikely that a significant improvement will be found. However, any such discovery could provide an immediate increase in performance. There is also room for improvement in the handling of inactive flows. For Schedule to be truly O(1), the call to Del_flow should be removed. This will then require some external method of removing expired flows. Overall, SRR appears to provide a strong alternative to existing schemes such as WRR and DRR. Because of its inability to provide a strict scheduling delay bound, it is not a substitute for true WFQ but may be far easier to implement. SRR does a surprisingly good job of emulating GPS, especially in light of its simplicity. 5 X. APPENDIX: IMPLEMENTATION OF DEMO SOFTWARE A. Description of SRR Demo The SRR Demo program is not intended to simulate a fully functional SRR scheduler. Instead, it is meant as a simple tool for visually comparing the service graphs of SRR with different parameters, and for comparing the service graphs of different scheduling algorithms. Screen shots of service curves using this program are shown in Figs. 3 and 4.. The program is capable of showing service curves for Round Robin, Weighted Round Robin, and Smoothed Round Robin scheduling. The program was developed using Microsoft Visual C++ on the Windows XP platform. Generic scheduler functions are implemented in an abstract base class to allow easier addition of new scheduling algorithms. All source code fragments shown in this paper are taken directly from the SRR Demo program. Usage of the program is self-explanatory. The user may choose from between 1 and 10 flows, specifying the weights of each. Where possible, these weights will be reduced to the greatest common denominator before being displayed. The output will be dynamically scaled based on the number of time slots required for each round and the highest flow weight. B. Obtaining the Software Full source code and the executable version of the software are available online. These may be downloaded from the following URL: http://www.southerington.com/souther/projects/srr/ REFERENCES [1] [2] [3] [4] [5] C. Guo, "SRR: An O(1) Time-Complexity Packet Scheduler for Flows in Multiservice Packet Networks", IEEE/ACM Trans. Networking, vol. 12, pp. 1144-1155, Dec. 2004 S. Keshav, An Engineering Approach to Computer Networking, Reading, MA: Addison-Wesley, 1997, pp. 236-240 M. Shreedhar and G. Varghese, "Efficient fair queuing using Deficit Round Robin", IEEE/ACM Trans. Networking¸ vol 4, pp. 375-385, June 1996 The Network Simulator -- ns2 [Online]. Available: http://www.isi.edu/nsnam/ns/ E. Weisstein, Pareto Distribution [Online]. Available: http://mathworld.wolfram.com/ParetoDistribution.html
© Copyright 2026 Paperzz