EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches High Performance Switching and Routing Telecom Center Workshop: Sept 4, 1997. Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University [email protected] http://www.stanford.edu/~nickm 2004 EE384y 1 Outline Up until now, we have focused on high performance packet switches with: 1. 2. 3. 4. A crossbar switching fabric, Input queues (and possibly output queues as well), Virtual output queues, and Centralized arbitration/scheduling algorithm. Today we’ll talk about the implementation of the crossbar switch fabric itself. How are they built, how do they scale, and what limits their capacity? 2004 EE384y 2 Crossbar switch Limiting factors 1. 2. 3. N2 crosspoints per chip, or N x N-to-1 multiplexors It’s not obvious how to build a crossbar from multiple chips, Capacity of “I/O”s per chip. 2004 State of the art: About 300 pins each operating at 3.125Gb/s ~= 1Tb/s per chip. About 1/3 to 1/2 of this capacity available in practice because of overhead and speedup. Crossbar chips today are limited by “I/O” capacity. EE384y 3 Scaling number of outputs: Trying to build a crossbar from multiple chips 16x16 crossbar switch: 4 inputs Building Block: 4 outputs Eight inputs and eight outputs required! 2004 EE384y 4 Scaling line-rate: Bit-sliced parallelism k 8 7 6 5 4 3 2 Cell Cell Cell Linecard 1 Scheduler 2004 EE384y • Cell is “striped” across multiple identical planes. • Crossbar switched “bus”. • Scheduler makes same decision for all slices. 5 Scaling line-rate: Time-sliced parallelism Linecard k Cell 8 7 6 5 4 3 2 Cell Cell Cell 1 Cell Cell Scheduler 2004 EE384y • Cell carried by one plane; takes k cell times. • Scheduler is unchanged. • Scheduler makes decision for each slice in turn. 6 Scaling a crossbar Conclusion: scaling the capacity is relatively straightforward (although the chip count and power may become a problem). What if we want to increase the number of ports? Can we build a crossbar-equivalent from multiple stages of smaller crossbars? If so, what properties should it have? 2004 EE384y 7 3-stage Clos Network mxm 1 n N nxk 1 1 2 1 2 … 2 … … … … m m k 2004 kxn EE384y 1 n N N=nxm k >= n 8 With k = n, is a Clos network nonblocking like a crossbar? Consider the example: scheduler chooses to match (1,1), (2,4), (3,3), (4,2) 2004 EE384y 9 With k = n is a Clos network nonblocking like a crossbar? Consider the example: scheduler chooses to match (1,1), (2,2), (4,4), (5,3), … By rearranging matches, the connections could be added. Q: Is this Clos network “rearrangeably non-blocking”? 2004 EE384y 10 With k = n a Clos network is rearrangeably non-blocking Routing matches is equivalent to edge-coloring in a bipartite multigraph. Colors correspond to middle-stage switches. (1,1), (2,4), (3,3), (4,2) Each vertex corresponds to an n x k or k x n switch. No two edges at a vertex may be colored the same. Vizing ‘64: a D-degree bipartite graph can be colored in D colors. Therefore, if k = n, a 3-stage Clos network is rearrangeably non-blocking (and can therefore perform any permutation). 2004 EE384y 11 How complex is the rearrangement? Method 1: Find a maximum size bipartite matching for each of D colors in turn, O(DN2.5). Method 2: Partition graph into Euler sets, O(N.logD) [Cole et al. ‘00] 2004 EE384y 12 Edge-Coloring using Euler sets Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O(E)]. For D=2i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E). 2004 EE384y 13 Euler partition of a graph Euler partiton of graph G: 1. Each odd degree vertex is at the end of one open path. 2. Each even degree vertex is at the end of no open path. 2004 EE384y 14 Euler split of a graph G G1 G2 Euler split of G into G1 and G2: 1. Scan each path in an Euler partition. 2. Place each alternate edge into G1 and G2 2004 EE384y 15 Edge-Coloring using Euler sets Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O(E)]. For D=2i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E). 2004 EE384y 16 Implementation Request graph 2004 Scheduler Permutation EE384y Route connections Paths 17 Implementation Pros A rearrangeably non-blocking switch can perform any permutation A cell switch is time-slotted, so all connections are rearranged every time slot anyway Cons Rearrangement algorithms are complex (in addition to the scheduler) Can we eliminate the need to rearrange? 2004 EE384y 18 Strictly non-blocking Clos Network Clos’ Theorem: If k >= 2n – 1, then a new connection can always be added without rearrangement. 2004 EE384y 19 mxm 1 n N nxk M1 I1 M2 O1 I2 … O2 … … … Im … Om Mk 2004 kxn EE384y 1 n N N=nxm k >= n 20 Clos Theorem 1 x Ia x+n n – 1 already in use at input and output. 1 Ob n n k k 1. Consider adding the n-th connection between 1st stage Ia and 3rd stage Ob. 2. We need to ensure that there is always some center-stage M available. 3. If k > (n – 1) + (n – 1) , then there is always an M available. i.e. we need k >= 2n – 1. 2004 EE384y 21 Scaling Crossbars: Summary Scaling capacity through parallelism (bitslicing and time-slicing) is straightforward. Scaling number of ports is harder… Clos network: 2004 Rearrangeably non-blocking with k = n, but routing is complicated, Strictly non-blocking with k >= 2n – 1, so routing is simple. But requires more bisection bandwidth. EE384y 22
© Copyright 2026 Paperzz