Strictly non-blocking Clos Network

EE384Y: Packet Switch Architectures
Part II
Scaling Crossbar Switches
High Performance
Switching and Routing
Telecom Center Workshop: Sept 4, 1997.
Nick McKeown
Professor of Electrical Engineering
and Computer Science, Stanford University
[email protected]
http://www.stanford.edu/~nickm
2004
EE384y
1
Outline
Up until now, we have focused on high performance
packet switches with:
1.
2.
3.
4.
A crossbar switching fabric,
Input queues (and possibly output queues as well),
Virtual output queues, and
Centralized arbitration/scheduling algorithm.
Today we’ll talk about the implementation of the
crossbar switch fabric itself. How are they built,
how do they scale, and what limits their
capacity?
2004
EE384y
2
Crossbar switch
Limiting factors
1.
2.
3.
N2 crosspoints per chip, or N x N-to-1
multiplexors
It’s not obvious how to build a crossbar
from multiple chips,
Capacity of “I/O”s per chip.



2004
State of the art: About 300 pins each
operating at 3.125Gb/s ~= 1Tb/s per chip.
About 1/3 to 1/2 of this capacity available in
practice because of overhead and speedup.
Crossbar chips today are limited by “I/O”
capacity.
EE384y
3
Scaling number of outputs:
Trying to build a crossbar from multiple chips
16x16 crossbar switch:
4 inputs
Building Block:
4 outputs
Eight inputs and eight
outputs required!
2004
EE384y
4
Scaling line-rate:
Bit-sliced parallelism
k
8
7
6
5
4
3
2
Cell
Cell
Cell
Linecard
1
Scheduler
2004
EE384y
• Cell is “striped”
across multiple
identical planes.
• Crossbar
switched “bus”.
• Scheduler makes
same decision for
all slices.
5
Scaling line-rate:
Time-sliced parallelism
Linecard
k
Cell
8
7
6
5
4
3
2
Cell
Cell
Cell
1
Cell
Cell
Scheduler
2004
EE384y
• Cell carried by
one plane; takes k
cell times.
• Scheduler is
unchanged.
• Scheduler makes
decision for each
slice in turn.
6
Scaling a crossbar
Conclusion: scaling the capacity is relatively
straightforward (although the chip count
and power may become a problem).
 What if we want to increase the number of
ports?
 Can we build a crossbar-equivalent from
multiple stages of smaller crossbars?
 If so, what properties should it have?

2004
EE384y
7
3-stage Clos Network
mxm
1
n
N
nxk
1
1
2
1
2
…
2
…
…
…
…
m
m
k
2004
kxn
EE384y
1
n
N
N=nxm
k >= n
8
With k = n, is a Clos network nonblocking like a crossbar?
Consider the example: scheduler chooses to match
(1,1), (2,4), (3,3), (4,2)
2004
EE384y
9
With k = n is a Clos network nonblocking like a crossbar?
Consider the example: scheduler chooses to match
(1,1), (2,2), (4,4), (5,3), …
By rearranging matches, the connections could be added.
Q: Is this Clos network “rearrangeably non-blocking”?
2004
EE384y
10
With k = n a Clos network is
rearrangeably non-blocking
Routing matches is equivalent to edge-coloring in a bipartite multigraph.
Colors correspond to middle-stage switches.
(1,1), (2,4), (3,3), (4,2)
Each vertex corresponds
to an n x k or k x n
switch.
No two edges at a vertex
may be colored the same.
Vizing ‘64: a D-degree bipartite graph can be colored in D colors.
Therefore, if k = n, a 3-stage Clos network is rearrangeably non-blocking
(and can therefore perform any permutation).
2004
EE384y
11
How complex is the
rearrangement?
 Method
1: Find a maximum size
bipartite matching for each of D
colors in turn, O(DN2.5).
 Method 2: Partition graph into Euler
sets, O(N.logD) [Cole et al. ‘00]
2004
EE384y
12
Edge-Coloring using Euler sets
 Make
the graph regular: Modify the
graph so that every vertex has the
same degree, D. [combine vertices
and add edges; O(E)].
 For D=2i, perform i “Euler splits” and
1-color each resulting graph. This is
logD operations, each of O(E).
2004
EE384y
13
Euler partition of a graph
Euler partiton of graph G:
1. Each odd degree vertex is at the end of one open path.
2. Each even degree vertex is at the end of no open path.
2004
EE384y
14
Euler split of a graph
G
G1
G2
Euler split of G into G1 and G2:
1. Scan each path in an Euler
partition.
2. Place each alternate edge
into G1 and G2
2004
EE384y
15
Edge-Coloring using Euler sets
 Make
the graph regular: Modify the
graph so that every vertex has the
same degree, D. [combine vertices
and add edges; O(E)].
 For D=2i, perform i “Euler splits” and
1-color each resulting graph. This is
logD operations, each of O(E).
2004
EE384y
16
Implementation
Request
graph
2004
Scheduler
Permutation
EE384y
Route
connections
Paths
17
Implementation
Pros
 A rearrangeably non-blocking switch can perform
any permutation
 A cell switch is time-slotted, so all connections are
rearranged every time slot anyway
Cons
 Rearrangement algorithms are complex (in addition
to the scheduler)
Can we eliminate the need to rearrange?
2004
EE384y
18
Strictly non-blocking Clos Network
Clos’ Theorem:
If k >= 2n – 1, then a new connection can always
be added without rearrangement.
2004
EE384y
19
mxm
1
n
N
nxk
M1
I1
M2
O1
I2
…
O2
…
…
…
Im
…
Om
Mk
2004
kxn
EE384y
1
n
N
N=nxm
k >= n
20
Clos Theorem
1
x
Ia
x+n
n – 1 already
in use at input
and output.
1
Ob
n
n
k
k
1.
Consider adding the n-th connection between
1st stage Ia and 3rd stage Ob.
2. We need to ensure that there is always some
center-stage M available.
3. If k > (n – 1) + (n – 1) , then there is always an M
available. i.e. we need k >= 2n – 1.
2004
EE384y
21
Scaling Crossbars: Summary
Scaling capacity through parallelism (bitslicing and time-slicing) is straightforward.
 Scaling number of ports is harder…
 Clos network:



2004
Rearrangeably non-blocking with k = n, but
routing is complicated,
Strictly non-blocking with k >= 2n – 1, so routing
is simple. But requires more bisection
bandwidth.
EE384y
22