Document

ESE680-002 (ESE534):
Computer Organization
Day 16: March 14, 2007
Interconnect 4: Switching
1
Penn ESE680-002 Spring2007 -- DeHon
Previously
• Used Rent’s Rule characterization to
understand wire growth
IO = c Np
• Top bisections will be (Np)
• 2D wiring area
(Np)(Np) = (N2p)
2
Penn ESE680-002 Spring2007 -- DeHon
We Know
• How we avoid O(N2) wire growth for
“typical” designs
• How to characterize locality
• How we might exploit that locality to
reduce wire growth
• Wire growth implied by a characterized
design
3
Penn ESE680-002 Spring2007 -- DeHon
Today
• Switching
– Implications
– Options
4
Penn ESE680-002 Spring2007 -- DeHon
Switching:
How can we use the locality captured
by Rent’s Rule to reduce switching
requirements? (How much?)
5
Penn ESE680-002 Spring2007 -- DeHon
Observation
• Locality that saved us wiring,
also saves us switching
IO = c
p
N
6
Penn ESE680-002 Spring2007 -- DeHon
Consider
• Crossbar case to exploit wiring:




split into two halves, connect with limited wires
N/2 x N/2 crossbar each half
N/2 x (N/2)p connect to bisection wires
2(N2/4) + 2(N/2)p+1 < N2
7
Penn ESE680-002 Spring2007 -- DeHon
Recurse
• Repeat at each level
– form tree
8
Penn ESE680-002 Spring2007 -- DeHon
Result
• If use crossbar at each tree node
– O(N2p) wiring area
• for p>0.5, direct from bisection
– O(N2p) switches
• top switch box is O(N2p)
–2 Wtop×Wbot+(Wbot)2
–2 (Np×(N/2)p)+(N/2)2p
–N2p(1/2p + 1/22p)
9
Penn ESE680-002 Spring2007 -- DeHon
Result
• If use crossbar at each tree node
– O(N2p) wiring area
• for p>0.5, direct from bisection
– O(N2p) switches
• top switch box is O(N2p)
–N2p(1/2p + 1/22p)
• switches at one level down is
–2(N/2)2p(1/2p + 1/22p)
–2(1/2p)2 ( N2p(1/2p + 1/22p))
p 2
–2(1/2
Penn ESE680-002 Spring2007
-- DeHon )  previous level
10
Result
• If use crossbar at each tree node
– O(N2p) wiring area
• for p>0.5, direct from bisection
– O(N2p) switches
• top switch box is O(N2p)
–N2p(1/2p + 1/22p)
• switches at one level down is
–2(1/2p)2  previous level
–(2/22p)=2(1-2p)
–coefficient
< 1 for p>0.5
Penn ESE680-002 Spring2007
-- DeHon
11
Result
• If use crossbar at each tree node
– O(N2p) switches
• top switch box is O(N2p)
• switches at one level down is
 2(1-2p)  previous level
• Total switches:
N2p (1+2(1-2p) +22(1-2p) +23(1-2p) +…)
 get geometric series; sums to O(1)
N2p (1/(1-2(1-2p) ))

Penn ESE680-002 Spring2007 -- DeHon
= 2(2p-1) /(2(2p-1) -1)  N2p
12
Good News
• Good news
– asymptotically optimal
– Even without switches, area O(N2p)
• so adding O(N2p) switches not change
13
Penn ESE680-002 Spring2007 -- DeHon
Bad News
• Switches area >> wire crossing area
– Consider 8l wire pitch  crossing 64 l2
– Typical (passive) switch 
2500 l2
– Passive only: 40× area difference
• worse once rebuffer or latch signals.
• …and switches limited to substrate
– whereas can use additional metal layers
for wiring area
14
Penn ESE680-002 Spring2007 -- DeHon
Additional Structure
• This motivates us to look beyond
crossbars
– can depopulate crossbars on up-down
connection without loss of functionality?
15
Penn ESE680-002 Spring2007 -- DeHon
Can we do better?
• Crossbar too powerful?
– Does the specific down channel matter?
• What do we want to do?
– Connect to any channel on lower level
– Choose a subset of wires from upper level
• order not important
16
Penn ESE680-002 Spring2007 -- DeHon
N choose K
• Exploit freedom to depopulate
switchbox
• Can do with:
– K(N-K+1) switches
– Vs. K  N
– Save ~K2
17
Penn ESE680-002 Spring2007 -- DeHon
N-choose-M
• Up-down connections
– only require concentration
• choose M things out of N
– i.e. order of subset irrelevant
• Consequent:
– can save a constant factor ~ 2p/(2p-1)
• (N/2)p x Np vs (Np - (N/2)p+1)(N/2)p
• P=2/3  2p/(2p-1)  2.7
• Similary, Left-Right
– order not important  reduces switches
Penn ESE680-002 Spring2007 -- DeHon
18
Multistage Switching
19
Penn ESE680-002 Spring2007 -- DeHon
Multistage Switching
• We can route any permutation w/ less
switches than a crossbar
• If we allow switching in stages
– Trade increase in switches in path
– For decrease in total switches
20
Penn ESE680-002 Spring2007 -- DeHon
Butterfly
• Log stages
• Resolve one bit per stage
bit3
Penn ESE680-002 Spring2007 -- DeHon
bit2
bit1
bit0
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
21 1110
1111
What can a Butterfly Route?
0000
1000
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
• 0000  0001
• 1000  0010
22
Penn ESE680-002 Spring2007 -- DeHon
What can a Butterfly Route?
0000
1000
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
• 0000  0001
• 1000  0010
23
Penn ESE680-002 Spring2007 -- DeHon
Butterfly Routing
• Cannot route all permutations
– Get internal blocking
• What required for non-blocking network?
24
Penn ESE680-002 Spring2007 -- DeHon
Decomposition
25
Penn ESE680-002 Spring2007 -- DeHon
Decomposed
Routing
• Pick a link to route.
• Route to destination over red network
• At destination,
– What can we say about the link which shares the
final stage switch with this one?
– What can we do with link?
• Route that link
– What constraint does this impose?
– So what do we do?
26
Penn ESE680-002 Spring2007 -- DeHon
Decomposition
• Switches: N/2  2  4 + (N/2)2 < N2
27
Penn ESE680-002 Spring2007 -- DeHon
Recurse
If it works once, try it again…
28
Penn ESE680-002 Spring2007 -- DeHon
Result: Beneš Network
• 2log2(N)-1 stages (switches in path)
• Made of N/2 22 switchpoints
– (4 switches)
• 4Nlog2(N) total switches
• Compute route in O(N log(N)) time
29
Penn ESE680-002 Spring2007 -- DeHon
Beneš Network Wiring
• Bisection: N
• Wiring  O(N2) area (fixed wire layers)
30
Penn ESE680-002 Spring2007 -- DeHon
Beneš Switching
• Beneš reduced switches
– N2 to N(log(N))
– using multistage network
• Replace crossbars in tree with Beneš
switching networks
31
Penn ESE680-002 Spring2007 -- DeHon
Beneš Switching
• Implication of Beneš Switching
– still require O(W2) wiring per tree node
• or a total of O(N2p) wiring
– now O(W log(W)) switches per tree node
• converges to O(N) total switches!
– O(log2(N)) switches in path across network
• strictly speaking, dominated by wire delay ~O(Np)
• but constants make of little practical interest except
for very large networks 
32
Penn ESE680-002 Spring2007 -- DeHon
Better yet…
•
•
•
•
Believe do not need Beneš on the up paths
Single switch on up path
Beneš for crossover
Switches in path:
log(N) up
+ log(N) down
+ 2log(N) crossover
= 4 log(N)
= O(log (N))
33
Penn ESE680-002 Spring2007 -- DeHon
Linear Switch Population
34
Penn ESE680-002 Spring2007 -- DeHon
Linear Switch Population
• Can further reduce switches
– connect each lower channel to O(1)
channels in each tree node
– end up with O(W) switches per tree node
35
Penn ESE680-002 Spring2007 -- DeHon
Linear Switch (p=0.5)
36
Penn ESE680-002 Spring2007 -- DeHon
Linear Population and Beneš
• Top-level crossover of p=1 is Beneš switching
37
Penn ESE680-002 Spring2007 -- DeHon
Beneš Compare
• Can permute stage switches so local
shuffles on outside and big shuffle in middle
38
Penn ESE680-002 Spring2007 -- DeHon
Linear Consequences:
Good News
• Linear Switches
– O(log(N)) switches in path
– O(N2p) wire area
– O(N) switches
– More practical than Beneš crossover case
39
Penn ESE680-002 Spring2007 -- DeHon
Linear Consequences:
Bad News
• Lacks guarantee can use all wires
– as shown, at least mapping ratio > 1
– likely cases where even constant not
suffice
• expect no worse than logarithmic
• Finding Routes is harder
– no longer linear time, deterministic
– open as to exactly how hard
40
Penn ESE680-002 Spring2007 -- DeHon
Mapping Ratio
• Mapping ratio says
– if I have W channels
• may only be able to use W/MR wires
–for a particular design’s
connection pattern
• to accommodate any design
–forall channels
physical wires  MR  logical
Penn ESE680-002 Spring2007 -- DeHon
41
Mapping Ratio
• Example:
– Shows MR=3/2
– For Linear Population, 1:1 switchbox
42
Penn ESE680-002 Spring2007 -- DeHon
Area Comparison
Both:
p=0.67
N=1024
M-choose-N
Penn ESE680-002 Spring2007perfect
-- DeHon
map
Linear
MR=2
43
Area Comparison
• Since
– switch >> wire
• may be able to
tolerate MR>1
• reduces switches
– net area savings
• Empirical:
– Never seen greater
than 1.5
M-choose-N
perfect map
Penn ESE680-002 Spring2007 -- DeHon
Linear
MR=2
44
Expanders
45
Penn ESE680-002 Spring2007 -- DeHon
Expander Theory
 (a,b)-expansion
– Any group of size k=aN will expand
connect to a group of size bk=baN in each
logical direction
[Arora, Leighton, Maggs
SIAM Journal of Comp. v25n3p600 1996]
46
Penn ESE680-002 Spring2007 -- DeHon
Expander Idea
• IF we can achieve expansion
– Can guarantee non-blocking at each stage
• i.e.
– Guarantee use less than aN
– Guarantee connections to more stuff in
next level
– Since baN>aN available in next level
• Guaranteed to be an available switch
47
Penn ESE680-002 Spring2007 -- DeHon
Dilated Switches
• Have multiple outputs per logical
direction
– Dilation: number of outputs per direction
– E.g. radix 2 switch w/ 4 outputs
• 2 per direction
• Dilation 2
Up (0)
Down (1)
48
Penn ESE680-002 Spring2007 -- DeHon
Dilated Switches allow Expansion
• On Right
– Any pair of nodes
connects to 3
switches
• Strictly speaking
must have d>2 for
expansion
49
Penn ESE680-002 Spring2007 -- DeHon
Random Wiring
• Random, dilated wiring for butterfly can
achieve
[Upfal/STOC 1989,
– For tree…22p (?) Leighton/Leiserson/Kravets
MIT/LCS/RSS 8 1990] 50
Penn ESE680-002 Spring2007 -- DeHon
Constraints
• Total load should not exceed a of net
–



L=mapping ratio (light loading factor)
aLW = number into each subtree
aLW  W/2p
L  1/(2pa)
• Cannot expand past the size of subtree
 baNN/2p
 ba  1/2p
51
Penn ESE680-002 Spring2007 -- DeHon
Extra Switches
• Extra switch factor: d×L
• Try:
 b=2, a=1/10
– d=8
– dL40 (p=1)
• Try:
 b=1.01, a=1/4  d=6, L2
dL12 (p=1)
 b=1.01, a=1/4  d=6, L2.8 dL17 (p=0.5)
52
Penn ESE680-002 Spring2007 -- DeHon
Putting it Together
• Base, linear-population trees have O(N)
switches
• Make larger by a factor of L (linear factor)
• Dilated version have a factor of d more
switches
• Randomly wired expander
– Can have O(N) switches
– Guarantee routes
– Constants < 100 (looks like < 20)
– Open: how tight can make it?
53
Penn ESE680-002 Spring2007 -- DeHon
Big Ideas
[MSB Ideas]
• In addition to wires, must have switches
– Have significant area and delay
• Rent’s Rule locality reduces
– both wiring and switching requirements
• Naïve switches match wires at O(N2p)
– switch area >> wire area
– prevent benefit from multiple layers of metal
54
Penn ESE680-002 Spring2007 -- DeHon
Big Ideas
[MSB Ideas]
• Can achieve O(N) switches
– plausibly O(N) area with sufficient metal layers
• Switchbox depopulation
– save considerably on area (delay)
– will waste wires
– May still come out ahead (evidence to date)
55
Penn ESE680-002 Spring2007 -- DeHon