Interconnection Network
• PRAM Model is too simple
• Physically, PEs communicate through the network
(either buses or switching networks)
• Cost depends on network topology
• Question:
– Should user exploit the interconnection network topology?
– Does user have the freedom to exploit the topology?
Mesh with Wraparound
Example: multiplying two n by n matrices on a mesh
Initially,
PE[i,j] has x[i,j] = a[i,j] and y[i,j] = b[i,j]
Row i shift left x data (i-1) times
Col j shift up y data j-1 times
At each step,
PE[i,j] do
{ c[i,j] = c + x*y.
Send x to left (Wrap around), send y to up(wrap)
}
How about transitive closure?
Matrix Multiplication
Step 1
a11
b11
a12
b12
a13
b13
a14
b14
a21
b21
a22
b22
a23
b23
a24
b24
a31
b31
a32
b32
a33
b33
a34
b34
a41
b41
a42
b42
a43
b43
a44
b44
Step 2: Rearrange Data
a11
b11
a12
b22
a13
b33
a14
b44
a22
b21
a23
b32
a24
b43
a21
b14
a33
b31
a34
b42
a31
b13
a32
b24
a44
b41
a41
b12
a42
b23
a43
b34
Step 3: Multiply Add and Move Data
Data Move at Cell ik
a11
b11
a12
b22
a13
b33
a14
b44
S
bjk
aij
a22
b21
a23
b32
a24
b43
a21
b14
a33
b31
a34
b42
a31
b13
a32
b24
a44
b41
a41
b12
a42
b23
a43
b34
c21 = a22b21 + a21b11 + a24b41 + a23b33
b43
b34
b42
b33
b24
b41
b32
b23
b14
b31
b22
b13
b21
b12
Systolic Array Algorithm
b11
a14
a13
a12
a21
a24
a23
a22
a34
a33
a32
a31
a43
a42
a41
a11
How to simulate wraparound mesh using regular mesh without
losing speed more than a constant factor?
Tree Architecture
Application: Census functions, Data Base, Queue, Stack
Tree Computation
Census function : a[1] + ... + a[n]
• Applications:
• Can you compute
• s[i] = a[1] + a[2] + ... a[i], for i=1 ... n?
parallel prefix computation
Bottleneck: Every data goes to root
How to solve: Make channel to thick as it goes to the
top of the tree => fat tree
Example: Parallel Prefix Computation
Step 1: Upward phase
For each node, when it receive data from left and right, then
sum = left + right
if node is not the root, send sum to its parent
when the root receives data from left and right children
{ send 0 to its left child
send left to its right child }
1
2
3
4
5
6
7
8
Step 2: Downward phase
When a nonleaf receives sum from its parent{
send sum to its left child
send left + sum to its right child }
When a leaf node receives sum from its parent
then prefix = sum + data
10
10
0
11
3
0
1
0
1
1
5
3
1
2
3
6
3
3
21
10
10
4
6
10
5
15
7
15
6
21
21
28
7
28
8
36
Disadvantages of Trees:
• Small bisection width
• Root can be the bottle neck
Properties of Interconnection Networks
– Small Diameter diam = max(u,v in V) (u,v)
– Large Bisection Width
• Smallest number of edges whose removal divides G into two equal size
– Fixed node degree
– Uniformity (symmetric)
• Graph looks the same independent from which vertex you look
– Incremental extendability: Allow any size
– Scalable (graph): construct larger one easily.
• i.e., smaller one can be obtained from the larger one by removing some nodes
and edges.
– hypercube, mesh
– shuffle exchange netwok, DeBruin’s graph
– Routing and collective communication
• one to all, all to all
– Embeddability
– Simple layout complexity (=> small bisection width: conflict)
– Fault tolerance
Fat Tree
CM5 Data Link
Hypercube
• One way of solving the bottleneck of tree
and large diameter of mesh
• Recursively defined as follows:
Hn :
Hn-1
Hn-1
Hypercube
0
1
Large Bisection width
Small Radius
High Fault Tolerant
But node degree too high
Interconnection
Mapping Mesh onto a hypercube
• A[i,j] on mesh -> A(gray(i)·gray(j)) on Hypercube
• A[i,j+1] on mesh -> A(gray(i)·gray(j+1)) on Hypercube
connected to A(gray(i)·gray(j))
Mapping a binary tree on a hypercube
Hypercube Data Move Example
• Reversing a list
Before: PE[i] has A[i]
After: PE[i] has A[n-i-1], 0<= i <= n-1
Reverse (H) {
Swap (A) for the highest bit
Reverse two Hk-1 in parallel }
• Matrix Transpose
A[i,j] -> A[j,i]
Shuffle Exchange Network
000
000
001
001
010
010
011
011
100
100
101
101
110
110
111
111
Mesh of Trees
2D with 16 nodes
Cube Connected Cycles
e1
e1
e2
e2
e3
e5
e3
e4
e5
e4
Hupercube node with dimension 4
CCC node
2n nodes
r2n nodes
r = logn
Cube Connecyed Cycles
0000
1111
CCC
•
•
•
•
Large bisection width
Scalable
Small diameter
Can simulate Hypercube
Simulation of Hypercube using CCC
• Divide and Conquer Algorithm communication pattern
Ascend d=1, d=2, d=4, d=8, ..., d=n/2
Descend d=n/2, ...., d=4, d=2, ..., d=1
• example :
– merging
– Sorting
– FFT
• For this type of data movement, CCC can simulate
hypercube data move without any penalty
De Bruin’s Graph
(xn-1,xn-2,...,x0) -> (xn-2,...,x0,0) and
-> (xn-2,...,x0,1)
Highly recursive
Linear Shift Register
Lock Combination
D=0
D=1
D=2
Multistage Interconnection Network
• Blocking Networks
– Unidirectional MIN
– Bidirectional MIN
• Non Blocking Networks
Any input port can be connected to any free output port
without affecting the existing connections.
– 2D mesh Crossbar
– Time Division bus
– Clos network.
© Copyright 2026 Paperzz