Document

Various communication networks
State of the art technology
Important aspects of routing schemes
Known results (theory)
The internet
Heiko Schröder, 2003
Parallel Architectures 1
Routing Models
•Store-and-forward (packet switching) model:
--Packet is entity – one packet per edge per time unit
--Queues can be allowed to build up in nodes – try to keep them short
•Circuit switching (path-lockdown)
--entire path is dedicated to packet (from source to destination)
•Wormhole routing
•Static routing problems: all packets are present when routing commences.
•(Dynamic routing: packets arrive at arbitrary times.)
Types of static routing problems:
General assumption: each processor sends only one packet
•One-to-one:
-- each packet has precisely one destination
-- at most one packet is destined for each processor
•Many-to-one: More than one packet can have same destination.
•One-to-many: A single packet can have more than one destinations (copies).
•Hot spots = bottlenecks (example: Many-to-one) – try to avoid !
Heiko Schröder, 2003
Parallel Architectures 2
Used in clusters
Heiko Schröder, 2003
Wormhole routing
Parallel Architectures 3
Hot potato routing
Try to move as
many as possible
into a “good”
direction
Very good average performance!
Hot potato routing on the internet
Heiko Schröder, 2003
Parallel Architectures 4
Greedy routing
•Move along row to correct column
•move along column
•Possible queue
Heiko Schröder, 2003
Parallel Architectures 5
routing
FFT
Butterfly network
sorting
Unique path
Heiko Schröder, 2003
Parallel Architectures 6
Benes network
Heiko Schröder, 2003
Parallel Architectures 7
Benes network
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Heiko Schröder, 2003
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Parallel Architectures 8
Packet-Routing Algorithms
•Most important in parallel architectures
•Meshes have big diameter
•Benes networks – fast routing – no fast way of finding the paths is known
(might be computed off-line – might not be suitable)
•On-line algorithms ?
Heiko Schröder, 2003
Parallel Architectures 9
Greedy Routing – in BF
0
row 000
1
level
log N (= k)
1
row 001
row 010
(u1u2… uk-1 uk, 0) 
(v1u2… uk-1 uk, 1) 
(v1v2… uk-1 uk, 2) 
...
(v1v2… vk-1 vk, k)
v1v2… vk row 011
u1u2… uk row 100
(u1u2…u(k-1)/200…0, 0) 
row 101
(00…0u(k-1)/2…u2u1, k)
row 110
row 111
Heiko Schröder, 2003
N
N
Parallel Architectures 10
Greedy Routing – Worst Cases
Rout N packets in a butterfly: :  [1,N]  [1,N] ; Example: bit-reversal permutation
0
row 000
1
level
log N (= k)
1
row 001
row 010
row 011
row 100
row 101
2(k-1)/2 =
thru
row 110
row 111
(u1u2…u(k-1)/200…0, 0) 
(0u2…u(k-1)/200…0, 1) 
...
(00..0u(k-1)/200…0, (k-3)/2) 
(00..0000…0, (k-1)/2) 
(00..0000…0, (k+1)/2) 
(00..00u(k-1)/20…0, (k+3)/2) 
. ..
(00…0u(k-1)/2…u2u1, k)
N
Heiko Schröder, 2003
N
N /2
paths go
Time  N 2  log N 1
Parallel Architectures 11
Oblivious Routing
Definition: A routing algorithm is called oblivious if its path depends only on the
addresses of source and destination of the packet.
Example: Greedy routing.
Theorem: Let G=(V,E) be any N-node degree-d network. Then for every
oblivious routing algorithm there exists a 1-1 packet routing problem which
Will take at least N / 2d steps to complete.
Proof: see Leighton.
Thus a “good” routing algorithm cannot be oblivious (or greedy)
– it has to take into account other packets and/or congestions.
Heiko Schröder, 2003
Parallel Architectures 12
Routing via sorting
Routing can be (and is often) done via sorting.
Merge-sort on the hypercube and hypercubic networks can be done
in time O(log2N) – much better results are known – it might be possible to sort
in time O(log N) (unknown for hypercubic networks).
If M<N keys need to be sorted it is advisable to
“pack” first, then sort, then “spread”.
Heiko Schröder, 2003
Parallel Architectures 13
Packing on the butterfly
A
row 000
row 001
A
B
C
row 010
row 011
B
D
row 100
C
E
Unique greedy path 
monotone packing without
collisions. Proof?
Destination unknown 
firstly determine destination!
row 101
row 110
D
row 111
E
Heiko Schröder, 2003
Parallel Architectures 14
neighbor
not neighbor
distance < 4
distance >= 4
Heiko Schröder, 2003
Parallel Architectures 15
Prefix sum
Complete binary tree is sub-graph of butterfly
Heiko Schröder, 2003
Parallel Architectures 16
Wrapped butterfly (WBF)
Heiko Schröder, 2003
Parallel Architectures 17
0/1 principle
If an oblivious comparison exchange algorithm sorts all input sets consisting solely
of 0s and 1s, then it sorts all input sets with r values.
Proof (by contradiction):
Assume it sorts all input sets consisting solely of 0s and 1s, but
it fails to sort some sequence of arbitrary values.
Instead of the correct output:
x1 x2  x3  …  xk-1  xk  …  xn
it outputs:
wrong position!
x1 x2  x3  …  xk-1 < xr … xk ...
Now replace all xi with i  k with 0s and all others with 1s.
0 xk
1 xs

0 xk
1 xs
0 xk
0 xc

0 xc
0 xk
0 xk
0 xc
0 xc
0 xk
An 0 ends up where xk ended up, i.e. in a wrong position -- contradiction!.
Heiko Schröder, 2003
Parallel Architectures 18
Use 0/1-principle
bitonic sequence:
Inductive proof:
butterfly sorts bitonic
sequences
a concatenation of two sorted
sequences (arbitrary length)
-- sorted in opposite directions
Heiko Schröder, 2003
Parallel Architectures 19
Inductive step
Case 3 & 4 : 0  1
bitonic
(min)
Case 1:
at least n/2 1s
Case 2:
at most n/2 1s
n/2 1s
(max)
Heiko Schröder, 2003
n/2 0s
(min)
bitonic
(max)
Parallel Architectures 20
Time/Area complexity?
For sorting on BF
Last merge: log n steps
previous merge: log n -1 steps
...
first merge: 1 step
Total time: (log n +1) log n / 2 steps
Time: (log2n)
Area of butterfly: (n2) -- # of crossing points!
AT2= (n2log4n) not quite optimal.
Heiko Schröder, 2003
Parallel Architectures 21
Sorting on the ISA
Repeat log n times: vertical merge; horizontal merge.
Heiko Schröder, 2003
Parallel Architectures 22
Sorting on the ISA
2x2
Heiko Schröder, 2003
4x2
Parallel Architectures 23
in-shuffle:
out-shuffle -- result?
Heiko Schröder, 2003
Horizontal merge
Only one dirty row - prove!
sorted!
Parallel Architectures 24
Same number per column
xxxx
In-shuffle
xxx
Heiko Schröder, 2003
Parallel Architectures 25
Time/Area complexity
for sorting on the mesh
Time for a merge step from k x k to 2k x 2k: Ck
Total time: O(log n n) ( n x  n mesh) (remark: n log n is possible)
Area: n log n
AT2= n2 log3n (n2 log2n is possible)
AT:
BF:
AT= n2 log2n
Mesh: AT=n3/2log2n
Heiko Schröder, 2003
Parallel Architectures 26
Warshall’s algorithm
for k:=1 to n do
for i:=1 to n do
for j:=1 to n do
aij:=F(aij, aik, akj)
akj
aik
aij
Algebraic path problem.
Examples: all shortest paths
1.) aij := aij  ( aik  akj)
-- start with adjacency matrix A
2.) dij := min dij ; ( dik + dkj) 
-- start with distance matrix D
[also carry first/last node on path]
Heiko Schröder, 2003
Parallel Architectures 27
Parallaxis versus ISA
a1j
a2j
akj
ai2
aik
aij
Parallaxis
ai1
aij
aij
ISA
Adjacency matrix in C
Instructions (only a suggestion):
A:=C AC
C:=CN
C:=A CA
C:=CE
V:=C VC
C:=CW
V:=CV 
C:=AV 
C:=CS
Heiko Schröder, 2003
Parallel Architectures 28
J
?
?
J
J
?
?
J