Various communication networks State of the art technology Important aspects of routing schemes Known results (theory) The internet Heiko Schröder, 2003 Parallel Architectures 1 Routing Models •Store-and-forward (packet switching) model: --Packet is entity – one packet per edge per time unit --Queues can be allowed to build up in nodes – try to keep them short •Circuit switching (path-lockdown) --entire path is dedicated to packet (from source to destination) •Wormhole routing •Static routing problems: all packets are present when routing commences. •(Dynamic routing: packets arrive at arbitrary times.) Types of static routing problems: General assumption: each processor sends only one packet •One-to-one: -- each packet has precisely one destination -- at most one packet is destined for each processor •Many-to-one: More than one packet can have same destination. •One-to-many: A single packet can have more than one destinations (copies). •Hot spots = bottlenecks (example: Many-to-one) – try to avoid ! Heiko Schröder, 2003 Parallel Architectures 2 Used in clusters Heiko Schröder, 2003 Wormhole routing Parallel Architectures 3 Hot potato routing Try to move as many as possible into a “good” direction Very good average performance! Hot potato routing on the internet Heiko Schröder, 2003 Parallel Architectures 4 Greedy routing •Move along row to correct column •move along column •Possible queue Heiko Schröder, 2003 Parallel Architectures 5 routing FFT Butterfly network sorting Unique path Heiko Schröder, 2003 Parallel Architectures 6 Benes network Heiko Schröder, 2003 Parallel Architectures 7 Benes network 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Heiko Schröder, 2003 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Parallel Architectures 8 Packet-Routing Algorithms •Most important in parallel architectures •Meshes have big diameter •Benes networks – fast routing – no fast way of finding the paths is known (might be computed off-line – might not be suitable) •On-line algorithms ? Heiko Schröder, 2003 Parallel Architectures 9 Greedy Routing – in BF 0 row 000 1 level log N (= k) 1 row 001 row 010 (u1u2… uk-1 uk, 0) (v1u2… uk-1 uk, 1) (v1v2… uk-1 uk, 2) ... (v1v2… vk-1 vk, k) v1v2… vk row 011 u1u2… uk row 100 (u1u2…u(k-1)/200…0, 0) row 101 (00…0u(k-1)/2…u2u1, k) row 110 row 111 Heiko Schröder, 2003 N N Parallel Architectures 10 Greedy Routing – Worst Cases Rout N packets in a butterfly: : [1,N] [1,N] ; Example: bit-reversal permutation 0 row 000 1 level log N (= k) 1 row 001 row 010 row 011 row 100 row 101 2(k-1)/2 = thru row 110 row 111 (u1u2…u(k-1)/200…0, 0) (0u2…u(k-1)/200…0, 1) ... (00..0u(k-1)/200…0, (k-3)/2) (00..0000…0, (k-1)/2) (00..0000…0, (k+1)/2) (00..00u(k-1)/20…0, (k+3)/2) . .. (00…0u(k-1)/2…u2u1, k) N Heiko Schröder, 2003 N N /2 paths go Time N 2 log N 1 Parallel Architectures 11 Oblivious Routing Definition: A routing algorithm is called oblivious if its path depends only on the addresses of source and destination of the packet. Example: Greedy routing. Theorem: Let G=(V,E) be any N-node degree-d network. Then for every oblivious routing algorithm there exists a 1-1 packet routing problem which Will take at least N / 2d steps to complete. Proof: see Leighton. Thus a “good” routing algorithm cannot be oblivious (or greedy) – it has to take into account other packets and/or congestions. Heiko Schröder, 2003 Parallel Architectures 12 Routing via sorting Routing can be (and is often) done via sorting. Merge-sort on the hypercube and hypercubic networks can be done in time O(log2N) – much better results are known – it might be possible to sort in time O(log N) (unknown for hypercubic networks). If M<N keys need to be sorted it is advisable to “pack” first, then sort, then “spread”. Heiko Schröder, 2003 Parallel Architectures 13 Packing on the butterfly A row 000 row 001 A B C row 010 row 011 B D row 100 C E Unique greedy path monotone packing without collisions. Proof? Destination unknown firstly determine destination! row 101 row 110 D row 111 E Heiko Schröder, 2003 Parallel Architectures 14 neighbor not neighbor distance < 4 distance >= 4 Heiko Schröder, 2003 Parallel Architectures 15 Prefix sum Complete binary tree is sub-graph of butterfly Heiko Schröder, 2003 Parallel Architectures 16 Wrapped butterfly (WBF) Heiko Schröder, 2003 Parallel Architectures 17 0/1 principle If an oblivious comparison exchange algorithm sorts all input sets consisting solely of 0s and 1s, then it sorts all input sets with r values. Proof (by contradiction): Assume it sorts all input sets consisting solely of 0s and 1s, but it fails to sort some sequence of arbitrary values. Instead of the correct output: x1 x2 x3 … xk-1 xk … xn it outputs: wrong position! x1 x2 x3 … xk-1 < xr … xk ... Now replace all xi with i k with 0s and all others with 1s. 0 xk 1 xs 0 xk 1 xs 0 xk 0 xc 0 xc 0 xk 0 xk 0 xc 0 xc 0 xk An 0 ends up where xk ended up, i.e. in a wrong position -- contradiction!. Heiko Schröder, 2003 Parallel Architectures 18 Use 0/1-principle bitonic sequence: Inductive proof: butterfly sorts bitonic sequences a concatenation of two sorted sequences (arbitrary length) -- sorted in opposite directions Heiko Schröder, 2003 Parallel Architectures 19 Inductive step Case 3 & 4 : 0 1 bitonic (min) Case 1: at least n/2 1s Case 2: at most n/2 1s n/2 1s (max) Heiko Schröder, 2003 n/2 0s (min) bitonic (max) Parallel Architectures 20 Time/Area complexity? For sorting on BF Last merge: log n steps previous merge: log n -1 steps ... first merge: 1 step Total time: (log n +1) log n / 2 steps Time: (log2n) Area of butterfly: (n2) -- # of crossing points! AT2= (n2log4n) not quite optimal. Heiko Schröder, 2003 Parallel Architectures 21 Sorting on the ISA Repeat log n times: vertical merge; horizontal merge. Heiko Schröder, 2003 Parallel Architectures 22 Sorting on the ISA 2x2 Heiko Schröder, 2003 4x2 Parallel Architectures 23 in-shuffle: out-shuffle -- result? Heiko Schröder, 2003 Horizontal merge Only one dirty row - prove! sorted! Parallel Architectures 24 Same number per column xxxx In-shuffle xxx Heiko Schröder, 2003 Parallel Architectures 25 Time/Area complexity for sorting on the mesh Time for a merge step from k x k to 2k x 2k: Ck Total time: O(log n n) ( n x n mesh) (remark: n log n is possible) Area: n log n AT2= n2 log3n (n2 log2n is possible) AT: BF: AT= n2 log2n Mesh: AT=n3/2log2n Heiko Schröder, 2003 Parallel Architectures 26 Warshall’s algorithm for k:=1 to n do for i:=1 to n do for j:=1 to n do aij:=F(aij, aik, akj) akj aik aij Algebraic path problem. Examples: all shortest paths 1.) aij := aij ( aik akj) -- start with adjacency matrix A 2.) dij := min dij ; ( dik + dkj) -- start with distance matrix D [also carry first/last node on path] Heiko Schröder, 2003 Parallel Architectures 27 Parallaxis versus ISA a1j a2j akj ai2 aik aij Parallaxis ai1 aij aij ISA Adjacency matrix in C Instructions (only a suggestion): A:=C AC C:=CN C:=A CA C:=CE V:=C VC C:=CW V:=CV C:=AV C:=CS Heiko Schröder, 2003 Parallel Architectures 28 J ? ? J J ? ? J
© Copyright 2026 Paperzz