One-to-all broadcast M M M M Single

One-to-all broadcast
M
0
1
...
p-1
M
M
0
1
Single-node accumulation
Figure 3.1 One-to-all broadcast and single-node accumulation.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
M
...
p-1
3
7
4
6
5
4
2
4
0
1
1
2
2
3
3
Figure 3.2 One-to-all broadcast on an eight-processor
ring with SF routing. Processor 0 is the source of the
broadcast. Each message transfer step is shown by a
numbered, dotted arrow from the source of the message
to its destination. The number on an arrow indicates the
time step during which the message is transferred.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
Processor boundaries
P0
P1
P2
P3
P0
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
P12
P13
P14
P15
Vector
Matrix
Figure 3.3 One-to-all broadcast in the multiplication of a
matrix with a
vector.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
12
13
4
14
4
8
4
9
4
5
3
0
4
10
11
4
4
4
4
6
3
1
15
1
7
3
2
3
2
3
2
Figure 3.4 One-to-all broadcast on a 16processor mesh with SF routing.
Copyright (r) 1994 Benjamin/Cummings
Publishing Co.
(110)
(111)
3
6
7
(011)
2
(010)
3
2
3
3
2
4
1
(100)
(000)
0
5
(101)
1
3
(001)
Figure 3.5 One-to-all broadcast on a three-dimensional
hypercube. The binary representations of processor labels are shown in parentheses.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
3
3
2
7
6
5
4
1
0
1
2
3
2
3
3
Figure 3.6 One-to-all broadcast with CT routing on an
eight-processor ring.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
3
7
4
11
4
2
14
3
5
4
3
13
9
4
2
4
10
3
1
0
4
6
3
15
4
4
8
4
2
12
1
Figure 3.7 One-to-all broadcast on a
16-processor square mesh with CT routing.
Copyright (r) 1994 Benjamin/Cummings
Publishing Co.
1
2
2
3
0
3
1
2
3
3
4
3
5
Figure 3.8 One-to-all broadcast on an eight-processor tree.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
6
7
M p -1
All-to-all broadcast
M0
0
M1
1
...
M p -1
p-1
Multinode accumulation
Figure 3.9 All-to-all broadcast and multinode accumulation.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
.
..
M p -1
M p -1
.
..
.
..
M1
M1
M1
M0
M0
M0
0
1
...
p-1
1 (6)
1 (5)
7
1 (4)
6
(7)
5
(6)
4
(5)
(4)
1 (7)
1 (3)
(0)
(1)
0
(2)
1
2
3
1 (0)
1 (1)
1 (2)
2 (5)
2 (4)
2 (3)
7
6
(7,6)
5
(6,5)
First communication step
(3)
4
(5,4)
(4,3)
2 (6)
2 (2)
(0,7)
(1,0)
0
(2,1)
1
2 (7)
2
2 (0)
Second communication step
(3,2)
3
2 (1)
.
.
.
7 (0)
7
7 (7)
6
(7,6,5,4,3,2,1)
.
.
.
7 (6)
5
(6,5,4,3,2,1,0)
(5,4,3,2,1,0,7)
4
(4,3,2,1,0,7,6)
7 (1)
7 (5)
(0,7,6,5,4,3,2)
0
(1,0,7,6,5,4,3)
(2,1,0,7,6,5,4)
1
7 (2)
(3,2,1,0,7,6,5)
2
7 (3)
Seventh communication step
3
7 (4)
Figure 3.10 All-to-all broadcast on an eight-processor ring with SF routing. In addition
to the time step, the label of each arrow has an additional number in parentheses.
This number labels a message and indicates the processor from which the message
originated in the first step. The number(s) in parentheses next to each processor are
the labels of processors from which data has been received prior to the communication
step. Only the first, second, and last communication steps are shown.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
(6)
(7)
(8)
(6,7,8)
(6,7,8)
(6,7,8)
6
7
8
6
7
8
(3)
(4)
(5)
(3,4,5)
(3,4,5)
(3,4,5)
3
4
5
3
4
5
0
1
2
0
1
2
(0)
(1)
(2)
(0,1,2)
(0,1,2)
(0,1,2)
(a) Initial data distribution
(b) Data distribution after rowwise broadcast
Figure 3.11 All-to-all broadcast on a
mesh. The groups of processors communicating with each other in each phase are enclosed by dotted boundaries. By the end
of the second phase, all processors get (0,1,2,3,4,5,6,7) (that is, a message from each
processor).
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
(2)
(6)
(7)
(6,7)
(6,7)
6
7
6
7
2
3
(3)
2
(2,3)
(4)
3
(2,3)
(5)
4
5
4
5
(4,5)
(4,5)
(0)
0
1
(a) Initial distribution of messages
(4,5,
6,7)
(0,1,
2,3)
6
2
3
(0,1)
(0,...,7)
(0,...,7)
6
7
(0,...,7)
(0,...,7)
2
3
(0,1,
2,3)
(4,5,
6,7)
(0,...,7)
6,7)
4
0
1
(b) Distribution before the second step
(4,5,
7 6,7)
(4,5,
(0,1,
2,3)
0
(0,1)
(1)
5
1
(0,1,
2,3)
(c) Distribution before the third step
(0,...,7)
4
(0,...,7)
5
(0,...,7)
0
1
(d) Final distribution of messages
Figure 3.12 All-to-all broadcast on an eight-processor hypercube.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
(6) [6]
(7) [7]
6
(6+7) [6+7]
6
[2]
(2)
(6) [6]
7
7
[2]
2
3
(3) [3]
(2+3)
2
3
(2+3)
[2+3]
[4]
4
5
(4) [4]
4
[0]
(0)
1
(4+5+6) [4+5+6]
(1) [1]
(0+1)
[0+1+2]
0
1
[0+ .. +6]
7
[0+ .. +7]
6
[0+1+2+3]
2
(0+1) [0+1]
(b) Distribution of sums before second step
(4+5+6+7) [4+5+6+7]
6
2+3)
(4+5) [4+5]
[0]
0
(a) Initial distribution of values
(0+1+
5
(4+5)
(5) [5]
7
[0+1+2+3]
[0+1+2]
3
2
3
(0+1+2+3)
[4]
(4+5)
4
4
[4+5]
(4+5)
[0]
(0+1+
2+3)
5
0
5
[0+1+2+3+4]
1
(0+1+
[0+1]
2+3)
(c) Distribution of sums before third step
[0]
0
[0+ .. +5]
1
[0+1]
(d) Final distribution of prefix sums
Figure 3.13 Computing prefix sums on an eight-processor hypercube. At
each processor, square brackets show the local prefix sum accumulated in a
buffer and parentheses enclose the contents of the outgoing message buffer
for the next step.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
Contention for a single
channel by multiple
messages
7
6
5
4
0
1
2
3
Figure 3.14 Contention for a channel when the communication step of Figure 3.12(c) for the hypercube is mapped onto a ring.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
M p -1
..
.
One-to-all personalized
M1
M0
0
1
...
p-1
Single-node gather
M0
M1
0
1
...
M p -1
p-1
Figure 3.15 One-to-all personalized communication and its dual—single-node gather.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
6
7
2
3
4
5
0
(a) Initial distribution of messages
0
(2)
3
1
(b) Distribution before the second step
7
2
5
(4,5,
6,7)
(6,7)
(2,3)
3
4
(0,1,
2,3)
1
6
7
2
(0,1,2,3,
4,5,6,7)
6
(6)
(7)
6
7
2
3
(3)
(4)
4
5
(5)
4
5
(4,5)
(0,1)
0
1
(c) Distribution before the third step
(0)
0
1
(1)
(d) Final distribution of messages
Figure 3.16 One-to-all personalized communication on an eight-processor
hypercube.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
M 0, p -1 M 1, p -1
.
..
..
.
M p -1,
M 0,1
M 1,1
M p -1,1
M 0,0
M 1,0
0
1
...
..
.
M p -1,0
p -1
All-to-all personalized
communication
p-1
Figure 3.17 All-to-all personalized communication.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
M p -1,0 M p -1,1
..
.
.
..
M p -1,
M 1,0
M 1,1
M 1, p -1
M 0,0
M 0,1
M 0, p -1
0
1
..
.
...
p-1
p -1
({0,5})
5
4
3
2
1
5
4
3
2
1
({5,4})
({1,5}, {1,0})
({0,4}, {0,5})
({2,5} ... {2,1})
({1,4} ... {1,0})
({3,5} ... {3,2})
({2,4} ... {2,1})
({4,5} ... {4,3})
({3,4} ... {3,2})
5
4
5
4
3
2
1
3
({5,0},
({3,0}, ({4,0}, {5,1},
({2,0},
({1,0})
{4,1},
{2,1}) {3,1}, {4,2}, {5,2},
{5,3},
{3,2})
{4,3}) {5,4})
({2,3},
{2,4}, ({1,3}, ({0,3}, ({5,3},
{2,5}, {1,4}, {0,4}, {5,4}) ({4,3})
{2,0}, {1,5}, {0,5})
{2,1}) {1,0})
0
1
2
3
4
5
1
2
({0,1} ... {0,5})
({1,2} ... {1,0})
({5,1} ... {5,4})
({0,2} ... {0,5})
({4,1} ... {4,3})
({5,2} ... {5,4})
({3,1}, {3,2})
({4,2}, {4,3})
({2,1})
({3,2})
1
2
3
4
5
1
2
3
4
5
Figure 3.18 All-to-all personalized communication on a six-processor ring. The label
of each message is of the form , where is the label of the processor that originally
stored the message, and is the label of the processor that is the final destination of
the message. The label indicates a message that is
formed by concatenating individual messages.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
({8,0},{8,3},{8,6},
6
7
8
{8,1},{8,4},{8,7},
{8,2},{8,5},{8,8})
({7,0},{7,3},{7,6},
({6,0},{6,3},{6,6},
{6,1},{6,4},{6,7},
{7,1},{7,4},{7,7},
{7,2},{7,5},{7,8})
{6,2},{6,5},{6,8})
({5,0},{5,3},{5,6},
3
4
({0,0},{0,3},{0,6},
{0,1},{0,4},{0,7},
{0,2},{0,5},{0,8})
{5,1},{5,4},{4,7},
{5,2},{5,5},{5,8})
({4,0},{4,3},{4,6},
({3,0},{3,3},{3,6},
{3,1},{3,4},{3,7},
{3,2},{3,5},{3,8})
0
5
{4,1},{4,4},{4,7},
{4,2},{4,5},{4,8})
1
({1,0},{1,3},{1,6},
{1,1},{1,4},{1,7},
{1,2},{1,5},{1,8})
(a) Data distribution at the
beginning of first phase
({6,0},{6,3},{6,6},
{7,0},{7,3},{7,6},
({6,1},{6,4},{6,7},
{7,1},{7,4},{7,7},
({6,2},{6,5},{6,8},
{7,2},{7,5},{7,8},
{8,0},{8,3},{8,6})
{8,1},{8,4},{8,7})
{8,2},{8,5},{8,8})
2
6
({2,0},{2,3},{2,6},
{2,1},{2,4},{2,7},
{2,2},{2,5},{2,8})
({3,0},{3,3},{3,6},
{4,0},{4,3},{4,6},
7
({3,1},{3,4},
{3,7},{4,1},
3
{4,4},{4,7},
{5,1},{5,,4},
{5,7})
8
({3,2},{3,5},
{3,8},{4,2},
{4,5},{4,8},
4
{5,2},{5,5},
{5,8})
5
{5,0},{5,3},{5,6})
({0,0},{0,3},{0,6},
{1,0},{1,3},{1,6},
{2,0},{2,3},{2,6})
0
({0,1},{0,4},
{0,7},{1,1},
({0,2},{0,5},
{0,8},{1,2},
{1,4},{1,7},
{1,5},{1,8},
{2,1},{2,4},
{2,7})
{2,2},{2,5},
{2,8})
1
2
(b) Data distribution at the beginning of second phase
Figure 3.19 The distribution of messages at the beginning of each phase of all-toall personalized communication on a 3
3 mesh. At the end of the second phase,
processor has messages ({0, }, {8, }), where 0
8. The groups of processors
communicating together in each phase are enclosed in dotted boundaries.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
({6,1},{6,3},{6,5},{6,7},
({6,0},{6,2},{6,4},{6,6},
({6,0} ... {6,7})
({7,0} ... {7,7})
6
({2,0} ... {2,7})
7
6
({2,0},{2,2},
{2,4},{2,6},
{3,0},{3,2},
({3,0} ... {3,7})
2
3
{7,1},{7,3},{7,5},{7,7})
{7,0},{7,2},{7,4},{7,6})
7
2
3
{3,4},{3,6})
4
5
({4,0} ... {4,7})
0
4
({5,0} ... {5,7})
1
({0,0} ... {0,7})
0
6
({1,1},{1,3},{1,5},{1,7},
{0,1},{0,3},{0,5},{0,7})
(b) Distribution before the second step
({7,3},{7,7},{5,3},{5,7},
{6,3},{6,7},{4,3},{4,7})
({6,2},{6,6},{4,2},{4,6},
{7,2},{7,6},{5,2},{5,6})
({0,6} ... {7,6})
({0,7} ... {7,7})
6
7
7
({0,2} ... {7,2})
2
3
0
({0,0},{0,4},{2,0},{2,4},
{1,0},{1,4},{3,0},{3,4})
({0,3} ... {7,3})
2
({4,1},{6,1},
{4,5},{6,5},
5 {5,1},{7,1},
{5,5},{7,5})
4
1
({1,1},{1,5},{3,1},{3,5},
{0,1},{0,5},{2,1},{2,5})
(c) Distribution before the third step
({4,1},{4,3},
{4,5},{4,7},
{5,1},{5,3},
{5,5},{5,7})
1
({0,0},{0,2},{0,4},{0,6},
{1,0},{1,2},{1,4},{1,6})
({1,0} ... {1,7})
(a) Initial distribution of messages
({0,2},{2,2},
{0,6},{2,6},
{1,2},{3,2},
{1,6},{3,6})
5
3
4
5
({0,4} ... {7,4})
0
({0,0} ... {7,0})
({0,5} ... {7,5})
1
({0,1} ... {7,1})
(d) Final distribution of messages
Figure 3.20 All-to-all personalized communication on a three-dimensional hypercube
with SF routing.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
6
7
2
3
6
2
4
5
0
7
3
2
4
(a)
1
4
0
(b)
6
2
4
0
3
(d)
1
4
0
(e)
6
2
0
5
1
(f)
7
3
4
7
3
5
0
1
6
2
4
5
1
7
2
3
5
(c)
6
7
7
3
5
0
1
6
5
1
0
1
3
7
1
0
2
6
2
3
1
5
3
2
0
4
4
5
7
3
5
4
6
2
6
7
5
1
7
6
4
0
(g)
Figure 3.21 Seven steps in all-to-all personalized communication on an eight-processor
hypercube with CT routing.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
(12)
(13)
12
(14)
13
(15)
14
(15)
15
12
(12)
(13)
13
14
(14)
15
1
(8)
(9)
1
8
9
1
10
2
(11)
(10)
1
(11)
11
8
(8)
(9)
9
10
(10)
11
1
(4)
(5)
1
4
5
2
(7)
(6)
1
1
6
2
(7)
7
4
(4)
(5)
5
6
(6)
7
1
(0)
(1)
1
0
1
1
2
(3)
(2)
1
2
(3)
3
0
(0)
(1)
1
2
(2)
3
1
(a) Initial data distribution and the
first communication step
(11)
(12)
12
8
3
4
3
(15)
0
3
1
7
3
3
(0)
(10)
15
(4)
(5)
9
10
(6)
11
(15)
4
(0)
(1)
5
6
(2)
7
3
(1)
2
8
(6)
6
3
(9)
14
3
(5)
5
3
11
3
(4)
(8)
13
(3)
(10)
10
(3)
12
3
(9)
9
3
15
3
(8)
(7)
(14)
14
3
(7)
3
(13)
13
3
(b) Step to compensate for backward row shifts
(11)
(2)
3
(c) Column shifts in the third communication step
0
(12)
(13)
1
2
(d) Final distribution of the data
Figure 3.22 The communication steps in a circular 5-shift on a
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
mesh.
(14)
3
(4)
(5)
(3)
(2)
4
5
4
5
(3)
3
2
(0)
(2)
3
2
(7)
(4)
7
6
(0)
0
(1)
1
(6)
7
6
(7)
(1)
0
First communication step of the 4-shift
1
(5)
(6)
Second communication step of the 4-shift
(a) The first phase (a 4-shift)
(0)
(1)
(7)
(0)
4
5
4
5
(7)
3
2
(6)
(6)
3
2
(2)
(3)
7
6
(4)
0
(5)
1
(2)
(5)
(b) The second phase (a 1-shift)
7
6
(3)
0
1
(1)
(4)
(c) Final data distribution after the 5-shift
Figure 3.23 The mapping of an eight-processor ring onto a three-dimensional
hypercube to perform a circular 5-shift as a combination of a 4-shift and a 1-shift.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
6
2
7
3
4
0
6
2
4
0
(a) 1-shift
2
4
0
4
1
1
7
3
5
4
0
(d) 4-shift
6
2
4
0
(f) 6-shift
7
3
4
0
5
1
(g) 7-shift
Figure 3.24 Circular -shifts on an -processor hypercube for
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
5
1
(e) 5-shift
6
7
3
5
1
2
5
(c) 3-shift
6
2
7
3
5
1
7
3
0
6
(b) 2-shift
6
2
3
5
1
7
.
1
1
2
3
0
3
4
6
5
7
5
4
4
5
6
7
6
5
1
7
1
6
2
5
4
3
4
6
7
6
3
4
2
5
6
4
6
4
5
0
4
3
2
4
3
3
4
0
3
5
4
2
2
6
6
7
6
5
3
3
7
5
2
3
2
5
3
5
4
4
4
4
2
5
4
3
5
3
2
5
4
5
1
0
6
1
(a)
0
1
(b)
0
1
(c)
Figure 3.25 The six time-steps in one-to-all broadcast on an eight-processor hypercube with SF routing when the message is split into three parts that are routed separately
on three different spanning binomial trees.
Copyright (r) 1994 Benjamin/Cummings Publishing Co.
Source
Figure 3.26 A sparse three-dimensional mesh of 64 processors [?].
Copyright (r) 1994 Benjamin/Cummings Publishing Co.

Download Report

One-to-all broadcast M M M M Single

Paperzz.com

Your Paperzz