Chapter 2

1/25/2010
Chapter 2
Parallel Architectures
Brief Review
Complexity Concepts Needed for Comparisons
• Whenever we define a counting function, we usually
characterize the growth rate of that function in terms of
complexity classes.
• Technical Definition: We say a function f(n) is in O(g(n)),
if (and only if) there are positive constants c and n0 such
that
0 ≤ f(n)  cg(n) for n  n0
• O(n) is read as big-oh of n.
• This notation can be used to separate counting functions
i t complexity
into
l it classes
l
th
thatt characterize
h
t i th
the size
i off th
the
count.
• We can use it for any kind of counting functions such as
timings, bisection widths, etc.
2
1
1/25/2010
Why Asymptotic Behavior is Important?
1. Allows us to compare counts on large sets.
2 Helps us understand the maximum size of input
2.
that can be handled in a given time, provided
we know the environment in which we are
running.
3. Stresses the fact that even dramatic speedups
in hardware can not overcome the handicap of
an asymptotically
t ti ll slow
l
algorithm.
l ith
3
Order Wins Out
The TRS-80
Main language support: BASIC - typically a slow running
interpreted language
The CRAY-YMP
Language used in example: FORTRAN – a fast running
language
4
2
1/25/2010
CRAY YMP
with FORTRAN
complexity is 3n3
TRS-80
with BASIC
complexity is 19,500,000n
microsecond (abbr µsec) One
One--millionth of a second.
millisecond (abbr msec) One
One--thousandth of a second.
n is:
10
3 microsec
100
3 millisec
200 millisec
2 sec
1000
3 sec
20 sec
2500
50 sec
50 sec
10000
49 min
3.2 min
1000000
95 years
5.4 hours
5
Interconnection Networks
• Uses of interconnection networks
– Connect processors to shared memory
– Connect processors to each other
• Interconnection media types
– Shared medium
– Switched medium
• Different interconnection networks define
different parallel machines
machines.
• The interconnection network’s properties
influence the type of algorithm used for various
machines as it affects how data is routed.
6
3
1/25/2010
Shared versus Switched Media
a. With shared medium, one message is sent & all processors listen
b. With switched medium, multiple messages are possible.
7
Shared Medium
•
•
•
•
Allows only one message at a time
Messages are broadcast
Each processor “listens” to every message
Before sending a message, a processor
“listens” until medium is unused
• Collisions require resending of messages
• Ethernet is an example
8
4
1/25/2010
Switched Medium
• Supports point-to-point messages
between pairs of processors
• Each processor is connected to one switch
• Advantages over shared media
– Allows multiple messages to be sent
simultaneously
– Allows scaling of the network to
accommodate the increase in processors
9
Switch Network Topologies
• View
Vie sswitched
itched net
network
ork as a graph
– Vertices = processors or switches
– Edges = communication paths
• Two kinds of topologies
– Direct
– Indirect
10
5
1/25/2010
Direct Topology
• Ratio of switch nodes to processor nodes
is 1:1
• Every switch node is connected to:
– 1 processor node
– At least 1 other switch node
– Example
11
Indirect Topology
• Ratio of sswitch
itch nodes to processor nodes
is greater than 1:1
• Some switches simply connect to other
switches
• Examples
12
6
1/25/2010
Terminology for Evaluating
Switch Topologies
• We evaluate 4 characteristics of a network in
order to understand their effectiveness in
implementing efficient parallel algorithms on a
machine with a given network
• Characteristics
–
–
–
–
Diameter
Bisection width
Edges per node
Constant edge length
13
Switch Characteristics
• Diameter – Largest distance between two switch
nodes
– A low diameter is desirable
– It puts a lower bound on the complexity of parallel
algorithms which requires communication between
arbitrary pairs of nodes
14
7
1/25/2010
Switch Characteristics
• Bisection width – The minimum number of edges
between switch nodes that must be removed in
order to divide the network into two halves
– High bisection width is desirable
– In algorithms requiring large amounts of data
movement, the size of the data set divided by the
bisection width puts a lower bound on the complexity
of an algorithm
– Proving what the bisection width of a network is can
be quite difficult
15
Switch Characteristics
• Number of edges / node
– It is best if the maximum number of edges/node is a
constant independent of network size, as this allows
the processor organization to scale more easily to a
larger number of nodes
– Degree is the maximum number of edges per node
• Constant edge length? (yes/no)
–A
Again,
i ffor scalability,
l bilit it iis b
bestt if th
the nodes
d and
d edges
d
can be laid out in 3D space so that the maximum
edge length is a constant independent of network size
16
8
1/25/2010
Evaluating Switch Topologies
• We will briefly look at the following topologies
–
–
–
–
–
–
–
2-D mesh
linear network
binary tree
hypertree
butterfly
hypercube
shuffle-exchange
shuffle
exchange
• Those in yellow have been used in commercial
parallel computers.
17
2-D Meshes
Note: Circles represent switches and squares
represent processors in all these slides.
18
9
1/25/2010
2-D Mesh Network
• Direct topology
• Switches arranged into a 2-D lattice or grid
• Communication allowed only between neighboring
switches
i h
• Torus: Variant that includes wraparound connections
between switches on edge of mesh
19
Linear Network
•
•
•
•
Switches arranged into a 1-D mesh
p gy
Direct topology
Corresponds to a row or column of a 2-D mesh
Ring : A variant that allows a wraparound
connection between switches on the end
• The linear and ring networks have many
applications
• Essentially supports a pipeline in both directions
• Although these networks are very simple, they
support many optimal algorithms
20
10
1/25/2010
Binary Tree Network
• Indirect topology
• n = 2d processor nodes, 2n-1 switches,
where d= 0,1,... is the number of levels
i.e. 23 = 8 processors
on bottom and
2(n) – 1 = 2(8) – 1 =
15 switches
21
Hypertree Network (of degree 4 and
depth 2)
(a) Front view: 4-ary tree of height 2
(b) Side view: upside down binary tree of height d
(c) Complete network
22
11
1/25/2010
Hypertree Network
•
•
•
•
Indirect topology
Note- the degree k and the depth d must be specified
Note
This gives from the front a k-ary tree of height d.
From the side, the same network looks like an upside
down binary tree of height d
• Joining the front and side views yields the complete
network
23
Butterfly Network
• Indirect topology
• n = 2d processor
nodes connected
by n(log n + 1)
switching nodes
A 23 = 8 processor
butterfly network with
8*4=32 switching nodes
As complicated as this
switching network appears
to be
be, it is really
reall q
quite
ite
simple as it admits a very
nice routing algorithm!
Wrapped Butterfly: When
top and bottom ranks are
merged into single rank.
24
The rows are called ranks.
12
1/25/2010
Why It Is Called a Butterfly Network
• Walk cycles such as node(i,j), node(i-1,j),
node(i,m), node(i-1,m), node(i,j) where m
is determined by the bit flipping as shown
and you “see” a butterfly:
25
Butterfly Network Routing
Send message from
processor 2 to processor 5.
Algorithm:
0 means ship left;
1 means ship right.
1) 5 = 101. Pluck off leftmost
bit 1 and send “01msg” to
right.
2) Pluck off leftmost bit 0
and send “1msg”
“
” to left.
f
3) Pluck off leftmost bit 1
and send “msg” to right.
Each cross edge followed
changes address by 1 bit.
26
13
1/25/2010
Hypercube
(also called binary n-cube)
A hypercube with n = 2d processors & switches for d=4
27
Hypercube (or Binary n-cube)
n = 2d Processors
• Direct topology
• 2 x 2 x … x 2 mesh
• Number
N b off nodes
d iis a
power of 2
• Node addresses 0, 1,
…, n-1
• Node i is connected to
k nodes whose
addresses differ from i in
exactly one bit position
• Example: k = 0111 is
connected to 1111, 0011,
0101, and 0110
28
14
1/25/2010
Growing a Hypercube
Note: For d = 4, it is called
a 4-dimensional cube.
29
Routing on the Hypercube Network
• Example: Send a
message from node
2 = 0010 to node 5 =
0101
• The nodes differ in 3
bits so the shortest
path will be of length
3.
• One path is
• 0010  0110 
0100  0101
obtained by flipping
one of the differing
bits at each step.
• Similar to butterfly
• As
with the butterfly network, bit
flipping helps you route on this
network.
30
15
1/25/2010
A Perfect Shuffle
• A permutation that is produced as follows is
called a perfect shuffle:
• Given a power of 2 cards, numbered 0, 1, 2, ...,
2d -1, write the card number with d bits. By left
rotating the bits with a wrap, we calculate the
position of the card after the perfect shuffle.
• Example: For d = 3, card 5 = 101. Left rotating
and
d wrapping
i gives
i
us 011
011. S
So, card
d 5 goes tto
position 3. Note that card 0 = 000 and card 7 =
111, stay in position
31
Shuffle-exchange Network
with n = 2d Processors
0
1
2
3
4
5
6
• Direct topology
p gy
• Number of nodes is a power of 2
• Nodes have addresses 0, 1, …, 2d-1
• Two outgoing links from node i
Shuffle link to node LeftCycle(i)
Exchange link between node i and node i+1
when i is even
7
32
16
1/25/2010
Shuffle-exchange Addressing – 16
processors
No arrows on line segment means it is bidirectional.
Otherwise, you must follow the arrows.
Devising a routing algorithm for this network is
interesting and will be a homework problem.
33
Two Problems with
Shuffle-Exchange
• Shuffle
Shuffle-Exchange
Exchange does not expand well
– A large shuffle-exchange network does not
decompose well into smaller separate shuffle
exchange networks
– In a large shuffle-exchange network, a small
percentage of nodes will be hot spots
• They will encounter much heavier traffic
34
17
1/25/2010
Comparing Networks
• All have logarithmic diameter
except 2-D mesh
• Hypertree,
yp
, butterfly,
y, and hypercube
yp
have bisection
width n / 2
• All have constant edges per node except hypercube
• Only 2-D mesh, linear, and ring topologies keep edge
lengths constant as network size increases
• Shuffle-exchange is a good compromise- fixed number
of edges per node, low diameter, good bisection width.
– However, negative results on preceding slide also need
to be considered
35
Flynn’s Taxonomy
• SISD
– Single instruction stream
– Single data stream
• SIMD
– Single instruction stream
– Multiple data streams
• MISD
– Multiple instruction streams
– Single data stream
• MIMD
– Multiple instruction streams
– Multiply data streams
36
18
1/25/2010
SISDs
• Uniprocessors
• Superscalars?
S
l ?
37
SIMD
• Front end computer
– Also called the control unit
– Holds and runs program
– Data manipulated sequentially
• Processor array
– Data manipulated in parallel
• Pipelined vector processor
38
19
1/25/2010
MISD
• Pipelineof multiple independently
executing
ti ffunctional
ti
l units
it operating
ti on a
single stream of data
– Systolic arrays
39
MIMD
• Examples?
40
20
1/25/2010
Summary
• Commercial parallel computers appeared
i 1980
in
1980s
• Multiple-CPU computers now dominate
• Small-scale: Centralized multiprocessors
• Large-scale: Distributed memory
architectures
– Multiprocessors
– Multicomputers
41
21