1/25/2010 Chapter 2 Parallel Architectures Brief Review Complexity Concepts Needed for Comparisons • Whenever we define a counting function, we usually characterize the growth rate of that function in terms of complexity classes. • Technical Definition: We say a function f(n) is in O(g(n)), if (and only if) there are positive constants c and n0 such that 0 ≤ f(n) cg(n) for n n0 • O(n) is read as big-oh of n. • This notation can be used to separate counting functions i t complexity into l it classes l th thatt characterize h t i th the size i off th the count. • We can use it for any kind of counting functions such as timings, bisection widths, etc. 2 1 1/25/2010 Why Asymptotic Behavior is Important? 1. Allows us to compare counts on large sets. 2 Helps us understand the maximum size of input 2. that can be handled in a given time, provided we know the environment in which we are running. 3. Stresses the fact that even dramatic speedups in hardware can not overcome the handicap of an asymptotically t ti ll slow l algorithm. l ith 3 Order Wins Out The TRS-80 Main language support: BASIC - typically a slow running interpreted language The CRAY-YMP Language used in example: FORTRAN – a fast running language 4 2 1/25/2010 CRAY YMP with FORTRAN complexity is 3n3 TRS-80 with BASIC complexity is 19,500,000n microsecond (abbr µsec) One One--millionth of a second. millisecond (abbr msec) One One--thousandth of a second. n is: 10 3 microsec 100 3 millisec 200 millisec 2 sec 1000 3 sec 20 sec 2500 50 sec 50 sec 10000 49 min 3.2 min 1000000 95 years 5.4 hours 5 Interconnection Networks • Uses of interconnection networks – Connect processors to shared memory – Connect processors to each other • Interconnection media types – Shared medium – Switched medium • Different interconnection networks define different parallel machines machines. • The interconnection network’s properties influence the type of algorithm used for various machines as it affects how data is routed. 6 3 1/25/2010 Shared versus Switched Media a. With shared medium, one message is sent & all processors listen b. With switched medium, multiple messages are possible. 7 Shared Medium • • • • Allows only one message at a time Messages are broadcast Each processor “listens” to every message Before sending a message, a processor “listens” until medium is unused • Collisions require resending of messages • Ethernet is an example 8 4 1/25/2010 Switched Medium • Supports point-to-point messages between pairs of processors • Each processor is connected to one switch • Advantages over shared media – Allows multiple messages to be sent simultaneously – Allows scaling of the network to accommodate the increase in processors 9 Switch Network Topologies • View Vie sswitched itched net network ork as a graph – Vertices = processors or switches – Edges = communication paths • Two kinds of topologies – Direct – Indirect 10 5 1/25/2010 Direct Topology • Ratio of switch nodes to processor nodes is 1:1 • Every switch node is connected to: – 1 processor node – At least 1 other switch node – Example 11 Indirect Topology • Ratio of sswitch itch nodes to processor nodes is greater than 1:1 • Some switches simply connect to other switches • Examples 12 6 1/25/2010 Terminology for Evaluating Switch Topologies • We evaluate 4 characteristics of a network in order to understand their effectiveness in implementing efficient parallel algorithms on a machine with a given network • Characteristics – – – – Diameter Bisection width Edges per node Constant edge length 13 Switch Characteristics • Diameter – Largest distance between two switch nodes – A low diameter is desirable – It puts a lower bound on the complexity of parallel algorithms which requires communication between arbitrary pairs of nodes 14 7 1/25/2010 Switch Characteristics • Bisection width – The minimum number of edges between switch nodes that must be removed in order to divide the network into two halves – High bisection width is desirable – In algorithms requiring large amounts of data movement, the size of the data set divided by the bisection width puts a lower bound on the complexity of an algorithm – Proving what the bisection width of a network is can be quite difficult 15 Switch Characteristics • Number of edges / node – It is best if the maximum number of edges/node is a constant independent of network size, as this allows the processor organization to scale more easily to a larger number of nodes – Degree is the maximum number of edges per node • Constant edge length? (yes/no) –A Again, i ffor scalability, l bilit it iis b bestt if th the nodes d and d edges d can be laid out in 3D space so that the maximum edge length is a constant independent of network size 16 8 1/25/2010 Evaluating Switch Topologies • We will briefly look at the following topologies – – – – – – – 2-D mesh linear network binary tree hypertree butterfly hypercube shuffle-exchange shuffle exchange • Those in yellow have been used in commercial parallel computers. 17 2-D Meshes Note: Circles represent switches and squares represent processors in all these slides. 18 9 1/25/2010 2-D Mesh Network • Direct topology • Switches arranged into a 2-D lattice or grid • Communication allowed only between neighboring switches i h • Torus: Variant that includes wraparound connections between switches on edge of mesh 19 Linear Network • • • • Switches arranged into a 1-D mesh p gy Direct topology Corresponds to a row or column of a 2-D mesh Ring : A variant that allows a wraparound connection between switches on the end • The linear and ring networks have many applications • Essentially supports a pipeline in both directions • Although these networks are very simple, they support many optimal algorithms 20 10 1/25/2010 Binary Tree Network • Indirect topology • n = 2d processor nodes, 2n-1 switches, where d= 0,1,... is the number of levels i.e. 23 = 8 processors on bottom and 2(n) – 1 = 2(8) – 1 = 15 switches 21 Hypertree Network (of degree 4 and depth 2) (a) Front view: 4-ary tree of height 2 (b) Side view: upside down binary tree of height d (c) Complete network 22 11 1/25/2010 Hypertree Network • • • • Indirect topology Note- the degree k and the depth d must be specified Note This gives from the front a k-ary tree of height d. From the side, the same network looks like an upside down binary tree of height d • Joining the front and side views yields the complete network 23 Butterfly Network • Indirect topology • n = 2d processor nodes connected by n(log n + 1) switching nodes A 23 = 8 processor butterfly network with 8*4=32 switching nodes As complicated as this switching network appears to be be, it is really reall q quite ite simple as it admits a very nice routing algorithm! Wrapped Butterfly: When top and bottom ranks are merged into single rank. 24 The rows are called ranks. 12 1/25/2010 Why It Is Called a Butterfly Network • Walk cycles such as node(i,j), node(i-1,j), node(i,m), node(i-1,m), node(i,j) where m is determined by the bit flipping as shown and you “see” a butterfly: 25 Butterfly Network Routing Send message from processor 2 to processor 5. Algorithm: 0 means ship left; 1 means ship right. 1) 5 = 101. Pluck off leftmost bit 1 and send “01msg” to right. 2) Pluck off leftmost bit 0 and send “1msg” “ ” to left. f 3) Pluck off leftmost bit 1 and send “msg” to right. Each cross edge followed changes address by 1 bit. 26 13 1/25/2010 Hypercube (also called binary n-cube) A hypercube with n = 2d processors & switches for d=4 27 Hypercube (or Binary n-cube) n = 2d Processors • Direct topology • 2 x 2 x … x 2 mesh • Number N b off nodes d iis a power of 2 • Node addresses 0, 1, …, n-1 • Node i is connected to k nodes whose addresses differ from i in exactly one bit position • Example: k = 0111 is connected to 1111, 0011, 0101, and 0110 28 14 1/25/2010 Growing a Hypercube Note: For d = 4, it is called a 4-dimensional cube. 29 Routing on the Hypercube Network • Example: Send a message from node 2 = 0010 to node 5 = 0101 • The nodes differ in 3 bits so the shortest path will be of length 3. • One path is • 0010 0110 0100 0101 obtained by flipping one of the differing bits at each step. • Similar to butterfly • As with the butterfly network, bit flipping helps you route on this network. 30 15 1/25/2010 A Perfect Shuffle • A permutation that is produced as follows is called a perfect shuffle: • Given a power of 2 cards, numbered 0, 1, 2, ..., 2d -1, write the card number with d bits. By left rotating the bits with a wrap, we calculate the position of the card after the perfect shuffle. • Example: For d = 3, card 5 = 101. Left rotating and d wrapping i gives i us 011 011. S So, card d 5 goes tto position 3. Note that card 0 = 000 and card 7 = 111, stay in position 31 Shuffle-exchange Network with n = 2d Processors 0 1 2 3 4 5 6 • Direct topology p gy • Number of nodes is a power of 2 • Nodes have addresses 0, 1, …, 2d-1 • Two outgoing links from node i Shuffle link to node LeftCycle(i) Exchange link between node i and node i+1 when i is even 7 32 16 1/25/2010 Shuffle-exchange Addressing – 16 processors No arrows on line segment means it is bidirectional. Otherwise, you must follow the arrows. Devising a routing algorithm for this network is interesting and will be a homework problem. 33 Two Problems with Shuffle-Exchange • Shuffle Shuffle-Exchange Exchange does not expand well – A large shuffle-exchange network does not decompose well into smaller separate shuffle exchange networks – In a large shuffle-exchange network, a small percentage of nodes will be hot spots • They will encounter much heavier traffic 34 17 1/25/2010 Comparing Networks • All have logarithmic diameter except 2-D mesh • Hypertree, yp , butterfly, y, and hypercube yp have bisection width n / 2 • All have constant edges per node except hypercube • Only 2-D mesh, linear, and ring topologies keep edge lengths constant as network size increases • Shuffle-exchange is a good compromise- fixed number of edges per node, low diameter, good bisection width. – However, negative results on preceding slide also need to be considered 35 Flynn’s Taxonomy • SISD – Single instruction stream – Single data stream • SIMD – Single instruction stream – Multiple data streams • MISD – Multiple instruction streams – Single data stream • MIMD – Multiple instruction streams – Multiply data streams 36 18 1/25/2010 SISDs • Uniprocessors • Superscalars? S l ? 37 SIMD • Front end computer – Also called the control unit – Holds and runs program – Data manipulated sequentially • Processor array – Data manipulated in parallel • Pipelined vector processor 38 19 1/25/2010 MISD • Pipelineof multiple independently executing ti ffunctional ti l units it operating ti on a single stream of data – Systolic arrays 39 MIMD • Examples? 40 20 1/25/2010 Summary • Commercial parallel computers appeared i 1980 in 1980s • Multiple-CPU computers now dominate • Small-scale: Centralized multiprocessors • Large-scale: Distributed memory architectures – Multiprocessors – Multicomputers 41 21
© Copyright 2026 Paperzz