ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation ECE669 L23: Parallel Compilation April 29, 2004 Parallel Compilation ° Two approaches to compilation • Parallelize a program manually • Sequential code converted to parallel code ° Develop a parallel compiler • Intermediate form • Partitioning - Block based or loop based • Placement • Routing ECE669 L23: Parallel Compilation April 29, 2004 Compilation technologies for parallel machines Assumptions: Input: Parallel program Output: “Coarse” parallel program & directives for: • • • • Which threads run in 1 task Where tasks are placed Where data is placed Which data elements go in each data chunk Limitation: No special optimizations for synchronization -synchro mem refs treated as any other comm. ECE669 L23: Parallel Compilation April 29, 2004 Toy example ° Loop parallelization ECE669 L23: Parallel Compilation April 29, 2004 Example ° Matrix multiply ° Typically, FORALL i FORALL j FOR k Ci, j Ci , j Ai , k Bk , j ° Looking to find parallelism... ECE669 L23: Parallel Compilation April 29, 2004 Choosing a program representation... ° Dataflow graph a0 b0 a2 + c0 • No notion of storage • Data values flow along arcs • Nodes represent operations ECE669 L23: Parallel Compilation b2 + c2 a5 b5 + c5 problem April 29, 2004 Compiler representation ° For certain kinds of structured programs Array A Data array Ai, j Index expressions FORALL LOOP FORALL LOOP nest ° Unstructured programs Data X Communication weight Task B Task A ECE669 L23: Parallel Compilation i April 29, 2004 j Process reference graph FORALL i FROM 0 to 5 Ci A i B i Monolithic memory Memory or 2 P0 P1 P2 1 P5 ......Not very useful • Nodes represent threads (processes) computation • Edges represent communication (memory references) • Can attach weights on edges to represent volume of communication • Extension: precedence relations edges can be added too • Can also try to represent multiple loop produced threads as one node ECE669 L23: Parallel Compilation April 29, 2004 Process communication graph B0 A0 d1 d0 1 1 P0 1 C0 A01 d2 d3 1 P1 1 A2 1 B0 d4 d5 P5 P2 C1 • Allocate data items to nodes as well • Nodes: Threads, data objects • Edges: Communication • Key: Works for both shared-memory, object-oriented, and dataflow systems! (Msg. passing) ECE669 L23: Parallel Compilation April 29, 2004 PCG for Jacobi relaxation Fine PCG 4 4 4 4 4 : Computation : Data Coarse PCG ECE669 L23: Parallel Compilation April 29, 2004 Compilation with PCGs Fine process communication graph Partitioning Coarse process communication graph ECE669 L23: Parallel Compilation April 29, 2004 Compilation with PCGs Fine process communication graph Partitioning Coarse process communication graph MP: Placement Coarse process communication graph ... other phases, scheduling. Dynamic? ECE669 L23: Parallel Compilation April 29, 2004 Parallel Compilation ° Consider loop partitioning ° Create small local compilation ° Consider static routing between tiles ° Short neighbor-to-neighbor communication ° Compiler orchestrated ECE669 L23: Parallel Compilation April 29, 2004 Flow Compilation ° Modulo unrolling ° Partitioning ° Scheduling ECE669 L23: Parallel Compilation April 29, 2004 Modulo Unrolling – Smart Memory ° Loop unrolling relies on dependencies ° Allow maximum parallelism ° Minimize communication ECE669 L23: Parallel Compilation April 29, 2004 Array Partitioning – Smart Memory ° Assign each line to separate memory ° Consider exchange of data ° Approach is scalable ECE669 L23: Parallel Compilation April 29, 2004 Communication Scheduling – Smart Memory ° Determine where data should be sent ° Determine when data should be sent ECE669 L23: Parallel Compilation April 29, 2004 Speedup for Jacobi – Smart Memory ° Virtual wires indicates scheduled paths ° Hard wires are dedicated paths ° Hard wires require more wiring resources ° RAW is a parallel processor from MIT ECE669 L23: Parallel Compilation April 29, 2004 Partitioning • Use heuristic for unstructured programs • For structured programs... ...start from: Arrays A C B L0 L1 L2 Loop Nests ECE669 L23: Parallel Compilation List of arrays April 29, 2004 List of loops Notion of Iteration space, data space E.g. A Forall i Forall j Ai, j Ai 2, j 1 Ai , j 1 Matrix Data space Iteration space j Represents a “thread” with a given value of i,j i ECE669 L23: Parallel Compilation April 29, 2004 Notion of Iteration space, data space E.g. A Forall i Forall j Ai, j Ai 2, j 1 Ai , j 1 + Matrix Data space This thread affects the above computation Iteration space j Represents a “thread” with a given value of i,j i ° Partitioning: How to “tile” iteration for MIMD M/Cs data spaces? ECE669 L23: Parallel Compilation April 29, 2004 Loop partitioning for caches ° Machine model A Memory Network Cache Cache Cache P P P • Assume all data is in memory • Minimize first-time cache fetches • Ignore secondary effects such as invalidations due to writes ECE669 L23: Parallel Compilation April 29, 2004 Summary ° Parallel compilation often targets block based and loop based parallelism ° Compilation steps address identification of parallelism and representations • Graphs often useful to represent program dependencies ° For static scheduling both computation and communication can be represented ° Data positioning is an important for computation ECE669 L23: Parallel Compilation April 29, 2004
© Copyright 2024 Paperzz