COMP7330/7336 Advanced Parallel and Distributed Computing Message Passing Interface Exercise Dr. Xiao Qin Auburn University http://www.eng.auburn.edu/~xqin [email protected] Slides are adopted from Drs. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar 2 Review: Avoiding Deadlocks Consider the following piece of code, in which process i sends a message to process i + 1 (modulo the number of processes) and receives a message from process i - 1 (module the number of processes). int a[10], b[10], npes, myrank; MPI_Status status; ... MPI_Comm_size(MPI_COMM_WORLD, &npes); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Send(a, 10, MPI_INT, (myrank+1)%npes, 1, MPI_COMM_WORLD); MPI_Recv(b, 10, MPI_INT, (myrank-1+npes)%npes, 1, MPI_COMM_WORLD); ... /* Deadlock?*/ 3 Odd-Even Sort • Also known odd–even transposition sort or brick sort[1] • A simple sorting algorithm for parallel processors with local interconnections. • Related to bubble sort • Compare all (odd, even)-indexed pairs of adjacent elements in the list • If a pair is in the wrong order, then the elements are switched. • The next step repeats this for (even, odd)-indexed pairs (of adjacent elements). • Then it alternates between (odd, even) and (even, odd) steps until the list is sorted. 4 Example: Odd-Even Sort Indices: 1 2 3 4 5 6 7 8 Num: 3 2 3 8 5 6 4 1 Complete in 8 phases. 5 Odd-Even Transposition Sorting n = 8 elements, using the odd-even transposition sort algorithm. During each phase, n = 8 elements are compared. 6 Sequential odd-even transposition sort algorithm 7 Odd-Even Transposition • After n phases of odd-even exchanges, the sequence is sorted. • Each phase of the algorithm (either odd or even) requires Θ(n) comparisons. • Serial complexity is Θ(n2). 8 Parallel Odd-Even Transposition (How?) • Consider the one item per processor case. • There are n iterations, in each iteration, each processor does one compare-exchange. • The parallel run time of this formulation is Θ(n). • This is cost optimal with respect to the base serial algorithm but not the optimal one. 9 Parallel Odd-Even Transposition 10 Parallel Odd-Even Transposition • Consider a block of n/p elements per processor. • The first step is a local sort. • In each subsequent step, the compare exchange operation is replaced by the compare split operation. • The parallel run time of the formulation is 11 Parallel Odd-Even Transposition • The parallel formulation is cost-optimal for p = O(log n). • The isoefficiency function of this parallel formulation is Θ(p2p). 12
© Copyright 2026 Paperzz