Lec07b-MPI-exercise

COMP7330/7336 Advanced Parallel and
Distributed Computing
Message Passing Interface Exercise
Dr. Xiao Qin
Auburn University
http://www.eng.auburn.edu/~xqin
[email protected]
Slides are adopted from Drs. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
2
Review: Avoiding Deadlocks
Consider the following piece of code, in which process i
sends a message to process i + 1 (modulo the number of
processes) and receives a message from process i - 1
(module the number of processes).
int a[10], b[10], npes, myrank;
MPI_Status status;
...
MPI_Comm_size(MPI_COMM_WORLD, &npes);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Send(a, 10, MPI_INT, (myrank+1)%npes, 1,
MPI_COMM_WORLD);
MPI_Recv(b, 10, MPI_INT, (myrank-1+npes)%npes, 1,
MPI_COMM_WORLD);
...
/* Deadlock?*/
3
Odd-Even Sort
• Also known odd–even transposition sort or brick sort[1]
• A simple sorting algorithm for parallel processors with
local interconnections.
• Related to bubble sort
• Compare all (odd, even)-indexed pairs of adjacent
elements in the list
• If a pair is in the wrong order, then the elements are
switched.
• The next step repeats this for (even, odd)-indexed pairs
(of adjacent elements).
• Then it alternates between (odd, even) and (even, odd)
steps until the list is sorted.
4
Example: Odd-Even Sort
Indices: 1 2 3 4 5 6 7 8
Num: 3 2 3 8 5 6 4 1
Complete in 8 phases.
5
Odd-Even Transposition
Sorting n = 8 elements, using the odd-even transposition sort
algorithm. During each phase, n = 8 elements are compared.
6
Sequential odd-even transposition
sort algorithm
7
Odd-Even Transposition
• After n phases of odd-even exchanges, the sequence
is sorted.
• Each phase of the algorithm (either odd or even)
requires Θ(n) comparisons.
• Serial complexity is Θ(n2).
8
Parallel Odd-Even Transposition
(How?)
• Consider the one item per processor case.
• There are n iterations, in each iteration, each
processor does one compare-exchange.
• The parallel run time of this formulation is Θ(n).
• This is cost optimal with respect to the base serial
algorithm but not the optimal one.
9
Parallel Odd-Even Transposition
10
Parallel Odd-Even Transposition
• Consider a block of n/p elements per processor.
• The first step is a local sort.
• In each subsequent step, the compare exchange
operation is replaced by the compare split operation.
• The parallel run time of the formulation is
11
Parallel Odd-Even Transposition
• The parallel formulation is cost-optimal for p = O(log
n).
• The isoefficiency function of this parallel formulation
is Θ(p2p).
12