An Introduction to Parallel Programming with MPI

An Introduction to Parallel
Programming with MPI
March 22, 24, 29, 31
2005
David Adams
[email protected]
http://research.cs.vt.edu/lasca/schedule
Outline






Disclaimers
Overview of basic parallel programming on a cluster
with the goals of MPI
Batch system interaction
Startup procedures
Quick review
Blocking message passing
Non-blocking message passing

Lab day
Collective communications
Review
Functions we have covered in detail:
MPI_INIT
MPI_COMM_SIZE
MPI_SEND
MPI_FINALIZE
MPI_COMM_RANK
MPI_RECV
Useful constants:
MPI_COMM_WORLD
MPI_ANY_TAG
MPI_ANY_SOURCE
MPI_SUCCESS
Motivating Example for Deadlock
SEND
RECV
P1
RECV
SEND
RECV
SEND
P2
P3
P10
P4
P9
…
P5
P8
P7
P6
Motivating Example for Deadlock
P1
P2
P3
P10
Timestep: 1
P9
P4
P5
P8
P7
P6
Motivating Example for Deadlock
P1
P2
P3
P10
Timestep: 2
P9
P4
P5
P8
P7
P6
Motivating Example for Deadlock
P1
P2
P3
P10
Timestep: 3
P9
P4
P5
P8
P7
P6
Motivating Example for Deadlock
P1
P2
P3
P10
Timestep: 4
P9
P4
P5
P8
P7
P6
Motivating Example for Deadlock
P1
P2
P3
P10
Timestep: 5
P9
P4
P5
P8
P7
P6
Motivating Example for Deadlock
P1
P2
P3
P10
Timestep: 6
P9
P4
P5
P8
P7
P6
Motivating Example for Deadlock
P1
P2
P3
P10
Timestep: 7
P9
P4
P5
P8
P7
P6
Motivating Example for Deadlock
P1
P2
P3
P10
Timestep: 8
P9
P4
P5
P8
P7
P6
Motivating Example for Deadlock
P1
P2
P3
P10
Timestep: 9
P9
P4
P5
P8
P7
P6
Motivating Example for Deadlock
P1
P2
P3
P10
Timestep: 10!
P9
P4
P5
P8
P7
P6
Solution
MPI_SENDRECV(sendbuf, sendcount,
sendtype, dest, sendtag, recvbuf,
recvcount, recvtype, source, recvtag,
comm, status, ierror)

The semantics of a send-receive operation is
what would be obtained if the caller forked
two concurrent threads, one to execute the
send, and one to execute the receive,
followed by a join of these two threads.
Nonblocking Message Passing
Allows for the overlap of communication
and computation.
Completion of a message is broken into
four steps instead of two.




post-send
complete-send
post-receive
complete-receive
Posting Operations
MPI_ISEND (BUF, COUNT, DATATYPE, DEST, TAG,
COMM, REQUEST, IERROR)



IN <type> BUF(*)
IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM,
OUT IERROR, REQUEST
MPI_IRECV (BUF, COUNT, DATATYPE, SOURCE, TAG,
COMM, REQUEST, IERROR)



IN <type> BUF(*)
IN INTEGER, COUNT, DATATYPE, SOURCE, TAG, COMM,
OUT IERROR, REQUEST
Request Objects
All nonblocking communications use request
objects to identify communication operations
and link the posting operation with the
completion operation.
Conceptually, they can be thought of as a
pointer to a specific message instance floating
around in MPI space.
Just as in pointers, request handles must be
treated with care or you can create request
handle leaks (like a memory leak) and
completely lose access to the status of a
message.
Request Objects
The value MPI_REQUEST_NULL is used to
indicate an invalid request handle. Operations
that deallocate request objects set the request
handle to this value.
Posting operations allocate memory for request
objects and completion operations deallocate
that memory and clean up the space.
Completion Operations
MPI_WAIT(REQUEST, STATUS, IERROR)


INOUT INTEGER REQUEST
OUT STATUS, IERROR
A call to MPI_WAIT returns when the operation identified by
REQUEST is complete.
MPI_WAIT is the blocking version of completion operations where
the program has determined it can’t do any more useful work without
completing the current message. In this case, it chooses to block
until the corresponding send or receive completes.
In iterative parallel code, it is often the case that an MPI_WAIT is
placed directly before the next post operation that intends to use the
same request object variable.
Successful completion of the function MPI_WAIT will set
REQUEST=MPI_REQUEST_NULL.
Completion Operations
MPI_TEST(REQUEST, FLAG, STATUS, IERROR)



INOUT INTEGER REQUEST
OUT STATUS(MPI_STATUS_SIZE)
OUT LOGICAL FLAG
A call to MPI_TEST returns flag=true if the operation identified by
REQUEST is complete.
MPI_TEST is the nonblocking version of completion operations.
If flag=true then MPI_TEST will clean up the space associated with
REQUEST, deallocating the memory and setting REQUEST =
MPI_REQUEST_NULL.
MPI_TEST allows the user to create code that can attempt to
communicate as much as possible but continue doing useful work if
messages are not ready.
Maximizing Overlap
To achieve maximum overlap between computation and
communication, communications should be started as
soon as possible and completed as late as possible.




Sends should be posted as soon as the data to be sent is
available.
Receives should be posted as soon as the receive buffer can be
used.
Sends should be completed just before the send buffer is to be
reused.
Receives should be completed just before the data in the buffer
is to be reused.
Overlap can often be increased by reordering the
computation.
Setting up your account for MPI
http://courses.cs.vt.edu/~cs4234/MPI/first_
exercise.html
List of 124 machine names:

http://courses.cs.vt.edu/~cs4234/MPI/124hosts.txt
More Stuff
Note: to login the 124 linux machines from the outside world, you do "ssh
rlogin.cslab.vt.edu". You will then be logged into one of the machines in the lab.
Set up public/private key pair. You only have to do this once. It will allow you to launch
mpi jobs from any of the McB 124 machines, and have them run on any of these
machines, without having to type passwords.

First, enter the command ssh-keygen -t dsa -N "" The result of this command will
be something like this:: Generating public/private dsa key pair. Enter file in which
to save the key (/home/ugrads/NAME/.ssh/id_dsa): Your identification has been
saved in /home/ugrads/NAME/.ssh/id_dsa. Your public key has been saved in
/home/ugrads/NAME/.ssh/id_dsa.pub. The key fingerprint is:
89:ff:00:5f:06:fd:d0:a2:9e:51:b1:00:cd:0a:76:6f
[email protected]

Then do this cd .ssh cp id_dsa.pub authorized_keys2

To make sure this step worked, try ssh'ing to another machine in the lab, e.g.,
"ssh strawberry". You should be able to do this without being prompted for a
password
Even More Stuff
Put /home/staff/ribbens/mpich-1.2.6/bin in
your path.
Make a subdirectory, mkdir MPI, and cd to
it.
Hello world example




Copy hello.c from /home/staff/ribbens/MPI.
Compile and link: mpicc -o hello hello.c
Run on 4 processors: mpirun -np 4 hello
Learn more about mpirun: mpirun -help