SI 486L Lab 5: Life Goes On

SI 486L
Lab 5: Life Goes On
This lab is due prior to our next lab period. All the files should be zipped or tar'd into a single file and
e-mailed (the subject line should contain the phrase “lab 5”) as an attachment to the instructor. One of
the files should be a makefile with a default target that builds an MPI executable called “life”.
Introduction
Earlier this semester you built a cellular automaton simulating a simple life form. Your task is to
convert that to run up to 8-wide parallel. See Lab 2 for background on the simulation “Life”. Your MPI
version should be able to handle a large board because you will divide it up into as many as 8 subboards with each processor computing a portion of the board.
Tasks
Your objective is to modify your version of the game of Life in C language using MPI to make it a
parallel application. Like the earlier lab, the input file will contain a first line that tells the size of the
“board” - the dimensions of the 2-D grid. What follows, then, are an arbitrary number of lines that
contain x and y value pairs indicating populated cells at the start of the simulation. Here's a very
simple example of input:
1600 400
31
32
33
This declares a 1600 x 400 grid and has only 3 populated cells: (3,1), (3,2) and (3,3).
Design
The program should take one argument on the command line, an integer. The argument (argv[1]) is
the number of generations to run the simulation. The final values should be written to stdout in the
format of an input file for your colorIt program from an earlier lab.
The input to the program will be reading from stdin. Your version may support filenames specified as
an argument, but that isn't necessary today.
Here's an example of invoking the program:
mpirun ­n 8 ./life 500 < huge.data
This invokes, as an MPI parallel application, 8 copies of the binary “life” in the current directory, to
run 500 generations. The input comes from stdin, redirected from the file huge.data in the current
directory.
Here's a simple design (in pseudo code) for the main() function:
main
parse command line argument
initialize the boards
for each generation
halo exchange
compute the next generation
switch boards
consolidate the results
Dividing up the work and sharing it among the processors can be tricky. We'll make it easier on
ourselves by dividing along just the x-axis, as follows. Note the shared boundaries. We'll use periodic
boundaries (wrap around) both on the top/bottom (internally) and between the first/last boards.
Step #1
Read in all the points on rank 0 and send all the x-y pairs to all the ranks using MPI_Bcast; let each
rank select its points from the master list. That's easier (I hope) than sending each rank its own unique
points. How will each rank know how much space to allocate for the Bcast receive buffer? You'll have
to broadcast that first.
Step #2
Have each rank treat its coordinates as if they were the base grid. Because of the halo, the lowest
numbered location will be (1,1). If the 4th board were starting at, say, (1501, 1) for grids that are 500
cells across, have the program run as if it were starting at (1, 1). The input section can find its points
and map them to the base dimensions (e.g., subtract 500 from the x coordinate). That way, all array
access works simply, regardless of which section of the board each rank uses.
Step #3
The halo exchange will consist of two parts, internal copies for top and bottom and MPI calls to copy
the neighboring edges.
halo exchange:
copy the top to bottom, bottom to top
copy the left and right boundaries to neighbors
Be sure to include the halo parts when copying a column or row.
Step #4
After all the generations are computed you can “gather” the final results; normally this isn't a good
approach because the root rank will have to have enough memory to hold all boards. Don’t send the
board, just send the x,y pairs where there is life. Use these data to generate a file for your colorIt lab!
Why
Why are we doing this? This is a simple “halo exchange”, a staple in supercomputing applications that
cover large grids. This gets us ready for a more complex halo, where a single rank has neighbors not
just to the left and right, but top and bottom as well. We are dividing the work between processors;
they each get a piece of the data on which to work. This is called Data Parallelism.
The computation involved in Life isn't that intensive; we aren't making the simulation run that much
faster than if it were on a single processor. Running the same work faster by using more processors is
called Strong Scaling. However, we are able to run bigger simulations by running across multiple
processors. This is called Weak Scaling.
Clean Code Counts
The code for this and other labs will be graded on two criteria: 1) does it work? and 2) is it well
written? That means clean, readable code, well structured and well commented – including having
your name in the comments!