Cluster Computing - Sam Houston State University

Message-Passing
Computing
Dr. Tim McGuire
Sam Houston State University
ACET 2002
Corpus Christi, TX
Motivation
 So, you attended my 2000 talk in Austin*
(or read a similar article), built a Beowulf
cluster from castoff computers, and now
you’re wondering what you can do with it,
right?
 Well, that’s the motivation for this talk
*T. McGuire, “Building a Low-Cost Supercomputer,”
ACET2000, Austin, Texas, September 2000.
The Target Machine
 For the purpose of this talk, we will look at
Beowulf clusters
 All the techniques we discuss can also be
extended to a network of workstations
 The differences are that a Beowulf cluster uses:
 dedicated processors (rather than scavenging cycles
from idle workstations)
 a private system area network (enclosed SAN rather
than exposed LAN)
How Does One Program a
Beowulf?
 The short answer is Message Passing, a
technique originally developed for distributed
computing
 The Beowulf architecture means that message
passing is more efficient -- it doesn't have to
compete with other traffic on the net
 Other techniques are being explored – Java is a
popular topic at this time
A Typical Uniprocessor
System
 Consists of a processor executing a
program stored in main memory
Types of Parallel Computers
 Two principal types:
 Shared memory multiprocessor
 Distributed memory multi-computer
Shared Memory
Multiprocessor System
 Natural way to extend
single processor model have multiple processors
connected to multiple
memory modules, such
that each processor can
access any memory
module - so-called shared
memory configuration
Shared memory
multiprocessor system
 Any memory location can be accessible by any
of the processors.
 A single address space exists, meaning that
each memory location is given a unique address
within a single range of addresses.
 Generally, shared memory programming more
convenient although it does require access to
shared data to be controlled by the programmer
(using critical sections etc.)
Message-Passing
Multicomputer
 Complete computers connected through an
interconnection network:
Message Passing Software
 PVM (parallel virtual machine) was the
first widely used API
 Developed at Oak Ridge National Laboratory (late
1980s)
 Very widely used (free)
 Berkeley NOW (network of workstations) project
 Has task scheduling and other advanced features
 http://www.epm.ornl.gov/pvm/
More Recent Message
Passing Work
 MPI (Message-passing Interface)
 Standard for message passing libraries
 Defines routines but not implementation
 Has adequate features for most parallel applications
 Version 1 released in 1994 with 120+ routines
defined
 Version 2 now available
 Both PVM and MPI provide a set of user-level libraries
for message passing with normal programming
languages (C, C++, Fortran)
Basics of Message-Passing
Basics of Message-Passing Programming
using user-level message passing
libraries:
 Two primary mechanisms needed:
1. A method of creating separate processes for
execution on different computers
2. A method of sending and receiving
messages
Single Program Multiple Data
(SPMD) Model
 Different processes
Basic MPI
merged into one
model
program. Within
program, control
statements select
Executables
different parts for each
processor to execute. All
executables started
together - static process
creation.
Processor 0
Source
file
Compile to
suit processor
Processor n-1
Basic “point-to-point” Send
and Receive Routines
 Passing a message between processes using
send() and recv() library calls:
MPI (Message Passing Interface)
 Standard developed by group of
academics and industrial partners to
foster more widespread use and
portability
 Defines routines, not implementation
 Several free implementations exist
A Simple MPI Example
 The first C program most of us saw was
the “Hello, World!” program in K&R
 We’ll look at a variant that makes some
use of multiple processes to have each
process send a greeting to another
process
 We will assume we have p processes
identified by their rank 0, 1 …, p-1
First MPI Program
/* From Peter Pacheco, University of San Francisco */
#include <stdio.h>
#include “mpi.h”
int main(int argc, char *argv[]) {
int myrank;
/* rank of process
*/
int p;
/* number of processes
*/
int source;
/* rank of sender
*/
int dest
;
/* rank of receiver
*/
int tag = 0;
/* tag for messages
*/
char message[100];/* storage for message
*/
MPI_STATUS status;/* receive
*/
/* Start up MPI */
MPI_Init(&argc, &argv);
First MPI Program, Cont’d
/* Find out process rank */
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
/* Find out number of processes */
MPI_Comm_size(MPI_COMM_WORLD, &p);
if (my_rank != 0) {
/* Create message */
sprintf(message, "Greetings from process %d!", my_rank);
dest = 0;
/* Use strlen+1 so that '\0' gets transmitted */
MPI_Send(message, strlen(message)+1, MPI_CHAR,
dest, tag, MPI_COMM_WORLD);
} else { /* my_rank == 0 */
for (source = 1; source < p; source++) {
MPI_Recv(message, 100, MPI_CHAR, source, tag,
MPI_COMM_WORLD, &status);
printf("%s\n", message);
} /* end for */
} /* end if */
First MPI Program, Cont’d
/* Shut down MPI */
MPI_Finalize();
} /* main */
 The details of compilation and execution
depend on the system you’re using
 On Bubbawulf:
 gcc –o greetings greetings.c –lmpi
 To run with two processors:
 mpirun –np 2 greetings
Running the first program
 When the program is compiled and run with 4
processes, the output should be:
Greetings from process 1!
Greetings from process 2!
Greetings from process 3!
 This is an example of a special type of MIMD
programming called SPMD (single-program,
multiple-data) programming
 Different processes execute different statements
by branching within the program based on their
process ranks
MPI
 The program consists entirely of C
statements
 MPI is simply a library of definitions and
functions (C or Fortran)
General MPI Programs
 Every MPI program contains the directive
#include “mpi.h”
which includes the definitions and declarations
necessary for compiling an MPI program
 MPI uses a consistent scheme for identifiers –
all begin with “MPI_”
 MPI uses communicators (collections of
processes that can send messages to each
other) – MPI_COMM_WORLD is the default
 Often 1 process per processor, but not
necessarily
MPI Program Skeleton
...
#include "mpi.h"
...
int main(int argc, char* argv[]) {
...
/* No MPI functions called before this */
MPI_Init(&argc, &argv); /* initialize MPI system */
...
/* No MPI functions called after this */
MPI_Finalize(); /* clean up MPI memory, etc. */
...
} /* main */
Essential MPI Functions
 MPI_Comm_size()
 Used to find out how many processes are
involved in the execution of a program
 MPI_Comm_rank() lets a process find
out its rank
 Essential since we are using SPMD
 MPI_Send() and MPI_Recv() are used
to accomplish the actual message passing
The Killer App
 Every paradigm shift in computing needs a
motivation
 The typical applications for parallel and
distributed processing are not as accessible to
the general undergraduate
 Large matrix operations, etc
 I propose a simple yet interesting application,
using synchronous computations
Cellular Automata
 The problem space is divided into cells.
 Each cell can be in one of a finite number of
states.
 Cells affected by their neighbors according to
certain rules, and all cells are affected
simultaneously in a “generation.”
 Rules re-applied in subsequent generations so
that cells evolve, or change state, from
generation to generation.
Heat Distribution Problem
 An area has known temperatures along each of its
edges. Find the temperature distribution within.
 Divide area into fine mesh of points, hi,j. Temperature at
an inside point taken to be average of temperatures of
four neighboring points. Convenient to describe edges
by points.
 Temperature of each point by iterating the equation:
H i ,j = (Hi -1, j +Hi +1 ,j + Hi ,j -1 + Hi ,j +1 )/4
(0 < i < n, 0 < j < n) for a fixed number of iterations or
until the difference between iterations less than some
very small amount.
Heat Distribution Problem
Parallel Code
w = x = y = z = initial temp
for (iteration = 0; iteration < limit; iteration++) {
g = 0.25 * (w + x + y + z);
send(&g, Pi-1,j); /* non-blocking sends */
send(&g, Pi+1,j);
send(&g, Pi,j-1);
send(&g, Pi,j+1);
recv(&w, Pi-1,j); /* synchronous recvs */
recv(&x, Pi+1,j);
recv(&y, Pi,j-1);
recv(&z, Pi,j+1);
}
 Important to use send()s that do not block while
waiting for the recv()s; otherwise the processes would
deadlock, each waiting for a recv() before moving on recv()s must be synchronous and wait for the
send()s.
The Game of Life
 Most famous cellular automata is the “Game of
Life” devised by John Conway (Scientific
American, October 1970)
 Also good assignment for graphical output, if
available
 Board game - theoretically infinite twodimensional array of cells.
 Each cell can hold one “organism” and has eight
neighboring cells, including those diagonally
adjacent. Initially, some cells occupied.
The Rules of Life
1. Every organism with two or three neighboring
organisms survives for the next generation.
2. Every organism with four or more neighbors dies from
overpopulation.
3. Every organism with one neighbor or none dies from
isolation.
4. Each empty cell adjacent to exactly three occupied
neighbors will give birth to an organism.
 These rules were derived by Conway “after a long
period of experimentation.”
How to Solve Life
 Each block can be represented as a
process
 Initialization can be done by giving some
blocks one organism and other blocks
none. This can be done randomly or
using a heuristic approach.
Outline of the Code
do {
iteration++
current_neighbors = 0;
send(current value – 0 or 1 – to all neighbors);
recv(current values of all neighbors);
current_neighbors = sum of received values;
if (current_neighbors > 4)
organism = 0; /* Dead from overcrowding */
else if (current_neighbors < 1)
organism = 0; /* Dead from isolation */
else if (current_neighbors == 3)
organism = 1; /* new organism created */
} while((!converged() || (iteration < limit));
Some Other Fun Examples
 Foxes and Rabbits
 Rabbits move around happily (reproducing)
while foxes eat any rabbits they come across
 Also based on a 2-D board
 Sharks and Fishes
 Ocean modeled as a 3-D array of cells
 Each cell holds one fish or one shark
Serious Applications for
Cellular Automata




Diffusion of gases
Airflow across an airplane wing
Erosion/movement of sand at a beach
Biological growth
IEEE Task Force on Cluster
Computing
 Aim to foster the use and development of
clusters
 Has been in operation since 1999
 Main home page: http://www.ieeetfcc.org
Conclusions
• Cluster computing can be effectively
taught at the undergraduate level
• Excellent and fun examples of applications
exist
Quote: Gill wrote in 1958
(quoting papers back to 1953):
“ … There is therefore nothing new in the basic idea of parallel
programming, but only its application to computers. The author
cannot believe that there will be any insuperable difficulty in
extending it to computers. It is not to be expected that the
necessary programming techniques will be worked out overnight.
Much experimenting remains to be done. After all, the techniques
that are commonly used in programming today were only won at
the cost of considerable toil several years ago. In fact the advent of
parallel programming may do something to revive the pioneering
spirit in programming which seems at the present to be
degenerating into a rather dull and routine occupation.”
Gill, S. (1958), “Parallel Programming,” The Computer Journal (British) Vol. 1, pp. 2-10.