Cooperative Parallelism An evolutionary programming model for

Introducing Cooperative Parallelism
John May, David Jefferson
Nathan Barton, Rich Becker, Jarek Knap
Gary Kumfert, James Leek, John Tannahill
Lawrence Livermore National Laboratory
presented to the CCA Forum 25 Jan 2007
This work was performed under the auspices of the U.S. Department of Energy
by University of California Lawrence Livermore National Laboratory
under contract No. W-7405-Eng-48. UCRL-PRES-XXXXXX.
CASC
Outline





Challenges for massively parallel programming
Cooperative parallel programming model
Applications for cooperative parallelism
Cooperative parallelism and Babel
Ongoing work
Massive parallelism strains SPMD

Increasingly difficult to
make all processors work in
lock-step
— Lack of inherent parallelism
— Load balance

New techniques need richer
programming model than
pure SPMD with MPI
— Adaptive sampling
— Multi-model simulation (e.g.,
components)

New techniques
needed to fill the
gap
Fault tolerance requires
better process management
— Need smaller unit of granularity
for failure recovery,
checkpoint/restart
Introducing Cooperative Parallelism
Parallel symponent
using MPI internally
Ad hoc symponent
creation and
communication
Runtime system


Computational job consists of multiple interacting
“symponents”
— Large parallel (MPI) jobs or single processes
— Created and destroyed dynamically
— Appear as objects to each other
— Communicate through remote method invocation (RMI)
Apps can add symponents incrementally
Cooperative parallelism features



Three synchronization styles for RMI
— Blocking (caller waits for return)
— Nonblocking (caller checks later for result)
— One-way (caller dispatches request and has no
further interaction)
Target of RMI can be a single process or a parallel
job, with parameters distributed to all tasks
Closely integrated with Babel framework
— Symponents written in C, C++, Fortran, F90, Java,
and Python interact seamlessly
— Developer writes interface description files to
specify RMI interfaces
— Exceptions propagated from remote methods
— Object-oriented structure lets symponents inherit
capabilities and interfaces from other symponents
Benefits of cooperative parallelism





Easy subletting of work improves load balance
Simple model for expressing task-based parallelism
(rather than data parallelism)
Nodes can be suballocated dynamically
Dynamic management of symponent jobs supports
fault tolerance
— Caller notified of failing symponents; can relaunch
Existing stand-alone applications can be modified
and combined as discrete modules
But what about MPI?


Cooperative parallelism
— Dynamic management of symponents
— Components are opaque to each other
— Communication is connectionless, ad-hoc,
interrupting and point-to-point
MPI and MPI-2
— Mostly-static process management (MPI-2 can
spawn processes but not explicitly terminate
them)
— Tasks are typically highly-coordinated
— Communication is connection-oriented and either
point-to-point or collective; MPI-2 supports remote
memory access
Applications: Load balancing
Server
proxy
Well-balanced work




Servers for unbalanced work
Divide work into well-balanced and unbalanced parts
Run balanced work as a regular MPI job
Set up pool of servers to handle unbalanced work
— Server proxy assigns work to available servers
Tasks with extra work can sublet it in parallel so they can catch
up to less-busy tasks
Applications: Adaptive sampling
Unknown function
Previously computed values
Interpolated values
Newly computed value
Server
proxy
Coarse scale model




Fine scale servers
Multiscale model, similar to AMR
— BUT: Can use different models at different scales
Fine-scale computations requested from remote servers to
improve load balance
Initial results cached in a database
Later computations check cached results and interpolate if
accuracy is acceptable
Applications: Parameter studies
Completed
simulation
Active
simulation
Completed
simulation
Active
simulation


Master
Active
simulation

Master process launches
multiple parallel components
to complete a simulation,
each with different
parameters
Existing simulation codes
can be wrapped to form
components
Master can direct study
based on accumulated
results, launch new
components as others
complete
Applications: Federated simulations


Components modeling separate physical entities interact
Potential example: Ocean, atmosphere, sea ice
— Each modeled in a separate job
— Interactions communicated through RMI (N-by-M parallel
RMI is future work)
Cooperative parallelism and Babel



Babel gives Co-op
— Language interoperability, SIDL, object-oriented model
— RMI, including exception handling
Co-op adds
— Symponent launch, monitoring, and termination
— Motivation and resources for extending RMI
— Patterns for developing task-parallel applications
Babel and Cooperative Parallelism teams are colocated and
share some staff
Status: Runtime software





Prototype runtime software working on Xeon,
Opteron and Itanium systems
Competed 1360-CPU demonstration in September
Beginning to do similar-scale runs on new platforms
Planning to port to IBM systems this year
Ongoing work to enhance software robustness and
documentation
Status: Applications




Ongoing experiments with material modeling
application
— Demonstrated 10-100X speedups on >1000
processors using adaptive sampling
Investigating use in parameter study application
In discussions with several other apps groups at
LLNL
Also looking for possible collaborations with groups
outside LLNL

Contacts: John May ([email protected]),
David Jefferson ([email protected])

http://www.llnl.gov/casc/coopParallelism/
