OpenMP - Cluster Extensions Node Interconnect

OpenMP - Cluster Extensions
Thomas Bönisch
[email protected]
Höchstleistungsrechenzentrum Stuttgart
SMP - Cluster (Hybrid System)
•
Most modern high-performance computing (HPC) systems are
clusters of SMP nodes
Node Interconnect
•
•
DMP (distributed memory parallelization) on the node inteconnect
SMP (symmetric multi-processing) inside of each node
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 2
Höchstleistungsrechenzentrum Stuttgart
20.
— OpenMP – Cluster Extensions
20-1
—
20.
SMP - Clusters Current Solutions for Programming
•
MPI based:
– the MPP model
• massively parallel processing
• each CPU = one MPI process
– MPI + OpenMP
•
•
•
•
Node Interconnect
each SMP node = one MPI process
MPI communication on the node interconnect
OpenMP inside of each SMP node
DMP with MPI & SMP with OpenMP
– MPI + automatic parallelization
•
Other models:
– HPF, MLP, ...
•
What about an OpenMP-like approach ?
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 3
Höchstleistungsrechenzentrum Stuttgart
MPI + OpenMP on SMP-Clusters
•
•
•
Advantages
– Could be effective utilizing heavyweight communications
between nodes and lightweight threads within a node
– Less communication packets and uses larger communication
packets than pure MPI on SMP Clusters.
Disadvantages
– Very Difficult to start with OpenMP & modify for MPI
(non-incremental)
– Very difficult to program, debug, modify and maintain
– Generally, cannot do MPI calls from within parallel regions
– Only people very experienced in both should use this mixed
programming model - an intimidating programming model
– Single Node AND single CPU performance suffers
Could provide highest performance (at a cost!)
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 4
Höchstleistungsrechenzentrum Stuttgart
20.
— OpenMP – Cluster Extensions
20-2
—
20.
“Distributed OpenMP” Current Approaches
•
New directives for distributing data
– Portland Group (following slides)
– Compaq
•
No extensions to OpenMP itself, but an new more intelligent
run time environment
– Real World Computing Partnership (RWCP) Japan
•
KAI announced KAP/Pro Toolset Network Edition
– presumably also DVSM approach
but there is no further information
– available far 2001
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 5
Höchstleistungsrechenzentrum Stuttgart
A New Parallel Programming Paradigm for SMP Clusters ?
•
•
•
Ideally, a mixed parallel programming paradigm would exist that
– enable parallelism using low level, efficient one-way
communication between PEs
– using an efficient multi-threaded mechanism between CPU
within a PE
Such a mixed parallel programming paradigm should
– allow for full efficiency within an SMP System while also
– retaining efficiency for distributed memory clusters
Other desired features:
– incremental parallelism
– ease-of-programming
– maintainability and debugability should be maintained
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 6
Höchstleistungsrechenzentrum Stuttgart
20.
— OpenMP – Cluster Extensions
20-3
—
20.
Distributed OMP
•
A merging of SMP and cluster capabilities
– Merge OpenMP + Craft + HPF technologies
– Goal is to maintain full source compatibility with OpenMP
•
A programmer should be able to
– start with an OpenMP program and
– later add mapping directives
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 7
Höchstleistungsrechenzentrum Stuttgart
Distributed OMP - Characteristics
•
Each node is treated as a single process with one-way
communication between the nodes
•
lightweight threads within a node
•
Each node’s program would utilize multiple threads according to
OpenMP directives and OpenMP library calls
•
Directives for data distribution to the nodes
no explicit communication
•
Less data movements between nodes plus larger packets of data.
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 8
Höchstleistungsrechenzentrum Stuttgart
20.
— OpenMP – Cluster Extensions
20-4
—
20.
Number of Communications
100000
10000
1000
1 CPU x 128 nodes
4 CPUs x 32 nodes
32 CPUs x 4 nodes
100
10
1
all-to-all
reduction
But the message size is increasing in the first case
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 9
Höchstleistungsrechenzentrum Stuttgart
Distributed OpenMP - Advantages
•
One can have a single portable code that can be executed on SMP
systems and distributed clusters and hybrid combinations of SMP
clusters
•
Single SMP node performance will not degrade
•
Incremental parallelization capability
•
Managers don’t have to risk “betting” on SMP systems being the
dominant system
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 10
Höchstleistungsrechenzentrum Stuttgart
20.
— OpenMP – Cluster Extensions
20-5
—
20.
Distributed OpenMP - Disadvantages
•
Not yet available (9-12 months)
•
More complex than OpenMP or HPF
•
User still must give some thought about data partitionment
•
Performance tuning after mapping directives are added
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 11
Höchstleistungsrechenzentrum Stuttgart
“Distributed OpenMP” Current Approaches
•
New directives for distributing data
– Portland Group (following slides)
– Compaq
•
No extensions to OpenMP itself, but an new more intelligent
run time environment
– Real World Computing Partnership (RWCP) Japan
•
KAI announced KAP/Pro Toolset Network Edition
– presumably also DVSM approach
but there is no further information
– available far 2001
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 12
Höchstleistungsrechenzentrum Stuttgart
20.
— OpenMP – Cluster Extensions
20-6
—
20.
Compiler-Directed Software DSM
•
CD-SDSM transparently executes OpenMP programs on Clusters
of SMP’s
•
Compilation:
– The particular compiler compiles the OpenMP C-program to a
multi-threaded C-program with runtime library calls.
– This program is compiled using a native C-compiler
– linked with the libraries
•
The executable is executed on each SMP node in a SPMD fashion
(like MPI)
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 13
Höchstleistungsrechenzentrum Stuttgart
CD-SDSM Internals
•
A part of the adress space of each process
(running on a SMP node)
is virtually shared between the nodes.
•
Coherence of shared space is maintained by software
– fine grain coherence control
– compiler can optimize coherence operations
(It knows about the source !!!)
– these optimizations are crucial to obtain good performance
•
Redundant barrier removal
– Overhead for synchronization is large for SDSM
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 14
Höchstleistungsrechenzentrum Stuttgart
20.
— OpenMP – Cluster Extensions
20-7
—
20.
CD-SDSM Results
•
Jacobi Overrelaxation Solver, Matrix4096x4096, 10 Its.
20
18
16
14
12
10
8
6
4
2
0
Serial
Parallal
1x1 1x2 1x4 2x1 2x2 2x4 4x1 4x2 4x4 8x1 8x2 8x4
nodes x threads
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 15
Höchstleistungsrechenzentrum Stuttgart
Summary
•
Distributed OpenMP
– may provide an (excellent ?) technology path for SMP Clusters
– Will be available in 9-12 months
– Increased complexity
– Not yet standardized
•
CD-SDSM for OpenMP programs
– available for C
– shows promising results
– easier for the programmer
•
No experience for large configurations
OpenMP - Cluster Extensions
Thomas Bönisch
Slide 16
Höchstleistungsrechenzentrum Stuttgart
20.
— OpenMP – Cluster Extensions
20-8
—
20.