ppt

Software Group Compiler Technology
Task, thread and processor
— OpenMP 3.0 and beyond
Guansong Zhang,
IBM Toronto Lab
© 2006 IBM Corporation
Compiler technology
Overview
 The purpose of the talk
– Introducing the latest improvement on the OpenMP
standard
• Task
• Still under discussion, don’t take the syntax
– Considerations for the future OpenMP development
• Thread affinity
2
Compiler technology
The changing world
 Hardware improvement
– Development of the multicore system
• Soon we will have more processors than we know how to
program with 
• IBM to Build World's First Cell Broadband Engine Based
Supercomputer
• Intel: Quad core to turbocharge chips
• Terra Soft to Build Cell-Based Super Out of PS3 Beta Iron
 OpenMP Standard
– C/C++ and Fortran standard are merged into 2.5
 Other language committee
– C++ memory model: atomic access
3
Compiler technology
More changes in the OpenMP world
 New players
– Microsoft just joined the OpenMP ARB
• …, and Visual C++® 2005 supports the full standard.
OpenMP is also supported by the Xbox 360™ platform
– GCC
• The GOMP project is developing an implementation of
OpenMP for the C, C++, and Fortran 95 compilers in the
GNU Compiler Collection
4
Compiler technology
Overview
 The purpose of the talk
– Introducing the latest improvement on the OpenMP
standard
• Task
• Still under discussion, don’t take the syntax
– Considerations for the future OpenMP development
• Thread affinity
5
Compiler technology
Workshare and task pool
 What is a workshare
6
Compiler technology
Workshare and task pool (cont.)
 What is a task
 Not a workshare
– But still “sharing/cooperating”
between threads
 Comparing with a workshare
– Unit can be generated
– Unit can wait for another
generated unit
7
Compiler technology
Task examples
 Pointer chasing
#pragma omp parallel
 Recursive algorithm
int fib(int n) {
int x, y;
{
if (n<2)
#pragma omp single
return n;
{
#pragma omp taskgroup
while(p) {
{
#pragma omp task
#pragma omp task common(x)
process(p)
x=fib(n-1);
p=p->next;
#pragma omp task common(y)
}
y=fib(n-2);
}
}
}
return x+y;
}
8
Compiler technology
Task schedule
 More flexible scheduling
– Can a task be multi-threaded?
•
When a task is encountered, the
thread always go for the new task
 Advantage
– The idea is to provide one more
level of abstraction
•
•
•
Task centric view
Try to avoid thread starvation
Potential cache reuse
 Disadvantage
– Threadprivate
•
No threadprivate data
– Thread id
•
HPC users may need thread id to
localize data access.
– Locks
•
Locks’ owner becomes confusing
9
Compiler technology
Overview
 The purpose of the talk
– Introducing the latest improvement on the OpenMP
standard
• Task
• Still under discussion, don’t take the syntax
– Considerations for the future OpenMP development
• Thread affinity
10
Compiler technology
Emerging architectures
11
Compiler technology
Performance number
Stride 1
Stride 2
12
Compiler technology
Thread affinity
 Nested parallelism
• Organize threads to multi levels (This is in previous OpenMP
standard already)
 Thread grouping
• Balancing the number of threads available and the
parallelism in the code
 Thread mapping
• Associate each OpenMP thread to physical/logical
processors
13
Compiler technology
How to represent a thread group
Environment Var
Explicit index
Descriptor handle
User interface
No touch for user
code
Simple data type;
Possible multiple
changes in the
source.
New internal type,
allow centralized
thread group
programming
Modularity
(procedure calls,
library functions)
No support
Pass level, CPU array
and array size
Pass group type var,
which may be used as
an execution context
(MPI)
Nested par (thread to
thread affinity)
Fixed in advance, no
dynamic adjustment
according to user
input
Specify number of
threads at different
levels
Specify thread
composition
Mapping threads
Implementation
defined
Supported, through
Virtual CPU numbers
Supported, through
omp_get_procs()
Heterogeneous
system
No support (?)
Supported, different
kinds of CPU with
same numbering
scheme?
Different groups
14
Compiler technology
What is for the future
 Performance is still our goal
– “OpenMP is about performance.”
• Quoted from NASA scientists
 OpenMP needs to enlarge itself for the broader market
– C/C++ will become more interesting
– People like to see non numeric programs in OpenMP
 Partition OpenMP interface as different layers
– TASK, WORKSHARE vs. THREAD vs. PROC
– MPI has more than 300 calls, most people will only use 6-8
– Keep the layered approach while we extending OpenMP ?
15
Compiler technology
Summary
 Start a parallel region
 Split into two nested parallel
regions
– This is the chance to bind
threads to the right processors
 Start a task region
– For independent works
•
E.g. game objects
 Start a workshare
– For computation intensive
calculation
•
16
E.g. graphic rendering
Compiler technology
Q&A
17