Introduction to Parallel Computing

‫בס"ד‬
FLEXCOMP Lab
The Challenge of
Parallel Architectures:
An Integrative Approach
D. Dayan, M. Goldstein, S. Mizrahi, R. B. Yehezkael
JCT, January 2011 – ‫שבט תשע"א‬
A few slides are based on presentations by Katherine Yelick (UC-Berkeley) and David A. Wood (UW-Madison).
FLEXCOMP Lab Presentation
1
Motivation
• Multicore processors are here!
- Intel, AMD, Sun, … are shipping their multi-core processors.
- The computing power of desktops, servers and the cloud,
are being transformed by multi-core processors.
• But Parallel Programming is hard!
- Few people have any real experience and knowledge
to program multi-processor systems.
- Parallel programming must be made easier:
- What do application programmers need to easily program
parallel systems?
- What should programming languages and/or models provide
programmers in order to make parallel programming easier?
FLEXCOMP Lab Presentation
2
Making Parallel Programming Easier: a proposal
We intend to concentrate research efforts on three
interrelated topics
Programming
Hardware
Education for
Parallel Thinking
FLEXCOMP Lab Presentation
3
Making Parallel Programming Easier: a proposal
We intend to concentrate research efforts on three
interrelated topics
• Programming
– A Flexible Execution Embedded Language is in the making.
• Hardware
– Hardware support for Flexible Execution will be developed.
• Education for Parallel Thinking
– Educational Material appropriate for Parallelism was and will
be developed for parallel programming courses.
FLEXCOMP Lab Presentation
4
What was done in the past
•Flexible Algorithms – The idea
- A flexible algorithm allows various execution orders
(parallel / asynchronous), but the end result well defined.
Asynchronous means sequential in any order.
- Declarative notation with algorithmic style.
- Notationally simple and multifaceted.
•Flexible Algorithms - Course for Beginners
•Simple Flexible Language – SFL
- A prototype compiler was developed in 2002 by Isaac Dayan.
FLEXCOMP Lab Presentation
5
What was done in the past
Educational Approach
- Emphasis on flexible algorithms - early awareness of parallelism
- Reading - Executing – Understanding
- Converting flexible algorithms to hardware block diagrams (a.c.)
- Converting flexible algorithms to sequential algorithms (t.r.)
- Computational Induction - Deeper understanding
- Changes to (and writing) flexible algorithms
- Overall description of systems using flexible algorithms (a.c.)
- Broaden outlook of students - important in a first CS course.
FLEXCOMP Lab Presentation
6
Flexible Algorithms
Functional form:
Parameters for values received (IN variables)
Parameters for values returned (OUT variables)
(no INOUT variables)
Function calls and compositions.
Set of statements
- once only and composite assignments
FLEXCOMP Lab Presentation
7
Flexible Algorithms – An Example
Reversing part of a vector
function v' •= reverse(v, low, high);
{
if (low<high)
{v'high •= vlow;
v' •= reverse (v, low+1, high-1); // **
v'low •= vhigh;}
else if (low=high)
{v'high •= v high;};
} // end reverse
** Also may be written as: reverse (low•=low+1; high•=high-1);
The values of low and high are not changed by the statements low •= low+1, high •= high-1.
There are separate variables for each call or activation of a function.
FLEXCOMP Lab Presentation
8
Flexible Algorithms – An Example
Suppose that v=(1,2,3,4,5) and we wish to execute:
r' •= reverse (v,0,4);
Parallel execution method
set of statements
{ r' •= reverse (v, 0, 4); }
{ r'4 •= v0; r' •= reverse (v, 1, 3); r'0 •= v4; }
{ r'3 •= v1; r' •= reverse (v, 2, 2); r'1 •= v3; }
{ r'2 •= v2; }
{ }
5 lines
FLEXCOMP Lab Presentation
r'
( _, _, _, _, _ )
( _, _, _, _, _ )
( 5, _, _, _, 1 )
( 5, 4, _, 2, 1 )
( 5, 4, 3, 2, 1 )
9
Flexible Algorithms – An Example
Sequential execution left to right with immediate execution of
the function call at left
set of statements
{ r' •= reverse (v, 0, 4); }
{ r'4 •= v0; r' •= reverse (v, 1, 3); r'0 •= v4; }
{ r' •= reverse (v, 1, 3); r'0 •= v4; }
{ r'3 •= v1; r' •= reverse (v, 2, 2); r'1 •= v3; r'0 •= v4; }
{ r' •= reverse (v, 2, 2); r'1 •= v3; r'0 •= v4; }
{ r'2 •= v2; r'1 •= v3; r'0 •= v4; }
{ r'1 •= v3; r'0 •= v4; }
{ r'0 •= v4; }
{ }
9 lines
FLEXCOMP Lab Presentation
r'
( _, _, _, _, _ )
( _, _, _, _, _ )
( _, _, _, _, 1 )
( _, _, _, _, 1 )
( _, _, _, 2, 1 )
( _, _, _, 2, 1 )
( _, _, 3, 2, 1 )
( _, 4, 3, 2, 1 )
( 5, 4, 3, 2, 1 )
10
Flexible Algorithms – An Example
Sequential execution left to right with delayed execution of
the function call at left
set of statements
{ r' •= reverse (v, 0, 4); }
{ r'4 •= v0; r' •= reverse (v, 1, 3); r'0 •= v4; }
{ r' •= reverse (v, 1, 3); r'0 •= v4; }
{ r' •= reverse (v, 1, 3); }
{ r'3 •= v1; r' •= reverse (v, 2, 2); r'1 •= v3; }
{ r' •= reverse (v, 2, 2); r'1 •= v3; }
{ r' •= reverse (v, 2, 2); }
{ r'2 •= v2;}
{ }
9 lines
r'
( _, _, _, _, _ )
( _, _, _, _, _ )
( _, _, _, _, 1 )
( 5, _, _, _, 1 )
( 5, _, _, _, 1 )
( 5, _, _, 2, 1 )
( 5, 4, _, 2, 1 )
( 5, 4, _, 2, 1 )
( 5, 4, 3, 2, 1 )
... ‫ הכל צפוי והרשות נתונה‬...
FLEXCOMP Lab Presentation
11
Flexible Algorithms – Work to be done
Flexible Execution
Embedded
Language
Hardware Support
for
Flexible Execution
Education
for
Parallel Thinking
FLEXCOMP Lab Presentation
12
Education for Parallel Thinking
• Amdahl's Law.
• Integrated course on flexible algorithms
and digital logic (also for secondary schools?).
• Advanced course on flexible algorithms and
parallel programming.
FLEXCOMP Lab Presentation
13
Embedded Flexible Language (EFL)
// Host language code (Python, Java, etc.)
………..
………..
EFL {
………..
………..
}
// Host language code (Python, Java, etc.)
………..
………..
EFL {
………..
………..
}
Should annotations be
allowed?
Annotations would indicate
in the EFL code where
parallelism should be used,
since too much parallelism
can be inefficient.
Annotations do not affect
the results produced by the
program and may be
ignored by the EFL precompiler (e.g. if there is
only one processor).
FLEXCOMP Lab Presentation
14
Embedded Flexible Language (EFL)
// Host language code (Python, Java, etc.)
………..
………..
EFL {
………..
………..
}
// Host language code (Python, Java, etc.)
………..
………..
EFL {
………..
………..
}
Interface between the Host
language and EFL
- Flexible execution
supported in EFL blocks.
- Only sequential
execution allowed in
(non-EFL) host language
code.
We have been working on the
design of this interface
FLEXCOMP Lab Presentation
15
MDA for EFL – Our Goals

Domain : EFL language

Aims :
 Grammar of the EFL language formal and precise
 Decomposition for separating the independent
components and dependant of the platform
 Facilitate the transformations to the PSMs and to the
code.
FLEXCOMP Lab Presentation
16
MDA application
An MDA application is composed of:
1. Platform independent model (PIM).
2. Platform(s) dependant model(s) (PSM).
3. Transformations : describe the passage of the source
model to the various target platforms.
FLEXCOMP Lab Presentation
17
MDA - Model Transformations
(figure from: Bezivin, J. et Blanc, X., "MDA : Vers un Important Changement de Paradigme en Génie
Logiciel" , 2002)
FLEXCOMP Lab Presentation
18
MDA – Overview of the PIM meta-model
FLEXCOMP Lab Presentation
19
MDA – Transformation from PIM to PSM
FLEXCOMP Lab Presentation
20
MDA – Code Generation from PSMs
• The syntactic translation
• The semantic translation
FLEXCOMP Lab Presentation
21
Hardware Support for Flexible Execution
Composite Assignments - Asynchronous execution
x f= e; // means the new value of x = (current value of x) f e;
// f is a binary operator and e, e0, ..., en are expressions.
x = e0; // Initial value of x
x f= e1; // Composite assignment
...
x f= en; // Composite assignment
Asynchronous execution of composite assignments is well defined if
((a f b) f c) = ((a f c) f b).
Values of the expressions e0, ..., en may be computed in parallel.
Once only assignment is a special case of composite assignment.
FLEXCOMP Lab Presentation
22
Hardware Support for Flexible Execution
• Use of "or=" ("and=") composite assignment is suggested.
Capacitance based memories give the possibility of implementing these
operations very simply.
Paradoxically, there is no need to read the contents of the memory to
perform "or=", and this may be done in parallel.
The end result is well defined.
Similarly regarding "and=".
FLEXCOMP Lab Presentation
23
‫‪Hardware Support for Flexible Execution‬‬
‫‪ DRAM‬עיקרון פעולה‬
‫תהליך הכתיבה נעשה ע"י בחירת‬
‫שורה ועמודה באמצעות קווי הכתובת‪.‬‬
‫ה‪ MUX‬במצב ‪ 1‬ולכן המידע שנמצא ב‬
‫‪ BUS‬גורם לטעינת או פריקת הקבל‬
‫שנבחר באמצעות קווי הכתובת‪.‬‬
‫בתהליך קריאה קווי הכתובת בוחרים‬
‫את השורה עמודה המתאימה‪ ,‬ה‪MUX‬‬
‫במצב ‪ ,0‬והמידע יוצא ל‪ BUS‬דרך‬
‫מעגלי ‪sense amplifier‬‬
‫בגלל שרכיב הזיכרון הינו קבל שיש לו‬
‫זליגה עצמית‪ ,‬יש צורך ברענון הזיכרון‪,‬‬
‫מקובל לעבוד בקצב רענון של‬
‫‪.msec64‬‬
‫‪24‬‬
‫‪FLEXCOMP Lab Presentation‬‬
‫‪Hardware Support for Flexible Execution‬‬
‫‪OR DRAM for Parallel Computing‬‬
‫‪ OR_DRAM‬דומה בפעולתו ל‪ DRAM‬רגיל‪,‬‬
‫ההבדל שבתהליך הכתיבה נעשית פעולת ‪OR‬‬
‫הדיודות גורמות שלא ניתן לכתוב ‪ 0‬אלא רק ‪.1‬‬
‫כאשר רוצים לאפס תא בזיכרון‪ ,‬ניתן לפנות לכתובת‬
‫ולהוריד את הקו ‪ CLR‬ל‪ 0‬דבר שיגרום‬
‫למחיקת התא בזכרון‪ ,‬ניתן להשיג מחיקה ע"י‬
‫המתנה מספיק ארוכה לפריקת המטען‬
‫מהקבל‪.‬‬
‫זיכרון זה מהווה מרכיב חשוב למחשבים מקבילים‬
‫המתבססים על אלגוריטם ‪ , FLEX‬היות‬
‫והבסיס של החישוב המקבילי הוא פונקציית‬
‫‪ OR‬והיא מתבצעת בעצם הכתיבה לזכרון‪ .‬כמו‬
‫כן אין צורך ב‪ FECHING‬של המידע ל‪.CPU‬‬
‫דבר שחוסך זמן רב בביצועי המחשב‪.‬‬
‫‪25‬‬
‫‪FLEXCOMP Lab Presentation‬‬
The FLEXCOMP People
Staff
David Dayan
[email protected]
Moshe Goldstein
[email protected]
Shimon Mizrahi
[email protected]
Raphael B. Yehezkael [email protected]
Students
Or Berlowitz
Sarit Gutman
Max (Mordechai) Rabin
Efrat Tamir
Others
Isaac Baron (JCT graduate, currently completing his PhD)
Isaac Dayan (JCT graduate, currently not at any college)
FLEXCOMP Lab Presentation
26