Parallel Program Design Tasks, Critical Path

Parallel Program Design
Tasks, Critical Path
Ned Nedialkov
McMaster University
Canada
SE 3F03
January 2015
Foster’s Methodology
Tasks
Degrees of concurrency
Critical path
Examples
Outline
Foster’s Methodology
Tasks
Degrees of concurrency
Critical path
Examples
c 2013–15 Ned Nedialkov
2/16
Foster’s Methodology
Tasks
Degrees of concurrency
Critical path
Examples
Foster’s Methodology
1. Partitioning. Divide the computation into small tasks that
can be done in parallel
2. Communication. Determine the communications between
the tasks
3. Aggregation (or agglomeration). Combine tasks and
communications into larger tasks
4. Mapping. Assign the combined tasks to processes (or
threads)
From Ian Foster, Designing and Building Parallel Programs,
http://www.mcs.anl.gov/~itf/dbpp/
c 2013–15 Ned Nedialkov
3/16
Foster’s Methodology
Tasks
Degrees of concurrency
Critical path
Examples
Tasks
I
We can decompose a computation into smaller parts or
tasks
I
The goal is to determine which can be executed in parallel
I
Fine-grained decomposition: many small tasks
I
Coarse-grained decomposition: small number of large
tasks
Task-dependency graph
I
I
I
I
directed acyclic graph
nodes are tasks
there is an edge between task A and task B if B must be
executed after A
c 2013–15 Ned Nedialkov
4/16
Foster’s Methodology
Tasks
Degrees of concurrency
Critical path
Examples
Degrees of concurrency
I
Maximum degree of concurrency: the maximum number of
tasks that can be executed in parallel
I
Average degree of concurrency: the average number of
tasks that can be executed in parallel
I
The average degree of concurrency is a more useful
measure
c 2013–15 Ned Nedialkov
5/16
Foster’s Methodology
Tasks
Degrees of concurrency
Critical path
Examples
Critical path
I
Start nodes: nodes with no incoming edges
I
Finish nodes: nodes with no outgoing edges
I
Critical path: the longest directed path between any pair of
start and finish nodes
I
Critical path length: sum of the weights of the nodes on a
critical path
I
ave degree of concurrency =
c 2013–15 Ned Nedialkov
total amount of work
critical path length
6/16
Foster’s Methodology
Tasks
Degrees of concurrency
Critical path
Examples
Example
Task-dependency graph
I
I
I
I
maximum degree of concurrency is 6
critical path length is 5
total amount of work is 14 (assuming each task takes one
unit of time)
average degree of concurrency is 14/5 = 2.8
c 2013–15 Ned Nedialkov
7/16
Foster’s Methodology
Tasks
Degrees of concurrency
Critical path
I
An assignment to 2 processes and the order in which the
tasks are executed
I
First number is the process number
I
The speedup is 14/7 = 2
0,1
1,1
1,3
0,2
1,2
0,4
1,7
0,3
Examples
1,5
1,4
0,5
1,6
0,6
0,7
c 2013–15 Ned Nedialkov
8/16
Foster’s Methodology
Tasks
Degrees of concurrency
I
An assignment to 3 processes
I
The speedup is 14/5 = 2.8
1,3
2,3
0,1
1,1
0,2
1,4
1,5
2,1
Critical path
Examples
2,2
1,2
0,3
2,4
0,4
0,5
c 2013–15 Ned Nedialkov
9/16
Foster’s Methodology
Tasks
Degrees of concurrency
I
An assignment to 4 processes
I
The speedup is 14/5 = 2.8
1,2
2,2
1,3
2,1
3,1
3,2
1,4
Critical path
0,1
Examples
1,1
0,2
0,3
2,3
0,4
0,5
I
What is the speedup on 8 processes?
c 2013–15 Ned Nedialkov
10/16
Foster’s Methodology
Tasks
Degrees of concurrency
Critical path
Examples
Example: LU decomposition
I
We want to compute the LU decomposition of a matrix A
I
Assume A consists of 3 × 3 blocks
I
We want





A11 A12 A13
L11 0
0
U11 U12 U13
A21 A22 A23  → L21 L22 0   0 U22 U23 
A31 A32 A33
L31 L32 L33
0
0 U33
{z
}|
{z
}
|
L
c 2013–15 Ned Nedialkov
U
11/16
Foster’s Methodology
Tasks
Degrees of concurrency
I
We need to determine the Lij and Uij
I
Consider L times the first column of U
I
We have
I
Critical path
Examples
A11 = L11 U11
compute L11 and U11
(1)
A21 = L21 U11
−1
L21 = A21 U11
(2)
A31 = L31 U11
−1
A31 U11
(3)
L31 =
From multiplying the first row of L times U, we obtain
A12 = L11 U12
U12 = L−1
11 A21
(4)
A13 = L11 U13
L−1
11 A13
(5)
U13 =
c 2013–15 Ned Nedialkov
12/16
Foster’s Methodology
I
Tasks
Degrees of concurrency
Critical path
Examples
Then we write
A22 = L21 U12 + L22 U22
A022 = A22 − L21 U12 = L22 U22
(6)
compute L22 and U22
(7)
= A23 − L21 U13 = L22 U23
(8)
A023
A23 = L21 U13 + L22 U23
0
L−1
22 A23
(9)
A032 = A32 − L31 U12 = L32 U22
(10)
−1
L32 = A032 U22
(11)
= A33 − L31 U13
(12)
− L32 U23 = L33 U33
(13)
compute L33 and U22
(14)
U23 =
A32 = L31 U12 + L32 U22
A033
A33 = L31 U13 + L32 U23 + L33 U33
A0033
=
c 2013–15 Ned Nedialkov
A033
13/16
Foster’s Methodology
I
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
Tasks
Degrees of concurrency
Critical path
Examples
Using the right column in (1–14), we can decompose the
computation in the following tasks
A11 = L11 U11
−1
L21 = A21 U11
−1
L31 = A31 U11
U12 = L−1
11 A21
U13 = L−1
11 A13
A022 = A22 − L21 U12
A032 = A32 − L31 U12
A023 = A23 − L21 U13
A033 = A33 − L31 U13
A022 = L22 U22
−1
L32 = A032 U22
−1 0
U23 = L22 A23
A0033 = A033 − L32 U23
A0033 = L33 U33
(compute LU factorization)
(compute LU factorization)
(compute LU factorization)
c 2013–15 Ned Nedialkov
14/16
Foster’s Methodology
Tasks
Degrees of concurrency
Critical path
Examples
1
2
5
4
3
8
6
9
7
10
12
11
13
14
Figure: Task-dependency graph
c 2013–15 Ned Nedialkov
15/16
Foster’s Methodology
Tasks
Degrees of concurrency
P1
2
5
8
P0
P2
10
12
P3
4
6
P1
Examples
P0
1
P0
Critical path
3
9
P2
7
P3
P0
P0
13
14
P1
11
P0
P0
Figure: Task distribution on 4 processes. Maximum speedup is 2
c 2013–15 Ned Nedialkov
16/16