1-P - NCSA Wiki

Parallelism and Amdahl's Law
Eric Shook
Department of Geography
Kent State University
Parallel computing
Image sources: intel.com, http://www.nasa.gov/audience/foreducators/k-4/features/F_ESSEA_Course_K-4.html
Inter-process Communication
Shared Memory
Processing
Core
0
Processing
Core
1
[40.742, -74.245]
Memory space is shared between
processing core 0 and 1
Message Passing
Processing
Core
0
[40.742, 74.245]
Processing
Core
1
[40.742, -74.245]
Private memory
space for processing
core 0
Private memory
space for processing
core 1
Parallel Programming Paradigms
Functional Parallelism
Data Parallelism
Processing
Core
0
Processing
Core
1
Processing
Core
0
Processing
Core
1
Task A
Task B
Task A
Task A
Data (Half)
Data (Half)
Equivalent
processing
times
Task B
Task B
Data (Half)
Data
Data
Data (Half)
Spatial Domain Decomposition
Row or
Column
Quadtree
Recursive
Bisection
Grid
Ding, Y., & Densham, P. J. (1996). Spatial strategies for parallel spatial modelling. International Journal of Geographical Information Systems, 10(6), 669-698.
Challenges for Parallelism: Load-Imbalance
Load-imbalance
Uneven amount of data for processing
Processing
Core
0
Task A
Processing
Core
1
Task A
Core 0 will finish
processing much
sooner than Core 1
Load-Imbalance: Bad for Performance
Imbalanced Workload
20%
80%
Balanced Workload
50%
50%
Load-Imbalance: Bad for Performance
Imbalanced Workload
20%
Doing nothing,
but could
be processing
80%
Balanced Workload
50%
Overloaded core
All lost time
due to
imbalance
50%
Challenges: Not Enough Parallelism
Task A
Task B
Task C
Task D
Task E
Not Enough
Task Parallelism
Data too small for
Data Parallelism
Measuring Parallel Performance: Speedup
Speedup is commonly used to assess the performance of a parallel
program. Speedup is defined as the execution time on a single
core (T1) over the execution time on p cores (Tp) (Amdahl, 1967).
Linear or ideal speedup is reached when Sp = p.
Linear Speedup
Actual Speedup
Speedup
Number of cores
Amdahl's Law: Theoretical Speedup
Task A
Serial Portion
Task B
Task C
Task D
Task E
Parallel Portion
Serial Portion
Assume P is the parallel portion of a parallel program,
then (1-P) is the portion that cannot be made parallel (serial portion).
Amdahl's law states that the maximum speedup on N processors is:
1
S(N) = (1-P) + PN
Amdahl's Law: Examples
1
S(N) = (1-P) + PN
As N tends to infinity, S(N) tends to 1/(1-P)
Parallel Portion
99%
95%
90%
75%
50%
25%
* Even if we have one million processing cores!
Maximum Speedup*
100
20
10
4
2
1.3