Parallel Computing
Chapter 3 - Patterns
R. HALVERSON
MIDWESTERN STATE UNIVERSITY
1
Parallel Patterns
Serial Patterns
Structured Programming
Universal
Algorithmic Skeletons,
Techniques, Strategies
Including OOP
Features
Well-structured
Maintainable
Efficient
Deterministic
Composable
2
Nesting Pattern
Ability to hierarchically compose patterns
Patterns within Patterns
As in Structured Programming
Static: Sequence, Selection, Iteration,
Dynamic: Recursion
Any pattern can contain any other pattern
3
Data Parallelism vs. Functional Decomposition
Static Patterns Functional Decomposition
Dynamic Pattern = Recursion Data Parallelism
Nesting + Recursion Parallel Slack
What about “excessive” recursion?
4
3.2 Serial Control Flow Patterns
Sequence
Selection (Decision)
Iteration (Loop, Repetition)
Loop-Carried Dependency
Map, Scan, Recurrence, Scatter
Gather, Pack
Recursion
What is an alias?
5
Can this loop be parallelized? Problems?
void engine
(int n, double x[ ],
int a[ ], b[ ], c[ ], d[ ])
{ for (int k = 0; k < n; ++ k)
x[a[k]] = x[b[k]]* x[c[k]]+ x[d[k]]
}
6
Can this loop be parallelized? Problems?
void engine
(int n, double x[ ], y[ ]
int a[ ], b[ ], c[ ], d[ ])
{ for (int k = 0; k < n; ++ k)
y[a[k]] = x[b[k]]* x[c[k]]+ x[d[k]]
}
7
3.3 Parallel Control Patterns
Fork-Join
Map
Stencil
Reduction
Scan
Recurrence
Nvidia GE Force 480
8
3.3.1 Fork - Join
Fork – instruction allows creation of new control flow
Join – instruction to synchronize control flows that have
been created via the fork instruction; after Join, only one
control flow continues
Variation: Spawn – for executing a function
Caller does not wait for return
Barrier – synchronizes multiple control flows but all may
continue after Barrier
9
3.3.2 Map (Fig.3.6)
Map – technique replicates elemental function over each
element of an index set
Elemental function is applied to elements of collections
Iteration (Loop) Replacement
Every iteration is independent
Computation – count, index, data item
Known number of iterations
Pure Elemental Function: No side effects
10
3.3.3 Stencil (Fig. 3.7)
Stencil – extension of Map allowing elemental function
access to set of “neighbors”
Pattern of access eliminates memory/data conflicts
Special cases: out-of-bounds
Utilizes Tiling (see section 7.3)
Applications: image filtering, simulation (fluid flow), linear
algebra
11
3.3.4 Reduction (Fig. 3.9)
Reduction - Combines elements of collection into single
element (using associative combiner function)
O(log n)
Consider summation of an array
Calculate total number of additions
12
3.3.5 Scan (Fig. 3.10)
Scan – computes partial reductions of a collection
For each output position, reduction to that point is computed
AKA – Prefix Sums (example)
Total number of additions serial? Parallel?
How many processors? Implications?
O(log n)
Applications: Checkbook, integration, random numbers
13
3.3.6 Recurrence
Omit???
14
3.4 Serial Data Management Patterns
How stored data is allocated, shared, read, written, copied
Random RW
Stack Allocation
Heap Allocation
Closure
Object
15
3.4.1 Random Read & Write
Memory Access via Addresses
Pointers
Alias – if “forbidden”- becomes programmers responsibility
Arrays
Safer due to contiguous storage
Can be aliased
Normal for Serial. Implications for Parallel? Locality?
16
3.4.2 Stack Allocation
Dynamic Allocation
Nested, as in function calls
Where is stack used by systems?
LIFO
Parallel: each thread has own stack
Preserves locality
17
3.4.3 Heap Allocation
Definition?
Where used by system?
Features
Dynamic, Complex, Slow
No Locality guarantee, Loss of Coherence
Fragmented memory
Limited Scalability
18
3.4.4 & 3.4.5 Closures & Objects
Omit
19
3.5 Parallel Data Management Patterns
Shared or Not Shared data
Modification patterns of data
Help improve performance
20
3.5.1 Pack - Unpack
Eliminate unused space in a collection (e.g. array)
How?
Assign 0 or 1 to locations
Use Scan (Parallel Prefix) to compute new address
Write to new array
EXAMPLE - Figure 3.12 (P. 98)
Unpack – return to original array
Applications??
21
3.5.2 Pipeline
Sequence (series) of processing elements such that the
output of 1 element is the input of the next element
Functional Decomposition – limited parallelism – number of
stages is generally fixed
Useful
For serially dependent tasks
When nested with other patterns
22
3.5.3 Geometric Decomposition
3.5.4 Gather
Omit
23
© Copyright 2026 Paperzz