Shared Counters and Parallelism
Companion slides for
The Art of Multiprocessor
Programming
by Maurice Herlihy & Nir Shavit
Shared Counter
3
2
0
12
3
1
Art of Multiprocessor
Programming
2
Shared Counter
No duplication
3
2
0
12
3
1
Art of Multiprocessor
Programming
3
Shared Counter
No duplication
No Omission 0
3
2
12
3
1
Art of Multiprocessor
Programming
4
Shared Counter
No duplication
No Omission 0
3
2
12
3
1
Not necessarily
linearizable
Art of Multiprocessor
Programming
5
Naïve Counter
Implementation
FAI
3
1
FAI
object
FAI
FAI
4
2
FAI
6
FAI
5
FAI
Last processes to succeed incur θ(n) time complexity!
Can we do much better?
Shared Counters
• Can we build a shared counter with
– Low memory contention, and
– Real parallelism?
Art of Multiprocessor
Programming
7
Software Combining Tree
Contention:
All spinning local
4
Art of Multiprocessor
Programming
Parallelism:
Potential n/log n
speedup
8
Combining Trees
0
Art of Multiprocessor
Programming
9
Combining Trees
0
+3
Art of Multiprocessor
Programming
10
Combining Trees
0
+3
+2
Art of Multiprocessor
Programming
11
Combining Trees
0
+3
+2
Two threads meet,
combine sums
Art of Multiprocessor
Programming
12
Combining Trees
0
+5
+3
+2
Two threads meet,
combine sums
Art of Multiprocessor
Programming
13
Combining Trees
5
Combined sum
added to root
+5
+3
+2
Art of Multiprocessor
Programming
14
Combining Trees
5
Result returned
to children
0
+3
+2
Art of Multiprocessor
Programming
15
Combining Trees
5
0
0
Results returned
to
0
threads
3
Art of Multiprocessor
Programming
16
What if?
• Threads don’t arrive together?
– Should I stay or should I go?
• How long to wait?
– Waiting times add up …
• Idea:
– Use multi-phase algorithm
– Where threads wait in parallel …
Art of Multiprocessor
Programming
17
Combining Status
enum CStatus{
IDLE, FIRST, SECOND, RESULT, ROOT
};
Art of Multiprocessor
Programming
18
Combining Status
enum CStatus{
IDLE, FIRST, SECOND, RESULT, ROOT
};
Nothing going on
Art of Multiprocessor
Programming
19
Combining Status
enum CStatus{
IDLE, FIRST, SECOND, RESULT, ROOT
};
1st thread is a partner for combining,
will return to check for 2nd thread
Art of Multiprocessor
Programming
20
Combining Status
enum CStatus{
IDLE, FIRST, SECOND, RESULT, ROOT
};
2nd thread has arrived with
value for combining
Art of Multiprocessor
Programming
21
Combining Status
enum CStatus{
IDLE, FIRST, SECOND, RESULT, ROOT
};
1st thread has deposited result
for 2nd thread
Art of Multiprocessor
Programming
22
Combining Status
enum CStatus{
IDLE, FIRST, SECOND, RESULT, ROOT
};
Special case: root node
Art of Multiprocessor
Programming
23
Node Synchronization
• Short-term
– Synchronized methods
– Consistency during method call
• Long-term
– Boolean locked field
– Consistency across calls
Art of Multiprocessor
Programming
24
Phases
• Precombining
– Set up combining rendez-vous
Art of Multiprocessor
Programming
25
Phases
• Precombining
– Set up combining rendez-vous
• Combining
– Collect and combine operations
Art of Multiprocessor
Programming
26
Phases
• Precombining
– Set up combining rendez-vous
• Combining
– Collect and combine operations
• Operation
– Hand off to higher thread
Art of Multiprocessor
Programming
27
Phases
• Precombining
– Set up combining rendez-vous
• Combining
– Collect and combine operations
• Operation
– Hand off to higher thread
• Distribution
– Distribute results to waiting threads
Art of Multiprocessor
Programming
28
High-level operation structure
Art of Multiprocessor
Programming
29
Precombining Phase
0
IDLE
Examine status
Art of Multiprocessor
Programming
30
Precombining Phase
0
FIRST
If IDLE,0 promise to
return to look for
partner
Art of Multiprocessor
Programming
31
Precombining Phase
0
At ROOT,turn
back
FIRST
Art of Multiprocessor
Programming
32
Precombining Phase
0
FIRST
Art of Multiprocessor
Programming
33
Precombining Phase
0
SECOND
If FIRST, I’m
willing to0 combine,
but lock for now
Art of Multiprocessor
Programming
34
Code
• Tree class
– In charge of navigation
• Node class
– Combining state
– Synchronization state
– Bookkeeping
Art of Multiprocessor
Programming
35
Precombining Navigation
Node node = myLeaf;
while (node.precombine()) {
node = node.parent;
}
Node stop = node;
Art of Multiprocessor
Programming
36
Precombining Navigation
Node node = myLeaf;
while (node.precombine()) {
node = node.parent;
}
Node stop = node;
Start at leaf
Art of Multiprocessor
Programming
37
Precombining Navigation
Node node = myLeaf;
while (node.precombine()) {
node = node.parent;
}
Node stop = node;
Move up while
instructed to do so
Art of Multiprocessor
Programming
38
Precombining Navigation
Node node = myLeaf;
while (node.precombine()) {
node = node.parent;
}
Node stop = node;
Remember where we
stopped
Art of Multiprocessor
Programming
39
Precombining Node
synchronized boolean precombine() {
while (locked) wait();
switch (cStatus) {
case IDLE: cStatus = CStatus.FIRST;
return true;
case FIRST: locked = true;
cStatus = CStatus.SECOND;
return false;
case ROOT: return false;
default: throw new PanicException()
}
}
Art of Multiprocessor
Programming
40
Precombining Node
synchronized boolean precombine() {
while (locked) wait();
switch (cStatus) {
case IDLE: cStatus = CStatus.FIRST;
return true;
case FIRST: locked = true;
cStatus = CStatus.SECOND;
return false;
case ROOT: return false;
default: throw new PanicException()
Short-term
}
synchronization
}
Art of Multiprocessor
Programming
41
Synchronization
synchronized boolean precombine() {
while (locked) wait();
switch (cStatus) {
case IDLE: cStatus = CStatus.FIRST;
return true;
case FIRST: locked = true;
cStatus
= CStatus.SECOND;
Wait
while
node is locked
false;
(in usereturn
by earlier
combining phase)
case ROOT: return false;
default: throw new PanicException()
}
}
Art of Multiprocessor
Programming
42
Precombining Node
synchronized boolean precombine() {
while (locked) wait();
switch (cStatus) {
case IDLE: cStatus = CStatus.FIRST;
return true;
case FIRST: locked = true;
cStatus = CStatus.SECOND;
return false;
case ROOT: return false;
default: throw new PanicException()
}
Check combining
}
Art of Multiprocessor
Programming
status
43
Node was IDLE
synchronized boolean precombine() {
while (locked) {wait();}
switch (cStatus) {
case IDLE: cStatus = CStatus.FIRST;
return true;
case FIRST: locked = true;
cStatus = CStatus.SECOND;
return false;
case ROOT: return false;
default: throw new PanicException()
nd
I
will
return
to
look
for
2
}
thread’s input value
}
Art of Multiprocessor
Programming
44
Precombining Node
synchronized boolean precombine() {
while (locked) {wait();}
switch (cStatus) {
case IDLE: cStatus = CStatus.FIRST;
return true;
case FIRST: locked = true;
cStatus = CStatus.SECOND;
return false;
Continue up the
case ROOT: return false;
default: throw new PanicException()
}
}
Art of Multiprocessor
Programming
tree
45
I’m the
nd
2
Thread
synchronized boolean precombine() {
while (locked) {wait();}
switch (cStatus) {
case IDLE: cStatus = CStatus.FIRST;
return true;
case FIRST: locked = true;
cStatus = CStatus.SECOND;
return false;
case ROOT: return false;
default: throw new PanicException()
st thread has promised to return,
If
1
}
} lock node so it won’t leave without me
Art of Multiprocessor
Programming
46
Precombining Node
synchronized boolean precombine() {
while (locked) {wait();}
switch (cStatus) {
case IDLE: cStatus = CStatus.FIRST;
return true;
case FIRST: locked = true;
cStatus = CStatus.SECOND;
return false;
case ROOT: return false;
default: throw new PanicException()
nd
Prepare
to
deposit
2
}
thread’s input value
}
Art of Multiprocessor
Programming
47
Precombining Node
synchronized
boolean phase1() {
End of precombining
while (sStatus==SStatus.BUSY) {wait();}
phase, don’t continue
switch (cStatus) {
up tree
case IDLE: cStatus = CStatus.FIRST;
return true;
case FIRST: locked = true;
cStatus = CStatus.SECOND;
return false;
case ROOT: return false;
default: throw new PanicException()
}
}
Art of Multiprocessor
Programming
48
Node is the Root
synchronized
boolean phase1() {
If root, precombining
while (sStatus==SStatus.BUSY) {wait();}
phase ends, don’t
switch (cStatus) {
continue up tree
case IDLE: cStatus = CStatus.FIRST;
return true;
case FIRST: locked = true;
cStatus = CStatus.SECOND;
return false;
case ROOT: return false;
default: throw new PanicException()
}
}
Art of Multiprocessor
Programming
49
Precombining Node
synchronized boolean precombine() {
Always check for
while (locked) {wait();}
switch (cStatus) {
unexpected values!
case IDLE: cStatus = CStatus.FIRST;
return true;
case FIRST: locked = true;
cStatus = CStatus.SECOND;
return false;
case ROOT: return false;
default: throw new PanicException()
}
}
Art of Multiprocessor
Programming
50
Combining Phase
0
SECOND
+3
1st thread locked
0 2nd
out until
provides value
Art of Multiprocessor
Programming
51
Combining Phase
0
SECOND
+3
2
2nd thread deposits
value to 0be combined,
unlocks node, & waits …
zzz
Art of Multiprocessor
Programming
52
Combining Phase
0
+5
SECOND
+3
2
+2
1st thread moves up
the tree with combined
value …
zzz
Art of Multiprocessor
Programming
53
Combining (reloaded)
0
FIRST
2nd thread
0 has not
yet deposited value …
Art of Multiprocessor
Programming
54
Combining (reloaded)
0
FIRST
+3
1st thread is alone,
locks out late
partner
Art of Multiprocessor
Programming
55
Combining (reloaded)
Stop at root
0
+3
FIRST
+3
Art of Multiprocessor
Programming
56
Combining (reloaded)
0
+3
FIRST
+3
2nd thread’s late
precombining phase
visit locked out
Art of Multiprocessor
Programming
57
Combining Navigation
node = myLeaf;
int combined = 1;
while (node != stop) {
combined = node.combine(combined);
stack.push(node);
node = node.parent;
}
Art of Multiprocessor
Programming
58
Combining Navigation
node = myLeaf;
int combined = 1;
while (node != stop) {
combined = node.combine(combined);
stack.push(node);
node = node.parent;
}
Start at leaf
Art of Multiprocessor
Programming
59
Combining Navigation
node = myLeaf;
int combined = 1;
while (node != stop) {
combined = node.combine(combined);
stack.push(node);
node = node.parent;
}
Add 1
Art of Multiprocessor
Programming
60
Combining Navigation
node = myLeaf;
int combined = 1;
while (node != stop) {
combined = node.combine(combined);
stack.push(node);
Revisit nodes
node = node.parent;
visited in
}
precombining
Art of Multiprocessor
Programming
61
Combining Navigation
node = myLeaf;
int combined = 1;
while (node != stop) {
combined = node.combine(combined);
stack.push(node);
node = node.parent;
}
Accumulate combined
values, if any
Art of Multiprocessor
Programming
62
Combining Navigation
node = myLeaf; We will retraverse path in
int combined = 1;
reverse order …
while (node != stop) {
combined = node.combine(combined);
stack.push(node);
node = node.parent;
}
Art of Multiprocessor
Programming
63
Combining Navigation
node = myLeaf;
Move up the tree
int combined = 1;
while (node != stop) {
combined = node.combine(combined);
stack.push(node);
node = node.parent;
}
Art of Multiprocessor
Programming
64
Combining Phase Node
synchronized int combine(int combined) {
while (locked) wait();
locked = true;
firstValue = combined;
switch (cStatus) {
case FIRST:
return firstValue;
case SECOND:
return firstValue + secondValue;
default: …
}
}
Art of Multiprocessor
Programming
65
Combining Phase Node
synchronized int combine(int combined) {
while (locked) wait();
locked = true;
firstValue = combined;
switch (cStatus) {
case FIRST:
return firstValue;
Wait until node is
case SECOND:
unlocked.
It is locked by
return firstValue
+ secondValue;
the 2nd thread
default: …
}
until it deposits its value
}
Art of Multiprocessor
Programming
66
Combining Phase Node
synchronized int combine(int combined) {
while (locked) wait();
locked = true;
firstValue = combined;
switch (cStatus) {
case FIRST:
return firstValue;
Why is it that no thread
case SECOND:
acquires+ the
lock between
return firstValue
secondValue;
the two lines?
default: …
}
}
Art of Multiprocessor
Programming
67
Combining Phase Node
synchronized int combine(int combined) {
while (locked) wait();
locked = true;
firstValue = combined;
Lock out late
switch (cStatus) {
attempts to combine
case FIRST:
(by threads still in
return firstValue;
case SECOND:
precombining)
return firstValue + secondValue;
default: …
}
}
Art of Multiprocessor
Programming
68
Combining Phase Node
synchronized int combine(int combined) {
while (locked) wait();
locked = true;
firstValue = combined;
switch (cStatus) {
case FIRST:
return firstValue;
case SECOND:
return firstValue + secondValue;
default: …
}
Remember my (1st
}
thread)
contribution
Art of Multiprocessor
Programming
69
Combining Phase Node
synchronized int combine(int combined) {
while (locked) wait();
Check status
locked = true;
firstValue = combined;
switch (cStatus) {
case FIRST:
return firstValue;
case SECOND:
return firstValue + secondValue;
default: …
}
}
Art of Multiprocessor
Programming
70
Combining Phase Node
synchronized int combine(int combined) {
while (locked) wait();
st thread) am
I
(1
locked = true;
firstValue = combined;
switch (cStatus) {
case FIRST:
return firstValue;
case SECOND:
return firstValue + secondValue;
default: …
}
}
Art of Multiprocessor
Programming
alone
71
Combining Node
synchronized int combine(int combined) {
while (locked) wait();
locked = true;
Not alone:
firstValue = combined;
combine with
switch (cStatus) {
2nd thread
case FIRST:
return firstValue;
case SECOND:
return firstValue + secondValue;
default: …
}
}
Art of Multiprocessor
Programming
72
Operation Phase
5
+5
+3
Add combined value to root,
start back down
+2
zzz
Art of Multiprocessor
Programming
73
Operation Phase (reloaded)
5
SECOND
2
Leave value to
be combined …
Art of Multiprocessor
Programming
74
Operation Phase (reloaded)
5
SECOND
Unlock, and
wait …
2
+2
zzz
Art of Multiprocessor
Programming
75
Operation Phase Navigation
prior = stop.op(combined);
Art of Multiprocessor
Programming
76
Operation Phase Navigation
prior = stop.op(combined);
The node where we stopped.
Provide collected sum and wait for
combining result
Art of Multiprocessor
Programming
77
Operation on Stopped Node
synchronized int op(int combined) {
switch (cStatus) {
case ROOT: int prior = result;
result += combined;
return prior;
case SECOND: secondValue = combined;
locked = false; notifyAll();
while (cStatus != CStatus.RESULT) wait();
locked = false; notifyAll();
cStatus = CStatus.IDLE;
return result;
default: …
Art of Multiprocessor
Programming
78
Op States of Stop Node
synchronized int op(int combined) {
switch (cStatus) {
case ROOT: int prior = result;
result += combined;
return
prior; and SECOND possible.
Only ROOT
case SECOND: secondValue = combined;
Why?
locked = false; notifyAll();
while (cStatus != CStatus.RESULT) wait();
locked = false; notifyAll();
cStatus = CStatus.IDLE;
return result;
default: …
Art of Multiprocessor
Programming
79
At Root
synchronized int op(int combined) {
switch (cStatus) {
case ROOT: int prior = result;
result += combined;
return prior;
case SECOND: secondValue = combined;
locked = false; notifyAll();
while (cStatus != CStatus.RESULT) wait();
locked = false; notifyAll();
cStatus = CStatus.IDLE;
Add sum to root,
return result;
default: …
return prior value
Art of Multiprocessor
Programming
80
Intermediate Node
synchronized int op(int combined) {
switch (cStatus) {
case ROOT: int prior = result;
result += combined;
return prior;
case SECOND: secondValue = combined;
locked = false; notifyAll();
while (cStatus != CStatus.RESULT) wait();
locked = false; notifyAll();
cStatus = CStatus.IDLE;
Deposit value for
return result;
default: …
later combining …
Art of Multiprocessor
Programming
81
Intermediate Node
synchronized
int op(int
combined)
Unlock
node (which
I {locked in
switch (cStatus) {
precombining). Then notify 1st thread
case ROOT: int prior = result;
result += combined;
return prior;
case SECOND: secondValue = combined;
locked = false; notifyAll();
while (cStatus != CStatus.RESULT) wait();
locked = false; notifyAll();
cStatus = CStatus.IDLE;
return result;
default: …
Art of Multiprocessor
Programming
82
Intermediate Node
synchronized int op(int combined) {
switch (cStatus) {
for 1st thread
case ROOT: int prior Wait
= result;
result += combined; to deliver results
return prior;
case SECOND: secondValue = combined;
locked = false; notifyAll();
while (cStatus != CStatus.RESULT) wait();
locked = false; notifyAll();
cStatus = CStatus.IDLE;
return result;
default: …
Art of Multiprocessor
Programming
83
Intermediate Node
synchronized int op(int combined) {
switch (cStatus) {
st thread
Unlock
node
(locked
by
1
case ROOT: int prior = result;
result in
+= combining
combined; phase) & return
return prior;
case SECOND: secondValue = combined;
locked = false; notifyAll();
while (cStatus != CStatus.RESULT) wait();
locked = false; notifyAll();
cStatus = CStatus.IDLE;
return result;
default: …
Art of Multiprocessor
Programming
84
Distribution Phase
5
0
Move down with
result
SECOND
zzz
Art of Multiprocessor
Programming
85
Distribution Phase
5
SECOND
Leave result
for 2nd thread
& lock node
3
zzz
Art of Multiprocessor
Programming
86
Distribution Phase
5
SECOND
Push result
down tree
3
0
zzz
Art of Multiprocessor
Programming
87
Distribution Phase
5
2nd thread awakens,
unlocks, takes value
IDLE
3
Art of Multiprocessor
Programming
88
Distribution Phase Navigation
while (!stack.empty()) {
node = stack.pop();
node.distribute(prior);
}
return prior;
Art of Multiprocessor
Programming
89
Distribution Phase Navigation
while (!stack.empty()) {
node = stack.pop();
node.distribute(prior);
}
return prior;
Traverse path in
reverse order
Art of Multiprocessor
Programming
90
Distribution Phase Navigation
while (!stack.empty()) {
node = stack.pop();
node.distribute(prior);
}
return prior;
Distribute results to
waiting 2nd threads
Art of Multiprocessor
Programming
91
Distribution Phase Navigation
while (!stack.empty()) {
node = stack.pop();
node.distribute(prior);
}
return prior;
Return result
to caller
Art of Multiprocessor
Programming
92
Distribution Phase
synchronized void distribute(int prior) {
switch (cStatus) {
case FIRST:
cStatus = CStatus.IDLE;
locked = false; notifyAll();
return;
case SECOND:
result = prior + firstValue;
cStatus = CStatus.RESULT; notifyAll();
return;
default: …
Art of Multiprocessor
Programming
93
Distribution Phase
synchronized void distribute(int prior) {
switch (cStatus) {
case FIRST:
cStatus = CStatus.IDLE;
locked = false; notifyAll();
return;
case SECOND:
result = prior +ndfirstValue;
No
2
thread
to
combine
cStatus = CStatus.RESULT; notifyAll();
with me, unlock node &
return;
reset
default: …
Art of Multiprocessor
Programming
94
Distribution Phase
nd thread
synchronized
distribute(int
prior)is{
Notify 2void
that result
switch (cStatus)
{
available
(2nd thread
will release lock)
case FIRST:
cStatus = CStatus.IDLE;
locked = false; notifyAll();
return;
case SECOND:
result = prior + firstValue;
cStatus = CStatus.RESULT; notifyAll();
return;
default: …
Art of Multiprocessor
Programming
95
Bad News: High Latency
+5
Log n
+3
+2
Art of Multiprocessor
Programming
96
Good News: Real Parallelism
1 thread
+5
+3
+2
Art of Multiprocessor
Programming
2 threads
97
Throughput Puzzles
• Ideal circumstances
– All n threads move together, combine
– n increments in O(log n) time
• Worst circumstances
– All n threads slightly skewed, locked out
– n increments in O(n · log n) time
Art of Multiprocessor
Programming
98
Index Distribution Benchmark
void indexBench(int iters, int work) {
while (int i < iters) {
i = r.getAndIncrement();
Thread.sleep(random() % work);
}}
Art of Multiprocessor
Programming
99
Index Distribution Benchmark
void indexBench(int iters, int work) {
while (int i < iters) {
i = r.getAndIncrement();
Thread.sleep(random() % work);
}}
How many iterations
Art of Multiprocessor
Programming
100
Index Distribution Benchmark
void indexBench(int iters, int work) {
while (int i < iters) {
i = r.getAndIncrement();
Thread.sleep(random() % work);
}}
Expected time between
incrementing counter
Art of Multiprocessor
Programming
101
Index Distribution Benchmark
void indexBench(int iters, int work) {
while (int i < iters) {
i = r.getAndIncrement();
Thread.sleep(random() % work);
}}
Take a number
Art of Multiprocessor
Programming
102
Index Distribution Benchmark
void indexBench(int iters, int work) {
while (int i < iters) {
i = r.getAndIncrement();
Thread.sleep(random() % work);
}}
Pretend to work
(more work, less concurrency)
Art of Multiprocessor
Programming
103
Performance
• Here are some fake graphs
– Distilled from real ones
Performance
• Here are some fake graphs
– Distilled from real ones
Performance
• Here are some fake graphs
– Distilled from real ones
• Throughput
– Average incs in 1 million cycles
Performance
• Here are some fake graphs
– Distilled from real ones
• Throughput
– Average incs in 1 million cycles
• Latency
– Average cycles per inc
Latency
Spin lock
bad
Combining tree
good
Number of processors
108
Throughput
Spin lock
Combining tree
bad
Combining tree
Spin lock
good
Number of processors
109
Combining Rate vs Work
110
© Copyright 2026 Paperzz