Spin Locks and Contention
Companion slides for
The Art of Multiprocessor
Programming
by Maurice Herlihy & Nir Shavit
Kinds of Architectures
• SISD (Uniprocessor)
– Single instruction stream
– Single data stream
• SIMD (Vector)
– Single instruction
– Multiple data
• MIMD (Multiprocessors)
– Multiple instruction
– Multiple data.
Art of Multiprocessor Programming
2
Kinds of Architectures
• SISD (Uniprocessor)
– Single instruction stream
– Single data stream
• SIMD (Vector)
– Single instruction
– Multiple data
Our space
• MIMD (Multiprocessors)
– Multiple instruction
– Multiple data.
(1)
Art of Multiprocessor Programming
3
MIMD Architectures
memory
Shared Bus
Distributed
• Memory Contention
• Communication Contention
• Communication Latency
Art of Multiprocessor Programming
4
What Should you do if you can’t
get a lock?
• Keep trying
– “spin” or “busy-wait”
– Good if delays are short
• Give up the processor
– Good if delays are long
– Always good on uniprocessor
Art of Multiprocessor Programming
5 (1)
What Should you do if you can’t
get a lock?
• Keep trying
– “spin” or “busy-wait”
– Good if delays are short
• Give up the processor
– Good if delays are long
– Always good on uniprocessor
our focus
Art of Multiprocessor Programming
6
Basic Spin-Lock
..
CS
spin
lock
critical
section
Art of Multiprocessor Programming
Resets lock
upon exit
7
Basic Spin-Lock
…lock introduces
sequential bottleneck
..
CS
spin
lock
critical
section
Art of Multiprocessor Programming
Resets lock
upon exit
8
Basic Spin-Lock
…lock suffers from
contention
..
CS
spin
lock
critical
section
Art of Multiprocessor Programming
Resets lock
upon exit
9
Basic Spin-Lock
…lock suffers from
contention
..
CS
spin
lock
critical
section
Resets lock
upon exit
Seq Bottleneck no
parallelism
Art of Multiprocessor Programming
10
Basic Spin-Lock
…lock suffers from
contention
..
CS
spin
lock
critical
section
Resets lock
upon exit
Contention ???
Art of Multiprocessor Programming
11
Test-and-Set
• Boolean value
• Test-and-set (TAS)
– Swap true with current value
– Return value tells if prior value was true
or false
• Can reset just by writing false
• TAS aka “getAndSet”
Art of Multiprocessor Programming
12
Test-and-Set
public class AtomicBoolean {
boolean value;
public synchronized boolean
getAndSet(boolean newValue) {
boolean prior = value;
value = newValue;
return prior;
}
}
Art of Multiprocessor Programming
13 (5)
Test-and-Set
public class AtomicBoolean {
boolean value;
}
public synchronized boolean
getAndSet(boolean newValue) {
boolean prior = value;
value = newValue;
return prior;
}
Package
java.util.concurrent.atomic
Art of Multiprocessor Programming
14
Test-and-Set
public class AtomicBoolean {
boolean value;
public synchronized boolean
getAndSet(boolean newValue) {
boolean prior = value;
value = newValue;
return prior;
}
}
Swap old and new
values
Art of Multiprocessor Programming
15
Test-and-Set
AtomicBoolean lock
= new AtomicBoolean(false)
…
boolean prior = lock.getAndSet(true)
Art of Multiprocessor Programming
16
Test-and-Set
AtomicBoolean lock
= new AtomicBoolean(false)
…
boolean prior = lock.getAndSet(true)
Swapping in true is called
“test-and-set” or TAS
Art of Multiprocessor Programming
17 (5)
Test-and-Set Locks
• Locking
– Lock is free: value is false
– Lock is taken: value is true
• Acquire lock by calling TAS
– If result is false, you win
– If result is true, you lose
• Release lock by writing false
Art of Multiprocessor Programming
18
Test-and-set Lock
class TASlock {
AtomicBoolean state =
new AtomicBoolean(false);
void lock() {
while (state.getAndSet(true)) {}
}
void unlock() {
state.set(false);
}}
Art of Multiprocessor Programming
19
Test-and-set Lock
class TASlock {
AtomicBoolean state =
new AtomicBoolean(false);
void lock() {
while (state.getAndSet(true)) {}
}
void unlock() {
state.set(false);
Lock
state
}}
is AtomicBoolean
Art of Multiprocessor Programming
20
Test-and-set Lock
class TASlock {
AtomicBoolean state =
new AtomicBoolean(false);
void lock() {
while (state.getAndSet(true)) {}
}
void unlock() {
state.set(false);
Keep
trying
until
}}
lock acquired
Art of Multiprocessor Programming
21
Test-and-set Lock
class TASlock {
Release
lock
AtomicBoolean
state
= by resetting
new AtomicBoolean(false);
state to false
void lock() {
while (state.getAndSet(true)) {}
}
void unlock() {
state.set(false);
}}
Art of Multiprocessor Programming
22
Performance
• Experiment
– n threads
– Increment shared counter 1 million times
• How long should it take?
• How long does it take?
Art of Multiprocessor Programming
23
time
Graph
no speedup
because of
sequential
bottleneck
ideal
threads
Art of Multiprocessor Programming
24
Mystery #1
time
TAS lock
Ideal
threads
Art of Multiprocessor Programming
What is
going
on?
25 (1)
Bus-Based Architectures
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
26
Bus-Based Architectures
Random access memory
(10s of cycles)
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
27
Bus-Based Architectures
Shared Bus
•Broadcast medium
•One broadcaster at a time
•Processors and memory all
“snoop”
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
28
Per-Processor Caches
Bus-Based
Architectures
•Small
•Fast: 1 or 2 cycles
•Address & state information
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
29
Jargon Watch
• Cache hit
– “I found what I wanted in my cache”
– Good Thing™
Art of Multiprocessor Programming
30
Jargon Watch
• Cache hit
– “I found what I wanted in my cache”
– Good Thing™
• Cache miss
– “I had to shlep all the way to memory
for that data”
– Bad Thing™
Art of Multiprocessor Programming
31
Processor Issues Load Request
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
32
Processor Issues Load Request
Gimme
data
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
33
Memory Responds
cache
cache
cache
Bus
Got your
data right
here
memory
Art of Multiprocessor Programming
Bus
data
34
Processor Issues Load Request
Gimme
data
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
35
Processor Issues Load Request
Gimme
data
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
36
Processor Issues Load Request
I got
data
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
37
Other Processor Responds
I got
data
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
Bus
data
38
Other Processor Responds
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
Bus
data
39
Modify Cached Data
data
data
cache
Bus
memory
Art of Multiprocessor Programming
data
40 (1)
Modify Cached Data
data
data
data
cache
Bus
memory
Art of Multiprocessor Programming
data
41 (1)
Modify Cached Data
data
data
cache
Bus
memory
Art of Multiprocessor Programming
data
42
Modify Cached Data
data
data
cache
Bus
What’s up with the
other copies?memory
Art of Multiprocessor Programming
data
43
Cache Coherence
• We have lots of copies of data
– Original copy in memory
– Cached copies at processors
• Some processor modifies its own copy
– What do we do with the others?
– How to avoid confusion?
Art of Multiprocessor Programming
44
Write-Back Caches
• Accumulate changes in cache
• Write back when needed
– Need the cache for something else
– Another processor wants it
• On first modification
– Invalidate other entries
– Requires non-trivial protocol …
Art of Multiprocessor Programming
45
Write-Back Caches
• Cache entry has three states
– Invalid: contains raw seething bits
– Valid: I can read but I can’t write
– Dirty: Data has been modified
• Intercept other load requests
• Write back to memory before using cache
Art of Multiprocessor Programming
46
Invalidate
data
data
cache
Bus
memory
Art of Multiprocessor Programming
data
47
Invalidate
data
data
Mine, all
mine!
cache
Bus
memory
Art of Multiprocessor Programming
data
48
Invalidate
Uh,oh
data
cache
data
cache
Bus
memory
Art of Multiprocessor Programming
data
49
Invalidate
Other caches lose read permission
cache
data
cache
Bus
memory
Art of Multiprocessor Programming
data
50
Invalidate
Other caches lose read permission
cache
data
cache
Bus
This cache acquires write permission
memory
Art of Multiprocessor Programming
data
51
Invalidate
Memory provides data only if not
present in any cache, so no need to
change it now
(expensive)
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
52 (2)
Another Processor Asks for
Data
cache
data
cache
Bus
memory
Art of Multiprocessor Programming
data
53 (2)
Owner Responds
Here it is!
cache
data
cache
Bus
memory
Art of Multiprocessor Programming
data
54 (2)
End of the Day …
data
data
cache
Bus
data
memory
Reading OK, no writing
Art of Multiprocessor Programming
55 (1)
Mutual Exclusion
• What do we want to optimize?
– Bus bandwidth used by spinning threads
– Release/Acquire latency
– Acquire latency for idle lock
Art of Multiprocessor Programming
56
Simple TASLock
• TAS invalidates cache lines
• Spinners
– Miss in cache
– Go to bus
Art of Multiprocessor Programming
57
NUMA Architecturs
• Acronym:
– Non-Uniform Memory Architecture
• Illusion:
– Flat shared memory
• Truth:
– No caches (sometimes)
– Some memory regions faster than others
Art of Multiprocessor Programming
58
NUMA Machines
Spinning on local
memory is fast
Art of Multiprocessor Programming
59
NUMA Machines
Spinning on remote
memory is slow
Art of Multiprocessor Programming
60
© Copyright 2026 Paperzz