PrWr / BusWr PrWr / BusWr PrRd / BusRd

Cache Coherence for Shared
Memory Multiprocessors
1
Cache Coherence Problem
 Example
 Processors see different values for u after event 3
P2
P1
u=?
$
P3
3
u=?
4
$
5
$
u :5 u= 7
u :5
I/O devices
1
u:5
2
Memory
2
Bus Snooping
 A coherence technique for Bus-based shared
memory multiprocessors
 Snoopy cache controller (SCC) inserted to do bus
snooping
 Bus transactions are visible to all SCCs
P1
Pn
SCC
$
SCC
$
Bus
Mem
I/O devices
3
Snooping for Write-Through Caches
 When a SCC detects a relevant write transaction, it
can either
 Invalidate the block containing the relevant variable
(write-invalidate approach)
 Update the value in cache (write-update approach)
4
Write-Invalidate Protocol
 Two states per block in each cache
 As in uniprocessor
 Hardware state bits associated with blocks that are in the cache
 Invalid state is also used in place of “not present” state
PrRd/ -PrWr / BusWr
V
BusWr / --
State Tag Data
Pn
P1
$
PrRd / BusRd
State Tag Data
Bus
Mem
$
I/O devices
I
PrWr / BusWr
A/B: if A is observed,
transaction B is generated
This is just a particular design where
on a write miss, the processor writes to
main memory. Other designs may read
the block first to validate it.
5
Example
 Three processors, consider the states of the blocks
containing X
Operation
P1 $
(X / State)
P2 $
P3 $
Main
memory
Initially
?/I
?/I
?/I
10
P2 Rd X
?/I
10 / V
?/I
10
P3 Rd X
?/I
10 / V
10 / V
10
invalid. 15 / V
P2 Wr X=15 Block remains
? /I
10 / I
15
10 / I
15
P1 Rd X
Updating the value of X
isn’t enough to validate
15 / block
V
15 / V
the whole
P1 Wr X = 3
3/V
15 / I
10 / I
3
P3 Wr X = 6
3/I
15 / I
10 / I
6
6
Bus Snooping
 Advantages

No need to change processor design

No explicit coherence statements added to program
 Snoopy cache controller observes events from

Local processor

Bus
 Write operations

Write-invalidate vs. write-update

Write-through caches


Snoopy Cache
Controller
See last lecture
Write-back caches

Now, writes take place locally; SCCs don’t observe them

How can we handle this? Extra work has to be done
7
Write-Back Caches
 Usually have a “dirty bit”
 One bit per block
 State
 True: block has been modified
 False: block unchanged
 Use for uniprocessor
 Block has to be written back to memory upon
replacement
 Use for multiprocessors
 Same as uniprocessor plus
 It means the processor “owns” the block
8
The Extra Work …
...before a processor writes into cache, it performs an “ownership”
transaction…

Case 1: No other modified copies of block in system


Case 2: A modified copy exists somewhere in the system



Processor can write back
Old owner

Writes block to memory

Invalidates its local copy
New owner

Reads the block as it’s being written back to memory

Performs write
What the new owner did is called “read to own” (read to modify)
transaction

There is only one owner at a time

Still don’t get it? Wait until you see the MSI protocol!
9
Ownership Overhead
 Ownership transactions are overhead
 If it happens every time a write is needed
 A block will be written back to memory every time
 Then, write-back caches would be as good/bad as
write-through
 Let’s cross our fingers and count on the concept of
locality
 Spatial and temporal locality can do it for us
 A processor owns the block and performs several
writes consecutively
10
MSI Protocol: States
 We need to differentiate between reads and writes
 Split the Valid state into two states
 I: Invalid
Invalid
 S: Shared (one or more can read only)
 M: Modified or Dirty (only one can write)
Valid
 This means it’s another write-invalidate
protocol
11
MSI Protocol: Events/Actions
 Local processor events
 PrRd: read
 PrWr: write
 Bus transactions
 BusRd: read w/ no intent to modify
 BusRdX: read w/ intent to modify (read to own)
 BusWB: update memory
 Possible actions
 _: Nothing
 BusRd: send read request over the bus
 BusRdX: ownership (read to own) transaction
 Flush: copy modified block to memory
12
MSI Protocol: State Transitions
PrRd, PrWr/_
M
Promote
PrWr/BusRdX
PrRd/BusRd
BusRdX/Flush
BusRdX/_
PrRd,BusRd
/_
Demote
BusRd/Flush
S
PrWr/BusRdX
I
13
MSI Protocol: Example
 Three processors, consider the states of the blocks
containing X
Operation
P1 $
(X / State)
P2 $
P3 $
Main memory
Initially
?/I
?/I
?/I
10
P2 Rd X
?/I
10 / S
?/I
10
P3 Rd X
?/I
10 / S
10 / S
10
P2 Wr X=15
?/I
15 / M
10 / I
10
P1 Rd X
15 / S
15 / S
10 / I
15
P1 Wr X = 3
3/M
15 / I
10 / I
15
P1 Wr X = 6
6/M
15 / I
10 / I
15
14
MESI Protocol: What’s wrong with MSI?

Another write-invalidate protocol

Consider this MSI scenario

Block containing X isn’t in any cache

P1 reads X: BusRd, state: S

P1 modifies X: BusWr, state: M

BusWr is to let everybody else know X is being modified

Previous scenario has 2 bus transactions

No need for 2 transactions since P1 is the only processor to
know about X!
15
MESI Protocol: States
 Same as MSI except S is split in 2
 E: Exclusive clean (only one processor)
 S: Shared clean (more than one processor)
 Let’s consider same scenario
 Block containing X isn’t in any cache
 P1 reads X: BusRd, state: E
 P1 modifies X: nothing, state: M
 In other words, P1 doesn’t need to let anybody
know about the modification
16
MESI Protocol: Hardware Support
 Additional bus signal is needed
 Use S signal (S for shared)
 This helps processor know whether to load block in E
or S state
 A cache controller asserts S signal if the relevant
block is in cache
 S bus signal is a wired OR line
17
MESI Protocol: State Transitions


A fast way for the new reader
to read the block
While flushing a shared block,
Flush’ means only 1 processor
is responsible
Other protocol variations may
not flush a clean block
E
S
I
BusRdX/Flush
BusRdX/Flush’
S
Not(S)
BusRd/Flush
PrRd,/_
Demote

M
PrWr/_

Diagram only showing labels for
what’s different from MSI
Flushing a “clean” block
Promote

18
Dragon Protocol
 Write-back update protocol
 States

Exclusive (E): 1 cache has a clean copy

Shared-clean (Sc): 2 or more caches have a clean copy;
memory up-to-date

Shared-modified (Sm): 1 cache just modified the block,
some other chaches memory outdated

Modified (M): 1 cache has a modified copy
 Added processor events: PrRdMiss, PrWrMiss
(remember we don’t have I state)
 Added bus transactions: BusUpd

Broadcast the word or byte written by processor so other
processors can update their copies
19
Dragon Protocol: State Transitions
PrRd/—
BusUpd/Update
PrRd/—
BusRd/—
E
Sc
PrRdMiss/BusRd(S)
PrRdMiss/BusRd(S)
PrWr/—
PrWr/BusUpd(S)
PrWr/BusUpd(S)
BusUpd/Update
BusRd/Flush
PrWrMiss/BusRd(S)
PrWrMiss/(BusRd(S); BusUpd)
Sm
M
PrWr/BusUpd(S)
PrRd/—
PrWr/BusUpd(S)
BusRd/Flush
PrRd/—
PrWr/—
20
Snoopy Protocol Taxonomy
Cache
Write- through
Write-back
Protocol
Write-invalidate
Write-update
IV
MSI
MESI
Homework
Dragon
21