Cache Coherence for Shared Memory Multiprocessors 1 Cache Coherence Problem Example Processors see different values for u after event 3 P2 P1 u=? $ P3 3 u=? 4 $ 5 $ u :5 u= 7 u :5 I/O devices 1 u:5 2 Memory 2 Bus Snooping A coherence technique for Bus-based shared memory multiprocessors Snoopy cache controller (SCC) inserted to do bus snooping Bus transactions are visible to all SCCs P1 Pn SCC $ SCC $ Bus Mem I/O devices 3 Snooping for Write-Through Caches When a SCC detects a relevant write transaction, it can either Invalidate the block containing the relevant variable (write-invalidate approach) Update the value in cache (write-update approach) 4 Write-Invalidate Protocol Two states per block in each cache As in uniprocessor Hardware state bits associated with blocks that are in the cache Invalid state is also used in place of “not present” state PrRd/ -PrWr / BusWr V BusWr / -- State Tag Data Pn P1 $ PrRd / BusRd State Tag Data Bus Mem $ I/O devices I PrWr / BusWr A/B: if A is observed, transaction B is generated This is just a particular design where on a write miss, the processor writes to main memory. Other designs may read the block first to validate it. 5 Example Three processors, consider the states of the blocks containing X Operation P1 $ (X / State) P2 $ P3 $ Main memory Initially ?/I ?/I ?/I 10 P2 Rd X ?/I 10 / V ?/I 10 P3 Rd X ?/I 10 / V 10 / V 10 invalid. 15 / V P2 Wr X=15 Block remains ? /I 10 / I 15 10 / I 15 P1 Rd X Updating the value of X isn’t enough to validate 15 / block V 15 / V the whole P1 Wr X = 3 3/V 15 / I 10 / I 3 P3 Wr X = 6 3/I 15 / I 10 / I 6 6 Bus Snooping Advantages No need to change processor design No explicit coherence statements added to program Snoopy cache controller observes events from Local processor Bus Write operations Write-invalidate vs. write-update Write-through caches Snoopy Cache Controller See last lecture Write-back caches Now, writes take place locally; SCCs don’t observe them How can we handle this? Extra work has to be done 7 Write-Back Caches Usually have a “dirty bit” One bit per block State True: block has been modified False: block unchanged Use for uniprocessor Block has to be written back to memory upon replacement Use for multiprocessors Same as uniprocessor plus It means the processor “owns” the block 8 The Extra Work … ...before a processor writes into cache, it performs an “ownership” transaction… Case 1: No other modified copies of block in system Case 2: A modified copy exists somewhere in the system Processor can write back Old owner Writes block to memory Invalidates its local copy New owner Reads the block as it’s being written back to memory Performs write What the new owner did is called “read to own” (read to modify) transaction There is only one owner at a time Still don’t get it? Wait until you see the MSI protocol! 9 Ownership Overhead Ownership transactions are overhead If it happens every time a write is needed A block will be written back to memory every time Then, write-back caches would be as good/bad as write-through Let’s cross our fingers and count on the concept of locality Spatial and temporal locality can do it for us A processor owns the block and performs several writes consecutively 10 MSI Protocol: States We need to differentiate between reads and writes Split the Valid state into two states I: Invalid Invalid S: Shared (one or more can read only) M: Modified or Dirty (only one can write) Valid This means it’s another write-invalidate protocol 11 MSI Protocol: Events/Actions Local processor events PrRd: read PrWr: write Bus transactions BusRd: read w/ no intent to modify BusRdX: read w/ intent to modify (read to own) BusWB: update memory Possible actions _: Nothing BusRd: send read request over the bus BusRdX: ownership (read to own) transaction Flush: copy modified block to memory 12 MSI Protocol: State Transitions PrRd, PrWr/_ M Promote PrWr/BusRdX PrRd/BusRd BusRdX/Flush BusRdX/_ PrRd,BusRd /_ Demote BusRd/Flush S PrWr/BusRdX I 13 MSI Protocol: Example Three processors, consider the states of the blocks containing X Operation P1 $ (X / State) P2 $ P3 $ Main memory Initially ?/I ?/I ?/I 10 P2 Rd X ?/I 10 / S ?/I 10 P3 Rd X ?/I 10 / S 10 / S 10 P2 Wr X=15 ?/I 15 / M 10 / I 10 P1 Rd X 15 / S 15 / S 10 / I 15 P1 Wr X = 3 3/M 15 / I 10 / I 15 P1 Wr X = 6 6/M 15 / I 10 / I 15 14 MESI Protocol: What’s wrong with MSI? Another write-invalidate protocol Consider this MSI scenario Block containing X isn’t in any cache P1 reads X: BusRd, state: S P1 modifies X: BusWr, state: M BusWr is to let everybody else know X is being modified Previous scenario has 2 bus transactions No need for 2 transactions since P1 is the only processor to know about X! 15 MESI Protocol: States Same as MSI except S is split in 2 E: Exclusive clean (only one processor) S: Shared clean (more than one processor) Let’s consider same scenario Block containing X isn’t in any cache P1 reads X: BusRd, state: E P1 modifies X: nothing, state: M In other words, P1 doesn’t need to let anybody know about the modification 16 MESI Protocol: Hardware Support Additional bus signal is needed Use S signal (S for shared) This helps processor know whether to load block in E or S state A cache controller asserts S signal if the relevant block is in cache S bus signal is a wired OR line 17 MESI Protocol: State Transitions A fast way for the new reader to read the block While flushing a shared block, Flush’ means only 1 processor is responsible Other protocol variations may not flush a clean block E S I BusRdX/Flush BusRdX/Flush’ S Not(S) BusRd/Flush PrRd,/_ Demote M PrWr/_ Diagram only showing labels for what’s different from MSI Flushing a “clean” block Promote 18 Dragon Protocol Write-back update protocol States Exclusive (E): 1 cache has a clean copy Shared-clean (Sc): 2 or more caches have a clean copy; memory up-to-date Shared-modified (Sm): 1 cache just modified the block, some other chaches memory outdated Modified (M): 1 cache has a modified copy Added processor events: PrRdMiss, PrWrMiss (remember we don’t have I state) Added bus transactions: BusUpd Broadcast the word or byte written by processor so other processors can update their copies 19 Dragon Protocol: State Transitions PrRd/— BusUpd/Update PrRd/— BusRd/— E Sc PrRdMiss/BusRd(S) PrRdMiss/BusRd(S) PrWr/— PrWr/BusUpd(S) PrWr/BusUpd(S) BusUpd/Update BusRd/Flush PrWrMiss/BusRd(S) PrWrMiss/(BusRd(S); BusUpd) Sm M PrWr/BusUpd(S) PrRd/— PrWr/BusUpd(S) BusRd/Flush PrRd/— PrWr/— 20 Snoopy Protocol Taxonomy Cache Write- through Write-back Protocol Write-invalidate Write-update IV MSI MESI Homework Dragon 21
© Copyright 2026 Paperzz