Slide - NC State University

An Extensible Simulator for Bus- and
Directory-Based Coherence
Allen Chen
Deepak Souda Bhat
Edward F. Gehringer
North Carolina State University
Cache coherence
One of the main issues in parallel
architecture
 Two main protocol types …



Invalidate
Update
Extensible cache-coherence simulator
[email protected]
Two main architecture types


SMPs … snoopy protocols
DSMs … directory-based protocols
Extensible cache-coherence simulator
[email protected]
Simulator is trace driven

Reads a set of mem refs in this format
1
 1
 2
 2

r
w
r
r
a1663dc4
a1663dc4
a165d30c
a1663dc4
Extensible cache-coherence simulator
[email protected]
CPU action  bus action

CPU action, e.g., write a word


Triggers a bus action, e.g., invalidate
other blocks


PrRd
BusRdX
How this is implemented


do CPU action
 method of cache class
for each other cache  method of main class
 do bus action
 method of cache class
Extensible cache-coherence simulator
[email protected]
Protocols supported

MSI
Extensible cache-coherence simulator
[email protected]
Protocols supported

MESI
Extensible cache-coherence simulator
[email protected]
Protocols supported

MOESI
Extensible cache-coherence simulator
[email protected]
Protocols supported

Firefly
Extensible cache-coherence simulator
[email protected]
Protocols supported

Dragon
Extensible cache-coherence simulator
[email protected]
Example method—PrRd for MSI
void MSI::PrRd(ulong addr, int processor_number) {
// Per-cache global counter to maintain LRU order among
// cache ways, updated on every cache access
current_cycle++;
reads++;
cache_line * line = find_line(addr);
if (line == NULL) {
// This is a miss
read_misses++;
cache_line *newline = allocate_line(addr);
memory_transactions++;
// State I --> S
newline->set_state(S);
// Read miss --> BusRd
bus_reads++;
sendBusRd(addr, processor_number);
}
Extensible cache-coherence simulator
[email protected]
PrRd for MSI (cont.)
else {
// The block is cached
cache_state state;
state=line->get_state();
if (state == I){
// The block is cached, but in invalid state.
// Hence Read miss
memory_transactions++;
read_misses++;
line->set_state(S);
bus_reads++;
sendBusRd(addr, processor_number);
}
else{
update_LRU(line);
}
}
}
Extensible cache-coherence simulator
[email protected]
How directory-based protocols differ

Along with cache hierarchy,

Cache


a directory hierarchy

Directory


Full bit vector, SCI, SSCI, etc.
Instead of bus actions, signal actions



MSI, MESI, Dragon, etc.
No BusRd, but SignalRd.
No iteration over all other caches
Directories receive

Invalidation, Intervention messages
Extensible cache-coherence simulator
[email protected]
Protocols supported

FBV


State transition for a cache
State transition for main memory
Extensible cache-coherence simulator
[email protected]
Sample assignments

Given MESI and Dragon,


implement MSI and Firefly
Given write-through,
implement MSI with and without
BusUpgr
 implement Firefly

Extensible cache-coherence simulator
[email protected]
Sample assignments, cont.

Given invalidation protocols,


Given a bus-based MESI,


implement directory-based MESI
Reimplement closely related protocols


implement update protocols
as a superclass & subclass
Hybridize two of the protocols,

say, invalidation and update
Extensible cache-coherence simulator
[email protected]
Assignments can study …





Vary
Vary
Vary
Vary
Vary

protocol
cache size
block size
associativity
number of processors
(dependent on trace)
Extensible cache-coherence simulator
[email protected]
Summary


Through coding cache actions, students
learn how cache coherence really works.
There are many different assignments
you can give.


You can use the simulator term after term,
each time trying something new.
Provides a good introduction to how
architectural innovations are simulated.

(But in much less detail, so results are quick.)
Extensible cache-coherence simulator
[email protected]