Scaling Formal Methods toward Hierarchical
Protocols in Shared Memory Processors:
Annual Review Presentation – April 2007
Intel SRC Customization Award
2005-TJ-1318
Presenters:
Ganesh Gopalakrishnan
Xiaofang Chen
School of Computing, University of Utah
Salt Lake City, UT
1
Project Personnel
IBM Mentor: Dr. Steven M. German
Intel Mentor: Dr. Ching-Tsun Chou
Primary Student:
Xiaofang Chen
Summer internship planned - IBM T.J. Watson (6/07)
where the research discussed here in Project 2 will be
furthered
Other SRC Student:
Robert Palmer (work involving TLA+ modeling of
communication libraries)
Defense May 10; Expected to join Intel (6/07)
3 other PhD students, 1 MS student, 2 UGs in FV
all working on FV of threading / msg-passing software
2
Multicores are the future!
Their caches are visibly central…
> 80% of chips
shipped will be
multi-core
(photo courtesy of
Intel Corporation.)
3
…and the number of organizations of
multiprocessor caches is mindboggling (e.g.
imagine 80 cores and deeper hierarchies).
Shared / Private
Cluster 1
L1
Cache
Cluster 2
L1
Cache
L1
Cache
Cluster 3
L1
Cache
L1
Cache
L1
Cache
L2 Cache+Local Dir
L2 Cache+Local Dir
L2 Cache+Local Dir
Interface
Interface
Interface
Inclusive / Exclusive
Global Dir
Main
Memory
4
Protocol design happens in “the thick of things”
(many interfaces, constraints of performance,
power, testability).
From “High-throughput coherence control and hardware messaging in
Everest,” by Nanda et.al., IBM J.R&D 45(2), 2001.
5
Future Coherence Protocols
Cache coherence protocols that are tuned for the contexts in which
they are operating can significantly increase performance and
reduce power consumption [Liqun Cheng]
Producer-consumer sharing pattern-aware protocol
[Cheng, HPCA07]
Interconnect-aware coherence protocols
[Cheng, ISCA06]
21% speedup and 15% reduction in network traffic
Heterogeneous Interconnect
Improve performance AND reduce power
11% speedup and 22% wire power savings
Bottom-line: Protocols are going to get more complex!
6
Designers have poor conceptual tools (e.g.,
“Informal MSC drawings”).
Need better notations and tools.
L1-1
(S)
L1-2
(I)
GDir
LDir
(S: L1-1)
Req_S
Swap
Broadcast
Fwd_Req
NAck
Gnt_S
(S: L1-2)
Gnt_S
7
Design Abstractions in More Modern Flows
An Interleaving Protocol Model (Murphi or TLA+ are the
languages of choice here)
FV here eliminates concurrency bugs
Detailed HDL model
FV here eliminates implementation bugs;
however
Correspondence with Interleaving Model is lost
Need more detailed models anyhow
Interleaving Models are very abstract
Monolithic Verification of HDL Code Does not Scale
Design optimizations captured at HDL level
Interleaving model becomes more obsolete
Need an Integrated Flow:
Interleaving -> High level HW View -> Final HDL
8
Related Work in Formal HW Design
BlueSpec
High level design is expressed using atomic
transactions
Synthesizes high level designs into hardware
implementations
Automatic scheduling of high level design steps in
hardware
May not meet performance goals
Malik et.al. Formal Architecture and Microarchitecture
Modeling for Verification
Meant for Instruction Set Processors
Need Formal theory of Refinement from
Interleaving to High level HW Models
9
Our Goals
Develop Methodology to Verify “Realistic” Interleaving
Models
Useful Benchmarks for others
Our particular contributions are towards Hierarchical
protocols
Largely Inspired by Chou et.al.’s work (FMCAD’04)
Xiaofang Chen’s PhD is wrapping up a nice story
here!
Develop Language and Formal Theory for Higher Level
HW Specification & Refinement
Ideas largely due to German & Janssen
Xiaofang Chen’s PhD work is taking ideas from
initial proposal all the way to practical realization!
10
A summary of our work over Y1-2
1.
Three progressively better approaches to verify
hierarchical cache coherence protocols at the interleaving
level
1. A/G
method
of
complementary
abstractions
(FMCAD’06)
2. Extensions to Non-inclusive hierarchies (TR 06-014)
3. Abstract each level separately (to be submitted)
4. Error-trace checking (to be submitted)
2.
A theory of transaction based design and verification
(writeup finished; initial experiments finished)
3.
Modular verification of transactions (writeup in progress;
initial experiments finished)
Number the projects 1.1, 1.2, 1.3, 1.4, 2, and 3
11
Project 1.[1-4] Timeline
1.1: FMCAD’06
results
1.3: Abstraction per level
(more scalable)
1.2: Another hierarchical
benchmark (non-inclusive)
1.4: Automatic Recognition
of spurious/real bugs
12
1.[1-4]: Hierarchical Protocols
Remote Cluster 1
L1
Cache
L1
Cache
Home Cluster
L1
Cache
L1
Cache
Remote Cluster 2
L1
Cache
L1
Cache
L2 Cache+Local Dir
L2 Cache+Local Dir
L2 Cache+Local Dir
RAC
RAC
RAC
Global Dir
Main
Memory
13
Abstracted Protocol #1
Home Cluster
Remote Cluster 1
L1
Cache
L1
Cache
Remote Cluster 2
L2 Cache+Local Dir’
L2 Cache+Local Dir
L2 Cache+Local Dir’
RAC
RAC
RAC
Global Dir
Main
Memory
14
Abstracted Protocol #2
Remote Cluster 1
L1
Cache
L1
Cache
Home Cluster
Remote Cluster 2
L2 Cache+Local Dir
L2 Cache+Local Dir’
L2 Cache+Local Dir’
RAC
RAC
RAC
Global Dir
Main
Memory
15
Non-Circular Assume/Guarantee
We can’t verify this due to state
explosion:
h ║ r1 ║ r2 ╞ Coh
Instead
Check-1: h ║ R1 ║ R2 ╞ Coh1 Λ Guarant1
Check-2: H ║ r1 ║ R2 ╞ Coh2 Λ Guarant2
16
1.2: We applied the non-circular A/G
method to a Non-Inclusive Hierarchical
Protocol….
Protocol features
Broadcast channels
Non-imprecise local dir
Verification challenges
A/G cannot infer local dir from just intraclusters
Coherence may involve multiple L1 caches
17
Verifying Non-Inclusive Protocols
Inferring “L2.State = Excl” from
Outside the cluster
Inside the cluster
Use history variables to change noninclusive to inclusive protocols
18
Experimental Results
Protocols
# of States
Mem
(GB)
Model
Check
Hierarchy
> 1,521,900,000
20
No
Abs-1
234,478,105
20
Y
Abs-2
283,124,383
20
Y
Reduction is over 65%
19
1.3: We then tried a “Split Hierarchy Per Level
Approach” to using non-circular A/G
L1
Cache
L1
Cache
ABS #1
L1
Cache
ABS #2
L2 Cache+Local Dir
L1
Cache
L2 Cache+Local Dir
L2 Cache+Local Dir’
L2 Cache+Local Dir’
L2 Cache+Local Dir’
RAC
RAC
RAC
Global
Dir
Main
Memory
ABS #3
20
A Sample Scenario
Remote Cluster 1
Excl
4. Fwd Req_Ex
5. Grant
3. Fwd Req_Ex
Home Cluster
Remote Cluster 2
Invld
1. Req_Ex
6. Grant
2. Fwd Req_Ex
21
Map to Abstracted Protocols
Remote Cluster 1
Invld
Excl
4. Fwd Req_Ex
Remote Cluster 2
5. Grant
1. Req_Ex
6. Grant
3. Fwd Req_Ex
2. Fwd
Req_Ex
22
Experimental Results
Protocols
# of States
Hierarchy > 438,120,000
Inter
Exec time
(sec)
Mem
(GB)
Model
Check
>125,799
18
No
1,500,621
269
2
Y
Intra-1
564,878
48
2
Y
Intra-2
188,842
18
2
Y
Reduction is over 95% !
23
Project 1.4: Automatic Recognition of
Spurious / Real Bugs in these approaches
Problem statement
Given an error trace of ABS protocol
Is it a real bug of the original protocol?
Solution
In the original protocol, using BFS to
guide the model checking to match the
error trace
Reason: because our abstraction is just projection
24
Basic Idea of Automatic Recognition
Error trace of
Abs. protocol
Directed BFS of original protocol
v1=0, v2=0, v3=0
v1=0, v2=0
drop
v1=1, v2=2
……
v1=3, v2=1, v3=0
keep
keep
v1=1, v2=2, v3=1
v1=0, v2=0, v3=3
……
……
v1=6, v2=8
25
Y3 Plans for Project 1:
Considerable Experience Gained
Three Large Benchmark Protocols (each is 3000+
lines of Murphi Code)
on the web
Have Reduced Verif Complexity of Hier Protocols by
90%
Can Identify Spurious Errors Automatically
All Finite-state
Not Parameterized
No plans for Parameterized
Y3 Plans: Build Tool to support this methodology
26
Summary of Projects 2 and 3
1.
Three progressively better approaches to verify
hierarchical cache coherence protocols at the interleaving
level
1. A/G
method
of
complementary
abstractions
(FMCAD’06)
2. Extensions to deeper, and non-inclusive hierarchies
(TR 06-014)
3. Latest method that abstracts each level separately (to
be submitted)
4. Error-trace checking (to be submitted)
2. A theory of transaction based design and
verification (writeup finished)
3.
Modular verification of transactions (writeup in progress)
27
Transaction Level HW Modeling
The problem addressed: Bridge the gap
between high-level specifications and RTL
implementations
Global properties cannot be formally
verified at RTL Level!
Specifications can be verified, but do they
correctly represent the implementations?
28
Driving Design Benchmark due to
German and Geert Janssen
29
What changes when moving from a
spec to an implementation?
Atomicity
Concurrency
Granularity in modeling
1
client
client
1.1
home
1.3
home
1.2
router
buffer
30
General Mappings between high level
transitions and transactions that help
implement them
High Level Transition 1
1
High Level Transitions take some
non-zero unit of time (conceptual)
Low Level Transitions that help realize 1
1.1
1.2
1.3
Each Low Level Transition takes
One Clock Cycle
31
High-Level and Low-Level Computations
1
1.1
2
3
1.2
1.3
2.1
2.2
3.1
3.3
3.2
32
Specification of High and Low Levels
1
1.1
In Murphi as a Guard Action Rule
1.2
1.3
In HMurphi as Multiple Guard Action Rules
enclosed in a Begin Transaction / End Transaction
The Guards Decide when each low level transition can fire
The Maximal Number of Low Level Transitions Enabled
in any state are concurrently fired within each clock tick
33
Transaction
A transaction is a set of transitions in
Impl that correspond to a transition in
Spec
Transaction
Rule 1
……
Rule n
Endtransaction;
34
Executions
Spec: interleaving
One enabled transition fires at each step
1
2
3
……
Impl: concurrent
All enabled transitions fire at each step
{1.1, 2.1}
{1.2}
{2.2, 3.1, 3.2}
……
35
A Few Notations
Observable variables: VH
These are Variables used in both Spec
and Impl
Impl has additional internal variables also
A variable v is inactive at a state s if
all transactions in Impl that can write
to v are quiescent at s
36
A Formal Notion of Simulation
For every concurrent execution of
Impl, exists an interleaving execution
of Spec, VH ∩ inactive(li) match
l0
h0
{…}
t0
l1
h1
{…}
t1
l2
h2
{…}
t2
……
……
37
Simulation Checks
Guard for Spec
transition must hold
Spec(I)
I
Spec transition
Impl transaction
Spec(I’)
I’
Observable vars
changed by either
Spec or Impl must
match
I is a reachable state
where the commit
guard is true
38
Model Checking Approaches
Monolithic
Cross product construction
Compositional
Abstraction
Assume/Guarantee
39
Compositional Approach
Abstraction
Change read to an access of an input var
Self-sourced read
Add all transitions that write to a var
Assume/Guarantee
Require all writes to var guarantee prop P
Assume P holds on all reads
40
Example of Abstraction
Transaction 1
Transaction 2
……
Transaction
…
Rule (v1 = d1) => ...
…
Endtransaction
Transaction n
41
Example of Assume/Guarantee
Transaction 1: Request granted
State := Excl
…
Impl.State = Spec.State
Data := d
Transaction 2: Update Cache
42
Benchmarks
High level in FMCAD’04 tutorial
Low level provided by German and
Janssen
Sizes:
1 Home node, 1 remote node
Sizes are constrained by accessible VHDL tools!
43
Implementations
Muv: HMurphi VHDL
Written by German
Mud:
Static analyzer for possible conflicts /
dependencies
VHDL verifier
IBM RuleBase
44
Preliminary Results
Approaches
# FlipFlops
#
Gates
Time
(min)
Monolithic
212
8574
17
W/W
conflicts
108
5763
11
closures
89
2194
3
Decomposed
* This is for datapath = 1 bit
* Intel Xeon CPU 3.0GHz, 2GB memory
45
When Datapath > 1 bit
Cannot check monolithic approach
RuleBase 300 F-F academic license restriction
Decomposed approach
W/W checks not affected
Datapath bits
# of F-F
# of Gates
1
89
2194
2
97
2380
26
289
6659
46
Future Work
Reduce the cost of W/W conflicts
checking
Localized reasoning
Apply to pipeline
More benchmarks
Try other VHDL tools
SixthSense etc.
47
Publications, Software, Models
FMCAD 2006 paper
Presentation at Intel
Journal version of hierarchical coherence protocol verification (under
prep)
TR on Theory of Transaction Based Specification and Verification
(under prep)
Detailed VHDL-level German Protocol developed
Analysis Framework for HMurphi Developed
Preliminary Verification Experiments using Cadence IFV, IBM
RuleBase, and IBM SixthSense
Xiaofang Chen’s Summer Internship at IBM T.J. Watson Res. Ctr.
Robert’s SRC Poster
Techcon 2007 submission
There will be more publications during 2007-8 following hiatus due to
infrastructure build-up (many delays!)
48
© Copyright 2026 Paperzz