Decomposed Model Checking - Formal Verification at Utah

Scaling Formal Methods toward Hierarchical
Protocols in Shared Memory Processors:
Annual Review Presentation – April 2007
Intel SRC Customization Award
2005-TJ-1318
Presenters:
Ganesh Gopalakrishnan
Xiaofang Chen
School of Computing, University of Utah
Salt Lake City, UT
1
Project Personnel
 IBM Mentor: Dr. Steven M. German
 Intel Mentor: Dr. Ching-Tsun Chou
 Primary Student:
 Xiaofang Chen
 Summer internship planned - IBM T.J. Watson (6/07)
where the research discussed here in Project 2 will be
furthered
 Other SRC Student:
 Robert Palmer (work involving TLA+ modeling of
communication libraries)
 Defense May 10; Expected to join Intel (6/07)
 3 other PhD students, 1 MS student, 2 UGs in FV

all working on FV of threading / msg-passing software
2
Multicores are the future!
Their caches are visibly central…
> 80% of chips
shipped will be
multi-core
(photo courtesy of
Intel Corporation.)
3
…and the number of organizations of
multiprocessor caches is mindboggling (e.g.
imagine 80 cores and deeper hierarchies).
Shared / Private
Cluster 1
L1
Cache
Cluster 2
L1
Cache
L1
Cache
Cluster 3
L1
Cache
L1
Cache
L1
Cache
L2 Cache+Local Dir
L2 Cache+Local Dir
L2 Cache+Local Dir
Interface
Interface
Interface
Inclusive / Exclusive
Global Dir
Main
Memory
4
Protocol design happens in “the thick of things”
(many interfaces, constraints of performance,
power, testability).
From “High-throughput coherence control and hardware messaging in
Everest,” by Nanda et.al., IBM J.R&D 45(2), 2001.
5
Future Coherence Protocols

Cache coherence protocols that are tuned for the contexts in which
they are operating can significantly increase performance and
reduce power consumption [Liqun Cheng]

Producer-consumer sharing pattern-aware protocol
[Cheng, HPCA07]


Interconnect-aware coherence protocols
[Cheng, ISCA06]




21% speedup and 15% reduction in network traffic
Heterogeneous Interconnect
Improve performance AND reduce power
11% speedup and 22% wire power savings
Bottom-line: Protocols are going to get more complex!
6
Designers have poor conceptual tools (e.g.,
“Informal MSC drawings”).
Need better notations and tools.
L1-1
(S)
L1-2
(I)
GDir
LDir
(S: L1-1)
Req_S
Swap
Broadcast
Fwd_Req
NAck
Gnt_S
(S: L1-2)
Gnt_S
7
Design Abstractions in More Modern Flows
 An Interleaving Protocol Model (Murphi or TLA+ are the
languages of choice here)
 FV here eliminates concurrency bugs
 Detailed HDL model
 FV here eliminates implementation bugs;
however
 Correspondence with Interleaving Model is lost

Need more detailed models anyhow

Interleaving Models are very abstract
 Monolithic Verification of HDL Code Does not Scale
 Design optimizations captured at HDL level

Interleaving model becomes more obsolete
 Need an Integrated Flow:

Interleaving -> High level HW View -> Final HDL
8
Related Work in Formal HW Design
 BlueSpec
 High level design is expressed using atomic
transactions
 Synthesizes high level designs into hardware
implementations
 Automatic scheduling of high level design steps in
hardware
 May not meet performance goals
 Malik et.al. Formal Architecture and Microarchitecture
Modeling for Verification
 Meant for Instruction Set Processors
 Need Formal theory of Refinement from
Interleaving to High level HW Models
9
Our Goals
 Develop Methodology to Verify “Realistic” Interleaving
Models




Useful Benchmarks for others
Our particular contributions are towards Hierarchical
protocols
Largely Inspired by Chou et.al.’s work (FMCAD’04)
Xiaofang Chen’s PhD is wrapping up a nice story
here!
 Develop Language and Formal Theory for Higher Level
HW Specification & Refinement


Ideas largely due to German & Janssen
Xiaofang Chen’s PhD work is taking ideas from
initial proposal all the way to practical realization!
10
A summary of our work over Y1-2
1.
Three progressively better approaches to verify
hierarchical cache coherence protocols at the interleaving
level
1. A/G
method
of
complementary
abstractions
(FMCAD’06)
2. Extensions to Non-inclusive hierarchies (TR 06-014)
3. Abstract each level separately (to be submitted)
4. Error-trace checking (to be submitted)
2.
A theory of transaction based design and verification
(writeup finished; initial experiments finished)
3.
Modular verification of transactions (writeup in progress;
initial experiments finished)
Number the projects 1.1, 1.2, 1.3, 1.4, 2, and 3
11
Project 1.[1-4] Timeline
1.1: FMCAD’06
results
1.3: Abstraction per level
(more scalable)
1.2: Another hierarchical
benchmark (non-inclusive)
1.4: Automatic Recognition
of spurious/real bugs
12
1.[1-4]: Hierarchical Protocols
Remote Cluster 1
L1
Cache
L1
Cache
Home Cluster
L1
Cache
L1
Cache
Remote Cluster 2
L1
Cache
L1
Cache
L2 Cache+Local Dir
L2 Cache+Local Dir
L2 Cache+Local Dir
RAC
RAC
RAC
Global Dir
Main
Memory
13
Abstracted Protocol #1
Home Cluster
Remote Cluster 1
L1
Cache
L1
Cache
Remote Cluster 2
L2 Cache+Local Dir’
L2 Cache+Local Dir
L2 Cache+Local Dir’
RAC
RAC
RAC
Global Dir
Main
Memory
14
Abstracted Protocol #2
Remote Cluster 1
L1
Cache
L1
Cache
Home Cluster
Remote Cluster 2
L2 Cache+Local Dir
L2 Cache+Local Dir’
L2 Cache+Local Dir’
RAC
RAC
RAC
Global Dir
Main
Memory
15
Non-Circular Assume/Guarantee
 We can’t verify this due to state
explosion:
 h ║ r1 ║ r2 ╞ Coh
 Instead
 Check-1: h ║ R1 ║ R2 ╞ Coh1 Λ Guarant1
 Check-2: H ║ r1 ║ R2 ╞ Coh2 Λ Guarant2
16
1.2: We applied the non-circular A/G
method to a Non-Inclusive Hierarchical
Protocol….
 Protocol features
 Broadcast channels
 Non-imprecise local dir
 Verification challenges
 A/G cannot infer local dir from just intraclusters
 Coherence may involve multiple L1 caches
17
Verifying Non-Inclusive Protocols
 Inferring “L2.State = Excl” from
 Outside the cluster
 Inside the cluster
 Use history variables to change noninclusive to inclusive protocols
18
Experimental Results
Protocols
# of States
Mem
(GB)
Model
Check
Hierarchy
> 1,521,900,000
20
No
Abs-1
234,478,105
20
Y
Abs-2
283,124,383
20
Y
Reduction is over 65%
19
1.3: We then tried a “Split Hierarchy Per Level
Approach” to using non-circular A/G
L1
Cache
L1
Cache
ABS #1
L1
Cache
ABS #2
L2 Cache+Local Dir
L1
Cache
L2 Cache+Local Dir
L2 Cache+Local Dir’
L2 Cache+Local Dir’
L2 Cache+Local Dir’
RAC
RAC
RAC
Global
Dir
Main
Memory
ABS #3
20
A Sample Scenario
Remote Cluster 1
Excl
4. Fwd Req_Ex
5. Grant
3. Fwd Req_Ex
Home Cluster
Remote Cluster 2
Invld
1. Req_Ex
6. Grant
2. Fwd Req_Ex
21
Map to Abstracted Protocols
Remote Cluster 1
Invld
Excl
4. Fwd Req_Ex
Remote Cluster 2
5. Grant
1. Req_Ex
6. Grant
3. Fwd Req_Ex
2. Fwd
Req_Ex
22
Experimental Results
Protocols
# of States
Hierarchy > 438,120,000
Inter
Exec time
(sec)
Mem
(GB)
Model
Check
>125,799
18
No
1,500,621
269
2
Y
Intra-1
564,878
48
2
Y
Intra-2
188,842
18
2
Y
Reduction is over 95% !
23
Project 1.4: Automatic Recognition of
Spurious / Real Bugs in these approaches
 Problem statement
 Given an error trace of ABS protocol
 Is it a real bug of the original protocol?
 Solution
 In the original protocol, using BFS to
guide the model checking to match the
error trace
Reason: because our abstraction is just projection
24
Basic Idea of Automatic Recognition
Error trace of
Abs. protocol
Directed BFS of original protocol
v1=0, v2=0, v3=0
v1=0, v2=0
drop
v1=1, v2=2
……
v1=3, v2=1, v3=0
keep
keep
v1=1, v2=2, v3=1
v1=0, v2=0, v3=3
……
……
v1=6, v2=8
25
Y3 Plans for Project 1:
 Considerable Experience Gained
 Three Large Benchmark Protocols (each is 3000+
lines of Murphi Code)
 on the web
 Have Reduced Verif Complexity of Hier Protocols by
90%
 Can Identify Spurious Errors Automatically
 All Finite-state
 Not Parameterized
 No plans for Parameterized
 Y3 Plans: Build Tool to support this methodology
26
Summary of Projects 2 and 3
1.
Three progressively better approaches to verify
hierarchical cache coherence protocols at the interleaving
level
1. A/G
method
of
complementary
abstractions
(FMCAD’06)
2. Extensions to deeper, and non-inclusive hierarchies
(TR 06-014)
3. Latest method that abstracts each level separately (to
be submitted)
4. Error-trace checking (to be submitted)
2. A theory of transaction based design and
verification (writeup finished)
3.
Modular verification of transactions (writeup in progress)
27
Transaction Level HW Modeling
The problem addressed: Bridge the gap
between high-level specifications and RTL
implementations
 Global properties cannot be formally
verified at RTL Level!
 Specifications can be verified, but do they
correctly represent the implementations?
28
Driving Design Benchmark due to
German and Geert Janssen
29
What changes when moving from a
spec to an implementation?
 Atomicity
 Concurrency
 Granularity in modeling
1
client
client
1.1
home
1.3
home
1.2
router
buffer
30
General Mappings between high level
transitions and transactions that help
implement them
High Level Transition 1
1
High Level Transitions take some
non-zero unit of time (conceptual)
Low Level Transitions that help realize 1
1.1
1.2
1.3
Each Low Level Transition takes
One Clock Cycle
31
High-Level and Low-Level Computations
1
1.1
2
3
1.2
1.3
2.1
2.2
3.1
3.3
3.2
32
Specification of High and Low Levels
1
1.1
In Murphi as a Guard  Action Rule
1.2
1.3
In HMurphi as Multiple Guard  Action Rules
enclosed in a Begin Transaction / End Transaction
The Guards Decide when each low level transition can fire
The Maximal Number of Low Level Transitions Enabled
in any state are concurrently fired within each clock tick
33
Transaction
 A transaction is a set of transitions in
Impl that correspond to a transition in
Spec
Transaction
Rule 1
……
Rule n
Endtransaction;
34
Executions
 Spec: interleaving
 One enabled transition fires at each step
1
2
3
……
 Impl: concurrent
 All enabled transitions fire at each step
{1.1, 2.1}
{1.2}
{2.2, 3.1, 3.2}
……
35
A Few Notations
 Observable variables: VH
 These are Variables used in both Spec
and Impl
 Impl has additional internal variables also
 A variable v is inactive at a state s if
all transactions in Impl that can write
to v are quiescent at s
36
A Formal Notion of Simulation
 For every concurrent execution of
Impl, exists an interleaving execution
of Spec, VH ∩ inactive(li) match
l0
h0
{…}
t0
l1
h1
{…}
t1
l2
h2
{…}
t2
……
……
37
Simulation Checks
Guard for Spec
transition must hold
Spec(I)
I
Spec transition
Impl transaction
Spec(I’)
I’
Observable vars
changed by either
Spec or Impl must
match
I is a reachable state
where the commit
guard is true
38
Model Checking Approaches
 Monolithic
 Cross product construction
 Compositional
 Abstraction
 Assume/Guarantee
39
Compositional Approach
 Abstraction
 Change read to an access of an input var
 Self-sourced read
 Add all transitions that write to a var
 Assume/Guarantee
 Require all writes to var guarantee prop P
 Assume P holds on all reads
40
Example of Abstraction
Transaction 1
Transaction 2
……
Transaction
…
Rule (v1 = d1) => ...
…
Endtransaction
Transaction n
41
Example of Assume/Guarantee
Transaction 1: Request granted
State := Excl
…
Impl.State = Spec.State
Data := d
Transaction 2: Update Cache
42
Benchmarks
 High level in FMCAD’04 tutorial
 Low level provided by German and
Janssen
 Sizes:
 1 Home node, 1 remote node
Sizes are constrained by accessible VHDL tools!
43
Implementations
 Muv: HMurphi  VHDL
 Written by German
 Mud:
 Static analyzer for possible conflicts /
dependencies
 VHDL verifier
 IBM RuleBase
44
Preliminary Results
Approaches
# FlipFlops
#
Gates
Time
(min)
Monolithic
212
8574
17
W/W
conflicts
108
5763
11
closures
89
2194
3
Decomposed
* This is for datapath = 1 bit
* Intel Xeon CPU 3.0GHz, 2GB memory
45
When Datapath > 1 bit
 Cannot check monolithic approach
 RuleBase 300 F-F academic license restriction
 Decomposed approach
 W/W checks not affected
Datapath bits
# of F-F
# of Gates
1
89
2194
2
97
2380
26
289
6659
46
Future Work
 Reduce the cost of W/W conflicts
checking
 Localized reasoning
 Apply to pipeline
 More benchmarks
 Try other VHDL tools
 SixthSense etc.
47
Publications, Software, Models










FMCAD 2006 paper
Presentation at Intel
Journal version of hierarchical coherence protocol verification (under
prep)
TR on Theory of Transaction Based Specification and Verification
(under prep)
Detailed VHDL-level German Protocol developed
Analysis Framework for HMurphi Developed
Preliminary Verification Experiments using Cadence IFV, IBM
RuleBase, and IBM SixthSense
Xiaofang Chen’s Summer Internship at IBM T.J. Watson Res. Ctr.
Robert’s SRC Poster
Techcon 2007 submission
 There will be more publications during 2007-8 following hiatus due to
infrastructure build-up (many delays!)
48