Title - Mark R. Tuttle

Protocol Verification with Merci
Mark R. Tuttle and Amit Goel
DTS SCL
Introduction
• I love proof
– Proof is the path to understanding why things work
– But theorem provers are too hard for the masses (even me)
• I advocate model checking at Intel
– It is the path to automated formal verification for the masses
– But model checkers verify without explaining, and don’t scale
• But the world has changed
– Decision procedures and SMT now automate some forms of proof
– Is theorem proving now viable for nonspecialists in product groups?
Slide 2
Our result
• Amit wrote Merci: SMT-based proof checker from SCL
– Systems modeled with guarded commands (like Murphi, TLA+)
– Clean mapping to decision procedures of an SMT solver
• Mark validated a classical distributed algorithm
– A novice: no prior exposure to Merci, little exposure to SMT
– Model done in 3 days, proof done in 3 days, just 9 pages long
– Model looks like ordinary code, invariants explain the algorithm
• Found little need to coach the prover about “obvious” things
Slide 3
Consensus
[Pease, Shostak, Lamport]
nodes
n1
n2
n3
inputs
0
1
0
– Each output was an input
• Agreement:
message
passing
outputs
• Validity:
– All outputs are equal
• Termination:
1
1
1
– All nodes choose an output
Slide 4
A shocking result!
[Fischer, Lynch, Patterson]
• Consensus is impossible in an asynchronous system if
even one node can fail.
– Asynchronous: no bound on node step time, msg delivery time
– Failure: node just stops (crashes)
• A decade of papers
– Different system models, different failure models
– How fast? How few messages? How many failures
• Consensus is the “hardest problem” in concurrency! [Herlihy]
– but sometimes it can be solved…
Slide 5
Synchronous model
Computation is a sequence of rounds of message passing.
nodes
send
messages
nodes
receive
messages
nodes
change
state
node
round r
round r+1
Slide 6
Crash failures
n
n is correct
sends all messages
n crashes!
sends
some
messages
n is silent
sends no messages
At most t nodes can fail.
Slide 7
Algorithm
[Dolev, Strong]
procedure consensus (node n)
state ← { input }
for each round r = 1, 2, …, t+1 do
broadcast state to all nodes
receive state1, state2, …, statek from other nodes
state ← state1 U state2 U … U statek
output ← min(state)
Validity: each output was an input
Termination: all nodes choose an output at end of round t+1
Agreement: ???
Slide 8
Clean round: no nodes fail
[Dwork, Moses]
Clean round!
• There is a clean round in t+1 rounds (at most t failures).
• Nodes have same state after a clean round.
• Nodes choose same output value min(state). Agreement!
Slide 9
Merci
• A typed procedural language
• Guarded commands used to
describe systems
[Amit Goel]
type node
var array(node, bool) y = mk_array[node](false)
var array(node, bool) critical =mk_array[node](false)
var node turn
transition unit req_critical (node n)
require (!y[n])
{ y[n] := true; }
transition unit enter_critical (node n)
require (y[n] && !critical[n] && turn=n)
{ critical[n] := true; }
transition unit exit_critical (node n)
require (critical[n])
{critical[n] := false; y[n] := false; nondet turn;}
Merci
• A typed procedural language
• Guarded commands used to
describe systems
• A goal description language for
compositional reasoning
[Amit Goel]
def bool mutex =
(node n1, node n2)
(critical[n1] && critical[n2] => n1=n2)
def bool aux =
(node n)
(critical[n] => turn=n)
goal g0 = invariant mutex assuming aux
goal g1 = invariant aux
Merci
• A typed procedural language
• Guarded commands used to
describe systems
template <type elem> Set {
type t // set type
const bool mem (elem x, t s)
const t add (elem x, t s)
const t remove (elem x, t s)
axiom mem_add = (elem x, elem y, t s)
(mem (x, add (y, s)) = (x = y || mem (x, s)))
• A goal description language for
compositional reasoning
• A template system for
extending the language
[Amit Goel]
axiom mem_remove = (elem x, elem y, t s)
(mem (x, remove(y, s)) = (x !=y && mem(x, s)))
}
type node
module Node= Set<type node>
Crash failure model
faulty
silent
def bool is_crash_behavior
(Nodes crashed, Nodes crashing, message_pattern deliver) =
 (node p) (p  crashed => is_silent(p,deliver)) &&
 (node p) (is_faulty(p,deliver) => p  crashed || p  crashing) &&
Nodes.disjoint(crashed,crashing) &&
Nodes.cardinality(crashed) + Nodes.cardinality(crashing) ≤ t
Slide 13
Synchronous model
for each node p
initialize state of p
for each round r
for each p and q
send msg from p to q
for each p and q
receive msg from p to q
for each p
update state of p
phase
program
counter
algorithm
init
init[p]
how?
send
send[p][q]
what?
recv
recv[p][q]
how?
comp
comp[p]
how?
decide?
decide!
Slide 14
Synchronous model
• Transitions
– initialize(p)
init[p] ← true
increment round
send[q][p] ← false
recv[p][q] ← false
comp[p] ← fasle
– start_send
– send(p,q)
phase ← send
send[p][q] ← true
– start_recv
– recv(p,q)
phase ← recv
recv[p][q] ← true
is_init_phase =
phase = init
– start_comp
– comp(p)
phase ← comp
comp[p] ← true
init_phase_done =
forall (node p) (init[p])
Slide 15
transition start_sending ()
require ( is_init_phase && init_phase_done ||
is_comp_phase && comp_phase_done)
{
"send[p][q], recv[p][q], comp[p] <= false"
"message[p][q] <= null_message"
round := round + 1;
phase := send;
crashed := Nodes.union(crashed,crashing);
nondet crashing;
nondet deliver;
assume is_crash_behavior(crashed,crash,deliver);
}
Slide 16
transition send (node n, node m)
require (is_send_phase)
require (!send[n][m])
{
messages[n][m] :=
(deliver [n][m] ? global_state[n] : null_message);
send[n][m] := true;
}
Transition size
initialize(p)
8 lines
start_send()
16 lines
send(p,q)
9 lines
start_recv()
5 lines
recv(p,q)
7 lines
start_comp()
5 lines
comp(p)
13 lines
Slide 17
Agreement proof
• Recall the agreement proof
–
–
–
–
A1: There is a clean round
A2: All states are equal at the end of a clean round
A3: All states remain equal after a clean round
A4: All nodes choose from their states the same output value
• Merci proof is short
–
–
–
–
A1: 7 lines
A2: 127 lines
A3: 12 lines
A4: 25 lines
• Merci proof is almost entirely at the algorithmic level
Slide 18
A1: There is a clean round
def bool clean_round_by_round_t_plus_1 =
round >= t+1 => !before_clean
def bool faulty_grows_until_clean_round =
before_clean => Nodes.cardinality(faulty) >= round
goal clean1 = invariant faulty_grows_until_clean_round
goal clean2 = invariant clean_round_by_round_t_plus_1
assuming faulty_grows_until_clean_round
Slide 19
A2: All states equal …
def bool state_equality =
 (node n, node m)
(noncrashed(n) && noncrashed(m) => state[n] = state[m])
def bool state_equality_in_clean =
in_clean && send_phase_done && recv_phase_done =>
state_equality
• Proof
–
–
–
–
A2.1: If nonfaulty n has v, then n received v in a message
A2.2: That message was sent to everyone since round is clean
A2.3: If m received v in a message, then m has v
A2.4: So nonfaulty n and m have the same values
• Proof algorithmic and short: 48, 34, 15, and 30 lines long
Slide 20
Conclusion
• Classical fault-tolerant distributed algorithm proved w/Merci
– Model looks like ordinary code, invariants explain the algorithm
– Merci proof is 170 lines, Classical proof is 1+ page
– Model and proof done in 6 days with no prior experience
• Yices made quantification hard
– exists: usually have to produce the example by hand
– forall: template instantiation wouldn’t find the right instantiation
• Yices counterexamples mostly useless
– Get a context from first few lines, ignore the rest
– “Is property false or is Yices failing to instantiate a forall template?”
– BKM: Think about the algorithm itself, and ignore Yices output
Slide 21