Constraint-based Program Reasoning with Heaps and Separation

Constraint-based Program Reasoning with Heaps
and Separation
Gregory J. Duck, Joxan Jaffar, and Nicolas C. H. Koh
National University of Singapore
Motivation
Reasoning about Heaps is hard
I
{x = 4, y = 4}
x = x + 1 {x = 5, y = 4}
I
{∗x = 4, ∗y = 4}
I
Need to reason about x = y (aliasing) at verification time
I
Much harder when
(a) using malloc() and free()
(b) considering recursive data structures
*x = *x + 1 {∗x = 5, ∗y =???}
Contribution
I
I
A basis for automatic reasoning of
heap-manipulating programs
Method: Symbolic Execution
Background
I
Naive Approach: heap-as-a-giant-array
+ Constraint based (use an array constraint-solver)
− tedious to express ”separation”, e.g.
list(H, l, F 1) ∧ tree(H, t, F 2) ∧ disjoint(F 1, F 2)
− Difficult to solve
I
Separation Logic was a breakthrough
+ supports local reasoning (Hoare triples, Frame Rule)
+ Separation is explicated at the top-level on ONE heap:
list(l) ∗ tree(t)
− Defined as a logic
No general algorithms (sym. execution, solving)
I
Our Contribution:
Reformulate Separation Logic into a constraint language:
+ Constraint-based (like arrays)
+ Representation of multilple heaps, and separation
+ Program reasoning becomes constraint solving; have a solver
Part I
A Constraint Language for Heaps
Heaps
I
DEFINITION: A heap is a finite partial map between integers
Heaps = Values *fin Values
I
DEFINITION (alternative): A heap h is a set of tuples such that
∀p, v , w : (p, v ) ∈ h ∧ (p, w ) ∈ w =⇒ v = w
H-Language
I
DEFINITION: H is a first-order language over the Heaps.
1. (Empty Heap):
∅
def
=
a Heap with no elements
2. (Singleton Heap):
p 7→ v
def
=
a Heap with exactly one element (p, v )
3. (Separation)
(
(H l H1 ∗ . . . ∗ Hn )
I
def
=
NOTE: (l) 6≡ (=)
(l is partial equality w.r.t. (∗))
Heaps H1 , .., Hn are separate/disjoint
H = H1 ∪ .. ∪ Hn as sets.
Program Reasoning with H
I
DEFINE: H̄ ∈ Heaps as the Program Heap
I
Standard memory operations can be mapped to H:
C Syntax
H Encoding
v = p[0];
p = malloc(1);
free(p);
∃H : H̄ l (p 7→ v )∗H
∃H, v : H̄ l (p 7→ v )∗H
∃H, v : H l((p 7→ v )∗H̄
H l (p 7→ w )∗H 0
∃H, H 0 , w :
H̄ l (p 7→ v )∗H 0
p[0] = v;
Hoare Triples
I
We use Hoare Triples to reason about programs.
{Precondition} Code {Postcondition}
where
1. Precondition = proposition about the initial state
2. Code = code/program fragment
3. Postcondition = proposition about the final state
I
EXAMPLES:
{x < y } x := x + 1 {x ≤ y }
..
.
Note: (Hoare Logic is traditionally heap-free.)
X
Hoare Triples (cont.)
I
Access:
{φ} x := [y ] {∃x 0 , H 0 : H̄ l (y 7→ x)∗H 0 ∧ φ[x 0 /x]}
I
Assignment:
{φ} [x] := y {∃H 0 , H 00 , v : ∧
I
H 0 l (x 7→ v )∗H 00
∧ φ[H 0 /H̄]}
H̄ l (x 7→ y )∗H 00
Allocation:
{φ} x := alloc(1) {∃x 0 , v , H 0 : H̄ l (x 7→ v )∗H 0 ∧ φ[H 0 /H̄, x 0 /x]}
I
Deallocation:
{φ} free(x) {∃H 0 , v : H 0 l (x 7→ v )∗H̄ ∧ φ[H 0 /H̄]}
Symbol Execution with H
I
Hoare triples are in “Strongest Post Condition” (SPC) form
∀φ : {φ} Code {SPC (Code, φ)}
I
SPC =⇒ Automation via Symbolic Execution.
PROVE:
{P} Code {Q}
STEPS:
1. Use Hoare rules to compute SPC (Code, P);
2. Prove (via a theorem prover) that
SPC (Code, P) → Q
3. QED
Symbolic Execution with H (cont.)
I
EXAMPLE: {H l H̄} x := alloc(); free(x) {H l H̄}
I
Use Symbolic Execution to compute the SPC:
(1)
{H l H̄} x := alloc(); free(x)
x := alloc(); {H l H0 ∧ H̄ l (x 7→ )∗H0 }
free(x)
x := alloc(); free(x) {H l H0 ∧ H1 l (x 7→ )∗H0 ∧ H1 l (x 7→ )∗H̄}
Since
H l H0 ∧ H1 l (x 7→ )∗H0 ∧ H1 l (x 7→ )∗H̄ → H l H̄
I
Triple (1) holds; QED
...but how to prove (2)?
(2)
Part II
A Solver for H
A Solver for H
I
Symbolic Execution generates Verification Conditions of the form
SPC (C , P) → Q, e.g.:
H l H0 ∧ H1 l (x 7→ )∗H0 ∧ H1 l (x 7→ )∗H̄ → H l H̄
holds iff
H l H0 ∧ H1 l (x 7→ )∗H0 ∧ H1 l (x 7→ )∗H̄ ∧ H 6l H̄
is UNSAT.
I
Approach:
I
I
I
STEP 1: Normalization
STEP 2: Constraint solver (hsolve) for flat H-formulae
STEP 3: DPLL(hsolve) for the Boolean structure.
STEP 1: Normalization
I
I
W.l.o.g. we can restrict H to three basic constraints:
Description
Constraint
(Heap Empty)
(Heap Singleton)
(Heap Separation)
Hl∅
H l (p 7→ v )
H l H1 ∗H2
THEOREM: We can normalize arbitrary H-formulae to these basic
constraints
H l H0 ∧ H1 l (x 7→ )∗H0 ∧ H1 l (x 7→ )∗H̄ ∧ H 6l H̄
↓
T1 l ∅ ∧ H l H0 ∗T1 ∧ T2 l (x 7→ ) ∧ H1 l T2 ∗H0 ∧ T3 l (x 7→ ) ∧ H1 l T3 ∗H̄ ∧
(T4 l (s 7→ t) ∧ T5 l (s 7→ u) ∧ H l T4 ∗T6 ∧ H̄ l T5 ∗T7 ∧ t 6= u ∨
H l T8 ∗T9 ∧ H̄ l T8 ∗T10 ∧ T11 l T9 ∗T10 ∧ T12 l (x 7→ y ) ∧ T11 l T12 ∗T13 )
STEP 1: Normalization (cont.)
PROOF: H Normalization Rules (see paper)
0
0
H l E1 ∗ E2 ∗ S −→ H l E1 ∗ E2 ∧ H l H ∗ S
0
0
H l E1 ∗ E2 −→ H l E1 ∧ H l H ∗ E2
(E1 non-variable)
0
0
(E2 non-variable)
0
0
H l H1 ∗ E2 −→ H l E2 ∧ H l H1 ∗ H
H1 l H2 −→ H l ∅ ∧ H1 l H2 ∗ H
(
E1 l (s 7→ t) ∗ H10 ∧ E2 l (s 7→ u) ∗ H20
H 6l E1 ∗ E2 ∗ S −→ ∨
H 0 l E1 ∗ E2 ∧ H 6l H 0 ∗ S
0
0
H 6l E1 ∗ E2 −→ H l E1 ∧ H 6l H ∗ E2
(E1 non-variable)
0
(E2 non-variable)
H 6l H1 ∗ E2 −→ H l E2 ∧ H 6l H1 ∗ H
0
0
H 6l ∅ −→ H l (s 7→ t) ∗ H
(
H l∅
H 6l (p 7→ v ) −→ ∨
H l (s 7→ t) ∗ H 0 ∧ (p 6= s ∨ v 6= t)
(
H1 l (s 7→ t) ∗ H10 ∧ H2 l (s 7→ u) ∗ H20
H 6l H1 ∗ H2 −→ ∨
H 0 l H1 ∗ H2 ∧ H 6l H 0
(
H1 l (s 7→ t) ∗ H10 ∧ H2 l (s 7→ u) ∗ H20 ∧ t 6= u
H1 6l H2 −→ ∨
H1 l I ∗ H10 ∧ H2 l I ∗ H20 ∧ H 0 l H10 ∗ H20 ∧ H 0 6l ∅
STEP 2: H-Solver for Flat Constraints
I
Basic idea: propagate heap membership constraints; define:
in(H, p, v )
I
def
=
(p, v ) ∈ H
Heap membership propagation rules:
I
Functional Dependency:
in(H, p, v ) ∧ in(H, p, w ) =⇒ v = w
I
Empty Heap:
H l ∅ ∧ in(H, p, v ) =⇒ false
I
Singleton Heap:
H l (p 7→ v ) =⇒ in(H, p, v )
H l (p 7→ v ) ∧ in(H, q, w ) =⇒ p = q ∧ v = w
STEP 2: H-Solver (cont.)
I
Separation:
H l H1 ∗H2 ∧ in(H, p, v ) =⇒ in(H1 , p, v ) ∨ in(H2 , p, v )
H l H1 ∗H2 ∧ in(H1 , p, v ) =⇒ in(H, p, v )
H l H1 ∗H2 ∧ in(H2 , p, v ) =⇒ in(H, p, v )
H l H1 ∗H2 ∧ in(H1 , p, v ) ∧ in(H2 , q, w ) =⇒ p 6= q
I
H-Solver Algorithm (hsolve) = Constraint Handling Rules with
Disjunction
“Given a constraint store S, repeatedly apply propagation
rules until a fixed point is reached.”
Disjunction is handled by branching and backtracking.
STEP 2: H-Solver Algorithm
in(H, p, v ) ∧ in(H, p, w ) =⇒ v = w
H l ∅ ∧ in(H, p, v ) =⇒ false
H l (p 7→ v ) =⇒ in(H, p, v )
H l (p 7→ v ) ∧ in(H, q, w ) =⇒ p = q ∧ v = w
H l H1 ∗H2 ∧ in(H, p, v ) =⇒ in(H1 , p, v ) ∨ in(H2 , p, v )
H l H1 ∗H2 ∧ in(H1 , p, v ) =⇒ in(H, p, v )
H l H1 ∗H2 ∧ in(H2 , p, v ) =⇒ in(H, p, v )
H l H1 ∗H2 ∧ in(H1 , p, v ) ∧ in(H2 , q, w ) =⇒ p 6= q
H l (p 7→ v ) , H l I ∗J, J l (p 7→ w ), v 6= w
H l (p 7→ v ), H l I ∗J, J l (p 7→ w ) , v 6= w , in(H, p, v )
H l (p 7→ v ), H l I ∗J , J l (p 7→ w ), v 6= w , in(H, p, v ), in(J, p, w )
H l (p 7→ v ), H l I ∗J, J l (p 7→ w ), v 6= w , in(H, p, v ) , in(J, p, w ), in(H, p, w )
H l (p 7→ v ), H l I ∗J, J l (p 7→ w ), v 6= w , in(H, p, v ), in(J, p, w ), v = w
false
∴ Goal is UNSAT.
STEP 2: Main H-Solver Results
Theorem (Soundness)
The H-Solver is sound.
Proof: By the correctness of the CHR rules.
Theorem (Completeness)
The H-Solver is complete.1
Proof: (see paper)
1. Assumes complete equality theory
STEP 3: DPLL(hsolve)
I
DPLL(hsolve) for non-conjunctive goals, e.g.
T1 l ∅ ∧ H l H0 ∗T1 ∧ T2 l (x 7→ ) ∧ H1 l T2 ∗H0 ∧ T3 l (x 7→ ) ∧ H1 l T3 ∗H̄ ∧
(T4 l (s 7→ t) ∧ T5 l (s 7→ u) ∧ H l T4 ∗T6 ∧ H̄ l T5 ∗T7 ∧ t 6= u ∨
H l T8 ∗T9 ∧ H̄ l T8 ∗T10 ∧ T11 l T9 ∗T10 ∧ T12 l (x 7→ y ) ∧ T11 l T12 ∗T13 )
↓
b1 ∧ b2 ∧ b3 ∧ b4 ∧ b5 ∧ b6 ∧ (b7 ∧ b8 ∧ b9 ∧ b10 ∧ ¬b11 ∨ b12 ∧ b13 ∧ b14 ) ∧
b1 ↔ T1 l ∅ ∧ b2 ↔ H l H0 ∗T1 ∧ b3 ↔ T2 l (x 7→ ) ∧ b4 ↔ H1 l T2 ∗H0 ∧
b5 ↔ T3 l (x 7→ ) ∧ b6 ↔ H1 l T3 ∗H̄ ∧ b7 ↔ T4 l (s 7→ t) ∧ b8 ↔ T5 l (s 7→ u) ∧
b9 ↔ H l T4 ∗T6 ∧ b10 ↔ H̄ l T5 ∗T7 ∧ b11 ↔ t = u ∧ b12 ↔ T11 l T9 ∗T10 ∧
b13 ↔ T12 l (x 7→ y ) ∧ b14 ↔ T11 l T12 ∗T13
I
DPLL(H) implemented in Satisfiability Modulo Constraint Handling
Rules (SMCHR).
Details/Download:
http://www.comp.nus.edu.sg/~gregory/smchr.html
STEP 3: DPLL(hsolve) (cont.)
I
EXAMPLE (complete):
$ ./smchr -s heaps,linear,eq
> emp(T 1) /\ sep(H, H 0, T 1) /\ one(T 2, x, v0) /\
sep(H 1, T 2, H 0) /\ one(T 3, x, v1) /\ sep(H 1, T 3, Heap) /\
((one(T 4, s, t) /\ one(T 5, s, u) /\ sep(H, T 4, T 6) /\
sep(Heap, T 5, T 7) /\ t != u) \ /
(sep(H, T 8, T 9) /\ sep(Heap, T 8, T 10) /\ sep(T 11, T 9, T 10) /\
one(T 12, x, y) /\ sep(T 11, T 12, T 13)))
UNSAT
Therefore:
T1 l ∅ ∧ H l H0 ∗T1 ∧ T2 l (x 7→ ) ∧ H1 l T2 ∗H0 ∧ T3 l (x 7→ ) ∧ H1 l T3 ∗H̄ ∧
(T4 l (s 7→ t) ∧ T5 l (s 7→ u) ∧ H l T4 ∗T6 ∧ H̄ l T5 ∗T7 ∧ t 6= u ∨
H l T8 ∗T9 ∧ H̄ l T8 ∗T10 ∧ T11 l T9 ∗T10 ∧ T12 l (x 7→ y ) ∧ T11 l T12 ∗T13 )
is UNSAT. Therefore:
H l H0 ∧ H1 l (x 7→ )∗H0 ∧ H1 l (x 7→ )∗H̄ → H l H̄
is VALID.Therefore:
{H l H̄} x := alloc(); free(x) {H l H̄}
Part III
Experiments
Experimental Results
I
BENCHMARKS:
1.
2.
3.
4.
5.
6.
7.
subsets N - sum-of-subsets
expr N - expression evaluation
stack N - stack
filter N - TCP/IP filtering
sort N - Bubblesort
search234 N - 234-tree search
insert234 N - 234-tree insert
TRIPLES:
(F)
(OP)
(A)
(∅)
{H̄ l (p 7→ v )∗F } C {∃F 0 : H̄ l (p 7→ v )∗F 0 }
{H l H̄} C {H OP H̄}
{. . .} C {∃F 0 , v : H̄ l (p 7→ v )∗F 0 }
{H̄ l ∅} C {false}
where OP ∈ {v, w, l}
I
We compare SMCHR(H) vs. Verifast (Separation Logic).
Experimental Results (cont.)
Bench.
subsets 16
expr 2
stack 80
filter 1
filter 2
sort 6
search234 3
search234 5
insert234 5
expr 2
stack 80
filter 2
stack 80
insert234 5
subsets 16
Safety
F
F
F
F
F
F
F
F
F
v
v
OP
A
A
∅
LOC
50
69
976
192
321
178
251
399
839
69
976
321
976
839
50
type
rwrwrwa
r-r-rwr-r-rwa
rwrwa
r-rwa
rwa
rw-
Heaps
time(s)
#bt
0.00
17
0.05
124
8.66
320
0.03
80
0.11
307
0.03
54
0.02
46
0.05
76
1.19
120
0.20
1329
8.07
322
0.00
2
8.90
320
1.50
60
0.00
33
Verifast
time(s)
#forks
10.69
65546
18.38
136216
68.20
9963
0.75
8134
–
–
2.66
35909
0.67
1459
90.65
118099
52.87
36885
n.a.
n.a.
n.a.
n.a.
n.a.
n.a.
65.68
9801
40.64
55423
n.a.
n.a.
Experimental Results (cont.)
I
RESULTS:
1. Interpolation: Constraint-based approach allows for search-space
pruing a la no-good learning/interpolation.
2. Expressivity: E.g. the (heap equivalence) triple:
{H l H̄} C {H l H̄}
cannot be directly expressed in Verifast/Separation Logic.
Conclusion
I
Constraint language for expressive formulas about heaps
I
Can conjoin with recursive definitions of heaps and other variables
I
By symbolic execution, provided a basis for automatic program
verification
I
Practical solver because of reduction into simple set-based reasoning