Refinements to techniques for verifying shape analysis invariants in

Refinements to techniques for
verifying shape analysis invariants in
Coq
Kenneth Roe
GBO Presentation
9/30/2013
The Johns Hopkins University
Summary
• Formal Methods for Imperative Languages
Such as C
– Many bugs caused by corruption of data
structures
– Use formal methods to document data structure
invariants and then verify correct program
execution
– Framework being developed in the Coq interactive
theorem prover
Research contribution
• Extension of Coq based program verification
to larger programs with more complex data
structures.
– Existing systems only work on small examples
A tree traversal example in C
Struct list {
struct list *fld_n;
struct tree *fld_t;
};
t = NULL;
} else {
list *tmp = i->n;
t = i->fld_t;
free(l);
i = tmp;
}
} else if (t->r==NULL) {
t = t->fld_l;
} else if (t->l==NULL) {
t = t->fld_r;
} else {
n = i;
i = malloc(
sizeof(struct list));
i->fld_n = n;
x = t->fld_r;
i->fld_t = x;
t = t->fld_l;
}
Struct tree {
struct tree *fld_l, *fld_r;
int value;
};
struct list *p;
void build_pre_order(struct tree *r) {
struct list *i = NULL, *n, *x;
struct tree *t = r;
p = NULL;
while (t) {
n = p;
p = malloc(sizeof(struct list));
p->fld_t = t;
p->fld_n = n;
if (t->fld_l==NULL && t->fld_r==NULL) {
if (i==NULL) {
}
}
What this program does
t
10
12
14
3
r
p
i
1
Nil
Nil
18
2
16
4
20
5
6
What this program does
t
10
12
14
3
r
p
i
1
10
Nil
18
2
16
4
20
5
6
Nil
What this program does
10
12
t
14
3
r
p
i
1
10
18
Nil
Nil
18
2
16
4
20
5
6
What this program does
10
12
t
14
3
r
p
i
1
12
18
5
10
Nil
6
Nil
18
2
16
4
20
What this program does
10
12
t
14
3
r
p
i
1
12
16
5
10
18
6
Nil
Nil
18
2
16
4
20
What this program does
10
12
t
14
3
r
p
i
1
14
16
5
12
18
6
10
Nil
18
2
16
4
20
Nil
What this program does
10
12
14
3
r
p
i
1
14
18
5
12
Nil
6
10
18
2
16
4
t
20
Nil
What this program does
10
12
14
3
r
p
i
1
16
18
5
14
Nil
6
12
18
2
16
4
t
20
10
Nil
What this program does
r
10
12
14
3
16
4
i
16
Nil
t
1
18
2
p
20
5
14
6
12
10
Nil
What this program does
r
10
12
14
3
16
4
i
18
Nil
t
1
18
2
p
20
5
16
6
14
12
10
Nil
What this program does
10
12
14
3
r
p
i
1
18
Nil
18
2
16
4
20
5
16
6
14
t
12
10
Nil
What this program does
10
12
14
3
r
p
i
1
20
Nil
18
2
16
4
20
5
18
6
16
t
14
12
10
Nil
What this program does
10
12
14
3
r
p
i
1
20
Nil
18
2
16
4
20
5
18
6
16
t
14
12
10
Nil
Invariants to be formally proven
• The program maintains two well formed linked lists, the
heads of which are pointed to by i and p.
– By well formed we mean that memory on the heap is properly
allocated for the lists and there are no loops in the data
structures.
•
•
•
•
The program maintains a well formed tree pointed to by r.
t always points to an element in the tree rooted at r.
The two lists and the tree do not share any nodes.
Other than the memory used for the two lists and the tree,
no other heap memory is allocated.
• The fld_t field of every element in both list structures
points to an element in the tree.
Coq Goal
{?}
WHILE not (T == 0)
DO N := P;
NEW P, 2;
...
{?}
Program state
Environment
R=10
I=30
P=40
T=10
e = { R → 10,
I → 20,
P → 30,
T → 10 }
Heap
h = {10 → 12,
11 → 18,
12 → 14,
13 → 16,
14 → 0,
15 → 0,
16 → 0,
17 → 0,
18 → 20,
19 → 0,
20 → 0,
21 → 0,
…}
Program state
Environment
R=10
I=30
P=40
T=10
e = { R → 10,
I → 20,
P → 30,
T → 10 }
Heap
v0=[10,
[12
,[14,[0],[0]],
[16,[0],[0]]],
[18,
[20,[0],[0]],
[0]]]
struct tree {
struct tree *left;
struct tree * right;
}
∃v0 . (e,h) ⊨ TREE(R,v0,2,[0,1])
h = {10 → 12,
11 → 18,
12 → 14,
13 → 16,
14 → 0,
15 → 0,
16 → 0,
17 → 0,
18 → 20,
19 → 0,
20 → 0,
21 → 0}
Separation logic
h={10 → 12,11 → 18,12 → 14,13 → 16,14 → 0,15 → 0,16 → 0,17 → 0,18 → 20,
19→ 0,20 → 0,21 → 0,30 → 32,31 → 10,32 → 0,33 → 12,40 → 42,41 → 14,
42 → 44,43 → 12,44 → 0,45 → 10}
(e,h) ⊨ ∃v0 v1 v2 TREE(R,v0,2,[0,1]) *
TREE(I,v1,2[0]) * TREE(P,v2,2,[0])
Separation logic
(e, h) ⊨ s1 ∗ s2
if and only if
∃h′ , h′′ .
(e,h′) ⊨ s1 ⋀ (e,h′′) ⊨ s2 ⋀
dom(h1)∩dom(h2)=∅ ⋀
h=h′ ∪ h′′
Data structure relationships
(e,h) ⊨ ∃v0 v1 v2 TREE(R,v0,2,[0,1]) *
TREE(I,v1,2,[0]) * TREE(P,v2,2,[0])
Data structure relationships
(e,h) ⊨ ∃v0 v1 v2 TREE(R,v0,2,[0,1]) *
TREE(I,v1,2[0]) * TREE(P,v2,2,[0]) *
∀ v3 ∈ TreeRecords(v1).
[nth(find(v1,v3),2) inTree v0]
Data structure relationships
(e,h) ⊨ ∃v0 v1 v2 TREE(R,v0,2,[0,1]) *
TREE(I,v1,2[0]) * TREE(P,v2,2,[0]) *
∀ v3 ∈ TreeRecords(v1). [nth(find(v1,v3),2) inTree v0] *
∀ v3 ∈ TreeRecords(v2). [nth(find(v2,v3),2) inTree v0]
Data structure relationships
T→
(e,h) ⊨ ∃v0 v1 v2 TREE(R,v0,2,[0,1]) *
TREE(I,v1,2[0]) * TREE(P,v2,2,[0]) *
∀ v3 ∈ TreeRecords(v1). [nth(find(v1,v3),2) inTree v0] *
∀ v3 ∈ TreeRecords(v2). [nth(find(v2,v3),2) inTree v0] *
[T = 0∨T inTree v0]
Deep model
• Process
– Create data structure for predicates
– Write semantic interpretation function
– Write customized tactics
• Advantage: greater flexibility in designing tactics
– Tactics can be any function that transforms the data
structure
– Tactic is proven correct once and used for all
verifications
Summary of tactics
• Forward propagation
• Fold/unfold
• Merge
– Works by pairing off identical pieces
• Simplify
• State implications
– Also works by pairing off
Results (so far)
• Tree traversal
–
–
–
–
–
Code size:
~30 lines
Invariant size: ~10 lines
Proof check time: ~5 minutes
Main proof size: ~220 lines
Status: top level complete, lemmas need to be proven
• DPLL (A decision procedure for sentential
satisfiability)
– Code size: ~200 lines
– Invariant size: ~52 lines
– Status: Proof incomplete
Research contributions
• Extension of Coq separation logic reasoning to larger programs with
more complex data structures.
• Creation of a library of useful predicates, functions and tactics
– Deep model allows greater control over the design of tactics
• Key challenge: Performance tuning
– Tradeoffs between performance and automation
• Development of a powerful simplification tactic
– Simplification tactic executed after every major proof step
– Based on term rewriting (with contextual rewriting) concepts
– Automates reasoning about associativity, communtivity and other
simple property classes
• Design decisions in creating canonical form addressed
Related work
Proposed work for PhD
• Finish DPLL verification
• Prove all underlying theorems
• Create improved presentation framework for
the environment
Deep model
• Consider proving:
a+c = a+b+c−b
Deep Model
type expr =
| Const int
| Var id
| Plus expr×expr
| Minus expr×expr
| Times expr×expr
Fixpoint eval (env : id → nat) (e : expr) :=
match e with
|Const c =⇒ c
|Var v =⇒ env v
|Plus e1 e2 =⇒ (eval env e1) + (eval env e2)
|Minus e1 e2 =⇒ (eval env e1) − (eval env e2)
|Times e1 e2 =⇒ (eval env e1) ∗ (eval env e2)
Definition simplify e := …
Theorem ∀env.∀e. eval env (simplify e) = eval env e
Deep Model
Prove the following:
Plus a b =
simplify (Plus a (Plus b (Plus (Minus c b)))
Deep Model
• Parameterized predicate data types
type expr =
| Const int
| Var id
| Fun id × list expr
Verification of initialization
{∃v0 .TREE(R, v0 , 2, [0,1])}
T := R;
I := 0;
P := 0;
{∃v0 .∃v1 .∃v2 .
TREE(R, v0 , 2, [0,1])∗ TREE(I, v1 , 2, [0])∗
TREE(P, v2 , 2, [0])∗
∀ v3 ∈ TreeRecords(v1).[nth(find(v1,v3),2) inTree v0]∗
∀ v3 ∈ TreeRecords(v2).[nth(find(v2,v3),2) inTree v0]∗
[T = 0∨T inTree v0]}
Verification of initialization
{∃v0 .TREE(R, v0 , 2, [0,1])}
T := R;
I := 0;
P := 0;
{?1234}
Verification of initialization
{∃v0 .TREE(R, v0 , 2, [0,1])}
T := R;
I := 0;
P := 0;
{?1234}
Verification of initialization
{∃v0 .TREE(R, v0 , 2, [0,1]) * [T=R]}
I := 0;
P := 0;
{?1234}
Verification of initialization
{∃v0 .TREE(R, v0 , 2, [0,1]) * [T=R]}
I := 0;
P := 0;
{?1234}
Verification of initialization
{∃v0 .TREE(R, v0 , 2, [0,1]) * [T=R] * [I = 0]}
P := 0;
{?1234}
Verification of initialization
{∃v0 .TREE(R, v0 , 2, [0,1]) * [T=R] * [I = 0]}
P := 0;
{?1234}
Verification of initialization
∃v0 .TREE(R, v0 , 2, [0,1]) * [T=R] * [I = 0] *
[P = 0] ->
?1234
Verification of initialization
∃v0 .TREE(R, v0 , 2, [0,1]) * [T=R] * [I = 0] *
[P = 0] ->
?1234
?1234 =
∃v0 .TREE(R, v0 , 2, [0,1]) * [T=R] * [I = 0] *
[P = 0]
Verification of initialization
?1234 →
∃v0 .∃v1 .∃v2 .
TREE(R, v0 , 2, [0,1])∗ TREE(I, v1 , 2, [0])∗ TREE(P, v2
, 2, [0])∗
∀ v3 ∈ TreeRecords(v1).
[nth(find(v1,v3),2) inTree v0]∗
∀ v3 ∈ TreeRecords(v2).
[nth(find(v2,v3),2) inTree v0]∗
[T = 0 ∨ T inTree v0]
Verification of initialization
∃v0 .TREE(R, v0 , 2, [0,1]) * [T=R] * [I = 0] *
[P = 0] →
∃v0 .∃v1 .∃v2 .
TREE(R, v0 , 2, [0,1])∗ TREE(I, v1 , 2, [0])∗ TREE(P, v2
, 2, [0])∗
∀ v3 ∈ TreeRecords(v1).
[nth(find(v1,v3),2) inTree v0]∗
∀ v3 ∈ TreeRecords(v2).
[nth(find(v2,v3),2) inTree v0]∗
[T = 0 ∨ T inTree v0]
Unfold example
{∃v0∃v1∃v2[Tmp l = 0]∗[l /= 0]∗[tmp r = 0]∗
[Tmp r = 0 ∨ Tmp r ∈ TreeRecords(v0)]∗
[nth(nth(find(v0,T)),2),0) = (Tmp r)]∗
[nth(nth(find(v0 , T )), 1), 0) = 0]∗
[T ∈ TreeRecords(v0)]∗
P + 0 → N ∗ P + 1 → T ∗ [T /= 0]∗
TREE(R, v0 , 2, [0,1]) ∗ TREE(I, v1 , 2, [0]) * TREE(N, v2 , 2, [0])∗
∀ v3 ∈ TreeRecords(v1). [nth(find(v1,v3),2) inTree v0]∗
∀ v3 ∈ TreeRecords(v2). [nth(find(v2,v3),2) inTree v0]} ∗
T := ∗(I+1);
…
{?1234}
Unfold example
∃v0 ∃v1 ∃v2 ∃v3 ∃v4
I + 1 → v1 ∗ I → nth(v0, 0) ∗ TREE(nth(v0, 0), nth([I, v0, v1], 1), 2, [0])
[Tmp r = 0 ∨ Tmp r ∈ TreeRecords(v0)]∗
[nth(nth(find(v2,T)),2),0) = (Tmp r)]∗ [nth(nth(find(v2 , T )), 1), 0) = 0]∗
[T ∈ TreeRecords(v2)]∗ P+0→N∗P+1→ T∗[T /= 0]∗
TREE(I, v1 , 2, [0]) *
[I + 1 → v1 ∗ I → nth(v0, 0)] ∗ TREE(nth(v0, 0), nth([I, v0, v1], 1), 2, [0]) ∗
TREE(R, v2 , 2, [0,1]) ∗ Empty * TREE(N,v4,2,[0]) *
∀ v5 ∈ TreeRecords([I,v0,v1]). [nth(find([I,v0,v1)]),v5),2) inTree v2]∗
∀ v5 ∈ TreeRecords(v4). [nth(find(v4,v5),2) inTree v2]}
T := ∗(I+1);
...
{?1234}
Unfold example
∃v0 ∃v1 ∃v2 ∃v3 ∃v4 ∃v5
[T = v2] ∗
I + 1 → v2 ∗ I → nth(v1, 0) ∗
TREE(nth(v1, 0), nth([I, v1, v2], 1), 2, [0])∗
[Tmp r = 0 ∨ Tmp r ∈ TreeRecords(v1)]∗
[nth(nth(find(v3,T)),2),0) = (Tmp r)]∗
[nth(nth(find(v3 ,T )),1), 0) = 0]∗
[T ∈ TreeRecords(v3)]∗P + 0 → N ∗ P + 1 → T ∗ [T ̸= 0]∗
TREE(R, v3 , 2, [0,1]) ∗ Empty ∗ TREE(N, v5 , 2, [0])∗
∀ v6 ∈ TreeRecords([I,v1,v2]).
[nth(find([I,v1,v2],v6),2) inTree v3]∗
∀ v6 ∈ TreeRecords(v5). [nth(find(v5,v6),2) inTree v3]} {?1234}
DPLL
Efficient SAT solving algorithm for CNF
expressions such
as:
(A v ~B v C) ^
(A v ~C v D)
Data structure
#define VAR_COUNT 4
char assignments[VAR_COUNT];
struct clause {
struct clause *next;
char positive_lit[VAR_COUNT];
char negative_lit[VAR_COUNT];
char watch_var[VAR_COUNT];
struct clause *watch_next[VAR_COUNT];
struct clause *watch_prev[VAR_COUNT];
} *clauses;
struct clause *watches[VAR_COUNT];
struct assignments_to_do {
struct assignments_to_do *next, *prev;
int var;
char value;
int unit_prop;
} *assignments_to_do_head, *assignments_to_do_tail;
struct assignment_stack {
struct assignment_stack *next;
int var;
char value;
char unit_prop;
} *stack;
DPLL Data structure diagram
DPLL invariant
The first part of the invariant are spacial constructs asserting the two arrays and
three dynamic data structures in the heap. ARRAY(root, count, functional_representation)
is a spatial predicate for arrays. The functional representation is a list of the
elements.
AbsExistsT v0 . AbsExistsT v1 . AbsExists v2 . AbsExistsT v3. AbsExistsT v4.
TREE(clauses,v0,sizeof_clause,[next_offset])) *
TREE(assignments_to_do_head,v1,sizeof_assignment_stack,[next_offset]) *
TREE(stack,v2,sizeof_assignment_stack,[next_offset]) *
ARRAY(assignments,var_count,v3) * ARRAY(watches,var_count,v4) *
Next, we add on two assertions that guarantee that both the assignment_stack v2
and assignment array v3 are consistent. We use (a,b)--->c as an abbreviation for
nth(find(a,b),c).
(AbsAll v5 in TreeRecords(v2) . ([nth(v3,(v2,v5)-->stack_var_offset)==(v2,v5)-->stack_val_offset])) *
(AbsAll v5 in range(0,var_count-1) . ([nth(v3,v5)==0] *\/* AbsExists v6 in (TreeRecords(v2)) .
([((v2,v6)-->stack_var_offset==v5 /\ (v2,v6)-->stack_val_offset==nth(v3,v5))]) )) *
…