Formalizing C in Coq

Formalizing C in Coq
Robbert Krebbers
ICIS, Radboud University Nijmegen, The Netherlands
October 24, 2014 @ Princeton University, USA
1
What is this program supposed to do?
The C quiz, question 1
int main() {
int x;
int y = (x = 3) + (x = 4);
printf("x=%d,y=%d\n", x, y);
}
2
What is this program supposed to do?
The C quiz, question 1
int main() {
int x;
int y = (x = 3) + (x = 4);
printf("x=%d,y=%d\n", x, y);
}
Let us try some compilers
I
Clang prints x=4,y=7, seems just left-right
2
What is this program supposed to do?
The C quiz, question 1
int main() {
int x;
int y = (x = 3) + (x = 4);
printf("x=%d,y=%d\n", x, y);
}
Let us try some compilers
I
Clang prints x=4,y=7, seems just left-right
I
GCC prints x=4,y=8, does not correspond to any order
2
What is this program supposed to do?
The C quiz, question 1
int main() {
int x;
int y = (x = 3) + (x = 4);
printf("x=%d,y=%d\n", x, y);
}
Let us try some compilers
I
Clang prints x=4,y=7, seems just left-right
I
GCC prints x=4,y=8, does not correspond to any order
This program violates the sequence point restriction
I
due to two unsequenced writes to x
I
resulting in undefined behavior
I
thus both compilers are right
2
Underspecification in C11
I
Unspecified behavior: two or more behaviors are allowed
For example: order of evaluation in expressions
(+57 more)
I
Implementation defined behavior: like unspecified
behavior, but the compiler has to document its choice
For example: size and endianness of integers
(+118 more)
I
Undefined behavior: the standard imposes no requirements
at all, the program is even allowed to crash
For example: dereferencing a NULL or dangling pointer, signed
integer overflow, . . .
(+201 more)
3
Underspecification in C11
I
Unspecified behavior: two or more behaviors are allowed
For example: order of evaluation in expressions
(+57 more)
Non-determinism
I
Implementation defined behavior: like unspecified
behavior, but the compiler has to document its choice
For example: size and endianness of integers
(+118 more)
Parametrization
I
Undefined behavior: the standard imposes no requirements
at all, the program is even allowed to crash
For example: dereferencing a NULL or dangling pointer, signed
integer overflow, . . .
(+201 more)
No semantics/crash state
3
Why does C use underspecification that heavily?
Pros for optimizing compilers:
I
More optimizations are possible
I
High run-time efficiency
I
Easy to support multiple architectures
4
Why does C use underspecification that heavily?
Pros for optimizing compilers:
I
More optimizations are possible
I
High run-time efficiency
I
Easy to support multiple architectures
Cons for programmers/formal methods people:
I
Portability and maintenance problems
I
Hard to capture precisely in a semantics
I
Hard to formally reason about
4
Why does C use underspecification that heavily?
Pros for optimizing compilers:
I
More optimizations are possible
I
High run-time efficiency
I
Easy to support multiple architectures
Cons Challenges for programmers/formal methods people:
I
Portability and maintenance problems
I
Hard to capture precisely in a semantics
I
Hard to formally reason about
4
Approaches to underspecification
CompCert (Leroy et al.) /
VST (Appel et al.)
I
Main goal: verification of/w.r.t. CompCert compiler in
I
Specific choices for unspecified/impl-defined behavior
For example: 32-bits ints
I
Describes some undefined behavior
Undefined: dereferencing NULL, . . .
But still defined: integer overflow, aliasing violations, . . .
I
VST separation logic proofs specific to CompCert
Coq
Formalin (Krebbers & Wiedijk)
I
Main goal: compiler independent C11 semantics in
Coq
I
Describes all unspecified and undefined behavior
I
Describes some implementation-defined behavior
For example: no legacy architectures with 1s’ complement
5
The Formalin project (Krebbers & Wiedijk,
OCaml part
)
Coq part
CH2 O core
syntax
Type
judgment
Subject red.
and progress
CH2 O
operational
semantics
= translation
= proof
Executable structured memory model
6
The Formalin project (Krebbers & Wiedijk,
OCaml part
.c file
CIL
abstract
syntax
)
Coq part
CH2 O
abstract
syntax
CH2 O core
syntax
Type
soundness
Type
judgment
Subject red.
and progress
CH2 O
operational
semantics
= translation
= proof
Executable structured memory model
6
The Formalin project (Krebbers & Wiedijk,
OCaml part
.c file
CIL
abstract
syntax
)
Coq part
CH2 O
abstract
syntax
Stream of
finite sets
of states
CH2 O core
syntax
Soundness and
completeness
Type
soundness
Type
judgment
Subject red.
and progress
CH2 O
operational
semantics
= translation
= proof
Executable structured memory model
6
The Formalin project (Krebbers & Wiedijk,
OCaml part
.c file
CIL
abstract
syntax
)
Coq part
CH2 O
abstract
syntax
Stream of
finite sets
of states
CH2 O core
syntax
Soundness and
completeness
Type
soundness
Type
judgment
Subject red.
and progress
CH2 O
operational
semantics
Soundness
Separation
logic
= translation
= proof
Executable structured memory model
6
The Formalin project (Krebbers & Wiedijk,
OCaml part
.c file
CIL
abstract
syntax
)
Coq part
CH2 O
abstract
syntax
Stream of
finite sets
of states
CH2 O core
syntax
Soundness and
completeness
Type
soundness
Type
judgment
Subject red.
and progress
Preservation and
composition
Refinement
judgment
CH2 O
operational
semantics
Soundness
Separation
logic
= translation
= proof
Executable structured memory model
6
Non-local control flow and block scope variables
The C quiz, question 2
int *p = NULL;
l: if (p) {
return (*p);
} else {
int j = 17;
p = &j;
goto l;
}
7
Non-local control flow and block scope variables
The C quiz, question 2
int *p = NULL;
l: if (p) {
return (*p);
} else {
int j = 17;
p = &j;
goto l;
}
memory:
p
NULL
7
Non-local control flow and block scope variables
The C quiz, question 2
int *p = NULL;
l: if (p) {
return (*p);
} else {
int j = 17;
p = &j;
goto l;
}
memory:
p
NULL
7
Non-local control flow and block scope variables
The C quiz, question 2
int *p = NULL;
l: if (p) {
return (*p);
} else {
int j = 17;
p = &j;
goto l;
}
memory:
p
j
NULL
17
7
Non-local control flow and block scope variables
The C quiz, question 2
int *p = NULL;
l: if (p) {
return (*p);
} else {
int j = 17;
p = &j;
goto l;
}
memory:
p
j
•
17
7
Non-local control flow and block scope variables
The C quiz, question 2
int *p = NULL;
l: if (p) {
return (*p);
} else {
int j = 17;
p = &j;
goto l;
}
memory:
p
j
•
17
7
Non-local control flow and block scope variables
The C quiz, question 2
int *p = NULL;
l: if (p) {
return (*p);
} else {
int j = 17;
p = &j;
goto l;
}
memory:
p
•
7
Non-local control flow and block scope variables
The C quiz, question 2
int *p = NULL;
l: if (p) {
return (*p);
} else {
int j = 17;
p = &j;
goto l;
}
memory:
p
•
C11, 6.2.4p2: the value of a pointer becomes indeterminate when
the object it points to (or just past) reaches the end of its lifetime.
=⇒ Undefined behavior
7
Non-local control flow and block scope variables
Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]
Problem: goto/return/etc. affect memory
8
Non-local control flow and block scope variables
Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]
Problem: goto/return/etc. affect memory
Solution:
I Execute gotos and returns in small steps
I
I
Not so much to search for labels, . . .
but to naturally perform required allocations and deallocations
8
Non-local control flow and block scope variables
Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]
Problem: goto/return/etc. affect memory
Solution:
I Execute gotos and returns in small steps
I
I
I
Not so much to search for labels, . . .
but to naturally perform required allocations and deallocations
Traversal through the AST in the following directions:
downwards to the next statement
upwards to the next statement
y l to a label l: after a goto l
↑↑ v to the top of the statement after a return
I &
I %
I
I
8
Non-local control flow and block scope variables
Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]
Problem: goto/return/etc. affect memory
Solution:
I Execute gotos and returns in small steps
I
I
I
Not so much to search for labels, . . .
but to naturally perform required allocations and deallocations
Traversal through the AST in the following directions:
downwards to the next statement
upwards to the next statement
y l to a label l: after a goto l
↑↑ v to the top of the statement after a return
I &
I %
I
I
I
Use a zipper to keep track of the position and the stack
8
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
l:
if (p)
return (*p)
int j = 17
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
memory:
p
if (p)
return (*p)
int j = 17
NULL
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
memory:
p
if (p)
return (*p)
int j = 17
NULL
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
memory:
p
if (p)
return (*p)
int j = 17
NULL
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
if (p)
return (*p)
memory:
p
j
NULL
17
int j = 17
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
if (p)
return (*p)
memory:
p
j
NULL
17
int j = 17
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
if (p)
return (*p)
memory:
p
j
•
17
int j = 17
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
%
l:
if (p)
return (*p)
memory:
p
j
•
17
int j = 17
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
%
l:
if (p)
return (*p)
memory:
p
j
•
17
int j = 17
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
if (p)
return (*p)
memory:
p
j
•
17
int j = 17
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
if (p)
return (*p)
memory:
p
j
•
17
int j = 17
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
yl
l:
if (p)
return (*p)
memory:
p
j
•
17
int j = 17
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
yl
l:
if (p)
return (*p)
memory:
p
j
•
17
int j = 17
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
yl
l:
if (p)
return (*p)
memory:
p
j
•
17
int j = 17
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
yl
l:
memory:
p
if (p)
return (*p)
int j = 17
•
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
yl
l:
memory:
p
if (p)
return (*p)
int j = 17
•
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
memory:
p
if (p)
return (*p)
int j = 17
•
;
p = &j
goto l
9
Non-local control flow and block scope variables
Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
memory:
p
if (p)
return (*p)
int j = 17
•
;
p = &j
goto l
9
Non-local control flow and block scope variables
Goto considered harmful?
http://xkcd.com/292/
10
Non-local control flow and block scope variables
Goto considered harmful?
http://xkcd.com/292/
Not necessarily:
` {P} . . . goto main_sub3; . . . {Q}
10
Non-local control flow and block scope variables
Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape
∆; J; R ` {P} s {Q}
11
Non-local control flow and block scope variables
Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape
∆; J; R ` {P} s {Q}
where:
I
{P} s {Q} is a Hoare triple, as usual
11
Non-local control flow and block scope variables
Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape
∆; J; R ` {P} s {Q}
where:
I
{P} s {Q} is a Hoare triple, as usual
I
∆ maps function names to their pre- and post-conditions
11
Non-local control flow and block scope variables
Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape
∆; J; R ` {P} s {Q}
where:
I
{P} s {Q} is a Hoare triple, as usual
I
∆ maps function names to their pre- and post-conditions
I
J maps labels to their jumping condition
When executing a goto l, the assertion J l has to hold
11
Non-local control flow and block scope variables
Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape
∆; J; R ` {P} s {Q}
where:
I
{P} s {Q} is a Hoare triple, as usual
I
∆ maps function names to their pre- and post-conditions
I
J maps labels to their jumping condition
When executing a goto l, the assertion J l has to hold
I
R has to hold to execute a return
11
Non-local control flow and block scope variables
Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape
∆; J; R ` {P} s {Q}
where:
I
{P} s {Q} is a Hoare triple, as usual
I
∆ maps function names to their pre- and post-conditions
I
J maps labels to their jumping condition
When executing a goto l, the assertion J l has to hold
I
R has to hold to execute a return
Remark: the assertions P, Q, J and R correspond to the directions
&, %, y and ↑↑ of traversal
11
Non-local control flow and block scope variables
The block scope variable rule
∆; J ↑ ∗ x0τ →
7− –; R ↑ ∗ x0τ →
7− – ` {P ↑ ∗ x0τ →
7− –} s {Q ↑ ∗ x0τ →
7− –}
∆; J; R ` {P} localτ s {Q}
When entering a block:
I
The De Bruijn indexes are lifted: ( ) ↑
I
The memory is extended: ( ) ∗ x0τ →
7− –
12
Non-local control flow and block scope variables
The block scope variable rule
∆; J ↑ ∗ x0τ →
7− –; R ↑ ∗ x0τ →
7− – ` {P ↑ ∗ x0τ →
7− –} s {Q ↑ ∗ x0τ →
7− –}
∆; J; R ` {P} localτ s {Q}
When entering a block:
I
The De Bruijn indexes are lifted: ( ) ↑
I
The memory is extended: ( ) ∗ x0τ →
7− –
When leaving a block: the reverse
12
Non-local control flow and block scope variables
The block scope variable rule
∆; J ↑ ∗ x0τ →
7− –; R ↑ ∗ x0τ →
7− – ` {P ↑ ∗ x0τ →
7− –} s {Q ↑ ∗ x0τ →
7− –}
∆; J; R ` {P} localτ s {Q}
When entering a block:
I
The De Bruijn indexes are lifted: ( ) ↑
I
The memory is extended: ( ) ∗ x0τ →
7− –
When leaving a block: the reverse
Important: symmetry matches gotos going both in and out
12
Non-local control flow and block scope variables
The block scope variable rule
∆; J ↑ ∗ x0τ →
7− –; R ↑ ∗ x0τ →
7− – ` {P ↑ ∗ x0τ →
7− –} s {Q ↑ ∗ x0τ →
7− –}
∆; J; R ` {P} localτ s {Q}
When entering a block:
I
The De Bruijn indexes are lifted: ( ) ↑
I
The memory is extended: ( ) ∗ x0τ →
7− –
When leaving a block: the reverse
Important: symmetry matches gotos going both in and out
Important: using De Bruijn indexes avoids shadowing
12
Non-determinism and sequence points
The C quiz, question 3
int x = 0, y = 0, *p = &x;
int main() {
*p = f();
printf("x=%d,y=%d\n", x, y);
}
13
Non-determinism and sequence points
The C quiz, question 3
int x = 0, y = 0, *p = &x;
int f() { p = &y; return 17; }
int main() {
*p = f();
printf("x=%d,y=%d\n", x, y);
}
13
Non-determinism and sequence points
The C quiz, question 3
int x = 0, y = 0, *p = &x;
int f() { p = &y; return 17; }
int main() {
*p = f();
printf("x=%d,y=%d\n", x, y);
}
Let us try some compilers
I
Clang prints x=0,y=17
I
GCC prints x=17,y=0
13
Non-determinism and sequence points
The C quiz, question 3
int x = 0, y = 0, *p = &x;
int f() { p = &y; return 17; }
int main() {
*p = f();
printf("x=%d,y=%d\n", x, y);
}
Let us try some compilers
I
Clang prints x=0,y=17
I
GCC prints x=17,y=0
Non-determinism appears even in innocently looking code
13
Non-determinism and sequence points
Separation logic [Krebbers, POPL’14]
Observation: non-determinism corresponds to concurrency
Idea: use the separation logic rule for parallel composition
{P1 } e1 {Q1 }
{P2 } e2 {Q2 }
{P1 ∗ P2 } e1 } e2 {Q1 ∗ Q2 }
14
Non-determinism and sequence points
Separation logic [Krebbers, POPL’14]
Observation: non-determinism corresponds to concurrency
Idea: use the separation logic rule for parallel composition
{P1 } e1 {Q1 }
{P2 } e2 {Q2 }
{P1 ∗ P2 } e1 } e2 {Q1 ∗ Q2 }
What does this mean:
I
Split the memory into two disjoint parts
I
Prove that e1 and e2 can be executed safely in their part
I
Now e1 } e2 can be executed safely in the whole memory
14
Non-determinism and sequence points
Separation logic [Krebbers, POPL’14]
Observation: non-determinism corresponds to concurrency
Idea: use the separation logic rule for parallel composition
{P1 } e1 {Q1 }
{P2 } e2 {Q2 }
{P1 ∗ P2 } e1 } e2 {Q1 ∗ Q2 }
What does this mean:
I
Split the memory into two disjoint parts
I
Prove that e1 and e2 can be executed safely in their part
I
Now e1 } e2 can be executed safely in the whole memory
Disjointness ⇒ no sequence point violation (accessing the same
location twice in one expression)
14
Non-determinism and sequence points
Operational semantics [Krebbers, POPL’14]
Head reduction for expressions (e, m) _h (e 0 , m0 )
15
Non-determinism and sequence points
Operational semantics [Krebbers, POPL’14]
Head reduction for expressions (e, m) _h (e 0 , m0 )
I
On assignments: add locked address a to Ω1 ∪ Ω2
([a]Ω1 := [v ]Ω2 , m) _h ([v ]{a}∪Ω1 ∪Ω2 , lock a (m[a := v ]))
I
Meanwhile: lock a makes accesses to a undefined
I
On sequence points: unlock Ω in memory
([v ]Ω ? e2 : e3 , m) _h (e2 , unlock Ω m) provided . . .
15
Non-determinism and sequence points
Operational semantics [Krebbers, POPL’14]
Head reduction for expressions (e, m) _h (e 0 , m0 )
I
On assignments: add locked address a to Ω1 ∪ Ω2
([a]Ω1 := [v ]Ω2 , m) _h ([v ]{a}∪Ω1 ∪Ω2 , lock a (m[a := v ]))
I
Meanwhile: lock a makes accesses to a undefined
I
On sequence points: unlock Ω in memory
([v ]Ω ? e2 : e3 , m) _h (e2 , unlock Ω m) provided . . .
Gives a local treatment of sequence points
15
Non-determinism and sequence points
Operational semantics [Krebbers, POPL’14]
Head reduction for expressions (e, m) _h (e 0 , m0 )
I
On assignments: add locked address a to Ω1 ∪ Ω2
([a]Ω1 := [v ]Ω2 , m) _h ([v ]{a}∪Ω1 ∪Ω2 , lock a (m[a := v ]))
I
Meanwhile: lock a makes accesses to a undefined
I
On sequence points: unlock Ω in memory
([v ]Ω ? e2 : e3 , m) _h (e2 , unlock Ω m) provided . . .
Gives a local treatment of sequence points
Small step reduction S(P, φ, m) _ S(P 0 , φ0 , m0 )
I
(P, φ) gives the position/direction in the whole program
I
Different φs for expressions, statements, function calls
I
Uses evaluation contexts, i.e. if (e1 , m1 ) _h (e2 , m2 ), then
S(P, E[ e1 ], m1 ) _ S(P, E[ e2 ], m2 )
15
Strict aliasing restrictions
What is aliasing?
Aliasing: multiple pointers referring to the same object
16
Strict aliasing restrictions
What is aliasing?
Aliasing: multiple pointers referring to the same object
int f(int *p, int *q) {
int x = *q; *p = 17; return x;
}
If p and q alias, the original value n of *p is returned
n
p
q
16
Strict aliasing restrictions
What is aliasing?
Aliasing: multiple pointers referring to the same object
int f(int *p, int *q) {
int x = *q; *p = 17; return x *q;
}
If p and q alias, the original value n of *p is returned
n
p
q
Optimizing x away is unsound: 17 would be returned
16
Strict aliasing restrictions
What is aliasing?
Aliasing: multiple pointers referring to the same object
int f(int *p, int *q) {
int x = *q; *p = 17; return x *q;
}
If p and q alias, the original value n of *p is returned
n
p
q
Optimizing x away is unsound: 17 would be returned
Alias analysis: to determine whether pointers can alias
16
Strict aliasing restrictions
The C quiz, question 4
Consider a similar function:
short g(int *p, short *q) {
short x = *q; *p = 17; return x;
}
And call it with aliased pointers:
union { int x; short y; } u;
u.y = 3;
g(&u.x, &u.y);
x
y
&u.x
&u.y
17
Strict aliasing restrictions
The C quiz, question 4
Consider a similar function:
short g(int *p, short *q) {
short x = *q; *p = 17; return x;
}
And call it with aliased pointers:
union { int x; short y; } u;
u.y = 3;
g(&u.x, &u.y);
x
y
&u.x
&u.y
C89 allows p and q to be aliased, and thus requires g to return 3
17
Strict aliasing restrictions
The C quiz, question 4
Consider a similar function:
short g(int *p, short *q) {
short x = *q; *p = 17; return x;
}
And call it with aliased pointers:
union { int x; short y; } u;
u.y = 3;
g(&u.x, &u.y);
x
y
&u.x
&u.y
C89 allows p and q to be aliased, and thus requires g to return 3
C99/C11 allow type-based alias analysis: reads/writes with “the
wrong type” cause undefined behavior
=⇒ A compiler can assume that p and q do not alias
17
Strict aliasing restrictions
How to treat pointers [Krebbers, CPP’13]
Others (e.g. CompCert)
Memory: a finite map of cells
which consist of arrays of bytes
Our approach
Pointers: pairs (o, i) where o
identifies the cell, and i the offset into that cell
Too little information to capture strict aliasing restrictions
18
Strict aliasing restrictions
How to treat pointers [Krebbers, CPP’13]
Others (e.g. CompCert)
Memory: a finite map of cells
which consist of arrays of bytes
Our approach
A finite map of cells which consist of well-typed trees of bits
Pointers: pairs (o, i) where o
identifies the cell, and i the offset into that cell
Too little information to capture strict aliasing restrictions
18
Strict aliasing restrictions
How to treat pointers [Krebbers, CPP’13]
Others (e.g. CompCert)
Memory: a finite map of cells
which consist of arrays of bytes
Our approach
A finite map of cells which consist of well-typed trees of bits
Pointers: pairs (o, i) where o
identifies the cell, and i the offset into that cell
Pairs (o, r ) where o identifies
the cell, and r the path through
the tree in that cell
Too little information to capture strict aliasing restrictions
18
Strict aliasing restrictions
How to treat pointers [Krebbers, CPP’13]
Others (e.g. CompCert)
Memory: a finite map of cells
which consist of arrays of bytes
Our approach
A finite map of cells which consist of well-typed trees of bits
Pointers: pairs (o, i) where o
identifies the cell, and i the offset into that cell
Pairs (o, r ) where o identifies
the cell, and r the path through
the tree in that cell
Too little information to capture strict aliasing restrictions
A semantics for strict aliasing
restrictions
18
Strict aliasing restrictions
Example of the memory as a structured forest
Consider:
struct S {
union U {
signed char x[2]; int y;
} u;
void *p;
} s = { { .x = {33,34} }, s.u.x + 2 }
The object in memory looks like:
os 7→
void∗:
(ptr p)0 (ptr p)1 . . . (ptr p)31
.0
signed char: 10000100 01000100 EEEEEEEE
EEEEEEEE
struct S
union U
signed char[2]
p = (os : struct S, ,−
−−−
→ 0 ,−−−→• 1 ,−−−−−−−
→ 0, 8)signed char>void
19
Strict aliasing restrictions
The C quiz, question 5
union U { int x; short y; } u = { .x = 3 };
short q1() {
return u.y;
}
short q2() {
short *p = &u.y;
return *p;
}
20
Strict aliasing restrictions
The C quiz, question 5
union U { int x; short y; } u = { .x = 3 };
short q1() {
return u.y; // OK
}
short q2() {
short *p = &u.y;
return *p; // Undefined
}
Type-punning: reading a union using a pointer to another variant
I
C11 vaguely mentions a notion of “visible”
I
GCC only if “the memory is accessed through the union type”
20
Strict aliasing restrictions
The C quiz, question 5
union U { int x; short y; } u = { .x = 3 };
short q1() {
return u.y; // OK
}
short q2() {
short *p = &u.y;
return *p; // Undefined
}
Type-punning: reading a union using a pointer to another variant
I
C11 vaguely mentions a notion of “visible”
I
GCC only if “the memory is accessed through the union type”
Formalized by decorating pointers with annotations
20
Strict aliasing restrictions
Strict-aliasing Theorem
Theorem (Strict-aliasing)
Given:
I
addresses Γ; m ` a1 : σ1 and Γ; m ` a2 : σ2
I
with annotations that do not allow type-punning
I
σ1 , σ2 6= unsigned char
I
σ1 not a subtype of σ2 and vice versa
Then there are two possibilities:
1. a1 and a2 do not alias
2. accessing a1 after a2 (and vice versa) has undefined behavior
21
Strict aliasing restrictions
Strict-aliasing Theorem
Theorem (Strict-aliasing)
Given:
I
addresses Γ; m ` a1 : σ1 and Γ; m ` a2 : σ2
I
with annotations that do not allow type-punning
I
σ1 , σ2 6= unsigned char
I
σ1 not a subtype of σ2 and vice versa
Then there are two possibilities:
1. a1 and a2 do not alias
2. accessing a1 after a2 (and vice versa) has undefined behavior
Theorem
Compilers can perform type-based alias analysis
21
CH2 O Abstract C
#
»
x ∈ string := Set of strings
I ∈ cinit ::= e | { #»
r := I }
k ∈ cintrank ::= char | short | int
sto ∈ cstorage ::= static | extern | auto
| long | long long | ptr
s ∈ cstmt ::= e | skip
si ∈ signedness ::= signed | unsigned
| goto x | return e ?
τi ∈ cinttype ::= si ? k
| break | continue
τ ∈ ctype ::= void | def x | τi | τ ∗
| {s}
#»
| τ [e] | struct x | union x
| sto τ x := I ? ; s
| enum x | typeof e
| typedef x := τ ; s
α ∈ assign ::= := | } := | := }
| s1 ; s2 | x : s
e ∈ cexpr ::= x | constτi z | sizeof τ
| while(e) s
| τi min | τi max | τi bits
| for(e1 ; e2 ; e3 ) s
| &e | ∗ e
| do s while(e)
| e1 α e2
| if (e) s1 else s2
| x(~e ) | abort
d ∈ decl ::= struct τ# »
x | union τ# »
x
| allocτ e | free e
| typedef τ
| }u e | e1 } e2
#
»
| enum x := e ? : τi
| e1 && e2 | e1 || e2
#
»
| global I ? : sto τ
| e1 ? e2 : e3 | (e1 , e2 )
# »
#»
| fun (τ x ? ) s ? : sto τ
| (τ ) I | e . x
Θ
∈
decls
:=
list
(string
×
decl)
r ∈ crefseg ::= [e] | .x
22
CH2 O Abstract C
Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted]
I
Named variables to De Bruijn indices
23
CH2 O Abstract C
Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted]
I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
23
CH2 O Abstract C
Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted]
I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e]
[[ e ]]Γ,getstack P,m = v
iff
S(P, e, m) _∗ S(P, v , m)
23
CH2 O Abstract C
Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted]
I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e]
[[ e ]]Γ,getstack P,m = v
I
iff
S(P, e, m) _∗ S(P, v , m)
Simplification of loops, e.g. while(e) s becomes
catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))
23
CH2 O Abstract C
Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted]
I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e]
[[ e ]]Γ,getstack P,m = v
I
iff
S(P, e, m) _∗ S(P, v , m)
Simplification of loops, e.g. while(e) s becomes
catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))
I
Expansion of typedef and enum declarations
23
CH2 O Abstract C
Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted]
I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e]
[[ e ]]Γ,getstack P,m = v
I
iff
S(P, e, m) _∗ S(P, v , m)
Simplification of loops, e.g. while(e) s becomes
catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))
I
Expansion of typedef and enum declarations
I
Translation of constants like INT_MIN
23
CH2 O Abstract C
Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted]
I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e]
[[ e ]]Γ,getstack P,m = v
I
iff
S(P, e, m) _∗ S(P, v , m)
Simplification of loops, e.g. while(e) s becomes
catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))
I
Expansion of typedef and enum declarations
I
Translation of constants like INT_MIN
I
Translation of compound literals
23
CH2 O Abstract C
Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted]
I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e]
[[ e ]]Γ,getstack P,m = v
I
iff
S(P, e, m) _∗ S(P, v , m)
Simplification of loops, e.g. while(e) s becomes
catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))
I
Expansion of typedef and enum declarations
I
Translation of constants like INT_MIN
I
Translation of compound literals
Theorem (Type soundness)
If the translator succeeds, the CH2 O core program is well-typed.
23
Executable semantics
Goal: define exec : state → Pfin (state)
24
Executable semantics
Goal: define exec : state → Pfin (state)
Problems:
1. Decomposition E[ e1 ] of expressions is non-deterministic:
S(P, E[ e1 ], m1 ) _ S(P, E[ e2 ], m2 )
if (e1 ,m1 )_h(e2 ,m2 )
2. Object identifiers o for newly allocated memory are arbitrary:
S(P, (&, localτ s), m)
_ S((localo:τ 2) :: P, (&, s), alloc o τ m)
if o ∈dom
/
m
24
Executable semantics
Goal: define exec : state → Pfin (state)
Problems:
1. Decomposition E[ e1 ] of expressions is non-deterministic:
S(P, E[ e1 ], m1 ) _ S(P, E[ e2 ], m2 )
if (e1 ,m1 )_h(e2 ,m2 )
2. Object identifiers o for newly allocated memory are arbitrary:
S(P, (&, localτ s), m)
_ S((localo:τ 2) :: P, (&, s), alloc o τ m)
if o ∈dom
/
m
Solutions:
1. Enumerate all possible decompositions E[ e1 ]
2. Pick a canonical object identifier fresh m for o
24
Executable semantics
Example of the Coq code
Definition cexec (Γ : env Ti) (δ : funenv Ti) (S : state Ti) : listset (state Ti) :=
let ’State k φ m := S in
match φ with
| Stmt & s =>
match s with
| skip => {[ State k (Stmt % skip) m ]}
| goto l => {[ State k (Stmt (y l) (goto l)) m ]}
| ...
| local{τ} s =>
let o := fresh (dom indexset m) in (* use canonical object identifier *)
{[ State (CLocal o τ :: k) (Stmt & s)
(mem_alloc Γ o false τ m) ]}
end
| Expr e =>
match maybe_EVal e with
| Some (Ω,v) => ...
| None =>
’(E,e’) ← expr redexes e; (* monadic programming to try each decomposition *)
match ehexec Γ (get_stack k) e’ m with
| Some (e2,m2) => {[ State k (Expr (subst E e2)) m2 ]}
| None =>
match maybe_ECall_redex e’ with
| Some (f, Ωs, vs) =>
S
{[ State (CFun E :: k) (Call f vs) (mem_unlock ( Ωs) m) ]}
| _ => {[ State k (Undef (UndefExpr E e’)) m ]}
end
end
end
| ...
25
Executable semantics
Example of the Coq code
Definition cexec (Γ : env Ti) (δ : funenv Ti) (S : state Ti) : listset (state Ti) :=
let ’State k φ m := S in
match φ with
| Stmt & s =>
match s with
| skip => {[ State k (Stmt % skip) m ]}
| goto l => {[ State k (Stmt (y l) (goto l)) m ]}
| ...
| local{τ} s =>
let o := fresh (dom indexset m) in (* use canonical object identifier *)
{[ State (CLocal o τ :: k) (Stmt & s)
(mem_alloc Γ o false τ m) ]}
end
| Expr e =>
match maybe_EVal e with
| Some (Ω,v) => ...
| None =>
’(E,e’) ← expr redexes e; (* monadic programming to try each decomposition *)
match ehexec Γ (get_stack k) e’ m with
| Some (e2,m2) => {[ State k (Expr (subst E e2)) m2 ]}
| None =>
match maybe_ECall_redex e’ with
| Some (f, Ωs, vs) =>
S
{[ State (CFun E :: k) (Call f vs) (mem_unlock ( Ωs) m) ]}
| _ => {[ State k (Undef (UndefExpr E e’)) m ]}
end
end
end
| ...
25
Executable semantics
Example of the Coq code
Definition cexec (Γ : env Ti) (δ : funenv Ti) (S : state Ti) : listset (state Ti) :=
let ’State k φ m := S in
match φ with
| Stmt & s =>
match s with
| skip => {[ State k (Stmt % skip) m ]}
| goto l => {[ State k (Stmt (y l) (goto l)) m ]}
| ...
| local{τ} s =>
let o := fresh (dom indexset m) in (* use canonical object identifier *)
{[ State (CLocal o τ :: k) (Stmt & s)
(mem_alloc Γ o false τ m) ]}
end
| Expr e =>
match maybe_EVal e with
| Some (Ω,v) => ...
| None =>
’(E,e’) ← expr redexes e; (* monadic programming to try each decomposition *)
match ehexec Γ (get_stack k) e’ m with
| Some (e2,m2) => {[ State k (Expr (subst E e2)) m2 ]}
| None =>
match maybe_ECall_redex e’ with
| Some (f, Ωs, vs) =>
S
{[ State (CFun E :: k) (Call f vs) (mem_unlock ( Ωs) m) ]}
| _ => {[ State k (Undef (UndefExpr E e’)) m ]}
end
end
end
| ...
25
Executable semantics
Soundness and completeness [Krebbers/Wiedijk, Submitted]
Theorem (Soundness)
If S2 ∈ exec S1 , then S1 _ S2 .
26
Executable semantics
Soundness and completeness [Krebbers/Wiedijk, Submitted]
Theorem (Soundness)
If S2 ∈ exec S1 , then S1 _ S2 .
Definition (Permutation)
We let S1 ∼f S2 , if S2 is obtained by renaming S1 with respect to
f : index → option index.
26
Executable semantics
Soundness and completeness [Krebbers/Wiedijk, Submitted]
Theorem (Soundness)
If S2 ∈ exec S1 , then S1 _ S2 .
Definition (Permutation)
We let S1 ∼f S2 , if S2 is obtained by renaming S1 with respect to
f : index → option index.
Theorem (Completeness)
If S1 _∗ S2 , then there exists an f and S20 such that:
S20
∗
exec
S1
f
∗
S2
26
Executable semantics
Demo
$ cat foo.c
int main(void) {
int x = 10, *p = &x;
return *p + *p + *p + *p + *p + *p;
}
$ ./ch2o -t foo.c
.....,.......,,,.....+;!|?*%%$$$$$$$$%%*?|!;:+-,..
........
"" 60
27
Conclusion
C11 is not a simple language
28
Questions
Sources: http://robbertkrebbers.nl/research/ch2o/
http://xkcd.com/371/
29