A precise and abstract memory model for C using symbolic

A precise and abstract memory model for C using
symbolic values
Frédéric Besson
Sandrine Blazy
Pierre Wilke
Rennes, France
P. Wilke
A precise and abstract memory model for C using symbolic values
1 / 24
What does this program do?
int main(){
int * p = (int *) malloc (sizeof (int));
//
*p = 42;
int * q = p | (checksum(p) & 0xF);
//
assert( checksum( (q » 4) « 4 ) == (q & 0xF) );
int * r = ( q » 4 ) « 4;
//
return *r;
}
P. Wilke
A precise and abstract memory model for C using symbolic values
2 / 24
What does this program do?
int main(){
int * p = (int *) malloc (sizeof (int));
// p = 0xTUVWXYZ0
*p = 42;
int * q = p | (checksum(p) & 0xF);
//
assert( checksum( (q » 4) « 4 ) == (q & 0xF) );
int * r = ( q » 4 ) « 4;
//
return *r;
}
P. Wilke
A precise and abstract memory model for C using symbolic values
2 / 24
What does this program do?
int main(){
int * p = (int *) malloc (sizeof (int));
// p = 0xTUVWXYZ0
*p = 42;
int * q = p | (checksum(p) & 0xF);
// q = 0xTUVWXYZ5
assert( checksum( (q » 4) « 4 ) == (q & 0xF) );
int * r = ( q » 4 ) « 4;
//
return *r;
}
P. Wilke
A precise and abstract memory model for C using symbolic values
2 / 24
What does this program do?
int main(){
int * p = (int *) malloc (sizeof (int));
// p = 0xTUVWXYZ0
*p = 42;
int * q = p | (checksum(p) & 0xF);
// q = 0xTUVWXYZ5
assert( checksum( (q » 4) « 4 ) == (q & 0xF) );
int * r = ( q » 4 ) « 4;
// r = 0xTUVWXYZ0 == p
return *r;
}
P. Wilke
A precise and abstract memory model for C using symbolic values
2 / 24
What does this program do?
int main(){
int * p = (int *) malloc (sizeof (int));
// p = 0xTUVWXYZ0
*p = 42;
int * q = p | (checksum(p) & 0xF);
// q = 0xTUVWXYZ5
assert( checksum( (q » 4) « 4 ) == (q & 0xF) );
int * r = ( q » 4 ) « 4;
// r = 0xTUVWXYZ0 == p
return *r;
}
“Real life”
Terminates and outputs 42
P. Wilke
A precise and abstract memory model for C using symbolic values
2 / 24
What does this program do?
int main(){
int * p = (int *) malloc (sizeof (int));
// p = 0xTUVWXYZ0
*p = 42;
int * q = p | (checksum(p) & 0xF);
// q = 0xTUVWXYZ5
assert( checksum( (q » 4) « 4 ) == (q & 0xF) );
int * r = ( q » 4 ) « 4;
// r = 0xTUVWXYZ0 == p
return *r;
}
ISO C Standard
Undefined behavior
P. Wilke
“Real life”
Terminates and outputs 42
A precise and abstract memory model for C using symbolic values
2 / 24
What does this program do?
int main(){
int * p = (int *) malloc (sizeof (int));
// p = 0xTUVWXYZ0
*p = 42;
int * q = p | (checksum(p) & 0xF);
// q = 0xTUVWXYZ5
assert( checksum( (q » 4) « 4 ) == (q & 0xF) );
int * r = ( q » 4 ) « 4;
// r = 0xTUVWXYZ0 == p
return *r;
}
ISO C Standard
Undefined behavior
“Real life”
Terminates and outputs 42
This work: an executable semantics for this kind of programs
P. Wilke
A precise and abstract memory model for C using symbolic values
2 / 24
State of the art: formal semantics for C
• Cholera [ESOP’99]: first formal semantics of C
• CompCert [JAR’09]: large subset of C, executable, abstract memory model
• KCC [POPL’12]: large subset of C, executable, abstract memory model
• Krebbers [POPL’14]: closer to the ISO C standard
• Our work: CompCert + more defined semantics + low level memory model
P. Wilke
A precise and abstract memory model for C using symbolic values
3 / 24
CompCert
compiles into
P. Wilke
A precise and abstract memory model for C using symbolic values
4 / 24
CompCert
Interpreter
compiles into
P. Wilke
A precise and abstract memory model for C using symbolic values
4 / 24
CompCert
Memory
model
Interpreter
compiles into
P. Wilke
A precise and abstract memory model for C using symbolic values
4 / 24
CompCert
Memory
model
Interpreter
compiles into
semantics
of C
P. Wilke
semantics of
ASM
A precise and abstract memory model for C using symbolic values
4 / 24
CompCert
Memory
model
Interpreter
compiles into
semantics
of C
semantics of
ASM
Semantic preservation theorem
If the program has well-defined semantics
Then the compiler does not introduce bugs
C OQ
P. Wilke
A precise and abstract memory model for C using symbolic values
4 / 24
CompCert
Memory
model
Interpreter
compiles into
Undefined
semantics
semantics
of C
semantics of
ASM
Semantic preservation theorem
If the program has well-defined semantics
Then the compiler does not introduce bugs
C OQ
P. Wilke
A precise and abstract memory model for C using symbolic values
4 / 24
CompCert
Memory
model
Interpreter
compiles into
Undefined
semantics
semantics
of C
semantics of
ASM
Semantic preservation theorem
If the program has well-defined semantics
Then the compiler does not introduce bugs
C OQ
P. Wilke
A precise and abstract memory model for C using symbolic values
4 / 24
CompCert’s memory model
• set of disjoint blocks
• Values:
val ::= i | (b, o) | Vundef
•
•
•
•
load (M , b, o) = bv c
store(M , b, o, v ) = bM 0 c
alloc (M , lo, hi ) = (M 0 , b)
free(M , b) = bM 0 c
b2
b1
0
(b2 , 2)
5
b3
7
5
128
“Good variable” properties:
load (store(M , b, o, v ), b, o) = v
P. Wilke
A precise and abstract memory model for C using symbolic values
5 / 24
Back to the example
int main(){
int * p = (int *) malloc (sizeof (int));
*p = 42;
int * q = p | 5;
int * r = (q » 4) « 4;
return *r;
}
bp
bq
br
P. Wilke
A precise and abstract memory model for C using symbolic values
6 / 24
Back to the example
int main(){
int * p = (int *) malloc (sizeof (int));
*p = 42;
int * q = p | 5;
int * r = (q » 4) « 4;
return *r;
}
bp
b16
(b , 0 )
bq
br
P. Wilke
A precise and abstract memory model for C using symbolic values
6 / 24
Back to the example
int main(){
int * p = (int *) malloc (sizeof (int));
*p = 42;
int * q = p | 5;
int * r = (q » 4) « 4;
return *r;
}
bp
b16
(b , 0 )
42
bq
br
P. Wilke
A precise and abstract memory model for C using symbolic values
6 / 24
Back to the example
int main(){
int * p = (int *) malloc (sizeof (int));
*p = 42;
int * q = p | 5;
int * r = (q » 4) « 4;
return *r;
}
bp
b16
(b , 0 )
42
bq
Vundef
br
P. Wilke
A precise and abstract memory model for C using symbolic values
6 / 24
Back to the example
int main(){
int * p = (int *) malloc (sizeof (int));
*p = 42;
int * q = p | 5;
int * r = (q » 4) « 4;
return *r;
}
bp
b16
(b , 0 )
42
bq
(b, 0) | 5
br
P. Wilke
A precise and abstract memory model for C using symbolic values
6 / 24
Back to the example
int main(){
int * p = (int *) malloc (sizeof (int));
*p = 42;
int * q = p | 5;
int * r = (q » 4) « 4;
return *r;
}
bp
b16
(b , 0 )
42
bq
(b, 0) | 5
br
(b, 0) | 5 4 4
P. Wilke
A precise and abstract memory model for C using symbolic values
6 / 24
Back to the example
int main(){
int * p = (int *) malloc (sizeof (int));
*p = 42;
int * q = p | 5;
int * r = (q » 4) « 4;
return *r;
}
bp
b16
(b , 0 )
42
bq
(b, 0) | 5
br
(b, 0) | 5 4 4
P. Wilke
≡ (b, 0)
A precise and abstract memory model for C using symbolic values
6 / 24
Back to the example
int main(){
int * p = (int *) malloc (sizeof (int));
*p = 42;
int * q = p | 5;
int * r = (q » 4) « 4;
return *r;
}
bp
b16
(b , 0 )
Alignment Constraints
42
bq
(b, 0) | 5
br
(b, 0) | 5 4 4
P. Wilke
≡ (b, 0)
A precise and abstract memory model for C using symbolic values
6 / 24
Back to the example
int main(){
int * p = (int *) malloc (sizeof (int));
*p = 42;
int * q = p | 5;
int * r = (q » 4) « 4;
return *r;
}
bp
b16
(b , 0 )
Alignment Constraints
42
bq
(b, 0) | 5
Symbolic values
br
(b, 0) | 5 4 4
P. Wilke
≡ (b, 0)
A precise and abstract memory model for C using symbolic values
6 / 24
Back to the example
int main(){
int * p = (int *) malloc (sizeof (int));
*p = 42;
int * q = p | 5;
int * r = (q » 4) « 4;
return *r;
}
bp
b16
(b , 0 )
Alignment Constraints
42
bq
(b, 0) | 5
Symbolic values
br
(b, 0) | 5 4 4
P. Wilke
≡ (b, 0)
Normalisation
A precise and abstract memory model for C using symbolic values
6 / 24
Outline
1
A C semantics with symbolic values
2
Normalisation: specification and implementation
3
Experimental evaluation
P. Wilke
A precise and abstract memory model for C using symbolic values
7 / 24
Outline
1
A C semantics with symbolic values
2
Normalisation: specification and implementation
3
Experimental evaluation
P. Wilke
A precise and abstract memory model for C using symbolic values
8 / 24
CompCert’s memory model with symbolic values
bp
b16
(b, 0)
42
bq
(b, 0) | 5
br
(b, 0) | 5 4 4
• Alignment
• alloc (M , lo, hi , mask) = (M 0 , b)
A(b) & mask = A(b)
• Symbolic values
• sv ::= val | op1 sv | sv op2 sv
• load (M , b, o) = bsvc
• store(M , b, o, sv) = bM 0 c
P. Wilke
A precise and abstract memory model for C using symbolic values
9 / 24
CompCert’s memory model with symbolic values
bp
b16
(b, 0)
42
bq
(b, 0) | 5
br
(b, 0) | 5 4 4
≡ (b, 0)
normalise : mem → sv → option val
When do we need to normalise symbolic values?
• Memory accesses:
• return *r;
• *p = 42;
• Control flow:
• if (c) { ... } else { ... }
P. Wilke
A precise and abstract memory model for C using symbolic values
10 / 24
Adapting the CompCert semantics
Semantic rules
` a, M → (b, o) load(M , b, o) = bv c
` ∗a , M → v
` a, M → (b, o) store(M , b, o, v ) = bM 0 c
` ∗a = v , M → M 0
` is _true(a)
` if a then s1 else s2 , M → s1 , M
P. Wilke
A precise and abstract memory model for C using symbolic values
11 / 24
Adapting the CompCert semantics
Semantic rules
` a, M → sva normalise(M , sva ) = b(b, o)c
load(M , b, o) = bsv c
` ∗a, M → sv
` a, M → sva normalise(M , sva ) = b(b, o)c
store(M , b, o, sv ) = bM 0 c
` ∗a = sv , M → M 0
` normalise(M , a) = bi c is _true(i )
` if a then s1 else s2 , M → s1 , M
P. Wilke
A precise and abstract memory model for C using symbolic values
11 / 24
Interpreter
Symbolic
Memory model
Semantics
of C
with normalisation
P. Wilke
A precise and abstract memory model for C using symbolic values
12 / 24
Outline
1
A C semantics with symbolic values
2
Normalisation: specification and implementation
3
Experimental evaluation
P. Wilke
A precise and abstract memory model for C using symbolic values
13 / 24
Evaluation of symbolic values
• Input:
• a memory mapping A : block → int32
• a symbolic value sv
• Output: the set of machine integers that sv evaluates to.
J·KA : sv → P (int32 )
i ∈ Ji KA
v1 ∈ Je1 KA
v1 ∈ Je1 KA
P. Wilke
A(b) + o ∈ J(b, o)KA
eval _unop(op1 , v1 ) = bv c
v ∈ Jop1 e1 KA
n ∈ JVundefKA
v2 ∈ Je2 KA eval _binop(op2 , v1 , v2 ) = bv c
v ∈ Je1 op2 e2 KA
A precise and abstract memory model for C using symbolic values
14 / 24
Valid memory mapping : A |= M
A memory mapping A : block → int32 is valid for memory M iff:
1
addresses from distinct blocks do not overlap,
2
the address of a block satisfies its alignment constraints:
A(b) & mask(M , b) = A(b)
3
valid addresses are not null.
Example:
b1
b2
bounds = [0, 4[
mask = 0xFFFFFFF0
bounds = [0, 8[
mask = 0xFFFFFFFF
Valid A1
A2
0
Invalid A3
A4
P. Wilke
4
8
12
16
20
24
28
32
36
40
A precise and abstract memory model for C using symbolic values
44
15 / 24
48
Normalisation: specification
If normalise(M , sv ) = bv c then:
• v 6= Vundef
• ∀A M , Jsv KA = Jv KA
Example:
Consider block
b with bounds [0, 4[ and 16-byte aligned.
sv =
(b, 0) | 5 4 4
v = ( b , 0)
A
{b 7→ 16}
{b 7→ 32}
{b 7→ 16k }
P. Wilke
JsvKA
(((16 + 0) | 5) 4)4
(((32 + 0) | 5) 4)4
(((16k + 0) | 5) 4)4
= 16
= 32
= 16k
JvKA
16
32
16k
A precise and abstract memory model for C using symbolic values
16 / 24
Normalisation: implementation
int main(){
int * p = (int *) malloc (sizeof (int));
*p = 42;
int * q = p | 5;
int * r = (q » 4) « 4;
return *r;
}
bp
b16
(b , 0 )
42
br
(b, 0) | 5 4 4
normalise
(b , 0 )
P. Wilke
A precise and abstract memory model for C using symbolic values
17 / 24
How to compute normalisation?
normalise(M , sv ) =
bval c
0/
bval c
sv
Bitvector
problem
SMT solver
M
P. Wilke
0/
A precise and abstract memory model for C using symbolic values
18 / 24
Normalisation in our example
Let A be a valid memory layout for memory M: A M
svr =
(b, 0) | 5 4 4
Translation into a bitvector expression
bvr =
(A(b) + 0) | 5 4 4
Goal: find a unique model (b, i ) such that: bvr = A(b) + i
Two steps:
• find a model (b0 , i0 ) such that bvr = A(b0 ) + i0
• (b0 , i0 ) = (b, 0)
• check that this solution is unique: unsat(bvr = A(b) + i ∧ b 6= b0 )
• the solution is indeed unique
• the normalisation returns b(b, 0)c
P. Wilke
A precise and abstract memory model for C using symbolic values
19 / 24
Outline
1
A C semantics with symbolic values
2
Normalisation: specification and implementation
3
Experimental evaluation
P. Wilke
A precise and abstract memory model for C using symbolic values
20 / 24
Experiments
Real life programs
• a malloc implementation (dlmalloc.c )
• excerpts from a C library : Public Domain C Library
• excerpts from a cryptographic library: Networking and Cryptographic library
• hand-written C programs
What kind of symbolic values do these programs trigger?
• bitwise operations on pointer
• use of undefined values
P. Wilke
A precise and abstract memory model for C using symbolic values
21 / 24
Putting the pieces together
Symbolic
Interpreter
Symbolic
Memory model
Z3
Normalising semantics
of C
Coq development available at http://www.irisa.fr/celtique/ext/csem/
P. Wilke
A precise and abstract memory model for C using symbolic values
22 / 24
Conclusion
Results:
• non-regression
• more programs have their expected semantics
• limits: some undefined behaviors are not captured by our semantics
• pointer comparison
Ongoing work:
• Proofs of the memory model
• e.g. load (store(M , b, o, sv), b, o) = sv
• Proof of the whole CompCert compiler?
• memory injections redefined
P. Wilke
A precise and abstract memory model for C using symbolic values
23 / 24
Questions?
P. Wilke
A precise and abstract memory model for C using symbolic values
24 / 24