PPTX

Spring 2017
Program Analysis and Verification
Lecture 9: Abstract Interpretation
Roman Manevich
Ben-Gurion University
Tentative syllabus
Program
Verification
Program
Analysis Basics
Abstract
Interpretation
fundamentals
Analysis
Techniques
Operational
semantics
Control Flow
Graphs
Lattices
Numerical
Domains
Hoare Logic
Equation
Systems
Fixed-Points
Alias analysis
Predicate
Calculus
Collecting
Semantics
Chaotic
Iteration
Interprocedural
Analysis
Data Structures
Galois
Connections
Shape
Analysis
Termination
Domain
constructors
CEGAR
Widening/
Narrowing
2
Previously
•
•
•
•
Control-flow graphs
Operational semantics for CFGs
Collecting semantics
Equation systems
3
Structural semantics for While
[ass]
[skip]
x := a,   [x
Aa]
skip,   
[comp1]
S1,   S1’, ’
S1; S2,   S1’; S2, 
[comp2]
S1,   ’
S1; S2,   S2, ’
What does the
structural stand for?
[iftt]
if b then S1 else S2,   S1, 
if Bb  = tt
[ifff]
if b then S1 else S2,   S2, 
if Bb  = ff
[whilett]
while b do S,  
 S; while b do S, 
if Bb  = tt
[whileff]
while b do S,   
if Bb  = ff
4
Annotated CFG
• The equivalent of annotated programs for
CFGs are CFGs with predicates labeling nodes
1 label0:
if x <= 0 goto label1
2
x := x – 1
3
goto label0
4
{true}
entry
{true}
5 label1:
1
assume x > 0
assume x <= 0
3 {x>0}
x := x - 1
{x<=0}
exit
4
{x>=0}
5
Correctly-annotated CFG
• A CFG is correctly (soundly) annotated if for
every edge n c
n’
{P}
{Q}
the following condition holds:
 .  P
c  = ’  ’
Q
6
Axiomatic semantics for While
[assp] { P[a/x] } x := a { P }
[skipp] { P } skip { P }
{ P } S1 { Q }, { Q } S2 { R }
[compp]
{ P } S1; S2 { R }
{b
P } S1 { Q }, { b P } S2 { Q
}
[ifp]
{ P } if b then S1 else S2 { Q }
[whilep]
[consp]
{b P}S{P}
{ P } while b do S { b
{ P’ } S { Q’ }
{P}S{Q}
P}
if P P’ and Q’ Q
7
Hoare logic for CFGs (WP variant)
• For each edge
n
c
{P}
n’
{Q}
[ass] { P[a/x] } x := a
{P}
{ P } skip
{P}
[skip]
[assume]
[cons]
{ P } assume b { P  b}
{ P’ } S { Q’ }
{P}S{Q}
if P P’ and Q’ Q
8
Hoare logic for CFGs (SP variant)
• For each edge
n
c
{P}
[ass]
[skip]
[assume]
[cons]
n’
{Q}
{ P } x := a
{ v. x=a[v/x]
{ P } skip
{P}
P[v/x]
}
{ P } assume b { P
{ P’ } S { Q’ }
{P}S{Q}
b}
if P P’ and Q’ Q
9
Collecting semantics
• Intuitively, like a forward-going predicate
transformer
1 label0:
if x <= 0 goto label1
2
x := x – 1
3
goto label0
4
{true}
entry
{true}
5 label1:
1
assume x > 0
assume x <= 0
3 {x>0}
x := x - 1
{x<=0}
exit
4
{x>=0}
10
Equational definition example
• A vector of variables R[0, 1, 2, 3, 4]
• R[0] = 
R[1] = R[0] R[3]
R[2] = assume x>0 R[1]
R[3] = x:=x-1 R[2]
R[4] = assume (x>0) R[1]
• A (recursive) system of equations
R[0]
entry
R[1]
1
assume x <= 0
assume x > 0
R[2] 2
x := x - 1
R[4]
exit
R[3] 3
11
General definition
• For every node n with incoming edges
{(n1, c1, n), …, (nk, ck, n)} define the equation
R[n] = c1 R[n1]  …  ck R[nk]
12
Agenda
• Motivation for static analysis
• Defining static analysis
• Semantic domains
–
–
–
–
–
–
–
Appendix A.
Preorders
Partial orders (posets)
Pointed posets
Ascending/descending chains
The height of a poset
Join and Meet operators
Complete lattices
13
static analysis motivation
14
Goal of static analysis
• Develop algorithms to automatically annotate
CFGs with predicates
• Applications:
– Enable compiler optimizations
– Prove absence of errors
– Prove programmer-given assertions
15
Compiler optimization
• A compiler optimization is defined by a
program transformation:
T : Stmt  Stmt
• The transformation is semantics-preserving:
 . C  = T(C) 
• The transformation is applied to the program
only if an enabling condition is met
• Use static analysis to infer enabling conditions
16
Common Subexpression Elimination
• If we have two variable assignments
x := a op b
…
op {+, -, *, ==, <=}
y := a op b
and the values of x, a, and b have not changed
between the assignments, rewrite the code as
x = a op b
…
y := x
• Eliminates useless recalculation
• Analysis: available expressions (AE)
17
What do we need to prove?
{ true }
C1
x := a + b
C2
{ x = a + b }
y := a + b
C3
CSE
Assertion
localizes
decision
{ true }
C1
x := a + b
C2
{ x = a + b }
y := x
C3
18
Constant folding
• Optimization: constant folding
{ x=c }
y := aexpr
constant
folding
simplifies
constant
expressions
y := eval(aexpr[c/x])
– Example:
x:=7; y:=x*9
transformed to: x:=7; y:=7*9
and then to:
x:=7; y:=63
• Analysis: constant propagation (CP)
– Infers facts of the form x=c
19
Defining static analysis
20
First attempt to formalize static analysis
• Problem 1: given a CFG G for a program P,
automatically annotate G
– Too naïve: we can always annotate all nodes with the
assertion true
– We need a quality criterion for the solution
• Problem 2: given a CFG G for a program P,
automatically annotate G with the strongest
assertions
– This is essentially asking to compute the result of the
collecting semantics
– Impossible: annotating an unreachable node with
false is equivalent to the halting problem
21
How do you find a lion in a desert?
• Problem 1: we are small and desert is huge
• Problem 2: lions move : assume lion is stationary
Sahara Desert
22
Finding a lion with a sensor
• Assume sensor for kk zones with resolution k=1…n
– If sensor says no: no lion in zone
– If sensor says maybe: either lion is there or not
Sahara Desert
23
Systematic search
• Divide desert to nn cubes and apply sensor to each
cube
• Decrease k and repeat only on maybe cubes
Sahara Desert
24
Systematic search
• Divide desert to nn cubes and apply sensor to each
cube
• Decrease k and repeat only on maybe cubes
Sahara Desert
25
Third attempt to define static analysis
• Pick a finite domain of predicates D={p1, …, pk}
• Problem 3: given a CFG G for a program P,
automatically annotate G with the strongest
assertions only by predicates from D
– Still impossible, just deciding whether pq is
undecidable
• Solution: introduce more compromises
– Approximate check for pq, by operation p
q
26
The need for approximate transformers
• What is CFG is just a single edge and the
precondition PD is already given
• Problem: compute the strongest
postcondition Q from D
• We need an operation c# : D  D
– A transformer c# P = Q where Q is the
strongest predicate in D that is weaker than the
strongest postcondition
n
{P}
[ass]
c
{ P } x := a
P[v/x] }
n’
{Q}
{ v. x=a[v/x] 
27
The need for a join
• Suppose the CFG has a join node
• The strongest postcondition would be
Q=LR
• We need to approximate Q in D
• We will define a join operation L  R
{L}
a
skip
skip
c
b {R}
{Q}
28
Equational definition of the semantics
• R[entry]
= State
R[2]
= R[entry]
x:=x-1
R[entry]
R[3]
entry
R[3]
= assume x>0 R[2]
R[exit]
= assume x 0 R[2]
• A recursive system of equations
• How can we approximate it using what R[2] 2 if x > 0
we have learned so far?
R[exit]
exit
R[3] 3 x := x - 1
29
An abstract semantics
• R[entry]
= true
R[2]
= R[entry]
x:=x-1 #
R[3]
R[3]
= assume x>0 # R[2]
R[exit]
= assume x 0 # R[2]
• A recursive system of equations
• We will now develop the principles
for solving them
R[exit]
exit
Abstract transformer for x:=x-1
R[entry]
entry
R[2] 2 if x > 0
R[3] 3 x := x - 1
30
Abstract
interpretation
Theory
[1977]
By Rama (Own work) [CC-BY-SA-2.0-fr (http://creativecommons.org/licenses/by-sa/2.0/fr/deed.en)], via Wikimedia Commons
31
Abstract Interpretation [CC77]
• A very general mathematical framework
for approximating semantics
– Generalizes Hoare Logic
– Generalizes weakest precondition calculus
• Allows designing sound static analysis algorithms
– Usually compute by iterating to a fixed-point
– Not specific to any programming language style
• Results of an abstract interpretation are (loop)
invariants
– Can be interpreted as axiomatic verification assertions
and used for verification
32
The big picture
• Use semantic domains to define both concrete
semantics and abstract semantics
• Relate semantics in a sound way
• Interpret program over abstract semantics
abstract
representation
of sets of states
statement S
abstract semantics
abstraction meaning
set of states
abstract
representation
of sets of states
abstraction
statement S
collecting semantics
set of states
meaning
set of states
33
A theory
of semantic
domains
1. Approximating
elements
2. Approximating
sets of elements
By Brett Jordan David Macdonald [CC-BY-2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
34
Overall idea
• A semantic domain can be used to define
properties (representations of predicates)
– Also called abstract states
– We called them assertions in axiomatic semantics
• Common representations
– Logical formulas
– Automata
– Specialized graphs
35
Simple available expressions domain
• For simple available expressions, let’s define
SAV-predicates as follows:
– SAV-factoids
 = { x = y | x, y Var }
– SAV-predicates  = 2
{ x = y + z | x, y, z
Var }
• A set of factoids is interpreted conjunctively:
{a=b+c, c=d+f} stands for {a=b+c c=d+f}
• What does {} stand for then?
36
A taxonomy of semantic domain types
Complete Lattice
(D, , , , , )
Lattice
(D, , , , , )
Join semilattice
(D, , , )
Meet semilattice
(D, , , )
Complete partial order (CPO)
(D, , )
Partial order (poset)
(D, )
Preorder
(D, )
37
preorders
38
Ordering predicates
• Recall that we need to qualitatively order the
result of a static analysis
• We want the strongest annotations
39
Preorder
• Let D (for semantic domain) be a set of
elements
• We say that a binary order relation over D
is a preorder if the following conditions hold
for every d, d’, d’’ D
– Reflexive: d d
– Transitive: d d’ and d’
d’’ implies d
• There may exist d, d’ such that
d d’ and d’ d yet d
d’’
d’
40
A preorder for SAV
• For simple available expressions, let’s define
SAV-predicates as follows:
– SAV-factoids
 = { x = y | x, y Var }
– SAV-predicates  = 2
{ x = y + z | x, y, z
Var }
• Define P1 P2 iff P1  P2
• Is it a preorder?
41
The problems with preorders
• Equivalent elements have different
representations
– Which result would we prefer {x=y, x=a+b} or
{x=y, y=a+b}?
– Transformers may have different results:
{x=y, x=a+b} assume ya+b {x=y, x=a+b}
{x=y, y=a+b} assume ya+b {false}
– Leads to unpredictability
• May turn a terminating analysis into a nonterminating one
In practice some static analyses still use preorders
(taking special care to ensure termination)
42
Partial orders
43
Partially ordered sets (partial orders)
• A partially ordered set (Poset for short)
is a pair (D , )
• : D  D has the following properties,
for all d, d’, d’’ in D
– Reflexive: d d
– Transitive: d d’ and d’ d’’ implies d d’’
– Anti-symmetric: d d’ and d’ d implies d = d’
• If d
d’ and d
d’ we write
d d’
Makes it easier to choose the
best element
44
Partially ordered sets (partial orders)
• A partially ordered set (Poset for short)
is a pair (D , )
• : D  D has the following properties,
for all d, d’, d’’ in D
– Reflexive: d d
– Transitive: d d’ and d’ d’’ implies d d’’
– Anti-symmetric: d d’ and d’ d implies d = d’
• If d
• If d
d’ and d
d’ and d
d’ we write d
d’ we write d
d’
d’
45
A partial order for SAV
• Define P1 P2 iff P1 P2
• Is it a partial order?
• Does P1 P2 imply P1  P2?
46
SAV partial order: P1
Hasse diagram
{x=y}
{y=x}
{x=y, y=x}
P2 iff P1
Var = {x, y}
{}
{x=x+x}
{y=y+y}
{y=x+y}
…
{x=y, x=x+x}
{x=y, x=x+x, x=x+y}
P2
…
{y=y+x}
{x=x+y}
{x=y+x}
{x=x+y, x=y+x}
{x=y, x=x+x, x=x+y}
{x=y, y=x, x=x+x, y=y+y, y=x+y, y=y+x, x=x+y, x=y+x}
47
SAV preorder 1: P1
P2 iff P1
Var = {x, y}
{}
{x=y}
{y=x}
{x=y, y=x}
{x=x+x}
{y=y+y}
{y=x+y}
…
{x=y, x=x+x}
{x=y, x=x+x, x=x+y}
P2
…
…
{y=y+x}
{x=x+y}
{x=y+x}
{x=x+y, x=y+x}
{x=y, x=x+x, x=x+y}
{x=y, y=x, x=x+x, y=y+y, y=x+y, y=y+x, x=x+y, x=y+x}
48
Bottom elements
• How can we represent the fact that a program
point is unreachable?
• In the SAV domain, we can’t
• We need to add… ?
49
Pointed poset
• A poset (D, ) with a least element
a pointed poset
– For all d D we have that
is called
d
• The pointed poset is denoted by (D , , )
• We can always transform a poset (D, ) into a
pointed poset by adding a special bottom
element
(D { },
{ d | d D}, )
• Example: false = 
{false}
50
chains
51
Chains
•
•
•
•
If d d’ and d d’ we write d d’
Similarly define d d’
Let (D, ) be a poset
An ascending chain is a sequence
x 1 x2 … xk …
• A descending chain is a sequence
x 1 x2 … xk …
52
Ascending chain example
the signs domain
(for variable x)
true
x
0
x<0
x
0
x=0
x>0
false
53
Height of a poset
• The height of a poset is the length of the
maximal ascending chain
54
What is the height of the SAV poset?
Var = {x, y}
{}
{x=y}
{y=x}
{x=y, y=x}
{x=x+x}
{y=y+y}
{y=x+y}
…
{x=y, x=x+x}
{x=y, x=x+x, x=x+y}
…
{y=y+x}
{x=x+y}
{x=y+x}
{x=x+y, x=y+x}
{x=y, x=x+x, x=x+y}
{x=y, y=x, x=x+x, y=y+y, y=x+y, y=y+x, x=x+y, x=y+x}
55
Joining
elements
By Viviana Pastor (originally posted to Flickr as Harbour Bridge 1) [CC-BY-2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
56
Bounds
• Let (D , ) be a poset
• Let X D be a set of elements from D
• An element dD is an upper bound (ub) of X
iff for every xD we have that x d
• An element dD is a lower bound (lb) of X
iff for every xD we have that d x
57
Bounds
• Let (D , ) be a poset
• Let X D be a set of elements from D
• An element dD is the least upper bound
(lub) of X iff d is the minimal of all upper
bounds of X
• An element dD is the greatest lower bound
(glb) of X iff d is the maximal of all lower
bounds of X
58
Bounds example
true
x
0
x<0
x
0
x=0
x>0
false
59
x 0 and true are upper bounds
true
x
0
x<0
x
0
x=0
x>0
false
60
x 0 is the least upper bound
true
x
0
x<0
x
0
x=0
x>0
false
61
Join (confluence) operator
• Assume a poset (D, )
• Let X D be a subset of D (finite/infinite)
• The join of X is defined as
–
X = the least upper bound (LUB) of all elements
in X if it exists
– X = min { b | forall x X we have that x b}
– The supremum of the elements in X
– A kind of abstract union (disjunction) operator
62
Properties of join
• Commutative: x y = y x
• Associative: (x y) z = x (y
• Idempotent: x x = x
z)
• Can be used to define partial order
x y iff x y = y
• Monotone: if y z then (x y) (x
•
x=x
•
x=
z)
63
Meet operator
• Assume a poset (D, )
• Let X D be a subset of D (finite/infinite)
• The meet of X is defined as
–
X = the greatest lower bound (GLB) of all elements in
X if it exists
– X = max { b | forall x X we have that b x}
– The infimum of the elements in X
– A kind of abstract intersection (conjunction) operator
• Properties of a meet operator
– Commutative: x y = y x
– Associative: (x y) z = x (y
– Idempotent: x x = x
z)
64
Complete partial orders
65
Complete partial order (CPO)
• A CPO is a partial order where each ascending
chain has a supremum
66
CPO example
Is there a
join here?
x
0
x<0
x
0
x=0
x>0
false
67
lattices
68
Complete lattice
•
•
•
•
•
A complete lattice (D, , , , , ) is
A set of elements D
A partial order x y
A join operator
A meet operator
69
A taxonomy of semantic domain types
Join/Meet exist for every finite
subset of D (alternatively,
binary join/meet)
Join of the empty set
Complete Lattice
(D, , , , , )
Lattice
(D, , , , , )
Join semilattice
(D, , , )
Complete partial order (CPO)
(D, , )
Partial order (poset)
(D, )
Preorder
(D, )
Join/Meet exist for
every subset of D
Meet of the empty set
Meet semilattice
(D, , , )
poset with LUB for all ascending chains
reflexive
transitive
anti-symmetric: d
d’ and d’
reflexive: d d
transitive: d d’, d’
d implies d = d’
d’’ implies d
d’’
70
practice
71
Set-based lattices 1: disjunction
• Let P={p1, …, pk} be a set of predicates
• Define the lattice Disj(P) where its elements
are subsets of P and each element X  P is
interpreted disjunctively
– For example, {a, b} stands for ab
• (D, , , , , )
72
Set-based lattices 2: conjunction
• Let P={p1, …, pk} be a set of predicates
• Define the lattice Conj(P) where its elements
are subsets of P and each element X  P is
interpreted conjunctively
– For example, {a, b} stands for ab
• (D, , , , , )
73
Lattice of natural numbers
• Consider the set of natural numbers N with
the natural order < and bottom element 0
• Is it a partial order?
• What is the join operator?
• What is the meet operator?
• Is it a lattice?
• Is it a complete lattice?
74
Powerset lattices
• For a set of elements X we define the
powerset lattice for X as
(2X, , , , , X)
– Notice it is a complete lattice
• For a set of program states , we define the
collecting lattice
(2, , , , , )
75
see you next time