4 Model Checking Program Analysis

Convergence of Model Checking & Program Analysis
Philippe Giabbanelli
CMPT 894 – Spring 2008
Some of you know more than me on the topic…
…do not hesitate to point out my mistakes at the end !
Overview
Convergence
Customization
1
Toward customization
Formalism
Analysis
Results
• We are doing static analysis: we want
the properties of programs without having
to execute them.
• We have the code of a program, and we
want to be guaranteed that there are no
errors.
• We don’t work with the raw code but
with higher representations of it like the
Control Flow Graph (CFG).
• Basic example from the CFG: if there are
too many nested control flow instructions, it
is a bad pattern in the complexity.
2
Toward customization
Formalism
Model Checking
Analysis
Results
Program Analysis
I╞ S ?
Given an implementation of
the system, does it satisfy
the specifications ?
• The implementation is
represented as an automaton.
• If we represent all states, it
will be huge… thus we have
sets of states (OBDD).
• Accurate analysis but very
costly: small programs!
• If we have big programs, we
cannot afford too costly operations:
we use approximations.
• The aim is an efficient calculus of
more or less basic properties.
Graphs from Patrick Clousot
3
Toward customization
Formalism
Analysis
Results
There are different paths through the states of the program. They
Model
Checking
Program
Analysis
define
a « reachability
tree » (i.e. states
that ‘can happen’).
• Not every path can happen:
we want to be accurate, we
keep the paths well separate.
• We consider that every path of
the CFG can be executed: we are
doing an approximation by
ignoring things such as conditional
statements.
→ Nodes are never merged.
→ Two nodes are merged if they
refer to the same control location.
The merge operator is a bit different.
4
Toward customization
Formalism
Analysis
Results
We don’t explore the reachability tree forever: we stop at some point.
That’s what we call the « termination » : when we stop at a node.
Model Checking
• We stop when the set of
states computed for the next
step is included in the current.
Program Analysis
• We stop when the abstract state
does not represent new concrete
states (it is a fixpoint).
The termination operator is a bit different.
The merge operator is a bit different.
5
Toward customization
Formalism
Analysis
Results
• The model checker BLAST has been extended to allow customized
program analyses.
• We have a set of abstract interpreters. Let’s call the overall engine the
meta-engine.
• We configure the meta-engine by defining a composite merge operator
and a composite terminator operator (composite as several interpreters).
The termination operator is a bit different.
The merge operator is a bit different.
6
Toward customization
Formalism
Analysis
Results
So, what do we have here?
• It is neither stricly a model
checker nor a program analyzer.
• The difference between a
model checker and a program
analyzer is a somehow a
parameter of our configurable
model.
• This illustrates the convergence:
we have a bit of both approaches!
7
Toward customization
Formalism
Analysis
Results
• We consider simple imperative programming languages.
• What do we have in a program ?
∙ Lines, the current one being indicated by Pc.
∙ The control flow is transfered from one location to another.
∙ A set X of variables.
• The ways to move in a program are already given by a CFG. If we add
information about variables, we are turning it into an automaton (CFA).
∙ L is the set of program locations (values taken by Pc).
∙
are the control-flow edges
∙ A concrete state c associates all variables X and Pc to a value.
∙ C is the set of all possible concrete states.
∙ A subset r of C is called a region.
8
Toward customization
Formalism
Analysis
Results
• We have the automaton, but how do we move in it?
∙ G is the set of transitions (or edges). g is a transition if it
belongs to G.
∙ c → c’: we can go from the concrete state c to the concrete
state c’ if there exists a transition from c to c’ by some g
∙ A state cn is reachable from r if there is a path from r to cn.
This path is defined as a set of transitions:
cn belongs to Reach(r) if there exists (r, c1,…, cn) such
that ci-1 → ci for all 1 ≤ i ≤ n.
9
Toward customization
Formalism
Analysis
Results
• Let see what an abstract domain is through an example…
∙ What is the result of 48176 * 59876 * 285561 ?
∙ What is the sign of 48176 * 59876 * 285561 ?
∙ We replaced the domain of integers by the domain of sign.


∙ As we said in abstract interpretation, we are simplifying a bit
the problem. Here, we consider an abstract domain of signs:
we still compute something relevant, but a bit less ambitious.
∙ We take the problem, we abstract its domain and we get an
answer in a reasonable time. To get back from the abstract
domain to the concrete one, we use a concretization function.
C(negative) = [-∞, -1]
C(positive) = [0, +∞]
10
Toward customization
Formalism
Analysis
Results
Schemes from Patrick Cousot, « Abstract Interpretation Based Formal Methods and
Future Challengeces » (Springer-Verlag 2001)
11
Toward customization
Formalism
Analysis
Results
• Now that we’ve got the basis, what is configurable program analysis?
∙ An abstract domain, determines the objective of the analysis.
∙ A transfer relation, assigns to each abstract state its successors.
∙ The merge operator, combines two abstract states.
∙ The termination check : if the abstract state given as first
parameter is covered by the states of abstract state given as
second parameter, then we stop. The way we define covered is
where the customization happens.
• Each of the four components independently influences precision&cost.
12
Toward customization
Formalism
Analysis
Results
• Among the program analysis used in the experiments, we have:
location analysis, predicate abstraction and shape analysis.
∙ In location analysis, we track the reachability of program
locations. Can we go there?
∙ The predicate abstraction defined by Ball, Podeslki and
Rajamani considers programs where the only type is boolean.
Their programs c2bp turned a program P with predicates E into
a boolean program B(P, E), and then they launched the model
checker bebop. Developped between 1999-2001.
∙ There might be destructive updating on dynamically allocated
storage. Shape analysis keeps tracks of the data structures
stored on the heap (i.e. dynamic allocation).
13
Toward customization
Formalism
Analysis
Results
• The authors have developped the reachability algorithm CPA. Given a
configurable program analysis and an initial abstract state, it gives the
set of reachable abstract states that over-approximates the set of
reachable concrete states.
∙ Let see what an over-approximation is through an example.
int i = 0;
At the end, i is in [0, 10]. An
for(int j=0; j<10; j++)
over-approximation of those
tttttif (rand() > 0.3)
concrete states would be [-5,15].
tttttttttti++
∙ Why
If we don’t
set
we
merge
always
operator
go
for
Now,
we the
configure
the
merge
the
combine,
accurate
then
analysis?
it considers
Because
to keep
things
separated,
and if
he
thathave
x might
n if, xbe
itand
might
equal
lead
to zhave
and
to 2^n
we
find
that
y will
abstract
declares states
that
itand
isno
unsafe.
the
algorithm
Thisby
different
values:
division
might
is not
not terminate
in
because
concrete
of
0,
the possible
program
is the
safe,
hurray!
loops…
program: false-alarm!
14
Toward customization
Formalism
Analysis
Results
• We have many tools, some being accurate but quite slow, other
converging faster but loosing in accuracy… Let’s combine them!
∙ For example, predicate abstraction and shape analysis can
be combined. The shape graph can become more accurate by
using information from the predicate abstraction. The
accuracy of the combination is the degree of sharpness.
• Three composite program analysis are used in the experiments.
∙ Basic BLAST’s. The components are the configurable
program analysis for program locations and the configurable
program analysis for predicate. Merge/Step are separated.
∙ We can add to it a third component: shape analysis.
∙ We can also add to it pointer analysis as a third component
(tracks pointer aliases, memory allocations, etc.)
15
Toward customization
Formalism
Analysis
Results
• Let’s consider BLATS’s + shape analysis. We want an efficient
configuration for it. Let see what we’ve got from the experiments…
∙ A. Stop is on separate and merge is on join. Bad: many
highly expensive operations.
∙ B. Stop is on separate and merge is on separate (lazy shape
analysis). Better cost as operations on small sets of shape
graphs are much more efficient than on large one (like in A).
∙ C. Stop is on separate and merge is on separate and we use
the predicate to sharpen the shape information. This sharpening
has a small cost and gives a more precise analysis!
∙ D. To do better than B, let’s turn stop on join. On the examples,
it doesn’t do well: the time spent on termination checks was very
small, and as there is an overhead for the join then we are loosing
in performances.
∙ E. Turning both on join looses precision and has a high cost. 16
Toward customization
Formalism
Analysis
Results
Source for this presentation. Configurable Software Verification:
concretizing the convergence of model checking and program
analysis, Dirk Beyer, Thomas A. Henzinger and Grégory Théoduloz
Thanks For Your Time 
17