Lecture No

Program analysis: Lecture #12
1
Programming Languages Analysis / Mooly Sagiv, Noam Rinetzky
Lecture #12, 10 June 2004: Assume/Guarantee Reasoning using Abstract
Interpretation
Notes by: Daniel Deutch
Assume/Guarantee Reasoning using Abstract
Interpretation
Limitations of whole program analysis
1. Complexity. Chaotic iterations’ complexity can become very high. The time
complexity may rise to double-exponential for shape analysis. Since programs regularly
include hundreds of thousands of commands, using the algorithm on such programs
becomes unfeasible.
2. Code availability. In many cases, not all of the code is available. When external
libraries are used in the application, the algorithm (as well as the developer) does not
know the code within the library, which can have a crucial effect on the analysis. For
example, in the constant propagation problem, the library code can affect the values of
the variables, thus breaking the property of them being constant.
3. Interaction with the developer. Our methods work only after the code is written, and
we may be interested in methods that interact with the developer at design time, to help
him design his program in a correct manner.
An alternative solution- A/G reasoning
An alternative approach for program analysis is influenced by “design by contract”
approach, which is a general method for design. When Conducting the design in this
manner, the developer states, for each procedure in the program, a “contract” for it- what
does the procedure assume regarding its input(pre condition), and what does it
guarantee regarding its output(post condition)
When a good design is done, the developer determines those contracts even before
writing the code within the procedures.
Program analysis: Lecture #12
2
Below is an initial and general description of the solution, followed by a simple example.
Later, a more detailed explanation will be given.
The main idea
The solution suggested receives as input,(and optionally, in addition, the program in
question), a formal description of the contracts.
1. Analyze the program procedure by procedure, in an arbitrary order.
2. When starting the analysis of a procedure, assume that its preconditions are met at its
entry point(before the first command of the procedure).
3. For every procedure call, check that the preconditions required by the invoked
procedure are met.
4. After every procedure call (meaning, before its succeeding command) , assume that
the post conditions of the procedure called are met.
5. At the end of the procedure analyzed, check that its post conditions are met.
Comments:
1. As one can tell, this technique is very similar to induction, where the functions called
are assumed to being consistent with their contracts, and later on this assumption is
checked, using assumptions on the function called within it, etc. This technique is known
as Co-induction.
2. The complexity of this method is linear in the size of the program, since we do not
analyze the procedure body for each invocation context, as in, for example in the callstring approach..Instead , we use the contract of the procedure to obtain the information
needed.
3. Since the actual code of the called function is not really needed, as long as we have the
pre and post condition (and ,of course, in the absence of code, we have to assume that
they are correct), the algorithm can also handle a situation in which part of the code is not
known, a common situation when using an external library.
Example
Code
List Reverse(List x) {
(1) If (x == null) return null;
(2) Return append(rev(x->next),x);
}
List append(List p,List y) {
(3) List e;
Program analysis: Lecture #12
3
(4) If (p == null) return y;
(5) e = malloc(…);
(6) e->data = p->data;
(7) e->next = append(p->next,y);
(8) return e;
Contracts
List rev(List x)
(1) Requires acyclic(x)
(2) Ensures RetVal = reverse(x)
List append(List x,List y)
(3) Requires acyclic(x) and acyclic(y)
(4) Ensures return value = x||y (y appended to x, at its end).
Analyzing Rev:
contract part(1) is assumed before line(1)( assume x acyclic)
In line (2), a call to rev is made- thus, we check the precondition, i.e., check that x->next
points to an acyclic list., and since x is acyclic so is x->next.
At the return from rev - assume that the return value is reverse of (x->next)
A call to append- check (Contract part (3)) that the two parameters: the returned value of
rev and x is acyclic. By the assumptions above, they are.
Assume that the return value from append satisfies contract part (4), meaning it is x>next reversed (the returne value of rev), appended to x.
So the valued returned is x reversed. Contract part (2) is satisfied.
Analyzing append:
Contract (3) is assumed- p, and y are acyclic.
Before the call to append, check that contract 3 is satisfied, i.e, that p->next is acyclic
(which is, since p is a cyclic and p did not change), y is acyclic (y did not change).
Assume that after the call, the returned value is x->next appended to y
Thus, e->next = x->next appended to y, and since e->data = x->data we get that e = x|| y.
The return valus is e, i.e., x|| y, thus contract part (4) is satisfied.
Program analysis: Lecture #12
4
Challenges in A\G reasoning
1. Specifying the contracts. The developer should know how to specify each procedure,
i.e., say wht it expects, and what it ensures. Sometimes writing the specification is as
challenging as writing the code, if not even more. The manner in which the developer
tells this information, i.e. designing a rich enough, and useable specification language,
is also a challenge.
2. Performing the abstract interpretation using the contracts: one need to design an
algorithm that can take into account the specification and perform analysis as the ones
shown in the above example.
We will mainly focus on the second challenge, but we start at discussing the first one.
Specification of the contracts can be written using expression of the programming
language, or even code fragments (and in this case thaey can be also tested at runtime,
using assert for example), or, using a declarative languages as first order logic.
Combinations of the two methods are also available, in implementations such as Larch
and java modeling language.
A problem that arises when we consider Pre and Post conditions of a procedure, is that a
procedure may have side-effects, as well, for example:
double y:= 3;
Double Divide(double x,double y)
{
Requires y <> 0
Ensures RetVal = x/y
}
Void Test(double x)
{
Divide(x,y);
Divide(1,y);
}
When using the assume guarantee framework, all we know at procedure return is what is
specified by it post-condition. Thus, we cannot be sure that Divide does not change y! So
we cannot be sure that y is not zero, and thus the next call to divide may not satisfy the
Pre condition of the function.
In order to reduce false alarms, we request the user to specify, for each procedure, not
only its pre and post conditions, but also its side-effects. If something is not mentioned in
the side-effects specification, we assume that it does not happen- meaning, in the
example above, it is implicitly ensured that y will not be changed by Divide.
Program analysis: Lecture #12
5
Following this paradigm can be difficult, since writing the side-effect in a language with
pointer is not easy as the side effects are very context-sensitive. That is, the side effect
might depend on the context and the state of the variables when the procedure is called. If
a procedures that alters the contents of its x parameter, when x is a pointer, is called,
when x points to y, it has a side effect on y, and if it is called when x points to z, it has
side-effects on z. It is not reasonable to request the developer to do this analysis
manually. One way to help the programmer is to this analysis automatically, using the
Points-To analysis studied in previous lessons. This analysis
is feasible even for large programs, since it is efficient.
The specification language of the contracts should maintain, to the best extent possible,
certain characters:
1. Expressible- we should be able to define many kind of contracts.
2. Concise – the description of the contracts should not be too large in size.
3. Natural – The “human” consideration. Since people, developers, are to use the
language, it should be natural for them.
4. Reuse – change of specification can be done in certain manner that keeps proves valid.
5. Decidability- we would like some questions regarding relationship between
specifications to be decidable, so we’ll be ablr yo use automatic tools, e.g., theorem
provers, to verify the contracts..
6. Cost of model checking – the complexity of checking whether a condition is met.
7. Cost of abstract interpretation – the complexity of conducting the analysis.
Those characters maintain a trade-off relationship among it selves.
In the rest of the lesson we shall show two different applications of the analysis:
1. CSSV for detecting buffer overflows in C
2. An algorithm for performing abstract interpretation
CCSV: Towards a Realistic Tool for Statically
Detecting All Buffer Overflows in C / Nurit Dor,
Michael Rodeh, Mooly Sagiv
General Goal: Detect statically if there is buffer overrun.
The importance of this problem is obvious for any programmer who ever programmed in
C. In surveys conducted it was claimed that about 50% of all bugs are due to buffer
overruns, and moreover, also 50% of the attacks done on organizations’ systems exploit
bugs related to buffer overrun.
Program analysis: Lecture #12
6
CCSV stands for C String Static Verifyer.
Specific Goals:
Finding efficient conservative static checking algorithm that:
-Find all buffer overruns
- verify, in a good accuracy, the absence of buffer overflow. Meaning, reduce the number
of false alarms
- Handle all C features – including pointer arithmetic, casting, dynamic memory
allocation, etc.
The target of verifying absence of buffer overrun is non-trivial, as the following example
can show:
(In curly bracket: An informal specification of the conditions that need to be to verified
before each command to ensure the absence of buffer overruns, where string means null
terminated array of characters, len is the length of the string and alloc is the size of the
buffer)
void concat(char* dst,int size,char* src)
{
{string(dst) and string(src) and (size > len(src) + len(dst)alloc(dst+len(dst> len(src)}
if (size > strlen(src)+strlen(dst))
{
{string(src) and string(dst) and alloc(dst+len(dst)) > len(src)}
dst = dst + strlen(dst);
{string(src) and alloc(dst) > len(src)}
strcpy(dst,src);
}
}
The best way to understand the reason why the conditions that need to be verified is to go
from the end of the procedure to the beginning. In order to verify the strcpy operation we
need to know that (i) src is a string and (ii) that the buffer is longer than the copied string.
Thus, we need to ensure that before w increment dst by strlen(dst) that (i) dst is also a
string (because of the call to strlen) and that the buffer that dst points to is big enough to
contain the two strings, etc.
Nevertheless, this analysis can be done for real C programs.
Linear Relation analysis
As shown in the example, to analyze buffer overrun, we will need to keep track of the
numerical relations between variables, to determine, for example, that the size of a buffer
is no more than the size of another buffer+ the value of some parameter x, since the
program wants to copy the second into the first and than add x characters.
Program analysis: Lecture #12
7
Those relationships can all be expressed in a manner of inequalities system, where every
inequality has the form:
A1*Var1+A2*Var2+…+An*Varn < b
Where A1,…An,b are literals,
Var1,..Varn are variables of the program(or a small variant of it, as the size of a buffer
which a variable points to)
A graphical representation of this equations system is a polyhedron, where each point in
the space is in the polyhedron if and only if it satisfies the equations.
Example:
The polyhedrons( and equivalently the equations systems) are a lattice, where:
The join operation: the join of two polyhedrons is their convex hull, the least convex
polyhedron containing both.
The meet operation: It is easier to define the meet operation on the equation system(it is
of course equivalent), and the meet of two equation systems is an equation system
including all equations of both systems.
Program analysis: Lecture #12
8
Widening:
Denote W(P,Q) the widening operator applied to P,Q. It can be obtained by removing
from the system of P all the inequalities that are not satisfied by Q. There
exists(Halbwachs79) algorithms for computing a widening operator, which is not a trivial
task, nor is the widening operator uniquely determined by the domain.
T = the entire space,
Bottom = The empty set,
The containment relationship is the containment relationship of the sets of variables
satisfying the equations, or, equivalently, the containment of the sets of points within the
polyhedra.
Semantics for C programs
In the C programming language, every location has an address, either in the heap or in the
stack.. Strings should be terminated with null(‘\0’). However, the basic semantics of C
does not record, for allocated buffers, what size was allocated. Therefore, we create an
instrumented semantics, where we remember additional information:
1. For each buffer (for simplicity, all allocated space is considered to be a buffer,
where a simple ‘char’ variable is also a buffer, containing 1 byte) remember the
size of the buffer (measured in bytes) – asize.
2. For each address inside an allocated buffer, save the buffer addres(the first
address of the sequential area allocated, in which the address lies) – base address
3. For each address conating a pointer to a buffer, we also save the offset of the
address, which is the pointer value minus the base address of the location it points
to.
Example:
Bool substrcmp(char* str1,char*str2,int size)
{//checks if the first ‘size’ characters are identical in str1 and str2
int i;
for (i = 0;i<size;i++)
if (str1[i] <> str2[i])
return false;
return true;
}
Program analysis: Lecture #12
9
Str1 points to the first character in the buffer, thus its offset is 0, Str2 points to the second
character, thus its ofsset is 1, Size is not a pointer. The assize of Str1 is 4 – the size of the
buffer “containing the pointer”. The assize of addresses addr1 to addr1+330 is 330 since
they are allocated inside a buffer of this size.
base
Str1
Str2
Size
addr1
addr2
asize
4
4
50
char1
4
addr1
char2
-
250
250
….
addr1+250
\0
char’1
char’2
addr2
….
\0
330
330
addr2+330
offset
0
1
Program analysis: Lecture #12
10
The instrumented semantics checks validity of C expressions, according to the ANSI C,
and in addition, rules relating to strings – that the pogram foes not access characters after
the terminating null ‘\0’.
For instance, to check if str[i] is legal, the semantics actually performs the check offset(str)+i<= asize(base(str)).
Meaning, verify that the i bytes following the location str points to are still in the same
buffer as the location pointed to by str, otherwise it is an overrun.
Contracts
The contracts are also defined in the instrumented semantics. They specify the string
behavior of procedures, in terms of pre and post conditions and side-effects. Note, that
the pre and post condition are specified related to the buffers allocated. They do not relate
to condition on pointer, e.g., that the argument dre not aliases, which would be much
harder for the programmer to specify. The information regarding the pointers (both for
the conditions and for the side-effects) will be analyzed using Points-To analysis.
The contracts are defined for each procedure and are an integral part of the program.
The (atomic) statement validity checks of the instrumented semantics can also be viewed
as a precondition, this time of the statement rather than of a procedure.
Examples of contracts:
Char* strcpy(char* dst,char* src)
Requires (string(src) and alloc(dst) > len(src))
Mod dst // can modify dst
Ensures (len(dst) == [len(src)]pre and string(dst) and retrun_value == [dst]pre )
//pre means- the value before the call to the function
Bool substrcmp(char* str1,char*str2,int size)
Requires (string(str1) and string(str2) and len(str1) > size and len(str2) > size)
//nothing is ensured, but the implicit one – no string is modified
void CreateSubString(char* dst, char* src,int size)
Program analysis: Lecture #12
11
Requires string(src) and alloc(dst) > size and len(src) > size
Mod dst
Ensures string(dst) and len(dst) == size
Soundness
All string violations are detected, and violation messages are issued upon violation of
1. statement’s precondition (an illegal reference).
2. procedure’s precondition upon a procedure call.
3. procedure’s postcondition upon returning from a procedure.
The messages issued depend on the contracts the developer gives, and thus bad
contracts might lead to more false alarms, but since, as explained above, the
Assume\Guarantee methods does not rely on the conditions, but check them as it
goes over the code, bad contracts does not harm the soundness, and no matter
what contracts are given, all string violations are detected.
CCSV Static analysis stages
There are three stages in the CCSV static analysis:
1. Inline contracts – Create code that includes both the procedure C code and the
contracts.
2. Pointer analysis – Find relationship between base addresses
3. integer analysis – compute offset information
Inline contracts
The first step of the algorithm is to create a code including both the code of the program
and
the
contracts,
inserted
in
place.
As
described
above,
the
requirements(=preconditions) are inserted as assumptions at the beginning of the
function, and the ensured attributes are inserted as asserts at the end of the function,
where the meaning of the assert is of course to verify that the condition holds. Where
Program analysis: Lecture #12
12
ever there is a function call, its requirements should be checked before the call, meaning
assert is inserted before the call, and assumption is inserted after the call.
Void func1(…)
{
assume func1_requierments
…
…
assert if not func2_requierments
func2(…)
assume func2_ensurements
…
assert if not fun1_ensurments
}
Compute pointer information
The second step of the algorithm computes an abstraction of all potential pointer
relationships between locations that may occur in the execution of the programs. The
pointer analysis is conducted in two steps. A preliminary global pojnter analysis followed
by a or a procedural points to (PPT) analysis.
Global Points-To
The global pointer analysis is done globally over all of the code. As such, it is insensitive
to the control flow of the program, and thus in many manners, it is also Context
insensitive. The outcome of this is,of course, that the results of this analysis is imprecise,
which could cause many false alarms.
Example 1:
Func(char* str, int loc, char NewChar)
{//update the first character
str[loc] = NewChar;
}
main()
{
char s1[10];
char s2[20];
Func(s1,9,’a’);
Func(s2,15,’a’);
}
Program analysis: Lecture #12
S1
13
S2
And the analysis does not know that the indirection inside the function is legal- it only
knows that when str[loc] is accessed, str can point either to s1 or to s2, and indirection to
s1[15] is an overrun. Thus, a false alarm is issued.
Example 2:
safe_cat(char* dst, int size, char* src)
{
…
strcpy(dst,src);
…
}
main()
{
r
s
t
char s[10],t[20],r[30];
char *p1,*p2;
p1 = r+i;
Safe_cat(s,10,p1);
p2 = r+j;
Safe_cat(t,10,p2);
}
Again, the analysis can not tell, when the copy is done, whether dst points to s or to t.
Since s contains a string after the first safe_cat, the second call to safe_cat causes a false
alarm.
This is the reason to cssv using Procedural Points-To analysis.
Program analysis: Lecture #12
14
Procedural Points-To(PPT)
The PPT method:
1. Projects information regarding the pointers on the visible variables of the procedure
(the variables that the procedure can access).
2. Creates abstract locations for the formal parameters of the procedures.
3. Allows destructive update through the formal parameters.
Formally, A PPT state is a quadruple (BA,loc,pt,sm) where:
BA is a set of abstract locations that represent all reachable concrete base addresses( a
location l is reachable in a state if there exists a visible variable whose store contents can
include l)
Loc:{Visible variables} -> 2^ BA maps variables into set of abstract locations
representing the variable’s possible locations.
Pt: BA->2^BA. Every pointer is represented by a relationship of pt, which maps the
abstract location of the pointer to the set of abstract locations that the pointer might point
to.
Sm: BA -> {1,infinity} is an abstract count on the number of concrete base addresses
sm(ba) = infinity where ba may represent, in a given concrete store, more than one base
address.
Sm(ba) = 1 when it is guaranteed that ba represent at most 1 base address.
An abstract location having sm = inifinity is called a summary abstract location, and is
used to represent unbounded sets of base addresses.
When encountering a code such as
Safe_cat(char* dst,int size.char* src)
{
…
strcpy(dst, src)
..
}
Program analysis: Lecture #12
15
we map dst, src to different abstract locations, and then by showing that they point to
different nodes(locations), which are not summary nodes, we can show that they do not
point to the same location(where that is the case, which can be found from the global
analysis, done first).
Static integer analysis
The third and final step of CCSV involves analyzing integer values, since those have an
effect on the existence of buffer overrun – the value of the indices used for the string
indirection, for instance. We define constraint variables for all information relevant to
the strings. It is thus essential to keep track of relationships between the constraint values.
This information can be obtained, in theory, any sound integer analysis can be used.
CCSV uses Linear Relation analysis, explained above, to check this relations, and to
verify the ‘safety’ preconditions, which are the conditions that must be fulfilled so that a
string reference will be legal.
The Constraint variables:
1. For every abstract location, its offset from the start of the buffer in which it resides.
ptr
Ptr.offset = 3
2. For every integer abstract location, e.g., index, the value at that location (index.val)
3. For every abstract location a, a.is_nullt, a Boolean information which is true iff a is a
null terminated string.
4. For every abstract location a, a.len, the length of the string (number of bytes before the
‘\0’).
5. For every abstract location a, a.asize, the size of the buffer allocated (in bytes).
An example to the abstract representation and verification
Suppose that dst is a pointer that points to n1, src a pointer to n2., and there exist the
requirements:
dst ,src are strings
dst points to a legal location inside the string(before the ‘\0’)
Program analysis: Lecture #12
16
alloc(dst) = size (starting from dst, size bytes are allocated to the buffer)
(These are actually the preconditions of the concat operations)
These conditions are represented as :
n1.is_nullt = true //The buffer that dst points to is a null-terminated string.
n2.is_nullt = true //The buffer that src points to is a null-terminated string.
dst.offset < n1.len // The offset of dst from the beginning of the buffer is less than the
number of bytes allocated for the buffer
dst.offset +size.val = n1.asize // Add size to the dst position, relative to the beginning of
the buffer. The result should be still be insided the buffer (n1)
Note that the abstract interpretation uses only relative notions such as offset and assize,
and not notions that depend on the physical address, such as physical base addresses. This
ensures that the results will be independent of the physical addresses allocated for the
buffers.
Safety condition are verified in the same manner: If we should verify that a command
such as dst = dst+i is legal, we need to make sure that dst+i is inside the allocated buffer,
meaning dst.offset + i.val <= n1.asize ( the offset of dst from the start of the buffer, plus i,
is no more than the entire size of the buffer, meaning the address of dst plus i is no more
than the last address of the buffer). Note that in the concrete semantics, we would verify
the equivalent condition (only in actual addresses), offset(dst)+ i <= asize(base(dst)).
The assume operation
It is now only left to explain how the assumptions of conditions are implemented in
CCSV. To begin with, the algorithm saves two copies of the constraint variables, one
before applying condition and one after.
For values that are to be modified, according to the contract, set their values to T(top).
The other values remain unchanged. Then, perform a meet between the resulting
polyhedron (the one we got after setting modified values to T) and the procedure post
condition
Program analysis: Lecture #12
17
Applications of CCSV
CCSV uses various software packages, including ASToolKit by Microsoft, Core C by
Greta yorsh of TAU, GOLF by Microsoft- Manuvir Das, New Polka by Inria – Bertrand
Jeannet.
It was used to verify Code from AirBus airplanes.
When applied to a string library from AirBus, it only issued 6 false alarms.
When applied to another application, It found 8 real errors, with only 2 false alarms.
Running time – Of course, depends very much on the size of the code. The running time
was 1-206 seconds per procedure.
In conclusion, CCSV is a powerful tool for finding buffer overruns and other string
violations. CCSV is quite precise, finding all violations while issuing very few false
alarms, and thus is used in verifying real C programs.
Foundation of A/G abstract interpretation/Greta Yorsh
Goals
The Goal of this part of the lesson is to present Generic algorithms for the assertions and
assumptions analysis described above. These algorithms are to be both efficient and
precise, and allow the developer to specify the contracts in a natural manner.
The idea
Let A be an abstract domain.
Let C be a concrete domain.
Let Alpha:C->A be the abstraction function.
Let Gamma:A->C the concretization function,
a will mark an element in A, an abstract value.
We introduce a third domain, F, of logic formulas. It will be used toprovide an
alternative representation for abstract values as well as specifying contracts. We define a
new function, Gamma_Hat: A-> F, where Gamma_Hat(a) is a formula f in F that exactly
characterizes Gamma(a), meaning it characterizes the set of concrete states represented
by a. More Formally, Gamma_Hat(a) is provable by a concrete state S if and only if
S is an element of Gamma(a).
Program analysis: Lecture #12
18
Example
x =2,y=4,z=-900
X=1,y=
3,z=4
(X >= 0)
And (X<=2)
And (Y>=2)
Gamma
Hat
[( x,0,2), (y,2,),
(z,T)]
Gamma
Concrete Space
Formulas
Abstract
The abstract value is a mapping from variables to the interval of thei possible values. The
formulas provide an exact characterization of all concrete states that are represented by
the abstract value.
Assume and Guarantee
As described above, to implement the Assume and Guarantee method, one should:
1. Implement the Assume[phi](a) function, whose meaning is to (possibly) reduce the
number of abstract states, subtracting all states that does not fulfill the condition phi.
2. Implement checks of the Guarantee s, in means of assertion when the condition is not
fulfilled.
Define Alpha_Hat: F->A to be the most precise abstract value that represents the set of
stores defined by the formula. Meaning, if Alpha_Hat(phi) = A1, than:
1. Gamma(A1) contains the set [|phi|], which is the set of concrete states fulfilling phi.
2. For every A2, if Gamma(A2) includes [|phi|], than Gamma(A2) contains Gamma(A1).
We shall now show the assume[phi](a) operation, which is the most-precise abstraction
of the set of stores represented by a, for which the precondition phi holds.
Program analysis: Lecture #12
19
Assume[phi](a) = Alpha_Hat(Gamma_Hat(a) AND phi).
The intuition behind this formula is:
Apply Gamma_Hat to a. The result is a formula that describes the set of states abstracted
by a. now apply the logical AND operator, on this formula, and phi, to receive a new
formula, that is satisfiable only by states that fulfill both the formula corresponding to a,
and also phi. This is indeed the meaning of assuming phi. Now return to the
corresponding abstract state, by applying Alpha_Hat.
Gamma
_Hat(a)
and phi
Gamma
_Hat(a)
Gamma
_Hat(a)
a
Alpha_hat
Assume[phi](a)
phi
Formulas
Abstract
Alpha_Hat(phi) can be implemented by performing a search on the absract lattice,
moving down with respect to the lattice orders, exploiting the monotonicity of Gamma,
Until founding the element a such that Gamma(a) is the tightest set including [|phi|].
The Guarantee part of the contract is implemented as a logical check, where assert[phi](a)
translates to the question does Gamma_Hat(a) => phi (Meaning, phi is true if the
formula representing a is true).
Program analysis: Lecture #12
20
Further issues and algorithms
3-valued Logical structure
Relation over the truth values {0, ½ , 1}, where 0 means false, ½ means unknown, 1
means true. It is a join semi-lattice , where 0 join 1 = ½
1/2
0
1
The abstraction used is canonical abstraction, where Gamma_Hat(a) is a formula in first
order logic with transitive closure that characterizes the set of concrete states represented
by a.
Example
Let say we want to find the abstract representation of all stores in which y = = x->next,
i.e., y points to the same location as the next field of the node x points to.
then
phi = v1. y(v1) iff v2. x(v2)  n(v2,v1) (n means next)
As explained before, we look for the abstract element best representing [|phi|.]
We apply assume(y==x->n)(T)
We start from the top value, and go down in the lattice. Let say that when we go down we
get to the state below, labeled ans. In this state, the formula does not get a definite value
since the value of y is not definite in it. We keep refining this value, until we get to the
lower two structures in which the formula gets a definit value, and thus, we do not refine
them.
Program analysis: Lecture #12
21
U1
xx
y
y
U1
Uy
U2
x
ans
Alpha_H
at(a)
y
x
T
y
U1
Uy
Abstract space. Gray arrows indicate value 1/2 , solid arrow indicate value 1
Materialization
Note that in order to get the structures that satisfy both Gamma_Hat(ans) and the formula
we had to materialize a node in the universe. That is, in ans there are only 2 nodes,
however in order to represent all the cyclic lists with 3 or more elements we had to
materizlize a node: “split the summary node” to a node that y points to and the rest of the
nodes.
Program analysis: Lecture #12
22
Example Of Materialization
If we apply assume[phi](ans) we get two structure, where materialization was used in
order to generate the left structure.
U1
ans
x
y
y
Materialization
U2->Uy,U2
Y(Uy) = 1,y(U2) = 0
Y(u2) = 1
X
X
U
1
y
y
U
y
U1
U2
y
U2
y
y
Program analysis: Lecture #12
23
Abstract Operations
As stated above, Alpha_Hat(phi) is the best abstract value that represent phi. Its uses:
1. assume[phi](a), as we saw, is implemented using Alpha_Hat, assume[phi](a) =
Alpha_Hat((Gamma_Hat(a) AND phi)) .
- Thus it gives us assume-guarantee reasoning
- pre and post condition are specified by logical formulas
2. BT(t,a) = Alpha_Hat(Gamma_Hat(extend(a)) AND t), where extend(a) is an extention
of the state to describe the state before and after applying the condition. It is the best
abstract transformer, and it allows parametric abstractions
3. meet(a1,a2) = Alpha_Hat(Gamma_Hat(a1) AND Gamma_Hat(a2)), and this can be
used as a way to compute the meet.
Spass
Spass is a theorem prover, which can handle arbitrary first order formulas, thus we can
use it to prove relationships between the formulas created by the algorithm. Generally, it
can diverge, but in the examples used by the algorithm it converges.
How to handle First order transitive closure formulas?
Over approximation will lead to too many structures.
Decidable Transitive-closure logic(Neil Immerman(UMASS), Alexander
Rabinovich(TAU))
Exist,For-All(TC,f) is a subset of transitive closure first order logic, where
It includes the , and  quantifiers, a single function f, and arbitrary unary relations. On
this structure, the satisfiability problem is decidable, though NEXPTIME- complete
(meaning it is assumed that the complexity of solving the question is worse than
exponential), and further more, Any extension to it which is natural, will result in
satisfiability not being decidable. This structure is quite limited.
Simulation Technique – CAV’04(Neil Immerman(UMASS), Alexander
Rabinovich(TAU))
This technique simulates realistic and complicated data structures using decidable logic
over tractable structures, simple structures which can be analyzed more easily, such as
linked list or trees. Once performing the simulation, the proofs upon the simple structures
implies properties of the complicated data strucutres.
Program analysis: Lecture #12
24
Further work
- Implementation of the algorithms
- Finding a decidable logic for shape analysis
- Performing Assume-Guarantee analysis for “real” programs
- Java collection
- Handling side-effects
- Composing a specification language
- Composing procedure specifications
- Extend to other domains
- Handling Infinite-height domains. Widening may be imprecise.
- Tuning the abstraction according to the specification (e.g., changing the abstract domain
according to the specification, to remember certain information)
Summary
- Assume/Guarantee approach can be a powerful tool in program analysis and
verification(both at design time and at run time).
- But it requires some effort:
- designing the specification language
- specification of contracts by the programmers
- performing abstract interpretation
- Specification can be verified at run time, but this requires efficient runtime testing
Resources
1. Lecture #12
2. CSSV:Towards a realistic Tool for statically detecting all buffer overflows in C\Nurit
Dor,Michael Rodeh,Mooly Sagiv
3. Symbolically Computing Most-precise Abstract Operations for Shape
Analysis\G.Yorsh,T.Reps,M.Sagiv