Compactly Representing First-Order Structures for Static Analysis

Compactly Representing
First-Order Structures for
Static Analysis
Tel-Aviv University
Roman Manevich
Mooly Sagiv
I.B.M T.J. Watson
Ganesan Ramalingam
John Field
Deepak Goyal
Motivation


TVLA is a powerful and general abstract
interpretation system
Abstract interpretation in TVLA



Operational semantics is expressed with
first-order logic formulae
Program states are represented as
sets of Evolving First-Order Structures
Space is a major bottleneck
Desired Properties

Sparse data structures

Share common sub-structures



Inherited sharing
Incidental sharing due to program invariants
But feasible time performance

Phase sensitive data structures
Outline


Background
First-order structure representations




Base representation (TVLA 0.91)
BDD representation
Empirical evaluation
Conclusion
First-Order Logical Structures




Generalize shape graphs
Arbitrary set of individuals
Arbitrary set of predicates on individuals
Dynamically evolving



Usually small changes
Properties are extracted by evaluating first
order formula: ∃v1 , v: x(v1) ∧ n(v1, v)
Join operator requires isomorphism testing
First-Order Structure ADT







Structure : new() /* empty structure */
SetOfNodes : nodeSet(Structure)
Node : newNode(Structure)
removeNode(Structure, node)
Kleene eval(Structure, p(r), <u1, . . . ,ur>)
update(Structure, p(r), <u1, . . . ,ur>, Kleene)
Structure copy(Structure)
print_all Example
/* list.h */
typedef struct node {
struct node * n;
int data;
} * L;
/* print.c */
#include “list.h”
void print_all(L y) {
L x;
x = y;
while (x != NULL) {
/* assert(x != NULL) */
printf(“elem=%d”, xdata);
x = xn;
}
}
print_all Example
n=½
x = y
x’(v) := y(v)
copy(S0) : S1
nodeset(S0) : {u1, u}
eval(S0, y, u1) : 1
update(S1, x, u1, 1)
eval(S0, y, u) : 0
update(S1, x, u, 0)
S0
u1
y=1
n=½
u
sm=½
n=½
S1
u1
y=1
x=1
n=½
u
sm=½
print_all Example
n=½
while (x != NULL)
precondition : ∃v x(v)
x = x  n
focus : ∃v1 x(v1) ∧ n(v1, v)
x’(v) := ∃v1 x(v1) ∧ n(v1, v)
S1
u1
u
n=½
x=1
sm=½
y=1
n=½
S2.0
u
sm=½
u1
y=1
n=½
S2.1
u1
y=1
n=1
u
x=1
n=½
S2.2
u1
y=1
n=1
u.1
x=1
n=½
n=½
u.0
sm=½
Overview and Main Results
1.
Two novel representations of first-order
structures


2.
3.
New BDD representation
New representation using functional maps
Implementation techniques
Empirical evaluation



Comparison of different representations
Space is reduced by a factor of 4–10
New representations scale better
Base Representation
(Tal Lev-Ami SAS 2000)



Two-Level Map :
Predicate  (Node Tuple  Kleene)
Sparse Representation
Limited inherited sharing by
“Copy-On-Write”
BDDs in a Nutshell (Bryant 86)



Ordered Binary Decision Diagrams
Data structure for Boolean functions
Functions are represented as (unique) DAGs
x1
x2
x3
f
0
0
0
0
0
0
1
0
0
1
0
0
0
1
1
1
1
0
0
0
1
0
1
1
1
1
0
0
1
1
1
1
x1
x2
x2
x3
0
x3
0
0
x3
1
0
x3
1
0
1
BDDs in a Nutshell (Bryant 86)




Ordered Binary Decision Diagrams
Data structure for Boolean functions
Functions are represented as (unique) DAGs
Also achieve sharing across functions
x1
x1
x2
x3
x2
x3
0
x3
x3
1
Duplicate Terminals
x1
x2
x2
x3
x3
0
1
Duplicate Nonterminals
x2
x3
0
1
Redundant Tests
Encoding Structures Using Integers

Static encoding of



Dynamic encoding of nodes


Predicates
Kleene values
0, 1, …, n-1
Encode predicate p’s values as

ep(p).en(u1). en(u2) . … . en(un) . ek(Kleene)
BDD Representation of Integer Sets

Characteristic function

S={1,5}
S
1=<001>
5=<101>
=
(¬x1¬x2x3) 
(x1¬x2x3)
x1
x2
x2
x3
0
1
BDD Representation of Integer Sets

Characteristic function

S={1,5}
S
1=<001>
5=<101>
=
(¬x1¬x2x3) 
(x1¬x2x3)
x1
x2
x2
x3
1
BDD Representation Example
n=½
S0
u
u1 n=½
sm=½
y=1
S0
1
BDD Representation Example
n=½
S0
u
u1 n=½
sm=½
y=1
S0
S1
x=y
n=½
S1
u1
u
n=½
x=1
sm=½
y=1
1
BDD Representation Example
S2.2
n=½
S0
u
u1 n=½
sm=½
y=1
S0
S1
x=y
n=½
S1
u1
u
n=½
x=1
sm=½
y=1
x=xn
n=½
S2.2
u1
y=1
n=1
u.1
x=1
n=½
n=½
u.0
sm=½
1
BDD Representation Example
S2.2
n=½
S0
u
u1 n=½
sm=½
y=1
S0
S1
x=y
n=½
S1
u1
u
n=½
x=1
sm=½
y=1
x=xn
n=½
S2.2
u1
y=1
n=1
u.1
x=1
n=½
n=½
u.0
sm=½
1
Improved BDD Representation


Using this representation directly
doesn’t save space
Observation


Our heuristics




Node names can be arbitrarily remapped without
affecting the ADT semantics
Use canonic node names to encode nodes
Increases incidental sharing
Reduces isomorphism test to pointer comparison
4-10 space reduction
Reducing Time Overhead

Current implementation not optimized


Expensive formula evaluation
Hybrid representation


Distinguish between phases:
mutable phase  Join  immutable phase
Dynamically switch representations
Functional Representation






Alternative representation for first-order structures
Structures represented by maps from integers to
Kleene values
Tailored for representing first-order structures
Achieves better results than BDDs
Techniques similar to the BDD representation
More details in the paper
Empirical Evaluation

Benchmarks:





Cleanness Analysis (SAS 2000)
Garbage Collector
CMP (PLDI 2002) of Java Front-End and Kernel
Benchmarks
Mobile Ambients (ESOP 2000)
Stress testing the representations


We use “relational analysis”
Save structures in every CFG location
Space Results
450
402.8
400
350
300
Base
OBDD total
Functional
250
200
187.7
168.2
150
100
51.6
50
12.8
5.5
22.7 16.7
12.9
9.6
0
JFE
KERNEL
CA
MA
GC
Abstract Counters


Ignore language/implementation details
A more reliable measurement technique


Count only crucial space information
Independent of C/Java
Abstract Counters Results
45,000,000
40,000,000
35,000,000
30,000,000
Base
OBDD
Functional
25,000,000
20,000,000
15,000,000
10,000,000
5,000,000
0
JFE
KERNEL
CA
MA
GC
Trends in the
Cleanness Analysis Benchmark
600
500
564
505
400
Base
OBDD
Functional
300
200
100
0
74
54
42
50
1
2
3
4
5
6
7
8
9
10
What’s Missing from this Work?



Investigate other node mapping heuristics
Compactly represent sets of structures
Time optimizations
Conclusions

Two novel representations of first-order structures



Implementation techniques


New BDD representation
New representation using functional maps
Normalization techniques are crucial
Empirical evaluation



Comparison of different representations
Space is reduced by a factor of 4–10
New representations scale better
Conclusions

The use of BDDs for static analysis
is not a panacea for space saving


Domain-specific encoding crucial for saving space
Failed attempts
Original implementation of Veith’s encoding
 PAG

The End