Context-sensitive points-to analysis: is it worth it?

Pointer and Shape Analysis Seminar
Context-sensitive
points-to analysis:
is it worth it?
Article by
Ondřej Lhoták & Laurie Hendren
from McGill University
Presentation by Roza Pogalnikova
Pointer and Shape Analysis Seminar
Abstract
z
Evaluate precision of subset-based points-to analysis
z
Compare different context-sensitivity approaches:
−
call site strings
−
object sensitivity
−
algorithm by Zhu and Calman, Whaley and Lam
(ZCWL)
28/02/08
2
Pointer and Shape Analysis Seminar
Subset-based PTA
z
Finding allocation sites that reach variable:
−
S: a = new A() // allocation statement
−
for variable x somewhere in the program: can it point to
object allocated at S?
28/02/08
3
Pointer and Shape Analysis Seminar
Context Sensitivity
z
Call site: by program statement of method
invocation
S: this->call_method()
z
Object sensitivity: by receiving object of method
invocation
S: this->call_method()
z
ZCWL: k-CFA, where k is call graph depth without
SCCs
Run context-insensitive algorithm on cloned
context-sensitive call graph.
28/02/08
4
Pointer and Shape Analysis Seminar
Parameters
z
z
Include:
−
specialize only pointer variables
−
use heap abstraction as well
Different lengths of context strings
28/02/08
5
Pointer and Shape Analysis Seminar
Measurements
z
z
Measure to guide implementation:
−
number of contexts
−
number of distinct contexts
−
number of distinct point-to sets
Measure to evaluate:
−
size of the call graph (methods/edges)
−
devirtualizable call sites
−
casts statically provable to be safe
28/02/08
6
Pointer and Shape Analysis Seminar
Results
z
Object sensitivity is the best and most scalable
z
Heap abstraction improves precision of analysis
z
Reduced analysis precision when no context
sensitivity call graph in cycles
28/02/08
7
Pointer and Shape Analysis Seminar
What
z
Compare three kinds of context-sensitive points-to
analysis:
−
call sites as context abstraction
−
object-sensitive analysis
−
ZCWL algorithm
28/02/08
8
Pointer and Shape Analysis Seminar
How
z
Implemented with JEDD system:
−
language extension of Java
−
abstraction of work with Binary Decision Diagrams
(BDDs)
−
Soot framework written in JEDD:
z
z
z
z
28/02/08
points-to analysis
call graph construction
side-effect analysis in BDDs
virtual call resolution
9
Pointer and Shape Analysis Seminar
BDDs
Binary decision tree and truth table for the function f(x1, x2, x3) = -x1
* -x2 * -x3 + x1 * x2 + x2 * x3
BDD for the function f
* credit: http://en.wikipedia.org/wiki/Binary_decision_diagram
28/02/08
10
Pointer and Shape Analysis Seminar
PTA using BDDs
z
Program:
A: a = new O()
B: b = new O()
C: c = new O()
a=b
b=a
c=b
28/02/08
z
Points-to:
(a, A)
(b, B)
(c, C)
(a, B)
(b, A)
(c, A), (c, B)
11
Pointer and Shape Analysis Seminar
PTA using BDDs
z
Binary representation:
−
a & A as 00
−
b & B as 01
−
c & C as 10
28/02/08
z
Points-to representation:
(a, A) as 0000
(a, B) as 0001
(b, A) as 0100
(b, B) as 0101
(c, A) as 1000
(c, B) as 1001
(c, C) as 1010
12
Pointer and Shape Analysis Seminar
PTA using BDDs
z
Compact way to represent points-to relations:
* credit: [2] Points-to Analysis using BDDs
28/02/08
13
Pointer and Shape Analysis Seminar
Determine
z
z
z
How many contexts generalized?
How number of contexts relates to precision of
analysis?
How likely scalable solution to be feasible?
28/02/08
14
Pointer and Shape Analysis Seminar
Background
z
O - pointer targets (objects)
z
P – pointers
z
I – method invocation
p may point to o: O(o) ϵ pt(P(p))
28/02/08
15
Pointer and Shape Analysis Seminar
Background
z
Oas – program statement where object was allocated
z
Pvar - pointer to local variable
z
[O(o), f] - field f of object o
z
Pfs(o.f) – pointer to a field f of object o
28/02/08
16
Pointer and Shape Analysis Seminar
Background
z
Compare 2 families of invocation abstraction:
−
call site Ics(i) (program statement of metacall)
−
receiver object Iro(i) = O(o) (object on which method
was invoked)
28/02/08
17
Pointer and Shape Analysis Seminar
Background
z
z
z
String of contexts given base abstraction Ibase:
Istring(i) = [Ibase(i), Ibase(i2), Ibase(i3), ...]
ij is a j'th topmost invocation on stack during i (i =
i1)
Two approaches to make it finite:
−
define limit k to length of context string
−
ZCWL: exclude cycle edges from call graph
28/02/08
18
Pointer and Shape Analysis Seminar
Background
z
z
Another choice: which pointers/objects to model
context-sensitively?
Given context-insensitive Pci and context I model
run-time pointer p:
−
context-sensitively by P(p) = [I(ip), Pci(p)]
(ip method invocation with p)
−
context-insensitively by P(p) = Pci(p)
28/02/08
19
Pointer and Shape Analysis Seminar
Background
z
Given allocation site abstraction Oas, and context I
model object o:
−
context-sensitively by O(o) = [I(io), Oas(o)]
(io method invocation where o was allocated)
−
context insensitively by O(o) = Oas(o)
28/02/08
20
Pointer and Shape Analysis Seminar
Benchmarks
z
z
The study was performed on:
−
SpecJVM 98 benchmark suite
−
DaCapo benchmark suite (ver. beta050224)
−
Ashes benchmark suite
−
Polyglot extensible Java front-end
SUN standard library 1.3.1_01
28/02/08
21
Pointer and Shape Analysis Seminar
Benchmarks
28/02/08
22
Pointer and Shape Analysis Seminar
Contexts Number
z
Considered intractable:
−
propagate context from call site to called method
−
context strings number grows exponentially in the length
of call chains
28/02/08
23
Pointer and Shape Analysis Seminar
Contexts Number
z
Clarify next issues:
−
how many of these contexts improve analysis results?
−
why BDDs can represent such number, and is there hope
to represent it with traditional techniques?
28/02/08
24
Pointer and Shape Analysis Seminar
Total contexts number
z
z
z
Count method-context pairs
Empty spots – analysis not completed with available
memory
BDD lib. could allocate 41 million BDD nodes
(~820 MB)
28/02/08
25
Pointer and Shape Analysis Seminar
Total contexts number
28/02/08
26
Pointer and Shape Analysis Seminar
Total contexts number
z
z
z
Explicit context representation not scaling good
Contexts number grows slowly in object-sensitive
(this pointer method invocations)
ZCWL
−
k is max call depth in the call graph after merging SCCs
−
big variations because k different for each benchmark
28/02/08
27
Pointer and Shape Analysis Seminar
Equivalent contexts
z
z
Method-context pairs (m1, c1) and (m2, c2) are
equivalent if:
−
m1 = m2
−
∀ local pointer p in the method, pt(P(p)) is the same for
c1 and c2
Equivalence classes reflect precision improvement
due to context sensitivity
28/02/08
28
Pointer and Shape Analysis Seminar
Equivalent contexts
28/02/08
29
Pointer and Shape Analysis Seminar
Equivalent contexts
z
BDD “automatically” merges equal points-to
relations, i. e. is effective
z
Object-sensitive vs. call sites – more precise
z
Context string length does not have great impact
z
Surprisingly ZCWL is less precise due to contextinsensitivity in SCCs
28/02/08
30
Pointer and Shape Analysis Seminar
Distinct points-to sets
z
z
z
z
Measures analysis cost
Approximates space requirements in
“traditional”representation, like shared bit-vectors
Similar results for all context-sensitive variations
Increase in distinct point-to sets with contextsensitive heap abstraction
28/02/08
31
Pointer and Shape Analysis Seminar
Distinct points-to sets
28/02/08
32
Pointer and Shape Analysis Seminar
Call Graph
z
Compare context-insensitive projection of contextsensitive call graphs
−
each node is method (and not method-context pair)
−
reachable methods preserved
−
ZCWL excluded (same as input context-insensitive
graph)
28/02/08
33
Pointer and Shape Analysis Seminar
Reachable methods
28/02/08
34
Pointer and Shape Analysis Seminar
Reachable methods
z
z
Context-sensitivity discovers more unreachable
methods (bloat)
Context-sensitivity for heap objects:
−
In object-sensitive adds precision (sablecc-j)
−
In call site no impact
28/02/08
35
Pointer and Shape Analysis Seminar
Call edges
28/02/08
36
Pointer and Shape Analysis Seminar
Call edges
z
z
Compare size of call graph in call edges
The same with exception of large difference in
sablecc-j (specific code pattern)
28/02/08
37
Pointer and Shape Analysis Seminar
Virtual call resolution
z
z
Number of virtual calls with more then one
implementation
Object-sensitive analysis has clear advantage over
call site.
−
28/02/08
heap objects add precision (sablecc-j)
38
Pointer and Shape Analysis Seminar
Virtual call resolution
28/02/08
39
Pointer and Shape Analysis Seminar
Cast safety
z
z
z
Cast cannot fail if pointer can point-to only to object
of “right” type (sub-type of the type in cast)
Count non-provable casts
Object-sensitivity, especially with heap objects is
the best (polyglot, javac)
28/02/08
40
Pointer and Shape Analysis Seminar
Cast safety
28/02/08
41
Pointer and Shape Analysis Seminar
Conclusions
z
Context-sensitive
variations:
z
Evaluated effects:
−
generated contexts
−
object-sensitive analysis
−
distinct point-to sets
−
call sites as context
abstraction
−
precision of call graph
construction
−
ZCWL algorithm
−
virtual call resolution
−
cast safety analysis
28/02/08
42
Pointer and Shape Analysis Seminar
Conclusions
z
z
Context-sensitivity improvements:
−
small: call graph precision
−
medium: virtual call resolution
−
major: cast safety analysis
Object-sensitive analysis was the best:
−
analysis precision
−
potential scalability
28/02/08
43
Pointer and Shape Analysis Seminar
Conclusions
z
Object-sensitive variations improvements:
−
small: length of context strings
−
significant: heap objects with context
−
implementable with other existing techniques
28/02/08
44
Pointer and Shape Analysis Seminar
Conclusions
z
ZCWL algorithm:
−
disappointing results
−
caused by context-insensitive treatment of calls within
SCCs of the initial graph
−
large proportion of edges in SCC
28/02/08
45