Course Outline
• Traditional Static Program Analysis
– Theory
• Compiler Optimizations; Control Flow Graphs
• Data-flow Analysis: Data-flow frameworks
– Classic analyses and applications
• Software Testing
• Dynamic Program Analysis
Announcements
• No class on Monday
– Tuesday follows Tuesday schedule
• Homework 1 due today
• Homework 2 posted
– Due Monday, February 28th
• Requests for CS accounts sent to labstaff
Outline
• Data-flow frameworks
–
–
–
–
–
Monotone frameworks
Distributive frameworks
A Non-distributive example: Points-to Analysis
The “Maximal Fixed Point” (MFP) solution
The “Meet Over all Paths” (MOP) solution
• Reading: Compilers: Principles, Techniques and
Tools, by Aho, Lam, Sethi and Ullman, Chapter
9.3
Monotone Dataflow Frameworks
• Generic data-flow equations:
in(i) = V out(m)
m in pred(i)
out(i) = fi (in(i))
Parameters:
– Property space: in(i), out(i) are elements of a property space
•
Combination operator V: U for may problems and ∩ for must
•
problems
Initial values set to the 0 (smallest element) of the property space
–
Transfer functions: fi is associated with node i
–
If we instantiate these parameters in a certain way, then our
analysis is an instance of the monotone dataflow framework
Monotone Frameworks:
Requirements
•
•
•
The property space
Is a complete lattice L under partial order ≤
where L satisfies the Ascending Chain Condition
(i.e., all ascending chains are finite)
The combination operator V
Is the join (V, pronounced “vee”) of L
Initial values set to the 0 of L
–
–
Reaching Definitions: Property space? Combination operator?
Available Expressions: Property space? Combination operator?
Monotone Frameworks:
Requirements
•
The transfer functions: fi: L L
Formally, there is space F such that
1.
2.
3.
4.
F contains all fi
F contains the identity function id(x) = x
F is closed under composition
Each fi is monotone
Monotonicity
• It is defined as
(1) a ≤ b
f(a) ≤ f(b)
• An equivalent definitions is
(2) f(x) V f(y) ≤ f(x V y)
• Lemma: The two definitions are equivalent.
First, we show that (1) implies (2).
Second, we show that (2) implies (1).
Distributive Frameworks
• A distributive framework: A monotone
framework with distributive transfer
functions: f(x) V f(y) = f(x V y).
The four classical dataflow problems
Let Def denote all definitions in the program
Let 2|Def| denote the powerset of Def
Let AExp denote all expressions in the program.
Let 2|AExp| denote the powerset of AExp
Reaching Definitions
Available Expressions
L,
elements
L, ≤
2|Def|
2|AExp|
2|Def|,
2|AExp|,
L,∨
2|Def|, U
2|AExp|, ∩
fi
out(i)=(in(i)-kill(i))Ugen(i)
kill(i) = ?
gen(i) = ?
out(i)=(in(i)-kill(i))Ugen(i)
kill(i) = ?
gen(i) = ?
Distributive Frameworks
• Each of the four problems is an instance of
a distributive framework.
– First, prove monotonicity
– Second, prove distributivity of the functions
Distributivity
• Each of the four problems is an instance of a
distributive framework.
– First, prove monotonicity
if in’(i) ≤ in”(i) then out’(i) ≤ out”(i)
For Reaching Definitions we have to show:
if in’(i) in”(i) then
(in’(i)∩pres(i)) U gen(i) (in”(i)∩pres(i)) U gen(i)
– Second, prove distributivity
((in’(i) U in”(i))∩pres(i)) U gen(i) =
((in’(i)∩pres(i)) U gen(i)) U ((in”(i)∩pres(i)) U gen(i))
Points-to Analysis
• Points-to Analysis: a fundamental analysis. It
computes the memory locations that a pointer
variable may point to
Example 1:
int a, b;
int *p1, *p2;
p1 = &a;
p2 = p1;
*p2 = 1;
Example 2:
int a, b = 15;
int *p1, *p2;
int **p3;
p3 = &p1;
p1 = &a;
p2 = *p3;
*p2 = b;
Points-to Analysis: Monotone, Nondistributive Analysis
•
•
•
•
Lattice: The set of all points-to graphs Pt
≤ is inclusion, Pt1 ≤ Pt2 if Pt1 is a subgraph of Pt2
V is union, P1 V P2 = P1 U P2
Transfer functions are defined on four kinds of statements:
– (1) f(p=&q) is “kill” all points-to edges from p, and “generate a
new points-to edge from p to q
– (2) f(p=q) is “kill” all points-to edges from p, and “generate” new
points-to edges from p to every x such that q points-to x
– (3) f(p=*q) is “kill” all points-to edges from p, and “generate”
new points to edges from p to every x, such that there exists y and
q points to y and y points to x
– (4) f(*p=q) Do not perform kill. Can you think of a reason why?
“Generate” new points-to edges from every y to every x, such that
p points to y and q points to x.
Monotone non-distributive Analysis
• First, we show that the framework is
monotone,
– I.e., for each of the four transfer functions we have
to show that if Pt1 ≤ Pt2, then f(Pt1) ≤ f(Pt2)
• Second, we show that the framework is not
distributive
– It is easy to show f(Pt1 V Pt2) ≠ f(Pt1) V f(Pt2)
• Another example is constant propagation
Non-distributivity of Points-to Analysis
p=&x;
q=&y;
p
p=&z;
q=&w;
q
p
Pt1:
y
p
q
x
y
f(Pt1) V f(Pt2) :
w
p
q
z
w
p
q
x
y
z
y
z
f(Pt2):
f(Pt1):
x
q
z
*p=q
q
Pt1 V Pt2 :
Pt2:
x
p
What f does: Adds edges from each
variable that p points to (i.e., x and z),
to each variable where q points to
(i.e., y and w).
4 new edges: from x to y and w, and from
z to y and w.
f(Pt1 V Pt2):
w
w
p
q
x
y
z
w
The Maximal Fixed Point
/* Initialize to initial values */
in(1)=InitialValue;
in(1) = UNDEF
for m := 2 to n do in(m) := 0;
in(m) := Ø
W := {1,2,…,n} /* put every node on the worklist */
1
(MFP)
while W ≠ Ø do {
remove i from W;
out(i) = fi(in(i));
outRD(i) = inRD(i)∩pres(i)Ugen(i)
for j in successors(i)
if out(i) ≤ in(j) then {
if outRD(i) not subset of inRD (j)
in(j) = out(i) V in(j);
inRD(j) = out(i) U inRD(j)
if j not in W do add j to W
}
}
1. The Least Fixed Point (LFP) actually…
Properties of the algorithm
• Lemma1: The algorithm terminates.
Sketch of the proof:
We have ink(j) ≤ ink+1(j) and since L has ACC, in(j)
changes at most O(h) times. Thus, each j is put on
W at most O(h) times (h is the height of the lattice
L).
Complexity: At each iteration, the analysis examines
e(j)out edges. Thus, number of basic operations is
bounded by h*(e(1)out+…+e(N)out)=O(h*E).
We can do better on reducible graphs.
Properties of the Algorithm
• Lemma2: The algorithm computes the least
solution of the dataflow equations.
– For every node i MFP computes solution MFP(i)
= {in(i),out(i)}, such that every other solution
{in’(i),out’(i)} of the dataflow equations is
“larger” than the MFP
• Lemma3: The algorithm computes a correct
(safe) solution.
Example
1. z:=x+y
2. if (z > 500)
3. skip
inAE(1) = Ø
outAE(1) = (inAE(1)-Ez)
{x+y}
inAE(2) = outAE(1) V outAE(3)
outAE(2) = inAE(2)
inout(3) = outAE(2)
outAE(3) = inAE(3)
Solution1
Ø
{x+y}
Solution2
Ø
{x+y}
{x+y}
Ø
{x+y}
Ø
Equivalent to: inAE(2) = {x+y} V inAE(2)
and recall that V is ∩ (i.e., set intersection).
That is why we needed to initialize inAE(2) and the other
initial values to the universal set of expressions (0 of the Available
Expressions lattice), rather than to the more intuitive empty set.
Meet Over All Paths (MOP) Solution1
ρ
n1
n2
…
nk
n
• Desired dataflow information at n is obtained by
traversing ALL PATHS from ρ to n. For every path
p=(ρ, n1, n2 ..., nk) we compute
fnk(…fn2(fn1(init(ρ))))
• The MOP at entry of n is V fnk(…fn2(fn1(init(ρ))))
p in paths from ρ to n
• The MOP is the best summary of dataflow facts
possible to compute with this static analysis
MOP vs. MFP
• For distributive functions the dataflow analysis can
merge paths (p1, p2), without loss of precision!
– E.g., fp1(0) need not be calculated explicitly
– MFP=MOP
• Due to Kam and Ullman, 1976,1977: This is not true
for monotone functions.
• Lemma 3: The MFP approximates the MOP for
general monotone functions: MFP ≥ MOP
Many Applications!
• White-box testing: compute coverage
– Control-flow-based testing
– Data-flow-based testing
• Intuitively, test each def-use chain
• Regression testing
– Analyze changes and select regression tests that
actually test changed code
Many Applications!
• Reverse engineering
– UML class diagrams
– UML sequence diagrams
– Many tools do these; Eclipse plug-ins
• Automated refactoring
– Analyze program, prove “safety” of the
refactoring
– Eclipse plug-ins
Many Applications!
• Static debugging
– Memory errors in C/C++ programs
• Memory leaks
• Null pointer dereferences
• Array-out-of-bound accesses
– Concurrency errors in shared-memory apps
• Data-races
• Atomicity violations
• Deadlocks
© Copyright 2026 Paperzz