DFA

Data-Flow Analysis
(Chapter 8)
Mooly Sagiv
Make-up class May 4
Outline
•
•
•
•
•
•
•
•
What is Data-Flow Analysis?
Structure of an optimizing compiler
An example: Reaching Definitions
Basic Concepts: Lattices, FlowFunctions, and Fixed Points
Taxonomy of Data-Flow Problems and
Solutions
Iterative Data-Flow Analysis
Structural Data-Flow Analysis
DU-Chains and SSA
Data-Flow Analysis
• Input: A control flow graph
• Output: A control flow graph with
“global” information at every basic block
Examples
– Constant expressions: x+y*z
– Live variables
Compiler Structure
String of characters
Scanner
tokens
Symbol table
and
access routines
Parser
AST
Semantic
analyzer
IR
Code Generator
Object code
OS
Interface
Optimizing Compiler Structure
String of characters
Front-End
IR
Control Flow Analysis
CFG
Data Flow Analysis
CFG+information
Program Transformations
Object
code
IR
instruction selection
An Example
Reaching Definitions
• A definition --- an assignment to variable
• An assignment d reaches a basic block if
there exists an execution path to the basic
block in which the value assigned at d is
still active at the basic block
Running Example
unsigned int fib(unsigned int m)
{unsigned int f0=0, f1=1, f2, i;
if (m <= 1) {
return m;
}
else {
for (i=2, i <=m, i++) {
f2=f0+f1;
f0=f1;
f1 =f2;}
return f2; }
}
1:
receive m(val)
2:
f0  0
3:
f1  1
4:
if m <= 1 goto L3
5:
i 2
6: L1: if i <=m goto L2
7:
return f2
8: L2: f2  f0 + f1
9:
f0  f1
10:
f1  f2
11:
i  i+1
12:
goto L1
13: L3: return m
entry
1:
receive m(val)
2:
f0  0
3:
f1  1
4:
if m <= 1 goto L3
5:
i 2
6: L1: if i <=m goto L2
7:
return f2
8: L2: f2  f0 + f1
9:
f0  f1
10:
f1  f2
11:
i  i+1
12:
2,3
goto L1
13: L3: return m


2,3
2,3, 5,8,9, 10, 11
2,3, 5, 8,9, 10, 11
2,3, 5, 8,9, 10, 11
2,3, 5,8,9, 10, 11
exit
Difficulties in
Data-Flow Analysis
• Input-dependent information
• Undecidability of program analysis
– Reachability of basic blocks
– Arithmetic
– ...
1 int g(int m, int i)
2 int f(int n)
3 { int i=0;
4 if (n == 1) i = 2
5 while (n > 0) {
6
j = i+1;
7
n = g(n, i);
8
}
9
return j
10 }
Conservative data-flow analysis
• Every piece of data-flow information is
sound
• Every enabled optimization is correct
• A superset of the execution sequences is
considered
• In the reaching definition example a
superset of the reaching definitions is
computed
1 int g(int m, int i)
2 int f(int n)
3 { int i=0;
4 if (n == 1) i = 2
5 while (n > 0) {
6
j = i+1;
7
n = g(n, i);
8
}
9
return j
10 }
Iterative Computation of Reaching Definitions
• Optimistically assume that at every block no
definition is reached
• Every basic block “generates” new definitions and
“preserves” other definitions
• No definition reaches ENTRY
• Accumulate reaching definitions along different
paths
• Iteratively compute more and more definitions at
every basic block
• The process must terminate
• The final solution is unique and conservative
Iterative Computation of Reaching Definitions
RCin(ENTRY) = 
RCin(B) = B’  Pred(B) RCout(B’)
RCout(B) = GEN(B)  (RCin(B)PRSV(B))
entry
1:
receive m(val)
2:
f0  0
3:
f1  1
4:
if m <= 1 goto L3
5:
i 2
6: L1: if i <=m goto L2
7:
return f2
8: L2: f2  f0 + f1
9:
f0  f1
10:
f1  f2
11:
i  i+1
12:
2,3
goto L1
13: L3: return m


2,3
2,3, 5,8,9, 10, 11
2,3, 5, 8,9, 10, 11
2,3, 5, 8,9, 10, 11
2,3, 5,8,9, 10, 11
exit
Iterative Computation of
Reaching Definitions
Using Bit-Vectors
• Represent every definition with a bit
• PRSV and GEN are bit-vectors
RCin(ENTRY) = <000...0>
RCin(B) = B’  Pred(B) RCout(B’)
RCout(B) = GEN(B)  (RCin(B) PRSV(B))
entry
1:
receive m(val)
2:
f0  0
3:
f1  1
4:
if m <= 1 goto L3
5:
i 2
6: L1: if i <=m goto L2
7:
return f2
8: L2: f2  f0 + f1
9:
f0  f1
10:
f1  f2
11:
i  i+1
12:
2,3
goto L1
13: L3: return m


2,3
2,3, 5,8,9, 10, 11
2,3, 5, 8,9, 10, 11
2,3, 5, 8,9, 10, 11
2,3, 5,8,9, 10, 11
exit
Complete Join-Lattices
• A set L of “data-flow” information
• A partial order  on the elements of L
• x  y  x “covers” less states than y 
x is more precise than y
•  is the minimum element
• height of a lattice  length of maximal strictly
increasing chain x1x2...  xk
• A “join” confluence operator : LLL
– x  x y, y  x y
– x  z, y  z  x y  z
• Examples: Powersets, Bit-Vectors, ICP
Properties of Lattices
•
•
•
•
 x=x=x
x  x = x (reflexivity)
x  y = y  x (commutativity)
(x  y )  z = x  (y  z)
(associativity)
Functions on Lattices
• Models effects of basic blocks
• A monotonic function f: L  L
x  y  f(x)  f(y)
• A distributive function f: L  L
f(x)  f(y) = f(x  y)
• A fixed point of a function f, f(x) = x
• For a monotonic function f
the effective height of L w.r.t. f,
the longest increasing chain
f()f2()=f(f())...  fk() =
The Join (Meet) Over All Paths
• A data-flow solution which is precise under
the assumption that every control flow path
is executable
• For a path [B1, B2, ... Bn]
Fp = FBn ... FB2 FB1
• The JOP at a block B
JOP(B) =  P  Path(B) Fp(Init)
• For distributive Fp compute JOP
• Otherwise, find X(B) JOP(B)
entry
w > 0?
N
Y
u 1
u 2
v 2
v 1
w u+v
exit
Dimensions for
Data-Flow Problems
• The information provided
• “ralational” Vs. independent attributes
• The type of lattice and functions used
powersets, ICP, ..., unbounded heights
• The direction of information flow
forward, backward, bidirectional
Example Data-Flow Problems
•
•
•
•
•
•
•
Reaching Definitions
Available Expressions
Live Variables
Upward Exposed Uses
Copy-Propagation Analysis
Constant-Propagation Analysis
Partial-Redundency Analysis
entry
z > 1?
x 1
N
Y
x 2
z>y
Y
N
z x-3
y x+1
exit
Data-Flow Analysis Algorithms
•
•
•
•
•
Allen’s strongly connected regions
Kildall’s iterative algorithm
Ullman’s T1-T2 analysis
Kennedy’s node-listing algorithm
Farrow, Kennedy, and Zuconi’s graph
grammar approach
• Rosen’s high-level approach
• structural analysis
• slotwise analysis
Iterative Data-Flow Analysis
in(ENTRY) = Init
In(B) = B’  Pred(B) Out(B’)
Out(B) = FB(In(B))
Iterative Data-Flow Algorithm
Input:
a flow graph G=(N,E,r)
An init value Init
A montonic function FB for every B in N
Output: For every N in(N)
Initializatio: in(Entry) := Init;
for each node B in N-{Entry} do
in(B) :=
WL := N - {Entry}
Iteration: while WL != {} do
Select and remove an B from WL
out := FB(in(B))
For all B’ in succ(B) such that in(B’) != in(B’) out do
in(B’):= in(B’)  out
WL := WL {B’}
Post-ordering
Input:
a flow graph G=(N,E,r)
Output: a depth-first spanning tree (N,T) and ordering Post of N
Method:T := Ø;
for each node n in N do mark n unvisited;
i := 1; call DFS(r)
Using: procedure DFS(n) is
mark n visited;
for each n s in E
do if s is not visited
then add the edge n s to T;
call DFS(s);
Post(n) := i;
i := i + 1;
8
entry
1:
receive m(val)
2:
f0  0
3:
f1  1
4:
if m <= 1 goto L3
5:
i 2
7
6
6: L1: if i <=m goto L2
7:
5
return f2
3
8: L2: f2  f0 + f1
9:
f0  f1
10:
f1  f2
11:
i  i+1
12:
2
4
1
goto L1
13: L3: return m
exit
{}
{2, 3}
{2, 3, 5}
{2, 3, 5}
1
entry
{}
1:
receive m(val)
2:
f0  0
3:
f1  1
4:
if m <= 1 goto L3
5:
i 2
2
3
6: L1: if i <=m goto L2
7:
4
return f2
6
8: L2: f2  f0 + f1
{2, 3, 5}
9:
f0  f1
10:
f1  f2
11:
i  i+1
12:
{2, 3}
5
8
goto L1
13: L3: return m
7
exit
{}
{2, 3}
{2, 3, 5, 8, 9, 10}
{2, 3, 5}
1
entry
{}
1:
receive m(val)
2:
f0  0
3:
f1  1
4:
if m <= 1 goto L3
5:
i 2
2
3
6: L1: if i <=m goto L2
7:
4
return f2
6
8: L2: f2  f0 + f1
{2, 3, 5}
9:
f0  f1
10:
f1  f2
11:
i  i+1
12:
{2, 3}
5
8
goto L1
13: L3: return m
7
exit
1
entry
{2, 3}
{2, 3, 5, 8, 9, 10}
{2, 3, 5, 8, 9, 10}
{2, 3, 5, 8, 9, 10}
1:
receive m(val)
2:
f0  0
3:
f1  1
4:
if m <= 1 goto L3
5:
i 2
3
6: L1: if i <=m goto L2
7:
4
return f2
6
8: L2: f2  f0 + f1
9:
f0  f1
10:
f1  f2
11:
i  i+1
12:
{2, 3}
2
5
8
goto L1
13: L3: return m
7
exit
1
entry
{2, 3}
{2, 3, 5, 8, 9, 10}
{2, 3, 5, 8, 9, 10}
{2, 3, 5, 8, 9, 10}
1:
receive m(val)
2:
f0  0
3:
f1  1
4:
if m <= 1 goto L3
5:
i 2
3
6: L1: if i <=m goto L2
7:
return f2
9:
f0  f1
10:
f1  f2
11:
i  i+1
goto L1
13: L3: return m
4
6
8: L2: f2  f0 + f1
12:
{2, 3}
2
5
{2, 3, 5, 8, 9, 10} 8
7
exit
Iterative Backward Data-Flow Analysis
Out(Exit) = Init
Out(B) = B’  Succ(B) In(B’)
In(B) = FB(Out(B))
Lattices of Flow Functions
• For a lattice L
LF are the monotonic functions from L to L
f LF  x  y f(x)  f(y)
• LF is a lattice with the order f  g
for all z: f(z)  g(z)
– F(z) 
– (x F y) (z) x(z) y(z)
• LF is closed under composition
(f g)(z) f(g(z))
– f0id, fn  f fn-1
– f*(z)  lim n  (id  f)n (z)
Structural Data-Flow Analysis
• Phase 1: Compute “the effect” of every
program construct in a bottom-up fashion
on the tree of control flow constructs
(control-tree)
• Phase 2: Propagates the data-flow value in a
top-down fashion into basic blocks
Bottom-Up Phase
(if-then)
if
Fif-then=(F then° Fif/Y)  Fif/N
Fif/Y
then
Fthen
Fif/N if-then
Fif-then
Bottom-Up Phase
(Simplified if-then)
if
Fif-then=(F then° Fif)  Fif
Fif
then
Fthen
Fif
if-then
Fif-then
Bottom-Up Phase
Reaching Definitions
(Simplified if-then)
if
(F G2, P2° FG1, P1)  FG1, P1
F (G1P2)G2, P1  P2  FG1, P1
FG1, P1
then
FG2, P2
F G1G2, P1
FG1, P1 if-then
FG, P
Top-Down Phase
(if-then)
in(if-then)=in(if)
if
Fif/Y
then
Fthen
in(then) = Fif/Y(in(if))
Fif/N if-then
Fif-then
Top-Down Phase
(Simplified if-then)
in(if-then)=in(if)
if
Fif
then
Fthen
Fif
in(then) = Fif (in(if))
if-then
Fif-then
Top-Down Phase
Reaching Definitions
(Simplified if-then)
in(if) = in(if-then)
if
FG1, P1
then
FG2, P2
in(then) = FG1, P1 (in(if))
FG1, P1 if-then
FG,P
Bottom-Up Phase
(if-then-else)
Fif-then-else=(F then° Fif/Y)  (F else° Fif/N)
if
Fif/Y
then
Fthen
Fif/N
else
Felse
if-then-else
Fif-then-else
Bottom-Up Phase
(simplified if-then-else)
Fif-then-else=(F then° Fif)  (F else° Fif)=
=(F then  F else )° Fif
if
Fif
then
Fthen
Fif
else
Felse
if-then-else
Fif-then-else
Bottom-Up Phase
Reaching Definitions
(simplified if-then-else)
(F G1, P1  F G2, P2 )° FG0, P0
F G1G2, P1P2, ° FG0, P0
if
FG0, P0
then
FG1,P1
F (G0 (P1P2))  G1G2, P0  (P1P2)
FG0, P0
else
if-then-else
FG2, P2
FG,P
Top-Down Phase
(if-then-else)
in(if)=in(if-then-else)
in(then)= Fif/Y (in(if))
in(else)= Fif/N (in(if))
if
Fif/Y
then
Fthen
Fif/N
else
Felse
if-then-else
Fif-then-else
Top-Down Phase
(Simplified if-then-else)
in(if)=in(if-then-else)
in(then)= Fif (in(if))
in(else)= Fif (in(if))
if
Fif
then
Fthen
Fif
else
Felse
if-then-else
Fif-then-else
Top-Down-Up Phase
Reaching Definitions
(simplified if-then-else)
in(if)=in(if-then-else)
in(then)= FG0, P0 (in(if))
in(else)= FG0, P0 (in(if))
if
FG0, P0
then
FG1,P1
FG0, P0
else
if-then-else
FG2, P2
FG,P
Bottom-Up Phase
(while)
Fwhile-loop=Fwhile/N °(F body° Fwhile/Y)*
while
Fwhile/Y
body
Fbody
while-loop
Fwhile/N
Fwhile-loop
Bottom-Up Phase
(Simplified while)
Fwhile-loop=Fwhile°(F body° Fwhile)*
while
Fwhile
body
Fbody
while-loop
Fwhile
Fwhile-loop
Bottom-Up Phase
Reaching Definitions
(Simplified while)
FG0,P0°(F G1, P1° FG0, P0)*
FG0,P0°(F (G0P1)G1,P0  P1)*
FG0,P0°(F (G0P1)G1,U)
while
FG0, P0
body
FG1, P1
F ((G0P1)G1) P0G0, P0
FG0, P0 while-loop
FG,P
Top-Down Phase
(while)
in(while)=(F body° Fwhile/Y)*(in(while-loop))
in(body)= Fwhile/Y (F body° Fwhile/Y)*(in(while-loop))
while
Fwhile/Y
body
Fbody
while-loop
Fwhile/N
Fwhile-loop
Top-Down Phase
(Simplified while)
in(while)=(F body° Fwhile)*(in(while-loop))
in(body)= Fwhile (F body° Fwhile)*(in(while-loop))
while
Fwhile
body
Fbody
while-loop
Fwhile
Fwhile-loop
Top-Down Phase
Reaching Definitions
(Simplified while)
in(while)=(F G1, P1° FG0, P0)*(in(while-loop))
in(body)= FG0, P0 (F G1,P1° FG0, P0)*(in(while-loop))
while
FG0, P0
body
FG1, P1
FG0, P0 while-loop
FG,P
entry
B0
B1
B2
B3
B4
B5
1:
receive m(val)
2:
f0  0
3:
f1  1
4:
if m <= 1 goto L3
5:
i 2
6: L1: if i <=m goto L2
7:
8: L2: f2  f0 + f1
9:
f0  f1
10:
f1  f2
11:
i  i+1
12:
B6
return f2
goto L1
13: L3: return m
B7
exit
Handling Arbitrary CFGs
• Need to handle arbitrary acyclic regions
• Need to to handle irreducible components
(improper regions)
Handling Arbitrary CFGs
• Need to handle arbitrary acyclic regions
• Need to handle arbitrary cyclic regions
– Reducible regions
– irreducible components (improper regions)
Handling Improper Regions
•
•
•
•
Ignore
Node splitting
Solve iteratively for every initial value
Solve iteratively over LF
Structural Backward Analysis
• Tricky
• For constructs with single exit “reverse”
equation direction
• For acyclic constructs with multiple exits
use join
• For cyclic reducible constructs with
multiple exits--- break the cycle and use
join
• Cyclic improper regions are handled like
the forward case
Bottom-Up Phase
Backward Problems
(if-then)
if
Fif-then=(F if/Y° Fthen)  Fif/N
Fif/Y
then
Fthen
Fif/N if-then
Fif-then
Top-Down Phase
Backward Problems
(if-then)
out(then)= out(if-then)
if
Fif/Y
then
Fthen
out(if) = Fthen(out(then))
out(if-then)
Fif/N if-then
Fif-then
Implementation
• Represent the computation of canonic cases
with functions (if-then-else, while)
• Use graphs to represent arbitrary functional
computations
Automatic Construction of
Data-Flow Analyzers
• Not commonly used so far
• Kildall developed a tool for iterative data
flow analysis (1973)
• The PAG (1995) system allows systematic
construction of iterative data-flow analysis
• The Sharlit (1992) system generates noniterative data-flow analyzers
– Finds regular “path-expressions” in CFG
– Convert into effect functions
Def-Use, Use-Def Chains
• Sparse data-flow information on flow of variables between
assignments
• Can be used to improve the efficiency of iterative dataflow analysis
• A du-chain for a variable v connects a definition of v to all
the uses of this definition
• A ud-chain for a variable v connects a use of v to all the
definitions that may flow to it
• A web for a variable v is the maximal union of interesting
du-chains for v
entry
z>1
Y
N
x2
x1
z>2
N
Y
zx-3
x4
yx+1
zx+7
exit
Static Single Assignment
(SSA)
• A normal form of the program such that defuse is immediate
• A separate variable for every assignment
• A  function combines the values of
relevant variables
• Simplifies some optimizations
• Increases program’s size
entry
z1>1
Y
N
x22
x11
z1>2
N
Y
x3 (x1, x2) ; z2x3-3
x44
y1x1+1
z3x4+7
exit
Handling Pointers and Arrays
•
•
•
•
Complicated!!!
Treated conservatively in most compilers
The frontier of research
A simple “reduction”
xa[i]
xaccess(a, i)
aupdate(a, i, 4)
a[i]4
• Direct solutions yield more precise
solutions
More Ambitious
Data-Flow Analysis
• Data-Flow analysis can yield “interesting”
information on program behavior
• Signs of variables
• Non-trivial constant values
• Termination properties
• Complicated bugs
• Partial correctness
int f(int x)
{
if (x > 100) return x -10;
else return f(f(x+11));
}
void main()
{
scanf(“%d”, &x);
if (x > 100) printf(“%d\n”, 91);
else printf(“%d\n”, f(x));
}