cfa

Control Flow Analysis
Mooly Sagiv
http://www.math.tau.ac.il/~sagiv/courses/pa.html
Tel Aviv University
640-6706
Sunday 18-21 Scrieber 8
Monday 10-12 Schrieber 317
Textbook Chapter 3
(Simplified+OO)
Goals
 Understand
the problem of Control Flow Analysis
– in Functional Languages
– In Object Oriented Languages
– Function Pointers
 Learn
Constraint Based Program Analysis
Technique
–
–
–
–
General
Usage for Control Flow Analysis
Algorithms
Systems
 Similarities
between Problems &Techniques
Outline









A Motivating Example (OO)
The Control Flow Analysis Problem
A Formal Specification
Set Constraints
Solving Constraints
Adding Dataflow information
Adding Context Information
Back to the Motivating Example
Conclusions
A Motivating Example
class Vehicle Object { int position = 10;
void move(x1 : int) {
position = position + x1 ;}}
class Car extends Vehicle { int passengers;
void await(v : Vehicle) {
if (v.position < position)
then v.move(position - v.position);
else self.move(10); }}
class Truck extends Vehicle {
void move(x2 : int) {
if (x2 < 55) position = position + x2; }}
void main { Car c; Truck t; Vehicle v1;
new c;
new t;
v1 := c;
c.passangers := 2;
c.move(60);
v1.move(70);
c.await(t) ;}
The Control Flow Analysis (CFA) Problem
 Given
a program in a functional programming
language with higher order functions
(functions can serve as parameters and return
values)
 Find out for each function invocation
which functions may be applied
 Obvious in C without function pointers
 Difficult in C++, Java and ML
 The Dynamic Dispatch Problem
An ML Example
let f = fn x => x 1 ;
g = fn y => y + 2 ;
h = fn z => z + 3;
in (f g) + (f h)
An ML Example
let f = fn x => /* {g, h} */ x 1 ;
g = fn y => y + 2 ;
h = fn z => z + 3;
in (f g) + (f h)
The Language FUN

Notations
–
–
–
–
–
–

e  Exp // expressions (or labeled terms)
t  Term // terms (or unlabeled terms)
f, x  Var // variables
c  Const // Constants
op  Op // Binary operators
l  Lab // Labels
Abstract Syntax
– e ::= tl
– t ::= c | x
| fn x  e // function definition
| fun f x  e // recursive function definition
| e1 e2 // function applications
| if e0 then e1 else e2
| let x = e1 in e2 | e1 op e2
A Simple Example
((fn x  x1)2 (fn y  y3)4)5
An Example which Loops
(let g = fun f x  (f1 (fn y  y2)3)4
)5
(g6 (fn z  z7)8)9
)10
The 0-CFA Problem
 Compute
for every program a pair (C, ) where:
– C is the abstract cache associating abstract values with
labeled program points
–  is the abstract environment associating abstract values
with variables
 Formally
–
–
–
–
v  Val = P(Term) // Abstract values
  Env = Var  Val // Abstract environment
C  Cache - Lab  Val // Abstract Cache
For function application (t1l1 t2l2)l
C(l1) determine the function that can be applied
 These
maps are finite for a given program
 No context is considered for parameters
Possible Solutions for ((fn x  x1)2 (fn y  y3)4)5
1 {fn yy3} {fn yy3}
2 {fn xx1} {fn xx1}
3 {}
{}
4 {fn yy3} {fn yy3}
5 {fn yy3} {fn yy3}
x {fn yy3} {}
y {}
{}
(let g = fun f x  (f1 (fn y  y2)3)4
)5
(g6 (fn z  z7)8)9
)10
Shorthand
sf  fun f x  (f1 (fn y  y2)3)4
idy  fn y  y2
idz  fn z  z7
C(1) = {sf}
C(2) = {}
C(3) = {idy}
C(4) = {}
C(5) = {sf}
C(6) = {sf}
C(7) = {}
C(8) = {idy} C(9) = {}
C(10) = {}
(x) = {idy , idy }
(z) = {}
(y) = {}
Relationship to Dataflow Analysis
 Expressions
are side effect free
– no entry/exit
 A single
environment
 Represents information at different points via
maps
 A single value for all occurrences of a variable
 Function applications act similar to assignments
– “Definition” - Function abstraction is created
– “Use” - Function is applied
A Formal Specification of 0-CFA
 A Boolean
function  define when a solution is
acceptable
 (C, )  e means that (C, ) is acceptable for the
expression e
 Define  by structural induction on e
 Every function is analyzed once
 Every acceptable solution is sound (conservative)
 Many acceptable solutions
 Generate a set of constraints
 Obtain the least acceptable solution by solving the
constraints
Syntax Directed 0-CFA
(Simple Expressions)
[const] (C, )  cl
[var] (C, )  xl
always
if  (x)  C (l)
Syntax Directed 0-CFA
Function Abstraction
[fn] (C, )  (fn x  e)l
if:
(C, ) e
fn x  e  C(l)
[fun] (C, )  (fun f x  e)l if:
(C, ) e
fun x  e  C(l)
fun x  e  (f)
Syntax Directed 0-CFA
Function Application
[app] (C, )  (t1l1 t2l2)l
if:
(C, )  t1l1
(C, )  t2l2
for all fn x  t0l0 C(l):
C (l2)   (x) C(l0)  C(l)
for all fun x  t0l0 C(l):
C (l2)   (x) C(l0)  C(l)
Syntax Directed 0-CFA
Other Constructs
[if] (C, )  (if t0l0 then t1l1 else t2l2)l
(C, )  t0l0
(C, )  t1l1
(C, )  t2l2
C(l1)  C(l)
C(l2)  C(l)
[let] (C, )  (let x = t1l1 in t2l2)l
if:
(C, )  t1l1
(C, )  t2l2
C(l1)   (x)
C(l2)  C(l)
[op] (C, )  (t1l1 op t2l2)l
if:
(C, )  t1l1
(C, )  t2l2
if:
Possible Solutions for ((fn x  x1)2 (fn y  y3)4)5
1 {fn yy3} {fn yy3}
2 {fn xx1} {fn xx1}
3 {}
{}
4 {fn yy3} {fn yy3}
5 {fn yy3} {fn yy3}
x {fn yy3} {}
y {}
{}
Set Constraints
 A set
of rules of the form:
– lhs  rhs
– {t}  rhs’  lhs  rhs (conditional constraint)
– lhs, rhs, rhs’ are
» terms
» C(l)
» (x)
 The
least solution (C, ) can be found iterativelly
– start with empty sets
– add terms when needed
 Efficient
cubic graph based solution
Syntax Directed Constraint Generation (Part I)
C* cl  = {}
C* xl  = { (x)  C (l)}
C* (fn x  e)l  = C*  e   { {fn x  e}  C(l)}
C* (fun x  e)l  = C*  e   { {fun x  e}  C(l)} 
{{fun x  e}  ( f)}
C*  (t1l1 t2l2)l
 = C*  t1l1   C* t2l2  
{{t}  C(l)  C (l2)   (x) | t=fn x  t0l0 Term* } 
{{t}  C(l)  C (l0)  C (l) | t=fn x  t0l0 Term* } 
{{t}  C(l)  C (l2)   (x) | t=fun x  t0l0 Term* } 
{{t}  C(l)  C (l0)  C (l) | t=fun x  t0l0 Term* }
Syntax Directed Constraint Generation (Part II)
C* (if t0l0 then t1l1 else t2l2)l  = C*  t0l0   C*  t1l1   C*
t2l2  
{C(l1)  C (l)} 
{C(l2)  C (l)}
C* (let x = t1l1 in t2l2)l  = C*  t1l1   C* t2l2  
{C(l1)   (x)} 
{C(l2)  C(l)}
C* (t1l1 op t2l2)l  = C*  t1l1   C* t2l2 
Set Constraints for ((fn x  x1)2 (fn y  y3)4)5
Iterative Solution to the Set Constraints for
((fn x  x1)2 (fn y  y3)4)5
step
Constraint
1
2
3
4
x
y
Adding Data Flow Information
 Dataflow
 Example
values can affect control flow analysis
(let f = (fn x  (if (x1 > 02)3
then (fn y  y4)5
else (fn z  56)7
)8
)9
in ((f10 311)12 013)14)15
Adding Data Flow Information
 Add
a finite set of “abstract” values per program
Data
 Update Val = P(TermData)
–   Env = Var  Val // Abstract environment
– C  Cache - Lab  Val // Abstract Cache
 Generate
extra constraints for data
 Obtained a more precise solution
 A special of case of product domain (4.4)
 The combination of two analyses may be more
precise than both
Adding Dataflow Information (Sign Analysis)
 Sign
analysis
 Add a finite set of “abstract” values per program
Data = {P, N, TT, FF}
 Update Val = P(TermData)
 dc is the abstract value that represents a constant c
–
–
–
–
d3 = {p}
d-7= {n}
dtrue= {tt}
dfalse= {ff}
 Every
operator is conservatively interpreted
Syntax Directed Constraint Generation (Part I)
C* cl  = dc  C (l)}
C* xl  = { (x)  C (l)}
C* (fn x  e)l  = C*  e   { {fn x  e}  C(l)}
C* (fun x  e)l  = C*  e   { {fun x  e}  C(l)} 
{{fun x  e}  ( f)}
C*  (t1l1 t2l2)l
 = C*  t1l1   C* t2l2  
{{t}  C(l)  C (l2)   (x) | t=fn x  t0l0 Term* } 
{{t}  C(l)  C (l0)  C (l) | t=fn x  t0l0 Term* } 
{{t}  C(l)  C (l2)   (x) | t=fun x  t0l0 Term* } 
{{t}  C(l)  C (l0)  C (l) | t=fun x  t0l0 Term* }
Syntax Directed Constraint Generation (Part II)
C* (if t0l0 then t1l1 else t2l2)l  = C*  t0l0   C*  t1l1   C*
t2l2  
{dt  C (l0)  C(l1)  C (l)} 
{df C (l0)  C(l2)  C (l)}
C* (let x = t1l1 in t2l2)l  = C*  t1l1   C* t2l2  
{C(l1)   (x)} 
{C(l2)  C(l)}
C* (t1l1 op t2l2)l  = C*  t1l1   C* t2l2  
{C(l1) op C(l2)  C(l)}
Adding Context Information
 The
analysis does not distinguish between
different occurrences of a variable
(Monovariant analysis)
 Example
(let f = (fn x  x1) 2
in ((f3 f4)5 (fn y  y6) 7)8)9
 Source to source can help (but may lead to code
explosion)
 Example rewritten
let f1 = fn x1  x1
in let f2 = fn x2  x2
in (f1 f2) (fn y  y)
Simplified K-CFA
 Records
the last k dynamic calls (for some fixed
k)
 Similar to the call string approach
 Remember the context in which expression is
evaluated
 Val is now P(Term)Contexts
–   Env = Var Contexts  Val
– C  Cache - LabContexts  Val
1-CFA
f = (fn x  x1) 2 in ((f3 f4)5 (fn y  y6) 7)8)9
 Contexts
 (let
– [] - The empty context
– [5] The application at label 5
– [8] The application at label 8
 Polyvariant
Control Flow
C(1, [5]) =  (x, 5)= C(2, []) = C(3, []) =  (f, []) =
({(fn x  x1)}, [] )
C(1, [8]) =  (x, 8)= C(7, []) = C(8, []) = C(9, []) =
({(fn y  y6)}, [] )
The Motivating Example
class Vehicle Object { int position = 10;
void move(x1 : int) {
position = position + x1 ;}}
class Car extends Vehicle { int passengers;
void await(v : Vehicle) {
if (v.position < position)
then v.move(position - v.position);
else self.move(10); }}
class Truck extends Vehicle {
void move(x2 : int) {
if (x2 < 55) position = position + x2; }}
void main { Car c; Truck t; Vehicle v1;
new c;
new t;
v1 := c;
c.passangers := 2;
c.move(60);
v1.move(70);
c.await(t) ;}
Missing Material
 Efficient

Cubic Solution to Set Constraints
www.cs.berkeley.edu/Research/Aiken/bane.html
Experimental results for OO
www.cs.washington.edu/research/projects/cecil
 Operational Semantics for FUN (3.2.1)
 Defining acceptability without structural induction
– More precise treatment of termination (3.2.2)
– Needs Co-Induction (greatest fixed point)
 Using
general lattices as Dataflow values
instead of powersets (3.5.2)
 Lower-bounds
– Decidability of JOP
– Polynomiality
Conclusions
 Set
constraints are quite useful
– A Uniform syntax
– Can even deal with pointers
 But
semantic foundation is still based on abstract
interpretation
 Techniques used in functional and imperative
(OO) programming are similar
 Control and data flow analysis are related