Type inference

CS 598 Scripting Languages Design
and Implementation
9. Constant propagation
and
Type Inference
Must and may define
• An assignment statement of the form x:=E or
a read statement of the form read x must
define x.
• A function call fun(x) where x is passed by
reference or an assignment of the form *q:=
y where nothing is known about pointer q,
may define x.
2
Reaching definitions
• Reaching defintions algorithm computes for each block
b the set REACHES(b) of the statements that reach b.
• We say that a statement s that must or may define a
variable x reaches b if
– there is a path from s to b in the control flow graph such
that none of the statements in the path must define x.
• The reaching definitions algorithm must be
conservative to guarantee correct transformations.
– i.e. it must assume that a definition can reach a block b
unless it is absolutely certain that it does not.
3
Computing reaching definitions
• Consider the following language:
S ::= id:=expression |
S;S |
if expression then S else S |
do S while expression
• We use the following terms to compute reaching
definitions:
– gen[S] is the set of definitions “generated by S”.
– kill[S] is the set of definitions “killed” by S.
– in[S] is the set of definitions reaching S (meaning reaching the
top of S)
– out[S] is the set of definitions that reach the bottom of S.
4
Computing reaching definitions
• When S has the form a:= expression and label d:
– gen[S] = {d}
– kill[S] = Da-{d}. Here Da is the set of all definitions of a.
– out[S] = gen[S] ∪ (in[S]-kill[S])
• When S is of the form S1; S2:
–
–
–
–
–
gen[S] = gen[S2] ∪(gen[S1]-kill[S2])
kill[S] = kill[S2] ∪(kill[S1]-gen[S2])
in[S1] = in[S]
in[S2] = out[S1]
out[S] = out[S2]
5
Computing reaching definitions
• When S is of the form if ... then S1 else S2
–
–
–
–
–
gen[S]=gen[S1] ∪ gen[S2]
kill[S]=kill[S1] ∩ kill[S2]
in[S1] = in[S]
in[S2] = in[S]
out[S] = out[S1] ∪ out[S2]
• When S is of the form do S1 while ...:
–
–
–
–
gen[S] = gen[S1]
kill[S] = kill[S1]
in[S1] = in[S] ∪ gen[S1]
out[S] = out[S1]
6
Computing reaching definitions
• Use an abstractsyntax tree instead of a control flow graph.
• Gen[S] and kill[S] are computed bottom up on that tree.
• The “real” gen[S] is a subset of the computed gen[S]. For
example when S is an if statement that always takes the
“true” branch, the real gen[S]=gen[S1] is a subset of the
computed gen[S1] ∪ gen[S2]. At the same time, the real
kill[S] is a superset of the computed kill[S].
• Notice that out[S] is not the same as gen[S]. The former
contains all the definitions that reach the bottom of S,
while the latter includes only those definitions within S that
reach the bottom of S.
• In[S] and out[S] are computed starting at the statement S0
representing the whole program
7
Algorithm IN-OUT: Compute in[S] and out[S] for all statements S
Input: An abstract syntax tree of program S0 and the gen and kill sets for all the
statements within the program.
Output: in[S] and out[S] for all statements within the program
computeOut(S,INS):
case S
a := ...:
S1;S2:
if …
then S1
else S2:
computeOut(S2,in[S2]))
do S1
while ... :
return(out[S]=gen[S] ∪(INS-kill[S]))
in[S1]=INS
in[S2],out[S1]=computeOut(S1,in[S1])
return(out[S2]=computeOut(S2,in[S2]))
in[S1], in[S2] = INS
return(out[S]=computeOut(S1,in[S1]) ∪
in[S1] = INS∪gen[S1]
return(out[S]=computeOut(S1,in[S1]))
end case
end
in[S0] = ∅
computeOut(S0,in[S0])
8
Computing reaching definitions
• The sets of statements can be represented with bit
vectors. Then unions and intersections take the form of
or and and operations. Only the statements that assign
values to program variables have to be taken into
account. That is statements (or instructions) assigning
to compilergenerated temporaries can be ignored.
• In an implementation it is better to do the
computations for basic blocks instead of statements.
The kill and gen of a basic block and the reaching sets
for each statement can then be obtained by applying
the rules above.
9
Computing reaching definitions
iteratively
•
•
•
Reaching definitions was computed assuming a structured program represented
as an abstract syntax tree.
An alternative approach that works even when the flow graph is not reducible is to
use iterative algorithms on the control flow graph.
These algorithms seek a solution to a system of equations:
in[S] = ∪ out[T]
T ∈ PRED(S)
out[S]=gen[S] ∪ (in[S]\kill[S])
•
or, in terms of in alone:
in[S] = ∪gen[S] ∪ (in[S]\kill[S])
T ∈ PRED(S)
•
•
For the CFG entry node, S0, it is assumed that in[S0] = ∅.
Initially in[S] = ∅ for all S in the program. In this way, we get the smallest solution
to these equations (which typically have more than one solution) for maximum
accuracy.
10
Computing reaching definitions
iteratively
Input: A flow graph for which kill[S] and gen[S] has been computed for each
assignment statement S
Output: in[S] and out [S] for each assignment statement S
Method:
for each statement S ∈ PROG
out[S] = gen[S] // compute out assuming in[S] = ∅.
change := true
while change
change:=false
for each statement S ∈ PROG
in[S] := ∪ out[T]
T ∈ PRED(S)
oldout:= out[S]
out[S]:=gen[S] ∪ (in[S]\kill[S])
if out[S] ≠ oldout then change := true
11
Use-definition chains
•
•
•
•
Data interconnections may be expressed in a pure form which directly links
instructions that produce values to instructions that use them.
For each statement S with input variable v, we say that DEFS(v,S) = in[S] ∩Dv. If v
is not input to S, DEFS(v,S) = ∅.
For each definition S with output variable v, USES(S) ={T| S is in DEFS(v,T)}.
Once DEFS is computed, USES can be computed by simple inversion as follows:
Algorithm US: USES Computation
Input: DEFS, a program PROG
Output: USES
Method:
for each statement T in PROG
USES(T):=∅
for each input variable v of T
for each statement S ∈ DEFS(v,T)
USES(S) = USES(S) ∪ {T}
12
Algorithm MK: Mark Useful Definitions
Input:
– A program, PROG,
– DEFS,
– CRIT, a set of critical statements which are useful by definition (e.g writes).
Output:
– MARK(S). For each definition S, MARK(S) = true iff S is useful
Method:
for each statement T ∈ PROG
MARK(T) = false
PILE=CRIT
while PILE ≠ ∅
S = from PILE // from removes one element from set PILE
MARK(S)=true
for each input variable v of S
for each T ∈ DEFS(v,S)
if MARK(T) then PILE = PILE∪ {T}
13
Algorithm CP: Constant Propagation
Input:
– A program PROG
– A flag CONST(v,S) for each statement S and input variable
v of S. Initially, CONST(v,S) is false for all v and S.
– CONST(S) for the output variable of S. Initially, CONST(S) is
true if the rhs of S is a constant.
– USES and DEFS
Output:
– The modified CONST flags
– The mapping VAL(v,S) which provides the run-time
constant value of input variable v at statement S. VAL(v,S)
is defined only if CONST(v,S) is true. VAL(S) is the value of
the output of S. VAL(S) is defined only if CONST(S) is true.
14
Algorithm CP: Constant Propagation
Method:
PILE = {S ∈ PROG | the rhs of S is a constant} // trivially constant statements
while PILE ≠ ∅
T from PILE
v = LHS(T)
for each S ∈ USES(T) // check for constant inputs
for each W in DEFS(v ,S)-{T} //check that all inputs are
constant
if ¬ CONST(W) or VAL(W) ≠ VAL(T) then next(2)
// If they are constant
CONST(v,S)=true
VAL(v,S)=VAL(T)
// is the statement now computing constant?
if CONST(w,S) is true for all inputs w of S then
CONST(S) = true
VAL(S) = evaluateRHS(S)
PILE = PILE ∪ {S}
15
Algorithm TA: Type Analysis
• In dynamic language, run-time type checks are needed
unless it can bedetermined at compile time the type of
the operands.
• We need an algebra of types where
– The atomic type symbols are: I (integer), R (real), N
(number, i.e. real or integer), UD (undefined), NS (set of
arbitrary elements), Z (error), etc.
– The transition function F⊕ for each operation ⊕ which for
input types t1, t2, ..., tn of the operands, produces the type
of the left hand side: t0=F⊕ (t1, t2, ..., tn ).
• e.g. real+real is real, real + integer is also real, integer + integer is
integer, real+error is error.
• And a “merging operation”:
16
Input:
– A program PROG
– A mapping TYPE, such that TYPE(v,S) is the best inital
estimate of the type of the variable v at the top of
statement S (for most variables this is ‘UD’).
– TYPE(S), the type of the output (LHS) of S.
– DEFS and USES
Output:
– For each instruction S and input or output variable v,
Type(v,S), a conservative approximation to the most
specific type information provably true at S.
17
Algorithm TA: Type Analysis
PILE={S ∈ PROG | no variable in the rhs is of type ‘UD’}
while PILE ≠ ∅
S from PILE
v = LHS (S)
for each T ∈ USES(S) // recompute type
oldtype = TYPE(v,T)
TYPE(v ,T) =
TYPE(S)
S ∈ DEFS(v,T)
of T
if TYPE(v,T) ≠ oldtype then // a type refinement
TYPE(T) = F⊕ (types on RHS of T)
// ⊕ is the operation on RHS
PILE = PILE ∪ {T}
18
Algorithm CP2: A second algorithm for
constant propagation
• It is also possible to do constant propagation (and
type inference) without using reaching
definitions.
• The approach would follow an iterative
procedure where, for each statement S and until
convergence, we compute
– in[S] as the merge of the out of the predecessors of S
in the control flow graph, and
– out[S] in terms of a value in[S] as specified next
• For constant propagation, we use not a set but a
map as follows.
19
Algorithm CP2
Compute out[S] in terms of in[S]
• For constant propagation the values of in[S] and out[S] are
– in[S]:V ℝ ∪ {nonconstant, undefined}
– out[S]:V ℝ ∪ {nonconstant, undefined}
– V is the set of variables in the program.
• The transfer functions are created from the type of operation in the
flowgraph:
– if no definitions then out[S]=in[S].
– if S = x=c then out[S](w)=in[S](w) ∀w ≠ x and out[S](x) = c.
– if S = x=y+z then out[S](w)=in[S](w) ∀w ≠ x and
out[S](x)=in[S](y)+in[S](z)
•
•
•
•
Here + is extended to ℝ ∪ {nonconstant, undefined} as follows:
nonconstant + a = nonconstant,
undefined + a = undefined,
nonconstant + undefined = nonconstant]
– if S = read(x) then out[S](w)=in[S](w) ∀w ≠ x and out[S](x)=nonconstant.
20
Algorithm CP2
Merge operation
• The merge operation is associative so we only
need to define it for pairs:
– M= out[S1]
out[S2]
– Where M(x) is defined by the table:
out[S1](x)
out[S2](x)
nonconstant
d∈ℝ
undefined
nonconstant
nonconstant
nonconstant
nonconstant
c ∈ℝ
nonconstant
If c=d then c
else nonconstant
c
undefined
nonconstant
d
undefined
21