Advanced Compiler Techniques

Advanced Compiler Techniques
Inter-procedural Analysis
LIU Xianhua
School of EECS, Peking University
Topics
 Up to now
 Intra-procedural analysis
 Dataflow analysis
 PRE
 Loops
 SSA
 Just for individual procedures
 Today: Inter-procedural analysis
 across/between procedures
“Advanced Compiler Techniques”
2
Modularity is a Virtue
 Decomposing programs into procedures aids
in readability and maintainability
 Object-oriented languages have pushed this
trend even further
 In a good design, procedures should be:
 An interface
 A black box
“Advanced Compiler Techniques”
3
The Catch
 This inhibits optimization!
 The compiler must assume:
 Called procedure may use or change any
accessible variable
 Procedure’s caller provides arbitrary values as
parameters
 Interprocedural optimizations – use the
calling relationships between procedures to
optimize one or both of them
“Advanced Compiler Techniques”
4
Recall
 Function calls can affect our points-to sets
p1 = &x;
p2 = &p1;
...
foo();
 Be conservative
 – Lose a lot of information
“Advanced Compiler Techniques”
5
Applications of IPA






Virtual method invocation
Pointer alias analysis
Parallelization
Detection software errors and vulnerabilities
SQL injection
Buffer overflow analysis & protection
“Advanced Compiler Techniques”
6
Basic Concepts








Procedure (Function )
Caller/Callee
Call Site
Call Graph
Call Context
Call Strings
Formal Arguments
Actual Arguments
“Advanced Compiler Techniques”
7
Terminology
Goal
 – Avoid making overly conservative assumptions about the effects
of procedures and the state at call sites
int a, e
// globals
procedure foo(var b, c) // formal args
b := c
end
program main
int d
// locals
foo(a, d)
// call site with
end
// actual args


In procedure body
 formals and/or globals may be aliased (two names refer to
same location)
 formals may have constant value
At procedure call
 global vars may be modified or used
 actual args may be modified or used
“Advanced Compiler Techniques”
8
Interprocedural Analysis vs.
Interprocedural Optimization
 Interprocedural analysis
 Gather information across multiple procedures
 (typically across the entire program)
 Can use this information to improve
intraprocedural analysis and optimization (e.g.,
CSE)
 Interprocedural optimizations
 Optimizations that involve multiple procedures
e.g., Inlining, procedure cloning, interprocedural
register allocation
 Optimizations that use interprocedural analysis
“Advanced Compiler Techniques”
9
The Call Graph
 Represent procedure call relationship
by call graph
 G = (V,E,start)
 Each procedure is a unique vertex
 Call site = edge between caller & callee
 (u,v) = call from u to v (u may call v)
 Can label with source line
 Cycles represent recursion
“Advanced Compiler Techniques”
10
Call Graph
“Advanced Compiler Techniques”
11
Super Graph
“Advanced Compiler Techniques”
12
Validity of Interprocedural
Control Flow Paths
“Advanced Compiler Techniques”
13
Safety, Precision, and Efficiency
of Data Flow Analysis
 Data flow analysis uses static representation of
programs to compute summary
information
along
A path
which represents
paths
legal control flow
 Ensuring Safety. All valid paths must be covered
 Ensuring Precision . Only valid paths should be
covered.
 Ensuring Efficiency. Only relevant valid paths should
be covered.
Subject to merging data
flow values at shared
program points without
creating invalid paths
A path which yields
information that
affects the summary
information
“Advanced Compiler Techniques”
14
Flow and Context Sensitivity
 Flow sensitive analysis:
 Considers intraprocedurally valid paths
 Context sensitive analysis:
 Considers interprocedurally valid paths
 For maximum statically attainable precision
, analysis must be both flow and context
sensitive.
“Advanced Compiler Techniques”
15
Context Sensitivity in
Interprocedural Analysis
“Advanced Compiler Techniques”
16
Example of Context Sensitivity
“Advanced Compiler Techniques”
17
Staircase Diagrams of
Interprocedurally Valid Paths
“You can descend only as much as you have ascended!”
Every descending step must match a corresponding
ascending step.
“Advanced Compiler Techniques”
18
Context Sensitivity in
Presence of Recursion
• For a path from u
tov, g must be
applied exactly the
same number of
times as f .
• For a prefix of
the above path, g
can be applied only
at most as many
times as f .
“Advanced Compiler Techniques”
19
Staircase Diagrams of
Interprocedurally Valid Paths
“Advanced Compiler Techniques”
20
Interprocedural Analysis
 Goals
 Enable standard optimizations even with
procedure calls
 Reduce call overhead for procedures
 Enable optimizations not possible for single
procedures
 Optimizations
 Register allocation
 Loop transformations
 CSE, etc.
“Advanced Compiler Techniques”
21
Analysis Sensitivity
 Flow-insensitive
 What may happen (on at least one path)
 Linear-time
 Flow-sensitive
 Consider control flow (what must happen)
 Iterative data-flow: possibly exponential
 Context-insensitive
 Call treated the same regardless of caller
 “Monovariant” analysis
 Context-sensitive
 Reanalyze callee for each caller
 “Polyvariant” analysis
More
sensitivity
 More
accuracy, but
more
expensive
 Path-sensitive vs. path-insensitive
 Computes one answer for every execution path
 Subsumes flow-sensitivity
 Extremely expensive
“Advanced Compiler Techniques”
22
Increasing Precision in
Data Flow Analysis
actually, only
caller sensitive
“Advanced Compiler Techniques”
23
Precision of IPA
 Flow-insensitive
 result not affected by control flow in procedure
 Flow-sensitive
 result affected by control flow in procedure
A
A
B
B
“Advanced Compiler Techniques”
24
Context Sensitivity
 Re-analyze callee as if procedure was inlined
a = id(3);
3
4
b = id(4);
id(x) { return x; }
a = min(3, 4);
ints
s = min(“aardvark”, “vacuum”);
strings
min(x, y) { if (x <= y) return x; else return y; }
 Too expensive in space & time
 Recursion?
 Approximate context sensitivity:
 Reanalyze callee for k levels of calling context
“Advanced Compiler Techniques”
25
Path Sensitivity
 Path-sensitive analysis
 – Computes an answer for every path:
 – x is 4 at the end of the left path
 – x is 5 at the end of the right path
 Path-insensitive analysis
 – Computes one answer for all path:
 – x is not constant
“Advanced Compiler Techniques”
26
Key Challenges for Interprocedural Analysis
 Compilation time, memory
 Key problem: scalability to large programs
 Dominated by analysis time/memory
 Flow-sensitive analysis: bottleneck often memory, not
time
 Often limited to fast but imprecise analysis
 Multiple calling environments
Different calls to P() have different properties:





Known constants
Aliases
Surrounding execution context (e.g., enclosing loops)
Function pointer arguments
Frequency of the call
 Recursion
“Advanced Compiler Techniques”
27
Brute Force: Full Context-Sensitive
Interprocedural Analysis
 Invocation Graph [Emami94]
 Use an invocation graph, which distinguishes all
calling chains
 Re-analyze callee for all distinct calling paths
 Pro: precise
 Cons: exponentially expensive, recursion is tricky
“Advanced Compiler Techniques”
28
Middle Ground: Use Call Graph and
Compute Summaries
 Goal
 Represent procedure
 Call relationships
 Definition
 If program P consists of n procedures:
p1, . . ., pn
 Static call graph of P is GP = (N,S,E,r)
 −N = {p1, . . ., pn}
 −S = {call-site labels}
 −E ⊆ N × N × S
 −r ∈ N is start node
“Advanced Compiler Techniques”
29
Summary Information
 Compute summary information for each procedure
Summarize effect of called procedure for callers
Summarize effect of callers for called procedure
 Store summaries in database
Use later when optimizing procedures
 Pros
+ Concise
+ Can be fast to compute and use
+ Separate compilation practical
 Cons
– Imprecise if only have one summary per procedure
“Advanced Compiler Techniques”
30
Two Types of Information
 Track info that flows into procedures
 “Propagation problems”, e.g.:
 which formals are constant?
 which formals are aliased to globals?
 Track info that flows out of procedures
 “Side effect problems”, e.g.:
proc(x, y)
{
. . .
 which globals defined/used by procedure? }
 which locals defined/used by procedure?
 Which actual parameters defined by
procedure?
“Advanced Compiler Techniques”
31
Propagation Summaries: Examples
 MAY-ALIAS
 Formals that may be aliased to globals
 MUST-ALIAS
 Formals definitely aliased to globals
 CONSTANT
 Formals that are definitely constant
“Advanced Compiler Techniques”
32
Side-Effect Summaries: Examples
 MOD
 Variables possibly modified (defined) by
procedure call
 REF
 Variables possibly referenced (used) by
procedure
 KILL
 Variables that are definitely killed in
procedure
“Advanced Compiler Techniques”
33
Computing Summaries
 Bottom-up (MOD, REF, KILL)
 Summarizes call effects
 Top-down (MAY-ALIAS)
 Summarizes information
about caller
 Bi-directional (AVAIL,
CONSTANT)
 Info to/from caller & callee
“Advanced Compiler Techniques”
34
Side-Effect Summarization
 At procedure boundaries:
 Translate formal args to actuals at call
site
 Compute:
 GMOD, GREF = procedure side effects
 MOD, REF = effects at call site
 Possibly specific to call
“Advanced Compiler Techniques”
35
Parameter Binding
 At procedure boundaries, we need to translate
formal arguments of procedure to actual arguments
of procedure at call site
int a,b
program main
foo(b)
end
procedure foo (var c)
int d
d := b
bar(b)
end
procedure bar (var d)
if (...)
d := a
end
// MOD(foo) = b
// REF(foo) = a,b
// GMOD(foo)= b
// GREF(foo)= a,b
// MOD(bar) = b
// REF(bar) = a
// GMOD(bar)= d
// GREF(bar)= a
“Advanced Compiler Techniques”
36
Constructing Summary Flow Functions
Iteratively
Termination is possible only if all function compositions
and confluences can be reduced to a finite set of functions
“Advanced Compiler Techniques”
37
An Example of Interprocedural
Liveness Analysis
“Advanced Compiler Techniques”
38
An Example of Interprocedural
Liveness Analysis
“Advanced Compiler Techniques”
39
An Example of Interprocedural
Liveness Analysis
“Advanced Compiler Techniques”
40
An Example of Interprocedural
Liveness Analysis
“Advanced Compiler Techniques”
41
An Example of Interprocedural
Liveness Analysis
“Advanced Compiler Techniques”
42
An Example of Interprocedural
Liveness Analysis
e ∈ InSp but e ∉ Inc1
“Advanced Compiler Techniques”
43
Interprocedural Validity and
Calling Contexts
“You can descend only as much as you have ascended!”
Every descending step must match a corresponding ascending step.
Calling context is represented by the remaining descending steps.
“Advanced Compiler Techniques”
44
Available Expressions Analysis Using Call
Strings Approach
int a, b, t;
void p()
{ if (a == 0)
{
a = a-1;
p();
t = a∗b;
}
}
Is a ∗ b
available?
YES!
“Advanced Compiler Techniques”
45
Available Expressions Analysis Using Call
Strings Approach
“Advanced Compiler Techniques”
46
Alternatives to IPA: Inlining
 Replaces calls to procedures with copies of
their bodies
 Converts calls from opaque objects to local
code
 Exposes the “effects” of the called procedure
 Extends the compilation region
 Language support: the inline attribute
 But the compiler can decide per call-site, rather
than per procedure
“Advanced Compiler Techniques”
47
Inlining Decisions
 Must be based on
 Heuristics, or
 Profile information
 Considerations




The size of the procedure body (smaller=better)
Number of call sites (1=usually wins)
If call site is in a loop (yes=more optimizations)
Constant-valued parameters
“Advanced Compiler Techniques”
48
Inlining Policies
The hard question
– How do we decide which calls to inline?
Many possible heuristics
– Only inline small functions
– Let the programmer decide using an inline directive
– Use a code expansion budget [Ayers, et al ’97]
– Use profiling or instrumentation to identify hot
paths—inline along the hot paths [Chang, et al ’92]
 – JIT compilers do this
 – Use inlining trials for object oriented languages [Dean
& Chambers ’94]
 – Keep a database of functions, their parameter
types, and the benefit of inlining
 – Keeps track of indirect benefit of inlining
 – Effective in an incrementally compiled language







“Advanced Compiler Techniques”
49
Study on Real Compilers
Cooper, Hall, Torczon (92)
 Eight Programs, five compilers, five processors
 Eliminated 99% of dynamic calls in 5 of the
programs
 Measured speed of original vs. transformed code
What do you
expect?
“Advanced Compiler Techniques”
V.S.
50
Results on real compilers
“Advanced Compiler Techniques”
51
What happened?
 Input code violated assumptions made by compiler
writers
 Longer procedures
 More names
 Different code shapes
 Exacerbated problems that are unimportant on
“normal” code




Imprecise analysis
Algorithms that scale poorly
Tradeoffs between global and local speed
Limitations in the implementations
The compiler writers were surprised!
“Advanced Compiler Techniques”
52
Inlining: Summary
 Pros
+ Exposes context & side effects
+ Simple
 Cons
-
Code bloat (bad for caches, branch predictor)
Can’t decide statically for OOPs
Library source?
Recursion?
How do we decide when to inline?
“Advanced Compiler Techniques”
53
Alternatives to IPA: Cloning
 Cloning: customize procedure for certain call sites
 Partition call sites to procedure p into equivalence
classes
 e.g., {{call3, call1}, {call4}}
 Equivalence based on optimization
 Constant propagation: partition based on parameter
value
“Advanced Compiler Techniques”
54
Cloning
 Pros
+
+
+
+
Compromise between inlining & IPA
Less code bloat compared to inlining
No problem with recursion
Better caller/callee optimization potential (compared
to IPA)
 Cons
-
Some code bloat (compared to IPA)
- May have to do interprocedural analysis anyway
e.g. Interprocedural constant propagation can guide cloning
“Advanced Compiler Techniques”
55
Summary
 Interprocedural analysis
 Difficult but expensive
 Need source code, recompilation analysis
 Trade-offs for precision & speed/space
 Better than inlining
 Useful for many optimizations
 IPA and cloning likely to become more
important
 Java: many small procedures
“Advanced Compiler Techniques”
56
Summary
 Most compilers avoid interprocedural analysis




– It’s expensive and complex
– Not beneficial for most classical optimizations
– Separate compilation + interprocedural analysis requires
recompilation analysis [Burke and Torczon’93]
– Can’t analyze library code
 When is it useful?







–
–
–
–
–
–
–
Pointer analysis
Constant propagation
Object oriented class analysis
Security and error checking
Program understanding and re-factoring
Code compaction
Parallelization
{
“Modern” Uses
of Compilers
“Advanced Compiler Techniques”
57
Trends
 Cost of procedures is growing
 – More of them and they’re smaller (OO
languages)
 – Modern machines demand precise
information (memory op aliasing)
 Cost of inlining is growing
 – Code bloat degrades efficacy of many
modern structures
 – Procedures are being used more extensively
 Programs are becoming larger
 Cost of interprocedural analysis is shrinking
 – Faster machines
 – Better methods
“Advanced Compiler Techniques”
58
Next Time
 Homework
 Convert program to SSA form
 Exercise 12.1.1
 Pointer Analysis
 Reading: Dragon chapter 12
 Mid-term Review
“Advanced Compiler Techniques”
59