PPTX

Fall 2016-2017 Compiler Principles
Lecture 5: Intermediate Representation
Roman Manevich
Ben-Gurion University of the Negev
Tentative syllabus
Front
End
Intermediate
Representation
Optimizations
Code
Generation
Scanning
Operational
Semantics
Dataflow
Analysis
Register
Allocation
Top-down
Parsing (LL)
Lowering
Loop
Optimizations
Energy
Optimization
Bottom-up
Parsing (LR)
mid-term
Instruction
Selection
exam
2
Previously
• Becoming parsing ninjas
– Going from text to an Abstract Syntax Tree
By Admiral Ham [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
3
From scanning to parsing
59 + (1257 * xPosition)
program text
Lexical
Analyzer
token stream
Grammar:
E  id
E  num
EE+E
EE*E
E(E)
num
+
(
num
*
id
)
Parser
valid
syntax
error
+
num
Abstract Syntax Tree
*
num
x
4
Agenda
• The role of intermediate representations
• Two example languages
– A high-level language
– An intermediate language
• Lowering
• Correctness
– Formal meaning of programs
5
Role of intermediate representation
• Bridge between front-end and back-end
High-level
Language
Lexical
Analysis
Syntax
Analysis
Parsing
AST
Symbol
Table
etc.
Inter.
Rep.
(IR)
Code
Generation
Executable
Code
(scheme)
• Allow implementing optimizations
independent of source language and
executable (target) language
6
Motivation for intermediate
representation
7
Intermediate representation
• A language that is between the source language and
the target language
– Not specific to any source language or machine language
• Goal 1: retargeting compiler components for
different source languages/target machines
Java
C++
Pyhton
Pentium
IR
Java bytecode
Sparc
8
Intermediate representation
• A language that is between the source language and
the target language
– Not specific to any source language or machine language
• Goal 1: retargeting compiler components for
different source languages/target machines
• Goal 2: machine-independent optimizer
– Narrow interface: small number of node types
(instructions)
Lowering
Java
C++
Pyhton
Code Gen.
optimize
IR
Pentium
Java bytecode
Sparc
9
Multiple IRs
• Some optimizations require high-level
constructs while others more appropriate on
low-level code
• Solution: use multiple IR stages
AST
optimize
optimize
HIR
LIR
Pentium
Java bytecode
Sparc
10
Multiple IRs example
Elixir – a language for parallel graph algorithms
Elixir
Program
Delta
Inferencer
Query Answer
Automated
Reasoning
(Boogie+Z3)
Elixir
Program
+ delta
HIR
Lowering
HIR
IL
Synthesizer
Planning
Problem
Plan
LIR
C++
backend
C++ code
Automated
Planner
Galois
Library
11
AST vs. LIR for imperative languages
AST
LIR
• Rich set of language constructs
• Rich type system
• Declarations: types (classes,
interfaces), functions,
variables
• Control flow statements: ifthen-else, while-do, breakcontinue, switch, exceptions
• Data statements: assignments,
array access, field access
• Expressions: variables,
constants, arithmetic
operators, logical operators,
function calls
• An abstract machine language
• Very limited type system
• Only computation-related
code
• Labels and conditional/
unconditional jumps, no
looping
• Data movements, generic
memory access statements
• No sub-expressions, logical as
numeric, temporaries,
constants, function calls –
explicit argument passing
12
three address code
13
Three-Address Code IR
• A popular form of IR
Chapter 8
• High-level assembly where instructions have
at most three operands
• There exist other types of IR
– For example, IR based on acyclic graphs – more
amenable for analysis and optimizations
14
Base language: While
15
Syntax
n
x
Num
Var
numerals
program variables
A  n | x | A ArithOp A | (A)
ArithOp  - | + | * | /
B  true | false
| A = A | A A | B | B B | (B)
S  x := A | skip | S; S | { S }
| if B then S else S
| while B S
16
Example program
while x < y {
x := x + 1
{
y := x;
17
Intermediate language:
IL
18
Syntax
n
l
x
Num
Num
Temp
Var
Vn|x
R  V Op V
Op  - | + | * | / | = |
C  l: skip
| l: x := R
| l: Goto l’
| l: IfZ x Goto l’
| l: IfNZ x Goto l’
IR  C+
Numerals
Labels
Temporaries and variables
| << | >> | …
19
Intermediate language programs
• An intermediate program P has the form
1:c1
…
n:cn
• We can view it as a map from labels to individual
commands and write P(j) = cj
1:
2:
3:
4:
5:
6:
7:
8:
9:
t0 := 137
y := t0 + 3
IfZ x Goto 7
t1 := y
z := t1
Goto 9
t2 := y
x := t2
skip
20
Lowering
21
TAC generation
• At this stage in compilation, we have
– an AST
– annotated with scope information
– and annotated with type information
• We generate TAC recursively (bottom-up)
– First for expressions
– Then for statements
22
TAC generation for expressions
• Define a function cgen(expr) that generates
TAC that:
1. Computes an expression,
2. Stores it in a temporary variable,
3. Returns the name of that temporary
• Define cgen directly for atomic expressions
(constants, this, identifiers, etc.)
• Define cgen recursively for compound
expressions (binary operators, function calls,
etc.)
23
Translation rules for expressions
cgen(n) = (l: t:=n, t)
where l and t are fresh
cgen(x) = (l: t:=x, t)
where l and t are fresh
cgen(e1) = (P1, t1)
cgen(e2) = (P2, t2)
where l and t are fresh
cgen(e1 op e2) = (P1· P2· l: t:=t1 op t2, t)
24
cgen for basic expressions
• Maintain a counter for temporaries in c, and a
counter for labels in l
• Initially: c = 0, l = 0
cgen(k) = { // k is a constant
c = c + 1, l = l +1
emit(l: tc := k)
return tc
{
cgen(id) = { // id is an identifier
c = c + 1, l = l +1
emit(l: tc := id)
return tc
{
25
Naive cgen for binary expressions
• cgen(e1 op e2) = {
let A = cgen(e1)
let B = cgen(e2)
c = c + 1, l = l +1
emit( l: tc := A op B; )
return tc
}
The translation emits code
to evaluate e1 before e2.
Why is that?
26
Example: cgen for binary expressions
cgen( (a*b)-d)
27
Example: cgen for binary expressions
c = 0, l = 0
cgen( (a*b)-d)
28
Example: cgen for binary expressions
c = 0, l = 0
cgen( (a*b)-d) = {
let A = cgen(a*b)
let B = cgen(d)
c = c + 1 , l = l +1
emit(l: tc := A - B; )
return tc
}
29
Example: cgen for binary expressions
c = 0, l = 0
cgen( (a*b)-d) = {
let A = {
let A = cgen(a)
let B = cgen(b)
c = c + 1, l = l +1
emit(l: tc := A * B; )
return tc
}
let B = cgen(d)
c = c + 1, l = l +1
emit(l: tc := A - B; )
return tc
}
30
Example: cgen for binary expressions
c = 0, l = 0
cgen( (a*b)-d) = {
here A=t1
let A = {
let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc }
let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc }
c = c + 1, l = l +1
emit(l: tc := A * B; )
return tc
}
let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc }
c = c + 1, l = l +1
emit(l: tc := A - B; )
return tc
}
Code
31
Example: cgen for binary expressions
c = 0, l = 0
cgen( (a*b)-d) = {
here A=t1
let A = {
let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc }
let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc }
c = c + 1, l = l +1
emit(l: tc := A * B; )
return tc
}
let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc }
c = c + 1, l = l +1
emit(l: tc := A - B; )
return tc
}
Code
1: t1:=a;
32
Example: cgen for binary expressions
c = 0, l = 0
cgen( (a*b)-d) = {
here A=t1
let A = {
let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc }
let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc }
c = c + 1, l = l +1
emit(l: tc := A * B; )
return tc
}
let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc }
c = c + 1, l = l +1
emit(l: tc := A - B; )
return tc
}
Code
1: t1:=a;
2: t2:=b;
33
Example: cgen for binary expressions
c = 0, l = 0
cgen( (a*b)-d) = {
here A=t1
let A = {
let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc }
let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc }
c = c + 1, l = l +1
emit(l: tc := A * B; )
return tc
}
let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc }
c = c + 1, l = l +1
emit(l: tc := A - B; )
return tc
}
Code
1: t1:=a;
2: t2:=b;
3: t3:=t1*t2
34
Example: cgen for binary expressions
c = 0, l = 0
here A=t3
cgen( (a*b)-d) = {
here A=t1
let A = {
let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc }
let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc }
c = c + 1, l = l +1
emit(l: tc := A * B; )
return tc
}
let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc }
c = c + 1, l = l +1
emit(l: tc := A - B; )
return tc
}
Code
1: t1:=a;
2: t2:=b;
3: t3:=t1*t2
35
Example: cgen for binary expressions
c = 0, l = 0
here A=t3
cgen( (a*b)-d) = {
here A=t1
let A = {
let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc }
let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc }
c = c + 1, l = l +1
emit(l: tc := A * B; )
return tc
}
let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc }
c = c + 1, l = l +1
emit(l: tc := A - B; )
return tc
}
Code
1: t1:=a;
2: t2:=b;
3: t3:=t1*t2
4: t4:=d
36
Example: cgen for binary expressions
c = 0, l = 0
here A=t3
cgen( (a*b)-d) = {
here A=t1
let A = {
let A = {c=c+1, l=l+1, emit(l: tc := a;), return tc }
let B = {c=c+1, l=l+1, emit(l: tc := b;), return tc }
c = c + 1, l = l +1
emit(l: tc := A * B; )
return tc
}
let B = {c=c+1, l=l+1, emit(l: tc := d;), return tc }
c = c + 1, l = l +1
emit(l: tc := A - B; )
return tc
}
Code
1: t1:=a;
2: t2:=b;
3: t3:=t1*t2
4: t4:=d
5: t5:=t3-t4
37
cgen for statements
• We can extend the cgen function to operate
over statements as well
• Unlike cgen for expressions, cgen for
statements does not return the name of a
temporary holding a value
– (Why?)
38
Syntax
n
x
Num
Var
numerals
program variables
A  n | x | A ArithOp A | (A)
ArithOp  - | + | * | /
B  true | false
| A = A | A A | B | B B | (B)
S  x := A | skip | S; S | { S }
| if B then S else S
| while B S
39
Translation rules for statements
cgen(skip) = l: skip
cgen(e) = (P, t)
cgen(x := e) = P · l: x :=t
where l is fresh
where l is fresh
cgen(S1) = P1, cgen(S2) = P2
cgen(S1; S2) = P1 · P2
40
Translation rules for conditions
cgen(b) = (Pb, t), cgen(S1) = P1, cgen(S2) = P2
cgen(if b then S1 else S2) =
Pb
IfZ t Goto lfalse
P1
where lfinish, lfalse, lafter
lfinish: Goto Lafter
are fresh
lfalse: skip
P2
lafter: skip
41
Translation rule for loops
cgen(b) = (Pb, t), cgen(S) = P
cgen(while b S) =
lbefore: skip
Pb
IfZ t Goto lafter
P
lloop: Goto Lbefore
lafter: skip
where lafter, lbefore, lloop
are fresh
42
Translation example
y := 137+3;
if x=0
z := y;
else
x := y;
1: t1 := 137
2: t2 := 3
3: t3 := t1 + t2
4: y := t3
5: t4 := x
6: t5 := 0
7: t6 := t4=t5
8: IfZ t6 Goto 12
9: t7 := y
10: z := t7
11: Goto 14
12: t8 := y
13: x := t8
14: skip
43
Translation Correctness
44
Compiler correctness
• Compilers translate programs in one language
(usually high) to another language (usually
lower) such that they are both equivalent
• Our goal is to formally define the meaning of
this equivalence
– First, we must formally define the meaning of
programs
45
Formal semantics
46
http://www.daimi.au.dk/~bra8130/Wiley_book/wiley.html
47
What is formal semantics?
“Formal semantics is concerned with rigorously
specifying the meaning, or behavior,
of programs, pieces of hardware, etc.”
48
Why formal semantics?
• Implementation-independent definition
of a programming language
• Automatically generating interpreters
(and some day maybe full fledged compilers)
• Optimization, verification, and debugging
– If you don’t know what it does, how do you know its
correct/incorrect?
– How do you know whether a given optimization is
correct?
49
Operational semantics
• Elements of the semantics
• States/configurations: the (aggregate) values
that a program computes during execution
• Transition rules: how the program advances
from one configuration to another
50
Operational semantics
of while
51
While syntax reminder
n
x
Num
Var
numerals
program variables
A  n | x | A ArithOp A | (A)
ArithOp  - | + | * | /
B  true | false
| A = A | A A | B | B B | (B)
S  x := A | skip | S; S | { S }
| if B then S else S
| while B do S
52
Semantic categories
Z
T
State
Integers {0, 1, -1, 2, -2, …}
Truth values {ff, tt}
Var Z
Example state:
Lookup:
Update:
 =[x 5, y 7, z 0]
(x) = 5
[x 6] = [x 6, y 7, z 0]
53
Semantics of expressions
54
Semantics of arithmetic expressions
• Semantic function  A  : State  Z
• Defined by induction on the syntax tree
n=n
 x   = (x)
 a1 + a2   =  a1   +  a2  
 a1 - a2   =  a1   -  a 2  
 a1 * a2   =  a1     a2  
 (a1)   =  a1   --- not needed
 - a   = 0 -  a1  
• Compositional
• Expressions in While are side-effect free
55
Arithmetic expression exercise
Suppose  x = 3
Evaluate x+1 
56
Semantics of boolean expressions
• Semantic function  B  : State  T
• Defined by induction on the syntax tree
 true   = tt
 false   = ff
 a1 = a2  =
 a1  a2   =
 b1  b2   =
 b=
• Compositional
• Expressions in While are side-effect free
57
Semantics
of statements
S,   
first step
By Vanillase (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.
58
Operational semantics
• Developed by Gordon Plotkin
• Configurations:  has one of two forms:
S, 

Statement S is about to execute on state 
Terminal (final) state
• Transitions S,   
first step
 = S’, ’ Execution of S from  is not completed
and remaining computation proceeds
from intermediate configuration 
 = ’
Execution of S from  has terminated
and the final state is ’
• S,  is stuck if there is no  such that S,   
59
Form of semantic rules
•
defined by rules of the form
side condition
premise
S1, 1
1’, … , Sn, n
S,  ’
n’
if…
conclusion
• The meaning of compound statements is
defined using the meaning immediate
constituent statements
60
Operational semantics for While
x:=a,   [xa]
skip,   
[skip]
S1,   S1’, ’
1
[comp ]
S1; S2,   S1’; S2, ’
S1,   ’
2
[comp ]
S1; S2,   S2, ’
[ass]
When does
this happen?
[iftt]
if b then S1 else S2,   S1, 
if b  = tt
[ifff]
if b then S1 else S2,   S2, 
if b  = ff
[whilett] while b do S,   S; while b do S, 
if b  = tt
[whileff] while b do S,   
if b  = ff
61
Factorial (n!) example
• Input state  such that  x = 3
y := 1; while (x=1) do (y := y * x; x := x – 1)
y :=1 ; W, 
 W, [y1]
 ((y := y * x; x := x – 1); W), [y1]
 (x := x – 1; W), [y3]
 W , [y3][x2]
 ((y := y * x; x := x – 1); W), [y3] [x2]
 (x := x – 1; W) , [y6] [x2]
 W, [y6][x1]
 [y6][x1]
62
Derivation sequences
• An interpreter
– Given a statement S and an input state 
start with the initial configuration S, 
– Repeatedly apply rules until either a terminal state
is reached or an infinite sequence
• We will write S,  k ’ if ’ can be obtained
from S,  by k derivation steps
• We will write S,  * ’ if S,  k ’ for
some k0
63
The semantics of statements
• The meaning of a statement S is defined as
S  =
’
 else
if S,  * ’
• Examples:
skip = 
x:=1 = [x 1]
while true do skip = 
64
Properties of operational
semantics
65
Deterministic semantics for While
• Theorem: for all statements S and states 1, 2
if S,  * 1 and S,  * 2 then 1= 2
66
Program termination
• Given a statement S and input 
– S terminates on  if there exists a state ’ such
that S,  * ’
– S loops on  if there is no state ’ such that
S,  * ’
• Given a statement S
– S always terminates if
for every input state , S terminates on 
– S always loops if
for every input state , S loops on 
67
Semantic equivalence
• S1 and S2 are semantically equivalent if
for all :
S1,  *  if and only if S2,  * 
 is either stuck or terminal
and there is an infinite derivation sequence
for S1,  if and only if there is one for S2, 
• Examples
–
–
–
–
S; skip and S
while b do S and if b then (S; while b S) else skip
((S1; S2); S3) and (S1; (S2; S3))
(x:=5; y:=x*8) and (x:=5; y:=40)
68
Exercise
Why do we
need this?
69
Operational semantics
of IL
70
IL syntax reminder
n
l
x
Num
Num
Var Temp
Vn|x
R  V Op V |V
Op  - | + | * | / | = |
C  l: skip
| l: x := R
| l: Goto l’
| l: IfZ x Goto l’
IR  C+
Numerals
Labels
Variables and temporaries
| << | >> | …
71
Intermediate program states
Z
Integers {0, 1, -1, 2, -2, …}
IState
(Var  Temp  {pc}) Z
– Var, Temp, and {pc} are all disjoint
– For every state m and program
P=1:c1,…,n:cn
we have that 1  m(pc)  n+1
• We can check that the labels used in P are legal
72
Rules for executing commands
• We will use rules of the following form
m(pc) = l
P(l) = C
m  m’
• Here m is the pre-state, which is scheduled to
be executed as the program counter indicates,
and m’ is the post-state
• The rules specialize for the particular type of
command C and possibly other conditions
73
Evaluating values
nm = n
xm =
m(x)
v1 op v2m = v1m op
v2m
74
Rules for executing commands
m(pc) = l
P(l) = skip
m  m[pc l+1]
m(pc) = l
P(l) = x:=v
m  m[pc l+1, x vm]
m(pc) = l
P(l) = x:=v1 op v2
m  m[pc l+1, x v1m op
v2m]
m(pc) = l
P(l) = Goto l’
m  m[pc l’]
75
Rules for executing commands
m(pc) = l
P(l) = IfZ x Goto l’
m  m[pc l’]
m(x)=0
m(pc) = l
P(l) = IfZ x Goto l’
m  m[pc l+1]
m(x)0
m(pc) = l
P(l) = IfNZ x Goto l’ m(x)  0
m  m[pc l’]
m(pc) = l
P(l) = IfNZ x Goto l’ m(x)=0
m  m[pc l+1]
76
Executing programs
• For a program P=1:c1,…,n:cn
we define executions as finite or infinite
sequences m1  m2 …  mn …
• We write m * m’ if there is a finite execution
starting at m and ending at m’:
m = m1  m2 …  mn = m’
77
Semantics of a program
• For a program P=1:c1,…,n:cn
and a state m s.t. m(pc)=1
we define the result of executing P on m as
P m
=
m’

if m * m’ and m’(pc)=n+1
else
• Lemma: the function is well-defined
(i.e., at most one output state)
78
Execution example
• Execute the following intermediate language
program on a state where all variables
evaluate to 0
1:
2:
3:
4:
5:
6:
7:
8:
9:
t1 := 137
y := t1 + 3
IfZ x Goto 7
t2 := y
z := t2
Goto 9
t3 := y
x := t3
skip
m = [pc 1, t1 0 , t2 0 , t3 0 , x 0 , y 0]  ?
79
Execution example
• Execute the following intermediate language
program on a state where all variables
evaluate to 0
1:
2:
3:
4:
5:
6:
7:
8:
9:
t1 := 137
y := t1 + 3
IfZ x Goto 7
t2 := y
z := t2
Goto 9
t3 := y
x := t3
skip
m = [pc 1, t1 0 , t2 0 , t3 0 , x 0 , y 0] 
m[pc 2, t1 137] 
m[pc 3, t1 137, y 140] 
m[pc 7, t1 137, y 140] 
m[pc 8, t1 137, t3 140, y 140] 
m[pc 9, t1 137, t3 140, , x 140, y 140] 
m[pc 10, t1 137, t3 140, , x 140, y 140]
80
Adding procedures
81
While+procedures syntax
n
x
f
Num
Var
Numerals
Program variables
Function identifiers, including main
A  n | x | A ArithOp A | (A) | f(A*)
ArithOp  - | + | * | /
B  true | false
| A = A | A A | B | B B | (B)
S  x := A | skip | S; S | { S }
| if B then S else S
| while B S
Can be used to call side-effecting
functions and to return values
| f(A*)
from functions
*
F  f(x ) { S }
P ::= F+
82
IL+procedures syntax
n
l
x
f
Num
Num
Var Temp
Numerals
Labels and function identifiers
Params Variables, temporaries, and parameters
Function identifiers
Vn|x
R  V Op V | V | Call f(x*) | pn
Op  - | + | * | / | = | | << | >> | …
C  l: skip
| l: x := R
| l: Goto l’
| l: IfZ x Goto l’
| l: IfNZ x Goto l’
| l: Call f(x*)
| l: Ret x
IR  C+
function
parameter n
83
Exercise
84
Defining translation
correctness
85
Exercise 1
• Are the following equivalent in your opinion?
While
x := 137
IL
1: t1 := 137
2: x := t1
86
Exercise 2
• Are the following equivalent in your opinion?
While
x := 137
IL
1: y := 137
2: x := y
87
Exercise 3
• Are the following equivalent in your opinion?
While
x := 137
IL
2: t2 := 138
3: t1 := 137
4: x := t1
88
Exercise 4
• Are the following equivalent in your opinion?
While
x := 137
IL
2:
3:
4:
5:
t2 := 138
t1 := 137
x := t1
t1 := 138
89
Equivalence of arithmetic expressions
•
•
•
•
•
Define TVar = Var  Temp
While state:   Var Z
IL state: m  (Var  Temp  {pc}) Z
Define label(P) to be first label of IL program P
An arithmetic While expression a is equivalent to an
IL program P and xTVar iff for every input state m
such that m(pc)=label(P):
a m|Var = (P m) x
90
Statement equivalence
• We define state equivalence by
 m iff =m|Var
– That is, for each xVar (x)=m(x)
• A While statement S is equivalent to an IL program P
iff for every input state m such that m(pc)=label(P):
S m|Var  (P )|Var
91
Next lecture:
Dataflow-based Optimizations