x:=y+1

Chapter 8
Intermediate Code
Basic Code Generation Techniques
Gang S. Liu
College of Computer Science & Technology
Harbin Engineering University
Introduction
 Final task of the compiler is to generate executable
code for a target machine that is a representation
of a semantics of the source code.
 This is the most complex phase of a compiler.
 It depends on detailed information about
 the target architecture,
 the structure of the runtime
 OS
 There is an attempt to optimize the speed and the
size of the target code to take advantages of
special features of the target machine (registers,
addressing modes, pipelining, and cache memory)
Compiler Construcion
[email protected]
2
Introduction (cont)
 The code generation is typically
broken into several steps, often
including an abstract code called
intermediate code.
 Two popular forms are
1. Three address code
2. P-code
Compiler Construcion
[email protected]
3
Intermediate Code
 A data structure that represents the source
program during translation is called an
intermediate representation (IR).
 An abstract syntax tree was used as the principal
IR.
 An abstract syntax tree does not resemble target
code.
 Example: control flow constructs.
 A new form of IR is necessary.
 Such intermediate representation that closely
resembles target code is called intermediate code.
Compiler Construcion
[email protected]
4
Form of Intermediate Code
 Intermediate code is a linearization of the
syntax tree.
 Intermediate code
 Can be very high level, representing operations
almost as abstractly as the syntax tree or can
closely resemble target code.
 May use or not used detailed information about
the target machine and runtime environment.
Compiler Construcion
[email protected]
5
Use of Intermediate Code
 Intermediate code is useful
 For producing extremely efficient code
 In making a compiler more easily retargetable
(if intermediate code is relatively target
independent).
Source Language 1
Target Language 1
Intermediate Code
Source Language 2
Compiler Construcion
Target Language 2
[email protected]
6
 Intermediate code generation is in the
mediate part of compiler, it is a bridge
which translate source program into
intermediate representation and then
translate into target code. The position of
intermediate code generation in compiler
is shown in Figure 8.1.
.
Compiler Construcion
[email protected]
7
Compiler Construcion
[email protected]
8
 There are two advantages of using intermediate
code,
 The first one is that we can attach different target
code machines to same front part after the part of
intermediate code generation;
;
 The second one is that a machine-independent code
optimizer can be applied to the intermediated
representation.
.
Compiler Construcion
[email protected]
9
 Intermediate codes are machine independent
codes, but they are close to machine
instructions. The given program in a source
language is converted to an equivalent program
in an intermediate language by the intermediate
code generator.
.
Compiler Construcion
[email protected]
10
 Intermediate language can be many different
languages, and the designer of the compiler
decides this intermediate language. Postfix
notation, four-address code(Quadraples),
three-address code, portable code and
assembly code can be used as an intermediate
language. In this chapter, we will introduce
them in detail.
Compiler Construcion
[email protected]
11
8.1 Postfix Notation
 If we can represent the source program by
postfix notation, it will be easy to be
translated into target code, because the target
instruction order is same with the operator
order in postfix
notation.
.
Compiler Construcion
[email protected]
12
8.1.1 The definition of postfix notation
 the postfix notation for the expression a+b*c is
abc*+. the expression are as follows:
1 The order of operands for expression in postfix
notation is same with its original order.
2 Operator follows its operand, and there are no
parentheses in postfix notation.
3 The operator appears in the order by the
calculation order.
Compiler Construcion
[email protected]
13
 For example, the postfix notation for expression
a*(b+c/d) is abcd/+*, the translation procedure is just
following the steps above.
.
 firstly, according to step 1 we get the order of operands
of the expression: abcd,
 secondly, by the step 2, the first operator in operator
order is /, because it just follows its operands cd, in
addition, as the step 3, operator / is calculated first, so
the operator follow operands is / . The second operator in
operator order is +, it dues to that there is parentheses in
the original expression, operator + should be calculated
earlier than operator *.The last one is *, because * is
calculated lastly.
.
Compiler Construcion
[email protected]
14
 The other example, the postfix notation for
expression a*b+(c-d)/e is ab*cd-e/+. From
examples, we know it is a bit difficult to
translate an expression into its postfix notation.
So scientist E.W.DIJKSTRA from Holand
created a method to solve the
problem.
.
Compiler Construcion
[email protected]
15
8.1.2 E.W.DIJKSTRA Method
 There are two stacks in E.W.DIJKSTRA method,
one stack storages operands, the other one is for
operators, the procedure of it is shown by Figure
8.2, and the step of E.W.DIJKSTRA method is as
follows:
.
Compiler Construcion
[email protected]
16

Compiler Construcion
[email protected]
17
 Actually, scanning the expression is from left to
right. At the beginning of scanning, we push
identifier # to the bottom of operator stack,
similarly, we add identifier # to the end of
expression to label that it is terminal of
expression. When the two identifier # meet, it
means the end of scanning. The steps of scanning
are:
1 If it is operand, go to the operand stack :
Compiler Construcion
[email protected]
18
2
If it is operator, it should be compared with the
operator on the top of operator stack. When the
priority of operator on the top stack is bigger
than the scanning operator, or equal to it, the
operator on the top of operator stack would be
popped and go to the left side. On the other hand
when the priority of operator on the top stack is
less than the scanning operator, scanning
operator should be pushed into operator stack.
Compiler Construcion
[email protected]
19
3 If it is left parenthesis, just push it into operator
stack, and then compare the operators within
parentheses.
.
 If it is right parenthesis, pop all the operators
within parentheses, what is more, parentheses
would be disappeared and would not be
represented as postfix notation.
.
4 Return to step 1 till two identifier # meet.
Compiler Construcion
[email protected]
20
Example 8.1
 There is an expression of a+b*c , its postfix
notation is abc*+. From the translating procedure
shown by Figure 8.3, we can see that operator
order is *+, it is also the pop order of the
operator stack and calculating
order
.
Compiler Construcion
[email protected]
21
Compiler Construcion
[email protected]
22
Compiler Construcion
[email protected]
23
Three-Address Code
 The most basic instruction of three
address code
x = y op z
 The use of the address x differs from
the addresses of y and z.
 y and z can represent constants and
literal values.
Compiler Construcion
[email protected]
24
Form of Intermediate Code
 Intermediate code is a linearization of the
syntax tree.
 Intermediate code
 Can be very high level, representing operations
almost as abstractly as the syntax tree or can
closely resemble target code.
 May use or not used detailed information about
the target machine and runtime environment.
Compiler Construcion
[email protected]
25
Example
+
2*a+(b-3)
*
2
a
b
t1=2*a
t1=b-3
t2=b-3
t2=2*a
t3=t1+t2
t3=t2+t1
Left-to-right linearization
Compiler Construcion
3
Right-to-left linearization
[email protected]
26
Three-Address Code (cont)
 It is necessary to vary form of the
three-address code to express all
constructs (e.g. t2=-t1)
 No standard form exists.
Compiler Construcion
[email protected]
27
Implementation of
Three-Address Code



Each three-address instruction is implemented as a
record structure containing several fields.
The entire sequence is an array or a linked list .
The most common implementation requires four
fields – quadruple
 One for operation and three for addresses.

For instructions that need fewer number of
addresses, one or more addresses fields is given
null or “empty” values.
Compiler Construcion
[email protected]
28
Factorial Program
{ Sample program
in TINY language computes factorial }
read x; { input an integer }
if 0 < x then { don't compute if x <= 0 }
fact := 1;
repeat
fact := fact * x;
x := x - 1
until x = 0;
write fact { output factorial of x }
end
Compiler Construcion
[email protected]
29
Syntax Tree for Factorial
Program
Compiler Construcion
[email protected]
30
Example
(rd, x, _, _)
{ Sample program
(gt, x, 0, t1)
in TINY language (if_f, t1, L1, _)
computes factorial }
(asn, 1, fact, _)
read x; { input an integer }
(lab, L2, _, _)
if 0 < x then
(mul, fact, x, t2)
{ don't compute if x <= 0 }
fact := 1;
(asn, t2, fact, _)
repeat
(sub, x, 1, t3)
fact := fact * x;
(asn, t3, x, _)
x := x - 1
(eq, x, 0, t4)
until x = 0;
(if_f, t4, L2, _)
write fact
(wri, fact, _, _)
{ output factorial of x }
(lab, L1, _, _)
end
(halt, _, _, _) 31
Compiler Construcion
[email protected]
Different Representation
 Instructions themselves represent
temporaries.
 This reduces the number of address fields
from three to two.
 Such representation is called a triple.
 Amount of space is reduced.
 Major drawback: any movement becomes
difficult for array representation.
Compiler Construcion
[email protected]
32
(rd, x, _, _)
(0) (rd, x, _)
Example
(gt, x, 0, t1)
(1) (gt, x, 0)
(if_f, t1, L1, _)
(2) (if_f, (1), (11))
(asn, 1, fact, _)
(3) (asn, 1, fact)
(lab, L2, _, _)
(4) (mul, fact, x)
(mul, fact, x, t2)
(5) (asn, (4), fact)
(ans, t2, fact, _)
(6) (sub, x, 1)
(sub, x, 1, t3)
(asn, t3, x, _)
(7) (asn, (6), x)
(eq, x, 0, t4)
(8) (eq, x, 0)
(if_f, t4, L2, _)
(9) (if_f, (8), (4))
(wri, fact, _, _)
(10) (wri, fact, _)
(lab, L1, _, _)
(11) (halt, _, _)
(halt,
_, _, _) [email protected]
Compiler Construcion
33
P-Code
 Standard assembly language code produced by
Pascal compilers in 1970/80.
 Designed for hypothetical stack machine, called Pmachine.
 Interpreters were written for actual machines.
 This made Pascal compilers easy portable.
 Only interpreter must be rewritten for a new platform.
 Modifications of P-code are used in a number of
compilers, mostly for Pascal-like languages.
Compiler Construcion
[email protected]
34
P-Machine
 Consists of
 A code memory
 An unspecified data memory for named
variables
 A stack for temporary data
 Registers needed to maintain the stack
and support execution.
Compiler Construcion
[email protected]
35
Example 1
2*a+(b-3)
ldc 2
; load constant 2
lod a
; load value of variable a
mpi
; integer multiplication
lod b
; load value of variable b
ldc 3
; load constant 3
sbi
; integer subtraction
adi
; integer addition
Compiler Construcion
[email protected]
36
Example 2
x:=y+1
lda x
; load address of x
lod y
; load value of y
ldc 1
; load constant 1
adi
; add
sto
; store top to address
; bellow top & pop both
Compiler Construcion
[email protected]
37
Factorial Program
lda x
rdi
;load address of x
lod fact ;load value of fact
;load value of x
;read an integer, store to lod x
;multiply
;address on top of the stackmpi
sto
;store top to
;(& pop it)
;address of second &
lod x
;load the value of x
;pop
ldc 0
;load constant 0
lda x
;load address of x
grt
;pop an compare top two
lod x
;load a value of x
;values push the Boolean
ldc 1
;load constant 1
;result
Sbi
;subtract
fjp L1
;pop Boolean value,
sto;
;jump to L1 if false
lod x
lda fact ;load address of fact
ldc 0
ldc 1
;load constant 1
equ
;test for equality
sto
;pop two values, storing
fjp L2
;jump to L2 of false
;the first to address
lod fact;
;represented by second
wri
lab L2
;definition of label 2
lab L1
stp
lda fact ;load address of fact
Compiler Construcion
[email protected]
38
P-Code and
Three-Address Code
 P-code
 is closer to actual machine.
 Instructions require fewer addresses.
 “One-address” or “zero-address”
 Less compact in terms of instructions.
 Not “self-contained”
 Instructions operate implicitly on a stack
 All temporary values are on stack, no need
for temporary names.
Compiler Construcion
[email protected]
39
Generation of Target Code
 Involves two standard techniques
1. Macro expansion
– Replaces each intermediate code instruction
with an equivalent sequence of target code
instructions.
2. Static simulation
– Straight-line simulation of the effects of the
intermediate code and generating target
code to match these effects.
Compiler Construcion
[email protected]
40
Example
exp → id = exp | aexp
aexp → aexp + factor | factor
factor → (exp) | num | id
lda
lod
ldc
adi
stn
ldc
adi
x
x
3
(x=x+3)+4
t1 = x+3
x = t1
t2 = t1+4
4
Compiler Construcion
[email protected]
41
Static Simulation
lda
lod
ldc
adi
stn
ldc
adi
x
x
3
top of stack
3
x
4
address of x
t1=x+3
top of stack
t1
address of x
Compiler Construcion
[email protected]
42
Static Simulation
lda
lod
ldc
adi
stn
ldc
adi
x
x
3
top of stack
t1
4
address of x
x=t1
top of stack
t1
Compiler Construcion
[email protected]
43
Static Simulation
lda
lod
ldc
adi
stn
ldc
adi
x
x
3
t1 = x+3
x = t1
t2 = t1+4
top of stack
4
4
t2=t1+4
t1
top of stack
t2
Compiler Construcion
[email protected]
44
Example
exp → id = exp | aexp
aexp → aexp + factor | factor
factor → (exp) | num | id
lda
lod
ldc
adi
stn
ldc
adi
x
x
3
(x=x+3)+4
t1 = x+3
x = t1
t2 = t1+4
4
Compiler Construcion
[email protected]
45
lda t1
Macro Expansion
t1 = x+3
x = t1
t2 = t1+4
t1 = x+3
x = t1
lod x
lod x
ldc 3
ldc 3
adi
adi
sto
sto
lda x
lda x
lod t1
lod t1
sto
sto
lda t2
t2 = t1+4
lod t1
ldc 4
adi
Compiler Construcion
lda t1
[email protected]
sto
lda t2
lod t1
ldc 4
adi
sto
46