Intermediate Code Generation
Reading List:
Aho-Sethi-Ullman:
Chapter 2.3
Chapter 6.1 ~ 6.2
Chapter 6.3 ~ 6.10
(Note: Glance through it only for
intuitive understanding.)
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
1
Summary of Front End
Lexical Analyzer (Scanner)
+
Syntax Analyzer (Parser)
+ Semantic Analyzer
Abstract Syntax Tree w/Attributes
Front
End
Intermediate-code Generator
Error
Message
7/13/2017
Non-optimized Intermediate Code
\course\cpeg621-10F\Topic-1a.ppt
2
Component-Based Approach
to Building Compilers
Source program
in Language-1
Source program
in Language-2
Language-1 Front End
Language-2 Front End
Non-optimized Intermediate Code
Intermediate-code Optimizer
Optimized Intermediate Code
7/13/2017
Target-1 Code Generator
Target-2 Code Generator
Target-1 machine code
Target-2 machine code
\course\cpeg621-10F\Topic-1a.ppt
3
Intermediate Representation (IR)
A kind of abstract machine language that
can express the target machine operations
without committing to too much machine
details.
Why IR ?
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
4
Without IR
7/13/2017
C
SPARC
Pascal
HP PA
FORTRAN
x86
C++
IBM PPC
\course\cpeg621-10F\Topic-1a.ppt
5
With IR
C
SPARC
Pascal
HP PA
IR
7/13/2017
FORTRAN
x86
C++
IBM PPC
\course\cpeg621-10F\Topic-1a.ppt
6
With IR
C
Pascal
IR
Common Backend
?
FORTRAN
C++
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
7
Advantages of Using an
Intermediate Language
1. Retargeting - Build a compiler for a new machine by
attaching a new code generator to an existing front-end.
2. Optimization - reuse intermediate code optimizers in
compilers for different languages and different
machines.
Note: the terms “intermediate code”, “intermediate
language”, and “intermediate representation” are all
used interchangeably.
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
8
Issues in Designing an IR
Whether to use an existing IR
if target machine architecture is similar
if the new language is similar
Whether the IR is appropriate for the
kind of optimizations to be performed
e.g. speculation and predication
some transformations may take much
longer than they would on a different IR
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
9
Issues in Designing an IR
Designing a new IR needs to consider
Level (how machine dependent it is)
Structure
Expressiveness
Appropriateness for general and special
optimizations
Appropriateness for code generation
Whether multiple IRs should be used
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
10
Multiple-Level IR
Source
Program
Semantic
Check
7/13/2017
High-level
IR
High-level
Optimization
Low-level
IR
…
Target
code
Low-level
Optimization
\course\cpeg621-10F\Topic-1a.ppt
11
Using Multiple-level IR
Translating from one level to another in the
compilation process
Preserving an existing technology
investment
Some representations may be more
appropriate for a particular task.
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
12
Commonly Used IR
Possible IR forms
• Graphical representations: such as syntax
trees, AST (Abstract Syntax Trees), DAG
• Postfix notation
• Three address code
• SSA (Static Single Assignment) form
IR should have individual components
that describe simple things
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
13
DAG Representation
A variant of syntax tree.
Example: D = ((A+B*C) + (A*B*C))/ -C
=
D
/
DAG: Direct Acyclic
Graph
_
+
+
*
*
A
7/13/2017
B
C
\course\cpeg621-10F\Topic-1a.ppt
14
Postfix Notation (PN)
A mathematical notation wherein every operator
follows all of its operands.
Examples:
The PN of expression 9* (5+2) is 952+*
How about (a+b)/(c-d) ?
7/13/2017
ab+cd-/
\course\cpeg621-10F\Topic-1a.ppt
15
Postfix Notation (PN) – Cont’d
Form Rules:
1. If E is a variable/constant, the PN of E is E itself
2. If E is an expression of the form E1 op E2, the PN
of E is E1’E2’op (E1’ and E2’ are the PN of E1 and
E2, respectively.)
3. If E is a parenthesized expression of form (E1), the
PN of E is the same as the PN of E1.
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
16
Three-Address Statements
A popular form of intermediate code used in optimizing
compilers is three-address statements.
Source statement:
x = a + b c + d
Three address statements with temporaries t1 and t2:
t 1 = b c
t2 = a + t1
x = t2 + d
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
17
Three Address Code
The general form
x := y op z
x,y,and z are names, constants, compilergenerated temporaries
op stands for any operator such as +,-,…
x*5-y might be translated as
t1 := x * 5
t2 := t1 - y
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
18
Syntax-Directed Translation
Into Three-Address
Temporary
• In general, when generating three-address statements, the
compiler has to create new temporary variables (temporaries) as
needed.
• We use a function newtemp( ) that returns a new temporary
each time it is called.
• Recall Topic-2: when talking about this topic
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
19
Syntax-Directed Translation
Into Three-Address
• The syntax-directed definition for E in a production
id := E has two attributes:
1. E.place - the location (variable name or offset) that
holds the value corresponding to the nonterminal
2. E.code - the sequence of three-address statements
representing the code for the nonterminal
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
20
Example Syntax-Directed
Definition
term ::= ID
{ term.place := ID.place ; term.code = “” }
term1 ::= term2 * ID
{term1.place := newtemp( );
term1.code := term2.code || ID.code ||*
gen(term1.place ‘:=‘ term2.place ‘*’ ID.place}
expr ::= term
{ expr.place := term.place ; expr.code := term.code }
expr1 ::= expr2 + term
{ expr1.place := newtemp( )
expr1.code := expr2.code || term.code ||+
gen(expr1.place ‘:=‘ expr2.place ‘+’ term.place
}
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
21
Syntax tree vs. Three address
code
Expression: (A+B*C) + (-B*A) - B
_
+
B
+
A
*
_
*
B
C
A
B
T1 := B * C
T2 = A + T1
T3 = - B
T4 = T3 * A
T5 = T2 + T4
T6 = T5 – B
Three address code is a linearized representation
of a syntax tree (or a DAG) in which explicit names
(temporaries) correspond to the interior nodes of the graph.
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
22
DAG vs. Three address code
Expression: D = ((A+B*C) + (A*B*C))/ -C
=
D
/
_
+
+
*
*
A
B
C
T1 := A
T2 := C
T3 := B * T2
T4 := T1+T3
T5 := T1*T3
T6 := T4 + T5
T7 := – T2
T8 := T6 / T7
D := T8
T1 := B * C
T2 := A+T1
T3 := A*T1
T4 := T2+T3
T5 := – C
T6 := T4 / T5
D := T6
Question: Which IR code sequence is better?
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
23
Implementation of Three
Address Code
Quadruples
• Four fields: op, arg1, arg2, result
¤ Array of struct {op, *arg1, *arg2, *result}
• x:=y op z is represented as op y, z, x
• arg1, arg2 and result are usually pointers to
symbol table entries.
• May need to use many temporary names.
• Many assembly instructions are like
quadruple, but arg1, arg2, and result are real
registers.
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
24
Implementation of Three
Address Code (Con’t)
Triples
• Three fields: op, arg1, and arg2. Result become implicit.
• arg1 and arg2 are either pointers to the symbol table or
index/pointers to the triple structure.
Example: d = a + (b*c)
1 *
b, c
Problem in
2 +
a, (1)
reorder the
codes?
3 assign d, (2)
• No explicit temporary names used.
• Need more than one entries for ternary operations such
as x:=y[i], a=b+c, x[i]=y, … etc.
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
25
IR Example in Open64 ─ WHIRL
The Open64 uses a tree-based intermediate
representation called WHIRL, which stands
for Winning Hierarchical Intermediate
Representation Language.
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
26
From WHIRL to CGIR
An Example
ST aa
LD
+
*
a
i
4
(d) CGIR
(c) WHIRL
7/13/2017
T1 = sp + &a;
T2 = ld T1
T3 = sp + &i;
T4 = ld T3
T6 = T4 << 2
T7 = T6
T8 = T2 + T7
T9 = ld T8
T10 = sp + &aa
:= st T10 T9
\course\cpeg621-10F\Topic-1a.ppt
28
From WHIRL to CGIR
An Example
int *a;
int
i;
int
aa;
aa = a[i];
(a) Source
7/13/2017
U4U4LDID 0 <2,1,a> T<47,anon_ptr.,4>
U4U4LDID 0 <2,2,i> T<8,.predef_U4,4>
U4INTCONST 4 (0x4)
U4MPY
U4ADD
I4I4ILOAD 0 T<4,.predef_I4,4> T<47,anon_ptr.,4>
I4STID 0 <2,3,aa> T<4,.predef_I4,4>
(b) Whirl
\course\cpeg621-10F\Topic-1a.ppt
29
(insn 8 6 9 1 (set (reg:SI 61 [ i.0 ])
(mem/c/i:SI (plus:SI (reg/f:SI 54 virtual-stack-vars)
(const_int -8 [0xfffffffffffffff8])) [0 i+0 S4 A32])) -1 (nil)
(nil))
U4U4LDID 0 <2,1,a> T<47,anon_ptr.,4>
U4U4LDID 0 <2,2,i> T<8,.predef_U4,4>
U4INTCONST 4 (0x4)
U4MPY
U4ADD
I4I4ILOAD 0 T<4,.predef_I4,4> T<47,anon_ptr.,4>
I4STID 0 <2,3,aa> T<4,.predef_I4,4>
(insn 9 8 10 1 (parallel [
(set (reg:SI 60 [ D.1282 ])
(ashift:SI (reg:SI 61 [ i.0 ])
(const_int 2 [0x2])))
(clobber (reg:CC 17 flags))
]) -1 (nil)
(nil))
(insn 10 9 11 1 (set (reg:SI 59 [ D.1283 ])
(reg:SI 60 [ D.1282 ])) -1 (nil)
(nil))
(insn 11 10 12 1 (parallel [
(set (reg:SI 58 [ D.1284 ])
(plus:SI (reg:SI 59 [ D.1283 ])
(mem/f/c/i:SI (plus:SI (reg/f:SI 54 virtual-stack-vars)
(const_int -12 [0xfffffffffffffff4])) [0 a+0 S4 A32])))
(clobber (reg:CC 17 flags))
]) -1 (nil)
(nil))
(insn 12 11 13 1 (set (reg:SI 62)
(mem:SI (reg:SI 58 [ D.1284 ]) [0 S4 A32])) -1 (nil)
(nil))
(insn 13 12 14 1 (set (mem/c/i:SI (plus:SI (reg/f:SI 54 virtual-stack-vars)
(const_int -4 [0xfffffffffffffffc])) [0 aa+0 S4 A32])
(reg:SI 62)) -1 (nil)
(nil))
WHIRL
7/13/2017
GCC RTL
\course\cpeg621-10F\Topic-1a.ppt
30
Differences
gcc rtl describes more details than whirl
gcc rtl already assigns variables to stack
actually, WHIRL needs other symbol tables to
describe the properties of each variable.
Separating IR and symbol tables makes WHIRL
simpler.
WHIRL contains multiple levels of program
constructs representation, so it has more
opportunities for optimization.
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
31
Summary of Front End
Lexical Analyzer (Scanner)
+
Syntax Analyzer (Parser)
+ Semantic Analyzer
Abstract Syntax Tree w/Attributes
Front
End
Intermediate-code Generator
Error
Message
7/13/2017
Non-optimized Intermediate Code
\course\cpeg621-10F\Topic-1a.ppt
32
Position := initial + rate * 60
intermediate code generator
lexical analyzer
id1 := id2 + id3 * 60
syntax analyzer
temp1 := inttoreal (60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1
:= temp3
:=
id1
code optimizer
+
id2
temp1 := id3 * 60.0
id1
:= id2 + temp1
*
id3
60
code generator
semantic analyzer
MOVF
MULF
MOVF
ADDF
MOVF
:=
id1
+
id2
*
id3
R2,
R1,
R1
id1
inttoreal
60
7/13/2017
id3, R2
#60.0, R2
id2, R1
The Phases of a Compiler
\course\cpeg621-10F\Topic-1a.ppt
33
Summary
1. Why IR
2. Commonly used IR
3. IRs of Open64 and GCC
7/13/2017
\course\cpeg621-10F\Topic-1a.ppt
34
© Copyright 2026 Paperzz