Topic #6: Type Checking

Topic #7: Intermediate Code
EE 456 – Compiling Techniques
Prof. Carl Sable
Fall 2003
Intermediate Code Generation
• The front end of a compiler translates a source program
into an intermediate representation
• Details of the back end are left to the back end
• Benefit include:
– Retargeting
– Machine-independent code optimization
Intermediate Languages
• Consider code:
a := b * -c + b * -c
• A syntax tree graphically
depicts code (example at
right)
• Postfix notation is a
linearized representation
of a syntax tree:
a b c uminus * b c
uminus * + assign
Three-Address Code
• Three-address code is a sequence of
statements of the general form x := y op z
• x, y, and z are names, constants, or compilergenerated temporaries
• op can be any operator
• Three-address code is a linearized
representation of a syntax tree
• Explicit names correspond to interior nodes of
the graph
Three-Address Code Example
t1 := -c
t2 := b * t1
t3 := -c
t4 := b * t3
t5 := t2 + t4
a := t5
Types of Statements
• Assignment statements of the form x := y op z,
where op is a binary operator
• Assignment statements of the form x := op y where
op is a unary operator
• Copy statements of the form x := y
• The unconditional jump goto L
• Conditional jumps such is if x relop y goto L
• param x and call p, n and return y relating to
procedure calls
• Assignments of the form x := y[i] and x[i] := y
• Address and pointer assignments of the form x := &y,
x := *y, and *x = y
Generating Three-Address Code
• Temporary names are made up for the interior nodes of
a syntax tree
• The synthesized attribute S.code represents the code
for the assignment S
• The nonterminal E has attributes:
– E.place is the name that will hold the value of E
– E.code is a sequence of three-address statements evaluating E
• The function newtemp returns a sequence of distinct
names
• The function newlabel returns a sequence of distinct
labels
Assignments (1)
Production
Semantic Rules
S  id := E S.code := E.code ||
gen(id.place ':=' E.place)
E  E1 + E2
E.place := newtemp;
E.code := E1.code || E2.code ||
gen(E.place ':=' E1.place '+' E2.place)
E  E1 * E2
E.place := newtemp;
E.code := E1.code || E2.code ||
gen(E.place ':=' E1.place '*' E2.place)
Assignments (2)
Production
E  -E1
Semantic Rules
E.place := newtemp;
E.code := E1.code ||
gen(E.place ':=' 'uminus' E1.place)
E  (E1)
E.place := E1.place;
E.code := E1.code
E  id
E.place := id.place;
E.code := ''
Example: S  while E do S1
S.begin := newlabel;
S.after := newlabel;
S.code := gen(S.begin ':') ||
E.code ||
gen('if' E.place '=' '0' 'goto' S.after) ||
S1.code ||
gen('goto' S.begin)
gen(S.after ':')
Quadruples
• A quadruple is a record structure with four fields:
op, arg1, arg2, and result
–
–
–
–
The op field contains an internal code for an operator
Statements with unary operators do not use arg2
Operators like param use neither arg2 nor result
The target label for conditional and unconditional
jumps are in result
• The contents of fields arg1, arg2, and result
are typically pointers to symbol table entries
– If so, temporaries must be entered into the symbol
table as they are created
– Obviously, constants need to be handled differently
Quadruples Example
op
arg1
arg2
result
(0)
uminus
c
(1)
*
b
(2)
uminus
c
(3)
*
b
t3
t4
(4)
+
t2
t4
t5
(5)
:=
t5
t1
t1
t2
t3
a
Triples
• Triples refer to a temporary value by the position
of the statement that computes it
– Statements can be represented by a record with only
three fields: op, arg1, and arg2
– Avoids the need to enter temporary names into the
symbol table
• Contents of arg1 and arg2:
– Pointer into symbol table (for programmer defined
names)
– Pointer into triple structure (for temporaries)
– Of course, still need to handle constants differently
Triples Example
op
arg1
arg2
(0)
uminus
c
(1)
*
b
(2)
uminus
c
(3)
*
b
(2)
(4)
+
t2
(3)
(5)
assign
a
(4)
(0)
Declarations (1)
• A symbol table entry is created for every
declared name
• Information includes name, type, relative
address of storage, etc.
• Relative address consists of an offset:
– Offset is from the base of the static data area for
globals
– Offset is from the field for local data in an activation
record for locals to procedures
• Types are assigned attributes type and width
• Becomes more complex if we need to deal with
nested procedures or records
Declarations (2)
Production
P  M D
M  ε
Semantic Rules
offset := 0
D  D ; D
D  id : T
enter(id.name, T.type, offset);
offset := offset + T.width
T  integer
T.type := integer;
T.width := 4
T  real
T.type := real
T.width := 8
T  array[num] of T1
T.type := array(num, T1.type);
T.width := num * T1.width
Addressing Array Elements
• The location of the ith element of array A is:
base + (i – low) * w
– w is the width of each element
– low is the lower bound of the subscript
– base is the relative address of A[low]
• The expression for the location can be rewritten
as: i * w + (base – low * w)
– The subexpression in parentheses is a constant
– That subexpression can be evaluated at compile time
• More complex for multi-dimensional arrays
Translating Assignments (1)
Production
S  id := E
Semantic Rules
p := lookup(id.name);
if p != NULL then
emit(p ':=' E.place)
else
error
E  E1 + E2
E.place := newtemp;
emit(E.place ':=' E1.place '+'
E2.place)
E  E1 * E2
E.place := newtemp;
emit(E.place ':=' E1.place '*'
E2.place)
Translating Assignments (2)
Production
E  -E1
Semantic Rules
E.place := newtemp;
emit(E.place ':=' 'uminus' E1.place)
E  (E1)
E.place := E1.place
E  id
p := lookup(id.name);
if p != NULL then
E.place := p
else
error
Type Conversions
• There are multiple types (e.g. integer,
real) for variables and constants
– Compiler may need to reject certain mixedtype operations
– At times, a compiler needs to general type
conversion instructions
• An attribute E.type holds the type of an
expression
Semantic Action: E  E1 + E2
E.place := newtemp;
if E1.type = integer and E2.type = integer then
begin
emit(E.place ':=' E1.place 'int+' E2.place);
E.type := integer
end
else if E1.type = real and E2.type = real then
…
else if E1.type = integer and E2.type = real then
begin
u := newtemp;
emit(u ':=' 'inttoreal' E1.place);
emit(E.place ':=' u 'real+' E2.place);
E.type := real
end
else if E1.type = real and E2.type = integer then
…
else E.type := type_error;
Example: x := y + i * j
• In this example, x and y have type real
• i and j have type integer
• The intermediate code is shown below:
t1 := i int* j
t3 := inttoreal t1
t2 := y real+ t3
x := t2
Boolean Expressions
• Boolean expressions compute logical values
• Often used with flow-of-control statements
• Methods of translating boolean expression:
– Numerical methods:
• True is represented as 1 and false is represented as 0
• Nonzero values are considered true and zero values are
considered false
– Flow-of-control methods:
• Represent the value of a boolean by the position reached in
a program
• Often not necessary to evaluate entire expression
Numerical Representation (1)
• Expressions evaluated left to right using 1 to
denote true and 0 to donate false
• Example: a or b and not c
t1 := not c
t2 := b and t1
t3 := a or t2
• Another example: a < b
100:
101:
102:
103:
104:
if a < b goto 103
t : = 0
goto 104
t : = 1
…
Numerical Representation (2)
Production
E  E1 or E2
Semantic Rules
E.place := newtemp;
emit(E.place ':=' E1.place 'or'
E2.place)
E  E1 and E2
E.place := newtemp;
emit(E.place ':=' E1.place 'and'
E2.place)
E  not E1
E.place := newtemp;
emit(E.place ':=' 'not' E1.place)
E  (E1)
E.place := E1.place;
Numerical Representation (3)
Production
E  id1 relop id2
Semantic Rules
E.place := newtemp;
emit('if' id1.place relop.op
id2.place 'goto' nextstat+3);
emit(E.place ':=' '0');
emit('goto' nextstat+2);
emit(E.place ':=' '1');
E  true
E.place := newtemp;
emit(E.place ':=' '1')
E  false
E.place := newtemp;
emit(E.place ':=' '0')
Example: a<b or c<d and e<f
100:
101:
102:
103:
104:
105:
106:
107:
108:
109:
110:
111:
112:
113:
if a < b goto 103
t1 := 0
goto 104
t1 := 1
if c < d goto 107
t2 := 0
goto 108
t2 := 1
if e < f goto 111
t3 := 0
goto 112
goto 23 := 1
t4 := t2 and t3
t5 := t1 or t4
Flow-of-Control (1)
• The function newlabel will return a new symbolic
label each time it is called
• Each boolean expression will have two new
attributes:
– E.true is the label to which control flows if E is true
– E.false is the label to which control flows if E is
false
• Attribute S.next of a statement S:
– Inherited attribute whose value is the label attached to
the first instruction to be executed after the code for S
– Used to avoid jumps to jumps
Flow-of-Control (2)
Production
S  if E then S1
Semantic Rules
E.true := newlabel;
E.false := S.next;
S1.next := S.next;
S.code := E.code ||
gen(E.true ':') ||
S1.code
Flow-of-Control (3)
Production
S  if E then S1 else S2
Semantic Rules
E.true := newlabel;
E.false := newlabel;
S1.next := S.next;
S2.next := S.next;
S.code := E.code ||
gen(E.true ':') ||
S1.code ||
gen('goto' S.next) ||
gen(E.false ':') ||
S2.code
Flow-of-Control (4)
Production
S  while E do S1
Semantic Rules
S.begin := newlabel;
E.true := newlabel;
E.false := S.next;
S1.next := S.begin;
S.code := gen(S.begin ':') ||
E.code ||
gen(E.true ':') ||
S1.code ||
gen('goto' S.begin)
Boolean Expressions Revisited (1)
Production
E  E1 or E2
E  E1 and E2
Semantic Rules
E1.true := E.true;
E1.false := newlabel;
E2.true := E.true;
E2.false := E.false;
E.code := E1.code ||
gen(E1.false ':') || E2.code
E1.true := newlabel;
E1.false := E.false;
E2.true := E.true;
E2.false := E.false;
E.code := E1.code ||
gen(E1.true ':') || E2.code
Boolean Expressions Revisited (2)
Production
E  not E1
Semantic Rules
E1.true := E.false;
E1.false := E.true;
E.code := E1.code
E  (E1)
E1.true := E.true;
E1.false := E.false;
E.code := E1.code
E  id1 relop id2 E.code := gen('if' id.place
relop.op id2.place 'goto'
E.true) ||
gen('goto' E.false)
E  true
E.code := gen('goto' E.true)
E  false
E.code := gen('goto' E.false)
Revisited: a<b or c<d and e<f
L1:
L2:
if a
goto
if c
goto
if e
goto
< b goto Ltrue
L1
< d goto L2
Lfalse
< f goto Ltrue
Lfalse
Another Example
L1:
L2:
while a < b do
if c < d then
x := y + z
else
x := y - z
L3:
L4:
Lnext:
if a < b goto L2
goto Lnext
if c < d goto L3
goto L4
t1 := y + z
x:= t1
goto L1
t2 := y – z
X := t2
goto L1
Mixed-Mode Expressions
• Boolean expressions often have arithmetic
subexpressions, e.g. (a + b) < c
• If false has the value 0 and true has the value 1
– arithmetic expressions can have boolean
subexpressions
– Example: (a < b) + (b < a) has value 0 if a and
b are equal and 1 otherwise
• Some operators may require both operands to
be boolean
• Other operators may take both types of
arguments, including mixed arguments
Revisited: E  E1 + E2
E.type := arith;
if E1.type = arith and E2.type = arith then
begin
/* normal arithmetic add */
E.place := newtemp;
E.code := E1.code || E2.code ||
gen(E.place ':=' E1.place '+' E2.place)
end
else if E1.type := arith and E2.type = bool then
begin
E2.place := newtemp;
E2.true := newlabel;
E2.flase := newlabel;
E.code := E1.code || E2.code ||
gen(E2.true ':' E.place ':=' E1.place + 1) ||
gen('goto' nextstat+1) ||
gen(E2.false ':' E.place ':=' E1.place)
else if …
Two-Pass Implementation
• First, construct syntax tree
• Walk through syntax tree in depth-first order,
computing translations
• May not know the labels to which control must
flow at the time a jump is generated
– Affect boolean expressions and flow control
statements
– Leave targets of jumps temporarily unspecified
– Add each such statement to a list of goto statements
whose labels will be filled in later
– This filling in of labels is called backpatching
Backpatching
• This filling in of labels mentioned on the
last slide is called backpatching
• Can be used to generate the code in a
single pass
• Translations generated are basically the
same as before except for labels
• Book notation – Quadruples are generated
into quadruple array
• Labels are indices into the array
Lists of Labels
• Lists of labels (really statements with jumps to
labels) are maintained
• New functions used to manipulate these lists
– makelist(i)
• Creates a new list containing only i, and index into the array
of quadruples
• Returns a pointer to the new list
– merge(p1, p2)
• Concatenates two lists of labels
• Returns a pointer to the new list
– backpatch(p, i) – inserts i as target label for
each statement on the list pointed to by p
Boolean Expressions and Markers
E 
|
|
|
|
|
|
E1 or M E2
E1 and M E2
not E1
(E1)
id1 relop id2
true
false
M  ε
The New Marker
• Translation scheme suitable for producing
quadruples during bottom-up pass
– The new marker an has associated semantic action
which
– Picks up, at appropriate times, the index of the next
quadruple to be generated
• Nonterminal E will have two new synthesized
attributes:
– E.truelist contains a list of statements that jump
when expression is true
– E.falselist contains a list of statements that jump
when expression is false
Example: E  E1 and M E2
• If E1 is false:
– Then E is also false
– So statements on E1.falselist become
part of E.falselist
• If E1 is true:
– Still need to test E2
– Target for statements on E1.truelist must
be the beginning of code generated for E2
– Target is obtained using the marker M
New Syntax-Directed Definition (1)
Production
E  E1 or M E2
Semantic Rules
backpatch(E1.falselist, M.quad);
E.truelist := merge(E1.truelist,
E2.truelist);
E.falselist := E2.falstlist
E  E1 and M E2
backpatch(E1.truelist, M.quad);
E.truelist := E2.truelist;
E.falselist := merge(E1.falselist,
E2.falselist)
E  not E1
E.truelist := E1.falselist;
E.falselist := E1.truelist
E  (E1)
E.truelist := E1.truelist;
E.falselist := E1.falselist
New Syntax-Directed Definition (2)
Production
E  id1 relop id2
Semantic Rules
E.truelist := makelist(nextquad);
E.falselist := makelist(nextquad+1);
emit('if' id1.place relop.op
id2.place 'goto _');
emit('goto _')
E  true
E.truelist := makelist(nextquad);
emit('goto _')
E  false
E.falselist := makelist(nextquad);
emit('goto _')
M  ε
M.quad := nextquad
Example Revisited (1)
• Reconsider: a<b or c<d and e<f
• First, a<b will be reduced, generating:
100:
101:
if a < b goto _
goto _
• Next, the marker M in E  E1 or M E2 will be
reduced, and M.quad will be set to 102
• Next, c<d will be reduced, generating:
102:
103:
if c < d goto _
goto _
Example Revisited (2)
• Next, the marker M in E  E1 and M E2 will be
reduced, and M.quad will be set to 104
• Next, e<f will be reduced, generating:
104:
105:
if e < f goto _
goto _
• Next, we reduce by E  E1 and M E2
– Semantic action calls backpatch({102}, 104)
– E1.truelist contains only 102
– Line 102 now reads: if c <d goto 104
Example Revisited (3)
• Next, we reduce by E  E1 or M E2
– Semantic action calls backpatch({101}, 102)
– E1.falselist contains only 101
– Line 101 now reads: goto 102
• Statements generated so far:
100:
101:
102:
103:
104:
105:
if a
goto
if c
goto
if e
goto
< b goto _
102
< d goto 104
_
< f goto _
_
• Remaining goto instructions will have their addresses
filled in later
Annotated Parse Tree
Flow-of-Control Revisited (1)
S 
|
|
|
|
L 
|
if E then S
if E then S else S
while E do S
begin L end
A
L ; S
S
• S, L, A, and E represent statements, statement
lists, assignments, and boolean expressions
• Clearly, other productions (not shown) are also
necessary
Flow-of-Control Revisited (2)
• New attributes and markers will be needed
• E will still have attributes E.truelist
and E.falselist as previously indicated
• S.nextlist is list of statements that
jump to quadruple following statement S
• L.nextlist is defined similarly
• New marker N will emit a goto statement
and have an associated list as an attribute
New Flow-of-Control Rules (1)
Production
M  ε
Semantic Rules
M.quad := nextquad
N  ε
N.nextlist := makelist(nextquad);
emit('goto _')
S  if E then M S1
backpatch(E.truelist, M.quad);
S.nextlist := merge(E.falselist,
S1.nextlist)
S  if E then M1 S1
else M2 S2
backpatch(E.truelist, M1.quad);
N backpatch(E.falselist, M2.quad);
S.nextlist := merge(S1.nextlist,
merge(N.nextlist, S2.nextlist))
New Flow-of-Control Rules (2)
Production
S  while M1 E do M2 S1
Semantic Rules
backpatch(S1.next, M1.quad);
backpatch(E.truelist, M2.quad);
S.nextlist := E.falselist;
emit('goto', M1.quad)
S  begin L end
S.nextlist := L.nextlist
S  A
S.nextlist := NULL
L  L1 ; M S
backpatch(L1.nextlist, M.quad);
L.nextlist := S.nextlist
L  S
L.nextlist := S.nextlist
Labels and Goto Statements
• The definition of a label is treated as a
declaration of the label
• Labels are typically entered into the symbol
table
– Entry is created the first time the label is seen
– This may be before the definition of the label if it is the
target of any forward goto
• When a compiler encounters a goto statement:
– It must ensure that there is exactly one appropriate
label in the current scope
– If so, it must generate the appropriate code;
otherwise, an error should be indicated
Procedures
• The procedure is an extremely important,
very commonly used construct
• Imperative that a compiler generates good
calls and returns
• Much of the support for procedure calls is
provided by a run-time support package
S  call id ( Elist)
Elist  Elist, E
Elist  E
Calling Sequence
• Calling sequences can differ for different
implementations of the same language
• Certain actions typically take place:
– Space must be allocated for the activation record of
the called procedure
– The passed arguments must be evaluated and made
available to the called procedure
– Environment pointers must be established to enable
the called procedure to access appropriate data
– The state of the calling procedure must be saved
– The return address must be stored in a known place
– An appropriate jump statement must be generated
Return Statements
• Several actions must also take place when
a procedure terminates
– If the called procedure is a function, the result
must be stored in a known place
– The activation record of the calling procedure
must be restored
– A jump to the calling procedure's return
address must be generated
• No exact division of run-time tasks
between the calling and called procedure
Pass by Reference
• The param statements can be used as
placeholders for arguments
• The called procedure is passed a pointer to the
first of the param statements
• Any argument can by obtained by using the
proper offset from the base pointer
• Arguments other than simple names:
– First generate three-address statements needed to
evaluate these arguments
– Follow this by a list of param three-address
statements
Using a Queue
Production
Semantic Rules
for each item p on queue do
emit('param' p);
S  call id ( Elist )
emit('call' id.place)
Elist  Elist, E
push E.place to queue
Elist  E
initialize queue to contain E
• The code to evaluate arguments is emitted first,
followed by param statements and then a call
• If desired, could augment rules to count the
number of parameters

Download Report

Topic #6: Type Checking

Paperzz.com

Your Paperzz