Chapter 8 Intermediate Code Basic Code Generation Techniques Gang S. Liu College of Computer Science & Technology Harbin Engineering University Introduction Final task of the compiler is to generate executable code for a target machine that is a representation of a semantics of the source code. This is the most complex phase of a compiler. It depends on detailed information about the target architecture, the structure of the runtime OS There is an attempt to optimize the speed and the size of the target code to take advantages of special features of the target machine (registers, addressing modes, pipelining, and cache memory) Compiler Construcion [email protected] 2 Introduction (cont) The code generation is typically broken into several steps, often including an abstract code called intermediate code. Two popular forms are 1. Three address code 2. P-code Compiler Construcion [email protected] 3 Intermediate Code A data structure that represents the source program during translation is called an intermediate representation (IR). An abstract syntax tree was used as the principal IR. An abstract syntax tree does not resemble target code. Example: control flow constructs. A new form of IR is necessary. Such intermediate representation that closely resembles target code is called intermediate code. Compiler Construcion [email protected] 4 Form of Intermediate Code Intermediate code is a linearization of the syntax tree. Intermediate code Can be very high level, representing operations almost as abstractly as the syntax tree or can closely resemble target code. May use or not used detailed information about the target machine and runtime environment. Compiler Construcion [email protected] 5 Use of Intermediate Code Intermediate code is useful For producing extremely efficient code In making a compiler more easily retargetable (if intermediate code is relatively target independent). Source Language 1 Target Language 1 Intermediate Code Source Language 2 Compiler Construcion Target Language 2 [email protected] 6 Intermediate code generation is in the mediate part of compiler, it is a bridge which translate source program into intermediate representation and then translate into target code. The position of intermediate code generation in compiler is shown in Figure 8.1. . Compiler Construcion [email protected] 7 Compiler Construcion [email protected] 8 There are two advantages of using intermediate code, The first one is that we can attach different target code machines to same front part after the part of intermediate code generation; ; The second one is that a machine-independent code optimizer can be applied to the intermediated representation. . Compiler Construcion [email protected] 9 Intermediate codes are machine independent codes, but they are close to machine instructions. The given program in a source language is converted to an equivalent program in an intermediate language by the intermediate code generator. . Compiler Construcion [email protected] 10 Intermediate language can be many different languages, and the designer of the compiler decides this intermediate language. Postfix notation, four-address code(Quadraples), three-address code, portable code and assembly code can be used as an intermediate language. In this chapter, we will introduce them in detail. Compiler Construcion [email protected] 11 8.1 Postfix Notation If we can represent the source program by postfix notation, it will be easy to be translated into target code, because the target instruction order is same with the operator order in postfix notation. . Compiler Construcion [email protected] 12 8.1.1 The definition of postfix notation the postfix notation for the expression a+b*c is abc*+. the expression are as follows: 1 The order of operands for expression in postfix notation is same with its original order. 2 Operator follows its operand, and there are no parentheses in postfix notation. 3 The operator appears in the order by the calculation order. Compiler Construcion [email protected] 13 For example, the postfix notation for expression a*(b+c/d) is abcd/+*, the translation procedure is just following the steps above. . firstly, according to step 1 we get the order of operands of the expression: abcd, secondly, by the step 2, the first operator in operator order is /, because it just follows its operands cd, in addition, as the step 3, operator / is calculated first, so the operator follow operands is / . The second operator in operator order is +, it dues to that there is parentheses in the original expression, operator + should be calculated earlier than operator *.The last one is *, because * is calculated lastly. . Compiler Construcion [email protected] 14 The other example, the postfix notation for expression a*b+(c-d)/e is ab*cd-e/+. From examples, we know it is a bit difficult to translate an expression into its postfix notation. So scientist E.W.DIJKSTRA from Holand created a method to solve the problem. . Compiler Construcion [email protected] 15 8.1.2 E.W.DIJKSTRA Method There are two stacks in E.W.DIJKSTRA method, one stack storages operands, the other one is for operators, the procedure of it is shown by Figure 8.2, and the step of E.W.DIJKSTRA method is as follows: . Compiler Construcion [email protected] 16 Compiler Construcion [email protected] 17 Actually, scanning the expression is from left to right. At the beginning of scanning, we push identifier # to the bottom of operator stack, similarly, we add identifier # to the end of expression to label that it is terminal of expression. When the two identifier # meet, it means the end of scanning. The steps of scanning are: 1 If it is operand, go to the operand stack : Compiler Construcion [email protected] 18 2 If it is operator, it should be compared with the operator on the top of operator stack. When the priority of operator on the top stack is bigger than the scanning operator, or equal to it, the operator on the top of operator stack would be popped and go to the left side. On the other hand when the priority of operator on the top stack is less than the scanning operator, scanning operator should be pushed into operator stack. Compiler Construcion [email protected] 19 3 If it is left parenthesis, just push it into operator stack, and then compare the operators within parentheses. . If it is right parenthesis, pop all the operators within parentheses, what is more, parentheses would be disappeared and would not be represented as postfix notation. . 4 Return to step 1 till two identifier # meet. Compiler Construcion [email protected] 20 Example 8.1 There is an expression of a+b*c , its postfix notation is abc*+. From the translating procedure shown by Figure 8.3, we can see that operator order is *+, it is also the pop order of the operator stack and calculating order . Compiler Construcion [email protected] 21 Compiler Construcion [email protected] 22 Compiler Construcion [email protected] 23 Three-Address Code The most basic instruction of three address code x = y op z The use of the address x differs from the addresses of y and z. y and z can represent constants and literal values. Compiler Construcion [email protected] 24 Form of Intermediate Code Intermediate code is a linearization of the syntax tree. Intermediate code Can be very high level, representing operations almost as abstractly as the syntax tree or can closely resemble target code. May use or not used detailed information about the target machine and runtime environment. Compiler Construcion [email protected] 25 Example + 2*a+(b-3) * 2 a b t1=2*a t1=b-3 t2=b-3 t2=2*a t3=t1+t2 t3=t2+t1 Left-to-right linearization Compiler Construcion 3 Right-to-left linearization [email protected] 26 Three-Address Code (cont) It is necessary to vary form of the three-address code to express all constructs (e.g. t2=-t1) No standard form exists. Compiler Construcion [email protected] 27 Implementation of Three-Address Code Each three-address instruction is implemented as a record structure containing several fields. The entire sequence is an array or a linked list . The most common implementation requires four fields – quadruple One for operation and three for addresses. For instructions that need fewer number of addresses, one or more addresses fields is given null or “empty” values. Compiler Construcion [email protected] 28 Factorial Program { Sample program in TINY language computes factorial } read x; { input an integer } if 0 < x then { don't compute if x <= 0 } fact := 1; repeat fact := fact * x; x := x - 1 until x = 0; write fact { output factorial of x } end Compiler Construcion [email protected] 29 Syntax Tree for Factorial Program Compiler Construcion [email protected] 30 Example (rd, x, _, _) { Sample program (gt, x, 0, t1) in TINY language (if_f, t1, L1, _) computes factorial } (asn, 1, fact, _) read x; { input an integer } (lab, L2, _, _) if 0 < x then (mul, fact, x, t2) { don't compute if x <= 0 } fact := 1; (asn, t2, fact, _) repeat (sub, x, 1, t3) fact := fact * x; (asn, t3, x, _) x := x - 1 (eq, x, 0, t4) until x = 0; (if_f, t4, L2, _) write fact (wri, fact, _, _) { output factorial of x } (lab, L1, _, _) end (halt, _, _, _) 31 Compiler Construcion [email protected] Different Representation Instructions themselves represent temporaries. This reduces the number of address fields from three to two. Such representation is called a triple. Amount of space is reduced. Major drawback: any movement becomes difficult for array representation. Compiler Construcion [email protected] 32 (rd, x, _, _) (0) (rd, x, _) Example (gt, x, 0, t1) (1) (gt, x, 0) (if_f, t1, L1, _) (2) (if_f, (1), (11)) (asn, 1, fact, _) (3) (asn, 1, fact) (lab, L2, _, _) (4) (mul, fact, x) (mul, fact, x, t2) (5) (asn, (4), fact) (ans, t2, fact, _) (6) (sub, x, 1) (sub, x, 1, t3) (asn, t3, x, _) (7) (asn, (6), x) (eq, x, 0, t4) (8) (eq, x, 0) (if_f, t4, L2, _) (9) (if_f, (8), (4)) (wri, fact, _, _) (10) (wri, fact, _) (lab, L1, _, _) (11) (halt, _, _) (halt, _, _, _) [email protected] Compiler Construcion 33 P-Code Standard assembly language code produced by Pascal compilers in 1970/80. Designed for hypothetical stack machine, called Pmachine. Interpreters were written for actual machines. This made Pascal compilers easy portable. Only interpreter must be rewritten for a new platform. Modifications of P-code are used in a number of compilers, mostly for Pascal-like languages. Compiler Construcion [email protected] 34 P-Machine Consists of A code memory An unspecified data memory for named variables A stack for temporary data Registers needed to maintain the stack and support execution. Compiler Construcion [email protected] 35 Example 1 2*a+(b-3) ldc 2 ; load constant 2 lod a ; load value of variable a mpi ; integer multiplication lod b ; load value of variable b ldc 3 ; load constant 3 sbi ; integer subtraction adi ; integer addition Compiler Construcion [email protected] 36 Example 2 x:=y+1 lda x ; load address of x lod y ; load value of y ldc 1 ; load constant 1 adi ; add sto ; store top to address ; bellow top & pop both Compiler Construcion [email protected] 37 Factorial Program lda x rdi ;load address of x lod fact ;load value of fact ;load value of x ;read an integer, store to lod x ;multiply ;address on top of the stackmpi sto ;store top to ;(& pop it) ;address of second & lod x ;load the value of x ;pop ldc 0 ;load constant 0 lda x ;load address of x grt ;pop an compare top two lod x ;load a value of x ;values push the Boolean ldc 1 ;load constant 1 ;result Sbi ;subtract fjp L1 ;pop Boolean value, sto; ;jump to L1 if false lod x lda fact ;load address of fact ldc 0 ldc 1 ;load constant 1 equ ;test for equality sto ;pop two values, storing fjp L2 ;jump to L2 of false ;the first to address lod fact; ;represented by second wri lab L2 ;definition of label 2 lab L1 stp lda fact ;load address of fact Compiler Construcion [email protected] 38 P-Code and Three-Address Code P-code is closer to actual machine. Instructions require fewer addresses. “One-address” or “zero-address” Less compact in terms of instructions. Not “self-contained” Instructions operate implicitly on a stack All temporary values are on stack, no need for temporary names. Compiler Construcion [email protected] 39 Generation of Target Code Involves two standard techniques 1. Macro expansion – Replaces each intermediate code instruction with an equivalent sequence of target code instructions. 2. Static simulation – Straight-line simulation of the effects of the intermediate code and generating target code to match these effects. Compiler Construcion [email protected] 40 Example exp → id = exp | aexp aexp → aexp + factor | factor factor → (exp) | num | id lda lod ldc adi stn ldc adi x x 3 (x=x+3)+4 t1 = x+3 x = t1 t2 = t1+4 4 Compiler Construcion [email protected] 41 Static Simulation lda lod ldc adi stn ldc adi x x 3 top of stack 3 x 4 address of x t1=x+3 top of stack t1 address of x Compiler Construcion [email protected] 42 Static Simulation lda lod ldc adi stn ldc adi x x 3 top of stack t1 4 address of x x=t1 top of stack t1 Compiler Construcion [email protected] 43 Static Simulation lda lod ldc adi stn ldc adi x x 3 t1 = x+3 x = t1 t2 = t1+4 top of stack 4 4 t2=t1+4 t1 top of stack t2 Compiler Construcion [email protected] 44 Example exp → id = exp | aexp aexp → aexp + factor | factor factor → (exp) | num | id lda lod ldc adi stn ldc adi x x 3 (x=x+3)+4 t1 = x+3 x = t1 t2 = t1+4 4 Compiler Construcion [email protected] 45 lda t1 Macro Expansion t1 = x+3 x = t1 t2 = t1+4 t1 = x+3 x = t1 lod x lod x ldc 3 ldc 3 adi adi sto sto lda x lda x lod t1 lod t1 sto sto lda t2 t2 = t1+4 lod t1 ldc 4 adi Compiler Construcion lda t1 [email protected] sto lda t2 lod t1 ldc 4 adi sto 46
© Copyright 2025 Paperzz