different phases of compiler

PRACTICAL NO. :- 1
DATE:-7.2.2013
AIM : To study the structure of a COMPILER.
THEORY : A compiler is a program that reads a program written in one language, the source
language and translate it into an equivalent program in another language. As an important part of
this translation process, the compiler reports to its user the presence of errors in the source
program.
STRUCTURE OF A COMPILER : A compiler takes a source program as a input
produces an equivalent sequence of machine instruction as output. This process is so complex
that it is not reasonable, either from a logical point of view or from an implementation point of
view, to consider the compilation process as occurring in one single step. It is customary to
partition the compilation process into a series of sub processes called PHASES. A phase is a
logically cohesive operation that takes as input one representation of the source program and
produces as output another representation.
DIFFERENT PHASES OF COMPILER
There are 5 different phases of compiler which are listed below:
1. Lexical Analysis
2. Syntax Analysis
3. Intermediate Code Generation
4. Code Optimization
5. Code generation
The description regarding these phases is given below:
1). Lexical Analysis: The first phase, called the Lexical Analyzer, or Scanner, separates
characters of source language into groups that logically belong together; these groups are called
tokens. The usual tokens are keywords ( like DO or IF ), identifiers ( like X or NUM ), operator
symbols ( like <, >, <= or = ) and punctuation symbols such as parenthesis or commas. The
output of this phase is a stream of tokens which is sent to the next phase i.e syntax analysis.
For example, suppose a source program contains the assignment statement
position = initial + rate* 60
The characters in this assignment could be grouped into the following lexemes and mapped into
the following tokens passed on to the syntax analyzer:
1. posit ion is a lexeme that would be mapped into a token (id, I), where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The symbol-table
entry for an identifier holds information about the identifier, such as its name and type.
2. The assignment symbol = is a lexeme that is mapped into the token (=). Since this token needs
no attribute-value, we have omitted the second component. We could have used any abstract
symbol such as assign for the token-name, but for notational convenience we have chosen to use
the lexeme itself as the name of the abstract symbol.
3. initial is a lexeme that is mapped into the token (id, 2), where 2 points to the symbol-table
entry for initial .
4. + is a lexeme that is mapped into the token (+).
5. rate is a lexeme that is mapped into the token (id, 3), where 3 points to the symbol-table entry
for rate .
6. * is a lexeme that is mapped into the token (*).
7. 60 is a lexeme that is mapped into the token (60)
position = initial + rate * 60
Lexical Analyzer
(id, 1) (=) (id, 2) (+) (id, 3) (*) (60)
2). Syntax Analysis: The parser has two functions. It checks that the tokens appearing in its
input, which is the output of the first phase, occur in patterns that are permitted by the
specification for the source language. It also imposes on the tokens a tree like structure. The
second aspect of syntax analyzer is to make implicit hierarchical structure of the incoming token
stream by identifying which part of the token stream should be grouped together.
Ex.
(id, 1) (=) (id, 2) (+) (id, 3) (*) (60)
Syntax Analyzer
=
(id , 1)
+
(id , 2)
*
(id , 3)
60
3.Semantic Analyzer :- Gathers a type information and evaluates syntax tree for the
semantic errors. Means type checking is the major task of semantic analyzer.
Ex:=
(id , 1)
+
(id , 2)
*
(id , 3)
60
Semantic Analyzer
=
(id , 1)
+
(id , 2)
*
(id , 3)
inttofloat
60
3). Intermediate Code Generator: It uses the structure produced by the syntax analyzer
to create a stream of simple instructions. Many styles of intermediate code are possible. One
common style uses instructions with one operator and a small number of operands. These
instructions can be viewed as simple macros like the macro ADD2. The primary difference
between code and assembly code is that the intermediate code need not specify the registers to be
used for each operation.
Ex:=
(id , 1)
+
(id , 2)
*
(id , 3)
inttofloat
60
Intermediate Code
Generator
tl = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
4). Code Optimization: It is an optional phase designed to improve intermediate code so
that the ultimate object program runs faster and/or takes less space. Its output is another
intermediate code program that does the same job as original, but perhaps in a way that saves
time and/or space.
Ex:tl = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
Code Optimization
tl = id3 * 60.0
id1 = id2 + t1
5). Code Generation: The last phase is code generation. It produces the object code by
deciding on the memory locations for data, selecting code to access each datum, and selecting
the registers in which each computation is to be done. Designing a code generator that produces
truly efficient object programs is one of the most difficult parts of compiler design.
Ex:tl = id3 * 60.0
id1 = id2 + t1
Code Generation
LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF idly Rl
TABLE MANAGEMENT :
The table management, or bookkeeping, portion of the compiler keeps track of the names used
by the program and records essential information about each, such as its type (integer, real, etc.).
The data structure used to store this information is called a symbol table.
ERROR HANDLING :
The error handler is invoked when a flaw in the source program is detected. It must warn the
programmer by issuing a diagnostic, and adjust the information being passed from phase to
phase so that each phase can proceed. It is desirable that compilation be completed on flawed
programs, at least through the syntax analysis phase, so that as many errors as possible can be
detected in one compilation. Both the table management and error handling routines interact with
all phases of compiler.
RESULT:- Study of the structure of a COMPILER is done.
PRACTICAL NO.:- 2
DATE:-7.2.2013
AIM:- write a program to find the tokens in the string entered by the user.
PROGRAM:#include<stdio.h>
#include<conio.h>
#include<string.h>
void main()
{
int i,n;
char s[20];
clrscr();
printf("Enter the string\n");
gets(s);
n=strlen(s);
printf("The length of given string is %d\n",n);
printf("Tokens in given string are ->\n");
for(i=0;i<n;i++)
{
printf("Token%d=%c\n",i+1, s[i]);
}
getch();
}
INPUT:-
OUTPUT:-
PRACTICAL NO.:- 3
DATE:-21.2.2013
AIM:- Write a program to check all the valid or invalid sreing of a grammer.
SaBde
Bac/f/ ^
PROGRAM:#include<stdio.h>
#include<conio.h>
#include<string.h>
void main()
{
int i,n,k,a;
char s[20];
clrscr();
printf("We have a grammer that is\n S->aBde\n B->ac/f/^\n");
printf("Enter the string\n");
gets(s);
n=strlen(s);
if(n==0)
{
printf("......ERROR...You does not entered any string......");
}
else if(s[0]=='a' && s[n-1]=='e')
{if(n==5 && s[n-2]=='d' && s[n-3]=='c' && s[n-4]=='a'&& s[n-5]=='a')
{printf("String in valid");
}
else if(n==4 && s[n-2]=='d'&& s[n-3]=='f'&& s[n-4]=='a')
{
printf("String is valid");
}
else
if(n==3 && s[n-2]=='d'&& s[n-3]=='a')
{
printf("string is valid");
}
else
{printf(".....ERROR.....String is not valid.........");
}
}
else
{printf(".......ERROR......Stinr is not valid...........");
}
getch();
}
Input:
Output:
Input:-
Output:-
PRACTICAL NO.:- 4
DATE:-28.2.2013
AIM:- Write a program for NFA for given grammar which gives an valid and invalid states of
the NFA for the grammar gives:SAaB
Ar/c
Bt
PROGRAM:#include<stdio.h>
#include<conio.h>
#include<string.h>
void main()
{char s[10];
int l,i;
clrscr();
printf("Given Grammar is\n\n");
printf("
r/c
a
t\n\n");
printf("----->q1------>q2--------->q3---------->q4\n\n");
printf("Enter the string for the given grammar\n\n");
gets(s);
l=strlen(s);
if(l==3)
{
printf("you are enter \n\n");
for(i=0;i<l;i++)
{
printf("\t\t%c",s[i]);
}
printf("\n--------->q1---------->q2------------>q3-------------->q4\n\n");
switch(1)
{
case 1: if(s[0]=='c'||s[0]=='r')
{
printf("State1 is valid\n\n");
goto state2;
}
else
{ printf("State1 is invalid\n\n");
printf("........ERROR..........\n\n");
printf(" so the string is not accept by the NFA ");
break;
}
state2: if(s[1]=='a')
{
printf("State2 is valid\n\n");
goto state3;
}
else
{
printf("State 2 is invalid\n\n");
printf("........ERROR..........\n\n");
printf("So the string is not accept by the NFA");
break;
}
state3: if(s[2]=='t')
{
printf("State3 is valid\n\n");
printf("So string is accept by the NFA");
}
else
{
printf("State3 is invalid\n\n");
printf("..........ERROE........\n\n");
printf("so the string is not accept by the NFA");
}
}
}
else
{
printf("\n\nERROR...........String length is not valid......... ");
}
getch();
}
Input:-
Output:-
Input:-
Output:-

Download Report

different phases of compiler

Paperzz.com

Your Paperzz