generator llgen

LLGEN
Generator of syntax analyzier (parser)
GENERATOR L LGEN
The main task of generator LLGEN is generate a
parser (in C), which use the recursive go down
method without recurrence;
The source code is generated by LLgen'a on the basis
of the file containing the specification;
In the file with specification we can use the extended
specification of simple LL(1) grammars.
Because LLgen includes a built-in mechanism of
static and dynamic conflict resolution, it allows the
use of ambiguous grammars;
2
GENERATOR LLGEN
Diagram the organization of LLgen:
scan.l
gram.g
LEX
LLgen
scan.c
gram.c
Lpars.c
Lpars.h
GCC
scane.exe
file.txt
RESULT
3
GENERATOR LLGEN
flex –l scan.l (use of generator LEX)
result - lex.yy.c
LLgen gram.g (use of generator LLgen for file with
the speciticafion gram.g)
result - Lpars.c and Lpars.h
gcc lex.yy.c Lpars.c gram.c (compilation in C)
./a.out < file.in (analisi of file)
4 4
GENERATOR LLGEN
LLgen generator uses (default) an external
lexical analyzer (generated by Lex). For this
purpose is used the function yylex();
The file Lpars.h which is generated during
generator
operation
LLgen,
contains
definitions that assigned numeric constants
declared the names of the token-s;
5
GNENRATOR LLGEN
Ways to use another analyzer are as follows:
 Put the implementation of the scanner directly
in the specification grammar (in the ’{’ ’}’or in
an external file);
 In the specifications, we have to indicate the
name of the function which is use by LLgen;
%lexical name_function;
 If it is necessary, we have to incorporate into
6
the lexical analyzer of file Lpars.h;
GENERATOR LLGEN
LLgen is a tool row. The specification file for
LLgen we create in a form of plain text file,
sometimes in several files;
Each of the generated source code, contains
productions, directive of generator LLgen and
declarations and code in C.
7
CREATE SPECIFICATION
Each production from the specifications for the
program LLgen comprises: nonterminal, the
character ":" and the right hand side of the
production. Ends with a semicolon;
The right side of alternative production are
separated by a "|";
The right side of the production can be of
terminals, nonterminals and semantic actions;
nonterminal : the right hand side of the
production ;
8
CREATE SPECIFICATION
The rules create specifications:
 White spaces are ignored, but can not occur
within name;
 Comments are introducing after the character
„//”;
 Comments can not be nested;
 Comments may occur at any place where it is
allowed to occurrence of names;
9
CREATE SPECIFICATION
The rules create specifications:
 The names of terminal and non-terminal
symbols can be any length. They have a syntax
such as C language identifiers;
 Symbol names must not conflict with
keywords in C;
 Capitalization is distinguishable;
10
CREATE SPECIFICATION
The rules create specifications:
 The names of symbols can be any length, but
in LLgen significant is 50 characters;
 All names generated and used by LLgen begin
with the prefix LL;
11
DECLARATION OF TERMINA
Terminals that are not letters, we declare:
%token ken;
If you have multiple terminals to declaration,
we can do this:
%token name1, name2, name3;
Any use of the terminal must be preceded his
declaration;
12
DEClARATION OF TERMINAL







The terminals, which are the letters are included in
quotes;
LLgen also recognizes (as C) a set of special
literals, i.e.:
new line
‘\n’
tab
‘\t’
carriage return
‘\r’
apostrophe
‘\’’
withdrawal character
‘\b’
backslash
‘\\’
octal number
‘\xxx’
13
DECLRATION OF TERMINAL
REMEMBER!!!
Assume that the parser encountered in the test
file, name that has not been declared as a
token. This name will be treated by LLgen as
a symbol nonterminal;
14
DECLRATION OF TERMINAL
Nonterminals is implemented like a function
of the C language;
In LLgen we can use local variables. The
generator enables them to declare, in
brackets, only the left side of production as a
nonterminal symbol, eg .:
A {int ken;} : S ken T ;
15
DECLRATION OF TERMINAL
Through the semantic action, we mean any
single instruction (a group of instructions)
written in C, which are enclosed in braces;
In LLgen the semantic actions can insert only
the right side of the production, eg .:
A {int counter} : S ken {counter=1;} T ;
16
STARTER NONTERMINAL
Analyzers generated by LLgen may have
multiple terminals not boot;
Declaration
of
starter
nonterminals
(otherwise axiom) is as follows:
%start function , name_of_nonterminal;
example:
%start parse, S;
17
COMMANDS OF COMPILATION
The command, which is used to start of the generator
is LLgen. This command is invoked for pilku
specifications (extension g), for example,.: LLgen
gram.g
LLgen on the output produces three files:
 gram.c – file in C, which contains implementation of
parser;
 Lpars.h – file containing the syntax analyzer
interface;
 Lpars.c – parser skeleton and board control;
18
OPTION -V
Sometimes it's helpful for startup and testing
parser, use the -v option;
Thanks to the -v option, will be generated file
LL.output, which contains information about
the unresolved conflicts that have arisen in the
grammar;
19
EXTENSION OF GRAMMAR OF
SYNTAX
Extensions of syntax context-free grammars:
 * (*quantity) – feedback clouser;
 + (+quantity) – positive clouser;
 ? – optionality operator;
 [...] – The possibility of grouping of symbols;
20
Example
Let ∑={a,b}. Let consider the following
regular language L=L(b*a). Then:
S:BA
;
B:
| ‘b’ B
;
A : ‘a’
;
S:BA
;
B : ‘b’ *
;
A : ‘a’
;
21
Example
Let ∑={a,b}. Let consider the following
language L={b, ab, aab, aaab}. Then:
S:AB
;
A:
| ‘a’ C
;
C:
| ‘a’
;
B : ‘b’
;
S:AB
;
A : ‘a’ *3
;
B : ‘b’
;
22
Example
Let ∑={a,b}. Let consider the following
language L={ab, aab, aaab}. Then:
S:AB
;
A : ‘a’ C
;
C:
| ‘a’
;
B : ‘b’
;
S:AB
;
A : ‘a’ +3
;
B : ‘b’
;
23
Example
Let ∑={a,b}. Let consider the following
landuade L={b, ab}. Then:
S:AB
;
A:
| ‘a’
;
B : ‘b’
;
S:AB
;
A : ‘a’ ?
;
B : ‘b’
;
24
Example
Let ∑={a,b}. Let consider the following
language L={A ∈∑* : |A|=2}. Then:
S : ‘a’ B
| ‘b’ B
;
B : ‘a’
| ‘b’
;
S : [ ‘a’ | ‘b’ ] +2
;
25
COMPARISON
Consider the grammar, which is not a simple
LL(1) grammar.
Compare the effort of procedure for
adjusting the grammar and implementation
of grammar in generator LLgen;
Let ∑=[a,b}. We will write a program that
accepts context-free language L={A∈∑* :
A=an bn ; n ∈Ν};
26
COMPARISON
{
int quan_a, quan_b;
}
%start parse , S;
S:AB
{ if (quan_a= = quan_b) puts(’’OK.’’);
else puts(’’Blad’’); }
;
27
COMPARISON
We remove the leftmost recursion;
A : ’a’ { quan_a=1; }
| A ’a’ { quan_a++; }
;
B : ‘b’ { quan_b=1; }
| B ‘b’ { quan_b++; }
;
A : ’a’ { quan_a=1; }
| ’a’ A { quan_a++; }
;
B : ‘b’ { quan_b=1; }
| ‘b’ B { quan_b++; }
;
28
COMPARISON
A : ’a’ C
;
C:
| ’a’ C
;
B : ’b’ D
;
D:
| ’b’ D
;
{ quan_a++; }
{ quan_a=0; }
{ quan_a++; }
{ quan_b++; }
{ quan_b=0; }
{ quan_b++; }
29
COMPARISON
S : {quan_a=quan_b=0} A B
{ if (quan_a= = quan_b) puts(’’OK.’’);
else puts(’’Blad’’); }
;
A : [ ’a’ {quan_a++} ] +
;
B : [ ’b’ {quan_b++} ] +
;
30
LLSYMB
LLsymb is a global integer variable that can
take on different values. What value will be
accepted, depending on the position of the
read head on the right side of the production:
Possible values:
 If the parser read the token, then in variable
LLsymb we have this token;
 After grouping and alternative, in variable is
remembered token;
31
CREATE SPECIFICATION
In the file with the specifications for the
generator LLgen, should be included
implementation of the main function;
%start parse, S ;
int main(){
parse();
return 0;
}
32
CREATE SPECIFICATION
The file with the specifications for the
generator LLgen should also be included
function LLmessage;
This function is automatically called by the
parser when an error occurs syntax;
void LLmessage ( int tk );
Do not
return any
value
It has one integer parameter
33
CREATE SPECIFICATION
The variable tk accepts the following values:
 when he was expected token „tk” – tk > 0;
 when loaded is an unexpected token and it has
been removed – tk = 0;
 if not encountered the expected end of the file
and other input will be ignored – tk = - 1;
34
Example
The operation of LLgen generator is best seen in
the example. At the entrance there is a string of
words made up of natural alphabet, words end
with a colon and are separated by a comma. Given
the input string contains at least one word....
35
CONFLICTS
During working of the syntax generator, can
occur the following conflicts:
- We are not able to determine which of the right
sides should to be developed - conflict of
alternatives;
- The structure that is currently in progress,
includes the closure and it is difficult to
determine, whether the input is its
continuation, or start another construction - a
conflict of repetitions;
36
CONFLICTS
Conflict of alternatives can be resolved in two
ways:
 dynamic settlement of the alternatives
conflict:
%if (condition)
 static settlement of the alternatives conflict:
%prefer ⇔ %if(1)
%avoid ⇔ %if(0)
37
Example
Consider the task of testing whether the binary
number is even number;
Lexical analyzer identifies and returns the
binary numbers;
%%
[01] { return yytext[0]; }
38
Example
{
int read_number;
}
%start parse, S;
S : ’0’ { read_number = 0; } R
| ’1’ { read_number = 1; } R
;
R : %if (read_number ==0 ) {puts(”even
number”);}
|
{puts(”odd number”);}
|S
;
39
SOLVING OF THE CONFLICT
Example use of mechanism of static
resolution of alternatives conflict is the socalled problem: "dangling else";
This issue will be discussed in detail during
the lecture devoted to the generator YACC;
40
SOLVING OF THE CONFLICT
To resolve of the conflict of repetition we may
use the keyword %while;
%while ( condition )
41
THE END
END OF THE SIXTH LECTURE