Compiler Structures
241-437, Semester 1, 2011-2012
5. Top-down Parsing
•
Objective
– look at top-down (LL) parsing using
recursive descent and tables
– consider a recursive descent parser for the
Expressions language
241-437 Compilers: topDown/5
1
Overview
1.
2.
3.
4.
5.
6.
Parsing with a Syntax Analyzer
Creating a Recursive Descent Parser
The Expressions Language Parser
LL(1) Parse Tables
Making a Grammar LL(1)
Error Recovery in LL Parsing
241-437 Compilers: topDown/5
2
Source Program
Lexical Analyzer
In this
lecture
Semantic Analyzer
but concentrating
on top-down parsing
Int. Code Generator
Syntax Analyzer
Front
End
Intermediate Code
Code Optimizer
Target Code Generator
Back
End
Target Lang. Prog.
241-437 Compilers: topDown/5
3
1. Parsing with a Syntax Analyzer
3. Token,
token value
Source
Lexical
Program
Analyzer
2. Get chars (using chars) 1. Get next
token
to make
a token
lexical
errors
241-437 Compilers: topDown/5
Syntax
Analyzer
(using tokens)
parse
tree
syntax
errors
4
1.1. Top Down (LL) Parsing
SS
2
4
SS
S
begin
1
B
B => begin SS end
SS => S ; SS
SS => e
S => simplestmt
S => begin SS end
3
simplestmt ;
241-437 Compilers: topDown/5
SS 6
5S
simplestmt
;
e
end
5
1.2. LL Parsing Definition
•
An LL parser is a top-down parser for a
context-free grammar.
•
It parses input from Left to right, and
constructs a Leftmost derivation of the
input.
241-437 Compilers: topDown/5
6
A Leftmost Derivation
•
In a leftmost derivation, the leftmost nonterminal is chosen to be expanded.
– this builds the parse tree top-down, left-to-right
•
Example grammar:
L => ( L ) L
L => e
241-437 Compilers: topDown/5
7
Leftmost Derivation for (())()
L
(L)L
( ( L) L) L
(()L)L
(())L
(())(L)L
(())()L
(())()
241-437 Compilers: topDown/5
// L => ( L ) L
// L => ( L ) L
( ( ) ) ( )
input
// L => e
// L => e
// L => e
// L =>( L ) L
// L => e
8
1.3. LL(1) and LL(k)
•
An LL(1) parser uses the current token only
to decide which production to use next.
•
An LL(k) parser uses k tokens of input to
decide which production to use
– this make the grammar easier to write
– adds no 'power' compared to LL(1)
– harder to implement efficiently
241-437 Compilers: topDown/5
9
1.4. Two LL Implementation
Approaches
•
Recursive Descent parsing
– all the compiler code is generated
(automatically) from the grammar
•
Table Driven parsing
– a table is generated (automatically) from the
grammar
– the table is 'plugged' into an existing compiler
241-437 Compilers: topDown/5
10
2. Creating a Recursive Descent
Parser
•
Each non-terminal (e.g. A) is translated into
a parsing function (e.g. A()).
•
The A() function is generated from all the
productions for A:
– A => B, A => a C, etc.
241-437 Compilers: topDown/5
11
2.1. Basic Translation Rules
•
I'll start by assuming a production body
doesn't use *, [], or e.
– I'll add to the translation rules later to deal with
these extra features
•
S => Body
becomes
void S()
{ translate< Body > }
241-437 Compilers: topDown/5
12
•
If Body is
B1 B2 . . . Bn
then it becomes:
translate< B1 > ;
translate< B2 > ;
:
translate< Bn > ;
241-437 Compilers: topDown/5
13
•
If Body is
B1 | B2 . . . | Bn
then it becomes:
if (currToken in FIRST_SEQ<B1>)
translate<B1> ;
else if (currToken in FIRST_SEQ<B2>)
translate<B2> ;
:
else if (currToken in FIRST_SEQ<Bn>)
translate<Bn> ;
else
error();
241-437 Compilers: topDown/5
14
•
currToken is the current token, which is
obtained from the lexical analyzer:
Token currToken;
// global
void nextToken(void)
{ currToken = scanner();
241-437 Compilers: topDown/5
}
15
•
The first token is read when the parser first
starts. main() also calls the function
representing the start symbol:
int main(void)
{
nextToken();
S(); // S is the grammar's start symbol
:
// other code
return 0;
}
241-437 Compilers: topDown/5
16
•
error() reports that the current token cannot
be matched against any production:
int lineNum;
// global
void error()
{
printf("\nSyntax error at \'%s\' on line
%d\n", currentToken, lineNum);
exit(1);
}
241-437 Compilers: topDown/5
17
•
In a body, if B is a non-terminal, it is
translated into the function call:
B();
•
In a body, if b is a terminal, it is translated
into a match() call:
match(b);
241-437 Compilers: topDown/5
18
•
match() checks that the current token is
what is expected (e.g. b), and reads in the
next one for future testing:
void match(Token expected)
{
if(currToken == expected)
currToken = scanner();
else
error();
}
241-437 Compilers: topDown/5
19
•
Special '|' Body case. If Body is
a1 B1 | a2 B2 . . . | an Bn
// ai's are terminals
then it becomes:
if (currToken == a1) {
match(a1); translate<B1> ; }
else if (currToken == a2) {
match(a2); translate<B2> ; }
:
else if (currToken == an) {
match(an); translate<Bn> ; }
else
error();
241-437 Compilers: topDown/5
a1, a2, ..., an
must be
different
20
2.2. Example Translation
void S() {
if (currToken == a) {
match(a); B();
}
else if (currToken == b) {
match(b); C();
}
else error();
}
// S => a B
void B() {
match(b);
match(b);
C();
}
// B => b b C
void C() {
match(c);
match(c);
}
// C => c c
241-437 Compilers: topDown/5
|
b C
And main(),
nextToken(),
match(), and
error().
21
Parsing "abbcc"
a b b c c
input
Function calls:
main() -->
S() -->
match(a);
B() -->
match(b);
match(b);
C() -->
match(c);
match(c)
S
a
b
B
b
C
c
c
241-437 Compilers: topDown/5
22
2.3. When can we use Recursive
Descent?
•
A fast/efficient recursive descent parser can
be generated for a LL(1) grammar.
•
So we must first check if the grammar is
LL(1).
– the check will generate information that can be
used in constructing the parser
– e.g. FIRST_SEQ<...>
241-437 Compilers: topDown/5
23
Dealing with "if"
•
A tricky part of LL(1) is making sure that
branches can be coded
– each branch must start differently so it's easy
(and also fast) to decide which branch to use
based only on the current input token
(currToken value)
241-437 Compilers: topDown/5
continued
24
a .. .. .. ..
•
e.g.
currToken
– A --> a B1
A --> b B2
– is okay since the two branches start
differently (a and b)
– A --> a B1
A --> a B2
– not okay since both branches start the same
way
241-437 Compilers: topDown/5
continued
25
•
In non-mathematical words, a grammar is
LL(1) if the choice between productions can
be made by looking only at the start of the
production bodies and the current input
token (currToken).
241-437 Compilers: topDown/5
26
Is a Grammar LL(1)?
•
in maths
For every non-terminal in the language (e.g. A,
B, C), generate the PREDICT set for all the
productions:
PREDICT( A => a1)
PREDICT( A => a3 )
PREDICT( B => b1 )
PREDICT( C => c1 )
241-437 Compilers: topDown/5
PREDICT( A => a2 )
PREDICT( B => b2 )
...
continued
27
•
Take the intersection of all pairs of sets for A:
PREDICT( A => a1) ∩
PREDICT( A => a2 ) ∩
PREDICT( A => a1) ∩
PREDICT( A => a3 ) ∩
PREDICT( A => a2) ∩
PREDICT( A => a3 ) ∩
– the intersection of every pair must be empty
(disjoint)
241-437 Compilers: topDown/5
continued
28
•
Repeat for all the sets for B, C, etc.:
– B --> b1
– C --> c1
•
B --> b2
C --> c2
C --> c3
If every PREDICT intersection pair is
disjoint then the grammar is LL(1).
241-437 Compilers: topDown/5
continued
29
•
If there's only one PREDICT set for a nonterminal (e.g. D --> d1), then it's
automatically disjoint.
241-437 Compilers: topDown/5
30
Calculating PREDICT
•
PREDICT(A => a)
= (FIRST_SEQ(a) – {e}) FOLLOW(A)
if e in FIRST_SEQ(a)
or
= FIRST_SEQ(a)
if e not in FIRST_SEQ(a)
•
FIRST_SEQ() and FOLLOW() are the set
functions I described in chapter 4.
241-437 Compilers: topDown/5
31
Short Example 1
•
•
S => a S | a
Production
– S => a S
– S => a
•
Predict
{a}
{a}
PREDICT(S) = {a} ∩ {a } == {a}
– not disjoint
– the grammar is not LL(1)
241-437 Compilers: topDown/5
32
Short Example 2
•
•
S => a S | b
Production
Predict
– S => a S
– S => b
•
{a}
{b}
PREDICT(S) = {a} ∩ {b } == {}
– disjoint
– the grammar is LL(1)
241-437 Compilers: topDown/5
33
Larger Example
•
Is this grammar LL(1)?
E => T E1
E1 => + T E1 | e
T => F T1
T1 => * F T1 | e
F => id | '(' E ')'
241-437 Compilers: topDown/5
FIRST(F) = {(,id}
FIRST(T) = {(,id}
FIRST(E) = {(,id}
FIRST(T1) = {*,e}
FIRST(E1) = {+,e}
FOLLOW(E) = {$,)}
FOLLOW(E1) = {$,)}
FOLLOW(T) = {+$,)}
FOLLOW(T1) = {+,$,)}
FOLLOW(F) = {*,+,$,)}
34
Production Predict
E => T E1
= FIRST(T) = {(,id}
E1 => + T
E1
{+}
E1 => e
= FOLLOW(E1) = {$,)}
T => F T1
= FIRST(F) = {(,id}
T1 => * F T1 {*}
T1 => e
= FOLLOW(T1) = {+,$,)}
F => id
{id}
F => ( E )
{(}
241-437 Compilers: topDown/5
FIRST(F) = {(,id}
FIRST(T) = {(,id}
FIRST(E) = {(,id}
FIRST(T1) = {*,e}
FIRST(E1) = {+,e}
FOLLOW(E) = {$,)}
FOLLOW(E1) = {$,)}
FOLLOW(T) = {+$,)}
FOLLOW(T1) = {+,$,)}
FOLLOW(F) = {*,+,$,)}
35
•
Are the PREDICT sets disjoint for all the
non-terminals?
–
–
–
–
–
•
PREDICT(E): {(,id}
PREDICT(E1): {+} ∩ {$,)}
PREDICT(T): {(,id}
PREDICT(T1): {*} ∩ {+,$,)}
PREDICT(F): {id} ∩ {(}
yes
yes
yes
yes
yes
All disjoint, so the grammar is LL(1).
241-437 Compilers: topDown/5
36
2.4. Extended Translation Rules
•
These extra rules allow a production body
to use *, [], or e.
•
S => Body
same as before
becomes
void S()
{ translate< Body > }
241-437 Compilers: topDown/5
37
•
If Body is
optional e part
B1 | B2 . . . | Bn | e
then it becomes:
if (currToken in FIRST_SEQ(B1))
translate<B1> ;
else if (currToken in FIRST_SEQ(B2))
translate<B2> ;
:
else if (currToken in FIRST_SEQ(Bn))
translate<Bn> ;
else
include if there's no
error();
e part in the grammar
241-437 Compilers: topDown/5
38
•
If Body is
[ B1 B2 . . . Bn ]
then it becomes:
rule []-1
if (currToken in FIRST_SEQ(B1)) {
translate<B1> ;
translate<B2> ;
:
translate<Bn> ;
}
–
[ B1 B2 ... Bn ] is the same as
( B1 B2 ... Bn ) | e
241-437 Compilers: topDown/5
39
•
A variant [] translation. If the body is
[ B1 B2 . . . Bn ] C
rule []-2
then it can become:
if (currToken not in FIRST_SEQ(C))
translate<B1> ;
translate<B2> ;
:
This may be
translate<Bn> ;
simpler code than
}
FIRST_SEQ(B1)
translate<C> ;
241-437 Compilers: topDown/5
40
•
Another variant [] translation.
If the grammar rule is
A => [ B1 B2 . . . Bn ]
rule []-3
then it becomes:
void A() {
if (currToken not in FOLLOW(A))
translate<B1> ;
translate<B2> ;
:
This may be
translate<Bn> ;
simpler code than
}
FIRST_SEQ(B1)
}
241-437 Compilers: topDown/5
41
•
If Body is
( B1 B2 . . . Bn )*
rule *-1
then it becomes:
while (currToken in FIRST_SEQ(B1))
translate<B1> ;
translate<B2> ;
:
translate<Bn> ;
}
241-437 Compilers: topDown/5
42
•
A variant * translation. If the body is
( B1 B2 . . . Bn )* C
rule *-2
then it becomes:
while (currToken not in FIRST_SEQ(C))
translate<B1> ;
translate<B2> ;
:
This may be
translate<Bn> ;
simpler code than
}
FIRST_SEQ(B1)
translate<C> ;
241-437 Compilers: topDown/5
43
•
Another variant * translation.
If the grammar rule is
A => ( B1 B2 . . . Bn )*
rule *-3
then it becomes:
void A() {
while (currToken not in FOLLOW(A))
translate<B1> ;
translate<B2> ;
:
This may be
translate<Bn> ;
simpler code than
}
FIRST_SEQ(B1)
}
241-437 Compilers: topDown/5
44
•
match() is slightly changed to deal with the
end of input symbol, $:
void match(Token expected)
{
if(currToken == expected) {
if (currToken != $)
currToken = scanner();
}
else
error();
}
241-437 Compilers: topDown/5
45
Translation Example 1
•
The LL(1) Grammar:
E => T E1
E1 => [ '+' T E1 ]
T => F T1
T1 => [ '*' F T1 ]
F => id | '(' E ')'
241-437 Compilers: topDown/5
This is the same grammar
as on slides 34-36, so
we know it's LL(1).
46
Generated Parser
void E()
{ T(); E1(); }
// E => T E1
void E1()
// E1 => ['+' T E1 ]
{
if (currToken == '+') {
use rule []-1
match('+');
T(); E1();
}
This is C code for
}
"currToken in FIRST_SEQ(+)"
241-437 Compilers: topDown/5
47
void T()
{ F(); T1(); }
// T => F T1
void T1()
// T1 => ['*' F T1 ]
{
if (currToken == '*') {
rule []-1
match('*');
F(); T1();
}
This is C code for
}
"currToken in FIRST_SEQ(*)"
241-437 Compilers: topDown/5
48
void F()
// F => id | '(' E ')'
{
if (currToken == ID)
match(ID);
else if (currToken == '(') {
match('(');
E();
match(')'):
}
else
error();
}
241-437 Compilers: topDown/5
49
Parsing "a + b * c"
a + b * c
input
E
T
F
id
a
E1
T1 + T E1
e
F
T1
e
id * F T1
b
e
id
c
241-437 Compilers: topDown/5
50
Optimizations
•
It's possible to combine grammar rules
and/or parse functions, in order to simplify
the compiler.
•
For example, we can combine:
– E and E1
– T and T1
241-437 Compilers: topDown/5
51
Translation Example 2
•
The previous LL(1) grammar can be
expressed using *:
E => T ( '+' T )*
T => F ( '*' F )*
F => id | '(' E ')'
same as before
241-437 Compilers: topDown/5
52
Generated Parser
•
void E()
// E => T ('+' T)*
{ T();
while (currToken == '+') {
rule *-1
match('+'); T();
}
}
void T()
// T => F ('*' F)*
{ F();
while (currToken == '*') {
rule *-1
match('*'); F();
}
}
241-437 Compilers: topDown/5
53
same as before
void F()
// F => id | '(' E ')'
{
if (currToken == ID)
match(ID);
else if (currToken == '(') {
match('(');
E();
match(')'):
}
else
error();
}
241-437 Compilers: topDown/5
54
Parsing "a + b * c" Again
E
T
id
a
241-437 Compilers: topDown/5
+ T
F *
F
id
b
done inside the
E() loop
F
done inside the
T() loop
id
c
55
3. The Expressions Language Parser
•
Is this grammar LL(1)?
Stats => ( [ Stat ] \n )*
Stat => let ID = Expr | Expr
Expr => Term ( (+ | - ) Term )*
Term => Fact ( (* | / ) Fact ) *
Fact => '(' Expr ')' | Int | ID
241-437 Compilers: topDown/5
56
3.1. FIRST and FOLLOW Sets
•
•
•
•
•
First(Stats) = {let, (, Int, Id, \n, e}
First(Stat) = {let, (, Int, Id}
First(Expr) = {(, Int, Id}
First(Term) = {(, Int, Id}
First(Fact) = {(, Int, Id}
241-437 Compilers: topDown/5
•
•
•
•
•
Follow(Stats) = {$}
Follow(Stat) = {\n}
Follow(Expr) = {\n}
Follow(Term) = {+, -, \n}
Follow(Fact) = {*, /, +,-,\n}
57
3.2. PREDICT Sets
•
Production
Stats => ( [ Stat ] \n )*
Stat => let ID = Expr
Stat => Expr
Expr => Term ( (+ | - ) Term )*
Term => Fact ( (* | / ) Fact ) *
Fact => '(' Expr ')'
Fact => Int
Fact => Id
241-437 Compilers: topDown/5
Predict
{let,(,Int,Id,\n,$}
{let}
{(,Int,Id}
{(,Int,Id}
{(,Int,Id}
{(}
{Int}
{Id}
Disjoint
Yes
Yes
Yes
Yes
Yes
58
3.3. exprParse0.c
•
exprParse0.c is a recursive descent parser
generated from the expressions grammar.
•
It reads in an expressions program file.
•
It's output is a print-out of parse function
calls.
241-437 Compilers: topDown/5
59
An Expressions Program (test1.txt)
5+6
let x = 2
3 + ( (x*y)/2) // comments
// y
let x = 5
let y = x /0
// comments
241-437 Compilers: topDown/5
60
Usage
> gcc -Wall -o exprParse0 exprParse0.c
> ./exprParse0 < test1.txt
1: stats<
2: stat<expr<term<fact<num(5) >>'+' term<fact<num(6) >>>>
3: stat<'let' var(x) '=' expr<term<fact<num(2) >>>>
4: stat<expr<term<fact<num(3) >>'+' term<fact<'('
expr<term<fact<'(' expr<term>
5:
6: stat<'let' var(x) '=' expr<term<fact<num(5) >>>>
7: stat<'let' var(y) '=' expr<term<fact<var(x) >'/' fact<num(0)
>>>>
8:
9:
10: >'eof'
241-437 Compilers: topDown/5
61
exprParse0.c Callgraph
lexical parser
(like exprTokens.c)
generated from
the grammar
241-437 Compilers: topDown/5
62
Standard Token Functions
// globals (first used in exprToken.c)
Token currToken;
char tokString[MAX_IDLEN];
int tokStrLen = 0;
int currTokValue;
int lineNum = 1;
// no. of lines read in
void nextToken(void)
{ currToken = scanner();
241-437 Compilers: topDown/5
}
continued
63
void match(Token expected)
{
if(currToken == expected){
printToken();
// produces the parser's output
if(currToken != SCANEOF)
currToken = scanner();
}
else
printf("Expected %s, found %s on line %d\n",
tokSyms[expected], tokSyms[currToken],lineNum);
} // end of match()
241-437 Compilers: topDown/5
continued
64
void printToken(void)
{
if (currToken == ID)
printf("%s(%s) ", tokSyms[currToken],
// show token string
else if (currToken == INT)
printf("%s(%d) ", tokSyms[currToken],
// show value
else if (currToken == NEWLINE)
printf("%s%2d: ", tokSyms[currToken],
// print newline token
else
printf("'%s' ", tokSyms[currToken]);
} // end of printToken()
241-437 Compilers: topDown/5
tokString);
currTokValue);
lineNum);
// other tokens
65
Syntax Error Reporting
void syntax_error(Token tok)
{ printf("\nSyntax error at \'%s\'
on line %d\n", tokSyms[tok], lineNum);
exit(1);
}
241-437 Compilers: topDown/5
66
main()
int main(void)
{
printf("%2d: ", lineNum);
nextToken();
statements();
match(SCANEOF);
printf("\n\n");
return 0;
}
function for
start symbol
check that program
is finished at eof
241-437 Compilers: topDown/5
67
Parsing Functions
void statements(void)
// Stats => ( [ Stat ] '\n' )*
{
printf("stats<");
while (currToken != SCANEOF) {
if (currToken != NEWLINE)
statement();
match(NEWLINE);
}
printf(">");
} // end of statements()
241-437 Compilers: topDown/5
rule *-3
rule []-2
68
void statement(void)
// Stat => ( 'let' ID '=' Expr ) | Expr
{
printf("stat<");
if (currToken == LET) {
match(LET);
Complicated, but
match(ID);
it can be optimized
match(ASSIGNOP);
with some 'tricks'
expression();
}
else if ((currToken == LPAREN) ||
(currToken == INT) || (currToken == ID))
expression();
else
error();
printf(">");
} // end of statement()
241-437 Compilers: topDown/5
69
Version 1
void expression(void)
// Expr => Term ( ( '+' | '-' ) Term )*
{
printf("expr<");
term();
while((currToken == PLUSOP) ||
(currToken == MINUSOP)) {
if (currToken == PLUSOP)
match(PLUSOP);
else if (currToken == MINUSOP)
match(MINUSOP);
else
error();
term();
}
printf(">");
} // end of expression()
241-437 Compilers: topDown/5
rule *-1
70
Version 2: simplified | code
void expression(void)
// Expr => Term ( ( '+' | '-' ) Term )*
{
printf("expr<");
term();
while((currToken == PLUSOP) ||
(currToken == MINUSOP)) {
match(currToken);
term();
}
printf(">");
} // end of expression()
241-437 Compilers: topDown/5
Shorter, but also
harder to
understand!
71
void term(void)
// Term => Fact ( ('*' | '/' ) Fact )*
{
printf("term<");
factor();
while((currToken == MULTOP) ||
(currToken == DIVOP)) {
if (currToken == MULTOP)
match(MULTOP);
else if (currToken == DIVOP)
match(DIVOP);
else
error();
factor();
}
printf(">");
} // end of term()
241-437 Compilers: topDown/5
Version 1
rule *-1
72
Version 2: simplified | code
void term(void)
// Term => Fact ( ('*' | '/' ) Fact )*
{
printf("term<");
factor();
while((currToken == MULTOP) ||
(currToken == DIVOP)) {
match(currToken);
factor();
}
printf(">");
} // end of term()
241-437 Compilers: topDown/5
Shorter, but also
harder to
understand!
73
void factor(void)
// Fact => '(' Expr ')' | INT | ID
{
printf("fact<");
if(currToken == LPAREN) {
match(LPAREN);
expression();
match(RPAREN);
}
else if(currToken == INT)
match(INT);
else if (currToken == ID)
match(ID);
else
syntax_error(currToken);
printf(">");
} // end of factor()
241-437 Compilers: topDown/5
74
4. LL(1) Parse Tables
•
The format of a parse table:
– T[non-term][term]
non-terminals
terminals
b
a production A => a
with b PREDICT(A=>a)
A
241-437 Compilers: topDown/5
75
Other Data Structures
•
•
Sequence of input tokens (ending with $).
A parse stack to hold nonterminals and
terminals that are being processed.
pop
push
E
$
241-437 Compilers: topDown/5
76
The Parsing Algorithm
push($); push(start_symbol);
currToken = scanner();
do
X = pop(stack);
if (X is a terminal or $) {
if (X == currToken)
like match()
currToken = scanner();
else error();
}
else // X is a non-terminal
if (T[X][currToken] == X => Y1 Y2 ...Ym )
push(Ym); ... push (Y1);
else error();
while (X != $);
241-437 Compilers: topDown/5
77
4.1. Table Parsing Example
•
Use the LL(1) grammar:
E => T E1
E1 => '+' T E1 | e
T => F T1
T1 => '*' F T1 | e
F => id | '(' E ')'
241-437 Compilers: topDown/5
78
Parse Table Generation
NT/T +
E
2
E1
T
T1
F
*
(
)
1
3
5
$
1
4
6
ID
3
4
6
8
241-437 Compilers: topDown/5
6
7
Production
Predict
1: E => T E1
{(,id}
2: E1 => + T E1 {+}
3: E1 => e
{$,)}
4: T => F T1
{(,id}
5: T1 => * F T1
{*}
6: T1 => e
{+,$,)}
7: F => id
{id}
8: F => ( E )
{(}
79
Parsing "a + b * c $"
Stack
Input Action
$E
a+b*c$
E => T E1
$E1 T
"
T => F T1
Stack
$E1 T1 F
"
F => id
$E1 T1 F
"
F => id
$E1 T1 id
"
match
$E1 T1 id
"
match
T1 => e
$E1 T1
$E1 T1
+b*c$
$E1
"
E1 => + T E1
$E1 T1 F *
$E1 T+
"
match
$E1 T1 F
T => F T1
$E1 T1 id
$E1 T
b*c$
$E1 T1
241-437 Compilers: topDown/5
Input
*c$
Action
T1 => * F T1
"
c$
match
F => id
"
match
T1 => e
$
$E1
"
E1 => e
$
"
Success!
80
5. Making a Grammar LL(1)
•
Not all context free grammars are LL(1).
•
We can tell if a grammar is not LL(1) by
looking at its PREDICT sets
– for a LL(1) grammar, the PREDICT sets for a
non-terminal will be disjoint
241-437 Compilers: topDown/5
81
Example
Production
Predict
E => E + T
= FIRST(E) = {(,id}
E => T
= FIRST(T) = {(,id}
T => T * F
= FIRST(T) = {(,id}
T => F
= FIRST(F) = {(,id}
F => id
= {id}
F => ( E )
= {(}
•FIRST(F) = {(,id}
•FIRST(T) = {(,id}
•FIRST(E) = {(,id}
•FOLLOW(E) = {$,),+}
•FOLLOW(T) = {+,$,),*}
•FOLLOW(F) = {+,$,),*}
E and T are problems since their PREDICT
sets are not disjoint.
241-437 Compilers: topDown/5
82
Example of Disjoint Problem
•
•
Input "5 + b"
There are two productions to choose from:
E => E + T
E => T
•
Which should be chosen by looking only at
the current token "5"?
241-437 Compilers: topDown/5
83
5.1. From non-LL(1) to LL(1)
•
There are two main techniques for
converting a non-LL(1) grammar to LL(1).
– but they don't work for every grammar
•
1. Left Factoring
– e.g. used on
•
A => B a C D | B a C E
2. Transforming left recursion to right
recursion
– e.g. used on
241-437 Compilers: topDown/5
E => E + T | T
84
5.2. Left Factoring
•
S => a B | a C
– to see the problem try choosing a production to
parse "a" in "andrew"
•
Change S to:
S => a S1
S1 => B | C
– now there is no difficult choice
241-437 Compilers: topDown/5
85
•
In general:
A => a b1 | a b2 | . . . | a bn
becomes
A => a A1
A1 => b1 | b2 | . . . | bn
241-437 Compilers: topDown/5
86
5.3. Why is Left Recursion a Problem?
•
Grammar:
A => A b
A => b
•
•
The input is "bbbb".
Using only the current token, "b", which
production should be used?
241-437 Compilers: topDown/5
87
Remove Left Recursion
A => A a1 | A a2 | … | b1 | b2 | …
becomes
A => b1 A1 | b2 A1 | …
A1 => a1 A1 | a2 A1 | … | e
•
The left recursion is changed to right
recursion in the new A1 rule.
241-437 Compilers: topDown/5
88
Example Translation
•
The left recursive grammar:
A => A b | b
becomes
A => b A1
A1 => b A1 | e
•
Try parsing the input string "bbbb" using
only the current token "b".
241-437 Compilers: topDown/5
89
Fixing the E Grammar
•
The folowing E grammar is not LL(1):
E => E + T | T
T => T * F | F
F => id | ( E )
•
Try parsing "5 + b"
241-437 Compilers: topDown/5
continued
90
•
Eliminate left recursion in E and T:
E => T E1
E1 => + T E1 | e
T => F T1
T1 => * F T1 | e
F => id | ( E )
•
This version of the E grammar is LL(1), and
we've been using it for most of our examples.
241-437 Compilers: topDown/5
91
5.4. Non-Immediate Left Recursion
•
Ex: A1 => A2 a | b
A2 => A1 c | A2 d
•
Convert to immediate left recursion
A1
A2
– replace A1 in A2 productions by A1’s definition:
A1 => A2 a | b
A2 => A2 a c | b c | A2 d
•
Now eliminate left recursion in A2:
A1 => A2 a | b
A2 => b c A3
A3 => a c A3 | d A3 | e
241-437 Compilers: topDown/5
92
Example
A => B c | d
B => C f | B f
C => A e | g
•
A
B
C
Replace C in B's production by C's defn:
B => A e f | g f | B f
•
Replace A in B's production by A's defn:
B => B c e f | d e f | g f | B f
241-437 Compilers: topDown/5
93
•
Now grammar is:
A => B c | d
B => B c e f | d e f | g f | B f
C => A e | g
•
Get rid of left recursion in B:
A => B c | d
B => d e f B1 | g f B1
B1 => c e f B1 | f B1 | e
C => A e | g
241-437 Compilers: topDown/5
If A is the start
symbol, then the
C production is
never called, so
can be deleted.
94
6. Error Recovery in LL Parsing
•
Simple answer:
– when there's an error, print a message and exit
•
Better error recovery:
– 1. insert the expected token and continue
•
this approach can cause non-termination
– 2. keep deleting tokens until the parser gets a
token in the FOLLOW set for the production
that went wrong
•
see example on next slide
241-437 Compilers: topDown/5
95
Example: E→T E1
from slide 29
void E()
{
if (currToken in FIRST(T)) { // error checking
T(); E1();
// FIRST(T) == {(,ID}
}
else {
// error reporting and recovery
printf("Expecting one of FIRST(T)");
while (currToken not in FOLLOW(E))
// FOLLOW(E) == {),$}
currToken = scanner();
// skip input
}
} // end of E()
241-437 Compilers: topDown/5
96
C Code
void E()
{
if ((currToken == LPAREN) || (currToken == ID)) {
T(); E1();
}
else {
printf("Expecting ( or id");
while ( (currToken != RPAREN) &&
(currToken != SCANEOF))
currToken = scanner();
}
} // end of E()
241-437 Compilers: topDown/5
97
© Copyright 2026 Paperzz