Top-Down Parsing: id + id * id
E
E’
T
T’
F
T E’
+ T E’ | ε
F T’
* F T’ | ε
( E ) | id
Top-Down Parsing
We will look at two different ways to implement top-
down parser.
A recursive descent parser which can backtrack
A predictive parser which is characterized by its
ability to choose the production to apply solely on the
basis of the next input symbol and the current non
terminal being processed.
Predictive Parsers
Like recursive-descent but parser can
“predict” which production to use
By looking at the next few tokens
No backtracking
Predictive parsers accept LL(k) grammars
L means “left-to-right” scan of input
L means “leftmost derivation”
k means “prediction based on k tokens of lookahead”
In practice, LL(1) grammar is used
Informally, an LL(1) has no left-recursive productions
and has been left-factored.
Cont’d
Note also that there exist many grammars that cannot
be modified to become LL(1).
In such cases, another parsing technique must be
employed, or special rules must be embedded into the
predictive parser.
Recursive Descent
General form of top-down parsing which may involve
backtracking (making repeated scans of input)
A recursive-descent parser consists of several small
functions, one for each non terminal in the grammar.
As we parse a sentence, we call the functions that
correspond to the left side non terminal of the
productions we are applying.
If these productions are recursive, we end up calling
the functions recursively.
A typical procedure for NT
Void A() {
choose an A production A-> X1X2…Xk;
for(i=1 to k){
if ( Xi is a NT )
call procedure Xi();
else if (Xi ==Current input symbol a)
advance the input to the next symbol;
else
error;
}
}
Recursive Descent - Example
S->cAd
A-> ab|a
Input string w=cad
Parse Trees in top-down fashion
S
/ | \
c A d
S
/ | \
c A d
/ \
a b
S
/ | \
c A d
|
a
Calculating first sets
To calculate First(u) where u has the form
X1X2...Xn,
a) If X1 is a terminal, add X1 to First(u).
b) Else X1 is a nonterminal, then add all non- ε
symbols of First(X1) First(u).
If X1 is a nullable nonterminal, i.e., X1 =>* ε, add the
non- ε symbols of First(X2) to First(u).
Furthermore, if X2 can also go to ε, then add non- ε
symbols of First(X3) and so on.
Finally add ε to FIRST(u) if X1X2...Xn =>* ε.
Calculating follow sets
For each non terminal in the grammar:1. Place EOF in Follow(S) where S is the start symbol and
EOF is the input's right end marker.
The end marker might be end of file, newline, or a special
symbol,
whatever is the expected end of input indication for this
grammar.
We will typically use $ as the end marker.
2. For every production A –> uBv where u and v are any
string of grammar symbols and B is a non terminal,
everything in First(v) except ε is placed in Follow(B).
3. For every production A –> uB, or a production A –> uBv
where First(v) contains ε (i.e.v is nullable), then everything
in Follow(A) is added to Follow(B).
Here is a complete example of first and follow set
computation, starting with this
grammar:
S –> AB
A –> Ca | ε
B –> BaAC | c
C –> b | ε
Notice we have a left-recursive production that must be
fixed if we are to use LL(1) parsing:
B –> BaAC | c becomes B –> cB'
B' –> aACB' | ε
The new grammar is:
S –> AB
A –> Ca | ε
B –> cB'
B' –> aACB' | ε
C –> b | ε
It helps to first compute the nullable set (i.e., those
non terminals X that X =>* ε),
since you need to refer to the nullable status of various
non terminals when computing the first and follow
sets:
Nullable (G) = {A B' C}
The first sets for each non terminal are:
First(C) = {b ε}
First(B') = {a ε}
First(B) = {c}
First(A) = {b a ε}
Start with First(C) - ε, add a (since C is nullable) and ε
(since A itself is nullable)
First(S) = {b a c}
Start with First(A) - ε, add First(B) (since A is
nullable).
We don’t add ε (since S itself is not-nullable— A can
go away, but B cannot)
To compute the follow sets, take each non terminal and go
through all the right-side productions that the non
terminal is in, matching to the steps given earlier:
Follow(S) = {$}
S doesn’t appear in the right hand side of any productions. We
put $ in the follow set because S is the start symbol.
Follow(B) = {$}
B appears on the right hand side of the S –> AB production.
Its follow set is the same as S.
Follow(B') = {$}
B' appears on the right hand side of two productions.
The B' –> aACB‘ production tells us its follow set includes the
follow set of B', which is tautological.
From B –> cB', we learn its follow set is the same as B.
cont;’d………….
Follow(C) = {a $}
C appears in the right hand side of two productions.
The production A –> Ca tells us a is in the follow set.
From B' –> aACB' , we add the First(B') which is just a again.
Because B' is nullable, we must also add Follow(B') which is $.
Follow(A) = {c b a $}
A appears in the right hand side of two productions.
From S –> AB we add First(B) which is just c.
B is not nullable.
From B' –> aACB' , we add First(C) which is b.
Since C is nullable, so we also include First(B') which is a.
B' is also nullable, so we include Follow(B') which adds $.
Example
Recall the grammar
E→TX
X→+E|ε
T → ( E ) | int Y
Y→*T|ε
Follow sets
Follow( + ) = { int, ( }
Follow( * ) = { int, ( }
Follow( ( ) = { int, ( }
Follow( E ) = {), $}
Follow( X ) = {$, ) }
Follow( T ) = {+, ) , $}
Follow( ) )=FIRST(X) = {+, ) , $}
Follow( Y ) = {+, ) , $}
Follow( int) = {*, +, ) , $}
Table-driven LL(1) Parsing
In a recursive-descent parser, the production
information is embedded in the individual parse
functions for each non terminal and the run-time
execution stack is keeping track of our progress
through the parse.
There is another method for implementing a
predictive parser that uses a table to store that
production along with an explicit stack to keep track of
where we are in the parse.
Example 1
This grammar for add/multiply expressions is already
set up to handle precedence and associativity:
E –> E + T | T
T –> T * F | F
F –> (E) | int
After removal of left recursion, we get:
E –> TE'
E' –> + TE' | ε
T –> FT'
T' –> * FT' | ε
F –> (E) | int
E –> TE'
E' –> + TE' | ε
T –> FT'
T' –> * FT' | ε
F –> (E) | int
First(E) = First(T) = First(F) = { ( , int }
First(T') = { *, ε }
First(E') = { + , ε }
Follow(E) = Follow(E') = { $, ) }
Follow(T) = Follow(T') = { + , $ ) }
Follow(F) =FIRST(T’) = FOLLOW(T) = { *, + , $ ) }
Algorithm-Construction of a predictive
parsing table
Input- Grammar G
Output- Parsing Table M
Method-
For Each production A α of the grammar, do the
following
For each terminal a in FIRST(A), add A α to M[ A, a}.
If ε is in FIRST(α) then for each terminal b in
FOLLOW( A), add A α to M[A, b].
If ε is in FIRST(α) and $ is in FOLLOW( A), add A α to
M[A, $] as well.
Input/Top
of parse
Stack
int
E
E–> TE'
E’
T
*
(
T–> FT'
$
E '–> ε
E '–> ε
T '–> ε
T '–> ε
T–> FT'
T '–>ε
F–> int
)
E–> TE '
E '–> +TE'
T’
F
+
T '–> *FT'
F–> (E)
Execution
Input String = int + int
M[ E, int] = E TE’
2. M[ T, int] = TFT’
3. M[ F, int] = F int(match)
1.
4. M[ T’, +] = T’ ε
5. M[E’, +] = E’ +TE’(match)
6. M[ T, int] = TFT’
7. M[ F, int] = F int(match)
8. M[ T’, $] = T’ ε
9. M[ E’, $] = E’ ε (input accepted)
Example 2
Recall the grammar
E→TX
X→+E|ε
T → ( E ) | int Y
Y→*T|ε
First sets
First( T ) = {int, ( }
First( E ) = {int, ( }
First( X ) = {+, ε }
First( Y ) = {*, ε }
Follow Sets
Follow( E ) = {), $}
Follow( T ) = {+, ) , $}
Follow( X ) = {$, ) }
Follow( Y ) = {+, ) , $}
Input/To
p of
parse
Stack
int
T
T int Y
E
ETX
+
*
(
)
$
Xε
Xε
Yε
Yε
T(E)
ETX
X
X+ E
Y
Yε
Y * T
Non recursive Predictive parsing
(Using Stack) input
a
X
Y
Stack
+
Predictive
Parsing
Program
Z
P
$
Parsing Table
M
b
$
output
Non recursive Predictive parsing
(Using Stack)
The input buffer contains the string to be parsed,
followed by the end-marker $.
Use stack explicitly, rather than implicitly via
recursive calls.
The symbol $ marks the bottom of the stack.
Initially the stack contains the start symbol of the
grammar.
Algorithm
Set input to point to first symbol of w$.
Let X be the top of the stack and a be the current token
While(X≠ $) /* Stack is not empty */
{
if (X is a) pop the stack and advanced input
else if (X is another terminal )
error()
else if (M[X, a] is an error entry) error();
else if (M[X, a] = X Y1Y2 …Yk )
{
output the production X Y1Y2 …Yk
pop X from the stack
push Yk Yk-1 …Y2Y1 onto the stack with Y1 on top
}
set Y1 to the top of the stack
}
Parse Trace of int + int
Matched
Stack
Input
Action
E$
int+ int $
TE’$
int+ int $
ETE’
FT’E’ $
int+ int $
TFT’
int T’E’ $
int + int $
Fint
int
T’E’ $
+ int $
Match int
int
E’ $
+ int $
T’ε
int
+ TE’ $
+ int $
E’ +TE’
int +
TE’ $
int $
Match +
Int +
FT’E’ $
int $
TFT’
Int +
int T’E’ $
int $
Fint
int + int
T’E’ $
$
Match int
int + int
E’ $
$
T’ε
int + int
$
$
T’ε
int + int
$
$
Input accepted
Error detection in LL(1) parsing
An error is detected whenever an empty table slot is
encountered.
We would like our parser to be able to recover from an
error and continue parsing.
Phase-level recovery
We associate each empty slot with an error handling
procedure.
Panic mode recovery
Modify the stack and/or the input string to try and reach
state from which we can continue.
Error recovery in LL(1) parsing
Panic mode recovery
Idea:
Decide on a set of synchronizing tokens.
When an error is found and there's a nonterminal at the top of
the stack, discard input tokens until a synchronizing token is
found.
Synchronizing tokens are chosen so that the parser can
recover quickly after one is found
e.g. a semicolon when parsing statements.
If there is a terminal at the top of the stack, we could try
popping it to see whether we can continue.
Assume that the input string is actually missing that terminal.
Error recovery in LL(1) parsing
Panic mode recovery
Possible synchronizing tokens for a nonterminal A
the tokens in FOLLOW(A)
When one is found, pop A of the stack and try to continue
the tokens in FIRST(A)
When one is found, match it and try to continue
tokens such as semicolons that terminate statements
Thank You !!!!!!
© Copyright 2026 Paperzz