lecture1 8

Top-Down Parsing: id + id * id
E
E’
T
T’
F
 T E’
 + T E’ | ε
 F T’
 * F T’ | ε
 ( E ) | id
Top-Down Parsing
 We will look at two different ways to implement top-
down parser.
 A recursive descent parser which can backtrack
 A predictive parser which is characterized by its
ability to choose the production to apply solely on the
basis of the next input symbol and the current non
terminal being processed.
 Predictive Parsers
 Like recursive-descent but parser can
 “predict” which production to use

By looking at the next few tokens
 No backtracking
 Predictive parsers accept LL(k) grammars
 L means “left-to-right” scan of input
 L means “leftmost derivation”
 k means “prediction based on k tokens of lookahead”
 In practice, LL(1) grammar is used
 Informally, an LL(1) has no left-recursive productions
and has been left-factored.
Cont’d
 Note also that there exist many grammars that cannot
be modified to become LL(1).
 In such cases, another parsing technique must be
employed, or special rules must be embedded into the
predictive parser.
Recursive Descent
 General form of top-down parsing which may involve
backtracking (making repeated scans of input)
 A recursive-descent parser consists of several small
functions, one for each non terminal in the grammar.
 As we parse a sentence, we call the functions that
correspond to the left side non terminal of the
productions we are applying.
 If these productions are recursive, we end up calling
the functions recursively.
A typical procedure for NT
 Void A() {
choose an A production A-> X1X2…Xk;
for(i=1 to k){
if ( Xi is a NT )
call procedure Xi();
else if (Xi ==Current input symbol a)
advance the input to the next symbol;
else
error;
}
}
Recursive Descent - Example
 S->cAd
 A-> ab|a
 Input string w=cad
 Parse Trees in top-down fashion
S
/ | \
c A d
S
/ | \
c A d
/ \
a b
S
/ | \
c A d
|
a
Calculating first sets
To calculate First(u) where u has the form
X1X2...Xn,
 a) If X1 is a terminal, add X1 to First(u).
 b) Else X1 is a nonterminal, then add all non- ε
symbols of First(X1) First(u).
 If X1 is a nullable nonterminal, i.e., X1 =>* ε, add the
non- ε symbols of First(X2) to First(u).
 Furthermore, if X2 can also go to ε, then add non- ε
symbols of First(X3) and so on.
 Finally add ε to FIRST(u) if X1X2...Xn =>* ε.
Calculating follow sets
For each non terminal in the grammar:1. Place EOF in Follow(S) where S is the start symbol and
EOF is the input's right end marker.
 The end marker might be end of file, newline, or a special
symbol,
 whatever is the expected end of input indication for this
grammar.
 We will typically use $ as the end marker.
2. For every production A –> uBv where u and v are any
string of grammar symbols and B is a non terminal,
everything in First(v) except ε is placed in Follow(B).
3. For every production A –> uB, or a production A –> uBv
where First(v) contains ε (i.e.v is nullable), then everything
in Follow(A) is added to Follow(B).
Here is a complete example of first and follow set
computation, starting with this
grammar:




S –> AB
A –> Ca | ε
B –> BaAC | c
C –> b | ε
 Notice we have a left-recursive production that must be
fixed if we are to use LL(1) parsing:
 B –> BaAC | c becomes B –> cB'
 B' –> aACB' | ε
 The new grammar is:
 S –> AB
 A –> Ca | ε
 B –> cB'
 B' –> aACB' | ε
 C –> b | ε
 It helps to first compute the nullable set (i.e., those
non terminals X that X =>* ε),
 since you need to refer to the nullable status of various
non terminals when computing the first and follow
sets:
 Nullable (G) = {A B' C}
The first sets for each non terminal are:





First(C) = {b ε}
First(B') = {a ε}
First(B) = {c}
First(A) = {b a ε}
Start with First(C) - ε, add a (since C is nullable) and ε
(since A itself is nullable)
 First(S) = {b a c}
 Start with First(A) - ε, add First(B) (since A is
nullable).
 We don’t add ε (since S itself is not-nullable— A can
go away, but B cannot)
 To compute the follow sets, take each non terminal and go
through all the right-side productions that the non
terminal is in, matching to the steps given earlier:
 Follow(S) = {$}
 S doesn’t appear in the right hand side of any productions. We
put $ in the follow set because S is the start symbol.
 Follow(B) = {$}
 B appears on the right hand side of the S –> AB production.
Its follow set is the same as S.
 Follow(B') = {$}
 B' appears on the right hand side of two productions.
 The B' –> aACB‘ production tells us its follow set includes the
follow set of B', which is tautological.
 From B –> cB', we learn its follow set is the same as B.
cont;’d………….
 Follow(C) = {a $}
 C appears in the right hand side of two productions.
 The production A –> Ca tells us a is in the follow set.
 From B' –> aACB' , we add the First(B') which is just a again.
 Because B' is nullable, we must also add Follow(B') which is $.
 Follow(A) = {c b a $}
 A appears in the right hand side of two productions.
 From S –> AB we add First(B) which is just c.
 B is not nullable.
 From B' –> aACB' , we add First(C) which is b.
 Since C is nullable, so we also include First(B') which is a.
 B' is also nullable, so we include Follow(B') which adds $.
Example
 Recall the grammar
 E→TX
 X→+E|ε
 T → ( E ) | int Y
 Y→*T|ε
 Follow sets
 Follow( + ) = { int, ( }
Follow( * ) = { int, ( }
 Follow( ( ) = { int, ( }
Follow( E ) = {), $}
 Follow( X ) = {$, ) }
Follow( T ) = {+, ) , $}
 Follow( ) )=FIRST(X) = {+, ) , $}
 Follow( Y ) = {+, ) , $}
 Follow( int) = {*, +, ) , $}
Table-driven LL(1) Parsing
 In a recursive-descent parser, the production
information is embedded in the individual parse
functions for each non terminal and the run-time
execution stack is keeping track of our progress
through the parse.
 There is another method for implementing a
predictive parser that uses a table to store that
production along with an explicit stack to keep track of
where we are in the parse.
Example 1
 This grammar for add/multiply expressions is already
set up to handle precedence and associativity:
 E –> E + T | T
 T –> T * F | F
 F –> (E) | int






After removal of left recursion, we get:
E –> TE'
E' –> + TE' | ε
T –> FT'
T' –> * FT' | ε
F –> (E) | int
 E –> TE'
 E' –> + TE' | ε
 T –> FT'
 T' –> * FT' | ε
 F –> (E) | int
 First(E) = First(T) = First(F) = { ( , int }
 First(T') = { *, ε }
 First(E') = { + , ε }
 Follow(E) = Follow(E') = { $, ) }
 Follow(T) = Follow(T') = { + , $ ) }
 Follow(F) =FIRST(T’) = FOLLOW(T) = { *, + , $ ) }
Algorithm-Construction of a predictive
parsing table
 Input- Grammar G
 Output- Parsing Table M
 Method-
 For Each production A α of the grammar, do the
following
 For each terminal a in FIRST(A), add A α to M[ A, a}.
 If ε is in FIRST(α) then for each terminal b in
FOLLOW( A), add A α to M[A, b].
 If ε is in FIRST(α) and $ is in FOLLOW( A), add A α to
M[A, $] as well.
Input/Top
of parse
Stack
int
E
E–> TE'
E’
T
*
(
T–> FT'
$
E '–> ε
E '–> ε
T '–> ε
T '–> ε
T–> FT'
T '–>ε
F–> int
)
E–> TE '
E '–> +TE'
T’
F
+
T '–> *FT'
F–> (E)
Execution
 Input String = int + int
M[ E, int] = E TE’
2. M[ T, int] = TFT’
3. M[ F, int] = F int(match)
1.
4. M[ T’, +] = T’ ε
5. M[E’, +] = E’ +TE’(match)
6. M[ T, int] = TFT’
7. M[ F, int] = F int(match)
8. M[ T’, $] = T’ ε
9. M[ E’, $] = E’ ε (input accepted)
Example 2
 Recall the grammar
 E→TX
 X→+E|ε
 T → ( E ) | int Y
 Y→*T|ε
First sets
First( T ) = {int, ( }
 First( E ) = {int, ( }
 First( X ) = {+, ε }
 First( Y ) = {*, ε }
Follow Sets
Follow( E ) = {), $}
Follow( T ) = {+, ) , $}
Follow( X ) = {$, ) }
Follow( Y ) = {+, ) , $}
Input/To
p of
parse
Stack
int
T
T  int Y
E
ETX
+
*
(
)
$
Xε
Xε
Yε
Yε
T(E)
ETX
X
X+ E
Y
Yε
Y * T
Non recursive Predictive parsing
(Using Stack) input
a
X
Y
Stack
+
Predictive
Parsing
Program
Z
P
$
Parsing Table
M
b
$
output
Non recursive Predictive parsing
(Using Stack)
 The input buffer contains the string to be parsed,
followed by the end-marker $.
 Use stack explicitly, rather than implicitly via
recursive calls.
 The symbol $ marks the bottom of the stack.
 Initially the stack contains the start symbol of the
grammar.
Algorithm
 Set input to point to first symbol of w$.
 Let X be the top of the stack and a be the current token
 While(X≠ $) /* Stack is not empty */
{
if (X is a) pop the stack and advanced input
else if (X is another terminal )
error()
else if (M[X, a] is an error entry) error();
else if (M[X, a] = X  Y1Y2 …Yk )
{
output the production X  Y1Y2 …Yk
pop X from the stack
push Yk Yk-1 …Y2Y1 onto the stack with Y1 on top
}
set Y1 to the top of the stack
}
Parse Trace of int + int
Matched
Stack
Input
Action
E$
int+ int $
TE’$
int+ int $
ETE’
FT’E’ $
int+ int $
TFT’
int T’E’ $
int + int $
Fint
int
T’E’ $
+ int $
Match int
int
E’ $
+ int $
T’ε
int
+ TE’ $
+ int $
E’ +TE’
int +
TE’ $
int $
Match +
Int +
FT’E’ $
int $
TFT’
Int +
int T’E’ $
int $
Fint
int + int
T’E’ $
$
Match int
int + int
E’ $
$
T’ε
int + int
$
$
T’ε
int + int
$
$
Input accepted
Error detection in LL(1) parsing
 An error is detected whenever an empty table slot is
encountered.
 We would like our parser to be able to recover from an
error and continue parsing.
 Phase-level recovery
 We associate each empty slot with an error handling
procedure.
 Panic mode recovery
 Modify the stack and/or the input string to try and reach
state from which we can continue.
Error recovery in LL(1) parsing
 Panic mode recovery
 Idea:




Decide on a set of synchronizing tokens.
When an error is found and there's a nonterminal at the top of
the stack, discard input tokens until a synchronizing token is
found.
Synchronizing tokens are chosen so that the parser can
recover quickly after one is found
 e.g. a semicolon when parsing statements.
If there is a terminal at the top of the stack, we could try
popping it to see whether we can continue.
 Assume that the input string is actually missing that terminal.
Error recovery in LL(1) parsing
 Panic mode recovery
 Possible synchronizing tokens for a nonterminal A



the tokens in FOLLOW(A)
 When one is found, pop A of the stack and try to continue
the tokens in FIRST(A)
 When one is found, match it and try to continue
tokens such as semicolons that terminate statements
Thank You !!!!!!