Top-down Parsing, LL(1)

CS 404
Introduction to Compiler Design
Lecture 4
Ahmed Ezzat
Top-Down Parsing LL(1),
Bottom-Up Parsing LR
1
CS 404
Ahmed Ezzat
1. Top-down Parsing

Predictive: try to guess which production
rule to apply next, given
–
–

Two ways to do predictive parsing
–
–
2
The current non-terminal symbol
One or more ‘look-ahead’ terminal symbols
Use recursive procedures
Use a predictive parsing table
CS 404
Ahmed Ezzat
LL(1) Grammar




3
A restrict set of grammars with no need to backtrack
Uses an explicit stack rather than recursive calls to
perform parsing
LL(k) parsing means that k tokens of lookahead are used
LL(1):
 L: scan input string from left to right
 L: left-most derivation is applied at each step
 1: one input symbol for lookahead
CS 404
Ahmed Ezzat
Two Separate Steps
1.
2.
4
Construct LL(1) parsing table
– Compute FIRST and FOLLOW
– Construct the parsing table
– Parsing Stack: that holds grammar symbol: nonterminals and tokens.
Parsing strings using the parsing table
CS 404
Ahmed Ezzat
FIRST and FOLLOW sets


5
FIRST(α) contains any symbol that might
begin a sentence derived from α
FOLLOW(A) includes all symbols that could
appear immediately after A in a valid
sentence
CS 404
Ahmed Ezzat
Compute FIRST
6

If x is a terminal, then FIRST(x) = {x}

If xε, then add ε to FIRST(x)

If x is non-terminal and XY1Y2…Yk, then add z to
FIRST(x) if for some i, z is in FIRST(Yi) and ε is in
FIRST(Yj) for all j<i
CS 404
Ahmed Ezzat
Compute FIRST for a String α

7
(FI4) For α = X1X2…Xn
–
Add all non-ε symbols of FIRST(X1) to FIRST(α)
–
Add all non- ε symbols of FIRST(Xj) to FIRST(α) if
ε is in all FIRST(Xi) for i<j
–
Add ε to FIRST(α) if ε is in all FIRST(Xi) for all i
CS 404
Ahmed Ezzat
Compute FOLLOW
8

(FO1) Put $ in FOLLOW(S) ($ is called endmarker)

(FO2) If AαBβ, then put FIRST(β) (except ε) into
FOLLOW(B)

(FO3) If AαB, then put FOLLOW(A) into
FOLLOW(B)

(FO4) If AαBβ and βε, then put FOLLOW(A)
into FOLLOW(B)
CS 404
Ahmed Ezzat
Predictive Parsing and
Left factoring Example



9
Assume the following Grammar:
 ET+E|T
 T  int | int * T | (E)
Hard to predict because:
 For T, 2 productions start with int
 For E, it is not clear how to predict
The Grammar must be left-factored before being
used for predictive parsing
CS 404
Ahmed Ezzat
Predictive Parsing and
Left factoring Example


Assume the following Grammar:
 ET+E|T
 T  int | int * T | (E)
Factor out common prefixes of productions,
possibly introducing ε-productions
int
*
+
(
)
E  TX
E
TX
TX
X+E|ε
X
+E
ε
T  (E) | int Y
Y*T|ε
T
Int Y
(E)
Y
10
CS 404
*T
ε
ε
Ahmed Ezzat
$
ε
ε
Construct the Parsing Table

For each production rule Aα
–
–
–
11
[M1] For each terminal a in FIRST(α), add Aα
to M[A,a]
[M2] If ε is in FIRST(α), add Aα to M[A,b] for
each terminal b in FOLLOW(A). (b can be $)
Unidentified entry of M are ‘error entries’
CS 404
Ahmed Ezzat
Use Parsing Table to Parse




Push $S into the stack; attach $ to the end of
the string. x is the stack top, a is the input
If x=a=$, success
If x=a<>$, pop x, advance input
If x is non-terminal
–
–
12
If M[x,a] = {xUVW}, replace x by WVU (U on
top)
If M[x,a] has no rule, error
CS 404
Ahmed Ezzat
Use Parsing Table
Example to Parse
E  TX
X+E|ε
T  (E) | int Y
Y*T|ε
13
CS 404
Ahmed Ezzat
2. Bottom-up Parsing

14
Start from the leaf nodes of a tree and works
in upward direction till reaching the root node
CS 404
Ahmed Ezzat
Bottom-up Parsing





15
Start with string of terminals
Builds up from leaves of parse tree
Apply production rules backwards (reduction)
When reach start symbol and exhausted
input, done
Shift-reduce is one common type of bottomup parsing
CS 404
Ahmed Ezzat
Bottom-up Parsing

Shift-Reduce Parsing:



LR Parser: it is non-recursive, shift-reduce, bottom-up parser



16
Shift: advance input pointer to next input symbol; symbol is
pushed into the stack
Reduce: when parser finds complete grammar rule (RHS) and
replace it to (LHS)
SLR(1): Simple LR parser; works on smallest class of grammar
LR(1): Works on complete set of LR(1) grammar
LALR(1): Look-Ahead LR parser. Works on intermediate size of
grammar. # of states is the same as in SLR(1).
CS 404
Ahmed Ezzat
Bottom-up Parsing - Example

Bottom-up parser traces rightmost derivation in reverse
E
int * int + int
int * T + int
T
T + int
T+T
T+E
T
int
E
17
E
CS 404
*
int
T
+
Ahmed Ezzat
int
Shift-reduce Parsing




Use context-free grammar (may not be LL1)
Use stack to keep track of tokens seen so far
Hard to do manually, but best with Yacc
Basic idea:
–
–
18
Shift next symbol onto stack
When stack top contains a good right-hand-side
of a production, reduce by a rule
CS 404
Ahmed Ezzat
When to Shift or Reduce?

Reduce if top of stack represents the right
hand side of a production rule (a handle)
–



19
Need to recognize handles
If cannot reduce and there are more inputs,
shift
If cannot shift or reduce, error
Use Action and Goto tables to help decide
CS 404
Ahmed Ezzat
Shift or Reduce Example


20
Shift: Move | one place to the right
 Shifts a terminal to the left string
ABC|XYZ  ABCX|YZ
Reduce: apply an inverse production rule at
the right end of the left string
 If A  XY is a production rule, then
Cbxy|ijk  CbA|ijk
CS 404
Ahmed Ezzat
LR Parsing





21
Left to right input (Left scan)
Right-most derivation in reverse order
Efficient, table based parsing by shift-reduce
Can handles more grammar than LL(1)
Can handle most programming languages
CS 404
Ahmed Ezzat
LR Parsing Data Structure



22
Stack of states {S0, …, Sm}
Action Table: Action[S’,a], a is terminal.
Tells the parser whether to:
 Shift (S’)
 Reduce (R)
 Accept (A) the source code, or
 Signal a syntactic error (E)
Goto Table: Goto[S’,X], X is non-terminal.
Defines the next state after a shift
CS 404
Ahmed Ezzat
LR Parsing Data Structure
Input
a1
…..
ai
…..
an
$
Sm
Sm-1
…
$
Stack
23
LR Parsing Program
ACTION
Output
GOTO
LR Parser Model
CS 404
Ahmed Ezzat
LR Parsing Algorithm



Initially push S0
Given state S’ on top of stack, with input
symbol a
If (Action[S’,a] = shift S’)
–
–
24
Push a, then S’ onto stack
Move to next input symbol
CS 404
Ahmed Ezzat
LR Parsing Algorithm (continue)

If (Action[S’,a] = reduce AX1X2…Xn)
–
–
–
–


25
Pop off n states (and n terminals) to find Su on top
of stack
Push A
Push new state Goto[Su,A]
Output production AX1X2…Xn
If action[S’,a] = accept, done!
If action[S’,a] = error, error!
CS 404
Ahmed Ezzat
LR Parsing With Only States on Stack

If (Action[S,a] = shift S)
–

If (Action[S,a] = reduce AX1X2…Xn)
–
26
Push S onto stack
Pop off n states to find Su on top of stack
CS 404
Ahmed Ezzat
END
27
CS 404
Ahmed Ezzat