CS 404
Introduction to Compiler Design
Lecture 3
Ahmed Ezzat
Top-Down Parsing LL(1)
1
CS 404
Ahmed Ezzat
Review of Context Free Grammars
2
Context-free-language (CFL): A language L is CF if there exists
CFG such that L = L(G).
Every regular language (can be generated by regular grammar) is
a subclass of CFL
CFG generates a language rather
than regular expressions
CFL
CFG Can describe the syntax of
most programming languages
Regular
Good at nested structures
Can be efficiently implemented
Can guide parser generation
CS 404
Ahmed Ezzat
Tasks Cannot be done by CFG
Wait until “semantic analysis,” i.e., needs
to be done first:
–
–
–
3
Match name uses against declarations
Verify function called with right number of
arguments
Type checking in expressions
CS 404
Ahmed Ezzat
Write a Parser for Language L
1.
2.
3.
4
Write a CFG for L (e.g., C, C++) and verify that G accepts
all strings in L
Eliminate ambiguity (no formal rules)
Eliminate left recursion, e.g., special case of recursion
where a string is recognized as part of a language by the
fact that it decomposes into a string from that language on
the left (non-terminal is left recursive) and a suffix on the
right
AAα
Where A is a nonterminal and α is a string of grammar
symbols.
CS 404
Ahmed Ezzat
Write a Parser for Language L
4.
Eliminate left factoring: removing the common factor that
appears in 2 productions of the same non-terminal, i.e., to
avoid back tracking by the parser.
Example: A qB | qC
where A, B, C are non-terminals and q is a sentence.
In this case the parser will get confused as to which of the 2
production rules to choose. After left factoring the grammar is
converted to:
A qD
DB|C
No ambiguity on the right production rule
5
CS 404
Ahmed Ezzat
Parsing Approaches (Top-down)
6
Syntax Analysis phase of a compiler verified that the sequence of tokens
extracted by the parser represents a valid sentence in the grammar of the
programming language.
There are 2 major parsing approaches:
Top-down: you start with the start symbol and apply production rules until
you arrive at the desired string
S AB
A aA | ϵ
B b | bB
Prove that the string aaab complies with the above grammar?
S
AB
S AB
aAB
A aA
aaAB
A aA
aaaAB
A aA
aaaϵB
Aϵ
aaab
Bb
CS 404
Ahmed Ezzat
Parsing Approaches (Bottom-up)
7
Bottom-up: start with the string and reduce it to the start
symbol, i.e., it works in reverse.
aaab
aaaϵb
(insert ϵ)
aaaAb
Aϵ
aaAb
A aA
aAb
A aA
Ab
A aA
AB
Bb
S
S AB
Handles larger set of grammars
CS 404
Ahmed Ezzat
Top-Down Parsing
8
A parser is top-down if it discovers a parse tree top to bottom:
A top-down parse corresponds to a preorder traversal of the parse tree
A leftmost derivation is applied at each derivation step
Top-down parsers come in 2 forms:
Predictive Parsers: Predict the production rule to be applied using
lookahead tokens
Backtracking Parsers: Will try different productions, backing up when a
parse fails.
Predictive parsers are much faster than backtracking ones
Predictive parsers operate in linear time – will be our focus
Backtracking parsers operate in exponential time – will not be considered.
Two kinds of top-down parsing techniques
Recursive-descent parsing (used to construct the syntax tree)
LLparsing
CS 404
Ahmed Ezzat
Top-Down Parsing
9
Start with grammar
Apply rules until generate desired sentence
Build parse tree down from root
Easy with simple grammars
Easily apply by hand
CS 404
Ahmed Ezzat
Top-down Parsing
Predictive: try to guess which production
rule to apply next, given
–
–
Two ways to do predictive parsing
–
–
10
The current non-terminal symbol
One or more ‘look-ahead’ terminal symbols
Use recursive procedures
Use a predictive parsing table
CS 404
Ahmed Ezzat
Top-down Parsing:
Construction of a Syntax Tree
11
Although recursive-descent is a top-down parsing technique …
The construction of the syntax tree for expressions is bottom up
Tracing verifies the precedence and associativity of operators
The tree construction of a – b + c * (b + d) is given below
ptr1 symtable.lookup(a)
ptr2 symtable.lookup(b)
ptr3 new node( ‘–’ , ptr1 , ptr2 )
ptr4 symtable.lookup(c)
ptr2 symtable.lookup(b)
ptr5 symtable.lookup(d)
ptr6 new node(‘+’ , ptr2 , ptr5 )
ptr7 new node(‘*’ , ptr4 , ptr6 )
ptr8 new node(‘+’ , ptr3 , ptr7 )
CS 404
Ahmed Ezzat
LL(1) Grammar
12
A restrict set of grammars with no need to backtrack
Uses an explicit stack rather than recursive calls to
perform parsing
LL(k) parsing means that k tokens of lookahead are used
LL(1):
L: scan input string from left to right
L: left-most derivation is applied at each step
1: one input symbol for lookahead
CS 404
Ahmed Ezzat
LL(1) Grammar
13
An LL parser consists of:
Parser stack that holds grammar symbol: nonterminals and tokens.
Parsing table that specifies the parser action
Driver function that interacts with parser stack,
parsing table and scanner
CS 404
Ahmed Ezzat
FIRST and FOLLOW sets
14
For terminal, non-terminal and a string of symbols
FIRST(α) contains any symbol that might begin a
sentence derived from α
If we have a rule X α , and “t” is in FIRST(α), and
we are looking at symbol t, then X α may be the
right rule to apply
CS 404
Ahmed Ezzat
Compute FIRST
15
If x is a terminal, then FIRST(x) = {x}
If xε, then add ε to FIRST(x)
If x is non-terminal and XY1Y2…Yk, then add z to
FIRST(x) if for some i, z is in FIRST(Yi) and ε is in
FIRST(Yj) for all j<i
CS 404
Ahmed Ezzat
Compute FIRST
16
Suppose we have the following grammar:
The RHS of the productions of S do not begin with terminals
Parser has no immediate guidance which production to apply to expand S
We may follow all possible derivations of S as shown below
SAa|Bb
ADc|CA
BdA|e
CfC|b
Dh|i
We predict S A a when
First token is h, i, f, or b. First(Aa) = {h, i, f, b}
We predict S B b when
First token is d or e. First(Bb) = {d, e}
Otherwise, we have an error
CS 404
Ahmed Ezzat
Use of FIRST(α)
17
If we have two rules Xα | β, we use FIRST(α) and
FIRST(β) to pick which rule
If t (lookahead) in FIRST(α) and not FIRST(β) , pick
Xα
If FIRST(α) and FIRST(β) share the same symbol,
cannot do predictive parsing
CS 404
Ahmed Ezzat
FOLLOW for non-terminal
18
FOLLOW(A) includes all symbols that could appear
immediately after A in a valid sentence
FOLLOW is used because FIRST alone still cannot
determine which rule in some cases
CS 404
Ahmed Ezzat
Compute FOLLOW
19
Suppose we have the following grammar
We follow derivations of S as shown below …
SAcB
AaA
Aϵ
BbBS
Bϵ
We predict A a A when
Next token is a because First(a A) = {a}
We predict A ϵ when
Next token is c because Follow(A) = {c}
Similarly, we predict B b B S when
Next token is b because First(b B S) = {b}
We predict B ϵ when
Next token is a, c, or $ (end-of-file token) because Follow(B) = {a, c, $}
CS 404
Ahmed Ezzat
Compute FOLLOW
20
Put $ in FOLLOW(S) ($ is called endmarker)
If AαBβ, then put FIRST(β) into FOLLOW(B)
If Aαβ, or AαBβ and βε, then put FOLLOW(A)
into FOLLOW(B)
CS 404
Ahmed Ezzat
Determine Predicate Set
21
The predict set of a production A α is defined as follows:
If a is NOT nullable then Predict(A α) = First(α)
If a is Nullable then Predict(A α) = (First(α) – {ϵ}) U Follow(A))
This is the set of lookahead tokens that will cause the selection of A α
Example on determining the predict set:
E TQ
Predict E T Q = First(TQ) = First(T) = {( , id}
Q +TQ
Predict Q + T Q = First(+TQ) = { + }
Q –TQ
Predict Q – T Q = First(–TQ) = { – }
Qϵ
Predict Q e = Follow(Q) = {$ , )}
TFR
Predict T F R = First(FR) = First(F) = {( , id}
R*FR
Predict R * F R = First(*FR) = { * }
R/FR
Predict R / F R = First(/FR) = { / }
Rϵ
Predict R e = Follow(R) = {+ , – , $ , )}
F(E)
Predict F ( E ) = { ( }
F id
Predict F id = { id }
CS 404
Ahmed Ezzat
Construct LL(1) Parsing Table
22
The predict sets can be represented in an LL(1) parse table
The rows are indexed by the nonterminals
The columns are indexed by the tokens
If A is a nonterminal and tok is the lookahead token then
Table[A][tok] indicates which production rule to predict
If no production rule can be used Table[A][tok] gives an error value
Table[A][tok] = A α iff tok Î predict(A α)
Example on constructing the LL(1) parsing table:
1: S A c B Predict(1) = {a, c}
2: A a A Predict(2) = {a}
3: A ϵ Predict(3) = {c}
4: B b B S Predict(4) = {b}
5: B ϵ Predict(5) = {$, a, c}
CS 404
Ahmed Ezzat
Use Parsing Table to Parse
Push $S into the stack, attach $ to the end of
the string. x is the stack top, a is the input
If x=a=$, success
If x=a<>$, pop x, advance input
If x is non-terminal
–
–
23
If M[x,a] = {xUVW}, replace x by WVU (U on
top)
If M[x,a] has no rule, error
CS 404
Ahmed Ezzat
LL(1) and Predictive Parsing
24
The parsing table of LL(1) grammar has no multiplydefined entries
No ambiguous or left recursive grammar can be
LL(1)
CS 404
Ahmed Ezzat
Parsing Errors
25
If top of stack is terminal, but no matching input
If top of stack is non-terminal, but no rules
CS 404
Ahmed Ezzat
Handing Parsing Errors
Report
–
–
Patch up
–
Insert missing symbols
Skip
–
–
–
26
Report expected vs found symbols
Fill the empty entries in the parse tree with error messages
To the next delimiter
Until find matching parenthesis
Until find }
CS 404
Ahmed Ezzat
END
27
CS 404
Ahmed Ezzat
Compute FIRST for a String
For α = X1X2…Xn
–
–
–
28
Add all non-ε symbols of FIRST(X1) to FIRST(α)
Add all non- ε symbols of FIRST(Xj) to FIRST(α) if
ε is in all FIRST(Xi) for i<j
Add ε to FIRST(α) if ε is in all FIRST(Xi) for all i
CS 404
Ahmed Ezzat
Predictive Parsing Table
For each production rule Aα
–
–
–
29
For each terminal a in FIRST(α), add Aα to
M[A,a]
If ε is in FIRST(α), add Aα to M[A,b] for each
terminal b in FOLLOW(A). (b can be $)
Unidentified entry of M are ‘error entries’
CS 404
Ahmed Ezzat
© Copyright 2026 Paperzz