Predictive Parsing
Find derivation for an input string,
Build a abstract syntax tree (AST)
– a representation of the parsed program
Build a symbol table
– Describe the program objects
Type
Location
Scope and access
Munge the AST
– Optimize
– Modify
Subscript checking
Communication interactions
Generate executable code
Terminology
Derivation
Derivation step
Sequence starting with the start symbol S and
proceeding through a sequence of derivation steps
A non-terminal on the left side of a derivation is replaced
by the right hand side of a production rule in the next step
of the derivation
A
if
A
,
is a production, and
are arbitrary strings of grammar symbols
Sentential Form
Any derivation step
Sentence
Sentential form with no non-terminals
Derivation example
Grammar
EE + E | E * E | ( E ) | - E | id
Simple derivation
E - E
Derivation of -(id+id)
E -E -(E) -(E+E)
-(id+E) -(id+id)
-(id+id)
E
1 2 … n
derives it
or simply
Select a nonterminal to replace and an alternative at each
step of a derivation
Leftmost Derivation
The
derivation
E -E -(E) -(E+E) -(id+E) -(id+id)
is leftmost which we designate as
E lm -E lm -(E) lm
-(E+E) lm -(id+E) lm
-(id+id)
A sentential form is a derivation step.
A leftmost derivation step is a left sentential
form, for example:
(Denoted *lm for typographical convenience)
lm
Leftmost Derivation
A derivation in which only the leftmost nonterminal in any sentential form is replaced at
each step.
Unique derivation for a string most of the time
Rightmost Derivation
The rightmost non-terminal is replaced in the
derivation process in each step.
Also referred to as Canonical Derivation
Right sentential form: a sentential form
produced via a rightmost derivation
– If S *, then is a sentential form of the CFG
– If S *rm, then is a right sentential form
Right Sentential Form
Grammar:
S aABe
A Abc | e
Bd
Reduce abbcde to S by four steps
abbcde
aAbcde
aAde
aABe
S
In reverse, it is
S rm aABe rm aAde rm aAbcde rm abbcde
– abbcde is a right-sentential form (replace b in position 2
by A)
– aAbcde is a right-sentential form (replace Abc in
position 2 by A)
Top Down Parsing
Top Down
For a given string, builds a parse tree from the start
symbol of the grammar, and grows towards the roots
Suitable grammar is LL(k)
Recursive descent parsing
Predictive parsing
– Leftmost processing, leftmost derivation, look at most
k symbols ahead
– Should be left-factored, and without immediate left
recursion
– May backtrack, for certain grammars
– Good error-detection and handling capabilities
– No backtracking
– Given the partial derivation and the leading terminal,
exactly one production rule is applicable
– Parsers include
Hands written recursive descent
Table-driven LL(k) parser
(for later)
Two:
Operator Precedence
Operator Precedence
Operator grammar
no right side of a production contains adjacent nonterminals
S EoE | id ,
o + | - | * | / (X)
If a grammar is an operator grammar and it
has no productions with null of the RHS, then
there is a operator-precedence parser for that
grammar
Special case of a shift-reduce parser
Precedence Grammars
Parse with shift/reduce
No production right side is , and no right side has
two adjacent nonterminals
bad
multi-precedence operator (-) difficult
can’t be sure parser matches the grammar!
only works for some grammars
good
simple, simple, simple
Build on non-reflexive precedence relations that we
denote as .> , , <. (typographical convenience for
dotted forms of <,=,> as in text)
Computing Precedence
Precedence is disjoint. Can have
a <. b
a <. b and a .> b
c <. b, c b, c .> b
is read “yields precedence” or “equal precedence”
Obtain precedence by manual assignment
using traditional associative and precedence
rules, or mechanically from nonambiguous
grammar
How to process
– “Ignore” nonterminals, and then delimit handle from
right side .> and then back up to the left side <.
Operator Precedence Parser
Remove (hide) nonterminals and place precedence
relationship between pairs of terminals
(1) Represent id + id * id as
$ <. id .> + <. id .> * <. id .> $
(2) Apply scanning method
a) scan from left toward right to find .>
b) backwards scan to <. or $
c) handle is located between start and end of scan
d) reduce the handle to the corresponding nonterminal
Relies on the grammar’s special form
(1) In grammar rule, no adjacent nonterminals on the right
hand-side (by definition), so no right sentential form will
have two non-terminals
(2) Form is 0a11...ann
i is nonterminal or
ai is a nonterminal
Bottom Up Parsing (shift-reduce)
Bottom Up
What
– For a given string, builds a parse tree starting at the
leaves and reaches the root.
– Known as shift-reduce parsing
– Rightmost derivation in reverse
How
– Start with the given input string (I.e., program)
– Find a substring that matches the right side of a
production
– Substitute left-side of grammar rule when right-side
matches
Terminology
– A handle is a substring that matches the right hand
side of a production
– Handles will be reduced to the left-hand side of the
matching production
Handles
Handle
– Substring that matches the right side of a production
– When reduced (to the left side) , this represents one
step along the reverse of a rightmost derivation
Handle Pruning
– Replacing a handle by the left hand side
Implementation
Stack is a good data type to implement a
shift-reduce parser
– The stack contains the grammar symbols
Nonterminals recognized from the input
Terminals not yet recognized as belonging to any
production
– An input tape contains the input string
The parser
– Shift next input symbol from the input tape and into
the stack
– Keep shifting until a handle appears at top of stack
– Reduce when a handle appears on top of stack
– Terminate when the stack contains S and the input
tape is null
LR(k) Grammar
LR grammar if it can be recognized by a
bottom-up parser on a left to right scan can
recognize handles when they appear on the
top of the stack
LR(k) – left to right scanning
LR(k) -- rightmost derivation in reverse
Supports non-backtracking shift-reduce
parsing method. Deterministic parsing!
Too hard to construct parse-table by hand (so
use a parser-generator)
LR Parsing Algorithm
Parsing program
Parsing table
Input tape
Stack
Output tape
Parsing Table
Action[]
What to do when a symbol appears on the tape
Action[State, input symbol]
Shift, reduce, accept, reject
Goto[]
A finite automaton that can recognize a handle on the
top of the stack by scanning the stack top to bottom
Goto[State, grammar symbol] State
Whats so great about
LR Grammars?
“The LR requirement is that we be able to
recognize the occurrence of the right side of a
production, having seen what is derived from
that side. This is far less stringent than the
requirement for predictive parsing, namely
that we be able to recognize the apparent use
of the production seeing only the first symbol
that it derives” [AU77:202]
The state symbol on top of the stack contains
all the information the parser needs
The Underlying Basis of LR
(a Lasting Relationship)
Whereas the nondeterministic nPDA is
equivalent to the CFG, the deterministic dPDA
is only equivalent to a subclass of
deterministic CFL
Every LR(k) grammar generates a
deterministic CFL
Every deterministic CFL has an LR(1) grammar
No ambiguous grammar can be LR(k) for any k
However, it remains undecidable whether a
CFG G is ambiguous
LR Parsing Algorithm
A Parsing Program
A Parsing Table
An input Tape
A stack
An output Tape
Parsing Table
Action
– (State, input symbol)
– Shift, reduce, accept, reject
Goto
– A finite automaton that can recognize a handle on the
top of the stack by scanning the stack top to bottom
– (State, grammar symbol)
– State
LR Algorithm
action[sm,ai] {shift s, reduce a,
accept, error }
goto[s, a] takes state and grammar symbol
and produces a state
It is the transition function of a DFA on viable prefixes of G
Viable prefix is a prefix that can appear on the stack during
a rightmost derivation
Progresses through configurations
Configuration – right sentential forms with states intermixed
Written as pairs – (stack contents, unexpended input)
To obtain the next move
– read a (current input)
– Look at sm (state at top-of-stack)
– consult action[]
LR Actions
action[sm,a] = shift s
execute a shift move:
(s0X1s1X2s2 .. Xmsm, ai ai+1 .. an$)
shifted current input ai and next state s
off of input and onto stack
(s0X1s1X2s2 .. Xmsmais, ai+1 .. an$)
action[sm,a] = reduce a
execute a reduce move:
(s0X1s1X2s2 .. Xmsm, ai ai+1 .. an$)
popped 2r symbols off (r for state, r for grammar)
pushed A and s onto stack
no change in input
(s0X1s1X2s2 .. Xm-r sm-r A s, ai ai+1 .. an$)
where s=goto[sm-r,A], and r is length of (rule right-hand side)
See example 4.33, pages 218-220 for detail
Construction of LR parser table
SLR (Simple LR)
LALR (Look Ahead LR)
Canonical LR
Yet Another Compiler Compiler
Parser Generator
Converts a Context Free Grammar into a set
of tables for an automaton.
Generates LALR parser
This automaton executes the LALR(1) parser
algo.
© Copyright 2026 Paperzz