Bottom-Up Parsing

Bottom-Up Parsing
1
Bottom-Up Parsing
• A bottom-up parser creates the parse tree of the given input starting
from leaves towards the root.
S  ...   (the right-most derivation of )
 (the bottom-up parser finds the right-most derivation in the reverse order)
• Bottom-up parsing is also known as SR parsing because its two main
actions are shift and reduce.
– At each shift action, the current symbol in the input string is pushed to a stack.
– At each reduction step, the symbols at the top of the stack (this symbol sequence is the right
side of a production) will replaced by the non-terminal at the left side of that production.
– There are also two more actions: accept and error.
2
Shift-Reduce Parsing
• A shift-reduce parser tries to reduce the given input string into the starting symbol.
a string

the starting symbol
reduced to
• At each reduction step, a substring of the input matching to the right side of a
production rule is replaced by the non-terminal at the left side of that production rule.
Rightmost Derivation:
S
*
rm
Shift-Reduce Parser finds:
S  ...  
rm
rm
3
Shift-Reduce Parsing -- Example
S  aABb
A  aA | a
B  bB | b
input string: aaabb
aaAbb
aAbb
aABb
S
 reduction
S rm
 aABb 
 aaAbb 
rm aAbb rm
rm aaabb
• How do we know which substring to be replaced at each reduction step?
4
Handle
• Informally, a handle of a string is a substring that matches the right side
of a production rule.
– But not every substring matches the right side of a production rule is handle
• Handle: It is a symbol or right side of any production rule, taken and
replaced by left side to a production to the start symbol
• If the grammar is unambiguous, then every right-sentential form of the
grammar has exactly one handle.
5
Handle Pruning
• A right-most derivation in reverse can be obtained by handle-pruning.
S=0 
 2 rm
 ... rm
 n-1 
rm 1 rm
rm n= 
input string
• Start from n, find a handle Ann in n,
and replace n in by An to get n-1.
• Then find a handle An-1n-1 in n-1,
and replace n-1 in by An-1 to get n-2.
• Repeat this, until we reach S.
6
A Shift-Reduce Parser
E  E+T | T
T  T*F | F
F  (E) | id
Right-Most Derivation of id+id*id
E  E+T  E+T*F  E+T*id  E+F*id
 E+id*id  T+id*id  F+id*id  id+id*id
Right-Most Sentential Form
Reducing Production
id+id*id
F  id
F+id*id
TF
T+id*id
ET
E+id*id
F  id
E+F*id
TF
E+T*id
F  id
E+T*F
T  T*F
E+T
E  E+T
E
Handles are red and underlined in the right-sentential forms.
7
A Stack Implementation of A Shift-Reduce Parser
•
There are four possible actions of a shift-parser action:
1. Shift : The next input symbol is shifted onto the top of the
stack.
2. Reduce: Replace the handle on the top of the stack by the
non-terminal.
3. Accept: Successful completion of parsing.
4. Error: Parser discovers a syntax error, and calls an error
recovery routine.
•
•
Initial stack just contains only the end-marker $.
The end of the input string is marked by the end-marker $.
8
A Stack Implementation of A Shift-Reduce Parser
E  E+T | T ; T  T*F | F ; F  (E) | id RMD id+id*id ; E  E+T  E+T*F  E+T*id  E+F*id
 E+id*id  T+id*id  F+id*id  id+id*id
Stack
Input
Action
$
$id
$F
$T
$E
$E+
$E+id
$E+F
$E+T
$E+T*
$E+T*id
$E+T*F
$E+T
$E
id+id*id$
+id*id$
+id*id$
+id*id$
+id*id$
id*id$
*id$
*id$
*id$
id$
$
$
$
$
shift
reduce by F  id
reduce by T  F
reduce by E  T
shift
shift
reduce by F  id
reduce by T  F
shift
shift
reduce by F  id
reduce by T  T*F
reduce by E  E+T
accept
Parse Tree
E 8
E 3
+
T 7
T 2
T 5
*
F 1
F 4
id
id
F6
id
9
Shift-Reduce Parsers
•
There are two main categories of shift-reduce parsers
1. LR-Parsers
– covers wide range of grammars.
• SLR – simple LR parser
• LR – most general LR parser
• LALR – intermediate LR parser (look ahead LR parser)
– SLR, LR and LALR work same, only their parsing tables are different.
2. Operator-Precedence Parser
–
simple, but only a small class of grammars.
10
LR Parsers
• The most powerful shift-reduce parsing (yet efficient) is:
LR parsing.
left to right
scanning
right-most
derivation
• LR parsing is attractive because:
– LR parsing is most general non-backtracking shift-reduce parsing, yet it is still efficient.
– An LR-parser can detect a syntactic error as soon as it is possible to do so a left-to-right
scan of the input.
11
LR Parsing Algorithm
input a1
... ai
... an
$
stack
Sm
Xm
LR Parsing Algorithm
Sm-1
output
Xm-1
.
.
Action Table
S1
X1
S0
Goto Table
terminals and $
s
t
a
t
e
s
four different
actions
non-terminal
s
t
a
t
e
s
each item is
a state number
12
LR – Parsing Algorithm
Step 1 : The stack is initialized with [0]. The current state will always be the state that is at
the top of the stack.
Step 2: Given the current state and the current terminal on the input stream an action is
looked up in the action table. There are four cases:
– a shift sn:
• the current terminal is removed from the input stream
• the state n is pushed onto the stack and becomes the current state
– a reduce rm:
• the number m is written to the output stream
• for every symbol in the right-hand side of rule m a state is removed from the
stack
– an accept: the string is accepted
– no action: a syntax error is reported
Step 3 : Step 2 is repeated until either the string is accepted or a syntax error is
reported.
13
SLR parsing
• A problem with LL(1) parsing is that most grammars need extensive
rewriting to get them into a form that allows unique choice of
production.
• A class of bottom-up methods for parsing called LR parsers exist which
accept a much larger class of grammars.
• The main advantage of LR parsing is that less rewriting is required to
get a grammar in acceptable form, also there are languages for which
there exist LR-acceptable grammars but no LL(1) grammars.
• We start our discussion with SLR for the following reasons:
– It is simpler.
– In practice, LALR(1) handles few grammars that are not also handled by SLR.
– When a grammar is in the SLR class, the parse-tables produced by SLR are identical to those produced
by LALR(1).
– Understanding of SLR principles is sufficient to know how a grammar should be rewritten when a
LALR(1) parser generator rejects it.
14
SLR parsing
• The letters SLR stand for Simple, Left and Right. Left input is read
from left to right & Right  a RMD is built.
• LR parsers are table-driven bottom-up parsers and use two kinds of
actions involving the input stream and a stack:
– shift: A symbol is read from the input and pushed on the stack.
– reduce: On the stack, a number of symbols that are identical to the right-hand side of a
production are replaced by the left-hand side of that production.
• When all of the input is read, the stack will have a single element, which
will be the start symbol of the grammar.
• Our aim is to make the choice of action depend only on the next input
symbol and the symbol on top of the stack. To achieve this, we
construct a DFA.
15
SLR parsing
• Conceptually, this DFA reads the contents of the stack, starting from the
bottom.
• If the DFA is in an accepting state when it reaches the top of the stack,
it will cause reduction by a production that is determined by the state
and the next input symbol.
• If the DFA is not in an accepting state, it will cause a shift.
• Letting the DFA read the entire stack at every action is not very
efficient, so, instead, we keep track of the DFA state every time we push
an element on the stack, storing the state as part of the stack element.
16
SLR parsing
• We represent the DFA as a table, where we cross-index a DFA state
with a symbol (terminal or nonterminal) and find one of the following
actions:
–
–
–
–
–
shift n: Read next input symbol, push state n on the stack.
go n: Push state n on the stack.
reduce p: Reduce with the production numbered p.
accept: Parsing has completed successfully.
error: A syntax error has been detected.
• Note that the current state is always found at the top of the stack. Shift
and reduce actions are found when a state is cross-indexed with a
terminal symbol.
• Go actions are found when a state is cross-indexed with a nonterminal.
17
Constructing SLR parse tables
• An SLR parse table has as its core a DFA.
• We first construct an NFA using a new techniques to convert this into a
DFA.
• We study the procedure by considering the grammar below.
18
Constructing SLR parse tables
• The first step is to extend the grammar with a new starting production.
• Doing this the grammar above yields the grammar below.
19
Constructing SLR parse tables
• The next step is to make an NFA for each production. This is done by
treating both terminals and non terminals as alphabet symbols.
• The accepting state of each NFA is labeled with the number of the
corresponding production.
NFAs for the productions of the grammar above
20
Constructing SLR parse tables
• The NFAs in the figure above make transitions both on terminals and
non terminals.
• Transitions by terminal corresponds to shift actions and transitions on
non terminals correspond to go actions.
• A go action happens after a reduction, whereby some elements of the
stack (corresponding to the right-hand side of a production) are replaced
by a nonterminal (corresponding to the left-hand side of that
production).
21
Constructing SLR parse tables
• To achieve this we must, whenever a transition by a nonterminal is
possible, also allow transitions on the symbols on the right-hand side of
a production for that nonterminal so these eventually can be reduced to
the nonterminal.
• We do this by adding epsilon-transitions to the NFAs.
• Whenever there is a transition from state s to state t on a nonterminal N,
we add epsilon-transitions from s to the initial states of all the NFAs for
productions with N on the left-hand side.
• The epsilon-transitions are shown in table below.
22
Constructing SLR parse tables
• Together with epsilon-transitions, the NFAs form a single, combined NFA.
• This NFA has the starting state A (the starting state of the NFA for the added start
production) and an accepting state for each production in the grammar.
23
Conversion of NFA to DFA
• We must now convert this NFA into a DFA using the
subset construction.
ε-closure (A) = {A, C, E, I, J}
ε-closure (B) = {B}
ε-closure (C) = {C, I, J}
ε-closure (D) = {D}
ε-closure (E) = { }={E}
ε-closure (F) = {C, E, F, I, J}
ε-closure (G) = { } ={ G}
ε-closure (H) = {H}
ε-closure (I) = {I}
ε-closure (J) = { } ={J}
ε-closure (K) = {K, I, J}
ε-closure (L) = {L}
24
NFA to DFA conversion
T along with S & empty
a
b
c
T
R
0
{A, C, E, I, J}
{C, E, F, I, J}
{K, I, J}
∮
{B}
{D}
1
{B}
∮
∮
∮
∮
∮
2
{D}
∮
∮
∮
∮
∮
3
{C, E, F, I, J}
{C, E, F, I, J}
{K, I, J}
∮
{G}
{D}
4
{K, I, J}
∮
{K, I, J}
∮
∮
{L}
5
{G}
∮
∮
{H}
∮
∮
6
{L}
∮
∮
∮
∮
∮
7
{H}
∮
∮
∮
∮
∮
25
DFA
26
Table representation
• Instead of showing the resulting DFA graphically, we construct a table
where transitions on terminals are shown as shift actions and transitions
on non terminals as go actions.
• The table below shows the DFA constructed from the NFA made by
adding epsilon-transitions
27
Constructing SLR parse tables
SLR DFA for the grammar above
28
Operator-Precedence Parser
• Operator grammar
– small, but an important class of grammars
– we may have an efficient operator precedence parser
(a shift-reduce parser) for an operator grammar.
• In an operator grammar, no production rule can have:
–  at the right side
– two adjacent non-terminals at the right side.
• Ex:
EAB
Aa
Bb
not operator grammar
EEOE
Eid
O+|*|/
not operator grammar
EE+E |
E*E |
E/E | id
operator grammar
Operator – Precedence Parser (OPP)
• Bottom-up parsers for a class of CFG can be easily developed using
operator grammars.
• Operator grammars have the property that no production right side
is empty or has two adjacent non terminals.
These parser rely on the following three precedence relations:
Relation
Meaning
a <b
a yields precedence to b
a = b
a has the same precedence as b
a > b
a takes precedence over b
30
OPP
• There are two common ways of determining precedence relation hold
between a pair of terminals.
1. Based on associativity and precedence of operators
2. Using operator precedence relation.
• For Ex, * have higher precedence than +. We make + < * and * > +
31
Opr
$
$
$+
$+
$+*
$+*
$+
$
Val
$
$id
$id
$id id
$id id
$id id id
$id id2
$id + id2
Input String
id+id*id$
+id*id$
id*id$
*id$
id$
$
$
$
Action
Shift <
Shift
Shift
Shift
Shift
Reduce >
Reduce
Accept
32
Disadvantages of Operator Precedence Parsing
• Disadvantages:
– It is difficult to handle operators (like - unary minus)which have different
precedence (the lexical analyzer should handle the unary minus).
– Small class of grammars.
– Difficult to decide which language is recognized by the grammar.
• Advantages:
– Simple and Easy to implement
– powerful enough for expressions in programming languages
– Can be constructed by hand after understanding the grammar.
– Simple to debug

Download Report

Bottom-Up Parsing

Paperzz.com

Your Paperzz