recitation4sec5

CS314 – Section 5
Recitation 4
Long Zhao ([email protected])
CFGs, Parse Tree
LL(1) Grammars
Slides available at http://www.ilab.rutgers.edu/~lz311/CS314
Context-Free Grammars
• Context-free grammars consist of:
• Set of symbols:
• terminals that denotes token types
• non-terminals that denotes a set of strings
• Start symbol
• Rules:
symbol ::= symbol symbol ... symbol
• left-hand side: non-terminal
• right-hand side: terminals and/or non-terminals
• rules explain how to rewrite non-terminals (beginning
with start symbol) into terminals
Context-Free Grammars
A string is in the language of the CFG if and only if it
is possible to derive that string using the
following non-deterministic procedure:
1.
2.
3.
begin with the start symbol
while any non-terminals exist, pick a non-terminal and
rewrite it using a rule <== could be many choices here
stop when all you have left are terminals (and check you
arrived at the string your were hoping to)
Parsing is the process of checking that a string is in
the CFG for your programming language. It is
usually coupled with creating an abstract syntax
tree.
non-terminals:
S, E, Elist
terminals:
ID, NUM, PRINT, +, :=, (, ), ;
rules:
S ::= S; S
S ::= ID := E
S ::= PRINT ( Elist )
E ::= ID
E ::= NUM
E ::= E + E
E ::= ( S , Elist )
Elist ::= E
Elist ::= Elist , E
non-terminals:
S, E, Elist
terminals:
ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
4.
5.
6.
7.
E ::= ID
E ::= NUM
E ::= E + E
E ::= ( S , Elist )
8. Elist ::= E
9. Elist ::= Elist , E
Derive me!
ID = NUM ; PRINT ( NUM )
non-terminals:
S, E, Elist
terminals:
ID, NUM, PRINT, +, :=, (, ), ;
rules:
4.
5.
6.
7.
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
E ::= ID
E ::= NUM
E ::= E + E
E ::= ( S , Elist )
S
ID = NUM ; PRINT ( NUM )
8. Elist ::= E
9. Elist ::= Elist , E
Derive me!
non-terminals:
S, E, Elist
terminals:
ID, NUM, PRINT, +, :=, (, ), ;
rules:
4.
5.
6.
7.
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
E ::= ID
E ::= NUM
E ::= E + E
E ::= ( S , Elist )
S
ID = E
ID = NUM ; PRINT ( NUM )
8. Elist ::= E
9. Elist ::= Elist , E
Derive me!
non-terminals:
S, E, Elist
terminals:
ID, NUM, PRINT, +, :=, (, ), ;
rules:
4.
5.
6.
7.
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
E ::= ID
E ::= NUM
E ::= E + E
E ::= ( S , Elist )
Derive me!
S
ID = E
oops,
can’t make
progress
8. Elist ::= E
9. Elist ::= Elist , E
E
ID = NUM ; PRINT ( NUM )
non-terminals:
S, E, Elist
terminals:
ID, NUM, PRINT, +, :=, (, ), ;
rules:
4.
5.
6.
7.
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
E ::= ID
E ::= NUM
E ::= E + E
E ::= ( S , Elist )
S
ID = NUM ; PRINT ( NUM )
8. Elist ::= E
9. Elist ::= Elist , E
Derive me!
non-terminals:
S, E, Elist
terminals:
ID, NUM, PRINT, +, :=, (, ), ;
rules:
4.
5.
6.
7.
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
E ::= ID
E ::= NUM
E ::= E + E
E ::= ( S , Elist )
S
S;S
ID = NUM ; PRINT ( NUM )
8. Elist ::= E
9. Elist ::= Elist , E
Derive me!
non-terminals:
S, E, Elist
terminals:
ID, NUM, PRINT, +, :=, (, ), ;
rules:
4.
5.
6.
7.
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
E ::= ID
E ::= NUM
E ::= E + E
E ::= ( S , Elist )
S
S;S
ID := E ; S
ID = NUM ; PRINT ( NUM )
8. Elist ::= E
9. Elist ::= Elist , E
Derive me!
non-terminals:
S, E, Elist
terminals:
ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
4.
5.
6.
7.
E ::= ID
E ::= NUM
E ::= E + E
E ::= ( S , Elist )
S
S;S
ID = E ; S
ID = NUM ; S
ID = NUM ; PRINT ( Elist )
ID = NUM ; PRINT ( E )
ID = NUM ; PRINT ( NUM )
8. Elist ::= E
9. Elist ::= Elist , E
Derive me!
non-terminals:
S, E, Elist
terminals:
ID, NUM, PRINT, +, :=, (, ), ;
rules:
1. S ::= S; S
2. S ::= ID := E
3. S ::= PRINT ( Elist )
S
S;S
ID = E ; S
ID = NUM ; S
ID = NUM ; PRINT ( Elist )
ID = NUM ; PRINT ( E )
ID = NUM ; PRINT ( NUM )
left-most derivation
4.
5.
6.
7.
E ::= ID
E ::= NUM
E ::= E + E
E ::= ( S , Elist )
8. Elist ::= E
9. Elist ::= Elist , E
S
S;S
S ; PRINT ( Elist )
S ; PRINT ( E )
S ; PRINT ( NUM )
ID = E ; PRINT ( NUM )
ID = NUM ; PRINT ( NUM )
right-most derivation
Another way to
derive the
same string
Parse Trees
• Representing derivations as trees
• useful in compilers: Parse trees correspond quite closely
(but not exactly) with abstract syntax trees we’re trying
to generate
• difference: abstract syntax vs concrete (parse) syntax
• each internal node is labeled with a non-terminal
• each leaf node is labeled with a terminal
• each use of a rule in a derivation explains how to
generate children in the parse tree from the
parents
Parse Trees
• Example:
S
S;S
ID = E ; S
ID = NUM ; S
ID = NUM ; PRINT ( Elist )
ID = NUM ; PRINT ( E )
ID = NUM ; PRINT ( NUM )
S
S
ID
:=
;
E
NUM
S
PRINT
(
L
E
NUM
)
Parse Trees
• Example: 2 derivations, but 1 tree
S
S;S
ID = E ; S
ID = NUM ; S
ID = NUM ; PRINT ( Elist )
ID = NUM ; PRINT ( E )
ID = NUM ; PRINT ( NUM )
S
S;S
S ; PRINT ( Elist )
S ; PRINT ( E )
S ; PRINT ( NUM )
ID = E ; PRINT ( NUM )
ID = NUM ; PRINT ( NUM )
S
S
ID
:=
;
E
NUM
S
PRINT
( Elist
E
NUM
)
Parse Trees
• parse trees have meaning.
• order of children, nesting of subtrees is significant
S
S
S
ID
:=
E
NUM
S
S
;
PRINT
(
L
)
PRINT
(
S
;
L
E
E
NUM
NUM
)
ID
:=
E
NUM
Ambiguous Grammars
• a grammar is ambiguous if the same sequence of
tokens can give rise to two or more parse trees
Ambiguous Grammars
characters: 4 + 5 * 6
tokens:
NUM(4) PLUS NUM(5) MULT NUM(6)
E
non-terminals:
E
terminals:
ID
NUM
PLUS
MULT
E ::= ID
| NUM
|E+E
|E*E
E
+
E
E
NUM(4)
NUM(5)
*
E
NUM(6)
Ambiguous Grammars
characters: 4 + 5 * 6
tokens:
NUM(4) PLUS NUM(5) MULT NUM(6)
E
non-terminals:
E
E
terminals:
ID
NUM
PLUS
MULT
+
E
E
NUM(4)
E
*
NUM(6)
NUM(5)
E
E ::= ID
| NUM
|E+E
|E*E
E
E
NUM(4)
+
*
E
E
NUM(6)
NUM(5)
Ambiguous Grammars
• problem: compilers use parse trees to interpret the
meaning of parsed expressions
• different parse trees have different meanings
• eg: (4 + 5) * 6 is not 4 + (5 * 6)
• languages with ambiguous grammars are DISASTROUS; The
meaning of programs isn’t well-defined! You can’t tell what
your program might do!
Parse Tree (Exercise)
• Given the following grammar, show the
parse tree and the abstract syntax tree for
the expression (1 + (5 * 4)):
<expr> ::= ( <expr> <op> <expr> ) | <const>
<op> ::= + | - | * | /
<const> ::= 0 | 1 | 2 | ...
<expr> ::= ( <expr> <op> <expr> ) | <const>
<op> ::= + | - | * | /
<const> ::= 0 | 1 | 2 | ...
(1 + (5 * 4)) LM
<expr>
( <expr> <op> <expr> )
( <const> <op> <expr> )
( <const> <op> <expr> )
( 1 <op> <expr> )
( 1 + <expr> )
( 1 + ( <expr> <op> <expr> ) )
( 1 + (<const> <op> <expr> ) )
( 1 + ( 5 <op> <expr> ) )
( 1 + ( 5 * <expr> ) )
(
( 1 + ( 5 * < const > ) )
(1+(5*4))
(
<expr>
<op>
<expr>
<expr>
<expr> )
( <expr>
<const>
)
<op>
+
<const>
1
<const>
*
5
4
+
1
)
*
(
5
4
)
<expr> ::= ( <expr> <op> <expr> ) | <const>
<op> ::= + | - | * | /
<const> ::= 0 | 1 | 2 | ...
(1 + (5 * 4)) RM
(
<expr>
<op>
<expr>
<expr>
<expr> )
( <expr>
<expr>
( <expr> <op> <expr> )
1
( <expr> <op> ( <expr> <op> <expr> ) )
( <expr> <op> ( <expr> <op> <const> ) )
( <expr> <op> ( <expr> <op> 4 ) )
( <expr> <op> ( <expr> * 4 ) )
( <expr> <op> ( <const> * 4 ) )
( <expr> <op> ( 5 * 4 ) )
(
( <expr> + ( 5 * 4 ) )
( <const> + ( 5 * 4 ) )
(1+(5*4))
<op>
+
<const>
)
<const>
<const>
*
5
4
+
1
)
*
(
5
4
)
LL(1) Grammars
• Recursive Descent Parsing
• top-down parsing
• simple, efficient
• can be coded by hand in quickly
• parses many, but not all CFGs
• parses LL(1) grammars
• Left-to-right parse; Leftmost-derivation; 1 symbol lookahead
• key ideas:
• one recursive function for each non terminal
• each production becomes one clause in the function
LL(1) Grammars
• For any two productions A ::= α | β with α ∈ (T ∪ N) ∗
and β ∈ (T ∪ N) ∗ , we would like a distinct way of
choosing the correct production to expand.
• For α ∈ (T ∪ N) ∗ , define FIRST (α) as the set of tokens
that appear as the first token in some string derived
from α.
• For a non-terminal A, define FOLLOW (A) as the set of
terminals that can appear immediately to the right of A
in some sentential form.
LL(1) Grammars
• Define FIRST+(δ) for rule A ::= δ
• FIRST(δ) - { ϵ } ∪ Follow(A), if ϵ ∈ FIRST(δ)
• FIRST(δ) otherwise
• A grammar is LL(1) iff (A ::= α and A ::= β) implies
FIRST+(α) ∩ FIRST+(β) = ∅
Computing First Sets
• Compute First(X):
• initialize:
• if T is a terminal symbol then First (T) = {T}
• if T is non-terminal then First(T) = { }
• while First(X) changes (for any X) do
• for all X and all rules (X:= ABC...) do
• First (X) := First(X) U First (ABC...)
where First(ABC...) := F1 U F2 U F3 U ... and
• F1 := First (A)
• F2 := First (B), if A is Nullable; emptyset otherwise
• F3 := First (C), if A is Nullable & B is Nullable; emp...
• ...
Computing Follow Sets
• Follow(X) is computed iteratively
• base case:
• initially, we assume nothing in particular
follows X
• (when computing, Follow (X) is initially { })
• inductive case:
• if (Y := s1 X s2) for any strings s1, s2 then
• Follow (X) = First (s2)
• if (Y := s1 X s2) for any strings s1, s2 then
• Follow (X) = Follow(Y), if s2 is Nullable
Computing First & Follow Sets
FIRST
S ::= ABCDE
S
A ::= a|ϵ
A
B ::= b|ϵ
B
C ::= c
C
D ::= d|ϵ
D
E ::= e|ϵ
A
FOLLOW
Computing First & Follow Sets
FIRST
S ::= ABCDE
S
A ::= a|ϵ
A
{ a, ϵ }
B ::= b|ϵ
B
{ b, ϵ }
C ::= c
C
{c}
D ::= d|ϵ
D
{ d, ϵ }
E ::= e|ϵ
A
{ e, ϵ }
FOLLOW
Computing First & Follow Sets
FIRST
S ::= ABCDE
S
{ a, b, c }
A ::= a|ϵ
A
{ a, ϵ }
B ::= b|ϵ
B
{ b, ϵ }
C ::= c
C
{c}
D ::= d|ϵ
D
{ d, ϵ }
E ::= e|ϵ
A
{ e, ϵ }
FOLLOW
Computing First & Follow Sets
FIRST
FOLLOW
S ::= ABCDE
S
{ a, b, c }
{ EOF }
A ::= a|ϵ
A
{ a, ϵ }
{ b, c }
B ::= b|ϵ
B
{ b, ϵ }
{c}
C ::= c
C
{c}
{ d, e, EOF }
D ::= d|ϵ
D
{ d, ϵ }
{ e, EOF }
E ::= e|ϵ
A
{ e, ϵ }
{ EOF }
Computing First & Follow Sets
FIRST
S ::= ACB|CbB|Ba
S
A ::= da|BC
A
B ::= g|ϵ
B
C ::= h|ϵ
C
FOLLOW
Computing First & Follow Sets
FIRST
S ::= ACB|CbB|Ba
S
A ::= da|BC
A
{d}
B ::= g|ϵ
B
{ g, ϵ }
C ::= h|ϵ
C
{ h, ϵ }
FOLLOW
Computing First & Follow Sets
FIRST
S ::= ACB|CbB|Ba
S
{ d, g, h, ϵ, b, a }
A ::= da|BC
A
{ d, g, h, ϵ }
B ::= g|ϵ
B
{ g, ϵ }
C ::= h|ϵ
C
{ h, ϵ }
FOLLOW
Computing First & Follow Sets
FIRST
FOLLOW
S ::= ACB|CbB|Ba
S
{ d, g, h, ϵ, b, a }
{ EOF }
A ::= da|BC
A
{ d, g, h, ϵ }
{ h, g, EOF }
B ::= g|ϵ
B
{ g, ϵ }
{ EOF, a, h, g }
C ::= h|ϵ
C
{ h, ϵ }
{ g, EOF, b, h }
Computing First & Follow Sets
Given the follow rules, compute the First and Follow
Sets of all non-terminal symbols:
• S ::= Bb|Cd, B ::= aB|ϵ, C ::= cC|ϵ
• S ::= aBDh, B ::= cC, C ::= bC|ϵ, D ::= EF, E ::= g|ϵ, F
::= f|ϵ
Computing First & Follow Sets
FIRST
S ::= Bb|Cd
S
B ::= aB|ϵ
B
{ a, ϵ }
C ::= cC|ϵ
C
{ c, ϵ }
FOLLOW
Computing First & Follow Sets
FIRST
S ::= Bb|Cd
S
{ a, b, c, d }
B ::= aB|ϵ
B
{ a, ϵ }
C ::= cC|ϵ
C
{ c, ϵ }
FOLLOW
Computing First & Follow Sets
FIRST
FOLLOW
S ::= Bb|Cd
S
{ a, b, c, d }
{ EOF }
B ::= aB|ϵ
B
{ a, ϵ }
{b}
C ::= cC|ϵ
C
{ c, ϵ }
{d}
Computing First & Follow Sets
FIRST
S ::= aBDh
S
{a}
B ::= cC
B
{c}
C ::= bC|ϵ
C
{ b, ϵ }
D ::= EF
D
E ::= g|ϵ
E
{ g, ϵ }
F ::= f|ϵ
F
{ f, ϵ }
FOLLOW
Computing First & Follow Sets
FIRST
S ::= aBDh
S
{a}
B ::= cC
B
{c}
C ::= bC|ϵ
C
{ b, ϵ }
D ::= EF
D
{ g, f, ϵ }
E ::= g|ϵ
E
{ g, ϵ }
F ::= f|ϵ
F
{ f, ϵ }
FOLLOW
Computing First & Follow Sets
FIRST
FOLLOW
S ::= aBDh
S
{a}
{ EOF }
B ::= cC
B
{c}
{ g, f, h }
C ::= bC|ϵ
C
{ b, ϵ }
{ g, f, h }
D ::= EF
D
{ g, f, ϵ }
{h}
E ::= g|ϵ
E
{ g, ϵ }
{ f, h }
F ::= f|ϵ
F
{ f, ϵ }
{h}