CS314 – Section 5 Recitation 4 Long Zhao ([email protected]) CFGs, Parse Tree LL(1) Grammars Slides available at http://www.ilab.rutgers.edu/~lz311/CS314 Context-Free Grammars • Context-free grammars consist of: • Set of symbols: • terminals that denotes token types • non-terminals that denotes a set of strings • Start symbol • Rules: symbol ::= symbol symbol ... symbol • left-hand side: non-terminal • right-hand side: terminals and/or non-terminals • rules explain how to rewrite non-terminals (beginning with start symbol) into terminals Context-Free Grammars A string is in the language of the CFG if and only if it is possible to derive that string using the following non-deterministic procedure: 1. 2. 3. begin with the start symbol while any non-terminals exist, pick a non-terminal and rewrite it using a rule <== could be many choices here stop when all you have left are terminals (and check you arrived at the string your were hoping to) Parsing is the process of checking that a string is in the CFG for your programming language. It is usually coupled with creating an abstract syntax tree. non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: S ::= S; S S ::= ID := E S ::= PRINT ( Elist ) E ::= ID E ::= NUM E ::= E + E E ::= ( S , Elist ) Elist ::= E Elist ::= Elist , E non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) 4. 5. 6. 7. E ::= ID E ::= NUM E ::= E + E E ::= ( S , Elist ) 8. Elist ::= E 9. Elist ::= Elist , E Derive me! ID = NUM ; PRINT ( NUM ) non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 4. 5. 6. 7. 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) E ::= ID E ::= NUM E ::= E + E E ::= ( S , Elist ) S ID = NUM ; PRINT ( NUM ) 8. Elist ::= E 9. Elist ::= Elist , E Derive me! non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 4. 5. 6. 7. 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) E ::= ID E ::= NUM E ::= E + E E ::= ( S , Elist ) S ID = E ID = NUM ; PRINT ( NUM ) 8. Elist ::= E 9. Elist ::= Elist , E Derive me! non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 4. 5. 6. 7. 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) E ::= ID E ::= NUM E ::= E + E E ::= ( S , Elist ) Derive me! S ID = E oops, can’t make progress 8. Elist ::= E 9. Elist ::= Elist , E E ID = NUM ; PRINT ( NUM ) non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 4. 5. 6. 7. 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) E ::= ID E ::= NUM E ::= E + E E ::= ( S , Elist ) S ID = NUM ; PRINT ( NUM ) 8. Elist ::= E 9. Elist ::= Elist , E Derive me! non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 4. 5. 6. 7. 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) E ::= ID E ::= NUM E ::= E + E E ::= ( S , Elist ) S S;S ID = NUM ; PRINT ( NUM ) 8. Elist ::= E 9. Elist ::= Elist , E Derive me! non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 4. 5. 6. 7. 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) E ::= ID E ::= NUM E ::= E + E E ::= ( S , Elist ) S S;S ID := E ; S ID = NUM ; PRINT ( NUM ) 8. Elist ::= E 9. Elist ::= Elist , E Derive me! non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) 4. 5. 6. 7. E ::= ID E ::= NUM E ::= E + E E ::= ( S , Elist ) S S;S ID = E ; S ID = NUM ; S ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM ) 8. Elist ::= E 9. Elist ::= Elist , E Derive me! non-terminals: S, E, Elist terminals: ID, NUM, PRINT, +, :=, (, ), ; rules: 1. S ::= S; S 2. S ::= ID := E 3. S ::= PRINT ( Elist ) S S;S ID = E ; S ID = NUM ; S ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM ) left-most derivation 4. 5. 6. 7. E ::= ID E ::= NUM E ::= E + E E ::= ( S , Elist ) 8. Elist ::= E 9. Elist ::= Elist , E S S;S S ; PRINT ( Elist ) S ; PRINT ( E ) S ; PRINT ( NUM ) ID = E ; PRINT ( NUM ) ID = NUM ; PRINT ( NUM ) right-most derivation Another way to derive the same string Parse Trees • Representing derivations as trees • useful in compilers: Parse trees correspond quite closely (but not exactly) with abstract syntax trees we’re trying to generate • difference: abstract syntax vs concrete (parse) syntax • each internal node is labeled with a non-terminal • each leaf node is labeled with a terminal • each use of a rule in a derivation explains how to generate children in the parse tree from the parents Parse Trees • Example: S S;S ID = E ; S ID = NUM ; S ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM ) S S ID := ; E NUM S PRINT ( L E NUM ) Parse Trees • Example: 2 derivations, but 1 tree S S;S ID = E ; S ID = NUM ; S ID = NUM ; PRINT ( Elist ) ID = NUM ; PRINT ( E ) ID = NUM ; PRINT ( NUM ) S S;S S ; PRINT ( Elist ) S ; PRINT ( E ) S ; PRINT ( NUM ) ID = E ; PRINT ( NUM ) ID = NUM ; PRINT ( NUM ) S S ID := ; E NUM S PRINT ( Elist E NUM ) Parse Trees • parse trees have meaning. • order of children, nesting of subtrees is significant S S S ID := E NUM S S ; PRINT ( L ) PRINT ( S ; L E E NUM NUM ) ID := E NUM Ambiguous Grammars • a grammar is ambiguous if the same sequence of tokens can give rise to two or more parse trees Ambiguous Grammars characters: 4 + 5 * 6 tokens: NUM(4) PLUS NUM(5) MULT NUM(6) E non-terminals: E terminals: ID NUM PLUS MULT E ::= ID | NUM |E+E |E*E E + E E NUM(4) NUM(5) * E NUM(6) Ambiguous Grammars characters: 4 + 5 * 6 tokens: NUM(4) PLUS NUM(5) MULT NUM(6) E non-terminals: E E terminals: ID NUM PLUS MULT + E E NUM(4) E * NUM(6) NUM(5) E E ::= ID | NUM |E+E |E*E E E NUM(4) + * E E NUM(6) NUM(5) Ambiguous Grammars • problem: compilers use parse trees to interpret the meaning of parsed expressions • different parse trees have different meanings • eg: (4 + 5) * 6 is not 4 + (5 * 6) • languages with ambiguous grammars are DISASTROUS; The meaning of programs isn’t well-defined! You can’t tell what your program might do! Parse Tree (Exercise) • Given the following grammar, show the parse tree and the abstract syntax tree for the expression (1 + (5 * 4)): <expr> ::= ( <expr> <op> <expr> ) | <const> <op> ::= + | - | * | / <const> ::= 0 | 1 | 2 | ... <expr> ::= ( <expr> <op> <expr> ) | <const> <op> ::= + | - | * | / <const> ::= 0 | 1 | 2 | ... (1 + (5 * 4)) LM <expr> ( <expr> <op> <expr> ) ( <const> <op> <expr> ) ( <const> <op> <expr> ) ( 1 <op> <expr> ) ( 1 + <expr> ) ( 1 + ( <expr> <op> <expr> ) ) ( 1 + (<const> <op> <expr> ) ) ( 1 + ( 5 <op> <expr> ) ) ( 1 + ( 5 * <expr> ) ) ( ( 1 + ( 5 * < const > ) ) (1+(5*4)) ( <expr> <op> <expr> <expr> <expr> ) ( <expr> <const> ) <op> + <const> 1 <const> * 5 4 + 1 ) * ( 5 4 ) <expr> ::= ( <expr> <op> <expr> ) | <const> <op> ::= + | - | * | / <const> ::= 0 | 1 | 2 | ... (1 + (5 * 4)) RM ( <expr> <op> <expr> <expr> <expr> ) ( <expr> <expr> ( <expr> <op> <expr> ) 1 ( <expr> <op> ( <expr> <op> <expr> ) ) ( <expr> <op> ( <expr> <op> <const> ) ) ( <expr> <op> ( <expr> <op> 4 ) ) ( <expr> <op> ( <expr> * 4 ) ) ( <expr> <op> ( <const> * 4 ) ) ( <expr> <op> ( 5 * 4 ) ) ( ( <expr> + ( 5 * 4 ) ) ( <const> + ( 5 * 4 ) ) (1+(5*4)) <op> + <const> ) <const> <const> * 5 4 + 1 ) * ( 5 4 ) LL(1) Grammars • Recursive Descent Parsing • top-down parsing • simple, efficient • can be coded by hand in quickly • parses many, but not all CFGs • parses LL(1) grammars • Left-to-right parse; Leftmost-derivation; 1 symbol lookahead • key ideas: • one recursive function for each non terminal • each production becomes one clause in the function LL(1) Grammars • For any two productions A ::= α | β with α ∈ (T ∪ N) ∗ and β ∈ (T ∪ N) ∗ , we would like a distinct way of choosing the correct production to expand. • For α ∈ (T ∪ N) ∗ , define FIRST (α) as the set of tokens that appear as the first token in some string derived from α. • For a non-terminal A, define FOLLOW (A) as the set of terminals that can appear immediately to the right of A in some sentential form. LL(1) Grammars • Define FIRST+(δ) for rule A ::= δ • FIRST(δ) - { ϵ } ∪ Follow(A), if ϵ ∈ FIRST(δ) • FIRST(δ) otherwise • A grammar is LL(1) iff (A ::= α and A ::= β) implies FIRST+(α) ∩ FIRST+(β) = ∅ Computing First Sets • Compute First(X): • initialize: • if T is a terminal symbol then First (T) = {T} • if T is non-terminal then First(T) = { } • while First(X) changes (for any X) do • for all X and all rules (X:= ABC...) do • First (X) := First(X) U First (ABC...) where First(ABC...) := F1 U F2 U F3 U ... and • F1 := First (A) • F2 := First (B), if A is Nullable; emptyset otherwise • F3 := First (C), if A is Nullable & B is Nullable; emp... • ... Computing Follow Sets • Follow(X) is computed iteratively • base case: • initially, we assume nothing in particular follows X • (when computing, Follow (X) is initially { }) • inductive case: • if (Y := s1 X s2) for any strings s1, s2 then • Follow (X) = First (s2) • if (Y := s1 X s2) for any strings s1, s2 then • Follow (X) = Follow(Y), if s2 is Nullable Computing First & Follow Sets FIRST S ::= ABCDE S A ::= a|ϵ A B ::= b|ϵ B C ::= c C D ::= d|ϵ D E ::= e|ϵ A FOLLOW Computing First & Follow Sets FIRST S ::= ABCDE S A ::= a|ϵ A { a, ϵ } B ::= b|ϵ B { b, ϵ } C ::= c C {c} D ::= d|ϵ D { d, ϵ } E ::= e|ϵ A { e, ϵ } FOLLOW Computing First & Follow Sets FIRST S ::= ABCDE S { a, b, c } A ::= a|ϵ A { a, ϵ } B ::= b|ϵ B { b, ϵ } C ::= c C {c} D ::= d|ϵ D { d, ϵ } E ::= e|ϵ A { e, ϵ } FOLLOW Computing First & Follow Sets FIRST FOLLOW S ::= ABCDE S { a, b, c } { EOF } A ::= a|ϵ A { a, ϵ } { b, c } B ::= b|ϵ B { b, ϵ } {c} C ::= c C {c} { d, e, EOF } D ::= d|ϵ D { d, ϵ } { e, EOF } E ::= e|ϵ A { e, ϵ } { EOF } Computing First & Follow Sets FIRST S ::= ACB|CbB|Ba S A ::= da|BC A B ::= g|ϵ B C ::= h|ϵ C FOLLOW Computing First & Follow Sets FIRST S ::= ACB|CbB|Ba S A ::= da|BC A {d} B ::= g|ϵ B { g, ϵ } C ::= h|ϵ C { h, ϵ } FOLLOW Computing First & Follow Sets FIRST S ::= ACB|CbB|Ba S { d, g, h, ϵ, b, a } A ::= da|BC A { d, g, h, ϵ } B ::= g|ϵ B { g, ϵ } C ::= h|ϵ C { h, ϵ } FOLLOW Computing First & Follow Sets FIRST FOLLOW S ::= ACB|CbB|Ba S { d, g, h, ϵ, b, a } { EOF } A ::= da|BC A { d, g, h, ϵ } { h, g, EOF } B ::= g|ϵ B { g, ϵ } { EOF, a, h, g } C ::= h|ϵ C { h, ϵ } { g, EOF, b, h } Computing First & Follow Sets Given the follow rules, compute the First and Follow Sets of all non-terminal symbols: • S ::= Bb|Cd, B ::= aB|ϵ, C ::= cC|ϵ • S ::= aBDh, B ::= cC, C ::= bC|ϵ, D ::= EF, E ::= g|ϵ, F ::= f|ϵ Computing First & Follow Sets FIRST S ::= Bb|Cd S B ::= aB|ϵ B { a, ϵ } C ::= cC|ϵ C { c, ϵ } FOLLOW Computing First & Follow Sets FIRST S ::= Bb|Cd S { a, b, c, d } B ::= aB|ϵ B { a, ϵ } C ::= cC|ϵ C { c, ϵ } FOLLOW Computing First & Follow Sets FIRST FOLLOW S ::= Bb|Cd S { a, b, c, d } { EOF } B ::= aB|ϵ B { a, ϵ } {b} C ::= cC|ϵ C { c, ϵ } {d} Computing First & Follow Sets FIRST S ::= aBDh S {a} B ::= cC B {c} C ::= bC|ϵ C { b, ϵ } D ::= EF D E ::= g|ϵ E { g, ϵ } F ::= f|ϵ F { f, ϵ } FOLLOW Computing First & Follow Sets FIRST S ::= aBDh S {a} B ::= cC B {c} C ::= bC|ϵ C { b, ϵ } D ::= EF D { g, f, ϵ } E ::= g|ϵ E { g, ϵ } F ::= f|ϵ F { f, ϵ } FOLLOW Computing First & Follow Sets FIRST FOLLOW S ::= aBDh S {a} { EOF } B ::= cC B {c} { g, f, h } C ::= bC|ϵ C { b, ϵ } { g, f, h } D ::= EF D { g, f, ϵ } {h} E ::= g|ϵ E { g, ϵ } { f, h } F ::= f|ϵ F { f, ϵ } {h}
© Copyright 2026 Paperzz