Parsing Discrete Mathematics and Its Applications Baojian Hua [email protected] Derivations A string is valid in a language if and only if there exists a derivation from the start state which produces it Begin with the start symbol, and apply grammar rules until you produce the string Note that the final string (sentence) consists of only terminals Question Given a formal grammar G and a sentence (program) p, is p derivable from grammar G ? Or equivalently, is a given program p valid according to some language’s syntax (say C)? Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum xuwz Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum xuwz xwu Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // derivable? xum xuwz xwu xuz Lexical Analyzer The lexical analyzer translates the source program into a stream of lexical tokens Source program: Lexical token: stream of (ASCII or Unicode) characters compiler data structure that represents the occurrence of a terminal symbol Valid sentence consists of only allowable terminals Example: Context-Free Grammar S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // all terminals T={x, y, u, v, t, w, z} Example: Context-Free Grammar S ::= x A | y B // all terminals T={x, y, u, v, t, w, z} A ::= u C | v C B ::= t C ::= w | z // allowable strings T* Predictive Parsing Parsing: recognizing a string and do something useful The most naïve approach to use when implementing a parser is to use recursive descent A form of top-down parsing Not as powerful as other methods, but easy enough to implement by hand Predictive Parsing S ::= x A | y B A ::= u C | v C B ::= t C ::= w | z // Valid? xum xuwz xwu xuz A Predictive Parser in C (Sketch) tokenTy token; void parseS () { switch (token.kind) { case x: token = nextToken (); parseA (); break; case y: token = nextToken (); parseB (); break; default: error (…); } } // other functions are similar Output: Abstract Syntax Tree xuz S x A u C z A Predictive Parser Emitting AST in C (Sketch) tokenTy token; S parseS () { switch (token.kind) { case x: token = nextToken (); a=parseA (); return newS1 (x, a); case y: token = nextToken (); b=parseB (); return newS2 (y, b); default: error (…); } } // other functions are similar Predictive Parsing Difficulties S ::= x A | x B A ::= u C | v C B ::= t C ::= w | z // derivable? xuz Or Even Worse 1 E ::= id 15*(3+4) E 2 | num By 4 => E * E 3 | E + E By 5 => E * (E + E) 4 | E * E By 2 => E * (E + 4) 5 | ( E ) By 2 => E * (3 + 4) By 2 => 15 * (3 + 4) Or Even Worse 15*(3+4) E E E * E E * E E * (E + E) 15 * E E * (E + 4) 15 * (E + E) E * (3 + 4) 15 * (3 + E) 15 * (3 + 4) 15 * (3 + 4) rightmost derivation leftmost derivation Ambiguous grammars A grammar is ambiguous if there is a sentence with >1 parse tree E E E 15 * E 3 15 * 3 + 4 E E + E E 4 15 * + E 3 E 15 Eliminating ambiguity In programming language syntax, ambiguity often arises from missing operator precedence or associativity * higher precedence than +? * and + are left associative? Can sometimes rewrite the grammar to disambiguate this Beyond the scope of this course Unambiguous Grammar E ::= id | num | E + E | E * E | ( E ) E ::= E + T | T T ::= T * F | F F ::= id | num | ( E ) Accepts the same language, but parses unambiguously Limitations with Predictive Parsing Rewriting grammar: to resolve ambiguity Grammars/trees are ugly But…easy to write code by hand, and very good for error reporting Doing better We can do better We can use a parsing algorithm that can handle all context-free languages (though not all context-free grammars) Remember: a context-free language might have many different context-free grammars The Yacc Tool semantic analyzer specification parser Yacc Originally developed for C, and now almost every main-stream language has its own Yacc-tool: bison (C), ml-yacc (SML), Cup (Java), GPPG (C#), … Whole Structure source code lexical analyzer tokens parser abstract syntax tree other part Pentiu m
© Copyright 2026 Paperzz