Context-Free Grammars 24 October 2013 OSU CSE 1 BL Compiler Structure Tokenizer string of characters (source code) Parser string of tokens (“words”) Code Generator abstract program string of integers (object code) The parser is arguably the most interesting, and most difficult, piece of the BL compiler. 24 October 2013 OSU CSE 2 Plan for the BL Parser • Design a context-free grammar (CFG) to specify syntactically valid BL programs • Use the grammar to implement a recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Program object) 24 October 2013 OSU CSE 3 Plan for the BL Parser • Design a context-free grammar (CFG) to specify syntactically valid BL programs • Use the grammar to implement a recursive-descent parser (i.e., an grammar is a set of algorithm to parseA a BL program and formation rules for strings in construct the corresponding Program a language. object) 24 October 2013 OSU CSE 4 Plan for the BL Parser • Design a context-free grammar (CFG) to specify syntactically valid BL programs • Use the grammar to implement a recursive-descent parser (i.e., an A grammar is context-free algorithm to parse a BL program and if it satisfies certain construct the corresponding Program technical conditions object) described herein. 24 October 2013 OSU CSE 5 Languages • A language is a set of strings over some alphabet Σ • If L is a language, then mathematically it is a set of string of Σ 24 October 2013 OSU CSE 6 Aside: Characters vs. Tokens • In the following examples of CFGs, we deal with languages over the alphabet of individual characters (e.g., Java’s char values) Σ = character • In the BL project, we deal with languages over an alphabet of tokens (to be explained later) 24 October 2013 OSU CSE 7 Example: Real-Number Constants • Some syntactically valid real-number constants (i.e., some strings in the “language of valid real-number constants”): 37.044 615.22E16 99241. 18.E-93 24 October 2013 OSU CSE 8 CFG Rewrite Rules real-const digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent E digit-seq | E + digit-seq | E – digit-seq digit-seq digit digit-seq | digit digit 0|1|2|3|4|5|6|7|8|9 24 October 2013 OSU CSE 9 CFG Rewrite Rules real-const digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent E digit-seq | E + digit-seq | This a rewrite rule (a E –isdigit-seq rule), digit-seq replacement digit digit-seq | which describes digit how strings in the language be| 5 formed. digit 0 | 1 | 2 may |3|4 |6|7|8|9 24 October 2013 OSU CSE 10 CFG Rewrite Rules real-const digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent E digit-seq | E + digit-seq | – digit-seq AE name on the left of a digit-seq rewrite digit digit-seq | rule is called a digit non-terminal symbol. digit 0|1|2|3|4|5|6|7|8|9 24 October 2013 OSU CSE 11 CFG Rewrite Rules real-const digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent E digit-seq | E + digit-seq | – digit-seq TheE special CFG symbol digit-seq means digit“can digit-seq | be rewritten as” ordigit “can be replaced by”. digit 0|1|2|3|4|5|6|7|8|9 24 October 2013 OSU CSE 12 CFG Rewrite Rules real-const digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent E digit-seq | E + digit-seq | The CFG symbol | E special – digit-seq means “or”, i.e., |there are digit-seq digit digit-seq multiple digit possible “rewrites” for0the digit | 1 same | 2 | 3non-terminal. |4|5|6|7|8|9 24 October 2013 OSU CSE 13 CFG Rewrite Rules real-const digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent E digit-seq | E + digit-seq | E – digit-seq digit-seq digit digit-seq | So this ... digit digit 0|1|2|3|4|5|6|7|8|9 24 October 2013 OSU CSE 14 CFG Rewrite Rules real-const real-const real-const real-const exponent digit-seq digit 24 October 2013 digit-seq . digit-seq digit-seq . digit-seq exponent digit-seq . digit-seq . exponent E digit-seq | E + digit-seq | E – digit-seq ... means exactly the same digit | separate thing as digit-seq these four digit rewrite rules. 0|1|2|3|4|5|6|7|8|9 OSU CSE 15 CFG Rewrite Rules real-const digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent E digit-seq | E + digit-seq | One non-terminal symbol E – digit-seq (normally in the first rewrite digit-seq digit digit-seq | rule) is called the digit start symbol. digit 0|1|2|3|4|5|6|7|8|9 24 October 2013 OSU CSE 16 CFG Rewrite Rules real-const digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent E digit-seq | E + digit-seq | A symbol from the alphabet E – digit-seq on the right-hand side of a digit-seq digit digit-seq | rewrite rule is called a digit terminal symbol. digit 0|1|2|3|4|5|6|7|8|9 24 October 2013 OSU CSE 17 CFG Rewrite Rules real-const digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent exponent E digit-seq | E + digit-seq | To remember the name: terminal E – digit-seq symbols are what you end up with digit-seq digit digit-seq | when generating strings in the digit language (see below). digit 0|1|2|3|4|5|6|7|8|9 24 October 2013 OSU CSE 18 Four Components of a CFG • Non-terminal symbols for this CFG: – real-const, exponent, digit-seq, digit • Terminal symbols for this CFG: – ., E, +, -, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 • Start symbol for this CFG: – real-const • Rewrite rules for this CFG: – (see previous slides) 24 October 2013 OSU CSE 19 Derivations • A derivation of a string of terminal symbols consists of a sequence of specific rewrite-rule applications that begin with the start symbol and continue until only terminal symbols remain – A string is in the language of the CFG iff there is a derivation that leads to it • The symbol indicates a derivation step, i.e., a specific rewrite-rule application 24 October 2013 OSU CSE 20 Example: Derivation of 5.6E10 • Begin with the start symbol: real-const 24 October 2013 OSU CSE 21 Example: Derivation of 5.6E10 • Begin with the start symbol: real-const • ... and pick one possible rewrite: real-const digit-seq . digit-seq | digit-seq . digit-seq exponent | Which rewrite digit-seq . | is appropriate digit-seq . exponent to derive 5.6E10? 24 October 2013 OSU CSE 22 Example: Derivation of 5.6E10 • This is the first step of the derivation: real-const 24 October 2013 digit-seq . digit-seq exponent OSU CSE 23 Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const 24 October 2013 digit-seq . digit-seq exponent OSU CSE 24 Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const digit-seq . digit-seq exponent • ... and pick one possible rewrite: digit digit-seq | digit Which rewrite is appropriate to derive 5.6E10? digit-seq 24 October 2013 OSU CSE 25 Example: Derivation of 5.6E10 • This is the second step of the derivation: real-const 24 October 2013 digit-seq . digit-seq exponent digit . digit-seq exponent OSU CSE 26 Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const 24 October 2013 digit-seq . digit-seq exponent digit . digit-seq exponent OSU CSE 27 Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const digit-seq . digit-seq exponent digit . digit-seq exponent • ... and pick one possible rewrite: digit 24 October 2013 0|1|2|3|4|5|6|7|8|9 OSU CSE 28 Example: Derivation of 5.6E10 • This is the third step of the derivation: real-const 24 October 2013 digit-seq . digit-seq exponent digit . digit-seq exponent 5 . digit-seq exponent OSU CSE 29 Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const 24 October 2013 digit-seq . digit-seq exponent digit . digit-seq exponent 5 . digit-seq exponent OSU CSE 30 Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const digit-seq . digit-seq exponent digit . digit-seq exponent 5 . digit-seq exponent • ... and pick one possible rewrite: digit-seq 24 October 2013 digit digit-seq | digit OSU CSE 31 One Derivation of 5.6E10 real-const 24 October 2013 digit-seq . digit-seq exponent digit . digit-seq exponent 5 . digit-seq exponent 5 . digit exponent 5 . 6 exponent 5 . 6 E digit-seq 5 . 6 E digit digit-seq 5 . 6 E 1 digit-seq 5 . 6 E 1 digit 5.6 E10 OSU CSE 32 One Derivation of 5.6E10 real-const 24 October 2013 that aexponent derivation is digit-seq .Note digit-seq used in this way to digit . digit-seq exponent generate a string in the 5 . digit-seq exponent language of the CFG. 5 . digit exponent 5 . 6 exponent 5 . 6 E digit-seq 5 . 6 E digit digit-seq 5 . 6 E 1 digit-seq 5 . 6 E 1 digit 5.6 E10 OSU CSE 33 Another Derivation of 5.6E10 real-const 24 October 2013 digit-seq . digit-seq exponent digit-seq . digit-seq E digit-seq digit-seq . digit-seq E digit digit-seq digit-seq . digit-seq E digit digit digit-seq . digit-seq E digit 0 digit-seq . digit-seq E 1 0 digit-seq . digit E 1 0 digit-seq . 6 E 1 0 digit . 6 E 1 0 5.6E10 OSU CSE 34 Derivation Trees • A derivation tree depicts a derivation (such as those above) in a tree • Note that the order in which rewrites are done is sometimes arbitrary – A tree captures the required temporal order of rewrites from top-to-bottom – A tree captures the required spatial order among terminal symbols from left-to-right 24 October 2013 OSU CSE 35 A Derivation Tree for 5.6E10 real-const digit-seq . digit-seq digit digit 5 6 exponent E digit-seq digit digit-seq 1 digit 0 24 October 2013 OSU CSE 36 A Derivation Tree for 5.6E10 real-const digit-seq . digit-seq digit digit 5 6 E This tree captures both derivations previously illustrated (and all others) for 5.6E10. 24 October 2013 exponent OSU CSE digit-seq digit digit-seq 1 digit 0 37 Other Examples • Can you find a derivation tree for 5.E3? – If so, it’s in the language of the CFG; otherwise it’s not in that language • Can you find a derivation tree for .6E10? – If so, it’s in the language of the CFG; otherwise it’s not in that language 24 October 2013 OSU CSE 38 A Famous CFG expr term factor add-op mult-op expr add-op term | term term mult-op factor | factor ( expr ) | digit-seq +| * | DIV | REM digit-seq digit digit digit-seq | digit 0|1|2|3|4|5|6|7|8|9 24 October 2013 OSU CSE 39 Example: 4+6*2 • Find a derivation tree for 4+6*2 24 October 2013 OSU CSE 40 A Derivation Tree for 4+6*2 expr expr add-op term + term term mult-op factor factor factor * digit-seq digit-seq digit-seq digit digit digit 2 4 6 24 October 2013 OSU CSE 41 Example: (4+6)*2 • Find a derivation tree for (4+6)*2 • How is it different from the previous one? 24 October 2013 OSU CSE 42 A Simpler CFG for Expressions expr op expr op expr | ( expr ) | digit-seq + | - | * | DIV | REM digit-seq digit digit digit-seq | digit 0|1|2|3|4|5|6|7|8|9 24 October 2013 OSU CSE 43 One Derivation Tree for 4+6*2 expr expr op digit-seq + expr expr op expr digit digit-seq * digit-seq 4 digit digit 6 2 24 October 2013 OSU CSE 44 Another Derivation Tree for 4+6*2 expr expr op expr * digit-seq expr op expr digit-seq + digit-seq digit digit digit 2 4 6 24 October 2013 OSU CSE 45 Ambiguity • The second (simpler) CFG for arithmetic expressions is ambiguous because some strings in the language of the CFG have more than one derivation tree • As is often the case, ambiguity is bad – If you want to use the derivation tree as the basis for evaluating the expression, only one of the derivation trees shown above results in the right answer (which one?) 24 October 2013 OSU CSE 46 Resources • Wikipedia: Context-Free Grammar – http://en.wikipedia.org/wiki/Context-free_grammar 24 October 2013 OSU CSE 47
© Copyright 2026 Paperzz