Formalising the Normal Forms of CFGs in HOL4 Aditi Barthwal1 1 Michael Norrish2 Australian National University 2 NICTA 19th EACSL Annual Conference on Computer Science Logic August 2010 Aditi Barthwal CFG Normal Forms 1/23 Context-free grammars G = (V ; T ; P ; S ), where V = finite set of variables or nonterminals T = finite set of terminals ! 2 [ P = finite set of productions, each one of form A , where A V and is a string of symbols such that (V T ) 2 S = start symbol A word is a string over terminals. Language of G, L(G), are all the words reachable from the start symbol. Aditi Barthwal CFG Normal Forms 2/23 CFGs — The HOL Version Types: (’nts, ’ts) symbol = NTS of ’nts | TS of ’ts (’nts, ’ts) rule = rule of ’nts => (’nts, ’ts) symbol list (’nts, ’ts) grammar = G of (’nts, ’ts) rule list => ’nts A grammar’s language: L g = tsl | f (derives g ) [NTS (startSym g )] tsl isWord tsl g Aditi Barthwal CFG Normal Forms ^ 3/23 Results I will not talk about Simplification/normalisation of CFGs by removing symbols that do not generate a terminal string or are not reachable from the start symbol of the grammar (useless symbols); removing -productions (as long as is not in the language generated by the grammar); removing unit productions, i.e. ones of the form A B is a nonterminal symbol. Aditi Barthwal CFG Normal Forms ! B where 4/23 Chomsky Normal Form A grammar G is in Chomsky Normal Form if every rule is of the form A A1 A2 ! where Ai is a non-terminal or A !a where a is a terminal. Aditi Barthwal CFG Normal Forms 5/23 The Chomsky Normal Form Theorem Language Equivalence U ^ 2 INFINITE (:’nts) [] = L g 0 : isCnf g 0 L g = L g0 9g ^ ) Proof: H&U’s proof is 3.5 pages long with examples The HOL proof is 1444 loc Translation from H&U to HOL is straightforward Aditi Barthwal CFG Normal Forms 6/23 The Relational Approach to Grammar Transformation Both normalisations feature “non-determinism”: choice of fresh non-terminals order in which rules are transformed Rather than define a function, use a “one-step” relation: R : grammar ! grammar ! bool (Additional parameters possible: e.g. fresh symbols) Show: Each application of R preserves language equality There is always a step possible while grammar has not reached final form Aditi Barthwal CFG Normal Forms 7/23 Greibach Normal Form (GNF) A grammar G is in Greibach Normal Form if every rule is of the form A aA1 A2 : : : An where n 0. ! Aditi Barthwal CFG Normal Forms 8/23 The GNF Destination Language Equivalence 9g U ^ ^ 2 INFINITE (:’nts) [] = L g 0 : isGnf g 0 L g = L g0 ) Proof (in H&U): 3 pages long Includes a crucial picture Aditi Barthwal CFG Normal Forms 9/23 The Crux of GNF The central issue in the proof is dealing with left-recursion: rules of the form A A ! or loops such as A B C ! ! ! Aditi Barthwal C A Æ B CFG Normal Forms 10/23 GNF: Step 0 Convert grammar to Chomsky Normal Form. Aditi Barthwal CFG Normal Forms 11/23 GNF: Step 1 Order the non-terminals. (Another source of non-determinism!) “Substitute out” variable references so that Ai only occurs if j ! Aj >i (Hard in presence of left-recursion!) Aditi Barthwal CFG Normal Forms 12/23 GNF: Step 1 (The Easy Case) Working on Ai . Assume that all Aj <i have been done. In order (j = 1 : : : i 1), if rule is Ai ! Aj take all possible RHSes for Aj (1 : : : n ) replace rule above with Ai ! k (k 2 f1 : : : ng) (Each replacement preserves the language (H&U Lemma 4.3)) May result in a rule Ai ! Ai . . . Aditi Barthwal CFG Normal Forms 13/23 GNF: Step 1 (The Hard Bit) May now have a left-recursive rule A ! A (No left-recursive cycles possible though.) Aditi Barthwal CFG Normal Forms 14/23 Hopcroft & Ullman Lemma 4.4: the “left to right” lemma Change the left recursive rules into right recursive rules. Lemma (“left to right lemma”) ! j ! j j Let g = (V ; T ; P ; S ) be a CFG. Let A A1 A2 : : : Ar be 1 2 : : : s the set of left recursive A-productions. Let A be the remaining A-productions. Then we can construct g 0 = (V B ; T ; P1 ; S ) such that L(g ) = L(g 0 ) by replacing all the left recursive A-productions by the following productions: j j j [f g ! i and A ! i B B ! i and B ! i B Rule 1 A Rule 2 Here, B is a fresh nonterminal that does not belong in g. Aditi Barthwal CFG Normal Forms 15/23 Hopcroft & Ullman’s Picture Any derivation in the left-recursive grammar can be mimicked in the right-recursive grammar, and vice versa: A A A A A a1 b a2 B an an B a2 B a1 b Aditi Barthwal CFG Normal Forms 16/23 Realising the Picture Formally A A A an A A a1 b a2 B an A-block B a2 B-block b B a1 Proof by induction on block. Aditi Barthwal CFG Normal Forms 17/23 The “left to right” lemma Result: Language Equivalence 8g g 0 : left2Right A B g g 0 Aditi Barthwal ) L g = L g0 CFG Normal Forms 18/23 GNF: Step 2 (A-productions to a-productions) a-productions Let a-productions be rules of the form A where a is a terminal symbol. Ai ! a ! Aj in g1 are replaced by Ai ! a , where Aj ! a Aditi Barthwal CFG Normal Forms 19/23 GNF: Step 3 (B-productions to a-productions) Bk ! Ai in g2 are replaced with Bk ! a , where Ai ! a Aditi Barthwal CFG Normal Forms 20/23 The Proof Effort in Summary 1 year 14000 lines of code 700 lemmas and theorems + library of common definitions and theorems Aditi Barthwal CFG Normal Forms 21/23 Conclusion Relational idiom for non-determinism Mechanisation of Chomsky Normal Form Mechanisation of Greibach Normal Form Lemma 4.3 — substituting out non-terminal references Lemma 4.4 — removal of left-recursion Translation of H&U’s picture into an induction Aditi Barthwal CFG Normal Forms 22/23 Hopcroft & Ullman Lemma 4.3 Let A-productions be those productions whose LHS is the nonterminal A. Lemma (“aProds lemma”) ! Let G = (V ; T ; P ; S ) be a CFG. Let A 1 B 2 be a production in 1 2 : : : r be the set of all B-productions. Let P and B G1 = (V ; T ; P1 ; S ) be obtained from G by deleting the production 1 B 2 from P and adding the productions A 1 1 2 1 2 2 : : : 1 2 2 . Then L(G) = L(G1 ). A ! j j j ! ! j j j Aditi Barthwal CFG Normal Forms 23/23
© Copyright 2026 Paperzz