Formalising the Normal Forms of CFGs in HOL4

Formalising the Normal Forms of CFGs in HOL4
Aditi Barthwal1
1
Michael Norrish2
Australian National University
2
NICTA
19th EACSL Annual Conference on Computer Science Logic
August 2010
Aditi Barthwal
CFG Normal Forms
1/23
Context-free grammars
G = (V ; T ; P ; S ), where
V = finite set of variables or nonterminals
T = finite set of terminals
!
2 [
P = finite set of productions, each one of form A
, where
A V and is a string of symbols such that (V T )
2
S = start symbol
A word is a string over terminals.
Language of G, L(G), are all the words reachable from the start
symbol.
Aditi Barthwal
CFG Normal Forms
2/23
CFGs — The HOL Version
Types:
(’nts, ’ts) symbol = NTS of ’nts | TS of ’ts
(’nts, ’ts) rule
= rule of ’nts => (’nts, ’ts) symbol list
(’nts, ’ts) grammar
= G of (’nts, ’ts) rule list => ’nts
A grammar’s language:
L g =
tsl |
f
(derives g ) [NTS (startSym g )] tsl
isWord tsl
g
Aditi Barthwal
CFG Normal Forms
^
3/23
Results I will not talk about
Simplification/normalisation of CFGs by
removing symbols that do not generate a terminal string or
are not reachable from the start symbol of the grammar
(useless symbols);
removing -productions (as long as is not in the language
generated by the grammar);
removing unit productions, i.e. ones of the form A
B is a nonterminal symbol.
Aditi Barthwal
CFG Normal Forms
! B where
4/23
Chomsky Normal Form
A grammar G is in Chomsky Normal Form if every rule is of the
form
A
A1 A2
!
where Ai is a non-terminal
or
A
!a
where a is a terminal.
Aditi Barthwal
CFG Normal Forms
5/23
The Chomsky Normal Form Theorem
Language Equivalence
U
^
2
INFINITE (:’nts)
[] = L g
0 : isCnf g 0
L g = L g0
9g
^
)
Proof:
H&U’s proof is 3.5 pages long with examples
The HOL proof is 1444 loc
Translation from H&U to HOL is straightforward
Aditi Barthwal
CFG Normal Forms
6/23
The Relational Approach to Grammar Transformation
Both normalisations feature “non-determinism”:
choice of fresh non-terminals
order in which rules are transformed
Rather than define a function, use a “one-step” relation:
R : grammar
! grammar ! bool
(Additional parameters possible: e.g. fresh symbols)
Show:
Each application of R preserves language equality
There is always a step possible while grammar has not
reached final form
Aditi Barthwal
CFG Normal Forms
7/23
Greibach Normal Form (GNF)
A grammar G is in Greibach Normal Form if every rule is of the
form
A
aA1 A2 : : : An
where n
0.
!
Aditi Barthwal
CFG Normal Forms
8/23
The GNF Destination
Language Equivalence
9g
U
^
^
2
INFINITE (:’nts)
[] = L g
0 : isGnf g 0
L g = L g0
)
Proof (in H&U):
3 pages long
Includes a crucial picture
Aditi Barthwal
CFG Normal Forms
9/23
The Crux of GNF
The central issue in the proof is dealing with left-recursion: rules
of the form
A
A
!
or loops such as
A
B
C
!
!
!
Aditi Barthwal
C
A Æ
B
CFG Normal Forms
10/23
GNF: Step 0
Convert grammar to Chomsky Normal Form.
Aditi Barthwal
CFG Normal Forms
11/23
GNF: Step 1
Order the non-terminals. (Another source of non-determinism!)
“Substitute out” variable references so that
Ai
only occurs if j
!
Aj
>i
(Hard in presence of left-recursion!)
Aditi Barthwal
CFG Normal Forms
12/23
GNF: Step 1 (The Easy Case)
Working on Ai .
Assume that all Aj <i have been done.
In order (j = 1 : : : i
1), if rule is Ai
!
Aj take all possible RHSes for Aj (1 : : : n )
replace rule above with Ai
! k (k 2 f1 : : : ng)
(Each replacement preserves the language (H&U Lemma 4.3))
May result in a rule Ai
!
Ai . . .
Aditi Barthwal
CFG Normal Forms
13/23
GNF: Step 1 (The Hard Bit)
May now have a left-recursive rule A
! A
(No left-recursive cycles possible though.)
Aditi Barthwal
CFG Normal Forms
14/23
Hopcroft & Ullman Lemma 4.4: the “left to right” lemma
Change the left recursive rules into right recursive rules.
Lemma (“left to right lemma”)
!
j
!
j
j
Let g = (V ; T ; P ; S ) be a CFG. Let A
A1 A2 : : : Ar be
1 2 : : : s
the set of left recursive A-productions. Let A
be the remaining A-productions. Then we can construct
g 0 = (V
B ; T ; P1 ; S ) such that L(g ) = L(g 0 ) by replacing all
the left recursive A-productions by the following productions:
j
j
j
[f g
! i and A ! i B
B ! i and B ! i B
Rule 1 A
Rule 2
Here, B is a fresh nonterminal that does not belong in g.
Aditi Barthwal
CFG Normal Forms
15/23
Hopcroft & Ullman’s Picture
Any derivation in the left-recursive grammar can be mimicked in
the right-recursive grammar, and vice versa:
A
A
A
A
A
a1
b
a2
B
an
an
B
a2
B
a1
b
Aditi Barthwal
CFG Normal Forms
16/23
Realising the Picture Formally
A
A
A
an
A
A
a1
b
a2
B
an
A-block
B
a2
B-block
b
B
a1
Proof by induction on block.
Aditi Barthwal
CFG Normal Forms
17/23
The “left to right” lemma
Result: Language Equivalence
8g
g 0 : left2Right A B g g 0
Aditi Barthwal
)
L g = L g0
CFG Normal Forms
18/23
GNF: Step 2 (A-productions to a-productions)
a-productions Let a-productions be rules of the form A
where a is a terminal symbol.
Ai
! a
! Aj in g1 are replaced by Ai ! a , where Aj ! a
Aditi Barthwal
CFG Normal Forms
19/23
GNF: Step 3 (B-productions to a-productions)
Bk
! Ai in g2 are replaced with Bk ! a , where Ai ! a
Aditi Barthwal
CFG Normal Forms
20/23
The Proof Effort in Summary
1 year
14000 lines of code
700 lemmas and theorems
+ library of common definitions and theorems
Aditi Barthwal
CFG Normal Forms
21/23
Conclusion
Relational idiom for non-determinism
Mechanisation of Chomsky Normal Form
Mechanisation of Greibach Normal Form
Lemma 4.3 — substituting out non-terminal references
Lemma 4.4 — removal of left-recursion
Translation of H&U’s picture into an induction
Aditi Barthwal
CFG Normal Forms
22/23
Hopcroft & Ullman Lemma 4.3
Let A-productions be those productions whose LHS is the
nonterminal A.
Lemma (“aProds lemma”)
!
Let G = (V ; T ; P ; S ) be a CFG. Let A
1 B 2 be a production in
1 2 : : : r be the set of all B-productions. Let
P and B
G1 = (V ; T ; P1 ; S ) be obtained from G by deleting the production
1 B 2 from P and adding the productions
A
1 1 2 1 2 2 : : : 1 2 2 . Then L(G) = L(G1 ).
A
! j j j
!
!
j
j j
Aditi Barthwal
CFG Normal Forms
23/23