CNF and CYK

Chomsky Normal Form
CYK Algorithm
Normal Forms
There are some special forms in which I can
bring the grammar to work with it more easily.
• Chomsky Normal Form
• Greibach Normal Form
Chomsky Normal Form
A Context Free Grammar is in Chomsky Normal
Form if the rules are of the form:
• A ⟶ BC
• A⟶a
• S⟶ε
with A, B, C being variables (B,C not being the
start variable), a being a terminal and S only
being the start variable.
Chomsky Normal Form
There are 5 steps to follow in order to transform
a grammar into CNF:
1. Add the a new start variable S0 and the
production rule S0 ⟶ S.
2. Eliminate the ε-rules.
3. Eliminate the unary productions A ⟶ B.
4. Add rules of the form Vt ⟶ t for every
terminal t and replace t with the variable Vt.
5. Transform the remaining of the rules to the
form A ⟶ BC (A, B, C variables).
1. Add a new start variable
• We have to make sure that the start variable
doesn’t occur to the right side of some rule.
• Thus, we add a new start variable S0 and the
rule S0 ⟶ S, where S is the old start variable.
2. Eliminate ε-rules
• We have to eliminate all productions of the
form A ⟶ ε, for A being any non-start
variable.
• To do so we should remove the rule A ⟶ ε
and replace every appearance of A with ε in
all other rules.
3. Eliminate unary productions
• A unary production is a production of the form
A ⟶ B (with both A, B being variables).
• There should only be productions of the form
V1 ⟶ V2V3 involving variables, thus we have
to eliminate unary productions.
• To do so, we replace B in A ⟶ B with the
right parts of the rules involving B in the left
part.
4. Add Vt ⟶ t and replace t with Vt
• There should only be rules of the form A ⟶ t
involving terminals, thus terminals should
disappear from every other rule involving
more than just one single literal.
• To do so, we add a new variable Vt for every
terminal t and we replace every appearance
of t with Vt , except those in rules of the form
A ⟶ t.
5. Transform rules to A ⟶ BC
• All the rules involving only variables should be
of the form A ⟶ BC. Thus we should take
care of all the rules involving more than 2
variables in the right part
• For the rule V ⟶ A1A2A3…An,we start
reducing the size of the right part by
replacing every two variables with one new
variable (resulting in the creation of n-2
new variables).
5. Transform rules to A ⟶ BC
V ⟶ A1A2A3A4A5A6…An
5. Transform rules to A ⟶ BC
V ⟶ B1A3A4A5A6…An
B1 ⟶ A1A2
5. Transform rules to A ⟶ BC
V ⟶ B2A4A5A6…An
B2 ⟶ B1A3
B1 ⟶ A1A2
5. Transform rules to A ⟶ BC
V ⟶ B3A5A6…An
B3 ⟶ B2A4
B2 ⟶ B1A3
B1 ⟶ A1A2
5. Transform rules to A ⟶ BC
V ⟶ B4A6…An
B4 ⟶ B3A5
B3 ⟶ B2A4
B2 ⟶ B1A3
B1 ⟶ A1A2
5. Transform rules to A ⟶ BC
V ⟶ Bn-2An
Bn-2 ⟶ Bn-3An-1
…
B4 ⟶ B3A5
B3 ⟶ B2A4
B2 ⟶ B1A3
B1 ⟶ A1A2
Example
S ⟶ CSC | B
C ⟶ 00 | ε
B ⟶ 01B | 1
Example
1. Add new start variablle
S0 ⟶ S
S ⟶ CSC | B
C ⟶ 00 | ε
B ⟶ 01B | 1
Example
2. Eliminate ε-moves
S0 ⟶ S
S ⟶ CSC | B
C ⟶ 00 | ε
B ⟶ 01B | 1
Example
2. Eliminate ε-moves
S0 ⟶ S
S ⟶ CSC | B | CS | SC | S
C ⟶ 00
B ⟶ 01B | 1
Example
3. Eliminate Unary Productions
S0 ⟶ S
S ⟶ CSC | B | CS | SC | S
C ⟶ 00
B ⟶ 01B | 1
Example
3. Eliminate Unary Productions
S0 ⟶ S
S ⟶ CSC | B | CS | SC
C ⟶ 00
B ⟶ 01B | 1
Example
3. Eliminate Unary Productions
S0 ⟶ S
S ⟶ CSC | B | CS | SC
C ⟶ 00
B ⟶ 01B | 1
Example
3. Eliminate Unary Productions
S0 ⟶ S
S ⟶ CSC | 01B | 1 | CS | SC
C ⟶ 00
B ⟶ 01B | 1
Example
3. Eliminate Unary Productions
S0 ⟶ S
S ⟶ CSC | 01B | 1 | CS | SC
C ⟶ 00
B ⟶ 01B | 1
Example
3. Eliminate Unary Productions
S0 ⟶ CSC | 01B | 1 | CS | SC
S ⟶ CSC | 01B | 1 | CS | SC
C ⟶ 00
B ⟶ 01B | 1
Example
4. Create Vt for every terminal t
S0 ⟶ CSC | 01B | 1 | CS | SC
S ⟶ CSC | 01B | 1 | CS | SC
C ⟶ 00
B ⟶ 01B | 1
Z⟶0
Example
4. Create Vt for every terminal t
S0 ⟶ CSC | Z1B | 1 | CS | SC
S ⟶ CSC | Z1B | 1 | CS | SC
C ⟶ ZZ
B ⟶ Z1B | 1
Z⟶0
Example
4. Create Vt for every terminal t
S0 ⟶ CSC | Z1B | 1 | CS | SC
S ⟶ CSC | Z1B | 1 | CS | SC
C ⟶ ZZ
B ⟶ Z1B | 1
Z⟶0
A⟶1
Example
4. Create Vt for every terminal t
S0 ⟶ CSC | ZAB | 1 | CS | SC
S ⟶ CSC | ZAB | 1 | CS | SC
C ⟶ ZZ
B ⟶ ZAB | 1
Z⟶0
A⟶1
Example
5. Take care of long rules
S0 ⟶ CSC | ZAB | 1 | CS | SC
S ⟶ CSC | ZAB | 1 | CS | SC
C ⟶ ZZ
B ⟶ ZAB | 1
Z⟶0
A⟶1
D ⟶ CS
Example
5. Take care of long rules
S0 ⟶ DC | ZAB | 1 | CS | SC
S ⟶ DC | ZAB | 1 | CS | SC
C ⟶ ZZ
B ⟶ ZAB | 1
Z⟶0
A⟶1
D ⟶ CS
Example
5. Take care of long rules
S0 ⟶ DC | ZAB | 1 | CS | SC
S ⟶ DC | ZAB | 1 | CS | SC
C ⟶ ZZ
B ⟶ ZAB | 1
Z⟶0
A⟶1
D ⟶ CS
E ⟶ ZA
Example
5. Take care of long rules
S0 ⟶ DC | EB | 1 | CS | SC
S ⟶ DC | EB | 1 | CS | SC
C ⟶ ZZ
B ⟶ EB | 1
Z⟶0
A⟶1
D ⟶ CS
E ⟶ ZA
CYK Introduction
• Problem: Given a context free grammar and a
string s is it possible to decide whether s can
be generated by the grammar or not?
• If the grammar is not in a very special form
this is not so efficient.
• If the grammar is in Chomsky Normal Form,
we have an elegant algorithm for testing this,
the CYK algorithm.
The CYK algorithm
• Suppose that we are given a grammar in
Chomsky Normal form
S → AB
A → BB | 0
B → AA |1
• We would like to see if 10110 is generated by
this grammar or not.
Substrings of length 1
• Since the only way to produce terminals is by
following the rules A → a, just replace every
terminal with the variables that produce it.
1
0
1
1
0
B
A
B
B
A
Substrings of length 2
Suppose now that we want to see how every substring
of length 2 can be generated. This is equivalent with
finding ways to produce all the length 2 substrings
where terminals are replaced with the variables that
represent them. But since every rule is of the form
A → BC, it suffices to replace every two consecutive
variables with the variables that produce them.
1
B
-
0
A
S
1
B
A
1
B
-
0
A
Substrings of length 3
• To produce the substring 101 (in 10110) we can
either take 1 with 01 or 10 with 1. Here BS cannot be
produced by any variable.
1
B
-
0
A
S
1
B
A
1
B
-
0
A
Substrings of length 3
• To produce the substring 101 (in 10110) we can
either take 1 with 01 or 10 with 1. Here we don’t
have a pair since 10 cannot be produced.
1
B
-
0
A
S
1
B
A
1
B
-
0
A
Substrings of length 3
• To produce the substring 011 (in 10110) we can
either take 0 with 11 or 01 with 1. Here AA can be
produced by B.
1
B
-
0
A
S
B
1
B
A
1
B
-
0
A
Substrings of length 3
• To produce the substring 011 (in 10110) we can
either take 0 with 11 or 01 with 1. Here SB cannot be
produced by any variable
1
B
-
0
A
S
B
1
B
A
1
B
-
0
A
Substrings of length 3
• To produce the substring 110 (in 10110) we can either
take 1 with 10 or 11 with 0. Here we don’t have a pair
since 10 cannot be produced by a variable.
1
B
-
0
A
S
B
1
B
A
-
1
B
-
0
A
Substrings of length 3
• To produce the substring 110 (in 10110) we can
either take 1 with 10 or 11 with 0. Here AA can be
produced by B
1
B
-
0
A
S
B
1
B
A
B
1
B
-
0
A
Substrings of length 4
• To produce the substring 1011 (in 10110) we can
take 1 with 011 or 10 with 11, or 101 with 1. Here BB
can be produced by A.
1
B
A
0
A
S
B
1
B
A
B
1
B
-
0
A
Substrings of length 4
• To produce the substring 1011 (in 10110) we can
take 1 with 011 or 10 with 11, or 101 with 1. Here we
don’t have a pair since 10 cannot be produced.
1
B
A
0
A
S
B
1
B
A
B
1
B
-
0
A
Substrings of length 4
• To produce the substring 1011 (in 10110) we can
take 1 with 011 or 10 with 11, or 101 with 1. Here we
don’t have a pair since 101 cannot be produced.
1
B
A
0
A
S
B
1
B
A
B
1
B
-
0
A
Substrings of length 4
• To produce the substring 0110 (in 10110) we can
take 0 with 110 or 01 with 10, or 011 with 0. Here AB
can be produced by S.
1
B
A
0
A
S
B
S
1
B
A
B
1
B
-
0
A
Substrings of length 4
• To produce the substring 0110 (in 10110) we can
take 0 with 110 or 01 with 10, or 011 with 0. Here we
don’t have a pair since 10 cannot be produced.
1
B
A
0
A
S
B
S
1
B
A
B
1
B
-
0
A
Substrings of length 4
• To produce the substring 0110 (in 10110) we can
take 0 with 110 or 01 with 10, or 011 with 0. Here BA
cannot be produced by any variable.
1
B
A
0
A
S
B
S
1
B
A
B
1
B
-
0
A
Combine previous solutions
• In order now to produce the whole string 10110 we
can take 1 with 0110 or 10 with 110 or 101 with 10,
or 1011 with 0. Here, BS cannot be produced.
1
B
A
-
0
A
S
B
S
1
B
A
B
1
B
-
0
A
Combine previous solutions
• In order now to produce the whole string 10110 we
can take 1 with 0110 or 10 with 110 or 101 with 10,
or 1011 with 0. Here we don’t have a pair.
1
B
A
-
0
A
S
B
S
1
B
A
B
1
B
-
0
A
Combine previous solutions
• In order now to produce the whole string 10110 we
can take 1 with 0110 or 10 with 110 or 101 with 10,
or 1011 with 0. Here we don’t have a pair.
1
B
A
-
0
A
S
B
S
1
B
A
B
1
B
-
0
A
Combine previous solutions
• In order now to produce the whole string 10110 we
can take 1 with 0110 or 10 with 110 or 101 with 10,
or 1011 with 0. Here, AA is produced by B.
1
B
A
B
0
A
S
B
S
1
B
A
B
1
B
-
0
A
Answer
• If the last line contains the start variable S, we
can find a derivation for the string following
the way the S was produced backwards.
• In our example, 10110 cannot be generated
since S was not found in the last line.
Mechanical way
• Now that we showed why this method works
lets give an easy way to compute the table
Mechanical way
• Suppose that we are about to fill in the position
with the cycle. We take the pairs that the
arrows designate
1
B
A
0
A
S
B
S
1
B
A
B
1
B
-
0
A
Mechanical way
• Suppose that we are about to fill in the position
with the cycle. We take the pairs that the
arrows designate
1
B
A
0
A
S
B
S
1
B
A
B
1
B
-
0
A
Mechanical way
• Suppose that we are about to fill in the position
with the cycle. We take the pairs that the
arrows designate
1
B
A
0
A
S
B
-
1
B
A
B
1
B
-
0
A
Mechanical way
• Suppose that we are about to fill in the position
with the cycle. We take the pairs that the
arrows designate
1
B
A
0
A
S
B
-
1
B
A
B
1
B
-
0
A
Mechanical way
• So finally:
1
B
A
0
A
S
B
S
1
B
A
B
1
B
-
0
A
A string that is produced
• Run the CYK algorithm for the string 10111
1
0
1
1
1
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
0
A
1
B
1
B
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
-
0
A
1
B
1
B
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
-
0
A
S
1
B
1
B
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
-
0
A
S
1
B
A
1
B
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
-
0
A
S
1
B
A
1
B
A
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
-
0
A
S
1
B
A
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
-
0
A
S
1
B
A
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
-
0
A
S
B
1
B
A
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
-
0
A
S
B
1
B
A
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
-
0
A
S
B
1
B
A
-
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
-
0
A
S
B
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
0
A
S
B
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
0
A
S
B
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
0
A
S
B
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
0
A
S
B
-
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
0
A
S
B
-
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
-
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
-
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
-
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
S
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
S
The derivation is:
S
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
S
The derivation is:
S
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
S
The derivation is:
S → AB
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
S
The derivation is:
S → AB
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
S
The derivation is:
S → AB
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
S
The derivation is:
S → AB → BBB
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
S
The derivation is:
S → AB → BBB
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
B
A
S
The derivation is:
S → AB → BBB
0
A
S
B
A
1
B
A
S
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
0
1
B
A
B
S
A
B
S
A
A
S
The derivation is:
S → AB → BBB → BAAB
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
0
1
B
A
B
S
A
B
S
A
A
S
The derivation is:
S → AB → BBB → BAAB
1
B
-
1
B
A string that is produced
• Run the CYK algorithm for the string 10111
1
0
1
1
1
B
A
B
B
B
S
A
B
S
A
A
S
The derivation is:
S → AB → BBB → BAAB → BABBB
A string that is produced
• Run the CYK algorithm for the string 10111
1
0
1
1
1
B
A
B
B
B
S
A
B
S
A
A
S
The derivation is:
S → AB → BBB → BAAB → BABBB
A string that is produced
• Run the CYK algorithm for the string 10111
1
0
1
1
1
B
A
B
B
B
S
A
B
S
A
A
S
The derivation is:
S → AB → BBB → BAAB → BABBB → 10111