In general

Simplifications
of
Context-Free Grammars
A Substitution Rule
Equivalent
grammar
S  aB
A  aaA
A  abBc
B  aA
Bb
S  aB | ab
Substitute
Bb
A  aaA
A  abBc | abbc
B  aA
A Substitution Rule
S  aB | ab
A  aaA
A  abBc | abbc
B  aA
Substitute
B  aA
S  aB | ab | aaA
A  aaA
A  abBc | abbc | abaAc
Equivalent
grammar
In general:
A  xBz
B  y1
Substitute
B  y1
A  xBz | xy1z
equivalent
grammar
Nullable Variables
  production :
A
Nullable Variable:
A  
Removing Nullable Variables
Example Grammar:
S  aMb
M  aMb
M 
Nullable variable
Final Grammar
S  aMb
M  aMb
M 
S  aMb
Substitute
M 
S  ab
M  aMb
M  ab
Unit-Productions
Unit Production:
A B
(a single variable in both sides)
Removing Unit Productions
Observation:
A A
Is removed immediately
Example Grammar:
S  aA
Aa
A B
BA
B  bb
S  aA
Aa
A B
BA
B  bb
S  aA | aB
Substitute
A B
Aa
B  A| B
B  bb
S  aA | aB
S  aA | aB
Aa
B  A| B
B  bb
Remove
BB
Aa
BA
B  bb
S  aA | aB
Aa
BA
B  bb
Substitute
BA
S  aA | aB | aA
Aa
B  bb
Remove repeated productions
Final grammar
S  aA | aB | aA
S  aA | aB
Aa
Aa
B  bb
B  bb
Useless Productions
S  aSb
S 
SA
A  aA Useless Production
Some derivations never terminate...
S  A  aA  aaA    aaaA  
Another grammar:
SA
A  aA
A
B  bA Useless Production
Not reachable from S
contains only
terminals
In general:
if
S    xAy    w
w L(G )
then variable
A
is useful
otherwise, variable
A
is useless
A production A  x is useless
if any of its variables is useless
S  aSb
S 
SA
Productions
useless
useless
A  aA
useless
useless
BC
useless
useless
CD
useless
Variables
Removing Useless Productions
Example Grammar:
S  aS | A | C
Aa
B  aa
C  aCb
First:
find all variables that can produce
strings with only terminals
S  aS | A | C
Round 1:
Aa
SA
B  aa
C  aCb
{ A, B}
Round 2:
{ A, B, S}
Keep only the variables
that produce terminal symbols:
{ A, B, S}
(the rest variables are useless)
S  aS | A | C
Aa
S  aS | A
Aa
B  aa
C  aCb
B  aa
Remove useless productions
Second: Find all variables
reachable from
S
Use a Dependency Graph
S  aS | A
Aa
B  aa
S
A
B
not
reachable
Keep only the variables
reachable from S
(the rest variables are useless)
Final Grammar
S  aS | A
Aa
B  aa
S  aS | A
Aa
Remove useless productions
Removing All
Step 1: Remove Nullable Variables
Step 2: Remove Unit-Productions
Step 3: Remove Useless Variables
Normal Forms
for
Context-free Grammars
Chomsky Normal Form
Each productions has form:
A  BC
variable
or
variable
Aa
terminal
Examples:
S  AS
S  AS
S a
S  AAS
A  SA
A  SA
Ab
A  aa
Chomsky
Normal Form
Not Chomsky
Normal Form
Convertion to Chomsky Normal Form
Example:
S  ABa
A  aab
B  Ac
Not Chomsky
Normal Form
Introduce variables for terminals:
Ta , Tb , Tc
S  ABTa
S  ABa
A  aab
B  Ac
A  TaTaTb
B  ATc
Ta  a
Tb  b
Tc  c
Introduce intermediate variable:
S  ABTa
A  TaTaTb
B  ATc
Ta  a
Tb  b
Tc  c
V1
S  AV1
V1  BTa
A  TaTaTb
B  ATc
Ta  a
Tb  b
Tc  c
Introduce intermediate variable:
S  AV1
V1  BTa
A  TaTaTb
B  ATc
Ta  a
Tb  b
Tc  c
V2
S  AV1
V1  BTa
A  TaV2
V2  TaTb
B  ATc
Ta  a
Tb  b
Tc  c
Final grammar in Chomsky Normal Form:
S  AV1
V1  BTa
Initial grammar
S  ABa
A  aab
B  Ac
A  TaV2
V2  TaTb
B  ATc
Ta  a
Tb  b
Tc  c
In general:
From any context-free grammar
(which doesn’t produce  )
not in Chomsky Normal Form
we can obtain:
An equivalent grammar
in Chomsky Normal Form
The Procedure
First remove:
Nullable variables
Unit productions
Then, for every symbol
a:
Ta  a
Add production
In productions: replace
New variable:
Ta
a with Ta
Replace any production
with
A  C1C2 Cn
A  C1V1
V1  C2V2

Vn2  Cn1Cn
New intermediate variables:
V1, V2 , ,Vn2
Theorem:
For any context-free grammar
(which doesn’t produce  )
there is an equivalent grammar
in Chomsky Normal Form
Observations
• Chomsky normal forms are good
for parsing and proving theorems
• It is very easy to find the Chomsky normal
form for any context-free grammar
Greinbach Normal Form
All productions have form:
A  a V1V2 Vk
symbol
variables
k 0
Examples:
S  cAB
A  aA | bB | b
Bb
Greinbach
Normal Form
S  abSb
S  aa
Not Greinbach
Normal Form
Conversion to Greinbach Normal Form:
S  abSb
S  aa
S  aTb STb
S  aTa
Ta  a
Tb  b
Greinbach
Normal Form
Theorem:
For any context-free grammar
(which doesn’t produce  )
there is an equivalent grammar
in Greinbach Normal Form
Observations
• Greinbach normal forms are very good
for parsing
• It is hard to find the Greinbach normal
form of any context-free grammar
The CYK Parser
The CYK Membership Algorithm
Input:
• Grammar
• String
G in Chomsky Normal Form
w
Output:
find if
w L(G )
The Algorithm
Input example:
• Grammar
• String
G : S  AB
A  BB
Aa
B  AB
Bb
w : aabbb
aabbb
a
a
b
b
aa
ab
bb
bb
aab
abb
bbb
aabb
abbb
aabbb
b
S  AB
A  BB
Aa
B  AB
Bb
a
A
a
A
b
B
b
B
aa
ab
bb
bb
aab
abb
bbb
aabb
abbb
aabbb
b
B
S  AB
A  BB
Aa
B  AB
Bb
a
A
a
A
b
B
b
B
aa
aab
ab
S,B
abb
bb
A
bbb
bb
A
aabb
abbb
aabbb
b
B
S  AB
A  BB
Aa
B  AB
Bb
a
A
a
A
b
B
b
B
aa
bb
A
bbb
S,B
bb
A
aab
S,B
ab
S,B
abb
A
aabb
A
abbb
S,B
aabbb
S,B
b
B
Therefore:
aabbb  L(G )
Time Complexity:
Observation:
3
| w|
The CYK algorithm can be
easily converted to a parser
(bottom up parser)