Simplification of context-free grammars
Removing useless productions
Definition
Let G = (N, T, P, S) be a context-free grammar. A nonterminal symbol A ∈ N is said to be useful if and only if there
is at least one w ∈ L(G) such that
∗
∗
S ⇒ xAy ⇒ w,
with x, y ∈ (N ∪ T )∗ .
A nonterminal that is not useful is called useless.
A production is useless if it involves any useless nonterminal.
Two reasons why a nonterminal is useless
1. Nonterminal cannot be reached from the start symbol
2. Nonterminal cannot derive a terminal string
Simplification of context-free grammars
Ü
Example:
Eliminate useless symbols and productions from
G = (N, T, P, S) where N = {S, A, B, C} and T = {a, b} with P
consisting of the rules S → aS | A | C, A → a, B → aa, C → aCb.
(1) Identify the set of nonterminals χ that can lead to a
terminal string:
χ = {A, B} because of A → a and B → aa.
χ = {A, B, S} because of S ⇒ A ⇒ a .
This argument cannot be made for C, thus identifying it as
useless.
Removing C and its corresponding productions we obtain:
S → aS | A, A → a, B → aa.
Example (continued)
(2) Eliminate the nonterminals that cannot be reached from
the start symbol.
For this, we draw a dependency graph for the nonterminals.
For context-free grammars, a dependency graph has its vertices
labeled with nonterminals, with an edge between vertices C
and D iff there is a production of the form C → xDy.
A dependency graph for our productions
S → aS | A, A → a, B → aa is
S
A
B
A nonterminal is useful only if there is a path from the vertex
labeled S to the vertex labeled with that nonterminal.
Example (continued)
In our example B is useless and removing B and its corresponding
production we obtain: Ĝ = (N̂ , T̂ , P̂ , S) with N̂ = {S, A},
T̂ = {a} and P = {S → aS | A, A → a}.
The formalization of this two-step process leads to a general
construction and the corresponding theorem
Theorem
Let G = (N, T, P, S) be a context-free grammar. Then there
exists an equivalent grammar Ĝ = {N̂ , T̂ , P̂ , S} that does
not contain any useless nonterminals or productions.
Simplification of context-free grammars
Removing λ-productions
Definition
Any production of a context-free grammar of the form A → λ
is called λ-production. Any nonterminal A for which the
∗
derivation A ⇒ λ is possible is called nullable.
Ü
A grammar may generate a language not containg λ, yet have
some λ-productions or nullable nonterminals. In such cases,
the λ-productions can be removed.
Simplification of context-free grammars
Ü
Example:
Consider G = ({S, S1 }, {a, b}, P, S), where
P = {S → aS1 b, S1 → aS1 b | λ}.
This grammar generates the λ-free language {an bn | n ≥ 1}.
Ü
For S1 → λ to be removed:
Ü
Substitute λ for S1 where it occurs on the right hand side:
S → aS1 b to S → ab
S1 → aS1 b to S1 → ab.
We obtain: S → aS1 b | ab and S1 aS1 b | ab.
This new grammar generates also {an bn | n ≥ 1}.
In more general situtations, substitutions for λ-productions can be
made in a similar, altough more complicated manner.
Simplification of context-free languages
Ü
Example:
Find a context-free grammar without λ-productions equivalent to
the grammar defined by
S → ABaC, A → BC, B → b | λ, C → D | λ, D → d.
1. Find nullable variables. ν = {B, C} because of B → λ,
C → λ.
ν = {A, B, C} because of A → BC.
2. Replace nullable variables with λ in all possible combinations
S → BaC | AaC | ABa | aC | Ba | Aa | a
A→B|C
We obtain
P̂ = {S → ABaC | BaC | AaC | ABa | aC | Ba | Aa | a,
A → BC | B | C, B → b, C → D, D → d}.
Simplification of context-free languages
Theorem
Let G be any context-free grammar with λ not in L(G).
Then there exists an equivalent grammar Ĝ having no λproductions.
Simplification of context-free languages
Removing unit-productions
Definition
Any production of a context-free grammar of the form
A → B,
where A, B ∈ N , is called a unit-production.
Ü
Example:
Remove the unit-productions in
P = {S → Aa | B, B → A | bb, A → a | bc | B}.
(1) Depedency graph for the unit productions
S
A
∗
B
∗
∗
∗
We see: S ⇒ B, S ⇒ A, B ⇒ A, A ⇒ B.
Example (continued)
(2) For grammar Ĝ without unit-productions put into P̂ all
non-unit productions of P . Pˆ1 = {S → Aa, B → bb, A → a | bc}.
∗
∗
(3) For all nonterminals S, A and B satisfying S ⇒ B, S ⇒ A,
∗
∗
B ⇒ A, A ⇒ B, add to Pˆ1
∗
S → a (because of S ⇒ A and A → a)
∗
S → bc (because of S ⇒ A and A → bc)
∗
S → bb (because of S ⇒ B and B → bb)
∗
A → bb (A ⇒ B, B → bb)
∗
B → a (B ⇒ A, A → a)
∗
B → bc (B ⇒ A, A → bc)
(4) We obtain Ĝ with
P̂ = {S → a | bc | bb | Aa, A → a | bc | bb, B → a | bc | bb}.
Simplification of context-free grammars
Theorem
Let G = (N, T, P, S) be any context-free grammar without λ-productions. Then there exists a context-free grammar Ĝ = (N̂ , T̂ , P̂ , S) that does not contain any unitproductions and that is equivalent to G.
Theorem
Let L be a context-free language that does not contain
λ. Then there exists a context-free grammar that generates L and that does not have any useless productions, λproductions, or unit-productions.
In order to avoid that the removal of one type of productions
introduces another kind of productions (removing λ-productions
can create new unit-productions) we use the following sequence of
steps 1. remove λ-productions, 2. remove unit-productions, 3.
remove useless productions.
Chomsky Normal Form
Ü
A normal form is one that, altough restricted, is broad
enough so that any grammar has an equivalent normal-form
version.
Definition
A context-free grammars is in Chomsky normal form (CNF)
if all productions are of the form
A → BC or
A → a,
where A, B, C ∈ N and a ∈ T .
Chomsky Normal Form
Ü
Example:
The grammar
S → AS | a,
A → SA | b
is in Chomsky normal form. The grammar
S → AS | AAS,
A → SA | aa
is not.
Chomsky Normal Form
Theorem
Any context-free grammar G = (N, T, P, S) with λ ∈
/ L(G)
has an equivalent grammar Ĝ = (N̂ , T̂ , P̂ , S) in CNF.
We assume that G has no λ-productions and no unit-productions.
Construction of Ĝ:
(1) Construct G1 = (N1 , T, P1 , S) from G by considering all
productions in P of the form
A → x1 x2 . . . xn ,
where each xi is a symbol either in N or T .
Ü
Ü
If n = 1, then x1 must be terminal (since we have no
unit-productions). In this case, we add the production to P1 .
If n ≥ 2, introduce new nonterminals Ba for each a ∈ T .
For each production A → C1 C2 . . . Cn , where Ci = xi if xi is
in N and Ci = Ba if xi = a.
For every Ba we also add to P1 the production Ba → a.
Proofidea (continued)
Now, we have a grammar G1 all of whose productions have the
form
(a) A → a or
(b) A → C1 C2 . . . Cn , where Ci ∈ N1 .
(2) Add to P̂ all productions of the form (a) and the productions
of form (b) with n = 2.
Ü
For n > 2, we introduce new nonterminals D1 , D2 , . . . Dn−2
and add to P̂ the productions
A → C1 D 1 ,
D 1 → C2 D 2 ,
..
.
Dn−2 → Cn−1 Cn .
The resulting grammar Ĝ is in CNF and L(G) = L(Ĝ).
Chomsky Normal Form
Ü
Example:
S → ABa, A → aab, B → Ac
Ü
Step 1: S → ABBa , A → Ba Ba Bb , B → ABc , Ba → a,
Bb → b, Bc → c.
Ü
Step 2: S → AD1 , D1 → BBa , A → Ba D2 , D2 → Ba Bb .
Cocke-Younger-Kasami Membership Algorithm
Ü
Memberhsip question:
Given a language L and a string w, can we determine whether
or not w is an element of L?
Cocke-Younger-Kasami (CYK) Membership Algorithm
Assume we have a context-free grammar G = (N, T, P, S) in CNF
and a string w = a1 a2 . . . an .
We define substrings
wij = ai . . . aj and
subsets of N
∗
Nij = {A ∈ N | A ⇒ wij }.
Clearly, w ∈ L(G) if and only if S ∈ N1n .
CYK
This algorithm constructs a cell table.
Ü
Ü
To compute Nij observe that A ∈ Nii if and only if G
contains a production A → ai .
Therefore, Nii can be computed for all 1 ≤ i ≤ n by
inspection of w and the productions of the grammar.
To continue, notice that for j > i, A derives wij iff there is a
∗
∗
production A → BC with B ⇒ wik and C ⇒ wk+1j for
some k with i ≤ k, k < j.
In other words,
Nij =
[
{A | A → BC with B ∈ Nik , C ∈ Nk+1,j }.
k∈{i,i+1,...,j−1}
Ü
Compute all Vij :
1. Compute N11 , N22 , . . . , Nnn ,
2. Compute N12 , N23 , . . . , Nn−1,n ,
3. Compute N13 , N24 , . . . , Nn−2,n ,
etc.
References
LINZ, P. An introduction to Formal Languages and Automata.
Jones and Bartlett Learning, 2012.
HOPCROFT, J. and ULLMAN, J. Introduction to Automata
Theory, Languages and Computation. Addison-Wesley, 1979.
© Copyright 2026 Paperzz