Chapter 16: Non-Context-Free Languages
● Theorem. Let G be a context-free grammar in Chomsky
normal form. Let L be the subset of words generated by G
which have derivations such that each production of the form:
Nonterminal → Nonterminal Nonterminal
is used at most once. L is a finite set of words.
Proof. At each step in a derivation, a nonterminal is replaced
by either 2 nonterminals or one terminal. Thus, if there
are p productions of the form:
Nonterminal → Nonterminal Nonterminal
a word in L contains at most p+1 letters.
The set of words that contain at most p+1 letters is finite.
1
Example
S → AB A → XY Y → SX
S → a X → a X→ b B→ b A → a
● S ⇒ AB ⇒ aB ⇒ ab
● S ⇒ AB ⇒ XYB ⇒ bYB ⇒ bSXB ⇒ baXB ⇒ baaB ⇒ baab
(1)
(2)
(3)
● S ⇒ AB ⇒ XYB ⇒ XSXB ⇒ bSXB ⇒ baXB ⇒ baaB ⇒ baab
(1)
(2)
(3)
leftmost derivations using CFGs in Chomsky normal form:
Z → XY
● S ⇒ … ⇒ s1Zs2 ⇒ … ⇒ s1s3Zs4 ⇒ …
where the above rule is used twice and s1 and s3 are strings of
terminals and s2 and s4 are strings of nonterminals
2
S
a branch: a path between the root and a leaf
of a derivation tree
Z
A
S → AZ Z → BB B → ZA
A→a B→b
a B
B
S ⇒ AZ ⇒ aZ ⇒ aBB ⇒ abB ⇒ abZA…
the same nonterminal twice on the same
A
b Z
branch (The 2nd is a tree descendant of the
1st. The nonterminal is self-embedded.)
S
S → AA A → BC C → BB
A→a B→b
A
A
S ⇒ AA ⇒ BCA ⇒ bCA ⇒ bBBA…
two different branches
C
B
…
Remark: binary trees
b
B
B
…
3
Given a context-free grammar in Chomsky normal form, a
derivation tree that has at most p+1 rows of nonterminals has at
most 2p leaves.
S → XY | a
1st row
20 nodes
2nd row
21 nodes
3rd row
22 nodes
4th row
23 nodes
p=3, p+1 rows
2p leaves, maximum
X → SY | a Y → SX | b
S
X
S
Y
S
Y
X
X
Y S
X X
Y S
Y
a
b a
a
b a
a
a
4
● Theorem. Let G be a context-free grammar in Chomsky
normal form that has p productions of the form:
Nonterminal → Nonterminal Nonterminal.
Let w be a word such that length(w)>2p. Then in every
derivation tree for w there exists some nonterminal Z that
appears twice on the same branch.
Proof. A tree that has
more than 2p leaves
has more than p+1
rows. (A tree that
has p+1 rows has at
most 2p leaves.)
1 terminal
p
nonterminals
p+1
nonterminals
1 terminal
5
Example
S
X
Y
B
S
X
Y
X X
Y a a
S → XY
X → BY
Y → XY
B → SX
B→b
X→a
Y→a
Y→b
a
Y
X
Y
B
Y b
B
Y
b
a
b
a
b
● p=4
● the branch that ends in a has
6 nonterminals (5
applications of productions
containing only
nonterminals)
● at least one production is
used more than once
6
Example: PALINDROME – {Λ}
S → AX
S → AA
X → SA
S → BB
S → BY Y → SB
A→a B→b
26 = 64
S
a
……………… 2nd row
A
…………… 3rd row
X a
…………… 4th row
S
A
…………… 5th row
b
a
…………… 6th row
S
A
a
S→b
………………… 1st row
X
A
S→a
7
S
S
X
A
a
a
S
a
A
X a
A
S
b
aabaa
X
A
A
a
S
X a
A
a
S
A
X a
A
a
A
S
A
b
a
aaabaaa
8
9
A
A
A
A
S
S
S
S
S
X
X
X
X
A
A
A
A
aaaabaaaa
aaaaabaaaaa
The pumping lemma (for context-free languages)
Let G be a context-free grammar in Chomsky normal form
that has p productions of the form:
Nonterminal → Nonterminal Nonterminal.
Let w be a word such that length(w)>2p. w can be
decomposed into 5 factors:
w = uvxyz
such that
● x is not Λ
● at least one of v and y is not Λ
● for all n ≥ 1, uvnxynz is in the language generated by G.
10
● Proof. From the previous theorem, there exists a nonterminal
P that occurs at least twice on the same branch.
S
P
u
Q
P
v
R
z
y
x
w = uvxyz
x≠Λ
either v ≠ Λ, or y ≠ Λ
11
S
S
A
S
A
b
a
a
Note: u, z, and at least one of v and y could be Λ.
u = Λ v = Λ x = ba y = a z = Λ
12
u
Q
S
S
P
P
P
R
v
Q
P
v
u
z
y
v
y
v
Q
Q
R
z
y
P
P
v
uvvxyyz
R
R
y
Q
x
P
R
y
x
uvvvxyyyz
In general: uvnxynz
13
Alternatively:
S
P
u
Q
P
v
R
z
y
x
S ⇒∗ uPz P ⇒∗ vPy P ⇒∗ x
S ⇒∗ uPz ⇒∗ uvPyz ⇒∗ uvxyz
w = uvxyz
x≠Λ
either v ≠ Λ, or y ≠ Λ
14
anbnan?
START
PUSH a
PUSH a
a
ACCEPT
READ
∆
b
POP
b
a
POP
READ
a
POP
a
a
∆
READ
15
© Copyright 2026 Paperzz