Pumping Lemmas
Pumping Lemmas
for
Context-Free Grammars
Non-Context-Free Languages
In the study of Regular Languages we developed some results that
were useful to show that some languages were not Regular. The
easiest such results to apply were those provided by the "pumping
lemmas". As it turns out, analogous results can be obtained for
Context-Free languages - allowing us to prove, among other things,
that not all languages are Context-Free.
When is a language not context-free?
Def.: the depth of a tree is the length of the longest path from the root
to a leaf.
Bounds: let G = (V, Σ, R, S) be a CFG, with |V| = m > 0, and such that
for every rule A → x ∈ R, |x| ≤ d, for some d > 0.
3/2/08
FCS
2
Pumping Lemmas
Pumping Lemmas
Non-Context-Free Languages
Lemma: a tree with more than d m leaves must have a path longer than
m. On this path at least one non-terminal symbol occurs twice.
Pf.: path length - by induction.
Base Case: If m = 0, then any tree with more than one leaf must
have a path of length at least 1.
Induction Step: Assume the statement is true for m = n ≥ 0. Prove
true for m = n + 1. Let T be a tree with more than dn+1 leaves. Since T0,
the root of T has at most d children, one of its children must be a tree
with more than d n leaves. This subtree must have a path longer than n
(by the induction assumption), and one more link will take us to the
root T0, giving us a path longer than n + 1.
End: Since a path of length at least m + 1 must connect at least m + 2
nodes, and we have m non-terminals, a non-terminal must repeat.
Pumping Lemma - Simple Version
Lemma 3.43. (Pumping Lemma) Let G be a CFG. There exists K =
K(G) > 0, s.t. every string w ∈ L(G), with |w| ≥ K, can be decomposed
as w = uvxyz satisfying
1. v ≠ ε or y ≠ ε;
2. |vxy| ≤ K;
3. uvnxynz ∈ L(G) ∀ n ≥ 0.
Proof. Let G = (V, Σ, R, S), |V| = m > 0, and that for every rule A → x ∈
R, |x| ≤ d, for some d > 0. Choose K = K(G) = dm+1, and let w ∈ L, |w| ≥ K.
Consider a shortest derivation S ⇒* w. Let T be its parse tree: each
internal node is labeled by a non-terminal, every leaf by a terminal (or
ε). T has at least |w| ≥ d m+1 leaves (ε-labeled leaves do not contribute to
|w|), which means that T has at least one path of length ≥ m+1 from the
root to some leaf.
3/2/08
3/2/08
FCS
3
FCS
4
1
Pumping Lemmas
Pumping Lemmas
The tree T - with a longest path decomposed via "lowest" repetitions
of the same non-terminal symbol: the path π has length ≥ m + 1,
which means it has at least m + 1
non-terminal symbols (only the last symbol is
π
T1
a terminal). Choose the (q1, q 2 ) pair with the
lowest q1 among such pairs on the path. Let π'
q1
denote the path from q 1 to the leaf. This path
u
z
has no other repetitions of non-terminals and
T2 thus has length ≤ m + 1. Let A be the label
q2
(non-terminal) of q1. We now decompose the
v
y
parse tree T into three parts: T1, T2 and T3 ,
π'
T3 giving us the five strings at left as the fringe of
T: w = uvxyz.
x
3/2/08
FCS
5
Does the decomposition do what we need?
1. Assume v = y = ε. Then T2 can be eliminated, and we have a
derivation that is shorter than the original , which was assumed
shortest. Contradiction.
2. As we saw, the longest path in T2 ∪ T3 has
π
length ≤ m + 1. This implies that |vxy| ≤ d m+1 = K.
T1
3. For n = 1, uvn xzyn z = w ∈ L(G). The
q
1
cases n = 0 and n ≥ 2 can be handled
u
z
via elementary surgery on the tree T,
deleting T2 or adding more copies of
T2
T2 joined at the labels A.
q2
So uvnxzynz ∈ L(G) ∀ n ≥ 0.
v
y
Now: how do we use it? We will prove
that several languages are NOT context-free.
3/2/08
Pumping Lemmas
FCS
q2
x
T2
y
T3
6
Pumping Lemmas
Pumping Lemma - Example
Ex. 3.44. L = {a n b n cn | n ≥ 0} is not Context Free.
Pf. By contradiction - just like the proofs for regular languages.
Assume L is CF, with a CF grammar G. Let K = K(G) as from the
Lemma. Consider the string w = a KbKcK ∈ L. By the lemma, w = uvxyz,
satisfying 1.- 3. A crucial result is that |vxy| ≤ K. This implies one of the
following:
1. vxy is a substring of a K or bK or cK. In all three cases, the Pumping
Lemma allows us to "pump" - up or down - one of the three substrings
independently of the others, creating strings that violate the size
constraint on the three segments.
2. vxy is a substring of a KbK or b KcK - the length constraint does not
allow it to have all three characters. Assume vxy is a substring of a Kb K ,
the other case being handled in the same way. Then z has cK as a
substring and uxz must have fewer as or bs than cs. Contradiction.
Pumping Lemma - Example
Ex. 3.45. L = {ww | w ∈ {0, 1}* } is not CF.
3/2/08
3/2/08
FCS
v
7
Pf. By contradiction. Assume L is CF, with grammar G. Let K = K(G);
ww = 0K1 K0 K1K with w = 0K1K. By the Pumping Lemma, ww = uvxyz, with
the three conditions satisfied. In case vxy falls in one of the "same
character" pieces, we are done, since the "pumping" allows us to
change the string so there is no way to maintain the constraint.
Otherwise, it must fall either in the first 0 K1 K (= w) or in the middle 1 K0 K
or in the final 0K1 K. In each one of these cases, the "pumped down"
string uxz ∈ L must contain a block of 0s (or 1s) of length < K, and one
(at the other end of the string) of length exactly K. This new string
cannot be equal to w'w' for any w' ∈ {0, 1}* .
FCS
8
2
Pumping Lemmas
Pumping Lemmas
Pumping Lemma - Example
Ex. 3.47. L = {a ib jck | k = max{i, j}} is not CF.
Pf. By contradiction. Assume L is CF, with grammar G. Let K = K(G);
let w = aKb KcK. By the Pumping Lemma, w = uvxyz, satisfying the three
conditions. By the length condition, if vxy contains characters of a
single type, we are done, by "pumping down" or "pumping up".
Otherwise, it cannot contain both a and c.
1. vy contains c. Then the number of cs in uxz is less than K (there are
K of them altogether in w), while the number of as in uxz is still K.
Contradiction.
2. vy does not contain c. In this case, "pumping up" implies that either
the number of as or that of bs can be increased without altering the
number of cs. Again, contradiction.
CFGs - (NON) Closure Properties
Corollary 3.48. The class of Context-Free languages is not closed
under intersection, complementation or the MAX operation.
Proof.
1. Intersection. Consider the CF languages L1 = {a nbncm | n, m ≥ 0}, L2 =
{a n b mcm | n, m ≥ 0} (it is easy to construct CF grammars for them). Note
that L1 ∩ L2 = {a n b n cn | n ≥ 0}, which we showed to be not CF.
2. Assume closure under complementation; let L1 and L2 be CF. Then
L1∪ L2 is CF (since CF is closed under union) and its complement
L1∩L2 is CF. This contradicts 1. Above.
3. Let L = {a ib jck | i ≥ k or j ≥ k}, which is easily shown CF. MAX(L) =
{a ib jck | k = max{i, j}}, shown not CF on the previous slide.
3/2/08
3/2/08
FCS
9
FCS
10
Pumping Lemmas
Pumping Lemmas
CFGs - (NON) Closure Properties - Example
Ex. 3.49. L = {w ∈ {a, b, c}* | #a(w) = #b(w) = #c(w)} is not CF.
Pf. Observe that L1 = {a n b n cn | n ≥ 0} = L ∩ a *b*c* . If L ∈ CF, then L1 ∈
CF - since a CF language intersected with a regular language gives a
CF language. But we just proved that the last inclusion if false.
Pumping Lemma - Stronger Version
Lemma 3.51 (Ogden's Lemma). Let G ∈ CFG. ∃ K = K(G) > 0 s.t. if we
mark at least K symbols of a string w ∈ L(G), with |w| ≥ K, then w can
be decomposed as w = uvxyz satisfying:
1. v or y has at least one marked symbol;
2. vxy contains at most K marked symbols;
3. uvnxynz ∈ L(G) ∀ n ≥ 0.
Proof. Let d, m, K be as in the Pumping Lemma. Let T be a parse tree
of w with ≥ K marked symbols.
S*
Def.: an internal node of T is marked
S
b
S*
if it has at least two children, both of
which contain a marked leaf as a
descendant.
a*
c
S
S
Ex. 3.50. If the set L has the property that every subset of L is CF,
then L must be a finite set.
Pf. See text.
We now try to extend the Pumping Lemma to non-continguous sets of
characters… Why can't we take any substring w, |w| ≥ K, and do what
we did for the Strong Form of the Pumping Lemma for RLs?
a*
3/2/08
FCS
11
3/2/08
FCS
a*
12
3
Pumping Lemmas
Pumping Lemmas
Claim: if every path of a parse tree T contains at most i marked
internal nodes, then T has at most d i marked leaves.
Pf. By induction.
1. Base case: i = 0. All internal nodes are unmarked - then there must
be at most 1 = d0 marked leaf: if there were two, then the lowest
common ancestor would be marked.
2. Induction case: assume result true for some i ≥ 0. Prove true for i +
1. Let q be a marked internal node of T (with i + 1 marked nodes)
whose proper ancestors are all unmarked. Then all marked leaves in
T are descendants of q - otherwise any lowest ancestor of both q and
this leaf would have to be a marked internal node of T. q has at most d
children (by assumption about the bound on fanning), and each of
them is the root of a subtree with at most i marked internal nodes in
any path. By the inductive hypothesis, each such subtree has at most
di marked leaves, for a total of d i+1 marked leaves in T.
Now we go back to proof of the Lemma: w has at least K = d m+1
marked symbols (= leaves in T) ⇒ the max number of marked internal
nodes in a path of T is ≥ m+1. But this implies that at least two marked
nodes on this path have the same label. Choose a path π with the max
number of marked internal nodes; choose its lowest pair of same-label
marked nodes, (q 1 , q2). Then decompose T into T1 , T2 and T3 at these
nodes. The decomposition gives rise to the 5 strings: w = uvxyz, and to
the conditions that allow pumping.
3/2/08
3/2/08
FCS
13
Note: the original form of the Pumping Lemma for CFLs follows by
simply marking all the characters of w.
FCS
14
Pumping Lemmas
Pumping Lemmas
Pumping Lemma - Stronger Version - Example
Ex. 3.52. L = {anbnci | i ≠ n} ∉ CFL.
Pf. Assume CF. ∃ G ∈ CFGs generating L. Let K = K(G). Since we
don't know the actual lengths of v and y, we have no idea by how
much each successive pumping will extend the string. We can restrict
the pumping to the left-hand part of the string (by marking there), so
that the length of the c-suffix will not change. This simply says that the
number of extra cs must be divisible by every possible "pumping
factor". Here are the details: let w = aKb KcK + K!, and find a way to force
the decomposition so v = a j, y = bj, j ≤ K. The "regular" Pumping
Lemma does not guarantee a position for the substrings, so y could be
a substring of cK + K! , and the pumping would not lead to any
contradiction. Ogden's Lemma allows us to mark letters: mark all the
as in w.
We now have to track various cases…
Claim 1: neither v nor y contains more than one type of symbol.
Pf. If either contained both as and bs or bs and cs, vn or yn, n ≥ 2, would
contain either bs before as or cs before bs - leading to string not in L.
Contradiction.
Claim 2: either b or c is not in vy.
Pf. Follows from Claim 1 (since either v or y contains a).
3/2/08
3/2/08
FCS
15
1.
2.
b ∉ vy. Then uv2 xy2z has more as than bs, and thus is not in L.
Contradiction.
c ∉ vy. Let j be the number of as in vy. Then the string uvnxyn z, with n =
1 + K!/j has as many as as cs, and cannot be in L. Contradiction.
FCS
16
4
Pumping Lemmas
Pumping Lemma - Stronger Version - Example
A consequence of Ogden's Lemma is the proof of the existence of
inherently ambiguous Context-Free languages. See the text for
details (Ex. 3.54).
We will see another ambiguity proof in a different context later in the
course.
3/2/08
FCS
17
5
© Copyright 2026 Paperzz