L is recognized by a PDA Suppose L is generated

15-453
FORMAL LANGUAGES,
AUTOMATA, AND
COMPUTABILITY
PUSH-DOWN AUTOMATA
AND
CONTEXT-FREE GRAMMARS
NONE OF THESE ARE REGULAR
Σ = {0, 1}, L = { 0n1n | n ≥ 0 }
Σ = {a, b, c, …, z},
L = { w | w = wR }
L = { palindromes}
madamimadam  L
zeus sees suez  L
Σ = { (, ) }, L = { balanced strings of parens }
(), ()(), (()()) are in L, (, ()), ())(() are not in L
NONE OF THESE ARE REGULAR
Σ = {0, 1}, L = { 0n1n | n ≥ 0 }
Σ = {a, b, c, …, z}, L = { w | w = wR
}
L = { palindromes}
madamimadam  L
zeus sees suez  L
PUSHDOWN AUTOMATA (PDA)
FINITE
STATE
CONTROL
STACK
(Last in,
first out)
INPUT
Input, pop,
qi
a, b → c
push
qj
In state qi reading symbol a from input
Seeing b on top of stack,
Replace b (pop) by c (push) and go to state qj
If a = ε, make transition without reading input
If b = ε,
“
“
“ popping from stack
If c = ε,
“
“
“
writing on stack
Non-deterministic
input
pop
ε,ε → $
0011
push
11
0011
011
0,ε → 0
1,0 → ε
ε,$ → ε
STACK
1
$
0 0
$ $
Non-deterministic
1,0 → ε
input
pop
ε,ε → $
001
push
1
001
01
0,ε → 0
1,0 → ε
ε,$ → ε
STACK
1,0 → ε
$
0 0
$ $
PDA that recognizes L = { 0n1n | n ≥ 0 }
Definition: A (non-deterministic) PDA is a tuple
P = (Q, Σ, Γ, , q0, F), where:
Q is a finite set of states
Σ is the input alphabet
Γ is the stack alphabet
 : Q  Σε  Γε → 2 Q  Γε
q0  Q is the start state
F  Q is the set of accept states
2 Q  Γε is the set of subsets of Q x Γε where
Γε = Γ  {ε}
Let w Σ* and suppose w can be written as
w1... wn where wi  Σε (recall Σε = Σ  {ε})
Then P accepts w if there are
r0, r1, ..., rn  Q and
s0, s1, ..., sn  Γ* (sequence of stacks) such that
1. r0 = q0 and s0 = ε (P starts in q0 with empty stack)
2. For i = 0, ..., n-1:
(ri+1 , b) (ri, wi+1, a), where si =at and si+1 = bt for
some a, b  Γε and t  Γ*
(P moves correctly according to state, stack and symbol read)
3. rn  F (P is in an accept state at the end of its input)
q0
ε,ε → $
q1
0,ε → 0
1,0 → ε
q3
Q = {q0, q1, q2, q3}
ε,$ → ε
q2
Σ = {0,1}
1,0 → ε
Γ = {$,0,1}
 : Q  Σε  Γε → 2 Q  Γε
(q1,1,0) = { (q2,ε) }
(q2,1,1) = 
EVEN-LENGTH PALINDROMES
Σ = {a, b, c, …, z}
q0
ε,ε → $
q1
,ε → 
ε,ε → ε
q3
ε,$ → ε
q2
Note: Also accepts the empty string
 , → ε
Build a PDA to recognize
L = { aibjck | i, j, k ≥ 0 and (i = j or i = k) }
c,ε → ε
b,a → ε
q0
ε,$ → ε
q2
q3
choose i=j
ε,ε → $
q1
choose i=k
q4
q5
b,ε → ε
c,a → ε
ε,ε → ε
a,ε → a
ε,ε → ε
ε,$ → ε
q6
CONTEXT-FREE GRAMMARS
“Colorless green ideas sleep furiously.”
CONTEXT-FREE GRAMMARS
production
rules
start variable
A → 0A1
A→B
B→#
variables
terminals
A  0A1  00A11  00B11  00#11
(yields)
A * 00#11
(derives)
Non-deterministic
Derivation
We say: 00#11 is
generated by the
Grammar
CONTEXT-FREE GRAMMARS
A → 0A1
A→B
B→#
A → 0A1 | B
B→#
CONTEXT-FREE GRAMMARS
A context-free grammar (CFG) is a tuple
G = (V, Σ, R, S), where:
V is a finite set of variables
Σ is a finite set of terminals (disjoint from V)
R is set of production rules of the form A → W,
where A  V and W  (VΣ)*
S  V is the start variable
L(G) = {w  Σ* | S * w} Strings Generated by G
CONTEXT-FREE LANGUAGES
A context-free grammar (CFG) is a tuple
G = (V, Σ, R, S), where:
V is a finite set of variables
Σ is a finite set of terminals (disjoint from V)
R is set of production rules of the form A → W,
where A  V and W  (VΣ)*
S  V is the start variable
G = { {S}, {0,1}, R, S }
L(G) =
R = { S → 0S1, S → ε }
CONTEXT-FREE LANGUAGES
A context-free grammar (CFG) is a tuple
G = (V, Σ, R, S), where:
V is a finite set of variables
Σ is a finite set of terminals (disjoint from V)
R is set of production rules of the form A → W,
where A  V and W  (VΣ)*
S  V is the start variable
G = { {S}, {0,1}, R, S }
R = { S → 0S1, S → ε }
L(G) = { 0n1n | n ≥ 0 } Strings Generated by G
WRITE A CFG FOR EVEN-LENGTH
PALINDROMES
S → S for all   Σ
S→ε
WRITE A CFG FOR THE EMPTY SET
G = { {S}, Σ, , S }
PARSE TREES
A
A
A
B
0 0
#
1 1
A  0A1  00A11  00B11  00#11
<EXPR> → <EXPR> + <EXPR>
<EXPR> → <EXPR> x <EXPR>
<EXPR> → ( <EXPR> )
<EXPR> → a
Build a parse tree for a + a x a
<EXPR>
<EXPR>
<EXPR>
<EXPR>
<EXPR>
<EXPR>
<EXPR> <EXPR>
a
+
a
x
a
<EXPR> <EXPR>
a
+
a
x
a
Definition: a string is derived ambiguously
in a context-free grammar if it has more
than one parse tree
Definition: a grammar is ambiguous if it
generates some string ambiguously
Can you give an unambiguous grammar
with standard arithmetic precedence ?
NOT REGULAR
Σ = {0, 1}, L = { 0n1n | n ≥ 0 }
But L is CONTEXT FREE
A → 0A1
A→ε
WHAT ABOUT?
Σ = {0, 1}, L1 = { 0n1n 0m| m,n ≥ 0 }
Σ = {0, 1}, L2 = { 0n1m 0n| m,n ≥ 0 }
Σ = {0, 1}, L3 = { 0m1n 0n| m=n ≥ 0 }
THE PUMPING LEMMA FOR CFGs
Let L be a context-free language
Then there is a P such that
if w  L and |w| ≥ P
then can write w = u v x y z, where:
1. |v y| > 0
2. |v x y| ≤ P
3. For every i ≥ 0, u vi x yi z  L
THE PUMPING LEMMA FOR CFGs
Let L be a context-free language
Then there is a P such that
if w  L and |w| ≥ P
then can write w = u v x y z, where:
1. |v y| > 0
2. |v x y| ≤ P
3. For every i ≥ 0, u vi x yi z  L
WHAT ABOUT?
Σ = {0, 1}, L3 = { 0m1n 0n| m=n ≥ 0 }
Idea of Proof: If w is long enough, then
any parse tree for w must have a path that
contains a variable more than once
T
T
R
R
R
R
R
u v
x
y z
y z
u v
v
x
y
Formal Proof:
Let b be the maximum number of symbols on
the right-hand side of any rule
If the height of a parse tree is h, the length of
the string generated by that tree is at most: bh
Let |V| be the number of variables in G
Define P = b|V|+1
Let w be a string of length at least P
Let T be a parse tree for w with a minimum
number of nodes.
T must have height at least |V|+1
The longest path in T must have ≥ |V|+1 variables
Select R to be the variable that repeats among
the lowest |V|+1 variables (in the path)
T
1. |vy| > 0
T
2. |vxy| ≤ PR
R
R
R
Let T be a parse tree for w with a minimum
R
number
ofxnodes.
y z
u v
y Tzmust have
u v height at
least |V|+1
v
x
y
THE PUMPING LEMMA FOR CFGs
Let L be a context-free language
Then there is a P such that
if w  L and |w| ≥ P
then can write w = u v x y z, where:
1. |v y| > 0
2. |v x y| ≤ P
3. For every i ≥ 0, u vi x yi z  L
WHAT ABOUT?
Σ = {0, 1}, L3 = { 0m1n 0n| m=n ≥ 0 }
PDAs ARE EQUIVALENT TO CFGs
A Language L is generated by a CFG

L is recognized by a PDA
Suppose L is generated by a CFG G = (V, Σ, R, S)
Construct P = (Q, Σ, Γ, , q, F) that recognizes L
A Language L is generated by a CFG

L is recognized by a PDA
Suppose L is generated by a CFG G = (V, Σ, R, S)
Construct P = (Q, Σ, Γ, , q, F) that recognizes L
ε,ε → S$
For each rule 'A → w’  R:
ε,$ → ε
ε,A → wR
For each terminal a  Σ:
a,a → ε
S → aTb
T → Ta | ε
ε,ε → $
ε,ε → T
ε,ε → S
ε,ε → T
ε,$ → ε
ε,ε → a
ε,T → ε
a,a → ε
b,b → ε
S → aTb
T → Ta | ε
S * ab
(derives)
ε,ε → $
ε,ε → T
ε,ε → S
ε,ε → T
ε,$ → ε
ε,ε → a
ε,T → ε
a,a → ε
b,b → ε
Suppose L is generated by a CFG G = (V, Σ, R, S)
Describe P = (Q, Σ, Γ, , q, F) that recognizes L :
(1) Push $ and then S on the stack
(2) Repeat the following steps forever:
(a) Pop the stack, call the result X.
(b) If X is a variable A, guess a rule that
matches A and push result into the stack
(c) If X is a terminal, read next symbol
from input and compare it to terminal.
If they’re different, reject.
(d) If X is $: then accept iff no more input
A Language L is generated by a CFG

L is recognized by a PDA
A Language L is generated by a CFG
 L is recognized by a PDA
Given PDA P = (Q, Σ, Γ, , q, F)
Construct a CFG G = (V, Σ, R, S) such that
L(G) = L(P)
First, simplify P to have the following form:
(1) It has a unique accept state, qacc
(2) It empties the stack before accepting
(3) Each transition either pushes a symbol or
pops a symbol, but not both at the same time
SIMPLIFY
q0
ε,ε → $
q1
0,ε → 0
1,0 → ε
q3
ε,$ → ε
q2
1,0 → ε
SIMPLIFY
ε,ε → E
Q
q0
ε,ε → $
q1
1,0 → ε
ε,ε → ε
q3
ε,$ → ε
ε,ε → ε
q4
ε,E → ε
ε,σ → ε
0,ε → 0
q5
q2
1,0 → ε
SIMPLIFY
ε,ε → E
Q
q0
ε,ε → $
q1
ε,ε → D
1,0 → ε
q’0
q3
ε,D→ ε
q’3
ε,$ → ε
ε,ε → D
ε,D → ε
q4
ε,E → ε
ε,σ → ε
0,ε → 0
q5
q2
1,0 → ε
Idea For Our Grammar G:
For every pair of states p and q in PDA P,
G will have a variable Apq whose production
rules will generate all strings x that can take:
P from p with an empty stack
to q with an empty stack
V = {Apq | p,qQ }
S = Aq0qacc
ε,ε → E
Q
q0
ε,ε → $
q1
ε,ε → D
1,0 → ε
q’0
q3
q’3
ε,$ → ε
ε,ε → D
q4
ε,E → ε
q2
1,0 → ε
Aq0q1 generates? 
ε,D → ε
ε,σ → ε
0,ε → 0
q5
Aq1q2 generates?
{0n1n | n > 0}
Aq1q3 generates? 
WANT: Apq generates all strings that take p
with an empty stack to q with empty stack
ε,ε → E
Q
q0
ε,ε → $
q1
ε,ε → D
1,0 → ε
q’0
q3
q’3
ε,$ → ε
ε,ε → D
q4
ε,E → ε
q2
1,0 → ε
Aq0q1 generates? 
ε,D → ε
ε,σ → ε
0,ε → 0
q5
Aq1q2 generates?
{0n1n | n > 0}
AQq5 generates?
WANT: Apq generates all strings that take p
with an empty stack to q with empty stack
WANT: Apq generates all strings that take p
with an empty stack to q with empty stack
Let x be such a string
• P’s first move on x must be a push
• P’s last move on x must be a pop
Two possibilities:
1. The symbol popped at the end is exactly
the one pushed at the beginning
2. The symbol popped at the end is not
the one pushed at the beginning
x = ayb takes p with empty stack to q with empty stack
1. The symbol t popped at the end is exactly
the one pushed at the beginning
stack
height
input
string
r
s
a push t
pop t b
p ────x──── q
δ(p, a, ε) → (r, t)
δ(s, b, t) → (q, ε)
Apq → aArsb
2. The symbol popped at the end is not
the one pushed at the beginning
stack
height
input
string
p
r
Apq → AprArq
q
Formally:
V = {Apq | p, qQ }
S = Aq0qacc
For every p, q, r, s  Q, t  Γ and a, b  Σε
If (r, t)  (p, a, ε) and (q, ε)  (s, b, t)
Then add the rule Apq → aArsb
For every p, q, r  Q,
add the rule Apq → AprArq
For every p  Q,
add the rule App → ε
q0
ε,ε → $
q1
0,ε → 0
1,0 → ε
q3
ε,$ → ε
q2
1,0 → ε
q0
ε,ε → $
q1
0,ε → 0
1,0 → ε
q3
L(P) = ?
G(P) = ?
ε,$ → ε
q2
1,0 → ε
For all x,
Apq generates x


x can bring P from p with an empty stack
to q with an empty stack
Proof (by induction on the number of steps in
the derivation of x from Apq):
Inductive
Step:
Base Case:
The derivation has 1 step: App  ε
Assume true for derivations of length ≤ k and
prove true for derivations of length k+1:
Apq * x in k+1 steps
For all x,
Apq generates x


x can bring P from p with an empty stack
to q with an empty stack
Proof (by induction on the number of steps in
the derivation of x from Apq):
Inductive Step:
Assume true for derivations of length ≤ k and
prove true for derivations of length k+1:
Apq * x in k+1 steps
Case 1: First step in derivation: Apq → AprArq * x
so x = yz where Apr * y & Arq * z
Now use induction hypothesis
For all x,
Apq generates x


x can bring P from p with an empty stack
to q with an empty stack
Proof (by induction on the number of steps in
the derivation of x from Apq):
Inductive Step:
Assume true for derivations of length ≤ k and
prove true for derivations of length k+1:
Apq * x in k+1 steps
Case 2: Apq → aArsb so x = ayb where Ars * y
By induction, y can bring P from r with empty stack to s with
empty stack (or with any symbol t on top stack to t on top stack).
For all x,
Apq generates x


x can bring P from p with an empty stack
to q with an empty stack
Proof (by induction on the number of steps in
the derivation of x from Apq):
Inductive Step:
Assume true for derivations of length ≤ k and
prove true for derivations of length k+1:
Apq * x in k+1 steps
Case 2: Apq → aArsb so x = ayb where Ars * y
Because Apq → aArsb (r,t)  (p,a,ε) and (q, ε)  (s,b,t)
is a rule,there is a t:
state
push
state
alphabet
pop
For all x,
Apq generates x


x can bring P from p with an empty stack
to q with an empty stack
Proof (by induction on the number of steps in
the computation of P from p to q with empty
stacks, on input x):
Base Case: The computation has 0 steps
So it starts and ends in the same state:
We must show that App * x
But it must be that x = ε, so we are done
Inductive Step:
Assume true for computations of length ≤ k,
we’ll prove true for computations of length k+1
Suppose that P has a computation where x
brings p to q with empty stacks in k+1 steps
Two cases:
1. The stack is empty somewhere in the
middle of the computation, say at state r.
Write x as yz (where the stack is empty after
reading y from p to r and after reading z from r to q).
So, Apr * y, Arq * z by I.H.
Recall, Apq → AprArq is a rule (by construction).
So Apq → AprArq → yArq → yz. So Apq * x
Inductive Step:
Assume true for computations of length ≤ k,
we’ll prove true for computations of length k+1
Suppose that P has a computation where x
brings p to q with empty stacks in k+1 steps
Two cases:
2. The stack is empty only at the beginning
and the end of this computation
Write x as ayb. There must be states r (immediately
following p) and s (immediately before q) where y goes from r
on empty stack to s on empty stack. p r…s q (why?)
So, Ars * y by I.H.
Now, P must push some t at p and pop t at s. (why?)
So grammar must have rule Apq → aArsb. So Apq * x
A Language L is generated by a CFG

L is recognized by a PDA
Corollary: Every regular
language is context-free