Notes for Comp 497 (Comp 454) Week 10 4/5/05 Today look at the

Notes for Comp 497 (Comp 454)
Week 10
4/5/05
Today look at the last two chapters in Part II. Cohen presents some results concerning
context-free languages (CFL) and regular languages (RL) also some decidability issues.
Chapter 17
Errata (Chapter 17)
p. 383, two lines from the end, make that "(baabbbbb)(a)"
p. 392, eight lines from the end, append "in" to the line.
Earlier we looked at the union, intersection and Kleene closure of regular languages. Let
us see what properties context-free languages have both in themselves and in conjunction
with regular languages.
First, where L1 and L2 are context-free languages, what can we say about
(a)
(b)
(c)
Union L1 + L2
Product L1 L2
Closure L1*
?
All these turn out to be CFLs. We can prove this construction in a couple of ways (1)
using grammars for the languages (2) using machines. Look at grammars here. Cohen
also discusses machine-based proofs
(a) Union
THEOREM 36
If L1 and L2 are context free grammars, so is their union L1+L2
For each of L1 and L2 there is a CFG. Call these grammars CFG1 and CFG2 respectively.
We modify CFG1 by adding a subscript (1) to each non terminal. Thus X becomes X1.
Similarly we modify CFG2, adding a different subscript (2) to each non-terminal. Now
we can combine the grammars into a single grammar with no ambiguity. Finally, we add
a new rule
S → S1 | S2
Thus we have devised a CFG that generates L1 + L2 and shown it is therefore a CFL.
(b) Product
THEOREM 37
If L1 and L2 are context free grammars, so is their product L1L2
Similar proof to that of Theorem 36 - subscripting followed by addition of
S → S1 S2
(c) Closure
THEOREM 38
If L is a CFL then so is L*.
Change S to S1 throughout the existing grammar then add new rule S → S S1 | Λ
What about
(d) intersection of two CFL and
(e) complement of a CFL?
(d) Intersection.
THEOREM 39
The intersection of two CFL may or may not be a CFL.
We can show that both possibilities exist.
For example if both L1 and L2 are regular then L1 ∩ L2 is regular (Theorem 12) and
therefore CF
But what if L1 = anbnam (m,n > 0) (This is CFL; we can easily devise a CFG for it)
and L2 = anbmbm (m,n > 0) (Also CFL, we can easily devise a CFG for it)
the intersection of these two languages is anbnan which we know is non-context-free.
It turns out there is no algorithm to which we can give two CFL which will determine if
the intersection is a CFL.
(e) Complement
THEOREM 40
The complement of a CLF may or may not be a CLF
Again we can show that both possibilities exist.
If L is regular, and therefore CF, its complement is also regular (Theorem 11) and
therefore CF.
Consider the following proof by contradiction (that the complement of any CFL is CF).
Assume that CFL’ is CF
If L1 and L2 are CFL
Then L1’ and L2’ are CFL
Then (L1’ + L2’) is CFL
Then (L1’ + L2’)’ is CFL
Then L1 ∩ L2 is CFL
(our assumption)
(Theorem 36)
(our assumption again)
(de Morgan’s law)
But we know that the intersection of two CFL is not always CF so it must be the case that
our assumption is wrong and CFL’ is not always CF.
Mixing context-free and regular languages
Union: CF + RL
The union of CF and RL is CF because the RL is CF and we can apply Theorem 36. The
union may or may not be regular depending which is the “larger” language
E.g. 1: PALINDROME + (a+b)*
E.g. 2: PALINDROME + a*
(a+b)* is larger therefore union is regular
PALINDROME is larger therefore union non-regular
Intersection: CFL ∩ RL
THEOREM 41
The intersection of a CFL and a RL is always CF.
Proof is by construction. Given a PDA for the CFL and a FA for the RL we can construct
a PDA for the intersection language. The states of the new PDA are combinations of the
old PDA states and the FA states. Cohen sketches the construction logic on pages 394395 and then shows how to construct a PDA which recognizes the intersection of
EQUAL (CFL of strings with same number of a’s as b’s) and ENDA (RL of strings
ending in a)
Consider also DOUBLEWORD ( = ww where w = (a+b)* )
We know from Chapter 10 that it is non-regular
We know from Chapter 16 that it is non-context-free (pumping lemma proof)
We can also prove it is non-regular by means of Theorem 41 and careful choice of a
regular language with which to intersect it. Consider the intersection of DOUBLEWORD
with
aa*bb*aa*bb*
The intersection is L = ww
where w = anbm where m,n > 0 i.e. anbmanbm
But we know this language is non-context-free (see last week) so that means that
DOUBLEWORD must be non-context free also otherwise it would contradict Theorem
41.
Chapter 18
Errata (Chapter 18)
p. 410, line 10, replace “B → A” by “B → bAA”
In this chapter we look at some decidability issues similar to those considered in Chapter
11 for RL. Paraphrasing Cohen, the first group of questions (p. 402) is:
1. Do two CFG define the same language?
2. Is a particular CFG ambiguous?
3. If a CFG is ambiguous, is there another CFG defining the same language that is
not?
4. How can we tell if (CFL)’ is CF?
5. How can we tell if CFL1 ∩ CFL2 is CF?
6. Given CFG1 defining CFL1 and CFG2 defining CFL2 is CFL1 ∩ CFL2 empty?
7. Given CFG defining CFL, is CFL ≡ (a+b)*
These questions are all undecidable – no algorithm can exist for any of them. We will
see more undecidable questions in Part III of the book.
However, there are still some questions concerning CFG that we can answer:
1. (Emptiness) Does a particular CFG generate any words at all?
2. (Finiteness) Given a CFG, is the CFL it generates finite or infinite?
3. (Membership) Given a CFG, is a particular word w in the language it generates?
We will see similar questions to these in Part III.
Emptiness
THEOREM 42
Given a CFG, there is an algorithm to determine if it generates any words at all
We can prove this is true by finding such an algorithm.
We can tell if Λ is in the language (is S nullable?)
Assume it is not and convert the grammar to CNF
Find a rule of the form N → t and back substitute
If S is eliminated, we are done, CFG produces a string
If we cannot eliminate S, CFG produces no strings.
This is an algorithm because the back substitution must terminate in a finite number of
steps – there are a finite number of rules in the grammar.
The example on page 405 is of a grammar that does produce strings. The example on
page 406 is of a grammar that does not produce any strings. If you draw the derivation
tree of the page 406 grammar, you will see that we can never get rid of all the
nonterminals in a working string.
The following Theorem is somewhat related
THEOREM 43
Given a CFG with nonterminal X, there is an algorithm to determine if X is ever used in
the generation of words.
Again we can prove this Theorem by devising such an algorithm. We could break the
problem down into two subproblems: (i) can we generate a string of terminals from X?
(ii) Can we obtain from S, a working string containing X?
(i)
In a copy of the grammar (CFG2) exchange S and X wherever they occur. Now
apply the algorithm of Theorem 42 to CFG2. If CFG2 produces any words then X
in CFG produces words.
(ii)
Back substitution of X to see if we can obtain a working string from S. In the
example on page 408 X can be produced from S if A can be produced from S
(second rule). A can be produced from S according to the first rule so X can be in
a working string from S
Finiteness
THEOREM 44
There is an algorithm to decide if a given CFG generates a finite or an infinite language.
Again the proof is to construct such an algorithm.
Note that if any word in the language is long enough to apply the pumping lemma to we
can produce an infinite language. If the language is infinite there must be some words
long enough to apply the pumping lemma to. So we need to determine if the pumping
lemma can be applied –need to see if there is a self-embedded nonterminal that is
involved in the derivation of words in the language.
Algorithm
(1) eliminate useless nonterminals from the grammar (see Theorem 43)
(2) back substitution similar to Theorem 43 to see if there is a self-embedded
(directly or indirectly) nonterminal.
Example on p. 410-411
Membership
We would like to be able to determine if the language defined by a particular CFG
contains a particular word w.
THEOREM 45
Given a CFG and string x (over the same alphabet), we can determine if x can be
generated by the CFG
Once more the proof of the algorithm is the demonstration that an algorithm exists to
answer the question. We assume the CFG is in CNF
We wish to determine which
substrings of x = x1 x2 ... xn are derivable
from which
nonterminals in the grammar S, N1, N2, ... NM
The substrings of length 1 are easy to identify because they are on the RHS of rules of the
form N → t. For each substring of length 1 we have a list of the nonterminals that can
produce it.
For a string of length 2 e.g. xixj to be producable
xi must be producable from Np,
xj must be producable from Nq
and there is a rule Nr → Np Nq.
For each producable substring of length 2 we have a list of the nonterminals (Nr) that can
produce it.
Similarly, we can determine which substrings of length 3 are producable, then which
substrings of length 4 and so on. Eventually, we will consider the (sub)strings of length n
(the length of the word of interest). If S is among the nonterminals that can produce it
then x is in the language. Because the string x is finite in length, the algorithm is finite.
Parsing Simple Arithmetic
In chapter 3 we had a simple (recursive) grammar (AE) for arithmetic expressions.
However, it did not reflect the different operator precedences. A better grammar PLUSTIMES is given on Page 414. It distinguishes between a lower precedence operator (+)
and a higher precedence operator (*). We can easily extend the grammar to include
subtraction and division operators. We can also extend he grammar to include operators
with precedence greater than * and operators with precedence lower than +.
The derivation tree for an arithmetic expression generated using PLUS-MINUS will
properly reflect operator precedence (see p 416)
The parsing problem is how to determine whether a string x is in a language L. In this
case whether a string is a valid arithmetic expression. Two approaches are:
(1) Top-down – start with S and see if we can generate x
(2) Bottom-up – start with x and see if we can reduce it to S
(1) Top-down: On pages 416-420, Cohen gives an example of this approach showing
how a derivation tree is grown and pruned and how, in this case, a tree can be
constructed with w as its leaves. Note that we don’t need to grow the tree in full;
we can explore a branch then backtrack to the parent and try another if it doesn’t
work out.
(2) Bottom-up: On pages 421-423. In this case the root of the tree is the string x and
we try to construct a path to a leaf S.
We know from Data Structures classes that a postfix (Reverse Polish) arithmetic
expression can be evaluated using a stack of operands. A PDA has a stack so it seems
reasonable to devise a PDA that can read a postfix expression and output its value. We
need to add ADD, MPY and PRINT operators to our existing set of nodes. This PDA is
given on page 424.
We also know from Data Structures that Dijkstra’s algorithm for converting infix to
postfix also uses a stack. This time the stack contains operators and open parentheses.
The PDA for this process is on Page 427. Input is an infix expression, output is the
corresponding postfix.
Reading Assignment
Read Chapters 17 and 18
Homework #5
Here is Homework #5 due 4/12. Each question is worth 20 points. Covers Chapters 16,
17 and 18.
1.
Consider the grammar for { anbn }
S → aSb | ab
(a) Chomsky-ize this grammar.
(b) Find all derivation trees that do not have self-embedded nonterminals.
2.
3.
(a)
Find a CFG for the language L = (bb)*a
(b)
Find a CFG for L*
(where L is the language in part (a))
Show that
L = { apbqapbs }
Is context free.
4.
For each of the following two grammars and target strings, determine whether the
string can be generated from the grammar (use the CYK algorithm).
(a)
S → XS
X → XX
X→a
S→b
String baab
(b)
S → AB
A → BB | a
B → AB | b
String bbaab
5.
For each of the following two grammars, determine whether it generates any
words (use the algorithm of Theorem 42)
(a)
S → XY
X → SY
Y → SX
X→a
Y→b
(b)
S → AB
A → BSB
B → AAS
A → CC
B → CC
C → SS
A→a|b
C → b | bb