2.7 Turing Machines and Grammars

2.7
Turing Machines and Grammars
We now turn our attention back to Turing Machines as language acceptors. We have already seen in Sec. 2.4
how Turing Machines define two classes of languages, i.e., recursive language and recursively enumerable
languages, depending on whether string membership in the respective languages can be decided on or merely
accepted.
Recursive and Recursively Enumerable Languages
Accepting Vs Deciding
M accepts a language when it halts on every member string. M
decides a language when it halts (resp. hangs) on a string that
is (resp. is not) a member of the language.
Accepting, Deciding and Languages
Recursive Languages are decidable. Recursively Enumerable are
accepted (recognised) by Turing Machines.
Slide 58
Chomsky Language Class Hierarchy
Regular Languages: Recognised by Regular Grammars.
Context Free Languages: Recognised by Context Free Grammars.
...
Phrase Structured Languages: Recognised by Phrase Structured Grammars.
Slide 59
One alternative way how to categorise languages is using (Phrase Structured) Grammars. But how do
these language classes compare in relation to those defined by types of Turing Machines? In an earlier
course, we have already laid some foundation towards answering this question by considering machines that
characterise these language classes.
For instance, recall that regular languages were recognised by Finite State Automata. We also saw how
the inverse of a regular language was also recognised by an FSA. Thus, if we could construct two Turing
Machines corresponding to the FSMs that recognise a regular language L, and its inverse, L, respectively,
then we could construct a third Turing Machine from these two Turing Machines that dovetails the two is
search of whether x 2 L and x 2 L; the third Turing Machine terminates as soon as either of the sub-tms
terminate and returns true is the Turing Machine accepting x 2 L terminates as false if the Turing Machine
accepting x 2 L terminates. Clearly, this third Turing Machine will always terminate since, for any x, we
have either x 2 L or else x 2 L. This leads us to conclude that regular languages are included as part of
the recursive languages. The observation is complete by noting that a Turing Machine can easily act as an
45
Regular Grammars and Recursive Languages
Theorem 41. LReg ⇢ LRec
Proof. Consider both inclusion relations:
✓: FSA transitions can be encoded as Turing Machine transitions
as
(q1 , a) = (q2 , R)
6◆: Palindromes,
w(w)R , wa(w)R | w 2 ⌃⇤ , a 2 ⌃ ,
are in LRec but not in LReg .
Slide 60
FSA; all transitions would need to be of the form
(q1 , a) = (q2 , R)
whereby we only read from the tape and transition internally from one state to the other, never writing on
the tape or moving in the other direction. This brief (and informal) analysis allows us to affirm our intuition
that regular languages are included in recursive languages i.e., LReg ✓ LRec .
In an earlier exercise we also discussed how palindromes are decidable and by this fact we conclude that
they included in recursive languages. However, palindromes could not be recognised by FSAs (recall that
we needed more powerful machines like Pushdown Automata for this). This allows us to establish the strict
inclusion
Lreg ⇢ Lrec
In the next subsections we will attempt to complete the picture of how phrase structured languages relate
to recursive and recursively enumerable languages.
2.7.1
Turing Machines and Context Free Languages
Context Free languages (CFLs) are languages that are recognised by Context Free Grammars, i.e., grammars
whose production rules are of the form N ! (N [ ⌃)⇤ . They are also languages that are recognised by a
type of machine called Pushdown automata. The type we considered in an earlier course were actually Nondeterministic PushDown Automata (NPDAs). We shall used this key information to show how languages
recognised by Turing Machines relate to CFLs.
The relationship we will show is outlined on Slide 61, i.e., that there is a strict inclusion between CFL
and recursive languages. In order to show this we have to demonstrate that:
1. Every CFL can be decided by some Turing Machine.
2. There are recursive languages that are not CFL.
As in the case of Slide 60, in order to prove the second point above, we only need to find one witness
language, which together with the first point above would mean that the set of Recursive languages is
46
Context Free Langauges
Theorem 42. Context Free Languages are strictly included in
Recursive Languages.
LCFG ⇢ LRec
Proof. Consider both inclusion relations:
✓: CFG can be converted in Chomsky Normal Form where
derivations for strings w are bounded by at most 2|w| 1
steps.
6◆: The language {an bn cn | n
0} is in LRec but not in LCFG .
Slide 61
strictly larger than that of Context Free Languages. The language {an bn cn | n 0} satisfies our needs as
this witness language. Recall that, in an earlier course, we had established that this language could not be
recognised by any Context Free Grammar. Moreover, on Slide 35 we showed how we can construct a Turing
Machine that can decide this language i.e., showing that it is recursive.
This leaves us with the task of showing that every CFL is decidable by a Turing Machine. It turns out
that, with our present machinery, the proof to show this would be rather involving5 . Instead here we prove
a weaker result, namely that CFL are included in the set of recursively enumerable languages. This follows
from Lemma 43 of Slide 62.
Context Free Langauges and Recursively Enumerable
Languages
Lemma 43 (CFL and Acceptability). Every CFL can be recognised by some Turing Machine.
Proof. Use 2-tape non-deterministic Turing Machine whereby:
• 1st tape simulates input tape with head moving only to the
right.
• 2nd tape simulates the stack (push and pop) using a string.
Slide 62
(Proof Outline). If a language is Context Free, then there exists a NDPA that can recognise it. Unfortunately,
the inherent non-determinism in NPDAs does not allow us to state much about the termination of every
5 This proof involves converting the CFG to its Chomsky Normal Form. Then we use the result that, for CFGs in normal
form, any derivation of string w is bounded and requires at most 2n 1 steps, where n = |w|. Since derivations have an
upper-bound, we can construct a membership checking Turing Machine that always terminates (and returns a negative answer
after 2n 1 derivation steps.)
47
run of such a machine. All that NDPA recognition gives us is that there exists at least one run that accepts
strings in the recognised CFL i.e., we only have termination guarantees for strings in the language and the
non-determinism of the machine prohibits us from stating anything about decidability.
Nevertheless, we can use a 2 tape Turing Machine to easily simulate an NPDA, whereby we use the first
tape as the input tape (leaving the input string untouched and always moving the head to the right) and
use the second tape to simulate the NDPA stack (adding to and removing from symbols at the rightmost
position of the string on the second tape). In order to keep the simulation simpler, we can even use a nondeterministic Turing Machine.Such a simulation together with results form Sections 2.6.2 and 2.6.3 guarantee
that there exists some deterministic Turing Machine that can simulate the NDPA and therefore recognise
the language.
2.7.2
Turing Machines and Phrase Structured Languages
Generic (Phrase Structured) Grammars (PSG), like Turing Machines, can be seen as a mechanical description
for transforming a string to some other string. There are three key di↵erences however between the two
models of computation:
• Turing Machines, at least the plain vanilla variant, are deterministic. This is not the case for PSG,
where the production rules may allow non-deterministic derivations i.e., ↵ ! 1 , ↵ ! 2 for the
same substring ↵. Moreover, PSG allow expansions to happen at multiple points in the string being
generated whereas Turing Machines can only alter the string at the location pointed to by the head.
• PSG do not express any explicit notion of state whereas Turing Machine descriptions are more intentional i.e., closer to an ”implementation”. In fact, state plays a central role in order to determine
computation termination in Turing Machines.
• String acceptance in Turing Machines starts from the string and works its way back, whereas PSG
string acceptance works in reverese by generating the string.
Phrase Structured Langauges and Recursively Enumerable Languages
Both transform strings to strings but:
• Turing Machines, at least the plain vanilla variant, are deterministic.
• PSG do not express any explicit notion of state.
• String acceptance in Turing Machines starts from the string
and works its way back, whereas PSG string acceptance
works in reverese by generating the string.
Slide 63
In what follows we will show that, despite these discrepancies, the two formalisms are equally expressive.
By this we mean that every language that can be recognised by PSG, i.e., any PSL, can be recognised by a
Turing Machine, and also that any language recognised by a Turing Machine can be recognised by a PSG.
Thm. 44 on Slide 64 formalises this statement whereby, for the sake of notational consistency, we denote
PSL as LPSG and recursively enumerable languages as LRE .
In order to show LRE ✓ LPSG we need to establish some correspondence between computation on a
Turing Machine and string derivations in a Phrase Structure Grammar. We start by formulating a string
48
Phrase Structured Langauges and Recursively Enumerable Languages
Theorem 44. LPSG = LRE
Proof. We need to show:
1. LPSG ✓ LRE
2. LRE ✓ LPSG
Slide 64
description of a configuration. There are many possibilities here (e.g., a direct representation), but what we
are looking for is a representation that can be easily manipulated by grammar. One such representation is
by encoding a configuration hq, xayi as the string [xqay] where:
• q, apart from denoting the current machine state, is also used to denote the position of the head on
the tape. For instance, in [xqay], the symbol pointed to by the head would be the symbol immediately
following q, i.e., a.
• x, y 2 ⌃⇤ . We therefore represent configurations where the head is at the first location as [qay] for
some a 2 ⌃ i.e., x = ✏, and configurations where the head is at the far right of the string on the tape
as [xq#] i.e., a = # and y = ✏.
• The auxiliary symbols [ and ] act as delimiters of the string on the Turing Machine tape. The left
delimiter, [, is used to model crashing when the head attempts to move past the leftmost location on
the tape. The right delimiter, ], is used to signal the need for more padding # symbols when the head
attempts to move past the final symbol of the string represented (recall that the tape being modelled
is infinite to the right, but, in configurations, we only represent until the last non-blank symbol on
tape (Slide 16).
We use this encoding when stating Lemma 45 on Slide 66, which formalises the discussion relating Turing
Machine computation with string derivation in a grammar. Thm. 46 on Slide 67 then consolidates this result
for complete derivations starting from S down to strings that are part of the language of the grammar; this
result then entails LRE ✓ LPSG .
(Proof Outline for Lemma 45). We here outline how to construct such a grammar G without going into the
details of why the Lemma holds with such a construction. Thus, for M = hQ, ⌃, i, our constructed grammar
would be G = h⌃0 , N, P, Si where
• ⌃0 = ⌃ [ {[, ], qH }
• N = {S} [ Q
• P is constructed as follows:
– For all a1 , a2 2 ⌃, q1 2 Q, q2 2 Q [ {qH } such that (q1 , a1 ) 7! (q2 , a2 ) we add the production
rule
q1 a1 ! q2 a2
49
Establishing Correspondence for LRE ✓ LPSG
• Encode hq, xayi as the string [xqay].
• For M = hQ, ⌃, i have
G = h(⌃[{[, ], qH }), ({S}[Q), P, Si
where P contains:
(q1 , a1 ) 7! (q2 , a2 ) : q1 a1 ! q2 a2
(q1 , a1 ) 7! (q2 , R) : q1 a1 a2 ! a1 q2 a2 and q1 a1 ] ! a1 q2 #]
(q1 , a1 ) 7! (q2 , L) : a2 q1 a4 ! q2 a2 a4 , (if a1 = a4 )
a2 q1 #a3 ! q2 a2 #a3
a2 q1 #] ! q2 a2 ]
(if a1 = #)
(where a1 , a2 , a3 2 ⌃, a4 2 ⌃ \ {#}, q1 2 Q, q2 2 Q [ {qH })
Slide 65
– For all a1 , a2 2 ⌃, q1 2 Q, q2 2 Q [ {qH } such that (q1 , a1 ) 7! (q2 , R) we add the production
rules
q1 a1 a2 ! a1 q2 a2
and
q1 a1 ] ! a1 q2 #]
– For all a1 , a2 , a3 2 ⌃, a4 2 ⌃ \ {#}, q1 2 Q, q2 2 Q [ {qH } such that (q1 , a1 ) 7! (q2 , L) we add
the following production rules:
If a1 = a4 then add a2 q1 a4 ! q2 a2 a4
else
If a1 = # then add a2 q1 #a3 ! q2 a2 #a3
and
a2 q1 #] ! q2 a2 ]
Notice that moving to the right may sometimes require additional padding of # symbols in the respective
grammar derivation step. Dually, moving to the left takes care of garbage collecting any extra padding.
Importantly though, moving left is only defined for a2 2 ⌃, meaning that we can never move past the
delimiter [. In such cases, the string expansion gets stuck, modelling a machine crash.
It is not that hard to ascertain from the definition of “yields”, `M , that every single-step computation,
transforming one configuration to another, can be matched by a transformation of the corresponding strings
in the relation )G (and vice-versa). This correspondence then extends in straightforward fashion for any
computation involving n-steps.
50
Turing Machine Computation and Grammar Derivations
Lemma 45. Let M = hQ, ⌃, i be a Turing Machine. Then there
exists a grammar G such that, for any computation
hq1 , x1 a1 y1 i `⇤M hq2 , x2 a2 y2 i
i↵
[x1 q1 a1 y1 ] )⇤G [x2 q2 a2 y2 ]
where the sets {[, ]}, ⌃ and (Q[ {qH }) are mutually disjoint.
Slide 66
Turing Machine Computation and Grammar Derivations
Theorem 46. Let M = hQ, ⌃, i be a Turing Machine. Then
there exists a grammar G such that, for any computation
hq0 , a1 y1 i `⇤M hqH , x2 a2 y2 i
i↵
⇤
S )+
G [q0 a1 y1 ] )G [x2 qH a2 y2 ]
and
[x2 qH a2 y2 ] 2 L(G)
where the sets {[, ]}, ⌃ and (Q[ {qH }) are mutually disjoint.
Slide 67
(Proof Outline for Thm. 46). We here use a similar Grammar construction used for the previous proof of
Lemma 45 i.e., G = h⌃0 , N, P 0 , Si where
• ⌃0 = ⌃ [ {[, ], qH }
• N = {S, A} [ Q (notice the new non-terminal A)
We note that the transitions in our earlier grammar G did not mention any use of S. For our purposes,
we just need to setup our ”initial configuration” encoding starting from S (which will requires us to use an
additional non-terminal A to do so, using simple context-free production rules). Thus, the set of production
rules P 0 also contains all the rules discussed in P earlier, but needs to be extended with rules for initialising
the starting configuration (for arbitrary input string) from the start symbol. This would entail the following
additional production rules:
• S ! [q0 A]and S ! [q0 #]
• A ! aA and A ! a for all a 2 ⌃
⇤
Together, the above rules allow us to derive S )+
G [q0 a1 y1 ] for arbitrary a1 2 ⌃ and y1 2 ⌃ . Then the first
part of our required result follows from Lemma 45.
51
Moreover, we note that, once computation reached a halting configuration, then this corresponding string
encoding in G0 will consist entirely of terminal symbols (since qH is a terminal symbol, whereas all the other
states are not - see definition of ⌃0 ). This makes such a string an acceptable string in L(G). In fact, since
all states q 2 Q are made non-terminals in the definition of G, our derivations in grammar G will always
contain a non-terminal symbol until the final state qH replaces some state q 2 Q.
Grammar Derivations and Turing Machine Computation
Theorem 47. Let G = h⌃, N, P, Si a Phrase Structure Grammar. Then there exists a (deterministic) Turing Machine M =
hQ, ⌃, i such that
x 2 L(G)
i↵
hq0 , #x#i `⇤M hqH , x2 ax3 i
for some x2 , a, x3 .
Proof. We can dovetail the generation of all sentential forms as:
D0 = {S}
Di+1 = {↵
| 9↵0 2 Di such that ↵0 )G ↵}
Since derivation rules are finite, and Di is finite, then Di+1 is
finite as well.
Slide 68
In order to complete the proof for 44 from Slide 64 we need to show the second part of the proof, namely
LPSG ✓ LRE . This follows from a proof for Thm. 47 of Slide 68. Proving such a theorem hinges on finding
a way how to algorithmically enumerate all x 2 L(G) and the main complication in doing so stems from the
(potential) non-deterministic nature of grammar derivations. To handle this problem, we once again use the
dove-tailing technique (see earlier in the proof of Lemma 40) to perform a breadth-first search across the
tree of possible derivation paths.
(Proof Outline for Thm. 47). The key insight to this proof is that we can layer all sentential forms of G, i.e.,
dove-tailing their derivation, using the following inductive definition:
D0 = {S}
Di+1 = {↵
| 9↵0 2 Di such that ↵0 )G ↵}
Since the derivation rules are finite in PG , if Di is finite then Di+1 is a finite set as well. Thus by induction,
every such Di is finite and thus easily enumerable (say alphabetically).
More concretely, since the production rules PG are finite, we can construct a Turing Machine whereby
these rules are hard-coded in its set of states. Our Turing Machine would therefore use 3 tapes, the first tape
to be used for input/output and second and third tapes to be used for generating member of a particular
Di . Membership acceptance in such a Turing Machine proceeds iteratively as follows:
1. initialise the third tape with S.
2. Generate all the members of Di+1 from the present members of Di listed on the third tape, and write
them to the second tape.
3. Match the string on the first tape with every string generated on the second tape and:
52
• if a match is found halt successfully.
• else overwrite the contents of the second tape on the third tape and goto step 2.
It is easy to ascertain that x 2 L(G) i↵ the Turing Machine terminates successfully.
2.7.3
Turing Machines and all other Languages
We have already seen that Turing Machines and PSG have the same computational power. This is an
important result for us computer scientists since it implies that, in terms of expressive power, it does not
matter whether we choose a Turing Machine or a PSG to describe how one can algorithmically recognise a
language.
Recursively Enumerable Languages and the Set of All
Languages
What is the relationship between LRE and LS ?
• We certainly know
• Do we have
LRE ✓ LS
LS ✓ LRE ?
• Or else, can we show that
LRE ⇢ LS ?
Slide 69
But this begs the question “Are there limits to this approach?” In other words, are there languages
that cannot be recognised by Turing Machines (or PSGs)? This result also has important implications to
computer science because it would mean that there exist some problems for which there is no algorithmic
solution.
Let us focus on Turing Machines and recursively enumerable languages for the time being (we could just
as easily conduct our discussion in terms of PSG however). Since Turing Machines recognise languages,
we trivially know LRE ✓ LS as any language is contained in LS by definition. The question we attempt
to answer now is whether all languages can be recognised by some Turing Machine i.e., LS ✓ LRE (all
languages are recursively enumerable) or whether there exist languages that cannot be recognised by any
Turing Machine, i.e., making the inclusion strict LRE ⇢ LS .
Answering this question is non-trivial because it requires us to reason about two infinite sets i.e., the set
of all Turing Machines i.e., the set LRE , and the set of all languages LS . We could attempt to determine
directly whether LS = LRE (which would imply LS ✓ LRE ) by devising a method for comparing infinite
sets and then determine that they are of the same size.
One technique for performing such comparisons is called enumeration and works as follows. We know
that the set natural numbers Nat is infinite (and intuitively totally ordered.) If we take another infinite set
and provide a mapping that is 1-to-1 i.e., both injective and surjective, between this set and Nat (in simpler
words, a function, f , with an inverse f 1 whereby f 1 (f (x)) = x) then we can determine that the two
sets are equal in size. The process is termed enumeration because, through the mapping, one is e↵ectively
enumerating (assigning a unique number from Nat) to every element of the infinite set under consideration.
Example 48 (Enumeration). Through the function f (x) = 2x (with inverse f 1 (x) = x2 ) we can determine
that the set of even numbers, Even, is of the same size as Nat. The argument for such a conclusion would
53
go as follows: If say, Nat, was larger than Even, then pairing distinct elements from both sets i.e., (x, y)
where x 2 Nat and y 2 Even should result in a situation whereby one runs out of distinct elements from
the set Even. But function f ensures that this can never happen and that one can always find a fresh value
in Even to pair with x 2 Nat, namely (x, f (x)). The argument is dual for checking whether Even is larger
than Nat, using f 1 instead.
Enumeration
• 1-to-1 mapping with Nat.
• Used as a measure between infinite sets.
• f (x) = x ⇥ 2 establishes correspondence between Nat and
Even. (f 1 (x) = x2 )
Lemma 49 (Enumeration and Cartesian Products). If S1 and
S2 are enumerable sets and S2 is finite, then S1 ⇥ S2 (their
Cartesian product) is also enumerable.
Slide 70
For any alphabet ⌃, we can also determine that the language defined over this alphabet is as large as Nat
through lexicographic enumeration of every string in ⌃⇤ . At this point, you should also be able to convince
yourself that the set of Turing Machines defined over this alphabet is also enumerable through lexicographical
ordering. The reason for this is that every Turing Machine has a finite description in terms of ⌃, Q and and
each of these sets can, in turn, be enumerated through lexicographic ordering which implies that the Turing
Machine description itself can be enumerated by Lemma 49. This also means that the recursively enumerable
languages can be enumerated (hence the names) through a direct mapping between Turing Machines and
the languages that they recognise.
Recursively Enumerable Languages and the Set of All
Languages
Theorem 50. There are languages that are not recursively enumerable i.e.,
LRE ⇢ LS
Proof. By Diagonalisation.
Slide 71
This brings us to the point whereby we can show that LRE ⇢ LS (Thm. 50 on Slide 71). From the fact
asserted earlier that LRE ✓ LS , we only need to show that there exist languages that are not recursively
enumerable. We show this through the technique called diagonalisation, discovered by the mathematician
Georg Cantor. What diagonalisation essentially does is to show, by contradiction, that there cannot be any
1-to-1 mapping between some infinite set and some other infinite, but enumerable, set. For our particular
case, it shows that the set of all languages cannot be enumerated and, as a result, this set must be strictly
54
larger than the set of all Turing Machines.
(Proof for Thm. 50). We start by setting up the table on Slide 72, whereby the y-axis show all enumerated
(hence ordered) Turing Machines over some ⌃ and the x-axis show the enumeration of all the strings over ⌃⇤ .
Alongside each Mi we list the language that the Turing Machine recognises and enumerate it as Li 6 . In each
row we write what is called the characteristic sequence for every Li with respect to the string enumeration
on the x-axis. This means that on row i on column j we write 1 if wj 2 Li and otherwise 0 if wj 62 Li .
Diagonalisation of R. E. Languages (1)
TM
M0
M1
M2
M3
..
.
LRE
L0
L1
L2
L3
..
.
w0
1
0
1
0
..
.
w1
0
0
0
0
..
.
w2
1
1
0
1
..
.
w3
1
0
1
1
..
.
...
...
...
...
...
Slide 72
The diagonalisation argument is shown on Slide 73 and proceeds as follows. We identify a “witness”
language that is clearly in LS but that is however distinct from any Li 2 LRE , which would prove that
LS is clearly larger than LRE . This language is constructed through the characteristic sequence generated
by the inversion of the values along the diagonal of our table. Let us refer to this language as Lwit . Thus
w1 is in Lwit if w1 is not in L1 (and vice-versa), w2 is in Lwit if w2 is not in L2 (and vice-versa), etc. By
construction Lwit di↵ers from any Li on the ith string, which makes is distinct from any possible Li . Since
all Li cover all possible languages in LRE , then it follows that there exists a language that is not in LRE .
Diagonalisation of R. E. Languages (2)
TM
M0
M1
M2
M3
..
.
LRE
L0
L1
L2
L3
..
.
w0
0
0
1
0
..
.
w1
0
1
0
0
..
.
w2
1
1
1
1
..
.
w3
1
0
1
0
..
.
...
...
...
...
...
Slide 73
6 There are some Turing Machines that recognise the same languages, but even in this case, the diagonalisation argument
still holds.
55
This final result allows us to construct the global picture shown on Slide 74. The only language class
relationship we still have not fully considered is between LRec and LRE . We have already seen, through
Thm. 23, that LRec ✓ LRE ; But what is the relationship in the other direction i.e., are all recursively
enumerable languages recursive as well? This question is a fundamental one in Computer Science and is
often referred to as the Halting Problem.
A Hierarchy of Languages
LReg ⇢ LCFG ⇢ LRec
We know
??? LRE = LPSG ⇢ LS
LRec ✓ LRE ;
The Halting problem will enable us to answer the question relating to the inclusion in the opposite direction.
Slide 74
2.7.4
Exercises
1. Outline why the proof given for Lemma 43 does not suffice to show that every CFL is decidable.
2. Consider the regular grammar G = h{a, b} , {A} , A, {A ! aA, A ! b}i. Use this grammar to automatically generate a Turing Machine M that can decide L(G).
3. Consider the context-freer grammar G = h{a, b} , {A} , A, {A ! aAa, A ! b}i. Use this grammar to
automatically generate a Turing Machine M that can recognise L(G).
4. Give a Phrase Structure Grammar description for the Turing Machine Merase according to the construction outlined in the proof for Lemma 45 and subsequently for Thm. 46.
56