Turing Machines
A “Turing machine” is a theoretical model of a computer introduced by Alan Turing in 1936. It consists of
a little “machine” sitting on an infinite tape which it can read and write from:
b
A
⊔ ⊔ ⊔ ⊔ ⊔ ⊔
1 ⊔
1 ⊔
0 ⊔
0 ⊔
1 ⊔
1 ⊔
0 ⊔ ⊔
1 ⊔ ⊔
1 ⊔
0 ⊔
0 ⊔
1 ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔
a
The tape is the machine’s “external memory”. It also has an internal memory, known as a state, which is
represented by the letter ‘A’ in the drawing. However the machine’s internal memory is finite, so that a
Turing machine has only finitely many different states. At each moment of the computation, the machine
is in exactly one state. At each step, depending on the symbol at its current location and on its state, the
machine decides whether to print a new symbol at this location, which new state to enter, and whether to
go left or right by one cell on the tape.
Here is an example of a Turing machine with three non-halting states A, B, C and that works with two
tape symbols, ‘⊔’ and ‘1’ (by the way, ‘⊔’ is called the “blank symbol”):
⊔
1
A
(1, B, R)
(1, H, R)
B
(⊔, C, R)
(1, B, R)
C
(1, C, L)
(1, A, L)
In column A and row ⊔ we see ‘(1, B, R)’; this means when the machine is in state A and reads a ‘⊔’, the
machine prints a ‘1’, changes to state B, and then moves right (R is for right, L is for left). The machine’s
start state is A (A is underlined to show that it is the start state) and the machine has a special halting
state, called H; when the machine reaches the halting state, the computation stops. Here are the steps which
this machine makes when it is started on a tape with all blank symbols:
A
B
⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔
C
⊔ ⊔ ⊔ ⊔
1 ⊔ ⊔ ⊔ ⊔ ⊔ ⊔
B
⊔ ⊔ ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔ ⊔ ⊔ ⊔
A
⊔ ⊔ ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔ ⊔
B
⊔ ⊔ ⊔ ⊔
1 ⊔
1 ⊔
1 ⊔ ⊔ ⊔ ⊔
C
⊔ ⊔ ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔ ⊔ ⊔ ⊔
⊔ ⊔ ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔ ⊔ ⊔ ⊔
H
⊔ ⊔ ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔ ⊔
1
⊔ ⊔ ⊔ ⊔
1 ⊔ ⊔
1 ⊔ ⊔ ⊔ ⊔
B
⊔ ⊔ ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔ ⊔ ⊔ ⊔
B
b
C
⊔ ⊔ ⊔ ⊔
1 ⊔ ⊔ ⊔ ⊔ ⊔ ⊔
A
⊔ ⊔ ⊔ ⊔
1 ⊔
1 ⊔
1 ⊔ ⊔ ⊔ ⊔
a
C
⊔ ⊔ ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔ ⊔ ⊔ ⊔
C
⊔ ⊔ ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔ ⊔
1 ⊔ ⊔
(To make the computation easier to read, we have put a little triangle to mark the machine’s starting
position on the tape.) This machine does thirteen steps before halting. The machine’s computation has no
“meaning”, in this case, but other Turing machines compute useful things. Not all Turing machines halt.
For example the Turing machine
⊔
A
(⊔, B, R)
B
(⊔, A, L)
C
(⊔, H, L)
simply switches back and forth between states A and B forever, without ever entering state C or the halting
state H. As an even simpler example, the machine
⊔
A
(⊔, A, R)
simply goes forever to the right. Both these machines work only with the tape symbol ⊔, which means in
particular that the tape always consists of an infinite sequence of blanks (otherwise, if the machine encounters a symbol that it is not designed for, the computation is undefined).
OK, we now give a formal definition of a Turing machine. Formally, a Turing machine M consists of a tuple1
(Q, Σ, Γ, δ, q0 , qA , qR ) where:
• Q is the finite set of states.
• q0 ∈ Q is the start state, while qA ∈ Q and qR ∈ Q are respectively2 the accept and reject states. The
accept and reject states are halting states, which means the computation halts if the machine reaches one of
these states. (This is a bit different from the examples of Turing machines above, which had a single halting
state ‘H’, but it is useful in general to have two different halting states called “accept” and “reject”.) We
also stipulate3 that qA 6= qR .
• Σ is a finite set of symbols called the input alphabet of the machine; the blank symbol ⊔ is never in the
input alphabet, that is, ⊔ ∈
/ Σ.
• Γ ⊇ Σ ∪ {⊔} is another finite set of symbols called the tape alphabet (the notation “Γ ⊇ Σ ∪ {⊔}” means
that Γ contains ⊔ and all the symbols in Σ, and may contain other symbols as well). Example: Σ = {0, 1}
and Γ = {0, 1, ⊔}.
• δ : Γ × Q\{qA, qR } → Γ × Q × {L, R} is a function called the transition function, which is the “brains” of
the machine. For example if δ(1, q0 ) = (0, q2 , L) this means that if the machine reads a ‘1’ while it is in state
q0 , then the machine prints a ‘0’, goes into state q2 , and moves left. The transition function δ is not defined
on the states qA , qR , because these are halting states, so the computation has stopped anyway if the machine
is in one of these states. (Note: Q\{qA , qR } means the set Q with the states qA and qR removed. Other
note: A × B is the Cartesian product of two sets A and B; the definition is A × B = {(a, b) : a ∈ A, b ∈ B}.)
Some notations on strings that we need. Let Σ be a finite set of symbols. Then Σ∗ means all possible finite
strings that can be formed from characters in Σ. For example, if Σ = {a, b} then Σ∗ = {a, b}∗ contains
strings like a, b, aab, bbbb, and so on; Σ∗ also contains the empty string, written ε, which has length 0 and
consists of no characters at all. The number of characters (or length) of a string x ∈ Σ∗ is written |x|.
Initially, a (finite) input x ∈ Σ∗ is written on the tape, with infinitely many ‘⊔’ to the left and to the
right of the input; the machine starts at the leftmost character of the input, in state q0 (if the input is the
empty string, the tape consists only of blanks, and the machine can start anywhere). With input alphabet
Σ = {0, 1}, a typical starting configuration of a Turing machine may look like this:
1 Tuple
means “finite sequence” or “sequence of objects”.
and Y are respectively A and B” means X is A and Y is B.
3 Stipulate is another word for require.
2 “X
2
b
q0
⊔ ⊔ ⊔ ⊔
0 ⊔
1 ⊔
1 ⊔
1 ⊔
1 ⊔
0 ⊔
1 ⊔
1 ⊔
0 ⊔
1 ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔
a
Here the input is x = 0111101101 ∈ Σ∗ . The computation then proceeds as defined by the transition function
δ, until the machine reaches either the state qA or the state qR . If the machine reaches the state qA we say
it accepts the input x; if it reaches the state qR we say it rejects the input x; if it never reaches either qA or
qR , then the machine is said to loop forever or diverge (they mean the same thing). The set of inputs that
a machine M accepts is written A(M ) and the set of inputs that a machine rejects is written R(M ). Thus
A(M ) ⊆ Σ∗ , R(M ) ⊆ Σ∗ , A(M ) ∩ R(M ) = ∅ (note: ∅ is the “empty set”, the set with no elements) and
A(M ) ∪ R(M ) = Σ∗ if and only if the machine halts on every input. Here are some more definitions:
Definition 1. A language is a set L ⊆ Σ∗ .
Thus a language is just a set of strings, with respect to some fixed alphabet Σ. The complement of a language
L ⊆ Σ∗ is Σ∗ \L, the set of all strings in Σ∗ that are not in L. The complement of a language L is sometimes
written L.
Definition 2. A language L such that L = A(M ) for some Turing machine M is called recursively enumerable or Turing-recognizable or simply recognizable.
Note: Just because L = A(M ) does not mean L = R(M ). It could be that for certain inputs x ∈ L, M does
not halt on input x. In fact, there is a definition for when L = A(M ) and L = R(M ):
Definition 3. A language L such that L = A(M ), L = R(M ) for some Turing machine M is called
decidable. We also say that M decides L in this case.
If a language is decidable then there is a procedure (namely, the Turing machine for that language) which
tells in a finite amount of time whether a given string is in the language or not. Every decidable language
is recognizable, but there are some recognizable languages that are not decidable. Also, there are languages
that are not even recognizable.
Another example of a Turing machine. We present a Turing machine with Σ = {0}, Γ = {0, x, ⊔} and
Q = {q0 , q1 , q2 , q3 , qA , qR } that decides the language
n
L0 = {02
−1
: n = 0, 1, 2, 3, . . .}.
“0b ” means the string 000 . . . 00 with b zeroes. Thus L0 contains the strings ε, 0, 000, 0000000, and so on.
Here is the machine, drawn using a “graph diagram”:
b
q1
x→R
⊔→R
q0
⊔→R
qA
x,
R
q3
→
x→R
0
⊔→L
0→L
x→L
0→R
q2
⊔→R
qR
0 → x, R
x→R
The graph should be read like this: for example, the presence of an arrow “0 → x, R” from q0 to q1 means
that δ(0, q0 ) = (x, q1 , R); the presence of the arrow “⊔ → R” from q0 to qA means that δ(⊔, q0 ) = (⊔, qA , R).
The solitary arrow leading into state q0 indicates that q0 is the start state. Using a table diagram, the same
machine looks like so:
3
⊔
0
x
q0
(⊔, qA , R)
(x, q1 , R)
(x, q0 , R)
q1
(⊔, q3 , L)
(0, q2 , R)
(x, q1 , R)
q2
(⊔, qR , R)
(x, q1 , R)
(x, q2 , R)
q3
(⊔, q0 , R)
(0, q3 , L)
(x, q3 , L)
To prove that this machine (let’s call it M ) decides the language L0 , we will show something stronger: that
if we give M as input a string x ∈ {0, x}∗, M will (i) accept x if x contains no 0’s at all, or (ii) reject x if x
contains an even but nonzero number of 0’s, or otherwise (iii) if x contains ℓ 0’s, with ℓ ≥ 1 an odd number,
M will “cross out” ⌈ 2ℓ ⌉ 0’s in x, replacing them with x’s, and then return to the leftmost character of this
new string, in state q0 . These three observations are easy to verify from M ’s graph diagram.
For example, if we give M as input any string x ∈ {0, x}∗ with, say, 35 0’s, M will first cross out
⌈35/2⌉ = 18 zeroes, leaving it with a string that has 17 zeroes, then M will cross out ⌈17/2⌉ = 9 zeroes,
leaving it with a string that has 8 zeroes, then M will cross out 4 zeroes and reject, because 8 is an even
number of zeroes.
Thus, if we give M an input x ∈ {0, x}∗ with ℓ 0’s, M will accept if and only if the first even number in
the sequence ℓ, ⌈ℓ/2⌉, ⌈⌈ℓ/2⌉/2⌉, . . . is the number 0. Otherwise M will reject. In particular, an input of
the form 0b (that is, an input in Σ∗ = {0}∗ ) is accepted if and only if b is of the form 2n − 1, because the
first even number in the sequence ℓ, ⌈ℓ/2⌉, ⌈⌈ℓ/2⌉/2⌉, . . . is 0 if and only if ℓ is of the form 2n − 1. Thus M
decides L0 , that is, we have A(M ) = L0 and R(M ) = Σ∗ \L0 .
Definition of computability. So far we have discussed Turing machines as a means of deciding a language,
that is, producing a yes/no answer to a question. We can also use Turing machines to perform computations
that give more than a yes/no answer. Such Turing machines can compute mathematical functions, in the
sense of the following definition:
Definition 4. A function f : N → N is computable if there exists a Turing machine M with Σ = {0, 1}
such that, given an input x ∈ {0, 1}∗, M computes and halts in the accept state with the value f (x) written
on its tape.
Note: We are implicitly using the fact that a string in {0, 1}∗ represents a number (in binary) and vice-versa
that a number in N can be represented as a binary string. This representation is not unique (we can add 0’s
at the beginning of a number, and it stays the same number), but this isn’t important. By fixing a binary
representation for numbers in Z, or in Q, etc, we could similarly give a definition for the computability of
a function from Z to Z or from Q to Q, etc, and so forth for any function from a mathematical set A to
another mathematical set B as long as we have a way to encode the elements of A as binary strings, and
on a way to encode the elements of B as binary strings. (These bit strings, however, have to be finite; this
means for example that there is no obvious way to discuss the computability of a function from R to R, since
a number in R requires an infinite string to represent; however, we don’t care so much about this problem,
at least in this course.)
Church-Turing thesis. Despite the apparent clumsiness of Turing machines, Turing machines can complete
any computational task that a normal computer can do. Turing machines which complete “interesting” tasks
(such as adding or multiplying two numbers, testing whether a number is prime, or much more complicated
tasks) have many more states, and it is too tedious to write down the full description of such Turing machines.
Instead people usually give “high-level” descriptions of such Turing machines. These high-level descriptions
give “the general idea” of how the machine works without naming all the states.
The fact that Turing machines can compute anything that we think of as being an “algorithm” is known
as the Church-Turing thesis. That is, the Church-Turing thesis says, “anything that you think of as being
an algorithm, computable in a finite amount of time, can be computed on a Turing machine”. The ChurchTuring thesis is not a mathematical statement, but a statement of belief; the belief, more precisely, is that
a Turing machine is the “correct” mathematical definition for the intuitive concept of an algorithm. (There
are other possible definitions which people use, but these all turn out to be equivalent to Turing machines in
4
the sense that functions which can be computed on these other models can be computed on Turing machines,
and vice-versa.)
The Busy Beavers
Informally, a busy beaver is a Turing machine that runs for a very large number of steps before stopping
when its input is the empty string (namely, when it is started on a tape consisting only of blanks). There
are two variables that determine a busy beaver’s “class”: the number of states of the machine, and the size
of the tape alphabet Γ. (Here the input alphabet Σ is not important because the machine is started on a
blank tape, anyway.) The larger the number of states and the larger the alphabet Γ, the longer it is possible
for a Turing machine to run before stopping, if it is going to stop at all. Of course, some Turing machines
never stop, and go for an infinite number of steps; however, these machines don’t interest us, we are only
interested in machines that do stop but after the largest possible number of steps.
For example, the first Turing machine we showed was a machine with 2 tape symbols (1 and ⊔) and 3
non-halting states (A, B and C). This machine runs for 13 steps on the empty input before halting. Is it
possible to find a machine with 2 tape symbols and 3 non-halting states that runs for more than 13 steps on
the empty input before halting? What is the maximum such value?
Let M(k, n) be the set of all Turing machines that have a tape alphabet Γ of size k and n non-halting
states. (For this discussion, we do not care about the difference between accept and reject halting states;
without loss of generality, we can assume there is just one halting state.) For example, the first Turing
machine we presented is an element of M(2, 3) and the second Turing machine we presented is an element
of M(3, 4). Note that the “table” describing a machine in M(k, n) has exactly k rows and n columns, and
in fact you can think of M(k, n) as the set of all Turing machines that have a table description with k rows
and n columns.
By relabeling the k elements of the tape alphabet Γ as ⊔, 1, . . . , k − 1, and by relabeling the n non-halting
states of the machine as 1, 2, . . . , n, we can see there are only finitely many machines M(k, n) that are
“essentially different” (that is, that cannot be obtained from one another by relabeling states and/or tape
symbols). More precisely, since there are kn cells in a Turing machine description table with k rows and n
columns, and since there are k · (n + 1) · 2 possibilities for the instruction in each cell ((n + 1) and not n
because there is also a halting state, besides the n non-halting states), and since we may assume the start
state is always state number 1, there are in total
(2k(n + 1))kn
possible different Turing machines in M(k, n). We do not care so much about this formula. We care about
the fact that there are only finitely many different Turing machines in M(k, n). Moreover, each such machine
either (i) eventually halts when it is given the empty string as input, or (ii) goes forever and never halts
when it is given the empty string as input. (The point is that a machine has no “choice” about what to do
when it is started on the empty input: it will do the same thing every time.)
We let BBk (n) be the maximum number of steps taken by any Turing machine in M(k, n) when started
on the empty input, among all those machines in M(k, n) that halt on the empty input. Since there are only
finitely many machines in M(k, n) (after relabeling the tape alphabet, etc), and since each of these machines
has a single possible behavior on the empty string, BBk (n) is well-defined for k, n ∈ N. We call BBk (·) the
k-symbol busy beaver function. For example, we know that BB2 (3) ≥ 13 since we know a Turing machine
with tape alphabet of size 2 and 3 non-halting states that runs for 13 steps before halting (the first Turing
machine we showed). In fact, BB2 (3) = 21, so there is another machine with |Γ| = 2 and 3 non-halting states
that runs for even longer on the empty input; the machine which runs 13 steps is not a “busy beaver”.
Let’s first look at BB1 (·). By definition, BB1 (·) is the maximum number of steps that an n-state Turing
machine with tape alphabet Γ of size 1 can take before stopping on the empty input, if the machine stops at
all. Note that if |Γ| = 1 then Γ = {⊔}. Therefore such a Turing machine can never read or write anything on
its tape besides ⊔; the tape is completely useless and stores no information at all, the only memory available
to the machine is its internal memory, its state. If such a Turing machine ever enters the same state twice
5
it will loop forever (this is easy to see), so the most such a machine can do, to run the longest possible but
still stop, is to use each of its states once and then halt. Here is an example of such a machine with n = 4;
the non-halting states are labeled 1, 2, 3, 4, and the halting state is H:
⊔
1
(⊔, 2, R)
2
(⊔, 3, R)
3
(⊔, 4, R)
4
(⊔, H, R)
This machine takes 4 steps before halting. In general, BB1 (n) = n.
The functions BB2 (·), BB3 (·), . . . are more interesting and also much harder to compute. In fact these
functions are all uncomputable (see the definition of a “computable” function above). Certain specific values
of these functions are known, however. For example, we know that
BB2 (1) = 1,
BB2 (2) = 6,
BB2 (3) = 21,
BB2 (4) = 107
but we do not know BB2 (n) for n ≥ 5, though we do know that BB2 (5) ≥ 47176870 and that BB2 (6) ≥
7.4 · 1036534 . For BB3 we only know that BB3 (1) = 1 and that BB3 (2) = 38; we do not know BB3 (3), though
we know BB3 (3) ≥ 119112334170342540. For k ≥ 4, the only values values of BBk (n) are those with n = 1,
namely the fact that BBk (1) = 1 for all k.
One could imagine the following (incorrect) algorithm for computing the function BBk : on input n ∈ N,
enumerate the (finitely many) different Turing machines that have |Γ| = k and n states; for each of these
machines, run the machine until it stops, and count the number of steps; but if the machine never stops,
abort the computation and move to the next machine; then take the largest possible number of steps among
all the machines that stopped. The flaw in this algorithm is easy to see: there is no way to tell, in general,
whether a Turing machine is going to halt or not. A machine’s behavior may be so complex and so messy
that we cannot predict whether it is going to halt. For example, the reason we know BB2 (5) ≥ 47176870
without knowing the value of BB2 (5) is that there is some machine in M(2, 5) which halts in 47176870 steps,
but there are some other machines in M(2, 5) that we do not even know whether they halt or not. (Actually,
one can prove that the problem of telling whether a Turing machine is going to halt on a certain input is
undecidable, meaning there exists no Turing machine that can compute the answer to this question. We will
discuss this more later.)
Simulating larger alphabets by smaller alphabets
We will show that the computation of a Turing machine M = (Q, Σ, Γ, δ, q0 , qA , qR ) can be “simulated”
(which means “reproduced” or “imitated”) by a Turing machine M ′ = (Σ′ , Γ′ , . . .) that uses only Γ′ = {⊔, 1}
(and Σ′ = {1}). Since M ′ cannot accept the same kind of inputs as M , or produce the same kind of outputs,
we first need to make precise in what sense we mean that M ′ “simulates” M , before showing that such a
simulation is really possible.
Firstly, we define an encoding from Γ to strings in {⊔, 1}∗ . Each character in Γ will become a string of
length r = ⌈log2 (|Γ|)⌉. This is essentially a binary encoding, treating each element of Γ as a number between
0 and |Γ| − 1, and translating this binary number into the alphabet {⊔, 1} by mapping 0 to ⊔ and 1 to 1;
moreover we treat ⊔ ∈ Γ as the number 0, by default, so that ⊔ maps to a string of r ⊔’s. For example, if
Γ = {⊔, a, b, c, x} then r = ⌈log2 |Γ|⌉ = 3, and we have the mapping
⊔
a
b
c
x
→
→
→
→
→
⊔⊔⊔
⊔⊔ 1
⊔1⊔
⊔1 1
1 ⊔⊔
We divide the tape of M ′ into imaginary “blocks” of r cells each. We give M ′ its input using the encoding
described above, with each character of Γ becoming one block in the tape of M ′ . At the beginning of the
computation, we place the tape head of M ′ on the leftmost cell of the leftmost block of its input:
6
b
2
22
22
22
22
22
22
22
22
22
22
22
22
22
22
22
22
(
22
(
22
22
22
(
22
22
22
(
⊔
a
a
M
x
⊔
c
M
⊔
(
(
(
(
⊔ ⊔ ⊔ ⊔ ⊔
1 ⊔ ⊔ ⊔
1 ⊔
1 ⊔ ⊔ ⊔ ⊔
1 ⊔
1 ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔ ⊔
(
a ⊔
a ⊔
x ⊔
c ⊔ ⊔ ⊔ ⊔ ⊔ ⊔
⊔ ⊔ ⊔
a
22
22
q0′
q0
⊔
′
(We note that the input of M ′ does not follow our convention of having the input for a Turing machine be
a string in Σ∗ where ⊔ ∈
/ Σ, since here the “input” contains occurrences of the character ⊔; however the
condition that ⊔ ∈
/ Σ is only made for convenience, so that a Turing machine can always tell where its input
ends; here M ′ can also tell where its input ends: it’s the first block consisting only of ⊔’s.)
We now say that M ′ “simulates M ” if, given the input of M encoded as above, M ′ computes and halts
either in the accept state (if M accepts) or in the reject state (if M rejects), with the contents of its tape
being the encoded version of M ’s tape at the moment when M halts.
The simulation of M by M ′ proceeds the way one might imagine, with M ′ imitating the actions of M
but making several operations for each operation of M . More precisely, before each step of M , M ′ places
itself at the left end of the block corresponding to the cell which M is reading. Then if, for example, M
prints an a in that cell and moves right, M ′ writes ⊔ ⊔ 1 in this block and moves to the beginning of the next
block to the right; then it reads this block to learn the character in it, so that it can keep simulating M ,
and so on. M ′ needs more states than M , because it must “remember” what state M is in while it prints
several characters, moves and reads several characters, before simulating the next step of M .
We now give more precise details on how M ′ works. Let {⊔, 1}≤r be the set of all strings in {⊔, 1}∗ that
have length ≤ r (including the empty string). We view the elements of {⊔, 1}≤r as prefixes of encodings of
characters in Γ, starting from the left of the encoding; for example, in the encoding shown above, we have
r = 3 and ⊔1 ∈ {⊔, 1}≤r can be viewed as a prefix of the encodings of b and c.
The set of states of M ′ will be Q′ = Q×{read, read rewind, write, write rewind, move}×Γ×{⊔, 1}≤r ×
{L, R}. Thus each state of Q′ is a 5-tuple consisting of a state of Q (recall Q is the set of states of M ), an
element of the set {read, read rewind, write, write rewind, move}, a character in Γ, a string in {⊔, 1}≤r
and finally a letter L or R. (At this point, the most interesting part of the description of M ′ is over; namely
the most interesting thing is that the set of states of M ′ is a “direct product” of several sets, so that a single
state of M ′ will allow M ′ to remember several things “in parrallel”; if you now feel comfortable that you
more or less understand how M ′ is going to work, you can skip the rest of the description, since it’s pretty
tedious.)
We now describe how M “uses” its 5-tuple states. We refer to the 5 parts of the state as “coordinates” of the state, with the first coordinate being an element of Q, the second an instruction in the set
{read, read rewind, write, write rewind, move}, the third coordinate being an element of Γ, the fourth
coordinate a string in {⊔, 1}≤r , and the fifth coordinate the letter L or R.
The first coordinate is used to remember what state the machine M is in. The second coordinate of the
state is used by M ′ to remember what type of action it is presently undertaking; more precisely: (i) read
is used for reading a block left to right, using the {⊔, 1}≤r portion of the state (a.k.a. fourth coordinate) to
remember what it has read so far and how many characters are left to read, and at the last character read
M ′ remembers the whole string by storing it in the Γ portion of its state (a.k.a. third coordinate) (recall that
M ′ cannot print or read elements of Γ to its external memory, the tape; however recording an element of Γ to
a portion of internal memory is of course not the same thing), (ii) after the read instruction is finished, M ′
changes the second coordinate of its state to read rewind, which orders it to return to the leftmost character
of the current block (it is currently at the rightmost character), and it uses the {⊔, 1}≤r portion of its state
to “count” how far it has gone (it doesn’t need to store anything meaningful as a prefix, here: we are just
using the prefix to count r − 1 steps to the left), (iii) write is used for printing left-to-right the encoding of a
character in Γ to a block, more precisely, the character that is currently stored in the third coordinate of the
state, (iv) write rewind is exactly similar to read rewind (but M ′ uses two different states read rewind
and write rewind instead of a single rewind instruction so that it can remember the type of operation just
performed: a read or a write), and (v) finally, when M ′ has done the necessary reading, rewinding, writing
7
and rewinding for the current step, and only needs to move left or right by one block before simulating the
next step of M , it sets the second coordinate of its state to move, and knows to move left or right from the
fifth coordinate of its state (which is L or R), which has been previously set; here again, M ′ uses {⊔, 1}≤r
to count how far it has moved.
A “full cycle” of a simulation of a step of M by M ′ is as follows: at the beginning of the cycle, M ′ is at
the leftmost cell in the block corresponding to the cell which M would be reading, and the first coordinate of
the state of M ′ is the state M would be in; then M ′ reads (and remembers in its third coordinate) this block,
rewinds to the start, and, knowing the state of M and the character of Γ encoded in the block, computes
the new state of M , the new character M will print in this cell, and the direction M will move next; these
three elements are stored in respectively the first, third and fifth coordinates of M ′ ’s state. Then M ′ does
the write/write rewind operations to print the (encoding of the) new character, and the move operation
to change blocks, and this concludes the cycle.
Note that we seem to have provided many details about M ′ , but in reality, we did not even write down
the transition function δ ′ ! Writing down δ ′ would be possible, but even more tedious. For such reasons,
people prefer to only give very high-level overviews of Turing machines, as we already mentioned, and very
rarely specify a full-fledged transition function (or even a full set of states, like we did).
Uncomputability of the Busy Beaver functions
We would like to show that the functions BB2 (·), BB3 (·), . . . are uncomputable. (This would be a good
time to go back and look at the definition of a computable function.) Of course, BB1 (·) is computable since
BB1 (n) = n (so in fact a machine that does nothing and immediately halts on every input computes BB1 (·),
by definition!).
Let’s think about why BB2 (·) should not be computable. If I can compute BB2 (·), it means (by definition
of “computable”) there is some turing machine M that computes BB2 (·). Say, for the sake of example, that
M has 1000 states.
I can use M to compute, say, the value BB2 (106 ): I just need to encode 106 in binary on M ’s input tape,
which is
11110100001001000000
and after computing for some finite number of steps on this input, M will halt with the value BB2 (106 )
written in binary on its output tape (once again, by the definition of “computable”). This number is,
supposedly, the maximum number of steps a 2-symbol, 106 -state machine can take on the empty input, and
still halt. But M itself only has 1000 states, not 106 states, and it succeeded in computing this number. If
we modify M a little bit to, say, count down from the value of its output to 0 before halting, then M , on
input 106 , will run for more than BB2 (106 ) steps even though it has only 1000 states (or a few more states
to account for the countdown modification).
The above paragraph gives the basic idea of the argument: use the machine M that supposedly computes
BB2 (·) in order to make a new machine M ∗ that beats the answer M provides. However, to get a real
contradiction, we need to construct a machine M ∗ from M which meets all the following requirements:
(i) M ∗ must run for a long time on the empty input, since the idea is to contradict the running time of the
function BB2 (·), which is defined as the maximum running time on the empty input; more precisely,
M ∗ must run for more than BB2 (106 ) steps on the empty input, where BB2 (106 ) is computed by M ;
(ii) M ∗ must only use a tape alphabet of size 2, since BB2 (·) is defined with respect to machines that have
tape alphabets of size 2;
(iii) M ∗ must have 106 or fewer states.
We implement this proof intuition in the following theorem:
Theorem 1. There is no Turing machine that computes the function BB2 .
8
Note: The same proof works to show no Turing machine computes the function BBk for any k ≥ 2.
Proof. Assume there exists a machine M = (Q, Σ, Γ, . . .) which computes the function BB2 . Then M has
Σ = {0, 1} (in order to accept binary inputs) and some arbitrary tape alphabet Γ ⊇ {0, 1, ⊔}. In fact,
because larger tape alphabets can be simulated by smaller tape alphabets (see the previous section), we can
assume without loss of generality that Γ = {0, 1, ⊔}.
We start by constructing a machine M ′ = (Q′ , Σ′ , Γ′ , . . .) with input alphabet Σ′ = {1} and tape alphabet
′
Γ = {⊔, 1} that simulates M . Each character of Γ = {0, 1, ⊔} is encoded as a block of length 2 on M ′ ’s
tape, using the encoding:
⊔ → ⊔⊔
0 → ⊔1
1 → 1⊔
Thus, when M ′ is given the encoded version of a binary number k on its input tape, M ′ computes and halts
with the encoded version of the binary number BB2 (k) on its tape.
Say, to keep the proof concrete, that M ′ has 1000 states (it could be 10000, whatever, but let’s say 1000
for example). We construct a machine M ∗ also with tape alphabet {1, ⊔} such that, when M ∗ ’s input is the
empty string, M ∗ does the following:
1. M ∗ prints the {1, ⊔}-encoding of the number 106 on the tape; that is, the number 106 has a certain
binary expansion, and each digit in this binary expansion becomes printed as a block of size 2 on the
tape; to be exact, the encoded version of 106 is
1 ⊔1 ⊔ 1 ⊔ 1 ⊔ ⊔11⊔ ⊔1 ⊔ 1 ⊔ 1 ⊔11 ⊔⊔1 ⊔11 ⊔⊔1 ⊔1 ⊔ 1 ⊔ 1 ⊔ 1 ⊔1
So M ∗ starts by printing this number, and positioning its head at the left end of this input.
2. M ∗ then runs the computation M ′ on this input; since M ′ simulates M , this computation will terminate
with the encoded version of the value BB2 (106 ) on the output tape;
3. M ∗ treats this output as a binary value (which it is, in encoded form), and decrements4 it (still in
encoded form) repeatedly by 1 until the value 0 is reached on the output tape. Then, M ∗ halts.
Since M ∗ has tape alphabet of size 2, we just need to show two things to get a contradiction: (i) M ∗ runs
for longer than BB2 (106 ) steps on the empty input; (ii) M ∗ does not have more than 106 states.
The fact that M ∗ runs for longer than BB2 (106 ) steps on the empty input is obvious by the construction
of M ∗ , specifically by step 3. Now, how many states does M ∗ have? Firstly, it takes 1000 states to run
the computation M ′ , since M ′ has 1000 states. But M ∗ needs a few more states for step 1 and step 3. For
step 1, since the encoding of 106 has 40 characters, M ∗ needs at most 40 states to remember the encoding
“by heart”, and a few more states to return its tape head to the left end of the input after printing out the
encoding, say 50 states in all. For step 3, the decrementation can be done using O(1) states (think about
this if you need to)—concretely, and being quite generous, 20 states suffice for doing the decrementation. So
altogether M ∗ uses 50 + 1000 + 20 = 1070 states, which is far less than 106 .
For you to think about: Say, above, that M ′ had 1010 states instead of having 1000 states. Check that a
contradiction can still be obtained by choosing a larger value than 106 , say by choosing 1011 instead of 106 .
One can also write the proof more generally to work for any number of states of M ′ .
Note: The above does not work for k = 1 (namely to show BB1 (·) is uncomputable) because M ′ needs tape
alphabet size at least 2 to simulate M , and 2 > 1. Of course, since BB1 (·) is actually computable, we could
never hope to prove BB1 (·) is uncomputable.
We can also show the following consequence:
4 To
decrement a value x by a means to subtract a from x.
9
Theorem 2. Let f : N → N be such that f (n) ≥ BB2 (n) for all n ∈ N. Then f is uncomputable.
Note: In other words, this theorem says that no function upper bounding BB2 (·) is computable. (Likewise,
it is also true that no function upper bounding BBk (·) is computable for any k ≥ 2, and the proof is the
same.) In other words, the function BB2 (·) grows faster than any computable function. Any function you
can write down a formula for will grow slower than BB2 !
Proof. Say f is a function such that f (n) ≥ BB2 (n) for all n and assume by contradiction that f is computable
by some machine M . Then I claim we can compute BB2 (·) using this machine M . On input n, we start
by computing f (n) (this is where we use that f is computable). Then we look through the (finitely many!)
machines that have |Γ| = 2 and that have n non-halting states. For each such machine, we simulate the
computation of this machine on the empty tape, counting the number of steps it takes. If it halts when
≤ f (n) steps have been taken, we remember this number somewhere (actually, we are only interested in the
maximum such value, so we only need to remember the largest such number among all machines); and if the
machine goes for more than f (n) steps, then we know this machine will never halt, because BB2 (n) ≤ f (n),
and so we can stop simulating this machine and move to the next machine. In this way, we will find the
largest number of steps of any halting machine, and compute BB2 (n), which is a contradiction.
Also we have this consequence:
Theorem 3. The problem of deciding whether a given Turing machine halts or not on a given input (given
a description of the Turing machine, and given the input) is undecidable.
Proof. In fact, it is already undecidable, in general, to tell whether a Turing machine will halt on the empty
input. Indeed, if we had a procedure for deciding whether a given machine is going to halt on the empty
input, we could use this procedure to compute BB2 (·), using the idea outlined in the previous proof.
Consequences of Busy Beaver functions in mathematics. The problem of computing Busy Beaver function
values may seem artificial to you, but it actually has deep (if useless) connections to mathematics. For
example, you may have heard of Goldbach’s conjecture, which states that every even number greater than 2
is the sum of exactly two primes. We can rephrase this as a question about whether a certain Turing machine
ever halts. The Turing machine in question iterates through all even number n = 4, 6, 8, . . ., and for each
number, checks whether it can find two primes that sum to it; the machine only halts if it ever encounters
an n which is not the sum of two primes. So this machine halts if and only Goldbach’s conjecture is false.
Moreover, such a Turing machine can be implemented in relatively few states, around 20-30 or so states with
tape alphabet Γ = {⊔, 0, 1}. For concreteness, say, there exists such a machine that uses 23 states; if we
could compute BB3 (23) then we would know exactly how long we would need to let the Goldbach machine
run before knowing whether the conjecture is true or false (if the Goldbach machine runs more that BB3 (23)
steps, then we know it will never stop and so the conjecture is true). Of course, the problem is that BB3
is not computable; in fact, determining BB3 (23) hides the problem of solving deep mathematical questions
like the Goldbach conjecture, since to determine BB3 (23) we would need to know, in particular, whether the
Goldbach Turing machine stops or not.
Universal Turing machines
In this section we will construct a Turing machine that is capable of simulating any other Turing machine,
when given the description of the other machine as a string. Such a Turing machine is called a universal
Turing machine.
Our Turing machine will have, say, tape alphabet
Γ0 = {0, 1, . . . , 9, 0, 1, . . . , 9, (, ), #, x, ⊔}.
10
(So Γ0 has 25 symbols.) First we show how to encode any other Turing machine (even Turing machines
using a larger, or different tape alphabet) as a string in Γ∗0 . A Turing machine M = (Q, Σ, Γ, δ, q0 , qA , qR ) is
encoded as a string which gives:
- the number |Q| − 2 of non-halting states of M ; this is encoded as a decimal number, followed by ‘#’,
- the number of characters in Γ, including the character ⊔; encoded as a decimal number, followed by
‘#’,
- the list of elements in δ; each element of δ is a tuple
(α, q, α′ , q ′ , T ) ∈ Γ × Q × Γ × Q × {L, R}
which we encode as a string (. . . # . . . # . . . # . . . # . . .) by assigning to each element of Γ a number between
1 and |Γ|, to each state a number between 1 and |Q|, where we assign the start state q0 number 1 and the
accept and reject states numbers |Q| − 1 and |Q|, respectively; and where we put 0 for L and 1 for R.
That’s it! The same encoding can in fact be used whether M is a machine with a single halting state or
two different halting states qA , qR : if r is the number of non-halting states of M (given at the beginning of
the encoding), then any state number of > r is a halting state, with state number r + 1 being the accept
state and state number r + 2 being the reject state, by default. For example, the very first Turing machine
we gave becomes encoded as the string
3#2#(1#1#2#2#1)(1#2#1#3#1)(1#3#2#3#0)(2#1#2#4#1)(2#2#2#2#1)(2#3#2#1#0)
Here we had Γ = {⊔, 1} and we encoded ⊔ as 1 and 1 as 2. We had three non-halting states A, B and C
which we encoded as 1, 2 and 3 respectively. Proceeding like this, we can encode any Turing machine M .
We write hM i for the encoding of M ; thus hM i is a string in Γ∗0 .
We can also encode M ’s inputs as strings in Γ∗0 . An input for M is (originally) a string in Σ∗ . We
encode each character of this string as a decimal number between 1 and |Σ|, putting ‘#’ between successive
numbers, and also putting ‘#’ at the beginning and ending of the string. For example if Σ = {0, 1} then the
input
001001
would be encoded as the string
#1#1#2#1#1#2#
We write hxi for the encoding of an input x ∈ Σ∗ . Thus hxi ∈ Γ∗0 . We encode a pair (M, x) consisting of a
Turing machine M = (Q, Σ, . . .) and an input x ∈ Σ∗ by concatenating the encoding of M and x; that is,
the encoding of (M, x) is simply hM ihxi. For example, if M is our first Turing machine, and if x = 001001,
then we encode the pair (M, x) as the string
3#2#(1#1#2#2#1)(1#2#1#3#1) . . . (2#3#2#1#0)#1#1#2#1#1#2#
(where ‘. . .’ contains more Turing machine encoding). We write hM, xi for the encoding of the pair (M, x).
[universal Turing machine notes are still unfinished—sorry!]
11
© Copyright 2026 Paperzz