Diagonalization from Cantor to Gödel

Gödel’s Incompleteness Results
Haim Gaifman
Chapter 0: The Easy Way to Gödel’s Proof and Related Topics∗
This short sketch of Gödel’s incompleteness proof shows how it arises naturally from
Cantor’s diagonalization method [1891]. It renders Gödel’s proof and its relation to the
semantic paradoxes transparent. Some historical details, which are often ignored, are
pointed out. We also make some observations on circularity and draw brief comparisons
with natural language. The sketch does not include the messy details of the
arithmetization of the language, but the motives for it are made obvious. We suggest this
as a more efficient way to teach the topic than what is found in the standard textbooks.
For the sake of self–containment Cantor’s original diagonalization is included. A broader
and more technical perspective on diagonalization is given in [Gaifman 2005].
In [1891] Cantor presented a new type of argument that shows that the set of all binary
sequences (sequences of the form a0, a1,…,an,…, where each ai is either 0 or 1) is not
denumerable ─ that is, cannot be arranged in a sequence, where the index ranges over the
natural numbers. Let A0, A2,…An, … be a sequence of binary sequences. Say
An = an,0, an,1, …, an,i, … . Define a new sequence A* = b0, b1,…,bn,… , by putting:
bn = 1, if an,n = 0, bn = 0, if an,n = 1
*
Then, for each n, A ≠ An, since the nth member of A* differs from the the nth member of
An. Hence, A* does not appear among the Ai’s. A diagram of the following form, which
appears already in Cantor’s original paper, illustrate the idea. The new sequence A* is
obtained from the diagonal, by changing each of its values. The method came to be
known as diagonalization.
A0 =
a0,0 a0,1 . . .
a0,n . . .
A1 =
a1,0 a1,1 . . .
a1,n . .
.
.
.
An =
.
.
.
an,0 an,1 .
.
.
.
.
.
an,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
There is an elementary well known correspondence between sets of natural numbers and
binary sequences: A set of natural numbers, X, determines the sequence, ao, a1, an, …,
defined by: ai = 1, if i ∈ X; ai = 0 if i∉ X. (As usual, ‘∈’ is used for the membership
relation.) Vice versa, every binary sequence, ao, a1, an, …, determines the set of all i’s for
which ai = 1. Obviously, the correspondence is one-to-one. Cantor’s argument therefore
shows that the set of all subsets of natural numbers cannot be enumerated. In fact,
diagonalization can (and historically was) applied directly to sets. Given a sequence of
sets of natural numbers, X0 , X1 ,…, Xn, … let
X* = set of all natural numbers, n, such that n∉ Xn
Then, for each n, X* ≠ Xn, because n∈ X* ⇔ n∉ Xn .
∗
A revised 2007 version of a piece written in 2003
1
It is convenient for our purposes to rewrite ‘y ∈ X’ as ‘X(y)’ Then ‘y∉ X’ is written as
‘not-X(y)’. Thus we use ‘X( )’ as a predicate. With this notation, let us go through the
construction and the argument in some detail. Given an enumeration of sets,
X0, X1, …, Xn, …, define X* by the condition:
X*(n) ⇔ not–Xn(n)
Suppose, for contradiction, that, for some k, X* = Xk; then for every n we have:
Xk(n) ⇔ not–Xn(n)
For n=k we get:
Xk(k) ⇔ not–Xk(k)
Contradiction. The diagonalization that is used in defining X* consists in applying to
each number, n, the predicate ‘Xn( )’ associated with it.
In general, diagonalization can be used when we are given some domain of objects and a
correlation that associates with some of them functions that are defined over this very
same domain. Think of the object as a name that represents the correlated function. 1
Diagonalization then consists in applying to an object the function it represents. The
concept extends to the case where instead of functions we have subsets. This is done by
correlating with every subset, X, the function that maps each x to a truth-value: True ─ if
x ∈ X, False ─ if x ∉ X. If Xa is the set associated with a, diagonalization means that we
consider Xa(a). This, strictly speaking denotes a truth-value, but we may also use ‘Xa(a)’
to make the assertion that this value is True, i.e., that a∈ Xa. 2
Note that the above argument involves two uses of diagonalization: X*( ) is defined by
considering those n’s for which Xn(n) and this is the first use; to this we apply negation
and then, assuming that X* is the same as Xk, we consider Xk(k)––and this is the second
use. We have thus two diagonalizations, with negation sandwiched between.
Cantor’s original claim is negative: there is no enumeration of all sets of natural numbers.
The proof, however, can be seen as establishing a positive result: a prescription for
defining ─ on the basis of any enumeration of sets of natural numbers ─ a set that differs
from all enumerated sets. 3 It is in this form that the method found numerous applications.
It is, among other things, the most basic tool of complexity theory in theoretical computer
science.
A crucial historical step was taken by Richard [1905], who, in a one-page paper, applied
diagonalization in a linguistic context, in the following way. Consider an enumeration of
all definitions of sets of natural numbers, in some natural language, say English (in
Richard’s case it would be French). The enumeration is based on the fact that all strings
of words, punctuation marks and any other symbols used in the language, can be
enumerated in one sequence. Deleting from this sequence all the strings that are not
1
The correlation need not be one–to–one; the same function can be represented by different objects. Also
the functions can be defined over a more inclusive domain.
2
Thus we employ systematic ambiguity. The context decides how the notation is read. We should keep in
mind the important philosophical distinction, pointed out by Wittgenstein, between denoting and
asserting; but there should be no confusion concerning our notation.
3
Some mathematicians find the notion of the set of all sets of natural numbers problematic. This notion
figures in the negative claim, but there is no problem with the positive claim.
2
definitions of sets of natural numbers, we get an enumeration of all definitions. Let
Xn = the set defined by the nth definition. Applying Cantorian diagonalization, we get a set
X* not in this sequence. Yet, X* has been defined in the same natural language! so it
should appear in the sequence after all. This, essentially, is Richard’s paradox. 4
In the same paper, Richard proposes a solution to his paradox. He argues that the
purported definition of X* is illegitimate on grounds of circularity: it uses the set of all
definitions, to which this definition itself belongs. The string of words that constitutes
that “definition” should be deleted when we weed out the non-definitions from our initial
sequence of strings. Only after constructing a sequence of legitimate definitions, can we
define the new set; but that definition is not in the old sequence. Richard’s very brief
explanation is a little obscure, but it contains the kernel of a valid idea. It also anticipates
the much later idea of a linguistic hierarchy. Poincaré [1906] called attention to
Richard’s paradox. He also adopted what he saw as the solution , giving Richard credit
for it. Poincaré argued that the “set of all definitions” is not well-defined since it contains
“definitions” that appeal to this very same set. The vicious circle principle, which rules
out such constructions, was adopted by Russell [1906] [1908]. His theory of types, which
is the basis of Principia Mathematica, was designed with the aim of ensuring, by a
bottom up construction, that there would be no circularity.
There is, in fact, a simple direct argument that identifies, without going into the
intricacies of “the set of all definitions”, the circularity and the failure in the purported
definition of X*. The same kind of failure underlies other semantical paradoxes. In order
to avoid interrupting the historical account, the argument will be given in the Appendix.
Richard’s analysis (and, by implication, that of Poincaré, as well as Russell’s) was
rejected by Peano [1906], who maintained that the paradox is linguistic, for it employs
non-mathematical concepts and it would not arise in a formal setting. Peano’s view was
later endorsed by Ramsey [[1925, pp. 183, 184 in Ramsey, Philosophical Papers].
Ramsey proposed to distinguish between set theoretical paradoxes and another group of
paradoxes, which, besides mathematical notions, also appeal to language and thought. He
classified them as belonging to epistemology (an unfortunate choice of term). Later they
became known as semantic. The semantic notion underlying Richard’s paradox is the
concept of definition. This concept is closely related to the concept of truth; there is, in
fact a direct route from Richard’s paradox to the Liar. Here it is:
Assume, with no loss of generality, that our definitions have the form: ‘the set of all v
such that … v…’, where ‘…v…’ is an expression stating a condition on v. Let the
enumeration of these expressions be: ϕ0(v), ϕ1(v),…, ϕn(v), …. In a formal language these
would be wffs with one free variable. The condition that m ∈ Xn is now to be stated as:
ϕn(m) is true,
where m is a term in that language that denotes the number m. We shall henceforth refer
to m as the numeral of m. If we ourselves are using that language, then instead of stating:
ϕn(m) is true, we can simply state: ϕn(m). But the statements would be different for
4
In Richard’s paper one enumerates real numbers, by enumerating all definitions of decimal sequences.
Let ai(j) be the jth decimal in the sequence defined by the ith definition. Define rn by:
rn = an(n)+1, if an(n) < 9; rn = 1, if an(n) = 9
The real number whose decimal expansion is r0,…, rn, … is different from all the real numbers defined in
our language; yet we have just defined it. The difference between this and our variant is not essential.
3
different wffs. We therefore cannot use in our definition all the statements that arise
from different ϕn’s, as ‘n’ ranges over the natural numbers. We are forced to use the truth
predicate: x is true. (In general, we should apply ‘is true’ to the name of the sentence in
question, rather than the sentence itself. But to simplify the notation let us, for the
moment, use ‘ϕn(m)’ as a name of itself ─ what is known in logic as autonymous use.
This is possible, since we are using English in order to speak about another language and
there is no danger of confusion. Later, when everything is done in the same formal
language, we shall introduce names for the wffs.)
X* is therefore defined by:
(1)
X*(n) ⇔Df ϕn(n) is not true
If X* were definable by a wff ϕk(v), then, this wff would say the following:
(2)
The sentence, obtained by substituting, in ϕv, the free variable by the
numeral of v, is not true.
“Saying” this means that for all n, the following holds:
(3)
ϕk (n) ⇔ (ϕ n(n) is not true).
For n = k, the biconditional becomes:
(4)
ϕk (k) ⇔ (ϕk (k) is not true)
Thus we get a sentence that “says of itself” that it is not true. This is a version of the Liar
paradox. The classical Liar, which goes back to Eubulides in the fourth century
B.C, is based on utterances of the form “What I am saying now is not true”, or “The
sentence written in … is not true”, where ‘…’ refers to a location in which this very same
sentence is written. These versions are based on the use of the truth predicate, within the
same language; but they require also the use of ‘now’, or ‘here’, or names of locations in
which there are sentences. The above shows that the latter requirement is not needed. In
a sufficient expressive language ─ in which we can describe the language’s syntax and
enumerate its wffs ─ we can achieve self-reference by diagonalization.
It is known that initially Gödel tried to prove the consistency of real number theory, by
representing real numbers in terms of arithmetically definable sets in some countable
language. 5 In all probability, he reflected on Richard’s paradox and found the version
given above. He knew that the trouble with formulating (1) in the same formal language
is the use of semantic concepts. It is likely that the reasoning that leads from (1) to (4),
led him to the idea of his proof: If we replace ‘true’ by ‘provable’ then there will not be
an obstacle to expressing the right-hand side of (1) in the same language, provided that
the language is sufficiently expressive. This will establish the essential gap between truth
and provability. Evidence can be found in his introduction to his 1931 paper (pp. 147 –
149, in the collected papers), where he explains the idea of the proof. Here is a sketch.
The presupposed formal system is derived from Principia Mathematica and referred to
rather loosely as ‘PM ’. Later, when the system is precisely specified, he uses ‘P’
5
Cf. Dawson [1997] p. 61.
4
instead. 6 He defines a class sign as any formula of PM with one free variable that ranges
over the natural numbers. The class signs are enumerated in a sequence (he suggests in a
footnote a possible enumeration); Let R(n) be the nth class sign. For any class sign α, let
[α;n] be the formula obtained by substituting the free variable by a sign denoting the
natural number n. Gödel notes that “the ternary relation x = [y;z] is seen to be definable in
PM. ” He then goes on to define a class K of natural numbers by the condition:
n∈ K ≡ Bew [R(n);n].
‘Bew’ (from the German beweisbare) means provable and the bar is used for negation.
In our notation the definition is: n∈ K ⇔ ¬ provable [R(n);n]. He then observes that all
the notions that occur in the right-hand side of the definition can be defined in PM,
therefore K can be defined in PM, which means that “there is a sign class S such that the
formula [S;n], interpreted according to the meaning of the terms of PM, states that the
natural number n belongs to K.” Hence, for some natural number q, the class sign S is
identical with R(q). Gödel then considers [R(q);q] and goes on to show that this sentence
(in his terminology, proposition) is undecidable in PM. The argument is the same as that
which proves our Theorem (I) below. He goes on to remark, “The analogy of the
argument with the Richard antinomy leaps to the eye. It is closely related to the Liar too.”
In a footnote he notes that “any epistemological antinomy could be used in a similar
proof of the existence of undecidable propositions.” (Evidently, he adopted Ramsey’s
term for what we now call semantic paradoxes.)
Assuming, as Gödel did, that the language is about natural numbers, one will have to
represent wffs and proofs by numbers. Hence, the arithmetization of the syntax, a
project that Gödel carried out in full detail. The number representing a syntactic item (a
wff, or a proof) is called nowadays the Gödel number of the item. Thus the idea of the
proof would lead one naturally to the project of arithmetization, though he might have
had the idea of arithmetization before that.
Assume, to start with, a formal deductive system, based on first-order logic, the language
of which is interpreted in a model, M, whose universe consists of the natural numbers,
0,1,2,,…n,…. The language should contain function symbols and predicates, with the
help of which the syntax can be expressed, via some encoding. Truth means truth in this
model. We also assume a formal theory T whose theorems are true in the given
interpretation. T is given by a set of non-logical axioms, the inference rules being those
of first-order logic. A proof in T is a proof from its axioms. From now on ‘proof’ means
proof in T, and ‘provable’ means provable in T, unless the context indicates otherwise.
Our goal is to construct a wff that expresses the analog of (2), where ‘true’ is replaced by
‘provable’:
(2#) The sentence obtained by substituting, in ϕv( ), the free variable by the
numeral of v, is not provable.
6
Gödel’s referring to Principia Mathematica as a “formal system” is far from accurate. That landmark is
not a well-defined formal system; formalizing it in a way that remains faithful to the authors’ ideas ─ in
particular, to Russell’s theory of propositions ─ is still an open task that requires creative work. Gödel’s
system P corresponds to a certain fragment of Principia Mathematica, modified and represented as a
Hilbert-type system. It amounts to third order arithmetic. In Gödel [1934] the reference to Principia
Mathematica is omitted. The system is still third order arithmetic. Later it became clear that first-order
arithmetic (what came to be known as Peano Arithmetic) is sufficient. In the early fifty’s this was reduced
to a very weak fragment of it known as Robinson’s Q.
5
Assume that we have an encoding of syntactic items into numbers, which (among other
things) associates with every wff α a Gödel number; let this number be GN(α). Let us
now find what is required for the expression of (2#). The name of a wff α in our
language is the name of this number, i.e., the numeral, GN(α). Let this be 7α9. 7
We assume that we have a wff, with two free variables, Proof(z, v) that “says” that z is
a proof of v. The scare quotes is a reminder that the wff says something about natural
numbers; and it is only through the encoding that we can view it as expressing statements
about proofs. To qualify for its role, Proof(z, v) should define the proof relation (where
proofs and wffs are identified with numbers); that is: Proof(n, m) is true iff n is a code of
a proof, and m is the Gödel number of the wff proved by it. But for our purposes it is
only required that this hold for the cases in which m is a Gödel number of some wff; we
do not care what happens if m is not a Gödel number of a wff. We therefore assume that,
for every wff α and every number n, we have:
(5) Proof(n, 7α9) is true ⇔ n is (a code of) a proof of α.
It follows that, for every wff α:
(6) ∃ z Proof(z, 7α9) is true ⇔ α is provable.
The argument is easy: If ∃ z Proof(z, 7α9) is true, than some number n satisfies the wff
Proof(z, 7α9), hence Proof(n , 7α9) is true; by (5) this implies that n is (a code of) a
proof of α, hence α is provable. Vice versa if α is provable, it has a proof. If n codes it,
then, by (5), Proof(n, 7α9) is true. ∃ z Proof(z, 7α9) is a logical consequence of
Proof(n, 7α9), hence it is true. (6) is obviously equivalent to:
(6’) ¬ ∃ z Proof(z, 7α9) is true ⇔ α is not provable
We shall use ¬ ∃ z Proof(z, u) as the wff that “says” that u is not provable.
In order to construct the wff that expresses (2#), we have to describe in our language the
enumeration of wffs and we have to define the operation of substituting, in ϕn(v), the free
variable by the numeral n. Now, we have already a representation of wffs by their
Gödel numbers. Let
ϕn(v) = wff whose Gödel number is n.
(This is defined whenever n is a Gödel number of a wff with one free variable. We need
not worry about n’s that are not Gödel numbers of such wffs; they will drop out of the
construction.) Since n = GN(ϕn(v), we have: ϕn(n) = ϕn(7ϕn(v)9). We can therefore drop
the index and consider, for any wff ϕ(v), the substitution of the free variable of ϕ(v) by its
name. The result of this operation is ϕ(7ϕ(v)9). Call it the diagonalization of ϕ(v). Our
(2#) now becomes:
(2*) The diagonalization of v is not provable.
7
Those acquainted with Quine’s corners, note that our notation is not Quine corners notation. It is a
special notation often adopted in this context.
6
Note: For notational purposes it is convenient to let ‘v’ be the free variable of the wff.
One can with equal ease describe in the language a more general function, where ϕ may
contain several free variables and we substitute a fixed, agreed upon, variable by 7ϕ9.
Or we can substitute the first free variable, in some fixed pre-given ordering of all
variables in the language. Such functions are used in extending various results.
To express (2*) we describe in the formal language a diagonalization function, that is: a
function that maps the Gödel number of ϕ(v) to the Gödel number of its diagonalization.
(It does not matter how the function is defined on numbers that are not Gödel numbers.)
This is easily done if we can use a term with one free variable, diag(x), such that, for
every wff ϕ(v), diag(7ϕ(v)9) ≈ 7ϕ(7ϕ(v)9)9 is true, where ≈ is the equality symbol of the
language. Then the wff that expresses (2*) is:
¬ ∃ y Proof(y, diag (v)).
The existence of a term diag(x) is not necessary however. It is sufficient if we have a wff
δ(x, y) that defines: y is the diagonalization of x, in the following sense; for each ϕ(v)
(7) ∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ]
is true.
Then, the wff, ψ(v), which is defined as follows expresses (2*).
(8) ψ(v) =Df ∀ y [δ(v, y) → ¬ ¬∃ z Proof(z, y)) ].
What qualifies ψ(v) as expressing, for our purposes, (2*) is the following property:
(9) ψ(7ϕ(v)9 ) ↔ ¬∃ z Proof(z, 7ϕ(7ϕ(v)9))9
is true, for each ϕ(v).
Proof of (9): ψ(7ϕ(v)9) is, by definition, ∀ y [δ(7ϕ(v)9, y) → ¬ ¬∃ z Proof(z, y)) ].
Now ∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ] says that there is exactly one y such that
δ(7ϕ(v)9, y) and that this y is equal to 7ϕ(7ϕ(v)9)9 . This implies in pure logic:
∀ y [δ(7ϕ(v)9, y) → ¬ ¬∃ z Proof(z, y)) ] ↔ ¬ ∃ z Proof(z, 7ϕ(7ϕ(v)9) 9).
Applying (9) for the case ϕ(v) = ψ(v), we deduce that the following biconditional is true:
ψ(7ψ (v)9 ) ↔ ¬∃ z Proof(z, 7ψ (7ψ (v)9))9.
Letting γ = ψ(7ψ (v)9 ), we have:
(10) γ ↔ ¬∃ z Proof(z, (7γ 9)
is true.
γ is therefore a sentence that “says of itself” that it is not provable. It is sometimes
referred to as the Gödel sentence. Note that γ’s construction is completely analogous to
the construction of ϕk(k) that figures in (3). While ϕk(k) says of itself that it is not true, γ
says of itself that it is not provable.
7
Theorem (I) Neither γ, nor ¬γ are provable.
Proof: If γ is provable, then it is true. Therefore ¬∃ z Proof(z, (7γ 9) is true. But this
implies, via (6’), that γ is not provable. Contradiction. For the second claim note that (10)
is equivalent to the truth of ¬γ ↔ ∃ z Proof(z, (7γ 9). If ¬γ is provable, then it is true,
hence ∃ z Proof(z, (7γ 9) is true. Via (6), this implies that γ is provable, hence true. It
follows that both ¬γ and γ are true, which is impossible.
qed
Theorem (I) is a version of the first incompleteness theorem, whose proof is indicated in
the introduction of [Gödel 1931]. The proof presupposes an intended model for the
language and uses the concept of being true in that model. Gödel noted that a similar
result is derivable from syntactic assumptions concerning T, without appealing to the
intended interpretation. That result is known as the first incompleteness theorem. Strictly
speaking, the two versions are incomparable, because the syntactic version requires
assumptions about T that are not required in Theorem (I).
We use ‘T A α’ as well as ‘AT α’ to say that α is provable in T. We now replace (5) by:
(11) For every wff α and every number n :
(i) n codes a proof in T of α ⇒ T A Proof(n, 7α9)
(ii) n does not code a proof in T of α ⇒ T A ¬Proof(n, 7α9)
And we replace (7) by:
(12) For every wff ϕ(v), T A ∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ]
Letting ψ and γ be as before, we get the following analogues of (9) and (10):
(13) For all ϕ(v), T A ψ(7ϕ(v)9 ) ↔ ¬∃ z Proof(z, 7ϕ(7ϕ(v)9)9 )
(14) T A γ ↔ ¬∃ z Proof(z, (7γ 9)
(13) is established by formalizing, inside T, the argument that proves (9). The
biconditional ψ(7ϕ(v)9 ) ↔ ¬∃z Proof(z, 7ϕ(7ϕ(v)9)9 ) is a logical consequence of
∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ], hence we can derive it from the latter in firstorder logic. If ∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ] is provable in T, so is
ψ(7ϕ(v)9) ↔ ¬∃z Proof(z, 7ϕ(7ϕ(v)9)9 ). (14) follows from (13) by letting ϕ(v) = ψ(v).
Note that (13), hence (14), follows from (12) only. We need (11) in deriving the
consequences of (14). The argument now runs as follows.
Suppose that T A γ. Then there is a proof of γ. Let n be its code. From (i) in (11), for the
case α=γ, we have: T A Proof(n, 7γ 9). Since Proof(n, 7γ 9) →∃ z Proof(z, (7γ9) is a
8
theorem of first-order logic, T A ∃ z Proof(z, (7γ 9). On the other hand, we can combine
the proof of γ, together with the proof of γ ↔ ¬∃ z Proof(z, (7γ 9) and get a proof of
¬∃ z Proof(z, (7γ 9). Hence T A ¬∃ z Proof(z, (7γ 9). Thus T is inconsistent. This shows
that if T is consistent, γ is not provable in it.
Next, suppose that T A ¬γ. From γ ↔ ¬∃ z Proof(z, (7γ 9), we can derive in sentential
logic: ¬γ ↔ ∃ z Proof(z, (7γ 9), hence we can prove in T: ∃ z Proof(z, (7γ 9). But if T is
consistent, there is no proof in it of γ; otherwise, both ¬γ and γ would be provable in T.
By (ii) of (11) it follows that for every n, T A ¬Proof(n, 7γ9). Therefore, if ¬γ is
provable, then T has the following undesirable property: T A ∃ z Proof(z, (7γ 9), but for
every n, T A ¬Proof(n, 7γ9).
If for some wff α(z), T A ∃ z α(z) and also, for every n, T A ¬α(n), then T is called ωinconsistent. An ω-consistent theory is one that is not ω-inconsistent. An inconsistent
theory is also ω-inconsistent, since in an inconsistent theory all wffs are provable. But the
reverse implication need not hold; there are consistent but ω-inconsistent theories. 8
Given that the individual variables are supposed to range over the natural numbers, ωinconsistency is a very bad feature. It means that we can prove in the theory a positive
existential claim: there is a number with a certain property, but for every particular
number, we can prove that it does not have this property. If we could infer from the
infinite number of wffs, ¬α(n), n = 0,1,2,…, the wff ∀v¬α(v), we would have gotten a
contradiction. Such an inference rule, from an infinite list of premises, is sometimes
called an ω-rule; hence ‘ω-inconsistent’ means inconsistent when we add the ω-rule.
The above argument establishes Theorem (II), below, which came to be known as
Gödel’s first incompleteness theorem. It holds under the assumption that we have a
coding of wffs and proofs for which (11) and (12) hold.
Theorem (II) There is a sentence γ such that (i) if T is consistent γ is unprovable in T,
(ii) if T is ω-consistent ¬γ is unprovable in T.
While consistency is a purely syntactic property, ω-consistency has a semantic link: the
fact that the theory is supposed to be interpreted in some universe consisting of the
natural numbers. Still, there is no appeal here to the truth concept.
Five years after the publication of Gödel’s result, Rosser [1936] succeeded in eliminating
the need for ω-consistency. Using some additional, rather weak, assumptions concerning
T, he has shown how to construct a sentence ρ, such that, if T is consistent then, neither ρ
nor ¬ρ are provable in it.
8
The models for such a theory must be non-standard, i.e., besides the standard numbers 0,1,…, they
contain non-standard natural numbers, which cannot be distinguished “inside the model” from the standard
ones. The wff α(v) is not satisfied by any standard number; but there is a non-standard number that satisfies
it.
9
One of the earliest logicians ─ perhaps the first ─ to read carefully Gödel’s manuscript
was Carnap. Analyzing the proof, he noticed that the argument that leads from (12) to
(14) is of a general character and does not depend on the wff ¬∃ z Proof(z, v). The same
construction can be applied to any wff α(v), resulting in a sentence that “says of itself”
that it has the property described by α(v):
Theorem (III) Assume that we have a wff δ(v, y) such that, for every ϕ(v),
T A ∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ]
Then, given any wff α(v) in the language, we can construct a sentence σ such that
T A σ ↔ α(7σ9)
The proof is completely analogous to the argument from (12) to (14), with α(v)
replacing ¬∃ z Proof(z,v). Namely, let β(v) = ∀ y [δ(v, y) → α(y)]. Then, for all ϕ(v),
T A β (7ϕ(v)9 ) ↔ α ( 7ϕ(7ϕ(v)9)9 ) . Choosing ϕ(v) = β(v) and letting σ =Df β(7β9), we
get: T A σ ↔ α(7σ9).
qed
If we consider the function that maps every sentence χ to α7χ9), then σ is mapped to a
sentence equivalent to it in T. For this reason the theorem is often known as the fixed
point theorem. It is sometimes also called the self-reflection lemma and also
diagonalization lemma. Carnap published this observation in [1934] and Gödel stated it
in his 1934 paper, referring to Carnap in a footnote. Although the proof generalizes in an
obvious way an argument used first by Gödel, anyone who goes through Gödel’s lengthy
manuscript will see that it is not at all trivial to extract it from the host of details. Gödel
himself did not seem to notice it. Carnap’s role is ignored today, but it is fitting that
Carnap’s role be remembered. I shall refer to it as the Gödel-Carnap fixed point theorem.
The theorem has numerous applications.
Nowadays the incompleteness result is usually proved by stating and proving first the
Gödel-Carnap fixed point theorem, and then by applying it for the case α(v) =
¬∃ z Proof(z,v). This reverses the historical order. It also endows the fixed point
construction with an aura of a magic trick.
Finally, note that the choice of an arithmetical language is not essential ─ a point noted
by Gödel in his introduction. Any language will serve, as long as a sufficient portion of
the syntax can be expressed in it, via a suitable representation of syntactic items by the
objects that T is about. In particular, the fixed point technique can be applied in other
setups. It is instructive to see how it works in natural languages.
Diagonalization and Fixed Points in Natural Languages
Take English as our language. It includes already names for linguistic expressions
(strings of words): the name of an expression is formed by enclosing it in quotes. Hence,
there is no need for encoding. Also predication in English is effected by appending the
predicate ─ including the copula ‘is’, if needed ─ to the noun phrase. Hence, the English
parallel of the formal sentence ϕ(a), where a is an individual constant, is an expression of
the form: ‘a is a Φ’, where ‘Φ’ stands for some descriptive phrase. The role of the wff
ϕ(v), in which the free variable marks the place to be occupied by the noun-phrase, is
played by the expression: ‘is a Φ’, which is to be appended to the noun phrase. Hence,
10
the diagonalization ϕ(7ϕ9) becomes: ‘ ‘is a Φ’ is a Φ’. 9 Diagonalization in English is
therefore effected by appending the expressions to its quote. Call this applying an
expression to itself. Such applications can yield gibberish, or grammatically correct
sentences. The latter can be nonsensical, or false, and they can also be true. Consider for
examples the results of applying the expression to itself, for each of the following:
(i) ‘boy’, (ii) ‘runs’, (iii) ‘is white’ (iv) ‘contains two words’. (v) ‘contains three words’.
They are respectively: (i*) ‘ ‘boy’ boy’, (ii*) ‘ ‘runs’ runs’, (iii*) ‘ ‘is white’ is white’,
(iv*) ‘ ‘contains two words’ contains two words’, (v*) ‘ ‘contains three words’ contains
three words’. The first is gibberish; the second is grammatically correct but either
nonsensical or false; the same is true of the third, though we might be more inclined to
classify it as false; the fourth is a false sentence; the fifth is true.
Now consider the following expression:
yields when applied to itself a Φ
It describes a property of English expressions: the property of yielding a Φ when the
expression is applied to itself. Diagonalizing this very same expression we get the
following sentence, call it Σ:
‘yields when applied to itself a Φ’ yields when applied to itself a Φ.
Σ says that when we apply to itself ‘yields when applied to itself a Φ’ , we get
something that is a Φ. But the application to itself of ‘yields when applied to itself a Φ’
gives us Σ ! Hence, we conclude: Σ iff ‘Σ’ is a Φ. This is the fixed point construction
for English.
Quine [1953] used such a construction for the case where ‘Φ’ stands for ‘non–true
sentence’. He got in this way a version of the Liar that does not involve indexicals, or
names of locations. Quine argued that the option of regarding the classical Liar sentence
as meaningless is not available for this version of the paradox. From this he concluded
that there is a need to restrict the application of the truth predicate according to Tarski’s
hierarchy. But Quine’s argument is wrong. The meaninglessness (if we agree that the
sentence is meaningless) is due to vicious circularity (as is shown clearly by the example
in the Appendix) not to the mechanism by which this vicious circularity is effected. There
is no difference in this respect between constructions based on indexicals, or location
names, and constructions that employ diagonalization. To say that the sentence is
meaningless boils down to using a logic with truth-value gaps, and this is possible in
either case.
The “diagonalization Liar”, in natural languages, is closely related to the Grelling-Nelson
paradox, [1908]. That paradox consists in defining an adjective to be heterological if it
is not true of itself, and asking whether ‘heterological’ is heterogical. For a more detailed
analysis of that paradox see [Gaifman 1983, pp.135, 136], where it is shown how the
paradox can lead to the diagonalization technique in natural language. One can then
simulate the same construction in a formal language and, with ‘true’ replaced by
9
The quotes around expressions that include schematic letters are interpreted here as Quine’s corners; it is
the scheme whose instances are obtained by replacing the schematic letter by an instance of it and quoting
the result.
11
‘provable’, the idea of Gödel’s proof emerges. Historically, this did not happen. The
Grelling-Nelson paradox remained entrenched in the linguistic domain, pertaining only to
natural languages. 10 It had no effect in formal setups. When the diagonalization technique
was applied to natural language by Quine [1953], it was through simulating a well
understood technique of mathematical logic.
The natural language analog of the Gödel sentence can be constructed by instantiating
Φ to: ‘a non–provable sentence’. The result is a sentence that says of itself that it is not
provable. There is however an essential difference between provability in the context of
formal systems and provability in natural language. The latter is open ended and not
rigorously defined. As in the formal case, we might conclude that the sentence that
asserts its own non-provability is indeed not provable; because a proof of it would lead to
contradiction. But this argument is itself a proof, in natural language, of the sentence.
Thus we get a paradox of a semantic character. The concept of provability (in natural
language) is different from the truth concept, but it is defined by appeal to it. A proof is
supposed to show us that something is true; if the result is not true, this might indicate
that there is something wrong with the proof. In particular, if we have a proof of a
contradiction, then this very fact might disqualify it from being a “real proof”.
Note that, by its very construction, the self-referring Σ is a grammatical sentence. There
might be some problems concerning its truth value, but in many cases the truth-value is
non problematic. The truth-value can however be very sensitive to the way Φ is phrased.
Equivalent Φ’s may lead to self–referring sentences that have different truth values.
Consider for example the following:
1. ‘yields when applied to itself a sentence containing twenty words’ yields when
applied to itself a sentence containing twenty words.
2. ‘yields when applied to itself a sentence that contains twenty words’ yields when
applied to itself a sentence that contains twenty words.
3. ‘yields when applied to itself a puzzling sentence’ yields when applied to itself a
puzzling sentence.
4. ‘yields when applied to itself a boring sentence’ yields when applied to itself a
boring sentence.
1 is true (check by counting the words). 2 is false. Any unclarity concerning the truth
values of 3 and 4 is due to the vagueness, or ambiguity of ‘puzzling’ and ‘boring’, not to
the self-referential nature of the sentences.
Appendix
Let ϕ0(v),…,ϕn(v),… be an enumeration of the conditions defining the corresponding sets
X0,…, Xn, …. Consider the set, X*, defined, as in Richard’s paradox by:
n∈ X* ⇔Df ϕn(n) is not true.
Let ψ(n) be the sentence that expresses the right-hand side. Now the truth-values of
sentences are determined by following a certain procedure; let us call the procedure
10
The article by Grelling and Nelson does not even appear in the comprehensive bibliography given in van
Heijenoort’s From Frege to Gödel, which covers the period 1979 ─ 1931 in the development ot formal
logic.
12
Evaluate and let us write it as a list of instruction, in the programming style. This does
not mean that truth-values are determined in an effective way; the procedure might call
for checking infinitely many cases. But we can represent it in a form that makes clear
certain relevant aspects. The procedure for the sentences ψ(n), n = 0, 1,… can be written
as follows:
Begin{Evaluate}
On input ψ(n)
Construct the sentence ϕn(n)
Apply Evaluate, on input ϕn(n)
If Evaluate, on input ϕn(n), returns T, output F
If Evaluate, on input ϕn(n), returns F, output T
End{Evaluate}
If ψ(v) itself appears in our sequence as ϕk(v), the execution of Evaluate on input ψ(k) results in
constructing ϕk(k), which is followed by applying Evaluate to the input ϕk(k); since ϕk(k) = ψ(k),
the procedure calls itself on the same input. Thus, we are in a non-terminating loop. The truth-value
of ψ(k) is therefore undefined. Hence, if ψ(v) appears in our list, it does not qualify as a definition of
a set. Note that we get a loop also if, in the above definition of X*, we replace ‘non-true’ by ‘true’. In
that case we do not get a contradiction, but such a wff, if it appears in the list, does not qualify as a
definition.
On Tarski’s analysis, ‘ϕn(n) is not true’ should be construed as a definition in a higher level language
(an embryonic form of the idea can be found at the very end of Richard’s paper). This, however, is
not the only possible handling of the paradox. Other treatments incorporate circular definitions by
using partial functions (i.e., functions that are not always defined). The use of partial function is
standard in recursion theory. What is not so well recognized is that the use of logical systems with
truth-value gaps amounts to applying this very same strategy in semantic contexts.
References
Cantor, G. 1891 “Über eine elementare Frage der Mannigfaltigkeitslehre”,
Jahresbericht der Deutche Math. Verieng vol I, pp. 75-78. Also in Cantor’s collected
papers, Gesammelte Abhandlungen, Mathematischen und Philosophischen Inhalts, 1932,
edited by Zermelo, Berlin, Verlag von Julius Springer, p. 278.
Carnap, R. 1934 Logische Syntax der Sprache, Viena: Springer. Translated into English
as The Logical Syntax of Language, 1937.
Dawson , J. 1997 Logical Dilemmas, The Life and Work of Kurt Gödel, A.K. Peters
Gaifman, H. 1983 “Paradoxes of infinity and self-applications, I.” Erkenntnis, vol. 20,
1983, pp. 131 - 155.
Gaifman, H. 2005 “Naming and Diagonalization, from Cantor to Gödel to Kleene”
Logic Journal of the IGPL, October 2006 pp. 709 – 728. Also on Gaifman’s website.
13
Gödel, K. 1931] “Über formal unentscheidebarre Sätze der Principia Mathamtica und
verwandler System I” Monathshefte für Mathematik und Physik , 38 pp. 173-198.
English translation in Gödel’s Collected Works, vol I.
Grelling, K. and L. Nelson 1908 “Bemerkungen zu der Paradoxedien von Russell und
Burali–Forti”, Abhandlungen der Fries’schen Schule 2, 1907–8. pp. 300–334.
Peano, G. “Additione” Revista de mathematica , 8 pp. 143 – 157.
Poincaré, H. 1906 “Les mathématiques et la logique”, Revue de metaphysique et de
morale 14, pp. 17-34, 294-317.
Quine, V.O. 1953 “Theory of reference” in From a Logical Point of View, pp. 130–138,
Harvard University Press (a combination and reworking of two earlier articles from
1943, in the Journal of Philosophy and from 1947 in the Journal of Symbolic Logic.)
Ramsey, F.P. 1925 “The foundations of mathematics” Proceedings of the London
Mathematical Society, 25, pp.338 – 384. Also in the collection Ramsey, Philosophical
Papers, ed. D. Mellor.
Richard, J. 1905 “Les Principes des Mathématiques et les problèmes des ensembles”
Revu general des sciences pures et appliquès , 16, p. 541. Also in English translation in
van Heijenoort’s anthology From Frege to Gödel, Harvard University Press, 1967.
Rosser, J. 1936 “Extensions of some theorems of Gödel and Church” Journal of
Symbolic Logic, 1, pp. 87–91; reprinted anthology The Undecidable, edited by M.
Davis.
Russell, B. 1906 “Les Paradoxes de la logique” Revue de Métaphisique et de
Morale, 14, English version as “On ‘Insolubilia’ and their Solution by Symbolic
Logic”, Chapter 9 in Russell’s anthology Essays in Analysis, ed. D. Lackey.
Russell, B. 1908 “Mathematical Logic as based on a theory of logical types”, Am.
Jour. of Math. 30. Reprinted in the anthology Logic and Knowledge ed. R. Marsh.
14

Download Report

Diagonalization from Cantor to Gödel

Paperzz.com

Your Paperzz