Gödel’s Incompleteness Results Haim Gaifman Chapter 0: The Easy Way to Gödel’s Proof and Related Topics∗ This short sketch of Gödel’s incompleteness proof shows how it arises naturally from Cantor’s diagonalization method [1891]. It renders Gödel’s proof and its relation to the semantic paradoxes transparent. Some historical details, which are often ignored, are pointed out. We also make some observations on circularity and draw brief comparisons with natural language. The sketch does not include the messy details of the arithmetization of the language, but the motives for it are made obvious. We suggest this as a more efficient way to teach the topic than what is found in the standard textbooks. For the sake of self–containment Cantor’s original diagonalization is included. A broader and more technical perspective on diagonalization is given in [Gaifman 2005]. In [1891] Cantor presented a new type of argument that shows that the set of all binary sequences (sequences of the form a0, a1,…,an,…, where each ai is either 0 or 1) is not denumerable ─ that is, cannot be arranged in a sequence, where the index ranges over the natural numbers. Let A0, A2,…An, … be a sequence of binary sequences. Say An = an,0, an,1, …, an,i, … . Define a new sequence A* = b0, b1,…,bn,… , by putting: bn = 1, if an,n = 0, bn = 0, if an,n = 1 * Then, for each n, A ≠ An, since the nth member of A* differs from the the nth member of An. Hence, A* does not appear among the Ai’s. A diagram of the following form, which appears already in Cantor’s original paper, illustrate the idea. The new sequence A* is obtained from the diagonal, by changing each of its values. The method came to be known as diagonalization. A0 = a0,0 a0,1 . . . a0,n . . . A1 = a1,0 a1,1 . . . a1,n . . . . . An = . . . an,0 an,1 . . . . . . an,n . . . . . . . . . . . . . . . . . . . . . . . . . There is an elementary well known correspondence between sets of natural numbers and binary sequences: A set of natural numbers, X, determines the sequence, ao, a1, an, …, defined by: ai = 1, if i ∈ X; ai = 0 if i∉ X. (As usual, ‘∈’ is used for the membership relation.) Vice versa, every binary sequence, ao, a1, an, …, determines the set of all i’s for which ai = 1. Obviously, the correspondence is one-to-one. Cantor’s argument therefore shows that the set of all subsets of natural numbers cannot be enumerated. In fact, diagonalization can (and historically was) applied directly to sets. Given a sequence of sets of natural numbers, X0 , X1 ,…, Xn, … let X* = set of all natural numbers, n, such that n∉ Xn Then, for each n, X* ≠ Xn, because n∈ X* ⇔ n∉ Xn . ∗ A revised 2007 version of a piece written in 2003 1 It is convenient for our purposes to rewrite ‘y ∈ X’ as ‘X(y)’ Then ‘y∉ X’ is written as ‘not-X(y)’. Thus we use ‘X( )’ as a predicate. With this notation, let us go through the construction and the argument in some detail. Given an enumeration of sets, X0, X1, …, Xn, …, define X* by the condition: X*(n) ⇔ not–Xn(n) Suppose, for contradiction, that, for some k, X* = Xk; then for every n we have: Xk(n) ⇔ not–Xn(n) For n=k we get: Xk(k) ⇔ not–Xk(k) Contradiction. The diagonalization that is used in defining X* consists in applying to each number, n, the predicate ‘Xn( )’ associated with it. In general, diagonalization can be used when we are given some domain of objects and a correlation that associates with some of them functions that are defined over this very same domain. Think of the object as a name that represents the correlated function. 1 Diagonalization then consists in applying to an object the function it represents. The concept extends to the case where instead of functions we have subsets. This is done by correlating with every subset, X, the function that maps each x to a truth-value: True ─ if x ∈ X, False ─ if x ∉ X. If Xa is the set associated with a, diagonalization means that we consider Xa(a). This, strictly speaking denotes a truth-value, but we may also use ‘Xa(a)’ to make the assertion that this value is True, i.e., that a∈ Xa. 2 Note that the above argument involves two uses of diagonalization: X*( ) is defined by considering those n’s for which Xn(n) and this is the first use; to this we apply negation and then, assuming that X* is the same as Xk, we consider Xk(k)––and this is the second use. We have thus two diagonalizations, with negation sandwiched between. Cantor’s original claim is negative: there is no enumeration of all sets of natural numbers. The proof, however, can be seen as establishing a positive result: a prescription for defining ─ on the basis of any enumeration of sets of natural numbers ─ a set that differs from all enumerated sets. 3 It is in this form that the method found numerous applications. It is, among other things, the most basic tool of complexity theory in theoretical computer science. A crucial historical step was taken by Richard [1905], who, in a one-page paper, applied diagonalization in a linguistic context, in the following way. Consider an enumeration of all definitions of sets of natural numbers, in some natural language, say English (in Richard’s case it would be French). The enumeration is based on the fact that all strings of words, punctuation marks and any other symbols used in the language, can be enumerated in one sequence. Deleting from this sequence all the strings that are not 1 The correlation need not be one–to–one; the same function can be represented by different objects. Also the functions can be defined over a more inclusive domain. 2 Thus we employ systematic ambiguity. The context decides how the notation is read. We should keep in mind the important philosophical distinction, pointed out by Wittgenstein, between denoting and asserting; but there should be no confusion concerning our notation. 3 Some mathematicians find the notion of the set of all sets of natural numbers problematic. This notion figures in the negative claim, but there is no problem with the positive claim. 2 definitions of sets of natural numbers, we get an enumeration of all definitions. Let Xn = the set defined by the nth definition. Applying Cantorian diagonalization, we get a set X* not in this sequence. Yet, X* has been defined in the same natural language! so it should appear in the sequence after all. This, essentially, is Richard’s paradox. 4 In the same paper, Richard proposes a solution to his paradox. He argues that the purported definition of X* is illegitimate on grounds of circularity: it uses the set of all definitions, to which this definition itself belongs. The string of words that constitutes that “definition” should be deleted when we weed out the non-definitions from our initial sequence of strings. Only after constructing a sequence of legitimate definitions, can we define the new set; but that definition is not in the old sequence. Richard’s very brief explanation is a little obscure, but it contains the kernel of a valid idea. It also anticipates the much later idea of a linguistic hierarchy. Poincaré [1906] called attention to Richard’s paradox. He also adopted what he saw as the solution , giving Richard credit for it. Poincaré argued that the “set of all definitions” is not well-defined since it contains “definitions” that appeal to this very same set. The vicious circle principle, which rules out such constructions, was adopted by Russell [1906] [1908]. His theory of types, which is the basis of Principia Mathematica, was designed with the aim of ensuring, by a bottom up construction, that there would be no circularity. There is, in fact, a simple direct argument that identifies, without going into the intricacies of “the set of all definitions”, the circularity and the failure in the purported definition of X*. The same kind of failure underlies other semantical paradoxes. In order to avoid interrupting the historical account, the argument will be given in the Appendix. Richard’s analysis (and, by implication, that of Poincaré, as well as Russell’s) was rejected by Peano [1906], who maintained that the paradox is linguistic, for it employs non-mathematical concepts and it would not arise in a formal setting. Peano’s view was later endorsed by Ramsey [[1925, pp. 183, 184 in Ramsey, Philosophical Papers]. Ramsey proposed to distinguish between set theoretical paradoxes and another group of paradoxes, which, besides mathematical notions, also appeal to language and thought. He classified them as belonging to epistemology (an unfortunate choice of term). Later they became known as semantic. The semantic notion underlying Richard’s paradox is the concept of definition. This concept is closely related to the concept of truth; there is, in fact a direct route from Richard’s paradox to the Liar. Here it is: Assume, with no loss of generality, that our definitions have the form: ‘the set of all v such that … v…’, where ‘…v…’ is an expression stating a condition on v. Let the enumeration of these expressions be: ϕ0(v), ϕ1(v),…, ϕn(v), …. In a formal language these would be wffs with one free variable. The condition that m ∈ Xn is now to be stated as: ϕn(m) is true, where m is a term in that language that denotes the number m. We shall henceforth refer to m as the numeral of m. If we ourselves are using that language, then instead of stating: ϕn(m) is true, we can simply state: ϕn(m). But the statements would be different for 4 In Richard’s paper one enumerates real numbers, by enumerating all definitions of decimal sequences. Let ai(j) be the jth decimal in the sequence defined by the ith definition. Define rn by: rn = an(n)+1, if an(n) < 9; rn = 1, if an(n) = 9 The real number whose decimal expansion is r0,…, rn, … is different from all the real numbers defined in our language; yet we have just defined it. The difference between this and our variant is not essential. 3 different wffs. We therefore cannot use in our definition all the statements that arise from different ϕn’s, as ‘n’ ranges over the natural numbers. We are forced to use the truth predicate: x is true. (In general, we should apply ‘is true’ to the name of the sentence in question, rather than the sentence itself. But to simplify the notation let us, for the moment, use ‘ϕn(m)’ as a name of itself ─ what is known in logic as autonymous use. This is possible, since we are using English in order to speak about another language and there is no danger of confusion. Later, when everything is done in the same formal language, we shall introduce names for the wffs.) X* is therefore defined by: (1) X*(n) ⇔Df ϕn(n) is not true If X* were definable by a wff ϕk(v), then, this wff would say the following: (2) The sentence, obtained by substituting, in ϕv, the free variable by the numeral of v, is not true. “Saying” this means that for all n, the following holds: (3) ϕk (n) ⇔ (ϕ n(n) is not true). For n = k, the biconditional becomes: (4) ϕk (k) ⇔ (ϕk (k) is not true) Thus we get a sentence that “says of itself” that it is not true. This is a version of the Liar paradox. The classical Liar, which goes back to Eubulides in the fourth century B.C, is based on utterances of the form “What I am saying now is not true”, or “The sentence written in … is not true”, where ‘…’ refers to a location in which this very same sentence is written. These versions are based on the use of the truth predicate, within the same language; but they require also the use of ‘now’, or ‘here’, or names of locations in which there are sentences. The above shows that the latter requirement is not needed. In a sufficient expressive language ─ in which we can describe the language’s syntax and enumerate its wffs ─ we can achieve self-reference by diagonalization. It is known that initially Gödel tried to prove the consistency of real number theory, by representing real numbers in terms of arithmetically definable sets in some countable language. 5 In all probability, he reflected on Richard’s paradox and found the version given above. He knew that the trouble with formulating (1) in the same formal language is the use of semantic concepts. It is likely that the reasoning that leads from (1) to (4), led him to the idea of his proof: If we replace ‘true’ by ‘provable’ then there will not be an obstacle to expressing the right-hand side of (1) in the same language, provided that the language is sufficiently expressive. This will establish the essential gap between truth and provability. Evidence can be found in his introduction to his 1931 paper (pp. 147 – 149, in the collected papers), where he explains the idea of the proof. Here is a sketch. The presupposed formal system is derived from Principia Mathematica and referred to rather loosely as ‘PM ’. Later, when the system is precisely specified, he uses ‘P’ 5 Cf. Dawson [1997] p. 61. 4 instead. 6 He defines a class sign as any formula of PM with one free variable that ranges over the natural numbers. The class signs are enumerated in a sequence (he suggests in a footnote a possible enumeration); Let R(n) be the nth class sign. For any class sign α, let [α;n] be the formula obtained by substituting the free variable by a sign denoting the natural number n. Gödel notes that “the ternary relation x = [y;z] is seen to be definable in PM. ” He then goes on to define a class K of natural numbers by the condition: n∈ K ≡ Bew [R(n);n]. ‘Bew’ (from the German beweisbare) means provable and the bar is used for negation. In our notation the definition is: n∈ K ⇔ ¬ provable [R(n);n]. He then observes that all the notions that occur in the right-hand side of the definition can be defined in PM, therefore K can be defined in PM, which means that “there is a sign class S such that the formula [S;n], interpreted according to the meaning of the terms of PM, states that the natural number n belongs to K.” Hence, for some natural number q, the class sign S is identical with R(q). Gödel then considers [R(q);q] and goes on to show that this sentence (in his terminology, proposition) is undecidable in PM. The argument is the same as that which proves our Theorem (I) below. He goes on to remark, “The analogy of the argument with the Richard antinomy leaps to the eye. It is closely related to the Liar too.” In a footnote he notes that “any epistemological antinomy could be used in a similar proof of the existence of undecidable propositions.” (Evidently, he adopted Ramsey’s term for what we now call semantic paradoxes.) Assuming, as Gödel did, that the language is about natural numbers, one will have to represent wffs and proofs by numbers. Hence, the arithmetization of the syntax, a project that Gödel carried out in full detail. The number representing a syntactic item (a wff, or a proof) is called nowadays the Gödel number of the item. Thus the idea of the proof would lead one naturally to the project of arithmetization, though he might have had the idea of arithmetization before that. Assume, to start with, a formal deductive system, based on first-order logic, the language of which is interpreted in a model, M, whose universe consists of the natural numbers, 0,1,2,,…n,…. The language should contain function symbols and predicates, with the help of which the syntax can be expressed, via some encoding. Truth means truth in this model. We also assume a formal theory T whose theorems are true in the given interpretation. T is given by a set of non-logical axioms, the inference rules being those of first-order logic. A proof in T is a proof from its axioms. From now on ‘proof’ means proof in T, and ‘provable’ means provable in T, unless the context indicates otherwise. Our goal is to construct a wff that expresses the analog of (2), where ‘true’ is replaced by ‘provable’: (2#) The sentence obtained by substituting, in ϕv( ), the free variable by the numeral of v, is not provable. 6 Gödel’s referring to Principia Mathematica as a “formal system” is far from accurate. That landmark is not a well-defined formal system; formalizing it in a way that remains faithful to the authors’ ideas ─ in particular, to Russell’s theory of propositions ─ is still an open task that requires creative work. Gödel’s system P corresponds to a certain fragment of Principia Mathematica, modified and represented as a Hilbert-type system. It amounts to third order arithmetic. In Gödel [1934] the reference to Principia Mathematica is omitted. The system is still third order arithmetic. Later it became clear that first-order arithmetic (what came to be known as Peano Arithmetic) is sufficient. In the early fifty’s this was reduced to a very weak fragment of it known as Robinson’s Q. 5 Assume that we have an encoding of syntactic items into numbers, which (among other things) associates with every wff α a Gödel number; let this number be GN(α). Let us now find what is required for the expression of (2#). The name of a wff α in our language is the name of this number, i.e., the numeral, GN(α). Let this be 7α9. 7 We assume that we have a wff, with two free variables, Proof(z, v) that “says” that z is a proof of v. The scare quotes is a reminder that the wff says something about natural numbers; and it is only through the encoding that we can view it as expressing statements about proofs. To qualify for its role, Proof(z, v) should define the proof relation (where proofs and wffs are identified with numbers); that is: Proof(n, m) is true iff n is a code of a proof, and m is the Gödel number of the wff proved by it. But for our purposes it is only required that this hold for the cases in which m is a Gödel number of some wff; we do not care what happens if m is not a Gödel number of a wff. We therefore assume that, for every wff α and every number n, we have: (5) Proof(n, 7α9) is true ⇔ n is (a code of) a proof of α. It follows that, for every wff α: (6) ∃ z Proof(z, 7α9) is true ⇔ α is provable. The argument is easy: If ∃ z Proof(z, 7α9) is true, than some number n satisfies the wff Proof(z, 7α9), hence Proof(n , 7α9) is true; by (5) this implies that n is (a code of) a proof of α, hence α is provable. Vice versa if α is provable, it has a proof. If n codes it, then, by (5), Proof(n, 7α9) is true. ∃ z Proof(z, 7α9) is a logical consequence of Proof(n, 7α9), hence it is true. (6) is obviously equivalent to: (6’) ¬ ∃ z Proof(z, 7α9) is true ⇔ α is not provable We shall use ¬ ∃ z Proof(z, u) as the wff that “says” that u is not provable. In order to construct the wff that expresses (2#), we have to describe in our language the enumeration of wffs and we have to define the operation of substituting, in ϕn(v), the free variable by the numeral n. Now, we have already a representation of wffs by their Gödel numbers. Let ϕn(v) = wff whose Gödel number is n. (This is defined whenever n is a Gödel number of a wff with one free variable. We need not worry about n’s that are not Gödel numbers of such wffs; they will drop out of the construction.) Since n = GN(ϕn(v), we have: ϕn(n) = ϕn(7ϕn(v)9). We can therefore drop the index and consider, for any wff ϕ(v), the substitution of the free variable of ϕ(v) by its name. The result of this operation is ϕ(7ϕ(v)9). Call it the diagonalization of ϕ(v). Our (2#) now becomes: (2*) The diagonalization of v is not provable. 7 Those acquainted with Quine’s corners, note that our notation is not Quine corners notation. It is a special notation often adopted in this context. 6 Note: For notational purposes it is convenient to let ‘v’ be the free variable of the wff. One can with equal ease describe in the language a more general function, where ϕ may contain several free variables and we substitute a fixed, agreed upon, variable by 7ϕ9. Or we can substitute the first free variable, in some fixed pre-given ordering of all variables in the language. Such functions are used in extending various results. To express (2*) we describe in the formal language a diagonalization function, that is: a function that maps the Gödel number of ϕ(v) to the Gödel number of its diagonalization. (It does not matter how the function is defined on numbers that are not Gödel numbers.) This is easily done if we can use a term with one free variable, diag(x), such that, for every wff ϕ(v), diag(7ϕ(v)9) ≈ 7ϕ(7ϕ(v)9)9 is true, where ≈ is the equality symbol of the language. Then the wff that expresses (2*) is: ¬ ∃ y Proof(y, diag (v)). The existence of a term diag(x) is not necessary however. It is sufficient if we have a wff δ(x, y) that defines: y is the diagonalization of x, in the following sense; for each ϕ(v) (7) ∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ] is true. Then, the wff, ψ(v), which is defined as follows expresses (2*). (8) ψ(v) =Df ∀ y [δ(v, y) → ¬ ¬∃ z Proof(z, y)) ]. What qualifies ψ(v) as expressing, for our purposes, (2*) is the following property: (9) ψ(7ϕ(v)9 ) ↔ ¬∃ z Proof(z, 7ϕ(7ϕ(v)9))9 is true, for each ϕ(v). Proof of (9): ψ(7ϕ(v)9) is, by definition, ∀ y [δ(7ϕ(v)9, y) → ¬ ¬∃ z Proof(z, y)) ]. Now ∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ] says that there is exactly one y such that δ(7ϕ(v)9, y) and that this y is equal to 7ϕ(7ϕ(v)9)9 . This implies in pure logic: ∀ y [δ(7ϕ(v)9, y) → ¬ ¬∃ z Proof(z, y)) ] ↔ ¬ ∃ z Proof(z, 7ϕ(7ϕ(v)9) 9). Applying (9) for the case ϕ(v) = ψ(v), we deduce that the following biconditional is true: ψ(7ψ (v)9 ) ↔ ¬∃ z Proof(z, 7ψ (7ψ (v)9))9. Letting γ = ψ(7ψ (v)9 ), we have: (10) γ ↔ ¬∃ z Proof(z, (7γ 9) is true. γ is therefore a sentence that “says of itself” that it is not provable. It is sometimes referred to as the Gödel sentence. Note that γ’s construction is completely analogous to the construction of ϕk(k) that figures in (3). While ϕk(k) says of itself that it is not true, γ says of itself that it is not provable. 7 Theorem (I) Neither γ, nor ¬γ are provable. Proof: If γ is provable, then it is true. Therefore ¬∃ z Proof(z, (7γ 9) is true. But this implies, via (6’), that γ is not provable. Contradiction. For the second claim note that (10) is equivalent to the truth of ¬γ ↔ ∃ z Proof(z, (7γ 9). If ¬γ is provable, then it is true, hence ∃ z Proof(z, (7γ 9) is true. Via (6), this implies that γ is provable, hence true. It follows that both ¬γ and γ are true, which is impossible. qed Theorem (I) is a version of the first incompleteness theorem, whose proof is indicated in the introduction of [Gödel 1931]. The proof presupposes an intended model for the language and uses the concept of being true in that model. Gödel noted that a similar result is derivable from syntactic assumptions concerning T, without appealing to the intended interpretation. That result is known as the first incompleteness theorem. Strictly speaking, the two versions are incomparable, because the syntactic version requires assumptions about T that are not required in Theorem (I). We use ‘T A α’ as well as ‘AT α’ to say that α is provable in T. We now replace (5) by: (11) For every wff α and every number n : (i) n codes a proof in T of α ⇒ T A Proof(n, 7α9) (ii) n does not code a proof in T of α ⇒ T A ¬Proof(n, 7α9) And we replace (7) by: (12) For every wff ϕ(v), T A ∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ] Letting ψ and γ be as before, we get the following analogues of (9) and (10): (13) For all ϕ(v), T A ψ(7ϕ(v)9 ) ↔ ¬∃ z Proof(z, 7ϕ(7ϕ(v)9)9 ) (14) T A γ ↔ ¬∃ z Proof(z, (7γ 9) (13) is established by formalizing, inside T, the argument that proves (9). The biconditional ψ(7ϕ(v)9 ) ↔ ¬∃z Proof(z, 7ϕ(7ϕ(v)9)9 ) is a logical consequence of ∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ], hence we can derive it from the latter in firstorder logic. If ∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ] is provable in T, so is ψ(7ϕ(v)9) ↔ ¬∃z Proof(z, 7ϕ(7ϕ(v)9)9 ). (14) follows from (13) by letting ϕ(v) = ψ(v). Note that (13), hence (14), follows from (12) only. We need (11) in deriving the consequences of (14). The argument now runs as follows. Suppose that T A γ. Then there is a proof of γ. Let n be its code. From (i) in (11), for the case α=γ, we have: T A Proof(n, 7γ 9). Since Proof(n, 7γ 9) →∃ z Proof(z, (7γ9) is a 8 theorem of first-order logic, T A ∃ z Proof(z, (7γ 9). On the other hand, we can combine the proof of γ, together with the proof of γ ↔ ¬∃ z Proof(z, (7γ 9) and get a proof of ¬∃ z Proof(z, (7γ 9). Hence T A ¬∃ z Proof(z, (7γ 9). Thus T is inconsistent. This shows that if T is consistent, γ is not provable in it. Next, suppose that T A ¬γ. From γ ↔ ¬∃ z Proof(z, (7γ 9), we can derive in sentential logic: ¬γ ↔ ∃ z Proof(z, (7γ 9), hence we can prove in T: ∃ z Proof(z, (7γ 9). But if T is consistent, there is no proof in it of γ; otherwise, both ¬γ and γ would be provable in T. By (ii) of (11) it follows that for every n, T A ¬Proof(n, 7γ9). Therefore, if ¬γ is provable, then T has the following undesirable property: T A ∃ z Proof(z, (7γ 9), but for every n, T A ¬Proof(n, 7γ9). If for some wff α(z), T A ∃ z α(z) and also, for every n, T A ¬α(n), then T is called ωinconsistent. An ω-consistent theory is one that is not ω-inconsistent. An inconsistent theory is also ω-inconsistent, since in an inconsistent theory all wffs are provable. But the reverse implication need not hold; there are consistent but ω-inconsistent theories. 8 Given that the individual variables are supposed to range over the natural numbers, ωinconsistency is a very bad feature. It means that we can prove in the theory a positive existential claim: there is a number with a certain property, but for every particular number, we can prove that it does not have this property. If we could infer from the infinite number of wffs, ¬α(n), n = 0,1,2,…, the wff ∀v¬α(v), we would have gotten a contradiction. Such an inference rule, from an infinite list of premises, is sometimes called an ω-rule; hence ‘ω-inconsistent’ means inconsistent when we add the ω-rule. The above argument establishes Theorem (II), below, which came to be known as Gödel’s first incompleteness theorem. It holds under the assumption that we have a coding of wffs and proofs for which (11) and (12) hold. Theorem (II) There is a sentence γ such that (i) if T is consistent γ is unprovable in T, (ii) if T is ω-consistent ¬γ is unprovable in T. While consistency is a purely syntactic property, ω-consistency has a semantic link: the fact that the theory is supposed to be interpreted in some universe consisting of the natural numbers. Still, there is no appeal here to the truth concept. Five years after the publication of Gödel’s result, Rosser [1936] succeeded in eliminating the need for ω-consistency. Using some additional, rather weak, assumptions concerning T, he has shown how to construct a sentence ρ, such that, if T is consistent then, neither ρ nor ¬ρ are provable in it. 8 The models for such a theory must be non-standard, i.e., besides the standard numbers 0,1,…, they contain non-standard natural numbers, which cannot be distinguished “inside the model” from the standard ones. The wff α(v) is not satisfied by any standard number; but there is a non-standard number that satisfies it. 9 One of the earliest logicians ─ perhaps the first ─ to read carefully Gödel’s manuscript was Carnap. Analyzing the proof, he noticed that the argument that leads from (12) to (14) is of a general character and does not depend on the wff ¬∃ z Proof(z, v). The same construction can be applied to any wff α(v), resulting in a sentence that “says of itself” that it has the property described by α(v): Theorem (III) Assume that we have a wff δ(v, y) such that, for every ϕ(v), T A ∀ y [δ(7ϕ(v)9, y) ↔ y ≈ 7ϕ (7 ϕ(v)9)9 ] Then, given any wff α(v) in the language, we can construct a sentence σ such that T A σ ↔ α(7σ9) The proof is completely analogous to the argument from (12) to (14), with α(v) replacing ¬∃ z Proof(z,v). Namely, let β(v) = ∀ y [δ(v, y) → α(y)]. Then, for all ϕ(v), T A β (7ϕ(v)9 ) ↔ α ( 7ϕ(7ϕ(v)9)9 ) . Choosing ϕ(v) = β(v) and letting σ =Df β(7β9), we get: T A σ ↔ α(7σ9). qed If we consider the function that maps every sentence χ to α7χ9), then σ is mapped to a sentence equivalent to it in T. For this reason the theorem is often known as the fixed point theorem. It is sometimes also called the self-reflection lemma and also diagonalization lemma. Carnap published this observation in [1934] and Gödel stated it in his 1934 paper, referring to Carnap in a footnote. Although the proof generalizes in an obvious way an argument used first by Gödel, anyone who goes through Gödel’s lengthy manuscript will see that it is not at all trivial to extract it from the host of details. Gödel himself did not seem to notice it. Carnap’s role is ignored today, but it is fitting that Carnap’s role be remembered. I shall refer to it as the Gödel-Carnap fixed point theorem. The theorem has numerous applications. Nowadays the incompleteness result is usually proved by stating and proving first the Gödel-Carnap fixed point theorem, and then by applying it for the case α(v) = ¬∃ z Proof(z,v). This reverses the historical order. It also endows the fixed point construction with an aura of a magic trick. Finally, note that the choice of an arithmetical language is not essential ─ a point noted by Gödel in his introduction. Any language will serve, as long as a sufficient portion of the syntax can be expressed in it, via a suitable representation of syntactic items by the objects that T is about. In particular, the fixed point technique can be applied in other setups. It is instructive to see how it works in natural languages. Diagonalization and Fixed Points in Natural Languages Take English as our language. It includes already names for linguistic expressions (strings of words): the name of an expression is formed by enclosing it in quotes. Hence, there is no need for encoding. Also predication in English is effected by appending the predicate ─ including the copula ‘is’, if needed ─ to the noun phrase. Hence, the English parallel of the formal sentence ϕ(a), where a is an individual constant, is an expression of the form: ‘a is a Φ’, where ‘Φ’ stands for some descriptive phrase. The role of the wff ϕ(v), in which the free variable marks the place to be occupied by the noun-phrase, is played by the expression: ‘is a Φ’, which is to be appended to the noun phrase. Hence, 10 the diagonalization ϕ(7ϕ9) becomes: ‘ ‘is a Φ’ is a Φ’. 9 Diagonalization in English is therefore effected by appending the expressions to its quote. Call this applying an expression to itself. Such applications can yield gibberish, or grammatically correct sentences. The latter can be nonsensical, or false, and they can also be true. Consider for examples the results of applying the expression to itself, for each of the following: (i) ‘boy’, (ii) ‘runs’, (iii) ‘is white’ (iv) ‘contains two words’. (v) ‘contains three words’. They are respectively: (i*) ‘ ‘boy’ boy’, (ii*) ‘ ‘runs’ runs’, (iii*) ‘ ‘is white’ is white’, (iv*) ‘ ‘contains two words’ contains two words’, (v*) ‘ ‘contains three words’ contains three words’. The first is gibberish; the second is grammatically correct but either nonsensical or false; the same is true of the third, though we might be more inclined to classify it as false; the fourth is a false sentence; the fifth is true. Now consider the following expression: yields when applied to itself a Φ It describes a property of English expressions: the property of yielding a Φ when the expression is applied to itself. Diagonalizing this very same expression we get the following sentence, call it Σ: ‘yields when applied to itself a Φ’ yields when applied to itself a Φ. Σ says that when we apply to itself ‘yields when applied to itself a Φ’ , we get something that is a Φ. But the application to itself of ‘yields when applied to itself a Φ’ gives us Σ ! Hence, we conclude: Σ iff ‘Σ’ is a Φ. This is the fixed point construction for English. Quine [1953] used such a construction for the case where ‘Φ’ stands for ‘non–true sentence’. He got in this way a version of the Liar that does not involve indexicals, or names of locations. Quine argued that the option of regarding the classical Liar sentence as meaningless is not available for this version of the paradox. From this he concluded that there is a need to restrict the application of the truth predicate according to Tarski’s hierarchy. But Quine’s argument is wrong. The meaninglessness (if we agree that the sentence is meaningless) is due to vicious circularity (as is shown clearly by the example in the Appendix) not to the mechanism by which this vicious circularity is effected. There is no difference in this respect between constructions based on indexicals, or location names, and constructions that employ diagonalization. To say that the sentence is meaningless boils down to using a logic with truth-value gaps, and this is possible in either case. The “diagonalization Liar”, in natural languages, is closely related to the Grelling-Nelson paradox, [1908]. That paradox consists in defining an adjective to be heterological if it is not true of itself, and asking whether ‘heterological’ is heterogical. For a more detailed analysis of that paradox see [Gaifman 1983, pp.135, 136], where it is shown how the paradox can lead to the diagonalization technique in natural language. One can then simulate the same construction in a formal language and, with ‘true’ replaced by 9 The quotes around expressions that include schematic letters are interpreted here as Quine’s corners; it is the scheme whose instances are obtained by replacing the schematic letter by an instance of it and quoting the result. 11 ‘provable’, the idea of Gödel’s proof emerges. Historically, this did not happen. The Grelling-Nelson paradox remained entrenched in the linguistic domain, pertaining only to natural languages. 10 It had no effect in formal setups. When the diagonalization technique was applied to natural language by Quine [1953], it was through simulating a well understood technique of mathematical logic. The natural language analog of the Gödel sentence can be constructed by instantiating Φ to: ‘a non–provable sentence’. The result is a sentence that says of itself that it is not provable. There is however an essential difference between provability in the context of formal systems and provability in natural language. The latter is open ended and not rigorously defined. As in the formal case, we might conclude that the sentence that asserts its own non-provability is indeed not provable; because a proof of it would lead to contradiction. But this argument is itself a proof, in natural language, of the sentence. Thus we get a paradox of a semantic character. The concept of provability (in natural language) is different from the truth concept, but it is defined by appeal to it. A proof is supposed to show us that something is true; if the result is not true, this might indicate that there is something wrong with the proof. In particular, if we have a proof of a contradiction, then this very fact might disqualify it from being a “real proof”. Note that, by its very construction, the self-referring Σ is a grammatical sentence. There might be some problems concerning its truth value, but in many cases the truth-value is non problematic. The truth-value can however be very sensitive to the way Φ is phrased. Equivalent Φ’s may lead to self–referring sentences that have different truth values. Consider for example the following: 1. ‘yields when applied to itself a sentence containing twenty words’ yields when applied to itself a sentence containing twenty words. 2. ‘yields when applied to itself a sentence that contains twenty words’ yields when applied to itself a sentence that contains twenty words. 3. ‘yields when applied to itself a puzzling sentence’ yields when applied to itself a puzzling sentence. 4. ‘yields when applied to itself a boring sentence’ yields when applied to itself a boring sentence. 1 is true (check by counting the words). 2 is false. Any unclarity concerning the truth values of 3 and 4 is due to the vagueness, or ambiguity of ‘puzzling’ and ‘boring’, not to the self-referential nature of the sentences. Appendix Let ϕ0(v),…,ϕn(v),… be an enumeration of the conditions defining the corresponding sets X0,…, Xn, …. Consider the set, X*, defined, as in Richard’s paradox by: n∈ X* ⇔Df ϕn(n) is not true. Let ψ(n) be the sentence that expresses the right-hand side. Now the truth-values of sentences are determined by following a certain procedure; let us call the procedure 10 The article by Grelling and Nelson does not even appear in the comprehensive bibliography given in van Heijenoort’s From Frege to Gödel, which covers the period 1979 ─ 1931 in the development ot formal logic. 12 Evaluate and let us write it as a list of instruction, in the programming style. This does not mean that truth-values are determined in an effective way; the procedure might call for checking infinitely many cases. But we can represent it in a form that makes clear certain relevant aspects. The procedure for the sentences ψ(n), n = 0, 1,… can be written as follows: Begin{Evaluate} On input ψ(n) Construct the sentence ϕn(n) Apply Evaluate, on input ϕn(n) If Evaluate, on input ϕn(n), returns T, output F If Evaluate, on input ϕn(n), returns F, output T End{Evaluate} If ψ(v) itself appears in our sequence as ϕk(v), the execution of Evaluate on input ψ(k) results in constructing ϕk(k), which is followed by applying Evaluate to the input ϕk(k); since ϕk(k) = ψ(k), the procedure calls itself on the same input. Thus, we are in a non-terminating loop. The truth-value of ψ(k) is therefore undefined. Hence, if ψ(v) appears in our list, it does not qualify as a definition of a set. Note that we get a loop also if, in the above definition of X*, we replace ‘non-true’ by ‘true’. In that case we do not get a contradiction, but such a wff, if it appears in the list, does not qualify as a definition. On Tarski’s analysis, ‘ϕn(n) is not true’ should be construed as a definition in a higher level language (an embryonic form of the idea can be found at the very end of Richard’s paper). This, however, is not the only possible handling of the paradox. Other treatments incorporate circular definitions by using partial functions (i.e., functions that are not always defined). The use of partial function is standard in recursion theory. What is not so well recognized is that the use of logical systems with truth-value gaps amounts to applying this very same strategy in semantic contexts. References Cantor, G. 1891 “Über eine elementare Frage der Mannigfaltigkeitslehre”, Jahresbericht der Deutche Math. Verieng vol I, pp. 75-78. Also in Cantor’s collected papers, Gesammelte Abhandlungen, Mathematischen und Philosophischen Inhalts, 1932, edited by Zermelo, Berlin, Verlag von Julius Springer, p. 278. Carnap, R. 1934 Logische Syntax der Sprache, Viena: Springer. Translated into English as The Logical Syntax of Language, 1937. Dawson , J. 1997 Logical Dilemmas, The Life and Work of Kurt Gödel, A.K. Peters Gaifman, H. 1983 “Paradoxes of infinity and self-applications, I.” Erkenntnis, vol. 20, 1983, pp. 131 - 155. Gaifman, H. 2005 “Naming and Diagonalization, from Cantor to Gödel to Kleene” Logic Journal of the IGPL, October 2006 pp. 709 – 728. Also on Gaifman’s website. 13 Gödel, K. 1931] “Über formal unentscheidebarre Sätze der Principia Mathamtica und verwandler System I” Monathshefte für Mathematik und Physik , 38 pp. 173-198. English translation in Gödel’s Collected Works, vol I. Grelling, K. and L. Nelson 1908 “Bemerkungen zu der Paradoxedien von Russell und Burali–Forti”, Abhandlungen der Fries’schen Schule 2, 1907–8. pp. 300–334. Peano, G. “Additione” Revista de mathematica , 8 pp. 143 – 157. Poincaré, H. 1906 “Les mathématiques et la logique”, Revue de metaphysique et de morale 14, pp. 17-34, 294-317. Quine, V.O. 1953 “Theory of reference” in From a Logical Point of View, pp. 130–138, Harvard University Press (a combination and reworking of two earlier articles from 1943, in the Journal of Philosophy and from 1947 in the Journal of Symbolic Logic.) Ramsey, F.P. 1925 “The foundations of mathematics” Proceedings of the London Mathematical Society, 25, pp.338 – 384. Also in the collection Ramsey, Philosophical Papers, ed. D. Mellor. Richard, J. 1905 “Les Principes des Mathématiques et les problèmes des ensembles” Revu general des sciences pures et appliquès , 16, p. 541. Also in English translation in van Heijenoort’s anthology From Frege to Gödel, Harvard University Press, 1967. Rosser, J. 1936 “Extensions of some theorems of Gödel and Church” Journal of Symbolic Logic, 1, pp. 87–91; reprinted anthology The Undecidable, edited by M. Davis. Russell, B. 1906 “Les Paradoxes de la logique” Revue de Métaphisique et de Morale, 14, English version as “On ‘Insolubilia’ and their Solution by Symbolic Logic”, Chapter 9 in Russell’s anthology Essays in Analysis, ed. D. Lackey. Russell, B. 1908 “Mathematical Logic as based on a theory of logical types”, Am. Jour. of Math. 30. Reprinted in the anthology Logic and Knowledge ed. R. Marsh. 14
© Copyright 2025 Paperzz