Class Notes

Math/CS 312 Notes
R. Pruim
Fall 2006
2
Math/CS 312 Notes – Fall 2006
c
R.
Pruim, December 4, 2006
Contents
1 Writing Proofs
1.1 Proofs in Mathematics . . . . . .
1.2 Written Proofs . . . . . . . . . .
1.3 Mathematics as a Language . . .
1.4 Proverbs . . . . . . . . . . . . . .
1.5 Contraposition and Contradiction
1.6 Ten Rules . . . . . . . . . . . . .
1.7 Forms of Theorems . . . . . . . .
1.7.1 Biconditional Theorems .
1.7.2 Existence Theorems . . .
1.7.3 Uniqueness Theorems . .
1.7.4 Universal Statements . . .
.
.
.
.
.
.
.
.
.
.
.
5
5
7
9
10
13
15
17
17
18
18
19
2 Cardinality
2.1 Counting, Children, and Chiars . . . . . . . . . . . . . . . . . . . . .
2.2 Basic Cardinality Results . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Cantor’s Two Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
21
22
25
3 Mathematical Induction
3.1 Basics . . . . . . . . .
3.2 Inductive Definitions .
3.3 Structural Induction .
3.4 Ambiguity . . . . . . .
.
.
.
.
29
29
33
37
40
4 Set
4.1
4.2
4.3
4.4
4.5
4.6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Theory
Naive Set Theory . . . . . . . . .
A Second Attempt at Set Theory
The Axioms of ZF . . . . . . . .
The axioms of PA . . . . . . . .
A Model for PA: The Universe .
A Model for PA: The Functions .
5 Computability Theory
5.1 Informal Computability . . . .
5.1.1 Numerical Computation
5.2 Formal Computability . . . . .
5.3 Notation . . . . . . . . . . . . .
5.4 Exercises . . . . . . . . . . . .
.
.
.
.
.
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
44
45
48
50
52
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
57
58
58
60
60
Math/CS 312 Notes – Fall 2006
4
5.5
5.6
Robustness of the Turing Machine Model . . . . . . . . . . . . . . .
More Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6 Computational Complexity
6.1 Time Complexity of a Turing Machine
6.2 The Complexity of Decision Problems
6.3 The Classes P and NP . . . . . . . . .
6.3.1 Some Decision Problems in NP
6.4 Reductions and Closure Properties . .
6.4.1 Polynomial Time Reductions .
6.4.2 Other Closure Properties . . .
6.4.3 Hardness and Completeness . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
c
R.
Pruim, December 4, 2006
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
61
63
63
65
68
69
72
72
72
73
Chapter 1
Writing Proofs
Proof serves many purposes simultaneously. In being exposed to the
scrutiny and judgment of a new audience, the proof is subject to a constant process of criticism and revalidation. Errors, ambiguities, and misunderstandings are cleared up by constant exposure. Proof is respectability. Proof is the seal of authority.
Proof, in its best instances, increases understanding by revealing the heart
of the matter. Proof suggests new mathematics. The novice who studies
proofs gets closer to the creation of new mathematics. Proof is mathematical power, the electric voltage of the subject which vitalizes the static
assertions of the theorems.
Finally, proof is ritual, and a celebration of the power of pure reason.
Such an exercise in reassurance may be necessary in view of all the
messes that clear thinking clearly gets us into.
Philip J. Davis and Reuben Hersh
One of the goals for this course is to improve your ability both to discover and
to express (in oral or written form) proofs of mathematical assertions. This section
presents some guidelines that will be useful in each of these tasks, especially in
preparing written proofs. These notes on writing proofs were originally produced
by M. Stob. This version has been revised somewhat by R. Pruim
1.1
Proofs in Mathematics
I have made such wonderful discoveries that I am myself lost in astonishment; out of nothing I have created a new and another world.
John Bolyai
Mathematicians prove their assertions. This distinguishes mathematics from all
other sciences and, indeed, all other intellectual pursuits. Indeed, one definition
of mathematics is that it is “proving statements about abstract objects.” You
probably first met this conception of mathematics in your secondary school geometry
course. While Euclid wasn’t the first person to prove mathematical propositions, his
treatment of geometry was the first systematization of a large body of mathematics
5
6
Math/CS 312 Notes – Fall 2006
and has served as a model of mathematical thought for 2200 years. To understand
the role of proof in mathematics, we can do no better than to start with Euclid.
The basic ingredients of Euclidean geometry are three.
Primitive terms.
Point, line, plane are all primitive terms in modern treatments of Euclidean geometry. We usually call these undefined terms and do not
attempt to give definitions for them. (Euclid did give definitions for all his terms;
we discuss the role of definition in mathematics later.) Of course a logical system
must start somewhere; it is impossible to define all terms without falling prey to
either circularity or infinite regress. David Hilbert, in his Foundations of Geometry,
simply starts as follows
Let us consider three distinct systems of things. The things composing
the first system, we will call points and designate them by the letters A,
B, C, . . . ; those of the second, we will call straight lines, and designate
them by the letters a, b, c . . . ; and those of the third system, we will call
planes and designate them by the Greek letters α, β, γ, . . . .
Hilbert says nothing about what the “things” are.
Axioms.
An axiom is a proposition about the objects in question which we do
not attempt to prove but rather which we accept as given. One of Euclid’s axioms,
for example, was “It shall be possible to draw a straight line joining any two points.”
Aristotle describes the role of axioms.
It is not everything that can be proved, otherwise the chain of proof
would be endless. You must begin somewhere, and you start with things
admitted but undemonstrable. These are first principles common to all
sciences which are called axioms or common opinions.
Euclid and Aristotle thought of axioms as propositions which were “obviously”
true. But it is not necessary to think of them as either true or false. Rather, they
are propositions which we agree, perhaps for the sake of argument, to accept as
given.
Rules of inference. Axioms and previously proved theorems are combined to
prove new theorems using the laws of logic. The nature of such rules of inference is
best illustrated by an example.
All men are mortal
Socrates is a man
Socrates is mortal
In this example, the axioms (called premises or hypotheses) are written above
the line and the theorem (called the conclusion) is written below the line. The
whole argument is called a deduction. This particular argument is an example
of a rule of inference which is now usually called Universal Instantiation. Two
important features of this argument characterize a rule of inference. First, the
relationship of the conclusion to the hypotheses is such that we cannot fail to accept
c
R.
Pruim, December 4, 2006
Section 1.2
7
the truth of the conclusion if we accept the truth of the hypotheses. Of course
we do not have to accept the truth of the hypotheses and so are not compelled
to believe the conclusion. But a rule of inference must necessarily produce true
conclusions from true hypotheses. Second, this relationship does not depend on
the concepts mentioned (humanity, mortality, Socratiety) but only on the form
of the propositions. Hilbert said it this way: “One must be able at any time to
relace ‘points, lines, and planes’ with ‘tables, chairs, and beer mugs.’ ” The next
nonsensical argument is as valid as the previous one.
All beer mugs are chairs
Socrates is a beer mug
Socrates is a chair
While neither hypothesis is true and certainly the conclusion is false (perhaps
your favorite chair is named Socrates but the Socrates I am thinking of was not a
chair), this argument too is a perfectly acceptable example of universal instantiation.
This then is how mathematics is created. Starting from axioms which are propositions about certain objects which may be undefined, the rules of inference are used
to prove new theorems. Anyone accepting the truth of the axioms, must accept the
truth of all the theorems.
Exercises
1. Morris Kline says that
Mathematics is a body of knowledge. But it contains no truths.
In a similar vein, Hilbert said
It is not truth but only certainty that is at stake.
What do they mean?
1.2
Written Proofs
I like being a writer; it’s the paperwork I dislike.
Peter De Vries
This chapter is specifically concerned with written proofs. While a proof might
be thought of as an abstract object existing only in our minds, the fact is that
mathematics advances only in so far as proofs are communicated. And writing
remains the principal means of such communication. So to be a mathematician,
you need to learn how to prove things but also to write those proofs clearly and
correctly. Learning to write proofs also makes reading other people’s proofs easier.
What does a written proof look like? Like any other piece of prose, a written
proof is organized into sentences and paragraphs. Like all other correct writing,
those sentences are written following all the standard conventions of the language in
which they are written. Because of the precision of the thoughts that a proof must
convey, it is especially important that the prose be clear and correct. In order to
c
R.
Pruim, December 4, 2006
8
Math/CS 312 Notes – Fall 2006
help the reader follow the argument being presented, it is helpful to use a style that
emphasizes the organization and flow of the argument.
It is useful to think of a proof as consisting of three sorts of sentences. First,
some sentences express the result of a deduction. That is they result from
other sentences by applying one of the rules of inference as described in the last
section. These sentences can be recognized by the fact that they start with words
like “therefore,” “thus,” “so,” or “from this it follows that.” Of course such a
sentence will not normally be the first sentence of the proof and such a sentence
normally depends on one or more earlier sentences of the proof.
A second sort of sentence is used to state a given fact. These sentences
normally are restatements of hypotheses of the theorem being proved or recall facts
that the reader
√ should know already. An example of the latter type is the sentence
“Recall that 2 is an irrational number.” Again, these sentences should contain
some sort of cue that they are stating hypotheses or previously proved facts rather
than asserting something new.
Finally, a third sort of sentence appearing in a proof is what might
be called “glue” or the prose that holds the other sentences together. These
sentences are used for purposes such as to inform the reader of the structure of
the argument (“Next we show that G is abelian.”) or to establish notation (“Let
X = Y ⊕ Z.”). This glue may also include longer passages that outline an entire
argument in advance, summarize it at the conclusion, provide motivation for the
methods being employed, or describe an example. It is a common error of beginning
proof-writers to use too few of these sentences so that the structure of the argument
is not clear.
This description of what a proof should look like is at odds with some of the ideas
of proof that students take with them from high-school mathematics and perhaps
even calculus. Some students think that a proof, like the solution to any homework
problem, should be a series of equations with few, if any, words. Now an equation
is a perfectly acceptable sentence. The equation “x+2=3.” is a sentence and could,
perhaps, appear in a proof. Mathematical notation serves as abbreviation for words
and phrases which would be tedious to write many times. However this equation
almost certainly should not appear as a sentence of a proof, for it is not clear whether
it expresses a hypothesis or a conclusion. Probably what is meant is “Suppose that
x is a real number such that x + 2 = 3.” or “Therefore, it follows from the last
equation that x + 2 = 3.” Equations must be tied together with words.
A second misconception of proof is a side-effect of studying geometry in high
school. High school geometry courses often teach that a proof should be arranged in
two columns; in one column we write assertions and in the other we write the reason
that the corresponding assertion is true. Look in a mathematics journal and you
will see that no mathematician writes proofs in this manner. The main reason for
teaching “two-column” proofs is that they sometimes help beginning proof writers
concentrate on the logical structure of the argument without having to attend to
the details of grammatical writing. A two-column proof reinforces the notion that
each assertion in a proof should serve a definite purpose and must be justified. For
realistic theorems however (unlike the baby theorems of high school geometry), the
two-column proof becomes cumbersome and hard to read. The two-column proof,
among other defects, is missing the “glue” described above. These informative
sentences are crucial to helping a reader understand a complicated proof. We want
c
R.
Pruim, December 4, 2006
Section 1.3
9
the reader to understand not only the justification of each step but also the logical
structure of the argument as a whole. Nevertheless, a two-column proof is sometimes
a good place to start on scratch paper.
1.3
Mathematics as a Language
[The universe] cannot be read until we have learnt the language and
become familiar with the characters in which it is written. It is written in mathematical language, and the letters are triangles, circles and
other geometrical figures, without which means it is humanly impossible
to comprehend a single word.
Galileo Galilei (1564 - 1642)
The chief aim of all investigations of the external world should be to
discover the rational order and harmony which has been imposed on it
by God and which He revealed to us in the language of mathematics.
Johannes Kepler (1571-1630)
Such is the advantage of a well constructed language that its simplified
notation often becomes the source of profound theories.
Pierre-Simon de Laplace (1749 - 1827)
Ordinary language is totally unsuited for expressing what physics really
asserts, since the words of everyday life are not sufficiently abstract.
Only mathematics and mathematical logic can say as little as the physicist means to say.
Russell, Bertrand (1872-1970)
Mathematics is a language.
Josiah Willard Gibbs (1839-1903)
Just like any other lanugage (natural or programming), the language of mathematics has its own vocabulary, syntax, and semantics. To communicate well, you
must use the language well.
From a computer science perspective, it is important to keep in mind that as a
language, mathematics is case-sensitive, typed, hierarchical, and allows
operator overloading. In particular, every mathematical object or variable has a
particular type. It is either an integer, or a complex number, or a graph, or a group,
or a computer program, or a proof, or . . . . When doing mathematics and when
writing proofs, it is important that you remain aware of the types of the objects you
work with.
Mathematical variables are declared just like variables must be declared in a
programming language. The typical way to do this is with a phrase like “let x be
an integer”. This declares that the type of x is integer. If x is an integer, it makes
sense to add other integers to it, to square it, or to perform any other operation
that works on arbitrary integers. Furthermore, we may perform any operation that
is legal for real numbers, or complex numbers, or quaternions, since every integer is
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
10
also a real, a complex, and a quaternion. (This is akin to inheritance in an object
oriented language.)
“Operator overloading” in mathematics is sometimes cause of confusion for beginners in a new area. There are a limited number of symbols available to us, so we
reuse them. What is meant by a symbol like +, for example, is determined by context – in particular by the types of the objects being added. Each of the following
statements is true in the correct context:
• 1 + 1 = 2;
• 1 + 1 = 0;
• 1 + 1 = 1;
• 1 + 1 = 11;
• 23 = 8;
• 23 = 222.
1.4
Proverbs
Here I have written out for you sayings
full of knowledge and wise advice
Proverbs 22:20
Proof-writing is, to some extent at least, a creative activity. There is no step-bystep recipe that, if followed carefully, is guaranteed to produce a proof. Proof-writing
is different in this way than differentiation in calculus. Machines can differentiate
as well as you can, but have not made much progress in producing proofs. However,
experienced proof-writers do have certain principles by which their search for a proof
is at least implicitly guided. In these notes we try to summarize at least some of
these principles in the form of proverbs. As a vehicle for introducing the first few of
these proverbs, we consider the proof of the following (easy) theorem.
Theorem 1.1 The square of an even natural number is even.
How does one get started writing a proof? The following proverb suggests the
answer.
Proverb 1.2 The form of the theorem suggests the outline of the proof.
In fact
Proverb 1.3 You can write the first line of a proof, even if you don’t understand
the theorem.
In a later section, we make a more detailed study of theorem forms. However
Theorem 1.1 has a common and easily recognizable form. Here is an abstract version
of that form.
Theorem 1.4 If object x has property P then x has property Q.
c
R.
Pruim, December 4, 2006
Section 1.4
11
or more simply
Theorem 1.5 If x is a P then x is a Q.
It will become clear to you in your study of abstract mathematics that many
theorems have exactly this form, though you might have to rewrite the theorem a bit
to see this clearly. Here are some examples from all sorts of different mathematical
specialities.
Theorem 1.6 Every differentiable function is continuous.
Theorem 1.7 Every semistable elliptic curve is modular.
Theorem 1.8 The lattice of computably enumerable sets is distributive.
Theorem 1.9 If I is a prime ideal of a ring R, then R/I is an integral domain.
The first of these theorems is one you know and love from your calculus class.
The second is the famous theorem of Wiles proved in 1993 (it implies Fermat’s last
theorem). The third comes from a field of mathematical logic known as recursion
theory. The last theorem is one that you would meet in an abstract algebra class.
Notice that Theorem 1.9 is really about two objects and both properties P and Q
express relationships between these two objects. Note too that while Theorem 1.8
appears to be about only one object (the lattice of recursively enumerable sets),
the other three theorems are about an unspecified number of objects. In fact, our
Theorem 1.1 is about infinitely many objects.
So if Proverb 1.2 is correct, the outline of the proof Theorem 1.4 should suggest
itself. By Proverb 1.3, we should be at least able to write the first sentence of the
proof. Indeed, here is a reasonable way to begin theorems such as Theorem 1.4.
Proof (of Theorem 1.4).
property Q.
Suppose x has property P . We must show that x has
The second sentence might be a bit pedantic, but the first clearly sets the stage.
Thus, our proof of Theorem 1.1 begins as follows.
Proof (of Theorem 1.1). Suppose that n is an even natural number . . .
Actually, we can also predict the last line of theorems with this form. We would
expect the proof of Theorem 1.1 to look like this.
Proof (of Theorem 1.1). Suppose that n is an even natural number.
n2 is even.
...
Thus
So what remains in the proof of the theorem is to fill in the . . . . We need to
reason from the fact that x has property P to the fact that x has property Q. In
our particular example, and in many similar cases, the next proverb supplies the
key.
Proverb 1.10 Use the definition.
In some cases it will be obvious that an x with property P also has property Q.
However, if it is that obvious, you probably will not be asked to supply a proof in the
first place. So the first step is usually to determine what it means to have property
P . All mathematical terms, except the primitive ones, have precise definitions. In
our case, the following definition is the important one.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
12
Definition 1.11 A natural number n is even if there is a natural number k such
that n = 2k.
Using the definition gives us the second line of our proof. It also gives us a hint as
to what the next to last line of the proof might be.
Proof (of Theorem 1.1). Suppose that n is an even natural number. Then there is
a natural number k such that n = 2k.
...
Thus there is a natural number
l such that n2 = 2l. Thus n2 is even.
The only work in this theorem is filling in the remaining gap. And this work is
simply a computation. Thus the complete proof is
Proof (of Theorem 1.1). Suppose that n is an even natural number. Then there is
a natural number k such that n = 2k. So n2 = (2k)2 = 4k2 = 2(2k2 ). Thus there is
a natural number l (namely 2k2 ) such that n2 = 2l. Thus n2 is even.
Notice that the end of each of these proofs above is marked by the symbol .
This or some other symbol is often used in books and articles to mark the ends of
proofs. Especially in older writing, the letters QED (which stand for a latin phrase
meaning ‘that which was to be proven has been shown’) are often used. Even when
using such a marker, the last sentence or two of a proof should also indicate somehow
that the conclusion of the argument has been reached, but the additional marker is
also helpful if, for example, the reader wants to skip over the proof on a first reading
or get an estimate on its length.
Here is one more (easy) theorem of this type will serve to reinforce the point of
this section.
Theorem 1.12 The base angles of an isosceles triangle are congruent.
Recasting this theorem in the form of Theorem 1.4, we might write this theorem as
Theorem 1.13 If a triangle is isosceles, then the base angles of that triangle are
congruent.
A diagram, Figure 1.4, serves to illustrate the statement of the theorem.
B
A α
B
B
B
B
B
B
γB
B C
Proof. Suppose that △ABC is isosceles. That is, suppose that AB and BC are
the same length. (Note the clever use of the definition of isosceles triangle.) We
must show that the base angles α and γ are equal. To do that notice that triangles
△ABC and △CBA are congruent by side-angle-side. Furthermore, α and γ are
corresponding angles of these two triangles. Thus γ and α are congruent (equal)
since corresponding parts of congruent triangles are congruent.
c
R.
Pruim, December 4, 2006
Section 1.5
13
Exercises
1. Write good first and last lines to the proofs of
(a) Theorem 1.6
(b) Theorem 1.7
(c) Theorem 1.8
(d) Theorem 1.9.
2. (a) Write a good definition of “odd natural number.”
(b) Prove that the square of an odd number is odd.
3. Prove the following theorem
Theorem 1.14 If a triangle has two congruent angles, then the triangle is
isosceles.
(This theorem is the converse of Theorem 1.12. It is always difficult to know
what you can assume in geometry, but the proof of this Theorem is similar to
that of Theorem 1.12 — it only uses simple facts about congruent triangles.)
1.5
Contraposition and Contradiction
“There is no use trying,” said Alice; “one can’t believe impossible things.”
“I dare say you haven’t had much practice,” said the Queen, “When I
was your age, I always did it for half an hour a day. Why, sometimes
I’ve believed as many as six impossible things before breakfast.”
Lewis Carroll
In the last section, we saw how to produce a reasonable outline of a proof of a
theorem with the following form.
Theorem 1.15 For all x, if x has property P then x has property Q.
However,
Proverb 1.16 There is more than one way to prove a theorem.
In this section, we look at two alternative outlines of the proof of Theorem 1.15.
The first of these is called “proof by contraposition.” It depends on the following
lemma.
Lemma 1.17 Suppose that α and β are propositions. Then the statement
If α then β
(1.1)
is logically equivalent to the statement
If β is not true then α is not true.
c
R.
Pruim, December 4, 2006
(1.2)
Math/CS 312 Notes – Fall 2006
14
The statement 1.2 is called the contrapositive of the statement 1.1. Here logically
equivalent means that 1.1 is true just in case 1.2 is true. So to prove a statement
of form 1.1, it is sufficient to prove 1.2. This fact gives us the first of two alternate
outlines for a proof of Theorem 1.15.
Proof (of Theorem 1.15). We prove the theorem by contraposition. Suppose that
x does not have property Q.
...
Then x does not have property P .
As a specific example of a proof with this outline, consider the following theorem.
Theorem 1.18 Suppose that n is a natural number such that n2 is odd. Then n is
odd.
Proof. We prove the theorem by contraposition. Suppose that n is not odd. That
is suppose that n is even. Then n2 is even (by Theorem 1.1). Thus n2 is not odd.
How does one decide whether to prove a theorem directly or by contraposition?
Often, the form of the properties P and Q give us a clue. If Q is a negative sort of
property, (such as x is not divisible by 3), it may very well be easier to work with
the hypothesis that x does not have property Q. Of course we could always try to
prove the theorem both ways and see which one works.
Another alternate way to prove a Theorem is to give a proof by contradiction.
In a proof by contradiction, we suppose that the theorem is false and derive some
sort of false statement from that assumption. If mathematics is consistent and if
our reasoning is sound, this means that our assumption that the theorem is false is
in error. So the outline of a proof by contradiction of Theorem 1.15 is as follows.
Proof (of of Theorm 1.15). . We prove the theorem by contradiction. So suppose
that x has property P but that x does not have property Q.
...
Then 1=2
(or any other obviously false statement). Thus our assumption is in error and it
must be the case that if x has property P , x also has property Q.
The most famous theorem which is usually proved by contradiction is the following.
√
Theorem 1.19 2 is irrational.
We first rephrase the theorem so that it has the form of Theorem 1.15.
Theorem 1.20 If x is any real number such that x2 = 2, then x is not rational.
This form is logically superior to that
√ of Theorem 1.19 since it doesn’t assume the
existence or uniqueness of a number 2. This form also suggests either contraposition or contradiction since the property “is not rational” is not as easy to work with
as the property “is rational.”
Proof. We proof Theorem 1.20 by contradiction. So suppose that x is a number
such that x2 = 2 and x is rational. Then there are natural numbers p and q such
that p and q have no common divisors and x = pq . Then
2
p
2=x =
q
2
c
R.
Pruim, December 4, 2006
Section 1.6
15
Therefore p2 = 2q 2 . This implies that p2 is even and so p is even (See exercise 1
below.) Since p is even, p = 2k for some natural number k. Thus 2q 2 = p2 = 4k2 or
q 2 = 2k2 . But this implies that q 2 is even and so that q is even. Thus p and q are
both even but this cannot be true since we assumed that p and q had no common
divisors.
Students sometimes confuse proofs by contradiction and contraposition. The
proofs look similar in the beginning since each begins by assuming that x does not
have property Q. But a proof by contradiction also assume that x has property P
and proves the negation of some true proposition while a proof by contraposition
does not make this assumption and simply proves that x does not have property P .
Exercises
1. Prove that if p is a natural number such that p2 is even then p is even.
√
2. Prove that 3 is irrational.
√
3. Prove that 6 is irrational.
4. Write the contrapositive of the following theorems.
(a) Theorem 1.6
(b) Theorem 1.7
(c) Theorem 1.8
(d) Theorem 1.9.
5. Write the assumption made in a proof by contradiction of the following theorems.
(a) Theorem 1.6
(b) Theorem 1.7
(c) Theorem 1.8
(d) Theorem 1.9.
1.6
Ten Rules
When I read some of the rules for speaking and writing the English language correctly . . . I think any fool can make a rule and every fool will
mind it.
Henry Thoreau
As in most writing, there are certain mistakes that occur over and over again in
mathematical writing. As well, there are certain conventions that mathematicians
adhere to. In this section, we present ten rules to follow in writing proofs. If you
follow these rules, you will go a long way towards making your writing clear and
correct. Most of these rules come in one form or another from the wonderful book,
Mathematical Writing, by Donald Knuth.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
16
Rule 1 Use the present tense, first person plural, active voice.
Bad:
Good:
It will now be shown that . . .
We show . . .
Rule 2 Choose the right technical term.
For example, not all formulas are equations.
Rule 3 Don’t start a sentence with a symbol.
Bad:
Good:
xn − a has n distinct roots.
The polynomial xn − a has n distinct roots.
Rule 4 Respect the equal sign.
Bad:
Good:
x2 = 4 = |x| = 2.
If x2 = 4, then |x| = 2.
Rule 5 Normally, if “if”, then “then”.
Bad:
Good:
If x is positive, x > 0.
If x is positive, then x > 0.
Rule 6 Don’t omit “that”.
Bad:
Good:
Assume x is positive.
Assume that x is positive.
Rule 7 Identify the type of variables.
Bad:
Good:
For all x, y, |x + y| ≤ |x| + |y|.
For all real numbers x, y, we have |x + y| ≤ |x| + |y|.
There is an exception to this rule. In many situations, the
notation used implies the type of a variable. For example,
it is usually understood that n denotes a natural number.
In Real Analysis, x almost always denotes a real number.
In these cases, the type can be suppressed in the interest
of saving ink and avoiding the distraction of extra words.
Rule 8 Use “that” and “which” correctly.
c
R.
Pruim, December 4, 2006
Section 1.7
17
Bad:
Good:
√
The least integer which is greater than√ 27 . . .
The least integer that is greater than 27 . . .
The distinction is this: ‘that’ introduces a modifying clause that
adds information necessary to specify the object modified; ‘which’
introduces a modifying clause that adds additional information about
an already specified object.
Rule 9 A variable used as an appositive need not be set off by commas.
Bad:
Good:
Consider the group, G, that. . .
Consider the group G that. . .
Rule 10 Don’t use symbols such as ∃, ∀, ∨, > in text; replace them by words.
Symbols may, of course, be used in formulas.
Bad:
Good:
1.7
Let S be the set of numbers < 1.
Let S be the set of numbers less than 1.
Forms of Theorems
And if you go in, should you turn left or right . . .
or right-and-three-quarters? Or maybe not quite?
Or go around back and sneak in from behind?
Simple it’s not, I’m afraid you will find,
for a mind-maker-upper to make up his mind.
Dr. Seuss
Not all theorems have the form of Theorem 1.4. In this section we look at a few
of the more common forms.
1.7.1
Biconditional Theorems
A biconditional theorem is one with form
Theorem 1.21 P is true if and only if Q is true.
The proof of Theorem 1.21, as you would suspect, has two parts.
Proof (of Theorem 1.21). First, suppose that P is true.
Next, suppose that Q is true. . . . Then P is true.
...
Then Q is true.
Often, one of the two directions of a biconditional is best proved by contraposition.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
18
1.7.2
Existence Theorems
An existence theorem is one with the form
Theorem 1.22 There is an x with property P .
Here is a specific example.
Theorem 1.23 There is a real number x such that x2 + 6x − 17 = 0.
The most common form of proof of a theorem with the form of Theorem 1.22 is
Proof. Let x be defined (constructed, given) by
because . . . .
. . . . Then x has property P
For example, here is a complete proof of Theorem 1.23.
Proof. Let x be given by
√
x = −3 + 26.
√
√
√
√
Then x2 +6x−17 = (−3+ 26)2 +6(−3+ 26)−17 = 9−6 26+26−18+6 26−17 =
0.
It is sometimes possible to prove Theorem 1.22 by contradiction. Such a proof
would look like
Proof (of Theorem 1.22). We prove the theorem by contradiction. So suppose
that no such x exists. Thus no x has property P .
. . . Then 0=1. Therefore our
assumption that no such x exists must be in error so indeed such an x does exist.
Such a proof by contradiction is rather peculiar since we now know that x exists
but the proof does not give us any way to find x.
1.7.3
Uniqueness Theorems
A uniqueness theorem is one with form
Theorem 1.24 There is a unique x with property P .
This theorem actually says two things; there is an x with property P but there is
only one x with that property. Thus, the proof has two parts; an existence part
(and we have already discussed that) and a uniqueness part. The proof of Theorem
1.24 therefore looks like this.
Proof. Existence. Define x by . . . Then x has property P because . . .
Uniqueness. Suppose that x and y have property P . . . . Then x = y.
Often, the uniqueness portion of Theorem 1.24 is proved by contradiction. Then
the second part of the proof looks like
Proof. Uniqueness. We prove uniqueness by contradiction. So suppose x and
y have property P and x 6= y.
. . . Then 0=1. Therefore our assumption that
x 6= y must be in error.
The next theorem is a typical (though simple) example of a uniqueness theorem.
Recall that an additive identity is a number 0 with the property that 0+x = x+0 = x
for all real numbers x.
c
R.
Pruim, December 4, 2006
Section 1.7
19
Theorem 1.25 The additive identity for the set of real numbers is unique.
Proof. Suppose that 0 and 0′ are numbers satisfying the defining condition for
being an additive identity, namely that
0 + x = x + 0 = x for all x
(1.3)
0′ + x = x + 0′ = x
(1.4)
and
for all x
Then by 1.3 (with x = 0′ ) we have that 0 + 0′ = 0′ and by 1.4 (with x = 0) we have
that 0 + 0′ = 0. Thus 0 = 0′ .
1.7.4
Universal Statements
A universal statement is one of form
Theorem 1.26 For all x, x has property R.
In fact, our original theorem form, Theorem 1.4, can best be understood as a universal statement.
Theorem 1.27 For all x, if x has property P then x has property Q.
Consider again our proof of Theorem 1.4. We write it somewhat differently.
Proof. Let x be an arbitrary object with property P . Then
property Q.
...
Then x has
The key feature of this proof that allows us to claim that the Theorem is true
of all objects, is that we did not assume any special properties of x (other than P ).
That is, though we named x, x was otherwise a “generic” object. You will recall
this sort of reasoning from geometry. If you were asked to prove something for all
triangles, you probably drew a triangle and labeled the vertices A, B, and C, but
realized that you were not allowed to assume anything special about the triangle
(say, from the diagram). Your proof had to work for equilateral triangles as well as
triangles that were not even isosceles. So the general proof of a universal statement
like Theorem 1.26 goes as follows.
Proof. Let x be given.
...
Then x has property R.
Here, the . . . is filled in by an argument which assumes nothing in particular
about x. As a simple concrete example, consider the following Theorem.
Theorem 1.28 For all real numbers x, x2 + x + 1 ≥ 0.
Proof. Let x be an arbitrary real number.
3
1
3
1
x2 + x + 1 = (x2 + x + ) + = (x + )2 +
4
4
2
4
But (x + 1/2)2 ≥ 0 so that (x + 1/2)2 + 3/4 > 0. Thus x2 + x + 1 > 0.
In this proof, we used no special property of x and so we were entitled to conclude
that the property in question held for all x.
c
R.
Pruim, December 4, 2006
20
Math/CS 312 Notes – Fall 2006
To prove a theorem like Theorem 1.26 by contradiction, we have to realize that
to deny that every x has property R is to assert that some x does not have property
R. So a proof by contradiction of Theorem 1.26 might look like this.
Proof. We prove the theorem by contradiction. So suppose that there is an x which
does not have property R. . . . Then 0=1. Therefore, no such x exists, i.e., all x
have property R.
The middle part of this proof would presumably be filled in by an argument
showing that x has some impossible property.
c
R.
Pruim, December 4, 2006
Chapter 2
Cardinality
In this chapter we want to investigate the sizes of sets, especially infinite sets. We
will discover that inifinite sets can come in different sizes and make an important
distinction between the smallest infinite sets (called countably infinite sets) and
larger infinite sets (called uncoutable sets).
In Section 2.2 we will make precise what we mean by the size of a set and present
some of the most important basic results on cardinality (the technical name for the
size of a set). In section 2.3, we will illustrate Cantor’s zig-zag and diagonalization
techniques, two important techniques for establishing the cardinality of a set. But
first, we turn our attention to some motivating examples from everyday life.
2.1
Counting, Children, and Chiars
We will motivate the definitions of cardinality by considering some simpler situations
that involve counting of everyday objects, like blocks and people and chairs. If you
have ever seen a young child attempt to count, you know that this is something
that must be learned, and that young children must pass through several phases as
they learn to count. In each phase, some new skill must be learned or some error
corrected.
The first step is to memorize a list of words: one, two, three, four . . . 1 Even
once the list can be repeated, in order and without error, the child cannot really
count anything until she understands how to associate these words with the objects
being counted.
And so the child is taught to say these words while pointing to objects or pictures
in a book. This phase of associating the number-words with objects develops slowly,
and at first the child is likely to make one or both of two errors that lead to a
miscount. The first error is to skip some of the objects and leads to an undercount.
The second error is to say more than one number while pointing to the same object
and leads to an overcount. So a typical young child presented with 7 objects might
obtain a “count” of ten by randomly pointing at objects and saying numbers until
they reach ten (a natural stopping point). Some objects will have been pointed at
more than once, of course. Perhaps others were never pointed at.
1
Later, for counting larger sets, it will be necessary to grasp the pattern by which new names
for numbers are generated, but this generally comes after a child has learned to count with the
memorized list of numbers to, say, twenty.
21
Math/CS 312 Notes – Fall 2006
22
Counting has only been mastered when the child understands that each object
must be associated with exactly one number from the list (no skips, no double
counts). The last number spoken is then called the size of the collection of objects.2
So we see that counting is the association of elements in two sets (in our example
above a set of number-words and a set of objects) such that each element of one
set is paired with exactly one element of the other set. We could visualize counting
three blocks as follows:
1
↔
A
2
↔
F
3
↔
D
Now suppose you are having a dinner party. At some point shortly before dinner
you begin to fear that you do not have the correct number of chairs at the table.
There are two ways to check. One is to count the chairs and the people and see if the
numbers match. But there is another (albeit potentially more embarassing) method.
Simply invite everyone to the table. If every person has a chair and every chair has
one person sitting on it, then the number of chairs and people is the same, even
though we don’t know just what that number is without some additional work. On
the other hand, if there is an unoccupied chair, or if there is a person left standing
...
The point here is that even though we do not know names for “infinite numbers”
(although they do exist) and do not have time to count elements in infinite sets (or
very large finite sets, for that matter) by pointing to them and reciting a memorized
list of words, we can still compare two sets to see if they are the same size by pairing
up elements in one set with elements in the other. Provided we do so while avoiding
the errors of skipping elements or double-matching elements, we will know our two
sets have the same size. We will formalize this in more mathematical language in
the next section.
2.2
Basic Cardinality Results
Mathematically, the “pairing up” of elements between two sets is done by means of
a function f from A to B (written f : A → B), which we can think of informally as
an assignment of an element in B to each element of A.
We want our function to avoid the errors of skipping or double-counting, so we
introduce the following definitions:
Definition 2.1 (one-to-one) A function f : A → B is one-to-one if for any
a1 , a2 ∈ A, if a1 6= a2 , then f (a1 ) 6= f (a2 ).
2
A good indication that all this is in place is when a child says something like “one, two, three;
three blocks”. The repetition of “three” indicating that the size of the collection of blocks has been
associated with the last number used to count them. Thus “three” is actually being used in two
related but somewhat different ways – no wonder it takes some practice to learn to count.
c
R.
Pruim, December 4, 2006
Section 2.2
23
Definition 2.2 (onto) A function f : A → B is onto if for any b ∈ B, there is
some a ∈ A such that f (a) = b.
Notice that a one-to-one function is a function that does not double-count and an
onto function is one that does not skip. One-to-one functions are sometimes called
injective functions or injections. Onto functions are sometimes called surjective
functions or surjections. A function that is both one-to-one and onto is called a
bijection.
With all this background, it is pretty clear what it means to say that two sets
have the same size:
Definition 2.3 (Same-sized sets) Two sets A and B have the same cardinality
(written |A| = |B|) if there is a function f : A → B such that
• f is one-to-one, and
• f is onto.
Note that although the notation suggests that we have implicitly defined |A| and
|B| (the sizes of the sets A and B), we have not really done so. This can be done
by specifying the list of infinite numbers (called cardinals) to be used once we finish
with the the natural numbers, but it is not necessary for our purposes.3
Example. Let E be the non-negative even integers (E = {0, 2, 4, 6, 8, . . . }). Then
|N| = |E|, since f : n 7→ 2n is one-to-one and onto.
Notice that the example above shows that infinite sets behave a bit differently
from finite sets. The natural numbers include all of the evens, and much more
besides; nevertheless, these two sets have the same size. In fact, this can be taken
as the definition of an infinite set:
Definition 2.4 (Finite, infintite) A set A is infinite if it has a proper subset
B ( A such that |A| = |B|. Otherwise, A is finite.
If when we pair up the elements of A with elements of B we use up all of A but
perhaps have skipped over some elements of B, then A cannot be larger than B:
Definition 2.5 (No bigger than) A set A is no bigger than the set B (written
|A| ≤ |B|), if there is a function f : A → B, such that
• f is one-to-one.
3
The study of cardinal numbers and cardinal arithmetic is part of a branch of logic known as
set theory. It turns out that the properties of cardinals are closely related to properties of sets and
in fact depend to some extent on the axioms one chooses to use for set theory.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
24
The notation chosen above suggests that “same size as” and “no bigger than”
behave in nice ways. The next two theorems demonstrate that this is the case:
Theorem 2.6 (Properties of =) “Same size as” is an equivalence relation. That
is,
1. “Same size” is reflexive: |A| = |A|.
2. “Same size” is symmetric: If |A| = |B|, then |B| = |A|.
3. “Same size” is transitive: If |A| = |B| and |B| = |C|, then |A| = |C|.
Theorem 2.7 (Properties of ≤)
1. “No bigger than” is reflexive: |A| ≤ |A|.
2. “No bigger than” is transitive: If |A| ≤ |B| and |B| ≤ |C|, then |A| ≤ |C|.
3. If A ⊆ B, then |A| ≤ |B|.
4. |B| ≤ |A| if and only if there is a function f : A → B that is onto.
5. If |A| ≤ |B| and |B| ≤ |A|, then |A| = |B|.
Thus the use of = and ≤ is justified in our notation. This also suggests some
extensions to the notation:
• |B| ≥ |A| means |A| ≤ |B|,
• |A| =
6 |B| means it is not the case that |A| = |B|.
• |A| < |B| means |A| ≤ |B| but |A| =
6 |B|.
Definition 2.8 (Countable) If |A| = |N|, then we say that A is countably infinite.
A countable set is any set that is either finite or countably infinite. In other words,
a countable set is the same size as some subset of N.
Most of the important infinite sets we will encounter in this class will be countably infinite. The following theorem establishes some nice properties of countable
sets.
Theorem 2.9 (Properties of countable sets)
1. The following sets are all countable: N, Z (the integers), Q (the rationals),
the set of even integers.
2. The following sets are uncountable: R (the real numbers), [0, 1] (the reals in
the interval from 0 to 1).
3. A finite union of countable sets is countable:
If A and B are countable, then A ∪ B is also countable.
4. The cross product of countable sets is countable:
Let A and B be countable, then A × B is countable.
c
R.
Pruim, December 4, 2006
Section 2.3
25
5. A countable union of countable sets is countable:
Suppose that for each natural number n, An is countable. Then A = ∪∞
n=0 An
is also countable.
6. The set of all finite sequences from a countable set is countable:
Let A=n denote the set of all sequences of n items from A. (For example,
=n
(1, 4, 3, 0) ∈ N=4 .) Let A∗ = ∪∞
. Then A=n and A∗ are countable.
n=0 A
7. Every infinite set has a countably infinite subset.
Finally, Cantor’s diagonalization argument can be used to establish the following
general fact:
Theorem 2.10 (Power set) Let P(A) denote the power set of A (i.e., the set of
all subsets of A). Then |A| < |P(A)|.
Among other things, this shows that there is no largest size of set. This fact about
power sets is also a key ingredient in one of the famous paradoxes of naive set theory.
You are asked to prove part 3 of Theorem 2.9 in the exercises. The proofs the
rest of the this theorem and of Theorem 2.10 will have to wait until we have seen
Cantor’s two powerful techniques of zig-zag and diagonalization.
Exercises
1. Prove Theorem 2.6.
2. Prove Theorem 2.7.
3. Prove part 3 of Theorem 2.9.
4. Show that Z is countable.
5. Assuming that R is uncountable and Q is countable, determine whether the
set of irrationals is countable or uncountable. Justify your claim.
2.3
Cantor’s Two Ideas
Zig-Zag
Several of the countability results of the previous section can be proven using a
zig-zag argument that goes back to Cantor. We will use this method here to prove
part 4 of Theorem 2.9.
Proof (of Theorem 2.9, part 4). Let A and B be countably infinite. We must show
that A × B is countably infinite. (Strictly speaking, we need to deal with the cases
where one or both of the sets are finite, too, but we will only do the case where both
are infinite here.)
Since A and B are countably infinite, there are one-to-one, onto functions f :
N → A and g : N → B. So A × B = {hf (i), g(j)i | i, j ∈ N}. The key idea of Cantor
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
26
hf (0), g(0)i - hf (0), g(1)i
hf (0), g(2)i - hf (0), g(3)i
hf (1), g(0)i
hf (1), g(1)i
hf (1), g(2)i
?
hf (2), g(0)i
hf (1), g(2)i
hf (3), g(0)i
?
hf (4), g(0)i
Figure 2.1: A picture of Cantor’s diagonalization idea.
is to arrange the elements of A × B in a rectangular grid filling one quadrant of the
plane:
As the Figure 2.1 indicates, we can enumerate A × B beginning in the upper lefthand corner and following the arrows. In fact, by slightly modifying the zig-zag
argument (travel each diagonal from top to bottom instead of zig-zagging), it is not
too hard to give an exact formula for a one-to-one, onto function mapping A × B to
N (or vice versa).
It is worthwhile to give another proof of this.
Proof (of Theorem 2.9, part 4). This time we will make use of part 5 of Theorem 2.7.
For this we need to exhibit one-to-one functions α : A × B → N and β : N → A × B.
The following two functions can easily be shown to be one-to-one (for the first we
use the fact that prime factorizations are unique):
α : hf (i), g(j)i |7→ 2i 3j
β : n 7→ hf (n), g(0)i
Note that we have not yet proved part 5 of Theorem 2.7, so we would be in
danger of circular reasoning if we need this result to prove that one and only had
the second of these proofs.
c
R.
Pruim, December 4, 2006
Section 2.3
27
Diagonalization
Cantor’s diagonalization idea is even cleverer than the previous idea. We will use it
here to prove Theorem 2.10.
Proof (of Theorem 2.10). Let A be any set, we need to show that |A| < |P(A)|.
First notice that clearly |A| ≤ |P(A)|, since
x 7→ {x}
is one-to-one.
The heart of the matter is to show that there is no function f : A → P(A)
that is onto. We will do this using the method of “defeating an arbitrary example”.
For any function f : A → P(A), we will describe a method to show that it is not
onto. That is, for any such function f , we must find some subset Sf of A that gets
“missed” by the function f . One such set is
Sf = {a ∈ A | a 6∈ f (a)} .
Since for every a, a ∈ Sf ↔ a 6∈ f (a), we see that Sf 6= f (a); that is, Sf is missed
by the function f . Since we have not assumed anything special about f , this shows
that no function f can be onto. Therefore |A| =
6 |P(A)|.
Exercises
1. Prove Theorem 2.9.
2. Show that the set of irrational numbers is uncountable.
3. Which of the following are countable, which are uncountable?
(a) The set of all functions from N to N.
(b) The set of all one-to-one functions from N to N.
(c) The set of all functions from N to N that are eventually 0.
(d) The set of all functions from N to N that are eventually constant.
c
R.
Pruim, December 4, 2006
28
Math/CS 312 Notes – Fall 2006
c
R.
Pruim, December 4, 2006
Chapter 3
Mathematical Induction
3.1
Basics
Mathematical induction is a very important proof technique that we will use often.
Let N be the set {0, 1, 2, ...} of natural numbers. The principle of Mathematical
Induction is used to prove statements of the form,
For all natural numbers n, . . .
Using the notation of FOL, we can represent this as
∀n P (n)
where our domain of discourse is the natural numbers. This, in turn, we can express
by saying that
X := {n | P (n)} = N
Both ways of thinking (in terms of a set X or a predicate P ) are useful. Focussing
on the predicate P reminds us that there is some statement about natural numbers;
focussing on X reminds us that such a statement will be either true or false for each
natural number, and we are trying to show that it is true for all of them.
There are several equivalent forms of the induction principle. In some sense,
the most basic of these is the following, which we can express using either logic
notation or set notation. It is often referred to as the Principle of Weak Induction
to distinguish it from some of the other versions.
Principle 3.1 (Principle of Mathematical Induction, Logic Form). Suppose that
the following two statements are true, where P is a unary predicate and the domain
of discourse is the natural numbers:
1. P (0), and
2. ∀k P (k) → P (k + 1).
Then ∀n P (n) is true.
29
Math/CS 312 Notes – Fall 2006
30
Principle 3.2 (Principle of Mathematical Induction, Set Form). Suppose that X ⊆
N satisfying
1. 0 ∈ X, and
2. for every k, if k ∈ X, then k + 1 ∈ X;
then X = N.
The following theorem shows a fairly typical use of the induction principle.
Theorem 3.3 For every n ∈ N,
0 + 1 + 2 + ··· + n =
n(n + 1)
.
2
(3.1)
Proof. In this case P (n) is the statement represented by Equation 3.1. Let X
be the set of natural numbers for which P (n) is true. To prove that every natural
number is in X we must prove two things.
1. First, we must show that 0 ∈ X; that is, we must show that P (0) is true.
This is true since 0 = 0(1)/2.
2. Second, we must show that for every natural number k, if k ∈ X, then k + 1 ∈
X. In other words, if P (k) is true, then P (k + 1) is also true.
In the current example, this means that we must show that if 0+1+2+· · ·+k =
k(k + 1)/2, then 0 + 1 + 2 + · · · + (k + 1) = (k + 1)(k + 1 + 1)/2.
To do this, suppose that k is an integer satisfying
1 + 2 + + k = k(k + 1)/2 .
(3.2)
Then
1 + 2 + · · · + (k + 1) =
=
=
k(k + 1)
+ (k + 1)
2
k(k + 1)/2 + 2(k + 1)
2
(k + 1)(k + 2)
.
2
(3.3)
(3.4)
(3.5)
But (3.5) is just what we need to show to demonstrate that k + 1 ∈ X. Thus,
by the Principle of Mathematical Induction, X contains all natural numbers,
and thus 3.1 is true for all natural numbers.
In a proof using the Principle of Mathematical Induction, step 1 above is referred
to as the base case or the basis step of the proof, and step 2 is called the inductive
step.
We will not prove the Principle of Mathematical Induction. In fact there really is
no way to prove it without assuming a principle at least as strong. We’ll come back
to this issue later. For now, we will notice that the Principle of Induction seems
to make sense intuitively, and assume that it is a valid form of reasoning about the
c
R.
Pruim, December 4, 2006
Section 3.1
31
natural numbers. Also note that we can express an instance of induction in FOL as
the sentence
[P (0) ∧ ∀k (P (k) → P (k + 1))] → ∀n P (n)
There are several other forms of mathematical induction which are frequently
used. Suppose for instance that we are trying to prove a proposition that holds of
all but finitely many integers. It is easy to see that the following principle is true
and useful in this situation.
Principle 3.4 Suppose that X is a set of natural numbers and c is a natural number
such that
1. c ∈ X, and
2. for every k ≥ c, if k ∈ X, then k + 1 ∈ X.
Then for every natural number n ≥ c, n ∈ X.
Consider the inequality 2n < 2n . Since 2n grows faster than any polynomial in
n, it is easy to believe that
Theorem 3.5 There is a number c such that 2n < 2n , provided n ≥ c.
Proof. Notice first that the inequality does not hold for n = 1 or n = 2, but that it
does hold for n = 3, since 6 < 8. So let’s take c = 3 in Theorem 3.5. This establishes
the base case, when n = 3.
Now suppose then that k ≥ 3 and that 2k < 2k . We must show that 2(k+1) < 2k .
But
2(k + 1) = 2k + 2 < 2k + 2k < 2(2k ) = 2k+1 .
The following principle is sometimes called Strong Induction. We express it here
is set form, but it can also be expressed in logic form.
Principle 3.6 (Strong Principle of Mathematical Induction) Suppose that X is a
set of natural numbers such that for every k ∈ N, if for all j < k, j ∈ X , then
k ∈ X. Then X = N.
There are two ways in which Principle 3.6 appears stronger than Principle 3.2.
First, the basis step of the induction is missing. Second, in proving the induction
step, one is allowed to assume that all smaller numbers are in the set X rather than
just that the predecessor of the number in question belongs to X. Actually, the
fact that the basis step is missing is only an apparent strengthening; usually the
proof of the induction step for k = 0 amounts to a basis step and must be handled
separately. In fact, Principle 3.6 is not stronger than Principle 3.2 since it is possible
to prove the strong form of the principle from the weak form.
Theorem 3.7 The Fibonacci sequence is the sequence {fn } of natural numbers defined by the following rules: f1 = 1, f2 = 1, and fn = fn−1 + fn−2 . For every
n ∈ N,
√ !n
√ !n !
1+ 5
1− 5
1
(3.6)
−
fn = √
2
2
5
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
32
Proof. We will use the strong form of the induction principle this time because
the definition of Fibonacci numbers depends on more than just the immediately
preceeding term. It is inevitable that the proofs for n = 1 and n = 2 will be
different from those for other n. We could think of this argument as having two
basis steps. For these basis steps, it is easy to see that (3.6) is true for n = 1 and
n = 2.
For the rest of the proof, let k ≥ 3 be an integer such that (3.6) holds for all
n < k. Showing √
that (3.6) holds for
n = k just requires a little high school algebra.
√
1− 5
1+ 5
First, let c = 2 and let d = 2 . Notice that c and d are the two solutions to
x2 − x − 1 = 0, so c2 = c + 1 and d2 = d + 1. This simplifies our algebra:
fk = fk−1 + fk−2 =
=
=
=
=
1 1 √ ck−1 − dk−1 + √ ck−2 − dk−2
5
5
1 1 √ c · ck−2 − d · dk−2 + √ ck−2 − dk−2
5
5
1 √ (c + 1)ck−2 − (d + 1)dk−2
5
1 2 k−2
√ (c )c
− (d2 )dk−2
5
1 √ ck − dk
5
Most logicians prefer to take the following principle as the fundamental form
of induction. It is equivalent to the forms we have already seen (see the exercises
below).
Principle 3.8 (Well-Ordering Principle) Suppose that X is an nonempty set of
natural numbers. Then X has a least element. That is, there is a natural number
m ∈ X such that for all n ∈ X, m ≤ n.
Exercises
1. Show that 12 + 22 + · · · + n2 = n(n + 1)(2n + 1) for every natuaral number n.
2. Show that for any natural number n ≥ 0,
n
X
j=1
n
1
=
.
j(j + 1)
n+1
3. For what natural numbers n is n2 < 2n ? Prove your claim.
4. Prove that every integer n ≥ 2 can be written as a product of prime numbers
in exactly one way (up to the order of the primes).
[Hints: Use Strong Induction. Recall that a prime number is a natural number
greater than 1 that is not the product of two smaller natural numbers.]
5. Suppose that you have available only 3- and 5- cent stamps. Show that you
can make any amount in postage greater than 7 cents.
6. A binary string is a finite sequence of 0’s and 1’s.
c
R.
Pruim, December 4, 2006
Section 3.2
33
(a) Let B(n) be the number of binary strings of length n. Show by induction
that B(n) = 2n .
(b) Let’s call a binary string α = b1 b2 ...bn good if it does not contain two
adjacent 1’s. Let G(n) be the number of strings of length n that are
good. Notice that G(1) = 2 and G(2) = 3.
Determine what G(n) is and prove your claim using induction.
7. Prove the following facts about the Fibonacci sequence.
(a) Show that f1 + f2 + · · · + fn = fn+2 − 1 for every n.
(b) Show that f2 + f4 + · · · + f2n = f2n+1 − 1.
(c) Show that fn+m+1 = fn · fm + fn+1 fm+1 for all m, n ∈ N.
[Hint: Hold m constant and use induction on n.]
8. Principle 3.2 and Principle 3.1 are equivalent. For a given X in Principle 3.2,
what is the corresponding predicate P in Principle 3.1?
9. Express the Strong Principle of Mathematical Induction (Principle 3.6) in logic
form.
10. Prove that Principle 3.2 follows from Principle 3.6.
11. Prove that Principle 3.6 follows from Principle 3.2.
12. Prove that Principle 3.2 follows from Principle 3.8.
13. Prove that Principle 3.8 follows from Principle 3.2.
3.2
Inductive Definitions
We have already seen several inductive definitions, but now it is time to consider
these more carefully. Our primary example will be the important example of sentences (or wffs) in FOL.
Recall that for a given first order language, the following elements must be
specified:
• a set of constant symbols,
• a set of predicate symbols (with specified arity),
• a set of symbols for the boolean connectives (∧, ∨, ¬, →, ↔), and
• parantheses: ( and ).
We are assuming of course that all of the symbols above are different so that we
can distinguish them.
Given this, we can define the sentences of the language.
Definition 3.9 The set of sentences of propositional logic is defined as follows.
1. Every atomic sentence is a sentence.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
34
2. If A and B are sentences, then so are
(a) (¬A),
(b) (A ∧ B),
(c) (A ∨ B),
(d) (A → B),
(e) (A ↔ B),
3. No sequence of symbols is a formula which is not required to be by 1 and 2.
This definition seems somewhat circular; it appears that in part 2 we are required
to know already what a sentence is in order to define one. This definition is an
example of what is known as an inductive definition. We shall discuss inductive
definitions in general in a moment. However, we can see how Rule 2 is used by
considering an example. Recall that there are only countably many atomic formulas,
so rather than specify our language, we will assume a fixed ordering of all atomic
sentences: A1 , A2 , . . . Finally, note that our official definition of a sentence is very
strict about where parentheses may and must occur.
Example. ((¬(A1 ∧ A2 )) → A3 ) is a sentence in propositional logic.
We can show this by a line-by-line derivation.
1)
2)
3)
4)
5)
6)
Sentence
A1
A2
(A1 ∧ A2 )
(¬(A1 ∧ A2 ))
A3
((¬(A1 ∧ A2 )) → A3 )
Reason
Rule 1
Rule 1
Rule 2b and Lines 1,2
Rule 2a and Line 3
Rule 1
Rule 2d and Lines 4 and 5
This definition of a sentence is an example of an inductive definition. In an
inductive definition of a set X, we specify which elements of a (larger) set S are in
X by giving two types of rules.
• One type of rule, a basis rule, specifies that certain elements of S are in X
unconditionally.
• The second sort of rule in an inductive definition, called an inductive rule,
specifies that a certain element (called the conclusion of the rule) is in X
whenever certain other elements (called the hypotheses of the rule) are in X.
It is easy to see that our definition of a sentence fits this description. (The set
S is the set of all sequences of symbols.) Here are some additional examples:
Definition 3.10 Given a set S, the set S ∗ is defined as follows.
1. The empty sequence, denoted λ, is an element of S ∗ .
2. If σ ∈ S ∗ and x ∈ S, then σx is in S ∗ .
c
R.
Pruim, December 4, 2006
Section 3.2
35
Notice that in Definition 3.10 we have omitted the analogue of clause 3 in the
definition of a sentence. That is, we have not explicitly stated that rules 1 and 2 are
the only way to get into S ∗ . We will always assume that such a condition is meant
in an inductive definition.
Definition 3.11 The set of arithmetic expressions, EXP, is defined as follows.
1. Any integer constant is in EXP.
2. If σ ∈ EXP and τ ∈ EXP , then so are −σ, σ + τ , σ − τ , σ · τ , and σ/τ .
It is easy to see that 2+3/4 is in EXP according to Definition 3.11. One property
of the definition of EXP bears mentioning at this point. Consider the arithmetic
expression 2+3·5. This expression can be shown to be in EXP in two fundamentally
different ways.
This is related to the fact that 2 + 3 · 4 is ambiguous as an arithmetic expression: Some people would evaluate it as 20 and others as 14. This ambiguity is not
present in our definition of sentences in FOL, and this is precisely why we introduce
parentheses (or precedence rules). We will return to the idea of ambiguity in a later
section.
Example.
Let B = {a, b, c}∗ . Let A ⊆ B be the set defined as follows.
1. The empty word (λ) is in A.
2. If α ∈ A, then so are aα, αc, and bαc.
(Here aα means the word formed by adjoining a to the beginning of α and similarly
for the other cases.)
It is easy to see that abbccc ∈ A. In the next section, we will see how to prove
propositions about inductive sets. But you should be able to convince yourself that
the set A defined in Example 3.2 consists of all sequences α of B that contain at
least as many c’s as b’s and in which no no c precedes any b or a.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
36
It is also possible to define objects other than sets using inductive definitions.
For example, we can define a function.
Definition 3.12 Let depth be a function mapping propositional sentences to the
natural numbers defined as follows:
1. depth(ϕ) = 0 for any atomic sentence ψ.
2. If ϕ and ψ are sentences, then
• depth(¬ϕ) = 1 + depth(ϕ),
• depth(ϕ ∧ ψ) = 1 + max(depth(ϕ), depth(ψ)),
• depth(ϕ ∨ ψ) = 1 + max(depth(ϕ), depth(ψ)),
• depth(ϕ → ψ) = 1 + max(depth(ϕ), depth(ψ)), and
• depth(ϕ ↔ ψ) = 1 + max(depth(ϕ), depth(ψ)).
In a certain sense, this is simply another instance of defining a set inductively.
For example, to show that this definition makes sense, we need to show that the
function is well-defined; that is, that the function depth is defined in exactly one
way for every propositional sentence. This amounts to working with the inductively
defined set X of sequences for which depth is well-defined. We will return to this
example in the next section.
Exercises
1. Consider the following inductive definition of a set of sequences A ⊆ {a, b}∗ .
• λ ∈ A;
• if σ ∈ A, then aσa ∈ A and bσb ∈ A.
Give a characterization of A. (That is, describe the elements of A.)
2. Let S = {M, U, I}, and define L ⊆ S ∗ by
• M I ∈ L,
• if M σ ∈ L, then M σσ ∈ L,
• if σI ∈ L, then σIU ∈ L,
• if σIIIτ ∈ L, then σU τ ∈ L,
• if σU U τ ∈ L, then στ ∈ L,
where σ and τ range over S ∗ .
(a) Show that M U IIU ∈ L.
(b) (Harder) Is M U ∈ L?
This problem (from Gödel, Escher, Bach by Hofstader) is known as the
M U puzzle.
c
R.
Pruim, December 4, 2006
Section 3.3
3.3
37
Structural Induction
We will often wish to prove that every element of some inductive set has a particular
property. The method we will usually use to do this is similar to mathematical
induction (in fact, it is a direct consequence of induction). We illustrate the method
using the particular case of propositional sentences.
Theorem 3.13 Let X be a set of propositional sentences. Suppose that
1. Every atomic sentence is in X.
2. For all propositional sentences ϕ, ψ, if ϕ ∈ X and ψ ∈ X then so are
(a) (¬ϕ),
(b) (ϕ ∧ ψ) ∈ X,
(c) (ϕ ∨ ψ) ∈ X,
(d) (ϕ → ψ) ∈ X, and
(e) (ϕ ↔ ψ) ∈ X.
Then for every propositional sentence ϕ, ϕ ∈ X.
Proof. The proof is by mathematical induction. As described in the last section,
every propositional sentence ϕ has a derivation; that is, a finite sequence of propositional sentences ϕ1 , . . . , ϕn such that ϕn = ϕ and for each propositional sentence
ϕi in the sequence, either ϕi is atomic , or ϕi is formed by the application of one of
the inductive rules with hypotheses among the propositional sentences ϕ1 , . . . , ϕi−1 .
We will use strong induction on the length n of the derivation to show that
ϕ ∈ X. That is, we will use induction to show that for all n,
P (n) := if ϕ is a propositional sentence with a derivation of length n, then ϕ ∈ X.
Let n be the length of a derivation of ϕ. By induction we can assume that ψ ∈ X
is true for all propositional sentences ψ with derivations of length less than n. In
particular, this means that every propositional sentence appearing in the derivation
of ϕ belongs to X. But then ϕ also belongs to X since if it is atomic it belongs by
rule (1), and otherwise it belongs by rule (2).
In applying Theorem 3.13, verifying that (1) is true is called the basis step and
showing that (2) holds is the inductive step. We illustrate the use of Theorem 3.13
with an easy example.
Theorem 3.14 Let ϕ be a propositional sentence. Then ϕ has the same number of
left parentheses as right parentheses.
Proof. For propositional sentences ϕ, let L(ϕ) and R(ϕ) denote the number of
left and right parentheses of ϕ respectively. Let X be the set of all propositional
sentences such that L(ϕ) = R(ϕ). We wish to show that every propositional sentence
ϕ belongs to X. The basis step, corresponding to (1) of Theorem 3.13, is to show
that every atomic sentence is in X. But L(ϕ) = 0 = R(ϕ) for any atomic sentence
ϕ.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
38
The first inductive step, corresponding to (2) of Theorem 3.13, is to show that
if ϕ ∈ X then (¬ϕ) ∈ X. Since ϕ ∈ X, L(ϕ) = R(ϕ). But then
L((¬ϕ)) = 1 + L(ϕ) = 1 + R(ϕ) = R((¬ϕ)) .
Thus (¬ϕ) ∈ X.
The next inductive step is to assume that ϕ and ψ are in X and to show that
each of (ϕ ∧ ψ), (ϕ ∨ ψ), (ϕ → ψ) and (ϕ ↔ ψ) is also in X. We can do these all at
once by leting ◦ stand for an arbitrary boolean operator. (Alternatively, we might
just give the proof for one example operator and claim that the rest are similar.)
Suppose then that L(ϕ) = R(ϕ) and L(ψ) = R(ψ). Then
L((ϕ ◦ ψ)) = 1 + L(ϕ) + L(ψ) = 1 + R(ϕ) + R(ψ) = R((ϕ ◦ ψ)).
Thus (ϕ ◦ ψ) ∈ X.
Theorem 3.15 The function depth defined in Definition 3.12 is well-defined. That
is, for every propositional sentence ϕ, depth(ϕ) is defined in exactly one way.
Proof. The proof is almost trivial. The statemnt holds for atomic sentences, since
they all have depth 0, and can’t be given another definition by the recursive part of
the definition.
For a sentence that is not atomic, Theorem 3.13 implies that depth is defined
for all sentences. The only remains to show that no sentence ϕ is given two different
definitions of depth. For this, we need to know that our definition of a propositional
sentence is unambiguous, that is, that there is only one way to derive it from our
inductive definition. We will leave the proof of this “Nonambiguity Theorem” for a
bit later.
Note that nonambiguity is important in the proof above. For example, consider
the obvious definition of a value function on the set EXP from Definition 3.11.
Because of the ambiguity of this definition, the value of 2 + 3 · 4 would not be well
defined (it could be 14 or it could be 20).
The next Theorem gives another example of this method of proof and will play
an important role in the Nonambiguity Theorem that remains oustanding.
Theorem 3.16 Every proper nonempty initial segment of a atomic propositional
sentence has no right parentheses and either 0 or 1 left parentheses. Every proper
nonempty initial segment of a non-atomic propositional sentence has more left parentheses than right parentheses.
Proof. Let X be the set of propositional sentences ϕ for which the theorem is true;
that is, X is the set of all propositional sentences ϕ for which every proper nonempty
initial segment of ϕ has more left parentheses that right parentheses. We wish to
show that every propositional sentence is in X by using structural induction. The
basis step, the case of atomic sentences, is clearly true since atomic sentences have
only two parentheses and the left parenthesis comes before the right.
For the first inductive step, suppose that ϕ is a propositional sentence with this
property and consider (¬ϕ). The proper initial segments of (¬ϕ) are obviously (,
c
R.
Pruim, December 4, 2006
Section 3.3
39
(¬, (¬u where u is a nonempty proper initial segment of ϕ, and (¬ϕ. The initial
segments ( and (¬ obviously have more left parentheses than right parentheses. If
u is a proper initial segment of ϕ, u has at least as many left parentheses as right
parentheses by the induction hypothesis. Thus, so does (¬u. Finally, ϕ has the
same number of left as right parentheses by Theorem 3.14 and so (¬ϕ has one more
left than right parentheses.
Next we assume that ϕ and ψ satisfy this property and consider (ϕ ∧ ψ). The
proper initial segments of (ϕ ∧ ψ) are (, (u where u is a proper initial segment of ϕ,
(ϕ, (ϕ∧, (ϕ ∧ v where v is a proper inital segment of ψ, and (ϕ ∧ ψ. These cases can
again be handled by using the induction hypothesis and Theorem 3.14. The other
induction steps are similar.
The next corollary follows immediately from the previous two theorems.
Corollary 3.17 If ϕ is a propositional sentence, then no proper initial segment of
ϕ is a propositional sentence.
The general principle of structural induction is the following Theorem.
Theorem 3.18 Let S be an inductively defined set and let X ⊆ S be such that the
following hold:
1. For every basis rule in the definition of S, if x is placed in S by that basis rule,
x ∈ X.
2. For every inductive rule in the definition of S, if each hypothesis of the rule
is in X, and x is the conclusion of that rule, then x ∈ X.
Then X = S.
As another example of Theorem 3.18 in action, recall that we considered the
following inductive definition of a language on the alphabet {a, b, c}.
Definition 3.19 Let B = {a, b, c}∗ . Let A ⊆ B be the set defined as follows.
1. The empty word is in A.
2. If σ is a word in A then so are aσ, σc, and bσc.
We now prove the following Theorem which justifies a claim we made in the last
section.
Theorem 3.20 Let A be the language defined by Definition 3.19. Then for every
σ in A, σ has at least as many occurrences of c as of b, and in σ, no occurrence of
c precedes any of b or a.
Proof.
The basis step of the proof corresponds to 1 of Definition 3.19. It is
clear that the empty word has both of the properties mentioned. For the inductive
steps, suppose that σ is a word satisfying the two properties in the statement of the
Theorem. Then we must show that aσ, σc, and bσc do as well. We only consider
the case of bσc. Since σ has at least as many occurrences of c as b, so does bσc since
this word has one more of each of b and c than does σ. Furthermore, no occurrence
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
40
of c in bσc precedes any occurrence of b since such a c would have to precede a b in
σ. Similarly, no occurrence of c precedes any of a.
We will not prove it, but Theorem 3.20 gives a characterization of the set A
defined by Definition 3.19.
Exercises
1. Show that every propositional sentence consists of only finitely many symbols.
2. Let c(ϕ) be the number of occurrences of binary connectives in propositional
sentence ϕ and let p(ϕ) be the number of occurrences of predicates in ϕ. Show
that c(ϕ) + 1 = p(ϕ).
3. Let a set E of natural numbers be defined by this inductive definition
(a) 0 ∈ E,
(b) If n ∈ E then n + 2 is in E.
Show that E is the set of even numbers. First use structural induction to show
that every element of E is even. Then use induction on N to show that if n is
even, n ∈ E.
4. Solve the MU puzzle (see Problem 2 on page 36) by using structural induction
to show that every string in the L has a certain property not enjoyed by MU.
5. Give an inductive definition of the set of “good” strings of Problem 6 on
page 32.
3.4
Ambiguity
We now turn to the question of ambiguity in an inductive definition. We saw in
Section 3.2 that Definition 3.11 of arithmetic expressions gave rise to ambiguous
expressions. That is, there were expressions (2 + 3 · 4, for example) with two fundamentally different derivations. This ambiguity leads to an ambiguity in interpretation. We want to argue that such ambiguity is not present in the definition of a
propositional sentence. We first give a precise statement of what it means for an
inductive definition to be unambiguous.
Definition 3.21 An inductive definition of a set S is unambiguous if the following
hold.
1. For each element of S there is only one rule which puts the element into S.
2. If the element is put into S by an inductive rule, then there is only one sequence
of hypotheses which can be used with the rule to put the element into the set.
A good way of thinking of this definition is that an inductive definition is unambiguous if there is exactly one tree for each of element of the inductively defined
set. Note that nonambiguity is a property of the inductive definition and not of
the inductive set. A set S may have both ambiguous and nonambiguous inductive
definitions.
c
R.
Pruim, December 4, 2006
Section 3.4
41
Theorem 3.22 The inductive definition of propositional sentences given in Definition 3.9 is unambiguous.
Proof. We must establish that for each propositional sentence ϕ there is only one
rule that puts ϕ into the set of sentences. Furthermore, if that rule is an inductive
one, we must show that the hypotheses of the rule are unique. For this purpose, the
various cases of part 2 of Definition 3.9 are considered separate rules. Suppose then
that ϕ is a formula. If ϕ is put into the set of sentences by Rule 1, then ϕ is atomic.
Since atomic sentences do not begin with ‘(’, atomic sentences cannot be put into
the set of sentences by any other rule.
Now if ϕ is a sentence by Rule 2a, then ϕ begins (¬, and hence can’t be a
sentence due to any other rule. Furthermore, there is only one possible ψ such that
gf = (¬gp).
Finally suppose that ϕ is put into the set of formulas by one of the remaining
rules. Then ϕ has the form the form (α ◦ β) for some operator ◦ and sentences α
and β. Now suppose that ϕ = (α ◦ β) = (γ ⋄ δ), where γ and δ are sentences and ⋄
is a boolean operator. Notice that α ◦ β) = γ ⋄ δ), and by Corollary 3.17, neither of
α or γ can be a propoer initial segment of the other. Thus, α = γ. But from this it
follows that ◦ = ⋄, and finally that β = γ. So once again there is only one way that
our sentence can enter the set of propositional sentences.
Note that although parentheses are important for avoiding ambiguity, it is handy
or readability to drop some parentheses from our sentences, especially the outer most
parentheses.
As we mentioned in Section 3.2, a nonambiguous inductive definition allows us
to define functions on S by structural induction. Our definitions of the extension of
a truth assignment (ĥ) and of the depth function are examples. Additional examples
are given in the exercises.
Exercises
1. Give a precise definition of a subsentence of a propositional sentence. (Hint:
What you are defining is a function S such that S(ϕ) is the set of subsentences
of ϕ. Define this function using structural induction.)
2. Let S be the set of propositional sentences and define the function g : S → N
inductively as follows:
•
•
•
•
•
•
g(Ai ) = 2i if Ai is the ith atomic sentence,
g(¬ϕ) = 3g(ϕ) ,
g(ϕ ∧ ψ) = 5g(ϕ) · 7g(ψ) ,
g(ϕ ∨ ψ) = 11g(ϕ) · 13g(ψ) ,
g(ϕ → ψ) = 17g(ϕ) · 19g(ψ) , and
g(ϕ ↔ ψ) = 23g(ϕ) · 29g(ψ) .
The number g(ϕ) is called the Gödel number of the sentence ϕ.
(a) Compute the G odel number of (A1 ∨ (¬A2 )).
(b) Show that g is one-to-one.
c
R.
Pruim, December 4, 2006
42
Math/CS 312 Notes – Fall 2006
c
R.
Pruim, December 4, 2006
Chapter 4
Set Theory
As a branch of mathematics, set theory is less than one hundred years
old, yet it occupies a unique and critical position. Set-theoretic principles
and methods pervade mathematics. Set-theoretic results have shaken the
worlds of analysis, algebra, and topology. Simple questions about sets
have split the mathematical community into hostile camps, and the romance of its infinite sets have charmed and challenged philosophers as
nothing else in mathematics.
Jim Henle
Set theory and set-theoretic notation were born out of the nineteenth century
struggle in mathematics to give a clear account of the real number system. In this
chapter we will investigate an axiom system known as ZF (for Zermelo-Fraenkel).
An investigation of set theory begins by asking the question: What is a set?
In school you were probably taught that a set is a collection of objects. While
this intuition is important, this approach won’t get us very far toward our goal
of a rigorous foundation for set theory. It is not very precise. (After all, what is
a collection? For that matter, what is an object?) Furthermore, this description
makes the (for us) unnecessary distinction between sets and objects.
Instead, we will take the approach of stipulating how sets behave. The fundamental property of any set has to do with what “objects” are “in” it. So it seems
natural that our language will need to have some way of talking about membership
of one set in another set (our objects will all be sets themselves). Let L = {∈}
be the language with one binary predicate (the intended meaning of which is to
indicate membership of one set in another). It turns out that this simple language
will suffice to express axioms rich enough to do an amazing amount mathematics
not only about sets but also about many other familiar mathematical objects, like
numbers, functions, graphs, groups, etc.
4.1
Naive Set Theory
Our first attempt at specifying how sets behave, naive set theory has a certain
elegance about it. It is based on only two fundamental principals (the axiom of
extension and the axiom scheme of comprehension), both of which seem to be intuitively obvious, or at least reasonable, assumptions based on the way we naively
43
Math/CS 312 Notes – Fall 2006
44
think about sets.
A study of naive set theory (see LPL, chapter 15) reveals two things:
1. Naive set theory seems to be useful for doing mathematics.
We are able to define many useful mathematical objects like intersections,
unions, ordered pairs, functions, relations, etc. and prove things about them.
2. Naive set theory is inconsistent.
We can use a form of Russel’s paradox to show that naive set theory is able
to prove that the sentence
∀ x(P(x) 6⊆ x)
is both true and false.
We would like to remedy this by revising our axioms in such a way that the
resulting theory of sets is still useful for doing mathematics, but no longer able to
prove contradictory statements.
4.2
A Second Attempt at Set Theory
We won’t actually quite succeed in meeting the goals listed above. In fact, it is
inherently impossible to do so. But we will come close. We will introduce a new
set of axioms called ZF that will no longer be susceptible to the Russell attack on
its consistency. Unfortunately, we won’t know with certainty that there is not some
other inconsistency that no one has yet detected.
We also won’t take time to fully develop “all of mathematics” in ZF, but we will
give some indication that this might be possible by doing the following:
• We will show that ZF ⊢ Con(PA).
That is, assuming the axioms of Zermelo-Fraenkel set theory, we can build a
model for the axioms of Peano Arithmetic. Although we will not do it here,
a similar thing can be done to construct models for the integers, the rationals
and the reals. In fact, one can attempt in this way to “do all of ordinary
mathematics” (like calculus) and see how much truth there is to the slogan
“All mathematics is set theory.”
• We will do some infinite arithmetic.
Actually there are two types of infinite arithmetic, ordinal and cardinal. Hopefully, we will have a chance to talk a little about each one.
But before we can make progress on these goals, we need to take a closer look at
the axioms of ZF.
c
R.
Pruim, December 4, 2006
Section 4.3
4.3
45
The Axioms of ZF
Our first axiom will formalize our statement that a set is determined by what is
“in” it. It is identical to the Axiom of Extensionality from naive set theory.
Axiom (Extensionality): Membership determines the set.
∀a∀b[∀z [z ∈ a ↔ z ∈ b] ↔ a = b].
In particular, this says that it doesn’t matter how we describe a set, how we denote a
set, or how we construct a set, only what ends up belonging to the set (as determined
by the relation ∈). Same members, same set. Different members, different sets.
Our second axiom provides us with our first example of a set.
Axiom (Empty Set): There is a set with no members.
∃a∀x x 6∈ a
Notice that since we won’t have the powerful comprehension scheme of naive set
theory, we need to have a separate axiom to build this set. (How could one show
that it does not follow from extensionality alone that there is an empty set? Must
there be a set at all?)
Also notice that we have introduced an abbreviation here. Whenever we use the
“phrase” y 6∈ x, officially we mean ¬(y ∈ x). In fact, we will use many abbreviations
in our study of set theory. This will make our expressions much easier to read and
understand. But in principal, every such statement with abbreviations could be
written down as a first order wff in the language L = {∈}. In fact, for any relation
defined by a formula ϕ, we will allow ourselves the convenience of introducing an
abbreviation.
Examples.
1. The binary relation a ⊆ b is defined by the wff (with 2 free variables) ϕ(a, b) =
∀x [x ∈ a → x ∈ b].
2. The 3-ary relation c = a ∩ b is defined by the wff (with 3 free variables)
ϕ(a, b, c) = ∀x [x ∈ c ↔ [x ∈ a ∧ x ∈ b]].
This example deserves a bit more discussion. Typically, we think of intersection (∩) as being an operation that builds a new set from two sets, rather than
as a relation among three sets. Once we have shown that for any a and b there
is a set c such that c = a ∩ b (this will require some more axioms) and that it
must be unique (that much we can already do from Extensionality), then we
will be free to treat ∩ as a operator (or function) in this way. But officially,
when we make some claim Ψ(a ∩ b) we really mean
∃c [c = a ∩ b ∧ Ψ(c)] ,
which is of course just an abbreviation for
∃c [∀x [x ∈ c ↔ [x ∈ a ∧ x ∈ b]] ∧ Ψ(c)] .
You can begin to see why abbreviations are going to be necessary to do anything complicated.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
46
3. a = ∅ is an abbreviation for ∀x x 6∈ a, so the Axiom of Extensionality can be
rewritten as
∃a a = ∅ .
Exercise 4.1. Write down a wff that defines each of the following relations: c = a∪ b;
c = {a, b}; c = a \ b.1
⊳
Exercise 4.2. There is another kind of abbreviation that will be useful to us. Write
wffs that are abbreviated by ∃x ∈ a ϕ, ∀x ∈ a ϕ, ∃!x ϕ, and ∃!x ∈ a ϕ. (∃!x is
intended to mean ‘there is a unique x such that’).
⊳
We can’t do much with just these first two axioms. In fact,
Exercise 4.3. Show that there is a model for Extensionality and Empty Set that has
only one element in its universe.
⊳
What we need is some way to generate more sets. The next few axioms of ZF will
provide us with ways to build new sets from old sets. We know from the paradoxes
we have already discussed that we will need to be at least a little bit careful as we
do this. In general, the guiding principal is not to allow sets that are “too big” (like
the set of all sets or some other “bad” thing). The four most important of these
“set-building axioms” are
Axiom (Pairing): If a and b are sets, so is {a, b}.
∀a∀b∃c∀x [x ∈ c ↔ [x = a ∨ x = b]
Axiom (Union): If a is a set, then ∪a is a set.
∀a∃c∀x [x ∈ c ↔ ∃b[b ∈ a ∧ x ∈ b]]
Axiom (Powerset): If a is a set, so is the set of all subsets of a.
∀a∃b∀x [x ∈ b ↔ x ⊆ a]
Axiom (Separation): If a is a set and ϕ is a wff, then {x ∈ a | ϕ(x)} is a set.
∀a∃b∀x [x ∈ b ↔ [x ∈ a ∧ ϕ(x)]]
Exercise 4.4. We used an abbreviation (⊆) in the Powerset Axiom. Rewrite that
axiom as an L-wff without any abbreviations.
⊳
The Pairing and Powerset Axioms are fairly straightforward. The Union Axiom
deserves a little explanation. By ∪a we mean the set ∪a = {x | ∃b (b ∈ a ∧ x ∈ b)}.
In this notation, the more familiar A ∪ B is ∪{A, B} Since by Pairing, {A, B} is a
set if A and B are sets, we see that ZF allows us to construct A ∪ B whenever A
and B are sets.
1
a \ b is the set of all elements of a that are not elements of b. In set theory, we often use
the symbol \ instead of the usual subtraction symbol since everything is a set and we want to
distinguish between, for example, 5 − 3 (which equals 2) and 5 \ 3 (which equals {3, 4}).
c
R.
Pruim, December 4, 2006
Section 4.3
47
The Separation Axiom is really an axiom scheme. By this we mean that it is
a description of countably many axioms, one for each possible wff ϕ.2 Separation
allows us to form “definable” subsets. That is, we can select out from any set all
of the members of that set which satisfy some property that can be expressed with
a wff. This is weaker than saying that any subset can be formed, since there may
be subsets that cannot be defined in this manner. For doing everyday mathematics,
however, this is usually sufficient, since usually it is not difficult to express the
kinds of subsets we need in this manner. And Separation is much weaker than
Comprehension, which allowed us to make any definable set, even if it wasn’t a
subset of some other set. This is the sense in which have tried to avoid sets that are
“too large”.
Note that once we have Separation, then weaker versions of the Pairing, Union,
and Powerset Axioms suffice:
∀x∀y∃z [x ∈ z ∧ y ∈ z]
∀x∃z∀y∀u [y ∈ x ∧ u ∈ y] → u ∈ z]
∀x∃y∀u [u ⊆ x → u ∈ z]
If one’s goal is to study models of set theory, then these weaker versions are
somewhat easier to establish for a given model. If we are only interested in consequences of ZF, then it doesn’t matter which version we assume, since the stronger
versions follow from the weaker versions and Separation.
The Empty Set Axiom is also unnecessary provided we know that there is some
set (like the one guaranteed by the Infinity Axiom discussed below), since we can
now define the empty set as
{x ∈ a | x 6= x}
for any set a.
The remaining axioms of ZF are Infinity, Regularity, and Replacement. The
Infinity Axiom is like the Empty Set Axiom in that it guarantees the existence of
one particular set, rather than giving a general way for building new sets from old.
Axiom (Infinity): There is an infinite set.
∃a [∅ ∈ a ∧ ∀x [x ∈ a → x ∪ {x} ∈ a]]
Without this axiom, it is possible to have a model in which every set is finite
(although the model itself must be infinite).
Exercise 4.5. Show that any model of ZF− Inf, the axioms of ZF without the Infinity
Axiom, must be infinite. Hint: How large is a powerset? Show that there must be
sets of infinitely many different sizes.
⊳
2
We have been a bit imprecise about what kind of wff ϕ may be. Clearly ϕ will usually have
x free. It may also have a free, but c should not appear free in ϕ. Also, ϕ may have additional
free variables, in which case we need to preface the axiom with universal quantifiers over all the
additional variables – what we have denoted as ∀~
z in the past. This is called the universal closure of
a wff with free variables. The separation scheme actually includes all universal closures of the wffs
just described, but we will suppress the listing of the free variables and their universal quantifiers
to keep the notation manageable.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
48
With this axiom, there must be a set that contains ∅, ∅ ∪ {∅} = {∅}, {∅} ∪ {{∅}} =
{∅, {∅}}, . . . This axiom will be important in our definition of a model for PA, and
in fact, we will interpret the successor function of PA using S(x) = x ∪ {x}.
Axiom (Regularity):
∀a [a 6= ∅ → ∃b [b ∈ a ∧ b ∩ a = ∅]]
The Axiom of Regularity is less intuitive than some of the others, and in fact,
much can be done without it, but is useful to avoid certain pathological sets and
relationships between sets. ZF− is sometimes used to denote ZF with the Regularity
Axiom deleted.
Lemma 4.1 There is no set a such that a ∈ a.
Proof. First note that {a} exists since it is the same as {a, a}, which exists by
Pairing. Now apply Regularity to {a}. Since {a} is non-empty and a is the only
element in {a}, it must be that a ∩ {a} = ∅. Since clearly a ∈ {a}, this means that
a 6∈ a, else the intersection would be non-empty.
Exercise 4.6. Show that if a ∈ b, then b 6∈ a. Hint: Apply Regularity to {a, b}. ⊳
We will defer discussion of Replacement until we need it.
4.4
The axioms of PA
Peano Arithmetic is an axiomatization of basic arithmetic on the natural numbers
(non-negative integers). Peano Arithmetic works in the language L = {0, 1, S, +, ×},
where 0 and 1 are constants, S is a unary function, and + and × are binary functions.
S stands for successor, and the intended meaning is that S(x) should be the “next”
natural number after x. The goal in choosing these axioms is to choose statements
that are “obviously true” about arithmetic on natural numbers, powerful enough to
form the basis of our reasoning about and with the natural numbers, yet as simple
and few as possible. Over time, the axioms proposed by Guiseppe Peano, an Italian
mathematician, have become the standard choice.
Peano Arithmetic has seven regular axioms and one axiom scheme. The axiom
scheme is intended to capture how induction works.
1. ∀x∀y [S(x) = S(y) → x = y]
2. ∀x [S(x) 6= 0]
3. S(0) = 1
4. ∀x [x + 0 = x]
5. ∀x∀y [x + S(y) = S(x + y)]
6. ∀x [x × 0 = 0]
7. ∀x∀y [x × S(y)) = (x × y) + x]
c
R.
Pruim, December 4, 2006
Section 4.4
49
8. For any wff ϕ with free variable x (and possibly other free variables, too), the
universal closure of
[ϕ(0) ∧ ∀x (ϕ(x) → ϕ(S(x))] → ∀x ϕ(x)
is an axiom. This axiom is called induction on the wff ϕ.
You might have expected some other familiar statements to be in this list, things
like ∀x∀y x + y = y + x, and other sentences saying that addition and multiplication
have the usual associative, commutative and distributive properties. It turns out
that all these (and much more) are consequences of PA, so for simplicity’s sake, we
leave them out of the axioms and instead make them theorems (statements we prove
with the axioms as premises). Here are two examples. The parenthesized PA before
each statement indicates that these are consequences of the Peano axioms.
Theorem 4.2 (PA) ∀x S(x) = x + 1
Proof. Let x be arbitrary (i.e., an arbitrary natural number). By (3) (and the
indiscernibility of identicals), x + 1 = x + S(0). By (5), x + S(0) = S(x + 0). By
(4), S(x + 0) = S(x). So by transitivity and symmetry of = (or by several uses of
indiscernibility of identicals, if you want to go all the way back to first principles),
S(x) = x + 1. Since x was arbitrary, this is true for all x.
Theorem 4.2 may leave you wondering why one bothers to introduce S into the
language at all, since everything can be expressed using +. Indeed, Language, Proof
and Logic does not introduce S into the language. (See page 456 for the axioms
there.) Nevertheless, there are reasons to do so. In particular, there is a subtle
distinction between adding 1 (successor) and adding an arbitrary number. The
successor is the foundation on which all the other addition is built, as you can see
from the axioms. Furthermore, what S(x) is depends in a much more significant
way on x than on 1, so it proper to think of it as a unary operator. This corresponds
in some sense the difference between x+1 and x++ in the C programming language.
In any case, we will need to think carefully about the successor when we build our
model of PA in ZF, so the distinction is a useful one for us.
Theorem 4.3 (PA) ∀x x + 1 = 1 + x
Proof. This proof is more involved, since it requires the use of induction. Our wff
ϕ(x) will be the following:
x+1 =1+x
We must show that ϕ(0) holds (base case) and that for any x, ϕ(x) → ϕ(x + 1)
(inductive step). (Now that we know that S(x) = x + 1, we will freely choose which
one we use in a particular situation.) We begin showing the base case, 0 + 1 = 1 + 0.
By axiom (3), 0 + 1 = 1. By axiom (4), 1 + 0 = 1. So 0 + 1 = 1 + 0.
Now we do the inductive step. Let x be any number such that ϕ(x). We must
show ϕ(x + 1), i.e., that (x + 1) + 1 = 1 + (x + 1). By the inductive hypothesis
(ϕ(x)), we know that x + 1 = 1 + x, so
(x + 1) + 1 = (1 + x) + 1by inductive hypothesis
= 1 + (x + 1)by axiom (5)
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
50
The other familiar properties of arithmetic follow by similar (but longer) sorts
of proofs. Each one will use induction. This is because Peano’s axiom scheme gives
inductive (or recursive) definitions of addition and multiplication.
One last note on PA. We have constant symbols for 0 and for 1, but not for the
other natural numbers. If necessary, we can consider 2, 3, etc. to be abbreviations
for S(S(0)), S(S(S(0))), etc. In this way, we are able to talk about any of our old
favorite natural numbers we like even though they do not have constant symbol
names.
4.5
A Model for PA: The Universe
We want to show that in ZF we can construct a model of PA. This will show us
that if ZF is consistent, then so is PA, since it has a model. (Of course, if ZF is not
consistent, then it can prove that there is a model for PA and it can prove there
is no such model.) We take this as evidence of two things: ZF is useful for doing
mathematics, and PA is a reasonable axiom set for arithmetic.
Let’s call the model we are going to construct M. We are part of the way there
already. We will let 0M be ∅, and S M be defined by S M (x) = x ∪ {x}. But we are
getting a little bit ahead of ourselves. We haven’t even said what the universe of our
model is supposed to be. Furthermore, we need to say something about functions,
since S M is supposed to be a function defined on the universe. We will deal with
the universe in this section and the functions – including +M and ∗M as well as
S M – in the next section.
We want our universe to be exactly ω = {0, S(0), S(S(0)), . . . }. (In set theory
this set is usually denoted by ω rather than N, although it is essentially the same
object.) So we are building the “standard model” of PA, which until now we have
just been assuming existed. All we need to do is show that ZF implies that the
ω exists. The Infinity Axiom almost says this, but we need to combine it with
Separation to get just the set we want.
Let ϕ(z) be the wff
ϕ(z) = ∅ ∈ z ∧ ∀x [x ∈ z → x ∪ {x} ∈ z] .
Then the Infinity Axiom is just ∃z ϕ(z). For some such z, we use Separation to
build the set
ω = {x ∈ z | ∀u [[u = x ∨ u ∈ x] → [u = ∅ ∨ ∃v S(v) = u]]} .
The intuition for this definition is that we want to include in ω only those things
which are built up from ∅ using successor. We want to remove from z any other
types of sets. We will say more about how to formalize S(x) = x ∪ {x} as a function
from ω to ω in the next section. For now, let’s plow on and show that S has the
desired properties.
Lemma 4.4 (ZF) The following statements are true about S and ω:
1. ∀x 0 6= S(x);
c
R.
Pruim, December 4, 2006
Section 4.5
51
2. ∀x∀y [S(x) = S(y) → x = y];
3. ∀x ∈ ω S(x) ∈ ω;
4. ∀~y [ϕ(0) ∧ ∀x [ϕ(x) → ϕ(S(x))] → ∀x ∈ ω ϕ(x)]].
Exercise 4.7. Prove Lemma 4.4. Hint: For (2) use Exercise 6. For (4) apply
Regularity to X = {x ∈ ω | ¬ϕ(x)} to show that X must be empty.
⊳
By Lemma 4.4, once we have shown that S is a function on ω, our model M
will satisfy axioms 1, 2 and Induction of PA. Statement (4) also justifies our use of
informal induction on ω, which we can use to prove the following useful facts about
the way ∈ behaves on ω.
Lemma 4.5 (ZF) Properties of ∈ on ω:
1. ∀x ∈ ω ∀y ∈ ω [x ∈ y → y 6∈ x].
2. ∀x ∈ ω∀y [y ∈ x → y ( x];
3. ∀x ∈ ω ∀y ∈ ω x ∈ y ↔ x ( y
4. ∀x ∈ ω ∀y ∈ ω ∀z ∈ ω [x ∈ y ∧ y ∈ z → x ∈ z];
5. ∀x ∈ ω ∀y ∈ ω [x = y ∨ x ∈ y ∨ y ∈ x].
Proof. (1): This follows immediately from Exercise 6.
(2): Induct on x. If x is 0, then the statement is vacuously true. If x = k ∪ {k}
and the statement is true for k in place of x, then y ∈ x = k ∪ {k} implies that
either y ∈ k so by induction y ( k ( x, or else y = k ( x.
(3): This is proved by induction on y.
• Base Case: y = 0.
It is vacuously true if y = 0, since there are no x such that x ∈ 0 and there
are no x such that x ( 0.
• Inductive Step: Suppose that y = S(z) = z ∪ {z} for some z and that x ∈ z ↔
x ( z.
First, we show that if x ∈ y, then x ( y. Since x ∈ y = z ∪ {z}, there are two
cases to consider:
– Case 1: x = z. In this case, x = z ( z ∪ {z} = y, so x ( y.
– Case 2: x ∈ z.
In this case, by the inductive hypothesis, x ( z ( z ∪ {z} = y.
So in either case x ( y.
On the other hand, suppose that x ( y = z ∪ {z}. Again there are two cases
to consider.
– x⊆z
If x ⊆ z, then either x ( z or x = z. If x ⊆ z, then x ∈ z (by the
inductive hypothesis). Either way, x ∈ z ∪ {z} = y.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
52
– z∈x
In this case, by the inductive hypothesis, z ( x. So z ( x ( z ∪ {z},
which is a contradiction, since z is missing only one element from z ∪ {z}.
So this case doesn’t happen.
(4): If x ∈ y and y ∈ z, then by (2) x ( y and y ( z, so x ( z, hence by (2)
again, x ∈ z.
(5): First notice that by (3), (5) is equivalent to ∀x ∈ ω ∀y ∈ ω [x ⊆ y ∨ y ⊆ x].
Now induct on x.
• Base case: If x = 0 the result is obvious since 0 ⊆ y for any y.
• Inductive step: x = k ∪ {k}, and for all y ∈ ω, y ( k, k ( y or y = k. Let’s
look at each case.
– If y ⊆ k, then y ( k ⊆ x, so y ( x.
– If y = k, then y ⊆ x.
– If k ( y, then (by (3)) k ∈ y, so {k} ⊆ y, so x = k ∪ {k} ⊆ y.
Notice that (1), (3) and (4) imply that ∈ is a strict linear order on ω. You may
remember that when we introduced PA, we mentioned that one can define x < y by
∃z x + S(z) = y. Of course, once we have defined addition on ω, we could do the
same thing here. It turns out that both orders are the same, that is
∀x ∈ ω ∀y ∈ ω [x ∈ y ↔ ∃z ∈ ω x + S(z) = y]
We will write x < y if x ∈ y, and x ≤ y if x ∈ y ∨ x = y.
4.6
A Model for PA: The Functions
In the last section we postponed the question of what a function is. Now we need
to answer it. So what is a function? Well, in set theory, everything is a set, so a
function must be some sort of set. And sets are determined by there members, so
what we are really asking is what belongs to (the set representing) a function.
Let’s suppose f : A → B is a function. What set should it be? Typically, we
think of the function f as a rule telling us how to assign to each element a ∈ A an
unique element b ∈ B. In set theory, we will assume that that rule is expressed as
a list (i.e., a set) of all such ordered pairs a and b.
But what is an ordered pair? Once again, it must be some set (everything is a
set). Let’s denote an ordered pair by ha, bi. The key property of an ordered pair is
that ha, bi = hc, di if and only if a = c and b = d. The Pairing Axiom lets us build
sets like {a, b}, but this set does not distinguish the order of a and b and is the same
as {b, a}. After a little experimenting, we find that there is a reasonable set to call
ha, bi and that the axioms of ZF imply that this set exists whenever a and b are sets.
Exercise 4.8. Here is a list of possibilities for ha, bi. Only one of them has the
desired properties. Find it and prove that it works. For the others, show why they
fail to work:
{a, b}
{a, {b}}
{{a}, {b}}
{{a}, {a, b}}
c
R.
Pruim, December 4, 2006
{a, b, {a, b}}
Section 4.6
53
⊳
Now we can define A × B to be the set of all ordered pairs ha, bi where a ∈ A
and b ∈ B.
Exercise 4.9. Prove that if A and B are sets, then A × B is a set. Hint: Find a set
big enough to contain A × B and then use Separation to get exactly A × B. You
will need the answer to Exercise 8 to do this.
⊳
A function from A to B is now just a set with the following properties:
• f ⊆ A × B,
• ∀a ∈ A ∃b ∈ B ha, bi ∈ f ,
• ∀a∀y∀z [[ha, yi ∈ f ∧ ha, zi ∈ f ] → y = z].
Notice that the last two properties can be combined into the following wff (with
abbreviations):
• ∀a ∈ A ∃!b ∈ B ha, bi ∈ f ,
Now we introduce a number of abbreviations for functions. f : A → B is an
abbreviation for “f is a function from A to B” (i.e., for the conjunction of the
three wffs in the definition above); “f is a function” abbreviates ∃A∃B f : A → B;
f (x) = y abbreviates hx, yi ∈ f ; range(f ) = {y | ∃x f (x) = y}; and f |`D = {hx, yi ∈
f | x ∈ D}.
Exercise 4.10. Write wffs (you may use other abbreviations) that define “f is oneto-one”, “f is onto B”, and “f is a function from A to B and g = f −1 ”.
⊳
These abbreviations will allow us to use our usual notation for functions when it is
convenient to do so. There are times, however, when knowing that f is really just
a set with certain properties is also handy.
The following properties of functions are easy to prove in ZF:
Lemma 4.6 (Function Lemma)
1. If f is a function, then range(f ) is a set.
2. If f is a function and X ⊆ f , then f is a function.
3. If f is a function and D is a set, then f |`D is a function.
4. If f : A → B is one-to-one and g = f −1 , then g is a one-to-one function.
Exercise 4.11. Prove Lemma 4.6. Hint: For (2), remember that “f is a function”
means f : A → B for some sets A and B. The trick is to show that the appropriate
sets A and B exist. Use Union and Separation for this. For (3) and (4), use (2). ⊳
Now we are ready to finish our model M for PA. First let’s deal with successor:
Lemma 4.7 There is a function f : ω → ω such that for every x ∈ ω, f (x) = S(x).
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
54
Proof. Let f = {hx, yi ∈ ω × ω : S(x) = y}.
Note that by Lemma 4.4 if we let S M be the function from the previous lemma,
then axioms 1, 2 and the Induction Scheme of PA are satisfied by our model.
Lemma 4.8 There are functions α : ω × ω → ω and µ : ω × ω → ω such that
1. For all n ∈ ω, α(hn, 0i) = n.
2. For all n, k ∈ ω, α(hn, S(k)i) = S(α(hn, ki)).
3. For all n ∈ ω, µ(hn, 0i) = 0.
4. For all n, k ∈ ω, µ(hn, S(k)i) = α(hµ(hn, ki), ni).
Note that once we have proven the lemma we will let +M = α and ∗M = µ.
Proof. We will only prove the result for α, the proof for µ is similar.
It turns out that this result is trickier then it might first appear. In fact, we will
need to introduce the axiom scheme of Replacement in order to accomplish it. Here
is the basic idea of the proof. Suppose we want to show that such a set/function α
exists. We would like to build it up in stages. For example, we know how to add
when the second addend is 0: m + 0 = m. So let’s let
A0 = hm, n, ri ∈ ω × ω × ω | n = 0 ∧ m = r .
A0 exists by Separation.
Now we want to let A1 be the part of α that tells us what to do when we add 1:
A1 = hm, S(n), S(r)i ∈ ω × ω × ω | hm, n, ri ∈ A0 .
And of course, we want to have
Ai+1 = AS(i) = hm, S(n), S(r)i ∈ ω × ω × ω | hm, n, ri ∈ Ai .
Finally we let α = ∪∞
i=0 Ai . In this way we build up α stage by stage.
So where is the rub? We need to justify the existence of all the A′i s and α =
∪∞
i=0 Ai . For the latter, we need a set A = {Ai | i ∈ ω}. Then we can use the Union
Axiom to define α = ∪A. (Remember, that is the way we formalize ∪∞
i=0 Ai .)
The Axiom of Replacement allows us to build just this sort of set. Notice the
form of this set: for each i ∈ ω, we want to put Ai into A. That is we want to
replace each i with Ai . More generally, Replacement allow us to build sets of the
form
{F (x) | x ∈ A}
where A is a set and F behaves like a function (for each x ∈ A there must be exactly
one F (x)) but is defined in terms of wffs rather than sets (since we want to use it
to prove the existence of the sets involved).
Why should we allow such an axiom in ZF? The intuition is that if we map
each element x of the set A to F (x), then the resulting collection of all these images
under F is no larger than A was, and no more complicated, so it should be a set,
too.
Here is the formal axiom (scheme):
Axiom (Replacement): For each wff ϕ with free variables x and ~z, we have the
axiom
c
R.
Pruim, December 4, 2006
Section 4.6
55
∀a ∀~z [∀x ∈ a∃!y ϕ(x, y, ~z) → ∃b ∀x ∈ a ∃y ∈ b P (x, y, ~z )]
By combining this with separation, we can change the conclusion to be
∀y [y ∈ b ↔ P (x, y, ~z)] .
That is, if we can prove that for each x in some set A there is a unique y such that
ϕ(x, y), then we may justify the formation of
{y | x ∈ A ∧ ϕ(x, y)} .
As with our axiom schemes of Comprehension and Separation, the ~z can be thought
of as allowing parametrization.
Now we have just what we need. We can use Replacement (and Separation) to
define the A′i s.
By Extension, each Ai will be a unique set, so we can apply Replacement (and
Separation) to build A = Ai | i ∈ ω.
Finally, we let α = ∪A = ∪i∈ω Ai .
Putting everything together, we get
Theorem 4.9 ZF ⊢ Con(PA). That is, assuming the axioms of Zermelo-Fraenkel
set theory, we can show that there is a model for PA, and hence that PA cannot prove
a contradiction.
Note that we are implicitly using Soundness and Completeness here to say that
there is such a connection between proof and truth.
c
R.
Pruim, December 4, 2006
56
Math/CS 312 Notes – Fall 2006
c
R.
Pruim, December 4, 2006
Chapter 5
Computability Theory
These brief notes are meant to accompany the exposition of Turing machines in
Introduction to Automata Theory, Languages, and Computation by Hopcroft and
Ullman.
5.1
Informal Computability
Let’s begin by giving some definitions that will allow us to talk about an informal
notion of computability. We will Σ denote a set that we will refer to as the ’alphabet’.
We will refer to the elements of Σ as characters or symbols or letters, depending on
our mood. Elements of Σ∗ will be called strings or words, and a subset L ⊆ Σ∗ is
sometimes called a language.
Definition 5.1 (Computable function) A function f : A ⊆ Σ∗ → Σ∗ is a computable function with domain A if there is a computation that for every input x ∈ A
outputs f (x).
Definition 5.2 (Computable set) A set A ⊆ Σ∗ is computable (also called decidable) if there is a computation that on every input x ∈ Σ∗ correctly answers the
question “is x ∈ A?”
Definition 5.3 (Computably enumerable set) A set A ⊆ Σ∗ is computably
enumerable if there is a computation that successively lists all and only elements of
A.
The idea behind this last definition is that by cycling through the various inputs x
and computing f (x), we get a “listing” of all member of A (perhaps with repetitions).
Thus f enumerates A.
Examples.
Let Σ be the set of symbols used in FOL.
1. SENT = {S | S is a sentence in FOL} is computable.
2. TAUT = {S | S is a tautology in FOL} is computable.
3. CONSEQUENCEST = {S | T ⊢T S} is computably enumerable.
57
Math/CS 312 Notes – Fall 2006
58
5.1.1
Numerical Computation
Fix some encoding of natural numbers as strings in Σ∗ for some finite alphabet Σ.
(Examples: unary encoding, binary encoding, decimal encoding.) Let pnq be the
encoding of n in this scheme. With such an encoding scheme in hand, we can define
computable functions and sets of natural numbers.
Definition 5.4 (Computable function) A function f : N → N is a computable
function if there is a computation that for every input pnq outputs pf (x)q.
Definition 5.5 (Computable set) A set A ⊆ N is computable (also called decidable) if there is a computation that on every input pnq correctly answers the question
“is n ∈ A?”
Definition 5.6 (Computably enumerable set) A set A ⊆ N is computably
enumerable if there is a computation that successively lists (the codes of ) all and
only elements of A.
Note that the same idea of encodings can be used to define sets and functions on
other types of mathematical objects in terms of strings in some Σ∗ . In particular, it
is often handy to define functions with more than one input. There are many ways
to do so, here is one easy way: Encode input pairs (x, y) with the string x#y. That
is, we will place # between the to inputs. This can be extended to arbitrarily many
inputs in the obvious way. (We are assuming here that # is a symbol not already
used in building the (code) strings x and y.) There are also ways to code pairs that
don’t require any extra symbols in the alphabet.
Examples.
1. PRIMES = {n | n is a prime number} is a computable set.
2. Most of the commonly used functions in mathematics are computable functions. (Addition, multiplication, squaring, (rounded) logarithms, etc.)
5.2
Formal Computability
Our definitions above are imprecise because all of them rely on the undefined notion
of computation. In order to make the definitions above precise, we must give a precise
definition of computation. Notice that computations must be able to do one of two
things: In the definition of computable function the computation must produce
“outputs” for given “inputs”; in the definition of computable set, the computation
must make “decisions” (answer yes or no) for each given input.
Our (first) formal definition of computation will be “what a 1-way 1-tape Turing
machine computes.” We will refer to this notion of computation as TM-computation.
(See Hopcroft/Ullman for details about our definition of Turing machine.)
Definition 5.7 (Turing-computable function) A function f : A ⊆ Σ∗ → Σ∗ is
Turing-computable if there is a Turing machine M such that for every input x ∈ A,
M halts with f (x) on its tape and all other tape cells blank.
c
R.
Pruim, December 4, 2006
Section 5.3
59
For Turing machine decisions, we have several roughly equivalent ways to represent a decision. Here is our official way: We will denote some of the states of the
Turing machine as “accepting” states and the rest as rejecting states and use the
state in which the machine halts to indicate the decision.
Definition 5.8 (Turing-computable set) A set A ⊆ Σ∗ is Turing-computable if
there is a Turing Machine that on every input x ∈ Σ∗ halts in an accepting state if
x ∈ A and in a rejecting state if x 6∈ A.
Definition 5.9 (Computably enumerable set) A set A ⊆ Σ∗ is computably
enumerable if there is a Turing-computable function f such that A = range(f ), that
is x ∈ A ↔ ∃zf (z) = x.
The idea behind this last definition is that by cycling through the various inputs
z and computing f (z), we get a “listing” of all members of A (perhaps with repetitions). Thus f “enumerates” A. See the exercises for another description of
computably enumerable sets.
We will need to demonstrate that this definition of computation corresponds
well with our intuitive notions about computation and that it is not too sensitive to
its particulars. The following theorems are a beginning in this direction.
Theorem 5.10 The following functions are Turing-computable:
1. f (n) = 0.
2. f (n) = n + 1.
3. f (n) = 2n.
4. f (m, n) = m + n.
Definition 5.11 (Neat Computation) A Turing machine computation is neat if
the following are true:
1. There is exactly one halting state, and the machine only halts in that state.
2. At the end of the computation, the tape head is returned to the left end of the
tape.
3. The output (if any) begins in the leftmost tape cell.
Theorem 5.12 (Neat Computation Theorem) If f is a Turing computable function, then f is computed by a TM that computes neatly. If A is a Turing coputable
set, then A is computed by a TM that computes neatly.
Theorem 5.13 (Composition Theorem) If f and g are Turing-computable, then
f ◦ g is computable. (f ◦ g(x) = g(f (x)).)
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
60
5.3
Notation
It is handy to introduce the following notation.
notation
meaning
input tape
a tape that holds input
output tape
a tape from which output is read (may also be an input
tape)
work tape
a tape that is neither an input tape nor an output tape
M (x) = y
on input x, machine M halts with y on its output tape
M (x) ↓
on input x, machine M halts
M (x) ↑
on input x, machine M fails to halt
M (x) accepts on input x, machine M halts in an accepting state
M (x) rejects on input x, machine M halts in a rejecting state
5.4
Exercises
Exercise 5.1. Write 1-way infinite 1-tape Turing Machines to compute each of the
following functions neatly.
(a) f (n) = 0.
(b) f (n) = 2n.
(c) f (m, n) = n + m
You may use either unary or binary encodings, but must use the same encoding for
all three parts. If you use unary, use the code pnq = 0n+1 (n + 1 0’s).
⊳
Exercise 5.2. More Challenging TM problems. Write 1-way infinite 1-tape
Turing Machines to compute each of the following functions:
(a) The conversion from a single binary input to its equivalent unary code.
(b) The conversion from a single unary input to its equivalent binary code.
If you are unable to do this with a single tape, you can make this problem simpler
by doing it with multiple tapes.
⊳
Exercise 5.3. Prove Theorem 5.13.
⊳
Exercise 5.4. Let b(n) denote the maximum number of ones that an n-state Turing
machine with tape alphabet Γ = {1, B} can print on an initially blank tape and
then halt. Here B denotes the blank symobl. Extra credit is available for proving
interesting facts about b(n). For starters, you might like to determine b(n) for some
small values of n.
⊳
c
R.
Pruim, December 4, 2006
Section 5.6
5.5
61
Robustness of the Turing Machine Model
Chapter 7 of Hopcroft/Ullman shows examples of several ways that one might consider modifying the definition of Turing machine: 2-way infinite tapes, multiple
tapes, 2-dimensional tapes, etc. Consult that text for details. The main point is
that all these different variations compute the same functions. In fact, many formal
models of computation have been proposed over the years, some of which look very
different from Turing machines. All of them have been proven to be equivalent to
or weaker than the Turing machine model. This has led most mathematicians to
accept the following statement.
Thesis 5.14 (Church’s Thesis) A function is computable if and only if it is Turingcomputable.
Church’s thesis cannot really be proven, since the notion of computable is not
formalized to anything specific. This is why it is called a thesis rather than a
theorem or conjecture. It could be essentially refuted by finding something that one
can prove is not Turing-computable but which is nevertheless clearly computable
(in some intuitive, informal sense). Similar statements can of course be made about
computable sets and computably enumerable functions. The word “computable” is
usually treated as a stand-in for any of the many equivalent formalizations, such as
Turing-computable.
5.6
More Exercises
Exercise 5.5. There are many ways to define computable sets. Here is an example:
Show that a set S is computable if and only if there is a computable function f
such x ∈ S ↔ f (x) = 1. Do this by describing at a high level the Turing machines
involved.
⊳
You may use Church’s Thesis for the rest of these problems. This means that
to show something is computable, you need only describe a high level computation
(in any combination of programming language, psuedo-code and English).
Exercise 5.6. There are also many ways to define computably enumerable sets.
Here is an example: Show that a set S is computably enumerable if and only if there
is a Turing machine M such that x ∈ S ↔ M (x) ↓.
⊳
Exercise 5.7. Here is yet another way to define computably enumerable sets: Show
that a set S is computably enumerable if and only if there is a two-tape Turing
machine M such that
• M never moves left on its second tape and never overwrites a non-blank tapecell on its second tape. [Such a tape is usually called an output tape.]
• When M is run with blank input tape, then M writes on its output tape all
the elements of S (in any order, repetitions allowed) separated by #.
Note that M might never halt.
c
R.
Pruim, December 4, 2006
⊳
62
Math/CS 312 Notes – Fall 2006
Exercise 5.8.
Show that if A ⊆ N is a computable set, then N − A is also a
computable set.
⊳
Exercise 5.9. Show that the collection of computable sets is closed under intersection, union and complement.
⊳
Exercise 5.10. Show that there is a set A that is not computable.
⊳
Exercise 5.11. Show that there is a set A that is not computably enumerable. ⊳
Exercise 5.12. Let A be computably enumerable. Show that if N − A is also a
computably enumerable, then A is computable.
⊳
Exercise 5.13. Show that the collection of computably enumerable sets is closed
under intersection and union.
⊳
Exercise 5.14. Show that if the collection of computably enumerable sets is closed
under complement, then every computably enumerable set is computable. (The fact
that we have two different names is a hint that we will eventually show that there
is a computably enumerable set that is not computable, hence that the computably
enumerable sets are not closed under complement – stay tuned.)
⊳
c
R.
Pruim, December 4, 2006
Chapter 6
Computational Complexity
Earlier we saw that we could use a particular model of computation (Turing machines) to define computable sets and functions. Recall that we call a set A decidable
if there is a Turing machine M that on any input x
• halts, and
• correctly computes decides whether x ∈ A (using an accepting or rejecting
state, for example).
But just because we know a Turing machine halts and is correct for all inputs
doesn’t mean it is efficient. In fact, a computation could be so inefficient that is is
of no practical value. In this chapter we will consider the computational complexity
of sets.1
It is traditional in this context to call the sets languages or decision problems
– since the computational “problem” is to “decide” whether a particular object is
or is not an element of the set. Decision problems are often described using an
INSTANCE/QUESTION format. For example, the decision problem PRIMES is
defined by
INSTANCE:
QUESTION:
A natural number n
Is n a prime number?
Implicit in this definition is the assumption that we have an agreed upon coding
of natural numbers as strings in Σ2 . The usual binary coding will work just fine
here.2
6.1
Time Complexity of a Turing Machine
But how do we measure the efficiency of a Turing machine M ? For starters, we could
run M on an input and keep track of the resources used during the coputation. The
two most commonly considered resources are time (number of steps the Turing
1
Many of the same sorts of questions can be asked about computable functions as well, but we
will limit our discussion to decidable sets.
2
We could also use a modified coding so that every string codes some natural number and each
number has only one code. As it is, we can simply have our Turing machine reject all strings that
begin with a 0 because they do not code positive natural numbers and therefore are not prime.
63
Math/CS 312 Notes – Fall 2006
64
machine runs) and space (number of tape cells visited by a Turing machine head).
We will focus our attention on time. Let’s let t′M (x) denote the number of steps
used by M on input x. (We are assuming that M always halts, so this makes sense.)
This gives us a (potentially) different number for each input. That is, t′M is really
a function: t′M : Σ2 → N.
It turns out that this function is a bit difficult to work with, so we will use
something slightly different.
Definition 6.1 (time complexity) Let M be a Turing machine that halts on all
inputs. The time complexity of M is a function tM : N → N defined by
tM (n) = max{t′M (x) : |x| = n}
The function tM has several advantages over the function t′M . In particular,
• It is usually easier to calculate since we don’t have to know how long each
computation takes, only the length of the longest computation for each length
of input.
• Its domain is N rather than Σ∗ . This will allow us to compose time complexity
functions.
This is sometimes called worst-case complexity because for each length of input
we are measuring the slowest computation. Sometimes other variations on this
theme are used. For example, if we averaged instead of taking the maximum, we
would get average-case complexity. Average case complexity is sometimes quite
interesting, but it is generally harder to work with.
Example 6.2. Let Σ = {0, 1} and let PALINDROMES be the language defined by
INSTANCE:
QUESTION:
A string x
Is x a palindrome?
A machine M to recognize PALINDROMES could work as follows.
1. Reads the first symbol of the input string x, “remember” it (using the state),
and change it to a blank on the tape. If the first symbol is already a blank,
then accepts.
2. Step over the input to get to the last symbol (detected by finding the trailing
blank and backing up one cell).
3. Read the last symbol and decide what to do based on the symbol:
(a) If the last symbol is blank (because we just changed the first symbol to
a blank and that is the only symbol), then accept.
(b) If the last symbol does not match the first symbol, reject.
(c) If the last symbol matches the first, then change it to a blank and repeat
the process of matching end symbols (return to step 1), but going in the
opposite direction.
c
R.
Pruim, December 4, 2006
Section 6.2
65
It is fairly clear that the longest computations are those for which M continues
to match symbols. So to compute tM (n), it suffices to find the time complexity of
M on inputs that are palindromes. The first few values of tM are easy to compute.
tM (0) = 1 since the machine, reading a blank, moves immediately to the accepting
state. tM (1) = 3 since M uses one step to remember the first symbol and move
right, a second to move back to the “last symbol” (now blank), and a third step to
accept the string because it reads a blank where the last symbol used to be. Finally,
tM (2) = 5. On input 00 say, the machine uses one transition to remember the 0, a
second to step to the first blank after the input, a third to move back to the 0, a
fourth to change that 0 to a blank and move left to the now blank first square of
input, and a fifth to move to the accepting state. To compute tM (n) for larger n,
it is helpful to realize that the operation of M consists of the repetition of a certain
subprocess. At the beginning of this subprocess, the read-write head is reading the
symbol at one end of the remaining string. The process consists of finding the other
end of the string, detecting the match, and moving to the new end of the string. If
there are k > 0 symbols left at the beginning of this subprocess, it uses k steps to
find the blank at the end, one step to move back to the end symbol, and 1 step to
process that symbol. So the subprocess uses k + 2 steps.
The tricky business here is the endgame. Note that each pass reduces the number
of symbols by 2. In particular, the last phase will be different for strings of even
and odd length. For strings of even length, at the beinning of the last phase, the
machine will use one step to find it has no more symbols to process. Thus for strings
of even length n, the machine will use (n + 2) + n + (n − 2) + · · · + 4 + 1 steps. For
strings of odd length n, the last pass (starting with k = 1 symbol) will use just 3
steps. So the computation will use (n + 2) + n + (n − 2) + · · · + 5 + 3 steps. Thus
tM (3) = 8. It is straightforward to verify that
(
(l + 1)(l + 2) − 1 when n = 2l is even
tM (n) =
(l + 2)2 − 1
when n = 2l + 1 is odd
⊳
Exercises
Exercise 6.1. Compute the time complexity for the “next string” Turing machine
you designed for Test 2. (Also give a high level description of the machine as in
Example 6.2.)
⊳
Exercise 6.2.
Give a high-level description of a two-tape Turing machine that
accepts PALINDROMES more efficiently than the one-tape machine in Example 6.2.
Determine its time-complexity function.
⊳
6.2
The Complexity of Decision Problems
We want to change our perspective a bit. Instead of starting with a Turing machine
and asking how efficient it is, we want to start with a decision problem and ask
how fast a computation for that problem can be. Our first idea might be to say the
c
R.
Pruim, December 4, 2006
66
Math/CS 312 Notes – Fall 2006
time-complexity of a problem is the time-complexity of the fastest Turing machine
that accepts it. This definition doesn’t work for two reasons:
• Suppose L(M1 ) = L(M2 ). It may well be that tM1 (n) < tM2 (n) for some n
while tM1 (n) > tM2 (n) for other n. So it may not be easy to say what it means
for one Turing machine to be faster than another.
• Given a Turing machine M that accepts a language L, there is usually another
machine that is faster on every input. Thus there certainly is no fastest Turing
machine for L.
Theorem 6.3 (Linear Speed-up Theorem) Suppose that M is a (multitape) Turing machine with time-complexity tM (n). For any ε > 0, there is a Turing machine
M ′ such that L(M ) = L(M ′ ) and t′M (n) ≤ εtM (n) + 2n. M ′ will have the same
number of tapes as M unless M has only one tape, in which case M ′ will have two
tapes.
Proof. (Sketch) To speed up M we will simulate k steps of M with fewer steps. To
do this, we will write the information from k tape cells of machine M on a single
cell of machine M ′ . The conversion can be done using the input and one other tape
in ≤ 2n steps. To simulate k steps of M , we may need to know what happens in up
to k tape cells to the left or right, that is, in the neighboring M ′ cell to either the
left or the right (but not both since we can’t get to both sides in k steps). Thus in
at most 4 steps we can simluate k steps of M : Read the current cell, read and write
in the nieghboring cell (if needed), go back and write in the original cell (if needed),
then move to the neighboring cell (if needed).
The theorem follows if we choose k so that 4/k < ε.
This theorem tells us that multiplicative constants don’t really matter very much
when comparing problems. If I have a Turing machine for each problem and one
takes twice as many steps as the other, I can simply speed up the slow machine by a
factor of 3 to get a new machine that is faster. Of course, then I could do the same
thing with the previously faster machine . . . .
We’ll formalize this idea that multiplicative constants don’t really matter to us
with the following definition.
Definition 6.4 (Big O) Consdier two functions f : N → N and g : N → N.
• We will say that “f is big O of g” and write f = O(g) if there is a constant
C such that
f (n) ≤ Cg(n) for all n ∈ N.
• If f = O(g) and g = O(f ), we write f = Θ(g) and say that “f is big theta of
g”.
The Speed-up Theorem tells us that most machines can be sped up by roughly
any constant multiple we like. There is a limit, however, to how much faster we can
make a Turing machine just by adding a new tape and recoding tape symbols.
c
R.
Pruim, December 4, 2006
Section 6.2
67
Theorem 6.5 Suppose that M is a multitape Turing machine with time complexity
tM (n). Then there is a one-tape machine M ′ with L(M ) = L(M ′ ) and t′M (n) =
O((tM (n))2 ).
Proof. (Sketch) We mentioned this when we showed how to simulate a two-tape
machine with a one-tape machine. The same idea works for any number of tapes.
The basic idea is that the one-tape machine must run all the way across its tape
some constant number of times (roughly twice the number of tapes in the multi-tape
machine) for each step it simulates. But the tape usage cannot be any more than
tM (n), so the time for our one-tape machine is approximately ktM (n) · tM (n) for an
appropriate constant k.
Exercises
Exercise 6.3. Show that if j < k, then nj = O(nk ) but nk 6= O(nj ).
⊳
Exercise 6.4. Show that if f = O(g) and g = O(h), then f = O(h). [Hint: How will
you get the necessary constant?]
⊳
Exercise 6.5. Describe functions f and g such that f 6= O(g) and g 6= O(f ) or
prove that no such functions exist.
⊳
Exercise 6.6. Show that if f (n) = a0 + a1 n + a2 n2 + · · · + ak nk where each ai ≥ 0
and ak > 0, then f = Θ(nk ).
⊳
Exercise 6.7. This problem demonstrates an interesting property of polynomials.
Compare f (n) with f (2n) for polynomials of the form nk and for exponential functions of the form bn for various k and b. What do you notice? What does this tell
you about the running time of a polynomial time algorithm on two inputs one twice
as long as the other?
⊳
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
68
6.3
The Classes P and NP
We would like to define a class of computations that we consider to be reasonably
efficient. The classes of interest are the following:
Definition 6.6
1. Computable Sets. (Also called decidable sets)
A set A is computable if there is a Turing machine M such that for any input
x,
• M halts when run on input x (M (x) ↓);
• x ∈ A ⇔ M (x) accepts.3
2. Computably Enumerable Sets.
There are several equivalent definitions for computalbly enumerable sets. A is
computably enumerable if . . .
(a) there is a Turing machine M such that x ∈ A ⇔ M (x) ↓
(b) there is a Turing machine M such that if x ∈ A, then M (x) accepts, and
if x 6∈ A, then M (x) may reject or may simply fail to halt.
(c) there is a Turing machine M that if run with an empty input tape will
write out all the elements of A on an output tape.
(d) A = ∅ or there is a computable function f such that A = {f (0), f (1), f (2), . . . }.
(Note that list enumeration need not be in order and may include repeats.)
(e) there is a computable set C such that x ∈ A ⇔ ∃y (hx, yi ∈ C)
3. P – polynomial time
A set A is said to be polynomial time computable (written A ∈ P) if there is a
Turing machine M and a polynomial p such that for any input x
• M (x) ↓ and this takes at most p(|x|) steps, where |x| is the length of x
• x ∈ A ⇔ M (x) accepts.
4. NP – nondeterministic polynomial time
A set A is said to be in the class NP (written A ∈ NP) if there is a set C ∈ P
such that for all inputs x
• x ∈ A ↔ ∃P y hx, yi ∈ C
that is, there is a polynomial q such that
• x ∈ A ↔ ∃y (hx, yi ∈ C ∧ |y| ≤ q(|x|))
The class NP can also be defined in terms of Turing machines that are nondeterministic (they may have more than one possible transition for some situations).
3
We defined acceptence in terms accepting and rejecting states. Acceptence can also be defined
in other ways. For example, the Turing machine could have a special output tape on which it writes
a 0 or a 1. This would be equivalent.
c
R.
Pruim, December 4, 2006
Section 6.3
6.3.1
69
Some Decision Problems in NP
We can think of NP-problems as search problems where it is easy (polynomial time)
to check if we have found what we are looking for, but may be difficult to find
it. Thought of this way, it is quite easy to generate many examples of interesting
problems that are in NP.
Among the many languages in NP, there is one that is both related to logic and
especially important. That language is
SAT = {x | x codes a tt-satisfiable quantifier-free sentence ϕ}
or, using our decision problem notation:
INSTANCE:
QUESTION:
A quantifier-free sentence ϕ
Is ϕ tt-satisfiable?
Note that for the purposes of deciding if ϕ is tt-satisfiable, we don’t really need
to know anything about the atomic sub-formulas of ϕ other than which ones are the
same. So usually these sentences are described using propositional letters (essentially symbols that stand for atomic sentences) rather than by specifying particular
constants and predicates. An example of such a sentence would be
ϕ = (A ∧ B) ∨ (A ∧ C ∧ ¬D) ∨ (¬C → D)
In this case, the brute force way to decide whether ϕ ∈ SAT would be to construct
a truth table. That is, we would try all the truth assignments until (if ever) we find
one that makes our sentence true. So we are searching for a truth assignment. With
a truth assignment in hand, we can easily check whether our sentence is true with
that truth assignment. But since a sentence with n atoms has 2n different truth
assignments, and 2n 6= O(nk ) for any k, we can’t try them all in polynomial time.
Of course, there may be some other way to decide if ϕ ∈ SAT, so perhaps
SAT ∈ NP. No one knows if this is the case or not. In fact, we don’t even know
whether P = NP or P 6= NP. But we do know this: If P 6= NP, then SAT 6= P. So
SAT plays a very important role in deciding whether P = NP or not. We’ll have
more to say about this shortly.
For now, let’s compile a few more examples of NP-problems:
• CLIQUE
INSTANCE:
QUESTION:
A graph G and an integer n.
Does G have a clique of size n?
• HC (Hamiltonian Cycle)
INSTANCE:
QUESTION:
A graph G
Does G have a path that visits each vertex exactly once and returns
to its starting point?
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
70
• EC (Euler Cycle)
INSTANCE:
QUESTION:
A graph G
Does G have a path that travels each edge exactly once and returns
to its starting point?
• VC (Vertex Cover)
INSTANCE:
QUESTION:
A graph G and an nautrual number n
Does G have a vertex cover of size n? That is, is there a set of
vertices of size n that includes at least one end of each edge?
• COMPOSITES4
INSTANCE:
QUESTION:
a natural number n ≥ 2
Is n a composite? (That is, can it be factored?)
The particular coding used to encode instances of these problems does not matter
much, but they should be reasonably efficient, since long inputs provide more time
for a polynomial algorithm. Codes for natural numbers, for example, should be in
binary, not unary. A graph with n vertices could be encoded as an adjaceny matrix
(a sequence of n2 0’s and 1’s to indicate which vertices are connected to which –
note that we don’t even need to know the names of the vertices for a typical graph
problem). There are other ways we could consider coding graphs as well.
Exercises
Exercise 6.8. Show that P ⊆ NP. That is if L ∈ P, then L ∈ NP.
⊳
Exercise 6.9. Show that every infinite c.e. set has an infinite computable subset.
⊳
Exercise 6.10. For each of the following languages, (1) express that language using
our decision problem notation, and (2) determine if the language is computable,
c.e., or none of these. Give careful explanations for each of your answers, but feel
free to invoke Church’s Thesis and any of the tools we have developed.
(a) F = {e : We is finite}
(b) I = {e : We is infinite}
(c) Q = {he, w, qi : Me on input w enters state q}
4
The complementary problem PRIMES is very interesting. It is less clear that PRIMES is in
NP (what easily checkable evidence would you give to convince someone that a number is prime?)
But using some results from absract algebra and number theory, it has long been known that
PRIMES ∈ NP. In fact, it has recently been shown that PRIMES is in P. It is still not know,
however, if it is possible to actually get the factors of a composite number in polynomial time. If
this were possible, RSA encryption would be compromised.
c
R.
Pruim, December 4, 2006
Section 6.4
71
(d) B = {e : Me on input λ eventually writes a non-blank letter on the tape} ⊳
Exercise 6.11. For each of the following languages, (1) express the language using
our decision problem notation, and (2) determine if the language is polynomial time
computable, in NP, computable, c.e., or none of these. Give careful explanations for
each of your answers, but feel free to invoke Church’s Thesis and any of the tools
we have developed.
(a) L = {he, wi : Me on input w moves left at some step in the computation}
(b) S = {he, w, ki : Me on input w never moves more than
k tape cells away from starting position}
⊳
Exercise 6.12. Show that the following decision problems are not computable.
(a) Instance: TM M
Qustion: Is there an input on which M halts?
(b) TOTAL
Instance: TM M
Question: Does M halt on all inputs?
Can you show that either of the decision problems above is or is not c.e.?
⊳
Exercise 6.13. Projection Theorem for the C.E. Languages. Prove that a
language L is c.e. if and only if there is some computable language C such that
x ∈ L ↔ ∃y [hx, yi ∈ C] .
That is, show that Defintion 6.6.2(e) is equivalent to the others.
⊳
Exercise 6.14. Projection “Theorem” for NP. Let C be the class of languages
L for which there exists a language A ∈ P such that
x ∈ L ↔ ∃y [hx, yi ∈ A] .
You might guess that C = NP, but this is not correct. What class is C?
⊳
Exercise 6.15. Listing Computable Sets. We showed that L is c.e. just if there
is a computable function f which lists L. (L = {f (0), f (1), . . . }) Since computable
sets are c.e., they have listing functions too. Show that a non-empty language is
computable if and only if it has a listing function which is non-decreasing.
This tells us that any listing of a non-computable c.e. language cannot be in
order.
⊳
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
72
6.4
6.4.1
Reductions and Closure Properties
Polynomial Time Reductions
There are many ways in which algorithms for determining membership in one set
might be used as sub-routines in an algorithm for determining membership in another. These lead to several different notions of reducibility between sets. We will
study one of the most important of these, namely many-one reducibility.
Definition 6.7 (Polynomial Time Reduction) We will say that A ≤m B if
there is a computable function f such that
x ∈ A ↔ f (x) ∈ B .
If f is computable in polynomial time, then we will write A ≤P
m B.
Theorem 6.8 P and NP are closed downward under ≤P
m -reductions. That is,
1. If A ≤P
m B and B ∈ P, then A ∈ P.
2. If A ≤P
m B and B ∈ NP, then A ∈ NP.
Proof. We will prove only part 1, part 2 is left as an exercise. Let f be a reduction
f : A ≤P
m B, and let B ∈ P. Then
x ∈ A ⇔ f (x) ∈ B
So an algorithm for deciding whether x ∈ A is the following.
1. On input x, compute f (x).
2. Use the Turing Machine for B to check whether f (x) ∈ B, output the same
answer.
This is a polynomial time algorithm because we can compute f (x) is polynomial
time, and we can decide whether f (x) ∈ B in time that is polynomial in |f (x)|. But
|f (x)| is polynomial in |x| since we can write at most one output symbol per step and
f (x) can be computed in polynomial time. Since the composition of polynomials is
a polynomial, this means we can check whether f (x) ∈ B is time that is polynomial
in |x|.
Exercise 6.16. Prove Part 2 of Theorem 6.8.
6.4.2
⊳
Other Closure Properties
The table below summarizes some of the closure properties of our favorite complexity
classes. Composition refers to the corresponding notion of function (computable,
partial computable, polynomial time computable). Defining “NP-function” is a bit
tricky.
c
R.
Pruim, December 4, 2006
Section 6.4
73
complement
union/intersetion
search
sub-routines
composition
6.4.3
Comp
Yes
Yes
bounded
Yes
Yes
CE
No
Yes
Yes
1-sided
Yes
P
Yes
Yes
O(log)
Yes
Yes
NP
No?
Yes
poly
1-sided
n/a
Hardness and Completeness
Definition 6.9 (Hardness and Completeness)
1. A set L is NP-hard if L ∈ NP and for any A ∈ NP, A ≤P
m L.
2. If L ∈ NP and L is NP-hard, we say that L is NP-complete (under many-one
reductions).
SAT is important because it is one of the first examples of an NP-complete set.
Theorem 6.10 SAT is NP-complete. That is, SAT ∈ NP and if A ∈ NP, then
A ≤P
m SAT.
Proof. To prove that SAT is in NP, we must show that we can check if a truth
assignent is satisfying in polynomial time. This proof will be omited.
The heart of the matter is to show that every decision problem L ∈ NP can be
polynomially reduced to SAT. Let M be a Turing machine and let p be a polynomial
such that
• M (hx, yi) halts in at most p(|x|) steps for each x.
• x ∈ L ⇔ ∃y |y| ≤ p(|x|) ∧ M (hx, yi)accepts
To simplify our description, lets assume that M has one one-way infinite tape the
alphabet for x and y is Σ = {0, 1} and that hx, yi is coded as x#y. As we have seen
before, there must be such a machine that accepts L.
For any input x, our reduction must construct a formula ϕ such that x ∈ L iff
ϕ ∈ SAT. Our formula will have different components representing different types
of information about the computation: the states, the contents of the tape, and the
head position. This information is represented by Boolean variables. To simplify
notation we assume that the states are q0 , . . . , qk−1 , that the tape alphabet consists
of the symbols a1 , . . . , am , and that a0 = 0, a1 = 1, a2 = #, and and am is the blank
symbol.
• Q(i, t), 0 ≤ i ≤ k − 1, 0 ≤ t ≤ p(n): Q(i, t) represents that at step t, M is in
state qi . (The 0th step corresponds to the initialization.)
• T (i, t, l), 1 ≤ i ≤ m, 0 ≤ t ≤ p(n), 0 ≤ l ≤ p(n): T (i, t) represents that at step
t, the symbol ai is on cell l of the tape.
• H(t, l) (for 1 ≤ t, l ≤ p(n)): H(t, l) represents that the head is on tape cell l
at time t.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
74
All together we use only polynomially many of these boolean variables.
Our formula will be a conjunction of clauses. The clauses will express the way
M works in such a way that there is a string y such that M accepts input hx, yi
if and only if the clauses are simultaneously satisfiable. To acheive this, we mus
ensure that the variables represent things the way we are imagining. That is, we
must satisfy the following conditions.
1. The variables for t = 0 correspond to the initial configuration of the computation.
2. The last configuration is accepting.
3. The variables represent conigurations (snap-shots of what the machine looks
like at a particular point of the computation) of a Turing machine computation.
4. The tth configuration is the successor configuration of the (t − 1)st configuration according to the instructions of the Turing machine M .
This can be done separately for each condition since conditions (1)–(4) are conjunctively connected. In the end, our formula will be the conjunction of all the
components described below.
(1) Since M starts in state q0 , it must be the case that Q(0, 0) = 1 and Q(i, 0) = 0
for all i 6= 0. Furthermore, T (i, 0, l) must code the initial tape contents:
• T (i, 0, l) whenever 1 ≤ l ≤ n and xi = ai
(This is the only point at which the input x influences the transformation.)
• T (2, 0, n + 1) = 1
(# comes between x and y.)
• T (0, 0, l) ∨ T (1, 0, l) ∨ T (m, 0, l) for n + 2 ≤ l ≤ p(n).
(y consists of 0’s and 1’s only.)
• T (m, 0, l) → T (m, 0, l + 1) for n + 2 ≤ l ≤ p(n).
(Once we finish with y the rest of the tape is blank.)
(2) For this we need a clause that is the disjunction of all Q(i, p(n)) for which qi
is an accepting state. (Alternatively, we could require that the only accepting state
be q1 and use Q(1, p(n)).)
(3) At every time, the Turing machine is in exactly one state, the tape head in
exactly one position, and each tape cell contains exactly one symbol. Syntactically,
this is the only condition placed on configurations. Formally, this means that for
every t ∈ {0, . . . , p(n)}, and every l ∈ {0, . . . , p(n)}, exactly one of the variables
Q(i, t) and exactly one of the variables T (i, t, l) is true. This is easily expressed as
a disjunction of expressions of the form5
(A ∧ ¬B ∧ · · · ∧ ¬Z) ∨ (¬A ∧ B ∧ · · · ∧ ¬Z) ∨ · · ·
5
With a little work we can make put this in CNF form using the fact that the set of states and
the alphabet are both finite.
c
R.
Pruim, December 4, 2006
Section 6.4
75
(4) Now we must code the semantics of the Turing machine M . The key property
to note is that Turing machine computation is local. The head can only move one
cell to the left or right, and the only cell that effects what M does is the one where
the head is located. So if we know the state at time t, the contents of tape cell l
and its left and right neighbors at time t, and the instruction set of M , then we can
easily figure out the contents of cell l at time t + 1. And if the head is located at
cell l at time t, then we can also know the state and head position at time t + 1.
That is, we can update the tableau of the computation (our graphical picture of a
computation as a big rectangle) by looking at 2 × 3 rectangular windows.
This makes it easy to write down the components of our formula that say things
update at each step the way M would.
..
.
All together we have polynomially many clauses of polynomial size. The clauses
can be computed in time O(p(n)).
If M accepts input hx, yi we can replace the Y (·) variables in our SAT instance
with the values that they represent in the computation of M and this will satisfy
all the clauses. On the other hand, if there is an assignment for the variables
that satisfies all of the clauses, then we obtain an accepting computation of M by
setting y according to the values of the variables Y (·) in the satisfying assignment.
Condition (1) ensures that M is correctly initialized. Condition (3) ensures that
the variables represent a current state and a read tape symbol at every step in the
computation. Condition (4) ensures inductively that the states and tape symbols
follow the computation of M . Finally, condition (2) ensures that the computation
accepts its input.
NP-complete problems (there are very many important problems that are NPcomplete) are the “hardest things in NP”. In particular, if L is NP-complete and
L ∈ P, then P = NP, since P is closed under ≤P
m -reductions. Since no one knows
a polynomial time algorithm for any of theses sets, and many of them have been
studied extensively, most researchers believe that P 6= NP. In any case, if you are
trying to write an algorithm for some problem and discover that it is NP-complete,
either you will be unable to find an algorithm that runs in polynomial time or you
will be famous (at least, if you are the first to find a polynomial time algorithm for
such a problem).
Exercises
Exercise 6.17.
This problem uses the following definitions:
A ≡m B ↔ A ≤m B & B ≤m A
degm (A)
=
A△B
=
=
{B : A ≡m B}
{x : x ∈ A ↔ x 6∈ B}
(A ∩ B) ∪ (A ∩ B)
degm (A) is called the (polynomial time many-one) degree of A.
(a) Show that ≡m is an equivalence relation.
c
R.
Pruim, December 4, 2006
Math/CS 312 Notes – Fall 2006
76
(b) What is degm (∅)?
(c) Show that (except for a couple of exceptions) if A △ B is finite, then A ≡m B.
What are the exceptions?
⊳
Exercise 6.18.
machine.
Consider the following sorting algorithm on a standard Turing
S1: Scan accross the tape to the right until (if ever) two consecutive letters are
found which are out of order.
S2: If two such letters are encountered, swap them, return to the left edge of the
tape and go to step S1.
S3: Else if the end of the string is encountered without finding two consecutive
letters out of order, then halt (the string will be sorted).
(a) Give the least upper bound on the number of steps it takes for this algorithm
to perform one swap (steps S1 and S2) on an input of size n. (Explain briefly
how you are implementing steps S1 and S2.)
(b) Find a good upper bound on the maximum number of swaps necessary to sort
a string of length n. (Does the size of the input alphabet matter?) Bonus if
you can get it exactly.
(c) Give a good upper bound on the time complexity of this algorithm for input
alphabets with one, two and three letters.
⊳
Exercise 6.19. Let L be the language over Σ = {a, b, c} such that L contains all
strings x that have a substring u with the following two properties
• |u| ≥ 3
• u contains the same number of a’s and b’s.
Describe a nondeterminisitic Turing machine which accepts L in linear time. (You
may use multiple tapes if you like.)
⊳
Exercise 6.20. 3SAT is NP-Complete.
(a) Show that for any clause (disjunction over literals) there is an equivalent clause
in which no literal appears more than once. State the definition of equivalent.
(b) For each of the following clauses C, find a 3CNF6 formula Φ (perhaps with
more variables than C had) such that any truth assignment which makes C
true can be extended to a truth assignment which makes Φ true, and any truth
assignment which makes C false cannot be extended to a truth assignment
which makes Φ true. You may assume that no literal appears in C more than
once.
6
3CNF means that the formula is a conjunction of clauses, each clause is the disjunction of
exactly three literals, and no literal appears more than once in a clause.
c
R.
Pruim, December 4, 2006
Section 6.4
77
(i) C = x
(ii) C = x ∨ y
(iii) C = x ∨ y ∨ z ∨ w
(iv) C = u1 ∨ u2 ∨ · · · ∨ uk (a general clause with k > 4 literals)
(c) Use the preceding results to show that SAT ≤pm 3SAT. Be sure to describe
the reduction carefully and to justify that it can be done in polynomial time.
⊳
Exercise 6.21. Traveling Salesperson Problem is NP-complete.
Show that TSP is NP-complete. (Hint: for hardness, compare with HC =
Hamiltonian Circuit).
⊳
Exercise 6.22.
guages:
Unary and Binary Primes. Consider the following two lan-
UP = {ap : p is prime }
BP = {x ∈ {0, 1}∗ : x is a prime number written in binary}
Show that UP is in P (how good is your algorithm?) and that BP is in co-NP.
⊳
Exercise 6.23. Another problem with one-letter alphabets.
Show that if there is an NP-complete language over a one-letter alphabet, then
P = NP. (Hint: If there were such a language, SAT would be reducible to it. Use
this to show that SAT ∈ P by using the reduction to prune the search tree for a
satisfying assignment.)
⊳
Exercise 6.24. E is a funny class.
Recall that A ∈ E if there is some Turing machine M and an integer k, such
that M accepts A and tM = O(2kn ). A ∈ EXP if there is some Turing machine M
k
and an integer k, such that M accepts A and tM = O(2n ).
For any language A and function q : N → N, let Bq = {x01q(n) : x ∈ A & |x| =
n}.
(a) Show that for any polynomial q, A ≡pm Bq .
(b) Show that if A ∈ EXP, then for some polynomial q, Bq ∈ E.
(c) How does this show that the class E is NOT closed under ≤pm -reductions?
⊳
c
R.
Pruim, December 4, 2006

Download Report

Class Notes

Paperzz.com

Your Paperzz