A Primer on Intuitive Set Theory
Sam Smith
January 2005
“No one shall expel us from the paradise which Cantor has created for us.”
David Hilbert
Introduction.
In these notes, we give highlights of the theory of sets as developed by
Georg Cantor (1845-1918). Cantor’s results, most of which he obtained in
the 1870s, represent a singular achievement in the history of mathematics.
They were at once completely novel and unexpected, and, at the same time,
compellingly elegant and natural. Virtually from scratch, Cantor developed
a thorough system of measure and comparsion for infinite sets which is now
part of the basic vocabulary of mathematics. Cantor’s theory was also, from
its inception, extremely controversial. Fundamental questions raised by Cantor’s methodology and and his stiking results became central issues for both
philosophers and mathematicians. Cantor struggled with the controversies
himself until his death. We give a brief indication of two important aspects
of controversy regarding Cantor’s methods in this introduction.
As noted by many important thinkers of his day, there was a serious
price of admission to Cantor’s “paradise”: To enter, one must take sides on
the controversial issue of the existence of actual or completed mathematical
infinity. For recall that Aristotle and his adherents argued that to speak of
1
an actual infinite totality is incoherent: infinitude is a process, something
that is never completed. Thus listing the counting or natural numbers
1, 2, 3, . . .
is sensible, but writing the totality
{1, 2, 3, . . .}
of all natural numbers is not. Cantor’s first move was to reject this prohibition and to allow for infinite totalities. In fact, Cantor went substantially
further. Not only was he willing to consider the possible existence of infinite
totalities, in pursuing his revolutionary program Cantor proposed making a
detailed mathematical study of their properties.
This particular controversy deepens when we discuss other number systems, specifically the so-called real numbers. While a rational number may
be succinctly defined
of two integers, how do we make sense of
√ as a ratio
π
numbers such as 2 or π or 3 ? A rigorous construction of the real numbers from the rationals was given by a friend and contemporary of Cantor’s
Richard Dedekind in the 1860’s. In this construction, real numbers appear
as certain (infinite!) subsets of the rational numbers called “cuts”. For our
purposes, we will understand real numbers using an idea which goes back to
grade school: decimal expansion. We have
√
2 = 1.414213562373095 · · · , π = 3.141592653589793 · · · , and
3π = 31.54428070019754 · · · .
Note that we must conscience a completed infinity just to write down a single
real number.
A second controversy related to Cantor’s work stems from an observation made by Betrand Russell at the turn of the twentieth century. In what
became a fundamental issue for both mathematicians and philosophers, Russell showed that the most obvious notion of what a set is leads directly to
paradox. Russell’s paradox demonstrated the need for careful and precise
axiomatic proofs in set theory and logic. Cantor’s set theory was ultimately
redeveloped axiomatically using the Zermelo-Frankel axioms (Appendix D).
Moreover, twentieth century set theorists focused as much on the meaning
of a theory of sets, a perspective known as meta-theory, as on proving the
2
theorems within this theory. In these notes, we will focus primarily on the
latter – proving theorems – and we will largely ignore the issues raised by
Russell’s paradox. The non-axiomatic approach we take is referred to as
naive or intuitive set theory for this reason.
1. The Basics.
In this section we establish some basic terminology and recall some elementary facts about sets. We begin with
1.1 Definitions. A set S is any collection of objects. The members of S
are called the elements. We write x ∈ S to indicate that the object x is an
element of S. If A is a set and every element of A is an element of S we say
A is a subset of S and write A ⊆ S. The set with no elements is called the
empty set and denoted by φ.
1.2 Examples. Some important sets which will figure prominently below
are
1. The set N of natural or counting numbers; N = {1, 2, 3, ...}. For each
element m ∈ N we have the finite subset Nm = {1, 2, 3, ..., m}
2. The integers Z = {..., −2, −1, 0, 1, 2...}.
3. The rational numbers Q which is the set of all ratios of integers with
nonzero denominator; Q = {p/q|p, q ∈ Z, q 6= 0}.
4. The real numbers R. As mentioned in the introduction, the representation of elements of R is itself a significant issue. We will express a positive
real number x ∈ R as an infinite decimal x = bn bn−1 · · · b1 b0 .a1 a2 a3 · · · where
each bj and ai is a ‘”digit” i.e. a number between 0 and 9 inclusive. This
representation is not unique since, e.g., .99999 · · · and 1.00000 · · · represent
the same number. It turns out that if we make a commitment to uniformly
choose the former or the latter in every case where it occurs then our decimal expansion will be unique and well-defined. We elect to always choose
the repeating 0’s over 9’s in what follows
Now notice that integers and rational numbers are also real numbers.
What do their decimal expansions look like? For integers, the answer is immediate. With our convention 1 = 1.0000 · · · etc. As far as fractions go, we
have 1/3 = .33333 · · ·, 1/12 = 0.083333 · · · and 1/7 = .1428571428571428 · · ·
3
Note that in each case, after some preliminary digits, we have a repeating
pattern. To be precise, say a decimal expansion x = bn bn−1 · · · b1 b0 .a1 a2 a3 · · ·
is repeating if there exists natural numbers k and j such that ai = ai+j for
all i ≥ k. Note that 1/3 (k = 1, j = 1) and 1/12 (k = 3, j = 1) and 1/7
(k = 1, j = 6) are all repeating. We have
Theorem 1.3 A real number x is rational if and only if x has a repeating decimal expansion.
Proof. Suppose x has a repeating decimal expansion. We show x is a ratio
of two integers. It clearly suffices to assume x is a pure decimal, i.e. has no
integer part. In this case, with k and j as in the definition, we may write
x = .a1 a2 a3 · · · ak ak+1 · · · ak+j
where we are using the usual repeating decimal notation. Notice that multiplying x by powers of 10 serves to shift the decimals of x to the left of the
decimal point. In particular,
10k x = a1 · · · ak .ak+1 · · · ak+j
and
10k+j x = a1 · · · ak ak+1 · · · ak+j .ak+1 · · · ak+j .
Subtracting, cancels the decimal part and gives
(10k+j − 10k )x = (a1 · · · ak ak+1 · · · ak+j ) − (a1 · · · ak ).
The point is that the right hand side is an integer. Let’s call it m. We then
have x = m/(10k+j − 10k ) is a rational number.
Conversely, suppose x = p/q is rational for integers p and q. To find the
decimal expansion of x we use long division to divide q into p. For example,
to find the expansion of 1/7 we divide 7 into 1.000 · · ·: 7 divides into 10 once
with remainder 3, our first digit is 1. The remainder 3 dictates the next
step: 7 divides into 30 four times with remainder 2, our second digit is 4, 7
divides into 20 twice with remainder 6 our third digit is 2, 7 divides into 60
eight times etc. Now observe that the remainders that we get at each stage
are between 0 and q (or in our example 7). Thus the remainder eventually
4
repeats. But once the remainder repeats the digits repeat, as needed. 2
We have now introduced the most important numeric sets that we will
encounter. It is important to realize, however, that any object can be an
element of a set. In particular, sets can themselves be elements
√ of other sets.
For instance, we must distinguish
between the set A = { 2, N} with two
√
elements and the set B = { 2, 1, 2, 3, ...} with infinitely many elements.
We assume the reader is familiar with the basic set operations of union,
intersection and difference. The definitions are
A ∪ B = {x|x ∈ A or x ∈ B}
A ∩ B = {x|x ∈ A and x ∈ B}
A − B = {x|x ∈ A and x 6∈ B}.
These operations are related by De Morgan’s Laws:
Theorem 1.4 For any sets A, B and C we have
i) A − (B ∪ C) = (A − B) ∩ (A − C) and
ii) A − (B ∩ C) = (A − B) ∪ (A − C).
Proof. We prove i) and leave ii) as an exercise. To prove two sets S and T
are equal, we must show 1) that every element of S is an element of T ; i.e.,
that S ⊆ T , and 2) that every element of T is an element of S: T ⊆ S. So let
x ∈ A − (B ∪ C) be any element. Then, by definition, x ∈ A and x 6∈ B ∪ C.
The latter means that x cannot be in B nor in C. Thus x ∈ A and x 6∈ B
and x 6∈ C which implies x ∈ (A − B) ∩ (A − C).
Conversely, suppose y ∈ (A−B)∩(A−C) is any element. Then y ∈ A−B
and y ∈ A − C. In either case, y ∈ A. Moreover, y 6∈ B and y 6∈ C and so
y 6∈ B ∪ C. Thus y ∈ A − (B ∪ C), as needed. 2
Exercise 1. Prove Theorem 1.4, part ii).
We will also consider the cartesian product A × B of two sets:
A × B = {(x, y) : x ∈ A and y ∈ B}.
5
For example,
N2 × N3 = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)}.
In general, if A had m elements and B has n elements then A × B has mn
elements. Thus cartesian product is like a multiplication for sets.
The power set P (A) of a given set A is defined to be the set of all subsets
of A. For example P (N2 ) = {φ, {1}, {2}, N2 }. Of course, if A is an infinite
set then so is P (A). For finite sets, the power set behaves like an exponential. Specifically, we have the following theorem which shows why P (A) is
sometimes denoted 2A .
Theorem 1.5 Suppose A has m elements. Then P (A) has 2m elements.
Proof. If A = φ is the empty set then P (A) = {φ} has 1 = 20 elements. If
A has 1 element, say A = {a}, then P (A) = {φ, A} has 2 elements and the
theorem holds for m = 1, as well.
But to prove this theorem, we must consider sets of size m, for each
possible m ∈ N! This seems, at first glance, an impossibility. However,
a natural approach is the following: we have proven the theorem true for
m = 0 and m = 1. We will now assume the theorem true for some fixed
m and argue that, with this assumption, the theorem is true for the next
number m + 1. This approach logically establishes the theorem for all m; it
is called the principle of induction.
So fix m and assume P (A) has 2m elements for all sets with m elements.
Let B be any set with m + 1 elements. We need to show that P (B) has 2m+1
or twice as many elements as P (A).
Now, since B has m + 1 elements, we may write B = A ∪ {b} where A has
m elements and b 6∈ A. By our assumption, the theorem is true for A. That
is, there are exactly 2m subsets of A. Now notice that there are exactly two
kinds of subsets of B: those which have b in them and those which do not.
The first kind of subsets are precisely the subsets of A. Thus there are 2m of
them. Let us call them A1 , . . . , A2m . Then notice that there are also exactly
2m subsets of B of the second type – namely, A1 ∪ {b}, . . . , A2m ∪ {b}. Thus
P (B) has twice as many elements as P (A), as needed. 2
Exercise 2. Let A = {1, 2, 3} and B = {3, 5, 7}. Write down the elements
6
of the sets
a) P (A − B)
b) A × P (A ∩ B)
c) P (A) − P (B).
2. One-to-One Correspondences and Cardinality of Sets.
Given two sets A and B, a natural question to ask is whether they have
the same size. When A and B are finite we can just count elements and
see. When A and B are infinite, however, this is a more interesting question.
How do we compare set sizes when both are infinite? An answer comes from
examining the meaning of counting.
When we count a set A, we decide which element of A is number 1, which
is 2 and which is 3 etc. That is, we define a function from Nm to A where m
is the size of A. Morevoer, this function assigns exactly one element of Nm
to each element of A. This is called a one-to-one correspondence. We make
this notion precise with
2.1 Definitions. A function f : A → B between two sets A and B is a
rule which assigns to each element x ∈ A a uniquely determined element
y = f (x) ∈ B.
A function f : A → B is one-to-one if, for every pair x1 , x2 ∈ A, f (x1 ) =
f (x2 ) in B implies x1 = x2 .
A function f : A → B is onto if, for every element y ∈ B, there exists
x ∈ A such that f (x) = y.
A function f : A → B which is both one-to-one and onto is called a
one-to-one correspondence between A and B.
2.2 Examples.
1. The function f : Z → Z given by f (x) = 2x is one-to-one. For
suppose f (x1 ) = f (x2 ) for some integers x1 , x2 . Then 2x1 = 2x2 which
implies x1 = x2 . Note that f is not onto since, e.g., 3 is not of the form 2x
for any integer x. In fact, f defines a one-to-one correspondence between
the set of all integers and the set of all even integers. This is an example of
Galileo’s paradox: an infinite set can be in one-to-one correspondence with
a proper subset of itself!
7
2. The function f : (0, 1) → (1, +∞)1 defined by f (x) = 1/x is a oneto-one correspondence. For suppose f (x1 ) = f (x2 ) for some x1 , x2 ∈ (0, 1).
Then 1/x1 = 1/x2 which clearly implies x1 = x2 . For onto, note that if
y ∈ (1, +∞) then y > 1 which implies 0 < 1/y < 1. Thus take x = 1/y and
we have f (x) = y.
3. There exists a one-to-one correspondence between the interval (−2, 2)
and R. Here we must define a function g : (−2, 2) → R ourselves. With
Example 2 in mind, we set g(x) = 1/x for x ∈ (−1, 0) ∪ (0, 1). Note that
g is one-to-one and maps onto (−∞, −1) ∪ (1, +∞). Next we make g(0) =
0. Now all that’s left to do is find a one-to-one correspondence between
(−2, −1]∪[1, 2) and [−1, 0)∪(0, 1]. We accomplish this using linear functions:
for x ∈ (−2, −1] let g(x) = −2−x and for x ∈ [1, 2) let g(x) = 2−x. It is then
an easy exercise (cf. Exercises 3 and 4 below) to show that g restricted to
(−2, −1] ∪ [1, 2) is one-to-one and onto. Thus g : (−2, 2) → R is a one-to-one
correspondence. Our definiton of g can be summarized as
g(x) =
−2 − x
1/x
0
1/x
2−x
−2 < x ≤ −1
−1 < x < 0
x=0
0<x<1
1 ≤ x < 2.
It may help to graph g to see how it works.
Exercise 3. Define f : (0, 1) → (−1, 3) by f (x) = 4x − 1. Prove that f
is a one-to-one correspondence.
Exercise 4. Prove that, given any two points a, b there is a one-to-one correspondence between the interval (a, b) and (−2, 2). (Try a linear function!)
One-to-one correspondence allows us to decide when two sets have the
same size, regardless of the meaning of the elements. Let us write A ≈ B
if there exists a one-to-one correspondence f : A → B. Notice that A ≈ B
is either true or false for any given ordered pair of sets A, B. That is, there
either is or is not a one-to-one correspondence f . We say that ≈ is a relation
1
Given two real numbers a < b, (a, b) = {x ∈ R|a < x < b}. We allow a = −∞ or
b = +∞ to indicate unbounded intervals to the left or right, respectively. Square brackets
indicate the endpoint (when not infinite) is included.
8
on any set of sets. In fact, ≈ is called an equivalence relation. The formal
definitions are as follows.
2.3 Definitions.
Let S be any set. A relation ∼ on S is a function
R : S × S → {T, F }. Here R(x, y) = T means x is related to y written x ∼ y
while R(x, y) = F means x is not related to y or, notationally, x 6∼ y.
A relation ∼ on a set S is an equivalence relation if the following three
properties hold: i) (reflexivity) x ∼ x for all x ∈ S, ii) (symmetry) x ∼ y
implies y ∼ x for all x, y ∈ S, and, iii) (transitivity) x ∼ y and y ∼ z imply
x ∼ z for all x, y, z ∈ S.
Equivalence relations behave like equality, =, which is the prototype.
Note that there are many relations which are not equivalence relations. For
example, inequality ≤ and strict inequality < on the reals and rationals.
These relations are prototypes for a class of relations called order relations.
We will study these in §6.
The following fundamental result is proved using inverses and compositions of functions. We recall these notions and give the proof in Appendix
A.
Theorem 2.4 Let S be any set of sets.2 The relation ≈ of one-to-one correspondence is an equivalence relation on S. 2
Ultimately, in §6, we will assign to each set A a “number” |A| called the
cardinal number or cardinality of A which captures the size of A. When A
is finite, |A| will equal the number of elements of A. When A is infinite the
definition of the cardinal number of A is trickier to make. For the moment,
we will content ourselves with defining three relations: equality |A| = |B|,
inequality |A| ≤ |B| and strict inequality |A| < |B of cardinals numbers:
2.5 Definitions. We say the cardinality of A is equal to the cardinality of B
and write |A| = |B| if there exists a one-to-one correspondence f : A → B;
i.e., if A ≈ B.
We say the cardinality of A is less than or equal to the cardinality of B
2
We would like S to be the set of all sets. Unfortunately, the existence of such a
universal set leads directly to paradoxes!
9
and write |A| ≤ |B| if there exists a one-to-one function f : A → B.
If |A| ≤ |B| but there is no one-to-one correspondence between A and B
we say the cardinality of A is strictly less than that of B and write |A| < |B|.
In our new notation, Examples 1,3 and Exercise 4 now say
|Z| = |{even integers}|
|(−2, 2)| = |R|
and
|(a, b)| = |(−2, 2)|
for any a < b. Notice how natural and necessary the reflexive, symmetric and
transitive laws are. In particular, we can write |(a, b)| = |(−2, 2)| = |R|. This
sentence has meaning: it says that there are as many points in any interval
of R as there are in all of R!
It is important to realize that just because we call our relation ≈ the
“equality” of cardinal numbers does not mean that it will behave like equality.
We still have to prove the desired properties (Theorem 2.4) in Appendix A.
As a case in point, consider our notion of inequality. Note that, for any
sets A, B if A ⊆ B then |A| ≤ |B|. The needed one-to-one function i : A → B
is called the inclusion function. It is defined by i(a) = a for all a ∈ A. (Of
course, i is onto if and only if A = B.) Since the closed interval [−2, 2] is a
subset of R, we have |[−2, 2]| ≤ |R|. By the same token, the open interval
(−2, 2) is a subset of the closed interval [−2, 2] and so |(−2, 2)| ≤ |[−2, 2]|.
By Example 3 above, R ≈ (−2, 2). Taking the composition (see Theorem
A.4) we obtain |R| ≤ |[−2, 2]|. Thus both |[−2, 2]| ≤ |R| and |R| ≤ |[−2, 2]|
are true. Can we conclude that |[−2, 2]| = |R|? What is the one-to-one
correpondence g : [−2, 2] → R? The function f : (−2, 2) → R defined in
Example 3 does not extend to [−2, 2]. We made crucial use of the fact that
the 2 and −2 were not included when we defined f . Besides f is already
onto; where would we send the extra points 2 and −2? We must construct
an altogether new one-to-one correpondence and it is not at all clear how!
In general, the question we are asking is whether, for any sets A and B,
if |A| ≤ |B| and |B| ≤ |A| can we conclude that |A| = |B|? Such a theorem
would require producing, out of thin air, a one-to-one correspondence from
two unrelated one-to-one functions. That this can be done is the substance
of a deep theorem conjectured by Cantor and proved by Schroeder and Bernstein. We give the proof in Appendix B.
10
Theorem 2.6 (Schroeder-Bernstein) Suppose f : A → B and g : B → A are
one-to-one functions. Then there exists a one-to-one correspondence between
A and B. 2
3. Countable and Uncountable Sets.
Definition 3.1. We say a set is A is countable if |A| ≤ |N|. If |A| = |N|
then we say A is countably infinite. If A is neither countable nor countably
infinite we say A is uncountable.
Note that all finite sets are countable. Of course, the natural numbers N
themselves are countably infinite. Suppose A is countably infinite. Let f :
N → A be a one-to-one correspondence. Let a1 = f (1) ∈ A, a2 = f (2) ∈ A,
etc. Then, since f is one-to-one, each of the ai are distinct elements of A.
Since f is onto A = {a1 , a2 , a3 , . . .}. Conversely, if A = {a1 , a2 , a3 , . . .} we can
define a one-to-one correspondence between N and A directly: namely, send
1 to a1 , 2 to a2 , etc. Thus the countably infinite sets are precisely those whose
elements can be listed or “counted” in an exhaustive way. This observation
gives a method for proving a set A is countable; namely, we show that there
is an exhaustive way to list the elements of A in the form above. For example,
Theorem 3.2 The integers Z are countably infinite.
Proof. Note that Z = {0, 1, −1, 2, −2, 3, −3, . . .}. This is clearly an exhaustive list. 2
Somewhat more interesting is
Theorem 3.3 The rationals Q are countably infinite.
Proof. It suffices to prove that the postive rationals, written Q+ , are countably infinite. For we can then use the trick for dealing with negatives and
zero above.
To each postive rational p/q we associate the natural number p + q, the
sum of numerator and denominator. Of course, there are many rationals for
11
each sum. For example, 1/4, 2/3, 3/2 and 4/1 all have sum 5. In fact, there
are clearly n − 1 different rationals for each sum n > 1. We will order the
rationals first by the size of this sum and then, for each sum n, in increasing
order of numerator. What we obtain is the following list of the positive
rationals:
Q+ = {1/1, 1/2, 2/1, 1/3, 2/2, 3/1, 1/4, 2/3, 3/2, 4/1, . . .}.
Note that this is an exhaustive list: a given rational p/q appears in the
pth position among the rationals with sum p + q. Unfortunately, there are
many repetitions on this list, e.g., 1/1, 2/2, . . . . However, if we delete all representations of a positive rational except the first, what results is the needed
exhaustive list. 2
The countability of the rationals is a surprising result since there appear
to be many more rationals then integers. For example, what is the smallest
rational number greater than zero? A moment’s thought reveals that there
is none. In fact, between any two real numbers r and s there always lies a
rational number. For this reason, the rationals are said to be dense in R.
Nonetheless, we managed to find a new way of ordering the rationals which
exhausted them all. In perhaps his most famous result, Cantor showed that
such a trick could not be played on the real numbers. The technique of the
proof is known as “Cantor’s Diagonal Argument”.
Theorem 3.4 The real numbers R are uncountable.
Proof. It suffices to prove that the interval (0, 1) is not countably infinite.
We will show that in any list of real numbers between 0 to 1 there will always
be at least one missing number between 0 and 1. The result follows.
So let x1 , x2 , x3 , x4 , . . . be any list of real numbers in (0, 1). We may write
these number in their decimal expansion:
x1
x2
x3
x4
..
.
= .a11 a12 a13 a14 . . .
= .a21 a22 a23 a24 . . .
= .a31 a32 a33 a34 . . .
= .a41 a42 a43 a44 . . .
12
Here aij is the jth decimal digit of xi . Define a number z digit by digit as
follows: Let z1 be any integer from 0 to 9 other than a11 . Let z2 be any
integer from 0 to 9 other than a22 . Continue in this way to obtain z3 , z4 , . . .
with the property that zi 6= aii .3 Let z = .z1 z2 z3 z4 .... Then z ∈ (0, 1) since it
is a pure decimal. Moreover, z 6= xi for all i because z and xi differ in the
ith digit. 2
With our cardinality notation, Theorems 3.2-4 can be summarized in one
line:
|N| = |Z| = |Q| < |R|.
A natural question now arises: Is there any set A whose cardinality lies
strictly between that of the natural numbers and the reals? We know A cannot be an interval (a, b) since |(a, b)| = |R|. On the other hand, A must be
larger than the rationals. Based on this evidence, Cantor made the following
conjecture:
3.5 The Continuum Hypothesis There is no set A with
|N| < |A| < |R|.
Proving or refuting Cantor’s Hypothesis and its generalization (see §5)
were listed among the most important open problems by David Hilbert in
his famous address to the International Congress of Mathematicians at the
turn of the twentieth century. Significantly, the ultimate resolution of these
problems were not theorems in set theory but rather meta-theoretic results
to the effect that neither a proof nor a disproof of these hypotheses is possible
within the context of set theory.
4. Countability of Unions and Cartesian Products.
In this section, we consider the extent to which countability is preserved
under the set operations unions and cartesian products. We settle the question for unions and leave cartesian products as exercises. Our first result
3
To be more precise, we should say zi = 2 if aii = 1 and zi = 1 if aii 6= 1.
13
asserts that a countable union of countable sets is still countable.
Theorem 4.1 Let A1 , A2 , A3 , . . . be any collection of countable sets. Then
the union A = A1 ∪ A2 ∪ A3 ∪ · · · is also a countable set.
Proof. We may as well assume each set Ai is countably infinite and that
there are no intersections. As usual, the former means we can exhaustively
list the elements of Ai . Write Ai = {ai1 , ai2 , ai3 , . . .} so that aij is the jth
element of Ai . We can then obtain an exhaustive listing of A by ordering the
elements aij in increasing order of the sum of the subscripts i + j. For each
sum we list in increasing order of i. Specifically,
A = {a11 , a12 , a21 , a22 , a13 , a22 , a31 , . . .},
is the needed exhaustive list of A. 2
We can now deduce a remarkable consequence of the fact that |Q| < |R|.
Let I = R − Q, the irrational numbers. Irrational numbers are characterized as having
It is elementary to
√ infinite, nonrepeating decimal expansions.
√
prove that 2 ∈ I. In fact, for all m ∈ N, m ∈ I unless m is a perfect
square. Other famous irrational numbers include π and e.
Corollary 4.2 The irrational numbers I are uncountable.
Proof. If I is countable then R = Q ∪ I is also, contradicting Theorem
3.4. 2
Exercise 5. Let S be any infinite set. Prove that S has a countably infinite subset A. Thus |N| is the smallest infinite cardinal.
(Your argument will require the Axiom of Choice.)
Exercise 6. Let S be any infinite set and B any countably infinite set.
Prove that |S ∪ B| = |S|.
Hint: Use Exercise 5.
Exercise 7. Prove that a set A is infinite if and only if A is in one to one
14
correspondence with a proper subset of itself.
Exercise 8. Let A and B be countable sets. Prove that the cartesian product A × B is also countable.
Exercise 9. Use induction and Exercise 5 to prove that, for any n ≥ 1, if
A1 , A2 , . . . , An are countable sets then so is A1 × A2 × · · · An .
Exercise 10. Prove that the infinite cartesian product N2 × N2 × · · · is an
uncountable set. Hint: Note that a typical element of this set is an infinite
sequence of 1s and 2s. Use Cantor’s Diagonal Argument.
Corollary 4.2 was, as you can imagine, anathema to the many nineteenth
century mathematicians and philosophers who did not believe in the existence
of irrational numbers. Cantor not only proved the existence of such numbers
(without actually producing one example!), but he proved that there are
fundamentally more of these strange numbers then there are the familiar
rationals. Corollary 4.2 shows that, although the rationals are dense in R,
they are actually very rare; “most” real numbers are irrational.
An even more surprising result uses the full strength of Theorem 4.1.
Define a real number a to be algebraic if a is a root of any polynomial
P (x) = an xn + an−1 xn−1 + · · · a1 x + a0
whose coefficients a0 , a1 , . . . , an are all rational numbers. Let A be the set
of all algebraic numbers. Of course, Q ⊆ A since every
√ rational satisfies a
linear polynomial with rational coefficients. Note that m ∈ A for all rational m: the polynomial is P (x) = x2 − m. Thus many irrational numbers are
algebraic numbers. Nonetheless, we have
Theorem 4.3 The algebraic numbers A are countable.
Proof. We will express A as a countable union of countable sets. The result
then follows from Theorem 4.1.
Let A1 be the set of all real numbers which satisfy linear polynomials
with rational coefficients. Then A1 = Q and so is countable.
Let A2 be the set of all real numbers which satisfy quadratic polynomials
with rational coefficients. A quadratic polynomial P (x) is determined by
three rational numbers, a0 , a1 , a2 , the constant, linear and quadratic coeffficients. Thus the set of all quadratic polynomials with rational coefficients is
15
in one-to-one correspondence with the cartesian product Q × Q × Q. This
set is countable by Exercies 7. Thus we can exhaustively list the quadratic
polynomials with rational coefficients: P1 (x), P2 (x), . . . . Now each quadratic
polynomial has at most two roots. Let ai1 , ai2 be the roots for Pi (x). (If Pi (x)
has duplicate roots we should delete one of these; if Pi (x) has no roots we
will just omit Pi (x) altogether.) We can now exhaustively list the elements
of A2 ; namely, A2 = {a11 , a12 , a21 , a22 , a31 , a32 , . . .}.
In general, let An be the set of all real numbers which satisfy nth degree
polynomials with rational coefficients. The set of all nth degree polynomials
with rational coefficients is countable, by Exercise 7 again, since it is in
one-to-one correspondence with the cartesian product of n + 1 copies of the
rationals. Thus we can enumerate these polynomials: P1 (x), P2 (x), . . . and let
ai1 , . . . , ain be the roots of Pi (x) with the same caveats above. An exhaustive
listing of An is then given by
An = {a11 , a12 , . . . , a1n , a21 , a22 , . . . , a2n , a31 , a32 , . . . a3n , . . .}.
Finally, observe that A = A1 ∪ A2 ∪ A3 · · · . 2
Now let T = R − A; T is called the set of transcendental numbers.
These are real numbers which do not satisfy any polynomial with rational
coefficients. It took two centuries for mathematicians to prove that π is
transcendental. The proof, given in 1882 by C.L.F. Lindemann, was one of
the most celebrated results of the nineteenth century. Nine years earlier,
using difficult techniques of complex analysis Charles Hermite had proved
that e was transcendental. Thus the following corollary to Theorem 4.3 is
remarkable since it guarantees that “most” real numbers are transcendental
– despite the fact that we can only name a few!
Corollary 4.4 The transcendental numbers T are uncountable. 2
5. Power Sets and Sizes of Infinity.
Cantor’s Diagonal Argument (Theorem 3.4) establishes that there are
different sizes of infinity. Specifically, we know that there are countably infinite sets like N and Q and then there are the real numbers R which are
16
uncountable. The Continuum Hypothesis proposes that there are no set sizes
in between. In this section we address the question whether there are any
cardinal numbers larger then |R|. We begin, in the spirit of the previous section, by proving that countability is not preserved by the power set operation.
Theorem 5.1 The power set of the natural numbers P (N) is uncountable.
4
Proof. We use Cantor’s Diagonal Argument to argue that there can be no
exhaustive list of all subsets of N. So suppose that A1 , A2 , A3 , . . . is any list
of subsets of N. We construct a subset A ⊆ N which is not on the list. If
1 ∈ A1 we do not include 1 in A. If 1 6∈ A1 then we do put 1 in A. Similarly,
we include 2 in A if and only if 2 6∈ A2 and so on for each i ∈ N. The precise
definition of A is A = {i ∈ N|i 6∈ Ai }. It is obvious that A does not coincide
with any Ai and so is not on the list. 2
To find a set with cardinality larger than that of R we might look at the
power set P (R). Since |P (N)| > |N|, perhaps the same is true for R. The
following result, known as Cantor’s Theorem, asserts remarkably that this
not only works for N and R but for any set A!
Theorem 5.2 For any set A, |A| < |P (A)|.
Proof. We must prove 1) that there is a one-to-one function f : A → P (A),
and 2) that there can not be any one-to-one function g : P (A) → A.
Part 1) is easy: Define f : A → P (A) by f (x) = {x} for all x ∈ A. It is
obvious that f is one-to-one since {x} = {y} as subsets of A if and only if
x = y in A.
To prove 2), suppose there is a one-to-one function g : P (A) → A. We
then define a function h : A → P (A) as follows: Set h(a) = φ if a ∈ A is
not in the image of g. Otherwise, a = g(X) for some unique subset X ⊂ A
(since g is one-to-one). Set h(a) = X. Observe that, by definition, h is onto.
Consider the subset R of A defined by R = {x ∈ A|x 6∈ h(x)}. In words,
R is the set of all elements of A which are not members of their associated
subset h(x) in P (A). Since h is onto, there must be some element a ∈ A with
h(a) = R.
4
Using binary numbers, it is easy to prove the stronger result that P (N) ≈ [0, 1].
17
We now ask the following simple question: “Is a ∈ R?”
Suppose the answer is yes; i.e. that a ∈ R. Then a ∈ R = h(a). Thus a
does not satisfy the condition for membership of R. But this means a 6∈ R.
The answer “yes” is contradictory.
Suppose, on the other hand, the answer is no; that a 6∈ R. Then a 6∈ R =
h(a) and so a 6∈ h(a). But by its very definition, a should be in R. Thus
“no” will not do for an answer either. The only conclusion we can make in
the face of this contradiction is that the original assumption of a one-to-one
function g : A → P (A) is impossible. Thus |A| < |P (A)|. 2
Cantor’s Theorem has the amazing consequence that there are infinitely
many sizes of infinity. Specifically we have the following list of sets whose
cardinalities are in strictly increasing order:
N,
φ,
N1 ,
R,
P (R),
N2 ,
...,
P (P (R)),
Nm ,
...,
P (P (P (R))), . . .
Since R ≈ P (N) (see Footnote 4, above) the infinite sets in this sequence
are all generated by the power set operation. The question occurs, of course,
whether there are other sets which can be inserted in this sequence. Cantor’s
Generalized Continuum Hypothesis proposes that the answer is no:
5.3 The Generalized Continuum Hypothesis. Given any infinite set
A, there is no set B with |A| < |B| < |P (A)|.
Exercise 11. Let A be any countable set and n any natural number. Prove
that the set of all subsets of A with exactly n elements is a countable set.
(Try induction.)
Exercise 12. Let A be any countable set. Prove that the set F (A) consisting of all finite subsets of A is a countable set. (Use Exercise 11.)
6. Well-Orderings, Ordinal and Cardinal Numbers.
The purpose of this section is to develop the theory of cardinal numbers
of sets. Our goal is to assign to each set A a “number” |A| called the cardinal
18
number of A which will capture the size of A. Of course, we recognize that
if A is an infinite set then |A| will not be a number in any ordinary sense of
the word. What do we intend |A| to mean? What do we want a system of
cardinal numbers to do for us? The second question is easier to answer. We
would like our cardinal numbers to allow us to compare any pair of sets A
and B: to say either A and B have the same cardinal number, or, that one
has a larger cardinal number than the other. This goal is the basis for our
construction.
In §2, we defined the inequality of cardinal numbers |A| ≤ |B| for any
two sets A, B to mean that there exists a one-to-one function f : A → B.
Recall that |A| ≤ |B| defines a relation on any set of sets. The SchroederBernstein Theorem shows that the inequality of cardinal numbers behaves as
expected with respect to the equivalence relation ≈, one-to-one correspondence. Specifically, |A| ≤ |B| and |B| ≤ |A| implies |A| = |B|.
So what’s the problem? Well, very simply, given two sets A and B do we
know that one of the following |A| ≤ |B| or |B| ≤ |A| must be true? Given
two sets A and B must there either be a one-to-one function f : A → B
or, if not, a one-to-one function from g : B → A? If you think about it for
a moment, you’ll agree that it seems highly plausible. Unfortunately, there
also seems to be no direct way to prove it. The answer, as we will see, is
yes. Happily, the constructions needed for the proof also furnish us with a
natural way to define the cardinal numbers.
The properties we want to be true for inequality of cardinal numbers are
those of an order relation. The official definition is
6.1 Definition. Let S be a set and ≤ a relation on S. Then ≤ is called an
order relation on S if i) (comparability) for all x, y either x ≤ y or y ≤ x, ii)
(antisymmetry) for all x, y ∈ S x ≤ y and y ≤ x if and only if x = y and iii)
(transitivity) for all x, y, z ∈ S if x ≤ y and y ≤ z then x ≤ z.
If ≤ is an order on a set S we write x < y whenever x ≤ y but x 6= y.
It is important to notice that inequality of cardinal numbers does not
define an order relation on most sets of sets. The problem is not i), which we
will prove below, but ii) – even with the Schroeder-Berstein Theorem. The
point is that the equality referred to in ii) is genuine equality. If |A| ≤ |B|
and |B| ≤ |A| then we know |A| = |B|. But this just means that there is a
one-to-one correspondence between A and B, not that they are equal as sets.
19
Suppose, however, that we focus on a collection of sets no two of which
are in one-to-one correspondence. Then |A| ≤ |B| and |B| ≤ |A| would imply
A = B as sets! This observation becomes our strategy. We will prove the
comparability of all sets. We will then construct a canoncial collection of
sets C with the property that for any set A, A ≈ C for exactly one set C in
our collection. The sets C will be called the cardinal numbers.
We thus undertake to prove the comparability of all sets under the relation of inequality of cardinal numbers. Our method will be to consider the
class of well-ordered sets, prove that these sets are all comparable, and then
prove actually every set can be well-ordered. We begin with
6.2 Definition. Let ≤ be an order relation on a set S. We say that ≤
well-orders S if, for every nonempty subset A ⊆ S, there exists an element
a0 ∈ A with the property that a0 ≤ a for all a ∈ A.
6.3 Examples. 1) The natural numbers N are well-ordered by the usual
inequality relation ≤ . However, none of the other number sets Z, Q, R are
well-ordered by inequality since many subsets of these sets will not have a
least element.
2) Given any set S of sets, we have the relation of set inclusion ⊆ which
clearly satisfies ii) and iii). Of course, ⊆ will usually not be an order because
most sets are not comparabe. A set C of sets which is ordered by ⊆ is called
a chain. For example,
{φ, {1}, {1, 2}, {1, 2, 3}, . . . , N}
is a chain contained in P (N).
3) Since sets can themselves be elements of sets, ∈ is a relation on any
set S of sets also. Note that, unless there is a set A ∈ S which is an element
of itself, pairs of sets A, B ∈ S will not satisfy both A ∈ B and B ∈ A. Thus
condition ii) above will be vacuously true. Condition iii) is obviously true.
But, again, i) will be false for most S. Examples of sets S which are ordered
by ∈ can be obtained by taking any set A and forming the set B = A ∪ {A}
whose elements are those of A plus one more element, the set A itself. Of
course, A ∈ B. Next let C = B ∪ {B}. Then S = {A, B, C} is well-ordered
by ∈ . We could continue this process to obtain an infinite set well-ordered
20
by ∈, namely
S = {A, A ∪ {A}, A ∪ {A ∪ {A}}, A ∪ {A ∪ {A}} ∪ {A ∪ {A ∪ {A}}, . . .}.
Note that S ≈ N.
Some important features of well-ordered sets follow directly from the definition. Let S be well-ordered by a relation ≤ . Taking S to be the subset
of S in the definition, we see that S has a least element, say s0 . Now let
S1 = S − s0 . If S1 is nonempty, then it has a least element, say s1 . Continuing, we obtain an increasing list of elements s0 < s1 < s2 < s3 < · · · of S.
Does this mean that S is countable? Not necessarily. There could be many
elements of S not obtainable in this way. While every non-maximal element
in a well-ordered set S has an immediate successor (Exercise 13, below) many
elements besides the minimal element s0 may not have an immediate predecessor. For example, in the chain given above in P (N), the element N has
no immediate predecessor.
A given set S well-ordered by ≤ contains many well-ordered subsets.
Given any element x ∈ S we form the initial segment determined by x by
letting S(x) = {s ∈ S|s < x}. Clearly, S(x) is itself well-ordered by ≤ . Note
that x 6∈ S(x).
Exercise 13. Let S be a set well-ordered by ≤ . Suppose A ⊆ S has
the property that, for all s ∈ S, if s ≤ a for some a ∈ A then actually s ∈ A.
Prove that, either A = S or A is an intial segment of S; i.e., that A = S(x)
for some x ∈ S. (Hint: Consider the complement of A in S.)
Exercise 14. Let S be a set well-ordered by ≤ . Let x ∈ S. Suppose X is
not maximal in S; i.e., there exists an element z ∈ S with z > x. Prove that
there then exists an element s(x) in S, called the immediate successor, of x
such that i) x < s(x) and ii) if x < z then s(x) ≤ z, as well. (Hint: Consider
the complement of the set S(x) ∪ {x} in S.)
We now set out to prove that all well-ordered sets are comparable by oneto-one functions. That is, for all well-ordered sets S, T , either |S| ≤ |T | or
|T | ≤ |S|. Our method is to single out a canonical class of well-ordered sets
called ordinals and prove that every well-ordered set is in one-to-one correspondence with one of these. We then prove that the ordinals are themselves
21
all comparable. Our definition recalls Example 3 above.
6.4 Definition. We say a set O of sets is an ordinal if i) O is well-ordered
by ∈ and ii) if A is any object and A ∈ S for some set S ∈ O then actually
A ∈ O.
Of course, the empty set φ is an ordinal. The sets S constructed in
Example 3, however, are not likely to be ordinals. For if A has any elements
then these must be included in S by ii), which they are not. In fact, the only
way that the construction in Example 3 will work is if we take A = φ. We
have then the following list of ordinals:
φ, {φ}, {φ, {φ}}, {φ, {φ}, {φ, {φ}}}, {φ, {φ}, {φ, {φ}}, {φ, {φ}, {φ, {φ}}}, . . . .
In general, if O is an ordinal then so is the set O ∪ {O}. This operation,
which produces a new ordinal from a given one, is called the successor operation. Let us write succ(O) = O ∪ {O}. Note that the sequence of ordinals
above was generated by the successor operation. For convenience, we will
give these ordinals new names. Let 0̄ be the empty set φ. Let 1̄ = succ(0̄).
In general, let n̄ = succ(n − 1) for n ∈ N. We have simply made an explicit correspondence between the nonempty sets on our list and the natural
numbers.
Notice that all the all the ordinals created so far form a chain; that is,
they are ordered (in fact, well-ordered) by ⊆. Thus our next result gives a
method for creating a new ordinal from these.
Theorem 6.5 Let {Oα } be any chain of ordinals. Then the union
O=
[
α
Oα
is an ordinal, as well.
Proof. We must first check the three conditions in Definition 6.4. Let
A, B ∈ O. Then, since the Oα form a chain, we can find Oα large enough
so that both A, B ∈ Oα . But now Oα is an ordinal and so, in particular,
ordered by ∈. Thus A and B are comparable with regard to ∈ and obey
antisymmetry. Of course, the transitivity of ∈ holds for any set of sets.
22
To see that ∈ well-orders O, let A be any nonempty subset of O. Choose
Oα so that A∩Oα 6= φ. Let S ∈ A∩On be the least element of this nonempty
subset of the ordinal Oα . Suppose there exist R ∈ A with R ∈ S. Then, by
property ii) in Definition 6.4, R ∈ Oα ; i.e., R ∈ A ∩ Oα But this means S is
not the least element of this subset and so there can be no such R. Thus S
is the least element of A.
Finally, suppose R ∈ S where S ∈ O. Again, find Oα large enough so
that S ∈ Oα . Since Oα is an ordinal R ∈ Oα and so R ∈ O. 2
The ordinal O in Theorem 6.5 is called the limit ordinal of the chain
{Oα }. We obtain our first infinite ordinal by taking the limit of the finite
ordinals listed above. Specifically, let
ω = 1̄ ∪ 2̄ ∪ 3̄ ∪ · · · .
Note that ω ≈ N. This “construction” of the natural numbers is due to John
Von Neuman.
Of course, now that we have ω we can use our successor operation to
obtain another list of new ordinals. Given any ordinal O, let O + n̄ denote
the result of applying the successor function n-times to O. We then have the
following list of infinite ordinals:
ω, ω + 1̄, ω + 2̄, ω + 3̄, . . . , ω + n̄, . . . .
Taking the limit of these ordinals, we obtain a new ordinal which we call
ω + ω or ω · 2̄. And, having produced ω · 2̄, we can apply the successor
function repeatedly to obtain ω · 2̄ + n̄. Taking the limit ordinal yields ω · 3̄
and so on. Moreover, we can take the limit of all the ordinals obtained by this
process to get ω · ω = ω 2̄ . Exponentials are obtain similarly. To summarize,
23
we have the following list of ordinals in increasing order of ∈:
φ,
ω,
ω · 2̄,
ω · 3̄,
..
.
1̄,
ω + 1̄,
ω · 2̄ + 1̄,
ω · 3̄ + 1̄,
2̄,
ω + 2̄,
ω · 2̄ + 2̄,
ω · 3̄ + 2̄,
3̄,
ω + 3̄,
ω · 2̄ + 3̄,
ω · 3̄ + 3̄,
...,
...,
...,
...,
..
.
n̄,
ω + n̄,
ω · 2̄ + n̄,
ω · 3̄ + n̄,
...
...
...
...
ω · n̄,
..
.
ω · n̄ + 1̄,
ω · n̄ + 2̄,
ω · n̄ + 3̄,
. . . , ω · n̄ + n̄,
..
.
...
ω 2̄ + 1̄,
ω 2̄ + 2̄,
ω 2̄ + 3̄,
ω 2̄ ,
ω 2̄ + ω, ω 2̄ + ω + 1̄, ω 2̄ + ω + 2̄, ω 2̄ + ω + 3̄,
..
.
. . . , ω 2̄ + n̄,
...
2̄
. . . , ω + ω + n̄, . . .
..
.
ω 3̄ ,
..
.
ω 3̄ + 1̄,
ω 3̄ + 2̄,
ω 3̄ + 3̄,
. . . , ω 3̄ + n̄,
..
.
...
ω n̄ ,
..
.
ω n̄ + 1̄,
ω n̄ + 2̄,
ω n̄ + 3̄,
. . . , ω n̄ + n̄,
..
.
...
ωω ,
..
.
ω ω + 1̄,
ω ω + 2̄,
ω ω + 3̄,
. . . , ω ω + n̄,
..
.
...
It will turn out that the ordinals obtainable by the process described
above are actually all the ordinals. In fact, we will ultimately prove that, up
to one-to-one correspondence, every set appears on this list somewhere! To
begin, we prove that every well-ordered set appears on this list.
Actually we prove a little more. Given sets S and T ordered by ≤ and ≤0 ,
respectively, we say a function f : S → T is order-preserving if x1 ≤ x2 in S
implies f (x1 ) ≤0 f (x2 ) in T . An order preserving one-to-one correspondence
f : S → T is called an isomorphism of ordered sets. In this case, we write
S∼
= T.
Theorem 6.6 Let S be any set well-ordered by a relation ≤ . Then S is
isomorphic to a unique ordinal O. In fact, O is obtained by repeated use of
the successor function and limit process applied to the empty set, as described
above.
Proof. If S is empty there is nothing to prove. Suppose S has one element.
24
Note that every one element set is trivially well-ordered. Thus we should
prove that there is only one ordinal with one element, namely 1̄ = {φ}. Well
suppose O = {A} is a one element ordinal. Suppose a ∈ A. Then, by the
definition of an ordinal, a ∈ O which is false. Thus A = φ.
Of course, we won’t get far in this step-by-step manner. Thus we shift
gears. Using Theorem 6.5, we show that if our current theorem is true for an
intial segment S(y) of a well-ordered set S then it is true for S(y) ∪ {y}, as
well. Specifically, suppose y ∈ S has the property that for all x < y there is
an isomorphism of S(x) ∪ {x} onto a unique ordinal created by our process.
Let us write Ox for the unique ordinal from our list and fx : S(x)∪{x} → Ox
for the isomorphism. Of course, the ordinals Ox for x < y form a chain: any
collection of ordinals created by our process forms a chain! Thus let
O=
[
x<y
Ox .
By Theorem 6.5, O is an ordinal. We define an isomorphism f : S(y) → O
as follows. Given x ∈ S(y) since x < y we can use the isomorphism fx and
set f (x) = fx (x). The point here is that the functions fx are all compatible.
That is, t ≤ x implies ft = fx on S(t) ∪ {t}) since both are order preserving
one-to-one correspondences. We can now extend f to an isomorphism g :
S(y) ∪ {y} → O ∪ {O} by setting g(x) = f (x) for x ∈ S(y) and g(y) = {O}.
It is obvious that g is an isomorphism. Moreover, since each of the Ox were
unique ordinals it is clear the the ordinal O ∪ {O}, is also unique.
We can now complete the poof with one trick. Define a subset R of S by
R = {y ∈ S| for all x ≤ y, S(x) ∪ {x} ∼
=O
for some unique ordinal O obtainable by our process.}
We wish to prove R = S.
First observe that R 6= φ. For, if s0 is the least element in S then trivially
s0 ∈ R by the first paragraph of the proof. Now suppose, for a contradiction, that R 6= S. Using Exercise 13, we see that, by its very definition, R
is an initial segment of S. Write R = S(y0 ) for some y0 ∈ S. Recall that
y0 6∈ R. Using the second paragraph of the proof, we can find an isomorphism
S(y0 ) ∪ {y0 } onto a unique ordinal O. But this means y0 ∈ R = S(y0 ), a
contradiction. We conclude that R = S, as needed. 2
25
Corollary 6.7 Let O be any ordinal. Then O is obtained by repeated use of
the successor function and limit process applied to the empty set.
Proof. This is a direct consequence of the uniqueness in Theorem 6.6. 2
Corollary 6.8 Let S and T be any two well-ordered sets. Then either
|S| ≤ |T | or |T | ≤ |S|.
Proof. By Theorem 6.5, S ∼
= O2 for some ordinals O1 , O2 .
= O1 and T ∼
But the ordinals are clearly ordered by ⊆ by our construction. Thus either
O1 ⊆ O2 or O2 ⊆ O1 . Since set inclusion determines a one-to-one function,
the result follows from the transitivity of inequality of cardinal numbers. 2
With Corollary 6.8, we are one step away from completing our goal. All
we need to know now is that every set can be well-ordered by some order
relation ≤. This remarkable fact is known as the Well-Ordering Theorem.
We give the proof in Appendix C.
Theorem 6.9 Given any set S there exists an order relation ≤ such that
S is well-ordered by ≤ . 2
With the Well-Ordering Theorem, Theorem 6.6 and Corollary 6.8 now
holds for all sets.
Corollary 6.10 Given any set S there exists a unique ordinal O such that
S ≈ O. 2
Corollary 6.11 Given any pair of sets S, T either |S| ≤ |T | or |T | ≤ |S|. 2
It is interesting to consider what a well-ordering of the reals R would look
like. What would be the least element? How would the order be defined?
Unfortunately, the proof of the Well-Ordering Theorem doesn’t give much of
a clue.
A natural question arises from Corollary 6.10; namely, which ordinal on
our list corresponds to R? Answering this question amounts to resolving the
26
Continuum Hypothesis.
We conclude this section by defining the cardinal numbers. Remember
that we would like to have exactly one cardinal number for each set size.
Thus we define the cardinal numbers to be those ordinals which occur first
on our ordered list for each given cardinality. To be precise, we make
6.12 Definition. A cardinal number is an ordinal O with the property
that, if O 0 is another ordinal with O 0 ∈ O, then |O 0 | < |O|.
Thus the first cardinal number is 0̄ = φ, the second is 1̄ = {φ}. Each
of the ordinals n̄ are cardinal numbers; these are called the finite cardinal
numbers. We may write |Nn | = n̄. Of course, it is more natural to drop the
overline notation and write n = 0, 1, 2, . . . for the finite cardinals.
The next cardinal number is the limit ordinal ω. It is customary to denote
this cardinal number by ℵ0 . From Sections 3 and 4, have
|N| = |Z| = |Q| = |A| = ℵ0 .
Now note that ω + n̄ for all n ∈ N is still a countable set since the finite
union of countable sets is countable. In fact, by Theorem 4.1, ω · n̄ is countable since it is a countable union of countable sets. Of course, we know that,
at some point, we will encounter an uncountable ordinal on our list. Just
when, is a very good question! Let ℵ1 denote this first uncountable ordinal.
We can restate Cantor’s Continuum Hypothesis as:
6.13 The Continuum Hypothesis. |R| = ℵ1 .
By Cantor’s Theorem (Theorem 5.2), given any set A there exists a set,
namely the power set, P (A) with strictly larger cardinality. Using Corollary
6.10, we see that to each ordinal O we can assign a cardinal number ℵO so
that the ℵO are in strictly increasing order of cardinality. The list of cardinal
numbers thus begins
0, 1, 2, . . . , ℵ0 , ℵ1 , ℵ2 , . . . , ℵω , ℵω+1̄ , ℵω+2̄ , . . . .
The Generalized Continuum Hypothesis can be restated as
6.14 The Generalized Continuum Hypothesis. For any ordinal O,
|P (ℵO )| = ℵO+1̄ .
27
As a final note we observe that the results of this section add real meaning
to the statement, proved in §5, that there are infinitely many sizes of infinity.
Consider, for example, the meaning of ℵω . This is not just an uncountable
set. This is an uncountable set strictly larger than any uncountable set constructable by applying the power set operation to N. We have thus proven
the existence of sets so large that they cannot be constructed directly, by any
obvious operation, from our number sets. Now consider the meaning of ℵℵ1
or ℵℵ2 or ℵℵω . . .
Appendix A. Inverses and Composition of Functions.
The purpose of this section is to prove that the relation A ≈ B – oneto-one correspondence of sets – is an equivalence relation. We will need two
concepts from high-school algebra: inverses and composition of functions.
A.1 Definitions. Let f : A → B and g : B → C be functions between
sets A, B and C. The composite function g ◦ f : A → C is defined by the
rule g ◦ f (x) = g(f (x)) for all x ∈ A.
Let f : A → B and g : B → A be functions. We say f and g are inverse
functions if g ◦ f (x) = x for all x ∈ A and f ◦ g(y) = y for all y ∈ B.
1
. Check
A.2 Example. Define f : (1, +∞) → (−∞, 0) by f (x) = 1−x
y−1
that g : (−∞, 0) → (1, +∞) defined by g(y) = y is an inverse function for
f.
The following theorem shows the connection between the existence of inverse functions and one-to-one correspondences.
Theorem A.3 Let f : A → B be any function. Then f is a one-to-one
correspondence if and only if f has an inverse function g : B → A. Moreover, when it exists the inverse function g is unique.
Proof. We must prove two implications: 1) that f a one-to-one correspon28
dence implies there exists an inverse g and 2) that the existence of an inverse
g implies f is a one-to-one correspondence.
So suppose f : A → B is a one-to-one correspondence. Define g : B → A
by the following rule: given y ∈ B define g(y) = x where x ∈ A is any
element with the property that f (x) = y. Note that this is a fairly indirect
way to define a function. We must make sure, first and foremost, that our
definition truly assigns a uniquely determined output x ∈ A to each y ∈ B;
that is, we must make sure g is a function! Well, observe first that since f is
onto, for each y ∈ B there will be at least one x ∈ A with f (x) = y. On the
other hand, since f is one-to-one there will be at most one element x ∈ A
with f (x) = y. Thus g is a function. By construction, g is an inverse for f .
A moment’s thought shows that g could not be defined any other way: thus
g is the unique inverse for f .
The second (converse) implication is more straightforward. Suppose g :
B → A is an inverse function for f : A → B. Let x1 , x2 ∈ A be given and suppose f (x1 ) = f (x2 ). Then, applying g to both sides and using the fact that g
is an inverse for f we obtain x1 = g(f (x1 )) = g(f (x2 )) = x2 : f is one-to-one!
For onto, take any y ∈ B. Let x = g(y) ∈ A. Note that y = f (g(y)) = f (x),
as needed. 2
Theorem A.4 Let f : A → B and g : B → C be any functions. If both f
and g are one-to-one (respectively, onto) then so is the composite function
g ◦ f : A → C. In particular, the composition of one-to-one correspondences
is itself a one-to-one correspondence.
Proof. This proof is an easy but important exercise with definitions. Suppose
first that both f and g are one-to-one. Let x1 , x2 ∈ A and suppose
g ◦ f (x1 ) = g ◦ f (x2 ).
Then, by definition of the composition, g(f (x1 )) = g(f (x2 )). Since g is oneto-one we conclude that f (x1 ) = f (x2 ). But now, f is one-to-one also, and
so we obtain x1 = x2 , as needed.
Next, suppose f and g are both onto. Let z ∈ C be any element. Then,
since g is onto, there exists y ∈ B with g(y) = z. Moreover, since f is onto,
we can find x ∈ A with f (x) = y. Now observe that
g ◦ f (x) = g(f (x)) = g(y) = z,
29
as needed. 2.
We have done the hard work for the proof of our main result: that one-toone correpondence defines an equivalence relation.
Theorem 2.4 Let S be any set of sets. The relation ≈ of one-to-one correspondence is an equivalence relation on S.
Proof. To prove reflexivity, we must take any set A and prove there is a
one-to-one correspondence from A to itself. The identity function i : A → A
defined by i(x) = x for all x ∈ A is clearly one-to-one and onto, as needed.
For symmetry, assume that there is a one-to-one correspondence f : A →
B. We must produce a one-to-one correspondence g : B → A. By the first
implication of Theorem A.3, f has an inverse g : B → A. And by the second
implication of Theorem A.3, g is a one-to-one correspondence since it has an
inverse, namely f ! We have symmetry.
Finally, for transitivity, suppose f : A → B and g : B → C are one-to-one
correspondences. We must prove A ≈ C. But, Theorem A.4 gives us that
g ◦ f : A → C is a one-to-one correspondence. This completes the proof. 2
Appendix B. The Schroeder-Bernstein Theorem.
In preparation for the proof of Theorem 2.6, we make the following notation convention. Let f : A → B be any function. Given a subset U ⊆ A
let
f (U ) = {f (u)|u ∈ U };
f (U ) is called the image of the subset U in B under f . Note that a one-toone function f : A → B always gives a one-to-one correspondence between
its domain A and its image f (A). By reducing the range, we force f to be
onto. In fact, for any any subset U ⊆ A, a one-to-one function f : A → B
determines a one-to-one correspondence between U and f (U ). In this case,
the function is called the restriction of f to U written f|U : U → U. The
definition is f|U (u) = f (u) for u ∈ U. We will use this fact in the proof.
Theorem 2.6 (Schroeder-Bernstein) Suppose f : A → B and g : B → A are
30
one-to-one functions. Then there exists a one-to-one correspondence between
A and B.
Proof. The proof of this theorem requires some real cleverness. The basic idea, however, is fairly straight-forward. Given a subset U we push it
over to B, via f to obtain f (U ). We then look at the “complement” of that
subset B − f (U ) in B. We then apply g to this complement to obtain a
subset g(B − f (U )) of A. Now suppose that, by some miracle, g(B − f (U ))
is nothing other than the complement of our originial set U in A; that is,
g(B − f (U )) = A − U. Then, f since is one-to-one, the restriction of f to U
defines a one-to-one correspondence
f|U : U → f (U ).
And since g is one-to-one, g defines a one-to-one correspondence
g|B−f (U ) : B − f (U ) → g(B − f (U )) = A − U.
Now, by Theorem A.3, since g|B−f (U ) is a one-to-one correspondence it has an
inverse which is also a one-to-one correspondence. Let h : A − U → B − f (U )
denote this inverse function. Combining f|U and h we obtain a one-to-one
correspondence from all of A onto all of B. Specifically, we define F : A → B
by
(
f (x)
if x ∈ U
F (x) =
h(x)
if x ∈ A − U.
It is clear that F is the needed one-to-one correspondence.
The only problem now is that we must prove that there is some subset,
let’s call it U0 to distinguish it, with the property that
g(B − f (U0 )) = A − U0 . The remainder of the proof is devoted to producing
this subset U0 ⊆ A.
Given any subset U ⊆ A define a subset Φ(U ) ⊆ A by
Φ(U ) = A − g(B − f (U )).
Note that Φ defines a function Φ : P (A) → P (A). Our search for U0 is really
a search for a “fixed point” of this function. Specifically, if we find U0 ⊆ A
with Φ(U0 ) = U0 then A − g(B − f (U0 )) = U0 . Taking the complement in A
of both sides, we obtain g(B − f (U0 )) = A − U0 which is what we want.
31
We next make a key observation about the function Φ. Given any subsets
U, V ⊆ A, if U ⊆ V, then Φ(U ) ⊆ Φ(V ). This fact is really a direct consequence of the “double negative” in the definition of Φ(U ). A one sentence
proof reads: if U ⊆ V then f (U ) ⊆ f (V ) and so B − f (V ) ⊆ B − f (U ) which
implies g(B − f (V )) ⊆ g(B − f (U )) which, in turn, gives A − g(B − f (U )) ⊆
A − g(B − f (V )).
Now define a subset U of the power set of A by
U = {U ⊆ A|U ⊆ Φ(U )}.
Thus U is the set of all subsets which “halfway” satisfy our desired condition.
Note that U is a nonempty subset of P (A) since φ ∈ U. We want U0 to be
the largest possible set in U. Thus we let U0 be the union of the all sets in
U:
[
U.
U0 =
U ∈U
We must prove that Φ(U0 ) = U0 .
We first show U0 ⊆ Φ(U0 ). Let x ∈ U0 be any element. Then x ∈ U for
some subset U ∈ U. But since U ∈ U, U ⊆ Φ(U ). Thus x ∈ Φ(U ). On the
other hand, U ⊆ U0 ( U0 contains all the subsets in U since it is the union of
them) and so Φ(U ) ⊆ Φ(U0 ) by the result of the preceding paragraph. Thus
x ∈ Φ(U0 ) and so U0 ⊆ Φ(U0 ).
We now sneak out the inclusion Φ(U0 ) ⊆ U0 for free. Since U0 ⊆ Φ(U0 )
our observation about Φ gives Φ(U0 ) ⊆ Φ(Φ(U0 )). But this means Φ(U0 )
satisfies the condition for membership in our collection U. Thus Φ(U0 ) ⊆ U0 .
This completes the proof. 2
Appendix C. The Well-Ordering Theorem and the Axiom of
Choice.
The purpose of this section is to prove Theorem 6.9: that every set can
be well-ordered. The Well-Ordering Theorem is a fundamental, global result
about the nature of sets. It is not surprising then that the proof requires
some crucial assumption about sets beyond our rather unhelpful definition
(Definition 1.1). In fact, we will prove that the Well-Ordering Theorem is
32
equivalent to a famously controversial axiom of set theory known as the
Axiom of Choice.
The motivation for the Axiom of Choice is simple. Given any set of
nonempty sets, we would like to be able to “choose” or distinguish a particular
element from each set. We don’t care what the chosen element is – we just
need to be able to fix one element in each set in the collection. While this
axiom may sound pretty modest, the Axiom of Choice has been shown to
lead to some startling consequences.
To make the statement precise, we think of the set of sets as the power
set of some large set S and the choice of an element in each subset of S as a
function F : P (S) → S. We have
C.1 The Axiom of Choice. Given any set S there exists a function
F : P (S) → S with the property that, if A ⊆ S is nonempty then F (A) ∈ A.
The function F is called a choice function.
We first prove that the Well-Ordering Theorem implies the Axiom of
Choice.
Theorem C.2 Given any set S if there exists a relation ≤ on S which
well-orders S then the Axiom of Choice is true for S.
Proof. The proof is easy. Given any nonempty subset A ⊆ S, by the
definition of the well-order ≤, there exists a least element a0 ∈ A. Define
F (A) = a0 . Since it doesn’t matter anyway, define F (φ) to be any element
of S. We then have our needed choice function F : P (S) → S. 2
The point of the proof above is that a well-order on a set S automatically distinguishes an element, namely the least element, of every nonempty
subset of S. The idea for the proof of the Well-Ordering Theorem is to try
to reverse this logic and use our choices to produce an order on S. We could
begin by letting the minimal element s0 of S be the “chosen” element of the
subset S of S. We could then let the next element in order be the “chosen”
element of the subset S − {s0 } of S etc. The difficulty in the proof is pushing this step-by-step process through, so that all of S is eventually ordered.
(Remember, S may not be countable!) We need to make one definition and
deduce one consequence of our work in §6. Recall that given an element x in
33
a well-ordered set S, the initial segment S(x) is defined by S(x) = {y|y < x}.
C.3 Definition. Given well-ordered sets S and T a function f : S → T
is called a well-order preserving function if f is order-preserving, one-to-one
and the image set f (S) is an initial segment of T .
It is easy to see that a well-order preserving function f : S → T carries
the least element of S to the least element of T , the second element of S to
that of T , etc. In fact, we have
Exercise 15.
Given a well-order preserving function f : S → T prove
that, for any x ∈ S, f (S(x)) = S(f (x)) : the image of the initial segment of
x in S under f is the initial segment of f (x) in T .
Using Theorem 6.6, we prove
Theorem C.4 Given any two well-ordered sets S and T there is either a
well-order preserving function f : S → T or a well-order preserving function
g : T → S.
Proof. By Theorem 6.6 we know that there are order preserving one-to-one
correspondences f1 : S → O1 and f2 : T → O2 between S and T and unique
ordinals O1 and O2 . Note that order-preserving one-to-one correspondences
between well-ordered sets are automatically well-order preserving. By our
construction of the ordinals, either O1 is an initial segment of O2 or O2 is an
initial segment of O1 . Without loss of generality, let us assume the former is
the case. Let g2 : O2 → T be the inverse for f2 . It is easy to see that g2 is
well-order preserving. Morover, the composition, g2 ◦f1 : S → T is well-order
preserving, as well. 2 newline
We are now prepared to prove
Theorem 6.9 Given any set S for which the Axiom of Choice is true, there
exists a relation ≤ on S which well-orders S.
Proof. Since we are assuming the Axiom of Choice, let F : P (S) → S
be a choice function. Our method will be to consider all subsets T ⊆ S
which can be well-ordered using the strategy above. Applying our standard
34
trick, we then union these subsets together and show that they must be all
of S.
Suppose T ⊆ S has a well-order ≤ . It is unlikely that our choice function
agrees with the order on T in the sense of the proof of Theorem C.3, above.
That is, given x ∈ T x may not be the element of S “chosen”by F from the
complement of the set of all y in T with y ≤ T ; i.e., F (S − S(x)) where here
S(x) is the initial segment of x in T . We will say that a well-order ≤ on a
subset T ⊆ S is good if for each x ∈ T, x was the “chosen” element of the
complement ; that is, if
F (S − S(x)) = x
for all x ∈ T.
We first prove the compatibility of all good well-orders. Suppose that T1
has a good well-order ≤1 and T2 has a good well-order ≤2 . We claim that
either T1 is an initial segment of T2 and ≤1 is the order on T1 inherited from
the order on T2 or vice-versa.
We use Theorem C.4. Reversing the roles of T1 and T2 if necessary, let
f : T1 → T2 be a well-order preserving function. Consider the subset X ⊆ T1
consisting of all x with the property that f (x) = x. If X is all of T1 then our
claim is proven. So suppose T1 − X is nonempty. Let x0 ∈ T1 − X be the
least element of this subset of T1 . Since f is well-order preserving, by Exercise
15 given any x ∈ T1 , f (S(x)) = S(f (x)) in T2 . Note that if x <1 x0 then
f (x) = x by our choice of x0 . Thus the inital segment of x0 in T1 is exactly
the same as the initial segment of f (x0 ) in T2 . We have S(x0 ) = S(f (x0 )).
Taking complements, we obtain
S − S(x0 ) = S − S(f (x0 ))
But now, since ≤1 and ≤2 are good well-orders we have
x0 = F (S − S(x0 )) = F (S − S(f (x0 ))) = f (x0 ).
Thus x0 ∈ X. This contradiction implies X = T1 which proves the claim.
We are now in a position to apply our limiting technique. Let T be the
set of all subsets of Tα ⊂ S with the property Tα has a good well-order ≤α .
Let R be the union of all the subsets in T . We define an order ≤ on R as
follows. Given x1 , x2 ∈ R choose Tα large enough so that x1 , x2 ∈ Tα . We
then define x ≤ y if and only if xα ≤α yα . The proof that ≤ well-orders R
35
now follows from the fact that the {Tα } is a chain, just as in the proof of
Theorem 6.5.
Let us prove that ≤ is a good well-order on R. Given x ∈ R, choose Tα
such that x ∈ Tα . By the the compatibility of the good well-ordered sets,
Tα is an intital segment of R. Thus the initial segment of x in R equals the
initial segment of x in Tα . Since ≤α is a good well-order on Tα , we have
x = F (S − S(x)),
as needed.
Finally, suppose R 6= S. Then we can use our choice function to pick the
next element. Specifically, let y = F (S − R) so that y ∈ S − R. We can
define a good well-order on R ∪ {y} by using the existing good well-order on
R together with the rule that x ≤ y for all x ∈ R. Since R is the union of
all subsets of S with good well-orders, R ∪ {y} ⊆ R. But this contradicts
the fact that y 6∈ R. We conclude that R = S, and so S does indeed have a
(good) well-order. 2
Appendix D. The Zermelo-Fraenkel Axioms for Set Theory.
Throughout, capital letters A, B etc. will denote sets, calligraphy letters
A, B will denote sets of sets and small case letters x, y, z, etc. will denote
objects.
1. Axiom of Extensionality. Two sets A and B are equal if an only
if for all objects x, x ∈ A if and only if x ∈ B.
2. Axiom of the Null Set. There exists a set φ such that for all objects x, x 6∈ φ.
3. Axiom of Unordered Pairs. For every pair of objects x, y there exists
a set A such that z ∈ A if an only if z = x or z = y.
4. Axiom of the Union. Given any collection A of sets there exists a
set B such that b ∈ B if and only if b ∈ A for some A ∈ A.
36
5. Axiom of Infinity. There exists a set A such that φ ∈ A and such
that, if A ∈ A, then A ∪ {A} ∈ A, as well.
6. Axiom of Replacement. Let A be any collection of sets and let Φ
be a function defined on each set A ∈ A. Then, for all A ∈ A, the image set
Φ(A) is itself a set.
7. Axiom of the Power Set. For every set A there exists a set P (A)
such that B ∈ P (A) if and only if B ⊆ A.
8. Axiom of Choice. There exists a function f : P (A) → A such that
for all nonempty B ∈ P (A), f (B) ∈ B.
9. Axiom of Regularity. For every nonempty set A there exists an element x ∈ A such that if y ∈ x then y 6∈ A.
37
© Copyright 2026 Paperzz