Section 13: Zorn`s Lemma and Applications

Section 13: Zorn’s Lemma and Applications
There are several basic axioms of set theory on which mathematics can be built. These
can be found from many sources. A less basic possible axiom is the following, which seems
obviously true at least in the finite case.
Axiom of Choice: For every family Sj , j ∈ I, of non-empty sets, there exists a set having
precisely one element from each of the Sj .
I is an index set, which could be extremely large, e.g. R. This axiom is very useful
in modern mathematics, e.g. analysis and topology. Unfortunately, it has “unpleasant”
consequences e.g. the “Banach-Tarski paradox”. So many mathematicians consider it is
preferable to prove something without using it if at all possible.
From the axiom of choice we can prove the following extremely useful “fact”.
Zorn’s Lemma: Suppose a poset (P, ≤) is such that every chain C ⊆ P (meaning totally
ordered subset) has an upper bound, i.e. there exists an element u ∈ P such that a ≤
u ∀a ∈ C. Then P has a maximal element (i.e. an element in P having nothing bigger than
it in P ).
Crude Sketch Proof. First of all note that the result is trivial in the case of a finite poset,
where maximal elements always exist. (If one can keep on making an element bigger, the
poset must be infinite!) How about the general case?
Suppose Zorn’s Lemma is false. Then there is a poset (P, ≤) such that every chain has
an upper bound, but there are no maximal elements.
For each chain C, let b(C) ∈ P be such that b(C) > some upper bound of C (exists as
none of the upper bounds can be maximal). Now choose a0 ∈ P , and set a1 = b({a0 }). So
a1 > a0 , a2 = b({a1 , a0 }), so a2 > a1 > a0 , etc. This gives a chain C0 ; so b(C0 ) exists.
Repeat the process using this element as the starting point to generate a new chain which
has an upper bound. One can use a notion called “tranfinite induction” to take a limit chain
for this process, which has no upper bound (since the process has completed!), which is a
contradiction.
In fact Zorn’s Lemma is equivalent to the Axiom of Choice (if we assume all the other
axioms of set theory, which are less “controversial”).
As an aside, we note that both axioms are also equivalent to the:
Well-Ordering Axiom: Every set can be well-ordered, meaning given a total order (x ≤
y or y ≤ x ∀x, y) such that every non-empty subset has a minimal element.
1
This means that the elements of the set (however large) can be labelled a0 , a1 , a2 , ..., at
least at the beginning! This seems counter-intuitive: how to do this for the reals R??
Example 1: Rings.
Let R be a ring. Recall that I ⊆ R is an ideal if I is a semigroup ideal of (R, ·)
which is also a subgroup of (R, +, −, 0). Notation: I / R. Ideals of R are in one-to-one
correspondence with congruences on R.
The ideal I is said to be maximal if there is no J / R such that I $ J $ R, and I 6= R.
This is the same as saying that it is a maximal element in the poset of proper ideals of R (=
ideals not equal to all of R) under the partial order of inclusion.
Theorem 13.1: Let I / R, a commutative ring with identity. Then I is maximal iff R/I is
a field (i.e. every non-zero element has an inverse under multiplication).
This is a standard result given in any 3rd year algebra paper. So please consult your
notes from a previous year, or any reasonable textbook, if you want to brush up on the proof
(which won’t be examined). However, the next result is probably new to you.
Therorem 13.2: Every non-zero ring with identity R contains a maximal ideal. (Hence if
R is commutative, it has a field as a factor ring - there exists I / R with R/I a field.)
Proof: Let I denote the poset of ideals {I | I / R, I 6= R}. Then I / R ⇒ I ∈ I iff 1 6∈ I
(as if 1 ∈ I, then ∀r ∈ R, r = r · 1 ∈ I, so I = R). Also {0} ∈ I, so I 6= ∅. (Exercise:
show 0 6= 1.)
S
Let C be a chain of ideals in (I, ⊆). Let I0 = I∈C I. It is easy to see I0 / R: if i, j ∈ I0
then i ∈ I, j ∈ J for some I, J ∈ C, so if I ⊆ J say (wlog!), then i, j ∈ J ⊆ I0 , so
i − j ∈ J ⊆ I0 , and for r ∈ R; ri, ir ∈ I ⊆ I0 . And 1 6∈ I ∀I ∈ C, so 1 6∈ I0 . So I0 ∈ I,
and I is an upper bound of our chain. So every chain has an upper bound. So by Zorn’s
Lemma, I has a maximal element, i.e. a proper ideal of R (i.e. 1 6∈ I) such that no other
proper ideal contains it, i.e. a maximal ideal. (In the commutative case, apply Theorem
13.1 to R/I.)
Theorem 13.4 (Monoid version): Every monoid with zero, 0, with 0 6= 1, has a maximal
semigroup ideal.
The proof is very similar to Theorem 13.2’s proof (but doesn’t use +).
Example 2: Vector spaces.
A vector space V over a field F is an abelian group (V, +, −, 0) together with a field
(F, +, ×, −, 0) (e.g: R or C, etc) and ∀ r ∈ F and v ∈ V , ∃ r · v ∈ V (so an operation of
“scalar multiplication” · : F × V → V ) such that, ∀r, s ∈ F, and u, v ∈ V :
1. r(u + v) = ru + rv
2. (r + s)u = ru + su
3. r(su) = (rs)u
4. 1 · u = u
2
Hopefully this concept is already familiar to you from linear algebra courses, at least for
the case of the real or even the complex number fields.
A familiar example is Fn , which is all n-tuples with entries from F, with the usual
vector addition and scalar multiplication. Also, the trivial vector space is {0}, which is a
vector space over every field: all sums and scalar products are zero. The usual notions of
subspaces, linear independence, spanning sets etc, all go across to vector spaces over fields.
(For the definitions, consult any linear algebra text.)
Recall a basis is a linearly independent spanning set. It is a fundamental fact that if
a vector space has a basis of size n, then every basis has size n, and we call this n the
dimension of the space. Every n-dimensional vector space over F is isomorphic to Fn . But
some vector spaces do not have finite bases (for example, the vector space of polynomials
over a field, with vector addition being polynomial addition).
Theorem 13.5: Every non-trivial vector space has a basis.
Proof: Let V be a vector space over the field F. Let F be the poset of all linearly independent subsets of V , ordered by ⊆. Then F 6= S
∅ since {v} ∈ F if v 6= 0, as is easily checked.
Let C be a chain of sets in F, and let I0 = I∈C I. We show that I0 ∈ F also; hence our
chain has an upper bound in F.
Suppose v1 , v2 , ..., vs ∈ I0 , with α1 v1 + α2 v2 + ... + αs vs = 0 for some αi ∈ F We
want to show αi = 0 for all i. So any linear combination of things from I0 equalling zero
implies all coefficients = 0, i.e. I0 is linearly independent.
Each vi ∈ Iσi for some Iσi ∈ C, i = 1, 2, . . . , s. So let J = max(Iσ1 , Iσ2 , ..., Iσs ), so
that IσP
⊆ J ∀i and J ∈ F. So vi ∈ J, i = 1, 2, . . . , s. But J is linearly independent, so
i
since si=1 αi vi = 0 we have αi = 0 ∀ i = 1, 2, ..., s. So indeed I0 is a linearly independent
set.
So F has a maximal element by Zorn’s Lemma, say B. We show B is a spanning set,
hence a basis (as it’s also linearly independent).
Let v ∈ V \ B. Then B ∪ {v} is not linearly independent (as B is maximally linearly
indendent). So there are u1 , u2 , ..., un ∈ B and α0 , α1 , ..., αn ∈ F, not all zero, such that
α0 v + α1 u1 + α2 u2 + ... + αn un = 0.
If α0 = 0, we get
α1 u1 + α2 u2 + ... + αn un = 0,
where at least one of α1 , α2 , ..., αn 6= 0. So u1 , u2 , ..., un are linearly dependent, which is a
contradiction. So α0 6= 0.
So we can reorganise to get
v = −α0−1 α1 u1 − α0−1 α2 u2 ... − α0−1 αn un ,
so v ∈ Span(B). Of course any element of B is in Span(B), so V = Span(B). So B is a
basis.
3
Example 3: Distributive lattices (and Boolean algebras).
Recall a lattice L is distributive if it satisfies
a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c),
or equivalently,
a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c),
∀ a, b, c ∈ L. Our main example is (2X , ∩, ∪), in which the partial order is ⊆. We shall
show every distributive lattice is embeddable in such an example.
First we need the notion of a filter F of a distributive lattice (L, ∧, ∨). This is a nonempty subset F ⊆ L such that
• ∀a, b ∈ F, a ∧ b ∈ F , and
• ∀a ∈ F, ∀x ∈ L : x ≥ a ⇒ x ∈ F .
It is proper if F 6= L.
e.g. In (2X , ∩, ∪), if S ∈ 2X , then <S> = {T ∈ 2X | T ⊇ S} is a filter, the principal
filter generated by S.
Generally, if L is a distributive lattice, with a ∈ L, then <a> = {b ∈ L | b ≥ a} is the
principal filter generated by a. This is the smallest filter of L containing a.
Proof: If x, y ∈ <a>, then y ≥ a, x ≥ a. So since a is a lower bound of x and y, x∧y ≥ a
so x ∧ y ∈ <a>. And if x ≥ b ∈ <a> then x ≥ b ≥ a, so x ≥ a, so x ∈ <a>. And
since a ≥ a, a ∈ <a>. Indeed if a ∈ F , a filter, then for b ∈ <a>, b ≥ a, so b ∈ F , so
<a> ⊆ F .
A prime filter F of the distributive lattice L is a filter such that if a ∨ b ∈ F , then either
a ∈ F or b ∈ F . A filter F ⊆ L is maximal with respect to not containing a ∈ L if, within
the poset under inclusion of all filters in L not containing a (if non-empty), F is maximal.
Of course these need not exist for given a ∈ L: possibly every filter of L contains a.
Lemma 13.6: Every filter F ⊆ L that is maximal with respect to not containing some
a ∈ L is prime.
Proof: Let F be maximal w.r.t. not containing a ∈ L, and assume it is not prime. Then
there are p, q ∈ L with p ∨ q ∈ F yet neither p nor q is in F . Let Gp = {x ∈ L | x ≥
p ∧ f for some f ∈ F }. For x1 , x2 ∈ Gp , x1 ≥ p ∧ f1 and x2 ≥ p ∧ f2 for some f1 , f2 ∈ F .
So x1 ≥ p ∧ f1 ∧ f2 , x2 ≥ p ∧ f1 ∧ f2 , so x1 ∧ x2 ≥ p ∧ f1 ∧ f2 . But f1 ∧ f2 ∈ F . So by
definition, x1 ∧x2 ∈ Gp . If x ∈ Gp and y ≥ x, then x ≥ p∧f for some f ∈ F, so y ≥ p∧f
too So y ∈ Gp . So Gp is a filter of L. Moreover p ∈ Gp since p ≥ p ∧ f ∀f ∈ F, and f ≥
p ∧ f ∀f ∈ F, so F ⊆ Gp . But p 6∈ F, so F $ Gp . So by maximality of F , a ∈ Gp . So
a ≥ p ∧ f1 , for some f1 ∈ F .
Similarly, by replacing p by q everywhere in this argument, a ≥ q∧f2 , for some f2 ∈ F .
So a ≥ p∧f1 ∧f2 , a ≥ q∧f1 ∧f2 , so a ≥ (p∧f1 ∧f2 )∨(q∧f1 ∧f2 ) = (p∨q)∧(f1 ∧f2 ) ∈ F
(using distributivity), as p ∨ q ∈ F and f1 ∧ f2 ∈ F . So a ∈ F , a contradiction. So in fact
F is prime.
4
Theorem 13.7: Every distributive lattice embeds in (2X , ∩, ∪) for some set X.
Proof: Let (L, ∧, ∨) be a distributive lattice. Let X be the set of all filters maximal with
respect to not containing some element of L. Define f : L → 2X by setting
f (a) = {F ∈ X | a ∈ F }.
Then
f (a ∨ b) =
=
=
=
{F ∈ X | a ∨ b ∈ F }
{F ∈ X | a ∈ F or b ∈ F } by Lemma 13.6
{F ∈ X | a ∈ F } ∪ {F ∈ X | b ∈ F }
f (a) ∪ f (b).
f (a ∧ b) =
=
=
=
{F ∈ X | a ∧ b ∈ F }
{F ∈ X | a ∈ F and b ∈ F } as each F is a filter
{F ∈ X | a ∈ F } ∩ {F ∈ X | b ∈ F }
f (a) ∩ f (b).
So f is a homomorphism (L, ∧, ∨) → (2X , ∩, ∪). We must show f is an embedding (i.e.
1:1).
Suppose a, b ∈ L, a 6= b. Then either a 6≤ b or b 6≤ a. Assume (without loss of
generality) that a 6≤ b. So b 6∈ <a> = {x ∈ L | x ≥ a}. So the filter <a> contains a but
not b.
Let P be the poset of all filters in L that contain a but not
S b. Then P 6= ∅ since
<a> ∈ P . Let F be a chain in P (ordered by ⊆). Then F0 = F ∈F F is a filter. (Check
this!) And a ∈ F0 , but b 6∈ F0 since b 6∈ F for all F ∈ F, so F0 ∈ P . So every chain
in P has an upper bound. So by Zorn’s Lemma, there is a maximal element Fa,b of P : it
is a filter maximal w.r.t containing a but not b, hence also maximal w.r.t not containing b.
(Any bigger one would contain a too.) So Fa,b ∈ X and Fa,b ∈ f (a), but Fa,b 6∈ f (b). So
f (a) 6= f (b).
Recall a Boolean Algebra is a distributive lattice with extra operations modelling further set operations:e.g. (2X , ∩, ∪,¯, ∅, X) where S̄ is the complement of S ∈ 2X , and hence
any subalgebra of this.
Theorem 13.8: Every Boolean algebra is embeddable in one of the form (2X , ∩, ∪,¯, ∅, X).
Proof: We extend the previous proof. We use the same mapping f : L → 2X (with L now a
Boolean algebra). We already know f is a distributive lattice embedding from the proof of
Theorem 13.7. We show f is a Boolean Algebra embedding. F ∈ f (0) says 0 ∈ F , which
is impossible, since then F would contain all of L (since x ≥ 0 ∀x ∈ L). So f (0) = ∅.
And F ∈ f (1) says 1 ∈ F , true ∀F ∈ X. So f (1) = X. Also, f (a) ∪ f (a0 ) = f (a ∨ a0 ) =
f (1) = X, and f (a) ∩ f (a0 ) = f (a ∧ a0 ) = f (0) = ∅. So from elementary set theory,
f (a0 ) = f (a), the complement of the set f (a).
5