MAT 217 Lecture notes, Spring 2012 Michael

Linear Algebra: MAT 217
Lecture notes, Spring 2012
Michael Damron
compiled from lectures and exercises designed with Tasho Kaletha
Princeton University
1
Contents
1 Vector spaces
1.1 Definitions . . . . . . . .
1.2 Subspaces . . . . . . . .
1.3 Linear independence and
1.4 Exercises . . . . . . . . .
. . . .
. . . .
bases
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
6
9
14
2 Linear transformations
2.1 Definitions and basic properties . . . .
2.2 Range and nullspace, one-to-one, onto .
2.3 Isomorphisms and L(V, W ) . . . . . . .
2.4 Matrices and coordinates . . . . . . . .
2.5 Exercises . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16
16
17
20
22
26
3 Dual spaces
3.1 Definitions .
3.2 Annihilators
3.3 Double dual
3.4 Dual maps .
3.5 Exercises . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
32
33
34
35
37
.
.
.
.
39
39
41
44
47
5 Eigenvalues
5.1 Definitions and the characteristic polynomial . . . . . . . . . . . . . . . . . .
5.2 Eigenspaces and the main diagonalizability theorem . . . . . . . . . . . . . .
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
54
56
58
6 Jordan form
6.1 Generalized eigenspaces . . . . . . . . . . . . . . . . . . .
6.2 Primary decomposition theorem . . . . . . . . . . . . . . .
6.3 Nilpotent operators . . . . . . . . . . . . . . . . . . . . . .
6.4 Existence and uniqueness of Jordan form, Cayley-Hamilton
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
62
62
63
66
70
72
7 Bilinear forms
7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Symmetric bilinear forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Sesquilinear and Hermitian forms . . . . . . . . . . . . . . . . . . . . . . . .
79
79
81
84
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4 Determinants
4.1 Permutations . . . . . . . .
4.2 Determinants: existence and
4.3 Properties of determinants .
4.4 Exercises . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . .
uniqueness
. . . . . . .
. . . . . . .
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7.4
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 Inner product spaces
8.1 Definitions . . . . . . . . . . . . . . . .
8.2 Orthogonality . . . . . . . . . . . . . .
8.3 Adjoints . . . . . . . . . . . . . . . . .
8.4 Spectral theory of self-adjoint operators
8.5 Normal and commuting operators . . .
8.6 Exercises . . . . . . . . . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
85
88
88
89
93
95
98
99
1
Vector spaces
1.1
Definitions
We begin with the definition of a vector space. (Keep in mind vectors in Rn or Cn .)
Definition 1.1.1. A vector space is a collection of two sets, V and F . The elements of F
(usually we take R or C) are called scalars and the elements of V are called vectors. For
each v, w ∈ V , there is a vector sum, v + w ∈ V , with the following properties.
0. There is one (and only one) vector called ~0 with the property
v + ~0 = v for all v ∈ V ;
1. for each v ∈ V there is one (and only one) vector called −v with the property
v + (−v) = ~0 for all v ∈ V ;
2. commutativity of vector sum:
v + w = w + v for all v, w ∈ V ;
3. associativity of vector sum:
(v + w) + z = v + (w + z) for all v, w, z ∈ V .
Furthermore, for each v ∈ V and c ∈ F there is a scalar product cv ∈ V with the following
properties.
1. For all v ∈ V ,
1v = v .
2. For all v ∈ V and c, d ∈ F ,
(cd)v = c(dv) .
3. For all c ∈ F, v, w ∈ V ,
c(v + w) = cv + cw .
4. For all c, d ∈ F, v ∈ V ,
(c + d)v = cv + dv .
Here are some examples.
1. V = Rn , F = R. Addition is given by
(v1 , . . . , vn ) + (w1 , . . . , wn ) = (v1 + w1 , . . . , vn + wn )
and scalar multiplication is given by
c(v1 , . . . , vn ) = (cv1 , . . . , cvn ) .
4
2. Polynomials: take V to be all polynomials of degree up to n with real coefficients
V = {an xn + · · · + a1 x + a0 : ai ∈ R for all i}
and take F = R. Note the similarity to Rn .
3. Let S be any nonempty set and let V be the set of functions from S to C. Set F = C.
If f1 , f2 ∈ V set f1 + f2 to be the function given by
(f1 + f2 )(s) = f1 (s) + f2 (s) for all s ∈ S
and if c ∈ C set cf1 to be the function given by
(cf1 )(s) = c(f1 (s)) .
4. Let
V =
1 1
0 0
(or any other fixed object) with F = C. Define
1 1
1 1
1 1
+
=
0 0
0 0
0 0
and
c
1 1
0 0
=
1 1
0 0
.
In general F is allowed to be a field.
Definition 1.1.2. A set F is called a field if for each a, b ∈ F there is an element ab ∈ F
and one a + b ∈ F such that the following hold.
1. For all a, b, c ∈ F we have (ab)c = a(bc) and (a + b) + c = a + (b + c).
2. For all a, b ∈ F we have ab = ba and a + b = b + a.
3. There exists an element 0 ∈ F such that for all a ∈ F , we have a + 0 = a; furthermore
there is a non-zero element 1 ∈ F such that for all a ∈ F , we have 1a = a.
4. For each a ∈ F there is an element −a ∈ F such that a + (−a) = 0. If a 6= 0 there
exists an element a−1 such that aa−1 = 1.
5. For all a, b, c ∈ F ,
a(b + c) = ab + ac .
Here are some general facts.
1. For all c ∈ F , c~0 = ~0.
5
Proof.
c~0 = c(~0 + ~0) = c~0 + c~0
c~0 + (−(c~0))
~0
~0
~0
=
=
=
=
(c~0 + c~0) + (−c(~0))
c~0 + (c~0 + (−(c~0)))
c~0 + ~0
c~0 .
Similarly one may prove that for all v ∈ V , 0v = ~0.
2. For all v ∈ V , (−1)v = −v.
Proof.
v + (−1)v =
=
=
=
1v + (−1)v
(1 + (−1))v
0v
~0 .
However −v is the unique vector such that v + (−v) = ~0. Therefore (−1)v = −v.
1.2
Subspaces
Definition 1.2.1. A subset W ⊆ V of a vector space is called a subspace if (W, F ) with the
same operations is also a vector space.
Many of the rules for vector spaces follow directly by “inheritance.” For example, if
W ⊆ V then for all v, w ∈ W we have v + w = w + v. We actually only need to check a few:
A. ~0 ∈ W .
B. For all w ∈ W the vector −w is also in W .
C. For all w ∈ W and c ∈ F , cw ∈ W .
D. For all v, w ∈ W , v + w ∈ W .
Theorem 1.2.2. W ⊆ V is a subspace if and only if it is nonempty and for all v, w ∈ W
and c ∈ F we have cv + w ∈ W .
Proof. Suppose that W is a subspace and let v, w ∈ W , c ∈ F . By (C) we have cv ∈ W . By
(D) we have cv + w ∈ W .
Conversely suppose that for all v, w ∈ W and c ∈ F we have cv + w ∈ W . Then we need
to show A-D.
6
A. Since W is nonempty choose w ∈ W . Let v = w and c = −1. This gives ~0 = w−w ∈ W .
B. Set v = w, w = 0 and c = −1.
C. Set v = w, w = 0 and c ∈ F .
D. Set c = 1.
Examples:
1. If V is a vector space then {0} is a subspace.
2. Take V = Cn . Let
W = {(z1 , . . . , zn ) :
n
X
zi = 0} .
i=1
Then W is a subspace. (Exercise.)
3. Let V be the set of 2 × 2 matrices with real entries.
a1 b 1
a2 b 2
a1 + a2 b1 + b2
+
=
c1 d 1
c2 d2
c1 + c2 d 1 + d 2
a1 b 1
ca1 cb1
c
=
.
c1 d1
cc1 cd1
Then is W is a subspace, where
W =
a b
c d
:a+d=1 ?
4. In Rn all subspaces are hyperplanes through the origin.
Theorem 1.2.3. Suppose that C is a non-empty collection of subspaces of V . Then the
intersection
f = ∩W ∈C W
W
is a subspace.
f . We need to show that (a) W
f 6= ∅ and (b) cv + w ∈ W
f . The
Proof. Let c ∈ F and v, w ∈ W
first holds because each W is a subspace, so 0 ∈ W . Next for all W ∈ C we have v, w ∈ W .
f.
Then cv + w ∈ W . Therefore cv + w ∈ W
Definition 1.2.4. If S ⊆ V is a subset of vectors let CS be the collection of all subspaces
containing S. The span of S is defined as
span(S) = ∩W ∈CS W .
Since CS is non-empty, span(S) is a subspace.
7
Question: What is span({})?
Definition 1.2.5. If
v = a1 w 1 + · · · + ak w k
for scalars ai ∈ F and vectors wi ∈ V then we say that v is a linear combination of
{w1 , . . . , wk }.
Theorem 1.2.6. If S 6= ∅ then span(S) is equal to the set of all finite linear combinations
of elements of S.
Proof. Set
Se = all finite l.c.’s of elements of S .
We want to show that Se = span(S). First we show that Se ⊆ span(S). Let
a1 s1 + · · · + ak sk ∈ Se
and let W be a subspace in CS . Since si ∈ S for all i we have si ∈ W . By virtue of
W being a subspace, a1 s1 + · · · + ak sk ∈ W . Since this is true for all W ∈ CS then
a1 s1 + · · · + ak sk ∈ span(S). Therefore Se ⊆ span(S).
In the other direction, the set Se is itself a subspace and it contains S (exercise). Thus
Se ∈ CS and so
span(S) = ∩W ∈CS W ⊆ Se .
Question: If W1 and W2 are subspaces, is W1 ∪ W2 a subspace? If it were, it would be the
smallest subspace containing W1 ∪ W2 and thus would be equal to span(W1 ∪ W2 ). But it is
not!
Proposition 1.2.7. If W1 and W2 are subspaces then span(W1 ∪ W2 ) = W1 + W2 , where
W1 + W2 = {w1 + w2 : w1 ∈ W1 , w2 ∈ W2 } .
Furthermore for each n ≥ 1,
span (∪nk=1 Wk ) = W1 + · · · + Wn .
Proof. A general element w1 +w2 in W1 +W2 is a linear combination of elements in W1 ∪W2 so
it is in the span of W1 ∪W2 . On the other hand if a1 v1 +· · ·+an vn is an element of the span then
we can split the vi ’s into vectors from W1 and those from W2 . For instance, v1 , . . . , vk ∈ W1
and vk+1 , . . . , vn ∈ W2 . Now a1 v1 + · · · + ak vk ∈ W1 and ak+1 vk+1 + · · · + an vn ∈ W2 by the
fact that these are subspaces. Thus this linear combination is equal to w1 + w2 for
w1 = a1 v1 + · · · + ak vk and w2 = ak+1 vk+1 + · · · + an vn .
The general case is an exercise.
Remark.
1. Span(S) is the “smallest” subspace containing S in the following sense: if
W is any subspace containing S then span(S) ⊆ W .
2. If S ⊆ T then Span(S) ⊆ Span(T ).
3. If W is a subspace then Span(W ) = W . Therefore Span(Span(S)) = Span(S).
8
1.3
Linear independence and bases
Now we move on to linear independence.
Definition 1.3.1. We say that vectors v1 , . . . , vk ∈ V are linearly independent if whenever
a1 v1 + · · · + ak vk = ~0 for scalars ai ∈ F
then ai = 0 for all i. Otherwise we say they are linearly dependent.
Lemma 1.3.2. Let S = {v1 , . . . , vn } for n ≥ 1. Then S is linearly dependent if and only if
there exists v ∈ S such that v ∈ Span(S \ {v}).
Proof. Suppose first that S is linearly dependent and that n ≥ 2. Then there exist scalars
a1 , . . . , an ∈ F which are not all zero such that a1 v1 + · · · + an vn = ~0. By reordering we may
assume that a1 6= 0. Now
a2
an
v1 = − v2 + . . . + −
vn .
a1
a1
So v1 ∈ Span(S \ {v1 }).
If S is linearly dependent and n = 1 then there exists a nonzero a1 such that a1 v1 = ~0,
so v1 = ~0. Now v1 ∈ Span({}) = Span(S \ {v1 }).
Conversely, suppose there exists v ∈ S such that v ∈ span(S \ {v}) and n ≥ 2. By
reordering we may suppose that v = v1 . Then there exist scalars a2 , . . . , an such that
v1 = a2 v2 + · · · + an vn .
But now we have
(−1)v1 + a2 v2 + · · · + an vn = ~0 .
Since this is a nontrivial linear combination, S is linearly dependent.
If n = 1 and v1 ∈ Span(S \ {v1 }) then v1 ∈ Span({}) = {~0}, so that v1 = ~0. Now it is
easy to see that S is linearly dependent.
Examples:
1. For two vectors v1 , v2 ∈ V , they are linearly dependent if and only if one is a scalar
multiple of the other. By reordering, we may suppose v1 = av2 .
2. For three vectors this is not true anymore. The vectors (1, 1), (1, 0) and (0, 1) in R2
are linearly dependent since
(1, 1) + (−1)(1, 0) + (−1)(0, 1) = (0, 0) .
However none of these is a scalar multiple of another.
9
3. {~0} is linearly dependent:
1 · ~0 = ~0 .
Proposition 1.3.3. If S is a linearly independent set and T is a nonempty subset then T
is linearly independent.
Proof. Suppose that
a1 t1 + · · · + an tn = ~0
for vectors ti ∈ T and scalars ai ∈ F . Since each ti ∈ S this is a linear combination
of elements of S. Since S is linearly independent all the coefficients must be zero. Thus
ai = 0 for all i. This was an arbitrary linear combination of elements of T so T is linearly
independent.
Corollary 1.3.4. If S is linearly dependent and R contains S then R is linearly dependent.
In particular, any set containing ~0 is linearly dependent.
Proposition 1.3.5. If S is linearly independent and v ∈ Span(S) then there exist unique
vectors v1 , . . . , vn and scalars a1 , . . . , an such that
v = a1 v1 + · · · + an vn .
Proof. Suppose that S is linearly independent and there are two representations
v = a1 v1 + · · · + an vn and v = b1 w1 + · · · + bk wk .
Then split the vectors in {v1 , . . . , vn } ∪ {w1 , . . . , wk } into three sets: S1 are those in the first
but not the second, S2 are those in the second but not the first, and S3 are those in both.
X
X
X
~0 = v − w =
aj s j +
b j sj +
(aj − bj )sj .
sj ∈S1
sj ∈S2
sj ∈S3
This is a linear combination of elements from S and by linear independence all coefficients are
zero. Thus both representations used the same vectors (in S3 ) and with the same coefficients
and are thus the same.
Lemma 1.3.6 (Steinitz exchange). Let L = {v1 , . . . , vk } be an linearly independent set in a
vector space V and let S = {w1 , . . . , wm } be a spanning set; that is, Span(S) = V . Then
1. k ≤ m and
2. there exist m − k vectors s1 , . . . , sm−k ∈ S such that
Span({v1 , . . . , vk , s1 , . . . , sm−k }) = V .
10
Proof. We will prove this by induction on k. For k = 0 it is obviously true (using the fact
that {} is linearly independent). Suppose it is true for k and we will prove it for k + 1.
In other words, let {v1 , . . . , vk+1 } be a linearly independent set. Then by last lecture, since
{v1 , . . . , vk } is linearly independent, we find k ≤ m and vectors s1 , . . . , sm−k ∈ S with
Span({v1 , . . . , vk , s1 , . . . , sm−k }) = V .
Now since vk+1 ∈ V we can find scalars a1 , . . . , ak and b1 , . . . , bm−k in F such that
vk+1 = a1 v1 + · · · + ak vk + b1 s1 + · · · bm−k sm−k .
(1)
We claim that not all of the bi ’s are zero. If this were the case then we would have
vk+1 = a1 v1 + · · · + ak vk
a1 v1 + · · · + ak vk + (−1)vk+1 = ~0 ,
a contradiction to linear independence. Also this implies that k 6= m, since otherwise the
linear combination (1) would contain no bi ’s. Thus
k ≤m+1 .
Suppose for example that b1 6= 0. Then we could write
b1 s1 = −a1 v1 + · · · + (−ak )vk + vk+1 + (−b2 )s2 + · · · + (−bm−k )sm−k
b2
a1
ak
1
bm−k
s1 = −
v1 + · · · + −
vk + vk+1 + −
s2 + · · · + −
sm−k .
b1
b1
b1
b1
b1
In other words,
s1 ∈ Span(v1 , . . . , vk , vk+1 , s2 , . . . , sm−k )
or said differently,
V = Span(v1 , . . . , vk , s1 , . . . , sm−k ) ⊆ Span(v1 , . . . , vk+1 , s2 , . . . , sm−k ) .
This completes the proof.
This lemma has loads of consequences.
Definition 1.3.7. A set B ⊆ V is called a basis for V if B is linearly independent and
Span(B) = V .
Corollary 1.3.8. If B1 and B2 are bases for V then they have the same number of elements.
Proof. If both sets are infinite, we are done. If B1 is infinite and B2 is finite then we may
choose |B2 | + 1 elements of B1 that will be linearly independent. Since B2 spans V , this
contradicts Steinitz. Similarly if B1 is finite and B2 is infinite.
Otherwise they are both finite. Since each spans V and is linearly independent, we apply
Steinitz twice to get |B1 | ≤ |B2 | and |B2 | ≤ |B1 |.
11
Definition 1.3.9. We define the dimension dim V to be the number of elements in a basis.
By the above, this is well-defined.
Remark. Each nonzero element of V has a unique representation in terms of a basis.
Theorem 1.3.10. Let V be a nonzero vector space and suppose that V is finitely generated;
that is, there is a finite set S ⊆ V such that V = Span(S). Then V has a finite basis:
dim V < ∞.
Proof. Let B be a minimal spanning subset of S. We claim that B is linearly independent.
If not, then by a previous result, there is a vector b ∈ B such that b ∈ Span(B \ {b}). It
then follows that
V = Span(B) ⊆ Span(B \ {b}) ,
so B \ {b} is a spanning set, a contradiction.
Theorem 1.3.11 (1 subspace theorem). Let V be a finite dimensional vector space and W
be a nonzero subspace of V . Then dim W < ∞. If C = {w1 , . . . , wk } is a basis for W then
there exists a basis B for V such that C ⊆ B.
Proof. It is an exercise to show that dim W < ∞. Write n for the dimension of V and
let B be a basis for V (with n elements). Since this is a spanning set and C is a linearly
independent set, there exist vectors b1 , . . . , bn−k ∈ B such that
B := C ∪ {b1 , . . . , bn−k }
is a spanning set. Note that B is a spanning set with n = dim V number of elements. Thus
we will be done if we prove the following lemma.
Lemma 1.3.12. Let V be a vector space of dimension n ≥ 1 and S = {v1 , . . . , vk } ⊆ V .
1. If k < n then S cannot span V .
2. If k > n then S cannot be linearly independent.
3. If k = n then S is linearly independent if and only if S spans V .
Proof. Let B be a basis of V . If S spans V , Steinitz gives that |B| ≤ |S|. This proves 1. If
S is linearly independent then again Steinitz gives |S| ≤ |B|. This proves 2.
Suppose that k = n and S is linearly independent. By Steinitz, we may add 0 vectors
from the set B to S to make S span V . Thus Span(S) = V . Conversely, suppose that
Span(S) = V . If S is linearly dependent then there exists s ∈ S such that s ∈ Span(S \{s}).
Then S \ {s} is a set of smaller cardinality than S and spans V . But now this contradicts
Steinitz, using B as our linearly independent set and S \ {s} as our spanning set.
Corollary 1.3.13. If V is a vector space and W is a subspace then dim W ≤ dim V .
Theorem 1.3.14. Let W1 and W2 be subspaces of V (with dim V < ∞). Then
dim W1 + dim W2 = dim (W1 ∩ W2 ) + dim (W1 + W2 ) .
12
Proof. Easy if either is zero. Otherwise we argue as follows. Let
{v1 , . . . , vk }
be a basis for W1 ∩ W2 . By the 1-subspace theorem, extend this to a basis of W1 :
{v1 , . . . , vk , w1 , . . . , wm1 }
and also extend it to a basis for W2 :
{v1 , . . . , vk , ŵ1 , . . . , ŵm2 } .
We claim that
B := {v1 , . . . , vk , w1 , . . . , wm1 , ŵ1 , . . . , ŵm2 }
is a basis for W1 + W2 . It is not hard to see it is spanning.
To show linear independence, suppose
a1 v1 + · · · + ak vk + b1 w1 + · · · + bm1 wm1 + c1 ŵ1 + · · · + cm2 ŵm2 = ~0 .
Then
c1 ŵ1 + · · · + cm2 ŵm2 = (−a1 )v1 + · · · + (−ak )vk + (−b1 )w1 + · · · + (−bm1 )wm1 ∈ W1 .
Also it is clearly in W2 . So it is in W1 ∩ W2 . So we can find scalars ã1 , . . . , ãk such that
c1 ŵ1 + · · · + cm2 ŵm2 = ã1 v1 + · · · + ãk vk
ã1 v1 + · · · + ãk vk + (−c1 )ŵ1 + · · · + (−cm2 )ŵm2 = ~0 .
This is a linear combination of basis elements, so all ci ’s are zero. A similar argument gives
all bi ’s as zero. Thus we finally have
a1 v1 + · · · + ak vk = ~0 .
But again this is a linear combination of basis elements so the ai ’s are zero.
Theorem 1.3.15 (2 subspace theorem). Let V be a finite dimensional vector space and W1
and W2 be nonzero subspaces. There exists a basis of V that contains bases for W1 and W2 .
Proof. The proof of the previous theorem shows that there is a basis for W1 + W2 which
contains bases for W1 and W2 . Extend this to a basis for V .
13
1.4
Exercises
Notation:
1. If F is a field then define s1 : F → F by s1 (x) = x. For integers n ≥ 2 define
sn : F → F by sn (x) = x + sn−1 (x). Last, define the characteristic of F as
char(F ) = min{n : sn (1) = 0} .
If sn (1) 6= 0 for all n ≥ 1 then we set char(F ) = 0.
Exercises:
1. The finite field Fp : For n ∈ N, let Zn denote the set of integers mod n. That is, each
element of Z/nZ is a subset of Z of the form d + nZ, where d ∈ Z. We define addition
and multiplication on Zn by
• (a + nZ) + (b + nZ) = (a + b) + nZ.
• (a + nZ) · (b + nZ) = (ab) + nZ.
Show that these operations are well defined. That is, if a0 , b0 ∈ Z are integers such that
a + nZ = a0 + nZ and b + nZ = b0 + nZ, then (a0 + nZ) + (b0 + nZ) = (a + b) + nZ and
(a0 + nZ) · (b0 + nZ) = (ab) + nZ. Moreover, show that these operations make Zn into
a field if and only if n is prime. In that case, one writes Z/pZ = Fp .
2. The finite field Fpn : Let F be a finite field.
(a) Show that char(F ) is a prime number (in particular non-zero).
(b) Write p for the characteristic of F and define
F = {sn (1) : n = 1, . . . , p} ,
where sn is the function given in the notation section. Show that F is a subfield
of F , isomorphic to Fp .
(c) We can consider F as a vector space over F . Vector addition and scalar multiplication are interpreted using the operations of F . Show that F has finite
dimension.
(d) Writing n for the dimension of F , show that
|F | = pn .
3. Consider R as a vector space over Q (using addition and multiplication of real numbers).
Does this vector space have finite dimension?
14
4. Recall the definition of direct sum: if W1 and W2 are subspaces of a vector space V
then we write W1 ⊕ W2 for the space W1 + W2 if W1 ∩ W2 = {~0}. For k ≥ 3 we write
W1 ⊕ · · · ⊕ Wk for the space W1 + · · · + Wk if for each i = 2, . . . , k, we have
Wi ∩ (W1 + · · · + Wi−1 ) = {~0} .
Let S = {v1 , . . . , vn } be a subset of nonzero vectors in a vector space V and for each
k = 1, . . . , n write Wk = Span({vk }). Show that S is a basis for V if and only if
V = W1 ⊕ · · · ⊕ Wn .
15
2
Linear transformations
2.1
Definitions and basic properties
We now move to linear transformations.
Definition 2.1.1. Let V and W be vector spaces over the same field F . A function T : V →
W is called a linear transformation if
1. for all v1 , v2 ∈ V , T (v1 + v2 ) = T (v1 ) + T (v2 ) and
2. for all v ∈ V and c ∈ F , T (cv) = cT (v).
Remark. We only need to check that for all v1 , v2 ∈ V , c ∈ F , T (cv1 + v2 ) = cT (v1 ) + T (v2 ).
Examples:
1. Let V = F n and W = F m , the vector spaces of n-tuples and m-tuples respectively.
Any m × n matrix A defines a linear transformation LA : F n → F m by
LA~v = A · ~v .
2. Let V be a finite dimensional vector space (of dimension n) and fix an (ordered) basis
β = {v1 , . . . , vn }
of V . Define Tβ : V → F n by
Tβ (v) = (a1 , . . . , an ) ,
where v = a1 v1 + · · · + an vn . Then Tβ is linear. It is called the coordinate map for the
ordered basis β.
Suppose that T : Fn → Fm is a linear transformation and ~x ∈ Fn . Then writing ~x =
(x1 , . . . , xn ),
T (~x) = T ((x1 , . . . , xn ))
= T (x1 (1, 0, . . . , 0) + · · · + xn (0, . . . , 0, 1))
= x1 T ((1, 0, . . . , 0)) + xn T ((0, . . . , 0, 1)) .
Therefore we only need to know the values of T at the standard basis. This leads us to:
Theorem 2.1.2 (The slogan). Given V and W , vector spaces over F , let {v1 , . . . , vn } be a
basis for V . If {w1 , . . . , wn } are any vectors in W , there exists exactly one linear transformation T : V → W such that
T (vi ) = wi for all i = 1, . . . , n .
16
Proof. We need to prove two things: (a) there is such a linear transformation and (b) there
cannot be more than one. Motivated by the above, we first prove (a).
Each v ∈ V has a unique representation
v = a1 v1 + · · · + an vn .
Define T by
T (v) = a1 w1 + · · · + an wn .
Note that by unique representations, T is well-defined. We claim that T is
Let
Plinear.
n
v, ve ∈
V
and
c
∈
F
.
We
must
show
that
T
(cv
+
v
e
)
=
cT
(v)
+
T
(e
v
).
If
v
=
a
v
and
i=1 i i
P
ve = ni=1 e
ai vi then we claim that the unique representation of cv + ve is
cv + ve = (ca1 + e
a1 )v1 + · · · + (can + e
an )vn .
Therefore
T (cv + ve) = (ca1 + e
a1 )w1 + · · · + (can + e
an )wn
= c(a1 w1 + · · · + an wn ) + (e
a1 w 1 + · · · + e
an wn )
= cT (v) + T (e
v) .
Thus T is linear.
Now we show that T is unique. Suppose that T 0 is another linear transformation such
that
T 0 (vi ) = wi for all i = 1, . . . , n .
P
Then if v ∈ V write v = ni=1 ai vi . We have
T 0 (v) =
=
=
=
T 0 (a1 v1 + · · · + an vn )
a1 T 0 (v1 ) + · · · + an T 0 (vn )
a1 w 1 + · · · + an w n
T (v) .
Since T (v) = T 0 (v) for all v, by definition T = T 0 .
2.2
Range and nullspace, one-to-one, onto
Definition 2.2.1. If T : V → W is linear we define the nullspace of T by
N (T ) = {v ∈ V : T (v) = ~0} .
Here, ~0 is the zero vector in W . We also define the range of T
R(T ) = {w ∈ W : there exists v ∈ V s.t. T (v) = w} .
17
Proposition 2.2.2. If T : V → W is linear then N (T ) is a subspace of V and R(T ) is a
subspace of W .
Proof. First note that T (~0) = ~0. This holds from
~0 = 0T (v) = T (0v) = T (~0) .
Therefore both spaces are nonempty.
Choose v1 , v2 ∈ N (T ) and c ∈ F . Then
T (cv1 + v2 ) = cT (v1 ) + T (v2 ) = c~0 + ~0 = ~0 ,
so that cv1 + v2 ∈ N (T ). Therefore N (T ) is a subspace of V . Also if w1 , w2 ∈ R(T ) and
c ∈ F then we may find v1 , v2 ∈ V such that T (v1 ) = w1 and T (v2 ) = w2 . Now
T (cv1 + v2 ) = cT (v1 ) + T (v2 ) = cw1 + w2 ,
so that cw1 + w2 ∈ R(T ). Therefore R(T ) is a subspace of W .
Definition 2.2.3. Let S and T be sets and f : S → T a function.
1. f is one-to-one if it maps distinct points to distinct points. In other words if s1 , s2 ∈ S
such that s1 6= s2 then f (s1 ) 6= f (s2 ). Equivalently, whenever f (s1 ) = f (s2 ) then
s1 = s2 .
2. f is onto if its range is equal to T . That is, for each t ∈ T there exists s ∈ S such that
f (s) = t.
Theorem 2.2.4. Let T : V → W be linear. Then T is one-to-one if and only if N (T ) = {~0}.
Proof. Suppose T is one-to-one. We want to show that N (T ) = {~0}. Clearly ~0 ∈ N (T ) as it
is a subspace of V . If v ∈ N (T ) then we have
T (v) = ~0 = T (~0) .
Since T is one-to-one this implies that v = ~0.
Suppose conversely that N (T ) = {~0}. If T (v1 ) = T (v2 ) then
~0 = T (v1 ) − T (v2 ) = T (v1 − v2 ) ,
so that v1 − v2 ∈ N (T ). But the only vector in the nullspace is ~0 so v1 − v2 = ~0. This implies
that v1 = v2 and T is one-to-one.
We now want to give a theorem that characterizes one-to-one and onto linear maps in a
different way.
Theorem 2.2.5. Let T : V → W be linear.
18
1. T is one-to-one if and only if it maps linearly independent sets in V to linearly independent sets in W .
2. T is onto if and only if it maps spanning sets of V to spanning sets of W .
Proof. Suppose that T is one-to-one and that S is a linearly independent set in V . We will
show that T (S), defined by
T (S) := {T (s) : s ∈ S} ,
is also linearly independent. If
a1 T (s1 ) + · · · + ak T (sk ) = ~0
for some si ∈ S and ai ∈ F then
T (a1 s1 + · · · + ak sk ) = ~0 .
Therefore a1 s1 + · · · + ak sk ∈ N (T ). But T is one-to-one so N (T ) = {~0}. This gives
a1 s1 + · · · + ak sk = ~0 .
Linear independence of S gives the ai ’s are zero. Thus T (S) is linearly independent.
Suppose conversely that T maps linearly independent sets to linearly independent sets.
If v is any nonzero vector in V then {v} is linearly independent. Therefore so is {T (v)}.
This implies that T (v) 6= 0. Therefore N (T ) = {~0} and so T is one-to-one.
If T maps spanning sets to spanning sets then let w ∈ W . Let S be a spanning set of V ,
so that consequently T (S) spans W . If w ∈ W we can write w = a1 T (s1 ) + · · · + ak T (sk )
for ai ∈ F and si ∈ S, so
w = T (a1 s1 + · · · + ak sk ) ∈ R(T ) ,
giving that T is onto.
For the converse suppose that T is onto and that S spans V . We claim that T (S) spans
W . To see this, let w ∈ W and note there exists v ∈ V such that T (v) = w. Write
v = a1 s 1 + · · · + ak s k ,
so w = T (v) = a1 T (s1 ) + · · · + ak T (sk ) ∈ Span(T (S)). Therefore T (S) spans W .
Corollary 2.2.6. Let T : V → W be linear.
1. if V and W are finite dimensional, then T is an isomorphism (one-to-one and onto)
if and only if T maps bases to bases.
2. If V is finite dimensional, then every basis of V is mapped to a spanning set of R(T ).
3. If V is finite dimensional, then T is one-to-one if and only if T maps bases of V to
bases of R(T ).
19
Theorem 2.2.7 (Rank-nullity theorem). Let T : V → W be linear and V of finite dimension. Then
rank(T ) + nullity(T ) = dim V .
Proof. Let
{v1 , . . . , vk }
be a basis for N (T ). Extend it to a basis
{v1 , . . . , vk , vk+1 , . . . , vn }
of V . Write N 0 = Span{vk+1 , . . . , vn } and note that V = N (T ) ⊕ N 0 .
Lemma 2.2.8. If N (T ) and N 0 are complementary subspaces (that is, V = N (T ) ⊕ N 0 ) then
T is one-to-one on N 0 .
Proof. If z1 , z2 ∈ N 0 such that T (z1 ) = T (z2 ) then z1 − z2 ∈ N (T ). But it is in N 0 so it is in
N 0 ∩ N (T ), which is only the zero vector. So z1 = z2 .
We may view T as a linear transformation only on N 0 ; call it T |N 0 ; in other words, T |N 0
is a linear transformation from N 0 to W that acts exactly as T does. By the corollary,
{T |N 0 (vk+1 ), . . . , T |N 0 (vn )} is a basis for R(T |N 0 ). Therefore {T (vk+1 ), . . . , T (vn )} is a basis
for R(T |N 0 ). By part two of the corollary, {T (v1 ), . . . , T (vn )} spans R(T ). So
R(T ) = Span({T (v1 ), . . . , T (vn )})
= Span({T (vk+1 ), . . . , T (vn )})
= = R(T |N 0 ) .
The second equality follows because the vectors T (v1 ), . . . , T (vk ) are all zero and do not
contribute to the span (you can work this out as an exercise). Thus {T (vk+1 ), . . . , T (vn )} is
a basis for R(T ) and
rank(T ) + nullity(T ) = (n − k) + k = n = dim V .
2.3
Isomorphisms and L(V, W )
Definition 2.3.1. If S and T are sets and f : S → T is a function then a function g : T → S
is called an inverse function for f (written f −1 ) if
f (g(t)) = t and g(f (s)) = s for all t ∈ T, s ∈ S .
Fact: f : S → T has an inverse function if and only if f is one-to-one and onto. Furthermore
the inverse is one-to-one and onto. (Explain this.)
Theorem 2.3.2. If T : V → W is an isomorphism then the inverse map T −1 : W → V is
an isomorphism.
20
Proof. We have one-to-one and onto. We just need to show linear. Suppose that w1 , w2 ∈ W
and c ∈ F . Then
T T −1 (cw1 + w2 ) = cw1 + w2
and
T cT −1 (w1 ) + T −1 (w2 ) = cT (T −1 (w1 )) + T (T −1 (w2 )) = cw1 + w2 .
However T is one-to-one, so
T −1 (cw1 + w2 ) = cT −1 (w1 ) + T −1 (w2 ) .
The proof of the next lemma is in the homework.
Lemma 2.3.3. Let V and W be vector spaces with dim V = dim W . If T : V → W is
linear then T is one-to-one if and only if T is onto.
Example: The coordinate map is an isomorphism. For V of dimension n choose a basis
β = {v1 , . . . , vn }
and define Tβ : V → F n by T (v) = (a1 , . . . , an ), where
v = a1 v1 + · · · + an vn .
Then Tβ is linear (check). To show one-to-one and onto we only need to check one (since
dim V = dim F n . If (a1 , . . . , an ) ∈ F n then define v = a1 v1 + · · · + an vn . Now
Tβ (v) = (a1 , . . . , an ) .
So T is onto.
The space of linear maps. Let V and W be vector spaces over the same field F . Define
L(V, W ) = {T : V → W |T is linear} .
We define addition and scalar multiplication as usual: for T, U ∈ L(V, W ) and c ∈ F ,
(T + U )(v) = T (v) + U (v) and (cT )(v) = cT (v) .
This is a vector space (exercise).
Theorem 2.3.4. If dim V = n and dim W = m then dim L(V, W ) = mn. Given bases
{v1 , . . . , vn } and {w1 , . . . , wm } of V and W , the set {Ti,j : 1 ≤ i ≤ n, 1 ≤ j ≤ m} is a basis
for L(V, W ), where
(
wj if i = k
Ti,j (vk ) =
.
~0
otherwise
21
Proof. First, to show linear independence, suppose that
X
ai,j Ti,j = 0T ,
i,j
where the element on the right is the zero transformation. Then for each k = 1, . . . , n, apply
both sides to vk :
X
ai,j Ti,j (vk ) = 0T (vk ) = ~0 .
i,j
We then get
~0 =
m X
n
X
ai,j Ti,j (vk ) =
j=1 i=1
m
X
ak,j wj .
j=1
But the wj ’s form a basis, so all ak,1 , . . . , ak,m = 0. This is true for all k so the Ti,j ’s are
linearly independent.
To show spanning suppose that T : V → W is linear. Then for each i = 1, . . . , n, the
vector T (vi ) is in W , so we can write it in terms of the wj ’s:
T (vi ) = ai,1 w1 + · · · + ai,m wm .
Now define the transformation
Te =
X
ai,j Ti,j .
i,j
We claim that this equals T . To see this, we must only check on the basis vectors. For some
k = 1, . . . , n,
T (vk ) = ak,1 w1 + · · · + ak,m wm .
However,
Te(vk ) =
=
X
i,j
m
X
ai,j Ti,j (vk ) =
m X
n
X
ai,j Ti,j (vk )
j=1 i=1
ak,j wj
j=1
= ak,1 w1 + · · · + ak,m wm .
2.4
Matrices and coordinates
Let T : V → W be a linear transformation and
β = {v1 , . . . , vn } and γ = {w1 , . . . , wm }
bases for V and W , respectively.
We now build a matrix, which we label [T ]βγ . (Using the column convention.)
22
1. Since T (v1 ) ∈ W , we can write
T (v1 ) = a1,1 w1 + · · · + am,1 wm .
2. Put the entries a1,1 , . . . , am,n into the first column of [T ]βγ .
3. Repeat for k = 1, . . . , n, writing
T (vk ) = a1,k w1 + · · · + am,k wm
and place the entries a1,k , . . . , am,k into the k-th column.
Theorem 2.4.1. For each T : V → W and bases β and γ, there exists a unique m × n
matrix [T ]βγ such that for all v ∈ V ,
[T ]βγ [v]β = [T (v)]γ .
Proof. Let v ∈ V . Then write v = a1 v1 + · · · + an vn .
T (v) = a1 T (v1 ) + · · · + an T (vn )
= a1 (a1,1 w1 + · · · + am,1 wm ) + · · · + an (a1,n w1 + · · · + am,n wm ) .
Collecting terms,
T (v) = (a1 a1,1 + · · · + an a1,n )w1 + · · · + (a1 am,1 + · · · + an am,n )wm .
This gives the coordinates of T (v) in terms of γ:
 

a1 a1,1 + · · · + an a1,n
a1,1 · · ·



···
··· ···
[T (v)]γ =
=
a1 am,1 + · · · + an am,n
am,1 · · ·


a1,n
a1
· · ·   · · ·  = [T ]βγ [v]β .
am,n
an
Suppose that A and B are two matrices such that for all v ∈ V ,
A[v]β = [T (v)]γ = B[v]β .
Take v = vk . Then [v]β = ~ek and A[v]β is the k-th column of A (and similarly for B).
Therefore A and B have all the same columns. This means A = B.
Theorem 2.4.2. Given bases β and γ of V and W , the map T 7→ [T ]βγ is an isomorphism.
Proof. It is one-to-one. To show linear let c ∈ F and T, U ∈ L(V, W ). For each k = 1, . . . , n,
[cT + U ]βγ [vk ]β = [cT (v) + U (v)]β = c[T (vk )]β + [U (vk )]β = c[T ]βγ [vk ]β + [U ]βγ [vk ]β
= (c[T ]βγ + [U ]βγ )[vk ]β .
However the left side is the k-th column of [cT + U ]βγ and the right side is the k-th column
of c[T ]βγ + [U ]βγ .
Both spaces have the same dimension, so the map is also onto and thus an isomorphism.
23
Examples:
1. Change of coordinates: Let V be finite dimensional and β, β 0 two bases of V . How
do we express coordinates in β in terms of those in β 0 ? For each v ∈ V ,
[v]β 0 = [Iv]β 0 = [I]ββ 0 [v]β .
We multiply by the matrix [I]ββ 0 .
2. Suppose that V, W and Z are vector spaces over the same field. Let T : V → W and
U : W → Z be linear with β, γ and δ bases for V, W and Z.
(a) U T : V → Z is linear. If v1 , v2 ∈ V and c ∈ F then
(U T )(cv1 + v2 ) = U (T (cv1 + v2 )) = U (cT (v1 ) + T (v2 )) = c(U T )(v1 ) + (U T )(v2 ) .
(b) For each v ∈ V ,
[(U T )v]δ = [U (T (v))]δ = [U ]γδ [T (v)]γ = [U ]γδ [T ]βγ [v]β .
Therefore
[U T ]βδ = [U ]γδ [T ]βγ .
(c) If T is an isomorphism from V to W ,
Id = [I]ββ = [T ]βγ [T −1 ]γβ .
Similarly,
Id = [I]γγ = [T −1 ]γβ [T ]βγ
−1
This implies that [T −1 ]γβ = [T ]βγ
.
3. To change coordinates back,
−1 −1
0
[I]ββ = [I −1 ]ββ 0
= [I]ββ 0 ]
.
Definition 2.4.3. An n × n matrix A is invertible is there exists an n × n matrix B such
that
I = AB = BA .
Remark. If β, β 0 are bases of V then
0
I = [I]ββ = [I]ββ [I]ββ 0 .
Therefore each change of basis matrix is invertible.
Now how do we relate the matrix of T with respect to different bases?
24
Theorem 2.4.4. Let V and W be finite-dimensional vector spaces over F with β, βe bases
for V and γ, γ
e bases for W .
1. If T : V → W is linear then there exist invertible matrices P and Q such that
0
[T ]βγ = P [T ]βγ 0 Q .
2. If T : V → W is linear then there exists an invertible matrix P such that
0
[T ]ββ = P −1 [T ]ββ 0 P .
Proof.
0
0
[T ]βγ = [I]γγ [T ]βγ 0 [I]ββ 0 .
Also
−1
0
0
0
[T ]ββ = [I]ββ [T ]ββ 0 [I]ββ 0 = [I]ββ 0
[T ]ββ 0 [I]ββ 0 .
Definition 2.4.5. Two n × n matrices A and B are similar if there exists an n × n invertible
matrix P such that
A = P −1 BP .
Theorem 2.4.6. Let A and B be n×n matrices with entries from F . If A and B are similar
then there exists an n-dimensional vector space V , a linear transformation T : V → V , and
bases β, β 0 such that
0
A = [T ]ββ and B = [T ]ββ 0 .
Proof. Suppose that A = P −1 BP . Define the linear transformation LA : F n → F n by
LA (~v ) = A · ~v .
If we choose β to be the standard basis then
[LA ]ββ = A .
Next we will show that if β 0 = {~p1 , . . . , p~n } are the columns of P then β 0 is a basis and
0
P = [I]ββ , where I : F n → F n is the identity map. If we prove this, then P −1 = [I]ββ 0 and so
0
0
B = P −1 AP = [I]ββ 0 [LA ]ββ [I]ββ = [LA ]ββ 0 .
Why is β 0 a basis? Note that
p~k = P · ~ek ,
so that if LP is invertible then β 0 will be the image of a basis and thus a basis. But for all
~v ∈ F n ,
LP −1 LP ~v = P −1 · P · ~v = ~v
and
LP LP −1 ~v = ~v .
So (LP )−1 = LP −1 . This completes the proof.
25
The moral: Similar matrices represent the same transformation but with respect to two
different bases. Any property of matrices that is invariant under conjugation can be viewed
as a property of the underlying transformation.
Example: Trace. Given an n × n matrix A with entries from F , define
Tr A =
n
X
ai,i .
i=1
Note that if P is another matrix (not nec. invertible),
" n
#
" n
#
n
n
n
n
X
X
X
X
X
X
T r (AP ) =
(AP )i,i =
ai,l pl,i =
pl,i ai,l =
(P A)l,l = T r (P A) .
i=1
i=1
l=1
l=1
i=1
l=1
Therefore if P is invertible,
T r (P −1 AP ) = T r (AP P −1 ) = T r A .
This means that trace is invariant under conjugation. Thus if T : V → V is linear (and V is
finite dimensional) then T r T can be defined as
T r [T ]ββ
for any basis β.
2.5
Exercises
Notation:
1. A group is a pair (G, ·), where G is a set and · : G × G → G is a function (usually
called product) such that
(a) there is an identity element e; that is, an element with the property
e · g = g · e = g for all g ∈ G ,
(b) for all g ∈ G there is an inverse element in G called g −1 such that
g −1 · g = g · g −1 = e ,
(c) and the operation is associative: for all g, h, k ∈ G,
(g · h) · k = g · (h · k) .
If the operation is commutative; that is, for all g, h ∈ G, we have g · h = h · g then we
call G abelian.
26
2. If (G, ·) is a group and H is a subset of G then we call H a subgroup of G if (H, ·|H×H )
is a group. Equivalently (and analogously to vector spaces and subspaces), H ⊆ G is
a subgroup of G if and only if
(a) for all h1 , h2 ∈ H we have h1 · h2 ∈ H and
(b) for all h ∈ H, we have h−1 ∈ H.
3. If G and H are groups then a function Φ : G → H is called a group homomorphism if
Φ(g1 · g2 ) = Φ(g1 ) · Φ(g2 ) for all g1 and g2 ∈ G .
Note that the product on the left is in G whereas the product on the right is in H. We
define the kernel of Φ to be
Ker(Φ) = {g ∈ G : Φ(g) = eH } .
Here, eH refers to the identity element of H.
4. A group homomorphism Φ : G → H is called
• a monomorphism (or injective, or one-to-one), if Φ(g1 ) = Φ(g2 ) ⇒ g1 = g2 ;
• an epimorphism (or surjective, or onto), if Φ(G) = H;
• an isomorphism (or bijective), if it is both injective and surjective.
A group homomorphism Φ : G → G is called an endomorphism of G. An endomorphism which is also an isomorphism is called an automorphism. The set of automorphisms of a group G is denoted by Aut(G).
5. Recall that, if F is a field (with operations + and ·) then (F, +) and (F \ {0}, ·) are
abelian groups, with identity elements 0 and 1 respectively. If F and G are fields and
Φ : F → G is a function then we call Φ a field homomorphism if
(a) for all a, b ∈ F we have
Φ(a + b) = Φ(a) + Φ(b) ,
(b) for all a, b ∈ F we have
Φ(ab) = Φ(a)Φ(b) ,
(c) and Φ(1F ) = 1G . Here 1F and 1G are the multiplicative identities of F and G
respectively.
6. If V and W are vector spaces over the same field F then we define their product to be
the set
V × W = {(v, w) : v ∈ V and w ∈ W } .
which becomes a vector space under the operations
(v, w)+(v 0 , w0 ) = (v +v 0 , w +w0 ),
c(v, w) = (cv, cw)
27
v, v 0 ∈ V, w, w0 ∈ W, c ∈ F.
If you are familiar with the notion of an external direct sum, notice that the product
of two vector spaces is the same as their external direct sum. The two notions cease
being equivalent when one considers infinitely many factors/summands.
If Z is another vector space over F then we call a function f : V × W → Z bilinear if
(a) for each fixed v ∈ V , the function fv : W → Z defined by
fv (w) = f ((v, w))
is a linear transformation as a function of w and
(b) for each fixed w ∈ W , the function fw : V → Z defined by
fw (v) = f ((v, w))
is a linear transformation as a function of v.
Exercises:
1. Suppose that G and H are groups and Φ : G → H is a homomorphism.
(a) Prove that if H 0 is a subgroup of H then the inverse image
Φ−1 (H 0 ) = {g ∈ G : Φ(g) ∈ H 0 }
is a subgroup of G. Deduce that Ker(Φ) is a subgroup of G.
(b) Prove that if G0 is a subgroup of G, then the image of G0 under Φ,
Φ(G0 ) = {Φ(g)|g ∈ G0 }
is a subgroup of H.
(c) Prove that Φ is one-to-one if and only if Ker(Φ) = {eG }. (Here, eG is the identity
element of G.)
2. Prove that every field homomorphism is one-to-one.
3. Let V and W be finite dimensional vector spaces with dim V = n and dim W = m.
Suppose that T : V → W is a linear transformation.
(a) Prove that if n > m then T cannot be one-to-one.
(b) Prove that if n < m then T cannot be onto.
(c) Prove that if n = m then T is one-to-one if and only if T is onto.
28
4. Let F be a field, V, W be finite-dimensional F -vector spaces, and Z be any F -vector
space. Choose a basis {v1 , . . . , vn } of V and a basis {w1 , . . . , wm } of W . Let
{zi,j : 1 ≤ i ≤ n, 1 ≤ j ≤ m}
be any set of mn vectors from Z. Show that there is precisely one bilinear transformation f : V × W → Z such that
f (vi , wj ) = zi,j for all i, j .
5. Let V be a vector space and T : V → V a linear transformation. Show that the
following two statements are equivalent.
(A) V = R(T ) ⊕ N (T ), where R(T ) is the range of T and N (T ) is the nullspace of T .
(B) N (T ) = N (T 2 ), where T 2 is T composed with itself.
6. Let V, W and Z be finite-dimensional vector spaces over a field F . If T : V → W and
U : W → Z are linear transformations, prove that
rank(U T ) ≤ min{rank(U ), rank(T )} .
Prove also that if either of U or T is invertible, then the rank of U T is equal to the
rank of the other one. Deduce that if P : V → V and Q : W → W are isomorphisms
then the rank of QT P equals the rank of T .
7. Let V and W be finite-dimensional vector spaces over a field F and T : V → W be
a linear transformation. Show that there exist ordered bases β of V and γ of W such
that
(
0
if i 6= j
[T ]βγ i,j =
.
0 or 1 if i = j
8. The purpose of this question is to show that the row rank of a matrix is equal to its
column rank. Note that this is obviously true for a matrix of the form described in
the previous exercise. Our goal will be to put an arbitrary matrix in this form without
changing either its row rank or its column rank. Let A be an m × n matrix with entries
from a field F .
(a) Show that the column rank of A is equal to the rank of the linear transformation
LA : F n → F m defined by LA (~v ) = A · ~v , viewing ~v as a column vector.
(b) Use question 1 to show that if P and Q are invertible n × n and m × m matrices
respectively then the column rank of QAP equals the column rank of A.
(c) Show that the row rank of A is equal to the rank of the linear transformation
RA : F m → F n defined by RA (~v ) = ~v · A, viewing ~v as a row vector.
(d) Use question 1 to show that if P and Q are invertible n × n and m × m matrices
respectively then the row rank of QAP equals the row rank of A.
29
(e) Show that there exist n × n and m × m matrices P and Q respectively such that
QAP has the form described in question 2. Deduce that the row rank of A equals
the column rank of A.
9. Given an angle θ ∈ [0, 2π) let Tθ : R2 → R2 be the function which rotates a vector
clockwise about the origin by an angle θ. Find the matrix of Tθ relative to the standard
basis. You do not need to prove that Tθ is linear.
10. Given m ∈ R, define the line
Lm = {(x, y) ∈ R2 : y = mx} .
(a) Let Tm be the function which maps a point in R2 to its closest point in Lm . Find
the matrix of Tm relative to the standard basis. You do not need to prove that
Tm is linear.
(b) Let Rm be the function which maps a point in R2 to the reflection of this point
about the line Lm . Find the matrix of Tm relative to the standard basis. You do
not need to prove that Rm is linear.
Hint for (a) and (b): first find the matrix relative to a carefully chosen basis
and then perform a change of basis.
11. The quotient space V /W . Let V be an F-vector space, and W ⊂ V a subspace. A
subset S ⊂ V is called a W -affine subspace of V , if the following holds:
∀s, s0 ∈ S : s − s0 ∈ W
and
∀s ∈ S, w ∈ W : s + w ∈ S.
(a) Let S and T be W -affine subspaces of V and c ∈ F. Put
(
{ct : t ∈ T } c 6= 0
S + T := {s + t : s ∈ S, t ∈ T },
cT :=
.
W
c=0
Show that S + T and cT are again W -affine subspaces of V .
(b) Show that the above operations define an F-vector space structure on the set of
all W -affine subspaces of V .
We will write V /W for the set of W -affine subspaces of V . We now know that it
is a vector space. Note that the elements of V /W are subsets of V .
(c) Show that if v ∈ V , then
v + W := {v + w : w ∈ W }
is a W -affine subspace of V . Show moreover that for any W -affine subspace S ⊂ V
there exists a v ∈ V such that S = v + W .
(d) Show that the map p : V → V /W defined by p(v) = v +W is linear and surjective.
30
(e) Compute the nullspace of p and, if the dimension of V is finite, the dimension of
V /W .
A helpful way to think about the quotient space V /W is to think of it as “being” the
vector space V , but with a new notion of “equality” of vectors. Namely, two vectors
v1 , v2 ∈ V are now seen as equal if v1 − v2 ∈ W . Use this point of view to find a
solution for the following exercise. When you find it, use the formal definition given
above to write your solution rigorously.
12. Let V and X be F-vector spaces, and f ∈ L(V, X). Let W be a subspace of V
contained in N (f ). Consider the quotient space V /W and the map p : V → V /W from
the previous exercise.
(a) Show that there exists a unique f˜ ∈ L(V /W, X) such that f = f˜ ◦ p.
(b) Show that f˜ is injective if and only if W = N (f ).
31
3
3.1
Dual spaces
Definitions
Consider the space F n and write each vector as a column vector. When can we say that a
vector is zero? When all coordinates are zero. Further, we say that two vectors are the same
if all of their coordinates are the same. This motivates the definition of the coordinate maps
e∗i : F n → F by e∗i (~v ) = i-th coordinate of ~v .
Notice that each e∗i is a linear function from F n to F . Furthermore they are linearly independent, so since the dimension of L(F n , F ) is n, they form a basis.
Last it is clear that a vector ~v is zero if and only if e∗i (~v ) = 0 for all i. This is true if
and only if f (~v ) = 0 for all f which are linear functions from F n → F . This motivates the
following definition.
Definition 3.1.1. If V is a vector space over F we define the dual space V ∗ as the space of
linear functionals
V ∗ = {f : V → F | f is linear} .
We can view this as the space L(V, F ), where F is considered as a one-dimensional vector
space over itself.
Suppose that V is finite dimensional and f ∈ V ∗ . Then by the rank-nullity theorem,
either f ≡ 0 or N (f ) is a dim V − 1 dimensional subspace of V . Conversely, you will show
in the homework that any dim V − 1 dimensional subspace W (that is, a hyperspace) is the
nullspace of some linear functional.
Definition 3.1.2. If β = {v1 , . . . , vn } is a basis for V then we define the dual basis β ∗ =
{fv1 , . . . , fvn } as the unique functionals satisfying
(
1 i=j
.
fvi (vj ) =
0 i 6= j
From our proof of the dimension of L(V, W ) we know that β ∗ is a basis of V ∗ .
Proposition 3.1.3. Given a basis β = {v1 , . . . , vn } of V and dual basis β ∗ of V ∗ we can
write
f = f (v1 )fv1 + · · · + f (vn )fvn .
In other words, the coefficients for f in the dual basis are just f (v1 ), . . . , f (vn ).
Proof. Given f ∈ V ∗ , we can write
f = a1 fv1 + · · · + an fvn .
To find the coefficients, we evaluate both sides at vk . The left is just f (vk ). The right is
ak fvk (vk ) = ak .
Therefore ak = f (vk ) are we are done.
32
3.2
Annihilators
We now study annihilators.
Definition 3.2.1. If S ⊆ V we define the annihilator of S as
S ⊥ = {f ∈ V ∗ : f (v) = 0 for all v ∈ S} .
Theorem 3.2.2. Let V be a vector space and S ⊆ V .
1. S ⊥ is a subspace of V ∗ (although S does not have to be a subspace of V ).
2. S ⊥ = (Span S)⊥ .
3. If dim V < ∞ and U is a subspace of V then whenever {v1 , . . . , vk } is a basis for U
and {v1 , . . . , vn } is a basis for V ,
{fvk+1 , . . . , fvn } is a basis for U ⊥ .
Proof. First we show that S ⊥ is a subspace of V ∗ . Note that the zero functional obviously
sends every vector in S to zero, so 0 ∈ S ⊥ . If c ∈ F and f1 , f2 ∈ S ⊥ , then for each v ∈ S,
(cf1 + f2 )(v) = cf1 (v) + f2 (v) = 0 .
So cf1 + f2 ∈ S ⊥ and S ⊥ is a subspace of V ∗ .
Next we show that S ⊥ = (Span S)⊥ . To prove the forward inclusion, take f ∈ S ⊥ . Then
if v ∈ Span S we can write
v = a1 v1 + · · · + am vm
for scalars ai ∈ F and vi ∈ S. Thus
f (v) = a1 f (v1 ) + · · · + am f (vm ) = 0 ,
so f ∈ (Span S)⊥ . On the other hand if f ∈ (Span S)⊥ then clearly f (v) = 0 for all v ∈ S
(since S ⊆ Span S). This completes the proof of item 2.
For the third item, we know that the functionals fvk+1 , . . . , fvn are linearly independent.
Therefore we just need to show that they span U ⊥ . To do this, take f ∈ U ⊥ . We can write
f in terms of the dual basis fv1 , . . . , fvn :
f = a1 fv1 + · · · + ak fvk + ak+1 fvk+1 + · · · + an fvn .
Using the formula we have for the coefficients, we get aj = f (vj ), which is zero for j ≤ k.
Therefore
f = ak+1 fvk+1 + · · · + an fvn
and we are done.
33
Corollary 3.2.3. If V is finite dimensional and W is a subspace then
dim V = dim W + dim W ⊥ .
Definition 3.2.4. For S 0 ⊆ V ∗ we define
⊥
(S 0 ) = {v ∈ V : f (v) = 0 for all f ∈ S 0 } .
In the homework you will prove similar properties for ⊥ (S 0 ).
Fact: v ∈ V is zero if and only if f (v) = 0 for all f ∈ V ∗ . One implication is easy. To prove
the other, suppose that v 6= ~0 and and extend {v} to a basis for V . Then the dual basis has
the property that fv (v) 6= 0.
Proposition 3.2.5. If W ⊆ V is a subspace and V is finite-dimensional then ⊥ (W ⊥ ) = W .
Proof. If w ∈ W then for all f ∈ W ⊥ , we have f (w) = 0, so w ∈⊥ (W ⊥ ). Suppose conversely
that w ∈ V has f (w) = 0 for all f ∈ W ⊥ . If w ∈
/ W then build a basis {v1 , . . . , vn } of V
such that {v1 , . . . , vk } is a basis for W and vk+1 = w. Then by the previous proposition,
{fvk+1 , . . . , fvn } is a basis for W ⊥ . However fw (w) = 1 6= 0, which is a contradiction, since
fw ∈ W ⊥ .
3.3
Double dual
Lemma 3.3.1. If v ∈ V is nonzero and dim V < ∞, there exists a linear functional fv ∈ V ∗
such that fv (v) = 1. Therefore v = ~0 if and only if f (v) = 0 for all f ∈ V ∗ .
Proof. Extend {v} to a basis of V and consider the dual basis. fv is in this basis and
fv (v) = 1.
For each v ∈ V we can define the evaluation map ve : V ∗ → F by
ve(f ) = f (v) .
Theorem 3.3.2. Suppose that V is finite-dimensional. Then the map Φ : V → V ∗∗ given
by
Φ(v) = ve
is an isomorphism.
Proof. First we show that if v ∈ V then Φ(v) ∈ V ∗∗ . Clearly ve maps V ∗ to F , but we just
need to show that ve is linear. If f1 , f2 ∈ V ∗ and c ∈ F then
ve(cf1 + f2 ) = (cf1 + f2 )(v) = cf1 (v) + f2 (v) = ce
v (f1 ) + ve(f2 ) .
Therefore Φ(v) ∈ V ∗∗ . We now must show that Φ is linear and either one-to-one or onto
(since the dimension of V is equal to the dimension of V ∗∗ ). First if v1 , v2 ∈ V , c ∈ F then
we want to show that
Φ(cv1 + v2 ) = cΦ(v1 ) + Φ(v2 ) .
34
Both sides are elements of V ∗∗ so we need to show they act the same on elements of V ∗ . Let
f ∈ V ∗ . Then
Φ(cv1 + v2 )(f ) = f (cv1 + v2 ) = cf (v1 ) + f (v2 ) = cΦ(v1 )(f ) + Φ(v2 )(f ) = (cΦ(v1 ) + Φ(v2 ))(f ) .
Finally to show one-to-one, we show that N (Φ) = {0}. If Φ(v) = ~0 then for all f ∈ V ∗ ,
0 = Φ(v)(f ) = f (v) .
This implies v = ~0.
Theorem 3.3.3. Let V be finite dimensional.
1. If β = {v1 , . . . , vn } is a basis for V then Φ(β) = {Φ(v1 ), . . . , Φ(vn )} is the double dual
of this basis.
2. If W is a subspace of V then Φ(W ) is equal to (W ⊥ )⊥ .
Proof. Recall that the dual basis of β is β ∗ = {fv1 , . . . , fvn }, where
(
1 i=j
fvi (vj ) =
.
0 i 6= j
Since Φ is an isomorphism, Φ(β) is a basis of V ∗∗ . Now
Φ(vi )(fvk ) = fvk (vi )
which is 1 if i = k and 0 otherwise. This means Φ(β) = β ∗∗ .
Next if W is a subspace, let w ∈ W . Letting f ∈ W ⊥ ,
Φ(w)(f ) = f (w) = 0 .
So w ∈ (W ⊥ )⊥ . However, since Φ is an isomorphism, Φ(W ) is a subspace of (W ⊥ )⊥ . But
they have the same dimension, so they are equal.
3.4
Dual maps
Definition 3.4.1. Let T : V → W be linear. We define the dual map T ∗ : W ∗ → V ∗ by
T ∗ (g)(v) = g(T (v)) .
Theorem 3.4.2. Let V and W be finite dimensional and let β and γ be bases for V and W .
If T : V → W is linear, so is T ∗ . If β ∗ and γ ∗ are the dual bases, then
t
∗
[T ∗ ]γβ ∗ = [T ]βγ .
35
Proof. First we show that T ∗ is linear. If g1 , g2 ∈ W ∗ and c ∈ F then for each v ∈ V ,
T ∗ (cg1 + g2 )(v) = (cg1 + g2 )(T (v)) = cg1 (T (v)) + g2 (T (v))
= cT ∗ (g1 )(v) + T ∗ (g2 )(v) = (cT ∗ (g1 ) + T ∗ (g2 ))(v) .
Next let β = {v1 , . . . , vn }, γ = {w1 , . . . , wm }, β ∗ = {fv1 , . . . , fvn } and γ ∗ = {gw1 , . . . , gwm }
and write [T ]βγ = (ai,j ). Then recall from the lemma that for any f ∈ V ∗ we have
f = f (v1 )fv1 + · · · + f (vn )fvn ,
Therefore the coefficient of fvi for T ∗ (gwk ) is
T ∗ (gwk )(vi ) = gwk (T (vi )) = gwk (a1,i w1 + · · · + am,i wm ) = ak,i .
So
T ∗ (gwk ) = ak,1 fv1 + · · · + ak,n fvn .
∗
This is the k-th column of [T ∗ ]γβ ∗ .
Theorem 3.4.3. If V and W are finite dimensional and T : V → W is linear then R(T ∗ ) =
(N (T ))⊥ and (R(T ))⊥ = N (T ∗ ).
Proof. If g ∈ N (T ∗ ) then T ∗ (g)(v) = 0 for all v ∈ V . If w ∈ R(T ) then w = T (v) for
some v ∈ V . Then g(w) = g(T (v)) = T ∗ (g)(v) = 0. Thus g ∈ (R(T ))⊥ . If, conversely,
g ∈ (R(T ))⊥ then we would like to show that T ∗ (g)(v) = 0 for all v ∈ V . We have
T ∗ (g)(v) = g(T (v)) = 0 .
Let f ∈ R(T ∗ ). Then f = T ∗ (g) for some g ∈ W ∗ . If T (v) = 0, we have f (v) =
T ∗ (g)(v) = g(T (v)) = 0. Therefore f ∈ (N (T ))⊥ . This gives R(T ∗ ) ⊆ (N (T ))⊥ . To show
the other direction,
dim R(T ∗ ) = m − dim N (T ∗ )
and
dim (N (T ))⊥ = n − dim N (T ) .
However, by part 1, dim N (T ∗ ) = m − dim R(T ), so
dim R(T ∗ ) = dim R(T ) = n − dim N (T ) = dim (N (T ))⊥ .
This gives the other inclusion.
36
3.5
Exercises
Notation:
1. Recall the definition of a bilinear function. Let F be a field, and V, W and Z be
F -vector spaces. A function f : V × W → Z is called bilinear if
(a) for each v ∈ V the function fv : W → Z given by fv (w) = f (v, w) is linear as a
function of w and
(b) for each w ∈ W the function fw : V → Z given by fw (v) = f (v, w) is linear as a
function of v.
When Z is the F -vector space F , one calls f a bilinear form.
2. Given a bilinear function f : V × W → Z, we define its left kernel and its right kernel
as
LN (f ) = {v ∈ V : f (v, w) = 0 ∀w ∈ W },
RN (f ) = {w ∈ W : f (v, w) = 0 ∀v ∈ V }.
More generally, for subspaces U ⊂ V and X ⊂ W we define their orthogonal complements
U ⊥f = {w ∈ W : f (u, w) = 0 ∀u ∈ U },
⊥f
X = {v ∈ V : f (v, x) = 0 ∀x ∈ X}.
Notice that LN (f ) = ⊥f W and RN (f ) = V ⊥f .
Exercises:
1. Let V and W be vector spaces over a field F and let f : V × W → F be a bilinear
form. For each v ∈ V , we denote by fv the linear functional W → F given by
fv (w) = f (v, w). For each w ∈ W , we denote by fw the linear functional V → F given
by fw (v) = f (v, w).
(a) Show that the map
Φ : V → W ∗,
Φ(v) = fv
is linear and its kernel is LN (f ).
(b) Analogously, show that the map
Ψ : W → V ∗,
Ψ(w) = fw
is linear and its kernel is RN (f ).
(c) Assume now that V and W are finite-dimensional. Show that the map W ∗∗ → V ∗
given by composing Ψ with the inverse of the canonical isomorphism W → W ∗∗
is equal to Φ∗ , the map dual to Φ.
37
(d) Assuming further dim(V ) = dim(W ), conclude that the following statements are
equivalent:
i.
ii.
iii.
iv.
LN (f ) = {0},
RN (f ) = {0},
Φ is an isomorphism,
Ψ is an isomorphism.
2. Let V, W be finite-dimensional F -vector spaces. Denote by ξV and ξW the canonical
isomorphisms V → V ∗∗ and W → W ∗∗ . Show that if T : V → W is linear then
−1
ξW
◦ T ∗∗ ◦ ξV = T .
3. Let V be a finite-dimensional F -vector space. Show that any basis (λ1 , . . . , λn ) of V ∗
is the dual basis to some basis (v1 , . . . , vn ) of V .
4. Let V be an F -vector space, and S 0 ⊂ V ∗ a subset. Recall the definition
⊥
S 0 = {v ∈ V : λ(v) = 0 ∀λ ∈ S 0 }.
Imitating a proof given in class, show the following:
(a)
⊥
S 0 = ⊥ span(S 0 ).
(b) Assume that V is finite-dimensional, let U 0 ⊂ V ∗ be a subspace, and let (λ1 , . . . , λn )
be a basis for V ∗ such that (λ1 , . . . , λk ) is a basis for U 0 . If (v1 , . . . , vn ) is the basis of V from the previous exercise, then (vk+1 , . . . , vn ) is a basis for ⊥ U 0 . In
particular, dim(U 0 ) + dim(⊥ U 0 ) = dim(V ∗ ).
5. Let V be a finite-dimensional F -vector space, and U ⊂ V a hyperplane (that is, a
subspace of V of dimension dim V − 1).
(a) Show that there exists λ ∈ V ∗ with N (λ) = U .
(b) Show that if µ ∈ V ∗ is another functional with N (µ) = U , then there exists
c ∈ F × with µ = cλ.
6. (From Hoffman-Kunze)
(a) Let A and B be n × n matrices with entries from a field F . Show that T r (AB) =
T r (BA).
(b) Let T : V → V be a linear transformation on a finite-dimensional vector space.
Define the trace of T as the trace of the matrix of T , represented in some basis.
Prove that the definition of trace does not depend on the basis thus chosen.
(c) Prove that on the space of n × n matrices with entries from a field F , the trace
function T r is a linear functional. Show also that, conversely, if some linear
functional g on this space satisfies g(AB) = g(BA) then g is a scalar multiple of
the trace function.
38
4
Determinants
4.1
Permutations
Now we move to permutations. These will be used when we talk about the determinant.
Definition 4.1.1. A permutation on n letters is a function π : {1, . . . , n} → {1, . . . , n}
which is a bijection.
The set of all permutations forms a group under composition. There are n! elements.
There are two main ways to write a permutation.
1. Row notation:
1 2 3 4 5 6
6 4 2 3 5 1
Here we write the elements of {1, . . . , n} in the first row, in order. In the second row
we write the elements they are mapped to, in order.
2. Cycle decomposition:
(1 6)(2 4 3)(5)
All cycles are disjoint. It is easier to compose permutations this way. Suppose π is the
permutation given above and π 0 is the permutation
π 0 = (1 2 3 4 5)(6) .
Then the product of ππ 0 is (here we will apply π 0 first)
(1 6)(2 3 4)(5) (1 2 3 4 5)(6) = (1 3 2 4 5 6) .
It is a simple fact that each permutation has a cycle decomposition with disjoint cycles.
Definition 4.1.2. A transposition is a permutation that swaps two letters and fixes the
others. Removing the fixed letters, it looks like (i j) for i 6= j. An adjacent transposition is
one that swaps neighboring letters.
Lemma 4.1.3. Every permutation π can be written as a product of transpositions. (Not
necessarily disjoint.)
Proof. All we need to do is write a cycle as a product of transpositions. Note
(a1 a2 · · · ak ) = (a1 ak )(a1 ak−1 ) · · · (a1 a3 )(a1 a2 ) .
Definition 4.1.4. A pair of numbers (i, j) is an inversion pair for π if i < j but π(i) > π(j).
Write Ninv (π) for the number of inversion pairs of π.
39
For example in the permutation (13)(245), also written as
1 2 3 4 5
,
3 4 1 5 2
we have inversion pairs (1, 3), (1, 5), (2, 3), (2, 5), (4, 5).
Lemma 4.1.5. Let π be a permutation and σ = (k k +1) be an adjacent transposition. Then
Ninv (σπ) = Ninv (π) ± 1. If σ1 , . . . , σm are adjacent transpositions then
(
even m even
Ninv (πσ1 · · · σm ) − Ninv (π) =
.
odd
m odd
Proof. Let a < b ∈ {1, . . . , n}. If π(a), π(b) ∈ {k, k + 1} then σπ(a) − σπ(b) = −(π(a) − π(b))
so (a, b) is an inversion pair for π if and only if it is not one for σπ. We claim that in all
other cases, the sign of σπ(a) − σπ(b) is the same as the sign of π(a) − π(b). If neither
of π(a) and π(b) is in {k, k + 1} then σπ(a) − σπ(b) = π(a) − π(b). The other cases are
somewhat similar: if π(a) = k but π(b) > k + 1 then σπ(a) − σπ(b) = k + 1 − π(b) < 0 and
π(a) − π(b) = k − π(b) < 0. Keep going.
Therefore σπ has exactly the same inversion pairs as π except for (π −1 (a), π −1 (b)), which
switches status. This proves the lemma.
Definition 4.1.6. Given a permutation π on n letters, we say that π is even if it can be
written as a product of an even number of transpositions and odd otherwise. This is called
the signature (or sign) of a permutation:
(
+1 if π is even
.
sgn(π) =
−1 if π is odd
Theorem 4.1.7. If π can be written as a product of an even number of transpositions, it
cannot be written as a product of an odd number of transpositions. In other words, signature
is well-defined.
Proof. Suppose that π = s1 · · · sk and π = t1 · · · tj . We want to show that k − j is even. In
other words, if k is odd, so is j and if k is even, so is j.
Now note that each transposition can be written as a product of an odd number of
adjacent transpositions:
(5 1) = (5 4)(4 3)(3 2)(1 2)(2 3)(3 4)(4 5)
so write s1 · · · sk = se1 · · · sek0 and t1 · · · tj = e
t1 · · · e
tj 0 where k − k 0 is even and j − j 0 is even.
0
e−1
We have 0 = Ninv (id) = Ninv (e
tj−1
0 · · · t1 π), which is Ninv (π) plus an even number if j
0
0
is even or and odd number if j is odd. This means that Ninv (π) − j is even. The same
argument works for k, so Ninv (π) − k 0 is even. Now
j − k = j − j 0 + j 0 − Ninv (π) + Ninv (π) − k 0 + k 0 − k
is even.
40
Corollary 4.1.8. For any two permutations π and π 0 ,
sgn(ππ 0 ) = sgn(π)sgn(π 0 ) .
4.2
Determinants: existence and uniqueness
Given n vectors ~v1 , . . . , ~vn in Rn we want to define something like the volume of the parallelepiped spanned by these vectors. What properties would we expect of a volume?
1. vol(e1 , . . . , en ) = 1.
2. If two of the vectors ~vi are equal the volume should be zero.
3. For each c > 0, vol(c~v1 , ~v2 , . . . , ~vn ) = c vol(~v1 , . . . , ~vn ). Same in other arguments.
4. For each ~v10 , vol(~v1 +~v10 , ~v2 , . . . , ~vn ) = vol(~v1 , . . . , ~vn )+vol(~v10 , ~v2 , . . . , ~vn ). Same in other
arguments.
Using the motivating example of the volume, we define a multilinear function as follows.
Definition 4.2.1. If V is an n-dimensional vector space over F then define
V n = {(v1 , . . . , vn ) : vi ∈ V for all i = 1, . . . , n} .
A function f : V n → F is called multilinear if for each i and vectors v1 , . . . , vi−1 , vi+1 , . . . , vn ∈
V , the function fi : V → F is linear, where
fi (v) = f (v1 , . . . , vi−1 , v, vi+1 , . . . , vn ) .
A multilinear function f is called alternating if f (v1 , . . . , vn ) = 0 whenever vi = vj for some
i 6= j.
Proposition 4.2.2. Let f : V n → F be a multilinear function. If F does not have characteristic two then f is alternating if and only if for all v1 , . . . , vn and i < j,
f (v1 , . . . , vi , . . . , vj , . . . , vn ) = −f (v1 , . . . , vj , . . . , vi , . . . , vn ) .
Proof. Suppose that f is alternating. Then
0 = f (v1 , . . . , vi + vj , . . . , vi + vj , . . . , vn )
= f (v1 , . . . , vi , . . . , vi + vj , . . . , vn ) + f (v1 , . . . , vj , . . . , vi + vj , . . . , vn )
= f (v1 , . . . , vi , . . . , vj , . . . , vn ) + f (v1 , . . . , vj , . . . , vi , . . . , vn ) .
Conversely suppose that f has the property above. Then if vi = vj ,
f (v1 , . . . , vi , . . . , vj , . . . , vn ) = −f (v1 , . . . , vj , . . . , vi , . . . , vn )
= −f (v1 , . . . , vi , . . . , vj , . . . , vn ) .
Since F does not have characteristic two, this means this is zero.
41
Corollary 4.2.3. Let f : V n → F be an n-linear alternating function. Then for each π ∈ Sn ,
f (vπ(1) , . . . , vπ(n) ) = sgn(π) f (v1 , . . . , vn ) .
Proof. Write π = σ1 · · · σk where the σi ’s are transpositions and (−1)k = sgn(π). Then
f (vπ(1) , . . . , vπ(n) ) = −f (vσ1 ···σk−1 (1) , . . . , vσ1 ···σk−1 (n) ) .
Applying this k − 1 more times gives the corollary.
Theorem 4.2.4. Let {v1 , . . . , vn } be a basis for V . There is at most one multilinear alternating function f : V n → F such that f (v1 , . . . , vn ) = 1.
Proof. Let u1 , . . . , un ∈ V and write
uk = a1,k v1 + · · · + an,k vn .
Then
f (u1 , . . . , un ) =
=
n
X
ai1 ,1 f (vi1 , u2 , . . . , un )
i1 =1
n
X
ai1 ,1 · · · ain ,n f (vi1 , . . . , vin ) .
i1 ,...,in =1
However whenever two ij ’s are equal, we get zero, so we can restrict the sum to all distinct
ij ’s. So this is
n
X
ai1 ,1 · · · ain ,n f (vi1 , . . . , vin ) .
i1 ,...,in =1 distinct
This can now be written as
X
aπ(1),1 · · · aπ(n),n f (vπ(1) , . . . , vπ(n) )
π∈Sn
=
X
sgn(π) aπ(1),1 · · · aπ(n),n f (v1 , . . . , vn )
π∈Sn
=
X
sgn(π) aπ(1),1 · · · aπ(n),n .
π∈Sn
We now find that if dim V = n and f : V n → F is an n-linear alternating function
with f (v1 , . . . , vn ) = 1 for some fixed basis {v1 , . . . , vn } then we have a specific form for f .
Writing vectors u1 , . . . , un as
uk = a1,k v1 + · · · + an,k vn ,
42
then we have
f (u1 , . . . , un ) =
X
sgn(π) aπ(1),1 · · · aπ(n),n .
π∈Sn
Now we would like to show that the formula above indeed does define an n-linear alternating
function with the required property.
1. Alternating. Suppose that ui = uj for some i < j. We will then split the set of
permutations into two classes. Let A = {π ∈ Sn : π(i) < π(j)}. Letting σi,j = (ij),
write π̂ for π ∈ A for the permutation πσi,j .
X
X
f (u1 , . . . , un ) =
sgn(π) aπ(1),1 · · · aπ(n),n +
sgn(τ ) aτ (1),1 · · · aτ (n),n
π∈A
=
X
τ ∈Sn \A
sgn(π) aπ(1),1 · · · aπ(n),n +
π∈A
X
sgn(πσi,j ) aπσi,j (1),1 · · · aπσi,j (n),n .
pi∈A
However, πσi,j (k) = π(k) when k 6= i, j and ui = uj so this equals
X
[sgn(π) + sgn(πσi,j )]aπ(1),1 · · · aπ(n),n = 0 .
π∈A
2. 1 at the basis. Note that for ui = vi for all i we have
(
1 i=j
ai,j =
.
0 i 6= j
Therefore the value of f is
X
sgn(π) aπ(1),1 · · · aπ(n),n = sgn(id)a1,1 · · · an,n = 1 .
π∈Sn
3. Multilinear. Write
uk = a1,k v1 + · · · + an,k vn for k = 1, . . . , n
and write u = b1 v1 + · · · + bn vn . Now for c ∈ F ,
cu + u1 = (cb1 + a1,1 )v1 + · · · + (cbn + an,1 )vn .
f (cu + u1 , u2 , . . . , un )
X
=
sgn(π) [cbπ(1) + aπ(1),1 ]aπ(2),2 · · · aπ(n),n
π∈Sn
= c
X
sgn(π) bπ(1) aπ(2),2 · · · aπ(n),n +
π∈Sn
X
π∈Sn
= cf (u, u2 , . . . , un ) + f (u1 , . . . , un ) .
43
sgn(π) aπ(1),1 aπ(2),2 · · · aπ(n),n
4.3
Properties of determinants
Theorem 4.3.1. Let f : V n → F be a multilinear alternating function and let {v1 , . . . , vn }
be a basis with f (v1 , . . . , vn ) 6= 0. Then {u1 , . . . , un } is linearly dependent if and only if
f (u1 , . . . , un ) = 0.
Proof. One direction is on the homework: suppose that f (u1 , . . . , un ) = 0 but that {u1 , . . . , un }
is linearly independent. Then write
vk = a1,k u1 + · · · + an,k un .
By the same computation as above,
f (v1 , . . . , vn ) = f (u1 , . . . , un )
X
sgn(π) aπ(1),1 · · · aπ(n),n = 0 ,
π∈Sn
which is a contradiction. Therefore {u1 , . . . , un } is linearly dependent.
Conversely, if {u1 , . . . , un } is linearly dependent then for n ≥ 2 we can find some uj and
scalars ai for i 6= j such that
X
ai u i .
uj =
i6=j
Now we have
f (u1 , . . . , uj , . . . , un ) =
X
ai f (u1 , . . . , ui , . . . , un ) = 0 .
i6=j
Definition 4.3.2. On the space F n we define det : F n → F as the unique alternating
n-linear function that gives det(e1 , . . . , en ) = 1. If A is an n × n matrix then we define
det A = det(~a1 , . . . , ~an ) ,
where ~ak is the k-th column of A.
Corollary 4.3.3. An n × n matrix A over F is invertible if and only if det A 6= 0.
Proof. We have det A 6= 0 if and only if the columns of A are linearly independent. This is
true if and only if A is invertible.
We start with the multiplicative property of determinants.
Theorem 4.3.4. Let A and B be n × n matrices over a field F . Then
det (AB) = det A · det B .
44
Proof. If det B = 0 then B is not invertible, so it cannot have full column rank. Therefore
neither can AB (by a homework problem). This means det (AB) = 0 and we are done.
Otherwise det B 6= 0. Define a function f : Mn×n (F ) → F by
f (A) =
det (AB)
.
det B
We claim that f is n-linear, alternating and assigns the value 1 to the standard basis (that
is, the identity matrix).
1. f is alternating. If A has two equal columns then its column rank is not full.
Therefore neither can be the column rank of AB and we have det (AB) = 0. This
implies f (A) = 0.
2. f (I) = 1. This is clear since IB = B.
3. f is n-linear. This follows because det is.
But there is exactly one function satisfying the above. We find f (A) = det A and we are
done.
For the rest of the lecture we will give further properties of determinants.
• det A = det At . This is on homework.
• det is alternating and n-linear as a function of rows.
• If A is (a block matrix) of the form
A=
B C
0 D
then det A = det B det D. This is also on homework.
• The determinant is unchanged if we add a multiple of one column (or row) to another.
To show this, write a matrix A as a collection of columns (~a1 , . . . , ~an ). For example if
we add a multiple of column 1 to column 2 we get
det(~a1 , c~a1 + ~a2 , ~a3 , . . . , ~an ) = det c(~a1 , ~a1 , ~a3 , . . . , ~an ) + det(~a1 , ~a2 , ~a3 , . . . , ~an )
= det(~a1 , ~a2 , ~a3 , . . . , ~an )
• det cA = cn det A.
We will now discuss cofactor expansion.
Definition 4.3.5. Let A ∈ Mn×n (F ). For i, j ∈ [1, n] define the (i, j)-minor of A (written
A(i|j)) to be the (n − 1) × (n − 1) matrix obtained from A by removing the i-th row and the
j-th column.
45
Theorem 4.3.6 (Laplace expansion). Let A ∈ Mn×n (F ) for n ≥ 2 and fix j ∈ [1, n]. We
have
n
X
det A =
(−1)i+j Ai,j det A(i|j) .
i=1
Proof. Let us begin by taking j = 1. Now write the column ~a1 as
~a1 = A1,1 e1 + a2,1 e2 + · · · + An,1 en .
Then we get
det A = det(~a1 , . . . , ~an ) =
n
X
Ai,1 det(ei , ~a2 , . . . , ~an ) .
(2)
i=1
We now consider the term det(ei , ~a2 , . . . , ~an ).

0 A1,2


 1 Ai,2


0 An,2
This is the determinant of the following matrix:

· · · A1,n

···

· · · Ai,n 
 .

···
· · · An,n
Here, the first column is 0 except for a 1 in the i-th spot. We can now swap the i-th row to
the top using i − 1 adjacent transpositions (12) · · · (i − 1 i). We are left with the determinant
of the matrix


1 Ai,2 · · · Ai,n
 0 A1,2 · · · A1,n 




···


i−1 
(−1)  0 Ai−1,2 · · · Ai−1,n 
 .
 0 Ai+1,2 · · · Ai+1,n 




···
0 An,2 · · · An,n
This is a block matrix of the form
1
B
0 A(i|1)
.
By the remarks earlier, the determinant is equal to det A(i|j) × 1. Plugging this into formula
(5), we get
n
X
det A =
(−1)i−1 Ai,1 det A(i|1) ,
i=1
Pn
i+j
which equals i=1 (−1) Ai,j det A(i|j).
If j 6= 1 then we perform j − 1 adjacent column switches to get the j-th column to the
e For this matrix, the formula holds. Compensating for
first. This gives us a new matrix A.
46
the switches,
det A = (−1)
j−1
e = (−1)j−1
det A
n
X
ei,1 det A(i|1)
e
(−1)i−1 A
i=1
=
n
X
(−1)i+j Ai,j det A(i|j) .
i=1
Check the last equality. This completes the proof.
We have discussed determinants of matrices (and of n vectors in F n ). We will now define
for transformations.
Definition 4.3.7. Let V be a finite dimensional vector space over a field F . If T : V → V
is linear, we define det T as det[T ]ββ for any basis β of V .
• Note that det T does not depend on the choice of basis. Indeed, if β 0 is another basis,
0
0
det[T ]ββ 0 = det [I]ββ [T ]ββ [I]ββ 0 = det[T ]ββ .
• If T and U are linear transformations from V to V then det T U = det T det U .
• det cT = cdim
V
det T .
• det T = 0 if and only if T is non-invertible.
4.4
Exercises
Notation:
1. n = {1, . . . , n} is the finite set of natural numbers between 1 and n;
2. Sn is the set of all bijective maps n → n;
3. For a sequence k1 , . . . , kt of distinct elements of n, we denote by (k1 k2 . . . kt ) the element
σ of Sn which is defined by


ks , i = ks−1 , 1 < s < t + 1
σ(i) = k1 , i = kt


i,
i∈
/ {k1 , . . . , kt }
Elements of this form are called cycles (or t-cycles). Two cycles (k1 . . . kt ) and (l1 . . . ls )
are called disjoint if the sets {k1 , . . . , kt } and {l1 , . . . , ls } are disjoint.
4. Let σ ∈ Sn . A subset {k1 , . . . , kt } ⊂ n is called an orbit of σ if the following conditions
hold
47
• For any j ∈ N there exists an 1 ≤ i ≤ t such that σ j (k1 ) = ki .
• For any 1 ≤ i ≤ t there exists a j ∈ N such that ki = σ j (k1 ).
Here σ j is the product of j-copies of σ.
5. Let V and W be two vector spaces over an arbitrary field F , and k ∈ N. Recall that
a k-linear map f : V k → W is called
• alternating, if f (v1 , . . . , vk ) = 0 whenever the vectors (v1 , . . . , vk ) are not distinct;
• skew-symmetric, if f (v1 , . . . , vk ) = −f (vτ (1) , . . . , vτ (k) ) for any transposition τ ∈
Sk ;
• symmetric, if f (v1 , . . . , vk ) = f (vτ (1) , . . . , vτ (k) ) for any transposition τ ∈ Sk .
6. If k and n are positive integers such that k ≤ n the binomial coefficient nk is defined
as
n
n!
.
=
k!(n − k)!
k
Note that this number is equal to the number of distinct subsets of size k of a finite
set of size n.
7. Given an F-vector space V , denote by Altk (V ) the set of alternating k-linear forms
(functions) on V ; that is, the set of alternating k-linear map V k → F.
Exercises:
1. Prove that composition of maps defines a group law on Sn . Show that this group is
abelian only if n ≤ 2.
2. Let f : Sn → Z be a function which is multiplicative, i.e. f (στ ) = f (σ)f (τ ). Show that
f must be one of the following three functions: f (σ) = 0, f (σ) = 1, f (σ) = sgn(σ).
3. (From Dummit-Foote) List explicitly the 24 permutations of degree 4 and state which
are odd and which are even.
4. Let k1 , . . . , kt be a sequence of distinct elements of n. Show that sgn((k1 . . . kt )) =
(−1)t−1 .
5. Let π ∈ Sn be the element (k1 . . . kt ) from the previous exercise, and let σ ∈ Sn be any
element. Find a formula for the element σπσ −1 .
6. Let σ = (k1 . . . kt ) and τ = (l1 . . . ls ) be disjoint cycles. Show that then στ = τ σ. One
says that σ and τ commute.
7. Let σ ∈ Sn . Show that σ can be written as a product of disjoint (and hence, by the
previous exercise, commuting) cycles.
Hint: Consider the orbits of σ.
48
8. If G is a group and S ⊆ G is a subset, define hSi to be the intersection of all subgroups
of G that contain S. (This is the subgroup of G generated by S.)
(a) Show that if S is a subset of G then hSi is a subgroup of G.
(b) Let S be a subset of G and define
S = S ∪ {s−1 : s ∈ S} .
Show that
hSi = {a1 · · · ak : k ≥ 1 and ai ∈ S for all i} .
9. Prove that Sn = h(12), (12 · · · n)i.
Hint: Use exercise 5.
10. Let V be an finite-dimensional vector space over some field F , W an arbitrary vector
space over F , and k > dim(V ). Show that every alternating k-linear function V k → W
is identically zero. Give an example (choose F , V , W , and k as you wish, as long as
k > dim(V )) of a skew-symmetric k-linear function V k → W which is not identically
zero.
11. (From Hoffman-Kunze) Let F be a field and f : F2 → F be a 2-linear alternating
function. Show that
a
c
f(
,
) = (ad − bc)f (e1 , e2 ).
b
d
Find an analogous formula for F3 . Deduce from this the formula for the determinant
of a 2 × 2 and a 3 × 3 matrix.
12. Let V be an n-dimensional vector space over a field F. Suppose that f : V n → F is an
n-linear alternating function such that f (v1 , . . . , vn ) 6= 0 for some basis {v1 , . . . , vn } of
V . Show that f (u1 , . . . , un ) = 0 implies that {u1 , . . . , un } is linearly dependent.
13. Let V and W be vector spaces over a field F , and f : V → W a linear map.
(a) For η ∈ Altk (W ) let f ∗ η be the function on V k defined by
[f ∗ η](v1 , . . . , vk ) = η(f (v1 ), . . . , f (vk )).
Show that f ∗ η ∈ Altk (V ).
(b) Show that in this way we obtain a linear map f ∗ : Altk (W ) → Altk (V ).
(c) Show that, given a third vector space X over F and a linear map g : W → X,
one has (g ◦ f )∗ = f ∗ ◦ g ∗ .
(d) Show that if f is an isomorphism, then so is f ∗ .
49
14. For n ≥ 2, we call M ∈ Mn×n (F) a block upper-triangular matrix if there exists k with
1 ≤ k ≤ n − 1 and matrices A ∈ Mk×k (F), B ∈ Mk×(n−k) (F) and C ∈ M(n−k)×(n−k) (F)
such that M has the form
A B
.
0 C
That is, the elements of M are given by

Ai,j



B
i,j−k
Mi,j =

0



Ci−k,j−k
1 ≤ i ≤ k,
1 ≤ i ≤ k,
k < i ≤ n,
k < i ≤ n,
1≤j≤k
k<j≤n
.
1≤j≤k
k<j≤n
We will show in this exercise that
det M = det A · det C .
(3)
(a) Show that if det C = 0 then formula (3) holds.
(b) Suppose that det C 6= 0 and define a function  7→ φ  for  ∈ Mk×k (F) by
 B
−1
φ Â = [det C] det
.
0 C
That is, φ Â is a scalar multiple of the determinant of the block upper-triangular
matrix we get when we replace A by  and keep B and C fixed.
i. Show that φ is k-linear as a function of the columns of Â.
ii. Show that φ is alternating and satisfies φ(Ik ) = 1, where Ik is the k × k
identity matrix.
iii. Conclude that formula (3) holds when det C 6= 0.
15. Suppose that A ∈ Mn×n (F) is upper-triangular; that is, ai,j = 0 when 1 ≤ j < i ≤ n.
Show that det A = a1,1 a2,2 · · · an,n .
16. Let A ∈ Mn×n (F) such that Ak = 0 for some k ≥ 0. Show that det A = 0.
17. Let a0 , a1 , . . . , an be distinct complex numbers.

1 a0 a20 · · ·
 1 a1 a21 · · ·


···
1 an a2n · · ·
Write Mn (a0 , . . . , an ) for the matrix

an0
an1 
 .

n
an
The goal of this exercise is to show that
det Mn (a0 , . . . , an ) =
Y
0≤i<j≤n
We will argue by induction on n.
50
(aj − ai ) .
(4)
(a) Show that if n = 2 then formula (5) holds.
(b) Now suppose that k ≥ 3 and that formula (5) holds for all 2 ≤ n ≤ k. Show that
it holds for n = k + 1 by completing the following outline.
i. Define the function f : C → C by f (z) = det Mn (z, a1 , . . . , an ). Show that f
is a polynomial of degree at most n.
ii. Find all the zeros of f .
iii. Show that the coefficient of z n is (−1)n det Mn−1 (a1 , . . . , an ).
iv. Show that formula (5) holds for n = k + 1, completing the proof.
18. Show that if A ∈ Mn×n (F) then det A = det At , where At is the transpose of A.
19. Let V be an n-dimensional F-vector space and k ≤ n. The purpose of this problem is
to show that
n
k
dim(Alt (V )) =
,
k
by completing the following steps:
(a) Let W be a subspace of V and let B = (v1 , . . . , vn ) be a basis for V such that
(v1 , . . . , vk ) is a basis for W . Show that
(
vi , i ≤ k
pW,B : V → W,
vi 7→
0, i > k
specifies a linear map with the property that pW,B ◦ pW,B = pW,B . Such a map
(that is, a T such that T ◦ T = T ) is called a projection.
(b) With W and B as in the previous part, let dW be a non-zero element of Altk (W ).
Show that [pW,B ]∗ dW is a non-zero element of Altk (V ). (Recall this notation from
exercise 3.)
(c) Let B = (v1 , . . . , vn ) be a basis of V . Let S1 , . . . , St be subsets of n = {1, . . . , n}.
Assume that each Si has exactly k elements and no two Si ’s are the same. Let
Wi = Span({vj : j ∈ Si }).
For i = 1, . . . , t, let dWi ∈ Altk (Wi ) be non-zero. Show that the collection
{[pWi ,B ]∗ dWi : 1 ≤ i ≤ t} of elements of Altk (V ) is linearly independent.
(d) Show that the above collection is also generating, by taking an arbitrary η ∈
Altk (V ), an arbitrary collection u1 , . . . , uk of vectors in V , expressing each ui as a
linear combination of (v1 , . . . , vk ) and plugging those linear combinations into η.
In doing this, it may be helpful (although certainly not necessary) to assume
that dWi is the unique element of Altk (Wi ) with dWi (w1 , . . . , wk ) = 1, where
Si = (w1 , . . . , wk ).
51
20. Let A ∈ Mn×n (F ) for some field F . Recall that if 1 ≤ i, j ≤ n then the (i, j)-th minor
of A, written A(i|j), is the (n − 1) × (n − 1) matrix obtained by removing the i-th row
and j-th column from A. Define the cofactor
Ci,j = (−1)i+j det A(i|j) .
Note that the Laplace expansion for the determinant can be written
det A =
n
X
Ai,j Ci,j .
i=1
(a) Show that if 1 ≤ i, j, k ≤ n with j 6= k then
n
X
Ai,k Ci,j = 0 .
i=1
(b) Define the classical adjoint of A, written adj A, by
(adj A)i,j = Cj,i .
Show that (adj A)A = (det A)I.
(c) Show that A(adj A) = (det A)I and deduce that if A is invertible then
A−1 = (det A)−1 adj A .
Hint: begin by applying the result of the previous part to At .
(d) Use the formula in the last part to find the inverses



1 2 3
1 2 4
 1 0 0
 1 3 9 , 
 0 1 1
1 4 16
6 0 1
of the following matrices:

4
0 
 .
1 
1
21. Consider a system of equations in n variables with coefficients from a field F . We
can write this as AX = Y for an n × n matrix A, an n × 1 matrix X (with entries
x1 , . . . , xn ) and an n × 1 matrix Y (with entries y1 , . . . , yn ). Given the matrices A and
Y we would like to solve for X.
(a) Show that
n
X
(det A)xj =
(−1)i+j yi det A(i|j) .
i=1
(b) Show that if det A 6= 0 then we have
xj = (det A)−1 det Bj ,
where Bj is an n × n matrix obtained from A by replacing the j-th column of A
by Y . This is known as Cramer’s rule.
52
(c) Solve the following systems of equations using Cramer’s rule.


2x − y + z
2y − z


y−x

2x − y + z − 2t



2x + 2y − 3z + t

−x + y − z



4x − 3y + 2z − 3t
=3
=1
=1
22. Find the determinants of the following matrices.
from R and in the second they are from Z3 .



1
1 4 5 7
 1

 0 0 2 3 

,  0

 1 4 −1 7 
 0
2 8 10 14
0
53
= −5
= −1
= −1
= −8
In the first example, the entries are
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1






5
Eigenvalues
5.1
Definitions and the characteristic polynomial
The simplest matrix is λI for some λ ∈ F . These act just like the field F . What is the
second simplest? A diagonal matrix; that is, a matrix D that satisfied Dij = 0 if i 6= j.
Definition 5.1.1. Let V be a finite dimensional vector space over F . An linear transformation T is called diagonalizable if there exists a basis β such that [T ]ββ is diagonal.
Proposition 5.1.2. T is diagonalizable if and only if there exists a basis {v1 , . . . , vn } of V
and scalars λ1 , . . . , λn such that
T (vi ) = λi vi for all i .
Proof. Suppose that T is diagonalizable. Then there is a basis {v1 , . . . , vn } such that [T ]ββ is
diagonal. Then, writing D = [T ]ββ ,
[T vk ]β = D[vk ]β = D · ek = Dk,k .
Now we can choose λk = Dk,k .
If the second condition holds then we see that [T ]ββ is diagonal with entries Di,j = 0 if
i 6= j and Di,i = λi .
This motivates the following definition.
Definition 5.1.3. If T : V → V is linear then we call a nonzero vector v an eigenvector of
T if there exists λ ∈ F such that T (v) = λv. In this case we call λ the eigenvalue for v.
Theorem 5.1.4. If dim V < ∞ and T : V → V is linear then the following are equivalent.
1. λ is an eigenvalue of T (for some eigenvector).
2. T − λI is not invertible.
3. det(T − λI) = 0.
Proof. If (1) holds then the eigenvector v is a non-zero vector in the nullspace of T −λI. Thus
T − λI is not invertible. We already know that (2) and (3) are equivalent. If T − λI is not
invertible then there is a non-zero vector in its nullspace. This vector is an eigenvector.
Definition 5.1.5. If T : V → V is linear and dim V = n then we define the characteristic
polynomial c : F → F by
c(λ) = det(T − λI) .
• Note that c(λ) does not depend on the choice of basis.
54
• We can write in terms of the matrix.
c(λ) = det(T − λI) = det([T − λI]ββ ) = det([T ]ββ − λId) .
• Eigenvalues are exactly the roots of c(λ).
Facts about c(x).
1. c is a polynomial of degree n. We can see this by analyzing each term in the definition
of the determinant: set B = A − xI and see
sgn(π) B1,π(1) · · · Bn,π(n) .
Each term Bi,π(i) is a polynomial of degree 0 or 1 in x. So the product has degree at
most n. A sum of such polynomials is a polynomial of degree at most n.
In fact, the only term of degree n is
sgn(id) B1,1 · · · Bn,n = (A1,1 − x) · · · (An,n − x) .
So the coefficient of xn is (−1)n .
2. In the above description of c(x), all terms corresponding to non-identity permutations
π have degree at most n − 2. Therefore the degree n − 1 term comes from (A1,1 −
x) · · · (An,n − x) as well. It is
(−1)n−1 xn−1 [A1,1 + · · · + An,n ] = (−1)n−1 xn−1 T r A .
3. Because c(0) = det A,
c(x) = (−1)n xn − T r A xn−1 + · · · + det A .
For F = C (or any field so that c(x) splits), we can always write c(x)
= (−1)n (x −
Q
λ1 ) · · · (x − λn ). Thus the constant term in the polynomial is (−1)n λi . Therefore
"
#
n
Y
c(x) = (−1)n xn − T r A xn−1 + · · · +
λi .
i=1
We find det A =
Q
λi in C.
Theorem 5.1.6. If dim V = n and c(λ) has n distinct roots then T is diagonalizable. The
converse is not true.
Proof. Write the eigenvalues as λ1 , . . . , λn . For each λi we have an eigenvector vi . We claim
that the vi ’s are linearly independent. This follows from the lemma:
55
Lemma 5.1.7. If λ1 , . . . , λk are k-distinct eigenvalues associated to eigenvectors v1 , . . . , vk
then {v1 , . . . , vk } is linearly independent.
Proof. Suppose that
a1 v1 + · · · + ak vk = ~0 .
Take T of both sides
a1 λ1 v1 + · · · + ak λk vk = ~0 .
Keep doing this k − 1 times so we get the system of equations
a1 v1 + · · · + ak vk
a1 λ1 v1 + · · · + ak λk vk
k−1
a1 λk−1
1 v1 + · · · + ak λk vk
=
=
···
=
~0
~0
~0
Write each vi as [vi ]β for some basis β. This is then equivalent to the matrix equation



 1 λ1 · · · λk−1


1
k−1


 a1 (v1 )β a2 (v2 )β · · · ak (vk )β   1 λ2 · · · λ2  =  ~0 ~0 · · · ~0  .


···
1 λk · · · λk−1
k
Here the left matrix is n × k and has j-th column equal to the column vector aj [vj ]β . But
the middle matrix has nonzero determinant when the λi ’s are distinct: its determinant is
Q
1≤i<j≤k (λj − λi ). Therefore it is invertible. Multiplying both sides by its inverse, we find
ai vi = ~0 for all i. Since vi 6= ~0, it follows that ai = 0 for all i.
5.2
Eigenspaces and the main diagonalizability theorem
Definition 5.2.1. If λ ∈ F we define the eigenspace
Eλ = N (T − λI) = {v ∈ V : T (v) = λv} .
Note that Eλ is a subspace even if λ is not an eigenvalue. Furthermore,
Eλ 6= {0} if and only if λ is an eigenvalue of T
and
Eλ is T -invariant for all λ ∈ F .
What this means is that if v ∈ Eλ then so is T (v):
(T − λI)(T (v)) = (T − λI)(λv) = λ(T − λI)v = ~0 .
56
Definition 5.2.2. If W1 , . . . , Wk are subspaces of a vector space V then we write
W1 ⊕ · · · ⊕ Wk
for the sum space W1 + · · · + Wk and say the sum is direct if
Wj ∩ [W1 + · · · + Wj−1 ] = {0} for all j = 2, . . . , k .
We also say the subspaces are independent.
Theorem 5.2.3. If λ1 , . . . , λk are distinct eigenvalues of T then
Eλ1 + · · · + Eλk = Eλ1 ⊕ · · · ⊕ Eλk .
Furthermore
dim
k
X
!
Eλi
=
i=1
k
X
dim Eλi .
i=1
Proof. The theorem will follows directly from the following lemma. The proof is in homework.
Lemma 5.2.4. Let W1 , . . . , Wk be subspaces of V . The following are equivalent.
1.
W1 + · · · + Wk = W1 ⊕ · · · ⊕ Wk .
2. Whenever w1 + · · · + wk = ~0 for wi ∈ Wi for all i, we have wi = ~0 for all i.
3. Whenever
βi is a basis for Wi for all i, the βi ’s are disjoint and β := ∪ki=1 βi is a basis
Pk
for i=1 Wi .
So take w1 +· · ·+wk = ~0 for wi ∈ Eλi for all i. Note that each nonzero wi is an eigenvector
for the eigenvalue λi . Remove all the zero ones. If we are left with any nonzero ones, by the
previous theorem, they must be linearly independent. This would be a contradiction. So
they are all zero.
P
For the second claim take bases βi of Eλi . By the lemma, ∪ki=1 βi is a basis for ki=1 Eλi .
This implies the claim.
Theorem 5.2.5 (Main diagonalizability theorem). Let T : V → V be linear and dim V <
∞. The following are equivalent.
1. T is diagonalizable.
2. c(x) can be written as (−1)n (x − λ1 )n1 · · · (x − λk )nk , where ni = dim Eλi for all i.
3. V = Eλ1 ⊕ · · · ⊕ Eλk , where λ1 , . . . , λk are the distinct eigenvalues of T .
57
Proof. Suppose first that T is diagonalizable. Then there exists a basis β of eigenvectors for
T ; that is, for which [T ]ββ is diagonal. Clearly each diagonal element is an eigenvalue. For
each i, call ni the number of entries on the diagonal that are equal to λi . Then [T − λi I]ββ
has ni number of zeros on the diagonal. All other diagonal entries must be non-zero, so the
nullspace has dimension ni . In other words, ni = dim Eλi .
Suppose that condition 2 holds. Since c is a polynomial of degree n we must have
dim Eλ1 + · · · + dim‘Eλk = dim V .
However since the λi ’s are distinct the previous theorem gives that
dim
k
X
Eλi = dim V .
i=1
P
In other words, V = ki=1 Eλi . The previous theorem implies that the sum is direct and the
claim follows.
Suppose that condition 3 holds. Then take βi a basis for Eλi for all i. Then β = ∪ki=1 βi
is a basis for V . We claim that [T ]ββ is diagonal. This is because each vector in β is an
eigenvector. This proves 1 and completes the proof.
5.3
Exercises
1. Let V be anP
F-vector space and let W1 , . . . , Wk be subspaces of V . Recall the definition
of the sum ki=1 Wi . It is the subspace of V given by
{w1 + · · · + wk : wi ∈ Wi }.
Recall further that this sum is called direct, and written as
1 < i ≤ n we have
i−1
X
Wi ∩ (
Wj ) = {0}.
Lk
i=1
if and only if for all
j=1
Show that the following statements are equivalent:
P
(a) The sum ki=1 Wi is direct.
(b) For any collection {w1 , . . . , wk } with wi ∈ Wi for all i, we have
k
X
wi = 0 ⇒ ∀i : wi = 0.
i=1
(c) If, for each i, βi is a basis of Wi , then
Pk the βi ’s are disjoint and their union
k
β = ti=1 βi is a basis for the subspace i=1 Wi .
P
(d) For any v ∈ ki=1 Wi there exist unique vectors w1 , . . . , wk such that wi ∈ Wi for
P
all i and v = ki=1 wi .
58
2. Let V be an F-vector space. Recall that a linear map p ∈ L(V, V ) is called a projection
if p ◦ p = p.
(a) Show that if p is a projection, then so is q = idV − p, and we have p ◦ q = q ◦ p = 0.
L
(b) Let W1 , . . . , Wk be subspaces of V and assume that V = ki=1 Wi . For 1 ≤ t ≤ k,
show that there is a unique element pt ∈ L(V, Wt ) such that for any choice of
vectors w1 , . . . , wk such that wj ∈ Wj for all j,
(
wj j = t
pt (wj ) =
.
0
j 6= t
(c) ShowP
that each pt defined in the previous part is a projection. Show furthermore
that ki=1 pt = idV and that for t 6= s we have pt ◦ ps = 0.
(d) Show
Pkconversely that if p1 , . . . , pt ∈ L(V, V ) are projections with the properties
(a) i=1 pi = idV and (b) pi ◦ pj = 0 for all i 6= j, and if we put Wi = R(pi ), then
L
V = ki=1 Wi .
3. Let V be an F-vector space.
(a) If U ⊂ V is a subspace, W is another F-vector space, and f ∈ L(V, W ), define
f |U : U → W by
f |U (u) = f (u)
∀u ∈ U.
Show that the map f 7→ f |U is a linear map L(V, W ) → L(U, W ). It is called the
restriction map.
(b) Let f ∈ L(V, V ) and let U ⊂ V be an f -invariant subspace (that is, a subspace U
with the property that f (u) ∈ U whenever u ∈ U ). Observe that f |U ∈ L(U, U ).
If W ⊂ V is another f -invariant subspace and V = U ⊕ W , show that
N (f ) = N (f |U )⊕N (f |W ),
R(f ) = R(f |U )⊕R(f |W ),
det(f ) = det(f |U ) det(f |W ).
(c) Let f, g ∈ L(V, V ) be two commuting endomorphisms, i.e. we have f ◦ g = g ◦ f .
Show that N (g) and R(g) are f -invariant subspaces of V .
4. Consider the matrix
1 1
A :=
.
0 1
Show that there does not exist an invertible matrix P ∈ M2×2 (C) such that P AP −1 is
diagonal.
5. Let V be a finite-dimensional F-vector space and f ∈ L(V, V ). Observe that for each
natural number k we have N (f k ) ⊂ N (f k+1 ).
(a) Show that there exists a natural number k so that N (f k ) = N (f k+1 ).
59
(b) Show further that for all l ≥ k one has N (f l ) = N (f k ).
6. Let V be a finite-dimensional F-vector space and f ∈ L(V, V ).
(a) Let U ⊂ V be an f -invariant subspace and β = (v1 , . . . , vn ) a basis of V such that
β 0 = (v1 , . . . , vk ) is a basis for U . Show that
A B
β
[f ]β =
0 C
0
with A = [f |U ]ββ 0 ∈ Mk×k (F), B ∈ Mk×(n−k) (F), and C ∈ M(n−k)×(n−k) (F).
(b) Let U, W ⊂ V be f -invariant subspaces with V = U ⊕ W . Let β 0 be a basis for
U , β 00 a basis for W , and β = β 0 t β 00 . Show that
A 0
β
[f ]β =
0 C
0
00
with A = [f |U ]ββ 0 and C = [f |W ]ββ 00 .
7. Let V be a finite-dimensional F-vector space. Recall that an element p ∈ L(V, V ) with
p2 = p is called a projection. On the other hand, an element i ∈ L(V, V ) with i2 = idV
is called an involution.
(a) Assume that char(F) 6= 2. Show that the maps
Involutions on V Projections in V
1
(idV + i)
i 7→
2
2p − idV ←[ p
are mutually inverse bijections.
(b) Show that if p ∈ L(V, V ) is a projection, then the only eigenvalues of p are 0
and 1. Furthermore, V = E0 (p) ⊕ E1 (p) (the eigenspaces for p). That is, p is
diagonalizable.
(c) Show that if i ∈ L(V, V ) is an involution, then the only eigenvalues of i are +1
and −1. Furthermore, V = E+1 (i) ⊕ E−1 (i). That is, i is diagonalizable.
Observe that projections and involutions are examples of diagonalizable endomorphisms which do not have dim(V )-many distinct eigenvalues.
8. In this problem we will show that every endomorphism of a vector space over an
algebraically closed field can be represented as an upper triangular matrix. This is a
simpler result than (and is implied by) the Jordan Canonical form, which we will cover
in class soon.
60
We will argue by (strong) induction on the dimension of V . Clearly the result holds for
dim V = 1. So suppose that for some k ≥ 1 whenever dim W ≤ k and U : W → W
is linear, we can find a basis of W with respect to which the matrix of U is uppertriangular. Further, let V be a vector space of dimension k + 1 over F and T : V → V
be linear.
(a) Let λ be an eigenvalue of T . Show that the dimension of R := R(T − λI) is
strictly less than dim V and that R is T -invariant.
(b) Apply the inductive hypothesis to T |R to find a basis of R with respect to which
T |R is upper-triangular. Extend this to a basis for V and complete the argument.
9. Let A ∈ Mn×n (F) be upper-triangular. Show that the eigenvalues of A are the diagonal
entries of A.
10. Let A be the matrix


6 −3 −2
A =  4 −1 −2  .
10 −5 −3
(a) Is A diagonalizable over R? If so, find a basis for R3 of eigenvectors of A.
(b) Is A diagonalizable over C? If so, find a basis for C3 of eigenvectors of A.
11. For which values of a, b, c ∈ R is the following

0 0 0
 a 0 0

 0 b 0
0 0 c
matrix diagonalizable over R?

0
0 

0 
0
12. Let V be a finite dimensional vector space over a field F and let T : V → V be linear.
Suppose that every subspace of V is T -invariant. What can you say about T ?
13. Let V be a finite dimensional vector space over a field F and let T, U : V → V be
linear transformations.
(a) Prove that if I − T U is invertible then I − U T is invertible and
(I − U T )−1 = I + U (I − T U )−1 T .
(b) Use the previous part to show that T U and U T have the same eigenvalues.
14. Let A be the matrix


−1 1
1
 1 −1 1  .
1
1 −1
Find An for all n ≥ 1.
Hint: first diagonalize A.
61
6
Jordan form
6.1
Generalized eigenspaces
It is of course not always true that T is diagonalizable. There can be a couple of reasons for
that. First it may be that the roots of the characteristic polynomial do not lie in the field.
For instance
0 1
−1 0
has characteristic polynomial x2 + 1. Even still it may be that the eigenvalues are in the
field, but we still cannot diagonalize. On the homework you will see that the matrix
1 1
0 1
is not diagonalizable over C (although its eigenvalues are certainly in C). So we resort to
looking for a block diagonal matrix.
Suppose that we can show that
V = W1 ⊕ · · · ⊕ Wk
for some subspaces Wi . Then we can choose a basis for V made up of bases for the Wi ’s. If
the Wi ’s are T -invariant then the matrix will be in block form.
Definition 6.1.1. Let T : V → V be linear. A subspace W of V is T -invariant if T (w) ∈ W
whenever w ∈ W .
• Each eigenspace is T -invariant. If w ∈ Eλ then
(T − λI)T (w) = λ(T − λI)w = ~0 .
• Therefore the eigenspace decomposition is a T -invariant direct sum.
To find a general T -invariant direct sum we define generalized eigenspaces.
Definition 6.1.2. Let T : V → V be linear. If λ ∈ F then the set
Êλ = {v ∈ V : (T − λI)k v = ~0 for some k}
is called the generalized eigenspace for λ.
• This is a subspace. If v, w ∈ Êλ and c ∈ F then there exists kv and kw such that
(T − λI)kv v = ~0 = (T − λI)kw w .
Choosing k = max{kv , kw } we find
(T − λI)k (cv + w) = ~0 .
62
• Each generalized eigenspace is T -invariant. To see this, suppose that (T − λI)k v = ~0.
Then because T commutes with (T − λI)k we have
(T − λI)k T v = T (T − λI)k v = ~0 .
To make sure the characteristic polynomial has roots we will take F to be an algebraically
closed field. That is, each polynomial with coefficients in F has a root in F .
6.2
Primary decomposition theorem
Theorem 6.2.1 (Primary decomposition theorem). Let F be algebraically closed and V a
finite-dimensional vector space over F . If T : V → V is linear and λ1 , . . . , λk are the distinct
eigenvalues of T then
V = Êλ1 ⊕ · · · ⊕ Êλk .
Proof. We follow several steps.
Step 1. c(x) has a root. Therefore T has an eigenvalue. Let λ be this value.
Step 2. Consider the generalized eigenspace Êλ . We first show that there exists k such that
Êλ = N (T − λI)k .
Let v1 , . . . , vm be a basis for Êλ . Then for each i there is a ki such that (T − λI)ki vi = ~0.
Choose k = max{k1 , . . . , km }. Then (T − λI)k kills all the basis vectors and thus kills
everything in Êλ . Therefore
Êλ ⊆ N (T − λI)k .
The other direction is obvious.
Step 3. We now claim that
V = R(T − λI)k ⊕ N (T − λI)k = R(T − λI)k ⊕ Êλ .
First we show that the intersection is only the zero vector. Suppose that v is in the intersection. Then (T − λI)k v = ~0 and there exists w such that (T − λI)k w = v. Then
(T − λ1 I)2k1 w = ~0 so w ∈ Êλ1 . Therefore
v = (T − λ1 I)k1 w = ~0 .
By the rank-nullity theorem,
dim R(T − λ1 I)k1 + dim N (T − λ1 I)k1 = dim V .
By the 2-subspace (dimension) theorem,
V = R(T − λ1 I)k1 + N (T − λ1 I)k1 ,
and so it is a direct sum.
63
Step 4. Write W1 = R(T − λ1 I)k1 so that
V = Êλ1 ⊕ W1 .
These spaces are T -invariant. To show that note that we know Êλ1 is already. For W1 ,
suppose that w ∈ W1 . Then there exists u such that
w = (T − λ1 I)k1 u .
So
(T − λ1 I)k1 (T − λ1 I)u = (T − λ1 I)w .
Therefore (T − λ1 I)w ∈ W1 and thus W1 is (T − λ1 I)-invariant. If w ∈ W1 then
T w = (T − λ1 I)w + λ1 Iw ∈ W1 ,
so W1 is T -invariant.
Step 5. We now argue by induction and do the base case. Let e(T ) be the number of
distinct eigenvalues of T . Note e(T ) ≥ 1.
We first assume e(T ) = 1. In this case we write λ1 for the eigenvalue and see
V = Êλ1 ⊕ R(T − λ1 I)k1 = Êλ1 ⊕ W1 .
We claim that the second space is only the zero vector. Otherwise we restrict T to it to get
an operator TW1 . Then TW1 has an eigenvalue λ. So there is a nonzero vector w ∈ W1 such
that
T w = TW1 w = λw ,
so w is an eigenvector for T . But T has only one eigenvalue so λ = λ1 . This means that
w ∈ Êλ1 and thus
w ∈ Êλ1 ∩ W1 = {~0} .
This is a contradiction, so
V = Êλ1 ⊕ {~0} = Êλ1 ,
and we are done.
Step 6. Suppose the theorem is true for any transformation U with e(U ) = k (k ≥ 1). Then
suppose that e(T ) = k + 1. Let λ1 , . . . , λk+1 be the distinct eigenvalues of T and decompose
as before:
V = Êλ1 ⊕ R(T − λ1 I)k1 = Êλ1 ⊕ W1 .
Now restrict T to W1 and call it TW1 .
Claim 6.2.2. TW1 has eigenvalues λ2 , . . . , λk+1 with the generalized eigenspaces from T : they
are Êλ2 , . . . , Êλk+1 .
64
Once we show this we will be done: we will have e(TW1 ) = k and so we can apply the
theorem:
W1 = Êλ2 ⊕ · · · ⊕ Êλk+1 ,
so
V = Êλ1 ⊕ · · · ⊕ Êλk+1 .
Proof. We first show that each of Êλ2 , . . . , Êλk+1 is in W1 . For this we want a lemma and a
definition:
Definition 6.2.3. If p(x) is a polynomial with coefficients in F and T : V → V is linear,
where V is a vector space over F , we define the transformation
P (T ) = an T n + · · · + a1 T + a0 I ,
where p(x) = an xn + · · · + a1 x + a0 .
Lemma 6.2.4. Suppose that p(x) and q(x) are two polynomials with coefficients in F . If
they have no common root then there exist polynomials a(x) and b(x) such that
a(x)p(x) + b(x)q(x) = 1 .
Proof. Homework
Now choose v ∈ Êλj for some j = 2, . . . , k + 1. By the decomposition we can write
v = u + w where u ∈ Êλ1 and w ∈ W1 . We can now write
Êλ1 = N (T − λ1 I)k1 and Êλj = N (T − λj I)kj
and see
~0 = (T − λj I)kj v = (T − λj I)kj u + (T − λj I)kj w .
However Êλ1 and W1 are T -invariant so they are (T − λj I)kj -invariant. This is a sum of
vectors equal to zero, where on is in Êλ1 , the other is in W1 . Because these spaces direct
sum to V we know both vectors are zero. Therefore
u satisfies (T − λj I)kj u = ~0 = (T − λ1 I)k1 u .
In other words, p(T )u = q(T )u = ~0, where p(x) = (x − λj )kj and q(x) = (x − λ1 )k1 . Since
these polynomials have no root in common we can find a(x) and b(x) as in the lemma.
Finally,
u = (a(T )p(T ) + b(T )q(T ))u = ~0 .
This implies that v = w ∈ W1 and therefore all of Êλ2 , . . . , Êλk+1 are in W1 .
Because of the above statement, we now know that all of λ2 , . . . , λk+1 are eigenvalues of
TW1 . Furthermore if λW1 is an eigenvalue of TW1 then it is an eigenvalue of T . It cannot
be λ1 because then any eigenvector for TW1 with eigenvalue λW1 would have to be in Êλ1
65
but also in W1 so it would be zero, a contradiction. Therefore the eigenvalues of TW1 are
precisely λ2 , . . . , λk+1 .
Let ÊλWj 1 be the generalized eigenspace for TW1 corresponding to λj . We want
ÊλWj 1 = Êλj , j = 2, . . . , k + 1 .
If w ∈ ÊλWj 1 then there exists k such that (TW1 − λj I)k w = ~0. But now on W1 , (TW1 − λj I)k
is the sam as (T − λj I)k , so
(T − λj I)k w = (TW1 − λj I)k w = ~0 ,
so that ÊλWj 1 = Êλj . To show the other inclusion, take w ∈ Êλj , Since Eλj ⊆ W1 , this implies
that w ∈ W1 . Now since there exists k such that (T − λI)k w = ~0, we find
(TW1 − λj I)k w = (T − λj I)k w = ~0 ,
and we are done. We find
V = Êλ1 ⊕ · · · ⊕ Êλk+1 .
6.3
Nilpotent operators
Now we look at the operator T on the generalized eigenspaces. We need only restrict T to
each eigenspace to determine the action on all of V . So for this purpose we will assume that
T has only one generalized eigenspace: there exists λ ∈ F such that
V = Êλ .
In other words, for each v ∈ V there exists k such that (T − λI)k v = ~0. Recall we can then
argue that there exists k ∗ such that
∗
V = N (T − λI)k ,
∗
or, if U = T − λI, U k = 0.
Definition 6.3.1. Let U : V → V be linear. We say that U is nilpotent if there exists k
such that
Uk = 0 .
The smallest k for which U k = 0 is called the degree of U .
The point of this section will be to give a structure theorem for nilpotent operators. It
can be seen as a special case of Jordan form when all eigenvalues are zero. To prove this
structure theorem, we will look at the nullspaces of powers of U . Note that if k = deg(U ),
then N (U k ) = V but N (U k−1 ) 6= V . We get then an increasing chain of subspaces
N0 ⊆ N1 ⊆ · · · ⊆ Nk , where Nj = N (U j ) .
66
• If v ∈ Nj \ Nj−1 then U (v) ∈ Nj−1 \ Nj−2 .
Proof. v has the property that U j v = ~0 but U j−1 v 6= ~0. Therefore U j−1 (U v) = ~0 but
U j−2 (U v) 6= ~0.
Definition 6.3.2. If W1 ⊆ W2 are subspaces of V then we say that v1 , . . . , vm ∈ W2 are
linearly independent mod W1 if
a1 v1 + · · · + am vm ∈ W1 implies ai = 0 for all i .
Lemma 6.3.3. Suppose that dim W2 − dim W1 = l and v1 , . . . , vm ∈ W2 \ W1 are linearly
independent mod W1 . Then
1. m ≤ l and
2. we can choose l − m vectors vm+1 , . . . , vl in W2 \ W1 such that {v1 , . . . , vl } are linearly
independent mod W1 .
Proof. It suffices to show that we can add just one vector. Let w1 , . . . , wt be a basis for W1 .
Then define
X = Span({w1 , . . . , wt , v1 , . . . , vm }) .
Then this set is linearly independent. Indeed, if
a1 w1 + · · · + at wt + b1 v1 + · · · + bm vm = ~0 ,
then b1 v1 + · · · + bm vm ∈ W1 , so all bi ’s are zero. Then
a1 w1 + · · · + at wt = ~0 ,
so all ai ’s are zero. Thus
t + m = dim X ≤ dim W2 = t + l
or m ≤ l.
For the second part, if k = l, we are done. Otherwise dim X < dim W2 , so there exists
vk+1 ∈ W2 \ X. To show linear independence mod W1 , suppose that
a1 v1 + · · · + ak vk + ak+1 vk+1 = w ∈ W1 .
If ak+1 = 0 then we are done. Otherwise we can solve for vk+1 and see it is in X. This is a
contradiction.
Lemma 6.3.4. Suppose that for some m, v1 , . . . , vp ∈ Nm \ Nm−1 are linearly independent
mod Nm−1 . Then U (v1 ), . . . , U (vp ) are linearly independent mod Nm−2 .
67
Proof. Suppose that
a1 U (v1 ) + · · · + ap U (vp ) = ~n ∈ Nm−2 .
Then
U (a1 v1 + · · · + ap vp ) = ~n .
Now
U m−1 (a1 v1 + · · · + ap vp ) = U m−1 (~n) = ~0 .
Therefore a1 v1 + · · · + ap vp ∈ Nm−1 . But these are linearly independent mod Nm−1 so we
find that ai = 0 for all i.
Now we do the following
1. Write dm = dim Nm − dim Nm−1 . Starting at the top, choose
βk = v1k , . . . , vdkk
which are linearly independent mod Nk−1 .
2. Move down one level: write vik−1 = U (vik ). Then {v1k−1 , . . . , vdk−1
} is linearly indepenk
dent mod Nk−2 , so dk ≤ dk−1 . By the lemma we can extend this to
βk−1 = {v1k−1 , . . . , vdk−1
, vdk−1
, . . . , vdk−1
},
k
k +1
k−1
a maximal linearly independent set mod Nk−2 in Nk−1 \ Nk−2 .
3. Repeat.
Note that dk + dk−1 + · · · + d1 = dim V . We claim that if βi is the set at level i then
β = β1 ∪ · · · ∪ βk
is a basis for V . It suffices to show linear independence. For this, we use the following fact.
• If W1 ⊆ · · · ⊆ Wk = V is a nested sequence of subspaces with βi ⊆ Wi \ Wi−1 linearly
independent mod Wi−1 then β = ∪i βi is linearly independent. (check).
We have shown the following result.
Definition 6.3.5. A chain of length l for U is a set {v, U (v), U 2 (v), . . . , U l−1 (v)} of non-zero
vectors such that U l (v) = ~0.
Theorem 6.3.6. If U : V → V is linear and nilpotent (dim V < ∞) then there exists a
basis of V consisting entirely of chains for U .
68
Let U : V → V be nilpotent. If C = {U l−1 v, U l−2 v, . . . , U (v), v} is a chain then note that
C = Span(C) is U -invariant .
Since V has a basis of chains, say C1 , . . . , Cm , we can write
V = C1 ⊕ · · · ⊕ Cm .
In our situation each Ci is a basis for Ci so our matrix for U is block diagonal. Each
block corresponds to a chain. Then U |Ci has the following matrix w.r.t. Ci :


0 1 0 0 ··· 0
 0 0 1 0 ··· 0 


 0 0 0 1 ···


 .


·
·
·



0 1 
0
Theorem 6.3.7 (Uniqueness of nilpotent form). Let U : V → V be linear and nilpotent
with dim V < ∞. Write
li (β) = # of (maximal) chains of length i in β .
Then if β, β 0 are bases of V consisting of chains for U then
li (β)li (β 0 ) for all i .
Proof. Write Ki (β) for the set of elements v of β such that U i (v) = ~0 but U i−1 (v) 6= ~0 (for
i = 1 we only require U (v) = ~0). Let ˜li (β) be the number of (maximal) chains in β of length
at least i.
We first claim that #Ki (β) = ˜li (β) for all i. To see this note that for each chain C of
length at least i there is a unique element v ∈ C such that v ∈ Ki (β). Conversely, for each
v ∈ Ki (β) there is a unique chain of length at least i containing v.
Let ni be the dimension of N (U i ). We claim that ni equals the number mi (β) of v ∈ β
such that U i (v) = ~0. Indeed, the set of such v’s is linearly independent and in N (U i ) so
ni ≥ mi (β). However all other v’s (dim V − mi (β) of them) are mapped to distinct elements
of β (since β is made up of chains), so dim R(U i ) ≥ dim V − mi (β), so ni ≤ mi (β).
Because N (U i ) contains N (U i−1 ) for all i (here we take N (U 0 ) = {~0}, we have
˜li (β) = #Ki (β) = ni − ni−1 .
Therefore
li (β) = ˜li (β) − ˜li+1 (β) = (ni − ni−1 ) − (ni+1 − ni ) .
The right side does not depend on β and in fact the same argument shows it is equal to
li (β 0 ). This completes the proof.
69
6.4
Existence and uniqueness of Jordan form, Cayley-Hamilton
Definition 6.4.1. A Jordan block for λ of size l is

λ 1 0 0 ···
 0 λ 1 0 ···

···
Jlλ = 



0
0 

 .

1 
λ
Theorem 6.4.2 (Jordan canonical form). If T : V → V is linear with dim V < ∞ and F
algebraically closed. Then there is a basis β of V such that [T ]β is block diagonal with Jordan
blocks.
Proof. First decompose V = Êλ1 ⊕ · · · ⊕ Êλk . On each Êλi , the operator T − λi I is nilpotent.
Each chain for (T −λi I)|Êλ gives a block in nilpotent decomposition. Then T = T −λi I +λi I
i
gives a Jordan block.
Draw a picture at this point (of sets of chains). We first decompose
V = Êλ1 ⊕ · · · ⊕ Êλk
and then
Êλi = C1i ⊕ · · · ⊕ Cki i ,
where each Cji is the span of a chain of generalized eigenvectors: Cji = {v1 , . . . , vp }, where
T (v1 ) = λi v1 , T (v2 ) = λv2 + v1 , . . . , T (vp ) = λi vp + vp−1 .
Theorem 6.4.3 (Cayley-Hamilton). Let T : V → V be linear with dim V < ∞ and F
algebraically closed. Then
c(T ) = 0 .
Remark. In fact the theorem holds even if F is not algebraically closed by doing a field
extension.
Lemma 6.4.4. If U : V → V is linear and nilpotent with dim V = n < ∞ then
Un = 0 .
Therefore if T : V → V is linear and v ∈ Êλ then
(T − λI)dim
Êλ
v = ~0 .
Proof. Let β be a basis of chains for U . Then the length of the longest chain is n.
70
Lemma 6.4.5. If T : V → V is linear with dim V < ∞ and β is a basis such that [T ]ββ is
in Jordan form then for each eigenvalue λ, let Sλ be the basis vectors corresponding to blocks
for λ. Then
Span(Sλ ) = Êλ for each λ .
Therefore if
k
Y
c(x) =
(λi − x)ni ,
i=1
then ni = dim Êλi for each i.
Proof. Write λ1 , . . . , λk for the distinct eigenvalues of T . Let
Wi = Span(Sλi ) .
We may assume that the blocks corresponding to λ1 appear first, λ2 appear second, and so
on. Since [T ]ββ is in block form, this means V is a T -invariant direct sum
W1 ⊕ · · · ⊕ Wk .
However for each i, T − λi I restricted to Wi is in nilpotent form. Thus (T − λi I)dim
for each v ∈ Sλi . This means
Êλi
v = ~0
Wi ⊆ Êλi for all i or dim Wi ≤ dim Êλi .
P
But V = Êλ1 ⊕ · · · ⊕ Êλk , so ki=1 dim Êλi = dim V . This gives that dim Wi = dimÊλi for
all i, or Wi = Êλi .
For the second claim, ni is the number of times that λi appears on the diagonal; that is,
the dimension of Span(Sλi ).
Proof of Cayley-Hamilton. We first factor
c(x) =
k
Y
(λi − x)ni ,
i=1
where ni is called the algebraic multiplicity of the eigenvalue λi . Let β be a basis such that
[T ]ββ is in Jordan form. If v ∈ β is in a block corresponding to λj then v ∈ Êλj and so
(T − λj I)dim Êλj v = ~0 by the previous lemma). Now
c(T )v =
k
Y
(λi I − T )ni
i=1
!
!
v=
Y
(λi I − T )ni
i6=j
since nj = dim Êλj .
Finally we have uniqueness of Jordan form.
71
(λj I − T )nj v = ~0
Theorem 6.4.6. If β and β 0 are bases of V for which T is in Jordan form, then the matrices
are the same up to permutation of blocks.
Proof. First we note that the characteristic polynomial can be read off of the matrices and
is invariant. This gives that the diagonal entries are the same, and the number of vectors
corresponding to each eigenvalue is the same.
We see from the second lemma that if βi and βi0 are the parts of the bases corresponding
to blocks involving λi then
Wi := Span(βi ) = Êλi = Span(βi0 ) =: Wi0 .
Restricting T to Wi and to Wi0 then gives the blocks for λi . But then βi and βi0 are just bases
of Êλi of chains for T − λi I. The number of chains of each length is the same, and this is
the number of blocks of each size.
6.5
Exercises
Notation:
1. If F is a field then we write F[X] for the set of polynomials with coefficients in F.
2. If P, Q ∈ F[X] then we say that P divides Q and write P |Q if there exists S ∈ F[X]
such that Q = SP .
3. If P ∈ F[X] then we write the deg(P ) for the largest k such that the coefficient of xk
in P is nonzero. We define the degree of the zero polynomial to be −∞.
4. If P ∈ F[X] then we say that P is monic if the coefficient of xn is 1, where n =deg(P ).
5. For a complex number z, we denote the complex conjugate by z̄, i.e. if z = a + ib, with
a, b ∈ R, then z̄ = a − ib.
6. If V is an F -vector space, recall the definition of V × V : it is the F -vector space whose
elements are
V × V = {(v1 , v2 ) : v1 , v2 ∈ V } .
Vector addition is performed as (v1 , v2 ) + (v3 , v4 ) = (v1 + v3 , v2 + v4 ) and for c ∈ F ,
c(v1 , v2 ) is defined as (cv1 , cv2 ).
Exercises
1. (a) Show that for P, Q ∈ F[X], one has deg(P Q) = deg(P ) + deg(Q).
(b) Show that for P, D ∈ F[X] such that D is nonzero, there exist Q, R ∈ F[X] such
that P = QD + R and deg(R) < deg(D).
Hint: Use induction on deg(P ).
72
(c) Show that, for any λ ∈ F ,
P (λ) = 0 ⇔ (X − λ)|P.
(d) Let P ∈ F[X] be of the form p(X) = a(X − λ1 )n1 · · · (X − λk )nk for some
a, λ1 , . . . , λk ∈ F and natural numbers n1 , . . . , nk . Show that Q ∈ F[X] divides
P if and only if Q(X) = b(X − λ1 )m1 · · · (X − λk )mk for some b ∈ F and natural
numbers mi with mi ≤ ni (we allow mi = 0).
(e) Assume that F is algebraically closed. Show that every P ∈ F[X] is of the
form a(X − λ1 )n1 . . . (X − λk )nk for some a, λ1 , . . . , λk ∈ F and natural numbers
n1 , . . . , nk with n1 + · · · + nk = deg(P ). In this case we call the λi ’s the roots of
P.
2. Let F be a field and suppose that P, Q are nonzero polynomials in F[X]. Define the
subset S of F[X] as follows:
S = {AP + BQ : A, B ∈ F[X]} .
(a) Let D ∈ S be of minimal degree. Show that D divides both P and Q.
Hint: Use part 2 of the previous problem.
(b) Show that if S ∈ F [X] divides both P and Q then S divides D.
(c) Conclude that there exists a unique monic polynomial D ∈ F[X] satisfying the
following conditions
i. D divides P and Q.
ii. If T ∈ F[X] divides both P and Q then T divides D.
(Such a polynomial is called the greatest common divisor (GCD) of P and Q.)
(d) Show that if F is algebraically closed and P and Q are polynomials in F[X] with
no common root then there exist A and B in F[X] such that
AP + BQ = 1 .
3. Let F be any field, V be an F -vector space, f ∈ L(V, V ), and W ⊂ V an f -invariant
subspace.
(a) Let p : V → V /W denote the natural map defined in Homework 8, problem 5.
Show that there exists an element of L(V /W, V /W ), which we will denote by
f |V /W , such that p ◦ f = f |V /W ◦ p. It is customary to expresses this fact using
the following diagram:
f
V
p
/
V /W
73
f |V /W
/
V
p
V /W
(b) Let β 0 be a basis for W and β be a basis for V which contains β 0 . Show that the
image of β 00 := β \ β 0 under p is a basis for V /W .
(c) Let β 00 be a subset of V with the property that the restriction of p to β 00 (which
is a map of sets β 00 → V /W ) is injective and its image is a basis for V /W . Show
that β 00 is a linearly-independent set. Show moreover that if β 0 is a basis for W ,
then β := β 0 t β 00 is a basis for V .
(d) Let β, β 0 , and β 00 be as above. Show that
A B
β
[f ]β =
0 C
0
p(β 00 )
with A = [f |W ]ββ 0 and C = [f |V /W ]p(β 00 ) .
4. The minimal polynomial. Let F be any field, V an F -vector space, and f ∈ L(V, V ).
(a) Consider the subset S ⊂ F [X] defined by
S = {P ∈ F [X]|P (f ) = 0}.
Show that S contains a nonzero element.
(b) Let Mf ∈ S be a monic non-zero element of minimal degree. Show that Mf
divides any other element of S. Conclude that Mf as defined is unique. It is
called the minimal polynomial of f .
(c) Show that the roots of Mf are precisely the eigenvalues of f by completing the
following steps.
i. Suppose that r ∈ F is such that Mf (r) = 0. Show that
Mf (X) = Q(X)(X − r)k
for some positive integer k and Q ∈ F[X] such that Q(r) 6= 0. Prove also
that Q(f ) 6= 0.
ii. Show that if r ∈ F satisfies Mf (r) = 0 then f − rI is not invertible and thus
r is an eigenvalue of f .
iii. Conversely, let λ be an eigenvalue of f with eigenvector v. Show that if
P ∈ F[X] then
P (f )v = P (λ)v .
Conclude that λ is a root of Mf .
(d) Assume that F is algebraically closed. For each eigenvalue λ of f , express multλ (Mf )
in terms of the Jordan form of f .
(e) Assume that F is algebraically closed. Show that f is diagonalizable if and only
if multλ (Mf ) = 1 for all eigenvalues λ of f .
74
(f) Assume that F is algebraically closed. Under which circumstances the does Mf
equal the characteristic polynomial of f ?
5. If T : V → V is linear and V is a finite-dimensional F-vector space with F algebraically
closed, we define the algebraic multiplicity of an eigenvalue λ to be a(λ), the dimension
of the generalized eigenspace Êλ . The geometric multiplicity of λ is g(λ), the dimension
of the eigenspace Eλ . Finally, the index of λ is i(λ), the length of the longest chain of
generalized eigenvectors in Êλ .
Suppose that λ is an eigenvalue of T and g = g(λ) and i = i(λ) are given integers.
(a) What is the minimal possible value for a = a(λ)?
(b) What is the maximal possible for a?
(c) Show that a can take any value between the answers for the above two questions.
(d) What is the smallest dimension n of V for which there exist two linear transformations T and U from V to V with all of the following properties? (i) There
exists λ ∈ F which is the only eigenvalue of either T or U , (ii) T and U are not
similar transformations and (iii) the geometric multiplicity of λ for T equals that
of U and similarly for the index.
6. Find the Jordan form for each of the following matrices over C. Write the minimal
polynomial and characteristic polynomial for each. To do this, first find the eigenvalues.
Then, for each eigenvalue λ, find the dimensions of the nullspaces of (A − λI)k for
pertinent values of k (where A is the matrix in question). Use this information to
deduce the block forms.






−1 0 0
2 3 0
5
1
3
2
0 
(a)  1 4 −1 
(b)  0 −1 0 
(c)  0
−1 4 0
0 −1 2
−6 −1 −4






3 0 0
2 3 2
4 1 0
(d)  4 2 0 
(e)  −1 4 2 
(f )  −1 2 0 
5 0 2
0 1 3
1 1 3
7. (a) The characteristic polynomial of the matrix


7 1 2
2
 1 4 −1 −1 

A=
 −2 1 5 −1 
1 1 2
8
is c(x) = (x − 6)4 . Find an invertible matrix S such that S −1 AS is in Jordan
form.
(b) Find all complex matrices in Jordan form with characteristic polynomial
c(x) = (i − x)3 (2 − x)2 .
75
8. Complexification of finite-dimensional real vector spaces. Let V be an Rvector space. Just as we can view R as a subset of C we will be able to view V as a
subset of a C-vector space. This will be useful because C is algebraically closed so we
can, for example, use the theory of Jordan form in VC and bring it back (in a certain
form) to V . We will give two constructions of the complexification; the first is more
elementary and the second is the standard construction you will see in algebra.
We put VC = V × V .
(a) Right now, VC is only an R-vector space. We must define what it means to
multiply vectors by complex scalars. For z ∈ C, z = a + ib with a, b ∈ R, and
v = (vr , vi ) ∈ VC , we define the element zv ∈ VC to be
(avr − bvi , avi + bvr ).
Show that in this way, VC becomes a C-vector space. This is the complexification
of V .
(b) We now show how to view V as a “subset” of VC . Show that the map ι : V → VC
which maps v to (v, 0) is injective and R-linear. (Thus the set ι(V ) is a copy of
V sitting in VC .)
(c) Show that dimC (VC ) = dimR (V ). Conclude that VC is equal to spanC (ι(V )).
Conclude further that if v1 , . . . , vn is an R-basis for V , then ι(v1 ), . . . , ι(vn ) is a
C-basis for VC .
(d) Complex conjugation: We define the complex conjugation map c : VC → VC to
be the map (vr , vi ) 7→ (vr , −vi ). Just as R (sitting inside of C) is invariant under
complex conjugation, so will our copy of V (and its subspaces) be inside of VC .
i. Prove that c2 = 1 and i(V ) = {v ∈ VC |c(v) = v}.
ii. Show that for all z ∈ C and v ∈ VC , we have c(zv) = z̄c(v). Maps with this
property are called anti-linear.
iii. In the next two parts, we classify those subspaces of VC that are invariant
under c. Let W be a subspace of V . Show that the C-subspace of VC spanned
by ι(W ) equals
{(w1 , w2 ) ∈ VC : w1 , w2 ∈ W }
and is invariant under c.
iv. Show conversely that if a subspace W̃ of the C-vector space VC is invariant
under c, then there exists a subspace W ⊂ V such that W̃ = spanC (ι(W )).
v. Last, notice that the previous two parts told us the following: The subspaces
of VC which are invariant under conjugation are precisely those which are
equal to WC for subspaces W of the R-vector space V . Show that moreover,
in that situation, the restriction of the complex conjugation map c : VC → VC
to WC is equal to the complex conjugation map defined for WC (the latter
map is defined intrinsically for WC , i.e. without viewing it as a subspace of
VC ).
76
9. Let V be a finite dimensional R-vector space. For this exercise we use the notation of
the previous one.
(a) Let W be another finite dimensional R-vector space, and let f ∈ L(V, W ). Show
that
fC ((v, w)) := (f (v), f (w))
defines an element fC ∈ L(VC , WC ).
(b) Show that for v ∈ VC , we have fC (c(v)) = c(fC (v)). Show conversely that if
f˜ ∈ L(VC , WC ) has the property that f˜(c(v)) = c(f˜(v)), the f˜ = fC for some
f ∈ L(V, W ).
10. In this problem we will establish the real Jordan form. Let V be a vector space over
R of dimension n < ∞. Let T : V → V be linear and TC its complexification.
(a) If λ ∈ C is an eigenvalue of TC , and Êλ is the corresponding generalized eigenspace,
show that
c(Êλ ) = Êλ̄ .
(b) Show that the non-real eigenvalues of TC come in pairs. In other words, show that
we can list the distinct eigenvalues of TC as
λ1 , . . . , λr , σ1 , σ2 , . . . , σ2m ,
where for each j = 1, . . . , r, λj = λj and for each i = 1, . . . , m, σ2i−1 = σ2i .
(c) Because C is algebraically closed, the proof of Jordan form shows that
VC = Êλ1 ⊕ · · · ⊕ Êλr ⊕ Êσ1 ⊕ · · · ⊕ Êσ2m .
Using the previous two points, show that for j = 1, . . . , r and i = 1, . . . , m, the
subspaces of VC
Êλj
and
Êσ2i−1 ⊕ Êσ2i
are c-invariant.
(d) Deduce from the results of problem 6, homework 10 that there exist subspaces
X1 , . . . , Xr and Y1 , . . . , Ym of V such that for each j = 1, . . . , r and i = 1, . . . , m,
Êλj = SpanC (ι(Xj )) and Êσ2i−1 ⊕ Êσ2i = SpanC (ι(Yi )) .
Show that
V = X1 ⊕ · · · ⊕ Xr ⊕ Y1 ⊕ · · · ⊕ Ym .
(e) Prove that for each j = 1, . . . , r, the transformation T − λj I restricted to Xj is
nilpotent and thus we can find a basis βj for Xj consisting entirely of chains for
T − λj I.
77
(f) For each i = 1, . . . , m, let
βi = {(v1i , w1i ), . . . , (vni i , wni i )}
be a basis of Êσ2i−1 consisting of chains for TC − σ2i−1 I. Prove that
β̂i = {v1i , w1i , . . . , vni i , wni i }
is a basis for Yi . Describe the form of the matrix representation of T restricted
to Yi relative to β̂i .
(g) Gathering the previous parts, state and prove a version of Jordan form for linear
transformations over finite-dimensional real vector spaces. Your version should
be of the form “If T : V → V is linear then there exists a basis β of V such that
[T ]ββ has the form . . . ”
78
7
Bilinear forms
7.1
Definitions
We now switch gears from Jordan form.
Definition 7.1.1. If V is a vector space over F , a function f : V × V → F is called a
bilinear form if for fixed v ∈ V , f (v, w) is linear in w and for fixed w ∈ V , f (v, w) is linear
in v.
Bilinear forms have matrix representations similar to those for linear transformations.
Choose a basis β = {v1 , . . . , vn } for V and write
v = a1 v1 + · · · + an vn , w = b1 v1 + · · · + bn vn .
Now
f (v, w) =
n
X
ai f (vi , w) =
i=1
n X
n
X
ai bj f (vi , vj ) .
i=1 j=1
Define an n × n matrix A by Ai,j = f (vi , vj ). Then this is
!
n
n
n
X
X
X
~
ai
bj Ai,j =
ai A · b = (~a)t A · ~b = [v]tβ [f ]ββ [w]β .
i=1
j=1
i
i=1
We have proved:
Theorem 7.1.2. If dim V < ∞ and f : V × V → F is a bilinear form there exists a unique
matrix [f ]ββ such that for all v, w ∈ V ,
f (v, w) = [v]tβ [f ]ββ [w]β .
Furthermore the map f 7→ [f ]ββ is an isomorphism from Bil(V, F ) to Mn×n (F ).
Proof. We showed existence. To prove uniqueness, suppose that A is another such matrix.
Then
Ai,j = eti Aej = [vi ]tβ A[vj ]β = f (vi , vj ) .
If β 0 is another basis then
t
0
t
0
t
β0
β
β0
β
0
0
0
([v]β )
[I]β [f ]β [I]β [w]β = [I]β [v]β [f ]ββ [I]ββ [w]β 0 = ([v]β )t [f ]ββ [w]β = f (v, w) .
Therefore
0 t
0
0
[f ]ββ 0 = [I]ββ [f ]ββ [I]ββ .
Note that for fixed v ∈ V the map Lf (v) : V → F given by Lf (v)(w) = f (v, w) is a
linear functional. So f gives a map in L(V, V ∗ ).
79
Theorem 7.1.3. Denote by Bil(V, F ) the set of bilinear forms on V . The map ΦL :
Bil(V, F ) → L(V, V ∗ ) given by
ΦL (f ) = Lf
is an isomorphism.
Proof. If f, g ∈ Bil(V, F ) and c ∈ F then
(ΦL (cf + g)(v)) (w) = (Lcf +g (v)) (w) = (cf + g)(v, w) = cf (v, w) + g(v, w)
= cLf (v)(w) + Lg (v)(w) = (cLf (v) + Lg (v))(w)
= (cΦL (f )(v) + ΦL (g)(v))(w) .
Thus ΦL (cf + g)(v) = cΦL (f )(v) + ΦL (g)(v). This is the same as (cΦL (f ) + ΦL (g))(v).
Therefore ΦL (cf + g) = cΦL (f ) + ΦL (g). Thus ΦL is linear.
Now Bil(V, F ) has dimension n2 . This is because the map from last theorem is an
isomorphism. So does L(V, V ∗ ). Therefore we only need to show one-to-one or onto. To
show one-to-one, suppose that ΦL (f ) = 0. Then for all v, Lf (v) = 0. In other words, for all
v and w ∈ V , f (v, w) = 0. This means f = 0.
Remark. We can also define Rf (w) by Rf (w)(v) = f (v, w). Then the corresponding map
ΦR is an isomorphism.
You will prove the following fact in homework. If β is a basis for V and β ∗ is the dual
basis, then for each f ∈ Bil(V, F ),
[Rf ]ββ ∗ = [f ]ββ .
Then we have
t
[Lf ]ββ ∗ = [f ]ββ .
To see this, set g ∈ Bil(V, F ) to be g(v, w) = f (w, v). Then for each v, w ∈ V ,
([w]β )t [f ]ββ [v]β = f (w, v) = g(v, w)
Taking transpose on both sides,
t
([v]β )t [f ]ββ [w]β = g(v, w)
so
[g]ββ
=
[f ]ββ
t
. But Lf = Rg , so
[f ]ββ
t
= [Rg ]ββ ∗ = [Lf ]ββ ∗ .
Definition 7.1.4. If f ∈ Bil(V, F ) then we define the rank of f to be the rank of Rf .
• By the above remark, the rank equals the rank of either of the matrices
[f ]ββ or [Lf ]ββ ∗ .
Therefore the rank of [f ]ββ does not depend on the choice of basis.
80
• For f ∈ Bil(V, F ), define
N (f ) = {v ∈ V : f (v, w) = 0 for all w ∈ V } .
This is just
{v ∈ V : Lf (v) = 0} = N (Lf ) .
But Lf is a map from V to V ∗ , so we have
rank(f ) = rank(Lf ) = dim V − dim N (f ) .
7.2
Symmetric bilinear forms
Definition 7.2.1. A bilinear form f ∈ Bil(V, F ) is called symmetric if f (v, w) = f (w, v)
for all v, w ∈ V . f is called skew-symmetric if f (v, v) = 0 for all v ∈ V .
• The matrix for a symmetric bilinear form is symmetric and the matrix for a skewsymmetric bilinear form is anti-symmetric.
• Furthermore, each symmetric matrix A gives a symmetric bilinear form:
f (v, w) = ([v]β )t A · [w]β .
Similarly for skew-symmetric matrices.
Lemma 7.2.2. If f ∈ Bil(V, F ) is symmetric and char(F ) 6= 2 then f (v, w) = 0 for all
v, w ∈ V if and only if f (v, v) = 0 for all v ∈ V .
Proof. One direction is clear. For the other, suppose that f (v, v) = 0 for all v ∈ V .
f (v + w, v + w) = f (v, v) + 2f (v, w) + f (w, w)
f (v − w, v − w) = f (v, v) − 2f (v, w) + f (w, w) .
Therefore
0 = f (v + w, v + w) − f (v − w, v − w) = 4f (v, w) .
Here 4f (v, w) means f (v, w) added to itself 3 times. If char(F ) 6= 2 then this implies
f (v, w) = 0.
Remark. If char(F ) = 2 then the above lemma is false. Take F = Z2 with V = F 2 and f
with matrix
0 1
.
1 0
Check that this has f (v, v) = 0 for all v but f (v, w) is clearly not 0 for all v, w ∈ V .
Definition 7.2.3. A basis β = {v1 , . . . , vn } of V is orthogonal for f ∈ Bil(V, F ) if f (vi , vj ) =
0 whenever i 6= j. It is orthonormal if it is orthogonal and f (vi , vi ) = 1 for all i.
81
Theorem 7.2.4 (Diagonalization of symmetric bilinear forms). Let f ∈ Bil(V, F ) with
char(F ) 6= 2 and dim V < ∞. If f is symmetric then V has an orthogonal basis for f .
Proof. We argue by induction on n = dim V . If n = 1 it is clear. Let us now suppose that
the statement holds for all k < n for some n > 1 and show that it holds for n. If f (v, v) = 0
for all v then f is identically zero and thus we are done. Otherwise we can find some v 6= 0
such that f (v, v) 6= 0.
Define
W = {w ∈ V : f (v, w) = 0} .
Since this is the nullspace of Lf (v) and Lf (v) is a nonzero element of V ∗ , it follows that W
is n − 1 dimensional. Because f restricted
to W is still a symmetric bilinear function, we can
β0
0
find a basis β of W such that [f ]β 0 is diagonal. Write β 0 = {v1 , . . . , vn−1 } and β = β 0 ∪ {v}.
Then we claim β is a basis for V : if
a1 v1 + · · · + an−1 vn−1 + av = ~0 ,
then taking Lf (v) of both sides we find af (v, v) = ~0, or a = 0. Linear independence gives
that the other ai ’s are zero.
Now it is clear that [f ]ββ is diagonal. For i 6= j which are both < n this follows because β 0
is a basis for which f is diagonal on W . Otherwise one of i, j is n and then the other vector
is in W and so f (vi , vj ) = 0.
• In the basis β, f has a diagonal matrix. This says that for each symmetric matrix A
we can find an invertible matrix S such that
S t AS is diagonal .
• In fact, if F is a field such that each number has a square root (like C) thenp
we can make
a new basis, replacing each element v of β such that f (v, v) 6= 0 by v/ f (v, v) and
leaving all elements such that f (v, v) = 0 to find a basis such that the representation
of f is diagonal, with all 1 and 0 on the diagonal. The number of 1’s equals the rank
of f .
• Therefore if f has full rank and each element of F has a square root, there exists an
orthonormal basis of V for f .
Theorem 7.2.5 (Sylvester’s law). Let f be a symmetric bilinear form on Rn . There exists
a basis β such that [f ]ββ is diagonal, with only 0’s, 1’s and −1’s. Furthermore, the number
of each is independent of the choice of basis that puts f into this form.
p
Proof. Certainly such a basis exists. Just modify the construction by dividing by |f (vi , vi )|
instead. So we show the other claim. Because the number of 0’s is independent of the basis,
we need only show the statement for 1’s.
82
For a basis β, let V + (β) be the span of the vi ’s such that f (vi , vi ) > 0, and similarly for
V − (β) and V 0 (β). Clearly
V = V + (β) ⊕ V − (β) ⊕ V 0 (β) .
Note that the number of 1’s equals the dimension of V + (β). Furthermore, for each v ∈ V + (β)
we have
p
X
f (v, v) =
a2i f (vi , vi ) > 0 ,
i=1
where v1 , . . . , vp are the basis vectors for V + (β). A similar argument gives f (v, v) ≤ 0 for
all v ∈ V − (β) ⊕ V 0 (β).
If β 0 is another basis we also have
V = V + (β 0 ) ⊕ V − (β 0 ) ⊕ V 0 (β 0 ) .
Suppose that dim V + (β 0 ) > dim V + (β). Then
dim V + (β 0 ) + dim (V − (β) ⊕ V 0 (β)) > n ,
so V + (β 0 ) intersects V − (β) ⊕ V 0 (β) in at least one non-zero vector, say v. Since v ∈ V + (β 0 ),
f (v, v) > 0, However, since v ∈ V − (β) ⊕ V 0 (β), so f (v, v) ≤ 0, a contradiction. Therefore
dim V + (β) = dim V + (β 0 ) and we are done.
• The subspace V 0 (β) is unique. We can define
NL (f ) = N (Lf ), NR (f ) = N (Rf ) .
In the symmetric case, these are equal and we can define it to be be N (f ). We claim
that
V 0 (β) = N (f ) for all β .
Indeed, if v ∈ V 0 (β) then because the basis β is orthogonal for f , f (v, ṽ) = 0 for all
ṽ ∈ β and so v ∈ N (f ). On the other hand,
dim V 0 (β) = dim V − dim V + (β) + dim V − (β) = dim V − rank [f ]ββ = dim N (f ) .
• However the spaces V + (β) and V − (β) are not unique. Let us take f ∈ Bil(R2 , R) with
matrix (in the standard basis)
1 0
β
[f ]β =
.
0 −1
√
√
Then f ((a, b), (c, d)) = ac − bd. Take v1 = (2, 3) and v2 = ( 3, 2). Thus we get
√ √
f (v1 , v1 ) = (2)(2) − ( 3)( 3) = 1 ,
√
√
f (v1 , v2 ) = (2)( 3) − ( 3)(2) = 0
√ √
f (v2 , v2 ) = ( 3)( 3) − (2)(2) = −1 .
83
7.3
Sesquilinear and Hermitian forms
One important example of a symmetric bilinear form on Rn is
f (v, w) = v1 w1 + · · · + vn wn .
p
In this case, f (v, v) actually defines a good notion of length of vectors on Rn (we will
define precisely what this means later). In particular, we have f (v, v) ≥ 0 for all v. On Cn ,
however, this is not true. If f is the bilinear form from above, then f ((i, . . . , i), (i, . . . , i)) < 0.
But if we define the form
f (v, w) = v1 w1 + · · · + vn wn ,
then it is true. This is not bilinear, but it is sesquilinear.
Definition 7.3.1. Let V be a finite dimensional complex vector space. A function f :
V × V → C is called sesquilinear if
1. for each w ∈ V , the function v 7→ f (v, w) is linear and
2. for each v ∈ V , the function w 7→ f (v, w) is anti-linear.
To be anti-linear, it means that f (v, cw1 + w2 ) = cf (v, w1 ) + f (v, w2 ). The sesquilinear form
f is called Hermitian if f (v, w) = f (w, v).
• Note that if f is hermitian, then f (v, v) = f (v, v), so f (v, v) ∈ R.
1. If f (v, v) ≥ 0 (> 0) for all v 6= 0 then f is positive semi-definite (positive definite).
2. If f (v, v) ≤ 0 (< 0) for all v 6= 0 then f is negative semi-definite (negative
definite).
• If f is a sesquilinear form and β is a basis then there is a matrix for f : as before, if
v = a1 v1 + · · · + an vn and w = b1 v1 + · · · + bn vn ,
f (v, w) =
n
X
ai f (vi , w) =
i=1
n
X
bj f (vi , vj ) = [v]tβ [f ]ββ [w]β .
i=1
• The map w 7→ f (·, w) is a conjugate isomorphism from V to V ∗ .
• We have the polarization formula
4f (u, v) = f (u + v, u + v) − f (u − v, u − v) + if (u + iv, u + iv) − if (u − iv, u − iv) .
From this we deduce that if f (v, v) = 0 for all v then f = 0.
Theorem 7.3.2 (Sylvester for Hermitian forms). Let f be a hermitian form on a finitedimensional complex vector space V . There exists a basis β of V such that [f ]ββ is diagonal
with only 0’s, 1’s and −1’s. Furthermore the number of each does not depend on β so long
as the matrix is in diagonal form.
Proof. Same proof.
84
7.4
Exercises
Notation:
1. For all problems below, F is a field of characteristic different from 2, and V is a finitedimensional F -vector space. We write Bil(V, F ) for the vector space of bilinear forms
on V , and Sym(V, F ) for the subspace of symmetric bilinear forms.
2. If B is a bilinear form on V and W ⊂ V is any subspace, we define the restriction of
B to W , written B|W ∈ Bil(W, F ), by B|W (w1 , w2 ) = B(w1 , w2 ).
3. We call B ∈ Sym(V, F ) non-degenerate, if N (B) = 0.
Exercises
1. Let l ∈ V ∗ . Define a symmetric bilinear form B on V by B(v, w) = l(v)l(w). Compute
the nullspace of B.
2. Let B be a symmetric bilinear form on V . Suppose that W ⊂ V is a subspace with
the property that V = W ⊕ N (B). Show that the B|W is non-degenerate.
3. Let B be a symmetric bilinear form on V and char(F) 6= 2. Suppose that W ⊂ V is a
subspace such that B|W is non-degenerate. Show that then V = W ⊕ W ⊥ .
Hint: Use induction on dim(W ).
4. Recall the isomorphism Φ : Bil(V, F ) → L(V, V ∗ ) given by
Φ(B)(v)(w) = B(v, w),
v, w ∈ V.
If β is a basis of V , and β ∗ is the dual basis of V ∗ , show that
T
β
β
[B]β = [Φ(B)]β ∗ .
5. Let n denote the dimension of V . Let d ∈ Altn (V ), and B ∈ Sym(V, F ) both be nonzero. We are going to show that there exists a constant cd,B ∈ F with the property
that for any vectors v1 , . . . , vn , w1 , . . . , wn ∈ V , the following identity holds:
det(B(vi , wj )ni=1 nj=1 ) = cd,B d(v1 , . . . , vn )d(w1 , . . . , wn ),
by completing the following steps:
(a) Show that for fixed (v1 , . . . , vn ), there exists a constant cd,B (v1 , . . . , vn ) ∈ F such
that
det(B(vi , wj )ni=1 nj=1 ) = cd,B (v1 , . . . , vn ) · d(w1 , . . . , wn ).
85
(b) We now let (v1 , . . . , vn ) vary. Show that there exists a constant cd,B ∈ F such
that
cd,B (v1 , . . . , vn ) = cd,B · d(v1 , . . . , vn ).
Show further that cd,B = 0 precisely when B is degenerate.
6. The orthogonal group. Let B be a non-degenerate symmetric bilinear form on V .
Consider
O(B) = {f ∈ L(V, V )|B(f (v), f (w)) = B(v, w) ∀v, w ∈ V }.
(a) Show that if f ∈ O(B), then det(f ) is either 1 or −1
Hint: Use the previous exercise.
(b) Show that the composition of maps makes O(B) into a group.
(c) Let V = R2 , B((x1 , x2 ), (y1 , y2 )) = x1 y1 + x2 y2 . Give a formula for the 2x2matrices that belong to O(B).
7. The vector product. Assume that V is 3-dimensional. Let B ∈ Sym(V, F ) be
non-degenerate, and d ∈ Alt3 (V ) be non-zero.
(a) Show that for any v, w ∈ V there exists a unique vector z ∈ V such that for all
vectors x ∈ V the following identity holds: B(z, x) = d(v, w, x).
Hint: Consider the element d(v, w, ·) ∈ V ∗ .
(b) We will denote the unique vector z from part 1 by v × w. Show that V × V → V ,
(v, w) 7→ v × w is bilinear and skew-symmetric.
(c) For f ∈ O(B), show that f (v × w) = det(f ) · (f (v) × f (w)).
(d) Show that v × w is B-orthogonal to both v and w.
(e) Show that v × w = 0 precisely when v and w are linearly dependent.
8. Let V be a finite dimensional R-vector space. Recall its complexification VC , defined
in the exercises of last chapter. It is a C-vector space with dimC VC = dimR V . As an
R-vector space, it equals V ⊕ V . We have the injection ι : V → VC , v 7→ (v, 0). We
also have the complex conjugation map c(v, w) = (v, −w).
(a) Let B be a bilinear form on V . Show that
BC ((v, w), (x, y)) := B(v, x) − B(w, y) + iB(v, y) + iB(w, x)
defines a bilinear form on VC . Show that N (BC ) = N (B)C . Show that BC is
symmetric if and only if B is.
(b) Show that for v, w ∈ VC , we have BC (c(v), c(w)) = BC (v, w). Show conversely
that any bilinear form B̃ on VC with the property B̃(c(v), c(w)) = B̃(v, w) is equal
to BC for some bilinear form B on V .
86
(c) Let B be a symmetric bilinear form on V . Show that
BH ((v, w), (x, y)) := B(v, x) + B(w, y) − iB(v, y) + iB(w, x)
defines a Hermitian form on VC . Show that N (BH ) = N (B)C .
(d) Show that for v, w ∈ VC , we have BH (c(v), c(w)) = BH (v, w). Show conversely
that any Hermitian form B̃ on VC with the property B̃(c(v), c(w)) = B̃(v, w) is
equal to BH for some bilinear form B on V .
9. Prove that if V is a finite-dimensional F-vector space with char(F ) 6= 2 and f is a
nonzero skew-symmetric bilinear form (that is, a bilinear form such that f (v, w) =
−f (w, v) for all v, w ∈ V ) then there is no basis β for V such that [f ]ββ is upper
triangular.
10. For each of the following real
is diagonal.

2
 3
5
matrices A, find an invertible matrix S such that S t AS

0
3 5
 1
7 11  , 
 2
11 13
3

1
0
1
2
2
1
0
1

3
2 
 .
1 
0
Also find a complex matrix T such that T t AT is diagonal with only entries 0 and 1.
87
8
Inner product spaces
8.1
Definitions
We will be interested in positive definite hermitian forms.
Definition 8.1.1. Let V be a complex vector space. A hermitian form f is called an inner
product (or scalar product) if f is positive definite. In this case we call V a (complex) inner
product space.
An example is the standard dot product:
hu, vi = u1 v1 + · · · + un vn .
It
p is customary to write an inner product f (u, v) as hu, vi. In addition, we write kuk =
hu, ui. This is the norm induced by the inner product h·, ·i. In fact, (V, d) is a metric
space, using
d(u, v) = ku − vk .
Properties of norm. Let (V, h·, ·i) be a complex inner product space.
1. For all c ∈ C, kcuk = |c|kuk.
2. kuk = 0 if and only if u = ~0.
3. (Cauchy-Schwarz inequality) For u, v ∈ V ,
|hu, vi| ≤ kukkvk .
Proof. Define If u or v is ~0 then we are done. Otherwise, set w = u −
0 ≤ hw, wi = hw, ui −
However
hw, vi = hu, vi −
hu, vi
hw, vi .
kvk2
hu, vi
hv, vi = 0 ,
kvk2
so
0 ≤ hw, ui = hu, ui −
and
0 ≤ hu, ui −
hu, vihu, vi
,
kvk2
|hu, vi|2
⇒ |hu, vi|2 ≤ kuk2 kvk2 .
kvk2
88
hu,vi
v.
kvk2
Everything above is an equality so, we have equality if and only if w = ~0, or v and u
are linearly dependent.
4. (Triangle inequality) For u, v ∈ V ,
ku + vk ≤ kuk + kvk .
This is also written ku − vk ≤ ku − wk + kw − vk.
Proof.
ku + vk2 =
=
≤
≤
hu + v, u + vi = hu, ui + hu, vi + hu, vi + hv, vi
hu, ui + 2Re hu, vi + hv, vi
kuk2 + 2 |hu, vi| + kvk2
kuk2 + 2kukkvk + kvk2 = (kuk + kvk)2 .
Taking square roots gives the result.
8.2
Orthogonality
Definition 8.2.1. Given a complex inner product space (V, h·, ·i) we say that vectors u, v ∈ V
are orthogonal if hu, vi = 0.
Theorem 8.2.2. Let v1 , . . . , vk be nonzero and pairwise orthogonal in a complex inner product space. Then they are linearly independent.
Proof. Suppose that
a1 v1 + · · · + ak vk = ~0 .
Then we take inner product with vi .
0 = h~0, vi i =
k
X
aj hvj , vi i = ai kvi k2 .
j=1
Therefore ai = 0.
We begin with a method to transform a linearly independent set into an orthonormal set.
Theorem 8.2.3 (Gram-Schmidt). Let V be a complex inner product space and v1 , . . . , vk ∈ V
which are linearly independent. There exist u1 , . . . , uk such that
1. {u1 , . . . , uk } is orthonormal and
2. for all j = 1, . . . , k, Span({u1 , . . . , uj }) = Span({v1 , . . . , vj }).
89
Proof. We will prove by induction. If k = 1, we must have v1 6= ~0, so set u1 = v1 /kv1 k. This
gives ku1 k = 1 so that {u1 } is orthonormal and certainly the second condition holds.
If k ≥ 2 then assume the statement holds for all j = k − 1. Find vectors u1 , . . . , uk−1 as
in the statement. Now to define uk we set
wk = vk − [hvk , u1 iu1 + · · · + hvk , uk−1 iuk−1 ] .
We claim that wk is orthogonal to all uj ’s and is not zero. To check the first, let 1 ≤ j ≤ k −1
and compute
hwk , uj i = hvk , uj i − [hvk , u1 ihu1 , uj i + · · · + hvk , uk−1 ihuk−1 , uj i]
= hvk , uj i − hvk , uj ihuj , uj i = 0 .
Second, if wk were zero then we would have
vk ∈ Span({u1 , . . . , uk−1 }) = Span({v1 , . . . , vk−1 }) ,
a contradiction to linear independence. Therefore we set uk = wk /kwk k and we see that
{u1 , . . . , uk } is orthonormal and therefore linearly independent.
Furthermore note that by induction,
Span({u1 , . . . , uk }) ⊆ Span({u1 , . . . , uk−1 , vk }) ⊆ Span({v1 , . . . , vk }) .
Since the spaces on the left and right have the same dimension they are equal.
Corollary 8.2.4. If V is a finite-dimensional inner product space then V has an orthonormal
basis.
What do vectors look like represented in an orthonormal basis? Let β = {v1 , . . . , vn } be
a basis and let v ∈ V . Then
v = a1 v1 + · · · + an vn .
Taking inner product with vj on both sides gives aj = hv, vj i, so
v = hv, v1 iv1 + · · · + hv, vn ivn .
Thus we can view in this case (orthonormal) the number hv, vi i as the projection of v onto
vi . We can then find the norm of v easily:
2
kvk
= hv, vi = hv,
n
X
hv, vi ivi i =
i=1
=
n
X
n
X
i=1
|hv, vi i|2 .
i=1
This is known as Parseval’s identity.
90
hv, vi ihv, vi i
Definition 8.2.5. If V is an inner product space and W is a subspace of V we define the
orthogonal complement of V as
W ⊥ = {v ∈ V : hv, wi = 0 for all w ∈ W } .
• {~0}⊥ = V and V ⊥ = {~0}.
• If S ⊆ V then S ⊥ is always a subspace of V (even if S was not). Furthermore,
S ⊥ = (Span S)⊥ and S ⊥
⊥
= Span S .
Theorem 8.2.6. Let V be a finite-dimensional inner product space with W a subspace. Then
V = W ⊕ W⊥ .
Proof. Let {w1 , . . . , wk } be a basis for W and extend it to a basis {w1 , . . . , wn } for V . Then
perform Gram-Schmidt to get an orthonormal basis {v1 , . . . , vn } such that
Span({v1 , . . . , vj }) = Span({w1 , . . . , wj }) for all j = 1, . . . , n .
In particular, {v1 , . . . , vk } is an orthonormal basis for W . We claim that {vk+1 , . . . , vn } is a
f to be the span of these vectors. Clearly W
f ⊆ W ⊥ . On
basis for W ⊥ . To see this, define W
the other hand,
W ∩ W ⊥ = {w ∈ W : hw, w0 i = 0 for all w0 ∈ W }
⊆ {w ∈ W : hw, wi = 0} = {~0} .
f = n − k we see
This means that dim W + dim W ⊥ ≤ n, or dim W ⊥ ≤ n − k. Since dim W
they are equal.
This leads us to a definition.
Definition 8.2.7. Let V be a finite dimensional inner product space. If W is a subspace of
V we write PW : V → V for the operator
PW (v) = w1 ,
where v is written uniquely as w1 +w2 for w1 ∈ W and w2 ∈ W ⊥ . PW is called the orthogonal
projection onto W .
Properties of orthogonal projection.
1. PW is linear.
2
2. PW
= PW .
3. PW ⊥ = I − PW .
91
4. For all v1 , v2 ∈ V ,
hPW (v1 ), v2 i = hPW (v1 ), PW (v2 )i + hPW (v1 ), PW ⊥ (v2 )i
= hPW (v1 ), PW (v2 )i + hPW ⊥ (v1 ), PW (v2 )i = hv1 , PW (v2 )i .
Alternatively one may define an orthogonal projection as a linear map with properties 2 and
4. That is, if T : V → V is linear with T 2 = I and hT (v), wi = hv, T (w)i for all v, w ∈ V
then (check this)
• V = R(T ) ⊕ N (T ) where N (T ) = (R(T ))⊥ and
• T = PR(T ) .
Example. Orthogonal projection onto a 1-d subspace. What we saw in the proof of V =
W ⊕ W ⊥ is the following. If W is a subspace of a finite dimensional inner product space,
there exists an orthonormal basis of V of the form β = β1 ∪ β2 , where β1 is an orthonormal
basis of W and β2 is an orthonormal basis of W ⊥ .
Choose an orthonormal basis {v1 , . . . , vn } of V so that v1 spans W and the other vectors
span W ⊥ . For any v ∈ V ,
v = hv, v1 iv1 + hv, v2 iv2 + · · · + hv, vn ivn ,
which is a representation of v in terms of W and W ⊥ . Thus PW (v) = hv, v1 iv1 .
For any vector v 0 we can define the orthogonal projection onto v 0 as PW , where W =
Span{v 0 }. Then we choose w0 = v 0 /kv 0 k as our first vector in the orthonormal basis and
hv, v 0 i 0
v .
P (v) = PW (v) = hv, w iw =
kv 0 k2
0
v0
0
Theorem 8.2.8. If V is a finite dimensional inner product space with W a subspace of V ,
for each v ∈ V , PW (v) is the closest vector in W to v (using the distance coming from k · k).
That is, for all w ∈ W ,
kv − PW (v)k ≤ kv − wk .
Proof. First we note that if w ∈ W and w0 ∈ W ⊥ then the Pythagoras theorem holds:
kw + w0 k2 = hw, w0 i = hw, wi + hw0 , w0 i = kwk2 + kw0 k2 .
We now take v ∈ V and w ∈ W and write v−w = PW (v)+PW ⊥ (v)−w = PW (v)−w+PW ⊥ (v).
Applying Pythagoras,
kv − wk2 = kPW (v) − wk2 + kPW ⊥ (v)k2 ≥ kPW ⊥ (v)k2 = kPW (v) − vk2 .
92
8.3
Adjoints
Theorem 8.3.1. Let V be a finite-dimensional inner product space. If T : V → V is linear
then there exists a unique linear transformation T ∗ : V → V such that for all v, u ∈ V ,
hT (v), ui = hv, T ∗ (u)i .
(5)
We call T ∗ the adjoint of T .
Proof. We will use the Riesz representation theorem.
Lemma 8.3.2 (Riesz). Let V be a finite-dimensional inner product space. For each f ∈ V ∗
there exists a unique z ∈ V such that
f (v) = hv, zi for all v ∈ V .
Given this we will define T ∗ as follows. For u ∈ V we define the linear functional
fu,T : V → C by fu,T (v) = hT (v), ui .
You can check this is indeed a linear functional. By Riesz, there exists a unique z ∈ V such
that
fu,T (v) = hv, zi for all v ∈ V .
We define this z to be T ∗ (u). In other words, for a given u ∈ V , T ∗ (u) is the unique vector
in V with the property
hT (v), ui = fT,u (v) = hv, T ∗ (u)i for all v ∈ V .
Because of this identity, we see that there exists a function T ∗ : V → V such that for all
u, v ∈ V , (5) holds. In other words, given T , we have a way of mapping a vector u ∈ V to
another vector which we call T ∗ (u). We need to know that it is unique and linear.
Suppose that R : V → V is another function such that for all u, v ∈ V ,
hT (v), ui = hv, R(u)i .
Then we see that
hv, T ∗ (u) − R(u)i = hv, T ∗ (u)i − hv, R(u)i = hT (v), ui − hv, R(u)i = 0
for all u, v ∈ V . Choosing v = T ∗ (u) − R(u) gives that
kT ∗ (u) − R(u)k = 0 ,
or T ∗ (u) = R(u) for all u. This means T ∗ = R.
To show linearity, let c ∈ C and u1 , u2 , v ∈ V .
hT (v), cu1 + u2 i = chT (v), u1 i + hT (v), u2 i = chv, T ∗ (u1 )i + hv, T ∗ (u2 )i
= hv, cT ∗ (u1 ) + T ∗ (u2 )i .
93
This means that
hv, T ∗ (cu1 + u2 ) − cT ∗ (u1 ) − T ∗ (u2 )i = 0
for all v ∈ V . Choosing v = T ∗ (cu1 + u2 ) − cT ∗ (u1 ) − T ∗ (u2 ) give that
T ∗ (cu1 + u2 ) = cT ∗ (u1 ) + T ∗ (u2 ) ,
or T ∗ is linear.
Properties of adjoint.
1. T ∗ : V → V is linear. To see this, if w1 , w2 ∈ V and c ∈ F then for all v,
hT (v), cw1 + w2 i = chT (v), w1 i + hT (v), w2 i
= chv, T ∗ (w1 )i + hv, T ∗ (w2 )i
= hv, cT ∗ (w1 ) + T ∗ (w2 )i .
By uniqueness, T ∗ (cw1 + w2 ) = cT ∗ (w1 ) + T ∗ (w2 ).
t
2. If β is an orthonormal basis of V then [T ∗ ]ββ = [T ]ββ .
Proof. If β is an orthonormal basis, remembering that h·, ·i is a sesquilinear form, the
matrix of this in the basis β is simply the identity. Therefore
t
t
hT (v), wi = [T (v)]tβ [w]β = [T ]ββ [v]β [w]β = [v]tβ [T ]ββ [w]β .
On the other hand, for all v, w
hv, T ∗ (w)i = [v]tβ [T ∗ (w)]β = [v]tβ [T ∗ ]ββ [w]β .
Therefore
t
[v]tβ [T ∗ ]ββ [w]β = [v]tβ [T ]ββ [w]β .
Choosing v = vi and w = vj tells that all the entries of the two matrices are equal.
3. (T + S)∗ = T ∗ + S ∗ .
Proof. If v, w ∈ V ,
h(T + S)(v), wi = hT (v), wi + hS(v), wi = hv, T ∗ (w)i + hv, S ∗ (w)i .
This equals hv, (T ∗ + S ∗ )(w)i.
4. (cT )∗ = cT ∗ . This is similar.
94
5. (T S)∗ = S ∗ T ∗ .
Proof. For all v, w ∈ V ,
h(T S)(v), wi = hT (S(v)), wi = hS(v), T ∗ (w)i = hv, S ∗ (T ∗ (w))i .
This is hv, (S ∗ T ∗ )(w)i.
6. (T ∗ )∗ = T .
Proof. If v, w ∈ V ,
hT ∗ (v), wi = hw, T ∗ (v)i = hT (w), vi = hv, T (w)i .
8.4
Spectral theory of self-adjoint operators
Definition 8.4.1. If V is an inner product space and T : V → V is linear we say that T
1. self-adjoint if T = T ∗ ;
2. skew-adjoint if T = −T ∗ ;
3. unitary if T is invertible and T −1 = T ∗ ;
4. normal if T T ∗ = T ∗ T .
Note all the above operators are normal. Also orthogonal projections are self-adjoint.
Draw relation to complex numbers (SA is real, skew is purely imaginary, unitary is unit
circle).
Theorem 8.4.2. Let V be an inner product space and T : V → V linear with λ an eigenvalue
of T .
1. If T is self-adjoint then λ is real.
2. If T is skew-adjoint then λ is purely imaginary.
3. If T is unitary then |λ| = 1 and |det T | = 1.
Proof. Let T : V → V be linear and λ and eigenvalue for eigenvector v.
1. Suppose T = T ∗ . Then
hλkvk2 = hT (v), vi = hv, T (v)i = λkvk2 .
But v 6= 0 so λ = λ.
95
2. Suppose that T = −T ∗ . Define S = iT . Then S ∗ = iT ∗ = −iT ∗ = iT = S, so S is
self-adjoint. Now iλ is an eigenvalue of S:
S(v) = (iT )(v) = iλv .
This means iλ is real, or λ is purely imaginary.
3. Suppose T ∗ = T −1 . Then
|λ|2 kvk2 = hT (v), T (v)i = hv, T −1 T vi = kvk2 .
This means |λ| = 1. Furthermore, det T is the product of eigenvalues, so |det T | = 1.
What do these operators look like? If β is an orthonormal basis for V then
t
1. If T = T ∗ then [T ]ββ = [T ]ββ .
∗
2. If T = −T then
[T ]ββ
t
β
= − [T ]β .
Lemma 8.4.3. Let V be a finite-dimensional inner product space. If T : V → V is linear,
the following are equivalent.
1. T is unitary.
2. For all v ∈ V , kT (v)k = kvk.
3. For all v, w ∈ V , hT (v), T (w)i = hv, wi.
Proof. If T is unitary then hT (v), T (v)i = hv, T −1 T vi = hv, vi. This shows 1 implies 2. If T
preserves norm then it also preserves inner product by the polarization identity. This proves
2 implies 3. To see that 3 implies 1, we take v, w ∈ V and see
hv, wi = hT (v), T (w)i = hv, T ∗ T (w)i .
This implies that hv, w − T ∗ T (w)i = 0 for all v. Taking v = w − T ∗ T (w) gives that
T ∗ T (w) = w. Thus T must be invertible and T ∗ = T −1 .
• Furthermore, T is unitary if and only if T maps orthonormal bases to orthonormal
bases. In particular, [T ]ββ has orthonormal columns whenever β is orthonormal.
• For β an orthonormal basis, the unitary operators are exactly those whose matrices
relative to β have orthonormal columns.
We begin with a definition.
96
Definition 8.4.4. If V is a finite-dimensional inner product space and T : V → V is linear,
we say that T is unitarily diagonalizable if there exists an orthonormal basis β of V such
that [T ]ββ is diagonal.
Note that T is unitarily diagonalizable if and only if there exists a unitary operator U
such that
U −1 T U is diagonal .
Theorem 8.4.5 (Spectral theorem). Let V be a finite-dimensional inner product space. If
T : V → V is self-adjoint then T is unitarily diagonalizable.
Proof. We will use induction on dim V = n. If n = 1 just choose a vector of norm 1.
Otherwise suppose the statement is true for n < k and we will show it for k ≥ 2. Since T
has an eigenvalue λ, it has an eigenvector, v1 . Choose v1 with norm 1.
Let Uλ = T − λI. We claim that
V = N (Uλ ) ⊕ R(Uλ ) .
To show this we need only prove that R(Uλ ) = N (Uλ )⊥ . This will follow from a lemma:
Lemma 8.4.6. Let V be a finite-dimensional inner product space and U : V → V linear.
Then
R(U ) = N (U ∗ )⊥ .
Proof. If w ∈ R(U ) then let z ∈ N (U ∗ ). There exists v ∈ V such that U (v) = w. Therefore
hw, zi = hU (v), zi = hv, U ∗ (z)i = 0 .
Therefore R(U ) ⊆ N (U ∗ )⊥ . For the other containment, note that dim R(U ) = dim R(U ∗ )
(since the matrix of U ∗ in an orthonormal basis is just the conjugate transpose of that of
U ). Therefore
dim R(U ) = dim R(U ∗ ) = dim V − dim N (U ∗ ) = dim N (U ∗ )⊥ .
Now we apply the lemma. Note that since T is self-adjoint,
Uλ∗ = (T − λI)∗ = T ∗ − λI = T − λI = Uλ ,
since λ ∈ R. Thus using the lemma with Uλ ,
V = N (Uλ∗ ) ⊕ N (Uλ∗ )⊥ = N (Uλ ) ⊕ R(Uλ ) .
Note that these are T -invariant spaces and dim R(Uλ ) < k since λ is an eigenvalue of T . Thus
there is an orthonormal basis β 0 of R(Uλ ) such that T restricted to this space is diagonal.
Taking
β = β 0 ∪ {v1 }
gives now an orthonormal basis such that [T ]ββ is block diagonal. But the block for {v1 } is
of size 1, so [T ]ββ is diagonal.
97
• Note that if T is skew-adjoint, iT is self-adjoint, so we can find an orthonormal basis
β such that [iT ]ββ is diagonal. This implies that T itself can be diagonalized by β: its
matrix is just −i[iT ]ββ .
8.5
Normal and commuting operators
Lemma 8.5.1. Let U, T : V → V be linear and F be algebraically closed. Write ÊλT1 , . . . , ÊλTk
for the generalized eigenspaces for T . If T and U commute then
V = ÊλT1 ⊕ · · · ⊕ ÊλTk
is both a T -invariant direct sum and a U -invariant direct sum.
Proof. We need only show that the generalized eigenspaces of T are U -invariant. If v ∈
N (T − λi I)m then
(T − λi I)m (U (v)) = U (T − λi I)m v = ~0 .
Theorem 8.5.2. Let U, T : V → V be linear and F algebraically closed. Suppose that T
and U commute. Then
1. If T and U are diagonalizable then there exists a basis β such that both [T ]ββ and [U ]ββ
are diagonal.
2. If V is an inner product space and T and U are self-adjoint then we can choose β to
be orthonormal.
Proof. Suppose that T and U are diagonalizable. Then the direct sum of eigenspaces for T
is simply
EλT1 ⊕ · · · ⊕ EλTk ,
the eigenspaces. For each j, choose a Jordan basis βj for U on EλTj . Set β = ∪kj=1 βj . These
are all eigenvectors for T so [T ]ββ is diagonal. Further, [U ]ββ is in Jordan form. But since U
is diagonalizable, its Jordan form is diagonal. By uniqueness, [U ]ββ is diagonal.
If T and U are self-adjoint the decomposition
V = EλT1 ⊕ · · · ⊕ EλTk
is orthogonal. For each j, choose an orthonormal basis βj of EλTj of eigenvectors for U (since
U is self-adjoint on EλTj ). Now β = ∪ki=1 βi is an orthonormal basis of eigenvectors for both
T and U .
Theorem 8.5.3. Let V be a finite-dimensional inner product space. If T : V → V is linear
then T is normal if and only if T is unitarily diagonalizable.
98
Definition 8.5.4. If V is an inner product space and T : V → V is linear, we write
T = (1/2)(T + T ∗ ) + (1/2)(T − T ∗ ) = T1 + T2
and call these operators the self-adjoint part and the skew-adjoint part of T respectively.
Of course each part of a linear transformation can be unitarily diagonalized on its own.
We now need to diagonalize them simultaneously.
Proof. If T is unitarily diagonalizable, then, taking U such that T = U −1 DU for a diagonal
D, we get
∗
T T ∗ = U −1 DU U −1 DU = U ∗ DU U ∗ D∗ U = U ∗ DD∗ U = · · · = T ∗ T ,
or T is normal.
Suppose that T is normal. Then T1 and T2 commute. Note that T1 is self-adjoint and
iT2 is also. They commute so we can find an orthonormal basis β such that [T1 ]ββ and [iT2 ]ββ
are diagonal. Now
[T ]ββ = [T1 ]ββ − i[iT2 ]ββ
is diagonal.
8.6
Exercises
Notation
1. If V is a vector space over R and h·, ·i : V × V → R is a positive-definite symmetric
bilinear form, then we call h·, ·i a (real) inner product. The pair (V, h·, ·i) is called a
(real) inner product space.
2. If (V, h·, ·i) is a real inner product space and S is a subset of V we say that S is
orthogonal if hv, wi = 0 whenever v, w ∈ S are distinct. We say S is orthonormal if S
is orthogonal and hv, vi = 1 for all v ∈ S.
3. If f is a symmetric bilinear form on a vector space V the orthogonal group is the set
O(f ) = {T : V → V | f (T (u), T (v)) = f (u, v) for all u, v ∈ V } .
Exercises
1. Let V be a complex inner product space. Let T ∈ L(V, V ) be such that T ∗ = −T .
We call such T skew-self-adjoint. Show that the eigenvalues of T are purely imaginary.
Show further that V is the orthogonal direct sum of the eigenspaces of T . In other
words, V is a direct sum of the eigenspaces and hv, wi = 0 if v and w are in distinct
eigenspaces.
Hint: Construct from T a suitable self-adjoint operator and apply the known results
from the lecture to that operator.
99
2. Let V be a complex inner product space, and T ∈ L(V, V ).
(a) Show that T is unitary if and only if it maps orthonormal bases to orthonormal
bases.
(b) Let β be an orthonormal basis of V . Show that T is unitary if and only if the
columns of the matrix matrix [T ]ββ form a set of orthonormal vectors in Cn with
respect to the standard hermitian form (standard dot product).
3. Let (V, h·, ·i) be a complex inner product space, and η be a Hermitian form on V (in
addition to h·, ·i ). Show that there exists an orthonormal basis of V such that [η]ββ is
diagonal, by completing the following steps:
(a) Show that for each w ∈ V , there exists a unique vector, which we call Aw, in V
with the property that for all v ∈ V ,
η(v, w) = hv, Awi.
(b) Show the the map A : V → V which sends a vector w ∈ V to the vector Aw just
defined, is linear and self-adjoint.
(c) Use the spectral theorem for self-adjoint operators to complete the problem.
4. Let (V, h·, ·i) be a real inner product space.
(a) Define k · k : V → R by
kvk =
p
hv, vi .
Show that for all v, w ∈ V ,
|hv, wi| ≤ kvkkwk .
(b) Show that k · k is a norm on V .
(c) Show that there exists an orthonormal basis β of V .
5. Let (V, h·, ·i) be a real inner product space and T : V → V be linear.
(a) Prove that for each f ∈ V ∗ there exists a unique z ∈ V such that for all v ∈ V ,
f (v) = hv, zi .
(b) For each u ∈ V define fu,T : V → V by
fu,T (v) = hT (v), ui .
Prove that fu,T ∈ V ∗ . Define T t (u) to be the unique u ∈ V such that for all
v ∈V,
hT (v), ui = hv, T t (u)i
and show that T t is linear.
100
(c) Show that if β is an orthonormal basis for V then
t
[T t ]ββ = [T ]ββ .
6. Let (V, h·, ·i) be a real inner product space and define the complexification of h·, ·i as
in homework 11 by
h(v, w), (x, y)iC = hv, xi + hw, yi − ihv, yi + ihw, xi .
(a) Show that h·, ·iC is an inner product on VC .
(b) Let T : V → V be linear.
i. Prove that (TC )∗ = (T t )C .
ii. If T t = T then we say that T is symmetric. Show in this case that TC is
Hermitian.
iii. It T t = −T then we say T is anti-symmetric. Show in this case that TC is
skew-adjoint.
iv. If T is invertible and T t = T −1 then we say that T is orthogonal. Show in
this case that TC is unitary. Show that this is equivalent to
T ∈ O(h·, ·i) ,
where O(h·, ·i) is the orthogonal group for h·, ·i.
7. Let (V, h·, ·i) be a real inner product space and T : V → V be linear.
(a) Suppose that T T t = T t T . Show then that TC is normal. In this case, we can find
a basis β of VC such that β is orthonormal (with respect to h·, ·iC ) and [TC ]ββ is
diagonal. Define the subspaces of V
X1 , . . . , Xr , Y1 , . . . , Y2m
as in problem 1, question 3. Show that these are mutually orthogonal; that is, if
v, w are in different subspaces then hv, wi = 0.
(b) If T is symmetric then show that there exists an orthonormal basis β of V such
that [T ]ββ is diagonal.
(c) If T is skew-symmetric, what is the form of the matrix of T in real Jordan form?
(d) A ∈ M2×2 (R) is called a rotation matrix if there exists θ ∈ [0, 2π) such that
cos θ sin θ
A=
.
− sin θ cos θ
If T is orthogonal, show that there exists a basis β of V such that [T ]ββ is block
diagonal, and the blocks are either 2 × 2 rotation matrices or 1 × 1 matrices
consisting of 1 or −1.
Hint. Use the real Jordan form.
101