Benjamin McKay
Abstract Linear Algebra
October 19, 2016
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Contents
I
Basic Definitions
1
Vector Spaces
2
Fields
3
Direct Sums of Subspaces
II
Jordan Normal Form
41
4
Jordan Normal Form
43
5
Decomposition and Minimal Polynomial
6
Matrix Functions of a Matrix Variable
7
Symmetric Functions of Eigenvalues
8
The Pfaffian
3
21
89
Dual Spaces
91
10 Singular Value Factorization
11 Factorizations
101
IV Tensors 105
12 Quadratic Forms
107
13 Tensors and Indices
14 Tensors
127
15 Exterior Forms
Hints
161
List of notation
Index
137
141
Bibliography
165
iii
35
77
III Factorizations
9
1
163
117
95
53
61
71
Basic Definitions
Chapter 1
Vector Spaces
The ideas of linear algebra apply more widely, in more abstract spaces than Rn .
Definition
To avoid rewriting everything twice, once for real numbers and once for complex
numbers, let K stand for either R or C.
Definition 1.1. A vector space V over K is a set (whose elements are called
vectors) equipped with two operations, addition (written +) and scaling (written
·), so that
a. Addition laws:
1. u + v is in V
2. (u + v) + w = u + (v + w)
3. u + v = v + u
for any vectors u, v, w in V ,
b. Zero laws:
1. There is a vector 0 in V so that 0 + v = v for any vector v in V .
2. For each vector v in V , there is a vector w in V , for which v + w = 0.
c. Scaling laws:
1. av is in V
2. 1 · v = v
3. a(bv) = (ab)v
4. (a + b)v = av + bv
5. a(u + v) = au + av
for any numbers a, b ∈ K, and any vectors u and v in V .
Because (u + v) + w = u + (v + w), we never need parentheses in adding up
vectors.
3
4
Vector Spaces
Kn is a vector space, with the usual addition and scaling.
The set V of all real-valued functions of a real variable is a vector space:
we can add functions (f + g)(x) = f (x) + g(x), and scale functions:
(c f )(x) = c f (x). This example is the main motivation for developing
an abstract theory of vector spaces.
Take some region inside Rn , like a box, or a ball, or several boxes and
balls glued together. Let V be the set of all real-valued functions of
that region. Unlike Rn , which comes equipped with the standard basis,
there is no “standard basis” of V . By this, we mean that there is no
collection of functions fi we know how to write down so that every
function f is a unique linear combination of the fi . Even still, we can
generalize a lot of ideas about linear algebra to various spaces like V
instead of just Rn . Practically speaking, there are only two types of
vector spaces that we ever encounter: Rn (and its subspaces) and the
space V of real-valued functions defined on some region in Rn (and its
subspaces).
The set Kp×q of p × q matrices is a vector space, with usual matrix
addition and scaling.
1.1 If V is a vector space, prove that
a. 0 v = 0 for any vector v, and
b. a 0 = 0 for any scalar a.
1.2 Let V be the set of real-valued polynomial functions of a real variable.
Prove that V is a vector space, with the usual addition and scaling.
1.3 Prove that there is a unique vector w for which v + w = 0. (Lets always
call that vector −v.) Prove also that −v = (−1)v.
We will write u − v for u + (−v) from now on. We define linear relations,
linear independence, bases, subspaces, bases of subspaces, and dimension using
exactly the same definitions as for Rn .
Remark 1.2. Thinking as much as possible in terms of abstract vector spaces
saves a lot of hard work. We will see many reasons why, but the first is that
every subspace of any vector space is itself a vector space.
Review problems
Review problems
1.4 Prove that if u + v = u + w then v = w.
1.5 Imagine that the population pj at year j is governed (at least roughly) by
some equation
pj+1 = apj + bpj−1 + cpj−2 .
Prove that for fixed a, b, c, the set of all sequences . . . , p1 , p2 , . . . which satisfy
this law is a vector space.
1.6 Give examples of subsets of the plane
a. invariant under scaling of vectors (sending u to au for any number a),
but not under addition of vectors. (In other words, if you scale vectors
from your subset, they have to stay inside the subset, but if you add some
vectors from your subset, you don’t always get a vector from your subset.)
b. invariant under addition but not under scaling or subtraction.
c. invariant under addition and subtraction but not scaling.
1.7 Take positive real numbers and “add” by the law u ⊕ v = uv and “scale”
by a u = ua . Prove that the positive numbers form a vector space with these
funny laws for addition and multiplication.
1.8 Which of the following sets are vector spaces (with the usual addition and
scalar multiplication for real-valued functions)? Justify your answer.
a. The set of all continuous functions of a real variable.
b. The set of all nonnegative functions of a real variable.
c. The set of all polynomial functions of degree exactly 3.
d. The set of all symmetric 10 × 10 matrices A, i.e. A = At .
Bases
We define linear combinations, linear relations, linear independence, bases and
the span of a set of vectors identically.
1.9 Find bases for the following vector spaces:
a. The set of polynomial functions of degree 3 or less.
b. The set of 3 × 2 matrices.
c. The set of n × n upper triangular matrices.
d. The set of polynomial functions p(x) of degree 3 or less which vanish at
the origin x = 0.
5
6
Vector Spaces
Lemma 1.3. The number of elements in a linearly independent set is never
more than the number of elements in a spanning set: if v1 , v2 , . . . , vp ∈ V is
a linearly independent set of vectors and w1 , w2 . . . , wq ∈ V is a spanning set
of vectors in the same vector space, then p ≤ q. Moreover, p = q just when
v1 , v2 , . . . , vp is a basis. In particular, any two bases have the same number of
elements.
Proof. If v1 = 0 then
1 v1 + 0 v2 + 0 v3 · · · + 0 vp = 0,
a linear relation. So v1 6= 0. We can write v1 as a linear combination, say
v1 = b1 w1 + b2 w2 + · · · + bq wq .
Not all of the b1 , b2 , . . . , bq coefficients can vanish, since v1 6= 0. If we relabel
the subscripts, we can arrange that b1 6= 0. Solve for w1 :
1
b2
bq
w1 = v1 − w2 − · · · − wq .
b1
b1
b1
Therefore we can write each of w1 , w2 , w3 , . . . , wq as linear combinations of
v1 , w2 , w3 , . . . , wq . So v1 , w2 , w3 , . . . , wq is a spanning set.
Next replace v1 in this argument by v2 , and then by v3 , etc. We can always
replace one of the vectors w1 , w2 , . . . , wq by each of the vectors v1 , v2 , . . . , vp .
If p ≥ q, we can keep going like this until we replace all of the vectors
w1 , w2 , . . . , wq by the vectors v1 , v2 , . . . , vq : v1 , v2 , . . . , vq is a spanning set. If
p = q, we find that v1 , v2 , . . . , vp span, so form a basis. If p > q, vq+1 is a linear
combination
vq+1 = b1 v1 + b2 v2 + · · · + bq vq ,
a linear relation, a contradiction.
Definition 1.4. The dimension of a vector space V is n if V has a basis consisting
of n vectors. If there is no such value of n, then we say the V has infinite
dimension.
Remark 1.5. We can include the possibility that n = 0 by defining K0 to consist
in just a single vector 0, a zero dimensional vector space.
1.10 Let V be the set of polynomials of degree at most p in n variables. Find
the dimension of V .
Subspaces
The definition of a subspace is identical to that for Rn .
Let V be the set of real-valued functions of a real variable. The set P
of continuous real-valued functions of a real variable is a subspace of V .
Review problems
Let V be the set of all infinite sequences of real numbers. We add a
sequence x1 , x2 , x3 , . . . to a sequence y1 , y2 , y3 , . . . to make the sequence
x1 + y1 , x2 + y2 , x3 + y3 , . . . . We scale a sequence by scaling each entry.
The set of convergent infinite sequences of real numbers is a subspace
of V .
In these last two examples, we see that a large part of analysis is encoded
into subspaces of infinite dimensional vector spaces. (We will define dimension
shortly.)
1.11 Describe some subspaces of the space of all real-valued functions of a real
variable.
Review problems
1.12 Which of the following are subspaces of the space of real-valued functions
of a real variable?
a. The set of everywhere positive functions.
b. The set of nowhere positive functions.
c. The set of functions which are positive somewhere.
d. The set of polynomials which vanish at the origin.
e. The set of increasing functions.
f. The set of functions f (x) for which f (−x) = f (x).
g. The set of functions f (x) each of which is bounded from above and below
by some constant functions.
1.13 Which of the following are subspaces of vector space of all 3 × 3 matrices?
a. The invertible matrices.
b. The noninvertible matrices.
c. The matrices with positive entries.
d. The upper triangular matrices.
e. The symmetric matrices.
f. The orthogonal matrices.
7
8
Vector Spaces
1.14 Prove that for any subspace U of a finite dimensional vector space U ,
there is basis for V
u1 , u2 , . . . , up , v1 , v2 , . . . , vq
so that
u1 , u2 , . . . , up ,
form a basis for U .
1.15
a. Let H be an n × n matrix. Let P be the set of all matrices A for which
AH = HA. Prove that P is a subspace of the space V of all n × n
matrices.
b. Describe this subspace P for
H=
1
0
0
.
−1
Sums and Direct Sums
Suppose that U, W ⊂ V are two subspaces of a vector space. Then the intersection U ∩ W ⊂ V is also a subspace. Let U + W be the set of all sums u + w
for any u ∈ U and w ∈ W . Then U + W ⊂ V is a subspace.
1.16 Prove that U + W is a subspace, and that
dim(U + W ) = dim U + dim W − dim(U ∩ W ).
If U and W are any vector spaces (not necessarily subspaces of any particular
vector space V ) the direct sum U ⊕ W is the set of all pairs (u, w) for any u ∈ U
and w ∈ W . We add pairs and scale pairs in the obvious way:
(u1 , w1 ) + (u2 , w2 ) = (u1 + u2 , w1 + w2 )
and
c (u, w) = (cu, cw) .
If u1 , u2 , . . . , up is a basis for U and w1 , w2 , . . . , wq is a basis for W , then
(u1 , 0) , (u2 , 0) , . . . (up , 0) , (0, w1 ) , (0, w2 ) , . . . (0, wq )
is a basis for U ⊕ W . In particular,
dim (U ⊕ W ) = dim U ⊕ dim W.
Linear Maps
Linear Maps
Definition 1.6. A linear map T between vector spaces U and V is a rule which
associates to each vector x from U a vector T x from V so that
a. T (x0 + x1 ) = T x0 + T x1
b. T (ax) = aT x
for any vectors x0 , x1 and x in U and real number a. We will write T : U → V
to mean that T is a linear map from U to V .
Let U be the vector space of all real-valued functions of real variable.
Imagine 16 scientists standing one at each kilometer along a riverbank,
each measuring the height of the river at the same time. The height at
that time is a function h of how far you are along the bank. The 16
measurements of the function, say h(1), h(2), . . . , h(16), sit as the entries
of a vector in R16 . So we have a map T : U → R16 , given by sampling
values of functions h(x) at various points x = 1, x = 2, . . . , x = 16.
h(1)
h(2)
Th = . .
..
h(16)
This T is a linear map.
Any p × q matrix A determines a linear map T : Rq → Rp , by the
equation T x = Ax. Conversely, given a linear map T : Rq → Rp , define
a p×q matrix A by letting the j-th column of A be T ej . Then T x = Ax.
We say that A is the matrix associated to T . In this way we can identify
the space of linear maps T : Rq → Rp with the space of p × q matrices.
It is convenient to write T = A to mean that T has associated matrix
A.
There is an obvious linear map I : V → V given by Iv = v for any
vector v in V , and called the identity map
Definition 1.7. If S : U → V and T : V → W are linear maps, then T S : U → W
is their composition.
9
10
Vector Spaces
If U, W ⊂ V are subspaces, then there is an obvious linear map
T : U ⊕ W → U + W, T (u, w) = u + w.
This map is a bijection just when U ∩ W = {0}, clearly, in which case
we use this map to identify U ⊕ W with U + W , and say U + W is a
direct sum of subspaces.
1.17 Prove that if A is the matrix associated to a linear map S : Rp → Rq and
B the matrix associated to T : Rq → Rr , then BA is the matrix associated to
their composition.
Remark 1.8. From now on, we won’t distinguish a linear map T : Rq → Rp from
its associated matrix, which we will also write as T . Once again, deliberate
ambiguity has many advantages.
Remark 1.9. A linear map between abstract vector spaces doesn’t have an
associated matrix; this idea only makes sense for maps T : Rq → Rp .
Let U and V be two vector spaces. The set W of all linear maps
T : U → V is a vector space: we add linear maps by (T1 + T2 ) (u) =
T1 (u) + T2 (u), and scale by (cT )u = cT u.
Definition 1.10. The kernel of a linear map T : U → V is the set of vectors u in
U for which T u = 0. The image is the set of vectors v in V of the form v = T u
for some u in U .
Definition 1.11. A linear map T : U → V is an isomorphism if
a. T x = T y just when x = y (one-to-one) for any x and y in U , and
b. For any z in W , there is some x in U for which T x = z (onto).
Two vector spaces U and V are called isomorphic if there is an isomorphism
between them.
Being isomorphic means effectively being the same for purposes of linear
algebra.
Remark 1.12. When working with an abstract vector space V , the role that has
up to now been played by a change of basis matrix will henceforth be played by
an isomorphism F : Rn → V . Equivalently, F e1 , F e2 , . . . , F en is a basis of V .
Let V be the vector space of polynomials p(x) = a + bx + cx2 of degree
11
Linear Maps
at most 2. Let F : R3 → V be the map
a
F b = a + bx + cx2 .
c
Clearly F is an isomorphism.
1.18 Prove that a linear map T : U → V is an isomorphism just when its kernel
is 0, and its image is V .
1.19 Let V be a vector space. Prove that I : V → V is an isomorphism.
1.20 Prove that an isomorphism T : U → V has a unique inverse map T −1 : V →
U so that T −1 T = 1 and T T −1 = 1, and that T −1 is linear.
1.21 Let V be the set of polynomials of degree at most 2, and map T : V → R3
by, for any polynomial p,
p(0)
T p = p(1) .
p(2)
Prove that T is an isomorphism.
Theorem 1.13. If v1 , v2 , . . . , vn is a basis for a vector space V , and w1 , w2 , . . . , wn
are any vectors in a vector space W , then there is a unique linear map T : V →
W so that T vi = wi .
Proof. If there were two such maps, say S and T , then S − T would vanish on
v1 , v2 , . . . , vn , and therefore by linearity would vanish on any linear combination
of v1 , v2 , . . . , vn , therefore on any vector, so S = T .
To see that there is such a map, we know that each vector x in V can be
written uniquely as
x = x1 v1 + x2 v2 + · · · + xn vn .
So lets define
T x = x1 w1 + x2 w2 + · · · + xn wn .
If we take two vectors, say x and y, and write them as linear combinations of
basis vectors, say with
x = x1 v1 + x2 v2 + · · · + xn vn ,
y = y1 v1 + y2 v2 + · · · + yn vn ,
then
T (x + y) = (x1 + y1 ) w1 + (x2 + y2 ) w2 + · · · + (xn + yn ) wn
= T x + T y.
12
Vector Spaces
Similarly, if we scale a vector x by a number a, then
ax = a x1 v1 + a x2 v2 + · · · + a xn vn ,
so that
T (ax) = a x1 w1 + a x2 w2 + · · · + a xn wn
= a T x.
Therefore T is linear.
Corollary 1.14. A vector space V has dimension n just when it is isomorphic
to Kn . To each basis
v1 , v2 , . . . , vn
we associate the unique linear isomorphism F : Rn → V so that
F (e1 ) = v1 , F (e2 ) = v2 , . . . , F (en ) = vn .
Suppose that T : V → W is a linear map between finite dimensional vector
spaces, and we have a basis
v1 , v2 , . . . , vp ∈ V
and a basis
w1 , w2 , . . . , wq ∈ W.
Then we can write each element T vi somehow in terms of these w1 , w2 , . . . , wq ,
say
X
T vj =
Aij wi ,
i
for some numbers Aij . Let A be the matrix with entries Aij ; we say that A is
the matrix of T in these bases.
Let F : Rp → V be the isomorphism
F (ej ) = vj ,
q
and let G : R → W be the isomorphism
G (ei ) = wi .
−1
Then G
p
q
T F : R → R is precisely the linear map
G−1 T F x = Ax,
given by the matrix A. The proof: clearly
Aej = j-th column of A,
A1j
A2j
= . ,
..
Aqj
X
=
Aij ei .
i
13
Review problems
Therefore
GAej =
X
Aij wj = T vj = T F ej .
i
So GA = T F , or A = G−1 T F .
We can now just take all of the theorems we have previously proven about
matrices and prove them for linear maps between finite dimensional vector
spaces, by just replacing the linear map by its matrix. For example,
Theorem 1.15. Let T : U → V be a linear transformation of finite dimensional
vector spaces. Then
dim ker T + dim im T = dim U.
The proof is that the kernels and images are identified when we match up
T and A using F and G as above.
Definition 1.16. If T : U → V is a linear map, and W is a subspace of U , the
restriction, written T |W : W → V , is the linear map defined by T |W (w) = T w
for w in W , only allowing vectors from W to map through T .
Review problems
1.22 Prove that if linear maps satisfy P S = T and P is an isomorphism, then
S and T have the same kernel, and isomorphic images.
1.23 Prove that if linear maps satisfy SP = T , and P is an isomorphism, then
S and T have the same image and isomorphic kernels.
1.24 Prove that dimension is invariant under isomorphism.
1.25 Prove that the space of all p × q matrices is isomorphic to Rpq .
Quotient Spaces
A subspace W of a vector space V doesn’t usually have a natural choice of
complementary subspace. For example, if V = R2 , and W is the vertical axis,
then we might like to choose the horizontal axis as a complement to W . But this
choice is not natural, because we could carry out a linear change of variables,
fixing the vertical axis but not the horizontal axis (for example, a shear along
the vertical direction). There is a natural choice of vector space which plays
the role of a complement, called the quotient space.
Definition 1.17. If V is a vector space and W a subspace of V , and v a vector
in V , the translate v + W of W is a set of vectors in V of the form v + w where
w is in W .
The translates of the horizontal plane through 0 in R3 are just the
horizontal planes.
14
Vector Spaces
1.26 Prove that any subspace W will have
w + W = W,
for any w from W .
Remark 1.18. If we take W the horizontal plane (x3 = 0) in R3 , then the
translates
0
7
0 + W and 2 + W
1
1
are the same, because we can write
7
0
7
0
2 = 0 + 2 + W = 0 + W.
1
1
0
1
This is the main idea behind translates: two vectors make the same translate
just when their difference lies in the subspace.
Definition 1.19. If x + W and y + W are translates, we add them by (x + W ) +
(y + W ) = (x + y) + W . If s is a number, let s(x + W ) = sx + W .
1.27 Prove that addition and scaling of translates is well-defined, independent
of the choice of x and y in a given translate.
Definition 1.20. The quotient space V /W of a vector space V by a subspace W
is the set of all translates v + W of all vectors v in V .
Take V the plane, V = R2 , and W the vertical axis. The translates of
W are the vertical lines in the plane. The quotient space V /W has the
various vertical lines as its points. Each vertical line passes through the
horizontal axis at a single point, uniquely determining the vertical line.
So the translates are the points
x
+ W.
0
The quotient space V /W is just identified with the horizontal axis, by
taking
x
+ W to x.
0
Lemma 1.21. The quotient space V /W of a vector space by a subspace is a
vector space. The map T : V → V /W given by the rule T x = x + W is an onto
linear map.
15
Determinants
Remark 1.22. The concept of quotient space can each be circumvented by using
some complicated matrices, as can everything in linear algebra, so that one
never really needs to use abstract vector spaces. But that approach is far more
complicated and confusing, because it involves a choice of basis, and there is
usually no natural choice to make. It is always easiest to carry out linear algebra
as abstractly as possible, descending into choices of basis at the latest possible
stage.
Proof. One has to check that (x + W ) + (y + W ) = (y + W ) + (x + W ), but
this follows from x + y = y + x clearly. Similarly all of the laws of vector spaces
hold. The 0 element of V /W is the translate 0 + W , i.e. W itself. To check
that T is linear, consider scaling: T (sx) = sx + W = s(x + W ), and addition:
T (x + y) = x + y + W = (x + W ) + (y + W ).
Lemma 1.23. If U and W are subspaces of a vector space V , and V = U ⊕ W
a direct sum of subspaces, then the map T : V → V /W taking vectors v to
v + W restricts to an isomorphism T |U : U → V /W .
Remark 1.24. So, while there is no natural complement to W , every choice of
complement is naturally identified with the quotient space.
Proof. The kernel of T is clearly U ∩ W = 0. To see that T is onto, take a
vector v + W in V /W . Because V = U ⊕ W , we can somehow write v as a sum
v = u + w with u from U and w from W . Therefore v + W = u + W = T |U u
lies in the image of T |U .
Theorem 1.25. If V is a finite dimensional vector space and W a subspace
of V , then
dim V /W = dim V − dim W.
Definition 1.26. If T : U → V is a linear map, and U0 ⊂ U and V0 ⊂ V are
subspaces, and T (U0 ) ⊂ V0 , we can define vector spaces U 0 = U/U0 , V 0 = V /V0
and a linear map T 0 : U 0 → V 0 so that T 0 (u + U0 ) = (T u) + V0 .
It is easy to check that T 0 is a well defined linear map.
Determinants
Definition 1.27. If T : V → V is a linear map taking a finite dimensional vector
space to itself, define det T to be
det T = det A,
where F : Rn → V is an isomorphism, and A is the matrix associated to
F −1 T F : Rn → Rn .
Remark 1.28. There is no definition of determinant for a linear map of an
infinite dimensional vector space, and there is no general theory to handle such
things, although there are many important examples.
16
Vector Spaces
Remark 1.29. A map T : U → V between different vector spaces doesn’t have
a determinant.
1.28 Prove that value of the determinant is independent of the choice of isomorphism F .
1.29 Let V be the vector space of polynomials of degree at most 2, and let
T : V → V be the linear map T p(x) = 2p(x − 1) (shifting a polynomial p(x) to
2p(x − 1).) For example, T 1 = 2, T x = 2(x − 1), T x2 = 2(x − 1)2 .
a. Prove that T is a linear map.
b. Prove that T is an isomorphism.
c. Find det T .
Theorem 1.30. Suppose that S : V → V and T : V → V are diagonalizable
linear maps, i.e. each has a basis of eigenvectors. Then ST = T S just when
there is a basis which consists of eigenvectors simultaneously for both S and T .
This is hard to prove for matrices, but easy in the abstract setting of linear
maps.
Proof. If there is a basis of simultaneous eigenvectors, then clearly the matrices
of S and T are diagonal in that basis, so commute, so ST = T S. Now suppose
that ST = T S.
Clearly the result is true if dim V = 1. More generally, clearly the result is
true if T = λI for any constant λ, because all vectors in V are then eigenvectors
of T .
More generally, for any eigenvalue λ of T , let Vλ be the λ-eigenspace of T .
Because T is diagonal, the sum
V = Vλ1 ⊕ Vλ2 ⊕ · · · ⊕ Vλp
summed over the eigenvalues λ1 , λ2 , . . . , λp of T is isomorphic to a direct sum.
We claim that each Vλ is S-invariant, for each eigenvalue λ of T . The proof: pick
any vector v ∈ Vλ . We want to prove that Sv ∈ Vλ . Since Vλ is the λ-eigenspace
of T , clearly T v = λv. But then we need to prove that T (Sv) = λ(Sv). This is
easy:
T Sv = ST v,
= Sλv,
= λSv.
Review problems
Review problems
1.30 Let T : V → V be the linear map T x = 2x. Suppose that V has dimension
n. What is det T ?
1.31 Let V be the vector space of all 2×2 matrices. Let A be a 2×2 matrix with
two different eigenvalues, λ1 and λ2 , and eigenvectors x1 and x2 corresponding
to these eigenvalues. Consider the linear map T : V → V given by T B =
AB (matrix multiplication on the right hand side of B by A). What are the
eigenvalues of T and what are the eigenvectors? (Warning: the eigenvectors
are vectors from V , so they are matrices.) What is det T ?
1.32 The same but let T B = BA.
1.33 Let V be the vector space of polynomials of degree at most 2, and let
T : V → V be defined by T q(x) = q(−x). What is the characteristic polynomial
of T ? What are the eigenspaces of T ? Is T diagonalizable?
1.34 (Due to Peter Lax [4].) Consider the problem of finding a polynomial
p(x) with specified average values on each of a dozen intervals on the x-axis.
(Suppose that the intervals don’t overlap.) Does this problem have a solution?
Does it have many solutions? (All you need is a naive notion of average value,
but you can consult a calculus book, for example [9], for a precise definition.)
(a) For each polynomial p of degree n, let T p be the vector whose entries are
the averages. Suppose that the number of intervals is at least n. Show
that T p = 0 only if p = 0.
(b) Suppose that the number of intervals is no more than n. Show that we
can solve T p = b for any given vector b.
1.35 How many of the “nutshell” criteria for invertibility of a matrix can you
translate into criteria for invertibility of a linear map T : U → V ? How much
more if we assume that U and V are finite dimensional? How much more if we
assume as well that U = V ?
Complex Vector Spaces
If we change the definition of a vector space, a linear map, etc. to use complex
numbers instead of real numbers, we have a complex vector space, complex
linear map, etc. All of the examples so far in this chapter work just as well with
complex numbers replacing real numbers. We will refer to a real vector space
or a complex vector space to distinguish the sorts of numbers we are using to
scale the vectors. Some examples of complex vector spaces:
a. Cn
b. The space of p × q matrices with complex entries.
17
18
Vector Spaces
c. The space of complex-valued functions of a real variable.
d. The space of infinite sequences of complex numbers.
Inner Product Spaces
Definition 1.31. An inner product on a real vector space V is a choice of a real
number hx, yi for each pair of vectors x and y so that
a. hx, yi is a real-valued linear map in x for each fixed y
b. hx, yi = hy, xi
c. hx, xi ≥ 0 and equal to 0 just when x = 0.
A real vector space equipped with an inner product is called a inner product
space. A linear map between vector spaces is called orthogonal if it preserves
inner products.
Theorem 1.32. Every inner product space of dimension n is carried by some
orthogonal isomorphism to Rn with its usual inner product.
Proof. Use the Gram–Schmidt process to construct an orthonormal basis, using
the same formulas we have used before, say u1 , u2 , . . . , un . Define a linear map
F x = x1 u1 + · · · + xn un , for x in Rn . Clearly F is an orthogonal isomorphism.
Take A any symmetric n × n matrix with positive eigenvalues, and let
hx, yiA = hAx, yi (with the usual inner product on Rn appearing on
the right hand side). Then the expression hx, yiA is an inner product.
Therefore by the theorem, we can find a change of variables taking it
to the usual inner product.
Definition 1.33. A linear map T : V → V from an inner product space to itself
is symmetric if hT v, wi = hv, T wi for any vectors v and w.
Theorem 1.34 (Spectral Theorem). Given a symmetric linear map T on a
finite dimensional inner product space V , there is an orthogonal isomorphism
F : Rn → V for which F −1 T F is the linear map of a diagonal matrix.
Hermitian Inner Product Spaces
Definition 1.35. A Hermitian inner product on a complex vector space V is a
choice of a complex number hz, wi for each pair of vectors z and w from V so
that
a. hz, wi is a complex-valued linear map in z for each fixed w
Review problems
b. hz, wi = hw, zi
c. hz, zi ≥ 0 and equal to 0 just when z = 0.
Review problems
1.36 Let V be the vector space of complex-valued polynomials of a complex
variable of degree at most 3. Prove that for any four distinct points z1 , z2 , z3 , z4 ,
the expression
hp(z), q(z)i = p (z0 ) q̄ (z0 ) + p (z1 ) q̄ (z1 ) + p (z2 ) q̄ (z2 ) + p (z3 ) q̄ (z3 ) +
is a Hermitian inner product.
1.37 Continuing the previous question, if the points z0 , z1 , z2 , z3 are z0 =
1, z1 = −1, z2 = i, z3 = −i, prove that the map T : V → V given by T p(z) =
p(−z) is unitary.
1.38 Continuing the previous two questions, unitarily diagonalize T .
1.39 State and prove a spectral theorem for normal complex linear maps
T : V → V on a Hermitian inner product space, and define the terms adjoint,
normal and unitary for complex linear maps V → V .
19
Chapter 2
Fields
Instead of real or complex numbers, we can dream up wilder notions of numbers.
Definition 2.1. A field is a set F equipped with operations + and · so that
a. Addition laws
1. x + y is in F
2. (x + y) + z = x + (y + z)
3. x + y = y + x
for any x, y and z from F .
b. Zero laws
1. There is an element 0 of F for which x + 0 = x for any x from F
2. For each x from F there is a y from F so that x + y = 0.
c. Multiplication laws
1. xy is in F
2. x(yz) = (xy)z
3. xy = yx
for any x, y and z in F .
d. Identity laws
1. There is an element 1 in F for which x1 = 1x = x for any x in F .
2. For each x 6= 0 there is a y 6= 0 for which xy = 1. (This y is called
the reciprocal or inverse of x.)
3. 1 6= 0
e. Distributive law
1. x(y + z) = xy + xz
for any x, y and z in F .
We will not ask the reader to check all of these laws in any of our examples,
because there are just too many of them. We will only give some examples; for
a proper introduction to fields, see Artin [1].
21
22
Fields
Of course, the set of real numbers R is a field (with the usual addition
and multiplication), as is the set C of complex numbers and the set Q
of rational numbers. The set Z of integers is not a field, because the
integer 2 has no integer reciprocal.
Let F be the set of all rational functions p(x)/q(x), with p(x) and q(x)
polynomials, and q(x) not the 0 polynomial. Clearly for any pair of
rational functions, the sum
p1 (x)q2 (x) + q1 (x)p2 (x)
p1 (x) p2 (x)
+
=
q1 (x)
q2 (x)
q1 (x)q2 (x)
is also rational, as is the product, and the reciprocal.
2.1 Suppose that F is a field. Prove the uniqueness of 0, i.e. that there is only
one element z = 0 in F which satisfies x + z = x for any element x.
2.2 Prove the uniqueness of 1.
2.3 Let x be an element of a field F . Prove the uniqueness of the element y
for which x + y = 0.
Henceforth, we write this y as −x.
2.4 Let x be an element of field F . If x 6= 0, prove the uniqueness of the
reciprocal.
Henceforth, we write the reciprocal of x as
1
x,
and write x + (−y) as x − y.
Some Finite Fields
Let F be the set of numbers F = {0, 1}. Carry out multiplication by
the usual rule, but when you add, x + y won’t mean the usual addition,
but instead will mean the usual addition except when x = y = 1, and
then we set 1 + 1 = 0. F is a field called the field of Boolean numbers.
2.5 Prove that for Boolean numbers, −x = x and
1
x
= x.
Suppose that p is a positive integer. Let F be the set of numbers
Fp = {0, 1, 2, . . . , p − 1}. Define addition and multiplication as usual
for integers, but if the result is bigger than p−1, then subtract multiples
of p from the result until it lands in Fp , and let that be the definition
23
Some Finite Fields
of addition and multiplication. F2 is the field of Boolean numbers. We
usually write x = y (mod p) to mean that x and y differ by a multiple
of p.
For example, if p = 7, we find
5 · 6 = 30 (mod 7)
= 30 − 28
=2
(mod 7)
(mod 7).
This is arithmetic in F7 .
It turns out that Fp is a field for any prime number p.
2.6 Prove that Fp is not a field if p is not prime.
The only trick in seeing that Fp is field is to see why there is a reciprocal.
It can’t be the usual reciprocal as a number. For example, if p = 7
6 · 6 = 36 (mod 7)
= 36 − 35
=1
(mod 7)
(mod 7)
(because 35 is a multiple of 7). So 6 has reciprocal 6 in F7 .
The Euclidean Algorithm
To compute reciprocals, we first need to find greatest common divisors,
using the Euclidean algorithm. The basic idea: given two numbers, for
example 12132 and 2304, divide the smaller into the larger, writing a
quotient and remainder:
12132 − 5 · 2304 = 612.
Take the two last numbers in the equation (2304 and 612 in this example), and repeat the process on them, and so on:
2304 − 3 · 612 = 468
612 − 1 · 468 = 144
468 − 3 · 144 = 36
144 − 4 · 36 = 0.
Stop when you hit a remainder of 0. The greatest common divisor of
the numbers you started with is the last nonzero remainder (36 in our
example).
Now that we can find the greatest common divisor, we will need to
write the greatest common divisor as an integer linear combination of
the original numbers. If we write the two numbers we started with
as a and b, then our goal is to compute integers u and v for which
24
Fields
ua + bv = gcd(a, b). To do this, lets go backwards. Start with the
second to last equation, giving the greatest common divisor.
36 = 468 − 3 · 144
Plug the previous equation into it:
= 468 − 3 · (612 − 1 · 468)
Simplify:
= −3 · 612 + 4 · 468
Plug in the equation before that:
= −3 · 612 + 4 · (2304 − 3 · 612)
= 4 · 2304 − 15 · 612
= 4 · 2304 − 15 · (12132 − 5 · 2304)
= −15 · 12132 + 79 · 2304.
We have it: gcd(a, b) = u a + b v, in our case 36 = −15 · 12132 + 79 · 2304.
What does this algorithm do? At each step downward, we are facing an
equation like a − bq = r, so any number which divides into a and b must
divide into r and b (the next a and b) and vice versa. The remainders
r get smaller at each step, always smaller than either a or b. On the
last line, b divides into a. Therefore b is the greatest common divisor
of a and b on the last line, and so is the greatest common divisor of the
original numbers. We express each remainder in terms of previous a
and b numbers, so we can plug them in, cascading backwards until we
express the greatest common divisor in terms of the original a and b.
In the example, that gives (−15)(12132) + (79)(2304) = 36.
Let compute a reciprocal modulo an integer. Lets compute 17−1 modulo
1009. Take a = 1009, and b = 17.
1009 − 59 · 17 = 6
17 − 2 · 6 = 5
6−1·5=1
5 − 5 · 1 = 0.
Going backwards
1=6−1·5
= 6 − 1 · (17 − 2 · 6)
= −1 · 17 + 3 · 6
= −1 · 17 + 3 · (1009 − 59 · 17)
= −178 · 17 + 3 · 1009.
25
Matrices
So finally, modulo 1001, −178 · 17 = 1. So 17−1 = −178 = 1009 − 178 =
831 (mod 1009).
This is how we can compute reciprocals in Fp : we take a = p, and b
the number to reciprocate, and apply the process. If p is prime, the
resulting greatest common divisor is 1, and so we get up + vb = 1, and
so vb = 1 (mod p), so v is the reciprocal of b.
2.7 Compute 15−1 in F79 .
2.8 Solve the linear equation 3x + 1 = 0 in F5 .
2.9 Prove that Fp is a field whenever p is a prime number.
Matrices
Matrices with entries from any field F are added, subtracted, and multiplied by
the same rules. We can still carry out forward elimination, back substitution,
calculate inverses, determinants, characteristic polynomials, eigenvectors and
eigenvalues, using the same steps.
2.10 Let F be the Boolean numbers, and
0 1
A = 1 0
1 1
A the matrix
0
1 ,
0
thought of as having entries from F . Is A invertible? If so, find A−1 .
All of the ideas of linear algebra worked out for the real and complex numbers
have obvious analogues over any field, except for the concept of inner product,
which is much more sophisticated. From now on, we will only state and prove
results for real vector spaces, but those results which do not require inner
products (or orthogonal or unitary matrices) continue to hold with identical
proofs over any field.
2.11 If A is a matrix whose entries are rational functions of a variable t, prove
that the rank of A is constant in t, except for finitely many values of t.
Polynomials
Consider the field
F2 = {0, 1} .
Consider the polynomial
p(x) = x2 + x.
26
Fields
Clearly
p(0) = 02 + 0,
= 0.
Keeping in mind that 2 = 0 in F2 , clearly
p(1) = 12 + 1,
= 1 + 1,
= 0.
Therefore p(x) = 0 for any value of x in F2 . So p(x) is zero, as a function.
But we will still want to say that p(x) is not zero as a polynomial, because
it is x2 + x, a sum of powers of x with nonzero coefficients. We should think
of polynomials as abstract expressions, sums of constants times powers of a
variable x, and distinguish them from polynomial functions. Think of x as
just a symbol, abstractly, not representing any value. So p(x) is nonzero as a
polynomial (because it has nonzero coefficients), but p(x) is zero as a polynomial
function.
A rational function is a ratio p(x)/q(x) of polynomials, with q(x) not the
zero polynomial. CAREFUL: it isn’t really a function, and should probably
be called something like a rational expression. We are stuck with the standard
terminology here. We consider two such expressions to be the same after
cancellation of any common factor from numerator and denominator. So 1/x is
a rational function, in any field, and x/x2 = 1/x in any field. Define addition,
subtraction, multiplication and division of rational functions as you expect. For
example,
1
1
2x + 1
+
=
,
x x+1
x(x + 1)
over any field.
CAREFUL: over the field F2 , we know that x2 + x vanishes for every x. So
the rational function
1
f (x) = 2
x +x
is actually not defined, no matter what value of x you plug in, because the
denominator vanishes. But we still consider it a perfectly well defined rational
function, since it is made out of perfectly well defined polynomials.
If x is an abstract variable (think of just a letter, not a value taken from any
field), then we write F (x) for the set of all rational functions p(x)/q(x). Clearly
F (x) is a field.fs For example, F2 (x) contains 0, 1, x, 1 + x, 1/x, 1/(x + 1), . . . .
Subfields
If K is a field, a subfield F ⊂ K is a subset containing 0 and 1 so that if a and
b are in F , then a + b, a − b and ab are in F , and, if b 6= 0, then a/b is in F .
In particular, F is itself a field. For example, Q ⊂ R, R ⊂ C, and Q ⊂ C are
subfields. Another example: if F is any field, then F ⊂ F (x) is a subfield.
27
Splitting fields
2.12 Find all of the subfields of F7 .
2.13 Find a subfield of C other than R or Q.
Example: Over the field R, the polynomial x2 +1 has no roots. A polynomial
p(x) with coefficients in a field F splits if it is a product of linear factors. If
F ⊂ K is a subfield, we say that a polynomial p(x) splits over K if it splits
into a product of linear factors, allowing the factors to have coefficients from
K. Example: x2 + 1 splits over C:
x2 + 1 = (x − i) (x + i) .
If F ⊂ K is a subfield, then K is an F -vector space. For example, C is an
R-vector space of dimension 2. The dimension of K as an F -vector space is
called the degree of K over F .
Splitting fields
We won’t prove the following theorem:
Theorem 2.2. If F is a field and p(x) is a polynomial over F , then there is
a field K containing F as a subfield, over which p(x) splits into linear factors,
and so that every element of K is expressible as a rational function of the roots
of p(x) with coefficients from F . Moreover, K has finite degree over F .
This field K is uniquely determined up to an isomorphism of fields, and is
called the splitting field of p(x) over F .
For example, over F = R the polynomial p(x) = x2 + 1 has splitting field
C:
x2 + 1 = (x − i) (x + i) .
Example of a splitting field
Consider the polynomial p(x) = x2 + x + 1 over the finite field F2 . Let’s look
for roots of p(x), i.e. eigenvalues. Try x = 0:
p(0) = 02 + 0 + 1 = 1,
no good. Try x = 1:
p(1) = 12 + 1 + 1 = 1 + 1 + 1 = 1,
since 1 + 1 = 0. No good. So p(x) has no eigenvalues in F2 . We know by
theorem 2.2 that there is some splitting field K for p(x), containing F2 , so that
p(x) splits into linear factors over K, say
p(x) = (x − α) (x − β) ,
for some α, β ∈ K.
28
Fields
What can we say about this mysterious field K? We know that it contains
F2 , contains α, contains β, and that everything in it is made up of rational
functions over F2 with α and β plugged in for the variables. We also know that
K has finite dimension over F2 . Otherwise, K is a total mystery: we don’t
know its dimension, or a basis of it over F , or its multiplication or addition
rules, or anything. We know that in F2 , 1 + 1 = 0. Since F2 ⊂ K is a subfield,
this holds in K as well. So in K, for any c ∈ K,
c(1 + 1) = c0 = 0.
Therefore c + c = 0 in K, for any element c ∈ K. Roughly speaking, the
arithmetic rules in F2 impose themselves in K as well.
A clever trick, which you probably wouldn’t notice at first: it turns out that
β = α + 1. Why? Clearly by definition, α is a root of p(x), i.e.
α2 + α + 1 = 0.
So then let’s try α + 1 and check that it is also a root.
2
(α + 1) + (α + 1) + 1 = α2 + 2 α + 1 + α + 1 + 1,
but numbers in K cancel in pairs, c + c = 0, so
= α2 + α + 1,
=0
since α is a root of p(x). So therefore elements of K can be written in terms of
α purely.
The next fact about K: clearly
{0, 1, α, α + 1} ⊂ K.
We want to prove that
{0, 1, α, α + 1} = K.
How? First, lets try to make up an addition table for these 4 elements:
+
0
1
α
α+1
0
0
1
α
α+1
1
1
0
α+1
α
α
α
α+1
0
1
α+1
α+1
α
1
0
To make up a multiplication table, we need to note that
0 = α2 + α + 1,
so that
α2 = −α − 1 = α + 1,
29
Example of a splitting field
and
α(α + 1) = α2 + α = α + 1 + α = 1.
Therefore
(α + 1) (α + 1) = α2 + 2α + 1 = α.
This gives the complete multiplication table:
·
0
0
0
0
0
0
1
α
α+1
1
0
1
α
α+1
α
0
α
α+1
1
α+1
0
α+1
1
α
Looking for reciprocals, we find that
1
does not exist,
0
1
= 1,
1
1
= α + 1,
α
1
= α.
α+1
So {0, 1, α, α + 1} is a field, containing F2 , and p(x) splits over this field, and
the field is generated by F2 and α, so this field must be the splitting field of
p(x):
{0, 1, α, α + 1} = K.
So K is the finite field with 4 elements, K = F4 .
2.14 Consider the polynomial
p(x) = x3 + x2 + 1
over the field F2 . Suppose that that splitting field K of p(x) contains a root α
of p(x). Prove that α2 and 1 + α + α2 are the two other roots. Compute the
addition table and the multiplication table of the 8 elements
0, 1, α, 1 + α, α2 , 1 + α2 , α + α2 , 1 + α + α2 .
Use this to prove that
K = 0, 1, α, 1 + α, α2 , 1 + α2 , α + α2 , 1 + α + α2
so K is the finite field F8 .
30
Fields
Construction of splitting fields
Suppose that F is a field and p(x) is a polynomial over F . We say that p(x) is
irreducible if p(x) does not split into a product of factors over F . Basic fact: if
p(x) is irreducible, and p(x) divides a product q(x)r(x), then p(x) must divide
one of the factors q(x) or r(x).
Suppose that p(x) is a nonconstant irreducible polynomial. (Think for
example of x2 + 1 over F = R, to have some concrete example in mind.) We
have no roots of p(x) in F , so can we construct a splitting field explicitly?
Let V be the vector space of all polynomials over F in a variable x. Let
W ⊂ V be the subspace consisting of all polynomials divisible by p(x). Clearly
if p(x) divides two polynomials, then it divides their sum, and their scalings,
so W ⊂ V is a subspace. Let K = V /W . So K is a vector space.
Every element of K is a translate, so has the form
q(x) + W,
for some polynomial q(x). Any two translates, say q(x) + W and r(x) + W , are
equal just when q(x) − r(x) ∈ W , as in our general theory of quotient spaces.
So this happens just when q(x) − r(x) is divisible by p(x). In other words,
if you write down a translate q(x) + W ∈ K and I write down a translate
r(x) + W ∈ K, then these will be the same translate just exactly when
r(x) = q(x) + p(x)s(x),
for some polynomial s(x) over F .
So far K is only a vector space. Let’s make K into a field. We know have
to add elements of K, since K is a vector space. How do we multiply elements?
Take two elements, say
q(x) + W, r(x) + W,
and try to define their product to be
q(x)r(x) + W.
Is this well defined? If I write the same translates down differently, I could
write them as
q(x) + Q(x)p(x) + W, r(x) + R(x)p(x) + W,
and my product would turn out to be
(q(x) + Q(x)p(x)) (r(x) + R(x)p(x)) + W
=q(x)r(x) + p(x) (q(x)R(x) + Q(x)r(x) + Q(x)R(x)) + W,
=q(x)r(x) + W,
the same translate, since your result and mine agree up to multiples of p(x), so
represent the same translate. So now we can multiple elements of K.
31
Construction of splitting fields
The next claim is that K is a finite dimensional vector space. This is not
obvious, since K = V /W and both V and W are infinite dimensional vector
spaces. Take any element of K, say q(x) + W , and divide q(x) by p(x), say
q(x) = p(x)Q(x) + r(x),
a quotient Q(x) and remainder r(x). Clearly q(x) differs from r(x) by a multiple
of p(x), so
q(x) + W = r(x) + W.
Therefore every element of K can be written as a translate
r(x) + W
for r(x) a polynomial r(x) of degree less than the degree of p(x). Clearly
r(x) is unique, since we can’t quotient out anything of lower degree than p(x).
Therefore K is identified as a vector space with the vector space of polynomials
in x of degree less than the degree of p(x).
The notation is much nicer if we write x + W as, say, α. Then clearly
x2 + W = α2 , etc. so q(x) + W = q(α) for any polynomial q(x) over F . So we
can say that α ∈ K is an element so that p(α) = 0, so p(x) has a root over K.
Moreover, every element of K is a polynomial over F in the element α.
We need to check that K is a field. The hard bit is checking that every
element of K has a reciprocal. Pick any element q(α) ∈ K. We want to
prove that q(α) has a reciprocal, i.e. that there is some element r(α) so that
q(α)r(α) = 1. Fix q(α) and consider the F -linear map
T : K → K,
given by
T (r(α)) = q(α)r(α).
If q(α) 6= 0, then T (r(α)) = 0 just when q(α)r(α) = 0, i.e. just when q(x)r(x)
is divisible by p(x). But since p(x) is irreducible, we know that p(x) divides
q(x)r(x) just when p(x) divides one of q(x) or r(x). But then q(x) + W = 0 or
r(x) + W = 0, i.e. q(α) = 0 or r(α) = 0. We know by hypothesis that q(α) 6= 0,
so r(α) = 0. In other words, the kernel of T is {0}. Therefore T is invertible.
So T is an isomorphism of F -vector spaces. In particular, T is onto. So there
must exist some r(α) ∈ K so that
T (r(α)) = 1,
i.e.
q(α)r(α) = 1,
so q(α) has a reciprocal,
r(α) =
1
.
q(α)
The remaining axioms of fields are easy to check, so K is a field, containing
F , and containing a root α for p(x). Every element of K is a polynomial in α.
32
Fields
The dimension of K over F is finite. We only need to check that p(x) splits
over K into a product of linear factors. Clearly we can split off one linear
factor: x − α, since α is a root of p(x) over K. Inductively, if p(x) doesn’t
completely split into a product of linear factors, we can try to factor out as many
linear factors as possible, and then repeat the whole process for any remaining
nonlinear factors.
If you have to calculate in K, how do you do it? The elements of K look like
q(α), where α is just some formal symbol, and you add and multiply as usual
with polynomials. But we can always assume that q(α) is a polynomial in α of
degree less than the degree of p(x), and then subtract off any p(x)-multiples
when we multiply or add, since p(α) = 0. For example, if p(x) = x2 + 1, and
F = R, then the field K consists of expressions like q(α) = b + cα, where b and
c are any real numbers. When we multiply, we just replace α2 + 1 by 0, i.e.
replace α2 by −1. So we just get K being the usual complex numbers.
Transcendental numbers
Some boring examples: a number a ∈ C is algebraic if it is the solution of a
polynomial equation p(a) = 0 where p(x) is a nonzero polynomial with rational
coefficients. A number which is not algebraic is called transcendental. If x is
an abstract variable, and F is a field, let
F (x)
be the set of all rational functions p(x)/q(x) in the variable x with p(x) and
q(x) polynomials with coefficients from F .
Theorem 2.3. Take an abstract variable x. A number a ∈ C is transcendental
if and only if the field
p(a) p(x)
Q(a) =
∈
Q(x)
and
q(a)
=
6
0
q(a) q(x)
is isomorphic to Q(x).
Proof. If a is transcendental, then the map
φ:
p(x)
p(a)
∈ Q(x) 7→
∈ Q(a)
q(x)
q(a)
is clearly well defined, onto, and preserves all arithmetic operations. Is φ 1-1?
Equivalently, does φ have trivial kernel? Suppose that p(x)/q(x) lies in the
kernel of φ. Then p(a) = 0. Therefore p(x) = 0. Therefore p(x)/q(x) = 0. So
the kernel is trivial, and so φ is a bijection preserving all arithmetic operations,
so φ is an isomorphism of fields.
On the other hand, take any complex number a and suppose that there is
some isomorphism of fields
ψ : Q(x) → Q(a).
33
Transcendental numbers
Let b = ψ(x). Because ψ is a field isomorphism, all arithmetic operations
carried out on x must then be matched up with arithmetic operations carried
out on b, so
p(x)
p(b)
ψ
=
.
q(x)
q(b)
Because ψ is an isomorphism, some element must map to a, say
p0 (x)
ψ
= a.
q0 (x)
So
p0 (b)
= a.
q0 (b)
So Q(b) = Q(a). Any algebraic relation on a clearly gives one on b and vice
versa. Therefore a is algebraic if and only if b is. Suppose that b is algebraic.
Then q(b) = 0 for some polynomial q(x), and then ψ is not defined on 1/q(x),
a contradiction.
Chapter 3
Direct Sums of Subspaces
Subspaces have a kind of arithmetic.
Definition 3.1. The intersection of two subspaces is the collection of vectors
which belong to both of the subspaces. We will write the intersection of subspaces U and V as U ∩ V .
The subspace U of R3 given by the vectors of the form
x1
x2
x1 − x2
intersects the subspace V consisting in the vectors of the form
x1
0
x3
in the subspace written U ∩ V , which consists in the vectors of the form
x1
0 .
x1
Definition 3.2. If U and V are subspaces of a vector space W , write U + V for
the set of vectors w of the form w = u + v for some u in U and v in V ; call
U + V the sum.
3.1 Prove that U + V is a subspace of W .
Definition 3.3. If U and V are two subspaces of a vector space W , we will write
U + V as U ⊕ V (and say that U ⊕ V is a direct sum) to mean that every vector
x of U + V can be written uniquely as a sum x = y + z with y in U and z in
Z. We will also say that U and V are complementary, or complements of one
another.
35
36
Direct Sums of Subspaces
R3 = U ⊕ V for U the subspace consisting of the vectors
x1
x = x2
0
and V the subspace consisting of the vectors
0
x = 0 ,
x3
since we can write any vector x uniquely as
x1
0
x = x2 + 0 .
x3
0
3.2 Give an example of two subspaces of R3 which are not complementary.
Theorem 3.4. U + V is a direct sum U ⊕ V just when U ∩ V consists of just
the 0 vector.
Proof. If U +V is a direct sum, then we need to see that U ∩V only contains the
zero vector. If it contains some vector x, then we can write x uniquely as a sum
x = y + z, but we can also write x = (1/2)x + (1/2)x or as x = (1/3)x + (2/3)x,
as a sum of vectors from U and V . Therefore x = 0.
On the other hand, if there is more than one way to write x = y + z = Y + Z
for some vectors y and Y from U and z and Z from V , then 0 = (y−Y )+(z −Z),
so Y − y = z − Z, a nonzero vector from U ∩ V .
Lemma 3.5. If U ⊕ V is a direct sum of subspaces of a vector space W , then
the dimension of U ⊕ V is the sum of the dimensions of U and V . Moreover,
putting together any basis of U with any basis of V gives a basis of W .
Proof. Pick a basis for U , say u1 , u2 , . . . , up , and a basis for V , say v1 , v2 , . . . , vq .
Then consider the set of vectors given by throwing all of the u’s and v’s together.
The u’s and v’ are linearly independent of one another, because any linear
relation
0 = a1 u1 + a2 u2 + · · · + ap up + b1 v1 + b2 v2 + · · · + bq vq
would allow us to write
a1 u1 + a2 u2 + · · · + ap up = − (b1 v1 + b2 v2 + · · · + bq vq ) ,
so that a vector from U (the left hand side) belongs to V (the right hand side),
which is impossible unless that vector is zero, because U and V intersect only
Application: Simultaneously Diagonalizing Several Matrices
at 0. But that forces 0 = a1 u1 + a2 u2 + · · · + ap up . Since the u’s are a basis,
this forces all a’s to be zero. The same for the b’s, so it isn’t a linear relation.
Therefore the u’s and v’s put together give a basis for U ⊕ V .
We can easily extend these ideas to direct sums with many summands
U1 ⊕ U2 ⊕ · · · ⊕ Uk .
3.3 Prove that if U ⊕V = W , then any linear maps S : U → Rp and T : V → Rp
determine a unique linear map Q : W → Rp written Q = S⊕T , so that Q|U = S
and Q|V = T .
Application: Simultaneously Diagonalizing Several Matrices
Theorem 3.6. Suppose that T1 , T2 , . . . , TN are linear maps taking a vector
space V to itself, each diagonalizable, and each commuting with the other (which
means T1 T2 = T2 T1 , etc.) Then there is a single invertible linear map F : Rn →
V diagonalizing all of them.
Proof. Since T1 and T2 commute, if x is an eigenvector of T1 with eigenvalue
λ, then T2 x is too:
T1 (T2 x) = T2 T1 x
= T2 λx
= λ (T2 x) .
So each eigenspace of T1 is invariant under T2 . The same is true for any two
of the linear maps T1 , T2 , . . . , TN . Because T1 is diagonalizable, V is a direct
sum of the eigenspaces of T1 . So it suffices to find a basis for each eigenspace of
T1 , which diagonalizes all of the linear maps. It suffices to prove this on each
eigenspace separately. So lets restrict to an eigenspace of T1 , where T1 = λ1 .
So T1 = λ1 is diagonal in any basis. By the same reasoning applied to T2 , etc.,
we can work on a common eigenspace of all of the Tj , arranging that T2 = λ2 ,
etc., diagonal in any basis.
Transversality
Lemma 3.7. If V is a finite dimensional vector space, containing two subspaces
U and W , then
dim U + dim W = dim(U + W ) + dim (U ∩ W ) .
Proof. Take any basis for U ∩ W . Then while you pick some more vectors from
U to extend it to a basis of U , I will simultaneously pick some more vectors
from W to extend it to a basis of W . Clearly can throw our vectors together
to get a basis of U + W . Count them up.
This lemma makes certain inequalities on dimensions obvious.
37
38
Direct Sums of Subspaces
Lemma 3.8. If U and W are subspaces of an n dimensional vector space V ,
say of dimensions p and q, then
max {0, p + q − n} ≤ dim (U ∩ W ) ≤ min {p, q} ,
max {p, q} ≤ dim (U + W ) ≤ min {n, p + q} .
Proof. All inequalities but the first are obvious. The first follows from the last
by applying lemma 3.7 on the previous page.
3.4 How few dimensions can the intersection of subspaces of dimensions 5 and
3 in R7 have? How many?
Definition 3.9. Two subspaces U and W of a finite dimensional vector space V
are transverse if U + W = V .
3.5 How few dimensions can the intersection of transverse subspaces of dimensions 5 and 3 in R7 have? How many?
3.6 Must subspaces in direct sums be transverse?
Computations
In Rn , all abstract concepts of linear algebra become calculations.
3.7 Suppose that U and W are subspaces of Rn . Take a basis for U , and put
it into the columns of a matrix, and call that matrix A. Take a basis for W ,
and put it into the columns of a matrix, and call that matrix B. How do you
find a basis for U + W ? How do you see if U + W is a direct sum?
Proposition 3.10. Suppose that U and W are subspaces of Rn and that A
and B are matrices whose columns give bases for U and W respectively. Apply
Gaussian elimination to find a basis for the kernel of (A B), say
x1
x
x
, 2 ,..., s .
y1
y2
ys
Then the vectors Ax1 , Ax2 , . . . , Axs form a basis for the intersection of U and
W.
Proof. For example, Ax1 + By1 = 0, so Ax1 = −By1 = B(−y1 ) lies in the
image of A and of B. Therefore the vectors Ax1 , Ax2 , . . . , Axn lie in U ∩ W .
Suppose that some vector v also lies in U ∩ W . Then v = Ax = B(−y) for
some vectors x and y. But then Ax + By = 0, so
x
y
is a multiple of the vectors
x1
x
x
, 2 ,..., s .
y1
y2
ys
39
Computations
P
so x =
aj Axj , for some numbers aj . Therefore these AxjPspan the intersection.
cj Axj . So
P Suppose they suffer some linear relation: 0 =
0 = A cj xj .PBut the columns of A are linearly independent, so A is 1-1.
Therefore 0 = cj xj . At the same time,
X
0=
cj Axj
X
=
cj B(−yj )
X
= B(−
cj yj ).
But B is also 1-1, so 0 =
P
cj yj . So
0=
X
xj
cj
.
yj
But these vectors are linearly independent by construction.
Jordan Normal Form
Chapter 4
Jordan Normal Form
We can’t quite diagonalize every linear map, but we will see how close we can come:
the Jordan normal form.
Jordan Normal Form and Strings
Diagonalizing is powerful. By diagonalizing a linear map, we can see what
it does completely, and we can easily carry out immense computations; for
example, finding large powers of a linear map. The trouble is that some linear
maps can’t be diagonalized. How close can we come?
A=
0
0
1
0
is the simplest possible example. Its only eigenvalue is λ = 0. As a real
or complex matrix, it has only one eigenvector,
1
,
0
up to rescaling. Not enough eigenvectors to form a basis of R2 , so not
enough to diagonalize. We will build this entire chapter from this simple
example.
4.1 What does the map taking x to Ax look like for this matrix A?
Lets write
∆1 = (0) , ∆2 =
0
0
0
1
, ∆3 = 0
0
0
1
0
0
0
1 ,
0
and in general write ∆n or just ∆ for the square matrix with 1’s just above the
diagonal, and 0’s everywhere else.
4.2 Prove that ∆ej = ej−1 , except for ∆e1 = 0. So we can think of ∆ as
shifting the standard basis, like the proverbial lemmings stepping forward until
e1 falls off the cliff:
43
44
Jordan Normal Form
∆
∆
e1
e2
∆
...
∆
en−1
∆
en
0
A matrix of the form λ + ∆ is called a Jordan block. Our goal in this chapter
is to prove:
Theorem 4.1. Suppose that T : V → V is a linear map on a finite dimensional
vector space V over the field K = R or C, and that all of the eigenvalues of T
lie in the field K. Then there is a basis in which the matrix of T is in Jordan
normal form:
λ1 I + ∆
λ2 I + ∆
,
..
.
λN I + ∆
broken into Jordan blocks.
A λ-string for a linear map T : V → V with eigenvalue λ is a collection of
nonzero vectors of the form
T −λI
(T − λI)k v
T −λI
(T − λI)k−1 v
T −λI
...
T −λI
(T − λI) v
T −λI
v
0.
4.3 For T = λI + ∆, show that e1 , e2 , . . . , en is a λ-string.
4.4 Find strings of
2
0
0
T =
0
0
0
0
2
0
0
0
0
0
1
2
0
0
0
0
0
0
3
0
0
0
0
0
1
3
0
0
0
0
.
0
1
3
Now we prove the theorem.
Proof. It is enough to give a basis of strings, since the isomorphism taking that
basis to the standard basis will identify T with the matrix of Jordan blocks: T
will act on its strings just like each of those Jordan blocks acts on its strings.
If dim V = 1, any linear map T : V → V is a scaling, so the result is obvious.
Suppose that dim V = n > 1. Pick an eigenvalue λ of T , and a λ-eigenvector v
of T . Let V0 be the span of v, and V1 = V /V0 . Since T takes V0 to V0 , we can
define a quotient map T1 : V1 → V1 , so
T1 (w + V0 ) = T w + V0 .
45
Jordan Normal Form and Strings
By induction, T1 has a basis of strings. Pick one such string for an eigenvalue
µ 6= λ, say
T1 −µI
(T1 − µI)` u
T1 −µI
(T1 − µI)`−1 u
T1 −µI
...
T1 −µI
(T1 − µI) u
T1 −µI
u
0.
Since u ∈ V1 , we can write u as a translate u = w + V0 . Then
`+1
(T1 − µI)
u = 0,
tells us that
(T − µI)
`+1
w = cv,
for some number c ∈ K, since 0 ∈ V1 corresponds precisely to elements spanned
by v back up in V .
This w vector might not give us a string. Lets patch it by trying to add
something on to it: try w + bv for some number b ∈ K, and check that
`+1
`+1
(T − µI)
(w + bv) = c + b (λ − µ)
v.
We can let
b=−
c
`+1
(λ − µ)
,
and find that w + bv lies in a string in V which maps to the original string
down in V1 .
Next, what if µ = λ? We take a λ-string in V1 , say
T1 −λI
(T1 − λI)` u
T1 −λI
(T1 − λI)`−1 u
T1 −λI
...
T1 −λI
(T1 − λI) u
T1 −λI
u
0.
Again we write u as w + V0 , some w ∈ V . Clearly
`+1
(T1 − λI)
u=0
implies that
`+1
(T − λI)
w = cv
for some c ∈ K. If c = 0, then w starts a string, and we use that string. If
c 6= 0, then
`+2
(T − λI)
w=0
and again w starts a string, but one step longer and containing cv. Rescale to
arrange c = 1, so v belongs to this string.
If we end up with two different strings both containing v, the difference of
the two will end up at 0 instead of v, so a shorter string. Therefore we can
46
Jordan Normal Form
arrange that only one string at most contains v. If v doesn’t already show up
in any one of our strings, then we can add v to a string too, just made of v.
We have lifted every string in V1 to a string in V . The vectors in our strings
map to a basis of V1 , so their span has dimension at least n − 1. But v is in
our strings too, so we have all n dimensions of V spanned by our strings. Our
strings have exactly n vectors in them, so form a basis.
Generalized Eigenspaces
If λ is an eigenvalue of a linear map T : V → V , a vector v ∈ V is a generalized
k
eigenvector of T with eigenvalue λ if (T − λI) v = 0, for some integer k > 0.
If k = 1 then x is an eigenvector in the usual sense.
A=
0
0
1
0
satisfies A2 = 0, so every vector x in K2 is a generalized eigenvector of A,
with eigenvalue 0. In the generalized sense, we have lots of eigenvectors.
4.5 Prove that every vector in Kn is a generalized eigenvector of ∆ with eigenvalue 0. Then prove that for any number λ, every vector in Kn is a generalized
eigenvector of λ + ∆ with eigenvalue λ.
For example, each vector in a λ-string is a generalized eigenvector with
eigenvalue λ.
Suppose that T : V → V is a linear map, pick some λ ∈ K and let
Vλ (T ) ⊂ V
be the set of all generalized eigenvectors of T with eigenvalue λ. Call Vλ (T ) the
generalized λ-eigenspace of T .
Lemma 4.2. The generalized eigenspaces of a linear map are subspaces.
Proof. If v ∈ Vλ (T ) and c ∈ K, we need to prove that cv ∈ Vλ (T ). By definition,
v is a generalized eigenvector with eigenvalue λ, so
k
(T − λI) v = 0
for some positive integer k. But then
k
k
(T − λI) (cv) = c (T − λI) v = 0.
Similarly, if v1 , v2 ∈ Vλ (T ), say
k1
(T − λI)
v1 = 0
47
Jordan Normal Form and Strings
and
(T − λI)
k2
v2 = 0
(the powers k1 and k2 could be different) then clearly
(T − λI)
max(k1 ,k2 )
(v1 + v2 ) = 0.
4.6 Prove that every nonzero generalized eigenvector v belongs to a string, and
all of the vectors in the string are linearly independent.
4.7 Prove that nonzero generalized eigenvectors of a square matrix, with different eigenvalues, are linearly independent. Prove that the sum of all of the
generalized eigenspaces is a direct sum.
Corollary 4.3. If T : V → V is a linear map on a finite dimensional vector
space V over K = R or C, and all eigenvalues of T belong to K, then V is a
direct sum
M
V =
Vλ (T )
λ
of the generalized eigenspaces of T .
Cracking of Jordan Blocks
Jordan normal form is very sensitive to small changes in matrix entries. For
this reason, we cannot compute Jordan normal form unless we know the matrix
entries precisely, to infinitely many decimals.
4.8 If T : V → V is a linear map on an n-dimensional vector space and has n
different eigenvalues, show that T is diagonalizable.
The matrix
∆2 =
0
0
1
0
is not diagonalizable, but the nearby matrix
ε1 1
0 ε2
is, as long as ε1 6= ε2 , since these ε1 and ε2 are the eigenvalues of this
matrix. It doesn’t matter how small ε1 and ε2 are. The same idea
clearly works for ∆ of any size, and for λ + ∆ of any size, and so for
any matrix in Jordan normal form.
48
Jordan Normal Form
Theorem 4.4. Every linear map T : V → V on a finite dimensional complex
vector space can be approximated as closely as we like by a diagonalizable linear
map.
Remark 4.5. By “approximated”, we mean that we can make a new linear map
whose matrix entries, in any basis we like, are all as close as we like to the
entries of the matrix of the original linear map.
Proof. Put your linear map into Jordan normal form, say F −1 T F , and then
use the trick from the last example, to make diagonalizable matrices B close
to F −1 T F . Then F BF −1 is diagonalizable too, and is close to T .
Remark 4.6. Using the same proof, we can also approximate any real linear
map arbitrarily closely by diagonalizable real linear maps (i.e. diagonalized by
real change of basis matrices), just when all of its eigenvalues are real.
Uniqueness of Jordan Normal Form
Proposition 4.7. A linear map T : V → V on a finite dimensional vector
space has at most one Jordan normal form, up to reordering the Jordan blocks.
Proof. Clearly the eigenvalues are independent of change of basis. The problem
is to figure out how to measure, for each eigenvalue, the number of blocks of
each size. Fix an eigenvalue λ. Let
m
dm (λ, T ) = dim ker (T − λI) .
(We will only use this notation in this proof.) Clearly dm (λ, T ) is independent
of choice of basis. For example, d1 (λ, T ) is the number of blocks with eigenvalue
λ, while d2 (λ, T ) counts vectors at or next to the end of strings with eigenvalue
λ. All 1 × 1 blocks contribute to both d1 (λ, T ) and d2 (λ, T ). The difference
d2 (λ, T ) − d1 (λ, T ) is the number of blocks of size at least 2 × 2. Similarly the
number d3 (λ, T ) − d2 (λ, T ) measures the number of blocks of size at least 3 × 3,
etc. But then the difference
(d2 (λ, T ) − d1 (λ, T ))−(d3 (λ, T ) − d2 (λ, T )) = 2 d2 (λ, T )−d1 (λ, T )−d3 (λ, T )
is the number blocks of size at least 2 × 2, but not 3 × 3 or more, i.e. exactly
2 × 2.
The number of m × m blocks is the difference between the number of blocks
at least m × m and the number of blocks at least (m + 1) × (m + 1), so
number of m × m blocks = (dm (λ, T ) − dm−1 (λ, T )) − (dm+1 (λ, T ) − dm (λ, T ))
= 2 dm (λ, T ) − dm−1 (λ, T ) − dm+1 (λ, T ) .
(We can just define d0 (λ, T ) = 0 to allow this equation to hold even for m = 1.)
Therefore the number of blocks of any size is independent of the choice of
basis.
49
Working over arbitrary fields
Working over arbitrary fields
Over F2 , the matrices
0 0
1
,
0 0
0
0
0
,
0
0
1
1
,
0
0
0
0
,
1
0
0
1
,
1
0
1
1
are in Jordan normal form. In fact, they are the only 2 × 2 matrices that are in
Jordan normal form, because we only have 0’s and 1’s to put into the entries.
Over F2 , consider the matrix
0 1
A=
1 1
Consider the problem of finding the Jordan normal form of A. First, the
characteristic polynomial of A is
p(x) = det(xI − A),
x
−1
= det
,
−1 x − 1
but −1 = 1 in F2 , since 1 + 1 = 0, so
x
1
= det
,
1 x+1
= x(x + 1) − 1,
= x2 + x + 1.
Let’s look for roots of p(x), i.e. eigenvalues. Try x = 0:
p(0) = 02 + 0 + 1 = 1,
no good. Try x = 1:
p(1) = 12 + 1 + 1 = 1 + 1 + 1 = 1,
since 1 + 1 = 0. No good. So p(x) has no eigenvalues in F2 . We immediately
recognize, from section 2 on page 27, that the splitting field of p(x) is
F4 = {0, 1, α, α + 1} ,
and that p (α) = p (α + 1) = 0. So over F4 , A has eigenvalues α and α + 1,
which are distinct. Each distinct eigenvalue contributes at least one linearly
independent eigenvector, so A is diagonalizable and has Jordan normal form
α
0
.
0 α+1
50
Jordan Normal Form
To find the basis of eigenvectors, proceed as usual. Try λ = α, and find that
eigenvectors
x
y
must satisfy
x
x
A
=α
,
y
y
which forces precisely
y = αx and x + y = αy.
The first equation tells us that if x = 0 then y = 0, and we want a nonzero
eigenvector. So we can assume that x 6= 0, rescale to arrange x = 1, and then
y = α, so our eigenvector is
x
1
=
.
y
α
In the same way, for λ = α + 1, we find the eigenvector
x
1
=
.
y
α+1
So the matrix
F =
1
α
1
α+1
diagonalizes A over F4 :
F AF −1 =
α
0
0
α+1
as you can easily check.
Review problems
Compute the matrix F for which F −1 AF is in Jordan normal form, and the
Jordan normal form itself, for
4.9
4.10
1
A=
1
0
A = 0
0
1
1
0
0
0
1
0
0
4.11 Thinking about the fact that ∆ has string en , en−1 , . . . , e1 , what is the
Jordan normal form of A = ∆2n ? (Don’t try to find the matrix F bringing A
to that form.)
51
Review problems
4.12 Find a matrix F so that F −1 AF
0
0
A=
0
−1
is in Jordan normal form, where
1 0 0
0 1 0
0 0 1
0 −2 0
4.13 Find a matrix F so that F −1 AF
0 0
0 0
A=
0 0
0 0
0 0
is in Jordan normal form, where
1 0 0
0 1 0
0 0 1
.
0 0 0
0 0 0
4.14 Without computation (and without finding the matrix F taking A to
Jordan normal form), explain how you can see the Jordan normal form of
1 10 100
0 20 200
0 0 300
4.15 If a linear map T : V → V satisfies a complex polynomial equation f (T ) =
0, show that each eigenvalue of T must satisfy the same equation.
4.16 Prove that any reordering of Jordan blocks can occur by changing the
choice of the basis we use to bring a linear map T to Jordan normal form.
4.17 Prove that every linear map is a product of two diagonalizable linear
maps.
4.18 Suppose that to each n × n matrix A we assign a number D(A), and that
D(AB) = D(A)D(B).
(a) Prove that
D P −1 AP = D(A)
for any matrix P .
(b) Define a function f (x) by
x
f (x) = D
1
.
1
..
.
1
Prove that f (x) is multiplicative; i.e. f (ab) = f (a)f (b) for any numbers
a and b.
52
Jordan Normal Form
(c) Prove that on any diagonal matrix
a1
a2
A=
..
.
an
we have
D(A) = f (a1 ) f (a2 ) . . . f (an ) .
(d) Prove that on any diagonalizable matrix A,
D(A) = f (λ1 ) f (λ2 ) . . . f (λn ) ,
where λ1 , λ2 , . . . , λn are the eigenvalues of A (counted with multiplicity).
(e) Use the previous exercise to show that D(A) = f (det(A)) for any matrix
A.
(f) In this sense, det is the unique multiplicative quantity associated to a
matrix, up to composing with a multiplicative function f . What are all
continuous multiplicative functions f ? (Warning: it is a deep result that
there are many discontinuous multiplicative functions.)
4.19 Consider the matrix
0
A = 0
1
1
0
0
0
1
1
over the finite field F2 . Calculate the characteristic polynomial; you should get
p(x) = x3 + x2 + 1.
Prove that the matrix is not diagonalizable over F2 , but becomes diagonalizable
over the finite field F8 . Hint: see problem 2.14 on page 29. If we write the
roots of p(x) as α, α2 , 1 + α + α2 , then calculate out a basis of eigenvectors.
Chapter 5
Decomposition and Minimal Polynomial
The Jordan normal form has an abstract version for linear maps, called the decomposition of a linear map.
Polynomial Division
5.1 Divide x2 + 1 into x5 + 3x2 + 4x + 1, giving quotient and remainder.
5.2 Use the Euclidean algorithm (subsection 2 on page 23) applied to polynomials instead of integers, to compute the greatest common divisor r(x) of
a(x) = x4 + 2 x3 + 4 x2 + 4 x + 4 and b(x) = x5 + x2 + 2 x3 + 2. Find polynomials
u(x) and v(x) so that u(x)a(x) + b(x)v(x) = r(x). Find the least common
multiple.
Given any pair of polynomials a(x) and b(x), the Euclidean algorithm writes
their greatest common divisor r(x) as
r(x) = u(x) a(x) + v(x) b(x),
a “linear combination” of a(x) and b(x). Similarly, if we have any number of
polynomials, we can write the greatest common divisor of any pair of them as
a linear combination. Pick two pairs of polynomials, and write the greatest
common divisor of the greatest common divisors as a linear combination, etc.
Keep going until you hit the greatest common divisor of the entire collection.
We can unwind this process, to write the greatest common divisor of the entire
collection of polynomials as a linear combination of the polynomials themselves.
5.3 For integers 2310, 990 and 1386 (instead of polynomials) express their greatest common divisor as an “integer linear combination” of them.
The Minimal Polynomial
We are interested in equations satisfied by a linear map.
Lemma 5.1. Every linear map T : V → V on a finite dimensional vector space
V satisfies a polynomial equation p(T ) = 0 with p(x) a nonzero polynomial.
Proof. The set of all linear maps V → V is a finite dimensional vector space,
2
of dimension n2 where n = dim V . Therefore the elements I, T, T 2 , . . . , T n
cannot be linearly independent.
53
54
Decomposition and Minimal Polynomial
5.4 Prove that ∆k has zeros down the diagonal, for any integer k ≥ 1.
k
5.5 Prove that, for any number λ, the diagonal entries of (λI + ∆) are all λk .
Definition 5.2. The minimal polynomial of a linear map T : V → V is the
smallest degree polynomial
m(x) = xd + ad−1 xd−1 + · · · + a0
with coefficients in K for which m(T ) = 0.
Lemma 5.3. There is a unique minimal polynomial for any linear map T : V →
V on a finite dimensional vector space V . The minimal polynomial divides every
other polynomial s(x) for which s(T ) = 0. All roots of the minimal polynomial
are eigenvalues.
Proof. For example, if T satisfies two polynomials, say 0 = T 3 + 3 T + 1 and
0 = 2 T 3 + 1, then we can rescale the second equation by 12 to get 0 = T 3 + 21 ,
and then we have two equations
which both start with T 3 , so just take the
3
difference: 0 = T + 3 T + 1 − T 3 + 12 . The point is that the T 3 terms wipe
each other out, giving a new equation of lower degree. Keep going until you
get the lowest degree possible nonzero polynomial. Rescale to get the leading
coefficient to be 1.
If s(x) is some other polynomial, and s(T ) = 0, then divide m(x) into s(x),
say s(x) = q(x)m(x) + r(x), with remainder r(x) of smaller degree than m(x).
But then 0 = s(T ) = q(T )m(T ) + r(T ) = r(T ), so r(x) has smaller degree than
m(x), and r(T ) = 0. But m(x) is already the smallest degree possible without
being 0. So r(x) = 0, and m(x) divides s(x).
If λ is a root of m(x), say m(x) = (x − λ)p(x), but not an eigenvalue, then
p(T ) = (T − λI)−1 m(T ) = 0,
but p(x) is a smaller degree polynomial than the minimal, a contradiction.
5.6 Prove that the minimal polynomial of ∆n is m(x) = xn .
5.7 Prove that the minimal polynomial of a Jordan block λ + ∆n is m(x) =
n
(x − λ) .
Lemma 5.4. If A and B are square matrices with minimal polynomials mA (x)
and mB (x), then the matrix
A 0
C=
0 B
has minimal polynomial mC (x) the least common multiple of the polynomials
mA (x) and mB (x).
55
Review problems
Proof. Calculate that
C2 =
A2
0
0
,
B2
etc., so for any polynomial q(x),
q(C) =
q(A)
0
.
0
q(B)
Let l(x) be the least common multiple of the polynomials mA (x) and mB (x).
Then clearly l(C) = 0. So mC (x) divides l(x). But mC (C) = 0, so mC (A) = 0.
Therefore mA (x) divides mC (x). Similarly, mB (x) divides mC (x). So mC (x)
is the least common multiple.
Using the same proof:
Lemma 5.5. If a linear map T : V → V has invariant subspaces U and W
so that V = U ⊕ W , then the minimal polynomial of T is the least common
multiple of the minimal polynomials of T |U and T |W .
Lemma 5.6. Suppose that T : V → V is a linear map on a finite dimensional
vector space V over a field K and that all of the eigenvalues of T lie in K. Then
the minimal polynomial m(x) is
d1
m(x) = (x − λ1 )
(x − λ2 )
d2
. . . (x − λs )
ds
where λ1 , λ2 , . . . , λs are the eigenvalues of T and dj is no larger than the dimension of the generalized eigenspace of λj . In fact dj is the size of the largest
Jordan block with eigenvalue λj in the Jordan normal form.
Proof. We can assume that T is already in Jordan normal form: the minimal
polynomial is the least common multiple of the minimal polynomials of the
blocks.
Corollary 5.7. Suppose that T : V → V is a linear map on a finite dimensional
vector space V over a field K and that all of the eigenvalues of T lie in K. Then
T is diagonalizable just when it satisfies a polynomial equation s(T ) = 0 where
s(x) = (x − λ1 ) (x − λ2 ) . . . (x − λs ) = 0,
for some distinct numbers λ1 , λ2 , . . . , λs ∈ K, which happens just when its
minimal polynomial is a product of distinct linear factors over K.
Review problems
5.8 Find the minimal polynomial of
A=
1
3
2
.
4
56
Decomposition and Minimal Polynomial
5.9 Prove that the minimal polynomial of any 2 × 2 matrix A is
m(λ) = λ2 − tr A λ + det A,
(where tr A is the trace of A), unless A is a multiple of the identity matrix, say
A = c for some number c, in ehich case m(A) = λ − c.
5.10 Use Jordan normal form to prove the Cayley–Hamilton theorem: every
linear map T : V → V on a finite dimensional complex vector space satisfies
p(T ) = 0, where p(λ) = det (A − λ I) is the characteristic polynomial of A.
5.11 Prove that if A is a square matrix with real entries, then the minimal
polynomial of A has real coefficients.
5.12 If T : V → V is a linear map on a finite dimensional complex vector space
V of dimension n, and T n = I, prove that T is diagonalizable. Give an example
on a real vector space to show that T need not be diagonalizable over the real
numbers.
Appendix: How the Find the Minimal Polynomial
Given a square matrix A, to find its minimal polynomial requires that we find
linear relations among powers of A. If we find a relation like A3 = I + 5A,
then we can multiply both sides by A to obtain a relation A4 = A + 5A2 . In
particular, once some power of A is a linear combination of lower powers of A,
then every higher power is also a linear combination of lower powers.
For each n × n matrix A, just for this appendix lets write A to mean the
vector you get by chopping out the columns of A and stacking them on top of
one another. For example, if
1 2
A=
,
3 4
then
1
3
A=
2 .
4
Clearly a linear relation like A3 = I +5A will hold just when it holds underlined:
A3 = I + 5A.
Now lets suppose that A is n × n, and lets form the matrix
B = I A A2 . . . A n .
Clearly B has n2 rows and n columns. Apply forward elimination to B, and
call the resulting matrix U . If one of the columns, say A3 , is not a pivot column,
then A3 is a linear combination of lower powers of A, so therefore A4 is too,
57
Decomposition of a Linear Map
etc. So as soon as you hit a pivotless
pivotless. Therefore U looks like
∗ ∗ ∗
∗ ∗
∗
U =
column of B, all subsequent columns are
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
∗
,
pivots straight down the diagonal, until you hit rows of zeros. Cut out all of
the pivotless columns of U except the first pivotless column. Also cut out the
zero rows. Then apply back substitution, turning U into
1
a0
1
a1
.
.
..
..
.
1 ap
Then the minimal polynomial is m(x) = xp+1 − a0 − a1 x − · · · − ap xp . To
see that this works, you notice that we have cut out all but the column of
Ap+1 , the smallest power of A that is a linear multiple of lower powers. So
the minimal polynomial has to express Ap+1 as a linear combination of lower
powers, i.e. solving the linear equations a0 I + a1 A + · · · + ap Ap = Ap+1 = 0.
These equations yield the matrix B with a0 , a1 , . . . , ap as the unknowns, and
we just apply elimination. On large matrices, this process is faster than finding
the determinant. But it has the danger that small perturbations of the matrix
entries alter the minimal polynomial drastically, so we can only apply this
process when we know the matrix entries precisely.
5.13 Find the minimal polynomial
0
2
0
of
1
1
0
1
2
.
−1
What are the eigenvalues?
Decomposition of a Linear Map
There is a more abstract version of the Jordan normal form, applicable to an
abstract finite dimensional vector space.
Definition 5.8. If T : V → W is a linear map, and U is a subspace of V , recall
that the restriction T |U : U → W is defined by T |U u = T u for u in U .
58
Decomposition and Minimal Polynomial
Definition 5.9. Suppose that T : V → V is a linear map from a vector space
back to itself, and U is a subspace of V . We say that U is invariant under T if
whenever u is in U , T u is also in U .
A difficult result to prove by any other means:
Corollary 5.10. If a linear map T : V → V on a finite dimensional vector space
is diagonalizable, then its restriction to any invariant subspace is diagonalizable.
Proof. The linear map satisfies the same polynomial equation, even after restricting to the subspace.
5.14 Prove that a linear map T : V → V on a finite dimensional vector space is
diagonalizable just when every subspace invariant under T has a complementary
subspace invariant under T .
Definition 5.11. A linear map N : V → V from a vector space back to itself is
called nilpotent if there is some positive integer k for which N k = 0.
Clearly a linear map on a finite dimensional vector space is nilpotent just
when its minimal polynomial is p(x) = xk for some positive integer k.
Corollary 5.12. A linear map N : V → V is nilpotent just when the restriction
of N to any N invariant subspace is nilpotent.
5.15 Prove that the only nilpotent which is diagonalizable is 0.
5.16 Give an example to show that the sum of two nilpotents might not be
nilpotent.
Definition 5.13. Two linear maps S : V → V and T : V → V commute when
ST = T S.
Lemma 5.14. The sum and difference of commuting nilpotents is nilpotent.
Proof. If S and T are nilpotent linear maps V → V , say S p = 0 and T q = 0.
r
Then take any number r ≥ p + q and expand out the sum (S + T ) . Because
S and T commute, every term can be written with all its S factors on the left,
and all its T factors on the right, and there are r factors in all, so either p of
the factors must be S or q must be T , hence each term vanishes.
Lemma 5.15. If two linear maps S, T : V → V commute, then S preserves
each generalized eigenspace of T , and vice versa.
Proof. If ST = T S, then clearly ST 2 = T 2 S, etc. so that Sp(T ) = p(T )S for
any polynomial p(T ). Suppose that T has an eigenvalue λ. If x is a generalized
eigenvector, i.e. (T − λI)p x = 0 for some p, then
p
p
(T − λI) Sx = S (T − λI) x = 0,
so that Sx is also a generalized eigenvector with the same eigenvalue.
59
Decomposition of a Linear Map
Theorem 5.16. Take T : V → V any linear map over a field K, and suppose
that all of the eigenvalues of T belong to K. Then T can be written in just one
way as a sum T = D + N with D diagonalizable, N nilpotent, and all three of
T, D and N commuting.
For T = λI + ∆, set D = λI, and N = ∆.
Remark 5.17. If any two of T, D and N commute, and T = D + N , then it is
easy to check that all three commute.
Proof. First, lets prove that D and N exist, and then prove they are unique.
One proof that D and N exist, which doesn’t require Jordan normal form: split
up V into the direct sum of the generalized eigenspaces of T . It is enough to
find some D and N on each one of these spaces. But on each eigenspace, say
with eigenvalue λ, we can let D = λI and let N = T − λI. So existence of D
and N is obvious.
Another proof that D and N exist, which uses Jordan normal form: pick
a basis in which the matrix of T is in Jordan normal form. Lets also call this
matrix T , say
λ1 I + ∆
λ2 I + ∆
T =
.
..
.
λN I + ∆
Let
λ1 I
D=
λ2 I
..
,
.
λN I
and
N =
∆
∆
..
.
.
∆
This proves that D and N exist.
Why are D and N uniquely determined? All generalized eigenspaces of T
are D and N invariant. So we can restrict to a single generalized eigenspace
of T , and need only show that D and N are uniquely determined there. If λ
is the eigenvalue, then D − λI = (T − λI) − N is a difference of commuting
nilpotents, so nilpotent by lemma 5.14 on the preceding page. Therefore D − λI
is both nilpotent and diagonalizable, and so vanishes: D = λI and N = T − λI,
uniquely determined.
60
Decomposition and Minimal Polynomial
Theorem 5.18. If T0 : V → V and T1 : V → V are commuting linear maps (i.e.
T0 T1 = T1 T0 ) on a finite dimensional vector space V over a field K. Suppose
that all of the eigenvalues of T0 and T1 belong to K. Then splitting each into
its diagonalizable and nilpotent parts, T0 = D0 + N0 and T1 = D1 + N1 , any
two of the maps T0 , D0 , N0 , T1 , D1 , N1 commute.
k
Proof. If x is a generalized eigenvector of T0 (so that (T0 − λ) x = 0 for some λ
k
and integer k > 0), then T1 x is also (because (T0 − λ) T1 x = 0 too, by pulling
the T1 across to the left). Since V is a direct sum of generalized eigenspaces of
T0 , we can restrict to a generalized eigenspace of T0 and prove the result there.
So we can assume that T0 = λ0 + N0 , for some complex number λ0 . Switching
the roles of T0 and T1 , we can assume that T1 = λ1 + N1 . Clearly D0 = λ0 and
D1 = λ1 commute with one another and with anything else. The commuting
of T0 and T1 is equivalent to the commuting of N0 and N1 .
Review problems
5.17 Find the decomposition T = D + N of
ε1 1
T =
0 ε2
(using the same letter T for the linear map and its associated matrix).
5.18 If two linear maps S : V → V and T : V → V on a finite dimensional
complex vector space commute, show that the eigenvalues of ST are products
of eigenvalues of S with eigenvalues of T .
Chapter 6
Matrix Functions of a Matrix Variable
We will make sense out of expressions like eT , sin T, cos T for square matrices T , and
for linear maps T : V → V . We expect that the reader is familiar with calculus and
infinite series.
Definition 6.1. A function f (x) is analytic if near each point x = x0 , f (x) is
the sum of a convergent Taylor series, say
2
f (x) = a0 + a1 (x − x0 ) + a2 (x − x0 ) + . . .
We will henceforth allow the variable x (and the point x = x0 around which we
take the Taylor series) to take on real or complex values.
Definition 6.2. If f (x) is an analytic function and T : V → V is a linear map
on a finite dimensional vector space, define f (T ) to mean the infinite series
2
f (T ) = a0 + a1 (T − x0 ) + a2 (T − x0 ) + . . . ,
just plugging the linear map T into the expansion.
Lemma 6.3. Under an isomorphism F : U → V of finite dimensional vector
spaces
f F −1 T F = F −1 f (T )F,
with each side defined when the other is.
2
Proof. Expanding out, we see that F −1 T F
= F −1 T 2 F . By induction,
k
F −1 T F = F −1 T k F for k = 1, 2, 3, . . . . So for any polynomial function p(x),
we see that p F −1 T F = F −1 p(T )F . Therefore the partial sums of the Taylor
expansion converge on the left hand side just when they converge on the right,
approaching the same value.
Remark 6.4. If a square matrix is in square blocks,
B 0
A=
,
0 C
then clearly
f (A) =
f (B)
0
.
0
f (C)
So we only need to work one block at a time.
61
62
Matrix Functions of a Matrix Variable
Theorem 6.5. Let f (x) be an analytic function. If A is a single Jordan block,
A = λ + ∆n , and the series for f (x) converges near x = λ, then
f (A) = f (λ) + f 0 (λ) ∆ +
f 00 (λ) 2 f 00 (λ) 3
f (n−1) (λ) n−1
∆ +
∆ + ··· +
∆
.
2
3!
(n − 1)!
Proof. Expand out the Taylor series, keeping in mind that ∆n = 0.
Corollary 6.6. The value of f (A) does not depend on which Taylor series we
use for f (x): we can expand f (x) about any point as long as the series converges
on the spectrum of A. If we change the choice of the point to expand around,
the resulting expression for f (A) determines the same matrix.
Proof. Entries will be given by the formulas above, which don’t depend on the
particular choice of Taylor series, only on the values of the function f (x) for x
in the spectrum of A.
Corollary 6.7. Suppose that T : V → V is a linear map of a finite dimensional
vector space. Split T into T = D + N , diagonalizable and nilpotent parts. Take
f (x) an analytic function given by a Taylor series converging on the spectrum
of T (which is the spectrum of D). Then
f (T ) = f (D + N )
= f (D) + f 0 (D)N +
f 00 (D)N 2
f 000 (D)N 3
f (n−1) (D)N n−1
+
+ ··· +
2!
3!
(n − 1)!
where n is the dimensional of V .
Remark 6.8. In particular, the result is independent of the choice of point about
which we expand f (x) into a Taylor series, as long as the series converges on
the spectrum of T .
Consider the matrix
A=
π
0
1
π
= π + ∆.
Then
sin A = sin (π + ∆)
= sin (π) + sin0 (π) ∆
= 0 + cos (π) ∆
= −∆
0 −1
=
.
0 0
63
The Exponential and Logarithm
6.1∗ Find
√
1 + ∆, log (1 + ∆), e∆ .
Remark 6.9. If f (x) is analytic on the spectrum of a linear map T : V → V on
finite dimensional vector space, then we could actually define f (T ) by using a
different Taylor series for f (x) around each eigenvalue, but that would require
a more sophisticated theory. (We could, for example, split up V into generalized eigenspaces for T , and compute out f (T ) on each generalized eigenspace
separately; this proves convergence.) However, we will never use such a complicated theory—we will only define f (T ) when f (x) has a single Taylor series
converging on the entire spectrum of T .
The Exponential and Logarithm
The series
ex = 1 + x +
x3
x2
+
+ ...
2!
3!
converges for all values of x. Therefore eT is defined for any linear map
T : V → V on any finite dimensional vector space.
6.2∗ Recall that the trace of an n × n matrix A is tr A = A11 + A22 + · · · + Ann .
Prove that det eA = etr A .
For any positive number r,
log (x + r) = −
∞
X
1 x k
−
.
k
r
k=0
For |x| < r, this series converges by the ratio test. (Clearly x can
actually be real or complex.) If T : V → V is a linear map on a finite
dimensional vector space, and if every eigenvalue of T has positive real
part, then we can pick r larger than the largest eigenvalue of T . Then
log T = −
k
∞
X
1
T −r
−
k
r
k=0
converges. The value of this sum does not depend on the value of r,
since the value of log x = log (x − r + r) doesn’t.
Remark 6.10. The same tricks work for complex linear maps. We won’t ever be
tempted to consider f (T ) for f (x) anything other than a real-valued function
of a real variable; the reader may be aware that there is a sophisticated theory
of complex functions of a complex variable.
64
Matrix Functions of a Matrix Variable
6.3∗ Find the Taylor series of f (x) = x1 around the point x = r (as long as
r 6= 0). Prove that f (A) = A−1 , if all eigenvalues of A have positive real part.
6.4∗ If f (x) is the sum of a Taylor series converging on the spectrum of a
matrix A, why are the entries of f (A) smooth functions of the entries of A? (A
function is called “smooth” to mean that we can differentiate the function any
number of times with respect to any of its variables, in any order.)
6.5∗ For any complex number z = x+iy, prove that ez converges to ex (cos y + i sin y).
6.6∗ Use the result of the previous exercise to prove that elog A = A if all
eigenvalues of T have positive real part.
6.7∗ Use the results of the previous two exercises to prove that log eA = A if
all eigenvalues of A have imaginary part strictly between −π/2 and π/2.
Lemma 6.11. If A and B are two n × n matrices, and AB = BA, then
eA+B = eA eB .
Proof. We expand out the product eA eB and collect terms. (The process proceeds term by term exactly as it would if A and B were real numbers, because
AB = BA.)
−1
Corollary 6.12. eA is invertible for all square matrices A, and eA
= e−A .
Proof. A commutes with −A, so eA e−A = e0 = 1.
Definition 6.13. A real matrix A is skew-symmetric if At = −A. A complex
matrix A is skew-adjoint if A∗ = −A.
Corollary 6.14. If A is skew-symmetric/complex skew-adjoint then eA is orthogonal/unitary.
t
t
Proof. Term by term in the Taylor series, eA = eA = e−A . Similarly for
the other cases.
Lemma 6.15. If two n × n matrices A and B commute (AB = BA) and
the eigenvalues of A and B have positive real part and the products of their
eigenvalues also have positive real part, then log (AB) = log A + log B.
Proof. The eigenvalues of AB will be products of eigenvalues of A and of B, as
see in section 5 on page 57. Again the result proceeds as it would for A and B
numbers, term by term in the Taylor series.
Corollary 6.16. If A is orthogonal/unitary and all eigenvalues of A have
positive real part, then log A is skew-symmetric/complex skew-adjoint.
6.8∗ What do corollaries 6.14 and 6.16 say about the matrix
0 1
A=
?
−1 0
65
The Exponential and Logarithm
6.9∗ What can you say about eA for A symmetric? For A self-adjoint?
If we slightly alter a matrix, we only slightly alter its spectrum.
Theorem 6.17 (Continuity of the spectrum). Suppose that A is a n×n complex
matrix, and pick some disks in the complex plane which together contain exactly
k eigenvalues of A (counting each eigenvalue by its multiplicity). In order to
ensure that a complex matrix B also has exactly k eigenvalues (also counted by
multiplicity) in those same disks, it is sufficient to ensure that each entry of B
is close enough to the corresponding entry of A.
Proof. Eigenvalues are the roots of the characteristic polynomial det (A − λ I).
If each entry of B is close enough to the corresponding entry of A, then each
coefficient of the characteristic polynomial of B is close to the corresponding
coefficient of the characteristic polynomial of A. The result follows by the
argument principle (theorem 6.29 on page 69 in the appendix to this chapter).
Remark 6.18. The eigenvalues vary as differentiable functions of the matrix
entries as well, except where eigenvalues “collide” (i.e. at matrices for which
two eigenvalues are equal), when there might not be any way to write the
eigenvalues in terms of differentiable functions of matrix entries. In a suitable
sense, the eigenvectors can also be made to depend differentiably on the matrix
entries away from eigenvalue “collisions.” See Kato [3] for more information.
6.10∗ Find the eigenvalues of
A=
0
t
1
0
as a function of t. What happens at t = 0?
6.11 Prove that if an n × n complex matrix A has n distinct eigenvalues, then
so does every complex matrix whose entries are close enough to the entries of
A.
Corollary 6.19. If f (x) is an analytic function given by a Taylor series converging on the spectrum of a matrix A, then f (B) is defined by the same Taylor
expansion as long as each entry of B is close enough to the corresponding entry
of A.
6.12∗ Prove that a complex square matrix A is invertible just when it has the
form A = eL for some square matrix L.
66
Matrix Functions of a Matrix Variable
Appendix: Analysis of Infinite Series
Proposition 6.20. Suppose that f (x) is defined by a convergent Taylor series
X
k
f (x) =
ak (x − x0 ) ,
k
converging for x near x0 . Then both of the series
X
k
|ak | |x − x0 | ,
k
X
k−1
kak (x − x0 )
k
converge for x near x0 .
Proof. We can assume just by replacing x by x − x0 that x0 = 0. Lets suppose
that our Taylor series converges for −b < x < b. Then it must converge for
x = b/r for any r > 1. So the terms must get small eventually, i.e.
k
b ak
→ 0.
r For large enough k,
|ak | <
r k
b
.
Pick any R < r. Then
|ak |
b
R
k
k b
r k
<
R
b
r k
=
.
R
Therefore if |x| ≤ b/R, we have
X
k
|ak | |x| ≤
X r k
R
a geometric series of diminishing terms, so convergent.
Similarly,
X
X r k
kak xk ≤
k
R
which converges by the comparison test.
Corollary 6.21. Under the same conditions, the series
X k k−`
ak (x − x0 )
`
k
converges in the same domain.
67
Appendix: Analysis of Infinite Series
Proof. The same trick works.
Lemma 6.22. If f (x) is the sum of a convergent Taylor series, then f 0 (x) is
too. More specifically, if
X
k
f (x) =
ak (x − x0 ) ,
then
f 0 (x) =
X
kak (x − x0 )
f1 (x) =
X
kak (x − x0 )
k−1
.
k−1
,
Proof. Let
which we know converges in the same open interval where the Taylor series for
f (x) converges. We have to show that f1 (x) = f 0 (x). Pick any points x + h
and x where f (x) converges. Expand out:
P
P
k
ak (x + h) − k ak xk X
f (x + h) − f (x)
− f1 (x) = k
−
kak xk−1
h
h
k
!
k
k
X
(x + h) − x
k−1
=
ak
− kx
h
k
!
k X
X
k k−` `−1
=
ak
x h
− xk h−1 − kxk−1
`
k
`=0
k X X
k k−` `−2
=h
ak
x h
`
k
`=2
k
X
X
k − 2 k−2−` `
1
x
h.
=h
ak k(k − 1)
(` + 2)(` + 1)
`
k
`=0
Each term has absolute value no more than
ak k(k − 1) k (x + h)k−2 `
which are the terms of a convergent series. The expression
f (x + h) − f (x)
− f1 (x)
h
is governed by a convergent series multiplied by h. In particular the limit as
h → 0 is 0.
Corollary 6.23. Any function f (x) which is the sum of a convergent Taylor
series in a disk has derivatives of all orders everywhere in the interior of that
disk, given by formally differentiating the Taylor series of f (x).
All of these tricks work equally well for complex functions of a complex
variable, as long as they are sums of Taylor series.
68
Matrix Functions of a Matrix Variable
Appendix: Perturbing Roots of Polynomials
z
z
p
p
A point z travelling around a circle winds around each point p inside, and doesn’t
wind around any point p outside.
Lemma 6.24. As a point z travels counterclockwise around a circle, it travels
once around every point inside the circle, and does not travel around any point
outside the circle.
z
z
p
p
As z travels around the circle, the angle from p to z increases by 2π if p is inside, but
is periodic if p is outside.
Remark 6.25. Lets state the result more precisely. The angle between any
points z and p is defined only up to 2π multiples. As we will see, if z travels
around a circle counterclockwise, and p doesn’t lie on that circle, we can select
this angle to be a continuously varying function φ. If p is inside the circle, then
this function increases by 2π as ztravels around the circle. If p is outside the
circle, then this function is periodic as z travels around the circle.
Proof. Rotate the picture to get p to lie on the positive x-axis, say p = (p0 , 0).
Scale to get the circle to be the unit circle, so z = (cos θ, sin θ). The vector
from z to p is
p − z = (p0 − cos θ, − sin θ) .
This vector has angle φ from the horizontal, where
p0 − cos θ sin θ
(cos φ, sin φ) =
,−
r
r
with
q
r=
2
(p0 − cos θ) + sin2 θ.
If p0 > 1, then cos φ > 0 so that after adding multiples of 2π, we must have φ
contained inside the domain of the arcsin function:
sin θ
φ = arcsin −
,
r
a continuous function of θ, and φ (θ + 2π) = φ (θ). This continuous function φ
is uniquely determined up to adding integer multiples of 2π.
Appendix: Perturbing Roots of Polynomials
On the other hand, suppose that p0 < 1. Consider the angle Φ between
Z = (p0 cos θ, p0 sin θ) and P = (1, 0). By the argument above
p0 sin θ
Φ = arcsin −
r
where
q
2
r = (1 − p0 cos θ) + p20 sin2 θ.
Rotating by θ takes P to z and Z to p. Therefore the angle of the ray from
z to p is φ = Φ(θ) + θ, a continuous function increasing by 2π every time θ
increases by 2π. Since Φ is uniquely determined up to adding integer multiples
of 2π, so is φ.
Corollary 6.26. Consider the complex polynomial function
P (z) = (z − p1 ) (z − p2 ) . . . (z − pn ) .
Suppose that p1 lies inside some disk, and all other roots p2 , p3 , . . . , pn lie outside
that disk. Then as z travels once around the boundary of that disk, the argument
of the complex number w = P (z) increases by 2π.
Proof. The argument of a product is the sum of the arguments of the factors,
so the argument of P (z) is the sum of the arguments of z − p1 , z − p2 , etc.
Corollary 6.27. Consider the complex polynomial function
P (z) = a (z − p1 ) (z − p2 ) . . . (z − pn ) .
Suppose that some roots p1 , p2 , . . . , pk all lie inside some disk, and all other
roots pk+1 , pk+2 , . . . , pn lie outside that disk. Then as z travels once around the
boundary of that disk, the argument of the complex number w = P (z) increases
by 2πk.
Corollary 6.28. Consider two complex polynomial functions P (z) and Q(z).
Suppose that P (z) has k roots lying inside some disk, and Q(z) has ` roots lying
inside that same disk, and all other roots of P (z) and Q(z) lie outside that disk.
(So no roots of P (z) or Q(z) lie on the boundary of the disk.) Then as z travels
once around the boundary of that disk, the argument of the complex number
w = P (z)/Q(z) increases by 2π(k − `).
Theorem 6.29 (The argument principle). If P (z) is a polynomial, with k
roots inside a particular disk, and no roots on the boundary of that disk, then
every polynomial Q(z) of the same degree as P (z) and whose coefficients are
sufficiently close to the coefficients of P (z) has exactly k roots inside the same
disk, and no roots on the boundary.
69
70
Matrix Functions of a Matrix Variable
Proof. To apply corollary 6.28 on the preceding page, we have only to ensure
that Q(z)/P (z) is not going to change in argument (or vanish) as we travel
around the boundary of that disk. So we have only to ensure that while z
stays on the boundary of the disk, Q(z)/P (z) lies in a particular half-plane, for
example that Q(z)/P (z) is never a negative real number (or 0). So it is enough
to ensure that |P (z) − Q(z)| < |P (z)| for z on the boundary of the disk. Let
m be the minimum value of |P (z)| for z on the boundary of the disk. Suppose
that the furthest point of our disk from
Pthe jorigin is some point z with |z| = R.
Then if we write out Q(z) = P (z)
+
cj z , we only need to ensure that the
P
coefficients c0 , c1 , . . . , cn satisfy
|cj |Rj < m, to be sure that Q(z) will have
the same number of roots as P (z) in that disk.
Chapter 7
Symmetric Functions of Eigenvalues
Symmetric Functions
Definition 7.1. A function f (x1 , x2 , . . . , xn ) is symmetric if its value is unchanged by permuting the variables x1 , x2 , . . . , xn .
For example, x1 + x2 + · · · + xn is clearly symmetric.
Definition 7.2. The elementary symmetric functions are the functions
s1 (x) = x1 + x2 + . . . + xn
s2 (x) = x1 x2 + x1 x3 + · · · + x1 xn + x2 x3 + · · · + xn−1 xn
..
.
X
sk (x) =
xi1 xi2 . . . xik .
i1 <i2 <···<ik
For any (real or complex) numbers x = (x1 , x2 , . . . , xn ) let
Px (t) = (t − x1 ) (t − x2 ) . . . (t − xn ) .
Clearly the roots of Px (t) are precisely the entries of the vector x.
7.1∗ Prove that
k
Px (t) = tn − s1 (x)tn−1 + s2 (x)tn−2 + · · · + (−1) sk (x)tn−k + · · · + (−1)n sn (x).
Definition 7.3. Let s(x) = (s1 (x), s2 (x), . . . , sn (x)), so that s : Rn → Rn (If we
work with complex numbers, then s : Cn → Cn .)
Lemma 7.4. The map s is onto, i.e. for each complex vector w in Cn there
is a complex vector z in Cn so that s(z) = w.
Proof. Pick any w in Cn . Let z1 , z2 , . . . , zn be the complex roots of the polynomial
P (t) = tn − w1 tn−1 + w2 tn−2 + · · · + (−1)n wn .
Such roots exist by the fundamental theorem of algebra. Clearly Pz (t) = P (t),
since these polynomial functions have the same roots and same leading term.
Lemma 7.5. The entries of two vectors z and w are permutations of one
another just when s(z) = s(w).
71
72
Symmetric Functions of Eigenvalues
Proof. The roots of Pz (t) and Pw (t) are the same numbers.
Corollary 7.6. A function is symmetric just when it is a function of the
elementary symmetric functions.
Remark 7.7. This means that every symmetric function f : Cn → C has the
form f (z) = h(s(z)), for a unique function h : Cn → C, and conversely if h is
any function at all, then f (z) = h(s(z)) determines a symmetric function.
Theorem 7.8. A symmetric function of some complex variables is continuous
just when it is expressible as a continuous function of the elementary symmetric
functions, and this expression is uniquely determined.
Proof. If h(z) is continuous, clearly f (z) = h(s(z)) is. If f (z) is continuous and
symmetric, then given any sequence w1 , w2 , . . . in Cn converging to a point w,
we let z1 , z2 , . . . be a sequence in Cn for which s (zj ) = wj , and z a point for
which s(z) = w. The entries of zj are the roots of the polynomial
Pzj (t) = tn − wj1 tn−1 + wj2 tn−2 + · · · + (−1)n wjn .
By the argument principle (theorem 6.29 on page 69), we can rearrange the
entries of each of the various z1 , z2 , . . . vectors so that they converge to z.
Therefore h (wj ) = f (zj ) converges to f (z) = h(w). If there are two expressions,
f (z) = h1 (s(z)) and f (z) = h2 (s(z)), then because s is onto, h1 = h2 .
If a = (a1 , a2 , . . . , an ), write z a to mean z1a1 z2a2 . . . znan . Call a the weight of
the monomial z a . We will order weights by “alphabetical” order, for example so
that (2, 1) > (1, 2). Define the weight of a polynomial to be the highest weight
of any of its monomials. (The zero polynomial will not be assigned any weight.)
The weight of a product of nonzero polynomials is the sum of the weights. The
weight of a sum is at most the highest weight of any term. The weight of sj (z)
is
(1, 1, . . . , 1, 0, 0, . . . , 0).
| {z }
j
Theorem 7.9. Every symmetric polynomial f has exactly one expression as a
polynomial in the elementary symmetric polynomials. If f has real/rational/integer
coefficients, then f is a real/rational/integer coefficient polynomial of the elementary symmetric polynomials.
Proof. For any monomial z a , let
z ā =
X
z p(a)
p
a sum over all permutations p. Every symmetric polynomial, if it contains
a monomial z a , must also contain z p(a) , for any permutation p. Hence every
symmetric polynomial is a sum of z ā polynomials. Consequently the weight a
of a symmetric polynomial f must satisfy a1 ≥ a2 ≥ · · · ≥ an . We have only
73
Symmetric Functions
to write the z ā in terms of the elementary symmetric functions, with integer
coefficients. Let bn = an , bn−1 = an−1 −bn , bn−2 = an−2 −bn−1 , . . . , b1 = a1 −b2 .
Then s(z)b has leading monomial z a , so z ā − s(z)b has lower weight. Apply
induction on the weight.
2
z12 + z22 = (z1 + z2 ) − 2 z1 z2 . To compute out these expressions: f (z) =
z12 + z22 has weight (2, 0). The polynomials s1 (z) and s2 (z) have weights
(1, 0) and (1, 1). So we subtract off the appropriate weights of s1 (z)2
from f (z), and find f (z) − s1 (z)2 = −2 z1 z2 = −2s2 (z).
Sums of powers
Define pj (z) = z1j + z2j + · · · + znj , the sums of powers.
Lemma 7.10. The sums of powers are related to the elementary symmetric
functions by
0 = k sk − p1 sk−1 + p2 sk−2 − · · · + (−1)k−1 pk−1 s1 + (−1)k pk .
Proof. Lets write z (`) for z with the `-th entry removed, so if z is a vector in
Cn , then z (`) is a vector in Cn−1 .
X j
X
pj sk−j =
z`
zi1 zi2 . . . zik−j
i1 <i2 <···<ik−j
`
Either we can’t pull a z` factor out of the second sum, or we can:
=
X
`
=
X
z`j
i1 ,iX
2 ,···6=`
zi1 zi2 . . . zik−j +
i1 <i2 <···<ik−j
X
i1 ,iX
2 ,···6=`
z`j+1
zi1 zi2 . . . zik−j−1
i1 <i2 <···<ik−j−1
`
X
z`j sk−j z (`) +
z`j+1 sk−j−1 z (`) .
`
`
Putting in successive terms of our sum,
X
X j
pj sk−j − pj+1 sk−j−1 =
z` sk−j z (`) +
z`j+1 sk−j−1 z (`)
`
−
X
=
X
`
z`j+1 sk−j−1
X
z (`) −
z`j+2 sk−j−2 z (`)
`
`
z`j sk−j
z
(`)
`
−
X
z`j+2 sk−j−2
z (`) .
`
Hence the sum collapses to
p1 sk − p2 sk−1 + · · · + (−1)k−1 pk−1 s1 =
X
X
z` sk−1 z (`) + (−1)k−1
z`k · s0 z (`)
`
= k sk + (−1)k−1 pk .
`
74
Symmetric Functions of Eigenvalues
Proposition 7.11. Every symmetric polynomial is a polynomial in the sums
of powers. If the coefficients of the symmetric polynomial are real (or rational),
then it is a real (or rational) polynomial function of the sums of powers. Every
continuous symmetric function of complex variables is a continuous function of
the sums of powers.
Proof. We can solve recursively for the sums of powers in terms of the elementary
symmetric functions and conversely.
Remark 7.12. The standard reference on symmetric functions is [5].
The Invariants of a Square Matrix
Definition 7.13. A complex-valued or real-valued function f (A),
depending on
the entries of a square matrix A, is an invariant if f F AF −1 = f (A) for any
invertible matrix F .
So an invariant is independent of change of basis. If T : V → V is a linear
map on an n-dimensional vector space, we can define the value f (T ) of any
invariant f of n × n matrices, by letting f (T ) = f (A) where A is the matrix
associated to F T F −1 , for any isomorphism F : Rn → V .
For any n × n matrix A, write
det(A − λ) = sn (A) − sn−1 (A)λ + sn−2 (A)λ2 + · · · + (−1)n λn .
The functions s1 (A), s2 (A), . . . , sn (A) are invariants.
The functions
pk (A) = tr Ak .
are invariants.
7.2 If A is diagonal, say
z1
A=
z2
..
,
.
zn
then prove that sj (A) = sj (z1 , z2 , . . . , zn ), the elementary symmetric functions
of the eigenvalues.
7.3∗ Generalize the previous exercise to A diagonalizable.
The Invariants of a Square Matrix
Theorem 7.14. Each continuous (or polynomial) invariant function of a complex matrix has exactly one expression as a continuous (or polynomial) function
of the elementary symmetric functions of the eigenvalues. Each polynomial
invariant function of a real matrix has exactly one expression as a polynomial
function of the elementary symmetric functions of the eigenvalues.
Remark 7.15. We can replace the elementary symmetric functions of the eigenvalues by the sums of powers of the eigenvalues.
Proof. Every continuous invariant function f (A) determines a continuous function f (z) by setting
z1
z2
A=
.
..
.
zn
−1
Taking F any permutation
matrix,
invariance
tells
us
that
f
F
AF
= f (A).
−1
But f F AF
is given by applying the associated permutation to the entries of
z. Therefore f (z) is a symmetric function. If f (A) is continuous (or polynomial)
then f (z) is too. Therefore f (z) = h(s(z)), for some continuous (or polynomial)
function h; so f (A) = h(s(A)) for diagonal matrices. By invariance, the same is
true for diagonalizable matrices. If we work with complex matrices, then every
matrix can be approximated arbitrarily closely by diagonalizable matrices (by
theorem 4.4 on page 48). Therefore by continuity of h, the equation f (A) =
h(s(A)) holds for all matrices A.
For real matrices, the equation only holds for those matrices whose eigenvalues are real. However, for polynomials this is enough, since two polynomial
functions equal on an open set must be equal everywhere.
Remark 7.16. Consider the function f (A) = sj (|λ1 | , |λ2 | , . . . , |λn |), where A
has eigenvalues λ1 , λ2 , . . . , λn . This function is a continuous invariant of a real
matrix A, and is not a polynomial in λ1 , λ2 , . . . , λn .
75
Chapter 8
The Pfaffian
Skew-symmetric matrices have a surprising additional polynomial invariant, called the
Pfaffian, but it is only invariant under rotations, and only exists for skew-symmetric
matrices with an even number of rows and columns.
Skew-Symmetric Normal Form
Theorem 8.1. For any skew-symmetric matrix A with an even number of rows
and columns, there is a rotation matrix F so that
F AF
−1
0
−a1
=
a1
0
0
−a2
a2
0
..
.
0
−an
.
an
0
(We can say that F brings A to skew-symmetric normal form.) If A is any skewsymmetric matrix with an odd number of rows and columns, we can arrange
the same equation, again via a rotation matrix F , but the normal form has an
extra row of zeroes and an extra column of zeroes:
F AF −1
0
−a1
=
a1
0
0
−a2
a2
0
..
.
0
−an
an
0
.
0
Proof. Because A is skew-symmetric, A is skew-adjoint, so normal when thought
of as a complex matrix. So there is a unitary basis of C2n of complex eigenvectors
of A. If λ is an complex eigenvalue of A, with complex eigenvector z, scale z
77
78
The Pfaffian
to have unit length, and then
λ = λ hz, zi
= hλz, zi
= hAz, zi
= − hz, Azi
= − hz, λzi
= −λ̄ hz, zi
= −λ̄.
Therefore λ = −λ̄, i.e. λ has the form ia for some real number a. So there are
two different possibilities: λ = 0 or λ = ia with a 6= 0.
If λ = 0, then z lies in the kernel, so if we write z = x + iy then both x
and y lie in the kernel. In particular, we can write a real orthonormal basis
for the kernel, and then x and y will be real linear combinations of those basis
vectors, and therefore z will be a complex linear combination of those basis
vectors. Lets take u1 , u2 , . . . , us to be a real orthonormal basis for the kernel
of A, and clearly then the same vectors u1 , u2 , . . . , un form a complex unitary
basis for the complex kernel of A.
Next lets take care of the nonzero eigenvalues. If λ = ia is a nonzero
eigenvalue, with unit length eigenvector z, then taking complex conjugates on
the equation Az = λz = iaz, we find Az̄ = −iaz̄, so z̄ is another eigenvector
with eigenvalue −ia. So they come in pairs. Since the eigenvalues ia and −ia
are distinct, the eigenvectors z and z̄ must be perpendicular. So we can always
make a new unitary basis of eigenvectors, throwing out any λ = −ia eigenvector
and replacing it with z̄ if needed, to ensure that for each eigenvector z in our
unitary basis of eigenvectors, z̄ also belongs to our unitary basis. Moreover, we
have three equations: Az = iaz, hz, zi = 1, and hz, z̄i = 0. Write z = x + iy
with x and y real vectors, and expand out all three equations in terms of x
and y to find Ax = −ay, Ay = ax, hx, xi + hy, yi = 1, hx, xi − hy, yi = 0 and
hx, yi = 0. So if we let X = √12 x and Y = √12 y, then X and Y are unit vectors,
and AX = −aY and AY = aX.
Now if we carry out this process for each eigenvalue λ = ia with a > 0,
then we can write down vectors X1 , Y1 , X2 , Y2 , . . . , Xt , Yt , one pair for each
eigenvector from our unitary basis with a nonzero eigenvalue. These vectors
are each unit length, and each Xi is perpendicular to each Yi . We also have
AXi = −ai Yi and AYi = ai Xi .
If zi and zj are two different eigenvectors from our original unitary basis
of eigenvectors, and their eigenvalues are λi = iai and λj = iaj with ai , aj >
0, then we want to see why Xi , Yi , Xj and Yj must be perpendicular. This
follows immediately from zi , z̄i , zj and z̄j being perpendicular, by just expanding
into real and imaginary parts. Similarly, we can see that u1 , u2 , . . . , us are
perpendicular to each Xi and Yi . So finally, we can let
F = X1 Y1 X2 Y2 . . . Xt Yt u1 u2 . . . us .
79
Partitions
Clearly these vectors form a real orthonormal basis, so F is an orthogonal matrix.
We want to arrange that F be a rotation matrix. Lets suppose that F is not a
rotation. We can either change the sign of one of the vectors u1 , u2 , . . . , us (if
there are any), or replace X1 by −X1 , which switches the sign of a1 , to make
F orthogonal.
Partitions
A partition of the numbers 1, 2, . . . , 2n is a choice of division of these numbers
into pairs. For example, we could choose to partition 1, 2, 3, 4, 5, 6 into
{4, 1} , {2, 5} , {6, 3} .
This is the same partition if we write the pairs down in a different order, like
{2, 5} , {6, 3} , {4, 1} ,
or if we write the numbers inside each pair down in a different order, like
{1, 4} , {5, 2} , {6, 3} .
It isn’t really important that the objects partitioned be numbers. Of course,
you can’t partition an odd number of objects into pairs. Each permutation p
of the numbers 1, 2, 3, . . . , 2n has an associated partition
{p(1), p(2)} , {p(3), p(4)} , . . . , {p(2n − 1), p(2n)} .
For example, the permutation 3, 1, 4, 6, 5, 2 has associated partition
{3, 1} , {4, 6} , {5, 2} .
Clearly two different permutations p and q could have the same associated
partition, i.e. we could first transpose various of the pairs of the partition of
p, keeping each pair in order, and then transpose entries within each pair, but
not across different pairs. Consequently, there are n!2n different permutations
associating the same partition: n! ways to permute pairs, and 2n ways to
swap the order within each pair. When you permute a pair, like changing
3, 1, 4, 6, 5, 2 to 4, 6, 3, 1, 5, 2, this is the effect of a pair of transpositions (one to
permute 3 and 4 and another to permute 5 and 6), so has no effect on signs.
Therefore if two permutations have the same partition, the root cause of any
difference in sign must be from transpositions inside each pair. For example,
while it is complicated to find the signs of the permutations 3, 1, 4, 6, 5, 2 and
of 4, 6, 1, 3, 5, 2, it is easy to see that these signs must be different.
On the other hand, we can write each partition in “alphabetical order”, like
for example rewriting
{4, 1} , {2, 5} , {6, 3}
as
{1, 4} , {2, 5} , {3, 6}
80
The Pfaffian
so that we put each pair in order, and then order the pairs among one another
by their lowest elements. This in term determines a permutation, called the
natural permutation of the partition, given by putting the elements in that
order; in our example this is the permutation
1, 4, 2, 5, 3, 6.
We write the sign of a permutation p as sgn(p), and define the sign of a partition P to be sign of its natural permutation. Watch out: if we start with a
permutation p, like 6, 2, 4, 1, 3, 5, then the associated partition P is
{6, 2} , {4, 1} , {3, 5} .
This is the same partition as
{1, 4} , {2, 6} , {3, 5}
(just written in alphabetical order). The natural permutation q of P is therefore
1, 4, 2, 6, 3, 5,
so the original permutation p is not the natural permutation of its associated
partition.
8.1 How many partitions are there of the numbers 1, 2, . . . , 2n?
8.2 Write down all of the partitions of
(a) 1, 2;
(b) 1, 2, 3, 4;
(c) 1, 2, 3, 4, 5, 6.
The Pfaffian
We want to write down a square root of the determinant.
If A is a 2 × 2 skew-symmetric matrix,
0 a
A=
,
−a 0
then det A = a2 , so the entry a = A12 is a polynomial function of the
entries of A, which squares to det A.
81
The Pfaffian
A huge calculation shows that if A is a 4 × 4 skew-symmetric matrix,
then
2
det A = (A12 A34 − A13 A24 + A14 A23 ) .
So A12 A34 − A13 A24 + A14 A23 is a polynomial function of the entries
of A which squares to det A.
Definition 8.2. For any 2n × 2n skew-symmetric matrix A, let
Pf A =
1 X
sgn(p)Ap(1)p(2) Ap(3)p(4) . . . Ap(2n−1)p(2n)
n!2n p
where sgn(p) is the sign of the permutation p. Pf is called the Pfaffian.
Remark 8.3. Don’t ever try to use this horrible formula to compute a Pfaffian.
We will find better way soon.
Lemma 8.4. For any 2n × 2n skew-symmetric matrix A,
X
Pf A =
sgn(P )Ap(1)p(2) Ap(3)p(4) . . . Ap(2n−1)p(2n) ,
P
where the sum is over partitions P and the permutation p is the natural permutation of the partition P . In particular, Pf A is an integer coefficient polynomial
of the entries of the matrix A.
Proof. Each permutation p has an associated partition P . So we can write the
Pfaffian as a sum
1 XX
Pf A =
sgn(p)Ap(1)p(2) Ap(3)p(4) . . . Ap(2n−1)p(2n)
n!2n
p
P
where the first sum is over all partitions P , and the second over all permutations
p which have P as their associated partition. But if two permutations p and q
both have the same associated partition
{p(1), p(2)} , {p(3), p(4)} , . . . , {p(2n − 1), p(2n)} ,
then p and q give the same pairs of indices in the expression Ap(1)p(2) Ap(3)p(4) . . . Ap(2n−1)p(2n) .
Perhaps some of the indices in these pairs might be reversed. For example, we
might have partition P being
{1, 5} , {2, 6} , {3, 4} ,
and permutations p being 1, 5, 2, 6, 3, 4 and q being 5, 1, 2, 6, 3, 4. The contribution to the sum coming from p is
sgn(p)A15 A26 A34 ,
82
The Pfaffian
while that from q is
sgn(q)A51 A26 A34 .
But then A51 = −A15 , a sign change which is perfectly offset by the sign sgn(q):
each transposition of pairs gives a sign change from sgn(p).
So put together, we find that for any two permutations p and q with the
same partition, their contributions are the same:
sgn(q)Aq(1)q(2) Aq(3)q(4) . . . Aq(2n−1)q(2n) = sgn(p)Ap(1)p(2) Ap(3)p(4) . . . Ap(2n−1)p(2n) .
Therefore the n!2n permutations with associated partition P all contribute the
same amount as the natural permutation of P .
Rotation Invariants of Skew-Symmetric Matrices
Theorem 8.5.
Pf 2 A = det A.
Moreover, for any 2n × 2n matrix B,
Pf BAB t = Pf(A) det(B).
If A is in skew-symmetric normal form, say
0
a1
−a1 0
0
a2
−a
0
2
A=
..
.
0
−an
,
an
0
then
Pf A = a1 a2 . . . an .
Proof. Lets start by proving that Pf A = a1 a2 . . . an for A in skew-symmetric
normal form. Looking at the terms that appear in the Pfaffian, we find that
at least one of the factors Ap(2j−1)p(2j) in each term will vanish unless these
factors come from among the entries A1,2 , A3,4 , . . . , A2n−1,2n . So all terms
vanish, except when the partition is (in alphabetical order)
{1, 2} , {3, 4} , . . . , {2n − 1, 2n} ,
yielding
Pf A = a1 a2 . . . an .
2
In particular, Pf A = det A.
83
Rotation Invariants of Skew-Symmetric Matrices
For any 2n × 2n matrix B,
X
n!2n Pf BAB t =
sgn(p) BAB t p(1)p(2) BAB t p(3)p(4) . . . BAB t p(2n−1)p(2n)
p
=
X
sgn(p)
X
p
Bp(1)i1 Ai1 i2 Bp(2)i2
i1 i2
Bp(3)i3 Ai3 i4 Bp(4)i4
i3 i4
X
···
X
Bp(2n−1)i2n−1 Ai2n−1 i2n Bp(2n)i2n
i2n−1 i2n
!
=
=
X
X
i1 i2 ...i2n
p
X
sgn(p)Bp(1)i1 Bp(2)i2 Bp(3)i3 . . . Bp(2n)i2n
det Bei1
Bei2
...
Ai1 i2 Ai3 i4 . . . Ai2n−1 i2n
Bei2n Ai1 i2 Ai3 i4 . . . Ai2n−1 i2n
i1 i2 ...i2n
If i1 = i2 then two columns are equal inside the determinant, so we can write
this as a sum over permutations:
=
X
=
X
det Beq(1)
Beq(2)
...
Beq(2n) Aq(1)q(2) Aq(3)q(4) . . . Aq(2n−1)q(2n)
q
sgn(q) det Be1
Be2
...
Be2n Aq(1)q(2) Aq(3)q(4) . . . Aq(2n−1)q(2n)
q
= n!2n det B Pf A.
Finally, to prove that Pf 2 A = det A, we just need to get A into skew-symmetric
normal form via a rotation matrix B, and then Pf 2 A = Pf 2 (BAB t ) = det (BAB t ) =
det A.
How do you calculate the Pfaffian in practice? It is like the determinant,
except that you start running your finger down the first column under the
diagonal, and you write down −, +, −, +, . . . in front of each entry from the
first column, and then the Pfaffian you get by crossing out that row and column,
84
The Pfaffian
and symmetrically the corresponding column and row. So
0 −2
0 −2 1
3
2
0
2
0
8 −4
= −(2) · Pf −1 −8
Pf
−1 −8 0
5
−3 4
−3 4 −5 0
0 −2
2
0
−1
−8
+ (−1) · Pf
−3 4
− (−3) · Pf
0
2
−1
−3
−2
0
−8
4
1
8
0
−5
3
−4
5
0
1
8
0
−5
3
−4
5
0
1
8
0
−5
3
−4
5
0
= −(2) · 5 + (−1) · (−4) − (−3) · 8.
Lets prove that this works:
Lemma 8.6. If A is a skew-symmetric matrix with an even number of rows
and columns, larger than 2 × 2, then
Pf A = −A21 Pf A[21] + A31 Pf A[31] − . . .
X
=
(−1)i+1 Ai1 Pf A[i1] ,
j>1
where A[ij] is the matrix A with rows i and j and columns i and j removed.
Proof. Lets define a polynomial P (A) in the entries of a skew-symmetric matrix
A (with an even number of rows and columns) by setting P (A) = Pf A if A is
2 × 2, and setting
X
P (A) =
(−1)i+1 Ai1 P A[i1] ,
i>1
for larger A. We need to show that P (A) = Pf A. Clearly P (A) = Pf A if A is
in skew-symmetric normal form. Each term in Pf A corresponds to a partition,
and each partition must put 1 into one of its pairs, say in a pair {1, i}. It then
can’t use 1 or i in any other pair. Clearly P (A) also has exactly one factor like
Ai1 in each term, and then no other factors get to have i or 1 as subscripts.
Moreover, all terms in P (A) and in Pf A have a coefficient of 1 or −1. So it is
clear that the terms of P (A) and of Pf A are the same, up to sign. We have to
fix the signs.
Suppose that we swap rows 2 and 3 of A and columns 2 and 3. Lets show that
this changes the signs of P (A) and of Pf A. For Pf A, this is immediate from
85
Rotation Invariants of Skew-Symmetric Matrices
theorem 8.5 on page 82. Let Q be the permutation matrix of the transposition
of 2 and 3. (To be more precise, let Qn be the n × n permutation matrix of
the transposition of 2 and 3, for any value of n ≥ 3. But lets write all such
matrices Qn as Q.) Let B = QAQt , i.e. A with rows 2 and 3 and columns 2
and 3 swapped. So Bij is just Aij unless i or j is either 2 or 3. So
P (B) = −B21 P B [21] + B31 P B [31] − B41 P B [41] + . . .
= −A31 P A[31] + A21 P A[21] − A41 P QA[41] Qt
By induction, the sign changes in the last term.
= +A21 P A[21] − A31 P A[31] + A41 P A[41]
= −P (A).
So swapping rows 2 and 3 changes a sign. In the same way,
P QAQt = sgn qP (A),
for Q the permutation matrix of any permutation q of the numbers 2, 3, . . . , 2n.
If we start with A in skew-symmetric normal form, letting the numbers a1 , a2 , . . . , an
in the skew-symmetric normal form be some abstract variables, then Pf A is
just a single term of the Pf and of P and these terms have the same sign.
All of the terms of Pf are obtained by permuting indices in this term, i.e. as
Pf (QAQt ) for suitable permutation matrices Q. Indeed you just need to take Q
the permutation matrix of the natural permutation of each partition. Therefore
the signs of Pf and of P are the same for each term, so P = Pf.
8.3∗ Prove that the odd degree elementary symmetric functions of the eigenvalues vanish on any skew-symmetric matrix.
Let s1 (a), . . . , sn (a) be the usual symmetric functions of some numbers
a1 , a2 , . . . , an . For any vector a let t(a) be the vector
s1 a21 , a22 , . . . , a2n s2 a21 , a22 , . . . , a2n
..
.
.
sn−1 a21 , a22 , . . . , a2n
a1 a2 . . . an
Lemma 8.7. Two complex vectors a and b in Cn satisfy t(a) = t(b) just when
b can be obtained from a by permutation of entries and changing signs of an
even number of entries. A function f (a) is invariant under permutations and
even numbers of sign changes just when f (a) = h(t(a)) for some function h.
2
Proof. Clearly sn (a21 , a22 , . . . , a2n ) = (a1 a2 . . . an ) = tn (a)2 . In particular, the
symmetric functions of a21 , a22 , . . . , a2n are all functions of t1 , t2 , . . . , tn . Therefore
86
The Pfaffian
if we have two vectors a and b with t(a) = t(b), then a21 , a22 , . . . , a2n are a permutation of b21 , b22 , . . . , b2n . So after permutation, a1 = ±b1 , a2 = ±b2 , . . . , an = ±bn ,
equality up to some sign changes. Since we also know that tn (a) = tn (b), we
must have a1 a2 . . . an = b1 b2 . . . bn . If none of the ai vanish, then a1 a2 . . . an =
b1 b2 . . . bn ensures that none of the bi vanish either, and that the number of
sign changes is even. It is possible that one of the ai vanish, in which case we
can change its sign as we like to arrange that the number of sign changes is
even.
Lemma 8.8. Two skew-symmetric matrices A and B with the same even
numbers of rows and columns can be brought one to another, say B = F AF t ,
by some rotation matrix F , just when they have skew-symmetric normal forms
0
a1
−a1 0
0
a2
−a2 0
..
.
0
an
−an 0
and
0
−b1
b1
0
0
−b2
b2
0
..
.
0
−bn
bn
0
respectively, with t(a) = t(b).
Proof. If we have a skew-symmetric normal form for a matrix A, with numbers a1 , a2 , . . . , an as above, then t1 (a), t2 (a), . . . , tn−1 (a) are the elementary
symmetric functions of the squares of the eigenvalues, while tn (a) = Pf A, so
clearly t(a) depends only on the invariants of A under rotation. In particular, suppose that I find two different skew-symmetric normal forms, one with
numbers a1 , a2 , . . . , an and one with numbers b1 , b2 , . . . , bn . Then the numbers
b1 , b2 , . . . , bn must be given from the numbers a1 , a2 , . . . , an by permutation
and switching of an even number of signs. In fact we can attain these changes
by actual rotations as follows.
For example, think about 4 × 4 matrices. The permutation matrix F of
3, 4, 1, 2 permutes the first two and second two basis vectors, and is a rotation
because the number of transpositions is even. When we replace A by F AF t , we
swap a1 with a2 . Similarly, we can take the matrix F which reflects e1 and e3 ,
changing the sign of a1 and of a2 . So we can clearly carry out any permutations,
and any even number of sign changes, on the numbers a1 , a2 , . . . , an .
Rotation Invariants of Skew-Symmetric Matrices
Lemma 8.9. Any polynomial in a1 , a2 , . . . , an can be written in only one way
as h(t(a)).
Proof. Recall that every complex number has a square root (a vector with half
the argument and the square root of the modulus). Clearly 0 has only 0 as
square root, while
√ all other complex numbers z have two square roots, which
we write as ± z.
Given any complex vector b, I can solve t(a) = b by first constructing a
solution c to
b1
b2
s(c) = ... ,
bn−1
b2n
√
and then letting aj = ± cj . Clearly t(a) = b unless tn (a) has the wrong sign.
If we change the sign of one of the aj then we can fix this. So t : Cn → Cn is
onto.
Theorem 8.10. Each polynomial invariant of a skew-symmetric matrix with
even number of rows and columns can be expressed in exactly one way as a
polynomial function of the even degree symmetric functions of the eigenvalues
and the Pfaffian. Two skew-symmetric matrices A and B with the same even
numbers of rows and columns can be brought one to another, say B = F AF t ,
by some rotation matrix F , just when their even degree symmetric functions
and their Pfaffian agree.
Proof. If f (A) is a polynomial invariant under rotations, i.e. f (F AF t ) = f (A),
then we can write f (A) = h(t(a)), with a1 , a2 , . . . , an the numbers in the skewsymmetric normal form of A, and h some function. Lets write the restriction
of f to the normal form matrices as as polynomial f (a). We can split f into a
sum of homogeneous polynomials of various degrees, so lets assume that f is
already homogeneous of some degree. We can pick any monomial in f and sum
it over permutations and over changes of signs of any even number of variables,
and f will be a sum over such quantities. So we only have to consider each
such quantity, i.e. assume that
X
d
d
d
f=
(±a1 ) p(1) (±a2 ) p(2) . . . (±an ) p(n)
where the sum is over all choices of any even number of minus signs and all
permutations p of the degrees d1 , d2 , . . . , dn . If all degrees d1 , d2 , . . . , dn are even,
then f is an elementary symmetric function of a21 , a22 , . . . , a2n , so a polynomial
in t1 (a), t2 , (a), . . . , tn−1 (a). If all degrees are odd, then they are all positive,
and we can divide out a factor of a1 a2 . . . an = tn (a). So lets assume that at
least one degree is even, say d1 , and that at least one degree is odd, say d2 . All
terms in equation 8 that put a plus sign in front of a1 and a2 cancel those terms
which put a minus sign in front of both a1 and a2 . Similarly, terms putting
87
88
The Pfaffian
a minus sign in front of a1 and a plus sign in front of a2 cancel those which
do the opposite. So f = 0. Consequently, invariant polynomial functions f (A)
are polynomials in the Pfaffian and the symmetric functions of the squared
eigenvalues.
The characteristic polynomial of an n×n skew-symmetric matrix A is clearly
det (A − λ I) = λ + a21 λ + a22 . . . λ + a2n ,
so that
s2j (A) = sj a21 , a22 , . . . , a2n ,
s2j−1 (A) = 0,
for any j = 1, . . . , n. Consequently, invariant polynomial functions f (A) are
polynomials in the Pfaffian and the even degree symmetric functions of the
eigenvalues.
The Fast Formula for the Pfaffian
Since Pf (BAB t ) = det B Pf A, we can find the Pfaffian of a large matrix by a
sort of Gaussian elimination process, picking B to be a permutation matrix, or
a strictly lower triangular matrix, to move A one step towards skew-symmetric
normal form. Careful:
8.4∗ Prove that replacing A by BAB t , with B a permutation matrix, permutes
the rows and columns of A.
8.5∗ Prove that if B is a strictly lower triangular matrix which adds a multiple
of, say, row 2 to row 3, then BAB t is A with that row addition carried out,
and with the same multiple of column 2 added to column 3.
We leave the reader to formulate the obvious notion of Gaussian elimination
of skew-symmetric matrices to find the Pfaffian.
Factorizations
Chapter 9
Dual Spaces
In this chapter, we learn how to manipulate whole vector spaces, rather than just
individual vectors. Out of any abstract vector space, we will construct some new
vector spaces, giving algebraic operations on vector spaces rather than on vectors.
The Vector Space of Linear Maps Between Two Vector Spaces
If V and W are two vector spaces, then a linear map T : V → W is also
often called a homomorphism of vector spaces, or a homomorphism for short,
or a morphism to be even shorter. We won’t use this terminology, but we will
nevertheless write Hom (V, W ) for the set of all linear maps T : V → W .
Definition 9.1. A linear map is onto if every output w in W comes from some
input: w = T v, some input v in V . A linear map is 1-to-1 if any two distinct
vectors v1 6= v2 get mapped to distinct vectors T v1 6= T v2 .
9.1 Turn Hom (V, W ) into a vector space.
9.2 Prove that a linear map is an isomorphism just when it is 1-to-1 and onto.
9.3 Give the simplest example you can of a 1-to-1 linear map which is not onto.
9.4 Give the simplest example you can of an onto linear map which is not
1-to-1.
9.5 Prove that a linear map is 1-to-1 just when its kernel consists precisely in
the zero vector.
The Dual Space
The simplest possible real vector space is K.
Definition 9.2. If V is a vector space, let V ∗ = Hom (V, K), i.e. V ∗ is the set
of linear maps T : V → K, i.e. the set of real-valued linear functions on V . We
call V ∗ the dual space of V .
We will usually write vectors in V with Roman letters, and vectors in V ∗
with Greek letters. The vectors in V ∗ are often called covectors.
91
92
Dual Spaces
If V = Kn , every linear function looks like
α(x) = a1 x1 + a2 x2 + · · · + an xn .
We can write this as
α(x) = a1
a2
...
x1
x2
an . .
..
xn
So we will identify Kn∗ with the set of row matrices. We will write
e1 , e2 , . . . , en for the obvious basis: ei is the i-th row of the identity
matrix.
9.6 Why is V ∗ a vector space?
9.7 What is dim V ∗ ?
Remark 9.3. V and V ∗ have the same dimension, but we should think of them
as quite different vector spaces.
Lemma 9.4. Suppose that V is a vector space with basis v1 , v2 , . . . , vn . Pick any
numbers a1 , a2 , . . . , an ∈ K. There is precisely one linear function f : V → K
so that
f (v1 ) = a1 and f (v2 ) = a2 and . . . and f (vn ) = an .
Proof. The equations above uniquely determine a linear function f , by theorem 1.13 on page 11, since we have defined the linear function on a basis.
Lemma 9.5. Suppose that V is a vector space with basis v1 , v2 , . . . , vn . There
is a unique basis for V ∗ , called the basis dual to v1 , v2 , . . . , vn , which we will
write using Greek letters, say as ξ1 , ξ2 , . . . , ξn , so that
(
1 if i = j ,
ξi (vj ) =
0 if i 6= j .
Remark 9.6. The hard part is getting used to the notation: ξ1 , ξ2 , . . . , ξn are
each a linear function taking vectors from V to numbers: ξ1 , ξ2 , . . . , ξn : V → K.
Proof. For each fixed i, the equations uniquely define ξi as above. The functions
ξP
they satisfy a linear relation
1 , ξ2 , . . . , ξn are linearly independent, because ifP
ai ξi = 0, then applying the linear function
ai ξi to the basis vector vj
must get zero, but we find that we get aj , so aj = 0, and this holds for
each j, so all numbers a1 , a2 , . . . , an vanish. The linear functions ξ1 , ξ2 , . . . , ξn
span V ∗ because, if we have any linear function f on V , then we can set
The Dual Space
P
a1 = f (v1 ) , a2 = f (v2 ) , . . . , an = f (vn ), and find f (v) =
aj ξj (v) for
v = v1 or v = v2 , etc.,
and
therefore
for
v
any
linear
combination
of v1 , v2 ,
P
etc. Therefore f =
aj ξj , and we see that these functions ξ1 , ξ2 , . . . , ξn span
V ∗.
9.8 Find the dual basis ξ1 , ξ2 , ξ3 to the basis
1
1
1
v1 = 0 , v2 = 2 , v3 = 2 ∈ K3 .
0
0
3
9.9 Recall that we identified each vector in Kn∗ with a row matrix. Suppose
that v1 , v2 , . . . , vn is a basis for Kn . Let ξ1 , ξ2 , . . . , ξn be the dual basis. Let
F be the matrix whose columns are v1 , v2 , . . . , vn , and G be the matrix whose
rows are ξ1 , ξ2 , . . . , ξn . Prove that G = F −1 .
Lemma 9.7. For any finite dimensional vector space V , V ∗∗ and V are isomorphic, by associating to each vector x from V the linear function fx from
V ∗∗ defined by
fx (α) = α(x).
Remark 9.8. This lemma is very confusing, but very simple, and therefore very
important.
Proof. First, lets ask what V ∗∗ means. Its vectors are linear functions on V ∗ ,
by definition. Next, lets pick a vector x in V and construct a linear function
on V ∗ . How? Take any covector α in V ∗ , and lets assign to it some number
f (α). Since α is (by definition again) a linear function on V , α(x) is a number.
Lets take the number f (α) = α(x). Lets call this function f = fx . The rest of
the proof is a series of exercises.
9.10 Check that fx is a linear function.
9.11 Check that the map T (x) = fx is a linear map T : V → V ∗∗ .
9.12 Check that T : V → V ∗∗ is one-to-one: i.e. if we pick two different vectors
x and y in V , then fx 6= fy .
Remark 9.9. Although V and V ∗∗ are identified as above, V and V ∗ cannot be
identified in any “natural manner,” and should be thought of as different.
Definition 9.10. If T : V → W is a linear map, write T ∗ : W ∗ → V ∗ for the
linear map given by
T ∗ (α)(v) = α(T v).
Call T ∗ the transpose of T .
9.13 Prove that T ∗ : W ∗ → V ∗ is a linear map.
9.14 What does this notion of transpose have to do with the notion of transpose
of matrices?
93
Chapter 10
Singular Value Factorization
We will analyse statistical data, using the spectral theorem.
Principal Components
Consider a large collection of data coming in from some kind of measuring
equipment. Lets suppose that the data consists in a large number of vectors,
say vectors v1 , v2 , . . . , vN in Rn . How can we get a good rough description of
what these vectors look like?
We can take the mean of the vectors
µ=
1
(v1 + v2 + · · · + vN ) ,
N
as a good description of where they lie. How do they arrange themselves
around the mean? To keep things simple, lets subtract the mean from each
of the vectors. So assume that the mean is µ = 0, and we are asking how the
vectors arrange themselves around the origin.
Imagine that these vectors v1 , v2 , . . . , vN tend to lie along a particular
line through the origin. Lets try to take an orthonormal basis of Rn , say
u1 , u2 , . . . , un , so that u1 points along that line. How can we find the direction
of that line? We look at the quantity hvk , xi. If the vectors lie nearly on a line
through 0, then for x on that line, hvk , xi should be large positive or negative,
while for x perpendicular to that line, hvk , xi should be nearly 0. If we square,
we can make sure the large positive or negative becomes large positive, so we
take the quantity
2
2
2
Q(x) = hv1 , xi + hv2 , xi + · · · + hvN , xi .
The spectral theorem guarantees that we can pick an orthonormal basis u1 , u2 , . . . , un
of eigenvectors of the symmetric matrix A associated to Q. We will arrange
the eigenvalues λ1 , λ2 , . . . , λn from largest to smallest. Because Q(x) ≥ 0, we
see that none of the eigenvalues are negative. Clearly Q(x) grows fastest in the
direction x = u1 .
10.1 The symmetric matrix A associated to Q(x) (for which
hAx, xi = Q(x)
95
96
Singular Value Factorization
for every vector x) is
Aij = hv1 , ei i hv1 , ej i + hv2 , ei i hv2 , ej i + · · · + hv2 , ei i hv2 , ej i .
If we rescale all of the vectors v1 , v2 , . . . , vN by the same nonzero scalar, then
the resulting vectors tend to lie along the same lines or planes as the original
vectors did. So it is convenient to replace Q(x) by the quadratic polynomial
function
P
2
k hvk , xi
Q(x) = P
2 .
` kv` k
This has associated symmetric matrix
P
hvk , ei i hvk , ej i
,
Aij = k P
2
` kv` k
which we will call the covariance matrix associated to the data.
Lemma 10.1. Given any set of nonzero vectors v1 , v2 , . . . , vN in Rn , write
them as the columns of a matrix V . Their covariance matrix
A= P
VVt
k
2
kvk k
has an orthonormal basis of eigenvectors u1 , u2 , . . . , un with eigenvalues 1 ≥
λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0.
Remark 10.2. The square roots of the eigenvalues are the correlation coefficients,
each indicating how much the data tends to lie in the direction of the associated
eigenvector.
P
2
Proof. We have only to check that hx, V V t xi = k hvk , xi , an exercise for
the reader, to see that A is the covariance matrix. Eigenvalues of A can’t
be negative, as mentioned already. For any vector x of length 1, the Schwarz
inequality says that
X
X
2
2
hvk , xi ≤
kvk k .
k
k
Therefore, by the minimum principle, eigenvalues of A can’t exceed 1.
Our data lies mostly along a line through 0 just when λ1 is large, and the
remaining eigenvalues λ2 , λ3 , . . . , λn are much smaller. More generally, if we
find that the first dozen or so eigenvalues are relatively large, and the rest
are relatively much smaller, then our data must lie very close to a subspace
of dimension a dozen or so. The data tends most strong to lie along the u1
direction; fluctations about that direction are mostly in the u2 direction, etc.
Every vector x can be written as x = a1 u1 + a2 u2 + · · · + an un , and
the numbers a1 , a2 , . . . , an are recovered from the formula ai = hx, ui i. If the
eigenvalues λ1 , λ2 , . . . , λd are relatively much larger than the rest, we can say
97
Singular Value Factorization
(a) Data points. The mean is
marked as a cross.
(b) The same data. Lines indicate the directions of eigenvectors. Vectors sticking out from
the mean are drawn in those directions. The lengths of the vectors give the correlation coefficients.
Figure 10.1: Applying principal components analysis to some data points.
that our data live near the subspace spanned by u1 , u2 , . . . , ud , and say that
our data has d effective dimensions. The numbers a1 , a2 , . . . , ad are called the
principal components of a vector x.
To store the data, instead of remembering all of the vectors v1 , v2 , . . . , vN ,
we just keep track of the eigenvectors u1 , u2 , . . . , ud , and of the principal components of the vectors v1 , v2 , . . . , vN . In matrices, this means that instead of
storing V , we store F = (u1 u2 . . . un ), and store the first d columns of F t V ;
let W be the matrix of these columns. Coming out of storage, we can approximately recover the vectors v1 , v2 , . . . , vN as the columns of F W . The matrix
F represents an orthogonal transformation putting the vectors v1 , v2 , . . . , vN
nearly into the subspace spanned by e1 , e2 , . . . , ed , and mostly along the e1
direction, with fluctations mostly along the e2 direction, etc. So it is often
useful to take a look at the columns of W themselves, as a convenient picture
of the data.
Singular Value Factorization
Theorem 10.3. Every real matrix A can be written as A = U ΣV t , with U
and V orthogonal, and Σ has the same dimensions as A, with the form
D 0
Σ=
0 0
with D diagonal with nonnegative diagonal entries:
σ1
σ2
D=
.
.
..
σr
98
Singular Value Factorization
Proof. Suppose that A is p × q. Just like when we worked out principal components, we order the eigenvalues
of At A from largest to smallest. For each
p
eigenvalue λj , let σj = λj . (Since we saw that the eigenvalues λj of At A
aren’t negative, the square root makes sense.) Let V be the matrix whose
columns are an orthonormal basis of eigenvectors of At A, ordered by eigenvalue.
Write
V = V1 V2
with V1 the eigenvectors with positive eigenvalues, and V2 those with 0 eigenvalue. For each nonzero eigenvalue, define a vector
uj =
1
Avj .
σj
Suppose that there are r nonzero eigenvalues. Lets check that these vectors
u1 , u2 , . . . , ur are orthonormal.
1
1
hui , uj i =
Avi , Avj
σi
σj
1
=
hAvi , Avj i
σi σj
1 vi , At Avj
=
σi σj
1
=
hvi , λj vj i
σi σj
(
1 if i = j ,
λj
=p
λi λj 0 otherwise.
(
1 if i = j ,
=
0 otherwise.
If there aren’t enough vectors u1 , u2 , . . . , ur to make up a basis (i.e. if r < p),
then just write down some more vectors to make up an orthonormal basis, say
vectors ur+1 , u2 , . . . , up , and let
U1 = u1 u2 . . . ur ,
U2 = ur+1 u2 . . . up ,
U = U1 U2 .
By definition of these uj , Avj = σj uj , so AV1 = U1 D. Calculate
t
D 0
V1
t
U ΣV = U1 U2
0 0
V2 t
= U1 DV1 t
= AV1 V1−1
= A.
Singular Value Factorization
Corollary 10.4. Any square matrix A can be written as A = KP (the Cartan
decomposition, also called the polar decomposition), where K is orthogonal
and P is symmetric and positive semidefinite.
Proof. Write A = U ΣV t and set K = U V t and P = V ΣV t .
99
Chapter 11
Factorizations
Most theorems in linear algebra are obvious consequence of simple factorizations.
LU Factorization
Forward elimination is messy: we swap rows and add rows to lower rows. We
want to put together all of the row swaps into one permutation matrix, and all
of the row additions into one strictly lower triangular matrix.
Algorithm
To forward eliminate a matrix A (lets say with n rows), start by setting p to
be the permutation 1, 2, 3, . . . , n (the identity permutation), L = 1, U = A. To
start with, no entries of L are painted. Carry out forward elimination on U .
a. Each time you find a nonzero pivot in U , you paint a larger square box
in the upper left corner of L. (This number of rows in this painted box is
always the number of pivots in U .)
b. When you swap rows k and ` of U ,
1. swap entries k and ` of the permutation p and
2. swap rows k and ` of L, but only swap unpainted entries which lie
beneath painted ones.
c. If you add s (row k) to row ` in U , then put −s into column k, row ` in
L.
The painted box in L is always square, with number of rows and columns equal
to the number of pivots drawn in U .
Remark 11.1. As each step begins, the pivot rows of U with nonzero pivots in
them are finished, and the entries inside the painted box and the entries on and
above the diagonal of L are finished.
Theorem 11.2. By following the algorithm above, every matrix A can be
written as A = P −1 LU where P is a permutation matrix of a permutation p, L
a strictly lower triangular matrix, and U an upper triangular matrix.
101
102
Factorizations
Figure 11.1: Computing the LU factorization
p
L
1, 2, 3
1, 2, 3
1, 3, 2
1, 3, 2
1, 3, 2
1
0
0
1
−2
−3
1
−3
−2
1
−3
−2
1
−3
−2
U
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
1 0
2 0
3 3
1 0
0 0
0 3
1 0
0 3
0 0
1
0
0
3
0
0
1
0
0
3
0
0
0
1
3
0
1
3
0
3
1
0
3
1
0
3
1
Proof. Lets show that after each step, we always have P A = LU and always
have L strictly lower triangular. For the first forward elimination step, we might
have to swap rows. There is no painted box yet, so the algorithm says that
the row swap leaves all entries of L alone. Let Q be the permutation matrix of
the required row swap, and q the permutation. Our algorithm will pass from
p = 1, L = I, U = A to p = q, L = I, U = QA, and so P A = LU .
Next, we might have to add some multiples of the first row of U to lower
rows. We carry this out by a strictly lower triangular matrix, say
1 0
S=
,
s I
with s a vector. Notice that
S
−1
=
1
−s
0
,
I
subtracts the corresponding multiples of row 1 from lower rows. So U becomes
U new = SU , while the permutation p (and hence the matrix P ) stays the same.
The matrix L becomes Lnew = S −1 L = S −1 , strictly lower triangular, and
P new A = Lnew U new .
103
LU Factorization
Suppose that after some number of steps, we have reduced the upper left
corner of U to echelon form, say
U0 U1
U=
,
0 U2
with U0 in echelon form. Suppose that
L0
L=
L1
0
1
is strictly lower triangular, and that P is some permutation matrix. Finally,
suppose that P A = LU .
Our next step in forward elimination could be to swap rows k and ` in U ,
and these we can assume are rows in the bottom of U , i.e. rows of U2 . Suppose
that Q is the permutation matrix of a transposition so that QU2 is U2 with the
appropriate rows swapped. In particular, Q2 = I since Q is the permutation
matrix of a transposition. Let
I 0
P new =
P,
0 Q
I 0
I 0
new
L
=
L
,
0 Q
0 Q
I 0
U new =
U.
0 Q
Check that P new A = Lnew U new . Multiplying out:
L0 0
new
L
=
,
QL1 I
strictly lower triangular. The upper left corner L0 is the painted box. So Lnew
is just L with rows k and ` swapped under the painted box.
If we add s (row k) of U to row `, this means multiplying by a strictly lower
triangular matrix, say S. Then P A = LU implies that P A = LS −1 SU . But
LS −1 is just L with s (column `) subtracted from column k.
11.1 Find the LU-factorization of each of
0 1
A=
, B= 1 , C= 1 .
1 0
11.2∗ Suppose that A is an invertible matrix. Prove that any two LU-factorizations
of A which have the same permutation matrix P must be the same.
Tensors
Chapter 12
Quadratic Forms
Quadratic forms generalize the concept of inner product, and play a crucial role in
modern physics.
Bilinear Forms
Definition 12.1. A bilinear form on a vector space V is a rule associating to each
pair of vectors x and y from V a number b(x, y) which is linear as a function
of x for each fixed y and also linear as a function of y for each fixed x.
If V = R, then every bilinear form on V is b(x, y) = cxy where c could
be any constant.
Every inner product on a vector space is a bilinear form. Moreover,
a bilinear form b on V is an inner product just when it is symmetric
(b(v, w) = b(w, v)) and positive definite (b(v, v) > 0 unless v = 0).
If V = Rp , each p × p matrix A determines a bilinear map b by the
rule b(v, w) = hv, Awi. Conversely, given a bilinear form b on Rp , we
can define a matrix A by setting Aij = b (ei , ej ), and then clearly if we
expand out,
X
b(x, y) =
xi yj b (ei , ej )
ij
=
X
xi yj Aij
ij
= hx, Ayi .
So every bilinear form on Rp has the form b(x, y) = hx, Ayi for a
uniquely determined matrix A.
107
108
Quadratic Forms
Of course, we can add and scale bilinear forms in the obvious way to make
more bilinear forms, and the bilinear forms on a fixed vector space V form a
vector space.
Lemma 12.2. Fix a basis v1 , v2 , . . . , vp of V . Given any collection of numbers
bij (with i, j = 1, 2, . . . , p), there is precisely one bilinear form b with bij =
b (vi , vj ). Thus the vector space of bilinear forms on V is isomorphic to the
vector space of p × p matrices (bij ).
Proof. Given any bilinear form b we can calculate out the numbers
bij =
P
b (vi ,P
vj ). Conversely given any numbers bij , and vectors x =
i xi vi and
y = j yj vj we can let
X
b (x, y) =
bij xi yj .
i,j
Clearly adding bilinear forms adds the associated numbers bij , and scaling
bilinear forms scales those numbers.
Lemma 12.3. Let B be the set of all bilinear forms on V . If V has finite
2
dimension, then dim B = (dim V ) .
2
Proof. There are (dim V ) numbers bij .
Review problems
12.1 Let V be any vector space. Prove that b(x0 + y0 , x1 + y1 ) = y0 (x1 ) is a
bilinear form on V ⊕ V ∗ .
12.2∗ What are the numbers bij if b(x, y) = hx, yi (the usual inner product on
Rn )?
12.3∗ Suppose that V is a finite dimensional vector space, and let B be the
vector space of all bilinear forms on V . Prove that B is isomorphic to the vector
space Hom (V, V ∗ ), by the isomorphism F : Hom (V, V ∗ ) → B given by taking
each linear map T : V → V ∗ , to the bilinear form b given by b(v, w) = (T w)(v).
2
(This gives another proof that dim B = (dim V ) .)
12.4∗ A bilinear form b on V is degenerate if b(x, y) = 0 for all x for some fixed
y, or for all y for some fixed x.
a. Give the simplest example you can of a nonzero bilinear form which is
degenerate.
b. Give the simplest example you can of a bilinear form which is nondegenerate.
12.5∗ Let V be the vector space of all 2 × 2 matrices. Let b(A, B) = tr AB.
Prove that b is a nondegenerate bilinear form.
109
Quadratic Forms
12.6∗ Let V be the vector space of polynomial functions of degree at most 2.
For each of the expressions
(a)
Z
1
b(p(x), q(x)) =
p(x)q(x) dx,
−1
(b)
Z
∞
b(p(x), q(x)) =
2
p(x)q(x)e−x dx,
−∞
(c)
b(p(x), q(x)) = p(0)q 0 (0),
(d)
b(p(x), q(x)) = p(1) + q(1),
(e)
b(p(x), q(x)) = p(0)q(0) + p(1)q(1) + p(2)q(2),
is b bilinear? Is b degenerate?
Quadratic Forms
Definition 12.4. A bilinear form b on a vector space V is symmetric if b(x, y) =
b(y, x) for all x and y in V . A bilinear form b on a vector space V is positive
definite if b(x, x) > 0 for all x 6= 0.
12.7∗ Which of the bilinear forms in problem 12.6 are symmetric?
Definition 12.5. The quadratic form Q of a bilinear form b on a vector space
V is the real-valued function Q(x) = b(x, x).
The squared length of a vector in Rn ,
2
Q(x) = kxk = hx, xi =
X
x2i
i
is the quadratic form of the inner product on Rn .
Every quadratic form on Rn has the form
X
Q(x) =
Aij xi xj ,
ij
for some numbers Aij = Aji . We could make a symmetric matrix A
110
Quadratic Forms
with those numbers as entries, so that Q(x) = hx, Axi. The symmetric
matrix A is uniquely determined by the quadratic form Q and uniquely
determines Q.
12.8∗ What are the quadratic forms of the bilinear forms in problem 12.6 on
the preceding page?
Lemma 12.6. Every quadratic form Q determines a symmetric bilinear form
b by
1
b(x, y) = (Q(x + y) − Q(x) − Q(y)) .
2
Moreover, Q is the quadratic form of b.
Proof. There are various identities we have to check on b to ensure that b is a
bilinear form. Each identity involves a finite number of vectors. Therefore it
suffices to prove the result over a finite dimensional vector space V (replacing V
by the span of the vectors involved in each identity). Be careful: the identities
have to hold for all vectors from V , but we can first pick vectors from V , and
then replace V by their span and then check the identity.
Since we can assume that V is finite dimensional, we can take a basis for V
and therefore assume that V = Rn . Therefore we can write
Q(x) = hx, Axi ,
for a symmetric matrix A. Expanding out
1
(hx + y, A(x + y)i − hx, Axi − hy, Ayi)
2
= hx, Ayi ,
b(x, y) =
which is clearly bilinear.
12.9∗ The results of this chapter are still true over any field (although we won’t
try to make sense of being positive definite if our field is not R), except for
lemma 12.6. Find a counterexample to lemma 12.6 over the field of Boolean
numbers.
Theorem 12.7. The equation
b(x, y) =
1
(Q(x + y) − Q(x) − Q(y)) .
2
gives an isomorphism between the vector space of symmetric bilinear forms b
and the vector space of quadratic forms Q.
The proof is obvious just looking at the equation: if you scale the left side,
then you scale the right side, and vice versa, and similarly if you add bilinear
forms on the left side, you add quadratic forms on the right side.
111
Sylvester’s Law of Inertia
Sylvester’s Law of Inertia
Theorem 12.8 (Sylvester’s Law of Inertia). Given a quadratic form Q on a
finite dimensional vector space V , there is an isomorphism F : V → Rn for
which
Q(x) = x21 + x22 + · · · + x2p −x2p+1 − x2p+2 − · · · − x2p+q ,
|
{z
}|
{z
}
p positive terms
where
q negative terms
x1
x2
Fx = . .
..
xn
We cannot by any linear change of variables alter the value of p (the number of
positive terms) or the value of q (the number of negative terms).
Remark 12.9. Sylvester’s Law of Inertia tells us what all quadratic forms look
like, if we allow ourselves to change variables. The numbers p and q are the only
invariants. The reader should keep in mind that in our study of the spectral
theorem, we only allowed orthogonal changes of variable, so we got eigenvalues
as invariants. But here we allow any linear change of variable; in particular we
can rescale, so only the signs of the eigenvalues are invariant.
Proof. We could apply the spectral theorem, but we will instead use elementary
algebra. Take any basis for V , so that we can assume that V = Rn , and that
Q(x) = hx, Axi for some symmetric matrix A. In other words,
Q(x) = A11 x21 + A12 x1 x2 + . . . .
Suppose that A11 6= 0. Lets collect together all terms containing x1 and
complete the square
X
X
A11 x21 +
A1j x1 xj +
Ai1 xi x1
j>1
i>1
2
2
X
X
1
1
=A11 x1 +
A1j xj −
A1j xj
A11 j>1
A11 j>1
Let
y1 = x1 +
1 X
A1j xj .
A11 j>1
Then
Q(x) = A11 y12 + . . .
where the . . . involve only x2 , x3 , . . . , xn . Changing variables to use y1 in place
of x1 is an invertible linear change of variables, and gets rid of nondiagonal
terms involving x1 .
112
Quadratic Forms
We can continue this process using x2 instead of x1 , until we have used
up all variables xi with Aii 6= 0. So lets suppose that all diagonal terms of A
vanish. If there is some nondiagonal term of A which doesn’t vanish, say A12 ,
then make new variables y1 = x1 + x2 and y2 = x1 − x2 , so x1 = 12 (y1 + y2 )
and x2 = 12 (y1 − y2 ). Then x1 x2 = 14 y12 − y22 , turning the A12 x1 x2 term into
two diagonal terms.
So now we have killed off all nondiagonal terms, so we can assume that
Q(x) = A11 x21 + A22 x22 + · · · + Ann x2n .
We can rescale x1 by any nonzero constant c which scales A11 by 1/c2 . Lets
choose c so that c2 = ±A11 .
12.10∗ Apply this method to the quadratic form
Q(x) = x21 + x1 x2 + x2 x1 + x2 x3 + x3 x2 .
Next we have to show that the numbers p of positive terms and q of negative
terms cannot be altered by any linear change of variables. We can assume (by
using the linear change of variables we have just constructed) that V = Rn and
that
Q(x) = x21 + x22 + · · · + x2p − x2p+1 + x2p+2 + · · · + x2p+q .
We want to show that p is the largest dimension of any subspace on which Q is
positive definite, and similarly that q is the largest dimension of any subspace
on which Q is negative definite. Consider the subspace V+ of vectors of the
form
x1
x2
..
.
xp
x=
0 .
0
.
..
0
Clearly Q is positive definite on V+ . Similarly, Q is negative definite on the
113
Sylvester’s Law of Inertia
subspace V− of vectors of the form
0
0
..
.
0
xp+1
xp+2
x = . .
..
xp+q
0
0
.
..
0
Suppose that we can find some subspace W of V of greater dimension than p,
so that Q is positive definite on W . Let T be the orthogonal projection to V+ .
In other words, for any vector x in Rn , let
x1
x2
..
.
xp
P+ x =
0 .
0
.
..
0
Then P+ |W : W → V+ is a linear map, and dim W > dim V+ , so
dim W = dim ker P+ |W + dim im T |W
≤ dim ker P+ |W + dim V+
< dim ker P+ |W + dim W.
so, subtracting dim W from both sides,
0 < dim ker P+ |W .
114
Quadratic Forms
Therefore there is a nonzero vector x in W for which P+ x = 0, i.e.
0
0
..
.
0
xp+1
xp+2
x = . .
..
xp+q
xp+q+1
xp+q+2
.
..
xn
Clearly Q(x) > 0 since x lies in W . But clearly
Q(x) = −x2p+1 − x2p+2 − · · · − x2p+q ≤ 0,
a contradiction.
Remark 12.10. Much of the proof works over any field, as long as we can divide
by 2, i.e. as long as 2 6= 0. However, there could be a problem when we try to
rescale: even if 2 6= 0, we can only arrange
Q(x) = ε1 x21 + ε2 x22 + · · · + εn x2n ,
where each εi can be rescaled by any nonzero number of the form c2 . (There is
no reasonable analogue of the numbers p and q in a general field). In particular,
since every complex number has a square root, the same theorem is true for
complex quadratic forms, but in the stronger form that we can arrange q = 0,
i.e. we can arrange
Q(x) = x21 + x22 + · · · + x2p .
12.11∗ Prove that a quadratic form on any real n-dimensional vector space is
nondegenerate just when p + q = n, with p and q as in Sylvester’s law of inertia.
12.12∗ For a complex quadratic form, prove that if we arrange our quadratic
form to be
Q(x) = x21 + x22 + · · · + x2p .
then we are stuck with the resulting value of p, no matter what linear change
of variables we employ.
115
Kernel and Null Cone
Kernel and Null Cone
Definition 12.11. Take a vector space V . The kernel of a symmetric bilinear
form b on V is the set of all vectors x in V for which b(x, y) = 0 for any vector
y in V .
12.13∗ Find the kernel of the symmetric bilinear form b(x, y) = x1 y1 − x2 y2
for x and y in R3 .
Definition 12.12. The null cone of a symmetric bilinear form b(x, y) is the set
of all vectors x in V for which b(x, x) = 0.
The vector
1
x=
0
lies in the null cone of the symmetric bilinear form b(x, y) = x1 y1 − x2 y2
for x and y in R2 . Indeed the null cone of that symmetric bilinear form
is 0 = b(x, x) = x21 − x22 , so its the pair of lines x1 = x2 and x1 = −x2
in R2 .
12.14 Prove that the kernel of a symmetric bilinear form lies in its null cone.
12.15∗ Find the null cone of the symmetric bilinear form b(x, y) = x1 y1 − x2 y2
for x and y in R3 . What part of the null cone is the kernel?
12.16∗ Prove that the kernel is a subspace. For which symmetric bilinear forms
is the null cone a subspace? For which symmetric bilinear forms is the kernel
equal to the null cone?
Review problems
12.17∗ Prove that the kernel of a symmetric bilinear form consists precisely in
the vectors x for which Q(x + y) = Q(y) for all vectors y, with Q the quadratic
form of that bilinear form. (In other words, we can translate in the x direction
without altering the value of the function Q(y).)
Orthonormal Bases
Definition 12.13. If b is a nondegenerate symmetric bilinear form on a vector
space V , a basis v1 , v2 , . . . , vn for V is called orthonormal for b if if
(
±1 if i = j,
b (vi , vj ) =
0
if i 6= j.
Corollary 12.14. Any nondegenerate symmetric bilinear form any finite dimensional vector space has an orthonormal basis.
116
Quadratic Forms
Proof. Take the linear change of variables guaranteed by Sylvesters’ law of
inertia, and then the standard basis will be orthonormal.
12.18 Find an orthonormal basis for the quadratic form b(x, y) = x1 y2 + x2 y1
on R2 .
12.19 Suppose that b is a symmetric bilinear form on finite dimensional vector
space V .
(a) For each vector x in V , define a linear map ξ : V → R by ξ(y) = b(x, y).
Write this covector ξ as ξ = T x. Prove that the map T : V → V ∗ given
by ξ = T x is linear.
(b) Prove that the kernel of T is the kernel of b.
(c) Prove that T is an isomorphism just when b is nondegenerate. (The moral
of the story is that a nondegenerate symmetric bilinear form b identifies
the vector space V with its dual space V ∗ via the map T .)
(d) If b is nondegenerate, prove that for each covector ξ, there is a unique
vector x in V so that b(x, y) = ξ(y) for every vector y in V .
Review problems
12.20∗ What is the linear map T of problem 12.19 (i.e. write down the associated matrix) for each of the following symmetric bilinear forms?
(a) b(x, y) = x1 y2 + x2 y1 on R2
(b) b(x, y) = x1 y1 + x2 y2 on R2
(c) b(x, y) = x1 y1 − x2 y2 on R2
(d) b(x, y) = x1 y1 − x2 y2 − x3 y3 − x4 y4 on R4
(e) b(x, y) = x1 (y1 + y2 + y3 ) + (x1 + x2 + x3 ) y1
Chapter 13
Tensors and Indices
In this chapter, we define the concept of a tensor in Rn , following an approach common
in physics.
What is a Tensor?
Vectors x have entries xi
x1
x2
x = . .
..
xn
A matrix A has entries Aij . To describe the entries of a vector x, we use a single
index, while for a matrix we use two indices. A tensor is just an object whose
entries have any number of indices. (Entries will also be called components.)
In a physics course, one learns that stress applied to a crystal causes
an electric field (see Feynman et. al. [2] II-31-12). The electric field
is a vector E (at each point in space) with components Ei , while the
stress is a symmetric matrix S with components Sij = Sji . These are
related by the piezoelectric tensor P , which has components Pijk :
X
Ei =
Pijk Sjk .
jk
Just as a matrix is a rectangle of numbers, a tensor with three indices
is a box of numbers.
For this chapter, all of our tensors will be “tensors in Rn ,” which means
that the indices all run from 1 to n. For example, our vectors are literally in
Rn , while our matrices are n × n, etc.
The subject of tensors is almost trivial, since there is really nothing much
we can say in any generality about them. There are two subtle points: upper
versus lower indices and summation notation.
117
118
Tensors and Indices
Upper versus lower indices
It is traditional (following Einstein) to write components of vectors x not as xi
but as xi , so not as
x1
x2
x = . ,
..
xn
but instead as
x1
x2
x = . .
..
xn
In particular, x2 doesn’t mean x · x, but means the second entry of x. Next we
write elements of Rn∗ as
y = y1
y2
...
yn ,
with indices down. We will call the elements of Rn∗ covectors. Finally, we write
matrices as
1
A1 A12 . . . A1q
A21 A22 . . . A2q
A= .
..
. ,
..
. . . . ..
Ap1
Ap2
...
Apq
so Arow
column . In general, a tensor can have as many upper and lower indices as
we need, and we will treat upper and lower indices as being different.
For example, the components of a matrix look like Aij , never like Aij or Aij ,
which would represent tensors of a different type.
Summation Notation
Following Einstein further, whenever we write an expression with some letter
appearing once as an upper index and once as a lower index, like Aij xj , this
P
means j Aij xj , i.e. a sum is implicitly understood over the repeated j index.
We will often refer to a vector x as xi . This isn’t really fair, since it confuses
a single component xi of a vector with the entire vector, but it is standard.
Similarly, we write a matrix A as Aij and a tensor with 2 upper and 3 lower
indices as
tij
klm .
The names of the indices have no significance and will usually change during
the course of calculations.
119
Operations
Operations
What can we do with tensors? Very little. At first sight, they look complicated.
But there are very few operations on tensors. We can
a. Add tensors that have the same numbers of upper and of lower indices;
for example add
sijk
lm
to
tijk
lm
to get
ijk
sijk
lm + tlm .
If the tensors are vectors or matrices, this is just adding in the usual way.
b. Scale; for example
3 tij
klm
means the obvious thing: triple each component of tij
klm .
c. Swap indices of the same type; for example, take a tensor tijk and make
the tensor tikj . There is no nice notation for doing this.
d. Take tensor product: just write down two tensors beside one another, with
distinct indices; for example, the tensor product of sij and tij is
sij tkl .
Note that we have to change the names on the indices of t before we write
it down, so that we don’t use the same index names twice.
e. Finally, contract: take any one upper index, and any one lower index, and
set them equal and sum. For example, we can contract the i and k indices
of a tensor tijk to produce tiji . Note that tiji has only one index j, since
the summation convention tells us to sum over all possibilities for the i
index. So tiji is a covector.
In tensor calculus there are some additional operations on tensor quantities
(various methods for differentiating and integrating), and these additional operations are essential to physical applications, but tensor calculus is not in the
algebraic spirit of this book, so we will never consider any other operations than
those listed above.
If xi is a vector and yi a covector, then we can’t add them, because the
indices don’t match. But we can take their tensor product xi yj , and
then we can contract to get xi yi . This is of course just y(x), thinking
of every covector y as a linear function on Rn .
120
Tensors and Indices
If xi is a vector and Aij is a matrix, then their tensor product is Aij xk , and
contracting gives Aij xj , which is the vector Ax. So matrix multiplication
is tensor product followed by contraction.
If Aij and Bji are two matrices, then Aik Bjk is the matrix AB. Similarly,
Akj Bki is the matrix BA. These are the two possible contractions of the
tensor product Aij Blk .
A matrix Aij has only one contraction: Aii , the trace, which is a number
(because it has no free indices).
It is standard in working with tensors to write the entries of the identity
matrix not as Iji but as δji :
δji
(
1
=
0
if i = j,
otherwise.
The trace of the identity matrix is δii = n, since for each value of i
we add one. Tensor products of the identity matrix give many other
tensors, like δji δlk .
13.1 Take the tensor product of the identity matrix and a covector ξ, and
simplify all possible contractions.
If a tensor has various lower indices, we can average over all permutations of them. This process is called symmetrizing over indices. For
example, a tensor tijk can be symmetrized to a tensor 12 tijk + 12 tikj . Obviously we can also symmetrize over any two upper indices. But we
can’t symmetrize over an upper and a lower index.
If we fix our attention on a pair of indices, we can also antisymmetrize
over them, say taking a tensor tjk and producing
1
1
tjk − tkj .
2
2
A tensor is symmetric in some indices if it doesn’t change when they
are permuted, and is antisymmetric in those indices if it changes by the
sign of the permutation when the indices are permuted.
Again focusing on just two lower indices, we can split any tensor into a
121
Operations
sum
tjk =
1
1
1
1
tjk + tkj + tjk − tkj
|2
{z 2 } |2
{z 2 }
symmetric
antisymmetric
of a symmetric part and an antisymmetric part.
13.2∗ Suppose that a tensor tijk is symmetric in i and j, and antisymmetric
in j and k. Prove that tijk = 0.
Of course, we write a tensor as 0 to mean that all of its components are 0.
Lets look at some tensors with lots of indices. Working in R3 , define a
tensor ε by setting
1
if i, j, k is an even permutation of 1, 2, 3,
−1 if i, j, k is an odd permutation of 1, 2, 3,
εijk =
0
if i, j, k is not a permutation of 1, 2, 3.
Of course, i, j, k fails to be a permutation just when two or three of i, j
or k are equal. For example,
ε123 = 1, ε221 = 0, ε321 = −1, ε222 = 0.
13.3 Take three vectors x, y and z in R3 , and calculate the contraction
εijk xi y j z k .
13.4∗ Prove that every tensor tijk which is antisymmetric in all lower
indices is a constant multiple tijk = c εijk .
More generally, working in Rn , we
1
if i1 , i2 , . . . in is
−1 if i , i , . . . i is
1 2
n
εi1 i2 ...in =
0
if
i
,
i
,
.
.
.
i
1
2
n is
can define a tensor ε by
an even permutation of 1, 2, . . . , n,
an odd permutation of 1, 2, . . . , n,
not a permutation of 1, 2, . . . , n.
Note that
εi1 i2 ...in Ai11 Ai22 . . . Ainn = det A,
for a matrix A.
122
Tensors and Indices
Changing Variables
Lets change variables from x to y = F x, with F an invertible matrix. We want
to make a corresponding tensor F∗ t out of any tensor t, so that sums, scalings,
tensor products, and contractions will correspond, and so that on vectors F∗ x
will be F x. These requirements will determine what F∗ t has to be.
Lets start with a covector ξ i . We need to preserve the contraction ξ(x) on
any vector x, so need to have F∗ ξ (F∗ x) = ξ(x).
Replacing the vector x by some
vector y = F x, we get F∗ ξ(y) = ξ F −1 y , for any vector y, which identifies
F∗ ξ as
F∗ ξ = ξF −1 .
In indices:
(F∗ ξ)i = ξj F −1
j
i
.
In other words, F∗ ξ is ξ contracted against F −1 . So vectors transform as
F∗ x = F x (contract with F ), and covectors as F∗ ξ = ξF −1 (contract with
F −1 ). Contracting any tensor with as many vectors and covectors as needed
to form a number; in order to preserve these contractions, the tensor’s upper
indices must transform like vectors, and its lower indices like covectors, when
we carry out F∗ . For example,
r
ij
(F∗ t)k = Fpi Fqj tpq
F −1 k
r
In other words, we contract one copy of F with each upper index and contract
one copy of F −1 with each lower index.
For example, lets see the invariance of contraction under F∗ in the simplest
case of a matrix.
k
i
(F∗ A)i = Fji Ajk F −1 i
k
= Ajk F −1 i Fji
k
= Ajk F −1 F j
= Ajk δjk
= Ajj
since Ajk δjk has a sum over j and k, but each term vanishes unless j = k, in
which case we find Ajj being added.
13.5 Prove that both of the contractions of a tensor tijk are preserved by linear
change of variables.
13.6∗ Find F∗ ε. (Recall the tensor ε from example 13 on the previous page.)
Remark 13.1. Note how abstract the subject is: it is rare that we would write
down examples of tensors, with actual numbers in their entries. It is more
common that we think about tensors as abstract algebraic gadgets for storing
multivariate data from physics. Writing down examples, in this rarified air,
would only make the subject more confusing.
123
Two Problems
Two Problems
Two problems:
a. How can you tell if two tensors can be made equal by a linear change of
variable?
b. How can a tensor be split into a sum of tensors, in a manner that is
unaltered by linear change of variables?
The first problem has no solution. For a matrix, the answer is Jordan normal
form (theorem 4.1 on page 44). For a quadratic form, the answer is Sylvester’s
law of inertia (theorem 12.8 on page 111). But a general tensor can store an
enormous amount of information, and no one knows how to describe it. We can
give some invariants of a tensor, to help to distinguish tensors. These invariants
are similar to the symmetric functions of the eigenvalues of a matrix.
The second problem is much easier, and we will provide a complete solution.
Tensor Invariants
The contraction of indices is invariant under F∗ , as is tensor product. So given
a tensor t, we can try to write down some invariant numbers out it by taking
any number of tensor products of that tensor with itself, and any number of
contractions until there are no indices left.
For example, a matrix Aij has among its invariants the numbers
Aii , Aij Aji , Aij Ajk Aki , . . .
In the notation of earlier chapters, these numbers are
tr A, tr A2 , tr A3 , . . .
i.e. the functions we called pk (A) in example 7 on page 74. We already know
from that chapter that every real-valued polynomial invariant of a matrix is a
function of p1 (A), p2 (A), . . . , pn (A). More generally, all real-valued polynomial
invariants of a tensor are polynomial functions of those obtained by taking some
number of tensor products followed by some number of contractions. (The proof
is very difficult; see Olver [6] and Procesi [7]).
13.7∗ Describe as many invariants of a tensor tij
kl as you can.
It is difficult to decide how many of these invariants you need to write down
before you can be sure that you have a complete set, in the sense that every
invariant is a polynomial function of the ones you have written down. General
theorems (again see Olver [6] and Procesi [7]) ensure that eventually you will
produce a finite complete set of invariants.
If a tensor has more lower indices than upper indices, then so does every
tensor product of it with itself any number of times, and so does every contraction. Therefore there are no polynomial invariants of such tensors. Similarly if
124
Tensors and Indices
there are more upper indices than lower indices then there are no polynomial
invariants. For example, a vector has no polynomial invariants. For example, a
quadratic form Qij = Qji has no upper indices, so has no polynomial invariants.
(This agrees with Sylvester’s law of inertia (theorem 12.8 on page 111), which
tells us that the only invariants of a quadratic form (over the real numbers) are
the integers p and q, which are not polynomials in the Qij entries.
Breaking into Irreducible Components
Cartesian Tensors
Tensors in engineering and applied physics never have any upper indices. However, the engineers and applied physicists never carry out any linear changes
of variable except for those described by orthogonal matrices. This practice is
(quite misleadingly) referred to as working with “Cartesian tensors.” The point
to keep in mind is that they are working in Rn in a physical context in which
lengths (and distances) are physically measurable. In order to preserve lengths,
the only linear changes of variable we can employ are those given by orthogonal
matrices.
If we had a tensor with both upper and lower indices, like tijk , we can see
that it transforms under a linear change of variables y = F x as
q
r
i
(F∗ t)jk = Fpi tpqr F −1 j F −1 k .
Lets define a new tensor by letting sijk = tijk , just dropping the upper index. If
t
p
F is orthogonal, i.e. F −1 = F t , then F = F −1 , so Fpi = F −1 i . Therefore
p
q
r
(F∗ s)ijk = F −1 i spqr F −1 j F −1 k
q
r
= Fpi tpqr F −1 j F −1 k
i
= (F∗ t)jk .
We summarize this calculation:
Theorem 13.2. Dropping upper indices to become lower indices is an operation
on tensors which is invariant under any orthogonal linear change of variable.
Note that this trick only works for orthogonal matrices F , i.e. orthogonal
changes of variable.
13.8∗ Prove that doubling (i.e. the linear map y = F x = 2x) acts on a vector
x by doubling it, on a covector by scaling by 12 . (In general, this linear map
F x = 2x acts on any tensor t with p upper and q lower indices by scaling by
2p−q , i.e. F∗ t = 2p−q t.) What happens if you first lower the index of a vector
and then apply F∗ ? What happens if you apply F∗ and then lower the index?
13.9∗ Prove that contracting two lower indices with one another is an operation
on tensors which is invariant under orthogonal linear change of variable, but
not under rescaling of variables.
Cartesian Tensors
Engineers often prefer their approach to tensors: only lower indices, and all
of the usual operations. However, their approach makes rescalings, and other
nonorthogonal transformations (like shears, for example) more difficult. There
is a similar approach in relativistic physics to lower indices: by contracting with
a quadratic form.
125
Chapter 14
Tensors
We give a mathematical definition of tensor, and show that it agrees with the more
concrete definition of chapter 13. We will continue to use Einstein’s summation
convention throughout this chapter.
Multilinear Functions and Multilinear Maps
Definition 14.1. If V1 , V2 , . . . , Vp are some vector spaces, then a function t(x1 , x2 , . . . , xp )
is multilinear if t(x1 , x2 , . . . , xp ) is a linear function of each vector when all of
the other vectors are held constant. (The vector x1 comes from the vector space
V1 , etc.)
Similarly a map t taking vectors x1 in V1 , x2 in V2 , . . . , xp in Vp , to a
vector w = t(x1 , x2 , . . . , xp ) in a vector space W is called a multilinear map if
t(x1 , x2 , . . . , xp ) is a linear map of each vector xi , when all of the other vectors
are held constant.
The function t(x, y) = xy is multilinear for x and y real numbers, being
linear in x when y is held fixed and linear in y when x is held fixed.
Note that t is not linear as a function of the vector
x
.
y
Any two linear functions ξ : V → R and η : W → R on two vector
spaces determine a multilinear map t(x, y) = ξ(x)η(y) for x from V and
y from W .
What is a Tensor?
We already have a definition of tensor, but only for tensors “in Rn ”. We want
to define some kind of object which we will call a tensor in an abstract vector
space, so that when we pick a basis (identifying the vector space with Rn ), the
object becomes a tensor in Rn . The clue is that a tensor in Rn has lower and
upper indices, which can be contracted against vectors and covectors. For now,
127
128
Tensors
lets only think about tensors with just upper indices. Upper indices contract
against covectors, so we should be able to “plug in covectors,” motivating the
definition:
Definition 14.2. For any finite dimensional vector spaces V1 , V2 , . . . , Vp , let
V1 ⊗ V2 ⊗ · · · ⊗ Vp (called the tensor product of the vector spaces V1 , V2 , . . . , Vp )
be the set of all multilinear maps
t (ξ1 , ξ2 , . . . , ξp ) ,
where ξ1 is a covector from V1 ∗ , ξ2 is a covector from V2 ∗ , etc. Each such
multilinear map t is called a tensor.
A tensor tij in Rn following our old definition (from chapter 13) yields a
tensor t(ξ, η) = tij ξi ηj following this new definition. On the other hand,
if t (ξ, η) is a tensor in Rn ⊗ Rn , then we can
define a tensor following
our old definition by letting tij = t ei , ej , where e1 , e2 , . . . , en is the
usual dual basis to the standard basis of Rn .
Let V be a finite dimensional vector space. Recall that there is a
natural isomorphism V → V ∗∗ , given by sending any vector v to the
linear function fv on V ∗ given by fv (ξ) = ξ(v). We will henceforth
identify any vector v with the function fv ; in other words we will from
now on use the symbol v itself instead of writing fv , so that we think
of a covector ξ as a linear function ξ(v) on vectors, and also think of
a vector v as a linear function on covectors ξ, by the bizarre definition
v(ξ) = ξ(v). In this way, a vector is the simplest type of tensor.
Definition 14.3. Let V and W be finite dimensional vector spaces, and take v
a vector in V and w a vector in W . Then write v ⊗ w for the multilinear map
v ⊗ w(ξ, η) = ξ(v)η(w).
So v ⊗ w is a tensor in V ⊗ W , called the tensor product of v and w.
Definition 14.4. If s is a tensor in V1 ⊗ V2 ⊗ · · · ⊗ Vp , and t is a tensor in
W1 ⊗ W2 ⊗ · · · ⊗ Wq , then let s ⊗ t, called the tensor product of s and t, be the
tensor in V1 ⊗ V2 ⊗ · · · ⊗ Vp ⊗ W1 ⊗ W2 ⊗ · · · ⊗ Wq given by
s ⊗ t (ξ1 , ξ2 , . . . , ξp , η1 , η2 , . . . , ηq ) = s (ξ1 , ξ2 , . . . , ξp ) t (η1 , η2 , . . . , ηq ) .
Definition 14.5. Similarly, we can define the tensor product of several tensors.
For example, given finite dimensional vector spaces U, V and W , and vectors
u from U , v from V and w from W , let u ⊗ v ⊗ w mean the multilinear map
u ⊗ v ⊗ w(ξ, η, ζ) = ξ(u)η(v)ζ(w), etc.
129
What is a Tensor?
14.1∗ Prove that
(av) ⊗ w = a (v ⊗ w) = v ⊗ (aw)
(v1 + v2 ) ⊗ w = v1 ⊗ w + v2 ⊗ w
v ⊗ (w1 + w2 ) = v ⊗ w1 + v ⊗ w2
for any vectors v, v1 , v2 from V and w, w1 , w2 from W and any number a.
14.2∗ Take U, V and W any finite dimensional vector spaces. Prove that
(u ⊗ v) ⊗ w = u ⊗ (v ⊗ w) = u ⊗ v ⊗ w for any three vectors u from U , v from
V and w from W .
Theorem 14.6. If V and W are two finite dimensional vector spaces, with
bases v1 , v2 , . . . , vp and w1 , w2 , . . . , wq , then V ⊗ W has as a basis the vectors
vi ⊗ wJ for i running over 1, 2, . . . , p and J running over 1, 2, . . . , q.
Proof. Take the dual bases v 1 , v 2 , . . . , v p and w1 , w2 , . . . , wq . Every tensor t
from V ⊗ W has the form
t(ξ, η) = t ξi v i , ηJ wJ
= ξi η J t v i , w J
so let tij = t v i , wJ to find
= tiJ ξi ηJ
= tiJ vi ⊗ wJ (ξ, η) .
So the vi ⊗ wJ span. Any linear relation between the vi ⊗ wJ , just reading
these lines from bottom to top, would
yield a vanishing multilinear map, so
would have to satisfy 0 = t v k , wL = tkL , forcing all coefficients in the linear
relation to vanish.
Remark 14.7. A similar theorem, with a similar proof, holds for any tensor
products: take any finite dimensional vector spaces V1 , V2 , . . . , Vp , and pick
any basis for V1 and any basis for V2 , etc. Then taking one vector from each
basis, and taking the tensor product of these vectors, we obtain a tensor in
V1 ⊗ V2 ⊗ · · · ⊗ Vp . These tensors, when we throw in all possible choices of basis
vectors for all of those bases, yield a basis for V1 ⊗ V2 ⊗ · · · ⊗ Vp , called the
tensor product basis.
14.3 Let V = R3 , W = R2 and let
1
4
x = 2 , y =
.
5
3
What is x ⊗ y in terms of the standard basis vectors ei ⊗ eJ ?
130
Tensors
Definition 14.8. Tensors of the form v ⊗ w are called pure tensors.
14.4 Prove that every tensor in V ⊗ W can be written as a sum of pure tensors.
Consider in R3 the tensor
e1 ⊗ e1 + e2 ⊗ e2 + e3 ⊗ e3 .
This tensor is not pure (which is certainly not obvious just looking at
it). Lets see why. Any pure tensor x ⊗ y must be
x ⊗ y = x1 e1 + x2 e2 + x3 e3 ⊗ y 1 e1 + y 2 e2 + y 3 e3
=x1 y 1 e1 ⊗ e1 + x2 y 1 e2 ⊗ e1 + x3 y 1 e3 ⊗ e1
+ x1 y 2 e1 ⊗ e2 + x2 y 2 e2 ⊗ e2 + x3 y 2 e3 ⊗ e2
+ x1 y 3 e1 ⊗ e3 + x2 y 3 e2 ⊗ e3 + x3 y 3 e3 ⊗ e3 .
If we were going to have x ⊗ y = e1 ⊗ e1 + e2 ⊗ e2 + e3 ⊗ e3 , we would
need x1 y 1 = 1, x2 y 2 = 1, x3 y 3 = 1, but also x1 y 2 = 0, so x1 = 0 or
y 2 = 0, contradicting x1 y 1 = x2 y 2 = 1.
Definition 14.9. The rank of a tensor is the minimum number of pure tensors
that can appear when it is written as a sum of pure tensors.
Definition 14.10. If U, V and W are finite dimensional vector spaces and b :
U ⊕ V → W is a map for which b(u, v) is linear in u for any fixed v and linear
in v for any fixed u, say that b is a bilinear map.
Theorem 14.11 (Universal Mapping Theorem). Every bilinear map b : U ⊕
V → W induces a unique linear map B : U ⊗ V → W , by the rule B (u ⊗ v) =
b(u, v). Sending b to B = T b gives an isomorphism T between the vector space
Z of all bilinear maps f : U ⊕ V → W and the vector space Hom (U ⊗ V, W ).
14.5∗ Prove the universal mapping theorem:
Review problems
∗
14.6∗ Write down an isomorphism V ∗ ⊗ W ∗ = (V ⊗ W ) , and prove that it is
an isomorphism.
14.7∗ If S : V0 → V1 and T : W0 → W1 are linear maps of finite dimensional
vector spaces, prove that there is a unique linear map, which we will write as
S ⊗ T : V0 ⊗ W0 → V1 ⊗ W1 , so that
S ⊗ T (v0 ⊗ w0 ) = (Sv0 ) ⊗ (T w0 ) .
14.8∗ What is the rank of e1 ⊗ e1 + e2 ⊗ e2 + e3 ⊗ e3 as a tensor in R3 ⊗ R3 ?
131
Review problems
14.9∗ Let any tensor v ⊗ w in V ⊗ W eat a covector ξ from V ∗ by the rule
(v ⊗ w) ξ = ξ(v)w. Prove that this makes v ⊗ w into a linear map V ∗ → W .
Prove that this definition extends to a linear map V ⊗ W → Hom (V ∗ , W ).
Prove that the rank of a tensor as defined above isPthe rank of the associated
linear map V ∗ → W . Use this to find the rank of i ei ⊗ ei in Rn ⊗ Rn .
14.10∗ Take two vector spaces V and W and define a vector space V ∗ W to
be the collection of all real-valued functions on V ⊕ W which are zero except at
finitely many points. Careful: these functions don’t have to be linear. Picking
any vectors v from V and w from W , lets write the function
(
1,
f (x, y) =
0,
if x = v and y = w
otherwise,
as v ∗ w. So clearly V ∗ W is a vector space, whose elements are linear combinations of elements of the form v ∗ w. Let Z be the subspace of V ∗ W spanned
by the vectors
(av) ∗ w − a(v ∗ w),
v ∗ (aw) − a(v ∗ w),
(v1 + v2 ) ∗ w − v1 ∗ w − v2 ∗ w,
v ∗ (w1 + w2 ) − v ∗ w1 − v ∗ w2 ,
for any vectors v, v1 , v2 from V and w, w1 , w2 from W and any number a.
a. Prove that if V and W both have positive and finite dimension, then
V ∗ W and Z are infinite dimensional.
b. Write down a linear map V ∗ W → V ⊗ W .
c. Prove that your linear map has kernel containing Z.
It turns out that (V ∗ W ) /Z is isomorphic to V ⊗ W . We could have defined
V ⊗ W to be (V ∗ W ) /Z, and this definition has many advantages for various
generalizations of tensor products.
Remark 14.12. In the end, what we really care about is that tensors using our
abstract definitions should turn out to have just the properties they had with the
more concrete definition in terms of indices. So even if the abstract definition
is hard to swallow, we will really only need to know that tensors have tensor
products, contractions, sums and scaling, change according to the usual rules
when we linearly change variables, and that when we tensor together bases, we
obtain a basis for the tensor product. This is the spirit behind problem 14.10.
132
Tensors
“Lower Indices”
Fix a finite dimensional vector space V . Consider the tensor product V ⊗ V ∗ .
A basis v1 , v2 , . . . , vn for V has a dual basis v 1 , v 2 , . . . , v n for V ∗ , and a tensor
product basis vi ⊗ v j . Every tensor t in V ⊗ V ∗ has the form
t = tij vi ⊗ v j .
So it is clear where the lower indices come from when we pick a basis: they
come from V ∗ .
14.11 We saw in chapter 13 that a matrix A is written in indices as Aij . Describe
an isomorphism between Hom (V, V ) and V ⊗ V ∗ .
Lets define contractions.
Theorem 14.13. Let V and W be finite dimensional vector spaces. There is
a unique linear map
V ⊗ V ∗ ⊗ W → W,
called the contraction map, that on pure tensors takes v ⊗ ξ ⊗ w to ξ(v)w.
Remark 14.14. We can generalize this idea in the obvious way to any tensor
product of any finite number of finite dimensional vector spaces: if one of the
vector spaces is the dual of another one, then we can contract. For example, we
can contract V ⊗ W ⊗ V ∗ by a linear map which on pure tensors takes v ⊗ w ⊗ ξ
to ξ(v)w.
Proof. Pick a basis v1 , v2 , . . . , vp of V and a basis w1 , w2 , . . . , wq of W . Define
T on the basis vi ⊗ v j ⊗ wK by
(
wK if i = j,
j
T vi ⊗ v ⊗ wK =
0
if i 6= j.
By theorem 1.13 on page 11, there is a unique linear map
T : V ⊗ V ∗ ⊗ W → W,
which has these values on these basis vectors. Writing any vector v in V as
v = ai vi and any covector ξ in V ∗ as ξ = bi v i , and any vector w in W as
w = cJ wJ , we find
T v ⊗ ξ ⊗ w = ai bj cJ wJ = ξ(v)w.
Therefore there is a linear map T : V ⊗ V ∗ ⊗ W → W , that on pure tensors
takes v ⊗ ξ ⊗ w to ξ(v)w. Any other such map, say S, which agrees with T on
pure tensors, must agree on all linear combinations of pure tensors, so on all
tensors.
“Swapping Indices”
“Swapping Indices”
We have one more tensor operation to generalize to abstract vector spaces:
when working in indices we can associate to a tensor tijk the tensor tikj , i.e.
swap indices. This generalizes in the obvious manner.
Theorem 14.15. Take V and W finite dimensional vector spaces. There is
a unique linear isomorphism V ⊗ W → W ⊗ V , which on pure tensors takes
v ⊗ w to w ⊗ v.
14.12∗ Prove theorem 14.15, by imitating the proof of theorem 14.13.
Remark 14.16. In the same fashion, we can make a unique linear isomorphism
reordering the factors in any tensor product of vector spaces. To be more specific,
take any permutation q of the numbers 1, 2, . . . , p. Then (with basically the
same proof) there is a unique linear isomorphism
V1 ⊗ V2 ⊗ · · · ⊗ Vp → Vq(1) ⊗ Vq(2) ⊗ · · · ⊗ Vq(p)
which takes each pure tensor v1 ⊗ v2 ⊗ · · · ⊗ vp to the pure tensor vq(1) ⊗ vq(2) ⊗
· · · ⊗ vq(p) .
Summary
We have now acheived our goal: we have defined tensors on an abstract finite
dimensional vector space, and defined the operations of addition, scaling, tensor
product, contraction and “index swapping” for tensors on an abstract vector
space.
All there is to know about tensors is that
a. they are sums of pure tensors v ⊗ w,
b. the pure tensor v ⊗ w depends linearly on v and linearly on w, and
c. the universal mapping property.
Another way to think about the universal mapping property is that there are
no identities satisfied by tensors other than those which are forced by (1) and
(2); if there were, then we couldn’t turn a bilinear map which didn’t satisfy that
identity into a linear map on tensors, i.e. we would contradict the universal
mapping property. Roughly speaking, there is nothing else that you could know
about tensors besides (1) and (2) and the fact that there is nothing else to
know.
Cartesian Tensors
If V is a finite dimensional inner product space, then we can ask how to “lower
indices” in this abstract setting. Given a single vector v from V , we can “lower
133
134
Tensors
its index” by turning v into a covector. We do this by contructing the covector
ξ(x) = hv, xi. One often sees this covector ξ written as v ∗ or some such notation,
and usually called the dual to v.
14.13∗ Prove that the map ∗ : V → V ∗ given by taking v to v ∗ is an isomorphism of finite dimensional vector spaces.
Careful: the covector v ∗ depends on the choice not only of the vector v, but
also of the inner product.
14.14∗ In the usual inner product in Rn , what is the map ∗ ?
14.15∗ Let hx, yi0 be the usual inner product on Rn , and define a new inner
product by the rule hx, yi1 = 2, hx, yi0 . Calculate the map which gives the dual
covector in the new inner product.
The inverse to ∗ is usually also written ∗ , and we write the vector dual
to a covector ξ as ξ ∗ . Naturally, we can define an inner product on V ∗ by
hξ, ηi = hξ ∗ , η ∗ i.
14.16∗ Prove that this defines an inner product on V ∗ .
If V and W are finite dimensional inner product spaces, we then define an
inner product on V ⊗ W by setting
hv1 ⊗ w1 , v2 ⊗ w2 i = hv1 , v2 i hw1 , w2 i .
This expression only determines the inner product on pure tensors, but since
the inner product is required to be bilinear and every tensor is a sum of pure
tensors, we only need to know the inner product on pure tensors.
14.17∗ Prove that this defines an inner product on V ⊗ W .
Lets write V ⊗2 to mean V ⊗ V , etc. We refer to tensors in a vector space V
to mean elements of V ⊗p ⊗ V ∗⊗q for some positive integers p and q, i.e. tensor
products of vectors and covectors. The elements of V ⊗p are called covariant
tensors: they are sums of tensor products of vectors. The elements of V ∗⊗p are
called contravariant tensors: they are sums of tensor products of covectors.
14.18∗ Prove that an inner product yields a unique linear isomorphism
∗
: V ⊗p ⊗ V ∗⊗q → V ⊗(p+q) ,
so that
∗
(v1 ⊗ v2 ⊗ · · · ⊗ vp ⊗ ξ1 ⊗ ξ2 ⊗ · · · ⊗ ξq ) = v1 ⊗v2 ⊗· · ·⊗vp ⊗ξ1 ∗ ⊗ξ2 ∗ ⊗· · ·⊗ξq ∗ .
This isomorphism “raises indices.” Similarly, we can define a map to “lower
indices.”
135
Polarization
Polarization
We will generalize the isomorphism between symmetric bilinear forms and
quadratic forms to an isomorphism between symmetric tensors and polynomials.
Definition 14.17. Let V be a finite dimensional vector space. If t is a tensor
in V ∗⊗p , i.e. a multilinear function t (v1 , v2 , . . . , vp ) depending on p vectors
v1 , v2 , . . . , vp from V , then we can define the polarization of t to be the function
(also written traditionally with the same letter t) t(v) = t (v, v, . . . , v).
If t is a covector, so a linear function t(v) of a single vector v, then the
polarization is the same linear function.
In R2 , if t = e1 ⊗ e2 , then the polarization is t (x) = x1 x2 for
1
x
x=
x2
in R2 .
The antisymmetric tensor t = e1 ⊗ e2 − e2 ⊗ e1 in R2 has polarization
t(x) = x1 x2 − x2 x1 = 0, vanishing.
Definition 14.18. A function f : V → R on a finite dimensional vector space
is called a polynomial if there is a linear isomorphism F : Rn → V for which
f (F (x)) is a polynomial in the usual sense.
Clearly the choice of linear isomorphism F is irrelevant. A different choice
would only alter the linear functions by linear combinations of one another, and
therefore would alter the polynomial functions by substituting linear combinations of new variables in place of old variables. In particular, the degree of a
polynomial function is well defined. A polynomial function f : V → R is called
homogeneous of degree d if f (λx) = λd f (x) for any vector x in V and number
λ. Clearly every polynomial function splits into a unique sum of homogeneous
polynomial functions.
There are two natural notions of multiplying symmetric tensors, which
simply differ by a factor. The first is
X
s◦t (x1 , x2 , . . . , xa+b ) =
s xp(1) , xp(2) , . . . , xp(a) t xp(a+1) , xp(a+2) , . . . , xp(a+b) ,
p
when s has a lower indices and t has b and the sum is over all permutations p
of the numbers 1, 2, . . . , a + b. The second is
st (x1 , x2 , . . . , xa+b ) =
1
s ◦ t.
(a + b)!
136
Tensors
Theorem 14.19. Polarization in is a linear isomorphism taking symmetric
contravariant tensors to polynomials, preserving degree, and taking products to
products (using the second multiplication above).
Chapter 15
Exterior Forms
This chapter develops the definition and basic properties of exterior forms.
Why Forms?
This section is just for motivation; the ideas of this section will not be used
subsequently.
Antisymmetric contravariant tensors are also called exterior forms. In terms
of indices, they have no upper indices, and they are skew-symmetric in their
lower indices. The reason for the importance of exterior forms is quite deep,
coming from physics. Consider fluid passing into and out of a region in space.
We would like to measure how rapidly some quantity (for instance heat) flows
into that region. We do this by integrating the flux of the quantity across the
boundary: counting how much is going in, and subtracting off how much is
going out. This flux is an integral along the boundary surface. But if we change
our mind about which part is the inside and which is the outside, the sign of the
integral has to change. So this type of integral (called a surface integral) has
to be sensitive to the choice of inside and outside, called the orientation of the
surface. This sign sensitivity has to be built into the integral. We are used to
integrating functions, but they don’t change sign when we change orientation
the way a flux integral should. For example, the area of a surface is not a
flux integral, because it doesn’t depend on orientation. Lets play around with
some rough ideas to get a sense of how exterior forms provide just the right
sign changes for flux integrals: we can integrate exterior forms. For precise
definitions and proofs, Spivak [8] is an excellent introduction.
Lets imagine some kind of object α that we can integrate over any surface
S (as long as S is reasonably smooth, except maybe for a few sharp edges: lets
not makeR that precise since we are only playing around). Suppose that the
integral S α is a number. Of course, S must be a surface with a choice of
orientation. Moreover,
if we write
−S for the same
R
R
R surface with the opposite
orientation, then −S α = − S α. Suppose that S α varies smoothly as we
smoothly deform S. Moreover, suppose that if we cut a surface S into two
surfaces S1 and S2 ,
137
138
Exterior Forms
S =
S1 ∪ S1
R
R
R
then S α = S1 α+ S2 α: the integral is a sum of locally measureable quantities.
Fix a point, which we can translate to become the origin,R and scale up the
picture so that eventually, for S a little piece of surface, S α is very nearly
invariant under small translations of S. (We can do this because the integral
varies smoothly, so after rescaling theR picture the integral hardly varies at all.)
Just for simplicity, lets assume that S α is unchanged when we translate the
surface S. Any two opposite sides of a box are translations of one
R another, but
with opposite orientations. So they must have opposite signs for α. Therefore
any small box has as much of our quantity entering as Rleaving. Approximating
any region with small boxes, we must get total flux S α = 0 when S is the
boundary of the region.
Pick two linearly independent vectors u and v, and let P be the parallelogram
at the origin with sides u and v. Pick any vector w perpendicular to u and v
and with
det u v w > 0.
Orient P so that the outside of P is the side in the direction of w.
If we swap u and v, then
we change the orientation of the parallelogram P .
R
Lets write α(u, v) for P α. Slicing the parallelogram into 3 equal pieces, say
into 3 parallelograms with sides u/3, v, we see that α(u/3, v) = α(u, v)/3.
In the same way, we can see that α(λu, v) = λ α(u, v) for λ any positive rational
number (dilate by the numerator, and cut into a number of pieces given by the
denominator). Because α(u, v) is a smooth function of u and v, we see that
α(λu, v) = λ α(u, v) for λ > 0. Similarly, α(0, v) = 0 since the parallelogram is
flattened into a line. Moreover, α(−u, v) = −α(u, v), since the parallelogram of
−u, v is the parallelogram of u, v reflected, reversing its orientation. So α(u, v)
scales in u and v. By reversing orientation, α(v, u) = −α(u, v). A shear applied
to the parallelogram will preserve the area, and after the shear we can cut
and paste the parallelogram. The integral must be preserved, by translation
invariance, so α(u + v, v) = α(u, v).
The hard part is to see why α is linear as a function of u. This comes from
the drawing
139
Definition
u
u+v
w
v
v+w
The integral over the boundary of this region must vanish. If we pick three
vectors u, v and w, which we draw as the standard basis vectors, then the region
has boundary given by various parallelograms and triangles (each triangle being
half a parallelogram), and the vanishing of the integral gives
1
1
0 = α(u, v) + α(v, w) + α(w, u) − α(w, u) − α(u + w, v).
2
2
Therefore α(u, v) + α(v, w) = α(u + w, v), so that finally α is a tensor. If you
like indices, you can write α as αij with αji = −αij .
R Our argument is only slightly altered if we keep in mind that the integral
α should not really be exactly translation invariant, but only vary slightly
S
with small translations, and that the integral around the boundary of a small
region should be small. We can still carry out the same argument, but throwing
in error terms proportional to the area of surface and extent of translation, or
to the volume of a region. We end up with α being an exterior form whose
coefficients are functions.
If we imagine a flow contained inside a surface, we can similarly measure
flux across a curve. We also need to be sensitive to orientation: which side of
the boundary of a surface is the inside of the surface. Again the correct object
to work with in order to have the correct sign sensitivity is an exterior form
(whose coefficients are functions, not just numbers). Similar remarks hold in
any number of dimensions. So exterior forms play a vital role because they are
the objects we integrate. We can easily change variables when we integrate
exterior forms.
Definition
Definition 15.1. A tensor t in V ∗⊗p is called a p-form if it is antisymmetric, i.e.
t (v1 , v2 , . . . , vp )
is antisymmetric as a function of the vectors v1 , v2 , . . . , vp ,
For any permutation q,
t vq(1) , vq(2) , . . . , vq(p) = (−1)N t (v1 , v2 , . . . , vp ) ,
where (−1)N is the sign of the permutation q.
140
Exterior Forms
The form in Rn
ε(v1 , v2 , . . . , vn ) = det v1
v2
...
vn
n
is called the
R volume form of R (because of its interpretation as an
integrand: R ε is the volume of any region R.).
A covector ξ in V ∗ is a 1-form, because there are no permutations you
can carry out on ξ(v).
In Rn we traditionally write points as
1
x
x2
x = . ,
..
xn
and write dx1 for the covector given by the rule dx1 (y) = y 1 for any
vector y in Rn . Then α = dx1 ⊗ dx2 − dx2 ⊗ dx1 is a 2-form:
α(u, v) = u1 v 2 − u2 v 1 .
15.1∗ If t is a 3-form, prove that
a. t(x, y, y) = 0 and
b. t(x, y + 3 x, z) = t(x, y, z)
for any vectors x, y, z.
Hints
1.1.
a. 0 v = (0 + 0) v = 0 v + 0 v. Add the vector w for which 0 v + w = 0 to
both sides.
b. a 0 = a (0 + 0) = a 0 + a 0. Similar.
1.3. If there are two such vectors, w1 and w2 , then v + w1 = v + w2 = 0.
Therefore
w1 = 0 + w1
= (v + w2 ) + w1
= v + (w2 + w1 )
= v + (w1 + w2 )
= (v + w1 ) + w2
= 0 + w2
= w2 .
0 = (1 + (−1)) v
= 1 v + (−1) v
= v + (−1) v
so (−1) v = −v.
1.8.
a. Yes: limits of sums and scalings are sums and scalings of limits, so continuity survives sums and scalings.
b. No: scaling by a negative number will take a (not everywhere 0) nonnegative function
to a nonpositive (somewhere negative) function. For
example, − x2 = −x2 . Therefore you can’t scale functions while staying
inside the set of nonnegative functions.
c. No: x3 + 1 + −x3 + 7 = 8 has degree 0, so you can’t add functions
while staying in the set of polynomials of degree 3.
141
142
Hints
d. Yes: if you add symmetric 3 × 3 matrices A and B, you get a matrix
A + B which satisfies
t
(A + B) = At + B t = A + B,
so A + B is symmetric, and if a is any number then clearly
t
(aA) = a At = a A,
so aA is symmetric. So you can add and scale symmetric matrices, and
they stay symmetric.
1.9. You might take:
a. 1, x, x2 , x3
b.
1
0
1
0
0
0
0
0
0
1 0 0
1 0 0
,
,
,
0
0 0 0
0 0 0
0
1 0 0
1 0 0
,
,
0
0 0 0
0 0 0
c. The set of matrices e(i, j) for i ≤ j, where e(i, j) has zeroes everywhere,
except for a 1 at row i and column j.
d. A polynomial p(x) = a + bx + cx2 + dx3 vanishes at the origin just when
a = 0. A basis: x, x2 , x3 .
1.11. Some examples you might think of:
• The set of constant functions.
• The set of linear functions.
• The set of polynomial functions.
• The set of polynomial functions of degree at most d (for any fixed d).
• The set of functions f (x) which vanish when x < 0.
• The set of functions f (x) for which there is some interval outside of which
f (x) vanishes.
• The set of functions f (x) for which f (x)p(x) goes to zero as x gets large,
for every polynomial p(x).
• The set of functions which vanish at the origin.
1.12.
Hints
a. no
b. no
c. no
d. yes
e. no
f. yes
g. yes
1.13.
a. no
b. no
c. no
d. yes
e. yes
f. no
1.15.
a. If AH = HA and BH = HB, then (A + B)H = H(A + B), clearly.
Similarly 0H = H0 = 0, and (cA)H = H(cA).
b. P is the set of diagonal 2 × 2 matrices.
1.17. Ax = Sx and By = T y for any x in Rp and y in Rq . Therefore T Sx =
BSx = BAx for any x in Rp . So (BA)ej = T Sej , and therefore BA has the
required columns to be the associated matrix.
1.19.
a. 1x = 1y just when x = y, since 1x = x.
b. Given any z in V , we find z = 1x just for x = z.
1.20. Given z in V , we can find some x in U so that T x = z. If there are two
choices for this x, say z = T x = T y, then we know that x = y. Therefore x is
uniquely determined. So let T −1 z = x. Clearly T −1 is uniquely determined,
by the equation T −1 T = 1, and satisfies T T −1 = 1 too. Lets prove that T −1
is linear. Pick z and w in V . Then let T x = z and T y = w. We know
143
144
Hints
that x and y are uniquely determined by these equations. Since T is linear,
T x + T y = T (x + y). This gives z + w = T (x + y). So
T −1 (z + w) = T −1 T (x + y)
=x+y
= T −1 z + T −1 w.
Similarly, T (ax) = aT x, and taking T −1 of both sides gives aT −1 z = T −1 az.
So T −1 is linear.
1.21. Write p(x) = a + bx + cx2 . Then
a
Tp = a + b + c .
a + 2b + 4c
To solve for a, b, c in terms of p(0), p(1), p(2), we solve
1
a
p(0)
1 1 1 b = p(1) .
1 2 4
c
p(2)
Apply forward elimination:
1
1
1
1
0
0
1
0
0
0
1
2
0
1
4
p(0)
p(1)
p(2)
0
1
2
0
1
4
p(0)
p(1) − p(0)
p(2) − p(0)
0 0
1 1
0 2
p(0)
p(1) − p(0)
p(2) − 2p(1) + p(0)
Back substitute to find:
1
1
p(0) − p(1) + p(2)
2
2
b = p(1) − p(0) − c
3
1
= − p(0) + 2p(1) − p(2)
2
2
a = p(0).
c=
Therefore we can recover p = a+bx+cx2 completely from knowing p(0), p(1), p(2),
so T is one-to-one and onto.
1.22. The kernel of T is the set of x for which T x = 0. But T x = 0 implies
Sx = P −1 T x = 0, and Sx = 0 implies T x = P Sx = 0. So the same kernel.
145
Hints
The image of S is the set of vectors of the form Sx, and each is carried by P
to a vector of the form P Sx = T x. Conversely P −1 carries the image of T to
the image of S. Check that this is an isomorphism.
1.24. If T : U → V is an isomorphism, and F : Rn → U is a isomorphism, prove
that T F : Rn → V is also an isomorphism.
1.27. If x + W is a translate, we might find that we can write this translate
two different ways, say as x + W but also as z + W . So x and z are equal up
to adding a vector from W , i.e. x − z lies in W . Then after scaling, clearly
sx − sz = s(x − z) also lies in W . So sx + W = sz + W , and therefore scaling
is defined independent of any choices. A similar argument works for addition
of translates.
1.28. Take F and G two isomorphisms. Determinants of matrices multiply. Let
A be the matrix associated to F −1 T F : Rn → Rn and B the matrix associated
to G−1 T G : Rn → Rn . Let C be the matrix associated to G−1 F . Therefore
CAC −1 = B.
det B = det CAC −1
−1
= det C det A (det C)
= det A.
1.29.
a. T (p(x) + q(x)) = 2 p(x − 1) + 2 q(x − 1) = T p(x) + T q(x), and T (ap(x)) =
2a p(x − 1) = a T p(x).
b. If T p(x) = 0 then 2 p(x − 1) = 0, so p(x − 1) = 0 for any x, so p(x) = 0
for any x. Therefore T has kernel {0}. As for the image, if q(x) is
any polynomial of degree at most 2, then let p(x) = 12 q(x + 1). Clearly
T p(x) = q(x). So T is onto.
c. To find the determinant, we need an isomorphism. Let F : R3 → V ,
a
b
2
F
c = a + bx + cx .
146
Hints
Calculate the matrix A of F −1 T F by
a
b
−1
2
F −1 T F
c = F T (a + bx + cx )
= F −1 2 a + b(x − 1) + c(x − 1)2
= F −1 2a + 2b(x − 1) + 2c x2 − 2x + 1
= F −1 (2a − 2b + 2c) + (2b − 4c) x + 2cx2
2a − 2b + 2c
2b − 4c
=
2c
So the associated matrix is
2
A = 0
0
−2
2
0
2
−4
2
giving
det T = det A = 8.
1.30. det T = 2n
1.31. det T = det A2 . The eigenvalues of T are the eigenvalues of A. The
eigenvectors with eigenvalue λj are spanned by
xj 0 , 0 xj .
1.32. Let y1 and y2 be the eigenvectors of At with eigenvalues λ1 and λ2
respectively. Then the eigenvalues of T are those of A, with multiplicity 2, and
λi -eigenspace spanned by
t yi
0
,
.
0
yit
2
1.33. The characteristic polynomial is p (λ) = (λ + 1) (λ − 1) . The 1-eigenspace
is the space of polynomials q(x) for which q(−x) = q(x), so q(x) = a + cx2 .
This eigenspace is spanned by 1, x2 . The (−1)-eigenspace is the space of polynomials q(x) for which q(−x) = −q(x), so q(x) = bx. The eigenspace is
spanned by x. Indeed T is diagonalizable, and diagonalized by the isomorphism
F e1 = 1, F e2 = x, F e3 = x2 , for which
1
F −1 T F = −1 .
1
1.34.
Hints
(a) A polynomial with average value 0 on some interval must take on the
value 0 somewhere on that interval, being either 0 throughout the interval
or positive somewhere and negative somewhere else. A polynomial in one
variable can’t have more zeros than its degree.
(b) It is enough to assume that the number of intervals is n, since if it is
smaller, we can just add some more intervals and specify some more
choices for average values on those intervals. But then T p = 0 only for
p = 0, so T is an isomorphism.
1.36. Clearly the expression is linear in p(z) and “conjugate linear” in q(z).
Moreover, if hp(z), p(z)i = 0, then p(z) has roots at z0 , z1 , z2 and z3 . But p(z)
has degree at most 3, so has at most 3 roots or else is everywhere 0.
2.1. If there were two, say z1 and z2 , then z1 +z2 = z1 , but z1 +z2 = z2 +z1 = z2 .
2.2. Same proof, but with · instead of +.
2.6. If p = ab, then in Fp arithmetic ab = p (mod p) = 0 (mod p). If a has
a reciprocal, say c, then ab = 0 (mod p) implies that b = cab (mod p) = c0
(mod p) = 0 (mod p). So b is a multiple of p, and p is a multiple of b, so p = b
and a = 1.
2.7. You find −21 as answer from the Euclidean algorithm. But you can add
79 any number of times to the answer, to get it to be between 0 and 78, since
we are working modulo 79, so the final answer is −21 + 79 = 58.
2.8. x = 3
2.9. It is easy to check that Fp satisfies addition laws, zero laws, multiplication
laws, and the distributive law: each one holds in the integers, and to see that
it holds in Fp , we just keep track of multiples of p. For example, x + y in Fp
is just addition up to a multiple of p, say x + y + ap, usual integer addition,
some integer a. So (x + y) + z in Fp is (x + y) + z in the integers, up to a
multiple of p, and so equals x + (y + z) up to a multiple of p, etc. The tricky bit
is the reciprocal law. Since p is prime, nothing divides into p except p and 1.
Therefore for any integer a between 0 and p − 1, the greatest common divisor
gcd(a, p) is 1. The Euclidean algorithm computes out integers u and v so that
ua + vp = 1, so that ua = 1 (mod p). Adding or subtracting enough multiples
of p to u, we find a reciprocal for a.
147
148
Hints
2.10. Gauss-Jordan elimination
0
1
1
1
0
1
1
0
0
1
0
0
1
0
0
applied to (A 1):
1 0 1 0 0
0 1 0 1 0
1 0 0 0 1
0 1 0 1 0
1 0 1 0 0
1 0 0 0 1
0 1 0 1 0
1 0 1 0 0
1 1 0 1 1
0
1 0 1 0
0 1 0 0
1
0
1 1 1 1
0
0 1 0 1
1
0 1 0 0
0
1 1 1 1
So
A−1
1
= 1
1
0
0
1
1
0 .
1
2.11. Carry out Gauss–Jordan elimination thinking of the entries as living in
the field of rational functions. The result has rational functions as entries. For
any value of t for which none of the denominators of the entries are zero, and
none of the pivot entries are zero, the rank is just the number of pivots. There
are finitely many entries, and the denominator of each entry is a polynomial,
so has finitely many roots.
3.2. You could try letting U be the horizontal plane, consisting in the vectors
x1
x = x2
0
and V any other plane, for instance the plane consisting in the vectors
x1
x = 0 .
x3
3.3. Any vector w in Rn splits uniquely into w = u + v for some u in U and v in
V . So we can unambiguously define a map Q by the equation Qw = Su + T v.
It is easy to check that this map Q is linear. Conversely, given any map
Q : Rn → Rp , define S = Q|U and T = Q|V .
149
Hints
3.7. The pivot columns of (A B) form a basis for U + W . All columns are
pivot columns just when U + W is a direct sum.
4.1. Take any point of the plane, project it horizontally onto the vertical axis,
and then rotate the vertical axis clockwise by a right angle to become the
horizontal axis. If you do it twice, clearly you end up at the origin.
4.4.
e1
e2 , e3
λ=2
λ=3
e4 , e5 , e6
4.6. Clearly v by itself is a string of length 1. Let k be the length of the longest
string starting from v. So the string is
T −λI
(T − λI)k v
T −λI
(T − λI)k−1 v
T −λI
...
T −λI
(T − λI) v
T −λI
v
0.
Suppose there is a linear relation, say relation:
k
c0 v + c1 (T − λI) v + · · · + ck (T − λI) v = 0.
Multiply both sides with a big enough power of T − λI to kill off all but the
first term. This power exists, because all of the vectors in the string are killed
by some power of T − λI, and the further down the list, the smaller a power
of T − λI you need to do the job, since you already have some T − λI sitting
around. So this forces c0 = 0. Hit with the next smallest power of T − λI to
force c1 = 0, etc.
4.7. Take the shortest possible linear relation between nonzero generalized eigenvectors. Suppose that it involves vectors vi with various different eigenvalues
k
λi : (T − λi I) i vi = 0. If you can find a shorter relation, use it instead. If you
can find a relation of the same length with lower powers ki , use it instead. Let
wi = (T − λ1 I) vi . These wi are still generalized eigenvectors, each with the
same eigenvalue as vi , but a smaller power for k1 . So these wi must all vanish.
So all vi are λ1 -eigenvectors. The same works for λ2 , so all of the vectors vi
are also λ2 eigenvectors, so all vanish.
To show that the sum of generalized eigenspaces is a direct sum, we need
to show that every vector in it has a unique expression as a sum of generalized
eigenvectors of distinct eigenvalues. If there are two such expressions, take their
difference and you get a linear relation among generalized eigenvectors.
4.8. Two ways:(1) There is an eigenvector for each eigenvalue. Pick one for
each. Eigenvectors with different eigenvalues are linearly independent, so they
150
Hints
form a basis. (2) Each Jordan block of size k has an eigenvalue λ of multiplicity
k. So all blocks must be 1 × 1, and hence A is diagonal in Jordan normal form.
4.9.
4.10.
1
2
− 12
F =
1
F = 0
0
0
1
1
1
2
1
2
0
0
, F −1 AF =
0
0
1 , F −1 AF = 0
0
0
0
2
1
0
0
0
0
0
4.11. Two blocks, each at most n/2 long, and a zero block if needed to pad it
out.
4.12.
i 1
i
1
−4 2
i 1 0
0
4
2
i
1
1
− 4i
0
4
4
4
, F −1 AF = 0 i 0
F =
i
i
0 0 −i 1
0 −4
0
4
0 0 0 −i
− 41 4i − 14 − 4i
4.13.
0
1
F =
0
0
0
0
0
−1
F AF =
0
0
0
1
0
1
0
0
0
0
0
0
1
0
0
0
0
0
,
0
1
1
0
0
0
0
1
0
0
0 .
1
0
4.17. Clearly it is enough to prove the result for a single Jordan block. Given
an n × n Jordan block λ + ∆, let A be any diagonal matrix,
a1
a2
A=
..
.
an
with all of the aj distinct and nonzero. Compute out the matrix B = A(λ + ∆)
to see that all diagonal entries of B are distinct and that B is upper triangular.
151
Hints
Why is B diagonalizable? Then λ + ∆ = A−1 B, a product of diagonalizable
matrices.
5.1. Using long division of polynomials,
x3
−x+3
x2 + 1
x5
+ 3x2 + 4x + 1
− x5 − x3
− x3 + 3x2 + 4x
x3
+x
3x2 + 5x + 1
− 3x2
−3
5x − 2
so the quotient is x3 − x + 3 and the remainder is 5x − 2.
5.2.
x −2
4
3
2
5
3
2
x + 2x + 4x + 4x + 4
x
+ 2x + x
+2
− x5 − 2x4 − 4x3 − 4x2 − 4x
− 2x4 − 2x3 − 3x2 − 4x + 2
2x4 + 4x3 + 8x2 + 8x + 8
2x3 + 5x2 + 4x + 10
1
1
2x − 4
2x3 + 5x2 + 4x + 10
x4 + 2x3 + 4x2 + 4x + 4
− x4 − 52 x3 − 2x2 − 5x
− 21 x3 + 2x2 − x + 4
5 2
1 3
+ x + 25
2x + 4x
13 2
4 x
+
13
2
2x + 5
2
x +2
3
2
2x + 5x + 4x + 10
− 2x3
− 4x
5x2
− 5x2
+ 10
− 10
0
Clearly r(x) = x2 + 2 (up to scaling; we will always scale to get the leading
term to be 1.) Solving backwards for r(x):
r(x) = u(x)a(x) + v(x)b(x)
with
2
1
x−
,
13
2
2
5
v(x) =
x2 − x + 3 .
13
2
u(x) = −
152
Hints
5.3. Euclidean algorithm yields:
2310 − 2 · 990 = 330
990 − 3 · 330 = 0
and
1386 − 1 · 990 = 396
990 − 2 · 396 = 198
396 − 2 · 198 = 0.
Therefore the greatest common divisors are 330 and 198 respectively. Apply
Euclidean algorithm to these:
330 − 1 · 198 = 132
198 − 1 · 132 = 66
132 − 2 · 66 = 0.
Therefore the greatest common divisor of 2310, 990 and 1386 is 66. Turning
these equations around, we can write
66 = 198 − 1 · 132
= 198 − 1 · (330 − 1 · 198)
= 2 · 198 − 1 · 330
= 2 (990 − 2 · 396) − 1 · (2310 − 2 · 990)
= 4 · 990 − 4 · 396 − 1 · 2310
= 4 · 990 − 4 · (1386 − 1 · 990) − 1 · 2310
= 8 · 990 − 4 · 1386 − 1 · 2310.
5.6. Clearly ∆nn = 0, since ∆ shifts each vector of the string en , en−1 , . . . , e1
one step (and sends e1 to 0). Therefore the minimal polynomial of ∆n must
divide xn . But for k < n, ∆kn en = en−k is not 0, so xk is not the minimal
polynomial.
5.7. If the minimal polynomial of λ + ∆ is m(x), then the minimal polynomial
of ∆ must be m(x − λ).
5.8. m (λ) = λ2 − 5λ − 2
5.10. Take the matrix A of T to have Jordan normal form, and you find
characteristic polynomial
det (A − λ I) = (λ1 − λ)
n1
n2
(λ2 − λ)
nN
. . . (λN − λ)
,
with n1 the sum of the sizes of all Jordan blocks with eigenvalue λ1 , etc. The
characteristic polynomial is clearly divisible by the minimal polynomial.
153
Hints
5.11. Split the minimal polynomial m(x) into real and imaginary parts. Check
that s(A) = 0 for s(x) either of these parts, a polynomial equation of the same
or lower degree. The imaginary part has lower degree, so vanishes.
5.12. Let
2πk
2πk
zk = cos
+ i sin
.
n
n
By deMoivre’s theorem, zkn = 1. If we take k = 1, 2, . . . , n, these are the
so-called n-th roots of 1, so that
z n − 1 = (z − z1 ) (z − z2 ) . . . (z − zn ) .
Clearly each root of 1 is at a different angle. T n = I implies that
(T − z1 I) (T − z2 I) . . . (T − zn I) = 0,
so by corollary 5.7, T is diagonalizable. Over the real numbers, we can take
0 −1
T =
,
1 0
which satisfies T 4 = 1, but has complex eigenvalues, so is not diagonalizable
over the real numbers.
5.13. Applying forward elimination to the matrix
1 0 2 2
0 2 2 6
0 0 0 0
0 1 1 3
1 1 3 5
0 0 0 0
0 1 1 3
0 2 2 6
1 −1 1 −1
yields
1
0
0
0
1
0
0
0
1
0
2
0
1
1
0
1
2
−1
2 2
2 6
0 0
1 3
3 5
0 0
1 3
2 6
1 −1
154
Hints
Add −(row 1) to row 5, −(row 1) to row 9.
1
0
0
0
0
0
0
0
0
0
2
0
1
1
0
1
2
−1
2
2
0
1
1
0
1
2
−1
2
6
0
3
3
0
3
6
−3
1
0
0
0
0
0
0
0
0
0
2
0
1
1
0
1
2
−1
2
2
0
1
1
0
1
2
−1
2
6
0
3
3
0
3
6
−3
Move the pivot &.
Add − 12 (row 2) to row 4, − 12 (row 2) to row 5, − 21 (row 2) to row 7, −(row 2)
to row 8, 12 (row 2) to row 9.
1
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
2 2
2 6
0 0
0 0
0 0
0 0
0 0
0 0
0 0
155
Hints
Move the pivot &.
2
6
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
2
2
0
0
0
0
0
0
0
2
6
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
2
2
0
0
0
0
0
0
0
2
6
0
0
0
0
0
0
0
2
2
0
0
0
0
0
0
0
Move the pivot →.
Move the pivot →.
Cutting out all of the pivotless columns after the first one, and all of the zero
rows, yields
1 0 2
0 2 2
156
Hints
Scale row 2 by 12 .
1
0
0
1
2
1
The minimal polynomial is therefore
−2 − λ + λ2 = (λ + 1) (λ − 2) .
The eigenvalues are λ = −1 and λ = 2. We can’t see which eigenvalue has
multiplicity 2 and which has multiplicity 1.
5.17. If ε1 6= ε2 , then D = T and N = 0. If ε1 = ε2 , then
ε1 0
0 1
D=
, N=
.
0 ε2
0 0
If we imagine ε1 and ε2 as variables, then D “jumps” (its top right corner
changes from 0 to 1 “suddenly”) as ε1 and ε2 collide.
5.18. We have seen previously that S and T will preserve each others generalized
k
eigenspaces: if (S − λ)k x = 0, then (S − λ) T x = 0. Therefore we can restrict
to a generalized eigenspace of S, and then further restrict to a generalized
eigenspace of T . So we can assume that S = λ0 + N0 and T = λ1 + N1 , with
λ0 and λ1 complex numbers and N0 and N1 commuting nilpotent linear maps.
But then ST = λ0 λ1 + N where N = λ0 N1 + λ1 N0 + N0 N1 . Clearly large
enough powers of N will vanish, because they will be sums of terms like
λj0 λk1 N0k+` N1j+` .
So N is nilpotent.
6.11. If we have a matrix A with n distinct eigenvalues, then every nearby
matrix has nearby eigenvalues, so still distinct.
7.2.
det (A − λ I) = (z1 − λ) (z2 − λ) . . . (zn − λ)
= (−1)n Pz (λ)
= sn (z) − sn−1 (z)λ + sn−2 (z)λ2 + · · · + (−1)n λn .
8.1. There are (2n)! permutations of 1, 2, . . . , 2n, and each partition is associated
to n!2n different permutations, so there are (2n)!/ (n!2n ) different partitions of
1, 2, . . . , 2n.
8.2. In alphabetical order, the first pair must always start with 1. Then we can
choose any number to sit beside the 1, and the smallest number not choosen
starts the second pair, etc.
157
Hints
(a) {1, 2}
(b) {1, 2} , {3, 4}
{1, 3} , {2, 4}
{1, 4} , {2, 3}
(c) {1, 2} , {3, 4} , {5, 6}
{1, 2} , {3, 5} , {4, 6}
{1, 2} , {3, 6} , {4, 5}
{1, 3} , {2, 4} , {5, 6}
{1, 3} , {2, 5} , {4, 6}
{1, 3} , {2, 6} , {4, 5}
{1, 4} , {2, 3} , {5, 6}
{1, 4} , {2, 5} , {3, 6}
{1, 4} , {2, 6} , {3, 5}
{1, 5} , {2, 3} , {4, 6}
{1, 5} , {2, 4} , {3, 6}
{1, 5} , {2, 6} , {3, 4}
{1, 6} , {2, 3} , {4, 5}
{1, 6} , {2, 4} , {3, 5}
{1, 6} , {2, 5} , {3, 4}
9.5. If a linear map T : V → W is 1-to-1 then T 0 6= T v unless v = 0, so there
are no vectors v in the kernel of T except v = 0. On the other hand, suppose
that T : V → W has only the zero vector in its kernel. If v1 6= v2 but T v1 = T v2 ,
then v1 − v2 6= 0 but T (v1 − v2 ) = T v1 − T v2 = 0. So the vector v1 − v2 is a
nonzero vector in the kernel of T .
9.6. Hom (V, W ) is always a vector space, for any vector space W .
9.7. If dim V = n, then V is isomorphic to Kn , so without loss of generality we
can just assume V = Kn . But then V ∗ is identified with row matrices, so has
dimension n as well: dim V ∗ = dim V .
9.10.
fx (α + β) = (α + β) (x)
= α(x) + β(x)
= fx (α) + fx (β).
Similarly for fx (s α).
9.11. fsx (α) = α(sx) = s α(x) = sfx (α). Similarly for any two vectors x and
y in V , expand out fx+y .
9.12. We need only find a linear function α for which α (x) 6= α (y). Identifying
V with Kn using some choice of basis, we can assume that V = Kn , and thus
x and y are different vectors in Kn . So some entry of x is not equal to some
entry of y, say xj 6= yj . Let α be the function α(z) = zj , i.e. α = ej .
158
Hints
10.1. Calculate out hAx, xi for x = x1 e1 + x2 e2 + · · · + xn en to see that
this gives Q(x). We know that the equation hAx, xi = Q(x) determines the
symmetric matrix A uniquely.
11.1.
p
L
1, 2
2, 1
2, 1
U
1 0
0 1
1 0
0 1
1 0
0 1
0
1
1
0
1
0
1
0
0
1
0
1
12.14. If b(x, y) = 0 for all y, then plug in y = x to find b(x, x) = 0.
12.18.
1
√
2
1
1
1
,√
.
1
2 −1
12.19.
(a) Suppose that ξ = T x and η = T y, i.e ξ(z) = b(x, z) and η(z) = b(y, z)
for any vector z. Then ξ(z) + η(z) = b(x, z) + b(y, z) = b(x + y, z), so
T x + T y = ξ + η = T (x + y). Similarly for scaling: aT x = T ax.
(b) If T x = 0 then 0 = b(x, y) for all vectors y from V , so x lies in the kernel
of b, i.e. the kernel of T is the kernel of b. Since b is nondegenerate, the
kernel of b consists precisely in the 0 vector. Therefore T is 1-to-1.
(c) T : V → V ∗ is a 1-to-1 linear map, and V and V ∗ have the same dimensions, say n. dim im T = dim ker T + dim im T = dim V = n, so T is onto,
hence T is an isomorphism.
(d) You have to take x = T −1 ξ.
13.1. The tensor product is δji ξk . There are two contractions:
a. δii ξk = n ξk (in other words, nξ), and
b. δji ξi = ξj (in other words, ξ).
13.3.
εijk xi y j z k = det x
y
z .
159
Hints
13.5. One contraction is sk = tiik .
p
q
i
(F∗ t)ik = Fli tlpq F −1 i F −1 k
p
q
= tlpq F F −1 l F −1 k
q
= tllq F −1 k
= (F∗ s)k .
14.3.
x ⊗ y = 4 e1 ⊗ e1 + 5 e1 ⊗ e2 + 8 e2 ⊗ e1 + 10 e2 ⊗ e2 + 12 e3 ⊗ e1 + 15 e3 ⊗ e2
14.4. Picking a basis v1 , v2 , . . . , vp for V and w1 , w2 , . . . , wq for W , we have
already seen that every tensor in V ⊗ W has the form
tiJ vi ⊗ wJ ,
so is a sum of pure tensors.
14.11. Take any linear map T : V → V and define a tensor t in V ⊗ V ∗ , which
is thus a bilinear map t (ξ, v) for ξ in V ∗ and v in V ∗∗ = V , by the rule
t (ξ, v) = ξ(T v).
Clearly if we scale T , then we scale t by the same amount. Similarly, if we add
linear maps on the right side of equation 15, then we add tensors on the left
side. Therefore the mapping taking T to t is linear. If t = 0 then ξ(T v) = 0
for any vector v and covector ξ. Thus T v = 0 for any vector v, and so T = 0.
Therefore the map taking T to t is 1-to-1. Finally, we need to see that the map
taking T to t is onto, the tricky part. But we can count dimensions for that: if
dim V = n then dim Hom (V, V ) = n2 and dim V ⊗ V ∗ = n.
Bibliography
[1] Emil Artin, Galois theory, 2nd ed., Dover Publications Inc., Mineola, NY,
1998. 21
[2] Richard P. Feynman, Robert B. Leighton, and Matthew Sands, The Feynman lectures on physics. Vol. 2: Mainly electromagnetism and matter,
Addison-Wesley Publishing Co., Inc., Reading, Mass.-London, 1964. 117
[3] Tosio Kato, Perturbation theory for linear operators, Springer-Verlag, Berlin,
1995. 65
[4] Peter D. Lax, Linear algebra, John Wiley & Sons Inc., New York, 1997. 17
[5] I. G. Macdonald, Symmetric functions and Hall polynomials, 2nd ed., Oxford University Press, New York, 1995. 74
[6] Peter J. Olver, Classical invariant theory, Cambridge University Press,
Cambridge, 1999. 123
[7] Claudio Procesi, Lie groups, Springer, New York, 2007. 123
[8] Michael Spivak, Calculus on manifolds, Addison-Wesley, Reading, Mass.,
1965. 137
[9]
, Calculus, 3rd ed., Publish or Perish, 1994. 17
161
List of notation
Rn
The space of all vectors with n real number entries, 17
p×q
Matrix with p rows and q columns, 17
0
Any matrix whose entries are all zeroes, 19
Σ
Sum, 22
I
The identity matrix, 25
In
The n × n identity matrix, 25
ei
The i-th standard basis vector (also the i-th column of the identity matrix), 25
A−1
The inverse of a square matrix A, 28
det
The determinant of a square matrix, 47
At
The transpose of a matrix A, 60
dim
Dimension of a subspace, 82
ker A
The kernel of a matrix A, 85
im A
The image of a matrix A, 89
hx, yi
Inner product of two vectors, 117
||x||
The length of a vector, 117
|z|
Modulus (length) of a complex number, 140
arg z
Argument (angle) of a complex number, 140
C
The set of all complex numbers, 140
C
n
The set of all complex vectors with n entries, 140
hz, wi
Hermitian inner product, 142
A∗
Adjoint of a complex matrix or complex linear map A, 143
163
Index
alphabetical order
see partition, alphabetical order,
79
analytic function, 61
antisymmetrize, 120
associated
matrix, see matrix, associated
bilinear form, see bilinear form,
degenerate
dimension
effective, 97
direct sum, 8, see subspace, direct sum
effective dimension, see dimension, effective
eigenvector
generalized, 46
basis
dual, 92
bilinear form, 107
degenerate, 108
nondegenerate, 108
positive definite, 109
symmetric, 109
bilinear map, 130
block
Jordan, see Jordan block
Boolean numbers, 22
field, 21
form
bilinear, see bilinear form
volume, 140
form, exterior, 139
generalized
eigenvector, see eigenvector, generalized
Cayley–Hamilton
theorem, see theorem, Cayley–Hamilton
commuting, 58
complement, see subspace, complementary
complementary subspace, see subspace,
complementary
complex
linear map, see linear map, complex
vector space, see vector space, complex
component, 117
components
see principal components, 95
composition, 9
covector, 91, 118
dual, 134
degenerate
165
Hermitian
inner product, see inner product,
Hermitian
homomorphism, 91
identity
permutation, see permutation, identity
identity map, see linear, map, identity
inertia
law of, 111
inner product, 18
Hermitian, 18
space, 18
intersection, 35
invariant
subspace, see subspace, invariant
inverse, 21
isomorphism, 10
166
Index
Jordan
block, 44
normal form, 44
linear
map, 9
complex, 17
identity, 9
map
linear, see linear map
matrix
associated, 9
skew-adjoint, 64
skew-symmetric, 64
minimal polynomial, see polynomial,
minimal
multilinear, 127
multiplicative
function, 51
nilpotent, 58
nondegenerate
bilinear form, see bilinear form,
nondegenerate
normal form
Jordan, see Jordan normal form
partition, 79
alphabetical order, 79
associated to a permutation, 79
permutation
identity, 101
natural, of partition, 80
Pfaffian, 81
polarization, 135
polynomial, 135
homogeneous, 135
minimal, 54
principal component, 97
principal components, 95
product
tensor, see tensor, product
product, inner, see inner product
pure
tensor, see tensor, pure
quadratic form
kernel, 115
quotient space, 14
rank
tensor, see tensor, rank
real
vector space, see vector space, real
reciprocal, 21
restriction, 13, 57
skew-symmetric
normal form, 77
string, 44
subspace
complement, see subspace, complementary
complementary, 35
direct sum, 35
invariant, 58
sum, 35
sum
of subspaces, see subspace, sum
sum, direct, see subspace, direct sum
Sylvester, 111
symmetrize, 120
tensor, 117, 128
antisymmetric, 120
contraction, 119
contravariant, 134
covariant, 134
product, 119
pure, 130
rank, 130
symmetric, 120
tensor product
basis, 129
of tensors, 128
of vector spaces, 128
of vectors, 128
theorem
Cayley–Hamilton, 56
trace, 63, 74, 120
translate, 13
transpose, 93
transverse, 38
vector, 3
vector space, 3
complex, 17
real, 17
weight
of polynomial, 72
© Copyright 2026 Paperzz