Notes on the Jordan Normal Form

The Jordan Normal Form
Erik Wahlén
ODE Spring 2011
Introduction
The purpose of these notes is to present a proof of the Jordan normal form (also
called the Jordan canonical form) for a square matrix. Even if a matrix is real its
Jordan normal form might be complex and we shall therefore allow all matrices to
be complex. For real matrices there is, however, a variant of the Jordan normal
form which is real see the remarks in Teschl, p. 60.
The result we want to prove is the following.
Theorem 1. Let
T
A
be an
n×n
matrix. There exists an invertible
n×n
matrix
such that
T −1 AT = J ,
where
J
is a block matrix,
ê
Ü
J1
..
J=
.
Jm
and each block
Ji
is a square matrix of the form
â
λ
1
..
J i = λI + N =
0
.
..
.
..
.
0
where
λ
is an eigenvalue of
A, I
ì
,
1
λ
is a unit matrix and
N
(1)
has ones on the line
directly above the diagonal and zeros everywhere else.
If we identify A with the linear operator x 7→ Ax on Cn , the relation J =
T −1 AT means that J is the matrix for A in the basis consisting of the columns of
T . The theorem says that there exists a basis for Cn in which the linear operator
A has the matrix J . When proving a result in linear algebra it is often more
convenient to work with linear operators on vector spaces rather their matrix
representations in some basis. Throughout the rest of the notes we shall therefore
assume that V is an n-dimensional complex vector space and that A : V → V is a
linear operator on V .
1
Recall that the kernel (or null space) of A is dened by
ker A = {x ∈ V : Ax = 0}
and that the range of A is dened by
range A = {Ax : x ∈ V }.
The kernel and the range are both linear subspaces of V and the dimension theorem
says that
dim ker A + dim range A = n.
Recall also that λ ∈ C is called an eigenvalue of A if there exists some vector
x 6= 0 in V such that
Ax = λx.
The vector x is called an eigenvector of A corresponding to the eigenvector λ.
The subspace ker(A − λI) of V , that is, the subspace spanned by the eigenvectors belonging to λ, is called the eigenspace corresponding to λ. The number
dim ker(A − λI) is called the geometric multiplicity of λ. Note that λ ∈ C is an
eigenvalue if and only if it is a root of the characteristic polynomial pchar (z) =
det(A − zI). By the fundamental theorem of algebra we can write pchar (z) as a
product of rst degree polynomials,
pchar (z) = (−1)n (z − λ1 )a1 (z − λ2 )a2 · · · (z − λk )ak ,
where λ1 , . . . , λk are the distinct eigenvalues of A. The positive integer aj is
called the algebraic multiplicity of the eigenvalue λj . The corresponding geometric
multiplicity will be denoted gj .
Decomposition into Invariant Subspaces
We begin with some denitions. Let V1 , . . . , Vk be subspaces of V . We say that
V is the direct sum of V1 , . . . , Vk if each vector x ∈ V can be written in a unique
way as
x = x1 + x2 + · · · + xk ,
where
xj ∈ Vj ,
If this is the case we use the notation
V = V1 ⊕ V2 ⊕ · · · ⊕ Vk .
We say that a subspace W of V is invariant under A if
x ∈ W ⇒ Ax ∈ W.
2
j = 1, . . . , k.
Suppose that A has n distinct eigenvalues λ1 , . . . , λn with corresponding eigenvectors u1 , . . . , un . It then follows that the vectors u1 , . . . , un are
linearly independent and thus form a basis for V . Let
Example 1.
ker(A − λk I) = {zuk : z ∈ C},
k = 1, . . . , n,
be the corresponding eigenspaces. Each eigenspace is invariant under A since
(A − λk I)u = 0 ⇒ (A − λk I)Au = A(A − λk I)u = 0.
Moreover,
V = ker(A − λ1 I) ⊕ ker(A − λ2 I) ⊕ · · · ⊕ ker(A − λn I)
by the denition of a basis.
More generally, suppose that A has k distinct eigenvalues λ1 , . . . , λk and that
the geometric multiplicity gj of each λj equals the algebraic multiplicity aj . Let
ker(A − λj I), j = 1, . . . , k , be the corresponding eigenspaces. We can then nd
a basis for each eigenspace consisting of gj eigenvectors. The union of these bases
consists of g1 + · · · + gk = a1 + · · · + ak = n elements and is linearly independent,
since eigenvectors belonging to dierent eigenvalues are linearly independent. We
thus obtain a basis for V and it follows that
V = ker(A − λ1 I) ⊕ ker(A − λ2 I) ⊕ · · · ⊕ ker(A − λk I).
In this basis, A has the matrix
ê
Ü
λ1 I 1
..
D=
.
λk I k
where each I j is a gj × gj unit matrix. In other words, D is a diagonal matrix
with the eigenvalues on the diagonal, each repeated gj times. One says that A is
diagonalized in the new basis.
Unfortunately, not all matrices can be diagonalized.
Example 2.
Consider the matrix
Ç
å
2 1
A=
.
0 2
The characteristic polynomial is (λ − 2)2 so the only eigenvalue is λ = 2 with
algebraic multiplicity a = 2. On the other hand
Ç
0 1
(A − 2I)x =
0 0
åÇ
å
Ç
x1
x2
=
x2
0
å
so that g = 1 and the eigenspace is ker(A − 2I) = {(z, 0) : z ∈ C}. Clearly we
cannot write C2 as a direct sum of the eigenspaces in this case. Note, however,
that
Ç
å
0 0
2
(A − 2I) =
0 0
so that C2 = ker(A − 2I)2 .
3
Given a polynomial p(z) = αm z m + αm−1 z m−1 + · · · + α1 z + α0 , we dene
p(A) = αm Am + αm−1 Am−1 + · · · + α1 A + α0 I.
Lemma 1. There exists a non-zero polynomial
p
such that
p(A) = 0.
Here it is convenient to identify A with its matrix in some basis. Note
that Cn×n is an n2 -dimensional vector space. It follows that the n2 + 1 matrices
2
I, A, A2 , . . . , An are linearly dependent. But this means that there exist numbers
α0 , . . . , αn2 , not all zero, such that
Proof.
2
2 −1
αn2 An + αn2 −1 An
+ · · · + α1 A + a0 I = 0,
2
that is, p(A) = 0, where p(z) = αn2 z n + · · · + α1 z + α0 .
Let pmin (z) be a monic polynomial (with leading coecient 1) of minimal degree
such that pmin (A) = 0. If p(z) is any polynomial such that p(A) = 0 it follows that
p(z) = q(z)pmin (z) for some polynomial q . To see this, use the division algorithm
on p and pmin :
p(z) = q(z)pmin (z) + r(z),
where r = 0 or deg r < deg pmin .
Thus r(A) = p(A) − q(A)pmin (A) = 0. But this implies that r(z) = 0, since pmin
has minimal degree. This shows that the polynomial pmin is unique. It is called
the minimal polynomial for A.
By the fundamental theorem of algebra, we can write the minimal polynomial
as a product of rst degree polynomials,
pmin (z) = (z − λ1 )m1 (z − λ2 )m2 · · · (z − λk )mk ,
(2)
where the numbers λj are distinct and each mj ≥ 1. Note that we don't know
that the roots λj of the minimal polynomial coincide with the eigenvalues of A
yet. This will be shown in Theorem 2 below.
Lemma 2. Suppose that
If
p(A) = 0
p(z) = p1 (z)p2 (z)
where
p1
and
p2
are relatively prime.
we have that
V = ker p1 (A) ⊕ ker p2 (A)
and each subspace
ker pj (A)
is invariant under
A.
The invariance follows from pj (A)Ax = Apj (A)x = 0, x ∈ ker pj (A).
Since p1 and p2 are relatively prime, it follow by Euclid's algorithm that there
exist polynomials q1 , q2 such that
Proof.
p1 (z)q1 (z) + p2 (z)q2 (z) = 1.
Thus
p1 (A)q1 (A) + p2 (A)q2 (A) = I.
4
Applying this identity to the vector x ∈ V , we obtain
x = p1 (A)q1 (A)x + p2 (A)q2 (A)x,
|
{z
x2
}
|
{z
}
x1
where
p2 (A)x2 = p2 (A)p1 (A)q1 (A)x = p(A)q1 (A)x = 0,
so that x2 ∈ ker p2 (A). Similarly x1 ∈ ker p1 (A). Thus V = ker p1 (A)+ker p2 (A).
On the other hand, if
x1 + x2 = x01 + x02 ,
xj , x0j ∈ ker pj (A), j = 1, 2,
we obtain that
u = x1 − x01 = x02 − x2 ∈ ker p1 (A) ∩ ker p2 (A),
so that
u = q1 (A)p1 (A)u + q2 (A)p2 (A)u = 0.
It follows that the representation x = x1 + x2 is unique and therefore
V = ker p1 (A) ⊕ ker p2 (A).
Theorem 2. With
λ1 , . . . , λ k
and
m1 , . . . , mk
as in
(2)
we have
V = ker(A − λ1 I)m1 ⊕ · · · ⊕ ker(A − λk I)mk ,
ker(A − λj I)mj
eigenvalues of A.
where each
is invariant under
A.
The numbers
λ1 , . . . , λk
are the
We begin by noting that the polynomials (z − λj )mj , j = 1, . . . , k , are
relatively prime. Repeated application of Lemma 2 therefore shows that
Proof.
V = ker(A − λ1 I)m1 ⊕ · · · ⊕ ker(A − λk I)mk ,
with each ker(A − λj I)mj invariant.
Consider the linear operator A : ker(A − λj I)mj → ker(A − λj I)mj . It is clear
that ker(A − λj I)mj 6= {0}, for otherwise pmin would not be minimal. Since every
linear operator on a (non-trivial) nite dimensional complex vector space has an
eigenvalue, it follows that there is some non-zero element u ∈ ker(A − λj I)mj with
Au = λu, λ ∈ C. But then
0 = (A − λj I)mj u = (λ − λj )mj u,
so λ = λj . This shows that the roots λj of the minimal polynomial are eigenvalues
of A. On the other hand if u is an eigenvector of A corresponding to the eigenvalue
λ, we have
0 = pmin (A)u = (A − λ1 I)m1 · · · (A − λk I)mk u = (λ − λ1 )m1 · · · (λ − λk )mk u,
so λ = λj for some j , that is, every eigenvalue is a root of the minimal polynomial.
5
The subspace ker(A−λj I)mj is called the generalized eigenspace corresponding
to λj and a non-zero vector x ∈ ker(A − λj I)mj is called a generalized eigenvector.
The number mj is the smallest exponent m such that (A − λj I)m vanishes on
ker(A − λj I)mj . Suppose for a contradiction that e.g. (A − λ1 I)m1 −1 u = 0 for all
u ∈ ker(A − λ1 I)m1 . Writing x ∈ V as x = x1 + x̃ according to the decomposition
V = ker(A − λ1 I)m1 ⊕ ker p̃(A),
where p̃(z) = (z − λ2 )m2 · · · (z − λk )mk , we would then obtain that
(A − λ1 I)m1 −1 p̃(A)x = p̃(A)(A − λ1 I)m1 −1 x1 + (A − λ1 I)m1 −1 p̃(A)x̃ = 0,
contradicting the denition of the minimal polynomial.
If we select a basis {uj,1 , . . . , uj,nj } for each generalized eigenspace, then the
union {u1,1 , . . . , u1,n1 , u2,1 , . . . , u2,n2 , . . . , uk,1 , . . . , uk,nk } will be a basis for V . Since
each generalized eigenspace is invariant under the linear operator A, the matrix
for A in this basis will have the block form
Ü
A1
ê
..
.
,
Ak
where each Aj is a nj ×nj square matrix. What remains in order to prove Theorem
1 is to show that we can select a basis for each generalized eigenspace so that each
block Aj takes the form (1) or possibly consists of multiple blocks of the form (1).
Proof of Theorem 1
By restricting A to a generalized eigenspace ker(A − λj I)mj , we can assume that
A only has one eigenvalue, which we call λ. Set N = A − λI and let m be the
smallest integer for which N m = 0 (so that pmin (z) = (z − λ)m for A). A linear
operator N with the property that N m = 0 for some m is called nilpotent.
Suppose that m = n (the dimension of V ). This means that there is some
vector u such that N n−1 u 6= 0. It follows that the vectors u, N u, . . . , N n−1 u are
linearly independent. Indeed, suppose that
α1 u + α2 N u + · · · + αn N n−1 u = 0.
Applying N n−1 to this equation we obtain that α1 N n−1 u = 0. Proceeding inductively we nd that αj = 0 for each j . Thus {N n−1 u, . . . , N u, u} is a basis for V .
The matrix for N in this basis is
â
ì
0
1
0
.. ..
.
.
..
. 1
0
0
6
,
which means that we are done.
In general, a set of non-zero vectors u, . . . , N l−1 u, with N l u = 0 is called a
Jordan chain. We will prove the theorem in general by showing that there is a
basis for V consisting of Jordan chains.
We prove the theorem by induction on the dimension of V . Clearly the theorem
holds if V has dimension 1. Suppose now that the theorem holds for all complex
vector spaces of dimension less than n, where n ≥ 2, and assume that dim V = n.
Since N is nilpotent it is not injective and therefore dim range N < n (by the
dimension theorem). By the induction hypothesis, we can therefore nd a basis of
Jordan chains
ui , N ui , . . . , N li −1 ui ,
i = 1, . . . , k,
for range N . For each ui we can nd a v i ∈ V such that N v i = ui (since
ui ∈ range N ). That is, each Jordan chain in the basis for range N can be
extended by one element. We claim that the vectors
v i , N v i , N 2 v i , . . . , N li v i ,
i = 1, . . . , k,
(3)
are linearly independent. Indeed, suppose that
li
k X
X
(4)
αi,j N j v i = 0.
i=1 j=0
Applying N to this equality, we nd that
k lX
i −1
X
αi,j N j ui =
i=1 j=0
li
k X
X
αi,j N j+1 v i = 0,
i=1 j=0
which, by hypothesis implies that αi,j = 0, 1 ≤ i ≤ k , 0 ≤ j ≤ li − 1. Looking at
(4) this means that
k
X
αi,li N li −1 ui =
i=1
k
X
αi,li N li v i = 0,
i=1
which again implies that αi,li = 0, 1 ≤ i ≤ k , by our induction hypothesis.
Extend the vectors in (3) to a basis for V by possibly adding vectors {w̃1 , . . . , w̃K }.
For each i we have N w̃i ∈ range N , so we can nd an element ŵi in the span of
the vectors in (3) such that N w̃i = N ŵi . But then wi = w̃i − ŵi ∈ ker N and
the vectors
v i , N v i , N 2 v i , . . . , N li v i ,
i = 1, . . . , k,
w1 , . . . , wK
constitute a basis for V consisting of Jordan chains (the elements wi are chains of
length 1).
7
Some Further Remarks
The matrix J is not completely unique, since we can e.g. change the order of the
Jordan blocks. It turns out that this is the only thing which is not unique. In
other words both the number of blocks and their sizes are uniquely determined.
Let us prove this. As in the previous section, it suces to consider a nilpotent
operator N : V → V . Let β be the total number of blocks and β(k) the number of
blocks of size k × k . Then dim ker N = β , and dim ker N 2 diers from dim ker N
by β − β(1). In the same manner, we nd that
dim ker N = β,
dim ker N 2 = dim ker N + β − β(1),
..
.
dim ker N k+1 = dim ker N k + β − β(1) − · · · − β(k).
It follows by induction that each β(k) is uniquely determined by N .
Note that the number of Jordan blocks in the matrix J equals the number of
Jordan chains, so that there may be several Jordan blocks corresponding to the
same eigenvalue. The sum of the lengths of the Jordan chains equals the dimension
of the generalized eigenspace.
Let pchar (z) = det(A − zI) be the characteristic polynomial of A. Recall that
pchar is independent of basis, so that pchar (z) = det(J − zI). Expanding repeatedly
along the rst column we nd that pchar (z) = (−1)n (z − λ1 )n1 · · · (z − λk )nk , where
nj = dim(A − λj I)mj is the dimension of the generalized eigenspace corresponding
to λj . Thus nj = aj , the algebraic multiplicity of λj . By the remarks above about
the uniqueness of J , it follows that the geometric multiplicity gj of each eigenvalue
equals the number of Jordan chains for that eigenvalue.
The exponent mj of the factor (z − λj )mj in the minimal polynomial is the
smallest exponent m such that N m = 0, where N = (A − λj I)|ker(A−λj I)mj .
Thus mj is the length of the longest Jordan chain and mj × mj the size of the
largest Jordan block. Clearly, mj ≤ dim ker(A − λj I)mj = aj . Thus the minimal
polynomial divides the characteristic polynomial. Since pmin (A) = 0 we have
proved the following result.
(Cayley-Hamilton).
of A. Then
Theorem 3
polynomial
Let
pchar (z) = det(A − zI)
be the characteristic
pchar (A) = 0.
Example 3.
Let
Ö
A=
è
1 0
1
0 2
0
−1 0 −1
.
The characteristic polynomial of A is pchar (z) = −z 2 (z − 2). Thus, A has the only
eigenvalues λ1 = 0 and λ2 = 2 with algebraic multiplicities a1 = 2 and a2 = 1,
8
respectively. The minimal polynomial must be z(z −2) or z 2 (z −2), since it divides
pchar (z) and is divisible by z − λj for each j . We nd that
Ö
A − 2I =
è
−1 0
1
0 0
0
−1 0 −3
Ö
A(A − 2I) =
,
è
−2 0 −2
0 0
0
2 0
2
and A2 (A − 2I) = 0, so that pmin (z) = −pchar (z) = z 2 (z − 2). This means that
a basis of generalized eigenvectors must consist of one Jordan chain of length 2
corresponding to the eigenvalue λ1 and one of length 1 corresponding to λ2 . We
can also conclude that the Jordan normal form is
Ö
J=
è
0 1 0
0 0 0
0 0 2
and that g1 = g2 = 1. This can also be seen from the computations
Ax = 0 ⇐⇒ x = z(1, 0, −1),
Ax = 2x ⇐⇒ x = z(0, 1, 0),
z ∈ C. Thus u1 = (1, 0, −1) and u3 = (0, 1, 0) are eigenvectors corresponding to
λ1 and λ2 , respectively. We obtain a basis of generalized eigenvectors by solving
the equation Au2 = u1 . Note that this equation must be solvable, since there has
to be a Jordan chain of length 2 corresponding to u1 . We nd that u2 = (1, 0, 0)
is a solution. We therefore nd that T −1 AT = J , where
Ö
T =
Example 4.
Let
Ö
A=
è
1 1 0
0 0 1
−1 0 0
.
è
3 1 −1
0 2
0
1 1
1
.
The characteristic polynomial of A is pchar (z) = −(z − 2)3 . Thus, A has the only
eigenvalue 2 with algebraic multiplicity 3. The generalized eigenspace is the whole
of C3 . Moreover, the minimal polynomial must be z − 2, (z − 2)2 or (z − 2)3 . We
see that
Ö
è
Ö
è
1 1 −1
0 0 0
0 , (A − 2I)2 = 0 0 0 ,
A − 2I = 0 0
1 1 −1
0 0 0
so that pmin (z) = (z − 2)2 . This means that a basis of generalized eigenvectors
must consist of one Jordan chain of length 2 and one of length 1 (an eigenvector).
We can also conclude that the Jordan normal form is
Ö
J=
è
2 1 0
0 2 0
0 0 2
9
and that the geometric multiplicity is 2. This can also be seen from the computation
Ax = 2x ⇐⇒ x1 + x2 − x3 = 0.
Contrary to the previous example, we cannot nd a basis of generalized eigenvectors by starting with an arbitrary basis of ker(A−2I). Instead, we rst proceed as
in the proof of the Jordan normal form. Notice that range(A − 2I) is spanned by
the vector u1 = (1, 0, 1). By the form of the minimal polynomial, we conclude that
u1 is an eigenvector. Next, we nd a solution of the equation (A − 2I)u2 = u1 ,
e.g. u2 = (1, 0, 0). Finally, we add an eigenvector which is not parallel to u1 , e.g.
u3 = (0, 0, 1). Setting
Ö
è
1 1 0
T = 0 0 1 ,
1 0 1
we have T −1 AT = J .
The Matrix Exponential
Recall that the unique solution of the initial value problem
( 0
x
= Ax,
x(0) = x0 ,
is given by
x(t) = etA x0 .
If J is the normal form of A and A = T AT −1 , we obtain that
(5)
etA = T etJ T −1 ,
where
ê
Ü
etJ 1
etJ =
..
.
,
tJ k
e
and
Ç
å
tmi −1
e =e
I + tN + · · · +
N mi −1 .
(mi − 1)!
Here we don't require that the λi are distinct. In general, the solution of the
initial value problem will be a sum of terms of the form tj eλi t . If A has a basis of
eigenvectors, there will only be terms of the form eλi t .
tJ i
λi t
While we now have an algorithm for computing the matrix exponential, it
involves nding the generalized eigenvectors of A and in the end one also has
to invert the matrix T . There are number of alternative ways of computing the
matrix exponential which avoid the Jordan normal form, most of which are based
on the Cayley-Hamilton theorem. Note that we should be able to express etA =
P∞ tj j
n−1
, since the Cayleyj=0 j! A for each t as a linear combination of I, . . . , A
Hamilton theorem allows us to express any higher power of A in terms of these
matrices. We leave the proof of the following theorem as an exercise.
10
Theorem 4
(Putzer's algorithm).
Let
µ1 , . . . , µ n
be the eigenvalues of
A,
repeated
according to multiplicity. Then
(6)
etA = r1 (t)P 1 + r2 (t)P 2 + · · · + rn (t)P n ,
where
P 2 = A − µ1 I,
P 1 = I,
P n = (A − µ1 I) · · · (A − µn−1 I)
...,
and
r10 = µ1 r1 ,
r20 = µ2 r2 + r1 ,
r1 (0) = 1,
r2 (0) = 0,
.
.
.
rn0 = µn rn + rn−1 ,
Example 5.
Let
rn (0) = 0.
Ö
è
1 0
1
0 2
0
−1 0 −1
A=
be the matrix from Example 3. We have
Ö
etJ =
and
Ö
T −1 =
Therefore,
Ö
etA =
è
1 t 0
0 1 0
0 0 e2t
,
è
0 0 −1
1 0
1
0 1
0
.
è
1+t 0
t
2t
0 e
0
−t 0 1 − t
(7)
.
We can also compute the solution using Putzer's algorithm. We have µ1 =
µ2 = 0, µ3 = 2 and
Ö
P 1 = I,
P 2 = A,
P 3 = A2 =
è
0 0 0
0 4 0
0 0 0
.
The functions r1 , r2 and r3 are determined by r10 = 0, r1 (0) = 1, r20 = r1 , r2 (0) = 0
and r30 = 2r3 + r2 , r3 (0) = 0. We nd that r1 (t) = 1 and r2 (t) = t. Finally
r30 = 2r3 + t, r3 (0) = 0 ⇐⇒ (r3 e−2t )0 = te−2t , r3 (0) = 0
Z t
t 1 e2t
.
⇐⇒ r3 (t) =
se2(t−s) ds = − − +
2 4
4
0
Evaluating r1 (t)P 1 + r2 (t)P 2 + r3 (t)P 3 we recover (7).
11
Exercises
Let
Exercise 1.
Ö
A1 =
è
Ö
0
1
1
0 −1 −1
0 −1 −1
,
A2 =
è
1
1
2
1 −1
0
−1 −1 −2
Ö
,
A3 =
1
2 0
3
1 3
0 −2 1
è
.
Calculate the Jordan normal form of each Aj and nd a matrix T j such that
T −1
j Aj T j is in Jordan normal form. What is the minimal polynomial of Aj ?
Calculate etAj , rst by using the Jordan normal form and then by
using Putzer's algorithm.
Exercise 2.
Exercise 3.
Consider the initial value problem
( 0
x
1
x02
= x1 + 2x2 ,
= 2x1 + x2 ,
x(0) = x0 .
For which initial data x0 does the solution converge to zero as t → ∞?
Can you nd a general condition on the eigenvalues of A which
guarantees that all solutions of the IVP
Exercise 4.
x0 = Ax,
x(0) = x0 .
converge to zero as t → ∞?
Exercise 5. The matrices A1 and A2 in Exercise 1 have the same eigenvalues.
If you've solved Exercise 2 correctly, you will notice that all solutions of the IVP
corresponding to A1 are bounded for t ≥ 0 while there are unbounded solutions
of the IVP corresponding to A2 . Explain the dierence and try to formulate a
general principle.
Exercise 6.
Show that X(t) = etA is the unique solution of the problem
X 0 (t) = AX(t),
X(0) = I.
(8)
Exercise 7. Prove Theorem 4 by showing that the right hand side of (6) is a
solution of (8).
Hint: AP n = µn I , AP j = µj P j + P j+1 , 1 ≤ j ≤ n − 1.
12