Notes on the matrix exponential

The matrix exponential
Erik Wahlén
[email protected]
October 3, 2014
1
Definition and basic properties
These notes serve as a complement to Chapter 7 in Ahmad and Ambrosetti. In
Chapter 7, the general solution of the equation
x̄0 = Ax̄
(1)
is computed using eigenvalues and eigenvectors. This method works when A has
n distinct eigenvalues or, more generally, when there is a basis for Cn consisting of
eigenvectors of A. If the latter case, we say that A is diagonalizable. In this case,
the general solution is given by
x̄(t) = c1 etλ1 v̄1 + · · · + cn etλn v̄n ,
where λ1 , . . . , λn are the, not necessarily distinct, eigenvalues of A and v̄1 , . . . , v̄n
the corresponding eigenvectors. Example 7.4.5 in the book illustrates what can
happen if A is not diagonalizable, i.e. if there is no basis of eigenvectors of A. In
these notes we will discuss this question in a more systematic way.
Our starting point is to write the general solution of (1) as
x̄(t) = etA c̄,
where c̄ is an arbitrary vector in Cn , in analogy with the scalar case n = 1. We
then have to begin by defining etA and show that the above formula actually gives
a solution of the equation. Note that it is simpler to work in the complex vector
space Cn than Rn , since even if A is real it might have complex eigenvalues and
eigenvectors. If A should happen to be real, we can restrict to real vectors c̄ ∈ Rn
to obtain real solutions.
Recall from Taylor’s formula that
t
e =
n
X
tk
k=0
where
rn (t) =
k!
+ rn (t),
es tn+1
,
(n + 1)!
1
for some s between 0 and t. Since limn→∞ rn (t) = 0, we have
t
e =
∞
X
tk
k=0
k!
t ∈ R.
,
This is an example of a power series.
More generally a power series is a series of the form
∞
X
ak (x − x0 )k ,
k=0
where the the coefficients ak are complex numbers. The number
R := sup{r ≥ 0 : {|ak |rk }∞
k=0 is bounded}
is called the radius of convergence of the series. The significance of the radius of
convergence is that the power series converges inside of it, and that the series can
be manipulated as though it were a finite sum there (e.g. differentiated termwise).
To prove this we first show a preliminary lemma.
Lemma 1. Suppose that {fn } is a sequence of continuously differentiable functions
on an interval I. Assume that fn0 → g uniformly on I and that {fn (a)} converges
for some a ∈ I. Then {fn } converges pointwise on I to some function f . Moreover,
f is continuously differentiable with f 0 (x) = g(x).
Proof. We have
Z
x
fn (x) = fn (a) +
fn0 (s) ds,
∀n.
a
Since fn0 (s) → g(s) uniformly on the interval between a and x, we can take the
limit under the integral and obtain that fn (x) converges to f (x) defined by
Z x
f (x) = f (a) +
g(s) ds, x ∈ I,
a
where f (a) = limn→∞ fn (a). The fundamental theorem of calculus now shows that
f is continuously differentiable with f 0 (x) = g(x).
Theorem 2. Assume that the power series
(2)
∞
X
ak (x − x0 )k
n=0
has positive radius of convergence. The series converges uniformly and absolutely
in the interval [x0 − r, x0 + r] whenever 0 < r < R and diverges when |x − x0 | > R.
The limit is infinitely differentiable and the series can be differentiated termwise.
Proof. If |x−x0 | > R the sequence ak (x−x0 )k is unbounded, so the series diverges.
Consider an interval |x − x0 | ≤ r, where r ∈ (0, R). Choose r < r̃ < R. Then
r k
r k
≤C
,
|ak (x − x0 )k | ≤ |ak |r̃k
r̃
r̃
2
when |x − x0 | ≤ r, where C is a constant such that |ak |r̃k ≤ C ∀k. Since r/r̃ < 1
we find that
∞ X
r k
< ∞.
r̃
k=0
Weierstrass’ M-test then shows that (2) converges uniformly when |x − x0 | ≤ r
to some function S(x). S is at least continuous since it’s the uniform limit of a
sequence of continuous functions. If S is differentiable and can be differentiated
termwise, then the derivative is
∞
X
(k + 1)ak+1 (x − x0 )k .
k=0
This is a again a power series with radius of convergence R (prove this!). Hence
it converges uniformly on [x0 − r, x0 + r], 0 < r < R. It follows from the previous
lemma that S is continuously differentiable on (x0 − R, x0 + R) and that the power
series can be differentiated termwise. An induction argument shows that S is
infinitely many times differentiable.
The interval (x0 − R, x0 + R) is called the interval of convergence. The power
series also converges for complex x with |x − x0 | < R (hence the name radius of
convergence). All of the above properties still hold for complex x, but the derivative
must now be understood as a complex derivative. This concept is studied in courses
in complex analysis; we will not linger on it here.
Let us now return to matrices. The set of complex n × n matrices is denoted
C .
n×n
Definition 3. If A ∈ Cn×n , we define
etA =
∞ k k
X
t A
k=0
In particular,
A
e =
k!
,
∞
X
Ak
k=0
k!
t ∈ R.
.
For this to make sense, we have to show that the series converges for all t ∈ R.
n×n
A matrix sequence {Ak }∞
, is said to converge if it converges elementk=1 , Ak ∈ C
wise, i.e. [Ak ]ij converges as k → ∞ for all i and j, where the notation [A]ij is used
to denote the element on row i and column j of A (this will also be denoted aij
in some places). As usual, a matrix series is said to converge if the corresponding
(matrix) sequence of partial sums converges. Instead of checking if all the elements
converge, it is sometimes useful to work with the matrix norm.
Definition 4. The matrix norm of A is defined by
kAk =
max
x̄∈Cn ,|x̄|=1
3
|Ax̄|.
Equivalently, kAk is the smallest K ≥ 0 s.t.
∀x̄ ∈ Cn
|Ax̄| ≤ K|x̄|,
since
x̄ x̄ |Ax̄| = A |x̄|
=
A
|x̄| and
|x̄| |x̄| x̄ = 1.
|x̄| The following proposition follows from the definition (prove this!).
Proposition 5. The matrix norm satisfies
(1) kAk ≥ 0, with equality iff A = 0.
(2) kzAk = |z|kAk, z ∈ C.
(3) kA + Bk ≤ kAk + kBk.
A more interesting property is the following.
Proposition 6.
kABk ≤ kAkkBk.
Proof. We have
|AB x̄| ≤ kAk|B x̄| ≤ kAkkBk|x̄| = kAkkBk if |x̄| = 1.
Proposition 7.
(1) |aij | ≤ kAk.
P
1/2
2
(2) kAk ≤
|a
|
.
ij
i,j
Proof. (1) Recall that aij = [A]ij . Let {ē1 , . . . , ēn }, be the standard basis. Then
 
a1j
 .. 
Aēj =  .  ⇒ |aij | ≤ kAēj k ≤ kAk|ēj | ≤ kAk.
anj
 


R1
R1 x̄
(2) We have that
 


Ax̄ =  ...  x̄ =  ...  ,
Rn
Rn x̄
where
Ri = ai1 · · ·
ain
is row i of A. Hence
kAx̄k2 =
n
X
i=1
kRi x̄k2
≤
n
X
Cauchy-Schwarz
i=1
!
|Ri |2 |x̄|2 =
!
X
|aij |2
|x̄|2 .
i,j
The following corollary is an immediate consequence of the above proposition.
4
n×n
and A ∈ Cn×n . Then
Corollary 8. Let {Ak }∞
k=1 ⊂ C
lim Ak = A ⇔ lim kAk − Ak → 0.
k→∞
k→∞
We can now show that our definition of the matrix exponential makes sense.
P
tk Ak
tA
Proposition 9. The series ∞
converges uniformly on compact
k=0 k! defining e
tA
intervals. Moreover, the function t 7→ e is differentiable with derivative AetA .
P
tk Ak
Proof. Each matrix element of ∞
k=0 k! is a power series in t with coefficients
[Ak ]ij /k!. The radius of convergence is infinite, R = ∞, since
k [A ]ij k kAk krk
kAkk rk
r ≤
≤
→0
k! k!
k!
as k → ∞ for all r ≥ 0. The series thus converges uniformly on any compact
interval and pointwise on R. Differentiating termwise, we obtain
∞
∞
d X tk Ak X tk−1 Ak
=
dt k=0 k!
(k − 1)!
k=1
= A
j=k−1
∞ j j
X
tA
j=0
j!
= AetA .
Since e0·A = I, we obtain the following result.
Theorem 10. The general solution of (1) is given by
x̄(t) = etA c̄,
c̄ ∈ Cn
and the unique solution of the IVP
x̄0 = Ax̄,
x̄(0) = x̄0
is given by
x̄(t) = etA x̄0 .
In certain cases it is possible to compute the matrix exponential directly from
the definition.
Example 11. Suppose that


λ1 0 · · · 0
 0 λ2 · · · 0 


A =  ..
.. . .
.. 
.
. .
.
0 0 · · · λn
is a diagonal matrix. Then


λk1 0 · · · 0
 0 λk · · · 0 
2


k
A =  ..
.. . .
..  ,
.
. .
.
0 0 · · · λkn
5
so
P
e
tA
∞ tk λk1
k=0 k!


=


0
..
.
···
···
...
0
tk λk2
k=0 k!
P∞
..
.
0
0
···

0
0
..
.
P∞
k=0
tk λkn
k!







0
etλ1 0 · · ·
 0 etλ2 · · ·
0 


=  ..
..
..  .
.
.
 .
.
.
. 
tλn
0
0 ··· e
Example 12. Let’s compute etA where
0 1
A=
.
−1 0
We find that
−1
0
2
A =
,
0 −1
0 −1
A =
,
1
0
3
1 0
A =
and A4j+r = Ar .
0 1
4
Hence,
(−1)j t2j
j=0 (2j)!
P
(−1)j t2j+1
− ∞
j=0 (2j+1)!
P∞
tA
e
=
(−1)j t2j+1
j=0 (2j+1)!
P∞ (−1)j t2j
j=0 (2j)!
P∞
!
=
cos t sin t
.
− sin t cos t
Let’s now look at the case of a general diagonalizable matrix A. As already
mentioned, A is diagonalizable if there is a basis for Cn consisting of eigenvectors
v̄1 , . . . , v̄n of A. The corresponding eigenvalues λ1 , . . . , λn don’t have to be distinct.
In this case we know from Chapter 7 in Ahmad and Ambrosetti that the general
solution of (1) is given by
x̄(t) = c1 eλ1 t v̄1 + · · · + cn eλn t v̄n .
In matrix notation, we can write this as

x̄(t) = eλ1 t v̄1 · · ·
or

x̄(t) = v̄1 · · ·
 
c1
λn t   .. 
e v̄n  .  ,
cn
 
  tλ1
e
···
0
c1



.
.
...
..   ... 
v̄n   ..
.
tλn
0 ··· e
cn
Denote the matrix to the left by T . If we want to find the solution with x̄(0) = x̄0 ,
we have to solve the system of equations
T c̄ = x̄0 ⇔ c̄ = T −1 x̄0 .
6
Thus the solution of the IVP is
x̄(t) = T etD T −1 x̄0 ,
where


λ1 · · · 0


D =  ... . . . ...  .
0 · · · λn
This can also be seen by computing the matrix exponential of etA using the following proposition.
Proposition 13.
eT BT
−1
= T eB T −1
Proof. This follows by noting that
(T BT −1 )k = (T BT −1 )(T BT −1 ) · · · (T BT −1 )
= T B(T −1 T )B(T −1 T ) · · · (T −1 T )BT −1
= T B k T −1 ,
whence
∞
X
(T BT −1 )k
k=0
k!
=
∞
X
T B k T −1
k=0
k!
= T eB T −1 .
Returning to the discussion above, if A is diagonalizable, then it can be written
(3)
A = T DT −1
with T and D as above. D is the matrix for the linear operator x̄ 7→ Ax̄ in the
basis v̄1 , . . . , v̄n . The matrix for this linear operator in the standard basis is simply
A. Equation (3) is the change of basis formula for matrices. For convenience we
will also refer to D as the matrix for A in the basis v̄1 , . . . , v̄n , although this is a
bit sloppy. From (3) and Proposition 13 it follows that
etA = T etD T −1 .
The matrix for etA in the basis v̄1 , . . . , v̄n is thus given by etD . The solution of the
IVP is given by
etA x̄0 = T etD T −1 x̄0 ,
confirming our previous result.
We finally record some properties of the matrix exponential which will be useful
for matrices which can’t be diagonalized.
Lemma 14. AB = BA ⇒ eA B = BeA .
Proof. Ak B = Ak−1 BA = · · · = AB k . Thus
!
N
X
Ak
lim
B = lim B
N →∞
N →∞
k!
k=0
7
N
X
Ak
k=0
k!
!
.
Proposition 15.
(1) (eA )−1 = e−A .
(2) eA eB = eA+B = eB eA if AB = BA.
(3) etA esA = e(t+s)A .
Proof. (1) We have that
d tA −tA
e e
= AetA e−tA + etA (−Ae−tA ) = (A − A)etA e−tA = 0,
dt
where we have used the previous lemma to interchange the order of etA and A.
Hence
etA e−tA ≡ C (constant matrix).
Setting t = 0, we find that C = I (identity matrix). Setting t = 1, we find that
eA e−A = I.
(2) We have that
d t(A+B) −tA −tB
e
e e
= (A + B)et(A+B) e−tA e−tB − et(A+B) Ae−tA e−tB − et(A+B) e−tA Be−tB
dt
= (A + B − (A + B))etA etB e−t(A+B)
= 0,
where we have used the previous lemma in the second line. As in (1) we obtain
eA eB e−(A+B) = I ⇒ eA eB = eA+B , where we have used (1).
(3) follows from (2) since tA and sA commute.
Let’s use this to compute the matrix exponential of a matrix which can’t be
diagonalized.
Example 16. Let
D=
2 0
,
0 2
0 1
N=
0 0
and
A=D+N =
2 1
.
0 2
The matrix A is not diagonalizable, since the only eigenvalue is 2 and C x̄ = 2x̄
has the solution
1
x̄ = z
, z ∈ C.
0
Since D is diagonal, we have that
tD
e
=
e2t 0
.
0 e2t
Moreover, N 2 = 0 (confirm this!), so
tN
e
1 t
= I + tN =
0 1
8
Since D and N commute, we find that
2t
2t
1 t
e te2t
e
0
tA
tD+tN
tD tN
.
=
e =e
=e e =
0 e2t
0 1
0 e2t
Exercise 5 below shows that the hypothesis that A and B commute in Proposition 15b is essential.
2
Generalized eigenvectors
We know now how to compute the matrix exponential when A is diagonalizable. In
the next section we will discuss how this can be done when A is not diagonalizable.
In order to do that, we need to introduce some more advanced concepts from linear
algebra. When A is diagonalizable there is a basis consisting of eigenvectors. The
main idea when A is not diagonalizable is to replace this basis by a basis consisting
of ‘generalized eigenvectors’.
Definition 17. Let A ∈ Cn×n . A vector v̄ ∈ Cn , v̄ 6= 0, is called a generalized
eigenvector corresponding to the eigenvalue λ if
(A − λI)m v̄ = 0
for some integer m ≥ 1.
Note that according to this definition, an eigenvector also qualifies as a generalized eigenvector. We also remark that λ in the above definition has to be an
eigenvalue since if m ≥ 1 is the smallest positive integer such that (A − λI)m v̄ = 0,
then w̄ = (A − λI)m−1 v̄ 6= 0 and
(A − λI)w̄ = (A − λI)(A − λI)m−1 v̄ = (A − λI)m v̄ = 0,
so w̄ is an eigenvector and λ is an eigenvalue.
The goal of this section is to prove the following theorem.
Theorem 18. Let A ∈ Cn×n . Then there is a basis for Cn consisting of generalized
eigenvectors of A.
We will also discuss methods for constructing such a basis.
Before proving the main theorem, we discuss a number of preliminary results.
Let V be a vector space and let V1 , . . . , Vk be subspaces. We say that V is the
direct sum of V1 , . . . , Vk if each vector x̄ ∈ V can be written in a unique way as
x̄ = x̄1 + x̄2 + · · · + x̄k ,
where
x̄i ∈ Vi ,
If this is the case we use the notation
V = V1 ⊕ V2 ⊕ · · · ⊕ Vk .
We say that a subspace W of V is invariant under A if
x ∈ W ⇒ Ax ∈ W.
9
i = 1, . . . , k.
Example 19. Suppose that A has n distinct eigenvalues λ1 , . . . , λn with corresponding eigenvectors v̄1 , . . . , v̄n . It then follows that the vectors v̄1 , . . . , v̄n are
linearly independent (see Theorem 7.1.2 in Ahmad and Ambrosetti) and thus form
a basis for Cn . Let
ker(A − λi I) = span{v̄i },
i = 1, . . . , n,
be the corresponding eigenspaces. Here
ker B = {x̄ ∈ Cn : B x̄ = 0}
is the kernel (or null space) of an n × n matrix B. Then
Cn = ker(A − λ1 I) ⊕ ker(A − λ2 I) ⊕ · · · ⊕ ker(A − λn I)
by the definition of a basis. It is also clear that each eigenspace is invariant under
A.
More generally, suppose that A is diagonalizable, i.e. that it has k distinct
eigenvalues λ1 , . . . , λk and that the geometric multiplicity of each eigenvalue λi
equals the algebraic multiplicity ai . Let ker(A − λi I), i = 1, . . . , k, be the corresponding eigenspaces. We can then find a basis for each eigenspace consisting of
ai eigenvectors. The union of these bases consists of a1 + · · · + ak = n elements
and is linearly independent, since eigenvectors belonging to different eigenvalues
are linearly independent (this follows from an argument similar to Theorem 7.1.2
in Ahmad and Ambrosetti). We thus obtain a basis for Cn and it follows that
Cn = ker(A − λ1 I) ⊕ ker(A − λ2 I) ⊕ · · · ⊕ ker(A − λk I).
In this basis, A has the matrix


λ1 I1
...

D=


λk Ik
where each Ii is an ai × ai unit matrix. In other words, D is a diagonal matrix
with the eigenvalues on the diagonal, each repeated ai times. This explains why
A is called diagonalizable.
Let A ∈ Cn×n be a square matrix and p(λ) = a0 λm + a1 λm−1 + · · · + am−1 λ + am
a polynomial. We define
p(A) = a0 Am + a1 An−1 + · · · + am−1 A + am I.
Theorem 20 (Cayley-Hamilton). Let pA (λ) = det(A − λI) be the characteristic
polynomial of A. Then pA (A) = 0.
Proof. Cramer’s rule from linear algebra says that
(A − λI)−1 =
1
adj(A − λI),
pA (λ)
10
where the adjugate matrix adj B is a matrix whose elements are the cofactors of
B. More specifically, the element on row i and column j of the adjugate matrix for
B is Cji , where Cji = (−1)j+i det Bji , in which Bji is the (n − 1) × (n − 1) matrix
obtained by eliminating row j and column i from B (see sections 7.1.2–7.1.3 of
Ahmad and Ambrosetti). Note that each element of adj(A−λI) is a polynomial in
λ of degree at most n − 1 (since at least one element of the diagonal is eliminated).
Thus,
pA (λ)(A − λI)−1 = λn−1 Bn−1 + · · · + λB1 + B0 ,
for some constant n × n matrices B0 , . . . , Bn−1 . Multiplying with A − λI gives
pA (λ)I = pA (λ)(A − λI)(A − λI)−1
= −λn Bn−1 + λn−1 (ABn−1 − Bn−2 ) + · · · + λ(AB1 − B0 ) + AB0 .
Thus,
−Bn−1 = a0 I,
ABn−1 − Bn−2 = a1 I,
..
.
AB1 − B0 = an−1 I,
AB0 = an I,
where
pA (λ) = a0 λn + a1 λn−1 + · · · + an−1 λ + an .
Multiplying the rows by An , An−1 , . . ., A, I and adding them, we get
pA (A) = a0 An + a1 An−1 + · · · + an−1 A + an I
= −An Bn−1 + An−1 (ABn−1 − Bn−2 ) + · · · + A(AB1 − B0 ) + AB0
= −An Bn−1 + An Bn−1 − An−1 Bn−2 + · · · + A2 B1 − AB0 + AB0
= 0.
Lemma 21. Suppose that p(λ) = p1 (λ)p2 (λ) where p1 and p2 are relatively prime.
If p(A) = 0 we have that
Cn = ker p1 (A) ⊕ ker p2 (A)
and each subspace ker pi (A) is invariant under A.
Proof. The invariance follows from pi (A)Ax = Api (A)x = 0, x ∈ ker pi (A). Since
p1 and p2 are relatively prime, it follow by Euclid’s algorithm that there exist
polynomials q1 , q2 such that
p1 (λ)q1 (λ) + p2 (λ)q2 (λ) = 1.
Thus
p1 (A)q1 (A) + p2 (A)q2 (A) = I.
11
Applying this identity to the vector x̄ ∈ Cn , we obtain
x̄ = p1 (A)q1 (A)x̄ + p2 (A)q2 (A)x̄,
|
{z
} |
{z
}
x̄2
x̄1
where
p2 (A)x̄2 = p2 (A)p1 (A)q1 (A)x̄ = p(A)q1 (A)x̄ = 0,
so that x̄2 ∈ ker p2 (A). Similarly x̄1 ∈ ker p1 (A). Thus V = ker p1 (A) + ker p2 (A).
On the other hand, if
x̄1 + x̄2 = x̄01 + x̄02 ,
x̄i , x̄0i ∈ ker pi (A), j = 1, 2,
we obtain that
ȳ = x̄1 − x̄01 = x̄02 − x̄2 ∈ ker p1 (A) ∩ ker p2 (A),
so that
ȳ = p1 (A)q1 (A)ȳ + p2 (A)q2 (A)ȳ = q1 (A)p1 (A)ȳ + q2 (A)p2 (A)ȳ = 0.
It follows that the representation x̄ = x̄1 + x̄2 is unique and therefore
Cn = ker p1 (A) ⊕ ker p2 (A).
Recall that the characteristic polynomial pA (λ) = det(A−λI) can be factorized
as
pA (λ) = (−1)n (λ − λ1 )a1 · · · (λ − λk )ak
where λ1 , . . . , λk are the distinct eigenvalues of A and a1 , . . . , ak the corresponding
algebraic multiplicities. Applying Theorem 20 and Lemma 21 to pA (λ) we obtain
the following important result.
Theorem 22. We have that
Cn = ker(A − λ1 I)a1 ⊕ · · · ⊕ ker(A − λk I)ak ,
where each ker(A − λi I)ai is invariant under A.
Proof. We begin by noting that the polynomials (λ − λi )ai , i = 1, . . . , k, are relatively prime. Repeated application of Lemma 21 therefore shows that
Cn = ker(A − λ1 I)a1 ⊕ · · · ⊕ ker(A − λk I)ak ,
with each ker(A − λi I)ai invariant.
The space ker(A − λi I)ai is called the generalized eigenspace corresponding
to λi . We leave it as an exercise to show that this is the space spanned by all
generalized eigenvectors of A corresponding to λi .
We can now prove Theorem 18.
12
Proof of Theorem 18. If we select a basis {v̄i,1 , . . . , v̄i,ni } for each subspace ker(A−
λi I)ai , then the union {v̄1,1 , . . . , v̄1,n1 , v̄2,1 , . . . , v̄2,n2 , . . . , v̄k,1 , . . . , v̄k,nk } will be a
basis for Cn consisting of generalized eigenvectors. The fact these vectors are
linearly independent follows from Theorem 22. Indeed, suppose that
!
ni
k
X
X
αi,j v̄i,j = 0.
i=1
j=1
P i
Then by the definition of a direct sum, we have nj=1
αi,j v̄i,j for each i. But then
for each i, αi,j = 0, j = 1, . . . , ni since {v̄i,1 , . . . , v̄i,ni } is linearly independent.
When A is diagonalizable it takes the form of a diagonal matrix in the eigenvector basis. We might therefore ask what a general matrix looks like in a basis
of generalized eigenvalues. Since each generalized eigenspace is invariant under A,
the matrix in the new basis will be block diagonal:


B1


...
B=
,
Bk
where each Bi is an ni × ni matrix, ni = dim ker(A − λi I)ai . Moreover, Bi only has
one eigenvalue λi . Indeed, Bi is the matrix for the restriction of A to ker(A−λi I)ai
in the basis v̄i,1 , . . . , v̄i,ni and if Av̄ = λv̄ for some non-zero v̄ ∈ ker(A − λi I)ai ,
then
0 = (A − λi I)ai v̄ = (λ − λi )ai v̄ ⇒ λ = λi .
It follows that the dimension of ker(A − λi I)ai equals the algebraic multiplicity of
the eigenvalue λi , that is, ni = ai . This follows since
(−1)n (λ − λ1 )a1 · · · (λ − λk )ak = det(A − λI)
= det(B − λI)
= det(B1 − λI1 ) · · · det(Bk − λIk )
= (−1)n (λ − λ1 )n1 · · · (λ − λk )nk ,
where we have used the facts that the determinant of a matrix is independent of
basis, and that the determinant of a block diagonal matrix is the product of the
determinants of the blocks.
Set Ni = Bi − λi Ii , where Ii is the ni × ni unit matrix. Then Niai = 0 by the
definition of the generalised eigenspaces. A linear operator N with the property
that N m = 0 for some m is called nilpotent.
We can summarize our findings as follows.
Theorem 23. Let A ∈ Cn×n . There exists a basis for Cn in which A has the block
diagonal form


B1


..
B=
,
.
Bk
and Bi = λi Ii + Ni , where λ1 , . . . , λk are the distinct eigenvalues of A, Ii is the
ai × ai unit matrix and Ni is nilpotent.
13
We remark that matrix B in the above theorem are not unique. Apart from the
order of the blocks Bi , the blocks themselves depend on the particular bases chosen
for the generalized eigenspaces. There is a particularly useful way of choosing these
bases which gives rise to the Jordan normal form. Although the Jordan normal
form will not be required to compute the matrix exponential we present it here for
completeness and since it is mentioned in Ahmad and Ambrosetti.
Theorem 24. Let A ∈ Cn×n . There exists an invertible n × n matrix T such that
T −1 AT = J,
where J is a block diagonal matrix,


J1

J =
...


Jm
and each block Ji is a square matrix of the form


λ 1
0
 ... ... 


,
Ji = λI + N = 
... 

1
0
λ
where λ is an eigenvalue of A, I is a unit matrix and N has ones on the line directly
above the diagonal and zeros everywhere else. In particular, N is nilpotent.
See http://en.wikipedia.org/wiki/Jordan_normal_form or any advanced textbook in linear algebra for more information. Note in particular that there is an
alternative version of the Jordan normal form for real matrices with complex eigenvalues, which is briefly mentioned in Ahmad and Ambrosetti and discussed in more
detail on the Wikipedia page.
3
Computing the matrix exponential
We will now use the results of the previous section in order to find an algorithm
for computing etA when A is not diagonalizable. Note that our main interest
is actually in solving the equation x̄0 = Ax̄, possibly with the initial condition
x̄(0) = x̄0 . As we will see, this doesn’t actually require computing etA explicitly.
As previously mentioned, when A is diagonalizable, the general solution of
x̄0 = Ax̄ is given by
(4)
x̄(t) = c1 eλ1 t v̄1 + · · · + cn eλn t v̄n .
If we want to solve the IVP
(5)
x̄0 = Ax̄,
x̄(0) = x̄0 ,
14
we simply choose c1 , . . . , cn so that
x̄(0) = c1 v̄1 + · · · + cn v̄n = x̄0 .
In other words, the numbers ci are the coordinates for the vector x̄0 in the basis
v̄1 , . . . , v̄n . Note that each term eλi t v̄i in the solution is actually etA v̄i . Since the
jth column of the matrix etA is given by etA ēj , where {ē1 , . . . , ēn } is the standard
basis, we can compute the matrix exponential by repeating the above steps with
initial data x̄0 = ēj for j = 1, . . . , n.
The same approach works when A is not diagonalizable, with the difference
that the basis vectors v̄i are now generalized eigenvectors instead of eigenvectors.
Denote the basis vectors v̄i,j , i = 1, . . . , k, j = 1, . . . , ai , as in the proof of Theorem 18 (recall that the dimension of each generalized eigenspace is the algebraic
multiplicity of the corresponding eigenvalue). Then the general solution is
x̄(t) =
ai
k X
X
ci,j etA v̄i,j
i=1 j=1
We thus need to compute
etA v̄i,j ,
where v̄i,j is a generalized eigenvector corresponding to the eigenvalue λi . Since
(A − λi I)ai v̄i,j = 0,
we find that
etA v̄i,j = etλi I+t(A−λi I) v̄i,j
= etλi I et(A−λi I) v̄i,j
= eλi t et(A−λi I) v̄i,j
ta−1
t2
2
ai −1
λi t
(A − λi I)
v̄i,j ,
=e
I + t(A − λi I) + (A − λi I) + · · · +
2
(a − 1)!
where we have also used the fact that I and (A − λI) commute and the definition
of the matrix exponential. The general solution can therefore be written
!
ai
aX
k X
i −1 `
X
t
(6)
x̄(t) =
ci,j eλi t
(A − λi I)` v̄i,j .
`!
i=1 j=1
`=0
In order to solve the IVP (5), we simply have to express x̄0 in the basis v̄i,j to find
the coefficients ci,j . Finally, to compute etA we repeat the above steps for each
standard basis vector ēj .
Remark 25. Formula (6) shows that the general solution is a linear combination
of exponential functions multiplied with polynomials. The polynomial factors can
only appear for eigenvalues with (algebraic) multiplicity two or higher. This is
precisely the same structure which we encountered for homogeneous higher order
scalar linear differential equations with constant coefficients.
15
Remark 26. When A is diagonalizable we see from (4) that the general solution
is just a linear combination of exponential functions, without polynomial factors.
This seems to contradict the previous remark. A closer look reveals that in this
case
(A − λi I)2 v̄i,j = 0
(7)
for each j. Indeed, we know that ker(A − λi I) ⊆ ker(A − λi I)ai . If A is diagonalizable then then the geometric multiplicity dim ker(A − λi I) equals the algebraic
multiplicity ai = dim ker(A − λi I)ai , so the two subspaces coincide. This means
that every generalized eigenvector is in fact an eigenvector, so that (7) holds. This
implies that
(A − λi I)` v̄i,j = 0, ` ≥ 2
in (6) even if ai ≥ 2.
Remark 27. More generally, it might happen that (A − λi I)` vanishes on the
generalized eigenspace ker(A − λi I)ai for some ` < ai − 1. One can show that there
is a unique monic polynomial pmin (λ) such that pmin (A) = 0. pmin (λ) is called
the minimal polynomial. It divides the characteristic polynomial pA (λ) and can
therefore be factorized as
pmin (λ) = (λ − λ1 )m1 · · · (λ − λk )mk ,
with mi ≤ ai for each i. Repeating the proof of Theorem 22, we obtain that
Cn = ker(A − λ1 I)m1 ⊕ · · · ⊕ ker(A − λk I)mk ,
so that ker(A−λi I)ai = ker(A−λi I)mi , i.e. (A−λi I)mi vanishes on ker(A−λi I)ai .
In fact, one can show that (A − λi I)m vanishes on ker(A − λi I)ai if and only if
m ≥ mi . In the diagonalizable case we have
pmin (λ) = (λ − λ1 ) · · · (λ − λk ),
i.e. mi = 1 for all i.
We now consider some examples.
Example 28. Let
1 2
A=
.
2 1
The characteristic polynomial is (λ − 1)2 − 4 = (λ + 1)(λ − 3), so the distinct
eigenvalues are λ1 = −1 and λ2 = 3. Since the eigenvalues are distinct, there is
a basis consisting of eigenvectors. Solving the equations Ax̄ = −x̄ and Ax̄ = 3x̄,
we find the eigenvectors v̄1 = (1, −1) and v̄2 = (1, 1). The general solution of the
system x̄0 = Ax̄ is thus given by
1
−t
3t
−t
3t 1
x̄(t) = c1 e v̄1 + c2 e v̄2 = c1 e
+ c2 e
−1
1
16
In order to compute the matrix exponential, we find the solutions with x̄(0) = ē1
and x̄(0) = ē2 , respectively. In the first case, we obtain the equations
1
1
1
c1
+ c2
=
−1
1
0
(
c1 + c2 = 1
⇔
−c1 + c2 = 0
(
c1 = 21
⇔
c2 = 12 .
In the second case, we find that
1
1
0
c1
+ c2
=
1
−1
1
(
c1 = − 12
⇔
c2 = 21 .
Hence,
1 −t 1
1 3t 1
1 e−t + e3t
e ē1 = e
+ e
=
−1
1
2
2
2 −e−t + e3t
At
and
1 −t 1
1 3t 1
1 −e−t + e3t
.
e ē2 = − e
+ e
=
−1
1
2
2
2 e−t + e3t
At
Finally,
At
e
1
=
2
e−t + e3t −e−t + e3t
.
−e−t + e3t e−t + e3t
Example 29. Let
−3 4
A=
.
−1 1
The characteristic polynomial is (λ + 1)2 , so λ1 = −1 is the only eigenvalue. This
means that any vector belongs to the generalized eigenspace ker(A + I)2 , so that
eAt v̄ = e−t e(A+I)t v̄
= e−t (I + t(A + I))v̄
1 0
−2 4
−t
=e
+t
v̄
0 1
−1 2
4t
−t 1 − 2t
=e
v̄.
−t
1 + 2t
In particular
e
At
−t
=e
1 − 2t
4t
.
−t
1 + 2t
17
We could come to the same conclusion by using the standard basis vectors ē1 and
ē2 as our basis v̄1,1 , v̄1,2 in the solution formula (6). This would give the general
solution
4t
−t
tA
tA
−t 1 − 2t
.
+ c2 e
x̄(t) = c1 e ē1 + c2 e ē2 = c1 e
1 + 2t
−t
There is one more possibility for 2 × 2 matrices. The matrix could be diagonalizable and still have a double eigenvalue. We leave it as an exercise to show
that this can happen if and only if A is a scalar multiple of the identity matrix,
i.e. A = λI for some number λ (which will be the only eigenvalue). In this case
etA = eλt I.
For 3 × 3 matrices there are more possibilities.
Example 30. Let


1 0
1
0 .
A= 0 2
−1 0 −1
The characteristic polynomial of A is pA (λ) = −λ2 (λ − 2). Thus, A has the only
eigenvalues λ1 = 0 and λ2 = 2 with algebraic multiplicities a1 = 2 and a2 = 1,
respectively. We find that
Ax̄ = 0 ⇐⇒ x̄ = z(1, 0, −1),
Ax̄ = 2x̄ ⇐⇒ x̄ = z(0, 1, 0),
z ∈ C. Thus v̄1 = (1, 0, −1) and v̄2 = (0, 1, 0) are eigenvectors corresponding to λ1
and λ2 , respectively. We see that A is not diagonalizable.
The generalised eigenspace corresponding to λ2 is simply the usual eigenspace
ker(A − 2I), but the one corresponding to λ1 is ker A2 . Calculating


0 0 0
A2 = 0 4 0 ,
0 0 0
we find e.g. the basis v̄1,1 = v̄1 = (1, 0, −1), v̄1,2 = (1, 0, 0) for ker A2 and we
previously found the basis v̄2,1 = v̄2 = (0, 1, 0) for ker(A − 2I).
We have
 
 
0
0
tA  
2t  
1 =e
1
e
0
0
and


   
1
1
1
tA 
0·t 



0 =e
0 =
0 .
e
−1
−1
−1
18
Finally,
 
  
  

1
1
1+t
0
t
1
1+t
1 + 2t
0  0 =  0  .
etA 0 = (I + tA) 0 =  0
0
0
−t
0
1−t
0
−t
We can thus already write the general solution as
 

 

1
1+t
0
(8)
x̄(t) = c1  0 + c2  0  + c3 e2t 1 ,
−1
−t
0
where we chose to use the simpler notation c1 , c2 , c3 for the coefficients instead of
c1,1 , c1,2 , c2,1 from eq. (6).
In order to compute etA , we need to compute etA ēi for the standard basis
vectors. Note that we have already computed


1+t
etA ē1 =  0 
−t
and
 
0
tA
2t  
1 ,
e ē2 = e
0
so it remains to compute etA ē3 . We thus need to solve the equation
 
 
 
 
0
1
1
0
0 = c1  0 + c2 0 + c3 1 ,
1
−1
0
0
and a simple calculation gives c1 = −1, c2 = 1 and c3 = 0, so that
  
 

 
1
1+t
t
0
etA 0 = −  0 +  0  =  0  .
−1
−t
1−t
1
Thus,

etA

1+t 0
t
e2t
0 .
= 0
−t
0 1−t
Example 31. Suppose that we in the previous example wanted to solve the IVP
 
0
0

x̄ = Ax̄, x̄(0) = 1 .
1
Of course, once we have the formula
solution by calculating

1+t
tA

0
x̄(t) = e x̄(0) =
−t
for the matrix exponential we can find the

  
0
t
0
t
e2t
0  1 =  e2t  .
0 1−t
1
1−t
19
We could however also solve the problem by using the general solution (8) and
finding c1 , c2 , c3 to match the initial data. We thus need to solve the system
 
 
 
 
0
1
1
0
1 = c1  0 + c2 0 + c3 1 ,
1
−1
0
0
giving c1 = −1, c2 = 1 and c3 = 1. Thus,
  
  


1
1+t
0
t
x̄(t) = −  0 +  0  + e2t 1 =  e2t  ,
−1
−t
0
1−t
which coincides with the result of our previous computation.
Example 32. Let


3 1 −1
0 .
A = 0 2
1 1
1
The characteristic polynomial of A is pA (λ) = −(λ − 2)3 . Thus, A has the only
eigenvalue λ1 = 2 with algebraic multiplicity 3. The corresponding generalized
eigenspace is the whole of C3 . Just like in Example 29 we can therefore compute
the matrix exponential directly through the formula
t2
2
tA
2t t(A−2I)
2t
e =e e
=e
I + t(A − 2I) + (A − 2I) .
2
We find that


1 1 −1
0
A − 2I = 0 0
1 1 −1
Thus the term
t2
(A
2


0 0 0
and (A − 2I)2 = 0 0 0 .
0 0 0
− 2I)2 vanishes and we obtain
etA = e2t (I + t(A − 2I))




1 0 0
1 1 −1
0
= e2t 0 1 0 + t 0 0
0 0 1
1 1 −1


2t
2t
(1 + t)e te
−te2t
.
0
e2t
0
=
te2t
te2t (1 − t)e2t
Example 33. Let


2 0 0
A =  0 2 1 .
−1 0 2
20
Again pA (λ) = −(λ − 2)3 and

0 0
A − 2I =  0 0
−1 0
thus 2 is the only eigenvalue. This


0
0 0
1 and (A − 2I)2 = −1 0
0
0 0
time

0
0 ,
0
so that
tA
e
t2
2
I + t(A − 2I) + (A − 2I)
=e
2






1 0 0
0 0 0
0 0 0
2
t
= e2t 0 1 0 + t  0 0 1 + −1 0 0
2
0 0 1
−1 0 0
0 0 0

 2t
e
0
0
t2 2t
2t

= − 2 e e te2t  .
−te2t 0 e2t
2t
The 4 × 4 case can be analyzed in a similar way. In general, the computations
will get more involved the higher n is. Most computer algebra systems have routines for computing the matrix exponential. In Maple this can be done using the
command MatrixExponential from the LinearAlgebra package.
We can also formulate the above algorithm using the block diagonal representation of A from Theorem 23. Let T be the matrix for the corresponding change of
basis, i.e. T is the matrix whose columns are the coordinates for the basis vectors
in which A takes the block diagonal form B. Then A = T BT −1 and
etA = T etB T −1 ,
where
etB

etB1

...
=

etBk

,
and
e
tBi
=e
t(λi Ii +Ni )
tλi Ii tNi
=e
e
=e
λi t
Ii + tNi + · · · +
tai −1
mi −1
N
,
(ai − 1)! i
since Nim = 0 for m ≥ ai .
In the last two examples above, the matrices were already in block diagonal
form, so that we could take T = I. Let us instead consider Example 30 from this
perspective.
Example 34. For the matrix


1 0
1
0
A= 0 2
−1 0 −1
21
in Example 30 we found that the generalized eigenspace ker A2 had the basis
 
 
1
1
v̄1,1 =  0 , v̄1,2 = 0
−1
0
and the eigenspace ker(A − 2I) the basis
 
0

v̄2,1 = 1 .
0
Thus, we take


1 1 0
T =  0 0 1
−1 0 0
and find that

T −1

0 0 −1
1 .
= 1 0
0 1
0
Since Av̄1,1 = 0, Av̄1,2 = v̄1,1 and Av̄2,1 = 2v̄2,1 , The corresponding block diagonal
matrix is


0 1 0
B
1
B = 0 0 0 =
,
B2
0 0 2
where
0 1
B1 =
,
0 0
We find that
e
tB1
B2 = 2.
1 t
= I + tB1 =
,
0 1
etB2 = e2t .
Thus,

etB

1 t 0
= 0 1 0 
0 0 e2t
and

etA



1 1 0
1 t 0
0 0 −1
1
= T etB T −1 =  0 0 1 0 1 0 1 0
2t
−1 0 0
0 0 e
0 1
0


1+t 0
t
e2t
0 .
= 0
−t
0 1−t
in agreement with our previous calculation.
22
Exercises
1. Compute eA by summing the power series when


0 1 −2
0 1
2 .
a) A =
b) A = 0 0
1 0
0 0
0
0 1
tA
2. Compute e by diagonalising the matrix, where A =
.
−1 0
3. Show that
keA k ≤ ekAk .
4. a) Show that
∗
(eA )∗ = eA .
b) Show that eS is unitary if S is skew symmetric, that is, S ∗ = −S.
5. Show that the following identities (for all t ∈ R) imply AB = BA.
a) AetB = etB A,
b) etA etB = et(A+B) .
6. Let


0
1
1
A1 = 0 −1 −1 ,
0 −1 −1


1
1
2
0 ,
A2 =  1 −1
−1 −1 −2


1
2 0
1 3 .
A3 = 3
0 −2 1
Calculate the generalized eigenspaces and determine a basis consisting of generalized eigenvectors in each case.
7. Calculate etAj for the matrices Aj in the previous exercise.
8. Solve the initial-value problem
x̄0 = Ax̄,
where
9. The matrix


2 1 −1
A = 4 1 −4
5 1 −4
x̄(0) = x̄0 ,
 
1
and x̄0 = 0 .
0


18 3 2 −12
 0 2 0
0

A=
−2 12 2
1
24 6 3 −16
has the eigenvalues 1 and 2. Find the corresponding generalized eigenspaces
and a determine a basis consisting of generalized eigenvectors.
23
10. Consider the initial value problem
(
x01 = x1 + 3x2 ,
x02 = 3x1 + x2 ,
x̄(0) = x̄0 .
For which initial data x̄0 does the solution converge to zero as t → ∞?
11. Can you find a general condition on the eigenvalues of A which guarantees
that all solutions of the IVP
x̄0 = Ax̄,
x̄(0) = x̄0 .
converge to zero as t → ∞?
12. The matrices A1 and A2 in Exercise 6 have the same eigenvalues. If you’ve
solved Exercise 7 correctly, you will notice that all solutions of the IVP corresponding to A1 are bounded for t ≥ 0 while there are unbounded solutions
of the IVP corresponding to A2 . Explain the difference and try to formulate
a general principle.
24