The Matrix Logarithm: from Theory to Computation

The MatrixMatters
Logarithm:
Research
from Theory to Computation
February 25, 2009
Nick
Higham
Nick
Higham
SchoolofofResearch
Mathematics
Director
The University of Manchester
School of Mathematics
[email protected]
http://www.ma.man.ac.uk/~higham/
6th European Congress of Mathematics, July 2012
1/6
Definition Applications Theory Methods
Matrix Logarithm
Definition
A logarithm of A ∈ Cn×n is any matrix X such that eX = A.
Implicit definition.
Properties, classification?
University of Manchester
Nick Higham
Matrix Logarithm
2 / 40
Definition Applications Theory Methods
Outline
1
Definition and Properties
2
Applications
3
Theory
4
Computing the Matrix Logarithm and
its Fréchet derivative
University of Manchester
Nick Higham
Matrix Logarithm
3 / 40
Definition Applications Theory Methods
Cayley and Sylvester
Matrix algebra developed by Arthur
Cayley, FRS (1821–1895) in Memoir on
the Theory of Matrices (1858).
Cayley considered matrix square
roots.
Term “matrix” coined in 1850 by James
Joseph Sylvester, FRS (1814–1897).
Gave (1883) first definition of f (A)
for general f .
University of Manchester
Nick Higham
Matrix Logarithm
4 / 40
Definition Applications Theory Methods
Multiplicity of Definitions
There have been proposed in the literature since 1880
eight distinct definitions of a matric function,
by Weyr, Sylvester and Buchheim,
Giorgi, Cartan, Fantappiè, Cipolla,
Schwerdtfeger and Richter.
— R. F. Rinehart,
The Equivalence of Definitions
of a Matric Function (1955)
University of Manchester
Nick Higham
Matrix Logarithm
5 / 40
Definition Applications Theory Methods
Multiplicity of Definitions
There have been proposed in the literature since 1880
eight distinct definitions of a matric function,
by Weyr, Sylvester and Buchheim,
Giorgi, Cartan, Fantappiè, Cipolla,
Schwerdtfeger and Richter.
— R. F. Rinehart,
The Equivalence of Definitions
of a Matric Function (1955)
University of Manchester
Nick Higham
Matrix Logarithm
5 / 40
Definition Applications Theory Methods
Jordan Canonical Form

Z
−1
AZ = J = diag(J1 , . . . , Jp ),
1
λk
λk


Jk = 
|{z}

mk ×mk

..
.
...



1
λk
Definition
f (A) = Zf (J)Z −1 = Z diag(f (Jk ))Z −1 ,


f (mk −1) )(λk )
0
 f (λk ) f (λk ) . . .
(mk − 1)! 


..


..
.
f (Jk ) = 
.
f (λk )
.


.


..
f 0 (λ )
k
f (λk )
University of Manchester
Nick Higham
Matrix Logarithm
6 / 40
Definition Applications Theory Methods
Primary and Nonprimary Logarithms
A = diag(1, 1, e, e).
Primary:
Nonprimary:
University of Manchester
Nick Higham
log(A) = diag(0, 0, 1, 1).
log(A) = diag(0, 2πi, 1, 1).
Matrix Logarithm
7 / 40
Definition Applications Theory Methods
Cauchy Integral Theorem
Definition
1
f (A) =
2πi
Z
Γ
f (z)(zI − A)−1 dz,
where f is analytic on and inside a closed contour Γ that
encloses λ(A).
University of Manchester
Nick Higham
Matrix Logarithm
8 / 40
Definition Applications Theory Methods
Mercator’s Series
By integrating (1 + t)−1 = 1 − t + t 2 − t 3 + · · · between 0
and x we obtain Mercator’s series (1668),
log(1 + x) = x −
x2 x3 x4
+
−
+ ··· ,
2
3
4
|x| < 1.
For A ∈ Cn×n ,
log(I + A) = A −
University of Manchester
A2 A3 A4
+
−
+ ··· ,
2
3
4
Nick Higham
ρ(A) < 1.
Matrix Logarithm
10 / 40
Definition Applications Theory Methods
Composite Functions
Theorem
f (t) = g(h(t)) ⇒ f (A) = g(h(A)).
Corollary
exp(log(A)) = A when log(A) is defined.
University of Manchester
Nick Higham
Matrix Logarithm
11 / 40
Definition Applications Theory Methods
Composite Functions
Theorem
f (t) = g(h(t)) ⇒ f (A) = g(h(A)).
Corollary
exp(log(A)) = A when log(A) is defined.
What about log(exp(A)) = A?
Matrix unwinding number
U(A) =
University of Manchester
Nick Higham
A − log(exp(A))
.
2πi
Matrix Logarithm
11 / 40
Definition Applications Theory Methods
Outline
1
Definition and Properties
2
Applications
3
Theory
4
Computing the Matrix Logarithm and
its Fréchet derivative
University of Manchester
Nick Higham
Matrix Logarithm
12 / 40
Definition Applications Theory Methods
Toolbox of Matrix Functions
d 2y
+ Ay = 0,
y (0) = y0 , y 0 (0) = y00
dt 2
has solution
√
√ −1
√
y (t) = cos( A t)y0 +
A
sin( A t)y00 .
University of Manchester
Nick Higham
Matrix Logarithm
13 / 40
Definition Applications Theory Methods
Toolbox of Matrix Functions
d 2y
+ Ay = 0,
y (0) = y0 , y 0 (0) = y00
dt 2
has solution
√
√ −1
√
y (t) = cos( A t)y0 +
A
sin( A t)y00 .
But
University of Manchester
y0
y
= exp
Nick Higham
0
t In
−tA
0
y00
.
y0
Matrix Logarithm
13 / 40
Definition Applications Theory Methods
Toolbox of Matrix Functions
d 2y
+ Ay = 0,
y (0) = y0 , y 0 (0) = y00
dt 2
has solution
√
√ −1
√
y (t) = cos( A t)y0 +
A
sin( A t)y00 .
But
y0
y
= exp
0
t In
−tA
0
y00
.
y0
In software want to be able evaluate interesting f at
matrix args as well as scalar args.
MATLAB has expm, logm, sqrtm, funm.
University of Manchester
Nick Higham
Matrix Logarithm
13 / 40
Definition Applications Theory Methods
Application: Control Theory
Convert continuous-time system
dx
= Fx(t) + Gu(t),
dt
y = Hx(t) + Ju(t),
to discrete-time state-space system
xk +1 = Axk + Buk ,
yk = Hxk + Juk .
Have
Fτ
A=e ,
Z
τ
Ft
e dt G,
B=
0
where τ is the sampling period.
MATLAB Control System Toolbox: c2d and d2c.
University of Manchester
Nick Higham
Matrix Logarithm
14 / 40
Definition Applications Theory Methods
The Average Eye
First order character of optical system characterized by
transference matrix
S δ
∈ R5×5 ,
T =
0 1
where S ∈ R4×4 is symplectic:
0
T
S JS = J =
−I2
Average m−1
I2
.
0
Pm
Ti is not a transference matrix.
P
Harris (2005) proposes the average exp(m−1 m
i=1 log(Ti )).
University of Manchester
i=1
Nick Higham
Matrix Logarithm
15 / 40
Definition Applications Theory Methods
Markov Models
Time-homogeneous continuous-time Markov process
with transition probability matrix P(t) ∈ Rn×n .
Transition intensity matrix Q: qij ≥ 0 (i 6= j),
Pn
Qt
j=1 qij = 0, P(t) = e .
For discrete-time Markov processes:
Embeddability problem
When does a given stochastic P have a real logarithm Q
that is an intensity matrix?
University of Manchester
Nick Higham
Matrix Logarithm
16 / 40
Definition Applications Theory Methods
Markov Models (1)—Example
√
With x = −e−2
≈ −1.9 × 10−5 ,

1 + 2x 1 − x
1
P =  1 − x 1 + 2x
3
1−x
1−x
3π

1−x
1 − x .
1 + 2x
P diagonalizable, Λ(P) = {1, x, x}.
Every primary log complex (can’t have complex
conjugate ei’vals).
Yet a generator is the non-primary log


−2/3
1/2
1/6
√
Q = 2 3π  1/6 −2/3 1/2  .
1/2
1/6 −2/3
University of Manchester
Nick Higham
Matrix Logarithm
17 / 40
Definition Applications Theory Methods
Markov Models (2)
Suppose P ≡ P(1) has a generator Q = log P.
Then P(t) at other times is P(t) = exp(Qt).
E.g., if P transition matrix for 1 year,
1
P(1/12) = e 12 log P ≡ P 1/12 is matrix for 1 month.
Problem: log P, P 1/k may have wrong sign patterns ⇒
“regularize”.
University of Manchester
Nick Higham
Matrix Logarithm
18 / 40
Definition Applications Theory Methods
HIV to Aids Transition
Estimated 6-month transition matrix.
Four AIDS-free states and 1 AIDS state.
2077 observations (Charitos et al., 2008).


0.8149 0.0738 0.0586 0.0407 0.0120
 0.5622 0.1752 0.1314 0.1169 0.0143 


.
0.3606
0.1860
0.1521
0.2198
0.0815
P=


 0.1676 0.0636 0.1444 0.4652 0.1592 
0
0
0
0
1
Want to estimate the 1-month transition matrix.
Λ(P) = {1, 0.9644, 0.4980, 0.1493, −0.0043}.
N. J. Higham and L. Lin.
On pth roots of stochastic matrices, LAA, 2011.
University of Manchester
Nick Higham
Matrix Logarithm
19 / 40
Definition Applications Theory Methods
Outline
1
Definition and Properties
2
Applications
3
Theory
4
Computing the Matrix Logarithm and
its Fréchet derivative
University of Manchester
Nick Higham
Matrix Logarithm
20 / 40
Definition Applications Theory Methods
Logs of A = I3


0

C = −2π
−2π

0 0 0
B = 0 0 0,
0 0 0


2π − 1 1
0


0
0 ,
D = −2π
0
0
0
2π
0
0

1
0,
0
eB = eC = eD = I3 .
Λ(C) = Λ(D) = {0, 2πi, −2πi}.
University of Manchester
Nick Higham
Matrix Logarithm
21 / 40
Definition Applications Theory Methods
All Solutions of eX = A
Theorem (Gantmacher, 1959)
A ∈ Cn×n nonsing with Jordan canonical form
Z −1 AZ = J = diag(J1 , J2 , . . . , Jp ). All solutions to eX = A
are given by
(j )
(j )
(j )
X = Z U diag(L1 1 , L2 2 , . . . , Lp p ) U
−1
Z −1 ,
where
(j )
Lk k = log(Jk (λk )) + 2 jk πi Imk ,
jk ∈ Z arbitrary, and U an arbitrary nonsing matrix that
commutes with J.
University of Manchester
Nick Higham
Matrix Logarithm
22 / 40
Definition Applications Theory Methods
All Solutions of eX = A: Classified
Theorem
A ∈ Cn×n nonsing: p Jordan blocks, s distinct ei’vals.
eX = A has a countable infinity of solutions that are primary
functions of A:
(j )
(j )
(j )
Xj = Z diag(L1 1 , L2 2 , . . . , Lp p )Z −1 ,
where λi = λk implies ji = jk . If s < p then eX = A has
non-primary solutions
(j )
(j )
(j )
Xj (U) = Z U diag(L1 1 , L2 2 , . . . , Lp p ) U
−1
Z −1 ,
where jk ∈ Z arbitrary, U arbitrary nonsing with UJ = JU,
and for each j ∃ i and k s.t. λi = λk while ji 6= jk .
University of Manchester
Nick Higham
Matrix Logarithm
23 / 40
Definition Applications Theory Methods
Logs of A = I3 (again)

0
C =  −2π
−2π
2π − 1
0
0

1
0,
0

0
D =  −2π
0
2π
0
0

1
0,
0
e0 = eC = eD = I3 . Λ(C) = Λ(D) = {0, 2πi, −2πi}.

1 α

U= 0 1
0 0

0
α,
1
α ∈ C,

X = U diag(2πi, −2πi, 0)U −1
University of Manchester
Nick Higham
1 −2α
= 2π i  0
1
0
0
Matrix Logarithm

2α2
−α  .
1
24 / 40
Definition Applications Theory Methods
Square Roots of Rotations
cos θ
G(θ) =
− sin θ
sin θ
.
cos θ
G(θ/2) is the natural square root of G(θ).
For θ = π,
−1 0
G(π) =
,
0 −1
0
G(π/2) =
−1
1
.
0
G(π/2) is a nonprimary square root.
University of Manchester
Nick Higham
Matrix Logarithm
25 / 40
Definition Applications Theory Methods
Principal Logarithm and pth Root
Let A ∈ Cn×n have no eigenvalues on R− .
Principal log
X = log(A) denotes unique X such that
eX = A.
−π < Im λ(X ) < π.
University of Manchester
Nick Higham
Matrix Logarithm
26 / 40
Definition Applications Theory Methods
Principal Logarithm and pth Root
Let A ∈ Cn×n have no eigenvalues on R− .
Principal log
X = log(A) denotes unique X such that
eX = A.
−π < Im λ(X ) < π.
Principal pth root
For integer p > 0, X = A1/p is unique X such that
X p = A.
−π/p < arg(λ(X )) < π/p.
University of Manchester
Nick Higham
Matrix Logarithm
26 / 40
Definition Applications Theory Methods
Outline
1
Definition and Properties
2
Applications
3
Theory
4
Computing the Matrix Logarithm and
its Fréchet derivative
University of Manchester
Nick Higham
Matrix Logarithm
27 / 40
Definition Applications Theory Methods
Henry Briggs (1561–1630)
Arithmetica Logarithmica (1624)
Logarithms to base 10 of 1–20,000 and
90,000–100,000 to 14 decimal places.
University of Manchester
Nick Higham
Matrix Logarithm
28 / 40
Definition Applications Theory Methods
Henry Briggs (1561–1630)
Arithmetica Logarithmica (1624)
Logarithms to base 10 of 1–20,000 and
90,000–100,000 to 14 decimal places.
Briggs must be viewed as one of the
great figures in numerical analysis.
—Herman H. Goldstine,
A History of Numerical Analysis (1977)
University of Manchester
Nick Higham
Matrix Logarithm
28 / 40
Definition Applications Theory Methods
Briggs’ Log Method (1617)
log(ab) = log a + log b
⇒
log a = 2 log a1/2 .
Use repeatedly:
k
log a = 2k log a1/2 .
k
Write a1/2 = 1 + x and note log(1 + x) ≈ x. Briggs worked
to base 10 and used
k
log10 a ≈ 2k · log10 e · (a1/2 − 1).
University of Manchester
Nick Higham
Matrix Logarithm
31 / 40
Definition Applications Theory Methods
When Does log(BC) = log(B) + log(C)?
Theorem
Let B, C ∈ Cn×n commute and have no ei’vals on R− . If for
every ei’val λj of B and the corr. ei’val µj of C,
| arg λj + arg µj | < π, then log(BC) = log(B) + log(C).
University of Manchester
Nick Higham
Matrix Logarithm
32 / 40
Definition Applications Theory Methods
When Does log(BC) = log(B) + log(C)?
Theorem
Let B, C ∈ Cn×n commute and have no ei’vals on R− . If for
every ei’val λj of B and the corr. ei’val µj of C,
| arg λj + arg µj | < π, then log(BC) = log(B) + log(C).
Proof. log(B) and log(C) commute, since B and C do.
Therefore
elog(B)+log(C) = elog(B) elog(C) = BC.
Thus log(B) + log(C) is some logarithm of BC. Then
Im(log λj + log µj ) = arg λj + arg µj ∈ (−π, π),
so log(B) + log(C) is the principal logarithm of BC.
University of Manchester
Nick Higham
Matrix Logarithm
32 / 40
Definition Applications Theory Methods
Inverse Scaling and Squaring Method
Take B = C in previous theorem:
log A = log A1/2 · A1/2 = 2 log(A1/2 ),
since arg λ(A1/2 ) ∈ (−π/2, π/2).
University of Manchester
Nick Higham
Matrix Logarithm
33 / 40
Definition Applications Theory Methods
Inverse Scaling and Squaring Method
Take B = C in previous theorem:
log A = log A1/2 · A1/2 = 2 log(A1/2 ),
since arg λ(A1/2 ) ∈ (−π/2, π/2).
Use Briggs’ idea:
University of Manchester
k
log A = 2k log A1/2 .
Nick Higham
Matrix Logarithm
33 / 40
Definition Applications Theory Methods
Inverse Scaling and Squaring Method
Take B = C in previous theorem:
log A = log A1/2 · A1/2 = 2 log(A1/2 ),
since arg λ(A1/2 ) ∈ (−π/2, π/2).
Use Briggs’ idea:
k
log A = 2k log A1/2 .
Kenney & Laub’s (1989) inverse scaling and squaring
method:
Bring A close to I by repeated square roots.
s
Approximate log(A1/2 ) using an [m/m] Padé
approximant rm (x) ≈ log(1 + x).
Rescale to find log(A).
University of Manchester
Nick Higham
Matrix Logarithm
33 / 40
Definition Applications Theory Methods
Choice of Parameters s, m
s
Must have kI − A1/2 k < 1.
Larger Padé degree m means smaller s.
Let h2m+1 (X ) = erm (X ) − X − I.
Assume ρ(rm (X )) < π, so log(erm (X ) ) = rm (X ). Then
rm (X ) = log(I + X + h2m+1 (X )) =: log(I + X + ∆X ),
where
h2m+1 (X ) =
∞
X
ck X k .
k =2m+1
University of Manchester
Nick Higham
Matrix Logarithm
34 / 40
Definition Applications Theory Methods
Bounding the Backward Error
Want to bound norm of h2m+1 (X ) =
P∞
k =2m+1
ck X k .
Non-normality implies ρ(A) kAk.
Note that
ρ(A) ≤ kAk k1/k ≤ kAk,
k = 1 : ∞.
and limk →∞ kAk k1/k = ρ(A).
Use kAk k1/k instead of kAk in the truncation bounds.
University of Manchester
Nick Higham
Matrix Logarithm
35 / 40
0.9
A=
0
500
.
−0.5
10
10
kAkk2
kAk k2
1/5
(kA5 k2 )k
1/10
(kA10 k2 )k
k 1/k
kA k2
8
10
6
10
4
10
2
10
0
10
0
5
10
15
20
Definition Applications Theory Methods
Algorithm of Al-Mohy & H (2012)
Truncation bounds use kAk k1/k rather than kAk, leading
to major benefits in speed and accuracy.
Matrix norms not such a blunt tool!
Use estimates of kAk k (alg of H & Tisseur (2000)).
Choose s and m to achieve double precision backward
error at minimal cost.
Initial Schur decomposition: A = QTQ ∗ .
Directly and accurately compute certain elements of
s
T 1/2 − I and log(T ). Use
s
a−1
.
1/2i )
i=1 (1 + a
a1/2 − 1 = Qs
University of Manchester
Nick Higham
Matrix Logarithm
37 / 40
Definition Applications Theory Methods
Frechét Derivative of Logarithm
f (A + E) − f (A) − L(A, E) = o(kEk).
Integral formula
Z
L(A, E) =
0
1
t(A − I) + I
−1
E t(A − I) + I
Method based on
X E
f (X )
f
=
0 X
0
−1
dt.
L(X , E)
.
f (X )
Kenney & Laub (1998): Kronecker–Sylvester alg,
Padé of tanh(x)/x. Requires complex arithmetic.
University of Manchester
Nick Higham
Matrix Logarithm
38 / 40
Definition Applications Theory Methods
Algorithm of Al-Mohy, H & Relton (2012)
Fréchet differentiate the ISS algorithm!
1
2
3
4
5
6
7
E0 = E
for i = 1: s
i
Compute A1/2 .
i
i
Solve the Sylvester eqn A1/2 Ei + Ei A1/2 = Ei−1 .
end
s
log(A) ≈ 2s rm (A1/2 − I)
s
Llog (A, E) ≈ 2s Lrm (A1/2 − I, Es )
Backward Error Result
rm (X ) = log(I + X + ∆X ),
Lrm (X , E) = Llog (I + X + ∆X , E + ∆E).
University of Manchester
Nick Higham
Matrix Logarithm
39 / 40
Definition Applications Theory Methods
Conclusions & Future Directions
Log appears in a growing number of applications.
Have good algorithms for log(A), Llog (A) and estimating
the condition number.
If A is real can work entirely in real arithmetic.
Conditioning of f (A).
Non-primary functions.
Functions of
structured matrices.
University of Manchester
Nick Higham
Matrix Logarithm
40 / 40
Definition Applications Theory Methods
References I
A. H. Al-Mohy and N. J. Higham.
A new scaling and squaring algorithm for the matrix
exponential.
SIAM J. Matrix Anal. Appl., 31(3):970–989, 2009.
A. H. Al-Mohy and N. J. Higham.
Improved inverse scaling and squaring algorithms for
the matrix logarithm.
SIAM J. Sci. Comput., 34(4):C152–C169, 2012.
University of Manchester
Nick Higham
Matrix Logarithm
36 / 40
Definition Applications Theory Methods
References II
A. H. Al-Mohy, N. J. Higham, and S. D. Relton.
Computing the fréchet derivative of the matrix logarithm
and estimating the condition number.
MIMS EPrint 2012.72, Manchester Institute for
Mathematical Sciences, The University of Manchester,
UK, July 2012.
17 pp.
T. Charitos, P. R. de Waal, and L. C. van der Gaag.
Computing short-interval transition matrices of a
discrete-time Markov chain from partially observed
data.
Statistics in Medicine, 27:905–921, 2008.
University of Manchester
Nick Higham
Matrix Logarithm
37 / 40
Definition Applications Theory Methods
References III
W. F. Harris.
The average eye.
Opthal. Physiol. Opt., 24:580–585, 2005.
N. J. Higham.
The Matrix Function Toolbox.
http:
//www.ma.man.ac.uk/~higham/mftoolbox.
N. J. Higham.
Evaluating Padé approximants of the matrix logarithm.
SIAM J. Matrix Anal. Appl., 22(4):1126–1135, 2001.
University of Manchester
Nick Higham
Matrix Logarithm
38 / 40
Definition Applications Theory Methods
References IV
N. J. Higham.
Functions of Matrices: Theory and Computation.
Society for Industrial and Applied Mathematics,
Philadelphia, PA, USA, 2008.
ISBN 978-0-898716-46-7.
xx+425 pp.
N. J. Higham and A. H. Al-Mohy.
Computing matrix functions.
Acta Numerica, 19:159–208, 2010.
N. J. Higham and L. Lin.
On pth roots of stochastic matrices.
Linear Algebra Appl., 435(3):448–463, 2011.
University of Manchester
Nick Higham
Matrix Logarithm
39 / 40
Definition Applications Theory Methods
References V
C. S. Kenney and A. J. Laub.
Condition estimates for matrix functions.
SIAM J. Matrix Anal. Appl., 10(2):191–209, 1989.
C. S. Kenney and A. J. Laub.
A Schur–Fréchet algorithm for computing the logarithm
and exponential of a matrix.
SIAM J. Matrix Anal. Appl., 19(3):640–663, 1998.
University of Manchester
Nick Higham
Matrix Logarithm
40 / 40