Matrix Functions: Theory and Algorithms

Matrix Functions:
Theory and Algorithms
Nick Higham
Department of Mathematics
University of Manchester
[email protected]
http://www.ma.man.ac.uk/~higham/
Includes joint work with Philip Davies
Function of Matrix – p.1/42
OUTLINE
I
Definitions of f (A)
Applications
Algorithms for particular f
Schur–Parlett algorithm for general f
Computing f (A)b
Function of Matrix – p.2/42
Defining by Substitution
Want to define f : Cn×n → Cn×n , but not elementwise.
Given f (t), can define f (A) by substituting A for t:
1 + t2
f (t) =
1−t
⇒
⇒
f (A) = (I − A)−1 (I + A2 ).
x2 x3 x4
log(1 + x) = x −
+
−
+ · · · , |x| < 1
2
3
4
A2 A3 A4
log(I + A) = A −
+
−
+ · · · , ρ(A) < 1.
2
3
4
Works for f
a polynomial,
a rational,
or with a convergent power series.
Function of Matrix – p.3/42
Multiplicity of Definitions
There have been proposed in the literature since 1880
eight distinct definitions
of a matric function,
by Weyr, Sylvester and Buchheim,
Giorgi, Cartan, Fantappiè, Cipolla,
Schwerdtfeger and Richter.
— R. F. Rinehart,
The Equivalence of Definitions of a Matric Function,
Amer. Math. Monthly (1955)
Function of Matrix – p.4/42
Cauchy Integral Theorem
Definition 1
1
f (A) =
2πi
Z
Γ
f (z)(zI − A)−1 dz,
where f is analytic inside a closed contour Γ which
encloses λ(A).
Function of Matrix – p.5/42
Jordan Canonical Form
Z −1 AZ = J = diag(J1 , J2 , . . . , Jp ),

λk


Jk = 

1
...
λk
...




1 
λk
Definition 2
f (A) = Zf (J)Z −1 = Zdiag(f (Jk ))Z −1 ,

 f (λk )


f (Jk ) = 



f 0 (λ
k)
...
f (λk )
...
...
f (k−1) )(λ
k)
(k − 1)!
..
.
f 0 (λk )
f (λk )




.



Function of Matrix – p.6/42
Interpolation
Definition 3 (Sylvester, 1883; Buchheim, 1886) Distinct
e’vals λ1 , . . . , λs , ni = geometric mult. of λi . Then
f (A) = r(A), where r is unique Hermite interpolating poly of
Ps
degree less than i=1 ni satisfying interpolation conditions
r(j) (λi ) = f (j) (λi ),
j = 0: ni − 1,
i = 1: s.
Poly r depends on A.
This def. preserves functional relations G(f1 , . . . , fp ) = 0,
where G is a polynomial. E.g. sin2 (A) + cos2 (A) = I .
But of course eA+B 6= eA eB .
Function of Matrix – p.7/42
Non-Primary Functions
Horn & Johnson call these defs primary matrix functions.
But not all possible functions captured when multiple
eigenvalues. E.g.,
·
¸
·
¸
·
¸
−1 0
i 0
0 −1
A=
,
X=
, Y =
.
0 −1
0 −i
1 0
X and Y are square roots of A but are not polynomials in A.
However, A = givens(π) and Y = givens(π/2) is a natural
square root.
Virtually all existing theory and methods are for primary
functions.
Non-primary functions sometimes needed when
tracking f (A(t)) when eigenvalues of A(t) coalesce.
Function of Matrix – p.8/42
Textbook References
[1] F. R. Gantmacher. The Theory of Matrices, volume
one. Chelsea, New York, 1959.
[2] Gene H. Golub and Charles F. Van Loan. Matrix
Computations. Johns Hopkins University Press,
Baltimore, MD, USA, third edition, 1996.
[3] Roger A. Horn and Charles R. Johnson. Topics in
Matrix Analysis. Cambridge University Press, 1991.
[4] Peter Lancaster and Miron Tismenetsky. The Theory
of Matrices. Academic Press, London, second edition,
1985.
Function of Matrix – p.9/42
OUTLINE
Definitions of f (A)
I
Applications
Algorithms for particular f
Schur–Parlett algorithm for general f
Computing f (A)b
Function of Matrix – p.10/42
Application: Differential equations
Nuclear magnetic resonance: Solomon equations
dM/dt = −RM,
M (0) = I,
where M (t) = matrix of intensities and R = symmetric
relaxation matrix. NMR workers need to solve both forward
and inverse problems.
Exponential time differencing for stiff systems (Cox &
Matthews, J. Comp. Phys., 2002)
y 0 = Ay + F (y, t).
Methods based on exact integration of linear part—require
one accurate evaluation of exp(hA) per integration.
Function of Matrix – p.11/42
Application: Control theory
Convert continuous-time system
dx
= Ax(t) + Bu(t)
dt
to discrete-time state-space system
xk+1 = F xk + Guk ,
where F = eAτ and τ is sampling period.
(E.g., MATLAB Control System Toolbox, c2d, d2c.)
Function of Matrix – p.12/42
OUTLINE
Definitions of f (A)
Applications
I
Algorithms for particular f
Schur–Parlett algorithm for general f
Computing f (A)b
Function of Matrix – p.13/42
Classic MATLAB
< M A T L A B >
Version of 01/10/84
HELP is available
<>
help
Type HELP followed by
INTRO
(To get started)
NEWS
(recent revisions)
ABS
ANS
ATAN BASE CHAR
DET
DIAG DIAR DISP EDIT
EXP
EYE
FILE FLOP FLPS
INV
KRON LINE LOAD LOG
ORTH PINV PLOT POLY PRIN
REAL RETU RREF ROOT ROUN
SQRT STOP SUM
SVD
TRIL
< > ( ) = . , ; \ / ’ + - * :
CHOL
EIG
FOR
LONG
PROD
SAVE
TRIU
CHOP
ELSE
FUN
LU
QR
SCHU
USER
CLEA
END
HESS
MACR
RAND
SHOR
WHAT
COND
EPS
HILB
MAGI
RANK
SEMI
WHIL
CONJ
EXEC
IF
NORM
RCON
SIN
WHO
COS
EXIT
IMAG
ONES
RAT
SIZE
WHY
Function of Matrix – p.14/42
Classic MATLAB
<>
help fun
FUN For matrix arguments X , the functions SIN, COS, ATAN,
SQRT, LOG, EXP and X**p are computed using eigenvalues D
and eigenvectors V . If <V,D> = EIG(X)
then
f(X) =
V*f(D)/V .
This method may give inaccurate results if V
is badly conditioned. Some idea of the accuracy can be
obtained by comparing X**1 with X .
For vector arguments, the function is applied to each
component.
The availability of [FUN] in early versions of MATLAB
quite possibly contributed to
the system’s technical and commercial success.
— Cleve Moler (2003)
Function of Matrix – p.15/42
Setup
I General nonsymmetric A
I Factorization of A feasible
I May not want full accuracy
I Many applications.
I Methods for very large, sparse A, often require solution
of smaller, dense subproblems.
Function of Matrix – p.16/42
Matrix Exponential
Cleve Moler and Charles Van Loan.
Nineteen dubious ways to compute the exponential of a
matrix, twenty-five years later, SIAM Rev., 45 (2003).
B 355 citations on Science Citation Index.
Scaling and squaring (SS) method for X ≈ eA
(Ward, 1977; Moler & Van Loan, 1978).
1. A ← A/2k so kAk∞ ≤ 1/2
2. r(A) = [6/6] Padé approximant to eA
3. X =
k
2
r(A)
Used by MATLAB’s expm.
Function of Matrix – p.17/42
Alternative SS Algorithm for eA
Suggested by Najfeld & Havel (1995): exploit
τ (A) = A coth(A) = A(e2A + I)(e2A − I)−1
A2
.
=I+
2
A
3I +
A2
5I +
7I + · · ·
1. B = A/2k+1 so kA2 k∞ /22k+2 ≤ 1.152
2. r(B) = [8/8] Padé approximant to τ (B).
h
i2k
3. X = (r(B) + B)(r(B) − B)−1
I Claimed to require fewer flops than original SS alg.
Function of Matrix – p.18/42
Principal Log and pth Root
Let A ∈ Cn×n have no eigenvalues on R− .
Log
X = log A denotes unique X such that
1. eX = A.
2. −π < Im(λ(X)) < π .
pth root
For integer p > 0, X = A1/p is unique X such that
1. X p = A.
2. −π/p < arg(λ(X)) < π/p.
Function of Matrix – p.19/42
Briggs’ Log Method (1617)
log(ab) = log a + log b
log a = 2 log a1/2 .
⇒
Use repeatedly:
k
1/2k
log a = 2 log a
.
k
1/2
a
Write
= 1 + x and note log(1 + x) ≈ x. Briggs worked to
base 10 and used
k
1/2k
log10 a ≈ 2 · log10 e · (a
− 1).
Function of Matrix – p.20/42
Briggs’ Log Method (1617)
log(ab) = log a + log b
log a = 2 log a1/2 .
⇒
Use repeatedly:
k
1/2k
log a = 2 log a
.
k
1/2
a
Write
= 1 + x and note log(1 + x) ≈ x. Briggs worked to
base 10 and used
k
1/2k
log10 a ≈ 2 · log10 e · (a
− 1).
Briggs must be viewed as one of the
great figures in numerical analysis.
— Herman H. Goldstine, A History of Numerical
Analysis (1977)
Function of Matrix – p.20/42
Briggs’ Log Method (1617)
log(ab) = log a + log b
log a = 2 log a1/2 .
⇒
Use repeatedly:
k
1/2k
log a = 2 log a
.
k
1/2
a
Write
= 1 + x and note log(1 + x) ≈ x. Briggs worked to
base 10 and used
1/2k
k
log10 a ≈ 2 · log10 e · (a
− 1).
Can we generalize to matrices:
log A =
2k
k
1/2
log A
?
Function of Matrix – p.20/42
Splitting Lemma
Lemma 0 (Cheng, H, Kenney & Laub, 2001) Suppose
A = BC has no eigenvalues on R− and
1. BC = CB .
2. Every eigenvalue of B (or C ) lies in the open halfplane
of the corresponding eigenvalue of A1/2 .
Then log A = log B + log C .
λ1/2
λ
A
B
Im λ
Re λ
Function of Matrix – p.21/42
Matrix Logarithm
Use the Briggs idea:
log A =
2k
k
1/2
log A
.
Kenney & Laub’s (1989) inverse scaling and squaring
method:
Bring A close to I by repeated square roots.
k
1/2
log A
Approximate
using an [m/m] Padé
approximant rm (x) ≈ log(1 − x).
Rescale to find log A.
Function of Matrix – p.22/42
Alg of Cheng, H, Kenney & Laub (2001)
F Transformation-free: uses only matrix mult, LU, inv.
Function of Matrix – p.23/42
Alg of Cheng, H, Kenney & Laub (2001)
F Transformation-free: uses only matrix mult, LU, inv.
F Sq. roots by product form of Denman–Beavers iteration:
i
1h
1
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2
2
Yk+1 = Yk (I + Mk−1 )/2,
Y0 = A,
where Mk → I and Yk → A1/2 .
Function of Matrix – p.23/42
Alg of Cheng, H, Kenney & Laub (2001)
F Transformation-free: uses only matrix mult, LU, inv.
F Sq. roots by product form of Denman–Beavers iteration:
i
1h
1
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2
2
Yk+1 = Yk (I + Mk−1 )/2,
Y0 = A,
where Mk → I and Yk → A1/2 .
F Aims for a specified accuracy.
Function of Matrix – p.23/42
Alg of Cheng, H, Kenney & Laub (2001)
F Transformation-free: uses only matrix mult, LU, inv.
F Sq. roots by product form of Denman–Beavers iteration:
i
1h
1
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2
2
Yk+1 = Yk (I + Mk−1 )/2,
Y0 = A,
where Mk → I and Yk → A1/2 .
F Aims for a specified accuracy.
F Padé degree m chosen using K & L’s (1989) bound:
krm (X) − log(I − X)k ≤ |rm (kXk) − log(1 − kXk)|.
Function of Matrix – p.23/42
Alg of Cheng, H, Kenney & Laub (2001)
F Transformation-free: uses only matrix mult, LU, inv.
F Sq. roots by product form of Denman–Beavers iteration:
i
1h
1
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2
2
Yk+1 = Yk (I + Mk−1 )/2,
Y0 = A,
where Mk → I and Yk → A1/2 .
F Aims for a specified accuracy.
F Padé degree m chosen using K & L’s (1989) bound:
krm (X) − log(I − X)k ≤ |rm (kXk) − log(1 − kXk)|.
F rm evaluated using partial fraction expansion
Pm αj(m) x
rm (x) = j=1
(m) : fast and accurate (H, 2001).
1+βj
x
Function of Matrix – p.23/42
Matrix pth Root
Square root: Björck & Hammarling (1983). Compute Schur
decomp. A = QT Q∗ and then solve R2 = T by
rii =
√
tii ,
rij =
tij −
Pj−1
k=i+1 tij tkj
tii + tjj
.
Extended to pth roots by Smith (2003)—much more
complicated recurrence.
These algs
I Have essentially optimal numerical stability.
I Generalize to real Schur decomp.
Function of Matrix – p.24/42
Matrix Cosine
Algorithm 0 (Serbin & Blalock, 1980) Given A ∈ Rn×n
and parameter α > 0 this alg approximates cos(A).
Choose m such that 2−m kAk ≈ α.
C0 = Taylor or Pade approximation to cos(A/2m ).
for i = 0: m − 1
Ci+1 = 2Ci2 − I
end
Choice of m (i.e., α)?
Which approximation?
Effect of rounding errors?
Function of Matrix – p.25/42
Alg of H & Smith (2002)
I Initial argument reduction and balancing to
reduce norm.
I [8/8] Padé approximation proved fully accurate
in IEEE double if kAk∞ ≤ 1. More economical
than Taylor series.
I “Schoolboy” evaluation of r8 (A).
I Total cost: (4 + dlog2 (kAk∞ )e)M + D.
I Error analysis give bound containing terms
(4.1)m and norms of intermediate Ci .
Function of Matrix – p.26/42
Numerical Stability
Is kfb − f k consistent with condition of problem?
Is fb = f (A + E) with E “small’, i.e.,
is residual f −1 (fb) − A “small’?
Unclear for all algs discussed except “yes” for A1/p .
F Currently lack characterizations of when an
f (A) problem is ill conditioned for nonnormal A.
Function of Matrix – p.27/42
OUTLINE
Definitions of f (A)
Applications
Algorithms for particular f
I
Schur–Parlett algorithm for general f
Computing f (A)b
Function of Matrix – p.28/42
Similarity Transformations
Can use the formula
A = XBX −1
⇒
f (A) = Xf (B)X −1 ,
provided f (B) is easily computable.
E.g. B = diag(λi ) if A diagonalizable.
Problem : any error ∆B in f (B) magnified by up to
κ(X) = kXkkX −1 k ≥ 1.
Prefer to work with unitary X : thus can use
eigendecomposition (diagonal B ) when A is normal
(AA∗ = A∗ A),
Schur decomposition (triangular B ) in general.
Function of Matrix – p.29/42
Example: Eigendecomposition
function F = funm_ev(A,fun)
[V,D] = eig(A);
F = V * diag(feval(fun,diag(D))) / V;
>> A = [3 -1; 1 1]; X = funm_ev(A,@sqrt)
X =
1.7678e+000 -3.5355e-001
3.5355e-001 1.0607e+000
>> norm(A-Xˆ2)
ans =
9.9519e-009
% cond(V) = 9.4e7
>> Y = sqrtm(A); norm(A-Yˆ2)
ans =
6.4855e-016
Function of Matrix – p.30/42
Parlett’s Recurrence
Schur decomposition A = QT Q∗ reduces problem to
F = f (T ), T upper triangular.
fii = f (tii ) is immediate.
Parlett (1976): from F T = T F obtain recurrence
fii − fjj
fij = tij
tii − tjj
j−1
X
fik tkj − tik fkj
+
.
tii − tjj
k=i+1
Used in MATLAB’s funm.
Function of Matrix – p.31/42
Parlett’s Recurrence
Schur decomposition A = QT Q∗ reduces problem to
F = f (T ), T upper triangular.
fii = f (tii ) is immediate.
Parlett (1976): from F T = T F obtain recurrence
fii − fjj
fij = tij
tii − tjj
j−1
X
fik tkj − tik fkj
+
.
tii − tjj
k=i+1
Used in MATLAB’s funm.
Fails when T has repeated eigenvalues.
Function of Matrix – p.31/42
Parlett vs. Björck & Hammarling
Parlett recurrence is not “optimal”, as clear from sq. root
case: x12 obtained from
√
√
a12 ( a11 − a22 )
a12
Parlett :
=√
: B & H.
√
a11 − a22
a11 + a22
Function of Matrix – p.32/42
Schur–Parlett Algorithm
H & Davies (2002):
Compute Schur decomposition A = QT Q∗ .
Re-order T to block triangular form in which
eigenvalues within a block are “close” and those of
separate blocks are “well separated”.
Evaluate Fii = f (Tii ).
Solve the Sylvester equations
Tii Fij − Fij Tjj = Fii Tij − Tij Fjj +
j−1
X
(Fik Tkj − Tik Fkj ).
k=i+1
Undo the unitary transformations.
Function of Matrix – p.33/42
Function of Atomic Block
Assume f has Taylor series with ∞ radius of cgce and
derivatives available.
For diagonal blocks T use
T = σI + M,
σ = trace(T )/n :
f (T ) =
∞
X
f (k) (σ)
k=0
k!
M k.
Truncate series based on strict error bound, not using
size of terms. NB: for n = 2,
·
¸
² α
M=
0 −²
⇒
M
2k
·
²2k
=
0
¸
0
,
2k
²
M
2k+1
·
²2k+1
=
0
¸
α²2k
.
2k+1
−²
Function of Matrix – p.34/42
Features of Algorithm
Costs O(n3 ) flops, or up to n4 /3 flops if large
blocks needed (close, repeated eigenvalues).
Needs derivatives if blocks size > 1: price to
pay for treating general f and nonnormal A.
Best general f (A) alg. Benchmark for
comparing other f (A) algs—general and
specific.
The basis of a new funm for next MATLAB
release.
Function of Matrix – p.35/42
OUTLINE
Definitions of f (A)
Applications
Algorithms for particular f
Schur–Parlett algorithm for general f
I
Computing f (A)b
Function of Matrix – p.36/42
log(A) b
Apply quadrature rule
R1
0
f (t) dt ≈
Pm
k=1 ck f (tk )
to (Wouk, 1965)
£
¤−1
log A = 0 (A − I) t(A − I) + I
dt.
R1
Combine with Hessenberg reduction A = QHQT to get
(log A) b ≈ Q
m
X
k=1
¤−1
£
ck tk (H − I) + I
d,
d = QT (A − I)b,
Costs (10/3)n3 + 2mn2 flops.
When kI − Ak < 1 can use m-point Gauss-Legendre ≡ Padé
approximation! Choose m using (Kenney & Laub, 2001)
krmm (X) − log(I + X)k ≤ |rmm (−kXk) − log(1 − kXk)|.
When kI − Ak > 1 use adaptive quadrature.
Function of Matrix – p.37/42
Aα b
dy
= α(A − I)[t(A − I) + I]−1 y,
dt
y(0) = b
has unique solution y(t) = [t(A − I) + I]α b ⇒ y(1) = Aα b.
Used by Allen, Baglama & Boyd (2000) for α = 1/2, spd A.
Example using MATLAB’s ode45.
A = gallery(’parter’,64), b = randn(64,1).
f (A)
tol
Succ. steps Fail. atts f evals Rel. err
A−1/2
1e-3
1e-6
1e-9
12
14
40
0
0
0
73
85
241
3.5e-8
6.0e-9
7.7e-12
A2/5
1e-3
1e-6
1e-9
15
16
54
0
0
0
79
91
325
2.8e-8
2.4e-9
1.8e-12
Function of Matrix – p.38/42
Interpolation
If A has distinct eigenvalues λj , Lagrange interp poly:
f (A)b =
n
X
fj `j (A)b,
j=0
`j (x) =
n
Y
k=0, k6=j
n
Y
k=0, k6=j
(x − λk )
.
(λj − λk )
Cost: O(n4 ) flops.
For any A, Newton divided difference form:
f (A)b =
n
X
i=0
ci
i−1
Y
j=0
(A − λj I)b,
ci = (confluent) div. diffs.
Requires derivatives of f . Cost: O(n3 ) flops.
Function of Matrix – p.39/42
Cauchy Integral Theorem
1
y=
2πi
Z
Γ
f (z)(zI − A)−1 b dz =:
Z
g(z) dz.
Γ
Take circle
Γ : z − α = βeiθ ,
Apply repeated trapezium rule:
Z
Γ
g(z) dz =
Z
0
2π
0 ≤ θ ≤ 2π.
n−1
2πi X
(zk − α)g(zk ),
(z(θ) − α)g(z(θ)) dθ ≈
n
k=0
where zk − α = βe2πki/n .
Use Hessenberg reduction, as before.
Function of Matrix – p.40/42
Euler-Maclaurin Error Bound
h(x) period 2π , in C 2k+1 (−∞, ∞), |h(2k+1) (x)| ≤ M :
¯
¯Z 2π
¯ 4πM ζ(2k + 1)
¯
¯≤
¯
h(x)
dx
−
T
(f
)
.
n
¯
¯
2k+1
n
0
• h(2k+1) (x) proportional to β 2k+2 = radius of circle.
• h(2k+1) (x) contains powers of resolvent (z(θ)I − A)−1 .
Bad if contour close to some λi or A highly nonnormal.
• h(2k+1) (x) contains derivatives of f on contour.
Conclude : restricted to matrices
not too nonnormal,
λi can be enclosed in circle of small radius not close to
singularity of derivs of f .
Function of Matrix – p.41/42
Future Work
F Theory and algorithms for non-primary
functions, perhaps linked to an f (A(t))
application.
F Better understanding of conditioning of f (A).
F Exploiting structure, e.g. A ∈ matrix
automorphism group (H, Mackey, Mackey &
Tisseur, 2003).
http://www.ma.man.ac.uk/~higham/
Function of Matrix – p.42/42

Download Report

Matrix Functions: Theory and Algorithms

Paperzz.com

Your Paperzz