Matrix Functions: Theory and Algorithms Nick Higham Department of Mathematics University of Manchester [email protected] http://www.ma.man.ac.uk/~higham/ Includes joint work with Philip Davies Function of Matrix – p.1/42 OUTLINE I Definitions of f (A) Applications Algorithms for particular f Schur–Parlett algorithm for general f Computing f (A)b Function of Matrix – p.2/42 Defining by Substitution Want to define f : Cn×n → Cn×n , but not elementwise. Given f (t), can define f (A) by substituting A for t: 1 + t2 f (t) = 1−t ⇒ ⇒ f (A) = (I − A)−1 (I + A2 ). x2 x3 x4 log(1 + x) = x − + − + · · · , |x| < 1 2 3 4 A2 A3 A4 log(I + A) = A − + − + · · · , ρ(A) < 1. 2 3 4 Works for f a polynomial, a rational, or with a convergent power series. Function of Matrix – p.3/42 Multiplicity of Definitions There have been proposed in the literature since 1880 eight distinct definitions of a matric function, by Weyr, Sylvester and Buchheim, Giorgi, Cartan, Fantappiè, Cipolla, Schwerdtfeger and Richter. — R. F. Rinehart, The Equivalence of Definitions of a Matric Function, Amer. Math. Monthly (1955) Function of Matrix – p.4/42 Cauchy Integral Theorem Definition 1 1 f (A) = 2πi Z Γ f (z)(zI − A)−1 dz, where f is analytic inside a closed contour Γ which encloses λ(A). Function of Matrix – p.5/42 Jordan Canonical Form Z −1 AZ = J = diag(J1 , J2 , . . . , Jp ), λk Jk = 1 ... λk ... 1 λk Definition 2 f (A) = Zf (J)Z −1 = Zdiag(f (Jk ))Z −1 , f (λk ) f (Jk ) = f 0 (λ k) ... f (λk ) ... ... f (k−1) )(λ k) (k − 1)! .. . f 0 (λk ) f (λk ) . Function of Matrix – p.6/42 Interpolation Definition 3 (Sylvester, 1883; Buchheim, 1886) Distinct e’vals λ1 , . . . , λs , ni = geometric mult. of λi . Then f (A) = r(A), where r is unique Hermite interpolating poly of Ps degree less than i=1 ni satisfying interpolation conditions r(j) (λi ) = f (j) (λi ), j = 0: ni − 1, i = 1: s. Poly r depends on A. This def. preserves functional relations G(f1 , . . . , fp ) = 0, where G is a polynomial. E.g. sin2 (A) + cos2 (A) = I . But of course eA+B 6= eA eB . Function of Matrix – p.7/42 Non-Primary Functions Horn & Johnson call these defs primary matrix functions. But not all possible functions captured when multiple eigenvalues. E.g., · ¸ · ¸ · ¸ −1 0 i 0 0 −1 A= , X= , Y = . 0 −1 0 −i 1 0 X and Y are square roots of A but are not polynomials in A. However, A = givens(π) and Y = givens(π/2) is a natural square root. Virtually all existing theory and methods are for primary functions. Non-primary functions sometimes needed when tracking f (A(t)) when eigenvalues of A(t) coalesce. Function of Matrix – p.8/42 Textbook References [1] F. R. Gantmacher. The Theory of Matrices, volume one. Chelsea, New York, 1959. [2] Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, MD, USA, third edition, 1996. [3] Roger A. Horn and Charles R. Johnson. Topics in Matrix Analysis. Cambridge University Press, 1991. [4] Peter Lancaster and Miron Tismenetsky. The Theory of Matrices. Academic Press, London, second edition, 1985. Function of Matrix – p.9/42 OUTLINE Definitions of f (A) I Applications Algorithms for particular f Schur–Parlett algorithm for general f Computing f (A)b Function of Matrix – p.10/42 Application: Differential equations Nuclear magnetic resonance: Solomon equations dM/dt = −RM, M (0) = I, where M (t) = matrix of intensities and R = symmetric relaxation matrix. NMR workers need to solve both forward and inverse problems. Exponential time differencing for stiff systems (Cox & Matthews, J. Comp. Phys., 2002) y 0 = Ay + F (y, t). Methods based on exact integration of linear part—require one accurate evaluation of exp(hA) per integration. Function of Matrix – p.11/42 Application: Control theory Convert continuous-time system dx = Ax(t) + Bu(t) dt to discrete-time state-space system xk+1 = F xk + Guk , where F = eAτ and τ is sampling period. (E.g., MATLAB Control System Toolbox, c2d, d2c.) Function of Matrix – p.12/42 OUTLINE Definitions of f (A) Applications I Algorithms for particular f Schur–Parlett algorithm for general f Computing f (A)b Function of Matrix – p.13/42 Classic MATLAB < M A T L A B > Version of 01/10/84 HELP is available <> help Type HELP followed by INTRO (To get started) NEWS (recent revisions) ABS ANS ATAN BASE CHAR DET DIAG DIAR DISP EDIT EXP EYE FILE FLOP FLPS INV KRON LINE LOAD LOG ORTH PINV PLOT POLY PRIN REAL RETU RREF ROOT ROUN SQRT STOP SUM SVD TRIL < > ( ) = . , ; \ / ’ + - * : CHOL EIG FOR LONG PROD SAVE TRIU CHOP ELSE FUN LU QR SCHU USER CLEA END HESS MACR RAND SHOR WHAT COND EPS HILB MAGI RANK SEMI WHIL CONJ EXEC IF NORM RCON SIN WHO COS EXIT IMAG ONES RAT SIZE WHY Function of Matrix – p.14/42 Classic MATLAB <> help fun FUN For matrix arguments X , the functions SIN, COS, ATAN, SQRT, LOG, EXP and X**p are computed using eigenvalues D and eigenvectors V . If <V,D> = EIG(X) then f(X) = V*f(D)/V . This method may give inaccurate results if V is badly conditioned. Some idea of the accuracy can be obtained by comparing X**1 with X . For vector arguments, the function is applied to each component. The availability of [FUN] in early versions of MATLAB quite possibly contributed to the system’s technical and commercial success. — Cleve Moler (2003) Function of Matrix – p.15/42 Setup I General nonsymmetric A I Factorization of A feasible I May not want full accuracy I Many applications. I Methods for very large, sparse A, often require solution of smaller, dense subproblems. Function of Matrix – p.16/42 Matrix Exponential Cleve Moler and Charles Van Loan. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later, SIAM Rev., 45 (2003). B 355 citations on Science Citation Index. Scaling and squaring (SS) method for X ≈ eA (Ward, 1977; Moler & Van Loan, 1978). 1. A ← A/2k so kAk∞ ≤ 1/2 2. r(A) = [6/6] Padé approximant to eA 3. X = k 2 r(A) Used by MATLAB’s expm. Function of Matrix – p.17/42 Alternative SS Algorithm for eA Suggested by Najfeld & Havel (1995): exploit τ (A) = A coth(A) = A(e2A + I)(e2A − I)−1 A2 . =I+ 2 A 3I + A2 5I + 7I + · · · 1. B = A/2k+1 so kA2 k∞ /22k+2 ≤ 1.152 2. r(B) = [8/8] Padé approximant to τ (B). h i2k 3. X = (r(B) + B)(r(B) − B)−1 I Claimed to require fewer flops than original SS alg. Function of Matrix – p.18/42 Principal Log and pth Root Let A ∈ Cn×n have no eigenvalues on R− . Log X = log A denotes unique X such that 1. eX = A. 2. −π < Im(λ(X)) < π . pth root For integer p > 0, X = A1/p is unique X such that 1. X p = A. 2. −π/p < arg(λ(X)) < π/p. Function of Matrix – p.19/42 Briggs’ Log Method (1617) log(ab) = log a + log b log a = 2 log a1/2 . ⇒ Use repeatedly: k 1/2k log a = 2 log a . k 1/2 a Write = 1 + x and note log(1 + x) ≈ x. Briggs worked to base 10 and used k 1/2k log10 a ≈ 2 · log10 e · (a − 1). Function of Matrix – p.20/42 Briggs’ Log Method (1617) log(ab) = log a + log b log a = 2 log a1/2 . ⇒ Use repeatedly: k 1/2k log a = 2 log a . k 1/2 a Write = 1 + x and note log(1 + x) ≈ x. Briggs worked to base 10 and used k 1/2k log10 a ≈ 2 · log10 e · (a − 1). Briggs must be viewed as one of the great figures in numerical analysis. — Herman H. Goldstine, A History of Numerical Analysis (1977) Function of Matrix – p.20/42 Briggs’ Log Method (1617) log(ab) = log a + log b log a = 2 log a1/2 . ⇒ Use repeatedly: k 1/2k log a = 2 log a . k 1/2 a Write = 1 + x and note log(1 + x) ≈ x. Briggs worked to base 10 and used 1/2k k log10 a ≈ 2 · log10 e · (a − 1). Can we generalize to matrices: log A = 2k k 1/2 log A ? Function of Matrix – p.20/42 Splitting Lemma Lemma 0 (Cheng, H, Kenney & Laub, 2001) Suppose A = BC has no eigenvalues on R− and 1. BC = CB . 2. Every eigenvalue of B (or C ) lies in the open halfplane of the corresponding eigenvalue of A1/2 . Then log A = log B + log C . λ1/2 λ A B Im λ Re λ Function of Matrix – p.21/42 Matrix Logarithm Use the Briggs idea: log A = 2k k 1/2 log A . Kenney & Laub’s (1989) inverse scaling and squaring method: Bring A close to I by repeated square roots. k 1/2 log A Approximate using an [m/m] Padé approximant rm (x) ≈ log(1 − x). Rescale to find log A. Function of Matrix – p.22/42 Alg of Cheng, H, Kenney & Laub (2001) F Transformation-free: uses only matrix mult, LU, inv. Function of Matrix – p.23/42 Alg of Cheng, H, Kenney & Laub (2001) F Transformation-free: uses only matrix mult, LU, inv. F Sq. roots by product form of Denman–Beavers iteration: i 1h 1 Mk+1 = I + (Mk + Mk−1 ) , M0 = A, 2 2 Yk+1 = Yk (I + Mk−1 )/2, Y0 = A, where Mk → I and Yk → A1/2 . Function of Matrix – p.23/42 Alg of Cheng, H, Kenney & Laub (2001) F Transformation-free: uses only matrix mult, LU, inv. F Sq. roots by product form of Denman–Beavers iteration: i 1h 1 Mk+1 = I + (Mk + Mk−1 ) , M0 = A, 2 2 Yk+1 = Yk (I + Mk−1 )/2, Y0 = A, where Mk → I and Yk → A1/2 . F Aims for a specified accuracy. Function of Matrix – p.23/42 Alg of Cheng, H, Kenney & Laub (2001) F Transformation-free: uses only matrix mult, LU, inv. F Sq. roots by product form of Denman–Beavers iteration: i 1h 1 Mk+1 = I + (Mk + Mk−1 ) , M0 = A, 2 2 Yk+1 = Yk (I + Mk−1 )/2, Y0 = A, where Mk → I and Yk → A1/2 . F Aims for a specified accuracy. F Padé degree m chosen using K & L’s (1989) bound: krm (X) − log(I − X)k ≤ |rm (kXk) − log(1 − kXk)|. Function of Matrix – p.23/42 Alg of Cheng, H, Kenney & Laub (2001) F Transformation-free: uses only matrix mult, LU, inv. F Sq. roots by product form of Denman–Beavers iteration: i 1h 1 Mk+1 = I + (Mk + Mk−1 ) , M0 = A, 2 2 Yk+1 = Yk (I + Mk−1 )/2, Y0 = A, where Mk → I and Yk → A1/2 . F Aims for a specified accuracy. F Padé degree m chosen using K & L’s (1989) bound: krm (X) − log(I − X)k ≤ |rm (kXk) − log(1 − kXk)|. F rm evaluated using partial fraction expansion Pm αj(m) x rm (x) = j=1 (m) : fast and accurate (H, 2001). 1+βj x Function of Matrix – p.23/42 Matrix pth Root Square root: Björck & Hammarling (1983). Compute Schur decomp. A = QT Q∗ and then solve R2 = T by rii = √ tii , rij = tij − Pj−1 k=i+1 tij tkj tii + tjj . Extended to pth roots by Smith (2003)—much more complicated recurrence. These algs I Have essentially optimal numerical stability. I Generalize to real Schur decomp. Function of Matrix – p.24/42 Matrix Cosine Algorithm 0 (Serbin & Blalock, 1980) Given A ∈ Rn×n and parameter α > 0 this alg approximates cos(A). Choose m such that 2−m kAk ≈ α. C0 = Taylor or Pade approximation to cos(A/2m ). for i = 0: m − 1 Ci+1 = 2Ci2 − I end Choice of m (i.e., α)? Which approximation? Effect of rounding errors? Function of Matrix – p.25/42 Alg of H & Smith (2002) I Initial argument reduction and balancing to reduce norm. I [8/8] Padé approximation proved fully accurate in IEEE double if kAk∞ ≤ 1. More economical than Taylor series. I “Schoolboy” evaluation of r8 (A). I Total cost: (4 + dlog2 (kAk∞ )e)M + D. I Error analysis give bound containing terms (4.1)m and norms of intermediate Ci . Function of Matrix – p.26/42 Numerical Stability Is kfb − f k consistent with condition of problem? Is fb = f (A + E) with E “small’, i.e., is residual f −1 (fb) − A “small’? Unclear for all algs discussed except “yes” for A1/p . F Currently lack characterizations of when an f (A) problem is ill conditioned for nonnormal A. Function of Matrix – p.27/42 OUTLINE Definitions of f (A) Applications Algorithms for particular f I Schur–Parlett algorithm for general f Computing f (A)b Function of Matrix – p.28/42 Similarity Transformations Can use the formula A = XBX −1 ⇒ f (A) = Xf (B)X −1 , provided f (B) is easily computable. E.g. B = diag(λi ) if A diagonalizable. Problem : any error ∆B in f (B) magnified by up to κ(X) = kXkkX −1 k ≥ 1. Prefer to work with unitary X : thus can use eigendecomposition (diagonal B ) when A is normal (AA∗ = A∗ A), Schur decomposition (triangular B ) in general. Function of Matrix – p.29/42 Example: Eigendecomposition function F = funm_ev(A,fun) [V,D] = eig(A); F = V * diag(feval(fun,diag(D))) / V; >> A = [3 -1; 1 1]; X = funm_ev(A,@sqrt) X = 1.7678e+000 -3.5355e-001 3.5355e-001 1.0607e+000 >> norm(A-Xˆ2) ans = 9.9519e-009 % cond(V) = 9.4e7 >> Y = sqrtm(A); norm(A-Yˆ2) ans = 6.4855e-016 Function of Matrix – p.30/42 Parlett’s Recurrence Schur decomposition A = QT Q∗ reduces problem to F = f (T ), T upper triangular. fii = f (tii ) is immediate. Parlett (1976): from F T = T F obtain recurrence fii − fjj fij = tij tii − tjj j−1 X fik tkj − tik fkj + . tii − tjj k=i+1 Used in MATLAB’s funm. Function of Matrix – p.31/42 Parlett’s Recurrence Schur decomposition A = QT Q∗ reduces problem to F = f (T ), T upper triangular. fii = f (tii ) is immediate. Parlett (1976): from F T = T F obtain recurrence fii − fjj fij = tij tii − tjj j−1 X fik tkj − tik fkj + . tii − tjj k=i+1 Used in MATLAB’s funm. Fails when T has repeated eigenvalues. Function of Matrix – p.31/42 Parlett vs. Björck & Hammarling Parlett recurrence is not “optimal”, as clear from sq. root case: x12 obtained from √ √ a12 ( a11 − a22 ) a12 Parlett : =√ : B & H. √ a11 − a22 a11 + a22 Function of Matrix – p.32/42 Schur–Parlett Algorithm H & Davies (2002): Compute Schur decomposition A = QT Q∗ . Re-order T to block triangular form in which eigenvalues within a block are “close” and those of separate blocks are “well separated”. Evaluate Fii = f (Tii ). Solve the Sylvester equations Tii Fij − Fij Tjj = Fii Tij − Tij Fjj + j−1 X (Fik Tkj − Tik Fkj ). k=i+1 Undo the unitary transformations. Function of Matrix – p.33/42 Function of Atomic Block Assume f has Taylor series with ∞ radius of cgce and derivatives available. For diagonal blocks T use T = σI + M, σ = trace(T )/n : f (T ) = ∞ X f (k) (σ) k=0 k! M k. Truncate series based on strict error bound, not using size of terms. NB: for n = 2, · ¸ ² α M= 0 −² ⇒ M 2k · ²2k = 0 ¸ 0 , 2k ² M 2k+1 · ²2k+1 = 0 ¸ α²2k . 2k+1 −² Function of Matrix – p.34/42 Features of Algorithm Costs O(n3 ) flops, or up to n4 /3 flops if large blocks needed (close, repeated eigenvalues). Needs derivatives if blocks size > 1: price to pay for treating general f and nonnormal A. Best general f (A) alg. Benchmark for comparing other f (A) algs—general and specific. The basis of a new funm for next MATLAB release. Function of Matrix – p.35/42 OUTLINE Definitions of f (A) Applications Algorithms for particular f Schur–Parlett algorithm for general f I Computing f (A)b Function of Matrix – p.36/42 log(A) b Apply quadrature rule R1 0 f (t) dt ≈ Pm k=1 ck f (tk ) to (Wouk, 1965) £ ¤−1 log A = 0 (A − I) t(A − I) + I dt. R1 Combine with Hessenberg reduction A = QHQT to get (log A) b ≈ Q m X k=1 ¤−1 £ ck tk (H − I) + I d, d = QT (A − I)b, Costs (10/3)n3 + 2mn2 flops. When kI − Ak < 1 can use m-point Gauss-Legendre ≡ Padé approximation! Choose m using (Kenney & Laub, 2001) krmm (X) − log(I + X)k ≤ |rmm (−kXk) − log(1 − kXk)|. When kI − Ak > 1 use adaptive quadrature. Function of Matrix – p.37/42 Aα b dy = α(A − I)[t(A − I) + I]−1 y, dt y(0) = b has unique solution y(t) = [t(A − I) + I]α b ⇒ y(1) = Aα b. Used by Allen, Baglama & Boyd (2000) for α = 1/2, spd A. Example using MATLAB’s ode45. A = gallery(’parter’,64), b = randn(64,1). f (A) tol Succ. steps Fail. atts f evals Rel. err A−1/2 1e-3 1e-6 1e-9 12 14 40 0 0 0 73 85 241 3.5e-8 6.0e-9 7.7e-12 A2/5 1e-3 1e-6 1e-9 15 16 54 0 0 0 79 91 325 2.8e-8 2.4e-9 1.8e-12 Function of Matrix – p.38/42 Interpolation If A has distinct eigenvalues λj , Lagrange interp poly: f (A)b = n X fj `j (A)b, j=0 `j (x) = n Y k=0, k6=j n Y k=0, k6=j (x − λk ) . (λj − λk ) Cost: O(n4 ) flops. For any A, Newton divided difference form: f (A)b = n X i=0 ci i−1 Y j=0 (A − λj I)b, ci = (confluent) div. diffs. Requires derivatives of f . Cost: O(n3 ) flops. Function of Matrix – p.39/42 Cauchy Integral Theorem 1 y= 2πi Z Γ f (z)(zI − A)−1 b dz =: Z g(z) dz. Γ Take circle Γ : z − α = βeiθ , Apply repeated trapezium rule: Z Γ g(z) dz = Z 0 2π 0 ≤ θ ≤ 2π. n−1 2πi X (zk − α)g(zk ), (z(θ) − α)g(z(θ)) dθ ≈ n k=0 where zk − α = βe2πki/n . Use Hessenberg reduction, as before. Function of Matrix – p.40/42 Euler-Maclaurin Error Bound h(x) period 2π , in C 2k+1 (−∞, ∞), |h(2k+1) (x)| ≤ M : ¯ ¯Z 2π ¯ 4πM ζ(2k + 1) ¯ ¯≤ ¯ h(x) dx − T (f ) . n ¯ ¯ 2k+1 n 0 • h(2k+1) (x) proportional to β 2k+2 = radius of circle. • h(2k+1) (x) contains powers of resolvent (z(θ)I − A)−1 . Bad if contour close to some λi or A highly nonnormal. • h(2k+1) (x) contains derivatives of f on contour. Conclude : restricted to matrices not too nonnormal, λi can be enclosed in circle of small radius not close to singularity of derivs of f . Function of Matrix – p.41/42 Future Work F Theory and algorithms for non-primary functions, perhaps linked to an f (A(t)) application. F Better understanding of conditioning of f (A). F Exploiting structure, e.g. A ∈ matrix automorphism group (H, Mackey, Mackey & Tisseur, 2003). http://www.ma.man.ac.uk/~higham/ Function of Matrix – p.42/42
© Copyright 2026 Paperzz