Computing Matrix Functions by Iteration: Convergence, Stability and the Role of Padé Approximants Nick Higham School of Mathematics The University of Manchester [email protected] http://www.ma.man.ac.uk/~higham/ Approximation and Iterative Methods (AMI06) on the occasion of the retirement of Professor Claude Brezinski, Lille, June 22–23, 2006 What is Claude Photographing? Overview Iterations Xk +1 = g(Xk ) for computing √ p ( A, unitary polar factor, . . . ). √ A, sign(A) Padé iterations for sign(A). Avoid Jordan form in convergence proofs. Reduce to: Does Gk tend to zero? Can reduce stability to: Is H k bounded? Connections: matrix sign and matrix square root. √ Convergence of a linear iteration for A. MIMS Nick Higham Matrix functions 3 / 33 Overview Iterations Xk +1 = g(Xk ) for computing √ p ( A, unitary polar factor, . . . ). √ A, sign(A) Padé iterations for sign(A). Avoid Jordan form in convergence proofs. Reduce to: Does Gk tend to zero? Can reduce stability to: Is H k bounded? Connections: matrix sign and matrix square root. √ Convergence of a linear iteration for A. Book in preparation: Functions of a Matrix: Theory and Computation MIMS Nick Higham Matrix functions 3 / 33 Matrix Sign Function Let A ∈ Cn×n have no pure imaginary eigenvalues and let A = ZJZ −1 be a Jordan canonical form with p J= p q J1 0 q 0 , J2 Λ(J1 ) ∈ LHP, −Ip sign(A) = Z 0 MIMS Nick Higham Λ(J2 ) ∈ RHP. 0 Z −1 . Iq Matrix functions 4 / 33 Matrix Sign Function Let A ∈ Cn×n have no pure imaginary eigenvalues and let A = ZJZ −1 be a Jordan canonical form with p J= p q J1 0 q 0 , J2 Λ(J1 ) ∈ LHP, −Ip sign(A) = Z 0 Λ(J2 ) ∈ RHP. 0 Z −1 . Iq sign(A) = A(A2 )−1/2 . MIMS Nick Higham Matrix functions 4 / 33 Matrix Sign Function Let A ∈ Cn×n have no pure imaginary eigenvalues and let A = ZJZ −1 be a Jordan canonical form with p J= p q J1 0 q 0 , J2 Λ(J1 ) ∈ LHP, −Ip sign(A) = Z 0 Λ(J2 ) ∈ RHP. 0 Z −1 . Iq sign(A) = A(A2 )−1/2 . 2 sign(A) = A π MIMS Nick Higham Z ∞ (t 2 I + A2 )−1 dt. 0 Matrix functions 4 / 33 Newton’s Method for Sign Xk +1 = 12 (Xk + Xk−1 ), X0 = A. Convergence Let S := sign(A), G := (A − S)(A + S)−1 . Then k k Xk = (I − G2 )−1 (I + G2 )S, Ei’vals of G are (λi − sign(λi ))/(λi + sign(λi )). Hence ρ(G) < 1 and Gk → 0 . Easy to show kXk +1 − Sk ≤ MIMS Nick Higham 1 −1 kXk k kXk − Sk2 . 2 Matrix functions 5 / 33 Matrix Square Root X is a square root of A ∈ Cn×n ⇐⇒ X 2 = A. Number of square roots may be zero, finite or infinite. For A with no eigenvalues on R− = {x ∈ R : x ≤ 0} , denote by A1/2 the principal square root: unique square root with spectrum in open right half-plane. MIMS Nick Higham Matrix functions 6 / 33 Newton’s Method for Square Root Newton’s method: X0 given, Solve Xk Ek + Ek Xk = A − Xk2 Xk +1 = Xk + Ek k = 0, 1, 2, . . . Assume AX0 = X0 A. Then can show Xk +1 = 21 (Xk + Xk−1 A). (∗) For nonsingular A, local quadratic cgce of full Newton to a primary square root. To which square root do the iterations converge? (∗) can converge when full Newton breaks down. Lack of symmetry in (∗). MIMS Nick Higham Matrix functions 7 / 33 Convergence (Jordan) Assume X0 = p(A) for some poly p. Let Z −1 AZ = J be Jordan canonical form and set Z −1 Xk Z = Yk . Then Yk +1 = 12 (Yk + Yk−1 J), Y0 = J. Convergence of diagonal of Yk reduces to scalar case: Heron: yk +1 = 1 2 λ yk + , yk y0 = λ. Can show that off-diagonal converges. Problem: analysis does not generalize to X0 A = AX0 ! X0 not necessarily a polynomial in A. MIMS Nick Higham Matrix functions 8 / 33 Convergence (via Sign) Theorem Let A ∈ Cn×n have no e’vals on R− . The Newton square root iterates Xk with X0 A = AX0 are related to the Newton sign iterates Sk +1 = 1 (Sk + Sk−1 ), 2 S0 = A−1/2 X0 by Xk ≡ A1/2 Sk . Hence, provided A−1/2 X0 has no pure imag e’vals, Xk are defined and Xk → A1/2 sign(S0 ) quadratically. ⇒: Xk → A1/2 if Λ(A−1/2 X0 ) ⊆ RHP, e.g., if X0 = A . MIMS Nick Higham Matrix functions 9 / 33 More Sign Iterations Start with Newton: xk +1 = MIMS Nick Higham 1 (xk + xk−1 ). 2 Matrix functions 10 / 33 More Sign Iterations Start with Newton: xk +1 = 1 (xk + xk−1 ). 2 Invert: xk +1 = MIMS Nick Higham 2xk . +1 xk2 Matrix functions 10 / 33 More Sign Iterations Start with Newton: xk +1 = 1 (xk + xk−1 ). 2 Invert: xk +1 = 2xk . +1 xk2 Halley: xk +1 = MIMS Nick Higham xk (3 + xk2 ) . 1 + 3xk2 Matrix functions 10 / 33 More Sign Iterations Start with Newton: xk +1 = 1 (xk + xk−1 ). 2 Invert: xk +1 = 2xk . +1 xk2 Halley: xk +1 = xk (3 + xk2 ) . 1 + 3xk2 xk +1 = xk (3 − xk2 ). 2 Newton–Schulz: MIMS Nick Higham Matrix functions 10 / 33 The Padé Family: Derivation For non-pure imaginary z ∈ C, ξ := 1 − z 2 , z z z sign(z) = √ = p , =√ 1−ξ 1 − (1 − z 2 ) z2 [ℓ/m] Padé approximant rℓm (ξ) = pℓm (ξ)/qℓm (ξ) to h(ξ) = (1 − ξ)−1/2 known explicitly. Kenney & Laub (1991): set up iterations xk +1 = fℓm (xk ) := xk MIMS Nick Higham pℓm (1 − xk2 ) , qℓm (1 − xk2 ) x0 = a. Matrix functions 11 / 33 The Padé Family: Examples ℓ m=0 m=1 m=2 0 x 2x 1 + x2 1 x (3 − x 2 ) 2 x(3 + x 2 ) 1 + 3x 2 8x 3 + 6x 2 − x 4 2 MIMS 4x(1 + x 2 ) 1 + 6x 2 + x 4 x x (15 + 10x 2 − x 4 ) x(5 + 10x 2 + x 4 ) (15 − 10x 2 + 3x 4 ) 8 4 1 + 5x 2 1 + 10x 2 + 5x 4 Nick Higham Matrix functions 12 / 33 The Padé Family: Convergence Theorem (Kenney & Laub) Let A ∈ Cn×n have no pure imag e’vals. Let ℓ + m > 0. (a) For ℓ ≥ m − 1 , if kI − A2 k < 1 then Xk → sign(A) as k k → ∞ and kI − Xk2 k < kI − A2 k(ℓ+m+1) . (b) For ℓ = m − 1 and ℓ = m , (ℓ+m+1)k (S − Xk )(S + Xk )−1 = (S − A)(S + A)−1 and hence Xk → sign(A) as k → ∞. MIMS Nick Higham Matrix functions 13 / 33 The Padé Family: Examples ℓ m=0 m=1 m=2 0 x 2x 1 + x2 8x 3 + 6x 2 − x 4 1 x (3 − x 2 ) 2 x(3 + x 2 ) 1 + 3x 2 4x(1 + x 2 ) 1 + 6x 2 + x 4 2 MIMS x x (15 + 10x 2 − x 4 ) (15 − 10x 2 + 3x 4 ) 8 4 1 + 5x 2 Nick Higham x(5 + 10x 2 + x 4 ) 1 + 10x 2 + 5x 4 Matrix functions 14 / 33 Principal Padé Iterations ℓ m=0 m=1 m=2 0 x 2x 1 + x2 8x 3 + 6x 2 − x 4 1 x (3 − x 2 ) 2 x(3 + x 2 ) 1 + 3x 2 4x(1 + x 2 ) 1 + 6x 2 + x 4 2 MIMS x x (15 + 10x 2 − x 4 ) x(5 + 10x 2 + x 4 ) (15 − 10x 2 + 3x 4 ) 8 4 1 + 5x 2 1 + 10x 2 + 5x 4 Nick Higham Matrix functions 15 / 33 Principal Padé Iteration Properties Define gr (x) taking from the main diagonal and superdiagonal of Padé table in zig-zag fashion. Theorem (Kenney & Laub) (1 + x)r − (1 − x)r . (1 + x)r + (1 − x)r (b) gr (x) = tanh(r arctanh(x)). (a) gr (x) = (c) gr (gs (x)) = grs (x) (semigroup property). ⌉ x 2 X′ ⌈ r −2 2 . (d) gr (x) = i=0 r 2 (2i+1)π x 2 sin2 (2i+1)π + cos 2r 2r MIMS Nick Higham Matrix functions 16 / 33 gr (x) = tanh(r arctanh(x)) 1 0.5 0 −0.5 r= 2 r= 4 r= 8 r = 16 −1 −3 −2 −1 0 1 2 3 Class of Square Root Iterations Theorem (H, Mackey, Mackey & Tisseur, 2005) Suppose the iteration Xk +1 = Xk h(Xk2 ), X0 = A converges to sign(A) with order m. If Λ(A) ∩ R− = ∅ and Yk +1 = Yk h(Zk Yk ), Zk +1 = h(Zk Yk )Zk , Y0 = A, Z0 = I, then Yk → A1/2 and Zk → A−1/2 as k → ∞ with order m. MIMS Nick Higham Matrix functions 18 / 33 Class of Square Root Iterations Theorem (H, Mackey, Mackey & Tisseur, 2005) Suppose the iteration Xk +1 = Xk h(Xk2 ), X0 = A converges to sign(A) with order m. If Λ(A) ∩ R− = ∅ and Yk +1 = Yk h(Zk Yk ), Zk +1 = h(Zk Yk )Zk , Y0 = A, Z0 = I, then Yk → A1/2 and Zk → A−1/2 as k → ∞ with order m. Proof makes use of sign MIMS Nick Higham 0 A I 0 = 0 A−1/2 Matrix functions A1/2 . 0 18 / 33 Examples Newton Sign: Xk +1 = Xk · 21 (I + (Xk2 )−1 ) ≡ Xk h(Xk2 ), X0 = A. Yk +1 = 12 Yk (I + (Zk Yk )−1 ) = 12 (Yk + Zk−1 ), Y0 = A. Denman–Beavers sq root iteration: 1 Yk +1 = Yk + Zk−1 , Y0 = A, 2 1 Zk +1 = Zk + Yk−1 , Z0 = I. 2 MIMS Nick Higham Matrix functions 19 / 33 Examples Newton Sign: Xk +1 = Xk · 21 (I + (Xk2 )−1 ) ≡ Xk h(Xk2 ), X0 = A. Yk +1 = 12 Yk (I + (Zk Yk )−1 ) = 12 (Yk + Zk−1 ), Y0 = A. Denman–Beavers sq root iteration: 1 Yk +1 = Yk + Zk−1 , Y0 = A, 2 1 Zk +1 = Zk + Yk−1 , Z0 = I. 2 Newton–Schulz sq root iteration: Yk +1 = 12 Yk (3I − Zk Yk ), Zk +1 = 21 (3I − Zk Yk )Zk , MIMS Nick Higham Y0 = A, Z0 = I. Matrix functions 19 / 33 History of Newton Sqrt Instability Xk +1 = 21 (Xk + Xk−1 A). Instability of Newton noted by Laasonen (1958): “Newton’s method if carried out indefinitely, is not stable whenever the ratio of the largest to the smallest eigenvalue of A exceeds the value 9.” Described informally by Blackwell (1985) in Mathematical People: Profiles and Interviews. Analyzed by H (1986) for diagonalizable A by deriving “error amplification factors”. MIMS Nick Higham Matrix functions 21 / 33 Stability Definition The iteration Xk +1 = g(Xk ) is stable in a nbhd of a fixed point X if Fréchet derivative dgX has bounded powers. MIMS Nick Higham Matrix functions 22 / 33 Stability Definition The iteration Xk +1 = g(Xk ) is stable in a nbhd of a fixed point X if Fréchet derivative dgX has bounded powers. For scalars & superlinear g, dgX = g ′ (x) = 0. MIMS Nick Higham Matrix functions 22 / 33 Stability Definition The iteration Xk +1 = g(Xk ) is stable in a nbhd of a fixed point X if Fréchet derivative dgX has bounded powers. For scalars & superlinear g, dgX = g ′ (x) = 0. Let X0 = X + E0 , Ek := Xk − X . Then Xk +1 = g(Xk ) = g(X + Ek ) = g(X ) + dgX (Ek ) + o(kEk k). So, since g(X ) = X , Ek +1 = dgX (Ek ) + o(kEk k). If kdgXi (E)k ≤ c, then recurring leads to kEk k ≤ ckE0 k + kc · o(kE0 k). MIMS Nick Higham Matrix functions 22 / 33 Stability of Newton Square Root g(X ) = 12 (X + X −1 A). dgX (E) = 12 (E − X −1 EX −1 A). Relevant fixed point: X = A1/2 . dgA1/2 (E) = 12 (E − A−1/2 EA1/2 ). Ei’vals of dgA1/2 are 1 −1/2 1/2 (1 − λi λj ), 2 i, j = 1 : n. For stability we need −1/2 1/2 maxi,j 12 1 − λi λj < 1. For hpd A, need κ2 (A) < 9. MIMS Nick Higham Matrix functions 23 / 33 Advantages Uses only Fréchet derivative of g. No additional assumptions on A. Perturbation analysis is all in the definition. General, unifying approach. Facilitates analysis of families of iterations. MIMS Nick Higham Matrix functions 24 / 33 Stability of Sign Iterations (H) Theorem Let Xk +1 = g(Xk ) be any superlinearly convergent iteration for S = sign(X0 ). Then dgS (E) = LS (E) = 12 (E − SES) , where LS is the Fréchet derivative of the matrix sign function at S. Hence dgS is idempotent (dgS ◦ dgS = dgS ) and the iteration is stable. “All” sign iterations are automatically stable. MIMS Nick Higham Matrix functions 25 / 33 Theorem (H, Mackey, Mackey & Tisseur, 2005) Consider the iteration function Yh(ZY ) , G(Y , Z ) = h(ZY )Z where Xk +1 = Xk h(Xk2 ) is any superlinearly cgt iteration for sign(X0 ). Any pair P = (B, B −1 ) is a fixed point for G, and the Fréchet derivative of G at P is E − BFB 1 . dGP (E, F ) = 2 F − B −1 EB −1 dGP is idempotent and hence the iteration is stable. In particular: Denman–Beavers iteration is stable. MIMS Nick Higham Matrix functions 26 / 33 Binomial Iteration for (I − C)1/2 Suppose ρ(C) < 1. Define (I − C)1/2 =: I − P and square to obtain P 2 − 2P = −C. Suggests: Pk +1 = 12 (C + Pk2 ), P0 = 0. I − Pk reproduces 1/2 (I − C) = ∞ 1 X 2 j=0 j j (−C) ≡ I − ∞ X αj C j , (αj > 0) j=1 up to terms of order C k . MIMS Nick Higham Matrix functions 27 / 33 Binomial Iteration Convergence (1) Pk +1 = 1 (C + Pk2 ), 2 P0 = 0. Theorem Let C ∈ Rn×n satisfy C ≥ 0 and ρ(C) < 1 and write (I − C)1/2 = I − P. Then Pk →P with 0 ≤ Pk ≤ Pk +1 ≤ P, MIMS Nick Higham k ≥ 0. Matrix functions 28 / 33 Binomial Iteration Convergence (1) Pk +1 = 1 (C + Pk2 ), 2 P0 = 0. Theorem Let C ∈ Rn×n satisfy C ≥ 0 and ρ(C) < 1 and write (I − C)1/2 = I − P. Then Pk →P with 0 ≤ Pk ≤ Pk +1 ≤ P, k ≥ 0. Let Xk = Pk /2. Then Xk +1 = Xk2 + C/4. MIMS Nick Higham Matrix functions 28 / 33 Binomial Iteration Convergence (2) Theorem (H) Let the e’vals λ of C ∈ Cn×n lie in the cardioid { 2z − z 2 : z ∈ C, |z| < 1 }. Then (I − C)1/2 =: I − P exists and Pk → P linearly. 2 1 0 −1 −2 −4 MIMS Nick Higham −3 −2 −1 0 1 2 Matrix functions 29 / 33 Paradox P (I − C)1/2 = ∞ j=0 convergence 1. 1 2 j (−C)j has radius of Pk +1 = 12 (C + Pk2 ), P0 = 0: Pk reproduces first k + 1 terms of the power series. Pk converges in the cardioid ⊇ unit circle. MIMS Nick Higham Matrix functions 30 / 33 Conclusions ◮ Stability equivalent to power boundedness of Fréchet derivative. ◮ Better understanding of convergence analysis (prefer matrix powers to Jordan form.) ◮ Matrix sign function is fundamental and connections with sqrt can be exploited. ◮ Padé iterations for sign have many interesting properties. ◮ Analysis of linearly convergent iterations can be tricky. http://www.ma.man.ac.uk/~higham/ MIMS Nick Higham Matrix functions 31 / 33 What is Claude Photographing?
© Copyright 2026 Paperzz