Computing Matrix Functions by Iteration

Computing Matrix Functions by Iteration:
Convergence, Stability and the Role of
Padé Approximants
Nick Higham
School of Mathematics
The University of Manchester
[email protected]
http://www.ma.man.ac.uk/~higham/
Approximation and Iterative Methods (AMI06) on the
occasion of the retirement of
Professor Claude Brezinski, Lille, June 22–23, 2006
What is Claude Photographing?
Overview
Iterations Xk +1 = g(Xk ) for computing
√
p
( A, unitary polar factor, . . . ).
√
A, sign(A)
Padé iterations for sign(A).
Avoid Jordan form in convergence proofs.
Reduce to: Does Gk tend to zero?
Can reduce stability to: Is H k bounded?
Connections: matrix sign and matrix square root.
√
Convergence of a linear iteration for A.
MIMS
Nick Higham
Matrix functions
3 / 33
Overview
Iterations Xk +1 = g(Xk ) for computing
√
p
( A, unitary polar factor, . . . ).
√
A, sign(A)
Padé iterations for sign(A).
Avoid Jordan form in convergence proofs.
Reduce to: Does Gk tend to zero?
Can reduce stability to: Is H k bounded?
Connections: matrix sign and matrix square root.
√
Convergence of a linear iteration for A.
Book in preparation:
Functions of a Matrix: Theory and Computation
MIMS
Nick Higham
Matrix functions
3 / 33
Matrix Sign Function
Let A ∈ Cn×n have no pure imaginary eigenvalues and let
A = ZJZ −1 be a Jordan canonical form with
p
J=
p
q
J1
0
q
0
,
J2
Λ(J1 ) ∈ LHP,
−Ip
sign(A) = Z
0
MIMS
Nick Higham
Λ(J2 ) ∈ RHP.
0
Z −1 .
Iq
Matrix functions
4 / 33
Matrix Sign Function
Let A ∈ Cn×n have no pure imaginary eigenvalues and let
A = ZJZ −1 be a Jordan canonical form with
p
J=
p
q
J1
0
q
0
,
J2
Λ(J1 ) ∈ LHP,
−Ip
sign(A) = Z
0
Λ(J2 ) ∈ RHP.
0
Z −1 .
Iq
sign(A) = A(A2 )−1/2 .
MIMS
Nick Higham
Matrix functions
4 / 33
Matrix Sign Function
Let A ∈ Cn×n have no pure imaginary eigenvalues and let
A = ZJZ −1 be a Jordan canonical form with
p
J=
p
q
J1
0
q
0
,
J2
Λ(J1 ) ∈ LHP,
−Ip
sign(A) = Z
0
Λ(J2 ) ∈ RHP.
0
Z −1 .
Iq
sign(A) = A(A2 )−1/2 .
2
sign(A) = A
π
MIMS
Nick Higham
Z
∞
(t 2 I + A2 )−1 dt.
0
Matrix functions
4 / 33
Newton’s Method for Sign
Xk +1 = 12 (Xk + Xk−1 ),
X0 = A.
Convergence
Let S := sign(A), G := (A − S)(A + S)−1 . Then
k
k
Xk = (I − G2 )−1 (I + G2 )S,
Ei’vals of G are (λi − sign(λi ))/(λi + sign(λi )).
Hence ρ(G) < 1 and Gk → 0 .
Easy to show
kXk +1 − Sk ≤
MIMS
Nick Higham
1 −1
kXk k kXk − Sk2 .
2
Matrix functions
5 / 33
Matrix Square Root
X is a square root of A ∈ Cn×n ⇐⇒ X 2 = A.
Number of square roots may be zero, finite or infinite.
For A with no eigenvalues on R− = {x ∈ R : x ≤ 0} ,
denote by A1/2 the principal square root: unique square
root with spectrum in open right half-plane.
MIMS
Nick Higham
Matrix functions
6 / 33
Newton’s Method for Square Root
Newton’s method: X0 given,
Solve Xk Ek + Ek Xk = A − Xk2
Xk +1 = Xk + Ek
k = 0, 1, 2, . . .
Assume AX0 = X0 A. Then can show
Xk +1 = 21 (Xk + Xk−1 A).
(∗)
For nonsingular A, local quadratic cgce of full Newton
to a primary square root.
To which square root do the iterations converge?
(∗) can converge when full Newton breaks down.
Lack of symmetry in (∗).
MIMS
Nick Higham
Matrix functions
7 / 33
Convergence (Jordan)
Assume X0 = p(A) for some poly p. Let Z −1 AZ = J be
Jordan canonical form and set Z −1 Xk Z = Yk . Then
Yk +1 = 12 (Yk + Yk−1 J),
Y0 = J.
Convergence of diagonal of Yk reduces to scalar case:
Heron:
yk +1 =
1
2
λ
yk +
,
yk
y0 = λ.
Can show that off-diagonal converges.
Problem: analysis does not generalize to X0 A = AX0 !
X0 not necessarily a polynomial in A.
MIMS
Nick Higham
Matrix functions
8 / 33
Convergence (via Sign)
Theorem
Let A ∈ Cn×n have no e’vals on R− . The Newton square
root iterates Xk with X0 A = AX0 are related to the Newton
sign iterates
Sk +1 =
1
(Sk + Sk−1 ),
2
S0 = A−1/2 X0
by Xk ≡ A1/2 Sk . Hence, provided A−1/2 X0 has no pure
imag e’vals, Xk are defined and Xk → A1/2 sign(S0 )
quadratically.
⇒: Xk → A1/2 if Λ(A−1/2 X0 ) ⊆ RHP, e.g., if X0 = A .
MIMS
Nick Higham
Matrix functions
9 / 33
More Sign Iterations
Start with Newton:
xk +1 =
MIMS
Nick Higham
1
(xk + xk−1 ).
2
Matrix functions
10 / 33
More Sign Iterations
Start with Newton:
xk +1 =
1
(xk + xk−1 ).
2
Invert:
xk +1 =
MIMS
Nick Higham
2xk
.
+1
xk2
Matrix functions
10 / 33
More Sign Iterations
Start with Newton:
xk +1 =
1
(xk + xk−1 ).
2
Invert:
xk +1 =
2xk
.
+1
xk2
Halley:
xk +1 =
MIMS
Nick Higham
xk (3 + xk2 )
.
1 + 3xk2
Matrix functions
10 / 33
More Sign Iterations
Start with Newton:
xk +1 =
1
(xk + xk−1 ).
2
Invert:
xk +1 =
2xk
.
+1
xk2
Halley:
xk +1 =
xk (3 + xk2 )
.
1 + 3xk2
xk +1 =
xk
(3 − xk2 ).
2
Newton–Schulz:
MIMS
Nick Higham
Matrix functions
10 / 33
The Padé Family: Derivation
For non-pure imaginary z ∈ C, ξ := 1 − z 2 ,
z
z
z
sign(z) = √ = p
,
=√
1−ξ
1 − (1 − z 2 )
z2
[ℓ/m] Padé approximant rℓm (ξ) = pℓm (ξ)/qℓm (ξ) to
h(ξ) = (1 − ξ)−1/2 known explicitly.
Kenney & Laub (1991): set up iterations
xk +1 = fℓm (xk ) := xk
MIMS
Nick Higham
pℓm (1 − xk2 )
,
qℓm (1 − xk2 )
x0 = a.
Matrix functions
11 / 33
The Padé Family: Examples
ℓ
m=0
m=1
m=2
0
x
2x
1 + x2
1
x
(3 − x 2 )
2
x(3 + x 2 )
1 + 3x 2
8x
3 + 6x 2 − x 4
2
MIMS
4x(1 + x 2 )
1 + 6x 2 + x 4
x
x (15 + 10x 2 − x 4 ) x(5 + 10x 2 + x 4 )
(15 − 10x 2 + 3x 4 )
8
4
1 + 5x 2
1 + 10x 2 + 5x 4
Nick Higham
Matrix functions
12 / 33
The Padé Family: Convergence
Theorem (Kenney & Laub)
Let A ∈ Cn×n have no pure imag e’vals. Let ℓ + m > 0.
(a) For ℓ ≥ m − 1 , if kI − A2 k < 1 then Xk → sign(A) as
k
k → ∞ and kI − Xk2 k < kI − A2 k(ℓ+m+1) .
(b) For ℓ = m − 1 and ℓ = m ,
(ℓ+m+1)k
(S − Xk )(S + Xk )−1 = (S − A)(S + A)−1
and hence Xk → sign(A) as k → ∞.
MIMS
Nick Higham
Matrix functions
13 / 33
The Padé Family: Examples
ℓ
m=0
m=1
m=2
0
x
2x
1 + x2
8x
3 + 6x 2 − x 4
1
x
(3 − x 2 )
2
x(3 + x 2 )
1 + 3x 2
4x(1 + x 2 )
1 + 6x 2 + x 4
2
MIMS
x
x (15 + 10x 2 − x 4 )
(15 − 10x 2 + 3x 4 )
8
4
1 + 5x 2
Nick Higham
x(5 + 10x 2 + x 4 )
1 + 10x 2 + 5x 4
Matrix functions
14 / 33
Principal Padé Iterations
ℓ
m=0
m=1
m=2
0
x
2x
1 + x2
8x
3 + 6x 2 − x 4
1
x
(3 − x 2 )
2
x(3 + x 2 )
1 + 3x 2
4x(1 + x 2 )
1 + 6x 2 + x 4
2
MIMS
x
x (15 + 10x 2 − x 4 ) x(5 + 10x 2 + x 4 )
(15 − 10x 2 + 3x 4 )
8
4
1 + 5x 2
1 + 10x 2 + 5x 4
Nick Higham
Matrix functions
15 / 33
Principal Padé Iteration Properties
Define gr (x) taking from the main diagonal and
superdiagonal of Padé table in zig-zag fashion.
Theorem (Kenney & Laub)
(1 + x)r − (1 − x)r
.
(1 + x)r + (1 − x)r
(b) gr (x) = tanh(r arctanh(x)).
(a) gr (x) =
(c) gr (gs (x)) = grs (x) (semigroup property).
⌉
x
2 X′ ⌈ r −2
2
.
(d) gr (x) =
i=0
r
2 (2i+1)π x 2
sin2 (2i+1)π
+
cos
2r
2r
MIMS
Nick Higham
Matrix functions
16 / 33
gr (x) = tanh(r arctanh(x))
1
0.5
0
−0.5
r= 2
r= 4
r= 8
r = 16
−1
−3
−2
−1
0
1
2
3
Class of Square Root Iterations
Theorem (H, Mackey, Mackey & Tisseur, 2005)
Suppose the iteration Xk +1 = Xk h(Xk2 ), X0 = A converges to
sign(A) with order m. If Λ(A) ∩ R− = ∅ and
Yk +1 = Yk h(Zk Yk ),
Zk +1 = h(Zk Yk )Zk ,
Y0 = A,
Z0 = I,
then Yk → A1/2 and Zk → A−1/2 as k → ∞ with order m.
MIMS
Nick Higham
Matrix functions
18 / 33
Class of Square Root Iterations
Theorem (H, Mackey, Mackey & Tisseur, 2005)
Suppose the iteration Xk +1 = Xk h(Xk2 ), X0 = A converges to
sign(A) with order m. If Λ(A) ∩ R− = ∅ and
Yk +1 = Yk h(Zk Yk ),
Zk +1 = h(Zk Yk )Zk ,
Y0 = A,
Z0 = I,
then Yk → A1/2 and Zk → A−1/2 as k → ∞ with order m.
Proof makes use of sign
MIMS
Nick Higham
0 A
I 0
=
0
A−1/2
Matrix functions
A1/2
.
0
18 / 33
Examples
Newton Sign:
Xk +1 = Xk · 21 (I + (Xk2 )−1 ) ≡ Xk h(Xk2 ), X0 = A.
Yk +1 = 12 Yk (I + (Zk Yk )−1 ) = 12 (Yk + Zk−1 ), Y0 = A.
Denman–Beavers sq root iteration:
1
Yk +1 =
Yk + Zk−1 ,
Y0 = A,
2
1
Zk +1 =
Zk + Yk−1 ,
Z0 = I.
2
MIMS
Nick Higham
Matrix functions
19 / 33
Examples
Newton Sign:
Xk +1 = Xk · 21 (I + (Xk2 )−1 ) ≡ Xk h(Xk2 ), X0 = A.
Yk +1 = 12 Yk (I + (Zk Yk )−1 ) = 12 (Yk + Zk−1 ), Y0 = A.
Denman–Beavers sq root iteration:
1
Yk +1 =
Yk + Zk−1 ,
Y0 = A,
2
1
Zk +1 =
Zk + Yk−1 ,
Z0 = I.
2
Newton–Schulz sq root iteration:
Yk +1 = 12 Yk (3I − Zk Yk ),
Zk +1 = 21 (3I − Zk Yk )Zk ,
MIMS
Nick Higham
Y0 = A,
Z0 = I.
Matrix functions
19 / 33
History of Newton Sqrt Instability
Xk +1 = 21 (Xk + Xk−1 A).
Instability of Newton noted by Laasonen (1958):
“Newton’s method if carried out indefinitely, is
not stable whenever the ratio of the largest to
the smallest eigenvalue of A exceeds the
value 9.”
Described informally by Blackwell (1985) in
Mathematical People: Profiles and Interviews.
Analyzed by H (1986) for diagonalizable A by deriving
“error amplification factors”.
MIMS
Nick Higham
Matrix functions
21 / 33
Stability
Definition
The iteration Xk +1 = g(Xk ) is stable in a nbhd of a fixed
point X if Fréchet derivative dgX has bounded powers.
MIMS
Nick Higham
Matrix functions
22 / 33
Stability
Definition
The iteration Xk +1 = g(Xk ) is stable in a nbhd of a fixed
point X if Fréchet derivative dgX has bounded powers.
For scalars & superlinear g, dgX = g ′ (x) = 0.
MIMS
Nick Higham
Matrix functions
22 / 33
Stability
Definition
The iteration Xk +1 = g(Xk ) is stable in a nbhd of a fixed
point X if Fréchet derivative dgX has bounded powers.
For scalars & superlinear g, dgX = g ′ (x) = 0.
Let X0 = X + E0 , Ek := Xk − X . Then
Xk +1 = g(Xk ) = g(X + Ek ) = g(X ) + dgX (Ek ) + o(kEk k).
So, since g(X ) = X ,
Ek +1 = dgX (Ek ) + o(kEk k).
If kdgXi (E)k ≤ c, then recurring leads to
kEk k ≤ ckE0 k + kc · o(kE0 k).
MIMS
Nick Higham
Matrix functions
22 / 33
Stability of Newton Square Root
g(X ) = 12 (X + X −1 A).
dgX (E) = 12 (E − X −1 EX −1 A).
Relevant fixed point: X = A1/2 .
dgA1/2 (E) = 12 (E − A−1/2 EA1/2 ).
Ei’vals of dgA1/2 are
1
−1/2 1/2
(1 − λi
λj ),
2
i, j = 1 : n.
For stability we need
−1/2 1/2 maxi,j 12 1 − λi
λj < 1.
For hpd A, need κ2 (A) < 9.
MIMS
Nick Higham
Matrix functions
23 / 33
Advantages
Uses only Fréchet derivative of g.
No additional assumptions on A.
Perturbation analysis is all in the definition.
General, unifying approach.
Facilitates analysis of families of iterations.
MIMS
Nick Higham
Matrix functions
24 / 33
Stability of Sign Iterations (H)
Theorem
Let Xk +1 = g(Xk ) be any superlinearly convergent
iteration for S = sign(X0 ).
Then dgS (E) = LS (E) = 12 (E − SES) , where LS is the
Fréchet derivative of the matrix sign function at S.
Hence dgS is idempotent (dgS ◦ dgS = dgS ) and the
iteration is stable.
“All” sign iterations are automatically stable.
MIMS
Nick Higham
Matrix functions
25 / 33
Theorem (H, Mackey, Mackey & Tisseur, 2005)
Consider the iteration function
Yh(ZY )
,
G(Y , Z ) =
h(ZY )Z
where Xk +1 = Xk h(Xk2 ) is any superlinearly cgt iteration for
sign(X0 ). Any pair P = (B, B −1 ) is a fixed point for G, and
the Fréchet derivative of G at P is
E − BFB
1
.
dGP (E, F ) = 2
F − B −1 EB −1
dGP is idempotent and hence the iteration is stable.
In particular: Denman–Beavers iteration is stable.
MIMS
Nick Higham
Matrix functions
26 / 33
Binomial Iteration for (I − C)1/2
Suppose ρ(C) < 1. Define (I − C)1/2 =: I − P and square to
obtain P 2 − 2P = −C. Suggests:
Pk +1 = 12 (C + Pk2 ),
P0 = 0.
I − Pk reproduces
1/2
(I − C)
=
∞ 1
X
2
j=0
j
j
(−C) ≡ I −
∞
X
αj C j ,
(αj > 0)
j=1
up to terms of order C k .
MIMS
Nick Higham
Matrix functions
27 / 33
Binomial Iteration Convergence (1)
Pk +1 =
1
(C + Pk2 ),
2
P0 = 0.
Theorem
Let C ∈ Rn×n satisfy C ≥ 0 and ρ(C) < 1 and write
(I − C)1/2 = I − P. Then Pk →P with
0 ≤ Pk ≤ Pk +1 ≤ P,
MIMS
Nick Higham
k ≥ 0.
Matrix functions
28 / 33
Binomial Iteration Convergence (1)
Pk +1 =
1
(C + Pk2 ),
2
P0 = 0.
Theorem
Let C ∈ Rn×n satisfy C ≥ 0 and ρ(C) < 1 and write
(I − C)1/2 = I − P. Then Pk →P with
0 ≤ Pk ≤ Pk +1 ≤ P,
k ≥ 0.
Let Xk = Pk /2. Then Xk +1 = Xk2 + C/4.
MIMS
Nick Higham
Matrix functions
28 / 33
Binomial Iteration Convergence (2)
Theorem (H)
Let the e’vals λ of C ∈ Cn×n lie in the cardioid
{ 2z − z 2 : z ∈ C, |z| < 1 }.
Then (I − C)1/2 =: I − P exists and Pk → P linearly.
2
1
0
−1
−2
−4
MIMS
Nick Higham
−3
−2
−1
0
1
2
Matrix functions
29 / 33
Paradox
P
(I − C)1/2 = ∞
j=0
convergence 1.
1
2
j
(−C)j has radius of
Pk +1 = 12 (C + Pk2 ), P0 = 0:
Pk reproduces first k + 1 terms of the power series.
Pk converges in the cardioid ⊇ unit circle.
MIMS
Nick Higham
Matrix functions
30 / 33
Conclusions
◮ Stability equivalent to power boundedness of Fréchet
derivative.
◮ Better understanding of convergence analysis
(prefer matrix powers to Jordan form.)
◮ Matrix sign function is fundamental and connections
with sqrt can be exploited.
◮ Padé iterations for sign have many interesting
properties.
◮ Analysis of linearly convergent iterations can be tricky.
http://www.ma.man.ac.uk/~higham/
MIMS
Nick Higham
Matrix functions
31 / 33
What is Claude Photographing?