c 1998 Society for Industrial and Applied Mathematics ° SIAM J. MATRIX ANAL. APPL. Vol. 20, No. 2, pp. 471–492 RELATIVE PERTURBATION THEORY: II. EIGENSPACE AND SINGULAR SUBSPACE VARIATIONS∗ REN-CANG LI† Abstract. The classical perturbation theory for Hermitian matrix eigenvalue and singular value problems provides bounds on invariant subspace variations that are proportional to the reciprocals of absolute gaps between subsets of spectra or subsets of singular values. These bounds may be bad news for invariant subspaces corresponding to clustered eigenvalues or clustered singular values of much smaller magnitudes than the norms of matrices under considerations. In this paper, we e = D∗ AD and consider how eigenspaces of a Hermitian matrix A change when it is perturbed to A e = D1∗ BD2 , where how singular spaces of a (nonsquare) matrix B change when it is perturbed to B D, D1 , and D2 are nonsingular. It is proved that under these kinds of perturbations, the changes of invariant subspaces are proportional to the reciprocals of relative gaps between subsets of spectra or subsets of singular values. The classical Davis–Kahan sin θ theorems and Wedin sin θ theorems are extended. Key words. multiplicative perturbation, relative perturbation theory, relative gap, eigenvector, singular vector, structured Sylvester equation, graded matrix AMS subject classifications. 15A18, 15A42, 65F15, 65F35, 65G99 PII. S0895479896298506 e be two n × n Hermitian matrices with eigende1. Introduction. Let A and A compositions (1.1) A = (U1 , U2 ) where U = (1.2) (1.3) ¡ Λ1 k n−k U1 U2 Λ2 U1∗ U2∗ ¡ e= ,U k e1 U à e2 ) e = (U e1 , U and A n−k e2 U e1 Λ !à e2 Λ e∗ U 1 e∗ U 2 ! , are unitary, and Λ1 = diag(λ1 , . . . , λk ), Λ2 = diag(λk+1 , . . . , λn ), e1 , . . . , λ ek ), Λ ek+1 , . . . , λ en ). e 1 = diag(λ e 2 = diag(λ Λ e2 e are close and that the spectrum of Λ1 and that of Λ Suppose now that A and A e 1 and that of Λ2 ) are well separated. The question is, How (or the spectrum of Λ ∗ Received by the editors February 2, 1996; accepted for publication (in revised form) by B. Kagstrom April 29, 1998; published electronically November 23, 1998. Part of this work was done during the summer of 1994 while the author was at the Department of Mathematics, University of California at Berkeley. A preliminary version of this paper appeared as Technical Report UBC//CSD-94-856, Computer Science Division, Department of EECS, University of California at Berkeley. This material is based in part upon work supported by Argonne National Laboratory under grant 20552402, the University of Tennessee through the Advanced Research Projects Agency under contract DAAL03-91-C-0047, the National Science Foundation under grant ASC-9005933, the National Science Infrastructure grants CDA-8722788 and CDA-9401156, a Householder Fellowship in Scientific Computing at Oak Ridge National Laboratory, and the Applied Mathematical Sciences Research Program, Office of Energy Research, United States Department of Energy contract DEAC05-96OR22464 with Lockheed Martin Energy Research Corp. http://www.siam.org/journals/simax/20-2/29850.html † Mathematical Science Section, Oak Ridge National Laboratory, P.O. Box 2008, Bldg. 6012, Oak Ridge, TN 37831-6367. Current address: Department of Mathematics, University of Kentucky, Lexington, KY 40506 ([email protected]). 471 472 REN-CANG LI ei ? This question has close are the eigenspaces spanned by the columns of Ui and U been answered well by four celebrated theorems: the so-called sin θ, tan θ, sin 2θ, and tan 2θ theorems due to Davis and Kahan [3] for arbitrary additive perturbations in the sense that the perturbations to A can be made arbitrary. It is proved that the changes of invariant subspaces are proportional to the reciprocals of absolute gaps between subsets of spectra. e = D∗ AD, In the case of multiplicative perturbations when A is perturbed to A Eisenstat and Ipsen [7] first attacked the question by bounding the angles between a e eigenspace spanned by the columns of U e1 , one-dimensional eigenspace of A and A’s and they ultimately obtained bounds for the angle between A’s eigenspace spanned e eigenspace spanned by the columns of U e1 . The study by the columns of U1 and A’s suggests that the changes of invariant subspaces be proportional to the reciprocals of relative gaps between subsets of spectra. This paper will study the same question, but using a different approach. It is explained that bounding the angle between the eigenspaces is related to bounding the solutions to Sylvester equations ΩX − XΓ = S, where S has very special structures. Our approach is more or less along the lines of Davis and Kahan [3] and Bhatia, Davis, and McIntosh [2], where no special structures for S are known. There are a number of advantages of the new approach over Eisenstat and Ipsen’s. The new approach can deal directly with an eigenspace and its perturbed one, unlike Eisenstat √ and Ipsen’s approach, and consequently gets rid of the unpleasant fact k in Eisenstat and Ipsen’s bounds; the new approach makes no distinctions in treating an eigenvector and an eigenspace and two eigenspaces of the same or different dimensions; the new approach is mathematically more elegant and makes it possible to extend Davis– Kahan theorems in all unitarily invariant norms. A similar question for singular value decompositions will be answered also. Although special, multiplicative perturbations cover component-wise relative perturbations to entries of symmetric tridiagonal matrices with zero diagonal [5, 9], entries of bidiagonal and biacyclic matrices [1, 4, 5], and more realistically perturbations in graded nonnegative definite Hermitian matrices [6, 14] and in graded matrices of singular value problems [6, 14] and more [8]. The rest of this paper is organized as follows. Section 2 serves two purposes: to present essential preliminary definitions and lemmas, and to briefly discuss technical similarities and differences between our extensions and the classical Davis–Kahan e = D∗ AD theorems. Section 3 details relative perturbation theorems for A and A and for nonnegative definite Hermitian matrices A that themselves may be very ill conditioned but can be scaled to a well-conditioned one. Section 3 also remarks on how to extend our approach to the perturbations of diagonalizable matrices. Section 4 develops analogous relative perturbation theorems but for the singular value problems. The proofs for theorems in section 4 turn out to be quite long and therefore are postponed to section 5. 2. Preliminaries. Throughout this paper, we follow notation used in the first part of this series [12]. 2.1. Relative distances. We shall use the following two kinds of relative distances to measure relative accuracies in numerical approximations: %p and χ are defined for α, α e ∈ C by (2.1) |α − α e| e) = p %p (α, α p |α|p + |e α|p |α − α e| , for 1 ≤ p ≤ ∞, and χ(α, α e) = p |αe α| RELATIVE PERTURBATION THEORY II 473 with convention 0/0 = 0 for convenience. Both have better mathematical properties than the classical measurement |δ|: the relative error in α e = α(1 + δ) as an approximation to α is (2.2) δ = relative error in α e= α e−α , α which is, however, good enough and actually more convenient to use in numerical computations. So we shall also present bounds using this classical measurement. The use of any particular relative distance in our perturbation bounds comes naturally with their derivations. From the numerical point of view, any one of the relative distances is just as good as the others, but theoretically they provide bounds of very different features. It can be proved that these relative distances are topologically equivalent; see [12] for details. 2.2. Angles between two subspaces. Since this paper concerns the variations of subspaces, we need some metrics to measure the differences between two subspaces. In this, we follow Davis and Kahan [3] and Stewart and Sun [15, Chapters I and II]. Wedin [17] presented an illuminating discussion on angles between two subspaces, e ∈ Cn×k (n > k) have full column rank k, and define the angle matrix too. Let X, X e e as Θ(X, X) between X and X (2.3) def e X e ∗ X) e −1 X e ∗ X(X ∗ X)−1/2 )1/2 . e = arccos((X ∗ X)−1/2 X ∗ X( Θ(X, X) e are defined The canonical angles between the subspaces X = R(X) and Xe = R(X) e where R(X) denotes the to be the singular values of the Hermitian matrix Θ(X, X), subspace spanned by X’s columns. The following lemma is well known. For a proof of it, the reader is referred to, e.g., Li [11, Lemma 2.1]. Lemma 2.1. Suppose that partition ¡ k e X n−k e1 X e X e1 )−1 = (X, k n−k ∈ Cn×n is a nonsingular matrix, and Ye ∗ Ye1∗ . Then for any unitarily invariant norm ||| · |||, e = (Ye1∗ Ye1 )−1/2 Ye1∗ X(X ∗ X)−1/2 . sin Θ(X, X) In this lemma, as well as many other places in the rest of this paper, we talk about the “same” unitarily invariant norm ||| · ||| that applies to matrices of different dimensions at the same time. Such applications of a unitarily invariant norm are understood in the following sense: first there is a unitarily invariant norm ||| · ||| on CM ×N for sufficiently large integers M and N ; then, for a matrix X ∈ Cm×n (m ≤ M and n ≤ N ), |||X||| is defined by appending X with zero blocks to make it M × N and then taking the unitarily invariant norm of the enlarged matrix. e =U e1 as in (1.1), by Lemma 2.1, one has Taking X = U1 and X 1/2 e1 ) = arccos(U ∗ U e1 ) = U e ∗ U1 . e e∗ and sin Θ(U1 , U (2.4) Θ(U1 , U 1 1 U1 U1 ) 2 474 REN-CANG LI t Spectrum of Λ 1 t t d d Spectrum of Λ e2 t d t d d e2 are disjoint. Fig. 2.1. The spectrum of Λ1 and that of Λ (a) (b) −α δ −α Spectrum of Λ1 r 0 α 0 r α δ - e2 Spectrum of Λ e2 are separated by two intervals, and one of the Fig. 2.2. The spectrum of Λ1 and that of Λ spectra scatters around the origin. Occasionally, it is also of interest to measure how far a lower-dimensional subspace is away from a higher-dimensional one. In such a situation, an angle matrix can still be e If we do so, Lemma 2.1 defined as in (2.3), but with X having fewer columns than X. remains valid. However, Θ( · , · ) is no longer symmetric with respect to its arguments. In the case of eigenspace variation in which we are interested we may take U1,sub , a submatrix consisting of a few (or just one) of U1 ’s columns and ask how close A’s e eigenspace R(U e1 ). Still, we have eigenspace R(U1,sub ) is to A’s (2.4a) 1/2 e1 ) = arccos(U ∗ U e e∗ Θ(U1,sub , U and 1,sub 1 U1 U1,sub ) e1 ) = U e2∗ U1,sub . sin Θ(U1,sub , U 2.3. Separation of spectra. In deriving bounds on the sines of the angles e1 ), certain disjointness between Λ1 ’s spectrum between the eigenspaces R(U1 ) and R(U e 1 ’s and Λ2 ’s) is assumed. Depending on what matrix norms are e 2 ’s (or between Λ and Λ used, two different kinds of separations, one stronger than the other, are considered. For bounds in Frobenius norm, only the disjointness is required as in Davis and Kahan [3]; see Figure 2.1. For bounds in all unitarily invariant norms, not only is the disjointness between the two spectra required, but they also have to be separated by two intervals; see Figure 2.2. Such a separation requirement is similar to Davis and Kahan’s sin θ theorems in all unitarily invariant norms, but it differs from theirs in e 2 has to scatter around the origin. that here either the spectrum of Λ1 or that of Λ This substantial difference exists for a reason. In the absolute perturbation theory, e by a scalar µ to A − µI and A e − µI retains every relevant object— shifting A and A e 1 − U1 Λ1 and most of all the absolute gap eigenspaces, the residuals such as R = AU δ, and thus the positions of the intervals in Davis and Kahan’s assumptions are not intrinsically important. For our relative case, the emphasis is on issues associated with eigenvalues of relatively (much) smaller magnitudes than the norm of the matrix, and shifting affects fundamentally the underlying properties of the problem. Classical Davis–Kahan theorems use the absolute gap—the minimum distance— e 2 ); our relative perturbation theorems, however, will use relbetween λ(Λ1 ) and λ(Λ ative gaps measured by any one of the relative “distances” (2.1) and (2.2). We shall use notation ηp , ηχ , and ηc to denote three different kinds of relative gaps defined in 475 RELATIVE PERTURBATION THEORY II terms of %p , χ, and the classical measurement in the case of Figure 2.1; we underline the η’s to indicate the stronger separation by intervals as in Figure 2.2. The use of different relative gaps comes quite naturally with different perturbation equations, which yields various bounds on the sines of the angles. It appears that ηχ and η χ are natural choices for nonnegative definite Hermitian matrices, and bounds that use them are normally sharper than bounds that use other kinds of relative gaps. ηp , ηc , and their underlined ones are natural choices for any Hermitian matrices, and bounds that use them are comparable as we shall see in later sections; but mathematically, bounds with ηp are more beautiful because it is defined in terms of %p , which is a mete 2 equally, while ηc is perhaps more convenient ric on R [12] and thus treats Λ1 and Λ to use in actually numerical approximations. All our perturbation bounds in this paper use relative gaps mentioned above. However, sometimes it is more convenient to have bounds that use relative gaps e 2 ). For this purpose, Li [13] presented between λ(Λ1 ) and λ(Λ2 ) rather than λ(Λ inequality relations between relative gaps for λ(Λ1 ) and λ(Λ2 ) and those for λ(Λ1 ) e 2 ). and λ(Λ Finally, we remark that if we are interested in how close A’s eigenspace (spanned e eigenspace R(U e1 ), spectrum separation by a few (not all) columns of U1 ) is to A’s assumptions will then be required only between the subset of λ(Λ1 ) corresponding to e 2. those selected columns of U1 and the spectrum of Λ 2.4. On Sylvester equations ΩX − XΓ = S with structured S. This subsection illustrates another technical similarity and difference of our development of relative perturbation theory to Davis and Kahan’s classical development [3], where Ω and Γ are two self-adjoint operators, and S is an arbitrary operator in certain norm ideals. In our case, however, S takes one of the forms ΩE + F Γ and Ω1/2 EΓ1/2 . In what follows, we shall try to exploit the situations, and by doing so, we are able to derive better bounds on the solution X. Lemma 2.2. Let T Ω ∈ Cs×s and Γ ∈ Ct×t be two Hermitian matrices,1 and let s×t E, F ∈ C . If λ(Ω) λ(Γ) = ∅, then ΩX − XΓ = ΩE + F Γ has a unique solution X ∈ Cs×t , and moreover, q kXkF ≤ kEk2F + kF k2F η2 , def where η2 = minω∈λ(Ω), γ∈λ(Γ) %2 (ω, γ). If, in addition, F = 0, we have a better bound kXkF ≤ kEkF /ηc , def where2 ηc = minω∈λ(Ω), γ∈λ(Γ) |ω − γ|/|ω|. Proof. For any s × s unitary P and t × t unitary Q, the substitutions Ω ← P ∗ ΩP, 1 Lemmas Γ ← Q∗ ΓQ, X ← P ∗ XQ, E ← P ∗ EQ, and F ← P ∗ F Q 2.2–2.5 are actually true for normal matrices Ω and Γ. that ηc ≥ η2 . This can be seen as follows. Assume ηc = |ω − γ|/|ω| for some ω ∈ λ(Ω), γ ∈ λ(Γ). Then ηc ≥ %2 (ω, γ) ≥ η2 . 2 Notice 476 REN-CANG LI leave the lemma unchanged, so we may assume without loss of generality that Ω = diag(ω1 , ω2 , . . . , ωs ) and Γ = diag(γ1 , γ2 , . . . , γt ). Write X = (xij ), E = (eij ), and F = (fij ). Entrywise, ΩX − XΓ = ΩE + F Γ reads ωi xij − xij γTj = ωi eij + fij γj . Thus xij exists uniquely provided ωi 6= γj , which holds since λ(Ω) λ(Γ) = ∅, an empty set; and moreover; |(ωi − γj )xij |2 = |ωi xij − xij γj |2 = |ωi eij + fij γj |2 ≤ (|ωi |2 + |γj |2 )(|eij |2 + |fij |2 ) by the Cauchy–Schwarz inequality. This implies |eij |2 + |fij |2 η22 [%2 (ωi , γj )] P P 2 2 X kEk2F + kF k2F i, j |eij | + i, j |fij | 2 2 ⇒ kXkF = |xij | ≤ = , 2 η2 η22 i, j |xij |2 ≤ |eij |2 + |fij |2 ≤ 2 as was to be shown. The case F = 0 can be handled similarly. Lemma 2.3. Let Ω ∈ Cs×s and Γ ∈ Ct×t be two Hermitian matrices, and let E, F ∈ Cs×t . If there exist α ≥ 0 and δ > 0 such that (2.5) kΩk2 ≤ α and kΓ−1 k−1 2 ≥α+δ or kΩ−1 k−1 2 ≥α+δ (2.6) and kΓk2 ≤ α, then ΩX − XΓ = ΩE + F Γ has a unique solution X ∈ Cs×t , and moreover for any unitarily invariant norm ||| · |||, q q q q ηp , |||X||| ≤ |||E||| + |||F ||| def where η p = %p (α, α + δ). If, in addition, F = 0, we have a better bound |||X||| ≤ |||E||| /η c , holds. where η c = δ/α when (2.5) holds, and η c = δ/(α + δ) when (2.6) T Proof. First of all, the conditions of this lemma imply λ(Ω) λ(Γ) = ∅, thus X exists uniquely by Lemma 2.2. In what follows, we consider the case (2.5); the other case (2.6) is analogous. Post-multiply ΩX − XΓ = ΩE + F Γ by Γ−1 to get (2.7) ΩXΓ−1 − X = ΩEΓ−1 + F. 1 −1 k2 ≤ α+δ , we have Under the assumptions kΩk2 ≤ α and kΓ−1 k−1 2 ≥ α + δ ⇒ kΓ ΩXΓ−1 − X ≥ |||X||| − ΩXΓ−1 ≥ |||X||| − kΩk2 |||X||| kΓ−1 k2 1 α |||X||| , ≥ |||X||| − α |||X||| = 1− α+δ α+δ and ΩEΓ−1 + F ≤ ΩEΓ−1 + |||F ||| ≤ kΩk2 |||E||| kΓ−1 k2 + |||F ||| r q αp 1 q q q |||E||| + |||F ||| . + |||F ||| ≤ p 1 + ≤ α |||E||| α+δ (α + δ)p 477 RELATIVE PERTURBATION THEORY II By (2.7), we deduce that r q αp α q q q |||X||| ≤ p 1 + |||E||| + |||F ||| 1− α+δ (α + δ)p from which the desired inequality follows. Also, the case F = 0 can be handled similarly. Lemma 2.4. Let Ω ∈ Cs×s and T Γ ∈ Ct×t be two nonnegative definite Hermitian s×t matrices, and let E ∈ C . If λ(Ω) λ(Γ) = ∅, then ΩX − XΓ = Ω1/2 EΓ1/2 has a unique solution X ∈ Cs×t , and moreover, kXkF ≤ kEkF /ηχ , def where ηχ = minω∈λ(Ω), γ∈λ(Γ) χ(ω, γ). Proof. For any s × s unitary P and t × t unitary Q, the substitutions Ω ← P ∗ ΩP, Γ ← Q∗ ΓQ, X ← P ∗ XQ, and E ← P ∗ EQ leave the lemma unchanged, so we may assume without loss of generality that Ω = diag(ω1 , ω2 , . . . , ωs ) and Γ = diag(γ1 , γ2 , . . . , γt ). Write X = (xij ), E = (eij ). Entrywise, ΩX − XΓ = Ω1/2 EΓ1/2 reads ωi xij − √ √ xij γj = ωi eij γj . As long as ωi 6= γj , xij exists uniquely, and |xij |2 = |eij |2 /χ(ωi , γj ) ≤ |eij |2 /ηχ , summing which over 1 ≤ i ≤ s and 1 ≤ j ≤ t leads to the desired inequality. Lemma 2.5. Let Ω ∈ Cs×s and Γ ∈ Ct×t be two nonnegative definite Hermitian matrices, and let E ∈ Cs×t . If there exist α ≥ 0 and δ > 0 such that (2.5) or (2.6) holds, then ΩX − XΓ = Ω1/2 EΓ1/2 has a unique solution X ∈ Cs×t , and moreover, |||X||| ≤ |||E||| /η χ , def where η χ = χ(α, α + δ). Proof. The existence and T uniqueness of X are easy to see because the conditions of this lemma imply λ(Ω) λ(Γ) = ∅. In what follows, we consider the case (2.5) only; the other case (2.6) is analogous. Post-multiply ΩX − XΓ = Ω1/2 EΓ1/2 by Γ−1 to get ΩXΓ−1 − X = Ω1/2 EΓ−1/2 . (2.8) −1 k2 ≤ Under the assumptions kΩk2 ≤ α and kΓ−1 k−1 2 ≥ α + δ ⇒ kΓ ΩXΓ−1 − X ≥ 1 − α |||X||| , α+δ 1 α+δ , as in the proof of Lemma 2.3, and √ 1 1/2 −1/2 . Ω EΓ ≤ kΩ1/2 k2 |||E||| kΓ−1/2 k2 ≤ α |||E||| √ α+δ By (2.8), we deduce that 1− α |||X||| ≤ α+δ from which the desired inequality follows. r α |||E||| , α+δ we have 478 REN-CANG LI Remark 2.1. For Sylvester equation ΩX − XΓ = S, with S having no special structures, Bhatia, Davis, and McIntosh [2] also proved bounds, independent T of X’s dimensions, on |||X||| under the conditions that Ω and Γ are normal and λ(Ω) λ(Γ) = ∅ only. It is easy to see that (2.5) or (2.6) describes similar spectral distributions to Figure 2.2. Thus an open question naturally arises: could Lemmas 2.3 and 2.5 be T extended to normal matrices Ω and Γ and λ(Ω) λ(Γ) = ∅ only? 3. Relative perturbation theorems for eigenspace variations. Let A and e be two Hermitian matrices whose eigendecompositions are A (1.1) A = (U1 , U2 ) Λ1 Λ2 U1∗ U2∗ ¡ k ¡ k n−k e1 e= U where U = U1 U2 , U as in (1.2) and (1.3). Define à e2 ) e = (U e1 , U and A n−k e2 U e1 Λ !à e2 Λ e∗ U 1 e U2∗ ! , e j ’s are defined are unitary, and Λi ’s and Λ e − A)U1 . e 1 − U1 Λ1 = (A R = AU Notice that e ∗ AU e ∗ U1 Λ1 = Λ e 2U e ∗ U1 Λ1 , e 1−U e2∗ U1 − U e2∗ R = U U 2 2 2 e2∗ R = U e ∗ D∗ AD(I − D−1 ) + (D∗ − I)A U1 U and 2 e2∗ (I − D−1 )U1 + U e 2U e2∗ (D∗ − I)U1 Λ1 . =Λ Thus, we have (3.1) e2∗ U1 − U e2∗ (I − D−1 )U1 + U e2∗ U1 Λ1 = Λ e 2U e2∗ (D∗ − I)U1 Λ1 . e 2U Λ def e ∗ ∗ ∗ 3 e∗ e∗ Let X = U 2 D U1 = U2 U1 − U2 (I − D )U1 . Another formulation of (3.1) is (3.2) e 2U e 2 X − XΛ1 = Λ e2∗ (D∗ − D−1 )U1 . Λ Both (3.1) and (3.2) are in the form of Sylvester equations with special structures which are vital to the development of our following perturbation theorems. Notice by Lemma 2.1 that e1 ) = U e ∗ U1 ≤ |||X||| + U e ∗ (I − D∗ )U1 . (3.3) sin Θ(U1 , U 2 2 (3.1) makes %2 a natural choice for measuring the relative gap between λ(Λ1 ) and e 2 ), while (3.2) makes the classical measurement a natural choice. λ(Λ e = D∗ AD be two n × n Hermitian matrices with Theorem 3.1. Let A and A T e eigendecompositions (1.1)–(1.3), where D is nonsingular. If λ(Λ1 ) λ(Λ 2 ) = ∅, then p 2 2 −1 ∗ e1 )kF ≤ k(I − D )U1 kF + k(I − D )U1 kF , (3.4) k sin Θ(U1 , U η2 (3.5) e1 )kF ≤ k(I − D∗ )U1 kF + k sin Θ(U1 , U k(D∗ − D−1 )U1 kF , ηc 3 Or, Λ e2 Y − Y Λ1 = Ue2∗ (D∗ − D−1 )U1 Λ1 , where Y = Ue2∗ U1 − Ue2∗ (I − D−1 )U1 . Such a formulation can be used to produce perturbation bounds different from but of the same spirit as (3.5) and (3.8) whose derivations rely on (3.2). 479 RELATIVE PERTURBATION THEORY II where def η2 = min %2 (µ, µ e) µ∈λ(Λ1 ), e µ∈λ(e Λ2 ) and def ηc = min µ∈λ(Λ1 ), e µ∈λ(e Λ2 ) |µ − µ e| . |e µ| Proof. Lemma 2.2 and (3.1) yield (3.4), whereas Lemma 2.2, (3.2), and (3.3) yield (3.5). Remark 3.1. Without assuming the multiplicative structure in perturbing A to e ∗ U1 Λ1 = U e ∗ R, which leads to e we shall end up with Sylvester equation Λ e 2U e ∗ U1 − U A, 2 2 2 , e1 )kF ≤ kRkF |µ − µ e| . min k sin Θ(U1 , U µ∈λ(Λ1 ), e µ∈λ(e Λ2 ) This is a Davis–Kahan sin θ theorem. Our other theorems in this section relate to Davis–Kahan sin θ theorems analogously. Remark 3.2. Let U1,sub be a submatrix consisting of a few (or just one) of U1 ’s columns, and let Λ1,sub be the corresponding eigenvalue matrix. (3.1) and (3.2) imply that e ∗ U1,sub Λ1,sub = Λ e 2U e ∗ (D∗ − I)U1,sub Λ1,sub , e2∗ U1,sub − U e2∗ (I − D−1 )U1,sub + U e 2U Λ 2 2 ∗ ∗ −1 e e e Λ2 Xsub − Xsub Λ1,sub = Λ2 U2 (D − D )U1,sub , def e ∗ ∗ ∗ e∗ e∗ where Xsub = U 2 D U1,sub = U2 U1,sub − U2 (I − D )U1,sub . So our approach needs e1 ) no modifications when it comes to bound the closeness of either R(U1,sub ) to R(U e or R(U1 ) to R(U1 ). It can be seen that all the theorems in this section remain valid if U1 is replaced by U1,sub and the relative gaps are redefined as those between λ(Λ1,sub ) e 2 ). and λ(Λ Eisenstat and Ipsen [7] obtained the following: Under the assumptions of Theorem 3.1, √ kI − D−∗ D−1 k2 ∗ e (3.6) . k sin Θ(U1 , U1 )k2 ≤ k kI − D k2 + ηc It is a good bound for k = 1. But for k ≥ 2, it is less competitive. To compare this inequality with (3.5), we notice that e1 )k2 ≤ k sin Θ(U1 , U e1 )kF , k sin Θ(U1 , U √ √ ∗ ∗ k(I − D )U1 kF ≤ kk(I − D )U1 k2 ≤ kkI − D∗ k2 , √ √ k(D∗ − D−1 )U1 kF ≤ kkD∗ − D−1 k2 ≤ kkD∗ k2 kI − D−∗ D−1 k2 , kI − D−∗ D−1 k2 ≤ kD−∗ k2 kD∗ − D−1 k2 . Thus (3.5) and (3.6) imply that (3.5a) (3.6a) ∗ −∗ −1 √ e1 )k2 ≤ k kI − D∗ k2 + kD k2 kI − D D k2 , k sin Θ(U1 , U ηc −∗ ∗ −1 k kD − D kF kD 2 ∗ e1 )kF ≤ k kI − D kF + . k sin Θ(U1 , U ηc Now for inequalities (3.5) and (3.6) to be of any significance at all, D must be fairly close to the identity matrix, under which kD∗ k2 ≈ 1 and thus (3.5a)—a weakened 480 REN-CANG LI (3.5)—is about as good√as (3.6). Here is an example for which (3.5) improves (3.6) by at least a factor of k. Take D = I − ww∗ , where > 0 is a small number and w a vector with kwk2 = 1. Then D−1 = I + ww∗ /(1 − ). Thus, ww∗ and I − D−∗ D−1 = − 2 + ww∗ . D∗ − D−1 = −(2 − ) 1− 1− 1− Hence (3.5) yields e1 )k2 ≤ k sin Θ(U1 , U (3.5b) 1 + (2 − ) , 1 − ηc 1− and (3.6) becomes e1 )k2 ≤ k sin Θ(U1 , U (3.6b) √ k 1 + 1 − ηc 2+ 1− . 1− e1 ) is almost a multiple of identity and both I − D∗ It is can be seen that when Θ(U1 , U −∗ −1 and I −D D are almost of rank 1, (3.6a) is not much weaker than (3.6), and in this case (3.5) improves (3.6a) by nearly a factor of k. But in any event the improvement can be by a factor at most k. Now we compare (3.4) and (3.5). Although ηc ≥ η2 always, and that ηc may be much larger than η2 , it appears (3.4) and (3.5) are comparable by a constant factor unless D is much closer to a unitary matrix than the identity matrix4 in which case the second term on the right-hand side of (3.5) becomes negligible and consequently we expect (3.5) to be sharper than (3.4). On one hand, can always produce a p (3.5) √ bound that is only weaker than (3.4) by a factor of 4 + 2 2: √ 2k(I − D∗ )U1 kF k(D∗ − I)U1 kF + k(I − D−1 )U1 kF e + k sin Θ(U1 , U1 )kF ≤ η2 η2 p q √ k(I − D−1 )U1 k2F + k(I − D∗ )U1 k2F ≤ 4+2 2 , η2 √ where we have used η2 ≤ 2 and η2 ≤ ηc . On the other hand, (3.4) cannot be much worse than (3.5) in general, also by a constant factor, at least for the interesting cases def when E = I − D is tiny. In fact, the following arguments show that when η2 and ηc are not too much apart, (3.4) may be sharper. Assume that D differs from I not much worse than from its closest unitary matrix. We have q √ kI − D−1 k2F + kI − D∗ k2F ≈ 2kEkF + O(kEk2F ), kD∗ − D−1 kF ≈ 2kEkF + O(kEk2F ). (3.4) and (3.5) become (3.4c) (3.5c) 4 By √ 2kEkF + O(kEk2F ), η2 e1 )kF ≤ kEkF + 2kEkF + O(kEk2F ). k sin Θ(U1 , U ηc e1 )kF ≤ k sin Θ(U1 , U this we mean that kI − Dk2 kD∗ − D−1 k2 . RELATIVE PERTURBATION THEORY II 481 et |/|λ et |. ηc = γη2 ≤ γ%2 (λs , λ et ) = |λs − λ Write ηc = γη2 , where p 1 ≤ γ, and let ηc p et | ≤ γ 2 − 1, and so ηc ≥ γ 2 − 1 − 1. The ratio of the right-hand implies |λs |/|λ side of (3.4c) over that of (3.5c) is (terms of O(kEk2 ) are ignored) √ √ 2/(ηc /γ) 2γ = . 1 + 2/ηc ηc + 2 √ Now√for 1 ≤ γ ≤ 2, this ratio is less than 1, which means that (3.4c) is sharper; for γ > 2, this ratio is bounded by √ √ √ 2γ 2γ ≤ 2 ≤p ηc + 2 γ2 − 1 + 1 p √ √ since 2 γ/( γ 2 − 1 + 1) is monotonically increasing for 2 ≤ γ. This means that when I − D is tiny and that D is about as equally close to a unitary matrix √ as to the identity matrix, (3.4) cannot be worse than (3.5) by a factor more than 2. e = D∗ AD be two n × n Hermitian matrices with Theorem 3.2. Let A and A eigendecompositions (1.1)–(1.3), where D is nonsingular. Assume that the spectra of e 2 distribute as in Figure 2.2. Then for any unitarily invariant norm ||| · |||, Λ1 and Λ p q q q |||(I − D−1 )U1 ||| + |||(I − D∗ )U1 ||| e (3.7) , U ) ≤ , Θ(U sin 1 1 ηp ∗ (D − D−1 )U1 ∗ e (3.8) , sin Θ(U1 , U1 ) ≤ |||(I − D )U1 ||| + ηc where def η p = %p (α, α + δ) and def ηc = δ/(α + δ) if Figure 2.2(a), δ/α if Figure 2.2(b). Proof. Lemma 2.3 and (3.1) yield (3.7), whereas Lemma 2.3, (3.2), and (3.3) yield (3.8). Theorems 3.1 and 3.2 deal with rather restrictive perturbations to A. In what follows we show how similar ideas can be applied to a more realistic situation when scaled A is much better conditioned. Consider nonnegative definite Hermitian matrix e = S ∗ HS, e where S is a scaling matrix A = S ∗ HS ∈ Cn×n which is perturbed to A and usually diagonal. But this is not necessary to the theorems below. The elements of S can vary wildly and can even be singular. H is nonsingular and usually better def e − H. Notice that conditioned than A itself. Set ∆H = H A = S ∗ HS = (H 1/2 S)∗ H 1/2 S, ¡ e = S ∗ H 1/2 I + H −1/2 (∆H)H −1/2 H 1/2 S A ¡ 1/2 1/2 ∗ ¡ 1/2 1/2 H S H S. = I + H −1/2 (∆H)H −1/2 I + H −1/2 (∆H)H −1/2 Set B = S ∗ H 1/2 , ¡ e = S ∗ H 1/2 I + H −1/2 (∆H)H −1/2 1/2 B def = BD, 482 REN-CANG LI ¡ 1/2 e where D = I + H −1/2 (∆H)H −1/2 . Given the eigendecompositions of A and A e as in (1.1)–(1.3), it can be seen that B and B admit the following SVDs. à ! 1/2 Λ1 V1∗ , B = (U1 , U2 ) 1/2 V2∗ Λ2 !à ! à e 1/2 Ve1∗ Λ 1 e e e , B = (U1 , U2 ) e 1/2 Ve2∗ Λ 2 ei are the same as in (1.1), and where Ui , U We have ¡ k V1 n−k V2 and ¡ k Ve1 n−k Ve2 are unitary. e ∗ B ∗ − BD e −1 B ∗ = B(D e ∗ − D−1 )B ∗ . e−A = B eB e ∗ − BB ∗ = BD A e ∗ and U1 , respectively, to get Pre- and post-multiply the equations by U 2 e2∗ U1 − U e2∗ U1 Λ1 = Λ e 1/2 Ve2∗ (D∗ − D−1 )V1 Λ1/2 , e 2U Λ 2 1 (3.9) a Sylvester equation. Notice that for any unitarily invariant norm, e ∗ ∗ V2 (D − D−1 )V1 ≤ D∗ − D−1 1/2 −1/2 −1/2 −1/2 −1/2 −1/2 (∆H)H − I +H (∆H)H = I + H −1/2 k2 H −1/2 (∆H)H −1/2 ≤ k I + H −1/2 (∆H)H −1/2 kH −1 k2 |||∆H||| ≤p 1 − kH −1 k2 k∆Hk2 . (3.9) makes relative distance χ a natural choice. Lemmas 2.4, 2.5, and (3.9) produce the following two theorems. e = S ∗ HS e be two n × n Hermitian Theorem 3.3. Let A = S ∗ HS and A matrices with eigendecompositions (1.1)–(1.3). Assume H is positive definite and def kH −1 k2 k∆Hk2 < 1. If ηχ = minµ∈λ(Λ (3.10) e e) > 0, then e χ(µ, µ 1 ), µ∈λ(Λ2 ) −1 k∆HkF kH −1 k2 e1 )kF ≤ kD − D kF ≤ p . k sin Θ(U1 , U −1 ηχ 1 − kH k2 k∆Hk2 ηχ where D = (I + H −1/2 (∆H)H −1/2 )1/2 = D∗ . e = S ∗ HS e be two n × n Hermitian matrices Theorem 3.4. Let A = S ∗ HS and A with eigendecompositions (1.1)—(1.3). H is positive definite and kH −1 k2 k∆Hk2 < 1. e 2 distribute as in Figure 2.2. Then for any Assume that the spectra of Λ1 and Λ unitarily invariant norm ||| · |||, D − D−1 |||∆H||| kH −1 k2 e1 ) ≤ (3.11) ≤p , sin Θ(U1 , U ηχ 1 − kH −1 k2 k∆Hk2 η χ def where η χ = χ(α, α + δ) and D = (I + H −1/2 (∆H)H −1/2 )1/2 = D∗ . 483 RELATIVE PERTURBATION THEORY II Remark 3.3. Our approach can be extended straightforwardly to diagonalizable e = D∗ AD2 , where D1 and D2 are nonsingular. Suppose matrices. Consider A and A 1 e are diagonalizable and let that both A and A A(X1 , X2 ) = (X1 , X2 ) Λ1 Λ2 à eX e1 , X e2 ) = (X e1 , X e2 ) and A( e1 Λ ! e2 Λ , ¡ k n−k ¡ k n−k e1 X e2 are nonsingular, and Λi and Λ e j are defined as where X1 X2 and X ej ’s possibly complex. Partition in (1.2) and (1.3) with λi ’s and λ −1 (X1 , X2 ) = k n−k Y1∗ Y2∗ and e1 , X e2 )−1 = (X k n−k e∗ Y1 . Ye ∗ 2 def e e Define R = AX 1 − X1 Λ1 = (A − A)X1 . We have e 2 Ye ∗ X1 − Ye ∗ X1 Λ1 , e 1 − Ye2∗ X1 Λ1 = Λ Ye2∗ R = Ye2∗ AX 2 2 h i e − D−1 ) + (D1∗ − I)A X1 Ye2∗ R = Ye2∗ A(I 2 e 2 Ye ∗ (I − D−1 )X1 + Ye ∗ (D∗ − I)X1 Λ1 . =Λ 2 2 1 2 Thus we have the following perturbation equations: (3.12) (3.13) e 2 Ye ∗ (I − D−1 )X1 + Ye ∗ (D∗ − I)X1 Λ1 , e 2 Ye2∗ X1 − Ye2∗ X1 Λ1 = Λ Λ 2 2 1 2 −1 ∗ ∗ e e e Λ2 Z − ZΛ1 = Λ2 Y2 (D1 − D2 )X1 , def where Z = Ye2∗ D1∗ X1 = Ye2∗ X1 − Ye2∗ (I − D1∗ )X1 , from which various bounds on e1 ) can be derived under certain conditions. For example, let sin Θ(X1 , X def η2 = min µ∈λ(Λ1 ), e µ∈λ(e Λ2 ) %2 (µ, µ e). If η2 > 0, then by Lemma 2.2 we have q 1 ∗ e kYe2∗ (I − D2−1 )X1 k2F + kYe2∗ (D1∗ − I)X1 k2F kY2 X1 kF ≤ η2 q 1 ≤ kYe2∗ k2 kX1 k2 kI − D2−1 k2F + kD1∗ − Ik2F . η2 Notice that by Lemma 2.1 e1 )kF = k(Ye ∗ Ye2 )−1/2 Ye ∗ X1 (X ∗ X1 )−1/2 kF k sin Θ(X1 , X 2 2 1 ∗ e −1/2 ∗ e e ≤ k(Y2 Y2 ) k2 kY2 X1 kF k(X1∗ X1 )−1/2 k2 . e1 )kF is immediately available. Then a bound on k sin Θ(X1 , X 484 REN-CANG LI 4. Relative perturbation theorems for singular subspace variation. Let e be two m × n (m ≥ n) (complex) matrices with SVDs B and B (4.1) Σ1 B = (U1 , U2 ) 0 0 ∗ 0 V1 Σ2 V2∗ 0 e1 Σ e2 ) 0 e = (U e1 , U and B 0 à ! 0 e∗ V 1 e2 , Σ Ve2∗ 0 ¡ k m−k ¡ k m−k e1 U e2 are m × m unitary, and V = e = U1 U2 and U U where U = k n−k ¡ ¡ k n−k e2 are n × n unitary, 1 ≤ k < n, and V1 V2 and Ve = Ve1 U (4.2) Σ1 = diag(σ1 , . . . , σk ), Σ2 = diag(σk+1 , . . . , σn ), e 2 = diag(e e 1 = diag(e σ1 , . . . , σ ek ), Σ σk+1 , . . . , σ en ). Σ (4.3) Define residuals (4.4) def e e RR = BV 1 − U1 Σ1 = (B − B)V1 def e ∗ e ∗ − B ∗ )U1 . and RL = B U1 − V1 Σ1 = (B Our development for the singular value problems more or less resembles what we did for the eigenvalue problems. Most comments and comparisons we made in section 3 apply here, too, with perhaps a little change. Nonetheless there is a little bit of complication here, namely we shall work with two residuals, RR and RL , and end up with bounding solutions to two coupled matrix equations. The proofs of theorems in this section are relatively long and are postponed to the next section. e = D∗ BD2 be two m×n (m ≥ n) (complex) matrices Theorem 4.1. Let B and B 1 with SVDs (4.1)–(4.3), where D1 and D2 are nonsingular. Let def (4.5) η2 = min %2 (µ, µ e) e2 ) µ∈σ(Σ1 ), e µ∈σext (Σ and def ηc = min e2 ) µ∈σ(Σ1 ), e µ∈σext (Σ |µ − µ e| , |e µ| e 2 ) ≡ σ(Σ e 2 )∪{0} if m > n, and σext (Σ e 2 ) ≡ σ(Σ e 2 ) otherwise. If ηc , η2 > 0, where σext (Σ then (4.6) q e1 )k2 + k sin Θ(V1 , Ve1 )k2 k sin Θ(U1 , U F F q ∗ 2 k(I − D1 )U1 kF + k(I − D1−1 )U1 k2F + k(I − D2∗ )V1 k2F + k(I − D2−1 )V1 k2F ≤ , η2 (4.7) q e1 )k2 + k sin Θ(V1 , Ve1 )k2 ≤ k sin Θ(U1 , U F F + 1 ηc q k(I − D1∗ )U1 k2F + k(I − D2∗ )V1 k2F q k(D1∗ − D1−1 )U1 k2F + k(D2∗ − D2−1 )V1 k2F . 485 RELATIVE PERTURBATION THEORY II This theorem is an extension of a Wedin sin θ theorem [16], where no multiplicative e is assumed. He proved structure in the perturbation of B to B p q 2 2 e1 )k2 + k sin Θ(V1 , Ve1 )k2 ≤ kRR kF + kRL kF , (4.8) k sin Θ(U1 , U F F δ def e|. Our other theorems in this section relate to where δ = minµ∈σ(Σ ), e e2 ) |µ − µ 1 µ∈σext (Σ Wedin sin θ theorems analogously. e 2 ) when m > n. This is Remark 4.1. Ghost singular values 0 are appended to σ(Σ not necessary for the sensitivity of the V -factor alone, but rather the U -factor depends on the ghost singular values. More fine analysis can be given to illustrate this point. To keep the theorem relatively concise, we shall not go into this matter further. e = D∗ BD2 be two m×n (m ≥ n) (complex) matrices Theorem 4.2. Let B and B 1 with SVDs (4.1)–(4.3), where D1 and D2 are nonsingular. If there exist α ≥ 0 and δ > 0 such that min σi ≥ α + δ 1≤i≤k and max 1≤j≤n−k σ ek+j ≤ α, then for any unitarily invariant norm ||| · ||| (4.9) o n e1 ) , sin Θ(V1 , Ve1 ) max sin Θ(U1 , U q q 1 q max q (I − D2−1 )V1 + |||(D1∗ − I)U1 ||| , ≤ ηp q q q q (I − D1−1 )U1 + |||(D2∗ − I)V1 ||| , (4.10) à ! e1 ) sin Θ(U1 , U sin Θ(V1 , Ve1 ) ≤ s (I − D−1 )U1 q 1 q ∗ + (I − D1 )U1 −1 (I − D2 )V1 q (I − D2∗ )V1 ηp , def where η p = %p (α, α + δ), and (4.11) o n e1 ) , sin Θ(V1 , Ve1 ) ≤ max {|||(I − D1∗ )U1 ||| , |||(I − D2∗ )V1 |||} max sin Θ(U1 , U ª 1 + max (D1∗ − D1−1 )U1 , (D2∗ − D2−1 )V1 , ηc (4.12) à ! e1 ) (I − D1∗ )U1 sin Θ(U1 , U ≤ ∗ e (I − D )V sin Θ(V1 , V1 ) 1 2 1 (D1∗ − D1−1 )U1 , + (D2∗ − D2−1 )V1 ηc def where η c = δ/α. 486 REN-CANG LI We have both (4.9) and (4.10), and both (4.11) and (4.12) for theoretical considerations. In fact (4.9) differs from (4.10) in many ways from the theoretical point of view, and the two are independent. Remark 4.2. Intuitively, D2 should not affect R(U1 ) much as long as it is close to a unitary matrix. In fact, if D2 is unitary it does not affect R(U1 ) at all. This suggests that when one or both Di ’s are closer to unitary matrices5 than to the identity matrices, better bounds may be possible. Li [13] indeed presented theorems, showing that D1 contributes to sin Θ(V1 , Ve1 ) by its departure from some unitary matrix rather e1 ). The than the identity matrix and similar conclusions hold for D2 and sin Θ(U1 , U reader is referred to Li [13] for details. Remark 4.3. Theorems 4.1 and 4.2, applied to a special case [13] when one of ), yield a the Di ’s is the identity matrix and the other one takes the form ( I X I deflation criterion that is sooner and cheaper in Demmel–Kahan QR algorithm [5] for computing the singular value system of a bidiagonal matrix; see [10]. Theorems 4.1 and 4.2 deal with rather restrictive perturbations to B. In what follows we show how to apply them to a more realistic situation when scaled B is much e = GS e ∈ Cn×n , better conditioned. Consider B = GS ∈ Cn×n which is perturbed to B where S is a scaling matrix and G is nonsingular. Interesting cases are when G is def e − G. If k(∆G)G−1 k2 < 1, then much better conditioned than B. Set ∆G = G e = G + ∆G = [I + (∆G)G−1 ]G is nonsingular also. G e = GS e ∈ Cn×n with SVDs (4.1)–(4.3), Theorem 4.3. Let B = GS ∈ Cn×n and B −1 where G is nonsingular. Assume k(∆G)G k2 < 1. If def η2 = then min e2 ) µ∈σ(Σ1 ), e µ∈σ(Σ %2 (µ, µ e) > 0, r° °2 ° °2 ° ° ° e1 )° °sin Θ(U1 , U ° + °sin Θ(V1 , Ve1 )° F F p k(∆G)G−1 U1 k2F + k[I + G−∗ (∆G)∗ ]−1 G−∗ (∆G)∗ U1 k2F ≤ η2 s k∆GkF 1 ≤ kG−1 k2 1 + . (1 − kG−1 k2 k∆Gk2 )2 η2 (4.13) e = GS e = [I + (∆G)G−1 ]GS = D∗ BD2 , where D∗ = I + (∆G)G−1 Proof. Write B 1 1 and D2 = I. Apply Theorem 4.1 to get (4.13). e = GS e ∈ Cn×n with SVDs (4.1)– Theorem 4.4. Let B = GS ∈ Cn×n and B −1 (4.3), where G is nonsingular. Assume k(∆G)G k2 < 1. If there exist α ≥ 0 and δ > 0 such that min µ ≥ α + δ µ∈σ(Σ1 ) and max µ e≤α e2 ) µ∈σ(Σ e or min µ ≤ α µ∈σ(Σ1 ) 5 When e= U D1∗ U and max µ e ≥ α + δ, e2 ) µ∈σ(Σ e e = D1∗ BD2 = D1∗ U ΣV ∗ D2 is an SVD of B, e which implies both D1 and D2 are unitary, B e = D2∗ V . Thus perturbations to singular subspaces, in this case, are independent of and V e2 ), as long as they are disjoint. the gap between σ(Σ1 ) and σ(Σ 487 RELATIVE PERTURBATION THEORY II then for any unitarily invariant norm ||| · |||, o n e1 ) , sin Θ(V1 , Ve1 ) (4.14) max sin Θ(U1 , U ª max (∆G)G−1 U1 , [I + G−∗ (∆G)∗ ]−1 G−∗ (∆G)∗ U1 ≤ η∞ −1 |||∆G||| kG k2 ≤ , 1 − kG−1 k2 k∆Gk2 η∞ à ! e1 ) sin Θ(U1 , U (4.15) sin Θ(V1 , Ve1 ) p q q q |||(∆G)G−1 U1 ||| + |||[I + G−∗ (∆G)∗ ]−1 G−∗ (∆G)∗ U1 ||| ≤ ηp s 1 |||∆G||| ≤ kG−1 k2 q 1 + . −1 q (1 − kG k2 k∆Gk2 ) ηp e = GS e = [I + (∆G)G−1 ]GS = D∗ BD2 , where D∗ = Proof. Again write B 1 1 −1 I + (∆G)G and D2 = I. Apply Theorem 4.2 to get (4.14) and (4.15). Remark 4.4. Better bounds, especially when (∆G)G−1 is nearly a skew Hermitian matrix, can be proved for the angle Θ(V1 , Ve1 ). To prevent the paper from getting too long, we refer the reader to Li [13]. Remark 4.5. Theorems 4.3 and 4.4 can be extended to cover nonsquare matrices. e = GS e are m × n (m ≥ n); S is a scaling matrix and both G Assume B = GS and B e are m × n; G has full column rank. Let G† = (G∗ G)−1 G∗ the pseudoinverse and G of G. Notice that G† G = I. Then e = GS e = (G + ∆G)S = (I + (∆G)G† )GS = (I + (∆G)G† )B. B e has full column rank, too. Now applying If k(∆G)G† k2 ≤ kG† k2 k∆Gk2 < 1, G Theorems 4.1 and 4.2, we find that Theorems 4.3 and 4.4 remain valid with G−1 e 2 ) by σext (Σ e 2 ). replaced by G† , and σ(Σ e by 5. Proofs of Theorems 4.1 and 4.2. We can always augment B and B m × (m − n) zero blocks to their rights to make them square. The augmented B and e will have straightforward SVDs based on those of B and B. e It turns out doing so B does not affect the U -factors, and the V -factors are affected in a trivial way such that ||| sin(V1 , Ve1 )||| stays the same; see [13]. In what follows we shall deal with the square case only. e 1 − U1 Σ1 = (B e − B)V1 and RL = B e ∗ U1 − V1 Σ1 = (B e ∗ − B ∗ )U1 . Let RR = BV When m = n, the SVDs (4.1)–(4.3) read (5.1) B = (U1 , U2 ) Σ1 Σ2 V1∗ V2∗ à e2 ) e = (U e1 , U and B e1 Σ Notice that e ∗ BV e ∗ U1 Σ1 = Σ e 2 Ve ∗ V1 − U e ∗ U1 Σ1 , e 1−U e2∗ RR = U U 2 2 2 i 2 h e ∗ B(I e ∗ RR = U e − D−1 ) + (D1∗ − I)B V1 U 2 2 2 e 2 Ve2∗ (I − D−1 )V1 + U e2∗ (D1∗ − I)U1 Σ1 =Σ 2 !à e2 Σ Ve1∗ Ve2∗ ! . 488 REN-CANG LI to get (5.2) e2∗ U1 Σ1 = Σ e 2 Ve2∗ (I − D−1 )V1 + U e2∗ (D1∗ − I)U1 Σ1 . e 2 Ve2∗ V1 − U Σ 2 On the other hand, e ∗ U1 − Ve2∗ V1 Σ1 = Σ e2∗ U1 − Ve2∗ V1 Σ1 . e 2U Ve2∗ RL = Ve2∗ B h i e ∗ (I − D−1 ) + (D2∗ − I)B ∗ U1 Ve2∗ RL = Ve2∗ B 1 e 2U e2∗ (I − D−1 )U1 + Ve2∗ (D2∗ − I)V1 Σ1 , =Σ 1 which produce (5.3) e 2U e2∗ U1 − Ve2∗ V1 Σ1 = Σ e2∗ (I − D−1 )U1 + Ve2∗ (D2∗ − I)V1 Σ1 . e 2U Σ 1 (5.2) and (5.3) take an equivalent form as a single matrix equation with dimensions doubled. !à ! à ! à e ∗ U1 e ∗ U1 e2 U U Σ1 Σ 2 2 − (5.4) e2 Σ1 Σ Ve2∗ V1 Ve2∗ V1 !à ! à e ∗ (I − D−1 )U1 e2 U Σ 2 1 = e2 Σ Ve2∗ (I − D2−1 )V1 ! à e ∗ (D∗ − I)U1 Σ1 U 2 1 . + Σ1 Ve2∗ (D2∗ − I)V1 (5.2)–(5.4) can also be rearranged in such a way that sharper bounds can be proved when Di ’s are closer to unitary matrices than the identity matrix. Write def X = Ve2∗ D2∗ V1 = Ve2∗ V1 − Ve2∗ (I −D2∗ )V1 def e ∗ ∗ ∗ e∗ e∗ and Y = U 2 D1 U1 = U2 U1 − U2 (I −D1 )U1 . We have, from (5.2) and (5.3), e 2 Ve ∗ (D∗ − D−1 )V1 , e 2 X − Y Σ1 = Σ Σ 2 2 2 ∗ ∗ e e e Σ2 Y − XΣ1 = Σ2 U2 (D1 − D1−1 )U1 , (5.5) (5.6) and, from (5.4), ! à e2 Y Σ (5.7) e2 Σ Ã = X e2 Σ − e2 Σ !à Y X Σ1 Σ1 e ∗ (D∗ − D−1 )U1 U 2 1 1 ! Ve2∗ (D2∗ − D2−1 )V1 . (5.2)–(5.4) make %p a natural choice for measuring the relative gaps between σ(Σ1 ) e 2 ), while (5.5)–(5.7) make the classical measurement (2.2) a natural choice. and σ(Σ Remark 5.1. Just as in Remark 3.1, perturbation equations (5.2)–(5.7) can be modified to bound the closeness of the singular subspaces spanned by a few selected e1 ) and R(Ve1 ). columns of U1 and V1 to R(U 489 RELATIVE PERTURBATION THEORY II Proof of Theorem 4.1. Notice that the eigenvalues of ( those of ( Σ1 Σ1 e2 Σ e2 Σ ) are ±e σk+j , that ) are ±σi , and that σk+j ) ≥ %2 (σi , σ ek+j ) %2 (σi , −e and %2 (−σi , σ ek+j ) ≥ %2 (σi , σ ek+j ). By Lemma 2.2 and (5.4), we have e2∗ U1 k2F + kVe2∗ V1 k2F kU 1 h e∗ −1 −1 2 2 2 e∗ ∗ e∗ ≤ 2 kU 2 (I − D1 )U1 kF + kU2 (D1 − I)U1 kF + kV2 (I − D2 )V1 kF η2 i +kVe2∗ (D2∗ − I)V1 k2F ≤ 1 k(I − D1−1 )U1 k2F + k(D1∗ − I)U1 k2F + k(I − D2−1 )V1 k2F + k(D2∗ − I)V1 k2F , 2 η2 which gives (4.6). By Lemma 2.2 and (5.7), we have q kXk2F + kY k2F q 1 e ∗ (D∗ − D−1 )U1 k2 + kVe ∗ (D∗ − D−1 )V1 k2 ≤ kU 2 1 2 2 1 2 F F ηc q 1 ≤ k(D1∗ − D1−1 )U1 k2F + k(D2∗ − D2−1 )V1 k2F , ηc which, together with q q e ∗ U1 k2 = kX + Ve ∗ (I − D∗ )V1 k2 + kY + U e ∗ (I − D∗ )U1 k2 kVe2∗ V1 k2F + kU 2 2 2 2 1 F F F q q e ∗ (I − D∗ )U1 k2 , ≤ kXk2F + kY k2F + kVe2∗ (I − D2∗ )V1 k2F + kU 2 1 F imply (4.7). Remark 5.2. Without assuming the multiplicative structure in the perturbation e we shall end up with of B to B, e2∗ U1 Σ1 = U e2∗ RR e 2 Ve2∗ V1 − U Σ e 2U e2∗ U1 − Ve2∗ V1 Σ1 = Ve2∗ RL , and Σ which lead to Wedin sin θ theorems, e.g., (4.8). Lemma 5.1. Let Ω ∈ Cs×s and Γ ∈ Ct×t be two Hermitian matrices, and let e F, Fe, ∈ Cs×t . If there exist α ≥ 0 and δ > 0 such that E, E, (5.8) kΩk2 ≤ α and kΓ−1 k−1 2 ≥α+δ or (5.9) kΩ−1 k−1 2 ≥α+δ and kΓk2 ≤ α, e + FeΓ has a unique solution X, Y ∈ then ΩX − Y Γ = ΩE + F Γ and ΩY − XΓ = ΩE Cs×t , and moreover for any unitarily invariant norm ||| · |||, ) (q r q q 1 e q q q e q (5.10) max {|||X||| , |||Y |||} ≤ max |||E||| + |||F ||| , E + F , ηp 490 REN-CANG LI def where η p = %p (α, α + δ). If, in addition, F = Fe = 0, we have a better bound max {|||X||| , |||Y |||} ≤ o n 1 e max |||E||| , E , ηc where η c = δ/α when (5.8) holds and η c = δ/(α + δ) when (5.9) holds. Proof. The proof of the existence and uniqueness of X, Y ∈ Cs×t is left to the reader. We present a proof of (5.10) for the case (5.8). A proof for the other case is e + FeΓ analogous. Consider first the subcase |||X||| ≥ |||Y |||. Postmultiply ΩY −XΓ = ΩE by Γ−1 to get e −1 + Fe. ΩY Γ−1 − X = ΩEΓ (5.11) 1 −1 k2 ≤ α+δ , that Then we have, by kΩk2 ≤ α and kΓ−1 k−1 2 ≥ α + δ ⇒ kΓ ΩY Γ−1 − X ≥ |||X||| − ΩY Γ−1 ≥ |||X||| − kΩk2 |||Y ||| kΓ−1 k2 1 1 ≥ |||X||| − α |||Y ||| ≥ |||X||| − α |||X||| α+δ α+δ α |||X||| = 1− α+δ and e −1 e −1 e e −1 e kΓ k2 + Fe ΩEΓ + F ≤ ΩEΓ + F ≤ kΩk2 E r q r 1 q αp e e e q e p + + F ≤ 1 + E ≤ α E F . α+δ (α + δ)p By (5.11), we deduce that r r q q αp α q e |||X||| ≤ p 1 + 1− E + Fe p α+δ (α + δ) q 1 q e q + |||Fe|||q . Similarly, if |||X||| < which produces that if |||X||| ≥ |||Y |||, |||X||| ≤ η |||E||| p p q q |||Y |||, from ΩX −Y Γ = ΩE +F Γ, we can obtain |||Y ||| ≤ η1 q |||E||| + |||F ||| . Inequality p (5.10) now follows. The case F = Fe = 0 can be handled analogously. Proof of Theorem 4.2. By (5.2) and (5.3) and Lemma 5.1, we have o n e ∗ e ∗ U V , V max U 1 1 2 2 (r q q 1 e ∗ ∗ max q Ve2∗ (I − D2−1 )V1 + U (D − I)U ≤ , 1 2 1 ηp ) r q q e ∗ ∗ q e ∗ −1 U (I − D )U1 + V (D − I)V1 2 ≤ 1 max ηp 1 2 2 q q (I − D−1 )V1 q + |||(D∗ − I)U1 |||q , 1 2 q q q q −1 ∗ (I − D1 )U1 + |||(D2 − I)V1 ||| , 491 RELATIVE PERTURBATION THEORY II as required. Lemma 2.3 and (5.4) yield that à ! Ãà !q e ∗ e ∗ (I − D−1 )U1 1 U U2 U1 2 1 (5.12) ≤ Ve2∗ V1 η p Ve2∗ (I − D2−1 )V1 à !q !1/q e ∗ ∗ U2 (D1 − I)U1 , + Ve2∗ (D2∗ − I)V1 since the conditions of Theorem 4.2 imply ° °Ã !° ° ° e2 ° Σ ° ° ° ° ≤ α, ° ° e ° ° Σ1 ° Σ2 Σ1 2 −1 ° ° 1 ° . ° ≤ ° α+δ 2 e1 ) have the same nonzero singular values and so do Ve ∗ V1 e ∗ U1 and sin Θ(U1 , U Since U 2 2 e and sin Θ(V1 , V1 ), à ! à ! e ∗ e1 ) sin Θ(U1 , U U2 U1 = (5.13) . sin Θ(V1 , Ve1 ) Ve2∗ V1 Note also à e ∗ (I − D−1 )U1 U 2 1 à ! à Ve2∗ (I − D2−1 )V1 e ∗ (D∗ − I)U1 U 2 1 Ve2∗ (D2∗ − I)V1 = ! à = e∗ U 2 e∗ U 2 ! Ve2∗ Ve2∗ ! (I − D1−1 )U1 (D1∗ − I)U1 (D2∗ − I)V1 Thus we arrive at (5.14) à e ∗ U2 (I − D1−1 )U1 (5.15) à e ∗ ∗ U2 (D1 − I)U1 ! (I − D1−1 )U1 ≤ Ve2∗ (I − D2−1 )V1 ! (D1∗ − I)U1 ≤ Ve2∗ (D2∗ − I)V1 , (I − D2−1 )V1 . (D2∗ − I)V1 Inequality (4.10) is a consequence of (5.12)–(5.15). By (5.5) and (5.6) and Lemma 5.1, we have o n 1 e ∗ ∗ −1 max Ve2∗ (D2∗ − D2−1 )V1 , U (D − D )U max {|||X||| , |||Y |||} ≤ 1 2 1 1 ηc ª 1 max (D2∗ − D2−1 )V1 , (D1∗ − D1−1 )U1 . ≤ ηc Notice that , (I − D2−1 )V1 e ∗ V2 V1 ≤ |||X||| + Ve2∗ (I − D2∗ )V1 , e ∗ e ∗ ∗ U2 U1 ≤ |||X||| + U 2 (I − D1 )U1 . Inequality (4.11) now follows. Similarly apply Lemma 2.3 to (5.7) to get (4.12). . 492 REN-CANG LI 6. Conclusions. We have developed a relative perturbation theory for eigenspace and singular subspace variations under multiplicative perturbations. In the theory, extensions of Davis–Kahan sin θ theorems and Wedin sin θ theorems from the classical perturbation theory are made. Our unifying treatment covers almost all previously studied cases over the last six years or so and yet produces sharper bounds. Straightforward extensions of the underlying theory of this paper to diagonalizable matrices are outlined. The theory is built upon bounds on solutions to various Sylvester equations with structured right-hand sides. This provides technical links to the classical Davis and Kahan development for eigenvalue problems and to Wedin’s development for singular value problems. Although these equations are used as tools in this paper to study eigenspace and singular subspace variations, we believe they should deserve at least as much attention as the bounds they lead to. Acknowledgment. I thank Professor W. Kahan for his consistent encouragement and support and Professor J. Demmel and Professor B. N. Parlett for helpful discussions. The referees’ constructive comments for improving the presentation are greatly appreciated. REFERENCES [1] J. Barlow and J. Demmel, Computing accurate eigensystems of scaled diagonally dominant matrices, SIAM J. Numer. Anal., 27 (1990), pp. 762–791. [2] R. Bhatia, C. Davis, and A. McIntosh, Perturbation of spectral subspaces and solution of linear operator equations, Linear Algebra Appl., 52–53 (1983), pp. 45–67. [3] C. Davis and W. Kahan, The rotation of eigenvectors by a perturbation. III, SIAM J. Numer. Anal., 7 (1970), pp. 1–46. [4] J. Demmel and W. Gragg, On computing accurate singular values and eigenvalues of matrices with acyclic graphs, Linear Algebra Appl., 185 (1993), pp. 203–217. [5] J. Demmel and W. Kahan, Accurate singular values of bidiagonal matrices, SIAM J. Sci. Statist. Comput., 11 (1990), pp. 873–912. [6] J. Demmel and K. Veselić, Jacobi’s method is more accurate than QR, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 1204–1245. [7] S. C. Eisenstat and I. C. F. Ipsen, Relative perturbation bounds for eigenspaces and singular vector subspaces, in Proceedings of the Fifth SIAM Conference on Applied Linear Algebra, J. G. Lewis, ed., SIAM, Philadelphia, 1994, pp. 62–66. [8] S. C. Eisenstat and I. C. F. Ipsen, Relative perturbation techniques for singular value problems, SIAM J. Numer. Anal., 32 (1995), pp. 1972–1988. [9] W. Kahan, Accurate Eigenvalues of a Symmetric Tridiagonal Matrix, Technical Report CS41, Computer Science Department, Stanford University, Stanford, CA, 1966 (revised June 1968). [10] R.-C. Li, On Deflating Bidiagonal Matrices, manuscript, Department of Mathematics, University of California, Berkeley, CA, 1994. [11] R.-C. Li, On perturbations of matrix pencils with real spectra, Math. Comp., 62 (1994), pp. 231– 265. [12] R.-C. Li, Relative perturbation theory: (I). Eigenvalue and singular value variations, SIAM J. Matrix Anal. Appl., 19 (1998), pp. 956–982. [13] R.-C. Li, Relative perturbation theory: (II) Eigenspace and singular subspace variations, Technical Report UCB//CSD-94-856, Computer Science Division, Department of EECS, University of California at Berkeley, 1994, also LAPACK Working notes # 85 (revised January 1996 and April 1996, available at http://www.netlib.org/lapack/lawns/lawn85.ps). [14] R. Mathias, Spectral Perturbation Bounds for Positive Definite Matrices, SIAM J. Matrix Anal. Appl., 18 (1997), pp. 959–980. [15] G. W. Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press, Boston, 1990. [16] P.-Å. Wedin, Perturbation bounds in connection with singular value decomposition, BIT, 12 (1972), pp. 99–111. [17] P.-Å. Wedin, On angles between subspaces, in Matrix Pencils, B. Kågström and A. Ruhe, eds., Springer–Verlag, New York, 1983, pp. 263–285.
© Copyright 2026 Paperzz