RELATIVE PERTURBATION THEORY: II. EIGENSPACE AND

c 1998 Society for Industrial and Applied Mathematics
°
SIAM J. MATRIX ANAL. APPL.
Vol. 20, No. 2, pp. 471–492
RELATIVE PERTURBATION THEORY:
II. EIGENSPACE AND SINGULAR SUBSPACE VARIATIONS∗
REN-CANG LI†
Abstract. The classical perturbation theory for Hermitian matrix eigenvalue and singular value
problems provides bounds on invariant subspace variations that are proportional to the reciprocals
of absolute gaps between subsets of spectra or subsets of singular values. These bounds may be
bad news for invariant subspaces corresponding to clustered eigenvalues or clustered singular values
of much smaller magnitudes than the norms of matrices under considerations. In this paper, we
e = D∗ AD and
consider how eigenspaces of a Hermitian matrix A change when it is perturbed to A
e = D1∗ BD2 , where
how singular spaces of a (nonsquare) matrix B change when it is perturbed to B
D, D1 , and D2 are nonsingular. It is proved that under these kinds of perturbations, the changes of
invariant subspaces are proportional to the reciprocals of relative gaps between subsets of spectra or
subsets of singular values. The classical Davis–Kahan sin θ theorems and Wedin sin θ theorems are
extended.
Key words. multiplicative perturbation, relative perturbation theory, relative gap, eigenvector,
singular vector, structured Sylvester equation, graded matrix
AMS subject classifications. 15A18, 15A42, 65F15, 65F35, 65G99
PII. S0895479896298506
e be two n × n Hermitian matrices with eigende1. Introduction. Let A and A
compositions
(1.1)
A = (U1 , U2 )
where U =
(1.2)
(1.3)
¡
Λ1
k
n−k
U1
U2
Λ2
U1∗
U2∗
¡
e=
,U
k
e1
U
Ã
e2 )
e = (U
e1 , U
and A
n−k
e2
U
e1
Λ
!Ã
e2
Λ
e∗
U
1
e∗
U
2
!
,
are unitary, and
Λ1 = diag(λ1 , . . . , λk ), Λ2 = diag(λk+1 , . . . , λn ),
e1 , . . . , λ
ek ), Λ
ek+1 , . . . , λ
en ).
e 1 = diag(λ
e 2 = diag(λ
Λ
e2
e are close and that the spectrum of Λ1 and that of Λ
Suppose now that A and A
e 1 and that of Λ2 ) are well separated. The question is, How
(or the spectrum of Λ
∗ Received by the editors February 2, 1996; accepted for publication (in revised form) by
B. Kagstrom April 29, 1998; published electronically November 23, 1998. Part of this work was
done during the summer of 1994 while the author was at the Department of Mathematics, University of California at Berkeley. A preliminary version of this paper appeared as Technical Report
UBC//CSD-94-856, Computer Science Division, Department of EECS, University of California at
Berkeley. This material is based in part upon work supported by Argonne National Laboratory
under grant 20552402, the University of Tennessee through the Advanced Research Projects Agency
under contract DAAL03-91-C-0047, the National Science Foundation under grant ASC-9005933, the
National Science Infrastructure grants CDA-8722788 and CDA-9401156, a Householder Fellowship
in Scientific Computing at Oak Ridge National Laboratory, and the Applied Mathematical Sciences
Research Program, Office of Energy Research, United States Department of Energy contract DEAC05-96OR22464 with Lockheed Martin Energy Research Corp.
http://www.siam.org/journals/simax/20-2/29850.html
† Mathematical Science Section, Oak Ridge National Laboratory, P.O. Box 2008, Bldg. 6012,
Oak Ridge, TN 37831-6367. Current address: Department of Mathematics, University of Kentucky,
Lexington, KY 40506 ([email protected]).
471
472
REN-CANG LI
ei ? This question has
close are the eigenspaces spanned by the columns of Ui and U
been answered well by four celebrated theorems: the so-called sin θ, tan θ, sin 2θ, and
tan 2θ theorems due to Davis and Kahan [3] for arbitrary additive perturbations in
the sense that the perturbations to A can be made arbitrary. It is proved that the
changes of invariant subspaces are proportional to the reciprocals of absolute gaps
between subsets of spectra.
e = D∗ AD,
In the case of multiplicative perturbations when A is perturbed to A
Eisenstat and Ipsen [7] first attacked the question by bounding the angles between a
e eigenspace spanned by the columns of U
e1 ,
one-dimensional eigenspace of A and A’s
and they ultimately obtained bounds for the angle between A’s eigenspace spanned
e eigenspace spanned by the columns of U
e1 . The study
by the columns of U1 and A’s
suggests that the changes of invariant subspaces be proportional to the reciprocals of
relative gaps between subsets of spectra.
This paper will study the same question, but using a different approach. It is
explained that bounding the angle between the eigenspaces is related to bounding the
solutions to Sylvester equations ΩX − XΓ = S, where S has very special structures.
Our approach is more or less along the lines of Davis and Kahan [3] and Bhatia,
Davis, and McIntosh [2], where no special structures for S are known. There are
a number of advantages of the new approach over Eisenstat and Ipsen’s. The new
approach can deal directly with an eigenspace and its perturbed one, unlike
Eisenstat
√
and Ipsen’s approach, and consequently gets rid of the unpleasant fact k in Eisenstat
and Ipsen’s bounds; the new approach makes no distinctions in treating an eigenvector
and an eigenspace and two eigenspaces of the same or different dimensions; the new
approach is mathematically more elegant and makes it possible to extend Davis–
Kahan theorems in all unitarily invariant norms.
A similar question for singular value decompositions will be answered also.
Although special, multiplicative perturbations cover component-wise relative perturbations to entries of symmetric tridiagonal matrices with zero diagonal [5, 9], entries of bidiagonal and biacyclic matrices [1, 4, 5], and more realistically perturbations
in graded nonnegative definite Hermitian matrices [6, 14] and in graded matrices of
singular value problems [6, 14] and more [8].
The rest of this paper is organized as follows. Section 2 serves two purposes: to
present essential preliminary definitions and lemmas, and to briefly discuss technical
similarities and differences between our extensions and the classical Davis–Kahan
e = D∗ AD
theorems. Section 3 details relative perturbation theorems for A and A
and for nonnegative definite Hermitian matrices A that themselves may be very ill
conditioned but can be scaled to a well-conditioned one. Section 3 also remarks on
how to extend our approach to the perturbations of diagonalizable matrices. Section 4
develops analogous relative perturbation theorems but for the singular value problems.
The proofs for theorems in section 4 turn out to be quite long and therefore are
postponed to section 5.
2. Preliminaries. Throughout this paper, we follow notation used in the first
part of this series [12].
2.1. Relative distances. We shall use the following two kinds of relative distances to measure relative accuracies in numerical approximations: %p and χ are
defined for α, α
e ∈ C by
(2.1)
|α − α
e|
e) = p
%p (α, α
p
|α|p
+
|e
α|p
|α − α
e|
,
for 1 ≤ p ≤ ∞, and χ(α, α
e) = p
|αe
α|
RELATIVE PERTURBATION THEORY II
473
with convention 0/0 = 0 for convenience. Both have better mathematical properties than the classical measurement |δ|: the relative error in α
e = α(1 + δ) as an
approximation to α is
(2.2)
δ = relative error in α
e=
α
e−α
,
α
which is, however, good enough and actually more convenient to use in numerical
computations. So we shall also present bounds using this classical measurement. The
use of any particular relative distance in our perturbation bounds comes naturally
with their derivations. From the numerical point of view, any one of the relative
distances is just as good as the others, but theoretically they provide bounds of very
different features. It can be proved that these relative distances are topologically
equivalent; see [12] for details.
2.2. Angles between two subspaces. Since this paper concerns the variations
of subspaces, we need some metrics to measure the differences between two subspaces.
In this, we follow Davis and Kahan [3] and Stewart and Sun [15, Chapters I and II].
Wedin [17] presented an illuminating discussion on angles between two subspaces,
e ∈ Cn×k (n > k) have full column rank k, and define the angle matrix
too. Let X, X
e
e as
Θ(X, X) between X and X
(2.3)
def
e X
e ∗ X)
e −1 X
e ∗ X(X ∗ X)−1/2 )1/2 .
e = arccos((X ∗ X)−1/2 X ∗ X(
Θ(X, X)
e are defined
The canonical angles between the subspaces X = R(X) and Xe = R(X)
e where R(X) denotes the
to be the singular values of the Hermitian matrix Θ(X, X),
subspace spanned by X’s columns. The following lemma is well known. For a proof
of it, the reader is referred to, e.g., Li [11, Lemma 2.1].
Lemma 2.1. Suppose that
partition
¡
k
e
X
n−k
e1
X
e X
e1 )−1 =
(X,
k
n−k
∈ Cn×n is a nonsingular matrix, and
Ye ∗
Ye1∗
.
Then for any unitarily invariant norm ||| · |||,
e = (Ye1∗ Ye1 )−1/2 Ye1∗ X(X ∗ X)−1/2 .
sin Θ(X, X)
In this lemma, as well as many other places in the rest of this paper, we talk about the
“same” unitarily invariant norm ||| · ||| that applies to matrices of different dimensions
at the same time. Such applications of a unitarily invariant norm are understood
in the following sense: first there is a unitarily invariant norm ||| · ||| on CM ×N for
sufficiently large integers M and N ; then, for a matrix X ∈ Cm×n (m ≤ M and
n ≤ N ), |||X||| is defined by appending X with zero blocks to make it M × N and then
taking the unitarily invariant norm of the enlarged matrix.
e =U
e1 as in (1.1), by Lemma 2.1, one has
Taking X = U1 and X
1/2
e1 ) = arccos(U ∗ U
e1 ) = U
e ∗ U1 .
e e∗
and sin Θ(U1 , U
(2.4) Θ(U1 , U
1 1 U1 U1 )
2
474
REN-CANG LI
t Spectrum of Λ
1
t
t
d
d Spectrum of Λ
e2
t
d
t
d
d
e2 are disjoint.
Fig. 2.1. The spectrum of Λ1 and that of Λ
(a)
(b)
−α
δ −α
Spectrum of Λ1
r
0
α
0
r
α
δ -
e2
Spectrum of Λ
e2 are separated by two intervals, and one of the
Fig. 2.2. The spectrum of Λ1 and that of Λ
spectra scatters around the origin.
Occasionally, it is also of interest to measure how far a lower-dimensional subspace is
away from a higher-dimensional one. In such a situation, an angle matrix can still be
e If we do so, Lemma 2.1
defined as in (2.3), but with X having fewer columns than X.
remains valid. However, Θ( · , · ) is no longer symmetric with respect to its arguments.
In the case of eigenspace variation in which we are interested we may take U1,sub , a
submatrix consisting of a few (or just one) of U1 ’s columns and ask how close A’s
e eigenspace R(U
e1 ). Still, we have
eigenspace R(U1,sub ) is to A’s
(2.4a)
1/2
e1 ) = arccos(U ∗ U
e e∗
Θ(U1,sub , U
and
1,sub 1 U1 U1,sub )
e1 ) = U
e2∗ U1,sub .
sin Θ(U1,sub , U
2.3. Separation of spectra. In deriving bounds on the sines of the angles
e1 ), certain disjointness between Λ1 ’s spectrum
between the eigenspaces R(U1 ) and R(U
e 1 ’s and Λ2 ’s) is assumed. Depending on what matrix norms are
e 2 ’s (or between Λ
and Λ
used, two different kinds of separations, one stronger than the other, are considered.
For bounds in Frobenius norm, only the disjointness is required as in Davis and
Kahan [3]; see Figure 2.1. For bounds in all unitarily invariant norms, not only is the
disjointness between the two spectra required, but they also have to be separated by
two intervals; see Figure 2.2. Such a separation requirement is similar to Davis and
Kahan’s sin θ theorems in all unitarily invariant norms, but it differs from theirs in
e 2 has to scatter around the origin.
that here either the spectrum of Λ1 or that of Λ
This substantial difference exists for a reason. In the absolute perturbation theory,
e by a scalar µ to A − µI and A
e − µI retains every relevant object—
shifting A and A
e 1 − U1 Λ1 and most of all the absolute gap
eigenspaces, the residuals such as R = AU
δ, and thus the positions of the intervals in Davis and Kahan’s assumptions are not
intrinsically important. For our relative case, the emphasis is on issues associated with
eigenvalues of relatively (much) smaller magnitudes than the norm of the matrix, and
shifting affects fundamentally the underlying properties of the problem.
Classical Davis–Kahan theorems use the absolute gap—the minimum distance—
e 2 ); our relative perturbation theorems, however, will use relbetween λ(Λ1 ) and λ(Λ
ative gaps measured by any one of the relative “distances” (2.1) and (2.2). We shall
use notation ηp , ηχ , and ηc to denote three different kinds of relative gaps defined in
475
RELATIVE PERTURBATION THEORY II
terms of %p , χ, and the classical measurement in the case of Figure 2.1; we underline
the η’s to indicate the stronger separation by intervals as in Figure 2.2. The use of
different relative gaps comes quite naturally with different perturbation equations,
which yields various bounds on the sines of the angles. It appears that ηχ and η χ
are natural choices for nonnegative definite Hermitian matrices, and bounds that use
them are normally sharper than bounds that use other kinds of relative gaps. ηp , ηc ,
and their underlined ones are natural choices for any Hermitian matrices, and bounds
that use them are comparable as we shall see in later sections; but mathematically,
bounds with ηp are more beautiful because it is defined in terms of %p , which is a mete 2 equally, while ηc is perhaps more convenient
ric on R [12] and thus treats Λ1 and Λ
to use in actually numerical approximations.
All our perturbation bounds in this paper use relative gaps mentioned above.
However, sometimes it is more convenient to have bounds that use relative gaps
e 2 ). For this purpose, Li [13] presented
between λ(Λ1 ) and λ(Λ2 ) rather than λ(Λ
inequality relations between relative gaps for λ(Λ1 ) and λ(Λ2 ) and those for λ(Λ1 )
e 2 ).
and λ(Λ
Finally, we remark that if we are interested in how close A’s eigenspace (spanned
e eigenspace R(U
e1 ), spectrum separation
by a few (not all) columns of U1 ) is to A’s
assumptions will then be required only between the subset of λ(Λ1 ) corresponding to
e 2.
those selected columns of U1 and the spectrum of Λ
2.4. On Sylvester equations ΩX − XΓ = S with structured S. This
subsection illustrates another technical similarity and difference of our development
of relative perturbation theory to Davis and Kahan’s classical development [3], where
Ω and Γ are two self-adjoint operators, and S is an arbitrary operator in certain norm
ideals. In our case, however, S takes one of the forms
ΩE + F Γ
and
Ω1/2 EΓ1/2 .
In what follows, we shall try to exploit the situations, and by doing so, we are able to
derive better bounds on the solution X.
Lemma 2.2. Let T
Ω ∈ Cs×s and Γ ∈ Ct×t be two Hermitian matrices,1 and let
s×t
E, F ∈ C . If λ(Ω) λ(Γ) = ∅, then ΩX − XΓ = ΩE + F Γ has a unique solution
X ∈ Cs×t , and moreover,
q
kXkF ≤ kEk2F + kF k2F η2 ,
def
where η2 = minω∈λ(Ω), γ∈λ(Γ) %2 (ω, γ). If, in addition, F = 0, we have a better bound
kXkF ≤ kEkF /ηc ,
def
where2 ηc = minω∈λ(Ω), γ∈λ(Γ) |ω − γ|/|ω|.
Proof. For any s × s unitary P and t × t unitary Q, the substitutions
Ω ← P ∗ ΩP,
1 Lemmas
Γ ← Q∗ ΓQ,
X ← P ∗ XQ,
E ← P ∗ EQ,
and F ← P ∗ F Q
2.2–2.5 are actually true for normal matrices Ω and Γ.
that ηc ≥ η2 . This can be seen as follows. Assume ηc = |ω − γ|/|ω| for some ω ∈ λ(Ω),
γ ∈ λ(Γ). Then ηc ≥ %2 (ω, γ) ≥ η2 .
2 Notice
476
REN-CANG LI
leave the lemma unchanged, so we may assume without loss of generality that Ω =
diag(ω1 , ω2 , . . . , ωs ) and Γ = diag(γ1 , γ2 , . . . , γt ).
Write X = (xij ), E = (eij ), and F = (fij ). Entrywise, ΩX − XΓ = ΩE + F Γ
reads ωi xij − xij γTj = ωi eij + fij γj . Thus xij exists uniquely provided ωi 6= γj , which
holds since λ(Ω) λ(Γ) = ∅, an empty set; and moreover;
|(ωi − γj )xij |2 = |ωi xij − xij γj |2 = |ωi eij + fij γj |2 ≤ (|ωi |2 + |γj |2 )(|eij |2 + |fij |2 )
by the Cauchy–Schwarz inequality. This implies
|eij |2 + |fij |2
η22
[%2 (ωi , γj )]
P
P
2
2
X
kEk2F + kF k2F
i, j |eij | +
i, j |fij |
2
2
⇒ kXkF =
|xij | ≤
=
,
2
η2
η22
i, j
|xij |2 ≤
|eij |2 + |fij |2
≤
2
as was to be shown. The case F = 0 can be handled similarly.
Lemma 2.3. Let Ω ∈ Cs×s and Γ ∈ Ct×t be two Hermitian matrices, and let
E, F ∈ Cs×t . If there exist α ≥ 0 and δ > 0 such that
(2.5)
kΩk2 ≤ α
and
kΓ−1 k−1
2 ≥α+δ
or
kΩ−1 k−1
2 ≥α+δ
(2.6)
and
kΓk2 ≤ α,
then ΩX − XΓ = ΩE + F Γ has a unique solution X ∈ Cs×t , and moreover for any
unitarily invariant norm ||| · |||,
q
q
q
q
ηp ,
|||X||| ≤ |||E||| + |||F |||
def
where η p = %p (α, α + δ). If, in addition, F = 0, we have a better bound
|||X||| ≤ |||E||| /η c ,
holds.
where η c = δ/α when (2.5) holds, and η c = δ/(α + δ) when (2.6) T
Proof. First of all, the conditions of this lemma imply λ(Ω) λ(Γ) = ∅, thus X
exists uniquely by Lemma 2.2. In what follows, we consider the case (2.5); the other
case (2.6) is analogous. Post-multiply ΩX − XΓ = ΩE + F Γ by Γ−1 to get
(2.7)
ΩXΓ−1 − X = ΩEΓ−1 + F.
1
−1
k2 ≤ α+δ
, we have
Under the assumptions kΩk2 ≤ α and kΓ−1 k−1
2 ≥ α + δ ⇒ kΓ
ΩXΓ−1 − X ≥ |||X||| − ΩXΓ−1 ≥ |||X||| − kΩk2 |||X||| kΓ−1 k2
1
α
|||X||| ,
≥ |||X||| − α |||X|||
= 1−
α+δ
α+δ
and
ΩEΓ−1 + F ≤ ΩEΓ−1 + |||F ||| ≤ kΩk2 |||E||| kΓ−1 k2 + |||F |||
r
q
αp
1
q
q
q
|||E||| + |||F ||| .
+ |||F ||| ≤ p 1 +
≤ α |||E|||
α+δ
(α + δ)p
477
RELATIVE PERTURBATION THEORY II
By (2.7), we deduce that
r
q
αp
α
q
q
q
|||X||| ≤ p 1 +
|||E||| + |||F |||
1−
α+δ
(α + δ)p
from which the desired inequality follows. Also, the case F = 0 can be handled
similarly.
Lemma 2.4. Let Ω ∈ Cs×s and T
Γ ∈ Ct×t be two nonnegative definite Hermitian
s×t
matrices, and let E ∈ C . If λ(Ω) λ(Γ) = ∅, then ΩX − XΓ = Ω1/2 EΓ1/2 has a
unique solution X ∈ Cs×t , and moreover,
kXkF ≤ kEkF /ηχ ,
def
where ηχ = minω∈λ(Ω), γ∈λ(Γ) χ(ω, γ).
Proof. For any s × s unitary P and t × t unitary Q, the substitutions
Ω ← P ∗ ΩP,
Γ ← Q∗ ΓQ,
X ← P ∗ XQ,
and E ← P ∗ EQ
leave the lemma unchanged, so we may assume without loss of generality that Ω =
diag(ω1 , ω2 , . . . , ωs ) and Γ = diag(γ1 , γ2 , . . . , γt ).
Write X = (xij ), E = (eij ). Entrywise, ΩX − XΓ = Ω1/2 EΓ1/2 reads ωi xij −
√
√
xij γj = ωi eij γj . As long as ωi 6= γj , xij exists uniquely, and
|xij |2 = |eij |2 /χ(ωi , γj ) ≤ |eij |2 /ηχ ,
summing which over 1 ≤ i ≤ s and 1 ≤ j ≤ t leads to the desired inequality.
Lemma 2.5. Let Ω ∈ Cs×s and Γ ∈ Ct×t be two nonnegative definite Hermitian
matrices, and let E ∈ Cs×t . If there exist α ≥ 0 and δ > 0 such that (2.5) or (2.6)
holds, then ΩX − XΓ = Ω1/2 EΓ1/2 has a unique solution X ∈ Cs×t , and moreover,
|||X||| ≤ |||E||| /η χ ,
def
where η χ = χ(α, α + δ).
Proof. The existence and
T uniqueness of X are easy to see because the conditions
of this lemma imply λ(Ω) λ(Γ) = ∅. In what follows, we consider the case (2.5)
only; the other case (2.6) is analogous. Post-multiply ΩX − XΓ = Ω1/2 EΓ1/2 by Γ−1
to get
ΩXΓ−1 − X = Ω1/2 EΓ−1/2 .
(2.8)
−1
k2 ≤
Under the assumptions kΩk2 ≤ α and kΓ−1 k−1
2 ≥ α + δ ⇒ kΓ
ΩXΓ−1 − X ≥ 1 − α
|||X||| ,
α+δ
1
α+δ ,
as in the proof of Lemma 2.3, and
√
1
1/2 −1/2 .
Ω EΓ
≤ kΩ1/2 k2 |||E||| kΓ−1/2 k2 ≤ α |||E||| √
α+δ
By (2.8), we deduce that
1−
α |||X||| ≤
α+δ
from which the desired inequality follows.
r
α
|||E||| ,
α+δ
we have
478
REN-CANG LI
Remark 2.1. For Sylvester equation ΩX − XΓ = S, with S having no special
structures, Bhatia, Davis, and McIntosh [2] also proved bounds, independent
T of X’s
dimensions, on |||X||| under the conditions that Ω and Γ are normal and λ(Ω) λ(Γ) =
∅ only. It is easy to see that (2.5) or (2.6) describes similar spectral distributions to
Figure 2.2. Thus an open question naturally arises:
could Lemmas 2.3 and 2.5 be
T
extended to normal matrices Ω and Γ and λ(Ω) λ(Γ) = ∅ only?
3. Relative perturbation theorems for eigenspace variations. Let A and
e be two Hermitian matrices whose eigendecompositions are
A
(1.1)
A = (U1 , U2 )
Λ1
Λ2
U1∗
U2∗
¡ k
¡ k n−k e1
e= U
where U = U1 U2 , U
as in (1.2) and (1.3). Define
Ã
e2 )
e = (U
e1 , U
and A
n−k e2
U
e1
Λ
!Ã
e2
Λ
e∗
U
1
e
U2∗
!
,
e j ’s are defined
are unitary, and Λi ’s and Λ
e − A)U1 .
e 1 − U1 Λ1 = (A
R = AU
Notice that
e ∗ AU
e ∗ U1 Λ1 = Λ
e 2U
e ∗ U1 Λ1 ,
e 1−U
e2∗ U1 − U
e2∗ R = U
U
2
2
2
e2∗ R = U
e ∗ D∗ AD(I − D−1 ) + (D∗ − I)A U1
U
and
2
e2∗ (I − D−1 )U1 + U
e 2U
e2∗ (D∗ − I)U1 Λ1 .
=Λ
Thus, we have
(3.1)
e2∗ U1 − U
e2∗ (I − D−1 )U1 + U
e2∗ U1 Λ1 = Λ
e 2U
e2∗ (D∗ − I)U1 Λ1 .
e 2U
Λ
def e ∗ ∗
∗
3
e∗
e∗
Let X = U
2 D U1 = U2 U1 − U2 (I − D )U1 . Another formulation of (3.1) is
(3.2)
e 2U
e 2 X − XΛ1 = Λ
e2∗ (D∗ − D−1 )U1 .
Λ
Both (3.1) and (3.2) are in the form of Sylvester equations with special structures
which are vital to the development of our following perturbation theorems. Notice by
Lemma 2.1 that
e1 ) = U
e ∗ U1 ≤ |||X||| + U
e ∗ (I − D∗ )U1 .
(3.3)
sin Θ(U1 , U
2
2
(3.1) makes %2 a natural choice for measuring the relative gap between λ(Λ1 ) and
e 2 ), while (3.2) makes the classical measurement a natural choice.
λ(Λ
e = D∗ AD be two n × n Hermitian matrices with
Theorem 3.1. Let A and A
T e
eigendecompositions (1.1)–(1.3), where D is nonsingular. If λ(Λ1 ) λ(Λ
2 ) = ∅, then
p
2
2
−1
∗
e1 )kF ≤ k(I − D )U1 kF + k(I − D )U1 kF ,
(3.4)
k sin Θ(U1 , U
η2
(3.5)
e1 )kF ≤ k(I − D∗ )U1 kF +
k sin Θ(U1 , U
k(D∗ − D−1 )U1 kF
,
ηc
3 Or, Λ
e2 Y − Y Λ1 = Ue2∗ (D∗ − D−1 )U1 Λ1 , where Y = Ue2∗ U1 − Ue2∗ (I − D−1 )U1 . Such a formulation
can be used to produce perturbation bounds different from but of the same spirit as (3.5) and (3.8)
whose derivations rely on (3.2).
479
RELATIVE PERTURBATION THEORY II
where
def
η2 =
min
%2 (µ, µ
e)
µ∈λ(Λ1 ), e
µ∈λ(e
Λ2 )
and
def
ηc =
min
µ∈λ(Λ1 ), e
µ∈λ(e
Λ2 )
|µ − µ
e|
.
|e
µ|
Proof. Lemma 2.2 and (3.1) yield (3.4), whereas Lemma 2.2, (3.2), and (3.3)
yield (3.5).
Remark 3.1. Without assuming the multiplicative structure in perturbing A to
e ∗ U1 Λ1 = U
e ∗ R, which leads to
e we shall end up with Sylvester equation Λ
e 2U
e ∗ U1 − U
A,
2
2
2
,
e1 )kF ≤ kRkF
|µ − µ
e| .
min
k sin Θ(U1 , U
µ∈λ(Λ1 ), e
µ∈λ(e
Λ2 )
This is a Davis–Kahan sin θ theorem. Our other theorems in this section relate to
Davis–Kahan sin θ theorems analogously.
Remark 3.2. Let U1,sub be a submatrix consisting of a few (or just one) of U1 ’s
columns, and let Λ1,sub be the corresponding eigenvalue matrix. (3.1) and (3.2) imply
that
e ∗ U1,sub Λ1,sub = Λ
e 2U
e ∗ (D∗ − I)U1,sub Λ1,sub ,
e2∗ U1,sub − U
e2∗ (I − D−1 )U1,sub + U
e 2U
Λ
2
2
∗
∗
−1
e
e
e
Λ2 Xsub − Xsub Λ1,sub = Λ2 U2 (D − D )U1,sub ,
def e ∗ ∗
∗
e∗
e∗
where Xsub = U
2 D U1,sub = U2 U1,sub − U2 (I − D )U1,sub . So our approach needs
e1 )
no modifications when it comes to bound the closeness of either R(U1,sub ) to R(U
e
or R(U1 ) to R(U1 ). It can be seen that all the theorems in this section remain valid if
U1 is replaced by U1,sub and the relative gaps are redefined as those between λ(Λ1,sub )
e 2 ).
and λ(Λ
Eisenstat and Ipsen [7] obtained the following: Under the assumptions of Theorem 3.1,
√
kI − D−∗ D−1 k2
∗
e
(3.6)
.
k sin Θ(U1 , U1 )k2 ≤ k kI − D k2 +
ηc
It is a good bound for k = 1. But for k ≥ 2, it is less competitive. To compare this
inequality with (3.5), we notice that
e1 )k2 ≤ k sin Θ(U1 , U
e1 )kF ,
k sin Θ(U1 , U
√
√
∗
∗
k(I − D )U1 kF ≤ kk(I − D )U1 k2 ≤ kkI − D∗ k2 ,
√
√
k(D∗ − D−1 )U1 kF ≤ kkD∗ − D−1 k2 ≤ kkD∗ k2 kI − D−∗ D−1 k2 ,
kI − D−∗ D−1 k2 ≤ kD−∗ k2 kD∗ − D−1 k2 .
Thus (3.5) and (3.6) imply that
(3.5a)
(3.6a)
∗
−∗ −1
√
e1 )k2 ≤ k kI − D∗ k2 + kD k2 kI − D D k2 ,
k sin Θ(U1 , U
ηc −∗
∗
−1
k
kD
−
D
kF
kD
2
∗
e1 )kF ≤ k kI − D kF +
.
k sin Θ(U1 , U
ηc
Now for inequalities (3.5) and (3.6) to be of any significance at all, D must be fairly
close to the identity matrix, under which kD∗ k2 ≈ 1 and thus (3.5a)—a weakened
480
REN-CANG LI
(3.5)—is about as good√as (3.6). Here is an example for which (3.5) improves (3.6)
by at least a factor of k. Take D = I − ww∗ , where > 0 is a small number and
w a vector with kwk2 = 1. Then D−1 = I + ww∗ /(1 − ). Thus,
ww∗ and I − D−∗ D−1 = − 2 +
ww∗ .
D∗ − D−1 = −(2 − )
1−
1− 1−
Hence (3.5) yields
e1 )k2 ≤
k sin Θ(U1 , U
(3.5b)
1
+ (2 − )
,
1 − ηc
1−
and (3.6) becomes
e1 )k2 ≤
k sin Θ(U1 , U
(3.6b)
√
k
1
+
1 − ηc
2+
1−
.
1−
e1 ) is almost a multiple of identity and both I − D∗
It is can be seen that when Θ(U1 , U
−∗ −1
and I −D D are almost of rank 1, (3.6a) is not much weaker than (3.6), and in this
case (3.5) improves (3.6a) by nearly a factor of k. But in any event the improvement
can be by a factor at most k.
Now we compare (3.4) and (3.5). Although ηc ≥ η2 always, and that ηc may be
much larger than η2 , it appears (3.4) and (3.5) are comparable by a constant factor
unless D is much closer to a unitary matrix than the identity matrix4 in which case
the second term on the right-hand side of (3.5) becomes negligible and consequently
we expect (3.5) to be sharper than (3.4). On one hand,
can always produce a
p (3.5)
√
bound that is only weaker than (3.4) by a factor of 4 + 2 2:
√
2k(I − D∗ )U1 kF
k(D∗ − I)U1 kF + k(I − D−1 )U1 kF
e
+
k sin Θ(U1 , U1 )kF ≤
η2
η2
p
q
√
k(I − D−1 )U1 k2F + k(I − D∗ )U1 k2F
≤ 4+2 2
,
η2
√
where we have used η2 ≤ 2 and η2 ≤ ηc . On the other hand, (3.4) cannot be much
worse than (3.5) in general, also by a constant factor, at least for the interesting cases
def
when E = I − D is tiny. In fact, the following arguments show that when η2 and
ηc are not too much apart, (3.4) may be sharper. Assume that D differs from I not
much worse than from its closest unitary matrix. We have
q
√
kI − D−1 k2F + kI − D∗ k2F ≈ 2kEkF + O(kEk2F ),
kD∗ − D−1 kF ≈ 2kEkF + O(kEk2F ).
(3.4) and (3.5) become
(3.4c)
(3.5c)
4 By
√
2kEkF
+ O(kEk2F ),
η2
e1 )kF ≤ kEkF + 2kEkF + O(kEk2F ).
k sin Θ(U1 , U
ηc
e1 )kF ≤
k sin Θ(U1 , U
this we mean that kI − Dk2 kD∗ − D−1 k2 .
RELATIVE PERTURBATION THEORY II
481
et |/|λ
et |. ηc = γη2 ≤ γ%2 (λs , λ
et )
= |λs − λ
Write ηc = γη2 , where
p 1 ≤ γ, and let ηc p
et | ≤ γ 2 − 1, and so ηc ≥ γ 2 − 1 − 1. The ratio of the right-hand
implies |λs |/|λ
side of (3.4c) over that of (3.5c) is (terms of O(kEk2 ) are ignored)
√
√
2/(ηc /γ)
2γ
=
.
1 + 2/ηc
ηc + 2
√
Now√for 1 ≤ γ ≤ 2, this ratio is less than 1, which means that (3.4c) is sharper; for
γ > 2, this ratio is bounded by
√
√
√
2γ
2γ
≤ 2
≤p
ηc + 2
γ2 − 1 + 1
p
√
√
since 2 γ/( γ 2 − 1 + 1) is monotonically increasing for 2 ≤ γ. This means that
when I − D is tiny and that D is about as equally close to a unitary matrix
√ as to the
identity matrix, (3.4) cannot be worse than (3.5) by a factor more than 2.
e = D∗ AD be two n × n Hermitian matrices with
Theorem 3.2. Let A and A
eigendecompositions (1.1)–(1.3), where D is nonsingular. Assume that the spectra of
e 2 distribute as in Figure 2.2. Then for any unitarily invariant norm ||| · |||,
Λ1 and Λ
p
q
q
q
|||(I − D−1 )U1 ||| + |||(I − D∗ )U1 |||
e
(3.7)
,
U
)
≤
,
Θ(U
sin
1
1 ηp
∗
(D − D−1 )U1 ∗
e
(3.8)
,
sin Θ(U1 , U1 ) ≤ |||(I − D )U1 ||| +
ηc
where
def
η p = %p (α, α + δ)
and
def
ηc =
δ/(α + δ) if Figure 2.2(a),
δ/α
if Figure 2.2(b).
Proof. Lemma 2.3 and (3.1) yield (3.7), whereas Lemma 2.3, (3.2), and (3.3) yield
(3.8).
Theorems 3.1 and 3.2 deal with rather restrictive perturbations to A. In what
follows we show how similar ideas can be applied to a more realistic situation when
scaled A is much better conditioned. Consider nonnegative definite Hermitian matrix
e = S ∗ HS,
e where S is a scaling matrix
A = S ∗ HS ∈ Cn×n which is perturbed to A
and usually diagonal. But this is not necessary to the theorems below. The elements
of S can vary wildly and can even be singular. H is nonsingular and usually better
def e
− H. Notice that
conditioned than A itself. Set ∆H = H
A = S ∗ HS = (H 1/2 S)∗ H 1/2 S,
¡
e = S ∗ H 1/2 I + H −1/2 (∆H)H −1/2 H 1/2 S
A
¡
1/2 1/2 ∗ ¡
1/2 1/2
H S
H S.
= I + H −1/2 (∆H)H −1/2
I + H −1/2 (∆H)H −1/2
Set
B = S ∗ H 1/2 ,
¡
e = S ∗ H 1/2 I + H −1/2 (∆H)H −1/2 1/2
B
def
= BD,
482
REN-CANG LI
¡
1/2
e
where D = I + H −1/2 (∆H)H −1/2
. Given the eigendecompositions of A and A
e
as in (1.1)–(1.3), it can be seen that B and B admit the following SVDs.
Ã
!
1/2
Λ1
V1∗
,
B = (U1 , U2 )
1/2
V2∗
Λ2
!Ã
!
Ã
e 1/2
Ve1∗
Λ
1
e
e
e
,
B = (U1 , U2 )
e 1/2
Ve2∗
Λ
2
ei are the same as in (1.1), and
where Ui , U
We have
¡
k
V1
n−k V2
and
¡
k
Ve1
n−k Ve2
are unitary.
e ∗ B ∗ − BD
e −1 B ∗ = B(D
e ∗ − D−1 )B ∗ .
e−A = B
eB
e ∗ − BB ∗ = BD
A
e ∗ and U1 , respectively, to get
Pre- and post-multiply the equations by U
2
e2∗ U1 − U
e2∗ U1 Λ1 = Λ
e 1/2 Ve2∗ (D∗ − D−1 )V1 Λ1/2 ,
e 2U
Λ
2
1
(3.9)
a Sylvester equation. Notice that for any unitarily invariant norm,
e ∗ ∗
V2 (D − D−1 )V1 ≤ D∗ − D−1 1/2 −1/2 −1/2
−1/2
−1/2
−1/2
(∆H)H
− I +H
(∆H)H
= I + H
−1/2 k2 H −1/2 (∆H)H −1/2 ≤ k I + H −1/2 (∆H)H −1/2
kH −1 k2 |||∆H|||
≤p
1 − kH −1 k2 k∆Hk2
.
(3.9) makes relative distance χ a natural choice. Lemmas 2.4, 2.5, and (3.9) produce
the following two theorems.
e = S ∗ HS
e be two n × n Hermitian
Theorem 3.3. Let A = S ∗ HS and A
matrices with eigendecompositions (1.1)–(1.3). Assume H is positive definite and
def
kH −1 k2 k∆Hk2 < 1. If ηχ = minµ∈λ(Λ
(3.10)
e
e) > 0, then
e χ(µ, µ
1 ), µ∈λ(Λ2 )
−1
k∆HkF
kH −1 k2
e1 )kF ≤ kD − D kF ≤ p
.
k sin Θ(U1 , U
−1
ηχ
1 − kH k2 k∆Hk2 ηχ
where D = (I + H −1/2 (∆H)H −1/2 )1/2 = D∗ .
e = S ∗ HS
e be two n × n Hermitian matrices
Theorem 3.4. Let A = S ∗ HS and A
with eigendecompositions (1.1)—(1.3). H is positive definite and kH −1 k2 k∆Hk2 < 1.
e 2 distribute as in Figure 2.2. Then for any
Assume that the spectra of Λ1 and Λ
unitarily invariant norm ||| · |||,
D − D−1 |||∆H|||
kH −1 k2
e1 ) ≤
(3.11)
≤p
,
sin Θ(U1 , U
ηχ
1 − kH −1 k2 k∆Hk2 η χ
def
where η χ = χ(α, α + δ) and D = (I + H −1/2 (∆H)H −1/2 )1/2 = D∗ .
483
RELATIVE PERTURBATION THEORY II
Remark 3.3. Our approach can be extended straightforwardly to diagonalizable
e = D∗ AD2 , where D1 and D2 are nonsingular. Suppose
matrices. Consider A and A
1
e are diagonalizable and let
that both A and A
A(X1 , X2 ) = (X1 , X2 )
Λ1
Λ2
Ã
eX
e1 , X
e2 ) = (X
e1 , X
e2 )
and A(
e1
Λ
!
e2
Λ
,
¡ k n−k ¡ k n−k e1 X
e2 are nonsingular, and Λi and Λ
e j are defined as
where X1 X2 and X
ej ’s possibly complex. Partition
in (1.2) and (1.3) with λi ’s and λ
−1
(X1 , X2 )
=
k
n−k
Y1∗
Y2∗
and
e1 , X
e2 )−1 =
(X
k
n−k
e∗ Y1
.
Ye ∗
2
def e
e
Define R = AX
1 − X1 Λ1 = (A − A)X1 . We have
e 2 Ye ∗ X1 − Ye ∗ X1 Λ1 ,
e 1 − Ye2∗ X1 Λ1 = Λ
Ye2∗ R = Ye2∗ AX
2
2
h
i
e − D−1 ) + (D1∗ − I)A X1
Ye2∗ R = Ye2∗ A(I
2
e 2 Ye ∗ (I − D−1 )X1 + Ye ∗ (D∗ − I)X1 Λ1 .
=Λ
2
2
1
2
Thus we have the following perturbation equations:
(3.12)
(3.13)
e 2 Ye ∗ (I − D−1 )X1 + Ye ∗ (D∗ − I)X1 Λ1 ,
e 2 Ye2∗ X1 − Ye2∗ X1 Λ1 = Λ
Λ
2
2
1
2
−1
∗
∗
e
e
e
Λ2 Z − ZΛ1 = Λ2 Y2 (D1 − D2 )X1 ,
def
where Z = Ye2∗ D1∗ X1 = Ye2∗ X1 − Ye2∗ (I − D1∗ )X1 , from which various bounds on
e1 ) can be derived under certain conditions. For example, let
sin Θ(X1 , X
def
η2 =
min
µ∈λ(Λ1 ), e
µ∈λ(e
Λ2 )
%2 (µ, µ
e).
If η2 > 0, then by Lemma 2.2 we have
q
1
∗
e
kYe2∗ (I − D2−1 )X1 k2F + kYe2∗ (D1∗ − I)X1 k2F
kY2 X1 kF ≤
η2
q
1
≤ kYe2∗ k2 kX1 k2 kI − D2−1 k2F + kD1∗ − Ik2F .
η2
Notice that by Lemma 2.1
e1 )kF = k(Ye ∗ Ye2 )−1/2 Ye ∗ X1 (X ∗ X1 )−1/2 kF
k sin Θ(X1 , X
2
2
1
∗ e −1/2
∗
e
e
≤ k(Y2 Y2 )
k2 kY2 X1 kF k(X1∗ X1 )−1/2 k2 .
e1 )kF is immediately available.
Then a bound on k sin Θ(X1 , X
484
REN-CANG LI
4. Relative perturbation theorems for singular subspace variation. Let
e be two m × n (m ≥ n) (complex) matrices with SVDs
B and B
(4.1)

Σ1
B = (U1 , U2 )  0
0

∗ 0
V1
Σ2 
V2∗
0

e1
Σ
e2 )  0
e = (U
e1 , U
and B
0
Ã
!
0
e∗
V
1
e2 
,
Σ
Ve2∗
0
¡ k m−k ¡ k m−k e1 U
e2 are m × m unitary, and V =
e =
U1 U2 and U
U
where U =
k
n−k
¡
¡ k n−k e2 are n × n unitary, 1 ≤ k < n, and
V1 V2 and Ve = Ve1 U
(4.2)
Σ1 = diag(σ1 , . . . , σk ), Σ2 = diag(σk+1 , . . . , σn ),
e 2 = diag(e
e 1 = diag(e
σ1 , . . . , σ
ek ), Σ
σk+1 , . . . , σ
en ).
Σ
(4.3)
Define residuals
(4.4)
def e
e
RR = BV
1 − U1 Σ1 = (B − B)V1
def e ∗
e ∗ − B ∗ )U1 .
and RL = B
U1 − V1 Σ1 = (B
Our development for the singular value problems more or less resembles what we did
for the eigenvalue problems. Most comments and comparisons we made in section 3
apply here, too, with perhaps a little change. Nonetheless there is a little bit of
complication here, namely we shall work with two residuals, RR and RL , and end up
with bounding solutions to two coupled matrix equations. The proofs of theorems in
this section are relatively long and are postponed to the next section.
e = D∗ BD2 be two m×n (m ≥ n) (complex) matrices
Theorem 4.1. Let B and B
1
with SVDs (4.1)–(4.3), where D1 and D2 are nonsingular. Let
def
(4.5) η2 =
min
%2 (µ, µ
e)
e2 )
µ∈σ(Σ1 ), e
µ∈σext (Σ
and
def
ηc =
min
e2 )
µ∈σ(Σ1 ), e
µ∈σext (Σ
|µ − µ
e|
,
|e
µ|
e 2 ) ≡ σ(Σ
e 2 )∪{0} if m > n, and σext (Σ
e 2 ) ≡ σ(Σ
e 2 ) otherwise. If ηc , η2 > 0,
where σext (Σ
then
(4.6)
q
e1 )k2 + k sin Θ(V1 , Ve1 )k2
k sin Θ(U1 , U
F
F
q
∗
2
k(I − D1 )U1 kF + k(I − D1−1 )U1 k2F + k(I − D2∗ )V1 k2F + k(I − D2−1 )V1 k2F
≤
,
η2
(4.7)
q
e1 )k2 + k sin Θ(V1 , Ve1 )k2 ≤
k sin Θ(U1 , U
F
F
+
1
ηc
q
k(I − D1∗ )U1 k2F + k(I − D2∗ )V1 k2F
q
k(D1∗ − D1−1 )U1 k2F + k(D2∗ − D2−1 )V1 k2F .
485
RELATIVE PERTURBATION THEORY II
This theorem is an extension of a Wedin sin θ theorem [16], where no multiplicative
e is assumed. He proved
structure in the perturbation of B to B
p
q
2
2
e1 )k2 + k sin Θ(V1 , Ve1 )k2 ≤ kRR kF + kRL kF ,
(4.8)
k sin Θ(U1 , U
F
F
δ
def
e|. Our other theorems in this section relate to
where δ = minµ∈σ(Σ ), e
e2 ) |µ − µ
1 µ∈σext (Σ
Wedin sin θ theorems analogously.
e 2 ) when m > n. This is
Remark 4.1. Ghost singular values 0 are appended to σ(Σ
not necessary for the sensitivity of the V -factor alone, but rather the U -factor depends
on the ghost singular values. More fine analysis can be given to illustrate this point.
To keep the theorem relatively concise, we shall not go into this matter further.
e = D∗ BD2 be two m×n (m ≥ n) (complex) matrices
Theorem 4.2. Let B and B
1
with SVDs (4.1)–(4.3), where D1 and D2 are nonsingular. If there exist α ≥ 0 and
δ > 0 such that
min σi ≥ α + δ
1≤i≤k
and
max
1≤j≤n−k
σ
ek+j ≤ α,
then for any unitarily invariant norm ||| · |||
(4.9)
o
n
e1 ) , sin Θ(V1 , Ve1 )
max sin Θ(U1 , U
q
q
1
q
max q (I − D2−1 )V1 + |||(D1∗ − I)U1 ||| ,
≤
ηp
q
q
q
q (I − D1−1 )U1 + |||(D2∗ − I)V1 ||| ,
(4.10)
Ã
!
e1 )
sin Θ(U1 , U
sin Θ(V1 , Ve1 ) ≤
s
(I − D−1 )U1
q 1
q ∗
+ (I − D1 )U1
−1
(I − D2 )V1 q
(I − D2∗ )V1 ηp
,
def
where η p = %p (α, α + δ), and
(4.11)
o
n
e1 ) , sin Θ(V1 , Ve1 ) ≤ max {|||(I − D1∗ )U1 ||| , |||(I − D2∗ )V1 |||}
max sin Θ(U1 , U
ª
1
+ max (D1∗ − D1−1 )U1 , (D2∗ − D2−1 )V1 ,
ηc
(4.12)
Ã
! e1 )
(I − D1∗ )U1
sin Θ(U1 , U
≤ ∗
e
(I
−
D
)V
sin Θ(V1 , V1 ) 1
2
1 (D1∗ − D1−1 )U1
,
+ (D2∗ − D2−1 )V1 ηc
def
where η c = δ/α.
486
REN-CANG LI
We have both (4.9) and (4.10), and both (4.11) and (4.12) for theoretical considerations. In fact (4.9) differs from (4.10) in many ways from the theoretical point of
view, and the two are independent.
Remark 4.2. Intuitively, D2 should not affect R(U1 ) much as long as it is close
to a unitary matrix. In fact, if D2 is unitary it does not affect R(U1 ) at all. This
suggests that when one or both Di ’s are closer to unitary matrices5 than to the identity
matrices, better bounds may be possible. Li [13] indeed presented theorems, showing
that D1 contributes to sin Θ(V1 , Ve1 ) by its departure from some unitary matrix rather
e1 ). The
than the identity matrix and similar conclusions hold for D2 and sin Θ(U1 , U
reader is referred to Li [13] for details.
Remark 4.3. Theorems 4.1 and 4.2, applied to a special case [13] when one of
), yield a
the Di ’s is the identity matrix and the other one takes the form ( I X
I
deflation criterion that is sooner and cheaper in Demmel–Kahan QR algorithm [5]
for computing the singular value system of a bidiagonal matrix; see [10].
Theorems 4.1 and 4.2 deal with rather restrictive perturbations to B. In what
follows we show how to apply them to a more realistic situation when scaled B is much
e = GS
e ∈ Cn×n ,
better conditioned. Consider B = GS ∈ Cn×n which is perturbed to B
where S is a scaling matrix and G is nonsingular. Interesting cases are when G is
def e
− G. If k(∆G)G−1 k2 < 1, then
much better conditioned than B. Set ∆G = G
e = G + ∆G = [I + (∆G)G−1 ]G is nonsingular also.
G
e = GS
e ∈ Cn×n with SVDs (4.1)–(4.3),
Theorem 4.3. Let B = GS ∈ Cn×n and B
−1
where G is nonsingular. Assume k(∆G)G k2 < 1. If
def
η2 =
then
min
e2 )
µ∈σ(Σ1 ), e
µ∈σ(Σ
%2 (µ, µ
e) > 0,
r°
°2 °
°2
°
°
°
e1 )°
°sin Θ(U1 , U
° + °sin Θ(V1 , Ve1 )°
F
F
p
k(∆G)G−1 U1 k2F + k[I + G−∗ (∆G)∗ ]−1 G−∗ (∆G)∗ U1 k2F
≤
η2
s
k∆GkF
1
≤ kG−1 k2 1 +
.
(1 − kG−1 k2 k∆Gk2 )2
η2
(4.13)
e = GS
e = [I + (∆G)G−1 ]GS = D∗ BD2 , where D∗ = I + (∆G)G−1
Proof. Write B
1
1
and D2 = I. Apply Theorem 4.1 to get (4.13).
e = GS
e ∈ Cn×n with SVDs (4.1)–
Theorem 4.4. Let B = GS ∈ Cn×n and B
−1
(4.3), where G is nonsingular. Assume k(∆G)G k2 < 1. If there exist α ≥ 0 and
δ > 0 such that
min µ ≥ α + δ
µ∈σ(Σ1 )
and
max µ
e≤α
e2 )
µ∈σ(Σ
e
or
min µ ≤ α
µ∈σ(Σ1 )
5 When
e=
U
D1∗ U
and
max µ
e ≥ α + δ,
e2 )
µ∈σ(Σ
e
e = D1∗ BD2 = D1∗ U ΣV ∗ D2 is an SVD of B,
e which implies
both D1 and D2 are unitary, B
e = D2∗ V . Thus perturbations to singular subspaces, in this case, are independent of
and V
e2 ), as long as they are disjoint.
the gap between σ(Σ1 ) and σ(Σ
487
RELATIVE PERTURBATION THEORY II
then for any unitarily invariant norm ||| · |||,
o
n
e1 ) , sin Θ(V1 , Ve1 )
(4.14) max sin Θ(U1 , U
ª
max (∆G)G−1 U1 , [I + G−∗ (∆G)∗ ]−1 G−∗ (∆G)∗ U1 ≤
η∞
−1
|||∆G|||
kG k2
≤
,
1 − kG−1 k2 k∆Gk2 η∞
Ã
!
e1 )
sin Θ(U1 , U
(4.15) sin Θ(V1 , Ve1 ) p
q
q
q
|||(∆G)G−1 U1 ||| + |||[I + G−∗ (∆G)∗ ]−1 G−∗ (∆G)∗ U1 |||
≤
ηp
s
1
|||∆G|||
≤ kG−1 k2 q 1 +
.
−1
q
(1 − kG k2 k∆Gk2 )
ηp
e = GS
e = [I + (∆G)G−1 ]GS = D∗ BD2 , where D∗ =
Proof. Again write B
1
1
−1
I + (∆G)G and D2 = I. Apply Theorem 4.2 to get (4.14) and (4.15).
Remark 4.4. Better bounds, especially when (∆G)G−1 is nearly a skew Hermitian
matrix, can be proved for the angle Θ(V1 , Ve1 ). To prevent the paper from getting too
long, we refer the reader to Li [13].
Remark 4.5. Theorems 4.3 and 4.4 can be extended to cover nonsquare matrices.
e = GS
e are m × n (m ≥ n); S is a scaling matrix and both G
Assume B = GS and B
e are m × n; G has full column rank. Let G† = (G∗ G)−1 G∗ the pseudoinverse
and G
of G. Notice that G† G = I. Then
e = GS
e = (G + ∆G)S = (I + (∆G)G† )GS = (I + (∆G)G† )B.
B
e has full column rank, too. Now applying
If k(∆G)G† k2 ≤ kG† k2 k∆Gk2 < 1, G
Theorems 4.1 and 4.2, we find that Theorems 4.3 and 4.4 remain valid with G−1
e 2 ) by σext (Σ
e 2 ).
replaced by G† , and σ(Σ
e by
5. Proofs of Theorems 4.1 and 4.2. We can always augment B and B
m × (m − n) zero blocks to their rights to make them square. The augmented B and
e will have straightforward SVDs based on those of B and B.
e It turns out doing so
B
does not affect the U -factors, and the V -factors are affected in a trivial way such that
||| sin(V1 , Ve1 )||| stays the same; see [13]. In what follows we shall deal with the square
case only.
e 1 − U1 Σ1 = (B
e − B)V1 and RL = B
e ∗ U1 − V1 Σ1 = (B
e ∗ − B ∗ )U1 .
Let RR = BV
When m = n, the SVDs (4.1)–(4.3) read
(5.1)
B = (U1 , U2 )
Σ1
Σ2
V1∗
V2∗
Ã
e2 )
e = (U
e1 , U
and B
e1
Σ
Notice that
e ∗ BV
e ∗ U1 Σ1 = Σ
e 2 Ve ∗ V1 − U
e ∗ U1 Σ1 ,
e 1−U
e2∗ RR = U
U
2
2
2
i 2
h
e ∗ B(I
e ∗ RR = U
e − D−1 ) + (D1∗ − I)B V1
U
2
2
2
e 2 Ve2∗ (I − D−1 )V1 + U
e2∗ (D1∗ − I)U1 Σ1
=Σ
2
!Ã
e2
Σ
Ve1∗
Ve2∗
!
.
488
REN-CANG LI
to get
(5.2)
e2∗ U1 Σ1 = Σ
e 2 Ve2∗ (I − D−1 )V1 + U
e2∗ (D1∗ − I)U1 Σ1 .
e 2 Ve2∗ V1 − U
Σ
2
On the other hand,
e ∗ U1 − Ve2∗ V1 Σ1 = Σ
e2∗ U1 − Ve2∗ V1 Σ1 .
e 2U
Ve2∗ RL = Ve2∗ B
h
i
e ∗ (I − D−1 ) + (D2∗ − I)B ∗ U1
Ve2∗ RL = Ve2∗ B
1
e 2U
e2∗ (I − D−1 )U1 + Ve2∗ (D2∗ − I)V1 Σ1 ,
=Σ
1
which produce
(5.3)
e 2U
e2∗ U1 − Ve2∗ V1 Σ1 = Σ
e2∗ (I − D−1 )U1 + Ve2∗ (D2∗ − I)V1 Σ1 .
e 2U
Σ
1
(5.2) and (5.3) take an equivalent form as a single matrix equation with dimensions
doubled.
!Ã
! Ã
!
Ã
e ∗ U1
e ∗ U1
e2
U
U
Σ1
Σ
2
2
−
(5.4)
e2
Σ1
Σ
Ve2∗ V1
Ve2∗ V1
!Ã
!
Ã
e ∗ (I − D−1 )U1
e2
U
Σ
2
1
=
e2
Σ
Ve2∗ (I − D2−1 )V1
!
Ã
e ∗ (D∗ − I)U1
Σ1
U
2
1
.
+
Σ1
Ve2∗ (D2∗ − I)V1
(5.2)–(5.4) can also be rearranged in such a way that sharper bounds can be proved
when Di ’s are closer to unitary matrices than the identity matrix. Write
def
X = Ve2∗ D2∗ V1 = Ve2∗ V1 − Ve2∗ (I −D2∗ )V1
def e ∗ ∗
∗
e∗
e∗
and Y = U
2 D1 U1 = U2 U1 − U2 (I −D1 )U1 .
We have, from (5.2) and (5.3),
e 2 Ve ∗ (D∗ − D−1 )V1 ,
e 2 X − Y Σ1 = Σ
Σ
2
2
2
∗
∗
e
e
e
Σ2 Y − XΣ1 = Σ2 U2 (D1 − D1−1 )U1 ,
(5.5)
(5.6)
and, from (5.4),
!
Ã
e2
Y
Σ
(5.7)
e2
Σ
Ã
=
X
e2
Σ
−
e2
Σ
!Ã
Y
X
Σ1
Σ1
e ∗ (D∗ − D−1 )U1
U
2
1
1
!
Ve2∗ (D2∗ − D2−1 )V1
.
(5.2)–(5.4) make %p a natural choice for measuring the relative gaps between σ(Σ1 )
e 2 ), while (5.5)–(5.7) make the classical measurement (2.2) a natural choice.
and σ(Σ
Remark 5.1. Just as in Remark 3.1, perturbation equations (5.2)–(5.7) can be
modified to bound the closeness of the singular subspaces spanned by a few selected
e1 ) and R(Ve1 ).
columns of U1 and V1 to R(U
489
RELATIVE PERTURBATION THEORY II
Proof of Theorem 4.1. Notice that the eigenvalues of (
those of (
Σ1
Σ1
e2
Σ
e2
Σ
) are ±e
σk+j , that
) are ±σi , and that
σk+j ) ≥ %2 (σi , σ
ek+j )
%2 (σi , −e
and %2 (−σi , σ
ek+j ) ≥ %2 (σi , σ
ek+j ).
By Lemma 2.2 and (5.4), we have
e2∗ U1 k2F + kVe2∗ V1 k2F
kU
1 h e∗
−1
−1
2
2
2
e∗ ∗
e∗
≤ 2 kU
2 (I − D1 )U1 kF + kU2 (D1 − I)U1 kF + kV2 (I − D2 )V1 kF
η2
i
+kVe2∗ (D2∗ − I)V1 k2F
≤
1 k(I − D1−1 )U1 k2F + k(D1∗ − I)U1 k2F + k(I − D2−1 )V1 k2F + k(D2∗ − I)V1 k2F ,
2
η2
which gives (4.6). By Lemma 2.2 and (5.7), we have
q
kXk2F
+ kY
k2F
q
1
e ∗ (D∗ − D−1 )U1 k2 + kVe ∗ (D∗ − D−1 )V1 k2
≤
kU
2
1
2
2
1
2
F
F
ηc
q
1
≤
k(D1∗ − D1−1 )U1 k2F + k(D2∗ − D2−1 )V1 k2F ,
ηc
which, together with
q
q
e ∗ U1 k2 = kX + Ve ∗ (I − D∗ )V1 k2 + kY + U
e ∗ (I − D∗ )U1 k2
kVe2∗ V1 k2F + kU
2
2
2
2
1
F
F
F
q
q
e ∗ (I − D∗ )U1 k2 ,
≤ kXk2F + kY k2F + kVe2∗ (I − D2∗ )V1 k2F + kU
2
1
F
imply (4.7).
Remark 5.2. Without assuming the multiplicative structure in the perturbation
e we shall end up with
of B to B,
e2∗ U1 Σ1 = U
e2∗ RR
e 2 Ve2∗ V1 − U
Σ
e 2U
e2∗ U1 − Ve2∗ V1 Σ1 = Ve2∗ RL ,
and Σ
which lead to Wedin sin θ theorems, e.g., (4.8).
Lemma 5.1. Let Ω ∈ Cs×s and Γ ∈ Ct×t be two Hermitian matrices, and let
e F, Fe, ∈ Cs×t . If there exist α ≥ 0 and δ > 0 such that
E, E,
(5.8)
kΩk2 ≤ α
and
kΓ−1 k−1
2 ≥α+δ
or
(5.9)
kΩ−1 k−1
2 ≥α+δ
and
kΓk2 ≤ α,
e + FeΓ has a unique solution X, Y ∈
then ΩX − Y Γ = ΩE + F Γ and ΩY − XΓ = ΩE
Cs×t , and moreover for any unitarily invariant norm ||| · |||,
)
(q
r q
q
1
e
q
q q e q
(5.10) max {|||X||| , |||Y |||} ≤
max
|||E||| + |||F ||| , E + F ,
ηp
490
REN-CANG LI
def
where η p = %p (α, α + δ). If, in addition, F = Fe = 0, we have a better bound
max {|||X||| , |||Y |||} ≤
o
n
1
e max |||E||| , E
,
ηc
where η c = δ/α when (5.8) holds and η c = δ/(α + δ) when (5.9) holds.
Proof. The proof of the existence and uniqueness of X, Y ∈ Cs×t is left to the
reader. We present a proof of (5.10) for the case (5.8). A proof for the other case is
e + FeΓ
analogous. Consider first the subcase |||X||| ≥ |||Y |||. Postmultiply ΩY −XΓ = ΩE
by Γ−1 to get
e −1 + Fe.
ΩY Γ−1 − X = ΩEΓ
(5.11)
1
−1
k2 ≤ α+δ
, that
Then we have, by kΩk2 ≤ α and kΓ−1 k−1
2 ≥ α + δ ⇒ kΓ
ΩY Γ−1 − X ≥ |||X||| − ΩY Γ−1 ≥ |||X||| − kΩk2 |||Y ||| kΓ−1 k2
1
1
≥ |||X||| − α |||Y |||
≥ |||X||| − α |||X|||
α+δ
α+δ
α
|||X|||
= 1−
α+δ
and
e −1
e −1 e e −1 e
kΓ k2 + Fe
ΩEΓ + F ≤ ΩEΓ + F ≤ kΩk2 E
r q
r
1
q
αp
e
e
e q e p
+
+ F ≤ 1 +
E
≤ α E F .
α+δ
(α + δ)p
By (5.11), we deduce that
r r
q
q
αp
α
q e |||X||| ≤ p 1 +
1−
E + Fe
p
α+δ
(α + δ)
q
1 q
e q + |||Fe|||q . Similarly, if |||X||| <
which produces that if |||X||| ≥ |||Y |||, |||X||| ≤ η
|||E|||
p
p
q
q
|||Y |||, from ΩX −Y Γ = ΩE +F Γ, we can obtain |||Y ||| ≤ η1 q |||E||| + |||F ||| . Inequality
p
(5.10) now follows. The case F = Fe = 0 can be handled analogously.
Proof of Theorem 4.2. By (5.2) and (5.3) and Lemma 5.1, we have
o
n
e ∗ e ∗ U
V
,
V
max U
1
1
2
2
(r
q q
1
e ∗ ∗
max q Ve2∗ (I − D2−1 )V1 + U
(D
−
I)U
≤
,
1
2
1
ηp
)
r
q q
e ∗ ∗
q e ∗
−1
U (I − D )U1 + V (D − I)V1 2
≤
1
max
ηp
1
2
2
q
q
(I − D−1 )V1 q + |||(D∗ − I)U1 |||q ,
1
2
q
q
q
q −1
∗
(I − D1 )U1 + |||(D2 − I)V1 ||| ,
491
RELATIVE PERTURBATION THEORY II
as required. Lemma 2.3 and (5.4) yield that
Ã
!
ÃÃ
!q
e ∗
e ∗ (I − D−1 )U1
1 U
U2 U1
2
1
(5.12) ≤
Ve2∗ V1 η p Ve2∗ (I − D2−1 )V1 Ã
!q !1/q
e ∗ ∗
U2 (D1 − I)U1
,
+ Ve2∗ (D2∗ − I)V1 since the conditions of Theorem 4.2 imply
°
°Ã
!°
°
°
e2 °
Σ
°
°
°
° ≤ α, °
° e
°
° Σ1
° Σ2
Σ1
2
−1 °
°
1
°
.
° ≤
°
α+δ
2
e1 ) have the same nonzero singular values and so do Ve ∗ V1
e ∗ U1 and sin Θ(U1 , U
Since U
2
2
e
and sin Θ(V1 , V1 ),
Ã
! Ã
!
e ∗
e1 )
sin Θ(U1 , U
U2 U1
=
(5.13)
.
sin Θ(V1 , Ve1 ) Ve2∗ V1 Note also
Ã
e ∗ (I − D−1 )U1
U
2
1
Ã
! Ã
Ve2∗ (I − D2−1 )V1
e ∗ (D∗ − I)U1
U
2
1
Ve2∗ (D2∗ − I)V1
=
! Ã
=
e∗
U
2
e∗
U
2
!
Ve2∗
Ve2∗
!
(I − D1−1 )U1
(D1∗ − I)U1
(D2∗ − I)V1
Thus we arrive at
(5.14)
Ã
e ∗
U2 (I − D1−1 )U1
(5.15)
Ã
e ∗ ∗
U2 (D1 − I)U1
! (I − D1−1 )U1
≤
Ve2∗ (I − D2−1 )V1 ! (D1∗ − I)U1
≤
Ve2∗ (D2∗ − I)V1 ,
(I − D2−1 )V1 .
(D2∗ − I)V1 Inequality (4.10) is a consequence of (5.12)–(5.15). By (5.5) and (5.6) and Lemma 5.1,
we have
o
n
1
e ∗ ∗
−1
max Ve2∗ (D2∗ − D2−1 )V1 , U
(D
−
D
)U
max {|||X||| , |||Y |||} ≤
1 2
1
1
ηc
ª
1
max (D2∗ − D2−1 )V1 , (D1∗ − D1−1 )U1 .
≤
ηc
Notice that
,
(I − D2−1 )V1
e ∗ V2 V1 ≤ |||X||| + Ve2∗ (I − D2∗ )V1 ,
e ∗
e ∗ ∗
U2 U1 ≤ |||X||| + U
2 (I − D1 )U1 .
Inequality (4.11) now follows. Similarly apply Lemma 2.3 to (5.7) to get (4.12).
.
492
REN-CANG LI
6. Conclusions. We have developed a relative perturbation theory for eigenspace
and singular subspace variations under multiplicative perturbations. In the theory,
extensions of Davis–Kahan sin θ theorems and Wedin sin θ theorems from the classical
perturbation theory are made. Our unifying treatment covers almost all previously
studied cases over the last six years or so and yet produces sharper bounds. Straightforward extensions of the underlying theory of this paper to diagonalizable matrices
are outlined.
The theory is built upon bounds on solutions to various Sylvester equations with
structured right-hand sides. This provides technical links to the classical Davis and
Kahan development for eigenvalue problems and to Wedin’s development for singular
value problems. Although these equations are used as tools in this paper to study
eigenspace and singular subspace variations, we believe they should deserve at least
as much attention as the bounds they lead to.
Acknowledgment. I thank Professor W. Kahan for his consistent encouragement and support and Professor J. Demmel and Professor B. N. Parlett for helpful
discussions. The referees’ constructive comments for improving the presentation are
greatly appreciated.
REFERENCES
[1] J. Barlow and J. Demmel, Computing accurate eigensystems of scaled diagonally dominant
matrices, SIAM J. Numer. Anal., 27 (1990), pp. 762–791.
[2] R. Bhatia, C. Davis, and A. McIntosh, Perturbation of spectral subspaces and solution of
linear operator equations, Linear Algebra Appl., 52–53 (1983), pp. 45–67.
[3] C. Davis and W. Kahan, The rotation of eigenvectors by a perturbation. III, SIAM J. Numer.
Anal., 7 (1970), pp. 1–46.
[4] J. Demmel and W. Gragg, On computing accurate singular values and eigenvalues of matrices
with acyclic graphs, Linear Algebra Appl., 185 (1993), pp. 203–217.
[5] J. Demmel and W. Kahan, Accurate singular values of bidiagonal matrices, SIAM J. Sci.
Statist. Comput., 11 (1990), pp. 873–912.
[6] J. Demmel and K. Veselić, Jacobi’s method is more accurate than QR, SIAM J. Matrix Anal.
Appl., 13 (1992), pp. 1204–1245.
[7] S. C. Eisenstat and I. C. F. Ipsen, Relative perturbation bounds for eigenspaces and singular
vector subspaces, in Proceedings of the Fifth SIAM Conference on Applied Linear Algebra,
J. G. Lewis, ed., SIAM, Philadelphia, 1994, pp. 62–66.
[8] S. C. Eisenstat and I. C. F. Ipsen, Relative perturbation techniques for singular value problems, SIAM J. Numer. Anal., 32 (1995), pp. 1972–1988.
[9] W. Kahan, Accurate Eigenvalues of a Symmetric Tridiagonal Matrix, Technical Report CS41,
Computer Science Department, Stanford University, Stanford, CA, 1966 (revised June
1968).
[10] R.-C. Li, On Deflating Bidiagonal Matrices, manuscript, Department of Mathematics, University of California, Berkeley, CA, 1994.
[11] R.-C. Li, On perturbations of matrix pencils with real spectra, Math. Comp., 62 (1994), pp. 231–
265.
[12] R.-C. Li, Relative perturbation theory: (I). Eigenvalue and singular value variations, SIAM J.
Matrix Anal. Appl., 19 (1998), pp. 956–982.
[13] R.-C. Li, Relative perturbation theory: (II) Eigenspace and singular subspace variations, Technical Report UCB//CSD-94-856, Computer Science Division, Department of EECS, University of California at Berkeley, 1994, also LAPACK Working notes # 85 (revised January
1996 and April 1996, available at http://www.netlib.org/lapack/lawns/lawn85.ps).
[14] R. Mathias, Spectral Perturbation Bounds for Positive Definite Matrices, SIAM J. Matrix
Anal. Appl., 18 (1997), pp. 959–980.
[15] G. W. Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press, Boston, 1990.
[16] P.-Å. Wedin, Perturbation bounds in connection with singular value decomposition, BIT, 12
(1972), pp. 99–111.
[17] P.-Å. Wedin, On angles between subspaces, in Matrix Pencils, B. Kågström and A. Ruhe, eds.,
Springer–Verlag, New York, 1983, pp. 263–285.