Maximization of the Sum of the Trace Ratio on the Stiefel Manifold

Maximization of the Sum of the Trace
Ratio on the Stiefel Manifold
Lei-Hong Zhang
Ren-Cang Li
Technical Report 2013-04
http://www.uta.edu/math/preprint/
Maximization of the Sum of the Trace Ratio on the Stiefel
Manifold
Lei-Hong Zhang˚
Ren-Cang Li
:
July 14, 2013
Abstract
We are concerned with the maximization of
trpV J AV q
` trpV J CV q
trpV J BV q
over the Stiefel manifold tV P Rmˆℓ |V J V “ Iℓ u pℓ ă mq, where B is a given symmetric
and positive definite matrix, A and C are symmetric matrices, and trp ¨ q is the trace
of a square matrix. This is a subspace version of the maximization problem studied
in [Zhang, Comput. Optim. Appl., 54(2013), 111-139], which arises from real-world
applications in, for example, the downlink of a multi-user MIMO system and the sparse
Fisher discriminant analysis in pattern recognition. We establish necessary conditions
for both the local and global maximizers and connect the problem with a nonlinear
extreme eigenvalue problem. The necessary condition for the global maximizers offers
deep insights into the problem on the one hand and leads to a self-consistent-field
(SCF) iteration on the other hand. We analyze the global and local convergence of
the SCF iteration, and show that the necessary condition for the global maximizers is
fulfilled at any convergent point of the sequences of approximations generated by the
SCF iteration. This is one of the advantages of the SCF iteration over optimizationbased methods. Preliminary numerical tests are reported and show that the SCF
iteration is much more efficient than manifold-based optimization methods.
Key words. Trace ratio, Rayleigh quotient, the Stiefel manifold, nonlinear eigenvalue problem,
optimality conditions, Self-Consistent-Field iteration, eigenspace
AMS subject classifications. 65F15, 65F30, 62H30, 15A18
˚
Department of Applied Mathematics, Shanghai University of Finance and Economics, 777 Guoding
Road, Shanghai 200433, People’s Republic of China. Email: [email protected]. Supported in part by
the National Natural Science Foundation of China NSFC-11101257 and the Basic Academic Discipline
Program, the 11th five year plan of 211 Project for Shanghai University of Finance and Economics. Part
of this work is done while this author is a visiting scholar at the Department of Mathematics, University
of Texas at Arlington from February 2013 to January 2014.
:
Department of Mathematics, University of Texas at Arlington, P.O. Box 19408, Arlington, TX 760190408, USA. Email: [email protected]. Supported in part by the National Science Foundation under Grant
No. DMS-1115834.
1
Contents
1 Introduction
3
2 Preliminaries
2.1 Angles between subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Unitarily invariant norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
5
3 First and second optimality conditions
3.1 First-order optimality condition . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Second-order optimality condition . . . . . . . . . . . . . . . . . . . . . . .
6
7
8
4 A necessary condition for local maximizers
11
5 A necessary condition for global maximizers
12
6 A scheme to make progress at a KKT point
17
7 Self-consistent-field iteration
7.1 Self-Consistent-Field (SCF) Iteration . . . . . . .
7.2 A divergent example for SCF iteration . . . . . .
7.3 Global convergence for the case C “ ηB pη ą 0q .
7.4 Local convergence for the general case . . . . . .
20
20
21
22
26
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8 Numerical experiments
33
9 Concluding remarks and future research
36
2
1
Introduction
We consider the maximization problem:
"
*
trpV J AV q
J
max
` trpV CV q ,
V J V “Iℓ trpV J BV q
(1.1)
where trp ¨ q stands for the trace of a square matrix, A, B, and C P Rmˆm are real
symmetric with B positive definite, and integer ℓ ă m.
The problem (1.1) for the case ℓ “ 1 was investigated in [36]. It can arise from realworld applications in, e.g., the downlink of a multi-user MIMO system [20] and the sparse
Fisher discriminant analysis in pattern recognition [6, 8, 16, 36]. In [36], the optimality conditions for (1.1) with ℓ “ 1, including necessary conditions for local and global
maximizers, are established. It is interesting to note that these conditions connect the
optimization problem (1.1) with ℓ “ 1 to a nonlinear eigenvalue problem which offers
fresh insights into (1.1) and injects new ideas for designing efficient algorithms.
The present paper attempts to deal with the subspace version (1.1) and generalize
these interesting properties for ℓ “ 1 to ℓ ą 1. But we point out that although the
problem (1.1) with ℓ ą 1 and ℓ “ 1 share certain similar properties, the subspace version
case is intrinsically harder, entailing new techniques for investigation.
Our major contributions in this paper are the following.
1. Any critical point (aka KKT point) V of (1.1) is a solution to a nonlinear eigenvalue
problem
“
‰
EpV qV “ V V J EpV qV ,
(1.2)
where EpV q P Rmˆm to be given in section 3 is symmetric and is dependent on V .
2. A necessary condition for the critical point V to be a local maximizer is that the
largest eigenvalue of V J EpV qV is no smaller than the p2ℓqth largest eigenvalues of
EpV q. (Note the eigenvalues of V J EpV qV are part of those of EpV q because of
(1.2).)
3. A necessary condition for the critical point V to be a global maximizer is that V is
an orthonormal eigenbasis1 of EpV q corresponding to its ℓ largest eigenvalues. This
necessary condition in item 3 together with (1.2) lend themselves to a self-consistentfield (SCF) iteration for computing a global maximizer.
4. We propose a scheme to march out a KKT point V , local maximizer included, that
does not satisfy the necessary condition of a global maximizer in the previous item.
5. We analyze the convergence behavior of the SCF iteration and prove a global convergence result for the special case C “ ηB pη ą 0q. Locally linear and quadratic
convergence for the general case is also established.
1
We follow [26, p.242] in using the term eigenbasis to refer to a matrix X whose columns are linearly
independent and span an invariant subspace of another matrix H. X is an orthonormal eigenbasis if its
columns are orthonormal vectors.
3
We note that there is no guarantee that the SCF iteration will deliver a global maximizer
at convergence but a KKT point that satisfies the necessary condition for the global
maximizers. This is actually an advantage of the SCF iteration over some optimizationbased methods [1, 18] which in general only converge to a KKT point that may or may not
satisfy the necessary condition for the global maximizers. Numerical tests are presented
to show that the SCF iteration in general is much more efficient than some manifold-based
optimization methods [2, 7], both in accuracy and in running time.
The rest of this paper is organized as follows. Section 2 collects some preliminaries
that are needed in the convergence analysis of the SCF iteration. Section 3 develops the
first and second order optimality conditions for local maximizers using the traditional
optimality conditions which relate (1.1) to a special nonlinear eigenvalue problem (1.2).
In sections 4 and 5, further characterizations of a local maximizer (section 4) and global
maximizer (section 5) in terms of the eigenvalues of the associated nonlinear eigenvalue
problem are obtained. A scheme to break out of a local maximizer/KKT point is proposed
in section 6. The SCF iteration and its convergence analysis are given in section 7. Section
8 reports our numerical experiments. Finally, we present our conclusions in Section 9.
Notation. Rnˆm is the set of all n ˆ m real matrices, Rn “ Rnˆ1 , and R “ R1 . In (or
simply I if its dimension is clear from the context) is the n ˆ n identity matrix. All vectors
are column vectors and are in bold. For Z P Rmˆn , Z J denotes its transpose, RpZq is its
column space, spanned by its column vectors. For Z P Rnˆn , eigpZq “ tλi pZq, 1 ď i ď mu
is the set of the eigenvalues of Z, and if also all λi pZq are real, they will be arranged in
descending order:
λ1 pZq ě λ2 pZq ě ¨ ¨ ¨ ě λm pZq.
The symmetric part of Z is sympZq :“ 12 pZ ` Z J q. }Z}2 , }Z}F , and ~Z~ are the spectral
norm, the Frobenius norm, and a general unitarily invariant norm, respectively.
2
Preliminaries
2.1
Angles between subspaces
Consider two subspaces X and Y of Rm and suppose
k :“ dimpXq ď dimpYq “: ℓ.
(2.1)
Let X P Rmˆk and Y P Rmˆℓ be orthonormal basis matrices of X and Y, respectively, i.e.,
X J X “ Ik , X “ RpXq,
and
Y J Y “ Iℓ , Y “ RpY q,
and denote by σj for 1 ď j ď k in descending order, i.e., σ1 ě ¨ ¨ ¨ ě σk , the singular values
of Y J X. The k canonical angles θj pX, Yq from2 X to Y are defined by
0 ď θj pX, Yq :“ arccos σj ď
2
π
2
for 1 ď j ď k.
If k “ ℓ, we may says that these angles are between X and Y.
4
(2.2)
They are in ascending order, i.e., θ1 pX, Yq ď ¨ ¨ ¨ ď θk pX, Yq. Set
ΘpX, Yq “ diagpθ1 pX, Yq, . . . , θk pX, Yqq.
(2.3)
It can be seen that angles so defined are independent of the orthonormal basis matrices X
and Y which are not unique. A different way to define these angles is through orthogonal
projections onto X and Y [31].
When k “ 1, i.e., X is a vector, there is only one canonical angle from X to Y and so
we will simply write θpX, Yq.
In what follows, we sometimes place a vector or matrix in one of or both arguments
of θj p ¨ , ¨ q, θp ¨ , ¨ q, and Θp ¨ , ¨ q with the understanding that it is about the subspace
spanned by the vector or the columns of the matrix argument.
} sin ΘpX, Yq}2 defines a distance metric between X and Y. It can be proved that
} sin ΘpX, Yq}2 “ sin θk pX, Yq “ }X J YK }2 ,
where YK is an orthonormal basis matrix of the orthogonal complement of Y.
2.2
Unitarily invariant norm
A matrix norm ~¨~ is called a unitarily invariant norm on Cmˆn (the set of mˆn complex
matrices) if it is a matrix norm and has the following two properties
1. ~XBY ~ “ ~B~ for all unitary matrices X and Y of apt sizes and B P Cmˆn .
2. ~B~ “ }B}2 , the spectral norm of B, if rankpBq “ 1.
Two commonly used unitarily invariant norms are
the spectral norm:
the Frobenius norm:
}B}2 “ max
břj σj ,
2
}B}F “
j σj ,
where σ1 , σ2 , . . . , σmintm,nu are the singular values of B. The trace norm
ÿ
}B}tr “
σj
j
is a unitarily invariant norm, too.
In this article, for convenience, any ~ ¨ ~ we use is generic to matrix sizes in the sense
that it applies to matrices of all sizes. Examples include the matrix spectral norm } ¨ }2 ,
the Frobenius norm } ¨ }F , and the trace norm. Two important properties of unitarily
invariant norms are
}X}2 ď ~X~,
~XY Z~ ď }X}2 ¨ ~Y ~ ¨ }Z}2
(2.4)
for any matrices X, Y , and Z of compatible sizes. By [11, p.176],
| trpXq| ď }X}tr ď n}X}2 for X P Cnˆn .
5
(2.5)
Lemma 2.1 ([4]). Let H and M be two Hermitian matrices, and let S be a matrix of a
compatible size as determined by the Sylvester equation HY ´ Y M “ S. If either eigpHq
is in a close interval that contains no eigenvalue of M or vice versa, then the equation has
a unique solution Y , and moreover
~Y ~ ď
1
~S~,
τ
where τ “ min |µ ´ ω| over all µ P eigpM q and ω P eigpHq.
3
First and second optimality conditions
To simplify our presentation, we introduce the following notations
ϕA pV q :“ trpV J AV q,
ϕB pV q :“ trpV J BV q,
ϕC pV q :“ trpV J CV q,
(3.1)
for any V P Rmˆℓ . The objective function in (1.1) now can be written as
f pV q :“
ϕA pV q
` ϕC pV q,
ϕB pV q
(3.2)
subject to the constraint V J V “ Iℓ . Let
Omˆℓ :“ tV P Rmˆℓ : V J V “ Iℓ u.
We remark that maximizing f pV q over Omˆℓ is much difficult than maximizing each
q
mˆℓ . A classical result for
term, i.e., ϕϕBA pV
pV q and ϕC pV q individually on O
max trpV J CV q
V POmˆℓ
(3.3)
reveals that any solution V of (3.3) is an orthonormal eigenbasis associated with the ℓ
largest eigenvalues of C and there is no local but non-global maximizer (i.e., any local
maximizer is also a global maximizer). A recent result in [24] for the trace ratio problem
max
V POmˆℓ
trpV J AV q
trpV J BV q
(3.4)
shows that no local but non-global maximizer exists, either. This nice property ensures
that any monotonic and convergent iteration is capable of finding a global maximizer for
(3.3) or (3.4) numerically. However, when f pV q is to be maximized over Omˆℓ , the story is
very different because local but non-global maximizers can happen as a numerical example
for the case ℓ “ 1 in [36] shows.
Often necessary and sufficient conditions for global maximizers of an optimization
problem are generally difficult to establish. This is the case for maximizing f pV q here,
too. In this section, we will first resort to the traditional approach to derive first-order
and second-order optimality conditions. They usually follow from the classical Lagrange
6
multiplier theory (e.g., [18]), but considering the nice geometry properties of the underlying Riemannian manifold Omˆℓ , we prefer to treating the function f pV q as the restriction
f|Omˆℓ pV q : Omˆℓ Ñ R that is to be maximized as an unconstrained optimization problem. For more discussions on the optimality conditions of the nonlinear programming on
manifolds, the reader is referred to [1, 35] and the references therein.
We can view Omˆℓ as an embedded submanifold (the Stiefel manifold, see e.g., [1, 3,
7, 10]) of the Euclidean space Rmˆℓ . The tangent space TV Omˆℓ at any V P Omˆℓ can be
expressed as [1, p.42], [3, Theorem 1]
TV Omˆℓ : “ tX P Rmˆℓ : X J V ` V J X “ 0u
(3.5a)
“ tX “ V K ` pIm ´ V V qJ : K “ ´K P R
J
J
ℓˆℓ
,J P R
mˆℓ
u.
(3.5b)
On TV Omˆℓ , we introduce the standard inner product (or the Frobenius inner product)
@ X, Y P TV Omˆℓ .
xX, Y y “ trpX J Y q,
It is known that the orthogonal projection of Z P Rmˆℓ onto the tangent space TV Omˆℓ
at V is given by [1, (3.35)], [3, Corollary 1]
ˆ J
˙
V Z ´ Z JV
ΠT pZq : “ V
` pIm ´ V V J qZ
(3.6a)
2
V JZ ` Z JV
“ Z ´ V sympV J Zq P TV Omˆℓ .
(3.6b)
“Z ´V
2
3.1
First-order optimality condition
Using the projection defined by (3.6), we can find the gradient of f|Omˆℓ pV q. In fact,
„
ȷ
1
ϕA pV q
Bf pV q
“2 A
´B
` C V “ 2 EpV qV.
(3.7)
BV
ϕB pV q
rϕB pV qs2
looooooooooooooooooomooooooooooooooooooon
“: EpV q
Thus the gradient grad f|Omˆℓ pV q is given by (see e.g., [1, (3.37)])
ˆ
˙
␣
“ J
‰(
Bf pV q
grad f|Omˆℓ pV q “ ΠT
“ 2 EpV qV ´ V loooooomoooooon
V EpV qV .
BV
(3.8)
“: MV
The first-order optimality condition grad f|Omˆℓ pV q “ 0 leads to
Theorem 3.1. If V P Omˆℓ is a local maximizer of (1.1), then
EpV qV “ V MV ,
(3.9)
where EpV q is defined in (3.7), and MV is defined in (3.8). Therefore eigpMV q Ă
eigpEpV qq, and V is an orthonormal eigenbasis of EpV q associated with its eigenvalues
given by eigpMV q.
7
Remark 3.1. Under the condition of Theorem 3.1, trpMV q “ ϕC pV q. Since f pV Zq ”
f pV q for any orthogonal matrix Z P Rℓˆℓ , if V is a local maximizer, so is V Z. That is
any local maximizer V gives rise to a set of local maximizers tV Z : Z J Z “ Iℓ u and they
all produce the same objective value. Within the set, there is one V0 that makes MV0
diagonal:
MV0 “ diagpλπ1 pEpV0 qq, . . . , λπℓ pEpV0 qqq, V0 “ rvv 1 , . . . , v ℓ s,
(3.10)
where v i P Rm is the unit eigenvector corresponding to λπi pEpV0 qq, and
λπ1 pEpV0 qq ě ¨ ¨ ¨ ě λπℓ pEpV0 qq.
This observation is also reflected by the equation (3.9) which is equivalent to
EpV Zq V Z “ V ZMV Z ,
for any given orthogonal matrix Z P Rℓˆℓ .
3.2
Second-order optimality condition
Next we will establish the second-order optimality condition for (1.1). Set
gpV q :“ grad f|Omˆℓ pV q.
The second-order optimality condition relies on the Riemannian Hessian operator. For
f|Omˆℓ pV q, the symmetric Riemannian Hessian operator at a point V P Omˆℓ
hess f|Omˆℓ pV q : TV Omˆℓ Ñ TV Omˆℓ ,
X
ÞÑ ▽X grad f|Omˆℓ pV q
involves the so-called affine connection ▽. If the natural Riemannian connection [1, section
5.5] is used as the affine connection, the action of Riemannian Hessian of f|Omˆℓ pV q on a
tangent “vector” X P TV Omˆℓ can be expressed by [1, Definition 5.5.1 and Proposition
5.3.2]
hess f|Omˆℓ pV qrXs “ ΠT pDgpV qrXsq,
(3.11)
where DgpV qrXs represents the classical directional derivative at V P Omˆℓ along X. It
is readily to derive, by (3.8), that
␣
(
1
DgpV qrXs “ EpV qX ` DEpV qrXs V ´ XV J EpV qV
2
␣
(
´ V X J EpV qV ´ V V J DEpV qrXs V ´ V V J EpV qX
“
‰
“ EpV qX ´ V X J EpV qV ` V J EpV qX
␣
(
´ XV J EpV qV ` pIm ´ V V J q DEpV qrXs V,
(3.12)
and
DEpV qrXs “ ´
”
ı 4 trpV J AV q trpX J BV q
2
J
J
B
trpX
BV
qA
`
trpX
AV
qB
`
rtrpV J BV qs2
rtrpV J BV qs3
8
trpV J AV q trpX J BV q
trpX J BV qA ` trpX J AV qB
B
´
2
rtrpV J BV qs3
rtrpV J BV qs2
“: GpV, Xq.
“4
(3.13)
Thus, by (3.11) and by noting
ΠT prIm ´ V V J sZ1 q “ pIm ´ V V J qZ1
for Z1 P Rmˆℓ ,
for Z2 “ Z2J P Rℓˆℓ ,
ΠT pV Z2 q “ 0
we have
”
hess f|Omˆℓ pV qrXs “ 2 EpV qX ´ V sympV J EpV qXq ´ XV J EpV qV
ı
`V sympV J XV J EpV qV q ` GpV, XqV ´ V V J GpV, XqV . (3.14)
Throughout the rest of this section, V P Omˆℓ is assumed to be a critical point (aka
a KKT point), i.e., satisfying (3.9). We shall investigate the second-order optimality
condition at V . To this end, we need to find out
hess f|Omˆℓ pV qrX, Xs “ xhess f|Omˆℓ pV qrXs, Xy
for X P TV Omˆℓ .
Notice that for any symmetric matrix Z P Rℓˆℓ , it follows from (3.5a) that
xX, V Zy “ trpX J V Zq “ trpZV J Xq “ ´ trpZX J V q “ ´ trpX J V Zq “ ´xX, V Zy,
implying xX, V Zy “ 0. Therefore, from (3.14) we have
xhess f|Omˆℓ pV qrXs, Xy “ 2xX, EpV qXy ´ 2xX, XV J EpV qV ` GpV, XqV y,
(3.15)
and consequently, we can state the second-order optimality condition as follows.
Theorem 3.2. If V is a local maximizer of (1.1), then
trpX J EpV qXq ´ trpXMV X J q ´ trpX J GpV, XqV q ď 0
for any
X P TV Omˆℓ , (3.16)
where GpV, Xq is defined in (3.13), and MV “ V J EpV qV in (3.8). On the other hand,
if V P Omˆℓ satisfies (3.9) and if (3.16) is a strict inequality, then V is a strict local
maximizer.
Proof. The second-order necessary condition for V to be a local maximizer is [18, 35]
xX, hess f|Omˆℓ pV qrXsy ď 0 for any X P TV Omˆℓ
which yields (3.16) because of (3.15). The second part is the second-order sufficient
optimality condition [35].
The second-order optimality condition in Theorem 3.2 is presented in terms of the
tangent vector X P TV Omˆℓ . In the next theorem, we describe the optimality condition
in terms of any J P Rmˆℓ .
9
Theorem 3.3. If V is a local maximizer of (1.1), then for all J P Rmˆℓ
trpJ J EpV qJq ` trpV J JMV J J V q ´ trpJ J V MV V J Jq ´ trpJMV J J q
`4
trpJ J rIm ´ V V J sBV q trpJ J rIm ´ V V J sCV q
ď 0. (3.17)
ϕB pV q
On the other hand, if V P Omˆℓ satisfies (3.9) and if (3.17) is strict for any nonzero
J P Rmˆℓ , then V is a strict local maximizer.
Proof. By (3.5b), we know that
␣
(
TV Omˆℓ “ X “ V K ` pIm ´ V V J qJ : K “ ´K J P Rℓˆℓ , J P Rmˆℓ ,
i.e., X “ V K ` pIm ´ V V J qJ P TV Omˆℓ for any skew-symmetric K and arbitrary J P
Rmˆℓ . For such an X, we have
EpV qX “ EpV qV K ` EpV qJ ´ EpV qV V J J
“ V MV K ` EpV qJ ´ V MV V J J,
(3.18)
where we have used (3.9), and
trpX J EpV qXq “ trpK J MV Kq ` trpK J V J EpV qJq
´ trpK J MV V J Jq ` trpJ J rIm ´ V V J sEpV qJq
“ trpK J MV Kq ` trpJ J rIm ´ V V J sEpV qJq
“ trpK J MV Kq ` trpJ J EpV qJq ´ trpJ J V MV V J Jq.
It can be verified that X J X “ K J K ` J J pIm ´ V V J qJ. Thus
trpX J XMV q “ trpK J KMV q ` trpJ J JMV q ´ trpJ J V V J JMV q
“ trpKMV K J q ` trpJ J JMV q ´ trpV J JMV J J V q
“ trpK J MV Kq ` trpJ J JMV q ´ trpV J JMV J J V q,
where we have used K “ ´K J for the last equality. Hence,
trpX J EpV qXq ´ trpXMV X J q
“ trpJ J EpV qJq ` trpV J JMV J J V q ´ trpJ J V MV V J Jq ´ trpJMV J J q. (3.19)
Lastly, we compute trpX J GpV, XqV q as follows.
4ϕA pV qrtrpX J BV qs2 4 trpX J BV q trpX J AV q
´
rϕB pV qs3
ϕB pV q
J
4 trpX BV q
“
rϕA pV q trpX J BV q ´ ϕB pV q trpX J AV qs
rϕB pV qs3
trpX J GpV, XqV q “
10
4 trpX J BV q
xX, rϕA pV qB ´ ϕB pV qAsV y
rϕB pV qs3
4 trpX J BV q
“
xX, rϕB pV qs2 rC ´ EpV qsV y
rϕB pV qs3
4 trpX J BV q
“
xX, rC ´ EpV qsV y.
ϕB pV q
“
(3.20)
Note from (3.18) that
xX, EpV qV y “ xV, V MV K ` EpV qJ ´ V MV V J Jy
“ trpMV Kq ` trpJ J EpV qV q ´ trpMV V J Jq
“ trpJ J V MV q ´ trpMV V J Jq “ 0,
(3.21)
and
xX, CV y “ xV K ` pIm ´ V V J qJ, CV y “ trpJ J rIm ´ V V J sCV q,
xX, BV y “ xV K ` pIm ´ V V J qJ, BV y “ trpJ J rIm ´ V V J sBV q
which, together with (3.16), (3.19), (3.20) and (3.21), lead to (3.17). The second part
follows directly from Theorem 3.2.
4
A necessary condition for local maximizers
The purpose of this section is to establish a necessary condition for a local maximizer V of
(1.1) in terms of the eigenvalues of EpV q. The same will be done for a global maximizer
in section 5.
Suppose V is a local maximizer of (1.1). According to Theorem 3.1 and Remark 3.1,
eigpMV q Ă eigpEpV qq, i.e.,
eigpMV q “ tλπi pEpV qq, i “ 1, 2, . . . , ℓu,
(4.1)
where 1 ď π1 ă π2 ¨ ¨ ¨ ă πℓ ď m.
Theorem 4.1. Let V P Omˆℓ be a local maximizer of (1.1), and denote eigpMV q by (4.1).
Then
λπ1 pEpV qq ě λ2ℓ pEpV qq.
(4.2)
Proof. Assume, to the contrary, that (4.2) were false, i.e.,
λπ1 pEpV qq ă λ2ℓ pEpV qq.
(4.3)
Let J1 P Rmˆ2ℓ be an orthonormal eigenbasis of EpV q corresponding to its 2ℓ largest
eigenvalues λi pEpV qq for 1 ď i ď 2ℓ. Then J1J V “ 0 because V is the orthonormal
eigenbasis of EpV q corresponding to its eigenvalues λπi pEpV qq for 1 ď i ď ℓ, all of which
11
were assumed less than λ2ℓ pEpV qq by (4.3). For any Q P R2ℓˆℓ with orthonormal columns,
let J “ J1 Q P Rmˆℓ . Then J J J “ Iℓ and J J V “ QJ J1J V “ 0, and thus
trpJ J EpV qJq ą trpV J EpV qV q “ trpMV q “ trpJ J JMV q “ trpJMV J J q.
(4.4)
Therefore, it follows from (3.17) that
trpJ J EpV qJq ´ trpMV q ` 4
trpJ J BV q trpJ J CV q
ď 0.
ϕB pV q
(4.5)
On the other hand, we note that
x J q,
p Σ1 W
J J BV “ QJ pJ1J BV q “ QJ pU
p Σ1 W
x J is the SVD of J J BV and U
p P R2ℓˆℓ , Σ1 P Rℓˆℓ and W
x P Rℓˆℓ . Since Q
where U
1
p , Qs P R2ℓˆ2ℓ is orthogonal. We
is arbitrary so far, we now let Q be the one such that rU
have
xJ “ 0
p Σ1 W
J J BV “ QJ pJ1J BV q “ QJ U
which leads to
trpJ J EpV qJq ´ trpMV q ď 0,
contradicting (4.4). Therefore (4.3) cannot be true. That leaves (4.2).
Theorem 4.1 basically says that any local maximizer V will never be an orthonormal
eigenbasis of EpV q associated with the ℓ eigenvalues all less than λ2ℓ pEpV qq. For the special
case with ℓ “ 1, Theorem 4.1 implies that any local maximizer v P Rm can only be the
eigenvector associated with the largest or the second largest eigenvalue of Epvv q. This is
exactly what was proved in [36, Theorem 3.3], even though the matrix Epvv q in [36] takes
a different form. Theorem 4.1 can be considered as a generalization of [36, Theorem 3.3]
for the necessary condition of a local maximizer.
5
A necessary condition for global maximizers
In this section and the next, C is assumed to be positive definite. This assumption is not
essential because one can always shift C to C ` ξIm for some ξ P R to make C positive
definite. The shift does not change the maximizer V .
Theorem 5.1. Suppose C is positive definite. If V is a global maximizer of (1.1), then
it must be an orthonormal eigenbasis3 of EpV q corresponding to its ℓ largest eigenvalues
λi pEpV qq for 1 ď i ď ℓ.
This result generalizes the necessary condition for the vector version [36, Theorem
3.5] of the problem (1.1), i.e., the case ℓ “ 1. Its proof, similar to [36], hinges on the
relationship between
∆f pJ, V q :“ f pJq ´ f pV q and
∆EpJ, V q :“ trpJ J EpV qJq ´ trpV J EpV qV q.
This is established in the following lemma.
3
RpV q is not unique unless λℓ pEpV qq ą λℓ`1 pEpV qq.
12
(5.1)
Lemma 5.1. For any J, V P Omˆℓ , we have
∆f pJ, V q “ f pJq ´ f pV q
ϕB pV q∆EpJ, V q ` rϕB pJq ´ ϕB pV qsrϕC pJq ´ ϕC pV qs
“
.
ϕB pJq
(5.2)
Proof. Note that trpV J EpV qV q “ ϕC pV q and
ϕA pJq
ϕA pV qϕB pJq
` ϕC pJq ´ ϕC pV q
´
ϕB pV q
rϕB pV qs2
ˆ
˙
ϕB pJq ϕA pJq ϕA pV q
“
´
` ϕC pJq ´ ϕC pV q
ϕB pV q ϕB pJq ϕB pV q
‰
ϕB pJq “
“
∆f pJ, V q ´ ϕC pJq ` ϕC pV q ` ϕC pJq ´ ϕC pV q
ϕB pV q
ϕB pJq
ϕB pV q ´ ϕB pJq
“
∆f pJ, V q `
rϕC pJq ´ ϕC pV qs
ϕB pV q
ϕB pV q
∆EpJ, V q “
solving which for ∆f pJ, V q leads to (5.2).
Rewrite (5.2) as
∆f pJ, V q “ f pJq ´ f pV q “
ϕB pV q∆EpJ, V q ` δpJ, V q
,
ϕB pJq
(5.3)
where
δpJ, V q :“ rϕB pJq ´ ϕB pV qsrϕC pJq ´ ϕC pV qs.
(5.4)
We are now ready to prove Theorem 5.1.
Proof of Theorem 5.1. We prove it by contradiction. Suppose V “ rvv 1 , . . . , v ℓ s is a global
maximizer but is not an orthonormal eigenbasis associated with the ℓ largest eigenvalues of
EpV q. We shall construct U P Omˆℓ such that f pU q ą f pV q, contradicting the assumption
that V is a global maximizer. So the conclusion of the theorem holds.
As we commented in Remark 3.1, we may assume, without loss of generality, that
EpV qvv i “ λπi pEpV qqvv i ,
i “ 1, 2, . . . , ℓ,
where pλπi pEpV qq, v i q is the πi th largest eigenpair of EpV q and
λπ1 pEpV qq ě λπ2 pEpV qq ě . . . ě λπℓ pEpV qq.
According to our assumption, we know that there exists at least one i such that
λπi pEpV qq ă λi pEpV qq.
Let p be the first such an i, i.e.,
p :“ min ti : λπi pEpV qq ă λi pEpV qqu.
1ďiďℓ
13
(5.5)
There is a unit eigenvector u of EpV q such that4
u “ λp pEpV qqu
u,
EpV qu
and u Jv i “ 0 for i “ 1, 2, . . . , ℓ.
(5.6)
Now, for any α, β P R, define
u ` βvv p , v p`1 , . . . , v ℓ s
Uα,β “ rvv 1 , v 2 , . . . , v p´1 , αu
us e J
“ V ´ rp1
´ βqvv p ´ αu
p
looooooooomooooooooon
“: y
“V
´ yeJ
p
P Rmˆℓ ,
(5.7)
where e p is the pth column of Im . It can be verified that
Uα,β P Omˆℓ
for all α, β P R satisfying α2 ` β 2 “ 1,
and if also α ‰ 0,
J
∆EpUα,β , V q “ trpUα,β
EpV qUα,β q ´ trpMV q “ α2 rλp pEpV qq ´ λπp pEpV qqs ą 0.
(5.8)
The rest of the proof is to look for some α, β P R satisfying α2 ` β 2 “ 1 such
that f pUα,β q ą f pV q. To this end, we recall (5.1). So, equivalently, we need to make
∆f pUα,β , V q ą 0. By (5.3) and (5.8), it suffices to find α, β P R, α2 ` β 2 “ 1, α ‰ 0, and
δpUα,β , V q “ 0 which is equivalent to either ϕB pUα,β q “ ϕB pV q or ϕC pUα,β q “ ϕC pV q.
We have
J
Uα,β
V “ V J Uα,β “ Iℓ ´ p1 ´ βqeepe J
p,
J
trpUα,β
rIm
J
´ V V sBV q “
“
J
J
trpUα,β
BV q ´ ϕB pV q ` p1 ´ βq trpeepe J
p V BV
J
´ trpeep y J BV q ` p1 ´ βq trpeepe J
p V BV q
(5.9)
q
J
“ ´yy J BV e p ` p1 ´ βqeeJ
p V BV e p
v p ` αu
uJ Bvv p ` p1 ´ βqvv J
vp
“ ´p1 ´ βqvv J
p Bv
p Bv
uJ Bvv p ,
“ αu
(5.10)
and similarly,
J
uJ Cvv p .
trpUα,β
rIm ´ V V J sCV q “ αu
(5.11)
Since V is a global maximizer, we have, by (5.9), (5.10), (5.11), and by applying Theorem 3.2 to Uα,β ,
J
trpUα,β
EpV qUα,β q ´ trpMV q ` 4
uJ Bvv p qpu
uJ Cvv p q
α2 pu
ď 0.
ϕB pV q
(5.12)
This inequality, together with (5.8), imply that
uJ Bvv p qpu
uJ Cvv p q ă 0.
pu
4
Such u can be found even if λp “ λp´1 “ λπp´1 .
14
(5.13)
On the other hand, we have, by (5.7),
J
ϕB pUα,β q “ trpUα,β
BUα,β q
“ ϕB pV q ´ 2 trpeepy J BV q ` trpeepy J Byye J
pq
“ ϕB pV q ´ 2yy J Bvv p ` y J Byy
2 J
u Bu
u ` 2αβu
uJ Bvv p ` pβ 2 ´ 1qvv J
vp ,
“ ϕB pV q ` α
p Bv
loooooooooooooooooooooooooomoooooooooooooooooooooooooon
(5.14)
“: hB pα,βq
2 J
u ` 2αβu
uJ Cvv p ` pβ 2 ´ 1qvv J
vp .
ϕC pUα,β q “ ϕC pV q ` α
u Cu
p Cv
loooooooooooooooooooooooooomoooooooooooooooooooooooooon
(5.15)
“: hC pα,βq
Because V is a global maximizer, we have from (5.3) that
0 ě f pU1,0 q ´ f pV q “
ϕB pV q∆EpU1,0 , V q ` δpU1,0 , V q
.
ϕB pU1,0 q
(5.16)
J EpV qU q ´ trpM q ą 0, and therefore, (5.16) implies
By (5.8), ∆EpU1,0 , V q “ trpU1,0
1,0
V
0 ą δpU1,0 , V q “ rϕB pU1,0 q ´ ϕB pV qsrϕC pU1,0 q ´ ϕC pV qs
“ hB p1, 0q hC p1, 0q
uJ Bu
u ´ vJ
v p sru
uJ Cu
u ´ vJ
v p s.
“ ru
p Bv
p Cv
(5.17)
There are two possible cases implied by (5.17):
"
u ą vJ
v p and u J Cu
u ă vJ
vp,
Case 1 : u J Bu
p Bv
p Cv
J
J
J
u ă v p Bvv p and u Cu
u ą vJ
Case 2 : u Bu
Cv
p vp.
(5.18)
We will show that for each case, there exist real α ‰ 0 and real β ‰ 0 satisfying α2 `β 2 “ 1
and f pUα,β q ą f pV q. The solutions of either hB pα, βq “ 0 or hC pα, βq “ 0 turn out to be
ones we are looking for.
Suppose Case 1. Consider the equation hB pα, βq “ 0 which can be regarded as a
quadratic equation of α with a parameter β. Since
uJ Bvv p q2 ´ 4pβ 2 ´ 1qpu
uJ Bu
uqpvv J
v p q ą 0 if |β| ă 1,
4pβu
p Bv
there are two roots
uJ Bvv p ˘
´βu
αB,˘ pβq “
b
uJ Bvv p q2 ´ pβ 2 ´ 1qpu
uJ Bu
uqpvv J
vpq
pβu
p Bv
u
u J Bu
.
(5.19)
We claim that
2
ψB pβq :“ αB,`
pβq ` β 2 ´ 1 “ 0
has at least one nonzero root β̃ in p´1, 1q. To prove this claim, we note that
ψB p0q “
vp
vJ
p Bv
v p ).
u ą vJ
´ 1 ă 0 (because Case 1 implies u J Bu
p Bv
J
u
u Bu
15
(5.20)
On the other hand, from (5.13), we know u J Bvv p ‰ 0. Now
uJ Bvv p
2u
ą 0 if u J Bvv p ą 0,
u
u J Bu
uJ Bvv p
2u
ą 0 if u J Bvv p ă 0.
ψB p`1q “ ´ J
u Bu
u
ψB p´1q “
So there exists at least one β̃ in either p´1, 0q or p0, 1q, depending on wether uJ Bvv p ą 0
or u J Bvv p ă 0, such that ψB pβ̃q “ 0.
For Case 2, we consider the equation hC pα, βq “ 0. The same argument, except
changing B to5 C, concludes that there exists at least one β̃ in either p´1, 0q or p0, 1q,
depending on wether u J Cvv p ą 0 or u J Cvv p ă 0, such that ψC pβ̃q “ 0, where ψC pβq :“
2 pβq ` β 2 ´ 1, and
αC,`
b
uJ Cvv p ` pβu
uJ Cvv p q2 ´ pβ 2 ´ 1qpu
uJ Cu
uqpvv J
vpq
´βu
p Cv
αC,` pβq “
.
(5.21)
u
u J Cu
Let α̃ “ αB,` pβ̃q for Case 1 and α̃ “ αC,` pβ̃q for Case 2. This α̃ ‰ 0 and α̃2 ` β̃ 2 “ 1.
So ∆EpUα̃,β̃ , V q ą 0, and either ϕB pUα̃,β̃ q “ ϕB pV q for Case 1 or ϕC pUα̃,β̃ q “ ϕC pV q for
Case 2, i.e., always δpUα̃,β̃ , V q “ 0. By (5.3), ∆f pUα̃,β̃ , V q “ f pUα̃,β̃ q ´ f pV q ą 0, i.e.,
f pUα̃,β̃ q ą f pV q which is a contradiction since V is a global maximizer. The proof is
completed.
As an interesting application of Theorem 5.1, we can give a characterization of a subset
of the global maximizers in some special cases, including C “ ηB for some η ą 0.
Theorem 5.2. Suppose C is positive definite. Let V be a global maximizer to (1.1) and
E1 pEpV qq be the set of all orthonormal eigenbasis corresponding to the ℓ largest eigenvalues
of EpV q. If
δpJ, V q “ rϕB pJq ´ ϕB pV qsrϕC pJq ´ ϕC pV qs ě 0
for any J P E1 pEpV qq,
(5.22)
then J is a global maximizer of (1.1). Moreover, for any V̄ P E1 pEpV qq, we have either
ϕB pV q “ ϕB pV̄ q or ϕC pV q “ ϕC pV̄ q.
Proof. For any V̄ P E1 pEpV qq, we have ∆EpV̄ , V q “ 0 by (5.1). Use (5.22) to get
f pV̄ q ´ f pV q “
δpV̄ , V q
ϕB pV q∆EpV̄ , V q ` δpV̄ , V q
“
ě 0,
ϕB pV̄ q
ϕB pV̄ q
which implies V̄ is a global maximizer, and furthermore, δpV̄ , V q “ 0 implying either
ϕB pV q “ ϕB pV̄ q or ϕC pV q “ ϕC pV̄ q.
For the special case C “ ηB pη ą 0q, δpV̄ , V q “ ηrϕB pV q ´ ϕB pV̄ qs2 ě 0 for any
V and V̄ of apt sizes. Therefore, by Theorem 5.2, if V is a global maximizer, then any
V̄ P E1 pEpV qq is also a global maximizer. Moreover, ϕB pV q “ ϕB pV̄ q and ϕC pV q “ ϕC pV̄ q
which imply ϕA pV q “ ϕA pV̄ q also.
5
This is where that C is positive definite is required.
16
6
A scheme to make progress at a KKT point
As we mentioned at the beginning of section 5, C is assumed positive definite in this
section, too.
In parallel to the vector version of the problem (1.1) [36, Section 3.3], we suggest a
strategy to select the starting point for any iterative method that is capable of converging
to KKT points monotonically. We argue that, under a mild condition (Assumption
(6.2) below), as long as the necessary condition in Theorem 5.1 for a global maximizer is
not satisfied, such strategy can be applied to return another approximation with a larger
objective value.
Suppose V̄ is a KKT point, i.e., satisfying (3.9), but the eigenvalues of
MV̄ “ V̄ J EpV̄ qV̄
do not consist of the ℓ largest eigenvalues of EpV̄ q; so V̄ is not a global maximizer,
according to Theorem 5.1. Such a V̄ could be obtained, for example, by an iterative
procedure which may no longer be able to make further progress but to stop.
Owing to the constructive proof of Theorem 5.1 in the previous section, we will explain
an idea to break out of the KKT point to a new point Vr0 that satisfies f pVr0 q ą f pV̄ q.
Thus the iterative procedure that got to V̄ in the first place can continue at the new point
Vr0 to the next KKT point.
Let the eigen-decomposition of MV̄ be
MV̄ “ U diagpλπ1 pEpV̄ qq, . . . , λπℓ pEpV̄ qqqU J ,
(6.1a)
λπ1 pEpV̄ qq ě ¨ ¨ ¨ ě λπℓ pEpV̄ qq.
(6.1b)
Set V “ V̄ U , and note EpV̄ q “ EpV q. Define p and u by (5.5) and (5.6), respectively.
According to the proof of Theorem 5.1, if
uJ Bu
u ´ vJ
v p qpu
uJ Cu
u ´ vJ
v p q,
0 ď δpU1,0 , V q “ pu
p Bv
p Cv
then f pU1,0 q ą f pV q “ f pV̄ q, where
U1,0 “ rvv 1 , . . . , v p´1 , u , v p`1 , . . . , v ℓ s.
So we may simply take Vr0 “ U1,0 .
Otherwise, suppose δpU1,0 , V q ă 0. This is the situation where we need the following
assumption6 that should be often true.
uJ Bvv p qpu
uJ Cvv p q ‰ 0.
Assumption: pu
(6.2)
That δpU1,0 , V q ă 0 results in two mutually exclusive cases given in (5.18):
6
uJ Bvv p qpu
uJ Cvv p q ă 0, only its implication
Recall (5.13) in the proof of Theorem 5.1, where although pu
J
J
that both u Bvv p and u Cvv p are nonzero were used later in the proof.
17
2 pβ̃q ` β̃ 2 ´ 1 “ 0 for
• For Case 1, solve ψB pβ̃q :“ αB,`
#
β̃ P p 0, 1q,
β̃ P p´1, 0q,
if u J Bvv p ă 0,
if u J Bvv p ą 0,
(6.3)
where αB,` pβq is given by (5.19);
2 pβ̃q ` β̃ 2 ´ 1 “ 0 for
• For Case 2, solve ψC pβ̃q :“ αC,`
#
β̃ P p 0, 1q,
β̃ P p´1, 0q,
if u J Cvv p ă 0,
if u J Cvv p ą 0,
(6.4)
where αC,` pβq is given by (5.21).
Finally, let α̃ “ αB,` pβ̃q for Case 1 and α̃ “ αC,` pβ̃q for Case 2, respectively, and take
Vr0 “ Uα̃,β̃ , where Uα,β is defined by (5.7). By the proof of Theorem 5.1,
f pVr0 q “ f pUα̃,β̃ q ą f pV q “ f pV̄ q.
Conceivably, we may seek Vr0 by maximizing f pUα,β q over α, β P R subject to α2 `β 2 “
1 because a better Vr0 in terms of a larger value of the objective function will be gotten
than Uα̃,β̃ we just defined. It turns out that maximizing f pUα,β q is not trivial and may
have to be solved iteratively, too.
The above construction requires (6.2). Can it fail? The following proposition says it
cannot if V̄ is a local but non-global maximizer.
Proposition 6.1. Suppose C is positive definite, and let V̄ be a local but non-global
maximizer V̄ of (1.1). Let the eigen-decomposition of MV̄ be given by (6.1), and set
V “ V̄ U “ rvv 1 , . . . , v ℓ s,
p :“ min ti : λπi pEpV̄ qq ă λi pEpV̄ qqu,
1ďiďℓ
u “ λp pEpV̄ qqu
u
EpV̄ qu
and
V̄ Ju “ 0.
uJ Bvv p qpu
uJ Cvv p q ă 0. As a result, (6.2) holds.
Then pu
Proof. By Theorem 5.1, we know that 1 ď p ď ℓ. For any two α, β P R with α2 ` β 2 “ 1,
let Uα,β P Omˆℓ be as defined in (5.7). Since V is also a local but non-global maximizer,
by (5.9), (5.10) and (5.11) and by applying Theorem 3.2 to Uα,β , we have the inequality
J EpV qU
(5.12). By (5.8), trpUα,β
α,β q ´ trpMV q ą 0 for 0 ă |α| ă 1. Now use (5.12) to
J
J
u Bvv p qpu
u Cvv p q ă 0.
conclude pu
In general for a KKT point that is not a local maximizer, we do not know if the assumpuJ Bvv p qpu
uJ Cvv p q “ 0
tion (6.2) fail or not. But we propose the following workaround. If pu
uJ Bvv p qpu
uJ Cvv p q is nonzero. This proand p ă ℓ, we can simply let p :“ p ` 1 and check if pu
J
J
u Bvv p qpu
u Cvv p q ‰ 0 is encountered, or p “ ℓ but still pu
uJ Bvv p qpu
uJ Cvv p q “
cess repeats until pu
18
Algorithm 6.1 Breaking out of a KKT point
Given A KKT point V̄ which is not an orthonormal eigenbasis corresponding to the ℓ
largest eigenvalues of EpV̄ q, this algorithm computes Vr0 P Omˆℓ such that f pVr0 q ě f pV̄ q.
1: compute the eigen-decomposition of MV̄ as in (6.1), and let V “ V̄ U ;
2: compute p and u by (5.5) and (5.6), respectively;
uJ Bvv p qpu
uJ Cvv p q “ 0 and p ď ℓ do
3: while pu
4:
p :“ p ` 1;
5: end while
6: if p ď ℓ then
if 0 ď δpU1,0 , V q then
7:
8:
Vr0 “ U1,0 ;
9:
else
10:
solve (6.3) in Case 1 or (6.4) in Case 2 for β̃;
11:
α̃ “ αB,` pβ̃q if Case 1 or α̃ “ αC,` pβ̃q if Case 2;
12:
Vr0 “ Uα̃,β̃ ;
13:
end if
14: else
15:
Vr0 “ V̄ ; /* no improvement is made */
16: end if
0, the worst case for which the workaround fails, too. But that should be rare. The complete algorithm, including this workaround, is summarized in Algorithm 6.1.
The above strategy requires solving either ψB pβq “ 0 or ψC pβq “ 0. Both equations
can be turned into quadratic equations in β 2 of the form:
α4 β 4 ` α2 β 2 ` α0 “ 0,
(6.5)
where for ψX pβq “ 0 with X “ B or C, the coefficients of (6.5) are given by
u,
x11 “ u J Xu
vp,
x22 “ v J
p Xv
x12 “ u J Xvv p ,
η “ x212 ´ x11 x22 ,
α4 “ 4x12 η ´ px211 ` x212 ` ηq2 ,
α2 “ 4x11 x12 x22 ` 2x11 px11 ´ x22 qpx211 ` x212 ` ηq,
α0 “ ´x211 px11 ´ x22 q2 .
Depending on the cases, one can solve (6.5) for the right β. Note
α22 ´ 4α0 α4 “ 16x211 x212 rx12 px211 ´ x222 q ` x222 s.
Alternatively, one can certainly employ some iterative methods such as the bisection or
the Newton iteration to solve ψX pβq “ 0 for β.
19
Algorithm 7.1 A Self-Consistent-Field iteration
Given V0 P Omˆℓ and a tolerance tol, this algorithm computes an approximate maximizer
for the optimization problem (1.1).
1: for k “ 0, 1, . . . do
2:
compute an orthonormal eigenbasis Vk`1 of EpVk q associated with its ℓ largest eigenvalues;
3:
if rk :“ }EpVk`1 qVk`1 ´ Vk`1 MVk`1 }2 ď tol then
BREAK;
4:
5:
end if
6: end for
7: return Vk`1 as an approximate maximizer.
7
Self-consistent-field iteration
7.1
Self-Consistent-Field (SCF) Iteration
Theorem 5.1, on one hand, sheds some lights on the optimization problem (1.1) by connecting it to a special nonlinear extreme eigenvalue problem
EpV qV “ V MV ,
eigpMV q “ tλi pEpV qq, i “ 1, 2, . . . , ℓu.
(7.1)
On the other hand, it naturally lends itself to Algorithm 7.1, a Self-Consistent-Field (SCF)
iterative method for solving (1.1). Several remarks are in order for the algorithm.
1. The SCF iteration is currently one of the most widely used algorithms for solving
the Kohn-Sham equations in electronic structure calculations (see e.g., [14, 23, 28,
29, 32, 34]). In dimension reduction and feature extraction, the effective algorithm
discussed in [17, 30, 37, 38] is also an SCF iteration for the trace quotient (or trace
ratio) optimization problem.
2. The quantity rk defined at Line 3 serves as the residual for the approximate Vk`1 of
the nonlinear eigenvalue problem (7.1);
3. If the sequence tVk u converges to V̄ , then not only V̄ is a KKT point of (1.1), but
also satisfies the necessary condition in Theorem 5.1 for a global maximizer. This is
one of the major advantages of the SCF iteration over optimization-based methods
in [1, 7, 18] which primarily concern the monotonic change of the objective value
and converge to a KKT point that may or may not satisfy the necessary condition
in Theorem 5.1. Because of this, conceivably the SCF iteration is more likely to
achieve a global maximizer than any optimization-based method.
4. The major computational cost of Algorithm 7.1 lies at Line 2, where a dominant
orthonormal eigenbasis of m ˆ ℓ for EpVk q has to be computed every iteration. One
may employ any state-of-the-art eigensolver, for example, the QR algorithm [5, 9] if
m is modest. But the QR algorithm require Opm3 q flops and Opm2 q storage, thus
20
it is impractical for large m. In the latter case, certain iterative method should be
used, for example, the Lanczos method [19, 21], the Jacobi-Davidson iteration [25],
the conjugate gradient type method [12], to name a few.
7.2
A divergent example for SCF iteration
Originally, the SCF iteration is referred to the one for solving the Kohn-Sham equation
in electronic structure calculations [22, 33]. The Kohn-Sham equation is a nonlinear
eigenvalue problem in PDE which after certain disretization becomes a nonlinear matrix
eigenvalue problem whose matrix depends on the eigenvectors to be computed, as oppose
to the other type of nonlinear matrix eigenvalue problems whose matrices depend on the
eigenvalues to be computed. This distinguishing feature of dependency on the eigenvectors
is shared by the current nonlinear eigenvalue problem (7.1), making the SCF iteration a
natural method to use.
While the SCF iteration is simple to implement, its convergence in general is not
well understood, depending closely on the targeted nonlinear eigenvalue problem. For
the trace ratio optimization problem (the case C “ 0) in the linear discriminant analysis
(LDA) [37], it is shown that SCF globally converges to a global maximizer regardless
the initial guess V0 , and the convergence is locally quadratic under a generic condition.
However, in [32] an artificial nonlinear eigenvalue problem is constructed to show that
the SCF iteration may generate a sequence of approximate solutions that contain two
convergent subsequences alternating each other neither of which converges to the solution
for the nonlinear eigenvalue problem. For our problem (7.1), in general SCF can produce
sequences with no subsequences converging to a KKT point as the following example
illustrates.
Example 7.1. Let m “ 3, ℓ “ 2 and
»
fi
»
fi
»
fi
15 10 9
7 7 7
11 5 8
A “ – 5 10 9fl , B “ –7 10 8fl and C “ –10 7 6fl .
7 8 8
9 6 6
8 9 5
With V0 “ ree1 , e 2 s, the sequences tf pVk qu and trk u generated by the SCF iteration oscillate
and diverge (see Figure 7.1). We observe that there are two convergent subsequences
(see Figure 7.1) of tf pVk qu, but neither approaches a solution to (1.1). This example
compromises the possibility of establishing a global convergence analysis for the general
case based on the monotonic increase of the objective values.
3
There are some strategies to curb the oscillation behavior of SCF, for example, a trustregion SCF iteration, which has shown some effectiveness in the case of the Kohn-Sham
equation [23, 29, 34], but we will not pursue this here but in our future work.
Despite that in general there isn’t much we can say about SCF’s global convergence
behavior, for certain special cases we do know more. One particular example is the case
when C “ 0 investigated in [37]. In the next subsection, we will add another example to
the list of special cases by establishing a global convergence result for the case C “ ηB
pη ą 0q.
21
28.689
0.18
28.6885
0.16
0.14
28.688
rk
f(Vk)
0.12
28.6875
0.1
28.687
0.08
28.6865
0.06
28.686
28.6855
0.04
0
5
10
15
iteration k
20
25
0.02
30
0
5
10
15
iteration k
20
25
30
Figure 7.1: f pVk q (left) and rk (right) by SCF with V0 “ ree1 , e 2 s for Example 7.1
7.3
Global convergence for the case C “ ηB pη ą 0q
Lemma 7.1. Suppose V, V̄ P Omˆℓ . Then there exists an orthogonal matrix Z P Rℓˆℓ
such that
?
~ sin ΘpV, V̄ q~ ď }V Z ´ V̄ }p ď 2 ~ sin ΘpV, V̄ q~,
(7.2)
for any unitarily invariant norm ~ ¨ ~.
Proof. Suppose for the moment that 2ℓ ď m. There exist orthogonal matrices Q P Rmˆm ,
Zi P Rℓˆℓ such that [27, p.40]
ℓ
ℓ
QV Z1 “
ℓ
m´2ℓ
»
ℓ
fi
I
– 0 fl,
0
ℓ
QV̄ Z2 “
ℓ
m´2ℓ
»
fi
Γ
– Σ fl,
0
where
Γ “ diagpγ1 , . . . , γℓ q,
γ1 ě ¨ ¨ ¨ ě γℓ ě 0,
Σ “ diagpσ1 , . . . , σℓ q,
0 ď σ1 ď ¨ ¨ ¨ ď σℓ ,
γi “ cos θi pV, V̄ q, σi “ sin θi pV, V̄ q, for 1 ď i ď ℓ.
Take Z “ Z1 Z2J . The singular values of V Z ´ V̄ are
b
θi
p1 ´ cos θi q2 ` sin2 θi “ 2 sin for 1 ď i ď ℓ,
2
where θi “ θi pV, V̄ q. The inequalities in (7.2) are now simple consequences, upon noticing
sin θ ď 2 sin
?
θ
sin θ
“
ď 2 sin θ
2
cos θ{2
for 0 ď θ ď
π
.
2
The case for 2ℓ ą m can be dealt with in the same way, except using [27, p.41]
m´ℓ
m´ℓ
QV Z1 “
2ℓ´m
m´ℓ
»
I
– 0
0
m´ℓ
2ℓ´m
0
I
0
fi
fl,
m´ℓ
QV̄ Z2 “
2ℓ´m
m´ℓ
22
»
Γ
– 0
Σ
2ℓ´m
0
I
0
fi
fl,
where
Γ “ diagpγ1 , . . . , γm´ℓ q,
γ1 ě ¨ ¨ ¨ ě γm´ℓ ě 0,
Σ “ diagpσ1 , . . . , σm´ℓ q,
0 ď σ1 ď ¨ ¨ ¨ ď σm´ℓ ,
γi “ cos θ2ℓ´m`i pV, V̄ q, σi “ sin θ2ℓ´m`i pV, V̄ q, for 1 ď i ď m ´ ℓ,
and θi pV, V̄ q “ 0 for 1 ď i ď 2ℓ ´ m.
The inequality (7.3) below can also be deduced from [13, Theorem 2.1], but since (7.3)
is much simpler to prove, a short proof is given for self-containedness.
Lemma 7.2. Let H P Rmˆm be a symmetric matrix. For V, V̄ P Omˆℓ , we have
?
?
| trpV̄ J H V̄ q ´ trpV J HV q| ď 2 2 }H}2 ϵ̃ ď 2 2 ℓ}H}2 ϵ.
(7.3)
where
ϵ̃ “ } sin ΘpV, V̄ q}tr ,
ϵ “ } sin ΘpV, V̄ q}2 .
(7.4)
In particular, for the function f defined by (3.2),
lim
f pV q “ f pV̄ q.
(7.5)
} sin ΘpV̄ ,V q}2 Ñ0
Proof. Let Z be the one in Lemma 7.1, and ∆V “ V̄ ´ V Z. We have
trpV̄ J H V̄ q ´ trpV J HV q “ trpV̄ J H V̄ q ´ trpZ J V J HV Zq
“ trpV̄ J H V̄ q ´ trpV̄ J HV Zq ` trpV̄ J HV Zq ´ trpZ J V J HV Zq
“ trpV̄ J H∆V q ` trpr∆V sJ HV Zq.
Therefore use (2.4) and (2.5) to get
| trpV̄ J H V̄ q ´ trpV J HV q| ď }V̄ J H∆V }tr ` }r∆V sJ HV Z}tr
ď 2}H}2 }∆V }tr
which is the first inequality in (7.3). The second inequality there holds because ϵ̃ ď ℓϵ.
Theorem 7.1. Suppose A P Rmˆm is symmetric and C “ ηB P Rmˆm pη ą 0q is
symmetric and positive definite. Let tVk u be the sequence generated by Algorithm 7.1 with
an arbitrarily given V0 P Omˆℓ . Let Vk “ RpVk q, the column space of Vk . Then the
following statements hold.
1. tVk u has a convergent subsequence tVkj u in the sense that there is an ℓ-dimensional
subspace V̄ Ă Rm such that
lim } sin ΘpVkj , V̄q}2 “ 0.
jÑ8
(7.6)
2. The sequence tf pVk qu monotonically increases and converges to f pV̄ q, where V̄ is an
orthonormal basis matrix of V̄.
23
3. V̄ is an invariant subspace of EpV̄ q corresponding to its ℓ largest eigenvalues. In
the other word, the orthonormal basis matrix of any limit point of tVk u satisfies the
necessary condition given in Theorem 5.1.
4. If λℓ pEpV̄ qq ą λℓ`1 pEpV̄ qq, i.e., the dominant ℓ-dimensional eigenspace of EpV̄ q is
unique, then for any positive integer q, we have
lim } sin ΘpVkj ˘q , V̄ q}2 “ 0.
jÑ8
(7.7)
As an interesting consequence, if max |kj`1 ´ kj | over all 1 ď j ď 8 is finite, then
} sin ΘpVk , V̄ q}2 Ñ 0 as k Ñ 8.
Proof. All subspaces of dimension ℓ in Rm form a Grassmann manifold which is compact
[15, p.57] with the metric given by } sin Θp ¨ , ¨ q}2 . tVk u is a sequence on the manifold
and thus has a convergent subsequence tVkj u that converges to an ℓ-dimensional subspace
V̄ Ă Rm in the sense of (7.6). This is item 1.
Before we prove item 2, we notice that C “ ηB leads to
MV “ V J EpV qV “ V J CV “ ηV J BV
for any V P Omˆℓ .
(7.8)
Now we have from (5.3)
∆fk : “ f pVk`1 q ´ f pVk q
“
‰
2
J EpV qV
ϕB pVk q trpVk`1
k k`1 q ´ ηϕB pVk q ` η rϕB pVk`1 q ´ ϕB pVk qs
“
.
ϕB pVk`1 q
(7.9)
Because Vk`1 is an orthonormal eigenbasis of EpVk q corresponding to its ℓ largest eigenvalues, implying
J
trpVk`1
EpVk qVk`1 q ě trpVkJ EpVk qVk q “ ηϕB pVk q
by (7.8), it follows that ∆fk ě 0. Thus tf pVk qu is nondecreasing and convergent since
t|f pVk q|u is bounded. Since } sin ΘpVkj , V̄ q}2 Ñ 0 as j Ñ 8, by (7.5) we conclude that
lim f pVk q “ lim f pVkj q “ f pV̄ q.
jÑ8
kÑ8
This proves item 2.
Making use of ∆fk Ñ 0, we conclude from (7.9)
J
lim rtrpVk`1
EpVk qVk`1 q ´ ηϕB pVk qs “ 0,
(7.10)
lim rϕB pVk`1 q ´ ϕB pVk qs “ 0.
(7.11)
kÑ8
kÑ8
Again since } sin ΘpVkj , V̄ q}2 Ñ 0 as j Ñ 8, by (7.5) and (7.11) we have
lim ϕB pVkj `1 q “ lim ϕB pVkj q “ ϕB pV̄ q,
jÑ8
jÑ8
24
(7.12)
lim ϕA pVkj q “ ϕA pV̄ q.
jÑ8
(7.13)
Combine (7.10) with (7.12) to get
lim trpVkJj `1 EpVkj qVkj `1 q “ η lim ϕB pVkj q “ ηϕB pV̄ q.
jÑ8
jÑ8
(7.14)
According to the SCF iteration, we have
Ăk`1
EpVk qVk`1 “ Vk`1 M
with
Ăk`1 “ V J EpVk qVk`1 .
M
k`1
Notice that
∆Ek : “ EpVk q ´ EpV̄ q
ˆ
˙
ˆ
˙
1
1
ϕA pV̄ q
ϕA pVk q
“
´
A`
´
B.
ϕB pVk q ϕB pV̄ q
rϕB pV̄ qs2 rϕB pVk qs2
Use (7.12) and (7.13) to see
lim ∆Ekj “ 0.
(7.15)
jÑ8
Therefore, by the continuity of eigenvalues with respect to matrix entries [27], we know
Ăk `1 q “ lim
lim trpM
j
jÑ8
jÑ8
ℓ
ÿ
λs pEpVkj qq “
ℓ
ÿ
λs pEpV̄ qq.
(7.16)
s“1
s“1
Together, (7.14) and (7.16) yield
Ăk `1 q “
lim trpM
j
jÑ8
ℓ
ÿ
λs pEpV̄ qq “ ηϕB pV̄ q.
(7.17)
s“1
This shows that ηϕB pV̄ q is the sum of the ℓ largest eigenvalues of EpV̄ q. Therefore by
(7.8),
ℓ
ÿ
trpV̄ J EpV̄ qV̄ q “ ηϕB pV̄ q “
λs pEpV̄ qq.
s“1
So V̄ must be an orthonormal eigenbasis of EpV̄ q corresponding to its ℓ largest eigenvalues.
This proves item 3.
We prove item 4 by induction on q. The assumption that λℓ pEpV̄ qq ą λℓ`1 pEpV̄ qq
implies that the eigenspace of EpV̄ q associated with its first ℓ largest eigenvalues is unique,
i.e., V̄ is unique even though its orthonormal basis matrix V̄ is not. EpVkj q Ñ EpV̄ q as
j Ñ 8 according to (7.15) and thus for sufficiently large j, λℓ pEpVkj qq ą λℓ`1 pEpVkj qq
and the eigenspace Vkj `1 of EpVkj q associated with its first ℓ largest eigenvalues is unique
and converges to V̄ by Davis-Kahan sin 2θ theorem [4], i.e., (7.7) holds for q “ 1. Suppose
now (7.7) holds for q, and we need to show it is also true for q ` 1. To see this, we simply
rename kj ` q to kj and apply the argument we just did.
25
7.4
Local convergence for the general case
We now will investigate the local convergence behavior of the SCF iteration for the general
case. Throughout this subsection,
ϵ̃ “ } sin ΘpV, V̄ q}tr ,
ϵ “ } sin ΘpV, V̄ q}2 ,
(7.4)
where V, V̄ P Omˆℓ . Then ϵ ď ϵ̃ ď ℓϵ. RA and RB as defined by (7.19) below with
H P tA, Bu.
Lemma 7.3 is a refinement of Lemma 7.2 when RpV̄ q is an approximate invariant
subspace of H.
Lemma 7.3. Let H P Rmˆm be a symmetric matrix. For V, V̄ P Omˆℓ , we have
?
|ϕH pV̄ q ´ ϕH pV q| ď p2 2 }RH }2 ` 4}H}2 ϵqϵ̃,
(7.18)
where ϕH pV q :“ V J HV , and
RH “ H V̄ ´ V̄ ploomoon
V̄ J H V̄ q.
(7.19)
“:H1
Proof. Let Z P Rℓˆℓ be the one in Lemma 7.1, and ∆V “ V̄ ´ V Z. We have
|ϕH pV̄ q ´ ϕH pV q| “ |ϕH pV̄ q ´ ϕH pV Zq|
ď | trp∆V J H V̄ q ` trpV̄ J H∆V q| ` | trp∆V J H∆V q|.
(7.20)
Use H V̄ “ RH ` V̄ H1 to get
trp∆V J H V̄ q ` trpV̄ J H∆V q
J
“ trp∆V J RH q ` trp∆V J V̄ H1 q ` trpRH
∆V q ` trpH1 V̄ J ∆V q
“ 2 trp∆V J RH q ` trp∆V J V̄ H1 q ` trpV̄ J ∆V H1 q
“ 2 trp∆V J RH q ` trpr∆V J V̄ ` V̄ J ∆V sH1 q
“ 2 trp∆V J RH q ` trp∆V J ∆V H1 q,
(7.21)
where we have used ∆V J V̄ ` V̄ J ∆V “ ∆V J ∆V which can be verified by substituting in
∆V “ V̄ ´ V Z. Combine (7.21) with (7.20) to get
|ϕH pV̄ q ´ ϕH pV q| ď 2| trp∆V J RH q| ` | trp∆V J ∆V H1 q| ` | trp∆V J H∆V q|
ď 2}∆V J RH }tr ` }∆V J ∆V H1 }tr ` }∆V J H∆V }tr
(by (2.5))
ď 2}∆V }tr }RH }2 ` }∆V }2 }∆V }tr p}H1 }2 ` }H}2 q,
and then use Lemma 7.1 and }H1 }2 ď }H}2 to conclude (7.18).
Note that RH “ 0 is equivalent to } sin ΘpV̄ , H V̄ q}2 “ 0. To establish the main theorem, Theorem 7.2, of this subsection, we additionally introduce the following notations
ωB :“
m
ÿ
λj pBq,
ΩB :“
ℓ
ÿ
j“1
j“m´ℓ`1
26
λj pBq.
(7.22)
Recall that B is positive definite but A may be indefinite. We have for any V P Omˆℓ ,
$ˇ
ˇ ˇ
ˇ,
ˇ ˇ ℓ
m
&ˇˇ ÿ
ˇ.
ˇ ˇÿ
ˇ
0 ă ωB ď ϕB pV q ď ΩB and |ϕA pV q| ď ΩA :“ max ˇˇ
λj pAqˇˇ , ˇ
λj pAqˇ .
%ˇ
ˇ
ˇˇ j“1
j“m´ℓ`1
(7.23)
Lemma 7.4. For V, V̄ P Omˆℓ , we have
ˇ
ˇ
ˇ 1
2ℓ}B}2
1 ˇˇ 2}B}2
ˇ
ˇ ϕB pV q ´ ϕ pV̄ q ˇ ď ω 2 ϵ̃ ď ω 2 ϵ,
B
B
“?B
‰
“?
‰
ˇ
ˇ
ˇ 1
ˇ 2 2 }RB }2 ` 2}B}2 ϵ
2ℓ 2 }RB }2 ` 2}B}2 ϵ
1
ˇ
ˇ
ϵ̃ ď
ϵ,
ˇ ϕB pV q ´ ϕ pV̄ q ˇ ď
ω2
ω2
B
B
(7.24a)
(7.24b)
B
and
ˇ
ˇ
ˇ ϕA pV̄ q
ϕA pV q ˇˇ
ˇ
ˇ rϕ pV̄ qs2 ´ rϕB pV qs2 ˇ ď
B
ˇ
ˇ
ˇ ϕA pV̄ q
ϕA pV q ˇˇ
ˇ
ˇ rϕ pV̄ qs2 ´ rϕB pV qs2 ˇ ď
B
‰
2 “
2
2ΩA ΩB }B}2 ` ΩB
}A}2 ϵ̃,
4
ωB
2 ” ?
2
4 2 2pΩB }RA }2 ` 2ΩA ΩB }RB }2 q
ωB
ı
2
`2pΩB
}A}2 ` 2ΩA ΩB }B}2 qϵ ϵ̃.
(7.25a)
(7.25b)
Proof. By Lemmas 7.2 and 7.3, we can write for H P tA, Bu
ϕH pV q “ ϕH pV̄ q ` ϵH ,
(7.26a)
|ϵH | ď 2}H}2 ϵ̃ ď 2ℓ}H}2 ϵ,
?
?
|ϵH | ď p2 2 }RH }2 ` 4}H}2 ϵqϵ̃ ď ℓp2 2 }RH }2 ` 4}H}2 ϵqϵ.
(7.26b)
(7.26c)
Therefore
ˇ
ˇ ˇ
ˇ
ˇ 1
ˇ ˇ
ˇ
1
´ϵ
B
ˇ
ˇ ˇ
ˇ
ˇ ϕB pV q ´ ϕ pV̄ q ˇ “ ˇ ϕ pV qϕ pV̄ q ˇ ,
B
B
B
ˇ
ˇ ˇ
ˇ
ˇ ϕA pV̄ q
ϕA pV q ˇˇ ˇˇ rϕA pV̄ q ´ ϕA pV qsrϕB pV qs2 ` ϕA pV qtrϕB pV qs2 ´ rϕB pV̄ qs2 u ˇˇ
ˇ
ˇ
ˇ rϕ pV̄ qs2 ´ rϕB pV qs2 ˇ “ ˇ
rϕB pV qϕB pV̄ qs2
B
Ω 2 |ϵA | ` 2ΩA ΩB |ϵB |
ď B
.
4
ωB
Now use (7.26b) and (7.26c) to bound ϵB to get (7.24a) and (7.24b), respectively; use
(7.26b) to bound ϵA and ϵB to get (7.25a); use (7.26c) to bound ϵA and ϵB to get (7.25b).
2 and ω 4
Remark 7.1. In this lemma and many expressions in what follows, we find ωB
B
in the denominators of various fractions, as the results of bounding ϕB pV qϕB pV̄ q and
rϕB pV qϕB pV̄ qs2 from below, respectively, simply by using (7.23). In cases where ωB is
27
rather tiny relative to ΩB , it may make various upper bound estimates unnecessarily too
big. A better option may be to use
ϕB pV qϕB pV̄ q “ ϕB pV̄ qrϕB pV̄ q ` ϵB s
followed by bounding ϵB . We omit the detail.
Lemma 7.5. For V, V̄ P Omˆℓ , we have
}EpV̄ q ´ EpV q}2 ď pχ
1 ` χ2 q}B}2 ϵ,
loooooooomoooooooon
(7.27)
“:χ
where
χ1 “
2ℓ}A}2
2
ωB
Proof. It can be verified that
„
∆E :“ EpV q ´ EpV̄ q “
and
χ2 “
2ℓΩB r2ΩA }B}2 ` ΩB }A}2 s
.
4
ωB
ȷ
„
ȷ
1
1
ϕA pV̄ q
ϕA pV q
´
A`
´
B.
ϕB pV q ϕB pV̄ q
rϕB pV̄ qs2 rϕB pV qs2
(7.28)
(7.29)
Therefore
ˇ
ˇ
ˇ
ˇ
ˇ 1
ˇ ϕA pV̄ q
1 ˇˇ
ϕA pV q ˇˇ
ˇ
ˇ
}∆E}2 ď ˇ
´
´
}A}2 ` ˇ
}B}2 .
ϕB pV q ϕB pV̄ q ˇ
rϕB pV̄ qs2 rϕB pV qs2 ˇ
Now use (7.24a) and (7.25a) to complete the proof.
Remark 7.2. Sharper bounds than (7.27) are possible through using (7.24b) and (7.25b)
instead. But the use of these possible sharper bounds does not lead to an essentially
different conclusion later in our main theorem – Theorem 7.2.
Lemma 7.6. Let H P Rmˆm be a symmetric matrix, and V` , V̄ P Omˆℓ , and RH be
defined by (7.19). Let V̄K P Omˆpm´ℓq such that rV̄ , V̄K s is orthogonal. Then for any
unitarily invariant norm ~ ¨ ~
?
~V̄KJ HV` ~ ď 2 }H}2 ~ sin ΘpV` , V̄ q~ ` ~RH ~ .
(7.30)
Proof.
Let Z P Rℓˆℓ be the one in Lemma 7.1 applied to V` and V̄ such that ~V̄ ´V` Z~ ď
?
2~ sin ΘpV` , V̄ q~. Now notice
V̄KJ HV` “ V̄KJ HpV` Z ´ V̄ qZ J ` V̄KJ H V̄ Z J
“ V̄KJ HpV` Z ´ V̄ qZ J ` V̄KJ pRH ` V̄ H1 qZ J
“ V̄KJ HpV` Z ´ V̄ qZ J ` V̄KJ RH Z J
to get ~V̄KJ HV` ~ ď }H}2 ~V` Z ´ V̄ ~ ` ~RH ~ ď
28
?
2 }H}2 ~ sin ΘpV` , V̄ q~ ` ~RH ~.
Lemma 7.7. Suppose V̄ P Omˆℓ is an orthonormal eigenbasis of EpV̄ q associated with
its ℓ largest eigenvalues, and let
δ :“ λℓ pEpV̄ qq ´ λℓ`1 pEpV̄ qq.
Define RB and B1 by (7.19) with H “ B. For V P Omˆℓ , let V` P Omˆℓ be an orthonormal
eigenbasis of EpV q associated with its ℓ largest eigenvalues. If
?
δ ą pχ ` 2χ2 }B}2 qϵ,
(7.31)
then
} sin ΘpV` , V̄ q}2 ď
τ1 }RB }2 ϵ ` τ2 ϵ2
?
,
δ ´ pχ ` 2χ2 }B}2 qϵ
where χ and χ2 are defined in (7.27) and (7.28), and
?
2 2ℓ}A}2
4ℓ}A}2 }B}2
τ1 “
` χ2 , τ2 “
.
2
2
ωB
ωB
(7.32)
(7.33)
Proof. By the assumption EpV qV` “ V` Λ1 , where Λ1 P Rℓˆℓ and eigpΛ1 q “ tλi pEpV qquℓi“1 .
Let
R1 :“ EpV̄ qV` ´ V` Λ1 “ rEpV̄ q ´ EpV qsV` .
(7.34)
Let V̄K P Omˆpm´ℓq such that rV̄ , V̄K s is orthogonal. By the assumption, V̄K is the orthonormal basis matrix associated the smallest m ´ ℓ eigenvalues of EpV̄ q. So we can
write
V̄KJ EpV̄ q “ Λp2 V̄KJ ,
where Λp2 P Rpm´ℓqˆpm´ℓq and eigpΛp2 q “ tλi pEpV̄ qqum
i“ℓ`1 . We have
V̄KJ R1 “ V̄KJ rEpV̄ qV` ´ V` Λ1 s
“ Λp2 V̄ J V` ´ V̄ J V` Λ1 .
K
K
(7.35)
Since |λi pEpV qq ´ λi pEpV̄ qq| ď }EpV̄ q ´ EpV q}2 ď χ ϵ by [27, p.203] and Lemma 7.5, we
have, for i ď ℓ and ℓ ` 1 ď j,
λi pEpV qq ´ λj pEpV̄ qq ě λℓ pEpV qq ´ λℓ`1 pEpV̄ qq
ě λℓ pEpV̄ qq ´ χϵ ´ λℓ`1 pEpV̄ qq
“ δ ´ χϵ.
(7.36)
Therefore by Lemma 2.1, we have
} sin ΘpV` , V̄ q}2 “ }VKJ V` }2 ď
To bound }V̄KJ R1 }2 , we have, using (7.34) and (7.29),
}V̄KJ R1 }2 “ }V̄KJ rEpV̄ q ´ EpV qsV` }2
29
}V̄KJ R1 }2
.
δ ´ χϵ
(7.37)
ˇ
ˇ
ˇ
ˇ ϕA pV̄ q
1
ϕA pV q ˇˇ J
1 ˇˇ J
ˇ
´
}
V̄
AV
}
`
´
` 2
ˇ rϕ pV̄ qs2 rϕB pV qs2 ˇ }V̄K BV` }2
ϕB pV q ϕB pV̄ q ˇ K
B
“: ε1 ` ε2 .
(7.38)
ˇ
ˇ
ď ˇˇ
For ε1 , we use (7.24b) to get
?
2 2ℓ}RB }2 ϵ ` 4ℓ}B}2 ϵ2
ε1 ď
}A}2 .
2
ωB
(7.39)
For ε2 , we use (7.25a) to get
ı
”
?
ε2 ď χ2 ϵ}V̄KJ BV` }2 ď χ2 ϵ }B}2 2} sin ΘpV` , V̄ q}2 ` }RB }2 .
(7.40)
Finally, combine (7.37) – (7.40) to arrive at, under (7.31),
} sin ΘpV` , V̄ q}2 ď
ε1 ` χ2 }RB }2 ϵ
?
,
δ ´ pχ ` 2χ2 }B}2 qϵ
which, together (7.39), lead to (7.32).
Remark 7.3. The inequality (7.37) remains valid in any unitarily invariant norm:
~ sin ΘpV` , V̄ q~ “ ~VKJ V` ~ ď
~V̄KJ R1 ~
.
δ ´ χϵ
(7.37’)
The machinery we have built so far in various lemmas allows us to easily bound ~V̄KJ R1 ~
in the similar way. The outcome will be interesting theoretically, but may add little to
our understanding of SCF than Lemma 7.7 as is. This same remark applies to the next
theorem in which the analysis is in terms of } ¨ }2 but can be done in terms of any unitarily
invariant norm, too.
Theorem 7.2. Suppose V̄ P Omˆℓ is an orthonormal eigenbasis of EpV̄ q associated with
its ℓ largest eigenvalues, and let
δ :“ λℓ pEpV̄ qq ´ λℓ`1 pEpV̄ qq.
Define RA , A1 and RB , B1 by (7.19) with H “ A and H “ B, respectively, and let χ, χ2 ,
τ1 , and τ2 be as defined in (7.27), (7.28), and (7.33), respectively. Apply the SCF iteration
(Algorithm 7.1) to generate a sequence tVk u, given V0 .
1. Given any 0 ă t ă 1 and 0 ă ν ă 1, if
}RB }2 ď tν δ{τ1 ,
(7.41)
then for any V0 P Omˆℓ satisfying
} sin ΘpV0 , V̄ q}2 ă
p1 ´ tqνδ
?
τ2 ` νpχ ` 2χ2 }B}2 q
(7.42)
the sequence tVk :“ RpVk qu converges to V̄ :“ RpV̄ q at least linearly, and moreover
} sin ΘpVk`1 , V̄ q}2 ď ν} sin ΘpVk , V̄ q}2
30
for k “ 0, 1, . . ..
(7.43)
2. If RB “ 0, then for any V0 P Omˆℓ such that
} sin ΘpV0 , V̄ q}2 ă
δ
?
,
τ2 ` χ ` 2χ2 }B}2
(7.44)
the sequence tVk u converges to V̄ quadratically, and moreover
} sin ΘpVk`1 , V̄ q}2 ď
δ ´ pχ `
?
τ2
} sin ΘpVk , V̄ q}22
2χ2 }B}2 q} sin ΘpVk , V̄ q}2
(7.45)
for k “ 0, 1, . . ..
3. If RA “ RB “ 0, then for any V0 P Omˆℓ satisfying
} sin ΘpV0 , V̄ q}2 ă
2δ
χ`
a
χ2 ` 4τ3 δ
then V1 “ V̄, i.e., convergence in one step, where
?
?
ı
4 2}A}2 }B}2 ℓ 4 2ℓ}B}2 ”
2
τ3 :“
`
}A}
Ω
`
2Ω
Ω
}B}
2 B
2 .
A B
2
4
ωB
ωB
(7.46)
(7.47)
Proof. It suffices to prove (7.43) and (7.45) just for k “ 0 and verify that (7.42) and (7.44)
hold with V0 replaced by V1 .
For clarity, we drop the subscript 0 to V0 and write V` for V1 . Let
ϵ “ } sin ΘpV, V̄ q}2 ,
ϵ` “ } sin ΘpV` , V̄ q}2 .
By Lemma 7.7, we have
ϵ` ď
τ1 }RB }2 ` τ2 ϵ
?
ˆϵ
δ ´ pχ ` 2χ2 }B}2 qϵ
(7.48)
?
if δ ą pχ ` 2χ2 }B}2 qϵ. As we will see, this condition is always satisfied under (7.42) or
(7.44) in the respective cases.
For item 1, we like to have
ô
ô
ð
τ1 }RB }2 ` τ2 ϵ
?
ďν
δ ´ pχ ` 2χ2 }B}2 qϵ
”
ı
?
τ1 }RB }2 ` τ2 ϵ ď ν δ ´ pχ ` 2χ2 }B}2 qϵ
ı
”
?
τ1 }RB }2 ` τ2 ` νpχ ` 2χ2 }B}2 q ϵ ď νδ
”
ı
?
τ1 }RB }2 ď tνδ,
τ2 ` νpχ ` 2χ2 }B}2 q ϵ ď p1 ´ tqνδ.
(7.49)
The two inequalities in (7.49) hold by the assumption
}RB }2 ď tν δ{τ1 and (7.42). Also
?
the second inequality in (7.49) implies δ ą pχ ` 2χ2 }B}2 qϵ.
For item 2, RB “ 0; so (7.48) becomes
ϵ` ď
τ ϵ
?2
ˆϵ
δ ´ pχ ` 2χ2 }B}2 qϵ
31
(7.50)
Now we like to have
ô
ô
τ ϵ
?2
ă1
δ ´ pχ ` 2χ2 }B}2 qϵ
?
τ2 ϵ ă δ ´ pχ ` 2χ2 }B}2 qϵ
”
ı
?
τ2 ` pχ ` 2χ2 }B}2 q ϵ ă δ.
(7.51)
The inequality
in (7.51) is equivalent to the assumption (7.44). Also this inequality implies
?
δ ą pχ ` 2χ2 }B}2 qϵ.
For item 3, to exploit the additional condition RA “ 0, we return to (7.37) and (7.38).
By (7.24b) and Lemma 7.6 for H “ A, we have
?
4ℓ}B}2 ϵ ?
4 2}A}2 }B}2 ℓ 2
ϵ1 ď
¨ 2}A}2 ϵ` ď
ϵ ϵ` .
(7.52)
2
2
ωB
ωB
For ϵ2 , we use (7.25b) and Lemma 7.6 for H “ B to get
ˇ
ˇ
ˇ ϕA pV̄ q
ϕA pV q ˇˇ ?
ˇ
ϵ2 ď ˇ
´
2}B}2 ϵ`
rϕB pV̄ qs2 rϕB pV qs2 ˇ
?
ı
4 2ℓ}B}2 ”
2
ď
}A}
Ω
`
2Ω
Ω
}B}
ϵ2 ϵ` .
2
2
A
B
B
4
ωB
(7.53)
Combine (7.37), (7.38), (7.52), and (7.53) to arrive at
ϵ` ď
τ3 2
ϵ ϵ`
δ ´ χϵ
where τ3 is defined by (7.47). Equivalently, ϵ` pδ ´ χϵ ´ τ3 ϵ2 q ď 0, provided δ ´ χϵ ą 0.
Thus ϵ` “ 0 if δ ´ χϵ ´ τ3 ϵ2 ą 0. This last inequality holds under (7.46) which also ensures
δ ´ χϵ ą 0.
Remark 7.4. The condition (7.41) is rather strong, as the result of a number of upper
bound estimations. In actual computations, SCF may still converge even if it fails. However, it is very useful to our understanding of SCF’s local convergence behaviors. The
condition reveals a remarkable intrinsic connection between the convergence speed of SCF
and that V̄ being close to an eigenspace of B. The closer V̄ is to an eigenspace of B, the
faster the convergence will be. In fact, when V̄ is an eigenspace of B, the convergence
is quadratic. Even more extreme is when V̄ is an eigenspace of both A and B, the convergence is instant for a sufficiently accurate V0 . A very particular case: AB “ BA and
CB “ BC falls into item 3 of Theorem 7.2. Under these assumptions, for any V P Omˆℓ ,
EpV qH “ HEpV q for H P tA, Bu and thus
EpV̄ qV̄ “ V̄ MV̄ ñ EpV̄ qH V̄ “ H V̄ MV̄
which implies that H V̄ is also a eigenbasis of EpV̄ q corresponding to its ℓ largest eigenvalues. With the aid of λℓ pEpV̄ qq ą λℓ`1 pEpV̄ qq, we conclude that for H P tA, Bu,
sin ΘpV̄ , H V̄ q “ 0 ô RH “ 0.
32
8
Numerical experiments
We report in this section our numerical experiments on the SCF iteration. We recall
Example 7.1, purposely constructed to demonstrate that global convergence cannot be
guaranteed in general. But our experience on randomly generated problems has been
always fast and globally convergent.
With various pairs pm, ℓq, we have tested the SCF iteration on numerous random
symmetric and positive definite matrices A, B, C and random initial V0 as generated in
MATLAB by
A “ randnpm, mq;
A “ A1 ˚ A;
B “ randnpm, mq;
B “ B1 ˚ B;
C “ randnpm, mq;
C “ C1 ˚ C;
V0 “ qrprandnpm, ℓq, 0q;
We mentioned before making C positive definite does not loss any generality. The same
can be said about A through shifting A to A ` ξB so that it is positive definite and at the
same time, the maximizers remain unchanged.
For comparison purpose, we also tested Stiefel manifold-based optimization methods,
coded into sg min7 [2, section 9.4] (see also [7]), to minimize ´f pV q. They include the
Fletcher-Reeves CG iterative search, the Polak-Ribière CG iterative search, the Newton
iterative search, and the dog-leg Newton iterative search, all accessible through sg min by
[f,V]=sg min(V0 , ‘frcg’,‘euclidean’);
[f,V]=sg min(V0 , ‘prcg’,‘euclidean’);
[f,V]=sg min(V0 , ‘newton’,‘euclidean’);
[f,V]=sg min(V0 , ‘dog’,‘euclidean’);
respectively. In order to run sg min, we provide
1. the objective function8 ´f pV q in the MATLAB file F.m ,
2. the first derivative information in dF.m
dFpV q :“ ´
Bf pV q
“ ´2EpV qV,
BV
3. the second derivative information in ddF.m
ˇ
ˇ
␣
(
d
“ ´2 EpV qX ` DEpV qrXsV
ddFpV, Xq :“ dFpV ptqqˇˇ
dt
t“0
␣
(
“ ´2 EpV qX ` GpV, XqV ,
where V ptq is any smooth curve on Omˆℓ with V “ V p0q and X “ V9 p0q P TV Omˆℓ ,
and GpV, Xq is defined by (3.12).
7
8
The MATLAB code sg min (version 2.4.3) is available at: http://web.mit.edu/„ripper/www/sgmin.html.
Because sg min minimizes a function on Omˆℓ , we use ´f pV q instead of f pV q.
33
Table 8.1: The numbers of outer iterations averaged over 20 random tests
SCF
m
sg min
frcg
prcg
Newton
dog
100
7.00
42.40
220.40
9.00
12.80
200
6.80
56.70
341.20
10.10
13.20
500
6.00
95.10
679.00
10.50
14.50
1000
6.00
85.50
677.60
11.10
15.40
2000
5.00
114.50
648.80
11.80
16.30
All calculations are carried out in MATLAB 7.10.0 (R2010a) on a PC Intel(R) Core(TM)2
Quad CPU Q9300 @2.50GHz. For the SCF iteration, when m ď 500, we use the MATLAB
function eig to find an orthonormal eigenbasis Vk`1 of EpVk q associated with its ℓ largest
eigenvalues at Line 2 of Algorithm 7.1, and use eigs if m ą 500. The SCF iteration is
terminated whenever
rk :“ }EpVk`1 qVk`1 ´ Vk`1 MVk`1 }2 ď tol “ 10´10
or k ě 50,
For the sg min, the default options are used.
For ℓ “ 3 and various m, Tables 8.1, 8.2 and 8.3 report the numbers of outer iterations,
the residuals rk , and the CPU times (measured by MATLAB function cputime), averaged
over 20 random tests, for each method. Similar numerical behaviors are also observed for
other ℓ.
Table 8.2: Residuals rk averaged over 20 random tests
SCF
m
sg min
frcg
prcg
Newton
dog
100
1.7392E-11
1.0016E-4
1.0426E-4
6.0403E-5
9.9159E-5
200
1.4309E-11
1.8052E-4
2.0508E-4
1.4730E-4
2.3812E-4
500
1.2952E-11
5.1048E-4
5.0872E-4
3.3755E-4
6.1155E-4
1000
1.3626E-11
1.0374E-3
1.0266E-3
8.3783E-4
1.1495E-3
2000
5.1391E-11
2.1672E-3
2.0579E-3
1.6901E-3
2.4546E-3
Tables 8.1, 8.2, and 8.3 clearly suggest that SCF is far more superior to Stiefel manifoldbased optimization methods implemented into sg min on the random problems in accuracy,
the number of iterative steps, and the running time. It deserves to point out that SCF
shows a remarkable global convergence behavior: we observed that numerically, all the
methods converge to the same objective value for each testing problem, but SCF can
reach a solution with residual about 10´11 in only about 6 iterations, while the FletcherReeves CG (frcg) takes from 6 up to 20 times, the Polak-Ribière CG (prcg) takes from
34
Table 8.3: CPU time: averages over 20 random tests
SCF
m
sg min
frcg
prcg
Newton
dog
0.6848
1.6817
0.3822
0.7254
100
0.1825
200
0.7831
1.6287
5.7798
1.1185
3.1590
500
3.8813
19.3270
84.3076
10.8545
66.6748
1000
9.5020
70.5156
379.2478
56.0278
520.1073
2000
29.2954
370.6475
1435.5852
298.2989
4075.0503
30 up to 128 times, the Newton iteration (Newton) takes from 1.3 up to 2 times, and the
dog-leg Newton iteration (dog) takes from 1.7 up to 3 times as many iterations to reduce
residuals to only about 10´4 .
Our last remark of this section is about the relationship between the accuracy of the
objective value f pV q and the residual rk of the computed solution for each method. Note
rk “ }EpVk`1 qVk`1 ´ Vk`1 MVk`1 }2
ď }EpV̄ qVk`1 ´ Vk`1 MVk`1 }2 ` }rEpVk`1 q ´ EpV̄ qsVk`1 }2
“ }EpV̄ qVk`1 ´ Vk`1 MVk`1 }2 ` Opϵk q,
rk ě }EpV̄ qVk`1 ´ Vk`1 MVk`1 }2 ´ }rEpVk`1 q ´ EpV̄ qsVk`1 }2
“ }EpV̄ qVk`1 ´ Vk`1 MVk`1 }2 ` Opϵk q,
by Lemma 7.5, where ϵk :“ sin ΘpVk , V̄ q. For sufficiently tiny ϵk , if
δ :“ λℓ pEpV̄ qq ´ λℓ`1 pEpV̄ qq ą 0,
then by Davis-Kahan sin θ theorem [4],
1
ϵk ď }EpV̄ qVk`1 ´ Vk`1 MVk`1 }2 ` Opϵk q.
δ
Hence roughly speaking, rk and ϵk are about in the same order of magnitude. Since the
gradient f pV q at V̄ vanishes,
f pVk q “ f pV̄ q ` Opϵ2k q “ f pV̄ q ` Oprk2 q.
(8.1)
This explains well Table 8.4 in which the objective values f pVk q by Stiefel manifold-based
optimization methods match the respective ones by SCF to about 11 decimal digits, even
though the residuals by the former methods are about the square roots of the ones by
SCF.
35
Table 8.4: The computed objective values at convergence of a typical test problem
SCF
sg min
m
frcg
prcg
Newton
dog
100 1.139615532720E+3 1.139615532719E+3 1.139615532719E+3 1.139615532719E+3 1.139615532719E+3
200 2.233485585827E+3 2.233485585826E+3 2.233485585826E+3 2.233485585826E+3 2.233485585824E+3
500 5.814017245337E+3 5.814017245334E+3 5.814017245335E+3 5.814017245332E+3 5.814017245321E+3
1000 1.173828254978E+4 1.173828254976E+4 1.173828254977E+4 1.173828254977E+4 1.173828254971E+4
2000 2.368365236550E+4 2.368365236545E+4 2.368365236546E+4 2.368365236548E+4 2.368365236535E+4
9
Concluding remarks and future research
Analogously to that maximizing trpV J CV q on Omˆℓ is a subspace version of max}vv }2 “1 v J Cvv
whose solution is the dominant eigenpair of C, what we have studied here
*
"
trpV J AV q
J
` trpV CV q ,
(1.1)
max f pV q “ max
V J V “Iℓ
V J V “Iℓ trpV J BV q
is a subspace version of optimizing the sum of two Rayleigh quotients on the unit sphere.
We have made a number of discoveries, including the connection between (1.1) and the
nonlinear linear eigenvalue problem (3.9), and the elegant necessary characterization of
a global maximizer in terms of the extreme eigenvalues of (3.9) that leads to the SCF
iteration for solving (1.1). Our numerical experiments suggest that the SCF iteration is
very effective, much more efficient than Stiefel manifold-based optimization methods. We
also explain how to march out of a KKT point that is not a global maximizer.
Although (3.9) is about maximization, min f pV q, if needed, can be turned into one of
(1.1) (with a different f ). In fact,
"
*
trpV J r´AsV q
J
min f pV q “ ´ maxr´f pV qs “ ´ max
` trpV r´CsV q .
trpV J BV q
While the SCF iteration appeared to be very efficient in our numerical tests so far,
there are certain theoretical and numerical issues that remain unanswered. These issues
inlcude 1) sufficient conditions for the global maximizers, 2) further convergence analysis
for the SCF iteration (like Algorithm 7.1 but for the more general case), and 3) some
modifications, if any, of the SCF iteration to ensure its global convergence. They, among
others, will be the subjects of our future research.
We mentioned that (1.1) is a subspace version of optimizing the sum of two Rayleigh
quotients on the unit sphere (the case ℓ “ 1). But (1.1) is only one kind among variously
conceivable subspace versions. Other versions may deserve attentions, too. For instance,
it is easy to see that when ℓ “ 1, (1.1) is the same as
␣
(
max trppV J AV qpV J BV q´1 q ` trpV J CV q .
(9.1)
V J V “Iℓ
36
Thus (9.1) can be considered as another subspace version. We believe similar lines of
arguments of ours so far can be used to investigate (9.1).
It is interesting to note that (9.1) is essentially equivalent to
!
)
r qpY J BY
r q´1 q ` trppY J CY
r qpY J DY
r q´1 q ,
max
trppY J AY
(9.2)
rankpY q“ℓ
r and C
r are symmetric, and B
r and D
r are symmetric and positive definite. In fact,
where A
we can set
r 1{2 Y,
Z :“ D
r ´1{2 A
rD
r ´1{2 ,
A :“ D
r ´1{2 B
rD
r ´1{2 ,
B :“ D
r ´1{2 C
rD
r ´1{2
C :“ D
to translate (9.2) into
max
rankpZq“ℓ
␣
(
trppZ J AZqpZ J BZq´1 q ` trppZ J CZqpZ J Zq´1 q .
(9.3)
Now substitute Z “ V R (the thin QR factorization of Z) into (9.3) to arrive at (9.1).
References
[1] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms On Matrix Manifolds.
Princeton University Press, Princeton, NJ, 2008.
[2] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst (editors). Templates for the
solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia, 2000.
[3] M. T. Chu and N. T. Trendafilov. The orthogonally constrained regression revisited. J.
Comput. Graph. Stat., 10(4):746–771, 2001.
[4] C. Davis and W. Kahan. The rotation of eigenvectors by a perturbation. III. SIAM J. Numer.
Anal., 7:1–46, 1970.
[5] J. Demmel. Applied Numerical Linear Algebra. SIAM, Philadelphia, PA, 1997.
[6] M. Murat Dundar, Glenn Fung, Jinbo Bi, Sandilya Sathyakama, and Bharat Rao. Sparse
Fisher discriminant analysis for computer aided detection. In In Proceedings of the SIAM
International Conference on Data Mining, 2005.
[7] A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with orthogonality
constraints. SIAM J. Matrix Anal. Appl., 20(2):303–353, 1999.
[8] E. Fung and M. K. Ng. On sparse Fisher discriminant method for microarray data analysis.
Bioinformation, 2(5):230–234, 2007.
[9] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press,
Baltimore, Maryland, 3rd edition, 1996.
[10] U. Helmke and J. B. Moore. Optimization and Dynamical Systems. Springer-Verlag, London,
UK, 1994.
[11] R. A. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge University Press,
Cambridge, 1991.
37
[12] Andrew V. Knyazev. Toward the optimal preconditioned eigensolver: Locally optimal block
preconditioned conjugate gradient method. SIAM J. Sci. Comput., 23(2):517–541, 2001.
[13] Andrew V. Knyazev and Merico E. Argentati. Rayleigh-Ritz majorization error bounds with
applications to FEM. SIAM J. Matrix Anal. Appl., 31(3):1521–1537, 2010.
[14] R. M. Martin. Electronic Structure: Basic Theory and Practical Methods. Cambridge University Press, Cambridge, UK, 2004.
[15] J. W. Milnor and J. D. Stasheff. Characteristic Classes. Princeton University Press & University of Tokyo Press, 1974.
[16] M. K. Ng, L.-Z. Liao, and L. Zhang. On sparse linear discriminant analysis algorithm for
high-dimensional data classification. Numer. Linear Algebra Appl., 18(2):223–235, 2011.
[17] T. Ngo, M. Bellalij, and Y. Saad. The trace ratio optimization problem for dimensionality
reduction. SIAM J. Matrix Anal. Appl., 31(5):2950–2971, 2010.
[18] J. Nocedal and S. Wright. Numerical Optimization. Springer, 2nd edition, 2006.
[19] B. N. Parlett. The Symmetric Eigenvalue Problem. SIAM, Philadelphia, 1998.
[20] G. Primolevo, O. Simeone, and U. Spagnolini. Towards a joint optimization of scheduling
and beamforning for mimo downlink. In Spread Spectrum Techniques and Applications, 2006
IEEE Ninth International Symposium on, pages 493–497, 2006.
[21] Y. Saad. Numerical Methods for Large Eigenvalue Problems. Manchester University Press,
Manchester, UK, 1992.
[22] Yousef Saad, James R. Chelikowsky, and Suzanne M. Shontz. Numerical methods for electronic
structure calculations of materials. SIAM Rev., 52(1):3–54, 2010.
[23] V. R. Saunders and I. H. Hillier. A “levelshifting method for converging closed shell hartreefock
wave functions. Internat. J. Quantum Chem., 7(4):699–705, 1973.
[24] H. Shen, K. Diepold, and K. Hüper. A geometric revisit to the trace quotient problem. In
Proceedings of the 19th International Symposium of Mathematical Theory of Networks and
Systems (MTNS 2010), 2010.
[25] Gerard L. G. Sleijpen and Henk A. Van der Vorst. A Jacobi–Davidson iteration method for
linear eigenvalue problems. SIAM J. Matrix Anal. Appl., 17(2):401–425, 1996.
[26] G. W. Stewart. Matrix Algorithms, Vol. II: Eigensystems. SIAM, Philadelphia, 2001.
[27] G. W. Stewart and Ji-Guang Sun. Matrix Perturbation Theory. Academic Press, Boston,
1990.
[28] A. Szabo and N. S. Ostlund. Modern Quantum Chemistry: An Introduction To Advanced
Electronic Structure Theory. Dover, New York, 1996.
[29] L. Thøgersen, J. Olsen, D. Yeager, P. Jørgensen, P. Salek, and T. Helgaker. The trust-region
self-consistent field method: Towards a black-box optimization in HartreeFock and KohnSham
theories. J. Chem. Phys., 121(1):16–27, 2004.
[30] Huan Wang, Shuicheng Yan, Dong Xu, Xiaoou Tang, and T. Huang. Trace ratio vs. ratio trace
for dimensionality reduction. In Computer Vision and Pattern Recognition, 2007. CVPR ’07.
IEEE Conference on, pages 1–8, 2007.
[31] P.-Å. Wedin. On angles between subspaces. In B. Kågström and A. Ruhe, editors, Matrix
Pencils, pages 263–285, New York, 1983. Springer.
38
[32] C. Yang, W. Gao, and J. C. Meza. On the convergence of the self-consistent field iteration
for a class of nonlinear eigenvalue problems. SIAM J. Matrix Anal. Appl., 30(4):1773–1788,
2009.
[33] C. Yang, J. C. Meza, B. Lee, and L.-W. Wang. KSSOLV—a MATLAB toolbox for solving
the Kohn-Sham equations. ACM Trans. Math. Software, 36(2):1–35, 2009.
[34] C. Yang, J. C. Meza, and L.-W. Wang. A trust region direct constrained minimization
algorithm for the kohn-sham equation. SIAM J. Sci. Comput., 29(5):1854–1875, 2007.
[35] W. H. Yang, L.-H. Zhang, and R. Y. Song. Optimality conditions of the nonlinear programming on riemannian manifolds. Pacific J. Opt., 2013. to appear.
[36] L.-H. Zhang. On optimizing the sum of the Rayleigh quotient and the generalized Rayleigh
quotient on the unit sphere. Comput. Opt. Appl., 54(1):111–139, 2013.
[37] L.-H. Zhang, L.-Z. Liao, and M. K. Ng. Fast algorithms for the generalized Foley-Sammon
discriminant analysis. SIAM J. Matrix Anal. Appl., 31(4):1584–1605, 2010.
[38] L.-H. Zhang, L.-Z. Liao, and M. K. Ng. Superlinear convergence of a general algorithm for
the generalized Foley-Sammon discriminant analysis. J. Optim. Theory Appl., 157(3):853–865,
2013.
39