Maximization of the Sum of the Trace Ratio on the Stiefel Manifold Lei-Hong Zhang Ren-Cang Li Technical Report 2013-04 http://www.uta.edu/math/preprint/ Maximization of the Sum of the Trace Ratio on the Stiefel Manifold Lei-Hong Zhang˚ Ren-Cang Li : July 14, 2013 Abstract We are concerned with the maximization of trpV J AV q ` trpV J CV q trpV J BV q over the Stiefel manifold tV P Rmˆℓ |V J V “ Iℓ u pℓ ă mq, where B is a given symmetric and positive definite matrix, A and C are symmetric matrices, and trp ¨ q is the trace of a square matrix. This is a subspace version of the maximization problem studied in [Zhang, Comput. Optim. Appl., 54(2013), 111-139], which arises from real-world applications in, for example, the downlink of a multi-user MIMO system and the sparse Fisher discriminant analysis in pattern recognition. We establish necessary conditions for both the local and global maximizers and connect the problem with a nonlinear extreme eigenvalue problem. The necessary condition for the global maximizers offers deep insights into the problem on the one hand and leads to a self-consistent-field (SCF) iteration on the other hand. We analyze the global and local convergence of the SCF iteration, and show that the necessary condition for the global maximizers is fulfilled at any convergent point of the sequences of approximations generated by the SCF iteration. This is one of the advantages of the SCF iteration over optimizationbased methods. Preliminary numerical tests are reported and show that the SCF iteration is much more efficient than manifold-based optimization methods. Key words. Trace ratio, Rayleigh quotient, the Stiefel manifold, nonlinear eigenvalue problem, optimality conditions, Self-Consistent-Field iteration, eigenspace AMS subject classifications. 65F15, 65F30, 62H30, 15A18 ˚ Department of Applied Mathematics, Shanghai University of Finance and Economics, 777 Guoding Road, Shanghai 200433, People’s Republic of China. Email: [email protected]. Supported in part by the National Natural Science Foundation of China NSFC-11101257 and the Basic Academic Discipline Program, the 11th five year plan of 211 Project for Shanghai University of Finance and Economics. Part of this work is done while this author is a visiting scholar at the Department of Mathematics, University of Texas at Arlington from February 2013 to January 2014. : Department of Mathematics, University of Texas at Arlington, P.O. Box 19408, Arlington, TX 760190408, USA. Email: [email protected]. Supported in part by the National Science Foundation under Grant No. DMS-1115834. 1 Contents 1 Introduction 3 2 Preliminaries 2.1 Angles between subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Unitarily invariant norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 5 3 First and second optimality conditions 3.1 First-order optimality condition . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Second-order optimality condition . . . . . . . . . . . . . . . . . . . . . . . 6 7 8 4 A necessary condition for local maximizers 11 5 A necessary condition for global maximizers 12 6 A scheme to make progress at a KKT point 17 7 Self-consistent-field iteration 7.1 Self-Consistent-Field (SCF) Iteration . . . . . . . 7.2 A divergent example for SCF iteration . . . . . . 7.3 Global convergence for the case C “ ηB pη ą 0q . 7.4 Local convergence for the general case . . . . . . 20 20 21 22 26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Numerical experiments 33 9 Concluding remarks and future research 36 2 1 Introduction We consider the maximization problem: " * trpV J AV q J max ` trpV CV q , V J V “Iℓ trpV J BV q (1.1) where trp ¨ q stands for the trace of a square matrix, A, B, and C P Rmˆm are real symmetric with B positive definite, and integer ℓ ă m. The problem (1.1) for the case ℓ “ 1 was investigated in [36]. It can arise from realworld applications in, e.g., the downlink of a multi-user MIMO system [20] and the sparse Fisher discriminant analysis in pattern recognition [6, 8, 16, 36]. In [36], the optimality conditions for (1.1) with ℓ “ 1, including necessary conditions for local and global maximizers, are established. It is interesting to note that these conditions connect the optimization problem (1.1) with ℓ “ 1 to a nonlinear eigenvalue problem which offers fresh insights into (1.1) and injects new ideas for designing efficient algorithms. The present paper attempts to deal with the subspace version (1.1) and generalize these interesting properties for ℓ “ 1 to ℓ ą 1. But we point out that although the problem (1.1) with ℓ ą 1 and ℓ “ 1 share certain similar properties, the subspace version case is intrinsically harder, entailing new techniques for investigation. Our major contributions in this paper are the following. 1. Any critical point (aka KKT point) V of (1.1) is a solution to a nonlinear eigenvalue problem “ ‰ EpV qV “ V V J EpV qV , (1.2) where EpV q P Rmˆm to be given in section 3 is symmetric and is dependent on V . 2. A necessary condition for the critical point V to be a local maximizer is that the largest eigenvalue of V J EpV qV is no smaller than the p2ℓqth largest eigenvalues of EpV q. (Note the eigenvalues of V J EpV qV are part of those of EpV q because of (1.2).) 3. A necessary condition for the critical point V to be a global maximizer is that V is an orthonormal eigenbasis1 of EpV q corresponding to its ℓ largest eigenvalues. This necessary condition in item 3 together with (1.2) lend themselves to a self-consistentfield (SCF) iteration for computing a global maximizer. 4. We propose a scheme to march out a KKT point V , local maximizer included, that does not satisfy the necessary condition of a global maximizer in the previous item. 5. We analyze the convergence behavior of the SCF iteration and prove a global convergence result for the special case C “ ηB pη ą 0q. Locally linear and quadratic convergence for the general case is also established. 1 We follow [26, p.242] in using the term eigenbasis to refer to a matrix X whose columns are linearly independent and span an invariant subspace of another matrix H. X is an orthonormal eigenbasis if its columns are orthonormal vectors. 3 We note that there is no guarantee that the SCF iteration will deliver a global maximizer at convergence but a KKT point that satisfies the necessary condition for the global maximizers. This is actually an advantage of the SCF iteration over some optimizationbased methods [1, 18] which in general only converge to a KKT point that may or may not satisfy the necessary condition for the global maximizers. Numerical tests are presented to show that the SCF iteration in general is much more efficient than some manifold-based optimization methods [2, 7], both in accuracy and in running time. The rest of this paper is organized as follows. Section 2 collects some preliminaries that are needed in the convergence analysis of the SCF iteration. Section 3 develops the first and second order optimality conditions for local maximizers using the traditional optimality conditions which relate (1.1) to a special nonlinear eigenvalue problem (1.2). In sections 4 and 5, further characterizations of a local maximizer (section 4) and global maximizer (section 5) in terms of the eigenvalues of the associated nonlinear eigenvalue problem are obtained. A scheme to break out of a local maximizer/KKT point is proposed in section 6. The SCF iteration and its convergence analysis are given in section 7. Section 8 reports our numerical experiments. Finally, we present our conclusions in Section 9. Notation. Rnˆm is the set of all n ˆ m real matrices, Rn “ Rnˆ1 , and R “ R1 . In (or simply I if its dimension is clear from the context) is the n ˆ n identity matrix. All vectors are column vectors and are in bold. For Z P Rmˆn , Z J denotes its transpose, RpZq is its column space, spanned by its column vectors. For Z P Rnˆn , eigpZq “ tλi pZq, 1 ď i ď mu is the set of the eigenvalues of Z, and if also all λi pZq are real, they will be arranged in descending order: λ1 pZq ě λ2 pZq ě ¨ ¨ ¨ ě λm pZq. The symmetric part of Z is sympZq :“ 12 pZ ` Z J q. }Z}2 , }Z}F , and ~Z~ are the spectral norm, the Frobenius norm, and a general unitarily invariant norm, respectively. 2 Preliminaries 2.1 Angles between subspaces Consider two subspaces X and Y of Rm and suppose k :“ dimpXq ď dimpYq “: ℓ. (2.1) Let X P Rmˆk and Y P Rmˆℓ be orthonormal basis matrices of X and Y, respectively, i.e., X J X “ Ik , X “ RpXq, and Y J Y “ Iℓ , Y “ RpY q, and denote by σj for 1 ď j ď k in descending order, i.e., σ1 ě ¨ ¨ ¨ ě σk , the singular values of Y J X. The k canonical angles θj pX, Yq from2 X to Y are defined by 0 ď θj pX, Yq :“ arccos σj ď 2 π 2 for 1 ď j ď k. If k “ ℓ, we may says that these angles are between X and Y. 4 (2.2) They are in ascending order, i.e., θ1 pX, Yq ď ¨ ¨ ¨ ď θk pX, Yq. Set ΘpX, Yq “ diagpθ1 pX, Yq, . . . , θk pX, Yqq. (2.3) It can be seen that angles so defined are independent of the orthonormal basis matrices X and Y which are not unique. A different way to define these angles is through orthogonal projections onto X and Y [31]. When k “ 1, i.e., X is a vector, there is only one canonical angle from X to Y and so we will simply write θpX, Yq. In what follows, we sometimes place a vector or matrix in one of or both arguments of θj p ¨ , ¨ q, θp ¨ , ¨ q, and Θp ¨ , ¨ q with the understanding that it is about the subspace spanned by the vector or the columns of the matrix argument. } sin ΘpX, Yq}2 defines a distance metric between X and Y. It can be proved that } sin ΘpX, Yq}2 “ sin θk pX, Yq “ }X J YK }2 , where YK is an orthonormal basis matrix of the orthogonal complement of Y. 2.2 Unitarily invariant norm A matrix norm ~¨~ is called a unitarily invariant norm on Cmˆn (the set of mˆn complex matrices) if it is a matrix norm and has the following two properties 1. ~XBY ~ “ ~B~ for all unitary matrices X and Y of apt sizes and B P Cmˆn . 2. ~B~ “ }B}2 , the spectral norm of B, if rankpBq “ 1. Two commonly used unitarily invariant norms are the spectral norm: the Frobenius norm: }B}2 “ max břj σj , 2 }B}F “ j σj , where σ1 , σ2 , . . . , σmintm,nu are the singular values of B. The trace norm ÿ }B}tr “ σj j is a unitarily invariant norm, too. In this article, for convenience, any ~ ¨ ~ we use is generic to matrix sizes in the sense that it applies to matrices of all sizes. Examples include the matrix spectral norm } ¨ }2 , the Frobenius norm } ¨ }F , and the trace norm. Two important properties of unitarily invariant norms are }X}2 ď ~X~, ~XY Z~ ď }X}2 ¨ ~Y ~ ¨ }Z}2 (2.4) for any matrices X, Y , and Z of compatible sizes. By [11, p.176], | trpXq| ď }X}tr ď n}X}2 for X P Cnˆn . 5 (2.5) Lemma 2.1 ([4]). Let H and M be two Hermitian matrices, and let S be a matrix of a compatible size as determined by the Sylvester equation HY ´ Y M “ S. If either eigpHq is in a close interval that contains no eigenvalue of M or vice versa, then the equation has a unique solution Y , and moreover ~Y ~ ď 1 ~S~, τ where τ “ min |µ ´ ω| over all µ P eigpM q and ω P eigpHq. 3 First and second optimality conditions To simplify our presentation, we introduce the following notations ϕA pV q :“ trpV J AV q, ϕB pV q :“ trpV J BV q, ϕC pV q :“ trpV J CV q, (3.1) for any V P Rmˆℓ . The objective function in (1.1) now can be written as f pV q :“ ϕA pV q ` ϕC pV q, ϕB pV q (3.2) subject to the constraint V J V “ Iℓ . Let Omˆℓ :“ tV P Rmˆℓ : V J V “ Iℓ u. We remark that maximizing f pV q over Omˆℓ is much difficult than maximizing each q mˆℓ . A classical result for term, i.e., ϕϕBA pV pV q and ϕC pV q individually on O max trpV J CV q V POmˆℓ (3.3) reveals that any solution V of (3.3) is an orthonormal eigenbasis associated with the ℓ largest eigenvalues of C and there is no local but non-global maximizer (i.e., any local maximizer is also a global maximizer). A recent result in [24] for the trace ratio problem max V POmˆℓ trpV J AV q trpV J BV q (3.4) shows that no local but non-global maximizer exists, either. This nice property ensures that any monotonic and convergent iteration is capable of finding a global maximizer for (3.3) or (3.4) numerically. However, when f pV q is to be maximized over Omˆℓ , the story is very different because local but non-global maximizers can happen as a numerical example for the case ℓ “ 1 in [36] shows. Often necessary and sufficient conditions for global maximizers of an optimization problem are generally difficult to establish. This is the case for maximizing f pV q here, too. In this section, we will first resort to the traditional approach to derive first-order and second-order optimality conditions. They usually follow from the classical Lagrange 6 multiplier theory (e.g., [18]), but considering the nice geometry properties of the underlying Riemannian manifold Omˆℓ , we prefer to treating the function f pV q as the restriction f|Omˆℓ pV q : Omˆℓ Ñ R that is to be maximized as an unconstrained optimization problem. For more discussions on the optimality conditions of the nonlinear programming on manifolds, the reader is referred to [1, 35] and the references therein. We can view Omˆℓ as an embedded submanifold (the Stiefel manifold, see e.g., [1, 3, 7, 10]) of the Euclidean space Rmˆℓ . The tangent space TV Omˆℓ at any V P Omˆℓ can be expressed as [1, p.42], [3, Theorem 1] TV Omˆℓ : “ tX P Rmˆℓ : X J V ` V J X “ 0u (3.5a) “ tX “ V K ` pIm ´ V V qJ : K “ ´K P R J J ℓˆℓ ,J P R mˆℓ u. (3.5b) On TV Omˆℓ , we introduce the standard inner product (or the Frobenius inner product) @ X, Y P TV Omˆℓ . xX, Y y “ trpX J Y q, It is known that the orthogonal projection of Z P Rmˆℓ onto the tangent space TV Omˆℓ at V is given by [1, (3.35)], [3, Corollary 1] ˆ J ˙ V Z ´ Z JV ΠT pZq : “ V ` pIm ´ V V J qZ (3.6a) 2 V JZ ` Z JV “ Z ´ V sympV J Zq P TV Omˆℓ . (3.6b) “Z ´V 2 3.1 First-order optimality condition Using the projection defined by (3.6), we can find the gradient of f|Omˆℓ pV q. In fact, „ ȷ 1 ϕA pV q Bf pV q “2 A ´B ` C V “ 2 EpV qV. (3.7) BV ϕB pV q rϕB pV qs2 looooooooooooooooooomooooooooooooooooooon “: EpV q Thus the gradient grad f|Omˆℓ pV q is given by (see e.g., [1, (3.37)]) ˆ ˙ ␣ “ J ‰( Bf pV q grad f|Omˆℓ pV q “ ΠT “ 2 EpV qV ´ V loooooomoooooon V EpV qV . BV (3.8) “: MV The first-order optimality condition grad f|Omˆℓ pV q “ 0 leads to Theorem 3.1. If V P Omˆℓ is a local maximizer of (1.1), then EpV qV “ V MV , (3.9) where EpV q is defined in (3.7), and MV is defined in (3.8). Therefore eigpMV q Ă eigpEpV qq, and V is an orthonormal eigenbasis of EpV q associated with its eigenvalues given by eigpMV q. 7 Remark 3.1. Under the condition of Theorem 3.1, trpMV q “ ϕC pV q. Since f pV Zq ” f pV q for any orthogonal matrix Z P Rℓˆℓ , if V is a local maximizer, so is V Z. That is any local maximizer V gives rise to a set of local maximizers tV Z : Z J Z “ Iℓ u and they all produce the same objective value. Within the set, there is one V0 that makes MV0 diagonal: MV0 “ diagpλπ1 pEpV0 qq, . . . , λπℓ pEpV0 qqq, V0 “ rvv 1 , . . . , v ℓ s, (3.10) where v i P Rm is the unit eigenvector corresponding to λπi pEpV0 qq, and λπ1 pEpV0 qq ě ¨ ¨ ¨ ě λπℓ pEpV0 qq. This observation is also reflected by the equation (3.9) which is equivalent to EpV Zq V Z “ V ZMV Z , for any given orthogonal matrix Z P Rℓˆℓ . 3.2 Second-order optimality condition Next we will establish the second-order optimality condition for (1.1). Set gpV q :“ grad f|Omˆℓ pV q. The second-order optimality condition relies on the Riemannian Hessian operator. For f|Omˆℓ pV q, the symmetric Riemannian Hessian operator at a point V P Omˆℓ hess f|Omˆℓ pV q : TV Omˆℓ Ñ TV Omˆℓ , X ÞÑ ▽X grad f|Omˆℓ pV q involves the so-called affine connection ▽. If the natural Riemannian connection [1, section 5.5] is used as the affine connection, the action of Riemannian Hessian of f|Omˆℓ pV q on a tangent “vector” X P TV Omˆℓ can be expressed by [1, Definition 5.5.1 and Proposition 5.3.2] hess f|Omˆℓ pV qrXs “ ΠT pDgpV qrXsq, (3.11) where DgpV qrXs represents the classical directional derivative at V P Omˆℓ along X. It is readily to derive, by (3.8), that ␣ ( 1 DgpV qrXs “ EpV qX ` DEpV qrXs V ´ XV J EpV qV 2 ␣ ( ´ V X J EpV qV ´ V V J DEpV qrXs V ´ V V J EpV qX “ ‰ “ EpV qX ´ V X J EpV qV ` V J EpV qX ␣ ( ´ XV J EpV qV ` pIm ´ V V J q DEpV qrXs V, (3.12) and DEpV qrXs “ ´ ” ı 4 trpV J AV q trpX J BV q 2 J J B trpX BV qA ` trpX AV qB ` rtrpV J BV qs2 rtrpV J BV qs3 8 trpV J AV q trpX J BV q trpX J BV qA ` trpX J AV qB B ´ 2 rtrpV J BV qs3 rtrpV J BV qs2 “: GpV, Xq. “4 (3.13) Thus, by (3.11) and by noting ΠT prIm ´ V V J sZ1 q “ pIm ´ V V J qZ1 for Z1 P Rmˆℓ , for Z2 “ Z2J P Rℓˆℓ , ΠT pV Z2 q “ 0 we have ” hess f|Omˆℓ pV qrXs “ 2 EpV qX ´ V sympV J EpV qXq ´ XV J EpV qV ı `V sympV J XV J EpV qV q ` GpV, XqV ´ V V J GpV, XqV . (3.14) Throughout the rest of this section, V P Omˆℓ is assumed to be a critical point (aka a KKT point), i.e., satisfying (3.9). We shall investigate the second-order optimality condition at V . To this end, we need to find out hess f|Omˆℓ pV qrX, Xs “ xhess f|Omˆℓ pV qrXs, Xy for X P TV Omˆℓ . Notice that for any symmetric matrix Z P Rℓˆℓ , it follows from (3.5a) that xX, V Zy “ trpX J V Zq “ trpZV J Xq “ ´ trpZX J V q “ ´ trpX J V Zq “ ´xX, V Zy, implying xX, V Zy “ 0. Therefore, from (3.14) we have xhess f|Omˆℓ pV qrXs, Xy “ 2xX, EpV qXy ´ 2xX, XV J EpV qV ` GpV, XqV y, (3.15) and consequently, we can state the second-order optimality condition as follows. Theorem 3.2. If V is a local maximizer of (1.1), then trpX J EpV qXq ´ trpXMV X J q ´ trpX J GpV, XqV q ď 0 for any X P TV Omˆℓ , (3.16) where GpV, Xq is defined in (3.13), and MV “ V J EpV qV in (3.8). On the other hand, if V P Omˆℓ satisfies (3.9) and if (3.16) is a strict inequality, then V is a strict local maximizer. Proof. The second-order necessary condition for V to be a local maximizer is [18, 35] xX, hess f|Omˆℓ pV qrXsy ď 0 for any X P TV Omˆℓ which yields (3.16) because of (3.15). The second part is the second-order sufficient optimality condition [35]. The second-order optimality condition in Theorem 3.2 is presented in terms of the tangent vector X P TV Omˆℓ . In the next theorem, we describe the optimality condition in terms of any J P Rmˆℓ . 9 Theorem 3.3. If V is a local maximizer of (1.1), then for all J P Rmˆℓ trpJ J EpV qJq ` trpV J JMV J J V q ´ trpJ J V MV V J Jq ´ trpJMV J J q `4 trpJ J rIm ´ V V J sBV q trpJ J rIm ´ V V J sCV q ď 0. (3.17) ϕB pV q On the other hand, if V P Omˆℓ satisfies (3.9) and if (3.17) is strict for any nonzero J P Rmˆℓ , then V is a strict local maximizer. Proof. By (3.5b), we know that ␣ ( TV Omˆℓ “ X “ V K ` pIm ´ V V J qJ : K “ ´K J P Rℓˆℓ , J P Rmˆℓ , i.e., X “ V K ` pIm ´ V V J qJ P TV Omˆℓ for any skew-symmetric K and arbitrary J P Rmˆℓ . For such an X, we have EpV qX “ EpV qV K ` EpV qJ ´ EpV qV V J J “ V MV K ` EpV qJ ´ V MV V J J, (3.18) where we have used (3.9), and trpX J EpV qXq “ trpK J MV Kq ` trpK J V J EpV qJq ´ trpK J MV V J Jq ` trpJ J rIm ´ V V J sEpV qJq “ trpK J MV Kq ` trpJ J rIm ´ V V J sEpV qJq “ trpK J MV Kq ` trpJ J EpV qJq ´ trpJ J V MV V J Jq. It can be verified that X J X “ K J K ` J J pIm ´ V V J qJ. Thus trpX J XMV q “ trpK J KMV q ` trpJ J JMV q ´ trpJ J V V J JMV q “ trpKMV K J q ` trpJ J JMV q ´ trpV J JMV J J V q “ trpK J MV Kq ` trpJ J JMV q ´ trpV J JMV J J V q, where we have used K “ ´K J for the last equality. Hence, trpX J EpV qXq ´ trpXMV X J q “ trpJ J EpV qJq ` trpV J JMV J J V q ´ trpJ J V MV V J Jq ´ trpJMV J J q. (3.19) Lastly, we compute trpX J GpV, XqV q as follows. 4ϕA pV qrtrpX J BV qs2 4 trpX J BV q trpX J AV q ´ rϕB pV qs3 ϕB pV q J 4 trpX BV q “ rϕA pV q trpX J BV q ´ ϕB pV q trpX J AV qs rϕB pV qs3 trpX J GpV, XqV q “ 10 4 trpX J BV q xX, rϕA pV qB ´ ϕB pV qAsV y rϕB pV qs3 4 trpX J BV q “ xX, rϕB pV qs2 rC ´ EpV qsV y rϕB pV qs3 4 trpX J BV q “ xX, rC ´ EpV qsV y. ϕB pV q “ (3.20) Note from (3.18) that xX, EpV qV y “ xV, V MV K ` EpV qJ ´ V MV V J Jy “ trpMV Kq ` trpJ J EpV qV q ´ trpMV V J Jq “ trpJ J V MV q ´ trpMV V J Jq “ 0, (3.21) and xX, CV y “ xV K ` pIm ´ V V J qJ, CV y “ trpJ J rIm ´ V V J sCV q, xX, BV y “ xV K ` pIm ´ V V J qJ, BV y “ trpJ J rIm ´ V V J sBV q which, together with (3.16), (3.19), (3.20) and (3.21), lead to (3.17). The second part follows directly from Theorem 3.2. 4 A necessary condition for local maximizers The purpose of this section is to establish a necessary condition for a local maximizer V of (1.1) in terms of the eigenvalues of EpV q. The same will be done for a global maximizer in section 5. Suppose V is a local maximizer of (1.1). According to Theorem 3.1 and Remark 3.1, eigpMV q Ă eigpEpV qq, i.e., eigpMV q “ tλπi pEpV qq, i “ 1, 2, . . . , ℓu, (4.1) where 1 ď π1 ă π2 ¨ ¨ ¨ ă πℓ ď m. Theorem 4.1. Let V P Omˆℓ be a local maximizer of (1.1), and denote eigpMV q by (4.1). Then λπ1 pEpV qq ě λ2ℓ pEpV qq. (4.2) Proof. Assume, to the contrary, that (4.2) were false, i.e., λπ1 pEpV qq ă λ2ℓ pEpV qq. (4.3) Let J1 P Rmˆ2ℓ be an orthonormal eigenbasis of EpV q corresponding to its 2ℓ largest eigenvalues λi pEpV qq for 1 ď i ď 2ℓ. Then J1J V “ 0 because V is the orthonormal eigenbasis of EpV q corresponding to its eigenvalues λπi pEpV qq for 1 ď i ď ℓ, all of which 11 were assumed less than λ2ℓ pEpV qq by (4.3). For any Q P R2ℓˆℓ with orthonormal columns, let J “ J1 Q P Rmˆℓ . Then J J J “ Iℓ and J J V “ QJ J1J V “ 0, and thus trpJ J EpV qJq ą trpV J EpV qV q “ trpMV q “ trpJ J JMV q “ trpJMV J J q. (4.4) Therefore, it follows from (3.17) that trpJ J EpV qJq ´ trpMV q ` 4 trpJ J BV q trpJ J CV q ď 0. ϕB pV q (4.5) On the other hand, we note that x J q, p Σ1 W J J BV “ QJ pJ1J BV q “ QJ pU p Σ1 W x J is the SVD of J J BV and U p P R2ℓˆℓ , Σ1 P Rℓˆℓ and W x P Rℓˆℓ . Since Q where U 1 p , Qs P R2ℓˆ2ℓ is orthogonal. We is arbitrary so far, we now let Q be the one such that rU have xJ “ 0 p Σ1 W J J BV “ QJ pJ1J BV q “ QJ U which leads to trpJ J EpV qJq ´ trpMV q ď 0, contradicting (4.4). Therefore (4.3) cannot be true. That leaves (4.2). Theorem 4.1 basically says that any local maximizer V will never be an orthonormal eigenbasis of EpV q associated with the ℓ eigenvalues all less than λ2ℓ pEpV qq. For the special case with ℓ “ 1, Theorem 4.1 implies that any local maximizer v P Rm can only be the eigenvector associated with the largest or the second largest eigenvalue of Epvv q. This is exactly what was proved in [36, Theorem 3.3], even though the matrix Epvv q in [36] takes a different form. Theorem 4.1 can be considered as a generalization of [36, Theorem 3.3] for the necessary condition of a local maximizer. 5 A necessary condition for global maximizers In this section and the next, C is assumed to be positive definite. This assumption is not essential because one can always shift C to C ` ξIm for some ξ P R to make C positive definite. The shift does not change the maximizer V . Theorem 5.1. Suppose C is positive definite. If V is a global maximizer of (1.1), then it must be an orthonormal eigenbasis3 of EpV q corresponding to its ℓ largest eigenvalues λi pEpV qq for 1 ď i ď ℓ. This result generalizes the necessary condition for the vector version [36, Theorem 3.5] of the problem (1.1), i.e., the case ℓ “ 1. Its proof, similar to [36], hinges on the relationship between ∆f pJ, V q :“ f pJq ´ f pV q and ∆EpJ, V q :“ trpJ J EpV qJq ´ trpV J EpV qV q. This is established in the following lemma. 3 RpV q is not unique unless λℓ pEpV qq ą λℓ`1 pEpV qq. 12 (5.1) Lemma 5.1. For any J, V P Omˆℓ , we have ∆f pJ, V q “ f pJq ´ f pV q ϕB pV q∆EpJ, V q ` rϕB pJq ´ ϕB pV qsrϕC pJq ´ ϕC pV qs “ . ϕB pJq (5.2) Proof. Note that trpV J EpV qV q “ ϕC pV q and ϕA pJq ϕA pV qϕB pJq ` ϕC pJq ´ ϕC pV q ´ ϕB pV q rϕB pV qs2 ˆ ˙ ϕB pJq ϕA pJq ϕA pV q “ ´ ` ϕC pJq ´ ϕC pV q ϕB pV q ϕB pJq ϕB pV q ‰ ϕB pJq “ “ ∆f pJ, V q ´ ϕC pJq ` ϕC pV q ` ϕC pJq ´ ϕC pV q ϕB pV q ϕB pJq ϕB pV q ´ ϕB pJq “ ∆f pJ, V q ` rϕC pJq ´ ϕC pV qs ϕB pV q ϕB pV q ∆EpJ, V q “ solving which for ∆f pJ, V q leads to (5.2). Rewrite (5.2) as ∆f pJ, V q “ f pJq ´ f pV q “ ϕB pV q∆EpJ, V q ` δpJ, V q , ϕB pJq (5.3) where δpJ, V q :“ rϕB pJq ´ ϕB pV qsrϕC pJq ´ ϕC pV qs. (5.4) We are now ready to prove Theorem 5.1. Proof of Theorem 5.1. We prove it by contradiction. Suppose V “ rvv 1 , . . . , v ℓ s is a global maximizer but is not an orthonormal eigenbasis associated with the ℓ largest eigenvalues of EpV q. We shall construct U P Omˆℓ such that f pU q ą f pV q, contradicting the assumption that V is a global maximizer. So the conclusion of the theorem holds. As we commented in Remark 3.1, we may assume, without loss of generality, that EpV qvv i “ λπi pEpV qqvv i , i “ 1, 2, . . . , ℓ, where pλπi pEpV qq, v i q is the πi th largest eigenpair of EpV q and λπ1 pEpV qq ě λπ2 pEpV qq ě . . . ě λπℓ pEpV qq. According to our assumption, we know that there exists at least one i such that λπi pEpV qq ă λi pEpV qq. Let p be the first such an i, i.e., p :“ min ti : λπi pEpV qq ă λi pEpV qqu. 1ďiďℓ 13 (5.5) There is a unit eigenvector u of EpV q such that4 u “ λp pEpV qqu u, EpV qu and u Jv i “ 0 for i “ 1, 2, . . . , ℓ. (5.6) Now, for any α, β P R, define u ` βvv p , v p`1 , . . . , v ℓ s Uα,β “ rvv 1 , v 2 , . . . , v p´1 , αu us e J “ V ´ rp1 ´ βqvv p ´ αu p looooooooomooooooooon “: y “V ´ yeJ p P Rmˆℓ , (5.7) where e p is the pth column of Im . It can be verified that Uα,β P Omˆℓ for all α, β P R satisfying α2 ` β 2 “ 1, and if also α ‰ 0, J ∆EpUα,β , V q “ trpUα,β EpV qUα,β q ´ trpMV q “ α2 rλp pEpV qq ´ λπp pEpV qqs ą 0. (5.8) The rest of the proof is to look for some α, β P R satisfying α2 ` β 2 “ 1 such that f pUα,β q ą f pV q. To this end, we recall (5.1). So, equivalently, we need to make ∆f pUα,β , V q ą 0. By (5.3) and (5.8), it suffices to find α, β P R, α2 ` β 2 “ 1, α ‰ 0, and δpUα,β , V q “ 0 which is equivalent to either ϕB pUα,β q “ ϕB pV q or ϕC pUα,β q “ ϕC pV q. We have J Uα,β V “ V J Uα,β “ Iℓ ´ p1 ´ βqeepe J p, J trpUα,β rIm J ´ V V sBV q “ “ J J trpUα,β BV q ´ ϕB pV q ` p1 ´ βq trpeepe J p V BV J ´ trpeep y J BV q ` p1 ´ βq trpeepe J p V BV q (5.9) q J “ ´yy J BV e p ` p1 ´ βqeeJ p V BV e p v p ` αu uJ Bvv p ` p1 ´ βqvv J vp “ ´p1 ´ βqvv J p Bv p Bv uJ Bvv p , “ αu (5.10) and similarly, J uJ Cvv p . trpUα,β rIm ´ V V J sCV q “ αu (5.11) Since V is a global maximizer, we have, by (5.9), (5.10), (5.11), and by applying Theorem 3.2 to Uα,β , J trpUα,β EpV qUα,β q ´ trpMV q ` 4 uJ Bvv p qpu uJ Cvv p q α2 pu ď 0. ϕB pV q (5.12) This inequality, together with (5.8), imply that uJ Bvv p qpu uJ Cvv p q ă 0. pu 4 Such u can be found even if λp “ λp´1 “ λπp´1 . 14 (5.13) On the other hand, we have, by (5.7), J ϕB pUα,β q “ trpUα,β BUα,β q “ ϕB pV q ´ 2 trpeepy J BV q ` trpeepy J Byye J pq “ ϕB pV q ´ 2yy J Bvv p ` y J Byy 2 J u Bu u ` 2αβu uJ Bvv p ` pβ 2 ´ 1qvv J vp , “ ϕB pV q ` α p Bv loooooooooooooooooooooooooomoooooooooooooooooooooooooon (5.14) “: hB pα,βq 2 J u ` 2αβu uJ Cvv p ` pβ 2 ´ 1qvv J vp . ϕC pUα,β q “ ϕC pV q ` α u Cu p Cv loooooooooooooooooooooooooomoooooooooooooooooooooooooon (5.15) “: hC pα,βq Because V is a global maximizer, we have from (5.3) that 0 ě f pU1,0 q ´ f pV q “ ϕB pV q∆EpU1,0 , V q ` δpU1,0 , V q . ϕB pU1,0 q (5.16) J EpV qU q ´ trpM q ą 0, and therefore, (5.16) implies By (5.8), ∆EpU1,0 , V q “ trpU1,0 1,0 V 0 ą δpU1,0 , V q “ rϕB pU1,0 q ´ ϕB pV qsrϕC pU1,0 q ´ ϕC pV qs “ hB p1, 0q hC p1, 0q uJ Bu u ´ vJ v p sru uJ Cu u ´ vJ v p s. “ ru p Bv p Cv (5.17) There are two possible cases implied by (5.17): " u ą vJ v p and u J Cu u ă vJ vp, Case 1 : u J Bu p Bv p Cv J J J u ă v p Bvv p and u Cu u ą vJ Case 2 : u Bu Cv p vp. (5.18) We will show that for each case, there exist real α ‰ 0 and real β ‰ 0 satisfying α2 `β 2 “ 1 and f pUα,β q ą f pV q. The solutions of either hB pα, βq “ 0 or hC pα, βq “ 0 turn out to be ones we are looking for. Suppose Case 1. Consider the equation hB pα, βq “ 0 which can be regarded as a quadratic equation of α with a parameter β. Since uJ Bvv p q2 ´ 4pβ 2 ´ 1qpu uJ Bu uqpvv J v p q ą 0 if |β| ă 1, 4pβu p Bv there are two roots uJ Bvv p ˘ ´βu αB,˘ pβq “ b uJ Bvv p q2 ´ pβ 2 ´ 1qpu uJ Bu uqpvv J vpq pβu p Bv u u J Bu . (5.19) We claim that 2 ψB pβq :“ αB,` pβq ` β 2 ´ 1 “ 0 has at least one nonzero root β̃ in p´1, 1q. To prove this claim, we note that ψB p0q “ vp vJ p Bv v p ). u ą vJ ´ 1 ă 0 (because Case 1 implies u J Bu p Bv J u u Bu 15 (5.20) On the other hand, from (5.13), we know u J Bvv p ‰ 0. Now uJ Bvv p 2u ą 0 if u J Bvv p ą 0, u u J Bu uJ Bvv p 2u ą 0 if u J Bvv p ă 0. ψB p`1q “ ´ J u Bu u ψB p´1q “ So there exists at least one β̃ in either p´1, 0q or p0, 1q, depending on wether uJ Bvv p ą 0 or u J Bvv p ă 0, such that ψB pβ̃q “ 0. For Case 2, we consider the equation hC pα, βq “ 0. The same argument, except changing B to5 C, concludes that there exists at least one β̃ in either p´1, 0q or p0, 1q, depending on wether u J Cvv p ą 0 or u J Cvv p ă 0, such that ψC pβ̃q “ 0, where ψC pβq :“ 2 pβq ` β 2 ´ 1, and αC,` b uJ Cvv p ` pβu uJ Cvv p q2 ´ pβ 2 ´ 1qpu uJ Cu uqpvv J vpq ´βu p Cv αC,` pβq “ . (5.21) u u J Cu Let α̃ “ αB,` pβ̃q for Case 1 and α̃ “ αC,` pβ̃q for Case 2. This α̃ ‰ 0 and α̃2 ` β̃ 2 “ 1. So ∆EpUα̃,β̃ , V q ą 0, and either ϕB pUα̃,β̃ q “ ϕB pV q for Case 1 or ϕC pUα̃,β̃ q “ ϕC pV q for Case 2, i.e., always δpUα̃,β̃ , V q “ 0. By (5.3), ∆f pUα̃,β̃ , V q “ f pUα̃,β̃ q ´ f pV q ą 0, i.e., f pUα̃,β̃ q ą f pV q which is a contradiction since V is a global maximizer. The proof is completed. As an interesting application of Theorem 5.1, we can give a characterization of a subset of the global maximizers in some special cases, including C “ ηB for some η ą 0. Theorem 5.2. Suppose C is positive definite. Let V be a global maximizer to (1.1) and E1 pEpV qq be the set of all orthonormal eigenbasis corresponding to the ℓ largest eigenvalues of EpV q. If δpJ, V q “ rϕB pJq ´ ϕB pV qsrϕC pJq ´ ϕC pV qs ě 0 for any J P E1 pEpV qq, (5.22) then J is a global maximizer of (1.1). Moreover, for any V̄ P E1 pEpV qq, we have either ϕB pV q “ ϕB pV̄ q or ϕC pV q “ ϕC pV̄ q. Proof. For any V̄ P E1 pEpV qq, we have ∆EpV̄ , V q “ 0 by (5.1). Use (5.22) to get f pV̄ q ´ f pV q “ δpV̄ , V q ϕB pV q∆EpV̄ , V q ` δpV̄ , V q “ ě 0, ϕB pV̄ q ϕB pV̄ q which implies V̄ is a global maximizer, and furthermore, δpV̄ , V q “ 0 implying either ϕB pV q “ ϕB pV̄ q or ϕC pV q “ ϕC pV̄ q. For the special case C “ ηB pη ą 0q, δpV̄ , V q “ ηrϕB pV q ´ ϕB pV̄ qs2 ě 0 for any V and V̄ of apt sizes. Therefore, by Theorem 5.2, if V is a global maximizer, then any V̄ P E1 pEpV qq is also a global maximizer. Moreover, ϕB pV q “ ϕB pV̄ q and ϕC pV q “ ϕC pV̄ q which imply ϕA pV q “ ϕA pV̄ q also. 5 This is where that C is positive definite is required. 16 6 A scheme to make progress at a KKT point As we mentioned at the beginning of section 5, C is assumed positive definite in this section, too. In parallel to the vector version of the problem (1.1) [36, Section 3.3], we suggest a strategy to select the starting point for any iterative method that is capable of converging to KKT points monotonically. We argue that, under a mild condition (Assumption (6.2) below), as long as the necessary condition in Theorem 5.1 for a global maximizer is not satisfied, such strategy can be applied to return another approximation with a larger objective value. Suppose V̄ is a KKT point, i.e., satisfying (3.9), but the eigenvalues of MV̄ “ V̄ J EpV̄ qV̄ do not consist of the ℓ largest eigenvalues of EpV̄ q; so V̄ is not a global maximizer, according to Theorem 5.1. Such a V̄ could be obtained, for example, by an iterative procedure which may no longer be able to make further progress but to stop. Owing to the constructive proof of Theorem 5.1 in the previous section, we will explain an idea to break out of the KKT point to a new point Vr0 that satisfies f pVr0 q ą f pV̄ q. Thus the iterative procedure that got to V̄ in the first place can continue at the new point Vr0 to the next KKT point. Let the eigen-decomposition of MV̄ be MV̄ “ U diagpλπ1 pEpV̄ qq, . . . , λπℓ pEpV̄ qqqU J , (6.1a) λπ1 pEpV̄ qq ě ¨ ¨ ¨ ě λπℓ pEpV̄ qq. (6.1b) Set V “ V̄ U , and note EpV̄ q “ EpV q. Define p and u by (5.5) and (5.6), respectively. According to the proof of Theorem 5.1, if uJ Bu u ´ vJ v p qpu uJ Cu u ´ vJ v p q, 0 ď δpU1,0 , V q “ pu p Bv p Cv then f pU1,0 q ą f pV q “ f pV̄ q, where U1,0 “ rvv 1 , . . . , v p´1 , u , v p`1 , . . . , v ℓ s. So we may simply take Vr0 “ U1,0 . Otherwise, suppose δpU1,0 , V q ă 0. This is the situation where we need the following assumption6 that should be often true. uJ Bvv p qpu uJ Cvv p q ‰ 0. Assumption: pu (6.2) That δpU1,0 , V q ă 0 results in two mutually exclusive cases given in (5.18): 6 uJ Bvv p qpu uJ Cvv p q ă 0, only its implication Recall (5.13) in the proof of Theorem 5.1, where although pu J J that both u Bvv p and u Cvv p are nonzero were used later in the proof. 17 2 pβ̃q ` β̃ 2 ´ 1 “ 0 for • For Case 1, solve ψB pβ̃q :“ αB,` # β̃ P p 0, 1q, β̃ P p´1, 0q, if u J Bvv p ă 0, if u J Bvv p ą 0, (6.3) where αB,` pβq is given by (5.19); 2 pβ̃q ` β̃ 2 ´ 1 “ 0 for • For Case 2, solve ψC pβ̃q :“ αC,` # β̃ P p 0, 1q, β̃ P p´1, 0q, if u J Cvv p ă 0, if u J Cvv p ą 0, (6.4) where αC,` pβq is given by (5.21). Finally, let α̃ “ αB,` pβ̃q for Case 1 and α̃ “ αC,` pβ̃q for Case 2, respectively, and take Vr0 “ Uα̃,β̃ , where Uα,β is defined by (5.7). By the proof of Theorem 5.1, f pVr0 q “ f pUα̃,β̃ q ą f pV q “ f pV̄ q. Conceivably, we may seek Vr0 by maximizing f pUα,β q over α, β P R subject to α2 `β 2 “ 1 because a better Vr0 in terms of a larger value of the objective function will be gotten than Uα̃,β̃ we just defined. It turns out that maximizing f pUα,β q is not trivial and may have to be solved iteratively, too. The above construction requires (6.2). Can it fail? The following proposition says it cannot if V̄ is a local but non-global maximizer. Proposition 6.1. Suppose C is positive definite, and let V̄ be a local but non-global maximizer V̄ of (1.1). Let the eigen-decomposition of MV̄ be given by (6.1), and set V “ V̄ U “ rvv 1 , . . . , v ℓ s, p :“ min ti : λπi pEpV̄ qq ă λi pEpV̄ qqu, 1ďiďℓ u “ λp pEpV̄ qqu u EpV̄ qu and V̄ Ju “ 0. uJ Bvv p qpu uJ Cvv p q ă 0. As a result, (6.2) holds. Then pu Proof. By Theorem 5.1, we know that 1 ď p ď ℓ. For any two α, β P R with α2 ` β 2 “ 1, let Uα,β P Omˆℓ be as defined in (5.7). Since V is also a local but non-global maximizer, by (5.9), (5.10) and (5.11) and by applying Theorem 3.2 to Uα,β , we have the inequality J EpV qU (5.12). By (5.8), trpUα,β α,β q ´ trpMV q ą 0 for 0 ă |α| ă 1. Now use (5.12) to J J u Bvv p qpu u Cvv p q ă 0. conclude pu In general for a KKT point that is not a local maximizer, we do not know if the assumpuJ Bvv p qpu uJ Cvv p q “ 0 tion (6.2) fail or not. But we propose the following workaround. If pu uJ Bvv p qpu uJ Cvv p q is nonzero. This proand p ă ℓ, we can simply let p :“ p ` 1 and check if pu J J u Bvv p qpu u Cvv p q ‰ 0 is encountered, or p “ ℓ but still pu uJ Bvv p qpu uJ Cvv p q “ cess repeats until pu 18 Algorithm 6.1 Breaking out of a KKT point Given A KKT point V̄ which is not an orthonormal eigenbasis corresponding to the ℓ largest eigenvalues of EpV̄ q, this algorithm computes Vr0 P Omˆℓ such that f pVr0 q ě f pV̄ q. 1: compute the eigen-decomposition of MV̄ as in (6.1), and let V “ V̄ U ; 2: compute p and u by (5.5) and (5.6), respectively; uJ Bvv p qpu uJ Cvv p q “ 0 and p ď ℓ do 3: while pu 4: p :“ p ` 1; 5: end while 6: if p ď ℓ then if 0 ď δpU1,0 , V q then 7: 8: Vr0 “ U1,0 ; 9: else 10: solve (6.3) in Case 1 or (6.4) in Case 2 for β̃; 11: α̃ “ αB,` pβ̃q if Case 1 or α̃ “ αC,` pβ̃q if Case 2; 12: Vr0 “ Uα̃,β̃ ; 13: end if 14: else 15: Vr0 “ V̄ ; /* no improvement is made */ 16: end if 0, the worst case for which the workaround fails, too. But that should be rare. The complete algorithm, including this workaround, is summarized in Algorithm 6.1. The above strategy requires solving either ψB pβq “ 0 or ψC pβq “ 0. Both equations can be turned into quadratic equations in β 2 of the form: α4 β 4 ` α2 β 2 ` α0 “ 0, (6.5) where for ψX pβq “ 0 with X “ B or C, the coefficients of (6.5) are given by u, x11 “ u J Xu vp, x22 “ v J p Xv x12 “ u J Xvv p , η “ x212 ´ x11 x22 , α4 “ 4x12 η ´ px211 ` x212 ` ηq2 , α2 “ 4x11 x12 x22 ` 2x11 px11 ´ x22 qpx211 ` x212 ` ηq, α0 “ ´x211 px11 ´ x22 q2 . Depending on the cases, one can solve (6.5) for the right β. Note α22 ´ 4α0 α4 “ 16x211 x212 rx12 px211 ´ x222 q ` x222 s. Alternatively, one can certainly employ some iterative methods such as the bisection or the Newton iteration to solve ψX pβq “ 0 for β. 19 Algorithm 7.1 A Self-Consistent-Field iteration Given V0 P Omˆℓ and a tolerance tol, this algorithm computes an approximate maximizer for the optimization problem (1.1). 1: for k “ 0, 1, . . . do 2: compute an orthonormal eigenbasis Vk`1 of EpVk q associated with its ℓ largest eigenvalues; 3: if rk :“ }EpVk`1 qVk`1 ´ Vk`1 MVk`1 }2 ď tol then BREAK; 4: 5: end if 6: end for 7: return Vk`1 as an approximate maximizer. 7 Self-consistent-field iteration 7.1 Self-Consistent-Field (SCF) Iteration Theorem 5.1, on one hand, sheds some lights on the optimization problem (1.1) by connecting it to a special nonlinear extreme eigenvalue problem EpV qV “ V MV , eigpMV q “ tλi pEpV qq, i “ 1, 2, . . . , ℓu. (7.1) On the other hand, it naturally lends itself to Algorithm 7.1, a Self-Consistent-Field (SCF) iterative method for solving (1.1). Several remarks are in order for the algorithm. 1. The SCF iteration is currently one of the most widely used algorithms for solving the Kohn-Sham equations in electronic structure calculations (see e.g., [14, 23, 28, 29, 32, 34]). In dimension reduction and feature extraction, the effective algorithm discussed in [17, 30, 37, 38] is also an SCF iteration for the trace quotient (or trace ratio) optimization problem. 2. The quantity rk defined at Line 3 serves as the residual for the approximate Vk`1 of the nonlinear eigenvalue problem (7.1); 3. If the sequence tVk u converges to V̄ , then not only V̄ is a KKT point of (1.1), but also satisfies the necessary condition in Theorem 5.1 for a global maximizer. This is one of the major advantages of the SCF iteration over optimization-based methods in [1, 7, 18] which primarily concern the monotonic change of the objective value and converge to a KKT point that may or may not satisfy the necessary condition in Theorem 5.1. Because of this, conceivably the SCF iteration is more likely to achieve a global maximizer than any optimization-based method. 4. The major computational cost of Algorithm 7.1 lies at Line 2, where a dominant orthonormal eigenbasis of m ˆ ℓ for EpVk q has to be computed every iteration. One may employ any state-of-the-art eigensolver, for example, the QR algorithm [5, 9] if m is modest. But the QR algorithm require Opm3 q flops and Opm2 q storage, thus 20 it is impractical for large m. In the latter case, certain iterative method should be used, for example, the Lanczos method [19, 21], the Jacobi-Davidson iteration [25], the conjugate gradient type method [12], to name a few. 7.2 A divergent example for SCF iteration Originally, the SCF iteration is referred to the one for solving the Kohn-Sham equation in electronic structure calculations [22, 33]. The Kohn-Sham equation is a nonlinear eigenvalue problem in PDE which after certain disretization becomes a nonlinear matrix eigenvalue problem whose matrix depends on the eigenvectors to be computed, as oppose to the other type of nonlinear matrix eigenvalue problems whose matrices depend on the eigenvalues to be computed. This distinguishing feature of dependency on the eigenvectors is shared by the current nonlinear eigenvalue problem (7.1), making the SCF iteration a natural method to use. While the SCF iteration is simple to implement, its convergence in general is not well understood, depending closely on the targeted nonlinear eigenvalue problem. For the trace ratio optimization problem (the case C “ 0) in the linear discriminant analysis (LDA) [37], it is shown that SCF globally converges to a global maximizer regardless the initial guess V0 , and the convergence is locally quadratic under a generic condition. However, in [32] an artificial nonlinear eigenvalue problem is constructed to show that the SCF iteration may generate a sequence of approximate solutions that contain two convergent subsequences alternating each other neither of which converges to the solution for the nonlinear eigenvalue problem. For our problem (7.1), in general SCF can produce sequences with no subsequences converging to a KKT point as the following example illustrates. Example 7.1. Let m “ 3, ℓ “ 2 and » fi » fi » fi 15 10 9 7 7 7 11 5 8 A “ – 5 10 9fl , B “ –7 10 8fl and C “ –10 7 6fl . 7 8 8 9 6 6 8 9 5 With V0 “ ree1 , e 2 s, the sequences tf pVk qu and trk u generated by the SCF iteration oscillate and diverge (see Figure 7.1). We observe that there are two convergent subsequences (see Figure 7.1) of tf pVk qu, but neither approaches a solution to (1.1). This example compromises the possibility of establishing a global convergence analysis for the general case based on the monotonic increase of the objective values. 3 There are some strategies to curb the oscillation behavior of SCF, for example, a trustregion SCF iteration, which has shown some effectiveness in the case of the Kohn-Sham equation [23, 29, 34], but we will not pursue this here but in our future work. Despite that in general there isn’t much we can say about SCF’s global convergence behavior, for certain special cases we do know more. One particular example is the case when C “ 0 investigated in [37]. In the next subsection, we will add another example to the list of special cases by establishing a global convergence result for the case C “ ηB pη ą 0q. 21 28.689 0.18 28.6885 0.16 0.14 28.688 rk f(Vk) 0.12 28.6875 0.1 28.687 0.08 28.6865 0.06 28.686 28.6855 0.04 0 5 10 15 iteration k 20 25 0.02 30 0 5 10 15 iteration k 20 25 30 Figure 7.1: f pVk q (left) and rk (right) by SCF with V0 “ ree1 , e 2 s for Example 7.1 7.3 Global convergence for the case C “ ηB pη ą 0q Lemma 7.1. Suppose V, V̄ P Omˆℓ . Then there exists an orthogonal matrix Z P Rℓˆℓ such that ? ~ sin ΘpV, V̄ q~ ď }V Z ´ V̄ }p ď 2 ~ sin ΘpV, V̄ q~, (7.2) for any unitarily invariant norm ~ ¨ ~. Proof. Suppose for the moment that 2ℓ ď m. There exist orthogonal matrices Q P Rmˆm , Zi P Rℓˆℓ such that [27, p.40] ℓ ℓ QV Z1 “ ℓ m´2ℓ » ℓ fi I – 0 fl, 0 ℓ QV̄ Z2 “ ℓ m´2ℓ » fi Γ – Σ fl, 0 where Γ “ diagpγ1 , . . . , γℓ q, γ1 ě ¨ ¨ ¨ ě γℓ ě 0, Σ “ diagpσ1 , . . . , σℓ q, 0 ď σ1 ď ¨ ¨ ¨ ď σℓ , γi “ cos θi pV, V̄ q, σi “ sin θi pV, V̄ q, for 1 ď i ď ℓ. Take Z “ Z1 Z2J . The singular values of V Z ´ V̄ are b θi p1 ´ cos θi q2 ` sin2 θi “ 2 sin for 1 ď i ď ℓ, 2 where θi “ θi pV, V̄ q. The inequalities in (7.2) are now simple consequences, upon noticing sin θ ď 2 sin ? θ sin θ “ ď 2 sin θ 2 cos θ{2 for 0 ď θ ď π . 2 The case for 2ℓ ą m can be dealt with in the same way, except using [27, p.41] m´ℓ m´ℓ QV Z1 “ 2ℓ´m m´ℓ » I – 0 0 m´ℓ 2ℓ´m 0 I 0 fi fl, m´ℓ QV̄ Z2 “ 2ℓ´m m´ℓ 22 » Γ – 0 Σ 2ℓ´m 0 I 0 fi fl, where Γ “ diagpγ1 , . . . , γm´ℓ q, γ1 ě ¨ ¨ ¨ ě γm´ℓ ě 0, Σ “ diagpσ1 , . . . , σm´ℓ q, 0 ď σ1 ď ¨ ¨ ¨ ď σm´ℓ , γi “ cos θ2ℓ´m`i pV, V̄ q, σi “ sin θ2ℓ´m`i pV, V̄ q, for 1 ď i ď m ´ ℓ, and θi pV, V̄ q “ 0 for 1 ď i ď 2ℓ ´ m. The inequality (7.3) below can also be deduced from [13, Theorem 2.1], but since (7.3) is much simpler to prove, a short proof is given for self-containedness. Lemma 7.2. Let H P Rmˆm be a symmetric matrix. For V, V̄ P Omˆℓ , we have ? ? | trpV̄ J H V̄ q ´ trpV J HV q| ď 2 2 }H}2 ϵ̃ ď 2 2 ℓ}H}2 ϵ. (7.3) where ϵ̃ “ } sin ΘpV, V̄ q}tr , ϵ “ } sin ΘpV, V̄ q}2 . (7.4) In particular, for the function f defined by (3.2), lim f pV q “ f pV̄ q. (7.5) } sin ΘpV̄ ,V q}2 Ñ0 Proof. Let Z be the one in Lemma 7.1, and ∆V “ V̄ ´ V Z. We have trpV̄ J H V̄ q ´ trpV J HV q “ trpV̄ J H V̄ q ´ trpZ J V J HV Zq “ trpV̄ J H V̄ q ´ trpV̄ J HV Zq ` trpV̄ J HV Zq ´ trpZ J V J HV Zq “ trpV̄ J H∆V q ` trpr∆V sJ HV Zq. Therefore use (2.4) and (2.5) to get | trpV̄ J H V̄ q ´ trpV J HV q| ď }V̄ J H∆V }tr ` }r∆V sJ HV Z}tr ď 2}H}2 }∆V }tr which is the first inequality in (7.3). The second inequality there holds because ϵ̃ ď ℓϵ. Theorem 7.1. Suppose A P Rmˆm is symmetric and C “ ηB P Rmˆm pη ą 0q is symmetric and positive definite. Let tVk u be the sequence generated by Algorithm 7.1 with an arbitrarily given V0 P Omˆℓ . Let Vk “ RpVk q, the column space of Vk . Then the following statements hold. 1. tVk u has a convergent subsequence tVkj u in the sense that there is an ℓ-dimensional subspace V̄ Ă Rm such that lim } sin ΘpVkj , V̄q}2 “ 0. jÑ8 (7.6) 2. The sequence tf pVk qu monotonically increases and converges to f pV̄ q, where V̄ is an orthonormal basis matrix of V̄. 23 3. V̄ is an invariant subspace of EpV̄ q corresponding to its ℓ largest eigenvalues. In the other word, the orthonormal basis matrix of any limit point of tVk u satisfies the necessary condition given in Theorem 5.1. 4. If λℓ pEpV̄ qq ą λℓ`1 pEpV̄ qq, i.e., the dominant ℓ-dimensional eigenspace of EpV̄ q is unique, then for any positive integer q, we have lim } sin ΘpVkj ˘q , V̄ q}2 “ 0. jÑ8 (7.7) As an interesting consequence, if max |kj`1 ´ kj | over all 1 ď j ď 8 is finite, then } sin ΘpVk , V̄ q}2 Ñ 0 as k Ñ 8. Proof. All subspaces of dimension ℓ in Rm form a Grassmann manifold which is compact [15, p.57] with the metric given by } sin Θp ¨ , ¨ q}2 . tVk u is a sequence on the manifold and thus has a convergent subsequence tVkj u that converges to an ℓ-dimensional subspace V̄ Ă Rm in the sense of (7.6). This is item 1. Before we prove item 2, we notice that C “ ηB leads to MV “ V J EpV qV “ V J CV “ ηV J BV for any V P Omˆℓ . (7.8) Now we have from (5.3) ∆fk : “ f pVk`1 q ´ f pVk q “ ‰ 2 J EpV qV ϕB pVk q trpVk`1 k k`1 q ´ ηϕB pVk q ` η rϕB pVk`1 q ´ ϕB pVk qs “ . ϕB pVk`1 q (7.9) Because Vk`1 is an orthonormal eigenbasis of EpVk q corresponding to its ℓ largest eigenvalues, implying J trpVk`1 EpVk qVk`1 q ě trpVkJ EpVk qVk q “ ηϕB pVk q by (7.8), it follows that ∆fk ě 0. Thus tf pVk qu is nondecreasing and convergent since t|f pVk q|u is bounded. Since } sin ΘpVkj , V̄ q}2 Ñ 0 as j Ñ 8, by (7.5) we conclude that lim f pVk q “ lim f pVkj q “ f pV̄ q. jÑ8 kÑ8 This proves item 2. Making use of ∆fk Ñ 0, we conclude from (7.9) J lim rtrpVk`1 EpVk qVk`1 q ´ ηϕB pVk qs “ 0, (7.10) lim rϕB pVk`1 q ´ ϕB pVk qs “ 0. (7.11) kÑ8 kÑ8 Again since } sin ΘpVkj , V̄ q}2 Ñ 0 as j Ñ 8, by (7.5) and (7.11) we have lim ϕB pVkj `1 q “ lim ϕB pVkj q “ ϕB pV̄ q, jÑ8 jÑ8 24 (7.12) lim ϕA pVkj q “ ϕA pV̄ q. jÑ8 (7.13) Combine (7.10) with (7.12) to get lim trpVkJj `1 EpVkj qVkj `1 q “ η lim ϕB pVkj q “ ηϕB pV̄ q. jÑ8 jÑ8 (7.14) According to the SCF iteration, we have Ăk`1 EpVk qVk`1 “ Vk`1 M with Ăk`1 “ V J EpVk qVk`1 . M k`1 Notice that ∆Ek : “ EpVk q ´ EpV̄ q ˆ ˙ ˆ ˙ 1 1 ϕA pV̄ q ϕA pVk q “ ´ A` ´ B. ϕB pVk q ϕB pV̄ q rϕB pV̄ qs2 rϕB pVk qs2 Use (7.12) and (7.13) to see lim ∆Ekj “ 0. (7.15) jÑ8 Therefore, by the continuity of eigenvalues with respect to matrix entries [27], we know Ăk `1 q “ lim lim trpM j jÑ8 jÑ8 ℓ ÿ λs pEpVkj qq “ ℓ ÿ λs pEpV̄ qq. (7.16) s“1 s“1 Together, (7.14) and (7.16) yield Ăk `1 q “ lim trpM j jÑ8 ℓ ÿ λs pEpV̄ qq “ ηϕB pV̄ q. (7.17) s“1 This shows that ηϕB pV̄ q is the sum of the ℓ largest eigenvalues of EpV̄ q. Therefore by (7.8), ℓ ÿ trpV̄ J EpV̄ qV̄ q “ ηϕB pV̄ q “ λs pEpV̄ qq. s“1 So V̄ must be an orthonormal eigenbasis of EpV̄ q corresponding to its ℓ largest eigenvalues. This proves item 3. We prove item 4 by induction on q. The assumption that λℓ pEpV̄ qq ą λℓ`1 pEpV̄ qq implies that the eigenspace of EpV̄ q associated with its first ℓ largest eigenvalues is unique, i.e., V̄ is unique even though its orthonormal basis matrix V̄ is not. EpVkj q Ñ EpV̄ q as j Ñ 8 according to (7.15) and thus for sufficiently large j, λℓ pEpVkj qq ą λℓ`1 pEpVkj qq and the eigenspace Vkj `1 of EpVkj q associated with its first ℓ largest eigenvalues is unique and converges to V̄ by Davis-Kahan sin 2θ theorem [4], i.e., (7.7) holds for q “ 1. Suppose now (7.7) holds for q, and we need to show it is also true for q ` 1. To see this, we simply rename kj ` q to kj and apply the argument we just did. 25 7.4 Local convergence for the general case We now will investigate the local convergence behavior of the SCF iteration for the general case. Throughout this subsection, ϵ̃ “ } sin ΘpV, V̄ q}tr , ϵ “ } sin ΘpV, V̄ q}2 , (7.4) where V, V̄ P Omˆℓ . Then ϵ ď ϵ̃ ď ℓϵ. RA and RB as defined by (7.19) below with H P tA, Bu. Lemma 7.3 is a refinement of Lemma 7.2 when RpV̄ q is an approximate invariant subspace of H. Lemma 7.3. Let H P Rmˆm be a symmetric matrix. For V, V̄ P Omˆℓ , we have ? |ϕH pV̄ q ´ ϕH pV q| ď p2 2 }RH }2 ` 4}H}2 ϵqϵ̃, (7.18) where ϕH pV q :“ V J HV , and RH “ H V̄ ´ V̄ ploomoon V̄ J H V̄ q. (7.19) “:H1 Proof. Let Z P Rℓˆℓ be the one in Lemma 7.1, and ∆V “ V̄ ´ V Z. We have |ϕH pV̄ q ´ ϕH pV q| “ |ϕH pV̄ q ´ ϕH pV Zq| ď | trp∆V J H V̄ q ` trpV̄ J H∆V q| ` | trp∆V J H∆V q|. (7.20) Use H V̄ “ RH ` V̄ H1 to get trp∆V J H V̄ q ` trpV̄ J H∆V q J “ trp∆V J RH q ` trp∆V J V̄ H1 q ` trpRH ∆V q ` trpH1 V̄ J ∆V q “ 2 trp∆V J RH q ` trp∆V J V̄ H1 q ` trpV̄ J ∆V H1 q “ 2 trp∆V J RH q ` trpr∆V J V̄ ` V̄ J ∆V sH1 q “ 2 trp∆V J RH q ` trp∆V J ∆V H1 q, (7.21) where we have used ∆V J V̄ ` V̄ J ∆V “ ∆V J ∆V which can be verified by substituting in ∆V “ V̄ ´ V Z. Combine (7.21) with (7.20) to get |ϕH pV̄ q ´ ϕH pV q| ď 2| trp∆V J RH q| ` | trp∆V J ∆V H1 q| ` | trp∆V J H∆V q| ď 2}∆V J RH }tr ` }∆V J ∆V H1 }tr ` }∆V J H∆V }tr (by (2.5)) ď 2}∆V }tr }RH }2 ` }∆V }2 }∆V }tr p}H1 }2 ` }H}2 q, and then use Lemma 7.1 and }H1 }2 ď }H}2 to conclude (7.18). Note that RH “ 0 is equivalent to } sin ΘpV̄ , H V̄ q}2 “ 0. To establish the main theorem, Theorem 7.2, of this subsection, we additionally introduce the following notations ωB :“ m ÿ λj pBq, ΩB :“ ℓ ÿ j“1 j“m´ℓ`1 26 λj pBq. (7.22) Recall that B is positive definite but A may be indefinite. We have for any V P Omˆℓ , $ˇ ˇ ˇ ˇ, ˇ ˇ ℓ m &ˇˇ ÿ ˇ. ˇ ˇÿ ˇ 0 ă ωB ď ϕB pV q ď ΩB and |ϕA pV q| ď ΩA :“ max ˇˇ λj pAqˇˇ , ˇ λj pAqˇ . %ˇ ˇ ˇˇ j“1 j“m´ℓ`1 (7.23) Lemma 7.4. For V, V̄ P Omˆℓ , we have ˇ ˇ ˇ 1 2ℓ}B}2 1 ˇˇ 2}B}2 ˇ ˇ ϕB pV q ´ ϕ pV̄ q ˇ ď ω 2 ϵ̃ ď ω 2 ϵ, B B “?B ‰ “? ‰ ˇ ˇ ˇ 1 ˇ 2 2 }RB }2 ` 2}B}2 ϵ 2ℓ 2 }RB }2 ` 2}B}2 ϵ 1 ˇ ˇ ϵ̃ ď ϵ, ˇ ϕB pV q ´ ϕ pV̄ q ˇ ď ω2 ω2 B B (7.24a) (7.24b) B and ˇ ˇ ˇ ϕA pV̄ q ϕA pV q ˇˇ ˇ ˇ rϕ pV̄ qs2 ´ rϕB pV qs2 ˇ ď B ˇ ˇ ˇ ϕA pV̄ q ϕA pV q ˇˇ ˇ ˇ rϕ pV̄ qs2 ´ rϕB pV qs2 ˇ ď B ‰ 2 “ 2 2ΩA ΩB }B}2 ` ΩB }A}2 ϵ̃, 4 ωB 2 ” ? 2 4 2 2pΩB }RA }2 ` 2ΩA ΩB }RB }2 q ωB ı 2 `2pΩB }A}2 ` 2ΩA ΩB }B}2 qϵ ϵ̃. (7.25a) (7.25b) Proof. By Lemmas 7.2 and 7.3, we can write for H P tA, Bu ϕH pV q “ ϕH pV̄ q ` ϵH , (7.26a) |ϵH | ď 2}H}2 ϵ̃ ď 2ℓ}H}2 ϵ, ? ? |ϵH | ď p2 2 }RH }2 ` 4}H}2 ϵqϵ̃ ď ℓp2 2 }RH }2 ` 4}H}2 ϵqϵ. (7.26b) (7.26c) Therefore ˇ ˇ ˇ ˇ ˇ 1 ˇ ˇ ˇ 1 ´ϵ B ˇ ˇ ˇ ˇ ˇ ϕB pV q ´ ϕ pV̄ q ˇ “ ˇ ϕ pV qϕ pV̄ q ˇ , B B B ˇ ˇ ˇ ˇ ˇ ϕA pV̄ q ϕA pV q ˇˇ ˇˇ rϕA pV̄ q ´ ϕA pV qsrϕB pV qs2 ` ϕA pV qtrϕB pV qs2 ´ rϕB pV̄ qs2 u ˇˇ ˇ ˇ ˇ rϕ pV̄ qs2 ´ rϕB pV qs2 ˇ “ ˇ rϕB pV qϕB pV̄ qs2 B Ω 2 |ϵA | ` 2ΩA ΩB |ϵB | ď B . 4 ωB Now use (7.26b) and (7.26c) to bound ϵB to get (7.24a) and (7.24b), respectively; use (7.26b) to bound ϵA and ϵB to get (7.25a); use (7.26c) to bound ϵA and ϵB to get (7.25b). 2 and ω 4 Remark 7.1. In this lemma and many expressions in what follows, we find ωB B in the denominators of various fractions, as the results of bounding ϕB pV qϕB pV̄ q and rϕB pV qϕB pV̄ qs2 from below, respectively, simply by using (7.23). In cases where ωB is 27 rather tiny relative to ΩB , it may make various upper bound estimates unnecessarily too big. A better option may be to use ϕB pV qϕB pV̄ q “ ϕB pV̄ qrϕB pV̄ q ` ϵB s followed by bounding ϵB . We omit the detail. Lemma 7.5. For V, V̄ P Omˆℓ , we have }EpV̄ q ´ EpV q}2 ď pχ 1 ` χ2 q}B}2 ϵ, loooooooomoooooooon (7.27) “:χ where χ1 “ 2ℓ}A}2 2 ωB Proof. It can be verified that „ ∆E :“ EpV q ´ EpV̄ q “ and χ2 “ 2ℓΩB r2ΩA }B}2 ` ΩB }A}2 s . 4 ωB ȷ „ ȷ 1 1 ϕA pV̄ q ϕA pV q ´ A` ´ B. ϕB pV q ϕB pV̄ q rϕB pV̄ qs2 rϕB pV qs2 (7.28) (7.29) Therefore ˇ ˇ ˇ ˇ ˇ 1 ˇ ϕA pV̄ q 1 ˇˇ ϕA pV q ˇˇ ˇ ˇ }∆E}2 ď ˇ ´ ´ }A}2 ` ˇ }B}2 . ϕB pV q ϕB pV̄ q ˇ rϕB pV̄ qs2 rϕB pV qs2 ˇ Now use (7.24a) and (7.25a) to complete the proof. Remark 7.2. Sharper bounds than (7.27) are possible through using (7.24b) and (7.25b) instead. But the use of these possible sharper bounds does not lead to an essentially different conclusion later in our main theorem – Theorem 7.2. Lemma 7.6. Let H P Rmˆm be a symmetric matrix, and V` , V̄ P Omˆℓ , and RH be defined by (7.19). Let V̄K P Omˆpm´ℓq such that rV̄ , V̄K s is orthogonal. Then for any unitarily invariant norm ~ ¨ ~ ? ~V̄KJ HV` ~ ď 2 }H}2 ~ sin ΘpV` , V̄ q~ ` ~RH ~ . (7.30) Proof. Let Z P Rℓˆℓ be the one in Lemma 7.1 applied to V` and V̄ such that ~V̄ ´V` Z~ ď ? 2~ sin ΘpV` , V̄ q~. Now notice V̄KJ HV` “ V̄KJ HpV` Z ´ V̄ qZ J ` V̄KJ H V̄ Z J “ V̄KJ HpV` Z ´ V̄ qZ J ` V̄KJ pRH ` V̄ H1 qZ J “ V̄KJ HpV` Z ´ V̄ qZ J ` V̄KJ RH Z J to get ~V̄KJ HV` ~ ď }H}2 ~V` Z ´ V̄ ~ ` ~RH ~ ď 28 ? 2 }H}2 ~ sin ΘpV` , V̄ q~ ` ~RH ~. Lemma 7.7. Suppose V̄ P Omˆℓ is an orthonormal eigenbasis of EpV̄ q associated with its ℓ largest eigenvalues, and let δ :“ λℓ pEpV̄ qq ´ λℓ`1 pEpV̄ qq. Define RB and B1 by (7.19) with H “ B. For V P Omˆℓ , let V` P Omˆℓ be an orthonormal eigenbasis of EpV q associated with its ℓ largest eigenvalues. If ? δ ą pχ ` 2χ2 }B}2 qϵ, (7.31) then } sin ΘpV` , V̄ q}2 ď τ1 }RB }2 ϵ ` τ2 ϵ2 ? , δ ´ pχ ` 2χ2 }B}2 qϵ where χ and χ2 are defined in (7.27) and (7.28), and ? 2 2ℓ}A}2 4ℓ}A}2 }B}2 τ1 “ ` χ2 , τ2 “ . 2 2 ωB ωB (7.32) (7.33) Proof. By the assumption EpV qV` “ V` Λ1 , where Λ1 P Rℓˆℓ and eigpΛ1 q “ tλi pEpV qquℓi“1 . Let R1 :“ EpV̄ qV` ´ V` Λ1 “ rEpV̄ q ´ EpV qsV` . (7.34) Let V̄K P Omˆpm´ℓq such that rV̄ , V̄K s is orthogonal. By the assumption, V̄K is the orthonormal basis matrix associated the smallest m ´ ℓ eigenvalues of EpV̄ q. So we can write V̄KJ EpV̄ q “ Λp2 V̄KJ , where Λp2 P Rpm´ℓqˆpm´ℓq and eigpΛp2 q “ tλi pEpV̄ qqum i“ℓ`1 . We have V̄KJ R1 “ V̄KJ rEpV̄ qV` ´ V` Λ1 s “ Λp2 V̄ J V` ´ V̄ J V` Λ1 . K K (7.35) Since |λi pEpV qq ´ λi pEpV̄ qq| ď }EpV̄ q ´ EpV q}2 ď χ ϵ by [27, p.203] and Lemma 7.5, we have, for i ď ℓ and ℓ ` 1 ď j, λi pEpV qq ´ λj pEpV̄ qq ě λℓ pEpV qq ´ λℓ`1 pEpV̄ qq ě λℓ pEpV̄ qq ´ χϵ ´ λℓ`1 pEpV̄ qq “ δ ´ χϵ. (7.36) Therefore by Lemma 2.1, we have } sin ΘpV` , V̄ q}2 “ }VKJ V` }2 ď To bound }V̄KJ R1 }2 , we have, using (7.34) and (7.29), }V̄KJ R1 }2 “ }V̄KJ rEpV̄ q ´ EpV qsV` }2 29 }V̄KJ R1 }2 . δ ´ χϵ (7.37) ˇ ˇ ˇ ˇ ϕA pV̄ q 1 ϕA pV q ˇˇ J 1 ˇˇ J ˇ ´ } V̄ AV } ` ´ ` 2 ˇ rϕ pV̄ qs2 rϕB pV qs2 ˇ }V̄K BV` }2 ϕB pV q ϕB pV̄ q ˇ K B “: ε1 ` ε2 . (7.38) ˇ ˇ ď ˇˇ For ε1 , we use (7.24b) to get ? 2 2ℓ}RB }2 ϵ ` 4ℓ}B}2 ϵ2 ε1 ď }A}2 . 2 ωB (7.39) For ε2 , we use (7.25a) to get ı ” ? ε2 ď χ2 ϵ}V̄KJ BV` }2 ď χ2 ϵ }B}2 2} sin ΘpV` , V̄ q}2 ` }RB }2 . (7.40) Finally, combine (7.37) – (7.40) to arrive at, under (7.31), } sin ΘpV` , V̄ q}2 ď ε1 ` χ2 }RB }2 ϵ ? , δ ´ pχ ` 2χ2 }B}2 qϵ which, together (7.39), lead to (7.32). Remark 7.3. The inequality (7.37) remains valid in any unitarily invariant norm: ~ sin ΘpV` , V̄ q~ “ ~VKJ V` ~ ď ~V̄KJ R1 ~ . δ ´ χϵ (7.37’) The machinery we have built so far in various lemmas allows us to easily bound ~V̄KJ R1 ~ in the similar way. The outcome will be interesting theoretically, but may add little to our understanding of SCF than Lemma 7.7 as is. This same remark applies to the next theorem in which the analysis is in terms of } ¨ }2 but can be done in terms of any unitarily invariant norm, too. Theorem 7.2. Suppose V̄ P Omˆℓ is an orthonormal eigenbasis of EpV̄ q associated with its ℓ largest eigenvalues, and let δ :“ λℓ pEpV̄ qq ´ λℓ`1 pEpV̄ qq. Define RA , A1 and RB , B1 by (7.19) with H “ A and H “ B, respectively, and let χ, χ2 , τ1 , and τ2 be as defined in (7.27), (7.28), and (7.33), respectively. Apply the SCF iteration (Algorithm 7.1) to generate a sequence tVk u, given V0 . 1. Given any 0 ă t ă 1 and 0 ă ν ă 1, if }RB }2 ď tν δ{τ1 , (7.41) then for any V0 P Omˆℓ satisfying } sin ΘpV0 , V̄ q}2 ă p1 ´ tqνδ ? τ2 ` νpχ ` 2χ2 }B}2 q (7.42) the sequence tVk :“ RpVk qu converges to V̄ :“ RpV̄ q at least linearly, and moreover } sin ΘpVk`1 , V̄ q}2 ď ν} sin ΘpVk , V̄ q}2 30 for k “ 0, 1, . . .. (7.43) 2. If RB “ 0, then for any V0 P Omˆℓ such that } sin ΘpV0 , V̄ q}2 ă δ ? , τ2 ` χ ` 2χ2 }B}2 (7.44) the sequence tVk u converges to V̄ quadratically, and moreover } sin ΘpVk`1 , V̄ q}2 ď δ ´ pχ ` ? τ2 } sin ΘpVk , V̄ q}22 2χ2 }B}2 q} sin ΘpVk , V̄ q}2 (7.45) for k “ 0, 1, . . .. 3. If RA “ RB “ 0, then for any V0 P Omˆℓ satisfying } sin ΘpV0 , V̄ q}2 ă 2δ χ` a χ2 ` 4τ3 δ then V1 “ V̄, i.e., convergence in one step, where ? ? ı 4 2}A}2 }B}2 ℓ 4 2ℓ}B}2 ” 2 τ3 :“ ` }A} Ω ` 2Ω Ω }B} 2 B 2 . A B 2 4 ωB ωB (7.46) (7.47) Proof. It suffices to prove (7.43) and (7.45) just for k “ 0 and verify that (7.42) and (7.44) hold with V0 replaced by V1 . For clarity, we drop the subscript 0 to V0 and write V` for V1 . Let ϵ “ } sin ΘpV, V̄ q}2 , ϵ` “ } sin ΘpV` , V̄ q}2 . By Lemma 7.7, we have ϵ` ď τ1 }RB }2 ` τ2 ϵ ? ˆϵ δ ´ pχ ` 2χ2 }B}2 qϵ (7.48) ? if δ ą pχ ` 2χ2 }B}2 qϵ. As we will see, this condition is always satisfied under (7.42) or (7.44) in the respective cases. For item 1, we like to have ô ô ð τ1 }RB }2 ` τ2 ϵ ? ďν δ ´ pχ ` 2χ2 }B}2 qϵ ” ı ? τ1 }RB }2 ` τ2 ϵ ď ν δ ´ pχ ` 2χ2 }B}2 qϵ ı ” ? τ1 }RB }2 ` τ2 ` νpχ ` 2χ2 }B}2 q ϵ ď νδ ” ı ? τ1 }RB }2 ď tνδ, τ2 ` νpχ ` 2χ2 }B}2 q ϵ ď p1 ´ tqνδ. (7.49) The two inequalities in (7.49) hold by the assumption }RB }2 ď tν δ{τ1 and (7.42). Also ? the second inequality in (7.49) implies δ ą pχ ` 2χ2 }B}2 qϵ. For item 2, RB “ 0; so (7.48) becomes ϵ` ď τ ϵ ?2 ˆϵ δ ´ pχ ` 2χ2 }B}2 qϵ 31 (7.50) Now we like to have ô ô τ ϵ ?2 ă1 δ ´ pχ ` 2χ2 }B}2 qϵ ? τ2 ϵ ă δ ´ pχ ` 2χ2 }B}2 qϵ ” ı ? τ2 ` pχ ` 2χ2 }B}2 q ϵ ă δ. (7.51) The inequality in (7.51) is equivalent to the assumption (7.44). Also this inequality implies ? δ ą pχ ` 2χ2 }B}2 qϵ. For item 3, to exploit the additional condition RA “ 0, we return to (7.37) and (7.38). By (7.24b) and Lemma 7.6 for H “ A, we have ? 4ℓ}B}2 ϵ ? 4 2}A}2 }B}2 ℓ 2 ϵ1 ď ¨ 2}A}2 ϵ` ď ϵ ϵ` . (7.52) 2 2 ωB ωB For ϵ2 , we use (7.25b) and Lemma 7.6 for H “ B to get ˇ ˇ ˇ ϕA pV̄ q ϕA pV q ˇˇ ? ˇ ϵ2 ď ˇ ´ 2}B}2 ϵ` rϕB pV̄ qs2 rϕB pV qs2 ˇ ? ı 4 2ℓ}B}2 ” 2 ď }A} Ω ` 2Ω Ω }B} ϵ2 ϵ` . 2 2 A B B 4 ωB (7.53) Combine (7.37), (7.38), (7.52), and (7.53) to arrive at ϵ` ď τ3 2 ϵ ϵ` δ ´ χϵ where τ3 is defined by (7.47). Equivalently, ϵ` pδ ´ χϵ ´ τ3 ϵ2 q ď 0, provided δ ´ χϵ ą 0. Thus ϵ` “ 0 if δ ´ χϵ ´ τ3 ϵ2 ą 0. This last inequality holds under (7.46) which also ensures δ ´ χϵ ą 0. Remark 7.4. The condition (7.41) is rather strong, as the result of a number of upper bound estimations. In actual computations, SCF may still converge even if it fails. However, it is very useful to our understanding of SCF’s local convergence behaviors. The condition reveals a remarkable intrinsic connection between the convergence speed of SCF and that V̄ being close to an eigenspace of B. The closer V̄ is to an eigenspace of B, the faster the convergence will be. In fact, when V̄ is an eigenspace of B, the convergence is quadratic. Even more extreme is when V̄ is an eigenspace of both A and B, the convergence is instant for a sufficiently accurate V0 . A very particular case: AB “ BA and CB “ BC falls into item 3 of Theorem 7.2. Under these assumptions, for any V P Omˆℓ , EpV qH “ HEpV q for H P tA, Bu and thus EpV̄ qV̄ “ V̄ MV̄ ñ EpV̄ qH V̄ “ H V̄ MV̄ which implies that H V̄ is also a eigenbasis of EpV̄ q corresponding to its ℓ largest eigenvalues. With the aid of λℓ pEpV̄ qq ą λℓ`1 pEpV̄ qq, we conclude that for H P tA, Bu, sin ΘpV̄ , H V̄ q “ 0 ô RH “ 0. 32 8 Numerical experiments We report in this section our numerical experiments on the SCF iteration. We recall Example 7.1, purposely constructed to demonstrate that global convergence cannot be guaranteed in general. But our experience on randomly generated problems has been always fast and globally convergent. With various pairs pm, ℓq, we have tested the SCF iteration on numerous random symmetric and positive definite matrices A, B, C and random initial V0 as generated in MATLAB by A “ randnpm, mq; A “ A1 ˚ A; B “ randnpm, mq; B “ B1 ˚ B; C “ randnpm, mq; C “ C1 ˚ C; V0 “ qrprandnpm, ℓq, 0q; We mentioned before making C positive definite does not loss any generality. The same can be said about A through shifting A to A ` ξB so that it is positive definite and at the same time, the maximizers remain unchanged. For comparison purpose, we also tested Stiefel manifold-based optimization methods, coded into sg min7 [2, section 9.4] (see also [7]), to minimize ´f pV q. They include the Fletcher-Reeves CG iterative search, the Polak-Ribière CG iterative search, the Newton iterative search, and the dog-leg Newton iterative search, all accessible through sg min by [f,V]=sg min(V0 , ‘frcg’,‘euclidean’); [f,V]=sg min(V0 , ‘prcg’,‘euclidean’); [f,V]=sg min(V0 , ‘newton’,‘euclidean’); [f,V]=sg min(V0 , ‘dog’,‘euclidean’); respectively. In order to run sg min, we provide 1. the objective function8 ´f pV q in the MATLAB file F.m , 2. the first derivative information in dF.m dFpV q :“ ´ Bf pV q “ ´2EpV qV, BV 3. the second derivative information in ddF.m ˇ ˇ ␣ ( d “ ´2 EpV qX ` DEpV qrXsV ddFpV, Xq :“ dFpV ptqqˇˇ dt t“0 ␣ ( “ ´2 EpV qX ` GpV, XqV , where V ptq is any smooth curve on Omˆℓ with V “ V p0q and X “ V9 p0q P TV Omˆℓ , and GpV, Xq is defined by (3.12). 7 8 The MATLAB code sg min (version 2.4.3) is available at: http://web.mit.edu/„ripper/www/sgmin.html. Because sg min minimizes a function on Omˆℓ , we use ´f pV q instead of f pV q. 33 Table 8.1: The numbers of outer iterations averaged over 20 random tests SCF m sg min frcg prcg Newton dog 100 7.00 42.40 220.40 9.00 12.80 200 6.80 56.70 341.20 10.10 13.20 500 6.00 95.10 679.00 10.50 14.50 1000 6.00 85.50 677.60 11.10 15.40 2000 5.00 114.50 648.80 11.80 16.30 All calculations are carried out in MATLAB 7.10.0 (R2010a) on a PC Intel(R) Core(TM)2 Quad CPU Q9300 @2.50GHz. For the SCF iteration, when m ď 500, we use the MATLAB function eig to find an orthonormal eigenbasis Vk`1 of EpVk q associated with its ℓ largest eigenvalues at Line 2 of Algorithm 7.1, and use eigs if m ą 500. The SCF iteration is terminated whenever rk :“ }EpVk`1 qVk`1 ´ Vk`1 MVk`1 }2 ď tol “ 10´10 or k ě 50, For the sg min, the default options are used. For ℓ “ 3 and various m, Tables 8.1, 8.2 and 8.3 report the numbers of outer iterations, the residuals rk , and the CPU times (measured by MATLAB function cputime), averaged over 20 random tests, for each method. Similar numerical behaviors are also observed for other ℓ. Table 8.2: Residuals rk averaged over 20 random tests SCF m sg min frcg prcg Newton dog 100 1.7392E-11 1.0016E-4 1.0426E-4 6.0403E-5 9.9159E-5 200 1.4309E-11 1.8052E-4 2.0508E-4 1.4730E-4 2.3812E-4 500 1.2952E-11 5.1048E-4 5.0872E-4 3.3755E-4 6.1155E-4 1000 1.3626E-11 1.0374E-3 1.0266E-3 8.3783E-4 1.1495E-3 2000 5.1391E-11 2.1672E-3 2.0579E-3 1.6901E-3 2.4546E-3 Tables 8.1, 8.2, and 8.3 clearly suggest that SCF is far more superior to Stiefel manifoldbased optimization methods implemented into sg min on the random problems in accuracy, the number of iterative steps, and the running time. It deserves to point out that SCF shows a remarkable global convergence behavior: we observed that numerically, all the methods converge to the same objective value for each testing problem, but SCF can reach a solution with residual about 10´11 in only about 6 iterations, while the FletcherReeves CG (frcg) takes from 6 up to 20 times, the Polak-Ribière CG (prcg) takes from 34 Table 8.3: CPU time: averages over 20 random tests SCF m sg min frcg prcg Newton dog 0.6848 1.6817 0.3822 0.7254 100 0.1825 200 0.7831 1.6287 5.7798 1.1185 3.1590 500 3.8813 19.3270 84.3076 10.8545 66.6748 1000 9.5020 70.5156 379.2478 56.0278 520.1073 2000 29.2954 370.6475 1435.5852 298.2989 4075.0503 30 up to 128 times, the Newton iteration (Newton) takes from 1.3 up to 2 times, and the dog-leg Newton iteration (dog) takes from 1.7 up to 3 times as many iterations to reduce residuals to only about 10´4 . Our last remark of this section is about the relationship between the accuracy of the objective value f pV q and the residual rk of the computed solution for each method. Note rk “ }EpVk`1 qVk`1 ´ Vk`1 MVk`1 }2 ď }EpV̄ qVk`1 ´ Vk`1 MVk`1 }2 ` }rEpVk`1 q ´ EpV̄ qsVk`1 }2 “ }EpV̄ qVk`1 ´ Vk`1 MVk`1 }2 ` Opϵk q, rk ě }EpV̄ qVk`1 ´ Vk`1 MVk`1 }2 ´ }rEpVk`1 q ´ EpV̄ qsVk`1 }2 “ }EpV̄ qVk`1 ´ Vk`1 MVk`1 }2 ` Opϵk q, by Lemma 7.5, where ϵk :“ sin ΘpVk , V̄ q. For sufficiently tiny ϵk , if δ :“ λℓ pEpV̄ qq ´ λℓ`1 pEpV̄ qq ą 0, then by Davis-Kahan sin θ theorem [4], 1 ϵk ď }EpV̄ qVk`1 ´ Vk`1 MVk`1 }2 ` Opϵk q. δ Hence roughly speaking, rk and ϵk are about in the same order of magnitude. Since the gradient f pV q at V̄ vanishes, f pVk q “ f pV̄ q ` Opϵ2k q “ f pV̄ q ` Oprk2 q. (8.1) This explains well Table 8.4 in which the objective values f pVk q by Stiefel manifold-based optimization methods match the respective ones by SCF to about 11 decimal digits, even though the residuals by the former methods are about the square roots of the ones by SCF. 35 Table 8.4: The computed objective values at convergence of a typical test problem SCF sg min m frcg prcg Newton dog 100 1.139615532720E+3 1.139615532719E+3 1.139615532719E+3 1.139615532719E+3 1.139615532719E+3 200 2.233485585827E+3 2.233485585826E+3 2.233485585826E+3 2.233485585826E+3 2.233485585824E+3 500 5.814017245337E+3 5.814017245334E+3 5.814017245335E+3 5.814017245332E+3 5.814017245321E+3 1000 1.173828254978E+4 1.173828254976E+4 1.173828254977E+4 1.173828254977E+4 1.173828254971E+4 2000 2.368365236550E+4 2.368365236545E+4 2.368365236546E+4 2.368365236548E+4 2.368365236535E+4 9 Concluding remarks and future research Analogously to that maximizing trpV J CV q on Omˆℓ is a subspace version of max}vv }2 “1 v J Cvv whose solution is the dominant eigenpair of C, what we have studied here * " trpV J AV q J ` trpV CV q , (1.1) max f pV q “ max V J V “Iℓ V J V “Iℓ trpV J BV q is a subspace version of optimizing the sum of two Rayleigh quotients on the unit sphere. We have made a number of discoveries, including the connection between (1.1) and the nonlinear linear eigenvalue problem (3.9), and the elegant necessary characterization of a global maximizer in terms of the extreme eigenvalues of (3.9) that leads to the SCF iteration for solving (1.1). Our numerical experiments suggest that the SCF iteration is very effective, much more efficient than Stiefel manifold-based optimization methods. We also explain how to march out of a KKT point that is not a global maximizer. Although (3.9) is about maximization, min f pV q, if needed, can be turned into one of (1.1) (with a different f ). In fact, " * trpV J r´AsV q J min f pV q “ ´ maxr´f pV qs “ ´ max ` trpV r´CsV q . trpV J BV q While the SCF iteration appeared to be very efficient in our numerical tests so far, there are certain theoretical and numerical issues that remain unanswered. These issues inlcude 1) sufficient conditions for the global maximizers, 2) further convergence analysis for the SCF iteration (like Algorithm 7.1 but for the more general case), and 3) some modifications, if any, of the SCF iteration to ensure its global convergence. They, among others, will be the subjects of our future research. We mentioned that (1.1) is a subspace version of optimizing the sum of two Rayleigh quotients on the unit sphere (the case ℓ “ 1). But (1.1) is only one kind among variously conceivable subspace versions. Other versions may deserve attentions, too. For instance, it is easy to see that when ℓ “ 1, (1.1) is the same as ␣ ( max trppV J AV qpV J BV q´1 q ` trpV J CV q . (9.1) V J V “Iℓ 36 Thus (9.1) can be considered as another subspace version. We believe similar lines of arguments of ours so far can be used to investigate (9.1). It is interesting to note that (9.1) is essentially equivalent to ! ) r qpY J BY r q´1 q ` trppY J CY r qpY J DY r q´1 q , max trppY J AY (9.2) rankpY q“ℓ r and C r are symmetric, and B r and D r are symmetric and positive definite. In fact, where A we can set r 1{2 Y, Z :“ D r ´1{2 A rD r ´1{2 , A :“ D r ´1{2 B rD r ´1{2 , B :“ D r ´1{2 C rD r ´1{2 C :“ D to translate (9.2) into max rankpZq“ℓ ␣ ( trppZ J AZqpZ J BZq´1 q ` trppZ J CZqpZ J Zq´1 q . (9.3) Now substitute Z “ V R (the thin QR factorization of Z) into (9.3) to arrive at (9.1). References [1] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms On Matrix Manifolds. Princeton University Press, Princeton, NJ, 2008. [2] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst (editors). Templates for the solution of Algebraic Eigenvalue Problems: A Practical Guide. SIAM, Philadelphia, 2000. [3] M. T. Chu and N. T. Trendafilov. The orthogonally constrained regression revisited. J. Comput. Graph. Stat., 10(4):746–771, 2001. [4] C. Davis and W. Kahan. The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal., 7:1–46, 1970. [5] J. Demmel. Applied Numerical Linear Algebra. SIAM, Philadelphia, PA, 1997. [6] M. Murat Dundar, Glenn Fung, Jinbo Bi, Sandilya Sathyakama, and Bharat Rao. Sparse Fisher discriminant analysis for computer aided detection. In In Proceedings of the SIAM International Conference on Data Mining, 2005. [7] A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl., 20(2):303–353, 1999. [8] E. Fung and M. K. Ng. On sparse Fisher discriminant method for microarray data analysis. Bioinformation, 2(5):230–234, 2007. [9] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, Maryland, 3rd edition, 1996. [10] U. Helmke and J. B. Moore. Optimization and Dynamical Systems. Springer-Verlag, London, UK, 1994. [11] R. A. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge University Press, Cambridge, 1991. 37 [12] Andrew V. Knyazev. Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput., 23(2):517–541, 2001. [13] Andrew V. Knyazev and Merico E. Argentati. Rayleigh-Ritz majorization error bounds with applications to FEM. SIAM J. Matrix Anal. Appl., 31(3):1521–1537, 2010. [14] R. M. Martin. Electronic Structure: Basic Theory and Practical Methods. Cambridge University Press, Cambridge, UK, 2004. [15] J. W. Milnor and J. D. Stasheff. Characteristic Classes. Princeton University Press & University of Tokyo Press, 1974. [16] M. K. Ng, L.-Z. Liao, and L. Zhang. On sparse linear discriminant analysis algorithm for high-dimensional data classification. Numer. Linear Algebra Appl., 18(2):223–235, 2011. [17] T. Ngo, M. Bellalij, and Y. Saad. The trace ratio optimization problem for dimensionality reduction. SIAM J. Matrix Anal. Appl., 31(5):2950–2971, 2010. [18] J. Nocedal and S. Wright. Numerical Optimization. Springer, 2nd edition, 2006. [19] B. N. Parlett. The Symmetric Eigenvalue Problem. SIAM, Philadelphia, 1998. [20] G. Primolevo, O. Simeone, and U. Spagnolini. Towards a joint optimization of scheduling and beamforning for mimo downlink. In Spread Spectrum Techniques and Applications, 2006 IEEE Ninth International Symposium on, pages 493–497, 2006. [21] Y. Saad. Numerical Methods for Large Eigenvalue Problems. Manchester University Press, Manchester, UK, 1992. [22] Yousef Saad, James R. Chelikowsky, and Suzanne M. Shontz. Numerical methods for electronic structure calculations of materials. SIAM Rev., 52(1):3–54, 2010. [23] V. R. Saunders and I. H. Hillier. A “levelshifting method for converging closed shell hartreefock wave functions. Internat. J. Quantum Chem., 7(4):699–705, 1973. [24] H. Shen, K. Diepold, and K. Hüper. A geometric revisit to the trace quotient problem. In Proceedings of the 19th International Symposium of Mathematical Theory of Networks and Systems (MTNS 2010), 2010. [25] Gerard L. G. Sleijpen and Henk A. Van der Vorst. A Jacobi–Davidson iteration method for linear eigenvalue problems. SIAM J. Matrix Anal. Appl., 17(2):401–425, 1996. [26] G. W. Stewart. Matrix Algorithms, Vol. II: Eigensystems. SIAM, Philadelphia, 2001. [27] G. W. Stewart and Ji-Guang Sun. Matrix Perturbation Theory. Academic Press, Boston, 1990. [28] A. Szabo and N. S. Ostlund. Modern Quantum Chemistry: An Introduction To Advanced Electronic Structure Theory. Dover, New York, 1996. [29] L. Thøgersen, J. Olsen, D. Yeager, P. Jørgensen, P. Salek, and T. Helgaker. The trust-region self-consistent field method: Towards a black-box optimization in HartreeFock and KohnSham theories. J. Chem. Phys., 121(1):16–27, 2004. [30] Huan Wang, Shuicheng Yan, Dong Xu, Xiaoou Tang, and T. Huang. Trace ratio vs. ratio trace for dimensionality reduction. In Computer Vision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on, pages 1–8, 2007. [31] P.-Å. Wedin. On angles between subspaces. In B. Kågström and A. Ruhe, editors, Matrix Pencils, pages 263–285, New York, 1983. Springer. 38 [32] C. Yang, W. Gao, and J. C. Meza. On the convergence of the self-consistent field iteration for a class of nonlinear eigenvalue problems. SIAM J. Matrix Anal. Appl., 30(4):1773–1788, 2009. [33] C. Yang, J. C. Meza, B. Lee, and L.-W. Wang. KSSOLV—a MATLAB toolbox for solving the Kohn-Sham equations. ACM Trans. Math. Software, 36(2):1–35, 2009. [34] C. Yang, J. C. Meza, and L.-W. Wang. A trust region direct constrained minimization algorithm for the kohn-sham equation. SIAM J. Sci. Comput., 29(5):1854–1875, 2007. [35] W. H. Yang, L.-H. Zhang, and R. Y. Song. Optimality conditions of the nonlinear programming on riemannian manifolds. Pacific J. Opt., 2013. to appear. [36] L.-H. Zhang. On optimizing the sum of the Rayleigh quotient and the generalized Rayleigh quotient on the unit sphere. Comput. Opt. Appl., 54(1):111–139, 2013. [37] L.-H. Zhang, L.-Z. Liao, and M. K. Ng. Fast algorithms for the generalized Foley-Sammon discriminant analysis. SIAM J. Matrix Anal. Appl., 31(4):1584–1605, 2010. [38] L.-H. Zhang, L.-Z. Liao, and M. K. Ng. Superlinear convergence of a general algorithm for the generalized Foley-Sammon discriminant analysis. J. Optim. Theory Appl., 157(3):853–865, 2013. 39
© Copyright 2026 Paperzz