Algebraic two-level convergence theory for singular systems Yvan Notay∗ Service de Métrologie Nucléaire Université Libre de Bruxelles (C.P. 165/84) 50, Av. F.D. Roosevelt, B-1050 Brussels, Belgium. email : [email protected] Report GANMN 15-01 July 2015 Revised December 2015, July 2016 Abstract We consider the algebraic convergence theory that gives its theoretical foundations to classical algebraic multigrid methods. All the main results constitutive of the approach are properly extended to singular compatible systems, including the recent sharp convergence estimates for both symmetric and nonsymmetric systems. In fact, all results carry over smoothly to the singular case, which does therefore not require a specific treatment (except a proper handling of issues associated with singular coarse grid matrices). Regarding problems with a low-dimensional null space, the presented results thus mainly confirm what has been observed often at a more practical level in many previous works. As illustrated by the discussion of the application to the curlcurl equation, the potential impact is greater for problems with large-dimensional null space. Indeed, it turns out that the design of multilevel methods can then actually be easier for the singular case than it is for nearby regularized problems. Key words. singular system, multigrid, multilevel, convergence analysis, preconditioning, AMG AMS subject classification. 65F08, 65F10, 65F50 ∗ Yvan Notay is Research Director of the Fonds de la Recherche Scientifique – FNRS 1 Introduction In this paper, we consider two level iterative solution methods for compatible singular linear systems Au = b ; b ∈ R(A) . (1.1) Such systems arise in numerous applications; they may be associated with Markov chains, discretized partial differential equations with Neumann boundary conditions, Laplacian of graphs, etc. In many cases, the systems are large and badly conditioned. In such a context, multilevel algorithms are often methods of choice [?, ?, ?], and these have effectively been successfully applied to singular systems in a number of works; see, e.g., [?, ?, ?, ?, ?, ?, ?, ?, ?, ?]. For general iterative methods, theoretical issues associated with singularities have been studied for a long time and are now fairly well understood. However, regarding multilevel methods, most previous works focus on practical aspects, and the state of the art is less advanced with respect to theory. A noteworthy contribution is the one in [?], where McCormick’s bound [?] is extended to symmetric positive semidefinite matrices. However, whereas this theory allows one to analyze multilevel algorithms of V-cycle type, it does not yield sharp two-level convergence estimates [?]. The analysis in [?] is also dedicated to multi-level convergence estimates which are by nature less tractable than bounds for the two-level case. Two-level analysis is actually more widely used even for methods that involve many levels in practice. This is particularly true for the algebraic analysis initiated by Brandt [?] and which inspired many works on algebraic multigrid (AMG) methods; see, [?, ?, ?, ?, ?] for examples. Moreover, the original theory has been improved in, e.g., [?, ?, ?], yielding finally sharp bounds in [?, ?] for symmetric positive definite (SPD) matrices, whereas the approach has been extended to nonsymmetric matrices in [?] (we refer [?, ?] for extensive reviews). Nevertheless, the extension of these results to singular systems has not been discussed so far. This is the first motivation of this paper, where we address the involved issues and provide a generalization of all the main results constitutive of the approach. We also analyze in detail a peculiarity of multilevel algorithms for singular systems: often, the coarse grid matrix is “naturally” singular as well, raising several questions. The first one is the compatibility of the coarse systems that are to be solved during the course of the iterations, which is effectively addressed in most works on the topic. Here we moreover analyze whether the choice of the generalized inverse has or does not have an influence on the following of the iterations. It turns out that there is actually no influence when the system matrix A is symmetric and positive semidefinite. Hence, whereas sometimes opposite choices are made in the literature (compare [?], [?] and [?]), here we show that they are in fact equally valid. The first outcome of our analysis is thus that all known theoretical results for regular problems carry over smoothly to the singular case. Regarding the many methods inspired by these results (e.g., [?, ?, ?, ?, ?, ?]), we then conclude that their extension to singular problems should not raise particular difficulties, even though their detailed investigation 2 lies outside the scope of the present study. Actually, here we mainly give a theoretical confirmation of a well known fact, as the practical observation that multilevel algorithms can be successfully applied to solve singular problems traces back to the early times of multigrid [?], and these methods are, for instance, nowadays routinely used in the context of pressure correction methods for Navier–Stokes problems [?]. More generally, the presented results complement the many works where multilevel algorithms have been successfully applied to solve singular problems (e.g., [?, ?, ?, ?, ?, ?, ?, ?]). Now, going to the technical details, it turns out that designing an efficient two-level algorithm can even be easier in the singular case, as the constraints to satisfy for fast convergence are actually weaker. When the null space is low-dimensional, as in all previous works (to the best of our knowledge), the difference is anecdotal, and has hardly any practical consequence. But, when the null space is large-dimensional, the impact is significant, and opens the way to the design of algorithms specifically tailored for singular problems, that can be more efficient than their regular counterpart. This other outcome of our theoretical analysis is further discussed below in the context of the curl-curl equation, for which multilevel algorithms have been proposed in, e.g., [?, ?], excluding, however, the singular case. The remainder of this paper is organized as follows. In Section 2, we recall some basic facts about the iterative solution of singular systems. In Section 3, we develop our analysis; more precisely, we first state the general framework (Section 3.1), and next we analyze the influence of the generalized inverse that defines the coarse grid correction (Section 3.2); we then proceed (Section 3.3) with our extension to the singular case of the main results in [?] that apply to general (nonsymmetric) matrices; thereafter, we discuss the particular cases of symmetric positive semidefinite matrices (Section 3.4) and of matrices positive semidefinite in Rn (Section 3.5). Problems with large-dimensional null space are discussed in Section 4, and conclusions are drawn in Section 5. Notation For any matrix C , R(C) denotes its range and N (C) its null space. If C is square, σ(C) denotes its spectrum. If W1 and W2 are two subspaces, W1 + W2 denotes their (not necessarily direct) sum; i.e., the set of linear combinations of vectors from W1 and W2 . 2 General iterative methods for singular systems As mentioned in the preceding section, there is a rich literature on the iterative solution of singular systems. Here we only recall the results that are useful for our later developments, or that help to put them in proper perspective. We also analyze in detail the conditions under which two different preconditioners may be considered equivalent, because this is crucial for a correct understanding of the results in the next section. Noting B the preconditioning matrix, stationary iterations to solve (1.1) correspond to um+1 = um + B(b − A um ) , 3 m = 0, 1, ... . (2.1) Often instead of B one writes B −1 , where B is the preconditioner or the main operator in a splitting of A . However, in the singular case, B does not need to be invertible, hence using B −1 would entail some loss of generality. Letting u be any particular solution to (1.1), there holds u − um = T m (u − u0 ) , where T = I −BA (2.2) is the iteration matrix. It is well known (see [?] for an early reference) that um converges to a particular solution for any b ∈ R(A) if and only if T m converges to a projector onto N (A) . In, e.g., [?, Theorem 2.1], this is shown to hold if and only if the three following conditions are satisfied: N (B A) = N (A) , 1 is a semi-simple eigenvalue of T , and all other eigenvalues are less than 1 in modulus. This yields the following lemma, observing that the first two conditions are equivalent to the assumption that the eigenvalue 0 of B A has algebraic multiplicity equal to k , where k = dim(N (A)) . (See [?] for a further discussion of this topic.) Lemma 2.1. Let A and B be n × n matrices, and let k = dim(N (A)) . The iterative scheme defined by (2.1) converges to a particular solution of (1.1) for any b ∈ R(A) if and only if the two following conditions hold: (1) The algebraic multiplicity of the eigenvalue 0 of B A is equal to k; (2) ρeff = max λ∈σ(B A) |1 − λ| < 1 . λ6=0 Krylov subspace methods have also been widely studied in presence of singularity; see, e.g., [?, ?, ?]. Here we only recall the main conclusions from [?] about GMRES (the conjugate gradient method is discussed below together with the positive semidefinite case). The following lemma is based on the observation that the condition (1) in Lemma 2.1 is in fact sufficient to apply the framework developed in Section 2.5 of [?], where a GMRES process for a singular system is decomposed into the components belonging to the range of the matrix and the components orthogonal to it. Lemma 2.2. Let A and B be n × n matrices, and let k = dim(N (A)) . If the algebraic multiplicity of the eigenvalue 0 of B A is equal to k , then GMRES iterations to solve (1.1) are well defined with either left or right preconditioning B . For any b ∈ R(A) , the convergence of such a GMRES process is the same as that of (a) in the case of left preconditioning, a regular GMRES process with system matrix T QBA B A QBA , where QBA is a matrix whose columns form an orthonormal basis of R(B A) ; (b) in the case of right preconditioning, a regular GMRES process with system matrix T QAB A B QAB , where QAB is a matrix whose columns form an orthonormal basis of R(A B) . Moreover, the eigenvalues of both matrices QTBA B A QBA and QTAB A B QAB are, counting multiplicities, equal to the n − k nonzero eigenvalues of B A . 4 Proof. In Section 2.5 of [?] it is shown how GMRES for a compatible system can be decomposed into the components belonging to the range of the (preconditioned) matrix, and those orthogonal to it. This decomposition provides the required results as soon as it is applicable, that is, as soon as QTBA B A QBA (case (a)) or QTAB A B QAB (case (b)) are nonsingular. But, by [?, Theorem 1.3.22], QTBA B A QBA has the same nonzero eigenvalues (counting multiplicities) as QBA QTBA B A , which is itself equal to B A since QBA QTBA is a projector onto R(B A) . Since our assumptions imply that B A has exactly n − k nonzero eigenvalues, where n − k is also the row and column size of QTBA B A QBA , the conclusions follows for (a). Similarly, one finds that QTAB A B QAB has the same nonzero eigenvalues as A B , which by virtue of [?, Theorem 1.3.22] again, also coincide with the nonzero eigenvalues of B A . When A is singular, different preconditioners may lead to the same iteration matrix (see [?] for a detailed discussion). More generally, different preconditioners act equivalently when, applied to a vector in R(A) , they provide the same result up to a component in N (A) . Indeed, on the one hand, considering either stationary iterations or Krylov subspace methods for compatible systems, it is clear that the preconditioner is applied only to vectors r in R(A) ; thus, only the action on vectors of that subspace matters. On the other hand, if two preconditioners applied to a same vector provide the same result up to a component in N (A) , this latter influences which particular solution will be finally obtained, but does not otherwise influence the following of the iterations. This follows because the subsequent multiplication by A will ignore this component; hence, e.g., GMRES parameters are not affected and the next residual vector will be the same. In the following lemma, we give conditions under which such extended equivalence holds, and we furthermore show that, under these conditions, the preconditioned matrices have same spectrum. Lemma 2.3. Let A , B1 and B2 be n × n matrices. If T T πR(A T ) B1 πR(A) = πR(AT ) B2 πR(A) , where πR(AT ) and πR(A) are projectors onto, respectively, R(AT ) and R(A) , then, for any r in R(A) , B1 r − B2 r ∈ N (A) . Moreover, B1 A and B2 A have the same eigenvalues (counting multiplicity). Proof. One has, since r ∈ R(A) implies πR(A) r = r , B1 r − B2 r = (B1 − B2 ) πR(A) r T T = πR(A T ) (B1 − B2 ) πR(A) r + (I − πR(AT ) ) (B1 − B2 ) πR(A) r , T and the last right hand side belongs to N (A) because πR(A T ) (B1 −B2 )πR(A) = 0 by assumpT T ⊥ tion, whereas, R(I − πR(AT ) ) = N (πR(AT ) ) = R(πR(AT ) ) = R(AT )⊥ = N (A) . The statement about the eigenvalues follows for the fact that, for i = 1 , 2 , there holds that Bi A = T T Bi πR(A) A πR(A T ) has, by [?, Theorem 1.3.22], the same eigenvalues as πR(AT ) Bi πR(A) A . 5 The symmetric positive semidefinite case An important particular case is when A is symmetric and positive semidefinite. Then, the convergence can be analyzed with respect to the seminorm k · kA = (· , A ·)1/2 , and it is easy to see that ku − um kA does not depend on the particular solution u . Moreover, if the preconditioning matrix B is SPD, condition (1) of Lemma 2.1 is automatically satisfied, whereas one may consistently use the preconditioned conjugate gradient method [?]. These observations are summarized in the following lemma, where we also use the fact that, by Lemma 2.3, the requirement on B can be relaxed: it needs only to be equivalent to some SPD preconditioning matrix. Lemma 2.4. Let A be a symmetric positive semidefinite n × n matrix, and let B be a n × n matrix such that πR(A) B πR(A) = πR(A) Bb πR(A) , (2.3) for some SPD preconditioning matrix Bb , where πR(A) is the orthogonal projector onto R(A) . Then B A has, counting multiplicities, k times the eigenvalue 0 and n − k positive eigenvalues, where k = dim(N (A)) . Moreover, considering the linear system (1.1) and letting u be any particular solution, the following propositions hold. (1) When performing stationary iterations (2.1), ku − um kA ≤ (ρeff )m ku − u0 kA (2.4) where ρeff = max |1 − λ|. (2.5) λ∈σ(B A) λ6=0 (2) When performing conjugate gradient iterations with preconditioning B , the convergence is the same as that of a regular conjugate gradient process in Rn−k with SPD preconditioning and for which the eigenvalues of the preconditioned matrix are equal to the nonzero eigenvalues of B A . Moreover, √ κeff − 1 k ku − u0 kA (2.6) ku − uk kA ≤ 2 √ κeff + 1 where κeff = maxλ∈σ(B A) λ . min λ∈σ(B A) λ (2.7) λ6=0 3 3.1 Analysis of two-level preconditioning General framework We assume that the system matrix A is a real n × n matrix, and we consider the two-level schemes that are described by the iteration matrix ν ν TTL = I − M2−1 A 2 (I − P Agc R A) I − M1−1 A 1 , (3.1) 6 where M1 (resp. M2 ) is a nonsingular 1 n × n matrix which defines the pre-smoother (resp. post-smoother), and where ν1 (resp. ν2 ) is the corresponding number of smoothing steps; P is the prolongation matrix, of size n × nc with nc < n , R is the restriction matrix, of size nc × n , and Ac is the coarse grid matrix; we restrict ourselves to Galerkin coarse grid matrices; i.e., we assume Ac = R A P , (3.2) but we do not assume that Ac is nonsingular; in (3.1), the notation Agc stands for a generalized inverse of Ac ; that is, see, e.g., [?], any matrix satisfying Ac Agc Ac = Ac . (3.3) This definition is motivated as follows: since (3.3) implies Ac Agc rc = rc ∀rc ∈ R(Ac ) , the vector Agc rc is indeed a particular solution to any compatible system Ac uc = rc . Clearly, if Ac is singular, there are infinitely many generalized inverses, and the iterative process may also be influenced by the choice of the latter. This will be addressed in the next subsection, where we also discuss conditions under which the coarse grid systems are always compatible (i.e., conditions under which, during the course of the iterations, Agc is indeed applied only to vectors in R(Ac )). To alleviate the presentation, we shall assume that one performs at most one presmoothing step and one post-smoothing step. Thus, either ν1 = 1 , ν2 = 0 (no postsmoothing), or ν1 = 0 , ν2 = 1 (no pre-smoothing) or ν1 = ν2 = 1 . For future reference, note that, letting, if ν1 = 1 , ν2 = 0 M1 Y = (3.4) M2 if ν1 = 0 , ν2 = 1 −1 M1 (M1 + M2 − A) M2 if ν1 = ν2 = 1 , the two-level preconditioner corresponding to (3.1) is ν ν BTL = Y −1 + I − M2−1 A 2 P Agc R I − A M1−1 1 , (3.5) which thus in particular satisfies I − BTL A = TTL . In view of (3.4), we assume that M1 + M2 − A is nonsingular when ν1 = ν2 = 1 . We target simple smoothers such as damped Jacobi (Mi = ω −1 diag(A)) or Gauss–Seidel (Mi = upp(A) or Mi = low(A)); technically, smoothers based on singular preconditioners can be allowed if assumption (1) of Lemma 2.1 is satisfied, but this would make the presentation significantly more cumbersome because the matrices Y and X defined below would then require a thorough discussion. 1 7 3.2 The coarse grid correction The coarse grid correction step corresponds to the term P Agc R in (3.1) or (3.5): given the residual vector e r obtained after pre-smoothing, in consists in the following substeps. 1. Restrict e r on the coarse grid: 2. Solve the coarse problem A uc = rc : 3. Prolongate uc on the fine grid: rc = R e r uc = Agc rc e = P uc u In the regular case, Ac is logically assumed nonsingular and hence Agc = A−1 c . In the singular case, however, Ac is often “naturally” singular. For instance, the constant vector is often in the range of the prolongation (then a constant on the coarse grid remains a constant on the fine grid after prolongation). However, the constant vector is also typically a null space vector in the singular case (consider for instance Laplacian of graphs or discretized partial differential equations with Neumann boundary conditions). Then, Ac = R A P is singular since A P vc = 0 for all vc such that P vc ∈ N (A) . Similarly, RT vc will be in the left null space of Ac for any vc such that RT vc ∈ N (AT ) . This arises with Markov chains for which the range of RT typically contains the constant vector which is also a left null space vector. Fortunately, such singular modes are harmless. Indeed, on the one hand, as shown in, e.g., [?, ?], the coarse systems Ac uc = rc to solve are compatible . This follows because, during the course of the iterations, only residual vectors e r belonging to R(A) are restricted T on the coarse grid, whereas e r ∈ R(A) if and only if nL r = 0 for all nL ∈ N (AT ) . Then, T T T T vc rc = vc (R e r) = (R vc ) e r = 0 for any vc such that RT vc ∈ N (AT ) ; that is, the restricted residual is orthogonal to all left null space vectors of Ac that are inherited from left null space vectors of A in the range of RT . If Ac has no other singular mode, the coarse system is therefore compatible. On the other hand, if, similarly, all right null space vectors of Ac are inherited from right null space vectors of A in the range of P , all null space components on the coarse grid (i.e., all vectors in N (Ac )) are prolongated into null space components on the fine grid, which, as seen in the preceding section, do not influence the convergence process. Therefore, which solution of the coarse grid system is picked up by the specific choice of Agc does not really matter. These observations amount to stating that the coarse grid correction term P Agc R , is independent of the choice of Agc when multiplied to the left by a projector onto R(AT ) and to the right by a projector onto R(A) . They are formally proved in the following lemma, where we use the condition dim(R(P ) ∩ N (A)) = dim(R(RT ) ∩ N (AT )) = dim(N (Ac )) (3.6) as mathematical formulation of the assumption that all vectors in N (Ac ) correspond to null space vectors of A in the range of P , and all vectors in N (ATc ) correspond to left null space vectors of A in the range of RT . Note that dim(N (Ac )) is in fact at least as large as both dim(R(P ) ∩ N (A)) and dim(R(RT ) ∩ N (AT )) . Thus, (3.6) amounts to assuming that Ac has no additional singularity. This may be seen as the natural extension 8 of the assumption that Ac is nonsingular when A is nonsingular (observe that it is trivially satisfied when Ac is nonsingular). Lemma 3.1. Let A be a n × n matrix, let P and R be full rank matrices of size respectively n × nc and nc × n , and let Ac = R A P . If (3.6) holds, one has, for any pair of matrices Agc (1) and Agc (2) satisfying (3.3), T g (1) T g (2) πR(A R πR(A) = πR(A R πR(A) , T ) P Ac T ) P Ac where πR(AT ) and πR(A) are projectors onto, respectively, R(AT ) and R(A) . Proof. Let Nc,R be a nc × kc matrix of rank kc such that the columns of P Nc,R form a basis of R(P ) ∩ N (A) . By the condition dim(R(P ) ∩ N (A)) = dim(N (Ac )) , the columns T ⊥ of Nc,R form also a basis of N (Ac ) . Since N (πR(A = R(AT )⊥ = N (A) , T ) ) = R(πR(AT ) ) T one has πR(A T ) P Nc,R = 0 . Similarly, for Nc,L forming a basis of the left null space of Ac , T R πR(A) = 0 . Thus, for i = 1 , 2 , one finds that Nc,L T g (i) πR(A R πR(A) = T ) P Ac T T −1 T T T πR(A Nc,R Agc (i) I − Nc,L (Nc,L Nc,L )−1 Nc,L R πR(A) . T ) P I − Nc,R (Nc,R Nc,R ) The conclusions follows, because, whatever Agc (i) satisfying (3.3), the matrix g (i) T T T T I − Nc,R (Nc,R Nc,R )−1 Nc,R Ac I − Nc,L (Nc,L Nc,L )−1 Nc,L is the unique generalized inverse of Ac with range R(Nc,R )⊥ and null space R(Nc,L ) [?, Chapter 2, Theorem 10(c)]. In the nonsymmetric case, the assumption (3.6) cannot always be fulfilled in practice. For instance, when solving linear systems associated with Markov chains, the left null space vector is the constant vector which is typically in the range of RT , but the right null space vector is unknown and in general not in the range of P . It follows that the coarse grid systems to solve are indeed compatible, but the choice of the generalized inverse has some influence, as the null space component on the coarse grid is prolongated in nontrivial components on the fine grid .2 Our main result in the next section uses the assumption (3.6) and therefore cannot directly cover such cases. However, it can be applied indirectly via the following lemma which shows that, if the used generalized inverse has the same rank as Ac (as, e.g., the Moore–Penrose inverse), then the coarse grid correction is in fact identical to that obtained e for which the corresponding Galerkin with some truncated prolongation Pe and restriction R e e coarse grid matrix R A P is nonsingular. (The Lemma is also a technical result needed for the proof of Theorem 3.3 below.) 2 It is generally observed that the best results are obtained with the Moore–Penrose inverse which provide a minimal norm solution [?, ?], but so far no analysis supports this fact. 9 Lemma 3.2. Let A be a n × n matrix, let P and R be full rank matrices of size respectively n × nc and nc × n , and let Ac = R A P . Let Agc be a matrix satisfying (3.3) that has the same rank as Ac , and let VL , VR be nc × n ec matrices whose columns form a basis of R(Agc ) and R(Agc T ) , respectively, where n ec is the rank of Ac . e = V T R are such that R e A Pe is The n × n ec matrix Pe = P VL and the n ec × n matrix R R nonsingular and −1 e A Pe e. P Ag R = Pe R R (3.7) c Moreover, if dim(R(P ) ∩ N (A)) = dim(N (Ac )) , (3.8) R Pe + N A = R P + N A , (3.9) dim(R(RT ) ∩ N (AT )) = dim(N (Ac )) , (3.10) eT + N AT = R RT + N AT . R R (3.11) there holds whereas, if there holds e A Pe = V T R A P VL = V T Ac VL . As shown in [?, p. 52], if Ag Proof. One finds R c R R has the same rank as Ac , then its range is a subspace complementary to N (Ac ) . It follows that for any w 6= 0 , Ac VL w is a non trivial vector in R(Ac ) , which further implies that VRT Ac VL w 6= 0 , because otherwise Ac VL w would be moreover a nontrivial vector in R(VR )⊥ = R(Agc T )⊥ = N (Agc ) , which is not possible because, as also shown in [?, p. 52], N (Agc ) is a subspace complementary to R(Ac ) . e A Pe = V T Ac VL is nonsingular, Since, as just shown, VRT Ac VL w 6= 0 for any w 6= 0 , R R which also shows that all three terms VR , Ac , and VL have rank n ec . Then, Theorem 11(d) in Chapter 2 of [?] shows that Agc = VL VRT Ac VL −1 −1 e A Pe VRT = VL R VRT , and (3.7) follows straightforwardly. To show (3.9), let PN be a basis of R(P ) ∩ N (A) . One has PN = P Nc for some nc × kc matrix Nc of rank kc = dim(N (Ac )) . Clearly, all columns of Nc belong to the null space of Ac , and hence form a basis of that subspace by the condition (3.8). Moreover, we have g already seen that N (Ac ) = R(Nc ) is complementary to R(Ac ) = R(VL ) . Hence the matrix V = VL Nc is nc × nc and nonsingular. Therefore, the range of P is equal to that of P V , which is itself equal to R(Pe) + R(PN ) . The equality (3.9) follows (remember that R(PN ) ⊂ N (A)). The proof of (3.11) is similar. 10 3.3 Main result for general nonsymmetric matrices We are now ready to state our generalization of the main result in [?]. The only restrictive assumption is (3.6), which has already been commented on in the preceding subsection; observe also that it is always satisfied when Ac is nonsingular. It is also worth noting that, applied to a nonsingular matrix A , the statements in Theorem 3.3 in fact reproduce the main statement in Theorem 2.1 of [?], which is recovered explicitly using the following simplifications: firstly, the main assumption (3.6) amounts in fact to dim(N (Ac )) = 0 , i.e., to assume that Ac is nonsingular, and hence Agc = A−1 c ; next b b (3.12) is most easily satisfied using P = P , R = R ; finally N (A) = {0} implies V = Rn . Theorem 3.3. Let A be an n × n matrix, let M1 , M2 be n × n nonsingular matrices, and let ν1 , ν2 be nonnegative integers such that either ν1 = 1 , ν2 = 0 (no post-smoothing), or ν1 = 0 , ν2 = 1 (no pre-smoothing) or ν1 = ν2 = 1 . Let P and R be full rank matrices of size respectively n × nc and nc × n , let Ac = R A P , and assume that (3.6) holds. Assume that M1 + M2 − A is nonsingular if ν1 = ν2 = 1 , and let BTL be defined by (3.5), where Y is defined by (3.4) and where Agc is a matrix satisfying (3.3). Let k = dim(N (A)) and, noting n ec the rank of Ac , let Pb be any n × (e nc + k) matrix b and R be any (e nc + k) × n matrix such that R(Pb) = R(P ) + N (A) and bT ) = R(RT ) + N (AT ) . R(R Letting X be defined by M1 X = M2 M2 (M1 + M2 − A)−1 M1 if if if ν1 = 1 , ν 2 = 0 ν1 = 0 , ν 2 = 1 ν1 = ν2 = 1 , (3.12) (3.13) b is nonsingular, the matrix BTL A has k times the eigenvalue 0 and n if (Pb X R) ec times the eigenvalue 1. The other eigenvalues are the inverses of the n − n ec − k nonzero eigenvalues of the generalized eigenvalue problem b X Pb −1 R bX v = µ A v , X I − Pb R v∈V , (3.14) where V is any subspace complementary to N (A) . Proof. The thesis about the eigenvalues of BTL A can be translated into a thesis on the eigenvalues of TTL = I − BTL A , which, by [?, Theorem 1.3.22], has the same eigenvalues (counting multiplicity) as ν ν TeTL = (I − P Agc R A) I − M1−1 A 1 I − M2−1 A 2 = (I − P Agc R A) I − X −1 A . It follows that, instead of analyzing the eigenvalues of BTL A , we may equivalently analyze the eigenvalues of BeTL A , where BeTL = X −1 + P Agc R I − A X −1 11 is such that I − BeTL A = TeTL . The main idea of the proof is then the following. We first derive an explicit expression for the inverse BTL of BeTL , and we use it to prove that the algebraic multiplicity of the eigenvalue 0 of BeTL A is equal to k as claimed. Next we use the fact that the inverses of the nonzero eigenvalues of BeTL A are the finite eigenvalues of the generalized eigenvalue problem BTL z = µ A z . We then observe that this latter admits n ec times the eigenvalue 1, and finally conclude the proof by showing that, regarding the remaining n − n ec − k eigenvalues, this generalized eigenvalue problem is in fact equivalent to (3.14). ec = R e X Pe is nonsingular, where R e and Pe The developments below require that X are defined as in Lemma 3.2. This is, however, no loss of generality because in case this matrix would be singular, we can make the proof for a slightly perturbed case obtained by exchanging X for X + ε I . Since both the eigenvalues of BTL A and of (3.14) depend continuously on ε , the needed results for the original matrices are obtained by considering the limit for ε → 0 . Before entering the core of the proof, it is also worth deriving some auxiliary results related to e R e , P = P NR and R = NLT where NL and NR are n × k matrices whose columns span a basis of, respectively, N (AT ) and N (A) . By (3.9) and (3.11) (which apply because of (3.6)), these matrices are full b = GR R for some nonsingular GP and GR . It follows rank and such that Pb = P GP and R b b that the nonsingularity of R X P implies that of R X P . Moreover: b X Pb −1 R b = P R X P −1 R ; Pb R (3.15) that is, it suffices to prove the required result for the particular case where Pb = P and b = R . On the other hand, note for future reference that R ! ec e X NR X R . (3.16) RX P = NLT X Pe NLT X NR ec implies then that of the Schur This matrix being nonsingular, the nonsingularity of X T T −1 e e e complement Sc = NL X NR − NL X P Xc R X NR . ec is nonsingular, we We now enter the main part of the proof. Assuming thus that X −1 ec R e (X − A) . Indeed, use Lemma 3.2 to show that the inverse of BeTL is BTL = X − X Pe X ec = R e A Pe : setting A e I − A X −1 e −1 R e (X − A) X −1 + PeA e−1 R BTL BeTL = X − X Pe X c c e (I − A X −1 ) e−1 e −1 e −1 e e e−1 R = I + X Pe A c − Xc − Xc R(X − A)P Ac = I. This allows us to prove the first claim, related to the multiplicity of the eigenvalue 0 of BTL A . First, because BTL is nonsingular, the geometric multiplicity is equal to k = 12 dim(N (A)) . Next, the algebraic multiplicity exceeds the geometric one if and only if there exist nontrivial vectors z and v such that BTL A z = NR v , implying A z = BTL NR v and therefore NLT BTL NR v = 0 ; but this is not possible because the matrix NLT BTL NR = e −1 R e X NR is equal to the Schur complement Sc that is shown above to NLT X − sX Pe X c be nonsingular. To pursue, the nonzero eigenvalues of BTL A are thus the inverses of the finite eigenvalues of e −1 R e (X − A) z = µ A z . X − X Pe X (3.17) c eT ) is a left Since R(NL ) is a left eigenspace associated with infinite eigenvalues, and R(R eigenspace associated with the eigenvalue 1, the (right) eigenvectors associated with other eigenvalues have to satisfy, besides (3.17), T −1 e e e NL X − X P Xc R X z = 0 (3.18) and eAz = 0 . R (3.19) On the other hand, exploiting a block factorization of (3.16), one sees that, for any z satisfying (3.18), −1 P RX P RX z −1 eX z e e X NR e −1 R I R X I −X c c e = P NR e −1 I NLT X z −NLT X Pe X Sc−1 I c eX z e −1 e X NR e −1 R R X I −X c c e = P NR Sc−1 0 I −1 e R eX z . = Pe X c Therefore, with the two conditions (3.18) and (3.19), the problem (3.17) is equivalent to −1 X −X P RX P RX z = µAz , (3.20) which in fact amounts to (3.14) because of (3.15). It remains thus only to check the equivalence of the additional conditions. With respect eT ) is a left to (3.14), we state that we consider the nonzero eigenvalues. Because R(R eigenspace associated with the eigenvalue 0 for (3.20), this indeed amounts to imposing (3.19). On the other hand, the condition z ∈ V is in general not equivalent to (3.18). However, N (A) is the null space of A , and is also included in the null space of the left hand side matrix of (3.14). Therefore, if this latter equation is satisfied by some z , it is also satisfied by z + n for the same eigenvalue µ whatever n ∈ N (A) . Hence, which subspace V is used is unimportant regarding the eigenvalues, and we may thus consider the subspace induced by the condition (3.18) providing that it is complementary to N (A) 13 as needed. Finally, this latter requirement is checked by observing that no vector in N (A) can satisfy (3.18). Indeed, otherwise, one would have −1 e T e e NL X − X P Xc R X NR v = 0 ec−1 R e X NR is equal for some non trivial v , which is not possible since NLT X − X Pe X to the Schur complement Sc that is shown above to be nonsingular. When applied to singular A , Theorem 3.3 first tells us that the algebraic multiplicity of the eigenvalue 0 of the preconditioned matrix is equal to dim(N (A)) . Hence, Lemma 2.2 can be applied to analyze GMRES iterations, whereas condition (1) of Lemma 2.1 holds, and hence the convergence of stationary iterations is guaranteed if the nonzero eigenvalues λ of B A satisfy |1 − λ| < 1 , which holds if and only if the nonzero eigenvalues µ of (3.14) satisfy |1 − µ−1 | < 1 . 3.4 The symmetric positive semidefinite case Here we address the generalization of the results for SPD matrices originally obtained in [?, ?, ?, ?, ?, ?]. We use the fact that these results can be recovered as corollaries of the analysis for general nonsymmetric matrices considered in Section 3.3, even though the original proofs were quite different. More precisely, this can be done in two steps. Firstly, the sharp estimate in [?, ?] can be obtained by particularizing to the symmetric case the results in [?, Theorem 2.1]. Next, as also discussed in [?, ?], the previous (non sharp) estimates can be derived by combining the sharp bound with some further inequalities. We accordingly proceed in two steps here: Theorem 3.4 below extends the bound in [?, ?] to symmetric positive semidefinite matrices, whereas Theorem 3.5 generalizes the inequalities that allow us to deduce more tractable bounds, in particular those that appeared earlier in [?, ?, ?, ?]. To make things clear, the assumption used in this subsection are: A is symmetric and positive (semi)definite , R = P T and M1 is SPD M2 is SPD M1 = M2T and M1 + M1T − A is SPD if ν1 = 1 , ν2 = 0 if ν1 = 0 , ν2 = 1 if ν1 = ν2 = 1 . (3.21) (3.22) Then, the two potentially restrictive assumptions of Theorem 3.3 are automatically satisfied. On the one hand, X in (3.13) is SPD and hence PbT X Pb is always non singular. On the other hand, (3.6) also always holds: because A positive semidefinite implies that Ac is positive semidefinite as well, one has, for any vc , Ac vc = 0 ⇐⇒ vcT (P T A P )vc = 0 ⇐⇒ (P vc )T A(P vc ) = 0 ⇐⇒ A(P vc ) = 0 ; 14 that is, any vector in the null space of Ac corresponds to a vector in R(P ) ∩ N (A) . Regarding the case ν1 = ν2 = 1 the positive definiteness of M1 + M1T − A implies that I − Y −1 A = (I − M1−T A)(I − M1−1 A) defines a convergent iteration, see [?, ?, ?]; moreover, 0 ≤ λ ≤ 1 ∀λ ∈ σ(Y −1 A) = σ(X −1 A) . (3.23) In practice, there are two important particular cases. First, one may use the same SPD matrix M1 = M2 = M for pre- and post-smoothing, and then 2M − A is SPD as needed if and only if I − M −1 A defines a convergent iteration. Second, one often uses symmetrized Gauss–Seidel smoothing, for which M1 = upp(A) and M2 = low(A) ; then M1 + M1T − A = diag(A) is SPD as soon as all diagonal entries of A are positive. Our generalization of the sharp bounds for the SPD case can be stated as follows. b = PbT , and assume in Theorem 3.4. Let the assumption of Theorem 3.3 hold, select R addition (3.21), (3.22) (which imply (3.6) and the nonsingularity of PbT X Pb). Then, BTL A has k times the eigenvalue 0 and n − k positive eigenvalues satisfying λ , (3.24) max λ ≤ max 1 , max −1 λ∈σ(BTL A) min λ∈σ(BTL A) λ6=0 λ∈σ(Y −1 λ = min 1 , KX , where v X I − Pb PbT X Pb T KX = sup A) −1 (3.25) PbT X vT A v v∈Rn v6∈N (A) v . (3.26) Moreover, if ν1 = ν2 = 1 , BTL is SPD and max λ = 1, (3.27) λ∈σ(BTL A) λ6=1 κeff = maxλ∈σ(BTL A) λ min λ∈σ(BTL A) λ = KX , (3.28) λ6=0 ρeff = 1 − min λ∈σ(BTL A) λ6=0 −1 λ = 1 − KX . (3.29) Proof. The eigenvalue estimates are straightforward corollaries of (3.14) under the given additional conditions. The matrix in the numerator of KX often has a complicated form (except if one deliberately chooses to analyze a simplified scheme with, say, only a single smoothing step). This matrix can be seen as the product of X with projectors: letting π̂ = I − Pb(PbT X Pb)−1 PbT X , −1 one has X I − Pb PbT X Pb PbT X = π̂ T X π̂ . The idea behind more tractable estimates (including earlier bounds) is to substitute a simpler matrix for X , and/or to use simpler projectors. Of course, inequalities are then needed to deduce eigenvalue bounds from the associated quantities. These are proved in the following theorem. 15 b = PbT , and assume in Theorem 3.5. Let the assumption of Theorem 3.3 hold, select R addition (3.21), (3.22) (which imply (3.6) and the nonsingularity of PbT X Pb). (1) The quantity −1 vT X I − Pb PbT X Pb PbT X v KX = sup , (3.30) vT A v v∈Rn v6∈N (A) is also the smallest constant K such that kA(I − P Agc P T A)vk2X −1 ≥ K −1 k(I − P Agc P T A)vk2A ∀v ∈ R(A) . (3.31) (2) Letting, for any projector π with null space equal to R(Pb) , KX (π) = sup v∈Rn v6∈N (A) vT π T X π v , vT A v (3.32) there holds KX = min KX (π) (3.33) π: projector with N (π)=R(Pb) with arg minπ = I − Pb(PbT X Pb)−1 PbT X . Moreover, kπk2X vT X v T v6∈N (A) v A v inf ≤ KX (π) ≤ KX kπk2X . (3.34) (3) Let δX be the largest constant δ such that kA vk2X −1 ≥ δ k(I − P Agc P T A)vk2A ∀v ∈ R(A) . (3.35) Setting π e = I − N (N T X N )−1 N T X I − P Agc P T A , (3.36) where N is any n × k matrix whose columns form a basis of N (A) , there holds δX = KX (e π) −1 . (4) If ν1 = ν2 = 1 , δX is also the largest constant δ such that k(I − M1−T A)vk2A ≤ kvk2A − δ k I − P Agc P T A vk2A (3.37) ∀v ∈ Rn . (3.38) (5) For any n × n SPD matrices G , defining KG as in (3.30) with G instead of X , there holds T T v v KG ≤ KX ≤ max vvT X KG . (3.39) min vvT X Gv Gv v6=0 v6=0 16 Proof. Preliminaries. Since Ac Agc is a projector onto R(Ac ) [?, p. 52], letting Nc be a matrix whose columns form a basis of N (Ac ) , we have Ac Agc = I − Ec NcT for some Ec . Similarly, Ac Agc T = (Agc Ac )T (Ac is symmetric but Agc may be unsymmetric) is a projector onto R(Ac ) and, therefore, Ac Agc T = I − Fc NcT for some Fc . Letting Q = P Agc P T A , one has then A(I −Q) P = A P (I −Agc Ac ) = A P Nc FcT = 0 (remember that R(P Nc ) ⊂ N (A)). Similarly, one finds P T A(I − Q) = (I − Ac Agc )P T A = 0 and P T (I − QT )A = 0 , and one may check that QT A Q = A P Agc T Ac Agc P T A = A P (Ac Agc )T Agc P T A = A Q , which further implies (I − QT )A(I − Q) = (I − QT )A . e for some positive α , then KX = Proof of statement (1). First note that if X = α X α KXe , whereas the smallest admissible K in (3.31) also scales linearly with α . Hence we may assume without loss of generality that X is scaled in such a way that maxλ∈σ(X −1 A) λ ≤ 1 , which implies KX ≤ 1 . Consider then the generalized eigenvalue problem (I − QT )A X −1 A(I − Q) z = µ A z , z ∈ R(A) . (3.40) Because A is SPD on R(A) and the left hand side matrix is symmetric nonnegative definite, the eigenvalues are nonnegative and there exists a set of eigenvectors {zi } forming a basis of R(A) and further satisfying zTi A zj = δij . Moreover, multiplying both sides to the left by P T , one sees that P T A zi = 0 for all eigenvectors associated with positive eigenvalues (as seen above, P T (I − QT )A = 0). On the other hand, the eigenvectors zi associated with the eigenvalue 0 are P such that (I − Q)zi ∈ N (A) . Then expand v ∈ R(A) on this eigenvector basis: v = i αi zi . Using the just seen properties of zi , one finds X X kA(I − P Agc P T A)vk2X −1 = |αi |2 µi and k(I − P Agc P T A)vk2A = |αi |2 . i: µi >0 i: µi >0 It follows that the smallest possible K in (3.31) is the inverse of the smallest nonzero eigenvalue of (3.40) µmin , and one may further note that µmin ≤ maxλ∈σ(X −1 A) λ ≤ 1. On the other hand, let zi be any eigenvector of (3.40) associated with a positive eigenvalue µi . Since, by the condition P T A zi = 0 , the equation (3.40) implies A zi − (I − QT )A X −1 A zi = (I − QT )(I − A X −1 )(A zi ) = (1 − µi ) (A zi ) , one sees that xi = A zi is an eigenvector of the matrix Z = (I − QT )(I − A X −1 ) associated with the eigenvalue 1 − µi . Conversely, since Z has N (A) as left eigenspace associated with the eigenvalue 1, all its (right) eigenvectors xi associated with eigenvalues different from 1 are orthogonal to N (A) ; i.e., they belong to the range of A and can thus be written xi = A zi for zi ∈ R(A) . Since P T Z A = 0 , such zi satisfy P T A zi = 0 when the eigenvalue λi is nonzero, and are therefore eigenvectors of (3.40) with eigenvalue µi = 1−λi . Below we show that Z has further the same eigenvalues as the iteration matrix I − BTL A , and hence that these µi = 1 − λi coincide with the eigenvalues of BTL A that are 17 different from 0 and 1. Using KX ≤ 1 and (3.25) of Theorem 3.4 then proves the required result. To conclude this part of the proof, we now show that Z has the same eigenvalues as I − BTL A : Z has the same eigenvalues as its transpose ν ν Z T = (I − X −1 A)(I − Q) = I − M1−1 A 1 I − M2−1 A 2 (I − Q) which, by [?, Theorem 1.3.22], has the same spectrum as I − BTL A . Proof of statements (2), (4) and (5). Regarding (3.33) and (3.39), these are standard results for the regular case; see [?, ?, ?, ?]. We refer the reader to these works because the related developments carry straightforwardly over the singular case, since they are based on inequalities involving only the numerators of the expressions defining KX and KX (π) . Similarly, the equivalence between (3.35) and (3.38) is based on kA vk2X = vT A(M −1 + M −T − M −1 A M −T )A v = vT A − (I − A M −1 )A(I − M −T A) v , which raises no particular comment for the singular case. Hence, for regarding the statements (2), (4) and (5), we are left with the proof of (3.34). Noting that vT π T X π v 2 , kπkX = sup vT X v v∈Rn the left inequality immediately follows. Further, using πX = I − Pb(PbT X Pb)−1 PbT X , one has π πX = π , and hence T T T v T π T X π v = v T πX π X π πX v ≤ kπk2X vT πX X πX v , from which we deduce the right inequality. Proof of statement (3). Remembering that (I − QT )A(I − Q) = (I − QT )A (see Preliminaries), one may check that the largest possible δ in (3.35) is the inverse of the largest eigenvalue of the generalized eigenvalue problem (I − QT )A v = µ A X −1 A v , v∈W , (3.41) where W is any fixed subspace complementary to N (A) . On the other hand, the right hand side of (3.32) with π = π e is the largest eigenvalue of (I − QT ) X I − N (N T X N )−1 N T X (I − Q) v = µ A v , v∈W . (3.42) Next, R(X −1 A) is complementary to N (A) : if A X −1 A v = 0 while X −1 A v is nontrivial, w = A v is a nontrivial vector for which wT X −1 w = 0 , which contradicts the fact that X is SPD. We can thus select W = R(X −1 A) , and replace v in (3.42) by X −1 A w , where w is again a vector in W ; that is, (3.42) is equivalent to (I − QT ) X I − N (N T X N )−1 N T X (I − Q) X −1 A w = µ A X −1 A w , 18 v ∈ W . (3.43) Now, multiplying both sides of (3.43) to the left by P T , the left hand side is zero because P T (I − QT ) = Fc NcT P T with P Nc ⊂ N (A) (see Preliminaries). Hence any eigenvector corresponding to a nonzero eigenvalue satisfies P T A X −1 A v = 0 . Similarly, any eigenvector of (3.41) corresponding to a nonzero eigenvalue satisfies P T A X−1 A w = 0 . Further, when P T A X −1 A w = 0 holds, (I − Q)X −1 A w = I − P Agc P T A X −1 A w = X −1 A w and (3.43) amounts to (I − QT ) X I − N (N T X N )−1 N T X I − P Agc P T A X −1 A w = (I − QT ) I − X N (N T X N )−1 N T A w = (I − QT )A w , showing that the problems (3.41) and (3.43) have indeed the same nonzero eigenvalues and associated eigenvectors. As for general nonsymmetric matrices, the obtained results can be applied to nonsingular A , and one then simply recovers the known results for the regular case. For instance, McCormick’s bound [?] (extended to the singular case in [?]) amounts to analyzing the −1 constant δ in either (3.35) or (3.38). The inequality KX ≤ δX (that can be deduced by combining (3.37) with (3.33)) extends here to the singular case of a well known fact, whereas (3.37) and (3.34) (applied to π e), which originate from [?, ?], allow us to see more closely the relationship between this bound and sharp two-level estimates. It follows that the analyses that exploit these results in the regular case carry over smoothly to the singular case. For instance, classical AMG methods [?, ?, ?] were originally developed for symmetric M-matrices, motivated by an analysis of the constant Kdiag(A) combined with an analysis of the constant in the right inequality (3.39) (referred to as smoothing property). Below we show explicitly how these results can easily be extended to singular M-matrices. 19 Application to classical AMG We show here how the above theorems can be used to extend to singular systems the classical two-level convergence estimate for Ruge-Stüben AMG methods [?, ?, ?]. This analysis applies when the system matrix A is a symmetric M-matrices with nonnegative row sum, hence we consider here symmetric M-matrices with zero row sum (singular Mmatrices with nonnegative row sum have always zero row sum). We further assume that A is irreducible, implying that the null space has dimension 1 and is thus spanned by the constant vector. Now, the convergence proof is based on the approach just mentioned, where one combines an analysis of KD with the smoothing property analysis. Regarding this latter, observe that both X and D in (3.39) are SPD in the singular case as well. Hence available results carry straightforwardly over the singular case. For instance, for symmetric M-matrices, it can be shown that this constant is at most 4 when X correspond to symmetrized Gauss–Seidel smoothing (i.e., forward Gauss–Seidel as pre-smoother and backward Gauss-Seidel as post-smoother); see, e.g., Theorem A.3.1 in [?] and the subsequent discussion, which can be applied verbatim to singular M-matrices. We thus now focus on the analysis of KD , which, see Theorem 3.4, also directly governs the convergence of a simplified two-level scheme with a single step of (damped) Jacobi smoothing. The AMG algorithm first partitions the unknowns set into coarse and fine unknowns. Next, a prolongation P is set up that has the form JF C P = , ICC where ICC is the identity for the coarse unknowns and JF C an interpolation operator for fine unknowns. For the considered class of matrices, the row sum of JF C should be 1 when the row sum of the corresponding row is zero; that is, everywhere in the singular case. Then, the range of P contains the constant vector; i.e., the null space of A , and we can therefore apply our theoretical results with Pb = P . In fact, in this case the “direct” interpolation rule [?, p 448] amounts to associating to each fine unknown i a set Pi of coarse neighboring unknowns, and set, for all j , ( |aij | P if j ∈ Pi k∈Pi |aik | (JF C )ij = 0 otherwise . Now, the coarsening algorithms, that define the fine/coarse partitioning and select the set Pi for each fine unknown, are actually designed in such a way that the following constant is only moderately larger than 1: P k6=i |aik | τ = max P . i∈F k∈Pi |aik | 20 This is motivated by theoretical considerations: one finds, for all v ∈ Rn (using Schwarz inequality and remembering that aij ≤ 0 for i 6= j whereas, in the singular case, aii = P k6=i |aik | for all i), !2 X i∈F aii vi − X (JF C )ij vj j∈Pi !2 |aij | P (vi − vj ) = aii k∈Pi |aik | i∈F j∈Pi ! ! X X X |aij | |aij | P P (vi − vj )2 ≤ aii |a | |a | ik ik k∈Pi k∈Pi j∈Pi i∈F j∈Pi XX ≤ τ |aij |(vi − vj )2 X X i∈F j∈Pi ≤ τ XX |aij |(vi − vj )2 i∈F j∈C ≤ −τ X X aij (vi − vj )2 2 i j = τ vT A v . And, with the help of Theorem 3.5, nothing more is needed to show that KD ≤ τ . Indeed, a valid “simplified” projector in (3.32) is π = I − P 0 ICC IF F −JF C = , 0 0 for which one has P KD (π) = 3.5 sup i∈F 2 P aii vi − j∈Pi (JF C )ij vj vT A v v∈Rn v6∈N (A) . The nonsymmetric positive semidefinite case Besides the case considered above where the system matrix is symmetric and positive semidefinite, more insight can also be gained for nonsymmetric matrices which are positive (semi)definite in Rn ; that is, for matrices A such that vT A v ≥ 0 ∀v ∈ Rn , (3.44) or, equivalently, such that the symmetric part A = 12 (A + AT ) is positive (semi)definite. For nonsingular matrices, this case is discussed in [?, ?], and we want to extend here these results to the singular case, taking benefit from the fact that they are corollaries of the general result for nonsymmetric matrices considered in the preceding subsection. 21 Note that with (3.44), vT (A+AT )v = 0 is possible only for null space vectors of A+AT . This implies that the left and right null space of A coincide, and thus further coincide with the null space of A + AT : A v = 0 ⇒ vT (A + AT )v = 0 ⇒ (A + AT )v = 0 ⇒ AT v = 0 . b such that R b = PbT . Thus, if R = P T , one may always select R Now, to obtain a clear localization of the eigenvalues from Theorem 3.3, we need not only to assume (3.44) and R = P T , but also that X is SPD. This latter assumption is particularly restrictive and means that, in practice, the analysis is in general possible only for a simplified two-level scheme using only a single (either pre- or post-) smoothing step with a simple smoother such as damped Jacobi (then X is equal to either M1 or M2 ). Altogether, the additional assumptions used in the the following theorem are thus A is positive (semi)definite in Rn , R = P T , and X is SPD . (3.45) Theorem 3.6. Let the assumption of Theorem 3.3 hold, assume in addition (3.45), and b = PbT . Then, BTL A has k times the eigenvalue 0, n select R ec times the eigenvalue 1, and the n − n ec − k other eigenvalues λ satisfy |λ − αX | ≤ αX , (3.46) −1 <e(λ) ≥ KX , (3.47) wT AT X −1 A w , 2 wT AS w (3.48) where, setting AS = 21 (A + AT ) , αX = KX = sup v∈Cn \N (A) b T A v=0 P sup −1 vT X I − Pb PbT X Pb PbT X v vT A v v∈Rn v6∈N (A) . (3.49) Moreover, if, for some number α there holds k(α I − X −1 A)wkX ≤ α kwkX ∀w : PbT A w = 0 , (3.50) then αX ≤ α . Proof. Exploiting (3.14), the proof of (3.47) given in Corollary 2.2 of [?] applies verbatim to the singular case as well. Next, any v satisfying (3.14) for µ 6= 0 also satisfies PbT A v = 0 (and hence vT AT Pb = T 0 ), as seen by multiplying both sides to the left by PbT . Then, multiplying both sides of (3.14) to the left by vT AT X −1 yields, with µ = λ−1 , vT AT v = λ−1 vT AT X −1 A v . 22 Thus, since v ∈ V implies v 6∈ N (A) , <e(λ−1 ) = <e(vT AT v) vT AS v = ≥ β −1 . vT AT X −1 A v vT AT X −1 A v The last inequality is in fact equivalent to (3.46) because <e (λR + i λI )−1 = λR λ2R + λ2I ≥ 1 β ⇐⇒ λR − β 2 2 + λ2I ≤ β2 4 . Finally, wT AT X −1 A w w T AS w ≤ 2α ⇐⇒ wT AT X −1 A w ≥ α wT A + AT w ⇐⇒ ⇐⇒ wT (α I − AT X −1 )X(α I − X −1 A) − α2 X w ≤ 0 k(α I − X −1 A)wkX ≤ α kwkX , showing that (3.50) implies αX ≤ α as claimed. 4 Problems with large-dimensional null space At first, our analysis provides mainly theoretical bases to the extension of methods initially developed for regular problems. However, going into the details, it turns out that obtaining satisfactory convergence can actually be easier for singular problems, at least in the positive semidefinite case. Indeed, null space vectors need not to be included in the prolongation P : if they are not, they are added to form the extended prolongation Pb , so that KX in (3.26) is finite in any case. On the contrary, if a positive semidefinite matrix is perturbed to become positive definite, all null space vectors become near null space ones, and they must be well approximated in the range of P . This can be seen with the help of Theorem 3.5, considering for instance Aε = A0 +ε I , where A0 is symmetric and positive semidefinite and ε a positive parameter. For any null space vector v of A0 , (3.30) implies that (using Pb = P since the problem is nonsingular for ε > 0) (I − P (P T P )−1 P T v2 vT (I − P (P T P )−1 P T v = ε−1 . KI ≥ ε vT v kvk2 The numerator of the right hand side is the square norm of the orthogonal projection of v onto R(P )⊥ and is thus sufficiently small to compensate for the ε−1 only if v has a close approximation in R(P ) . If not, KI will be large, which also means, in view of (3.39), that KX will be large for any well conditioned smoother. Since smoothers used with AMG methods are typically well conditioned, we have thus in the regular case an additional constraint with respect to the singular case. 23 Now, so far multilevel methods have been (to the best of our knowledge) only applied to matrices with a low-dimensional null space, for which the above observation is a mathematical curiosity without practical consequence. However, it can be crucial for the successful application of multilevel methods to problems with much larger null space. An example of such problem is offered by the discretization of curl ν curl E + σ E = f in Ω ⊂ R3 , (4.1) using Whitney–Nédélec edge finite elements [?, ?]. This problem arises as a special case of the Maxwell equations, with then ν > 0 and σ ≥ 0 , see [?, ?] for details, where the multigrid solution of the resulting linear system is also discussed in the case σ > 0 . This system (Kν + σ M ) u = b is symmetric positive definite when σ > 0 but only positive semidefinite when σ = 0 . This latter case corresponds to magnetostatics problems [?, ?], for which the null space is huge: at the continuous level, any vector field E corresponding to the gradient of a function ϕ is in the kernel of the curl operator. At the discrete level, this implies that the dimension of the null space of Kν is large (about one third of the matrix size in the numerical example below). Perhaps for this reason, so far, multilevel methods have been designed only for the regular case (σ > 0), even though, as in [?], the final goal is to solve the singular instance, σ playing the role of a regularization parameter. Clearly, this huge null space of Kν makes the design of multilevel methods especially challenging for small positive σ . In fact, since it is impractical to have all null space vectors of Kν in the range of P , state of the art methods resort to specialized smoothers; see, e.g., [?, ?]. Now, what prevented, e.g. in [?], the direct application of the so defined approach to the singular case (σ = 0)? We identify two obstacles. 1. The structure of the coarse grid matrix is similar to that of the fine grid one; i.e., the coarse grid matrix has also a large-dimensional null space, and issues associated with singular coarse grid matrices may then seem nontrivial. 2. The used hybrid smoother from [?] is not properly defined for σ = 0 . However, our results offer proper answers to both these concerns. Regarding the first one, we know that the coarse systems are compatible and may be solved with any generalized inverse; i.e., there is no real issue here. Regarding the second concern, we observe that the initial motivation for the hybrid smoother is the proper handling of near null space vectors not in the range of P . In the singular case, these near null space vectors become true null space vectors, and hence are in any case in the range of the extended prolongation Pb . Thus the specific reason why a specialized smoother is needed disappears, and it is sensible to expect satisfactory convergence with classical (Gauss-Seidel or damped Jacobi) smoothers. Technically, it means that one uses the same smoother as in [?, ?], but skipping the “hybrid part” when σ = 0. 24 Smoother h−1 h−1 Damped Jacobi Sym. Gauss–Seidel Std Hyb Std Std Hyb Std σ 0.01 0 0.01 0 = 20 1.00 0.84 0.84 1.00 0.72 0.72 = 40 1.00 0.85 0.85 1.00 0.76 0.76 Table 1: ρeff for the singular and regularized variants of the discrete problem, using a two-level method with different smoothing schemes: “Std” refers to standard smoothing as defined in this work, and “Hyb” to the hybrid smoother described in [?, ?, ?]. We tested this idea on the linear system resulting from the finite element discretization of (4.1) where Ω is the unit cube, using a uniform mesh of hexahedral (cubic) edge elements with edge length h . We use a two-level method as defined in this paper with a prolongation based on nodal aggregation as described in [?]; we refer to [?, Chapter 6] for more details on the prolongation setup, and, more generally, for a thorough description of the experiment setting. In Table 1, we report the effective spectral radius for both the singular case (σ = 0) and the regularized one, comparing for this latter the hybrid smoother with the standard one (only the standard smoother is well defined when σ = 0). For damped Jacobi smoothing, we consider no pre-smoothing and a single post-smoothing step with damping factor ω = 1/3 , which ensures in this example that (3.23) holds, and hence that (3.29) is actually satisfied with both damped Jacobi and symmetrized Gauss-Seidel smoothers. The results confirm the expectations: standard smoothers just perform as well for the singular problem as their hybrid counterpart in the regularized case. Hence if the goal is to solve the singular system, it is neither needed nor useful to regularize the problem and further use a specific smoother to tackle the induced near singularity. 5 Conclusions We have provided a generalization to singular matrices of all the main theoretical results constitutive of the approach developed in [?, ?, ?, ?, ?, ?, ?]. This has been done in a way that allows the extension to singular problems of the many methods inspired by these results. Comparing, for instance, Theorem 3.4 applied to singular and nonsingular A , one sees that the main convergence parameters will be essentially the same for two nearby matrices, provided that the extended prolongation Pb to be used in the singular case is equal to the standard prolongation P for the nonsingular one, which happens when the range of the prolongation contains the null space of the singular instance and thus the near null space of nearby nonsingular matrices. Moreover, we have shown that singularities arising in the coarse grid matrices are mostly harmless, and, further, that the course of the iterations is often independent of the choice of the generalized inverse; i.e., independent of which particular solution of the coarse grid system is picked up. A noteworthy exception is the application to Markov chains. Then, 25 the convergence may depend upon the choice of the generalized inverse, which deserves further investigations. On the other hand, our results show that the convergence conditions can actually be weaker in the singular case, because null space vectors need not to be well approximated in the range of the prolongation, as required for near null space vectors in the regular case. Discussing an example, we showed how this observation could lead to the development of more efficient algorithms for singular problems in the case where the null space is largedimensional. Acknowledgments Anonymous referees are acknowledged for suggestions that helped to improve the original manuscript. I thank Artem Napov for fruitful discussions and the code that allowed the numerical experiment reported in Section 4 to be run. 26
© Copyright 2026 Paperzz