Algebraic two-level convergence theory for singular systems

Algebraic two-level convergence theory for singular
systems
Yvan Notay∗
Service de Métrologie Nucléaire
Université Libre de Bruxelles (C.P. 165/84)
50, Av. F.D. Roosevelt, B-1050 Brussels, Belgium.
email : [email protected]
Report GANMN 15-01
July 2015
Revised December 2015, July 2016
Abstract
We consider the algebraic convergence theory that gives its theoretical foundations
to classical algebraic multigrid methods. All the main results constitutive of the
approach are properly extended to singular compatible systems, including the recent
sharp convergence estimates for both symmetric and nonsymmetric systems. In fact,
all results carry over smoothly to the singular case, which does therefore not require a
specific treatment (except a proper handling of issues associated with singular coarse
grid matrices). Regarding problems with a low-dimensional null space, the presented
results thus mainly confirm what has been observed often at a more practical level in
many previous works. As illustrated by the discussion of the application to the curlcurl equation, the potential impact is greater for problems with large-dimensional null
space. Indeed, it turns out that the design of multilevel methods can then actually
be easier for the singular case than it is for nearby regularized problems.
Key words. singular system, multigrid, multilevel, convergence analysis, preconditioning, AMG
AMS subject classification. 65F08, 65F10, 65F50
∗
Yvan Notay is Research Director of the Fonds de la Recherche Scientifique – FNRS
1
Introduction
In this paper, we consider two level iterative solution methods for compatible singular
linear systems
Au = b ;
b ∈ R(A) .
(1.1)
Such systems arise in numerous applications; they may be associated with Markov chains,
discretized partial differential equations with Neumann boundary conditions, Laplacian
of graphs, etc. In many cases, the systems are large and badly conditioned. In such
a context, multilevel algorithms are often methods of choice [?, ?, ?], and these have
effectively been successfully applied to singular systems in a number of works; see, e.g.,
[?, ?, ?, ?, ?, ?, ?, ?, ?, ?].
For general iterative methods, theoretical issues associated with singularities have been
studied for a long time and are now fairly well understood. However, regarding multilevel
methods, most previous works focus on practical aspects, and the state of the art is less
advanced with respect to theory. A noteworthy contribution is the one in [?], where
McCormick’s bound [?] is extended to symmetric positive semidefinite matrices. However,
whereas this theory allows one to analyze multilevel algorithms of V-cycle type, it does
not yield sharp two-level convergence estimates [?]. The analysis in [?] is also dedicated to
multi-level convergence estimates which are by nature less tractable than bounds for the
two-level case.
Two-level analysis is actually more widely used even for methods that involve many
levels in practice. This is particularly true for the algebraic analysis initiated by Brandt [?]
and which inspired many works on algebraic multigrid (AMG) methods; see, [?, ?, ?, ?, ?]
for examples. Moreover, the original theory has been improved in, e.g., [?, ?, ?], yielding
finally sharp bounds in [?, ?] for symmetric positive definite (SPD) matrices, whereas the
approach has been extended to nonsymmetric matrices in [?] (we refer [?, ?] for extensive
reviews). Nevertheless, the extension of these results to singular systems has not been
discussed so far. This is the first motivation of this paper, where we address the involved
issues and provide a generalization of all the main results constitutive of the approach.
We also analyze in detail a peculiarity of multilevel algorithms for singular systems:
often, the coarse grid matrix is “naturally” singular as well, raising several questions. The
first one is the compatibility of the coarse systems that are to be solved during the course of
the iterations, which is effectively addressed in most works on the topic. Here we moreover
analyze whether the choice of the generalized inverse has or does not have an influence
on the following of the iterations. It turns out that there is actually no influence when
the system matrix A is symmetric and positive semidefinite. Hence, whereas sometimes
opposite choices are made in the literature (compare [?], [?] and [?]), here we show that
they are in fact equally valid.
The first outcome of our analysis is thus that all known theoretical results for regular
problems carry over smoothly to the singular case. Regarding the many methods inspired
by these results (e.g., [?, ?, ?, ?, ?, ?]), we then conclude that their extension to singular
problems should not raise particular difficulties, even though their detailed investigation
2
lies outside the scope of the present study. Actually, here we mainly give a theoretical
confirmation of a well known fact, as the practical observation that multilevel algorithms
can be successfully applied to solve singular problems traces back to the early times of
multigrid [?], and these methods are, for instance, nowadays routinely used in the context of
pressure correction methods for Navier–Stokes problems [?]. More generally, the presented
results complement the many works where multilevel algorithms have been successfully
applied to solve singular problems (e.g., [?, ?, ?, ?, ?, ?, ?, ?]).
Now, going to the technical details, it turns out that designing an efficient two-level
algorithm can even be easier in the singular case, as the constraints to satisfy for fast
convergence are actually weaker. When the null space is low-dimensional, as in all previous works (to the best of our knowledge), the difference is anecdotal, and has hardly
any practical consequence. But, when the null space is large-dimensional, the impact is
significant, and opens the way to the design of algorithms specifically tailored for singular
problems, that can be more efficient than their regular counterpart. This other outcome of
our theoretical analysis is further discussed below in the context of the curl-curl equation,
for which multilevel algorithms have been proposed in, e.g., [?, ?], excluding, however, the
singular case.
The remainder of this paper is organized as follows. In Section 2, we recall some basic
facts about the iterative solution of singular systems. In Section 3, we develop our analysis;
more precisely, we first state the general framework (Section 3.1), and next we analyze the
influence of the generalized inverse that defines the coarse grid correction (Section 3.2);
we then proceed (Section 3.3) with our extension to the singular case of the main results
in [?] that apply to general (nonsymmetric) matrices; thereafter, we discuss the particular
cases of symmetric positive semidefinite matrices (Section 3.4) and of matrices positive
semidefinite in Rn (Section 3.5). Problems with large-dimensional null space are discussed
in Section 4, and conclusions are drawn in Section 5.
Notation
For any matrix C , R(C) denotes its range and N (C) its null space. If C is square, σ(C)
denotes its spectrum. If W1 and W2 are two subspaces, W1 + W2 denotes their (not
necessarily direct) sum; i.e., the set of linear combinations of vectors from W1 and W2 .
2
General iterative methods for singular systems
As mentioned in the preceding section, there is a rich literature on the iterative solution of
singular systems. Here we only recall the results that are useful for our later developments,
or that help to put them in proper perspective. We also analyze in detail the conditions
under which two different preconditioners may be considered equivalent, because this is
crucial for a correct understanding of the results in the next section.
Noting B the preconditioning matrix, stationary iterations to solve (1.1) correspond to
um+1 = um + B(b − A um ) ,
3
m = 0, 1, ... .
(2.1)
Often instead of B one writes B −1 , where B is the preconditioner or the main operator in
a splitting of A . However, in the singular case, B does not need to be invertible, hence
using B −1 would entail some loss of generality.
Letting u be any particular solution to (1.1), there holds
u − um = T m (u − u0 ) ,
where
T = I −BA
(2.2)
is the iteration matrix.
It is well known (see [?] for an early reference) that um converges to a particular
solution for any b ∈ R(A) if and only if T m converges to a projector onto N (A) . In, e.g.,
[?, Theorem 2.1], this is shown to hold if and only if the three following conditions are
satisfied: N (B A) = N (A) , 1 is a semi-simple eigenvalue of T , and all other eigenvalues
are less than 1 in modulus. This yields the following lemma, observing that the first two
conditions are equivalent to the assumption that the eigenvalue 0 of B A has algebraic
multiplicity equal to k , where k = dim(N (A)) . (See [?] for a further discussion of this
topic.)
Lemma 2.1. Let A and B be n × n matrices, and let k = dim(N (A)) . The iterative
scheme defined by (2.1) converges to a particular solution of (1.1) for any b ∈ R(A) if
and only if the two following conditions hold:
(1) The algebraic multiplicity of the eigenvalue 0 of B A is equal to k;
(2) ρeff = max λ∈σ(B A) |1 − λ| < 1 .
λ6=0
Krylov subspace methods have also been widely studied in presence of singularity;
see, e.g., [?, ?, ?]. Here we only recall the main conclusions from [?] about GMRES (the
conjugate gradient method is discussed below together with the positive semidefinite case).
The following lemma is based on the observation that the condition (1) in Lemma 2.1 is
in fact sufficient to apply the framework developed in Section 2.5 of [?], where a GMRES
process for a singular system is decomposed into the components belonging to the range
of the matrix and the components orthogonal to it.
Lemma 2.2. Let A and B be n × n matrices, and let k = dim(N (A)) . If the algebraic
multiplicity of the eigenvalue 0 of B A is equal to k , then GMRES iterations to solve
(1.1) are well defined with either left or right preconditioning B . For any b ∈ R(A) , the
convergence of such a GMRES process is the same as that of
(a) in the case of left preconditioning, a regular GMRES process with system matrix
T
QBA B A QBA , where QBA is a matrix whose columns form an orthonormal basis of R(B A) ;
(b) in the case of right preconditioning, a regular GMRES process with system matrix
T
QAB A B QAB , where QAB is a matrix whose columns form an orthonormal basis of R(A B) .
Moreover, the eigenvalues of both matrices QTBA B A QBA and QTAB A B QAB are, counting
multiplicities, equal to the n − k nonzero eigenvalues of B A .
4
Proof. In Section 2.5 of [?] it is shown how GMRES for a compatible system can be
decomposed into the components belonging to the range of the (preconditioned) matrix,
and those orthogonal to it. This decomposition provides the required results as soon as
it is applicable, that is, as soon as QTBA B A QBA (case (a)) or QTAB A B QAB (case (b)) are
nonsingular. But, by [?, Theorem 1.3.22], QTBA B A QBA has the same nonzero eigenvalues
(counting multiplicities) as QBA QTBA B A , which is itself equal to B A since QBA QTBA is a
projector onto R(B A) . Since our assumptions imply that B A has exactly n − k nonzero
eigenvalues, where n − k is also the row and column size of QTBA B A QBA , the conclusions
follows for (a). Similarly, one finds that QTAB A B QAB has the same nonzero eigenvalues
as A B , which by virtue of [?, Theorem 1.3.22] again, also coincide with the nonzero
eigenvalues of B A .
When A is singular, different preconditioners may lead to the same iteration matrix
(see [?] for a detailed discussion). More generally, different preconditioners act equivalently
when, applied to a vector in R(A) , they provide the same result up to a component in
N (A) . Indeed, on the one hand, considering either stationary iterations or Krylov subspace
methods for compatible systems, it is clear that the preconditioner is applied only to vectors
r in R(A) ; thus, only the action on vectors of that subspace matters. On the other hand,
if two preconditioners applied to a same vector provide the same result up to a component
in N (A) , this latter influences which particular solution will be finally obtained, but does
not otherwise influence the following of the iterations. This follows because the subsequent
multiplication by A will ignore this component; hence, e.g., GMRES parameters are not
affected and the next residual vector will be the same.
In the following lemma, we give conditions under which such extended equivalence
holds, and we furthermore show that, under these conditions, the preconditioned matrices
have same spectrum.
Lemma 2.3. Let A , B1 and B2 be n × n matrices. If
T
T
πR(A
T ) B1 πR(A) = πR(AT ) B2 πR(A) ,
where πR(AT ) and πR(A) are projectors onto, respectively, R(AT ) and R(A) , then, for any
r in R(A) ,
B1 r − B2 r ∈ N (A) .
Moreover, B1 A and B2 A have the same eigenvalues (counting multiplicity).
Proof. One has, since r ∈ R(A) implies πR(A) r = r ,
B1 r − B2 r = (B1 − B2 ) πR(A) r
T
T
= πR(A
T ) (B1 − B2 ) πR(A) r + (I − πR(AT ) ) (B1 − B2 ) πR(A) r ,
T
and the last right hand side belongs to N (A) because πR(A
T ) (B1 −B2 )πR(A) = 0 by assumpT
T
⊥
tion, whereas, R(I − πR(AT ) ) = N (πR(AT ) ) = R(πR(AT ) ) = R(AT )⊥ = N (A) . The statement about the eigenvalues follows for the fact that, for i = 1 , 2 , there holds that Bi A =
T
T
Bi πR(A) A πR(A
T ) has, by [?, Theorem 1.3.22], the same eigenvalues as πR(AT ) Bi πR(A) A .
5
The symmetric positive semidefinite case
An important particular case is when A is symmetric and positive semidefinite. Then, the
convergence can be analyzed with respect to the seminorm k · kA = (· , A ·)1/2 , and it is
easy to see that ku − um kA does not depend on the particular solution u . Moreover, if the
preconditioning matrix B is SPD, condition (1) of Lemma 2.1 is automatically satisfied,
whereas one may consistently use the preconditioned conjugate gradient method [?]. These
observations are summarized in the following lemma, where we also use the fact that, by
Lemma 2.3, the requirement on B can be relaxed: it needs only to be equivalent to some
SPD preconditioning matrix.
Lemma 2.4. Let A be a symmetric positive semidefinite n × n matrix, and let B be a n × n
matrix such that
πR(A) B πR(A) = πR(A) Bb πR(A) ,
(2.3)
for some SPD preconditioning matrix Bb , where πR(A) is the orthogonal projector onto
R(A) . Then B A has, counting multiplicities, k times the eigenvalue 0 and n − k positive
eigenvalues, where k = dim(N (A)) .
Moreover, considering the linear system (1.1) and letting u be any particular solution,
the following propositions hold.
(1) When performing stationary iterations (2.1),
ku − um kA ≤ (ρeff )m ku − u0 kA
(2.4)
where
ρeff =
max |1 − λ|.
(2.5)
λ∈σ(B A)
λ6=0
(2) When performing conjugate gradient iterations with preconditioning B , the convergence is the same as that of a regular conjugate gradient process in Rn−k with SPD
preconditioning and for which the eigenvalues of the preconditioned matrix are equal to the
nonzero eigenvalues of B A . Moreover,
√
κeff − 1 k
ku − u0 kA
(2.6)
ku − uk kA ≤ 2 √
κeff + 1
where
κeff =
maxλ∈σ(B A) λ
.
min λ∈σ(B A) λ
(2.7)
λ6=0
3
3.1
Analysis of two-level preconditioning
General framework
We assume that the system matrix A is a real n × n matrix, and we consider the two-level
schemes that are described by the iteration matrix
ν
ν
TTL = I − M2−1 A 2 (I − P Agc R A) I − M1−1 A 1 ,
(3.1)
6
where M1 (resp. M2 ) is a nonsingular 1 n × n matrix which defines the pre-smoother (resp.
post-smoother), and where ν1 (resp. ν2 ) is the corresponding number of smoothing steps;
P is the prolongation matrix, of size n × nc with nc < n , R is the restriction matrix, of
size nc × n , and Ac is the coarse grid matrix; we restrict ourselves to Galerkin coarse grid
matrices; i.e., we assume
Ac = R A P ,
(3.2)
but we do not assume that Ac is nonsingular; in (3.1), the notation Agc stands for a
generalized inverse of Ac ; that is, see, e.g., [?], any matrix satisfying
Ac Agc Ac = Ac .
(3.3)
This definition is motivated as follows: since (3.3) implies
Ac Agc rc = rc
∀rc ∈ R(Ac ) ,
the vector Agc rc is indeed a particular solution to any compatible system Ac uc = rc .
Clearly, if Ac is singular, there are infinitely many generalized inverses, and the iterative
process may also be influenced by the choice of the latter. This will be addressed in the
next subsection, where we also discuss conditions under which the coarse grid systems are
always compatible (i.e., conditions under which, during the course of the iterations, Agc is
indeed applied only to vectors in R(Ac )).
To alleviate the presentation, we shall assume that one performs at most one presmoothing step and one post-smoothing step. Thus, either ν1 = 1 , ν2 = 0 (no postsmoothing), or ν1 = 0 , ν2 = 1 (no pre-smoothing) or ν1 = ν2 = 1 .
For future reference, note that, letting,


if ν1 = 1 , ν2 = 0
M1
Y =
(3.4)
M2
if ν1 = 0 , ν2 = 1


−1
M1 (M1 + M2 − A) M2 if ν1 = ν2 = 1 ,
the two-level preconditioner corresponding to (3.1) is
ν
ν
BTL = Y −1 + I − M2−1 A 2 P Agc R I − A M1−1 1 ,
(3.5)
which thus in particular satisfies I − BTL A = TTL . In view of (3.4), we assume that
M1 + M2 − A is nonsingular when ν1 = ν2 = 1 .
We target simple smoothers such as damped Jacobi (Mi = ω −1 diag(A)) or Gauss–Seidel (Mi = upp(A)
or Mi = low(A)); technically, smoothers based on singular preconditioners can be allowed if assumption
(1) of Lemma 2.1 is satisfied, but this would make the presentation significantly more cumbersome because
the matrices Y and X defined below would then require a thorough discussion.
1
7
3.2
The coarse grid correction
The coarse grid correction step corresponds to the term P Agc R in (3.1) or (3.5): given the
residual vector e
r obtained after pre-smoothing, in consists in the following substeps.
1. Restrict e
r on the coarse grid:
2. Solve the coarse problem A uc = rc :
3. Prolongate uc on the fine grid:
rc = R e
r
uc = Agc rc
e = P uc
u
In the regular case, Ac is logically assumed nonsingular and hence Agc = A−1
c . In the
singular case, however, Ac is often “naturally” singular. For instance, the constant vector is
often in the range of the prolongation (then a constant on the coarse grid remains a constant
on the fine grid after prolongation). However, the constant vector is also typically a null
space vector in the singular case (consider for instance Laplacian of graphs or discretized
partial differential equations with Neumann boundary conditions). Then, Ac = R A P is
singular since A P vc = 0 for all vc such that P vc ∈ N (A) . Similarly, RT vc will be in the
left null space of Ac for any vc such that RT vc ∈ N (AT ) . This arises with Markov chains
for which the range of RT typically contains the constant vector which is also a left null
space vector.
Fortunately, such singular modes are harmless. Indeed, on the one hand, as shown in,
e.g., [?, ?], the coarse systems Ac uc = rc to solve are compatible . This follows because,
during the course of the iterations, only residual vectors e
r belonging to R(A) are restricted
T
on the coarse grid, whereas e
r ∈ R(A) if and only if nL r = 0 for all nL ∈ N (AT ) . Then,
T
T
T
T
vc rc = vc (R e
r) = (R vc ) e
r = 0 for any vc such that RT vc ∈ N (AT ) ; that is, the
restricted residual is orthogonal to all left null space vectors of Ac that are inherited from
left null space vectors of A in the range of RT . If Ac has no other singular mode, the
coarse system is therefore compatible.
On the other hand, if, similarly, all right null space vectors of Ac are inherited from
right null space vectors of A in the range of P , all null space components on the coarse
grid (i.e., all vectors in N (Ac )) are prolongated into null space components on the fine
grid, which, as seen in the preceding section, do not influence the convergence process.
Therefore, which solution of the coarse grid system is picked up by the specific choice of
Agc does not really matter.
These observations amount to stating that the coarse grid correction term P Agc R , is
independent of the choice of Agc when multiplied to the left by a projector onto R(AT ) and
to the right by a projector onto R(A) . They are formally proved in the following lemma,
where we use the condition
dim(R(P ) ∩ N (A)) = dim(R(RT ) ∩ N (AT )) = dim(N (Ac ))
(3.6)
as mathematical formulation of the assumption that all vectors in N (Ac ) correspond to
null space vectors of A in the range of P , and all vectors in N (ATc ) correspond to left
null space vectors of A in the range of RT . Note that dim(N (Ac )) is in fact at least as
large as both dim(R(P ) ∩ N (A)) and dim(R(RT ) ∩ N (AT )) . Thus, (3.6) amounts to
assuming that Ac has no additional singularity. This may be seen as the natural extension
8
of the assumption that Ac is nonsingular when A is nonsingular (observe that it is trivially
satisfied when Ac is nonsingular).
Lemma 3.1. Let A be a n × n matrix, let P and R be full rank matrices of size respectively
n × nc and nc × n , and let Ac = R A P .
If (3.6) holds, one has, for any pair of matrices Agc (1) and Agc (2) satisfying (3.3),
T
g (1)
T
g (2)
πR(A
R πR(A) = πR(A
R πR(A) ,
T ) P Ac
T ) P Ac
where πR(AT ) and πR(A) are projectors onto, respectively, R(AT ) and R(A) .
Proof. Let Nc,R be a nc × kc matrix of rank kc such that the columns of P Nc,R form a
basis of R(P ) ∩ N (A) . By the condition dim(R(P ) ∩ N (A)) = dim(N (Ac )) , the columns
T
⊥
of Nc,R form also a basis of N (Ac ) . Since N (πR(A
= R(AT )⊥ = N (A) ,
T ) ) = R(πR(AT ) )
T
one has πR(A
T ) P Nc,R = 0 . Similarly, for Nc,L forming a basis of the left null space of Ac ,
T
R πR(A) = 0 . Thus, for i = 1 , 2 ,
one finds that Nc,L
T
g (i)
πR(A
R πR(A) =
T ) P Ac
T
T
−1 T
T
T
πR(A
Nc,R Agc (i) I − Nc,L (Nc,L
Nc,L )−1 Nc,L
R πR(A) .
T ) P I − Nc,R (Nc,R Nc,R )
The conclusions follows, because, whatever Agc (i) satisfying (3.3), the matrix
g (i)
T
T
T
T
I − Nc,R (Nc,R
Nc,R )−1 Nc,R
Ac
I − Nc,L (Nc,L
Nc,L )−1 Nc,L
is the unique generalized inverse of Ac with range R(Nc,R )⊥ and null space R(Nc,L ) [?,
Chapter 2, Theorem 10(c)].
In the nonsymmetric case, the assumption (3.6) cannot always be fulfilled in practice.
For instance, when solving linear systems associated with Markov chains, the left null
space vector is the constant vector which is typically in the range of RT , but the right null
space vector is unknown and in general not in the range of P . It follows that the coarse
grid systems to solve are indeed compatible, but the choice of the generalized inverse has
some influence, as the null space component on the coarse grid is prolongated in nontrivial
components on the fine grid .2
Our main result in the next section uses the assumption (3.6) and therefore cannot
directly cover such cases. However, it can be applied indirectly via the following lemma
which shows that, if the used generalized inverse has the same rank as Ac (as, e.g., the
Moore–Penrose inverse), then the coarse grid correction is in fact identical to that obtained
e for which the corresponding Galerkin
with some truncated prolongation Pe and restriction R
e
e
coarse grid matrix R A P is nonsingular. (The Lemma is also a technical result needed for
the proof of Theorem 3.3 below.)
2
It is generally observed that the best results are obtained with the Moore–Penrose inverse which
provide a minimal norm solution [?, ?], but so far no analysis supports this fact.
9
Lemma 3.2. Let A be a n × n matrix, let P and R be full rank matrices of size respectively
n × nc and nc × n , and let Ac = R A P .
Let Agc be a matrix satisfying (3.3) that has the same rank as Ac , and let VL , VR be
nc × n
ec matrices whose columns form a basis of R(Agc ) and R(Agc T ) , respectively, where
n
ec is the rank of Ac .
e = V T R are such that R
e A Pe is
The n × n
ec matrix Pe = P VL and the n
ec × n matrix R
R
nonsingular and
−1
e A Pe
e.
P Ag R = Pe R
R
(3.7)
c
Moreover, if
dim(R(P ) ∩ N (A)) = dim(N (Ac )) ,
(3.8)
R Pe + N A = R P + N A ,
(3.9)
dim(R(RT ) ∩ N (AT )) = dim(N (Ac )) ,
(3.10)
eT + N AT = R RT + N AT .
R R
(3.11)
there holds
whereas, if
there holds
e A Pe = V T R A P VL = V T Ac VL . As shown in [?, p. 52], if Ag
Proof. One finds R
c
R
R
has the same rank as Ac , then its range is a subspace complementary to N (Ac ) . It
follows that for any w 6= 0 , Ac VL w is a non trivial vector in R(Ac ) , which further implies
that VRT Ac VL w 6= 0 , because otherwise Ac VL w would be moreover a nontrivial vector in
R(VR )⊥ = R(Agc T )⊥ = N (Agc ) , which is not possible because, as also shown in [?, p. 52],
N (Agc ) is a subspace complementary to R(Ac ) .
e A Pe = V T Ac VL is nonsingular,
Since, as just shown, VRT Ac VL w 6= 0 for any w 6= 0 , R
R
which also shows that all three terms VR , Ac , and VL have rank n
ec . Then, Theorem 11(d)
in Chapter 2 of [?] shows that
Agc = VL VRT Ac VL
−1
−1
e A Pe
VRT = VL R
VRT ,
and (3.7) follows straightforwardly.
To show (3.9), let PN be a basis of R(P ) ∩ N (A) . One has PN = P Nc for some nc × kc
matrix Nc of rank kc = dim(N (Ac )) . Clearly, all columns of Nc belong to the null space
of Ac , and hence form a basis of that subspace by the condition (3.8). Moreover, we have
g
already seen that
N (Ac ) = R(Nc ) is complementary to R(Ac ) = R(VL ) . Hence the matrix
V = VL Nc is nc × nc and nonsingular. Therefore, the range of P is equal to that of
P V , which is itself equal to R(Pe) + R(PN ) . The equality (3.9) follows (remember that
R(PN ) ⊂ N (A)). The proof of (3.11) is similar.
10
3.3
Main result for general nonsymmetric matrices
We are now ready to state our generalization of the main result in [?]. The only restrictive
assumption is (3.6), which has already been commented on in the preceding subsection;
observe also that it is always satisfied when Ac is nonsingular.
It is also worth noting that, applied to a nonsingular matrix A , the statements in
Theorem 3.3 in fact reproduce the main statement in Theorem 2.1 of [?], which is recovered
explicitly using the following simplifications: firstly, the main assumption (3.6) amounts in
fact to dim(N (Ac )) = 0 , i.e., to assume that Ac is nonsingular, and hence Agc = A−1
c ; next
b
b
(3.12) is most easily satisfied using P = P , R = R ; finally N (A) = {0} implies V = Rn .
Theorem 3.3. Let A be an n × n matrix, let M1 , M2 be n × n nonsingular matrices, and
let ν1 , ν2 be nonnegative integers such that either ν1 = 1 , ν2 = 0 (no post-smoothing), or
ν1 = 0 , ν2 = 1 (no pre-smoothing) or ν1 = ν2 = 1 . Let P and R be full rank matrices of
size respectively n × nc and nc × n , let Ac = R A P , and assume that (3.6) holds.
Assume that M1 + M2 − A is nonsingular if ν1 = ν2 = 1 , and let BTL be defined by
(3.5), where Y is defined by (3.4) and where Agc is a matrix satisfying (3.3).
Let k = dim(N (A)) and, noting n
ec the rank of Ac , let Pb be any n × (e
nc + k) matrix
b
and R be any (e
nc + k) × n matrix such that
R(Pb) = R(P ) + N (A)
and
bT ) = R(RT ) + N (AT ) .
R(R
Letting X be defined by


M1
X =
M2


M2 (M1 + M2 − A)−1 M1
if
if
if
ν1 = 1 , ν 2 = 0
ν1 = 0 , ν 2 = 1
ν1 = ν2 = 1 ,
(3.12)
(3.13)
b is nonsingular, the matrix BTL A has k times the eigenvalue 0 and n
if (Pb X R)
ec times the
eigenvalue 1. The other eigenvalues are the inverses of the n − n
ec − k nonzero eigenvalues
of the generalized eigenvalue problem
b X Pb −1 R
bX v = µ A v ,
X I − Pb R
v∈V ,
(3.14)
where V is any subspace complementary to N (A) .
Proof. The thesis about the eigenvalues of BTL A can be translated into a thesis on the
eigenvalues of TTL = I − BTL A , which, by [?, Theorem 1.3.22], has the same eigenvalues
(counting multiplicity) as
ν
ν
TeTL = (I − P Agc R A) I − M1−1 A 1 I − M2−1 A 2 = (I − P Agc R A) I − X −1 A .
It follows that, instead of analyzing the eigenvalues of BTL A , we may equivalently analyze
the eigenvalues of BeTL A , where
BeTL = X −1 + P Agc R I − A X −1
11
is such that I − BeTL A = TeTL . The main idea of the proof is then the following. We first
derive an explicit expression for the inverse BTL of BeTL , and we use it to prove that the
algebraic multiplicity of the eigenvalue 0 of BeTL A is equal to k as claimed. Next we use
the fact that the inverses of the nonzero eigenvalues of BeTL A are the finite eigenvalues of
the generalized eigenvalue problem BTL z = µ A z . We then observe that this latter admits
n
ec times the eigenvalue 1, and finally conclude the proof by showing that, regarding the
remaining n − n
ec − k eigenvalues, this generalized eigenvalue problem is in fact equivalent
to (3.14).
ec = R
e X Pe is nonsingular, where R
e and Pe
The developments below require that X
are defined as in Lemma 3.2. This is, however, no loss of generality because in case this
matrix would be singular, we can make the proof for a slightly perturbed case obtained
by exchanging X for X + ε I . Since both the eigenvalues of BTL A and of (3.14) depend
continuously on ε , the needed results for the original matrices are obtained by considering
the limit for ε → 0 .
Before entering the core of the proof, it is also worth deriving some auxiliary results
related to
e
R
e
,
P = P NR
and
R =
NLT
where NL and NR are n × k matrices whose columns span a basis of, respectively, N (AT )
and N (A) . By (3.9) and (3.11) (which apply because of (3.6)), these matrices are full
b = GR R for some nonsingular GP and GR . It follows
rank and such that Pb = P GP and R
b
b
that the nonsingularity of R X P implies that of R X P . Moreover:
b X Pb −1 R
b = P R X P −1 R ;
Pb R
(3.15)
that is, it suffices to prove the required result for the particular case where Pb = P and
b = R . On the other hand, note for future reference that
R
!
ec
e X NR
X
R
.
(3.16)
RX P =
NLT X Pe NLT X NR
ec implies then that of the Schur
This matrix being nonsingular, the nonsingularity of X
T
T
−1 e
e
e
complement Sc = NL X NR − NL X P Xc R X NR .
ec is nonsingular, we
We now enter the main part of the proof. Assuming thus that X
−1
ec R
e (X − A) . Indeed,
use Lemma 3.2 to show that the inverse of BeTL is BTL = X − X Pe X
ec = R
e A Pe :
setting A
e I − A X −1
e −1 R
e (X − A) X −1 + PeA
e−1 R
BTL BeTL = X − X Pe X
c
c
e (I − A X −1 )
e−1
e −1
e −1 e
e e−1 R
= I + X Pe A
c − Xc − Xc R(X − A)P Ac
= I.
This allows us to prove the first claim, related to the multiplicity of the eigenvalue 0
of BTL A . First, because BTL is nonsingular, the geometric multiplicity is equal to k =
12
dim(N (A)) . Next, the algebraic multiplicity exceeds the geometric one if and only if
there exist nontrivial vectors z and v such that BTL A z = NR v , implying A z = BTL NR v
and therefore NLT BTL NR v = 0 ; but this is not possible because the matrix NLT BTL NR =
e −1 R
e X NR is equal to the Schur complement Sc that is shown above to
NLT X − sX Pe X
c
be nonsingular.
To pursue, the nonzero eigenvalues of BTL A are thus the inverses of the finite eigenvalues
of
e −1 R
e (X − A) z = µ A z .
X − X Pe X
(3.17)
c
eT ) is a left
Since R(NL ) is a left eigenspace associated with infinite eigenvalues, and R(R
eigenspace associated with the eigenvalue 1, the (right) eigenvectors associated with other
eigenvalues have to satisfy, besides (3.17),
T
−1 e
e
e
NL X − X P Xc R X z = 0
(3.18)
and
eAz = 0 .
R
(3.19)
On the other hand, exploiting a block factorization of (3.16), one sees that, for any z
satisfying (3.18),
−1
P RX P
RX z
−1
eX z
e
e X NR
e −1 R
I
R
X
I −X
c
c
e
= P NR
e −1 I
NLT X z
−NLT X Pe X
Sc−1
I
c
eX z
e −1
e X NR
e −1 R
R
X
I −X
c
c
e
= P NR
Sc−1
0
I
−1
e R
eX z .
= Pe X
c
Therefore, with the two conditions (3.18) and (3.19), the problem (3.17) is equivalent to
−1
X −X P RX P
RX z = µAz ,
(3.20)
which in fact amounts to (3.14) because of (3.15).
It remains thus only to check the equivalence of the additional conditions. With respect
eT ) is a left
to (3.14), we state that we consider the nonzero eigenvalues. Because R(R
eigenspace associated with the eigenvalue 0 for (3.20), this indeed amounts to imposing
(3.19). On the other hand, the condition z ∈ V is in general not equivalent to (3.18).
However, N (A) is the null space of A , and is also included in the null space of the left
hand side matrix of (3.14). Therefore, if this latter equation is satisfied by some z , it
is also satisfied by z + n for the same eigenvalue µ whatever n ∈ N (A) . Hence, which
subspace V is used is unimportant regarding the eigenvalues, and we may thus consider
the subspace induced by the condition (3.18) providing that it is complementary to N (A)
13
as needed. Finally, this latter requirement is checked by observing that no vector in N (A)
can satisfy (3.18). Indeed, otherwise, one would have
−1 e
T
e
e
NL X − X P Xc R X NR v = 0
ec−1 R
e X NR is equal
for some non trivial v , which is not possible since NLT X − X Pe X
to the Schur complement Sc that is shown above to be nonsingular.
When applied to singular A , Theorem 3.3 first tells us that the algebraic multiplicity of
the eigenvalue 0 of the preconditioned matrix is equal to dim(N (A)) . Hence, Lemma 2.2
can be applied to analyze GMRES iterations, whereas condition (1) of Lemma 2.1 holds,
and hence the convergence of stationary iterations is guaranteed if the nonzero eigenvalues
λ of B A satisfy |1 − λ| < 1 , which holds if and only if the nonzero eigenvalues µ of (3.14)
satisfy |1 − µ−1 | < 1 .
3.4
The symmetric positive semidefinite case
Here we address the generalization of the results for SPD matrices originally obtained in
[?, ?, ?, ?, ?, ?]. We use the fact that these results can be recovered as corollaries of
the analysis for general nonsymmetric matrices considered in Section 3.3, even though the
original proofs were quite different. More precisely, this can be done in two steps. Firstly,
the sharp estimate in [?, ?] can be obtained by particularizing to the symmetric case
the results in [?, Theorem 2.1]. Next, as also discussed in [?, ?], the previous (non sharp)
estimates can be derived by combining the sharp bound with some further inequalities. We
accordingly proceed in two steps here: Theorem 3.4 below extends the bound in [?, ?] to
symmetric positive semidefinite matrices, whereas Theorem 3.5 generalizes the inequalities
that allow us to deduce more tractable bounds, in particular those that appeared earlier
in [?, ?, ?, ?].
To make things clear, the assumption used in this subsection are:
A is symmetric and positive (semi)definite , R = P T
and


M1 is SPD
M2 is SPD


M1 = M2T and M1 + M1T − A is SPD
if ν1 = 1 , ν2 = 0
if ν1 = 0 , ν2 = 1
if ν1 = ν2 = 1 .
(3.21)
(3.22)
Then, the two potentially restrictive assumptions of Theorem 3.3 are automatically satisfied. On the one hand, X in (3.13) is SPD and hence PbT X Pb is always non singular. On
the other hand, (3.6) also always holds: because A positive semidefinite implies that Ac is
positive semidefinite as well, one has, for any vc ,
Ac vc = 0 ⇐⇒ vcT (P T A P )vc = 0 ⇐⇒ (P vc )T A(P vc ) = 0 ⇐⇒ A(P vc ) = 0 ;
14
that is, any vector in the null space of Ac corresponds to a vector in R(P ) ∩ N (A) .
Regarding the case ν1 = ν2 = 1 the positive definiteness of M1 + M1T − A implies that
I − Y −1 A = (I − M1−T A)(I − M1−1 A) defines a convergent iteration, see [?, ?, ?]; moreover,
0 ≤ λ ≤ 1 ∀λ ∈ σ(Y −1 A) = σ(X −1 A) .
(3.23)
In practice, there are two important particular cases. First, one may use the same SPD
matrix M1 = M2 = M for pre- and post-smoothing, and then 2M − A is SPD as needed if
and only if I − M −1 A defines a convergent iteration. Second, one often uses symmetrized
Gauss–Seidel smoothing, for which M1 = upp(A) and M2 = low(A) ; then M1 + M1T − A =
diag(A) is SPD as soon as all diagonal entries of A are positive.
Our generalization of the sharp bounds for the SPD case can be stated as follows.
b = PbT , and assume in
Theorem 3.4. Let the assumption of Theorem 3.3 hold, select R
addition (3.21), (3.22) (which imply (3.6) and the nonsingularity of PbT X Pb). Then, BTL A
has k times the eigenvalue 0 and n − k positive eigenvalues satisfying
λ
,
(3.24)
max λ ≤ max 1 , max
−1
λ∈σ(BTL A)
min
λ∈σ(BTL A)
λ6=0
λ∈σ(Y
−1
λ = min 1 , KX
,
where
v X I − Pb PbT X Pb
T
KX =
sup
A)
−1
(3.25)
PbT X
vT A v
v∈Rn
v6∈N (A)
v
.
(3.26)
Moreover, if ν1 = ν2 = 1 , BTL is SPD and
max
λ = 1,
(3.27)
λ∈σ(BTL A)
λ6=1
κeff =
maxλ∈σ(BTL A) λ
min λ∈σ(BTL A) λ
= KX ,
(3.28)
λ6=0
ρeff = 1 −
min
λ∈σ(BTL A)
λ6=0
−1
λ = 1 − KX
.
(3.29)
Proof. The eigenvalue estimates are straightforward corollaries of (3.14) under the
given additional conditions.
The matrix in the numerator of KX often has a complicated form (except if one deliberately chooses to analyze a simplified scheme with, say, only a single smoothing step). This
matrix can be seen as the product of X with projectors: letting π̂ = I − Pb(PbT X Pb)−1 PbT X ,
−1
one has X I − Pb PbT X Pb PbT X = π̂ T X π̂ . The idea behind more tractable estimates
(including earlier bounds) is to substitute a simpler matrix for X , and/or to use simpler
projectors. Of course, inequalities are then needed to deduce eigenvalue bounds from the
associated quantities. These are proved in the following theorem.
15
b = PbT , and assume in
Theorem 3.5. Let the assumption of Theorem 3.3 hold, select R
addition (3.21), (3.22) (which imply (3.6) and the nonsingularity of PbT X Pb).
(1) The quantity
−1
vT X I − Pb PbT X Pb PbT X v
KX = sup
,
(3.30)
vT A v
v∈Rn
v6∈N (A)
is also the smallest constant K such that
kA(I − P Agc P T A)vk2X −1 ≥ K −1 k(I − P Agc P T A)vk2A
∀v ∈ R(A) .
(3.31)
(2) Letting, for any projector π with null space equal to R(Pb) ,
KX (π) =
sup
v∈Rn
v6∈N (A)
vT π T X π v
,
vT A v
(3.32)
there holds
KX =
min
KX (π)
(3.33)
π: projector with N (π)=R(Pb)
with arg minπ = I − Pb(PbT X Pb)−1 PbT X . Moreover,
kπk2X
vT X v
T
v6∈N (A) v A v
inf
≤ KX (π) ≤ KX kπk2X .
(3.34)
(3) Let δX be the largest constant δ such that
kA vk2X −1 ≥ δ k(I − P Agc P T A)vk2A
∀v ∈ R(A) .
(3.35)
Setting
π
e =
I − N (N T X N )−1 N T X
I − P Agc P T A ,
(3.36)
where N is any n × k matrix whose columns form a basis of N (A) , there holds
δX =
KX (e
π)
−1
.
(4) If ν1 = ν2 = 1 , δX is also the largest constant δ such that
k(I − M1−T A)vk2A ≤ kvk2A − δ k I − P Agc P T A vk2A
(3.37)
∀v ∈ Rn .
(3.38)
(5) For any n × n SPD matrices G , defining KG as in (3.30) with G instead of X ,
there holds
T
T
v
v
KG ≤ KX ≤ max vvT X
KG .
(3.39)
min vvT X
Gv
Gv
v6=0
v6=0
16
Proof. Preliminaries. Since Ac Agc is a projector onto R(Ac ) [?, p. 52], letting Nc be a
matrix whose columns form a basis of N (Ac ) , we have Ac Agc = I − Ec NcT for some Ec .
Similarly, Ac Agc T = (Agc Ac )T (Ac is symmetric but Agc may be unsymmetric) is a projector
onto R(Ac ) and, therefore, Ac Agc T = I − Fc NcT for some Fc . Letting Q = P Agc P T A , one
has then A(I −Q) P = A P (I −Agc Ac ) = A P Nc FcT = 0 (remember that R(P Nc ) ⊂ N (A)).
Similarly, one finds P T A(I − Q) = (I − Ac Agc )P T A = 0 and P T (I − QT )A = 0 , and one
may check that QT A Q = A P Agc T Ac Agc P T A = A P (Ac Agc )T Agc P T A = A Q , which further
implies (I − QT )A(I − Q) = (I − QT )A .
e for some positive α , then KX =
Proof of statement (1). First note that if X = α X
α KXe , whereas the smallest admissible K in (3.31) also scales linearly with α . Hence we
may assume without loss of generality that X is scaled in such a way that maxλ∈σ(X −1 A) λ ≤
1 , which implies KX ≤ 1 . Consider then the generalized eigenvalue problem
(I − QT )A X −1 A(I − Q) z = µ A z ,
z ∈ R(A) .
(3.40)
Because A is SPD on R(A) and the left hand side matrix is symmetric nonnegative definite,
the eigenvalues are nonnegative and there exists a set of eigenvectors {zi } forming a basis
of R(A) and further satisfying zTi A zj = δij . Moreover, multiplying both sides to the left
by P T , one sees that P T A zi = 0 for all eigenvectors associated with positive eigenvalues
(as seen above, P T (I − QT )A = 0). On the other hand, the eigenvectors zi associated
with the eigenvalue 0 are
P such that (I − Q)zi ∈ N (A) . Then expand v ∈ R(A) on this
eigenvector basis: v = i αi zi . Using the just seen properties of zi , one finds
X
X
kA(I − P Agc P T A)vk2X −1 =
|αi |2 µi and k(I − P Agc P T A)vk2A =
|αi |2 .
i: µi >0
i: µi >0
It follows that the smallest possible K in (3.31) is the inverse of the smallest nonzero
eigenvalue of (3.40) µmin , and one may further note that µmin ≤ maxλ∈σ(X −1 A) λ ≤ 1.
On the other hand, let zi be any eigenvector of (3.40) associated with a positive eigenvalue µi . Since, by the condition P T A zi = 0 , the equation (3.40) implies
A zi − (I − QT )A X −1 A zi = (I − QT )(I − A X −1 )(A zi ) = (1 − µi ) (A zi ) ,
one sees that xi = A zi is an eigenvector of the matrix
Z = (I − QT )(I − A X −1 )
associated with the eigenvalue 1 − µi . Conversely, since Z has N (A) as left eigenspace
associated with the eigenvalue 1, all its (right) eigenvectors xi associated with eigenvalues
different from 1 are orthogonal to N (A) ; i.e., they belong to the range of A and can
thus be written xi = A zi for zi ∈ R(A) . Since P T Z A = 0 , such zi satisfy P T A zi = 0
when the eigenvalue λi is nonzero, and are therefore eigenvectors of (3.40) with eigenvalue
µi = 1−λi . Below we show that Z has further the same eigenvalues as the iteration matrix
I − BTL A , and hence that these µi = 1 − λi coincide with the eigenvalues of BTL A that are
17
different from 0 and 1. Using KX ≤ 1 and (3.25) of Theorem 3.4 then proves the required
result.
To conclude this part of the proof, we now show that Z has the same eigenvalues as
I − BTL A : Z has the same eigenvalues as its transpose
ν
ν
Z T = (I − X −1 A)(I − Q) = I − M1−1 A 1 I − M2−1 A 2 (I − Q)
which, by [?, Theorem 1.3.22], has the same spectrum as I − BTL A .
Proof of statements (2), (4) and (5). Regarding (3.33) and (3.39), these are standard results for the regular case; see [?, ?, ?, ?]. We refer the reader to these works
because the related developments carry straightforwardly over the singular case, since
they are based on inequalities involving only the numerators of the expressions defining KX and KX (π) . Similarly, the equivalence between (3.35) and (3.38) is based on
kA vk2X = vT A(M −1 + M −T − M −1 A M −T )A v = vT A − (I − A M −1 )A(I − M −T A) v ,
which raises no particular comment for the singular case.
Hence, for regarding the statements (2), (4) and (5), we are left with the proof of (3.34).
Noting that
vT π T X π v
2
,
kπkX = sup
vT X v
v∈Rn
the left inequality immediately follows. Further, using πX = I − Pb(PbT X Pb)−1 PbT X , one
has π πX = π , and hence
T T
T
v T π T X π v = v T πX
π X π πX v ≤ kπk2X vT πX
X πX v ,
from which we deduce the right inequality.
Proof of statement (3). Remembering that (I − QT )A(I − Q) = (I − QT )A (see
Preliminaries), one may check that the largest possible δ in (3.35) is the inverse of the
largest eigenvalue of the generalized eigenvalue problem
(I − QT )A v = µ A X −1 A v ,
v∈W ,
(3.41)
where W is any fixed subspace complementary to N (A) . On the other hand, the right
hand side of (3.32) with π = π
e is the largest eigenvalue of
(I − QT ) X I − N (N T X N )−1 N T X (I − Q) v = µ A v ,
v∈W .
(3.42)
Next, R(X −1 A) is complementary to N (A) : if A X −1 A v = 0 while X −1 A v is nontrivial, w = A v is a nontrivial vector for which wT X −1 w = 0 , which contradicts the fact
that X is SPD. We can thus select W = R(X −1 A) , and replace v in (3.42) by X −1 A w ,
where w is again a vector in W ; that is, (3.42) is equivalent to
(I − QT ) X I − N (N T X N )−1 N T X (I − Q) X −1 A w
= µ A X −1 A w ,
18
v ∈ W . (3.43)
Now, multiplying both sides of (3.43) to the left by P T , the left hand side is zero because
P T (I − QT ) = Fc NcT P T with P Nc ⊂ N (A) (see Preliminaries). Hence any eigenvector
corresponding to a nonzero eigenvalue satisfies P T A X −1 A v = 0 . Similarly, any eigenvector of (3.41) corresponding to a nonzero eigenvalue satisfies P T A X−1 A w = 0 . Further,
when P T A X −1 A w = 0 holds, (I − Q)X −1 A w = I − P Agc P T A X −1 A w = X −1 A w
and (3.43) amounts to
(I − QT ) X I − N (N T X N )−1 N T X I − P Agc P T A X −1 A w
= (I − QT ) I − X N (N T X N )−1 N T A w
= (I − QT )A w ,
showing that the problems (3.41) and (3.43) have indeed the same nonzero eigenvalues and
associated eigenvectors.
As for general nonsymmetric matrices, the obtained results can be applied to nonsingular A , and one then simply recovers the known results for the regular case. For instance,
McCormick’s bound [?] (extended to the singular case in [?]) amounts to analyzing the
−1
constant δ in either (3.35) or (3.38). The inequality KX ≤ δX
(that can be deduced
by combining (3.37) with (3.33)) extends here to the singular case of a well known fact,
whereas (3.37) and (3.34) (applied to π
e), which originate from [?, ?], allow us to see more
closely the relationship between this bound and sharp two-level estimates.
It follows that the analyses that exploit these results in the regular case carry over
smoothly to the singular case. For instance, classical AMG methods [?, ?, ?] were originally
developed for symmetric M-matrices, motivated by an analysis of the constant Kdiag(A)
combined with an analysis of the constant in the right inequality (3.39) (referred to as
smoothing property). Below we show explicitly how these results can easily be extended to
singular M-matrices.
19
Application to classical AMG
We show here how the above theorems can be used to extend to singular systems the
classical two-level convergence estimate for Ruge-Stüben AMG methods [?, ?, ?]. This
analysis applies when the system matrix A is a symmetric M-matrices with nonnegative
row sum, hence we consider here symmetric M-matrices with zero row sum (singular Mmatrices with nonnegative row sum have always zero row sum). We further assume that
A is irreducible, implying that the null space has dimension 1 and is thus spanned by the
constant vector.
Now, the convergence proof is based on the approach just mentioned, where one combines an analysis of KD with the smoothing property analysis. Regarding this latter,
observe that both X and D in (3.39) are SPD in the singular case as well. Hence available results carry straightforwardly over the singular case. For instance, for symmetric
M-matrices, it can be shown that this constant is at most 4 when X correspond to symmetrized Gauss–Seidel smoothing (i.e., forward Gauss–Seidel as pre-smoother and backward Gauss-Seidel as post-smoother); see, e.g., Theorem A.3.1 in [?] and the subsequent
discussion, which can be applied verbatim to singular M-matrices.
We thus now focus on the analysis of KD , which, see Theorem 3.4, also directly governs
the convergence of a simplified two-level scheme with a single step of (damped) Jacobi
smoothing.
The AMG algorithm first partitions the unknowns set into coarse and fine unknowns.
Next, a prolongation P is set up that has the form
JF C
P =
,
ICC
where ICC is the identity for the coarse unknowns and JF C an interpolation operator for
fine unknowns. For the considered class of matrices, the row sum of JF C should be 1
when the row sum of the corresponding row is zero; that is, everywhere in the singular
case. Then, the range of P contains the constant vector; i.e., the null space of A , and we
can therefore apply our theoretical results with Pb = P . In fact, in this case the “direct”
interpolation rule [?, p 448] amounts to associating to each fine unknown i a set Pi of
coarse neighboring unknowns, and set, for all j ,
( |aij |
P
if j ∈ Pi
k∈Pi |aik |
(JF C )ij =
0
otherwise .
Now, the coarsening algorithms, that define the fine/coarse partitioning and select the
set Pi for each fine unknown, are actually designed in such a way that the following constant
is only moderately larger than 1:
P
k6=i |aik |
τ = max P
.
i∈F
k∈Pi |aik |
20
This is motivated by theoretical considerations: one finds, for all v ∈ Rn (using Schwarz
inequality
and remembering that aij ≤ 0 for i 6= j whereas, in the singular case, aii =
P
k6=i |aik | for all i),
!2
X
i∈F
aii
vi −
X
(JF C )ij vj
j∈Pi
!2
|aij |
P
(vi − vj )
=
aii
k∈Pi |aik |
i∈F
j∈Pi
!
!
X
X
X
|aij |
|aij |
P
P
(vi − vj )2
≤
aii
|a
|
|a
|
ik
ik
k∈Pi
k∈Pi
j∈Pi
i∈F
j∈Pi
XX
≤ τ
|aij |(vi − vj )2
X
X
i∈F j∈Pi
≤ τ
XX
|aij |(vi − vj )2
i∈F j∈C
≤
−τ X X
aij (vi − vj )2
2
i
j
= τ vT A v .
And, with the help of Theorem 3.5, nothing more is needed to show that
KD ≤ τ .
Indeed, a valid “simplified” projector in (3.32) is
π = I − P 0 ICC
IF F −JF C
=
,
0
0
for which one has
P
KD (π) =
3.5
sup
i∈F
2
P
aii vi − j∈Pi (JF C )ij vj
vT A v
v∈Rn
v6∈N (A)
.
The nonsymmetric positive semidefinite case
Besides the case considered above where the system matrix is symmetric and positive
semidefinite, more insight can also be gained for nonsymmetric matrices which are positive
(semi)definite in Rn ; that is, for matrices A such that
vT A v ≥ 0
∀v ∈ Rn ,
(3.44)
or, equivalently, such that the symmetric part A = 12 (A + AT ) is positive (semi)definite.
For nonsingular matrices, this case is discussed in [?, ?], and we want to extend here these
results to the singular case, taking benefit from the fact that they are corollaries of the
general result for nonsymmetric matrices considered in the preceding subsection.
21
Note that with (3.44), vT (A+AT )v = 0 is possible only for null space vectors of A+AT .
This implies that the left and right null space of A coincide, and thus further coincide with
the null space of A + AT :
A v = 0 ⇒ vT (A + AT )v = 0 ⇒ (A + AT )v = 0 ⇒ AT v = 0 .
b such that R
b = PbT .
Thus, if R = P T , one may always select R
Now, to obtain a clear localization of the eigenvalues from Theorem 3.3, we need not
only to assume (3.44) and R = P T , but also that X is SPD. This latter assumption is
particularly restrictive and means that, in practice, the analysis is in general possible only
for a simplified two-level scheme using only a single (either pre- or post-) smoothing step
with a simple smoother such as damped Jacobi (then X is equal to either M1 or M2 ).
Altogether, the additional assumptions used in the the following theorem are thus
A is positive (semi)definite in Rn , R = P T , and X is SPD .
(3.45)
Theorem 3.6. Let the assumption of Theorem 3.3 hold, assume in addition (3.45), and
b = PbT . Then, BTL A has k times the eigenvalue 0, n
select R
ec times the eigenvalue 1, and
the n − n
ec − k other eigenvalues λ satisfy
|λ − αX | ≤ αX ,
(3.46)
−1
<e(λ) ≥ KX
,
(3.47)
wT AT X −1 A w
,
2 wT AS w
(3.48)
where, setting AS = 21 (A + AT ) ,
αX =
KX =
sup
v∈Cn \N (A)
b T A v=0
P
sup
−1
vT X I − Pb PbT X Pb PbT X v
vT A v
v∈Rn
v6∈N (A)
.
(3.49)
Moreover, if, for some number α there holds
k(α I − X −1 A)wkX ≤ α kwkX
∀w : PbT A w = 0 ,
(3.50)
then αX ≤ α .
Proof. Exploiting (3.14), the proof of (3.47) given in Corollary 2.2 of [?] applies verbatim to the singular case as well.
Next, any v satisfying (3.14) for µ 6= 0 also satisfies PbT A v = 0 (and hence vT AT Pb =
T
0 ), as seen by multiplying both sides to the left by PbT . Then, multiplying both sides of
(3.14) to the left by vT AT X −1 yields, with µ = λ−1 ,
vT AT v = λ−1 vT AT X −1 A v .
22
Thus, since v ∈ V implies v 6∈ N (A) ,
<e(λ−1 ) =
<e(vT AT v)
vT AS v
=
≥ β −1 .
vT AT X −1 A v
vT AT X −1 A v
The last inequality is in fact equivalent to (3.46) because
<e (λR + i λI )−1
=
λR
λ2R + λ2I
≥
1
β
⇐⇒
λR −
β 2
2
+ λ2I
≤
β2
4
.
Finally,
wT AT X −1 A w
w T AS w
≤ 2α
⇐⇒
wT AT X −1 A w ≥ α wT A + AT w
⇐⇒
⇐⇒
wT (α I − AT X −1 )X(α I − X −1 A) − α2 X w ≤ 0
k(α I − X −1 A)wkX ≤ α kwkX ,
showing that (3.50) implies αX ≤ α as claimed.
4
Problems with large-dimensional null space
At first, our analysis provides mainly theoretical bases to the extension of methods initially
developed for regular problems. However, going into the details, it turns out that obtaining
satisfactory convergence can actually be easier for singular problems, at least in the positive
semidefinite case. Indeed, null space vectors need not to be included in the prolongation
P : if they are not, they are added to form the extended prolongation Pb , so that KX in
(3.26) is finite in any case.
On the contrary, if a positive semidefinite matrix is perturbed to become positive definite, all null space vectors become near null space ones, and they must be well approximated
in the range of P . This can be seen with the help of Theorem 3.5, considering for instance
Aε = A0 +ε I , where A0 is symmetric and positive semidefinite and ε a positive parameter.
For any null space vector v of A0 , (3.30) implies that (using Pb = P since the problem is
nonsingular for ε > 0)
(I − P (P T P )−1 P T v2
vT (I − P (P T P )−1 P T v
= ε−1
.
KI ≥
ε vT v
kvk2
The numerator of the right hand side is the square norm of the orthogonal projection of v
onto R(P )⊥ and is thus sufficiently small to compensate for the ε−1 only if v has a close
approximation in R(P ) . If not, KI will be large, which also means, in view of (3.39),
that KX will be large for any well conditioned smoother. Since smoothers used with AMG
methods are typically well conditioned, we have thus in the regular case an additional
constraint with respect to the singular case.
23
Now, so far multilevel methods have been (to the best of our knowledge) only applied to
matrices with a low-dimensional null space, for which the above observation is a mathematical curiosity without practical consequence. However, it can be crucial for the successful
application of multilevel methods to problems with much larger null space. An example of
such problem is offered by the discretization of
curl ν curl E + σ E = f
in Ω ⊂ R3 ,
(4.1)
using Whitney–Nédélec edge finite elements [?, ?]. This problem arises as a special case
of the Maxwell equations, with then ν > 0 and σ ≥ 0 , see [?, ?] for details, where the
multigrid solution of the resulting linear system is also discussed in the case σ > 0 . This
system
(Kν + σ M ) u = b
is symmetric positive definite when σ > 0 but only positive semidefinite when σ = 0 . This
latter case corresponds to magnetostatics problems [?, ?], for which the null space is huge:
at the continuous level, any vector field E corresponding to the gradient of a function ϕ is
in the kernel of the curl operator. At the discrete level, this implies that the dimension of
the null space of Kν is large (about one third of the matrix size in the numerical example
below). Perhaps for this reason, so far, multilevel methods have been designed only for the
regular case (σ > 0), even though, as in [?], the final goal is to solve the singular instance,
σ playing the role of a regularization parameter.
Clearly, this huge null space of Kν makes the design of multilevel methods especially
challenging for small positive σ . In fact, since it is impractical to have all null space vectors
of Kν in the range of P , state of the art methods resort to specialized smoothers; see, e.g.,
[?, ?]. Now, what prevented, e.g. in [?], the direct application of the so defined approach
to the singular case (σ = 0)? We identify two obstacles.
1. The structure of the coarse grid matrix is similar to that of the fine grid one; i.e.,
the coarse grid matrix has also a large-dimensional null space, and issues associated
with singular coarse grid matrices may then seem nontrivial.
2. The used hybrid smoother from [?] is not properly defined for σ = 0 .
However, our results offer proper answers to both these concerns. Regarding the first one,
we know that the coarse systems are compatible and may be solved with any generalized
inverse; i.e., there is no real issue here.
Regarding the second concern, we observe that the initial motivation for the hybrid
smoother is the proper handling of near null space vectors not in the range of P . In
the singular case, these near null space vectors become true null space vectors, and hence
are in any case in the range of the extended prolongation Pb . Thus the specific reason
why a specialized smoother is needed disappears, and it is sensible to expect satisfactory
convergence with classical (Gauss-Seidel or damped Jacobi) smoothers. Technically, it
means that one uses the same smoother as in [?, ?], but skipping the “hybrid part” when
σ = 0.
24
Smoother
h−1
h−1
Damped Jacobi Sym. Gauss–Seidel
Std Hyb Std Std Hyb
Std
σ
0.01
0
0.01
0
= 20 1.00 0.84 0.84 1.00 0.72 0.72
= 40 1.00 0.85 0.85 1.00 0.76 0.76
Table 1: ρeff for the singular and regularized variants of the discrete problem, using a
two-level method with different smoothing schemes: “Std” refers to standard smoothing
as defined in this work, and “Hyb” to the hybrid smoother described in [?, ?, ?].
We tested this idea on the linear system resulting from the finite element discretization
of (4.1) where Ω is the unit cube, using a uniform mesh of hexahedral (cubic) edge elements
with edge length h . We use a two-level method as defined in this paper with a prolongation
based on nodal aggregation as described in [?]; we refer to [?, Chapter 6] for more details on
the prolongation setup, and, more generally, for a thorough description of the experiment
setting. In Table 1, we report the effective spectral radius for both the singular case
(σ = 0) and the regularized one, comparing for this latter the hybrid smoother with the
standard one (only the standard smoother is well defined when σ = 0). For damped Jacobi
smoothing, we consider no pre-smoothing and a single post-smoothing step with damping
factor ω = 1/3 , which ensures in this example that (3.23) holds, and hence that (3.29) is
actually satisfied with both damped Jacobi and symmetrized Gauss-Seidel smoothers.
The results confirm the expectations: standard smoothers just perform as well for the
singular problem as their hybrid counterpart in the regularized case. Hence if the goal is
to solve the singular system, it is neither needed nor useful to regularize the problem and
further use a specific smoother to tackle the induced near singularity.
5
Conclusions
We have provided a generalization to singular matrices of all the main theoretical results
constitutive of the approach developed in [?, ?, ?, ?, ?, ?, ?]. This has been done in a
way that allows the extension to singular problems of the many methods inspired by these
results. Comparing, for instance, Theorem 3.4 applied to singular and nonsingular A , one
sees that the main convergence parameters will be essentially the same for two nearby
matrices, provided that the extended prolongation Pb to be used in the singular case is
equal to the standard prolongation P for the nonsingular one, which happens when the
range of the prolongation contains the null space of the singular instance and thus the near
null space of nearby nonsingular matrices.
Moreover, we have shown that singularities arising in the coarse grid matrices are mostly
harmless, and, further, that the course of the iterations is often independent of the choice
of the generalized inverse; i.e., independent of which particular solution of the coarse grid
system is picked up. A noteworthy exception is the application to Markov chains. Then,
25
the convergence may depend upon the choice of the generalized inverse, which deserves
further investigations.
On the other hand, our results show that the convergence conditions can actually be
weaker in the singular case, because null space vectors need not to be well approximated
in the range of the prolongation, as required for near null space vectors in the regular case.
Discussing an example, we showed how this observation could lead to the development of
more efficient algorithms for singular problems in the case where the null space is largedimensional.
Acknowledgments
Anonymous referees are acknowledged for suggestions that helped to improve the original
manuscript. I thank Artem Napov for fruitful discussions and the code that allowed the
numerical experiment reported in Section 4 to be run.
26