SYMMETRIC INDEFINITE PRECONDITIONERS FOR SADDLE

SYMMETRIC INDEFINITE PRECONDITIONERS FOR SADDLE POINT
PROBLEMS WITH APPLICATIONS TO PDE-CONSTRAINED
OPTIMIZATION PROBLEMS
JOACHIM SCHÖBERL∗ AND WALTER ZULEHNER†
Abstract. We consider large scale sparse linear systems in saddle point form, or, equivalently, large scale
quadratic optimization problems with linear equality constraints. A natural property in this context is the coercivity
of the cost functional only on the set satisfying the constraints. To enforce the coercivity everywhere, an augmented
Lagrangian approach is usually chosen. However, the adjustment of the involved parameters is a critical issue.
We will present a different approach that is not based on such an explicit augmentation technique. A symmetric
and indefinite preconditioner for the saddle point problem will be constructed and analyzed. Under appropriate
conditions the preconditioned saddle point system is symmetric and positive definite with respect to a particular
scalar product. Therefore, conjugate gradient acceleration can be used.
If applied to a typical infinite-dimensional optimization problem from optimal control with an L 2 -type cost
functional and an elliptic state equation with distributed cpntrol, it is shown that the convergence rate for solving
the discretized problem with the proposed preconditioned conjugate gradient method is independent of the mesh
size of the underlying subdivision and also independent of some involved regularization parameter. Numerical
experiments are presented for such a model problem in order to illustrate the theoretical results.
Key words. saddle point problems, indefinite preconditioners, conjugate gradient methods, optimization
problems with PDE-constraints
AMS subject classifications. 65F10, 15A12, 49M15
1. Introduction. In this paper we consider large scale sparse linear systems of equations in
saddle point form
A BT
x
f
=
,
B
0
p
g
where A is a real, symmetric and positive semi-definite n-by-n matrix, B is a real m-by-n matrix
with full rank m ≤ n, and B T denotes the transposed matrix of B. Such systems typically result
from the discretization of mixed variational problems for systems of partial differential equations
(in short: PDEs), see Brezzi and Fortin [4], in particular, from the discretization of optimization
problems with PDE-constraints. A natural property of such a problem is that A is positive definite
on the kernel of B, i.e.:
(Ax, x) > 0 for all x ∈ ker B with x 6= 0,
(1.1)
where (x, y) denotes the Euclidean scalar product. This condition guarantees in combination with
the full rank of B that the matrix
A BT
K=
B
0
is non-singular.
As long as the matrix A is positive definite not only on ker B but on the whole space R n ,
the Schur complement S = BA−1 B T is well-defined. Then several approaches for an efficient
solution procedure have been proposed, see the review article by Benzi, Golub and Liesen [2]. In
this paper, however, we will focus on systems, where A is positive definite in a stable way (to be
specified later) only on ker B, a typical situation for certain classes of optimization problems with
PDE-constraints.
∗ Center
for Computational Engineering Science, RWTH Aachen, D-52074 Aachen, Germany,
([email protected]).
† Institute
of Computational Mathematics, Johannes Kepler University, A-4040 Linz, Austria
([email protected]).
The work was supported in part by the Austrian Science Foundation
(FWF) under the grant SFB F013.
1
2
J. SCHÖBERL AND W. ZULEHNER
One strategy to enforce the definiteness on the whole space Rn is the augmented Lagrangian
approach, where the matrix A is replaced by a matrix of the form AW = A + B T W B with an
appropriate matrix W , see e.g. Fortin and Glowinski [5]. This does not change the solution of the
equation, and AW becomes positive definite if W is positive definite. It is, however, a delicate issue
to choose the matrix W in order to obtain good convergence properties, see the discussion in Golub
and Greif, [6]. Another strategy is to use positive definite block diagonal preconditioners, either
applied to the original system, see e.g. Rusten and Winther [8], Silvester and Wathen [9], or to the
augmented system, see e.g. Golub, greif and Varah [7]. Then the preconditioned system remains
indefinite. The application of conjugate gradient is then not possible, other Krylov subspace
methods suitable for indefinite systems, such as MINRES, have been proposed to accelerate the
iteration. In Bramble and Pasciak [3] a preconditioning technique was introduced leading to a
symmetric and positive definite system, which then can be solved by conjugate gradient method.
However, this technique requires a symmetric and positive definite approximation  with  < A.
Thus it could be applied only after augmentation, if A is not positive definite on the whole space
Rn .
In this paper we will take a different approach and construct preconditioners K̂ for the original
system matrix K (without augmentation) which, nevertheless, work also well in the case that A is
positive definite only on the kernel of B. It will be shown that the preconditioned matrix K̂−1 K
is even symmetric and positive definite in some appropriate scalar product. Therefore, conjugate
gradient acceleration can be applied. In contrast to Bramble and Pasciak [3] this new technique
requires a symmetric and positive definite approximation  with  > A, which is easier to achieve
and can be applied also if A is only positive definite on the kernel of B.
The paper is organized as follows: In Section 2 the class of preconditioners is introduced and
analyzed. Section 3 describes how the algebraic conditions for the preconditioners are linked to
the conditions of the theorem of Brezzi for mixed variational problems, and a general strategy for
constructing the preconditioners is sketched. In Section 4 a model problem from optimal control
is discussed and preconditioners are constructed which are robust with respect to the mesh size as
well as to the involved regularization parameter. Numerical experiments are presented in Section
5, followed by some concluding remarks.
Throughout the paper the following notations are used: M < N (N > M ) iff N − M is
positive definite, and M ≤ N (N ≥ M ) iff N − M is positive semi-definite, for symmetric matrices
M and N . For a symmetric and positive definite matrix M the associated scalar product (v, w) M
and norm kvkM are given by
(v, w)M = (M v, w),
1/2
and kvkM = (v, v)M ,
where (v, w) (without index) denotes the Euclidean scalar product. The Euclidean norm of a
vector v is denoted by kvk (without index).
2. A Class of Symmetric and Indefinite Preconditioners. A well-known class of preconditioners is given by
Â
BT
,
K̂ =
B B Â−1 B T − Ŝ
where  and Ŝ are symmetric and positive definite matrices, see Bank, Welfert and Yserentant
[1]. More precisely, we will assume that  and Ŝ are preconditioners, i.e., an efficient evaluation
of Â−1 w and Ŝ −1 q is available.
We have the following factorization
I
0
 B T
,
K̂ =
B Â−1 I
0 −Ŝ
which implies that K̂ is non-singular and that the solution of a linear system
w
s
K̂
=
q
t
SYMMETRIC INDEFINITE PRECONDITIONERS
3
reduces to the consecutive solution of the following three linear systems:
Âŵ = s,
Ŝ q = B ŵ − t,
Âw = s − B T q.
So, one application of the preconditioner K̂ requires two applications of the preconditioner  and
one application of the preconditioner Ŝ.
In Bank, Welfert and Yserentant [1] and later in Zulehner [10], this preconditioner has been
analyzed for the case that A is positive definite. One important part of the analysis easily carries
over to the case, considered here:
Theorem 2.1. Assume that A ≥ 0, condition (1.1) is satisfied, and rank B = m. Let  > 0
and Ŝ > 0.
1. If
 ≥ A
and
Ŝ ≤ B Â−1 B T ,
(2.1)
then all eigenvalues of K̂−1 K are real and positive.
2. If
 > A
and
Ŝ < B Â−1 B T ,
(2.2)
then K̂−1 K is symmetric and positive definite with respect to the scalar product
w
x
= ((Â − A)x, w) + ((B Â−1 B T − Ŝ)p, q).
,
q
p
D
(2.3)
Proof. Apply Theorem 5.2 from Zulehner [10] to the regularized matrices A + ε I and  + ε I
for ε > 0, take the limit ε → 0 and observe the K is non-singular.
However, the estimates for the extreme eigenvalues of K̂−1 K were derived under the assumption that A is positive definite on the whole space. In particular, the estimate for the smallest
eigenvalue degenerates, if directly applied to the case considered here. In this paper this gap will
be closed.
First of all, we have to discuss reasonable assumptions on  and Ŝ, which measure the quality
of these preconditioners. Comparing the matrix K and the preconditioner K̂ it seems to be natural
to consider  as an approximation to A at least on ker B and to consider Ŝ as an approximation
to the so-called inexact Schur complement H, given by
H = B T Â−1 B.
In order to quantify the quality of these approximations, we introduce two positive constants α
and β with
(Ax, x) ≥ α (Âx, x)
for all x ∈ ker B
and
B Â−1 B T ≤ β Ŝ.
Observe, that we will still require at least condition (2.1), therefore α ≤ 1 and β ≥ 1. The closer
α and β are to 1 the better, we expect, the preconditioner K̂ will be.
Now we have
Theorem 2.2. Assume that A ≥ 0, condition (1.1) is satisfied, and rank B = m. Let  > 0
and Ŝ > 0 with
α (Âx, x) ≤ (Ax, x)
for all x ∈ ker B
and
A ≤ Â,
(2.4)
4
J. SCHÖBERL AND W. ZULEHNER
and
Ŝ ≤ B Â−1 B T ≤ β Ŝ.
(2.5)
Then
λmax (K̂−1 K) ≤ β +
and
λmin (K̂−1 K) ≥
p
β2 − β
i
p
1h
2 + α − 1/β − (2 + α − 1/β)2 − 4α .
2
Proof. The upper bound directly follows from Theorem 5.2 in Zulehner [10], again by considering the regularized matrices A + ε I and  + ε I for ε > 0 with ε → 0.
For the lower bound we consider an eigenvalue λ of the matrix K̂−1 K:
x
x
K
= λ K̂
,
p
p
which is equivalent to the eigenvalue problem
x
x
= µD
K
p
p
with
λ=
µ
1+µ
and D = K̂ − K =
 − A
0
0
,
B Â−1 B T − Ŝ
or, in an equivalent variational form:
(Ax, w) + (Bw, p) = µ ((Â − A)x, w)
for all w ∈ Rn ,
= µ ((B Â−1 B T − Ŝ)p, q) for all q ∈ Rm .
(Bx, q)
Now, two cases are distinguished: Firstly, for the case µ ≤ 0 it follows that λ = µ/(1 + µ) > 1,
since λ must be positive by Theorem 2.1. (The case µ = −1 can be excluded, since K̂ is nonsingular.) So, in this case, the eigenvalues λ are bounded from below by 1.
Next, we consider the remaining case µ > 0: Let
W = ker B,
W ⊥ = {x ∈ Rn : (Âx, w) = 0 for all w ∈ W }.
Then there is a unique representation of x of the following form:
x = x1 + x2
with x1 ∈ W and x2 ∈ W ⊥ .
Now the variational form reads:
h
i
= µ ((Â − A)x1 , w1 ) − (Ax2 , w1 )
h
i
(Ax1 , w2 ) + (Ax2 , w2 ) + (Bw2 , p) = µ −(Ax1 , w2 ) + ((Â − A)x2 , w2 )
(Ax1 , w1 ) + (Ax2 , w1 )
(Bx2 , q)
= µ ((B Â−1 B T − Ŝ)p, q)
for all w1 ∈ W,
for all w2 ∈ W ⊥ ,
for all q ∈ Rm .
From the first equation we obtain for w1 = x1 :
α (x1 , x1 )Â ≤ (Ax1 , x1 ) = µ ((Â − A)x1 , x1 ) − (µ + 1)(Ax2 , x1 ).
5
SYMMETRIC INDEFINITE PRECONDITIONERS
Using
(Aw2 , w1 ) = −((Â − A)w2 , w1 ) ≤ ((Â − A)w1 , w1 )1/2 ((Â − A)w2 , w2 )1/2
√
≤ 1 − α kw1 k kw2 k for all w1 ∈ W, w2 ∈ W ⊥ ,
it follows that
√
α (x1 , x1 ) ≤ µ (1 − α) (x1 , x1 ) + (µ + 1) 1 − α kx1 k kx2 k ,
which implies
√
α kx1 k ≤ µ (1 − α) kx1 k + (µ + 1) 1 − α kx2 k .
From the second equation we obtain
sup
w2
∈W ⊥
−(µ + 1)(Ax1 , w2 ) + ((µ(Â − A) − A)x2 , w2 )
(Bw2 , p)
= sup
kw2 kÂ
kw2 kÂ
⊥
w2 ∈W
√
≤ (µ + 1) 1 − α kx1 k + (µ + 1) kx2 k .
From the third equation we obtain
sup
06=q
µ ((B Â−1 B T − Ŝ)p, q)
(Bx2 , q)
= sup
≤ µ (1 − 1/β) kpkH .
kqkH
kqkH
06=q
Observe that, for the left-hand sides of the last two inequalities, we have the following well-known
representation:
sup
w2 ∈W ⊥
(Bw2 , p)
(Bw, p)
= sup
= (B Â−1 B T p, p)1/2 = kpkH
kw2 kÂ
w∈Rn kwkÂ
and
sup
06=q∈Rm
(Bx2 , q)
1/2
1/2
= (B T H −1 Bx2 , x2 )1/2 = (Â−1 B T H −1 Bx2 , x2 )Â = (x2 , x2 )Â = kx2 kÂ
kqkH
since P = Â−1 B T H −1 B is a
Hence, in summary,

√
− 1−α
√α
− 1 − α
−1
0
1
or, in short:
projector onto W ⊥ , so P x2 = x2 for x2 ∈ W ⊥ .



kx1 kÂ
0
√1 − α
1 kx2 k  ≤ µ  1 − α
kpkH
0
0
√
1−α
1
0


kx1 kÂ
0
0  kx2 k  ,
kpkH
1 − 1/β
Ke ≤ µ De
(2.6)
with

√α
K = − 1 − α
0
√
− 1−α
−1
1


0
√1 − α
1 , D =  1 − α
0
0
Since K −1 is non-negative element-wise, it follows
√
1−α
1
0



0
kx1 kÂ
0  , e = kx2 k  .
kpkH
1 − 1/β
e ≤ µ K −1 De.
Elementary calculations show that
i
p
1 h
2 − α − 1/β + (2 − α − 1/β)2 + 4α(1 − 1/β)
ν+ =
2α
6
J. SCHÖBERL AND W. ZULEHNER
is a non-negative eigenvalue of K −1 D with component-wise non-negative left eigenvector l T , given
by
√
1 − α, 1, αν+ + α − 1 .
lT =
Then
lT e ≤ µν+ lT e.
Obviously, lT e ≥ 0. One can easily excludes the case l T e = 0 by taking into account (2.6) and
e 6= 0. Therefore, after dividing by l T e > 0, we obtain
µ≥
1
.
ν+
Consequently,
λ=
i
p
1
1h
µ
2 + α − 1/β − (2 + α − 1/β)2 − 4α
≥
=
1+µ
1 + ν+
2
This lower bound is obviously smaller than 1, which was the lower bound for the first case µ ≤ 0.
This completes the proof.
Under the slightly stronger conditions
α (Âx, x) ≤ (Ax, x)
for all x ∈ ker B
and A < Â,
(2.7)
and
Ŝ < B Â−1 B T ≤ β Ŝ.
(2.8)
it is justified by Theorem 2.1 to apply the standard conjugate gradient method to the preconditioned system
x
−1
−1 f
K̂ K
= K̂
(2.9)
p
g
with respect to the scalar product (2.3). It is well-known that the error e(k) for the k-th iterate
(x(k) , p(k) )T measured in the corresponding energy norm can be estimated by
q
k
κ(K̂−1 K) − 1
2q
(0)
q
e(k) ≤
e
with
q
=
,
1 + q 2k
κ(K̂−1 K) + 1
where κ(K̂−1 K) denotes the relative condition number:
κ(K̂−1 K) =
λmax (K̂−1 K)
λmin (K̂−1 K)
.
From Theorem 2.2 the following upper bound for the relative condition number follows:
p
2(β + β 2 − β)
−1
p
κ(K̂ K) ≤
≡ κ(α, β).
2 + α − 1/β − (2 + α − 1/β)2 − 4α
This shows that the convergence rate q can be bounded by α and β only. If the preconditioners
are chosen such that α and β are independent of certain parameters like the mesh size h of
discretization or some involved regularization parameter ν, then the convergence rate is also robust
with respect to such parameters.
SYMMETRIC INDEFINITE PRECONDITIONERS
7
Furthermore, for α → 1 and β → 1, the lower and upper bounds for the eigenvalues in Theorem
both approach 1, (implying that all eigenvalues of the preconditioned matrix K̂−1 K approach 1,)
leading to a relative condition number approaching 1 and a convergence factor q approaching 0.
In the limit case α = 1 and β = 1 one can easily derive from the conditions (2.4) and (2.5)
the following representations for the preconditioners:
 = A + B T W B
and Ŝ = B Â−1 B.
for some matrix W ≥ 0. Then, we obtain for K̂:
A + BT W B
K̂ =
B
BT
0
From the considerations above it follows in this case that all eigenvalue of K̂−1 K must be equal
to 1. Moreover, it can easily be shown that
h
K̂−1 K
i2
= 0.
So, the corresponding Richardson method terminates at the solution after two steps.
In a simplified way one could describe the proposed strategy as follows: Good preconditioners
 can be interpreted as good approximations to some augmented matrix A + B T W B, but we do
not change the matrix A itself in the system matrix K. This seems to be only a slight variant to
the augmented Lagrangian approach, where A itself is replaced by A + B T W B in K. However, the
actual construction of the preconditioner is not based on selecting first some augmentation matrix
W and then preconditioning the augmented matrix. Instead, as it will be detailed in the next
section, it naturally follows from the analysis of an underlying (infinite-dimensional) variational
problem, whose discretization leads to the discussed large scale linear systems of equations in
saddle point form.
3. Application to mixed variational problems. Let the (infinite-dimensional) variational
problem be of the following form:
Find x ∈ X and p ∈ Q such that
a(x, w) + b(w, p) = hF, wi
b(x, q)
= hG, qi
for all w ∈ X,
for all q ∈ Q.
Here, X and Q are real Hilbert spaces, a : X × X −→ R and b : X × Q −→ R are bilinear forms,
F : X −→ R and G : Q −→ R are continuous linear functionals, and hF, wi (hG, qi) denotes the
evaluation of F (G) at the element w (q).
The existence and uniqueness of a solution to this mixed variational problem is well-established
(theorem of Brezzi, see Brezzi and Fortin [4]) under the following conditions:
1. The bilinear form a is bounded. Then, of course the norm kak satisfies the condition
a(w, w) ≤ kak kwk2X
for all w ∈ X.
2. The bilinear form a is coercive on ker B = {w ∈ X : b(w, q) = 0 for all q ∈ Q}: There
exists a constant α0 > 0 such that
a(w, w) ≥ α0 kwk2X
for all x ∈ ker B.
3. The bilinear form b is bounded: Then, of course, the norm kbk satisfies the condition
sup
06=w∈X
b(w, q)
≤ kbk kqkQ
kwkX
for all q ∈ Q.
8
J. SCHÖBERL AND W. ZULEHNER
4. The bilinear form b satisfies the inf-sup condition: There exists a constant k0 such that
sup
06=w∈X
b(w, q)
≥ k0 kqkQ
kwkX
for all q ∈ Q.
Under the additional assumptions that
5. the bilinear form a is symmetric on X:
a(x, w) = a(w, x)
for all x, w ∈ X, and
6. the bilinear form a is non-negative on X:
a(w, w) ≥ 0 for allw ∈ X,
the theorem of Brezzi implies the equivalence of the mixed variational problem to the following
constrained optimization problem:
Find x ∈ X such that
J(x) = min J(y)
w∈Xg
(3.1)
with
J(w) =
1
a(w, w) − hF, wi
2
and
Xg = {w ∈ X : b(w, q) = hG, qi for all q ∈ Q}.
For discretizing the infinite-dimensional problem the spaces X and Q are replaced by finitedimensional sub-spaces Xh ⊂ X and Qh ⊂ Q, which results in the following finite-dimensional
variational problem:
Find xh ∈ Xh and ph ∈ Qh such that
a(xh , wh ) + b(wh , ph ) = hF, wh i for all wh ∈ Xh ,
b(xh , qh )
=0
for all qh ∈ Qh .
By introducing suitable basis functions in Xh and Qh , we finally obtain the following saddle point
problem in matrix-vector notation:
Ah xh + BhT ph = f h ,
B h xh
= gh,
where xh and ph denote the corresponding vectors of coefficients with respect to these basis
functions.
We assume that the conditions of the theorem of Brezzi are also satisfied in Xh and Qh . This
is trivial for the first and the third condition, but not at all for the second and fourth condition.
To simplify the notation the same symbols are used to denote the constants.
These conditions read in matrix-vector notations:
Ah ≤ kak X h ,
(3.2)
(Ah , wh , w h ) ≥ α0 (X h wh , w h ) for all wh ∈ ker Bh ,
(3.3)
T
2
Bh X −1
h Bh ≤ kbk Qh ,
(3.4)
T
2
Bh X −1
h Bh ≥ k0 Qh .
(3.5)
SYMMETRIC INDEFINITE PRECONDITIONERS
9
Here, X h and Qh denote the matrices representing the scalar products (x, q)X and (p, q)Q as
bilinear forms on Xh and Qh , respectively:
(ph , qh )Q = (Qh ph , q h ).
(xh , wh )X = (X h xh , w h ),
For the third and fourth condition we used the well-known representation
sup
06=wh ∈Xh
b(wh , qh )
1/2
T
.
= (Bh X −1
h Bh q h , q h )
kwh kX
Comparing with the conditions (2.7) and (2.8) it is reasonable to choose for Âh a properly scaled
preconditioner of X h , and for Ŝh a properly scaled preconditioner of Qh , say
Âh =
1
X̂h
σ
and
Ŝh =
σ
Q̂h
τ
(3.6)
for some real parameters σ > 0 and τ > 0 and with preconditioners X̂h and Q̂h satisfying, e.g.,
the spectral estimates
(1 − qX ) (X̂h w h , w h ) ≤ (wh , wh )X ≤ (X̂h wh , wh )
(3.7)
(1 − qQ ) (Q̂h q h , q h ) ≤ (qh , qh )Q ≤ (Q̂h q h , q h )
(3.8)
and
Combining all estimates we easily obtain the following result:
Lemma 3.1. Assume that (3.2) - (3.8) hold. Then the conditions (2.7) and (2.8) are satisfied
with
α = σ (1 − qX ) α0
and
β = τ kbk2,
if the parameters σ and τ are chosen such that
1
1
and τ >
.
σ<
kak
(1 − qX )(1 − qQ )k02
Proof. We have
Ah ≤ kak X h ≤ kak X̂h = σ kak Âh < Âh
if σ < 1/kak. Next
(Ah w h , w h ) ≥ α0 (X h w h , w h ) ≥ (1 − qX ) α0 (X̂h wh , w h ) = α (Âh wh , w h )
with α = σ (1 − qX ) α0 . Next
−1 T
−1 T
T
2
2
Bh Â−1
h Bh = σ Bh X̂h Bh ≤ σ Bh X h Bh ≤ σ kbk Qh ≤ σ kbk Q̂h = β Ŝh
with β = τ kbk2 . Finally
−1 T
−1 T
T
2
Bh Â−1
h Bh = σ Bh X̂h Bh ≥ σ (1 − qX ) Bh X h Bh ≥ σ (1 − qX ) k0 Qh
≥ σ (1 − qX ) (1 − qQ ) k02 Q̂h = τ (1 − qX ) (1 − qQ ) k02 Ŝh > Ŝh
if τ > 1/[(1 − qX )(1 − qQ )k02 ].
Good and efficient preconditioners X̂h and Q̂h are usually available, as it will be shown for
a model problem in the next section. Therefore, the quantities qX and qQ , which measure the
quality of these preconditioners, are typically very small, say 0.1.
Roughly speaking, the parameter σ has to be sufficiently small, while the parameter τ has
to be sufficiently large in order to guarantee the conditions (2.7) and (2.8). On the other hand,
in order to obtain low upper bounds for the condition number of the preconditioned matrix, α
should be as large as possible and β should be as small as possible, i.e.: σ should be as large as
possible and τ should be as small as possible. This, of course, requires at least a rough quantitative
knowledge of the constants kak and k0 , which are involved in the choice of σ and τ .
Next, we will study a particular model problem from optimal control, where the parameters
kak, α0 , kbk, and k0 are known:
10
J. SCHÖBERL AND W. ZULEHNER
4. A model problem from optimal control. Let Ω ⊂ Rd be an open and bounded set.
We consider the following optimization problem with PDE-constraints:
Find the state y ∈ H 1 (Ω) and the control u ∈ L2 (Ω) such that
J(y, u) =
min
(z,v)∈H 1 (Ω)×L2 (Ω)
J(z, v),
subject to the state equation with distributed control u on the right hand side
−∆y + y = u in Ω,
∂y
= 0 on ∂Ω,
∂n
where the cost functional is given by
J(y, u) =
1
2
Z
Ω
(y − yd )2 dx +
ν
2
Z
u2 dx.
Ω
More precisely, we prescribe the state equation in weak form:
Z
Z
Z
u q dx for all q ∈ H 1 (Ω).
y q dx =
∇y · ∇q dx +
Ω
Ω
1
Ω
2
Let X = Y × U with Y = H (Ω), U = L (Ω) and Q = H 1 (Ω). With x = (y, u) ∈ X,
w = (z, v) ∈ X and q ∈ Q we introduce the following bilinear forms and linear functionals:
Z
Z
a(x, w) =
y z dx + ν
u v dx,
Ω
ZΩ
Z
Z
b(w, q) =
∇z · ∇q dx +
z q dx −
v q dx,
Ω
Ω
ZΩ
hF, wi =
yd z dx,
Ω
hG, qi = 0.
With this setting the optimization problem is of the standard form (3.1).
The conditions of the theorem of Brezzi can easily be verified for the Hilbert spaces X =
Y × U and Q introduced above and equipped with the standard scalar products (y, z)H 1 (Ω) in Y ,
(u, v)L2 (Ω) in U and (p, q)H 1 (Ω) in Q. Then, however, the parameters kak, α0 , kbk and k0 depend
on the regularization parameter ν, eventually resulting in convergence rates also depending on ν.
With a different scaling of the scalar products in Y , U and Q we obtain parameters kak, α 0 ,
kbk and k0 independent of ν, eventually leading to preconditioners with convergence rates robust
in ν: In particular, we consider the following new scalar products (y, z)Y in Y = H 1 (Ω), (u, v)U
in U = L2 (Ω) and (p, q)Q in Q = H 1 (Ω):
√
(y, z)Y = (y, z)L2 (Ω) + ν (y, z)H 1 (Ω) , (u, v)U = ν (u, v)L2 (Ω)
and
(p, q)Q =
1
1
(p, q)L2 (Ω) + √ (p, q)H 1 (Ω)
ν
ν
and we set (x, w)X = (y, z)Y + (u, v)U for x = (y, u), w = (z, v) ∈ X = Y × U .
With these definitions of the scalar products the following properties can be verified:
Lemma 4.1.
1. The bilinear form a is bounded:
a(x, x) ≤ kxk2X
for all x ∈ X.
11
SYMMETRIC INDEFINITE PRECONDITIONERS
2. The bilinear form a is coercive on ker B:
a(x, x) ≥ α0 kxk2X
for all x ∈ ker B
with α0 =
2
.
3
3. The bilinear form b is bounded:
sup
06=w∈X
b(w, q)
≤ kqkQ
kwkX
for all q ∈ Q.
4. The bilinear form b satisfies the inf-sup condition:
b(w, q)
sup
≥ k0 kqkQ
06=w∈X kwkX
with k0 =
r
3
.
4
Proof. (1) is trivial. For (2) take x = (y, u) ∈ ker B. Then:
for all q ∈ H 1 (Ω).
(y, q)H 1 (Ω) = (u, q)L2 (Ω)
In particular, it follows for q = y:
kyk2H 1 (Ω) = (u, y)L2 (Ω) ≤ kukL2 (Ω) kykL2 (Ω) ,
which implies
kxk2X = kyk2Y + kuk2U ≤ kyk2L2(Ω) +
√
ν kykL2(Ω) kukL2(Ω) + ν kuk2L2 (Ω)
Then
a(x, x) ≥ α0 kxk2X
is certainly satisfied if
h
i
√
a(x, x) ≥ α0 kyk2L2 (Ω) + ν kykL2 (Ω) kukL2 (Ω) + ν kuk2L2(Ω) ,
which is equivalent to
(1 − α0 ) kyk2L2 (Ω) − α0
√
ν kykL2(Ω) kukL2(Ω) + (1 − α0 ) ν kuk2L2(Ω) ≥ 0.
This is obviously the case for α0 = 2/3, since
2
√
1
2√
1 1
kykL2 (Ω) − νkukL2 (Ω) .
ν kykL2(Ω) kukL2(Ω) + ν kuk2L2(Ω) =
kyk2L2(Ω) −
3
3
3
3
To show (3) and (4) we start with the following formula:
2
(z, q)H 1 (Ω) − (v, q)L2 (Ω)
b(w, q)2
=
sup
sup
2
kzk2Y + kvk2U
06=w∈X kwkX
06=(z,v)∈Y ×U
(z, q)2H 1 (Ω)
= sup
06=z∈Y
kzk2Y
(z, q)2H 1 (Ω)
= sup
06=z∈Y
kzk2Y
+ sup
06=v∈U
+
(v, q)2L2 (Ω)
kvk2U
1
kqk2L2 (Ω) .
ν
Then (3) easily follows from the estimates
sup
06=z∈Y
(z, q)2H 1 (Ω)
kzk2Y
+
kzk2H 1 (Ω) kqk2H 1 (Ω)
1
1
kqk2L2 (Ω) ≤ sup
+ kqk2L2 (Ω)
ν
kzk2Y
ν
06=z∈Y
= sup
06=z∈Y
kzk2H 1 (Ω) kqk2H 1 (Ω)
1
√
+ kqk2L2 (Ω)
2
2
kzkL2 (Ω) + ν kzkH 1 (Ω)
ν
1
1
≤ √ kqk2L2 (Ω) + kqk2H 1 (Ω) = kqk2Q .
ν
ν
12
J. SCHÖBERL AND W. ZULEHNER
For (4) observe that:
sup
(z, q)2H 1 (Ω)
kzk2Y
06=z∈Y
+
kqk4H 1 (Ω)
1
1
kqk2L2 (Ω) ≥
+ kqk2L2 (Ω)
ν
kqk2Y
ν
kqk4H 1 (Ω)
1
√
+ kqk2L2 (Ω) .
=
kqk2L2 (Ω) + ν kqk2H 1 (Ω
ν
The inf-sup condition
sup
06=w∈X
b(w, q)
≥ k0 kqkQ .
kwkX
is certainly satisfied if
kqk4H 1 (Ω)
1
√
+ kqk2L2 (Ω) ≥ k02 kqk2Q = k02
ν
kqk2L2 (Ω) + ν kqk2H 1 (Ω
1
1
kqk2L2 (Ω) + √ kqk2H 1 (Ω) ,
ν
ν
which is equivalent to
1
1
(1 − k02 ) kqk4H 1 (Ω) + (1 − 2k02 ) √ kqk2L2 (Ω) kqk2H 1 (Ω + (1 − k02 ) kqk4L2 (Ω) ≥ 0.
ν
ν
This is obviously the case for k02 = 3/4 since
2
1
1 1
1
1
1 1
2
4
4
2
2
2
kqkH 1 (Ω − √ kqkL2 (Ω) .
kqkH 1 (Ω) − √ kqkL2 (Ω) kqkH 1 (Ω +
kqkL2 (Ω) =
4
2 ν
4ν
4
ν
By the theorem of Brezzi it now follows that the optimization problem is equivalent to the
following mixed variational problem:
Find x ∈ H 1 (Ω) × L2 (Ω) and p ∈ H 1 (Ω) such that
a(x, w) + b(w, p) = hF, xi
b(x, q)
=0
for all w ∈ H 1 (Ω) × L2 (Ω),
for all q ∈ H 1 (Ω).
For the spaces Yh = Uh = Qh we choose, as an example, the space of piecewise linear and
continuous functions on a simplicial subdivision of Ω. By introducing the standard nodal basis,
we finally obtain the following saddle point problem in matrix-vector notation:
Ah xh + BhT ph = f h ,
B h xh
= 0,
with
Ah =
Mh
0
0
ν Mh
and Bh = Kh
−Mh ,
where Mh denotes the mass matrix representing the L2 (Ω) inner product on Yh , and Kh denotes
the stiffness matrix representing the bilinear form of the state equation, here (∇y, ∇q) L2 (Ω) +
(y, q)L2 (Ω) , on Yh .
For the matrices X h and Qh representing the scalar products (x, w)X = (y, z)Y + (u, v)U and
(p, q)Q on Xh and Qh we obtain
1
0
Yh
and Qh = Y h
Xh =
0 ν Mh
ν
SYMMETRIC INDEFINITE PRECONDITIONERS
13
with
Yh =
√
√
ν Kh + ( ν + 1) Mh .
√
√
Observe that Y h is the stiffness matrix representing the bilinear form ν (∇y, ∇q)L2 (Ω) + ( ν +
1) (y, q)L2 (Ω) on Yh .
It is easy to see that Lemma 4.1 remains valid with the same constants if Y , U , Q are replaced
by the finite-dimensional spaces Yh , Uh , Qh , as long as Uh = Qh ⊂ Yh .
As discussed before, it is reasonable to use for Âh a (scaled) preconditioner for X h , and to
use for Ŝh a (scaled) preconditioner for Qh . For Y h , which appears in the first diagonal block of
X h and in Qh , we use, e.g., a standard multigrid preconditioner Ŷh for the second-order elliptic
√
√
differential operator represented by the bilinear form ν (∇y, ∇q)L2 (Ω) + ( ν + 1) (y, q)L2 (Ω) .
For the well-conditioned matrix Mh , which appears in the second diagonal block of X h a simple
preconditioner M̂h , e.g. a few steps of a symmetric Gauss-Seidel iteration, is used. So, eventually
we set
1 Ŷh
σ 1
1
0
and Ŝh =
Ŷh
(4.1)
Âh = X̂h =
σ
σ
τ ν
0 ν M̂h
with real parameters σ > 0 and τ > 0.
So, in summary, the preconditioner
Âh
K̂h =
Bh
BhT
−1 T
Bh Âh Bh − Ŝh
for the matrix
Kh =
Ah
Bh
BhT
0
is given by (4.1), where Ŷh is a √
preconditioner for the
√ second-order elliptic differential operator
represented by the bilinear form ν (∇y, ∇q)L2 (Ω) + ( ν + 1) (y, q)L2 (Ω) and a preconditioner M̂h
for the mass matrix.
It is reasonable to assume that
(1 − qX ) (Ŷh yh , yh ) ≤ (yh , yh )Y ≤ (Ŷh yh , y h )
(4.2)
(1 − qX ) (M̂h uh , uh ) ≤ (uh , uh )L2 (Ω) ≤ (M̂h uh , uh )
(4.3)
and
for some small value qX ∈ (0, 1). The factor qX describes the quality of the preconditioners Ŷh
and M̂h .
Then the discussion of the last section shows that the conditions (2.7) and (2.8) are satisfied
with
α = σ (1 − qX )
2
3
and β = τ
for parameters σ and τ satisfying
σ < 1 and τ >
4
.
3(1 − qX )2
In particular, assuming that qX ≈ 0, we can except α ≈ 2/3 and β ≈ 4/3 for σ ≈ 1 and
τ ≈ 4/3, leading to a rough estimate of the condition number κ ≈ κ(2/3, 4/3) ≈ 4, which implies
a convergence factor q ≈ 1/3 for the conjugate gradient method.
14
J. SCHÖBERL AND W. ZULEHNER
Remark 1. Observe that the mass matrix resulting from the first term of the L 2 -type cost
functional is approximated a preconditioner for a second order differential operator, at first sight
an very unusual approach, but completely natural in the context of the conditions of theorem of
Brezzi. The more straight forward approach, namely approximating the mass matrix by some
lumped mass matrix, which is easy to invert, leads to a much more complicated inexact Schur
complement, which can be interpreted as a discretized fourth order differential operator, which is
much harder to approximate. By our choice of the preconditioner for the mass matrix, the inexact
Schur complement remains a discretized second order differential operator.
5. Numerical Experiments. We consider the optimal control problem from the previous
section on the unit cube Ω = (0, 1)3 and with homogeneous data yd ≡ 0. Starting from an initial
mesh of 24 tetrahedra (starting level l = 1) we obtain a hierarchy of nested meshes by uniform
refinement up to some final level l = L. On each tetrahedral mesh piecewise linear and continuous
finite elements are used for Yh = Uh = Qh .
The discretized mixed problem is solved on the finest mesh (level l = L) by using the conjugate
gradient method for the preconditioned system (2.9) with the scalar product (2.3) as described
before. For the preconditioner we used the proposed symmetric block preconditioner, where Ŷh is
one V-cycle of the multigrid method with m1 forward Gauss-Seidel steps for pre-smoothing and m1
backward Gauss-Seidel steps for post-smoothing (in short
√ second-order elliptic
√ V (m1 , m1 )) for the
differential operator represented by the bilinear form ν (∇y, ∇q)L2 (Ω) + ( ν + 1) (y, q)L2 (Ω) . For
M̂h we use m2 steps of the symmetric Gauss-Seidel method (in short SGS(m2 )).
(0)
(0)
Starting values xh and ph are generated randomly. The exact solution of the problem is
(k)
(k)
the trivial solution xh = 0 and ph = 0. The quality of an approximation (xh , ph ) is measured
either by the energy norm e(k) of the error, which here is given by
!
(k) xh (k)
e = (k) ph −1
Dh K̂h Kh
or the residual r (k) :
r
(k)
= K h
!
(k) xh (k) .
ph Figure 5.1 shows a typical convergence history (number of iterations versus e (k) /e(0) and
r /r(0) for level L = 5 (number of unknowns 3 × 17.985) and regularization parameter ν = 1
using a V (3, 3)-cyle for Ŷh and SGS(3) for M̂h and parameters σ = 0.9 and τ = 1.1/k02 .
Table 5.1 shows that the number of iterations does not depend on the level of refinement. L
denotes the level of refinement, n the total number of all unknowns y h , uh and ph , and k the
number of iterations needed to satisfy the stopping rule
(k)
r(k) ≤ ε r(0)
with ε = 10−8 .
Table 5.1
Dependence of the number of iterations on the mesh size for fixed ν = 1.
level L
3
4
5
6
7
number of unknown n
1.107
7.395
53.955
412.035
3.200.227
iterations k
14
15
15
16
15
Table 5.2 shows that the number of iterations does not depend on the regularization parameter
ν. The results are given for refinement level L = 5.
15
SYMMETRIC INDEFINITE PRECONDITIONERS
1
energy norms
residuals
0.01
1e-04
1e-06
1e-08
1e-10
1e-12
1e-14
1e-16
0
5
10
15
20
25
30
35
40
Fig. 5.1. Convergence history: number of iterations versus relative accuracy.
Table 5.2
Dependence of the number of iterations on ν for fixed refinement level L = 5.
ν
10−4
10−2
1
102
104
iterations k
15
14
15
14
15
6. Concluding remarks. A class of symmetric and indefinite preconditioners have been
analyzed for saddle point problems where A is positive definite only on the kernel of B. The
preconditioned system is symmetric and positive definite, and, therefore, can be solved by a
conjugate gradient method. The construction of the preconditioners are guided by the conditions
of the theorem of Brezzi. In particular, for the considered optimal control problem with L 2 -type
cost functional and elliptic state equation with distributed control, the mass matrix for the state
variable, originated from the L2 -type cost functional is preconditioned by a preconditioner for a
modified elliptic equation, at first sight a very unusual approach, but quite natural in the context
of the theorem of Brezzi. The presented technique results in preconditioners robust in the mesh
size for a large class of saddle point problems. For the considered model problem it is also robust
with respect to the regularization parameter.
REFERENCES
[1] R. E. Bank, B. D. Welfert, and H. Yserentant. A class of iterative methods for solving saddle point problems.
Numer. Math., 56:645 – 666, 1990.
[2] M. Benzi, G. H. Golub, and J. Liesen. Numerical Solution of Saddle Point Problems. Acta Numerica,
14:1–137, 2005.
[3] J. H. Bramble and J. E. Pasciak. A preconditioning technique for indefinite systems resulting from mixed
approximations of elliptic problems. Math. Comp., 50:1 – 17, 1988.
[4] F. Brezzi and M. Fortin. Mixed and Hybrid Finite Element Methods. Springer-Verlag, 1991.
[5] M. Fortin and R. Glowinski. Augmented Lagrangian Methods: Application to the Numerical Solution of
Boundary–Value Problems. North–Holland, Amsterdam, 1983.
16
J. SCHÖBERL AND W. ZULEHNER
[6] G. H. Golub and C. Greif. On solving block-structured indefinite linear systems. SIAM J. Sci. Comput.,
24(6):2076–2092, 2003.
[7] G. H. Golub, C. Greif, and J. M. Varah. An algebraic analysis of a block diagonal preconditioner for saddle
point problems. SIAM J. Matrix Anal. Appl., 27(3):779–792, 2006.
[8] T. Rusten and R. Winther. A preconditioned iterative method for saddle-point problems. SIAM J. Matrix
Anal. Appl., 13:887 – 904, 1992.
[9] D. Silvester and A. Wathen. Fast iterative solutions of stabilized Stokes systems. Part II: Using block diagonal
preconditioners. SIAM J. Numer. Anal., 31:1352 – 1367, 1994.
[10] W. Zulehner. Analysis of iterative methods for saddle point problems: a unified approach. Math. Comp.,
71:479 – 505, 2002.