MULTIPLE LOCAL MINIMA OF PDE

MULTIPLE LOCAL MINIMA OF PDE-CONSTRAINED
OPTIMISATION PROBLEMS VIA DEFLATION
P. E. FARRELL∗
Abstract. Nonconvex optimisation problems constrained by partial differential equations (PDEs)
may permit distinct local minima. In this paper we present a numerical technique, called deflation,
for computing multiple local solutions of such optimisation problems. The basic approach is to apply a nonlinear transformation to the Karush–Kuhn–Tucker optimality conditions that eliminates
previously found solutions from consideration. Starting from some initial guess, Newton’s method is
used to find a stationary point of the Lagrangian; this solution is then deflated away, and Newton’s
method is initialised from the same initial guess to find other solutions. In this paper, we investigate
how the Schur complement preconditioners widely used in PDE-constrained optimisation perform
after deflation. We prove an upper bound on the number of new distinct eigenvalues of a matrix after
an arbitrary additive perturbation; from this it follows that for diagonalisable operators the number
of Krylov iterations required for exact convergence of the Newton step at most doubles compared to
the undeflated problem. While deflation is not guaranteed to converge to all minima, these results
indicate the approach scales to arbitrary-dimensional problems if a scalable Schur complement preconditioner is available. The technique is demonstrated on a discretised nonconvex PDE-constrained
optimisation problem with approximately ten million degrees of freedom.
Key words. distinct eigenvalues, deflation, Newton’s method, multiple local minima, PDEconstrained optimisation.
AMS subject classifications. 35Q93, 90C26, 65K10, 65N30
1. Introduction. Many situations in science, engineering and economics can be
modelled as the optimisation of some functional constrained by a system of partial
differential equations along with suitable boundary conditions. Important examples
include the optimisation of aircraft to minimise drag and maximise lift [22] and variational data assimilation in numerical weather prediction [23]. In general, one wishes
to compute the feasible input that attains the global minimum of the functional to
be minimised; in practice, rigorous global optimisation techniques are too expensive
for high-dimensional problems, and heuristic strategies are employed instead [32]. In
other situations, it is desirable to compute as many local minima of the functional as
possible: for example, the aesthetics of the optimal designs may be crucial, but it is
difficult to quantitatively formulate this as a constraint on an optimisation problem.
Another situation where this arises is multimodal Bayesian inference [38], where multiple modes of the posterior are characterised by multiple local minima of the negative
logarithm of the posterior density.
In this paper we propose a new strategy for computing distinct local minima of
PDE-constrained optimisation problems. Suppose one solution of the Karush–Kuhn–
Tucker optimality conditions has been computed (e.g. with some inexact Newton
method) and additional solutions are sought. A new deflated problem is constructed
∗ Mathematical Institute, University of Oxford, Oxford, UK. Center for Biomedical Computing,
Simula Research Laboratory, Oslo, Norway ([email protected]). This research is
funded by EPSRC grants EP/K030930/1 and EP/M019721/1, an ARCHER RAP award, and a Center of Excellence grant from the Research Council of Norway to the Center for Biomedical Computing
at Simula Research Laboratory. The author would like to acknowledge the use of the University of
Oxford Advanced Research Computing (ARC) facility in carrying out this work [34]. Experiments
presented in this paper were carried out using the HPC facilities of the University of Luxembourg
[40]. This work used the ARCHER UK National Supercomputing Service. The author would like
to thank A. J. Wathen, J. S. Hale, M. A. Saunders, C. Beentjes, and N. I. M. Gould for useful
discussions.
1
2
P. E. FARRELL
with two properties: first, that other solutions of the original problem are also solutions of the deflated problem, and vice versa; and second, that Newton’s method
(or another rootfinding technique) applied to the deflated problem will never converge to the solution already available [9]. In this manner, Newton’s method may be
started from the same initial guess and converge to a new solution of the optimality
conditions. The process is repeated until Newton’s method fails to converge. This
procedure will also identify local maxima and saddle points, but it is straightforward
to characterise these after as many stationary points as possible have been found. In
fact, in many applications it is desirable to compute saddle points also [25, 7, 14].
This deflation approach is not guaranteed to find all local minima, as it is not
possible to guarantee that the nonlinear rootfinding algorithm employed will converge
from arbitrarily bad initial guesses, even if a solution to the deflated problem exists.
However, when deflation is combined with a robust line search such as NLEQ-ERR
[6], many stationary points can be found, even with uninformed initial guesses.
The key computational problem to solve is the efficient preconditioning of the
linear systems that arise in Newton’s method applied to the deflated problem. If a
good preconditioner is available, then the deflation approach can scale to problems
of arbitrarily large dimension. Let JF and PF be the undeflated discretised Jacobian
and associated preconditioner, and let JG and PG be their deflated counterparts (JG
follows from the chain rule, PG is to be defined). In previous work [9], we proposed a
preconditioning strategy for PG in terms of PF and showed that kPG−1 JG −Ik could be
bounded in terms of a small multiple of kPF−1 JF − Ik. However, it is not necessary for
a good preconditioner to control kPF−1 JF − Ik. As pointed out in [29], it is sufficient
that PF−1 JF have a minimal polynomial of small degree; [29] give an example where
PF−1 in no sense approximates JF−1 , but an optimal Krylov method will converge in
three iterations when applied to PF−1 JF . The previous results of [9] do not address
this case.
In this paper we analyse the number of distinct eigenvalues of a matrix after an
additive perturbation. We prove a new theorem relating the number of distinct eigenvalues after perturbation to the prior number of distinct eigenvalues, the rank of the
update, and the degree of nondiagonalisability of the matrix. Applying this theorem
to the case of deflation, we prove that if the preconditioned operator is diagonalisable,
the number of distinct eigenvalues of PG−1 JG can be no more than twice the number
of distinct eigenvalues of PF−1 JF . Hence, by studying the resulting minimal polynomial, we show that any Krylov method with an optimality or Galerkin property
will converge exactly in no more than twice the number of iterations required for the
undeflated problem, in exact arithmetic. This holds regardless of the number of solutions deflated, i.e. the number of Krylov iterations required only doubles once, rather
than doubling after each deflation. This analysis is an essential step in extending
deflation to large-scale optimisation problems, as such preconditioners are ubiquitous
in optimisation [30].
We now contrast the proposed deflation approach to other related approaches.
The approach of Goldstein and Price [11] attempts to find superior local minima by
performing Taylor expansions around a minimum already obtained and minimising
auxiliary functions that involve higher derivatives of the function concerned. In the
context of PDE-constrained optimisation, the need for further derivatives (beyond
Hessians of the Lagrangian) is unattractive, because the problem may not possess
sufficient regularity and the number of auxiliary PDEs to solve in the computation
of derivatives of the reduced functional is exponential in the degree. Another class
LOCAL MINIMA VIA DEFLATION
3
of algorithms, multistart methods, attempt to find multiple local minima by starting
independent local optimisation runs from random initial points [26]. The tunnelling
method of Levy and Gomez [24] is the closest in spirit to the approach proposed here:
tunnelling employs a local optimisation algorithm to identify an initial minimum x∗
of a functional J, and then seeks other values of the controls that attain the same
value, by finding a root of the function J(x) − J(x∗ ) = 0 (“tunnelling” into another
valley). If such an x is computed, the local optimisation algorithm is initialised from
this new point and the process repeated. In order to ensure that the tunnelling phase
converges to a solution other than x∗ , the authors add a pole to the function to
ensure nonconvergence to x∗ , exactly the deflation concept investigated in this work.
However, there are several key differences. Levy and Gomez apply deflation to the
tunnelling function, whereas here deflation is applied to the optimality conditions
directly. Tunnelling attempts to find one global minimum, ignoring undesired local
minima, whereas here we are concerned with identifying as many stationary points
as possible. Finally, Levy and Gomez do not discuss strategies for dealing with the
density of the Jacobian induced by deflation: all examples considered have small,
dense Jacobians, and hence there is no need for the analysis of preconditioned Krylov
methods, as is central to this work.
2. Deflation. In this section we review the deflation strategy for nonlinear equations proposed in [9].
Consider the following generic PDE-constrained optimisation problem:
minimise J(y, u)
y∈Y, u∈U
subject to c(y, u) = 0,
(2.1)
where y is the state, u is the control, Y is the Hilbert space of states, U is the Hilbert
space of controls (without any side constraints), J(y, u) : Y × U → R is the functional
to be minimised, and c(y, u) : Y × U → Z represents the PDE constraint relating y
and u. In general, problem (2.1) will permit multiple local minima; the objective of
this paper is to identify as many stationary points of this problem as possible.
Under suitable regularity conditions [39, 19] minima of this problem must satisfy
the first-order necessary conditions
   
Ly
0
(2.2)
F ≡ Lu  = 0 ,
Lp
0
where L is the associated Lagrangian
L = J(y, u) + (p, c),
(2.3)
with p the adjoint variable associated with the PDE constraint. Equation (2.2) forms
a set of three coupled PDEs whose solutions form a superset of the local minima
(local maxima and saddle points also satisfy (2.2)). If many solutions of (2.2) could
be found, then it would be straightforward to postprocess these to identify multiple
local minima of problem (2.1).
After discretisation on a given mesh, Newton’s method may be applied to compute
solutions of (2.2)1 . For concreteness we will consider Newton’s method globalised with
1 The deflation approach can be extended to the case where discretisation is deferred until after
the Newton–Kantorovitch iteration is applied [6], as deflation can be formulated at the continuous
level [9].
4
P. E. FARRELL
line search, but we remark that the results of this paper apply without modification to
trust region methods also. Let F : Rn → Rn be the discretised residual, let JF ∈ Rn×n
be its Jacobian, and let x0 = (y0 , u0 , p0 ) be an initial guess for the minimum. Newton’s
method consists of solving the linear system
PF (xn )−1 JF (xn )δxn+1 = −PF (xn )−1 F (xn ),
(2.4)
where PF is a preconditioner (to be discussed in section 3). This is followed by an
update
xn+1 = xn + µδxn+1 ,
(2.5)
with µ chosen by line search such as the affine-covariant algorithm of Deuflhard [6].
Assume that Newton’s method applied to F converges to a solution r. The core
idea of deflation is to construct a new nonlinear problem G that retains all other
solutions to the problem, but for which Newton’s method will never converge to r
again. Specifically, we construct a new residual G with two properties:
1. For all x 6= r, F (x) = 0 ⇐⇒ G(x) = 0.
2. Given any sequence xi → r,
lim inf kG(xi )k > 0.
xi →r
(2.6)
The former guarantees that solutions of G are also solutions of F , whereas the latter
ensures that Newton’s method will not converge to r again, even starting from the
same initial guess x0 . Newton’s method may in turn be applied to G; supposing
Newton’s method converges to another root, this solution may in turn be deflated
and the procedure repeated. In this way many solutions of (2.2) may be found, even
starting from the same initial guess.
There are many ways to construct a residual G with the desired properties; several
techniques are investigated in [9]. The approach considered in this paper is to define
1
+ α F (x),
(2.7)
G(x) =
kx − rkp
with typical parameter values p = 2 and α = 1. The pole strength p governs the rate
at which the function approaches the introduced singularity, while the shift parameter
α ensures that the deflated residual recovers the behaviour of the original residual far
from previously found solutions, as kx − rk → ∞. For more details see [9].
3. Preconditioners for PDE-constrained optimisation. The ability to efficiently solve a PDE-constrained optimisation problem with an inexact Newton method
typically hinges on the availability of a good preconditioner for the Karush–Kuhn–
Tucker system. This has been the focus of intense research in recent years [10, 3, 33,
30, 5].
Suppose F : RN → RN is the residual of the undeflated discretised nonlinear
problem F (x) = 0 with Jacobian JF and preconditioner PF . Suppose G(x) = ηF (x) =
0 is the deflated problem; its Jacobian at a point x is given by
JG = ηJF + f dT ,
(3.1)
where η ∈ R is the deflation factor, f ∈ RN is the undeflated residual, and d ∈
RN is the derivative of the deflation factor. This motivates the choice of deflated
preconditioner
PG = ηPF + f dT ,
(3.2)
LOCAL MINIMA VIA DEFLATION
5
where the action of PG−1 can be expressed in terms of two actions of PF−1 via the
Sherman–Morrison formula by
η −1 PF−1 f dT PF−1 JF − I
−1
−1
.
(3.3)
PG JG = PF JF −
1 + η −1 dT PF−1 f
In [9], we prove that
kPG−1 JG − Ik ≤ (s (f, d) + 1) · kPF−1 JF − Ik,
(3.4)
where s(·) is well-behaved far from previous solutions. This is satisfactory in the
case where PF−1 does indeed attempt to approximate JF−1 , and hence kPF−1 JF − Ik
is expected to be small. In [9], this preconditioner was demonstrated to control the
growth of the number of Krylov iterations with algebraic multigrid on a scalar-valued
discretised Allen–Cahn problem with up to ∼ 1010 degrees of freedom.
However, as pointed out in [29], controlling kPF−1 JF − Ik is not necessary for the
development of effective preconditioners. Given a problem of the form
A BT
JF =
(3.5)
C
0
with A nonsingular, the preconditioner
A
PF =
0
0
CA−1 B T
(3.6)
will cause an optimal Krylov method to converge in three iterations, as PF−1 JF has a
minimal polynomial of degree three. However, the preconditioned Jacobian is
I
A−1 B T
−1
PF JF =
,
(3.7)
(CA−1 B T )−1 C
0
and since PF−1 JF 6≈ I, the previous theorem on the performance of preconditioners for
the deflated problem does not apply, and we have no guarantees on good performance
after deflation of systems with such preconditioners.
We now remedy this deficiency. We prove a general theorem regarding the number
of distinct eigenvalues of matrices perturbed by an additive update: we will show that
the number of distinct eigenvalues of the perturbed matrix can be bounded in terms
of the rank of the perturbation and the diagonalisability of the original matrix. In
the context of deflation, the preconditioned operator is typically diagonalisable, and
the perturbation is always rank-one: in this case, the theorem states that the number
of distinct eigenvalues can at most double. From this, we will show that the deflated
Jacobian can be solved with optimal Krylov methods with at most twice the number
of iterations as the undeflated Jacobian.
3.1. The number of distinct eigenvalues after a rank-one update. From
(3.3), it is clear that the preconditioned operator after any number of deflations is
a rank-one update of the undeflated preconditioned operator. This motivates the
analysis of the number of distinct eigenvalues of matrices after perturbation by a rankone update. The eigenvalues of a matrix after rank-one perturbation is of interest in
a wide variety of applications and has been studied extensively in various particular
cases, with most work focussing on the case of symmetric perturbations [41, 12, 4, 21].
6
P. E. FARRELL
The most general results concern the Jordan form of the matrix after “generic” rankone perturbations, i.e. the set of rank-one perturbations for which the analysis does
not hold has Lebesgue measure zero [20, 37, 28, 27]. In this work we will first prove
a result regarding symmetric and nonsymmetric diagonalisable matrices perturbed
by arbitrary rank-one updates, and then extend it to the general case of arbitrary
matrices perturbed by updates of arbitrary rank.
Let Λ(M ) be the set of distinct eigenvalues of a matrix M , let ma (M, λ) be the
algebraic multiplicity of λ as an eigenvalue of M , and let mg (M, λ) be the geometric
multiplicity of λ as an eigenvalue of M . We now state the central theorem of this
paper.
Theorem 3.1. Let A ∈ Rn×n be diagonalisable, and B ∈ Rn×n with rank(B) =
1. If C = A + B, then |Λ(C)| ≤ 2 |Λ(A)|.
Proof. Clearly |Λ(C)| = |Λ(C) ∩ Λ(A)| + |Λ(C) \ Λ(A)|, and the first term is
bounded by |Λ(A)|. We seek an upper bound for the quantity
X
ma (C, λ)
(3.8)
λ∈Λ(C)
λ∈Λ(A)
/
as this bounds the number of new eigenvalues that the rank-one perturbation can
introduce. (Every eigenvalue λ of C must have ma (C, λ) ≥ 1.) Since
X
X
ma (C, λ) +
ma (C, λ) = n
(3.9)
λ∈Λ(C)
λ∈Λ(A)
/
λ∈Λ(A)
for fixed matrix size n (with the convention that ma (C, λ) = 0 ⇐⇒ λ ∈
/ Λ(C)), the
bound on the number of new eigenvalues introduced is maximised when
X
ma (C, λ)
(3.10)
λ∈Λ(A)
is minimised.
Let λ ∈ Λ(A) be a distinct eigenvalue of A. We first investigate mg (C, λ), the
geometric multiplicity of λ as an eigenvalue of the perturbed matrix C. Using the
fact that rank(X + Y ) ≤ rank(X) + rank(Y ), we derive a lower bound for mg (C, λ):
rank(A + B − λI) ≤ rank(A − λI) + rank(B)
(3.11a)
=⇒ n − dim ker(A + B − λI) ≤ n − dim ker(A − λI) + 1
(3.11b)
mg (C, λ) ≥ mg (A, λ) − 1.
=⇒
(3.11c)
Hence, the geometric multiplicity of an eigenvalue can at most decrease by one on
perturbation by a rank-one operator.
Recall that ma (M, λ) ≥ mg (M, λ) for all M and λ. It therefore follows that
X
X
ma (C, λ) ≥
mg (C, λ)
(3.12a)
λ∈Λ(A)
λ∈Λ(A)
≥
X
mg (A, λ) − 1 (by (3.11c))
(3.12b)
ma (A, λ) − 1 (by diagonalisability of A)
(3.12c)
λ∈Λ(A)
=
X
λ∈Λ(A)
= n − |Λ(A)| .
(3.12d)
LOCAL MINIMA VIA DEFLATION
7
The maximal number of new eigenvalues is introduced when (3.12) is an equality,
and
X
ma (C, λ) = n − (n − |Λ(A)|) = |Λ(A)| .
(3.13)
λ∈Λ(C)
λ∈Λ(A)
/
Hence
|Λ(C)| = |Λ(C) ∩ Λ(A)| + |Λ(C) \ Λ(A)|
≤ |Λ(A)| + |Λ(A)| = 2 |Λ(A)| .
(3.14)
With this theorem, we can now bound the number of Krylov iterations required
after deflation.
Corollary 3.2. If A = PF−1 JF is diagonalisable, a Newton step for the deflated
problem with C = PG−1 JG can be solved in at most twice as many Krylov iterations as
that of the undeflated problem.
Proof. If C is diagonalisable, then the minimal polynomial degree is |Λ(C)| and
the number of distinct eigenvalues bounds the number of Krylov iterations required to
solve the deflated Jacobian. Now consider the case where C is nondiagonalisable; note
that the set of rank-one perturbations that yield nondiagonalisable C is of Lebesgue
measure zero [28, 37]. From (3.11c) we know that the number of Jordan blocks
associated with an eigenvalue λ ∈ Λ(A) can at most decrease by 1 in C. Since by
diagonalisability all Jordan blocks of A are of size 1 × 1, the largest Jordan block of
C can be at most of size |Λ(A)| × |Λ(A)|. It is straightforward to see that with any
arrangement of new Jordan blocks with sizes adding to |Λ(A)|, the number of Krylov
iterations required is bounded by twice that of the undeflated problem.
3.2. Various extensions. While Theorem 3.1 is sufficient to analyse the case
of current interest arising in deflation, it is possible to extend it in various ways to
give a more complete picture regarding the number of distinct eigenvalues of a matrix
after perturbation.
The first extension considers perturbations of higher rank.
Theorem 3.3. Let A ∈ Rn×n be diagonalisable, and B ∈ Rn×n with rank(B) = r.
If C = A + B, then |Λ(C)| ≤ (r + 1) |Λ(A)|.
Proof. Examining the proof of Theorem 3.1, the fact that B is rank-one is only
used in (3.11b). Generalising this yields
mg (C, λ) ≥ mg (A, λ) − r,
and hence, generalising (3.12d),
X
ma (C, λ) ≥ n − r |Λ(A)| ,
(3.15)
(3.16)
λ∈Λ(A)
and the result follows.
The second extension considers perturbations to nondiagonalisable matrices.
Definition 3.4 (Defectivity of an eigenvalue). The defectivity of an eigenvalue
d(M, λ) ≥ 0 is the difference between its algebraic and geometric multiplicities,
d(M, λ) ≡ ma (M, λ) − mg (M, λ).
(3.17)
8
P. E. FARRELL
Given the defectivities of the eigenvalues, we can define the defectivity of a matrix.
Definition 3.5 (Defectivity of a matrix). The defectivity of a matrix d(M ) is
the sum of the defectivities of its eigenvalues:
X
ma (M, λ) − mg (M, λ).
(3.18)
d(M ) ≡
λ∈Λ(M )
This is a quantitative measure of its degree of nondiagonalisability: a matrix is
diagonalisable if and only if it has defectivity zero.
Remark 1. The defectivity of a matrix is clear from its Jordan form: it is the
number of off-diagonal ones.
We can now extend Theorem 3.3 to nondiagonalisable matrices.
Theorem 3.6. Let A ∈ Rn×n have defectivity 0 ≤ d(A) ≤ n, and B ∈ Rn×n with
rank(B) = r. If C = A + B, then |Λ(C)| ≤ (r + 1) |Λ(A)| + d(A).
Proof. Diagonalisability of A is used precisely once in the proof of Theorem 3.1,
in going from (3.12b) to (3.12c). Generalising this, we find
X
X
ma (C, λ) ≥
mg (C, λ)
(3.19)
λ∈Λ(A)
λ∈Λ(A)
≥
X
mg (A, λ) − r
(3.20)
ma (A, λ) − r − d(A, λ)
(3.21)
λ∈Λ(A)
=
X
λ∈Λ(A)
= n − r |Λ(A)| − d(A),
(3.22)
and the result follows.
4. Inexact inner solves. Theorem 3.1 pertains to deflation when exact arithmetic and exact inner solves are used. However, when inexact inner solves are used,
available results in specific cases indicate that the resulting spectrum of the preconditioned operator is the union of disjoint regions clustered around the eigenvalues
associated with exact inner solves [31, 15]. On the (optimistic) assumption that this
is typical, we give a preliminary analysis of the resulting spectrum after deflation.
Theorem 4.1. Let A be diagonalisable. Let V be the matrix of eigenvectors of
A, and let κ(V ) = kV k2 kV −1 k2 . Let B(λ, δ) be an eigenvalue cluster of A centred at
λ with radius δ. Then after perturbation by B√= uv T , the associated eigenvalues of
C = A + B are contained within B(λ, δ + κ(V ) uT uv T v).
Proof. Consider an eigenvalue λC of C. By the Bauer–Fike theorem [2], there
exists a λA ∈ Λ(A) such that
|λC − λA | ≤ κ(V )kBk2 .
Without loss of generality assume λA ∈ B(λ, δ). Note that kBk2 =
|λ − λC | ≤ |λ − λA | + |λC − λA |
√
≤ δ + κ(V ) uT uv T v,
√
(4.1)
√
uT uv T v. Then
(4.2)
(4.3)
and λC ∈ B(λ, δ + κ(V ) uT uv T v).
Remark 2. It is possible to extend Theorem 4.1 to the case of non-diagonalisable
A by employing an extension of the Bauer–Fike theorem based on its Schur decomposition [13, Theorem 7.2.3].
9
LOCAL MINIMA VIA DEFLATION
If the undeflated preconditioned operator A is symmetric, then κ(V ) = 1. Recall
that in the context of deflation B is the outer product of the preconditioned undeflated residual and the derivative of the deflation factor. Theorem 4.1 states that the
spreading of eigenvalue clusters due to deflation will be small when this residual is
small (near to solutions of the original problem), when the deflation factor is changing
slowly (far away from deflated solutions), and when the preconditioned operator is
close to normality. In the symmetric case, such as in optimisation, this result could
be combined with analysis of the Krylov method employed to bound its convergence
as a function of the residual and the deflation factor. (In the nonsymmetric case the
eigenvalue distribution does not govern the convergence of optimal Krylov methods
[16].)
5. A linear-quartic PDE-constrained optimisation problem. Corollary
3.2 compares the number of Krylov iterations for exact convergence required for a
Newton step at the same iterate. We now consider an experiment to compare the
performance of a block triangular preconditioner as the number of solutions deflated
is varied. Consider the following quartic variant of the problem to identify an optimal
heat source under homogeneous Dirichlet boundary conditions:
Z
Z
β
1
2
2
(y
−
y
)
(y
−
y
)
+
u2
(5.1)
minimise
A
B
2 Ω
y∈H01 , u∈L2 2 Ω
subject to
− ∇2 y = u
in Ω.
(5.2)
(5.3)
Whereas the standard problem with a quadratic misfit term has a unique solution [39],
here the quartic misfit term induces nonlinearity and nonconvexity in the optimisation
problem, and causes it to permit at least two local minima: one (u, y) pair with y
close to yA , and another pair with y close to yB . The Karush–Kuhn–Tucker necessary
conditions are: find (y, u, p) ∈ H01 × L2 × H01 such that
Z
Z
ȳ (y − yA )(y − yB )2 + (y − yA )2 (y − yB ) +
∇p · ∇ȳ = 0,
Ω
Ω
Z
Z
β
ūu −
vū = 0,
(5.4)
Ω
Z
ZΩ
∇p̄ · ∇y −
p̄u = 0,
Ω
H01
2
Ω
H01 .
for all (ȳ, ū, p̄) ∈
×L ×
Upon taking the Newton linearisation and discretising with finite elements, we
find

 
Mp
0
K
δy
 0
βM −M  δu = −F (y, u, p)
(5.5)
K −M
0
δp
where F is the discretisation of the residual (5.4), K is the standard stiffness matrix,
M is the standard mass matrix, and
Z
(Mp )ij =
(y − yB )2 + 4(y − yA )(y − yB ) + (y − yA )2 φi φj ,
(5.6)
Ω
i.e. a mass matrix with a spatially varying coefficient. Due to the spatially varying
coefficient, this matrix is indefinite.
10
P. E. FARRELL
In order to employ a block-triangular preconditioner, a variable must be chosen for elimination to construct the Schur complement. The typical choice in PDEconstrained optimisation is to eliminate the adjoint variable p, as its associated diagonal block is always zero and hence the diagonal preconditioner of [29] may be
employed. This also has the advantage that the top-left block whose inverse action
must be computed in the Schur complement action is itself block-diagonal. However,
on taking the Schur complement with respect to p, we find
Sp = −KMp−1 K +
1
M.
β
(5.7)
Each action of Sp requires the action of Mp−1 , and as noted above this matrix is
indefinite and expensive to solve. Therefore, to avoid this difficulty, we instead choose
to eliminate the state y. This yields the Schur complement
Sy = βKM −1 K + Mp ,
(5.8)
which avoids the necessity of solving linear systems involving Mp for every Schur
complement action. This Schur complement suggests the approximations [33]
Ŝy = βKM −1 K,
S̃y = Mp ,
(5.9)
(5.10)
with inverses
Ŝy−1 = β −1 K −1 M K −1 ,
(5.11)
S̃y−1
(5.12)
=
Mp−1 .
As data, we take
yA = x1 (1 − x1 )x2 (1 − x2 ),
(5.13)
yB = sin (2πx1 ) sin (2πx2 ),
(5.14)
on the unit square Ω = [0, 1]2 . The problem was discretised with piecewise linear finite
elements for all variables and first solved on a mesh of 50 × 50 elements (7803 degrees
of freedom) with NLEQ-ERR and LU, yielding 9 coarse grid solutions from the initial
guess (y, u, p) = (0, 0, 0), including the expected local minima. These solutions were
then used as initial guesses in a grid sequencing technique on a mesh of 1825 × 1825
elements (just over 10 million degrees of freedom) using the preconditioned Krylov
solver to be described, yielding 9 fine grid solutions. Then, for each initial guess, the
number of other fine grid solutions deflated was varied (from 0 to 8) and the number
of outer Krylov iterations required to converge was recorded.
On the fine grid, flexible GMRES [35] was employed as the outer Krylov solver
for (5.5), as this allows for nonlinear preconditioning (inner Krylov solvers) to be
employed. Each Newton step was terminated with a relative tolerance of 10−5 and
an absolute tolerance of 10−10 in the `2 norm. As the bottom right block of the
Jacobian is nonzero with this choice of Schur complement, the full Schur complement
preconditioner
I
0
I A−1 B A−1
0
−1
P =
(5.15)
0
I
0
S −1 B T A−1 I
11
LOCAL MINIMA VIA DEFLATION
Initial guess
0
1
Solutions deflated
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
6
14
9
10
11
6
8
8
11
6
14
9
12
13
8
7
8
11
5
13
9
8
14
8
8
9
9
6
14
9
11
13
9
7
8
11
6
11
9
10
15
9
8
9
9
6
12
9
13
16
8
9
9
9
6
14
9
12
14
9
8
8
9
6
10
11
11
13
8
8
9
9
6
12
8
11
14
9
10
8
11
Table 5.1: The number of outer Krylov iterations required to converge (summed
over all Newton steps), as the number of deflations is varied, with the full Schur
complement preconditioner (5.15). The number of iterations does not systematically
increase as more solutions are deflated, and sometimes decreases.
was employed; with exact inner solvers, this would converge in one FGMRES iteration
[3]. Since no spectrally equivalent operator to the Schur complement was available,
FGMRES was also chosen as the inner Krylov solver for Sy , with a weak relative convergence criterion of 10−1 in `2 norm. The preconditioner for Sy alternated between
200 iterations of (5.11) and 100 iterations of (5.12); although in most cases (5.12)
was not used, this alternating was essential to overcome stalling of some Schur complement solves. The innermost solves of M and K were performed with conjugate
gradients [18] preconditioned by algebraic multigrid [17, 1, 8], while the innermost
solve of Mp was performed with GMRES [36] preconditioned by Jacobi’s method.
These strong innermost solvers were found to be essential for the convergence of the
Schur complement solve, and hence for the convergence of the Newton step.
The Krylov iterations required as the number of deflations was varied are shown
in table 5.1. The behaviour observed is better than the theory would suggest: the
number of iterations required barely increases, and sometimes decreases, even as more
solutions are deflated. The deflated problems can be solved at a modest overhead
relative to the original problem, even with many deflations.
It is sometimes of practical interest to consider preconditioners that yield nondiagonalisable matrices, such as the upper triangular preconditioner
I A−1 B A−1
0
P −1 =
(5.16)
0
I
0
S −1
With exact inner solvers, the resulting preconditioned operator is nondiagonalisable
with minimal polynomial (λ − 1)2 , and an optimal Krylov method would converge
in two iterations. The results for the analogous experiment are shown in table 5.2.
Again, the deflated problems can still be solved at a modest overhead relative to the
original problem, even as the number of deflations is increased.
6. Conclusions. We have proven that, in exact arithmetic and with exact inner
solves, block triangular preconditioners for the Newton step of a deflated problem
can take no more than twice as many iterations as for the corresponding undeflated
problem, so long as the preconditioned operator is diagonalisable. This holds no
12
P. E. FARRELL
Initial guess
0
1
Solutions deflated
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
7
12
9
10
17
10
11
11
8
7
17
11
15
18
9
9
12
12
6
17
11
12
20
13
11
13
12
7
20
12
14
19
11
12
11
13
7
16
13
12
14
15
12
13
13
7
14
14
13
15
14
12
15
14
6
18
12
11
17
14
12
13
13
7
16
12
13
17
12
10
11
10
7
21
12
14
21
14
13
14
10
Table 5.2: The number of outer Krylov iterations required to converge (summed over
all Newton steps), as the number of deflations is varied, with the upper triangular preconditioner (5.16). The increase in iterations is modest, and the number of iterations
does not systematically increase as more solutions are deflated.
matter how many other solutions are deflated, as the update is always rank-one. We
have also performed a preliminary analysis of the practical case where inexact inner
solvers are employed. These results indicate that deflation scales to the computation
of multiple solutions of arbitrary dimensional optimisation problems, so long as the
underlying undeflated problem can be efficiently solved.
References.
[1] M. F. Adams, H. H. Bayraktar, T. M. Keaveny, and P. Papadopoulos, Ultrascalable implicit finite element analyses in solid mechanics with over
a half a billion degrees of freedom, in ACM/IEEE Proceedings of SC2004: High
Performance Networking and Computing, Pittsburgh, Pennsylvania, 2004.
[2] F. L. Bauer and C. T. Fike, Norms and exclusion theorems, Numerische
Mathematik, 2 (1960), pp. 137–141.
[3] M. Benzi, G. H. Golub, and J. Liesen, Numerical solution of saddle point
problems, Acta Numerica, 14 (2005), pp. 1–137.
[4] J. R. Bunch, C. P. Nielsen, and D. C. Sorensen, Rank-one modification
of the symmetric eigenproblem, Numerische Mathematik, 31 (1978), pp. 31–48.
[5] Y. Choi, C. Farhat, W. Murray, and M. A. Saunders, A practical factorization of a Schur complement for PDE-constrained distributed optimal control,
Journal of Scientific Computing, (2014), pp. 1–22.
[6] P. Deuflhard, Newton Methods for Nonlinear Problems, vol. 35, SpringerVerlag, 2011.
[7] W. E, W. Ren, and E. Vanden-Eijnden, String method for the study of rare
events, Physical Review B, 66 (2002), p. 052301.
[8] R. D. Falgout, An introduction to algebraic multigrid computing, Computing
in Science & Engineering, 8 (2006), pp. 24–33.
[9] P. E. Farrell, Á. Birkisson, and S. W. Funke, Deflation techniques for
finding distinct solutions of nonlinear partial differential equations, SIAM Journal
on Scientific Computing, 37 (2015), pp. A2026–A2045.
[10] P. E. Gill, W. Murray, D. B. Ponceleón, and M. A. Saunders, Preconditioners for indefinite systems arising in optimization, SIAM Journal on Matrix
LOCAL MINIMA VIA DEFLATION
13
Analysis and Applications, 13 (1992), pp. 292–311.
[11] A. A. Goldstein and J. F. Price, On descent from local minima, Mathematics
of Computation, 25 (1971), pp. 569–574.
[12] G. H. Golub, Some modified matrix eigenvalue problems, SIAM Review, 15
(1973), pp. 318–334.
[13] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins
University Press, 4th ed., 2012.
[14] N. I. M. Gould, C. Ortner, and D. Packwood, An efficient dimer method
with preconditioning and linesearch, 2014. arXiv:1407.2817 [math.OC].
[15] N. I. M. Gould and V. Simoncini, Spectral analysis of saddle point matrices
with indefinite leading blocks, SIAM Journal on Matrix Analysis and Applications, 31 (2010), pp. 1152–1171.
[16] A. Greenbaum, V. Pták, and Z. Strakoš, Any nonincreasing convergence
curve is possible for GMRES, SIAM Journal on Matrix Analysis and Applications, 17 (1996), pp. 465–469.
[17] V. E. Henson and U. M. Yang, BoomerAMG: A parallel algebraic multigrid
solver and preconditioner, Applied Numerical Mathematics, 41 (2002), pp. 155–
177.
[18] M. R. Hestenes and E. Stiefel, Methods of Conjugate Gradients for solving
linear systems, Journal of Research of the National Bureau of Standards, 49
(1952), pp. 409–436.
[19] M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich, Optimization with
PDE constraints, vol. 23 of Mathematical Modelling: Theory and Applications,
Springer, 2009.
[20] L. Hörmander and A. Melin, A remark on perturbations of compact operators, Mathematica Scandinavica, 75 (1994), pp. 255–262.
[21] I. C. F. Ipsen and B. Nadler, Refined perturbation bounds for eigenvalues of
Hermitian and non-Hermitian matrices, SIAM Journal on Matrix Analysis and
Applications, 31 (2009), pp. 40–53.
[22] A. Jameson, Aerodynamic design via control theory, Journal of Scientific Computing, 3 (1988), pp. 233–260.
[23] F.-X. Le Dimet and O. Talagrand, Variational algorithms for analysis and
assimilation of meteorological observations: theoretical aspects, Tellus A, 38A
(1986), pp. 97–110.
[24] A. V. Levy and S. Gomez, The tunneling method applied to global optimization,
in Numerical Optimization 1984, P. T. Boggs, ed., SIAM, June 1985.
[25] Y. Li and J. Zhou, A minimax method for finding multiple critical points and
its applications to semilinear PDEs, SIAM Journal on Scientific Computing, 23
(2001), pp. 840–865.
[26] R. Martı́, Multi-start methods, in Handbook of Metaheuristics, F. Glover and
G. A. Kochenberger, eds., vol. 57 of International Series in Operations Research
& Management Science, Springer, 2003, pp. 355–368.
[27] C. Mehl, V. Mehrmann, A. C. M. Ran, and L. Rodman, Jordan forms of
real and complex matrices under rank one perturbations, Operators and Matrices,
7 (2013), pp. 381–398.
[28] J. Moro and F. M. Dopico, Low rank perturbation of Jordan structure, SIAM
Journal on Matrix Analysis and Applications, 25 (2003), pp. 495–506.
[29] M. F. Murphy, G. H. Golub, and A. J. Wathen, A note on preconditioning
for indefinite linear systems, SIAM Journal on Scientific Computing, 21 (2000),
14
P. E. FARRELL
pp. 1969–1972.
[30] J. W. Pearson and A. J. Wathen, A new approximation of the Schur complement in preconditioners for PDE-constrained optimization, Numerical Linear
Algebra with Applications, 19 (2012), pp. 816–829.
[31] J. Pestana and A. J. Wathen, Natural preconditioning and iterative methods
for saddle point systems, SIAM Review, 57 (2015), pp. 71–91.
[32] J. D. Pintér, Global optimization: software, test problems, and applications,
in Handbook of Global Optimization, P. M. Pardalos and H. E. Romeijn, eds.,
Springer, 2002, ch. 15, pp. 515–569.
[33] T. Rees, H. S. Dollar, and A. J. Wathen, Optimal solvers for PDEconstrained optimization, SIAM Journal on Scientific Computing, 32 (2010),
pp. 271–298.
[34] A. Richards, University of Oxford Advanced Research Computing, 2015.
10.5281/zenodo.22558.
[35] Y. Saad, A flexible inner-outer preconditioned GMRES algorithm, SIAM Journal
on Scientific Computing, 14 (1993), pp. 461–469.
[36] Y. Saad and M. Schultz, GMRES: a generalized minimal residual algorithm
for solving nonsymmetric linear systems, SIAM Journal on Scientific and Statistical Computing, 7 (1986), pp. 856–869.
[37] S. V. Savchenko, Typical changes in spectral properties under perturbations by
a rank-one operator, Mathematical Notes, 74 (2003), pp. 557–0568.
[38] A. M. Stuart, Inverse problems: a Bayesian perspective, Acta Numerica, 19
(2010), pp. 451–559.
[39] F. Tröltzsch, Optimal control of partial differential equations: Theory, methods
and applications, vol. 112 of Graduate Studies in Mathematics, AMS, 2010.
[40] S. Varrette, P. Bouvry, H. Cartiaux, and F. Georgatos, Management
of an academic HPC cluster: the UL experience, in Proc. of the 2014 Intl. Conf.
on High Performance Computing & Simulation (HPCS 2014), Bologna, Italy,
July 2014, IEEE, pp. 959–967.
[41] J. H. Wilkinson, The Algebraic Eigenvalue Problem, vol. 87 of Monographs on
Numerical Analysis, Oxford University Press, 1965.