MULTIPLE LOCAL MINIMA OF PDE-CONSTRAINED OPTIMISATION PROBLEMS VIA DEFLATION P. E. FARRELL∗ Abstract. Nonconvex optimisation problems constrained by partial differential equations (PDEs) may permit distinct local minima. In this paper we present a numerical technique, called deflation, for computing multiple local solutions of such optimisation problems. The basic approach is to apply a nonlinear transformation to the Karush–Kuhn–Tucker optimality conditions that eliminates previously found solutions from consideration. Starting from some initial guess, Newton’s method is used to find a stationary point of the Lagrangian; this solution is then deflated away, and Newton’s method is initialised from the same initial guess to find other solutions. In this paper, we investigate how the Schur complement preconditioners widely used in PDE-constrained optimisation perform after deflation. We prove an upper bound on the number of new distinct eigenvalues of a matrix after an arbitrary additive perturbation; from this it follows that for diagonalisable operators the number of Krylov iterations required for exact convergence of the Newton step at most doubles compared to the undeflated problem. While deflation is not guaranteed to converge to all minima, these results indicate the approach scales to arbitrary-dimensional problems if a scalable Schur complement preconditioner is available. The technique is demonstrated on a discretised nonconvex PDE-constrained optimisation problem with approximately ten million degrees of freedom. Key words. distinct eigenvalues, deflation, Newton’s method, multiple local minima, PDEconstrained optimisation. AMS subject classifications. 35Q93, 90C26, 65K10, 65N30 1. Introduction. Many situations in science, engineering and economics can be modelled as the optimisation of some functional constrained by a system of partial differential equations along with suitable boundary conditions. Important examples include the optimisation of aircraft to minimise drag and maximise lift [22] and variational data assimilation in numerical weather prediction [23]. In general, one wishes to compute the feasible input that attains the global minimum of the functional to be minimised; in practice, rigorous global optimisation techniques are too expensive for high-dimensional problems, and heuristic strategies are employed instead [32]. In other situations, it is desirable to compute as many local minima of the functional as possible: for example, the aesthetics of the optimal designs may be crucial, but it is difficult to quantitatively formulate this as a constraint on an optimisation problem. Another situation where this arises is multimodal Bayesian inference [38], where multiple modes of the posterior are characterised by multiple local minima of the negative logarithm of the posterior density. In this paper we propose a new strategy for computing distinct local minima of PDE-constrained optimisation problems. Suppose one solution of the Karush–Kuhn– Tucker optimality conditions has been computed (e.g. with some inexact Newton method) and additional solutions are sought. A new deflated problem is constructed ∗ Mathematical Institute, University of Oxford, Oxford, UK. Center for Biomedical Computing, Simula Research Laboratory, Oslo, Norway ([email protected]). This research is funded by EPSRC grants EP/K030930/1 and EP/M019721/1, an ARCHER RAP award, and a Center of Excellence grant from the Research Council of Norway to the Center for Biomedical Computing at Simula Research Laboratory. The author would like to acknowledge the use of the University of Oxford Advanced Research Computing (ARC) facility in carrying out this work [34]. Experiments presented in this paper were carried out using the HPC facilities of the University of Luxembourg [40]. This work used the ARCHER UK National Supercomputing Service. The author would like to thank A. J. Wathen, J. S. Hale, M. A. Saunders, C. Beentjes, and N. I. M. Gould for useful discussions. 1 2 P. E. FARRELL with two properties: first, that other solutions of the original problem are also solutions of the deflated problem, and vice versa; and second, that Newton’s method (or another rootfinding technique) applied to the deflated problem will never converge to the solution already available [9]. In this manner, Newton’s method may be started from the same initial guess and converge to a new solution of the optimality conditions. The process is repeated until Newton’s method fails to converge. This procedure will also identify local maxima and saddle points, but it is straightforward to characterise these after as many stationary points as possible have been found. In fact, in many applications it is desirable to compute saddle points also [25, 7, 14]. This deflation approach is not guaranteed to find all local minima, as it is not possible to guarantee that the nonlinear rootfinding algorithm employed will converge from arbitrarily bad initial guesses, even if a solution to the deflated problem exists. However, when deflation is combined with a robust line search such as NLEQ-ERR [6], many stationary points can be found, even with uninformed initial guesses. The key computational problem to solve is the efficient preconditioning of the linear systems that arise in Newton’s method applied to the deflated problem. If a good preconditioner is available, then the deflation approach can scale to problems of arbitrarily large dimension. Let JF and PF be the undeflated discretised Jacobian and associated preconditioner, and let JG and PG be their deflated counterparts (JG follows from the chain rule, PG is to be defined). In previous work [9], we proposed a preconditioning strategy for PG in terms of PF and showed that kPG−1 JG −Ik could be bounded in terms of a small multiple of kPF−1 JF − Ik. However, it is not necessary for a good preconditioner to control kPF−1 JF − Ik. As pointed out in [29], it is sufficient that PF−1 JF have a minimal polynomial of small degree; [29] give an example where PF−1 in no sense approximates JF−1 , but an optimal Krylov method will converge in three iterations when applied to PF−1 JF . The previous results of [9] do not address this case. In this paper we analyse the number of distinct eigenvalues of a matrix after an additive perturbation. We prove a new theorem relating the number of distinct eigenvalues after perturbation to the prior number of distinct eigenvalues, the rank of the update, and the degree of nondiagonalisability of the matrix. Applying this theorem to the case of deflation, we prove that if the preconditioned operator is diagonalisable, the number of distinct eigenvalues of PG−1 JG can be no more than twice the number of distinct eigenvalues of PF−1 JF . Hence, by studying the resulting minimal polynomial, we show that any Krylov method with an optimality or Galerkin property will converge exactly in no more than twice the number of iterations required for the undeflated problem, in exact arithmetic. This holds regardless of the number of solutions deflated, i.e. the number of Krylov iterations required only doubles once, rather than doubling after each deflation. This analysis is an essential step in extending deflation to large-scale optimisation problems, as such preconditioners are ubiquitous in optimisation [30]. We now contrast the proposed deflation approach to other related approaches. The approach of Goldstein and Price [11] attempts to find superior local minima by performing Taylor expansions around a minimum already obtained and minimising auxiliary functions that involve higher derivatives of the function concerned. In the context of PDE-constrained optimisation, the need for further derivatives (beyond Hessians of the Lagrangian) is unattractive, because the problem may not possess sufficient regularity and the number of auxiliary PDEs to solve in the computation of derivatives of the reduced functional is exponential in the degree. Another class LOCAL MINIMA VIA DEFLATION 3 of algorithms, multistart methods, attempt to find multiple local minima by starting independent local optimisation runs from random initial points [26]. The tunnelling method of Levy and Gomez [24] is the closest in spirit to the approach proposed here: tunnelling employs a local optimisation algorithm to identify an initial minimum x∗ of a functional J, and then seeks other values of the controls that attain the same value, by finding a root of the function J(x) − J(x∗ ) = 0 (“tunnelling” into another valley). If such an x is computed, the local optimisation algorithm is initialised from this new point and the process repeated. In order to ensure that the tunnelling phase converges to a solution other than x∗ , the authors add a pole to the function to ensure nonconvergence to x∗ , exactly the deflation concept investigated in this work. However, there are several key differences. Levy and Gomez apply deflation to the tunnelling function, whereas here deflation is applied to the optimality conditions directly. Tunnelling attempts to find one global minimum, ignoring undesired local minima, whereas here we are concerned with identifying as many stationary points as possible. Finally, Levy and Gomez do not discuss strategies for dealing with the density of the Jacobian induced by deflation: all examples considered have small, dense Jacobians, and hence there is no need for the analysis of preconditioned Krylov methods, as is central to this work. 2. Deflation. In this section we review the deflation strategy for nonlinear equations proposed in [9]. Consider the following generic PDE-constrained optimisation problem: minimise J(y, u) y∈Y, u∈U subject to c(y, u) = 0, (2.1) where y is the state, u is the control, Y is the Hilbert space of states, U is the Hilbert space of controls (without any side constraints), J(y, u) : Y × U → R is the functional to be minimised, and c(y, u) : Y × U → Z represents the PDE constraint relating y and u. In general, problem (2.1) will permit multiple local minima; the objective of this paper is to identify as many stationary points of this problem as possible. Under suitable regularity conditions [39, 19] minima of this problem must satisfy the first-order necessary conditions Ly 0 (2.2) F ≡ Lu = 0 , Lp 0 where L is the associated Lagrangian L = J(y, u) + (p, c), (2.3) with p the adjoint variable associated with the PDE constraint. Equation (2.2) forms a set of three coupled PDEs whose solutions form a superset of the local minima (local maxima and saddle points also satisfy (2.2)). If many solutions of (2.2) could be found, then it would be straightforward to postprocess these to identify multiple local minima of problem (2.1). After discretisation on a given mesh, Newton’s method may be applied to compute solutions of (2.2)1 . For concreteness we will consider Newton’s method globalised with 1 The deflation approach can be extended to the case where discretisation is deferred until after the Newton–Kantorovitch iteration is applied [6], as deflation can be formulated at the continuous level [9]. 4 P. E. FARRELL line search, but we remark that the results of this paper apply without modification to trust region methods also. Let F : Rn → Rn be the discretised residual, let JF ∈ Rn×n be its Jacobian, and let x0 = (y0 , u0 , p0 ) be an initial guess for the minimum. Newton’s method consists of solving the linear system PF (xn )−1 JF (xn )δxn+1 = −PF (xn )−1 F (xn ), (2.4) where PF is a preconditioner (to be discussed in section 3). This is followed by an update xn+1 = xn + µδxn+1 , (2.5) with µ chosen by line search such as the affine-covariant algorithm of Deuflhard [6]. Assume that Newton’s method applied to F converges to a solution r. The core idea of deflation is to construct a new nonlinear problem G that retains all other solutions to the problem, but for which Newton’s method will never converge to r again. Specifically, we construct a new residual G with two properties: 1. For all x 6= r, F (x) = 0 ⇐⇒ G(x) = 0. 2. Given any sequence xi → r, lim inf kG(xi )k > 0. xi →r (2.6) The former guarantees that solutions of G are also solutions of F , whereas the latter ensures that Newton’s method will not converge to r again, even starting from the same initial guess x0 . Newton’s method may in turn be applied to G; supposing Newton’s method converges to another root, this solution may in turn be deflated and the procedure repeated. In this way many solutions of (2.2) may be found, even starting from the same initial guess. There are many ways to construct a residual G with the desired properties; several techniques are investigated in [9]. The approach considered in this paper is to define 1 + α F (x), (2.7) G(x) = kx − rkp with typical parameter values p = 2 and α = 1. The pole strength p governs the rate at which the function approaches the introduced singularity, while the shift parameter α ensures that the deflated residual recovers the behaviour of the original residual far from previously found solutions, as kx − rk → ∞. For more details see [9]. 3. Preconditioners for PDE-constrained optimisation. The ability to efficiently solve a PDE-constrained optimisation problem with an inexact Newton method typically hinges on the availability of a good preconditioner for the Karush–Kuhn– Tucker system. This has been the focus of intense research in recent years [10, 3, 33, 30, 5]. Suppose F : RN → RN is the residual of the undeflated discretised nonlinear problem F (x) = 0 with Jacobian JF and preconditioner PF . Suppose G(x) = ηF (x) = 0 is the deflated problem; its Jacobian at a point x is given by JG = ηJF + f dT , (3.1) where η ∈ R is the deflation factor, f ∈ RN is the undeflated residual, and d ∈ RN is the derivative of the deflation factor. This motivates the choice of deflated preconditioner PG = ηPF + f dT , (3.2) LOCAL MINIMA VIA DEFLATION 5 where the action of PG−1 can be expressed in terms of two actions of PF−1 via the Sherman–Morrison formula by η −1 PF−1 f dT PF−1 JF − I −1 −1 . (3.3) PG JG = PF JF − 1 + η −1 dT PF−1 f In [9], we prove that kPG−1 JG − Ik ≤ (s (f, d) + 1) · kPF−1 JF − Ik, (3.4) where s(·) is well-behaved far from previous solutions. This is satisfactory in the case where PF−1 does indeed attempt to approximate JF−1 , and hence kPF−1 JF − Ik is expected to be small. In [9], this preconditioner was demonstrated to control the growth of the number of Krylov iterations with algebraic multigrid on a scalar-valued discretised Allen–Cahn problem with up to ∼ 1010 degrees of freedom. However, as pointed out in [29], controlling kPF−1 JF − Ik is not necessary for the development of effective preconditioners. Given a problem of the form A BT JF = (3.5) C 0 with A nonsingular, the preconditioner A PF = 0 0 CA−1 B T (3.6) will cause an optimal Krylov method to converge in three iterations, as PF−1 JF has a minimal polynomial of degree three. However, the preconditioned Jacobian is I A−1 B T −1 PF JF = , (3.7) (CA−1 B T )−1 C 0 and since PF−1 JF 6≈ I, the previous theorem on the performance of preconditioners for the deflated problem does not apply, and we have no guarantees on good performance after deflation of systems with such preconditioners. We now remedy this deficiency. We prove a general theorem regarding the number of distinct eigenvalues of matrices perturbed by an additive update: we will show that the number of distinct eigenvalues of the perturbed matrix can be bounded in terms of the rank of the perturbation and the diagonalisability of the original matrix. In the context of deflation, the preconditioned operator is typically diagonalisable, and the perturbation is always rank-one: in this case, the theorem states that the number of distinct eigenvalues can at most double. From this, we will show that the deflated Jacobian can be solved with optimal Krylov methods with at most twice the number of iterations as the undeflated Jacobian. 3.1. The number of distinct eigenvalues after a rank-one update. From (3.3), it is clear that the preconditioned operator after any number of deflations is a rank-one update of the undeflated preconditioned operator. This motivates the analysis of the number of distinct eigenvalues of matrices after perturbation by a rankone update. The eigenvalues of a matrix after rank-one perturbation is of interest in a wide variety of applications and has been studied extensively in various particular cases, with most work focussing on the case of symmetric perturbations [41, 12, 4, 21]. 6 P. E. FARRELL The most general results concern the Jordan form of the matrix after “generic” rankone perturbations, i.e. the set of rank-one perturbations for which the analysis does not hold has Lebesgue measure zero [20, 37, 28, 27]. In this work we will first prove a result regarding symmetric and nonsymmetric diagonalisable matrices perturbed by arbitrary rank-one updates, and then extend it to the general case of arbitrary matrices perturbed by updates of arbitrary rank. Let Λ(M ) be the set of distinct eigenvalues of a matrix M , let ma (M, λ) be the algebraic multiplicity of λ as an eigenvalue of M , and let mg (M, λ) be the geometric multiplicity of λ as an eigenvalue of M . We now state the central theorem of this paper. Theorem 3.1. Let A ∈ Rn×n be diagonalisable, and B ∈ Rn×n with rank(B) = 1. If C = A + B, then |Λ(C)| ≤ 2 |Λ(A)|. Proof. Clearly |Λ(C)| = |Λ(C) ∩ Λ(A)| + |Λ(C) \ Λ(A)|, and the first term is bounded by |Λ(A)|. We seek an upper bound for the quantity X ma (C, λ) (3.8) λ∈Λ(C) λ∈Λ(A) / as this bounds the number of new eigenvalues that the rank-one perturbation can introduce. (Every eigenvalue λ of C must have ma (C, λ) ≥ 1.) Since X X ma (C, λ) + ma (C, λ) = n (3.9) λ∈Λ(C) λ∈Λ(A) / λ∈Λ(A) for fixed matrix size n (with the convention that ma (C, λ) = 0 ⇐⇒ λ ∈ / Λ(C)), the bound on the number of new eigenvalues introduced is maximised when X ma (C, λ) (3.10) λ∈Λ(A) is minimised. Let λ ∈ Λ(A) be a distinct eigenvalue of A. We first investigate mg (C, λ), the geometric multiplicity of λ as an eigenvalue of the perturbed matrix C. Using the fact that rank(X + Y ) ≤ rank(X) + rank(Y ), we derive a lower bound for mg (C, λ): rank(A + B − λI) ≤ rank(A − λI) + rank(B) (3.11a) =⇒ n − dim ker(A + B − λI) ≤ n − dim ker(A − λI) + 1 (3.11b) mg (C, λ) ≥ mg (A, λ) − 1. =⇒ (3.11c) Hence, the geometric multiplicity of an eigenvalue can at most decrease by one on perturbation by a rank-one operator. Recall that ma (M, λ) ≥ mg (M, λ) for all M and λ. It therefore follows that X X ma (C, λ) ≥ mg (C, λ) (3.12a) λ∈Λ(A) λ∈Λ(A) ≥ X mg (A, λ) − 1 (by (3.11c)) (3.12b) ma (A, λ) − 1 (by diagonalisability of A) (3.12c) λ∈Λ(A) = X λ∈Λ(A) = n − |Λ(A)| . (3.12d) LOCAL MINIMA VIA DEFLATION 7 The maximal number of new eigenvalues is introduced when (3.12) is an equality, and X ma (C, λ) = n − (n − |Λ(A)|) = |Λ(A)| . (3.13) λ∈Λ(C) λ∈Λ(A) / Hence |Λ(C)| = |Λ(C) ∩ Λ(A)| + |Λ(C) \ Λ(A)| ≤ |Λ(A)| + |Λ(A)| = 2 |Λ(A)| . (3.14) With this theorem, we can now bound the number of Krylov iterations required after deflation. Corollary 3.2. If A = PF−1 JF is diagonalisable, a Newton step for the deflated problem with C = PG−1 JG can be solved in at most twice as many Krylov iterations as that of the undeflated problem. Proof. If C is diagonalisable, then the minimal polynomial degree is |Λ(C)| and the number of distinct eigenvalues bounds the number of Krylov iterations required to solve the deflated Jacobian. Now consider the case where C is nondiagonalisable; note that the set of rank-one perturbations that yield nondiagonalisable C is of Lebesgue measure zero [28, 37]. From (3.11c) we know that the number of Jordan blocks associated with an eigenvalue λ ∈ Λ(A) can at most decrease by 1 in C. Since by diagonalisability all Jordan blocks of A are of size 1 × 1, the largest Jordan block of C can be at most of size |Λ(A)| × |Λ(A)|. It is straightforward to see that with any arrangement of new Jordan blocks with sizes adding to |Λ(A)|, the number of Krylov iterations required is bounded by twice that of the undeflated problem. 3.2. Various extensions. While Theorem 3.1 is sufficient to analyse the case of current interest arising in deflation, it is possible to extend it in various ways to give a more complete picture regarding the number of distinct eigenvalues of a matrix after perturbation. The first extension considers perturbations of higher rank. Theorem 3.3. Let A ∈ Rn×n be diagonalisable, and B ∈ Rn×n with rank(B) = r. If C = A + B, then |Λ(C)| ≤ (r + 1) |Λ(A)|. Proof. Examining the proof of Theorem 3.1, the fact that B is rank-one is only used in (3.11b). Generalising this yields mg (C, λ) ≥ mg (A, λ) − r, and hence, generalising (3.12d), X ma (C, λ) ≥ n − r |Λ(A)| , (3.15) (3.16) λ∈Λ(A) and the result follows. The second extension considers perturbations to nondiagonalisable matrices. Definition 3.4 (Defectivity of an eigenvalue). The defectivity of an eigenvalue d(M, λ) ≥ 0 is the difference between its algebraic and geometric multiplicities, d(M, λ) ≡ ma (M, λ) − mg (M, λ). (3.17) 8 P. E. FARRELL Given the defectivities of the eigenvalues, we can define the defectivity of a matrix. Definition 3.5 (Defectivity of a matrix). The defectivity of a matrix d(M ) is the sum of the defectivities of its eigenvalues: X ma (M, λ) − mg (M, λ). (3.18) d(M ) ≡ λ∈Λ(M ) This is a quantitative measure of its degree of nondiagonalisability: a matrix is diagonalisable if and only if it has defectivity zero. Remark 1. The defectivity of a matrix is clear from its Jordan form: it is the number of off-diagonal ones. We can now extend Theorem 3.3 to nondiagonalisable matrices. Theorem 3.6. Let A ∈ Rn×n have defectivity 0 ≤ d(A) ≤ n, and B ∈ Rn×n with rank(B) = r. If C = A + B, then |Λ(C)| ≤ (r + 1) |Λ(A)| + d(A). Proof. Diagonalisability of A is used precisely once in the proof of Theorem 3.1, in going from (3.12b) to (3.12c). Generalising this, we find X X ma (C, λ) ≥ mg (C, λ) (3.19) λ∈Λ(A) λ∈Λ(A) ≥ X mg (A, λ) − r (3.20) ma (A, λ) − r − d(A, λ) (3.21) λ∈Λ(A) = X λ∈Λ(A) = n − r |Λ(A)| − d(A), (3.22) and the result follows. 4. Inexact inner solves. Theorem 3.1 pertains to deflation when exact arithmetic and exact inner solves are used. However, when inexact inner solves are used, available results in specific cases indicate that the resulting spectrum of the preconditioned operator is the union of disjoint regions clustered around the eigenvalues associated with exact inner solves [31, 15]. On the (optimistic) assumption that this is typical, we give a preliminary analysis of the resulting spectrum after deflation. Theorem 4.1. Let A be diagonalisable. Let V be the matrix of eigenvectors of A, and let κ(V ) = kV k2 kV −1 k2 . Let B(λ, δ) be an eigenvalue cluster of A centred at λ with radius δ. Then after perturbation by B√= uv T , the associated eigenvalues of C = A + B are contained within B(λ, δ + κ(V ) uT uv T v). Proof. Consider an eigenvalue λC of C. By the Bauer–Fike theorem [2], there exists a λA ∈ Λ(A) such that |λC − λA | ≤ κ(V )kBk2 . Without loss of generality assume λA ∈ B(λ, δ). Note that kBk2 = |λ − λC | ≤ |λ − λA | + |λC − λA | √ ≤ δ + κ(V ) uT uv T v, √ (4.1) √ uT uv T v. Then (4.2) (4.3) and λC ∈ B(λ, δ + κ(V ) uT uv T v). Remark 2. It is possible to extend Theorem 4.1 to the case of non-diagonalisable A by employing an extension of the Bauer–Fike theorem based on its Schur decomposition [13, Theorem 7.2.3]. 9 LOCAL MINIMA VIA DEFLATION If the undeflated preconditioned operator A is symmetric, then κ(V ) = 1. Recall that in the context of deflation B is the outer product of the preconditioned undeflated residual and the derivative of the deflation factor. Theorem 4.1 states that the spreading of eigenvalue clusters due to deflation will be small when this residual is small (near to solutions of the original problem), when the deflation factor is changing slowly (far away from deflated solutions), and when the preconditioned operator is close to normality. In the symmetric case, such as in optimisation, this result could be combined with analysis of the Krylov method employed to bound its convergence as a function of the residual and the deflation factor. (In the nonsymmetric case the eigenvalue distribution does not govern the convergence of optimal Krylov methods [16].) 5. A linear-quartic PDE-constrained optimisation problem. Corollary 3.2 compares the number of Krylov iterations for exact convergence required for a Newton step at the same iterate. We now consider an experiment to compare the performance of a block triangular preconditioner as the number of solutions deflated is varied. Consider the following quartic variant of the problem to identify an optimal heat source under homogeneous Dirichlet boundary conditions: Z Z β 1 2 2 (y − y ) (y − y ) + u2 (5.1) minimise A B 2 Ω y∈H01 , u∈L2 2 Ω subject to − ∇2 y = u in Ω. (5.2) (5.3) Whereas the standard problem with a quadratic misfit term has a unique solution [39], here the quartic misfit term induces nonlinearity and nonconvexity in the optimisation problem, and causes it to permit at least two local minima: one (u, y) pair with y close to yA , and another pair with y close to yB . The Karush–Kuhn–Tucker necessary conditions are: find (y, u, p) ∈ H01 × L2 × H01 such that Z Z ȳ (y − yA )(y − yB )2 + (y − yA )2 (y − yB ) + ∇p · ∇ȳ = 0, Ω Ω Z Z β ūu − vū = 0, (5.4) Ω Z ZΩ ∇p̄ · ∇y − p̄u = 0, Ω H01 2 Ω H01 . for all (ȳ, ū, p̄) ∈ ×L × Upon taking the Newton linearisation and discretising with finite elements, we find Mp 0 K δy 0 βM −M δu = −F (y, u, p) (5.5) K −M 0 δp where F is the discretisation of the residual (5.4), K is the standard stiffness matrix, M is the standard mass matrix, and Z (Mp )ij = (y − yB )2 + 4(y − yA )(y − yB ) + (y − yA )2 φi φj , (5.6) Ω i.e. a mass matrix with a spatially varying coefficient. Due to the spatially varying coefficient, this matrix is indefinite. 10 P. E. FARRELL In order to employ a block-triangular preconditioner, a variable must be chosen for elimination to construct the Schur complement. The typical choice in PDEconstrained optimisation is to eliminate the adjoint variable p, as its associated diagonal block is always zero and hence the diagonal preconditioner of [29] may be employed. This also has the advantage that the top-left block whose inverse action must be computed in the Schur complement action is itself block-diagonal. However, on taking the Schur complement with respect to p, we find Sp = −KMp−1 K + 1 M. β (5.7) Each action of Sp requires the action of Mp−1 , and as noted above this matrix is indefinite and expensive to solve. Therefore, to avoid this difficulty, we instead choose to eliminate the state y. This yields the Schur complement Sy = βKM −1 K + Mp , (5.8) which avoids the necessity of solving linear systems involving Mp for every Schur complement action. This Schur complement suggests the approximations [33] Ŝy = βKM −1 K, S̃y = Mp , (5.9) (5.10) with inverses Ŝy−1 = β −1 K −1 M K −1 , (5.11) S̃y−1 (5.12) = Mp−1 . As data, we take yA = x1 (1 − x1 )x2 (1 − x2 ), (5.13) yB = sin (2πx1 ) sin (2πx2 ), (5.14) on the unit square Ω = [0, 1]2 . The problem was discretised with piecewise linear finite elements for all variables and first solved on a mesh of 50 × 50 elements (7803 degrees of freedom) with NLEQ-ERR and LU, yielding 9 coarse grid solutions from the initial guess (y, u, p) = (0, 0, 0), including the expected local minima. These solutions were then used as initial guesses in a grid sequencing technique on a mesh of 1825 × 1825 elements (just over 10 million degrees of freedom) using the preconditioned Krylov solver to be described, yielding 9 fine grid solutions. Then, for each initial guess, the number of other fine grid solutions deflated was varied (from 0 to 8) and the number of outer Krylov iterations required to converge was recorded. On the fine grid, flexible GMRES [35] was employed as the outer Krylov solver for (5.5), as this allows for nonlinear preconditioning (inner Krylov solvers) to be employed. Each Newton step was terminated with a relative tolerance of 10−5 and an absolute tolerance of 10−10 in the `2 norm. As the bottom right block of the Jacobian is nonzero with this choice of Schur complement, the full Schur complement preconditioner I 0 I A−1 B A−1 0 −1 P = (5.15) 0 I 0 S −1 B T A−1 I 11 LOCAL MINIMA VIA DEFLATION Initial guess 0 1 Solutions deflated 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 6 14 9 10 11 6 8 8 11 6 14 9 12 13 8 7 8 11 5 13 9 8 14 8 8 9 9 6 14 9 11 13 9 7 8 11 6 11 9 10 15 9 8 9 9 6 12 9 13 16 8 9 9 9 6 14 9 12 14 9 8 8 9 6 10 11 11 13 8 8 9 9 6 12 8 11 14 9 10 8 11 Table 5.1: The number of outer Krylov iterations required to converge (summed over all Newton steps), as the number of deflations is varied, with the full Schur complement preconditioner (5.15). The number of iterations does not systematically increase as more solutions are deflated, and sometimes decreases. was employed; with exact inner solvers, this would converge in one FGMRES iteration [3]. Since no spectrally equivalent operator to the Schur complement was available, FGMRES was also chosen as the inner Krylov solver for Sy , with a weak relative convergence criterion of 10−1 in `2 norm. The preconditioner for Sy alternated between 200 iterations of (5.11) and 100 iterations of (5.12); although in most cases (5.12) was not used, this alternating was essential to overcome stalling of some Schur complement solves. The innermost solves of M and K were performed with conjugate gradients [18] preconditioned by algebraic multigrid [17, 1, 8], while the innermost solve of Mp was performed with GMRES [36] preconditioned by Jacobi’s method. These strong innermost solvers were found to be essential for the convergence of the Schur complement solve, and hence for the convergence of the Newton step. The Krylov iterations required as the number of deflations was varied are shown in table 5.1. The behaviour observed is better than the theory would suggest: the number of iterations required barely increases, and sometimes decreases, even as more solutions are deflated. The deflated problems can be solved at a modest overhead relative to the original problem, even with many deflations. It is sometimes of practical interest to consider preconditioners that yield nondiagonalisable matrices, such as the upper triangular preconditioner I A−1 B A−1 0 P −1 = (5.16) 0 I 0 S −1 With exact inner solvers, the resulting preconditioned operator is nondiagonalisable with minimal polynomial (λ − 1)2 , and an optimal Krylov method would converge in two iterations. The results for the analogous experiment are shown in table 5.2. Again, the deflated problems can still be solved at a modest overhead relative to the original problem, even as the number of deflations is increased. 6. Conclusions. We have proven that, in exact arithmetic and with exact inner solves, block triangular preconditioners for the Newton step of a deflated problem can take no more than twice as many iterations as for the corresponding undeflated problem, so long as the preconditioned operator is diagonalisable. This holds no 12 P. E. FARRELL Initial guess 0 1 Solutions deflated 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 7 12 9 10 17 10 11 11 8 7 17 11 15 18 9 9 12 12 6 17 11 12 20 13 11 13 12 7 20 12 14 19 11 12 11 13 7 16 13 12 14 15 12 13 13 7 14 14 13 15 14 12 15 14 6 18 12 11 17 14 12 13 13 7 16 12 13 17 12 10 11 10 7 21 12 14 21 14 13 14 10 Table 5.2: The number of outer Krylov iterations required to converge (summed over all Newton steps), as the number of deflations is varied, with the upper triangular preconditioner (5.16). The increase in iterations is modest, and the number of iterations does not systematically increase as more solutions are deflated. matter how many other solutions are deflated, as the update is always rank-one. We have also performed a preliminary analysis of the practical case where inexact inner solvers are employed. These results indicate that deflation scales to the computation of multiple solutions of arbitrary dimensional optimisation problems, so long as the underlying undeflated problem can be efficiently solved. References. [1] M. F. Adams, H. H. Bayraktar, T. M. Keaveny, and P. Papadopoulos, Ultrascalable implicit finite element analyses in solid mechanics with over a half a billion degrees of freedom, in ACM/IEEE Proceedings of SC2004: High Performance Networking and Computing, Pittsburgh, Pennsylvania, 2004. [2] F. L. Bauer and C. T. Fike, Norms and exclusion theorems, Numerische Mathematik, 2 (1960), pp. 137–141. [3] M. Benzi, G. H. Golub, and J. Liesen, Numerical solution of saddle point problems, Acta Numerica, 14 (2005), pp. 1–137. [4] J. R. Bunch, C. P. Nielsen, and D. C. Sorensen, Rank-one modification of the symmetric eigenproblem, Numerische Mathematik, 31 (1978), pp. 31–48. [5] Y. Choi, C. Farhat, W. Murray, and M. A. Saunders, A practical factorization of a Schur complement for PDE-constrained distributed optimal control, Journal of Scientific Computing, (2014), pp. 1–22. [6] P. Deuflhard, Newton Methods for Nonlinear Problems, vol. 35, SpringerVerlag, 2011. [7] W. E, W. Ren, and E. Vanden-Eijnden, String method for the study of rare events, Physical Review B, 66 (2002), p. 052301. [8] R. D. Falgout, An introduction to algebraic multigrid computing, Computing in Science & Engineering, 8 (2006), pp. 24–33. [9] P. E. Farrell, Á. Birkisson, and S. W. Funke, Deflation techniques for finding distinct solutions of nonlinear partial differential equations, SIAM Journal on Scientific Computing, 37 (2015), pp. A2026–A2045. [10] P. E. Gill, W. Murray, D. B. Ponceleón, and M. A. Saunders, Preconditioners for indefinite systems arising in optimization, SIAM Journal on Matrix LOCAL MINIMA VIA DEFLATION 13 Analysis and Applications, 13 (1992), pp. 292–311. [11] A. A. Goldstein and J. F. Price, On descent from local minima, Mathematics of Computation, 25 (1971), pp. 569–574. [12] G. H. Golub, Some modified matrix eigenvalue problems, SIAM Review, 15 (1973), pp. 318–334. [13] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, 4th ed., 2012. [14] N. I. M. Gould, C. Ortner, and D. Packwood, An efficient dimer method with preconditioning and linesearch, 2014. arXiv:1407.2817 [math.OC]. [15] N. I. M. Gould and V. Simoncini, Spectral analysis of saddle point matrices with indefinite leading blocks, SIAM Journal on Matrix Analysis and Applications, 31 (2010), pp. 1152–1171. [16] A. Greenbaum, V. Pták, and Z. Strakoš, Any nonincreasing convergence curve is possible for GMRES, SIAM Journal on Matrix Analysis and Applications, 17 (1996), pp. 465–469. [17] V. E. Henson and U. M. Yang, BoomerAMG: A parallel algebraic multigrid solver and preconditioner, Applied Numerical Mathematics, 41 (2002), pp. 155– 177. [18] M. R. Hestenes and E. Stiefel, Methods of Conjugate Gradients for solving linear systems, Journal of Research of the National Bureau of Standards, 49 (1952), pp. 409–436. [19] M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich, Optimization with PDE constraints, vol. 23 of Mathematical Modelling: Theory and Applications, Springer, 2009. [20] L. Hörmander and A. Melin, A remark on perturbations of compact operators, Mathematica Scandinavica, 75 (1994), pp. 255–262. [21] I. C. F. Ipsen and B. Nadler, Refined perturbation bounds for eigenvalues of Hermitian and non-Hermitian matrices, SIAM Journal on Matrix Analysis and Applications, 31 (2009), pp. 40–53. [22] A. Jameson, Aerodynamic design via control theory, Journal of Scientific Computing, 3 (1988), pp. 233–260. [23] F.-X. Le Dimet and O. Talagrand, Variational algorithms for analysis and assimilation of meteorological observations: theoretical aspects, Tellus A, 38A (1986), pp. 97–110. [24] A. V. Levy and S. Gomez, The tunneling method applied to global optimization, in Numerical Optimization 1984, P. T. Boggs, ed., SIAM, June 1985. [25] Y. Li and J. Zhou, A minimax method for finding multiple critical points and its applications to semilinear PDEs, SIAM Journal on Scientific Computing, 23 (2001), pp. 840–865. [26] R. Martı́, Multi-start methods, in Handbook of Metaheuristics, F. Glover and G. A. Kochenberger, eds., vol. 57 of International Series in Operations Research & Management Science, Springer, 2003, pp. 355–368. [27] C. Mehl, V. Mehrmann, A. C. M. Ran, and L. Rodman, Jordan forms of real and complex matrices under rank one perturbations, Operators and Matrices, 7 (2013), pp. 381–398. [28] J. Moro and F. M. Dopico, Low rank perturbation of Jordan structure, SIAM Journal on Matrix Analysis and Applications, 25 (2003), pp. 495–506. [29] M. F. Murphy, G. H. Golub, and A. J. Wathen, A note on preconditioning for indefinite linear systems, SIAM Journal on Scientific Computing, 21 (2000), 14 P. E. FARRELL pp. 1969–1972. [30] J. W. Pearson and A. J. Wathen, A new approximation of the Schur complement in preconditioners for PDE-constrained optimization, Numerical Linear Algebra with Applications, 19 (2012), pp. 816–829. [31] J. Pestana and A. J. Wathen, Natural preconditioning and iterative methods for saddle point systems, SIAM Review, 57 (2015), pp. 71–91. [32] J. D. Pintér, Global optimization: software, test problems, and applications, in Handbook of Global Optimization, P. M. Pardalos and H. E. Romeijn, eds., Springer, 2002, ch. 15, pp. 515–569. [33] T. Rees, H. S. Dollar, and A. J. Wathen, Optimal solvers for PDEconstrained optimization, SIAM Journal on Scientific Computing, 32 (2010), pp. 271–298. [34] A. Richards, University of Oxford Advanced Research Computing, 2015. 10.5281/zenodo.22558. [35] Y. Saad, A flexible inner-outer preconditioned GMRES algorithm, SIAM Journal on Scientific Computing, 14 (1993), pp. 461–469. [36] Y. Saad and M. Schultz, GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems, SIAM Journal on Scientific and Statistical Computing, 7 (1986), pp. 856–869. [37] S. V. Savchenko, Typical changes in spectral properties under perturbations by a rank-one operator, Mathematical Notes, 74 (2003), pp. 557–0568. [38] A. M. Stuart, Inverse problems: a Bayesian perspective, Acta Numerica, 19 (2010), pp. 451–559. [39] F. Tröltzsch, Optimal control of partial differential equations: Theory, methods and applications, vol. 112 of Graduate Studies in Mathematics, AMS, 2010. [40] S. Varrette, P. Bouvry, H. Cartiaux, and F. Georgatos, Management of an academic HPC cluster: the UL experience, in Proc. of the 2014 Intl. Conf. on High Performance Computing & Simulation (HPCS 2014), Bologna, Italy, July 2014, IEEE, pp. 959–967. [41] J. H. Wilkinson, The Algebraic Eigenvalue Problem, vol. 87 of Monographs on Numerical Analysis, Oxford University Press, 1965.
© Copyright 2026 Paperzz