Stability and Sensitivity Analysis in Optimal Control of Partial Differential Equations Dr. rer. nat. Roland Griesse Cumulative Habilitation Thesis Faculty of Natural Sciences Karl-Franzens University Graz October 2007 Contents Preface 3 Chapter 1. Stability and Sensitivity Analysis 1. Lipschitz Stability of Solutions for Elliptic Optimal Control Problems with Pointwise State Constraints 2. Lipschitz Stability of Solutions for Elliptic Optimal Control Problems with Pointwise Mixed Control-State Constraints 3. Sensitivity Analysis for Optimal Control Problems Involving the Navier-Stokes Equations 4. Sensitivity Analysis for Optimal Boundary Control Problems of a 3D Reaction-Diffusion System 7 Chapter 2. Numerical Methods and Applications 5. Local Quadratic Convergence of SQP for Elliptic Optimal Control Problems with Mixed Control-State Constraints 6. Update Strategies for Perturbed Nonsmooth Equations 7. Quantitative Stability Analysis of Optimal Solutions in PDE-Constrained Optimization 8. Numerical Sensitivity Analysis for the Quantity of Interest in PDEConstrained Optimization 9. On the Interplay Between Interior Point Approximation and Parametric Sensitivities in Optimal Control Bibliography 12 29 45 62 81 82 102 124 145 174 195 Preface The topic of this thesis is stability and sensitivity analysis in optimal control of partial differential equations. Stability refers to the continuous behavior of optimal solutions under perturbations of the problem data, while sensitivity indicates a differentiable dependence. This thesis is divided into two chapters. Chapter 1 provides a short overview over the topic and its theoretical foundations. The individual sections give an introduction to the author’s contributions concerning new stability and sensitivity results for several problem classes, in particular optimal control problems with state constraints (Section 1) and mixed control-state constraints (Section 2), as well as problems involving the Navier-Stokes equations (Section 3) and boundary control problems for a system of coupled reaction-diffusion equations (Section 4). Chapter 1 is based on the following publications. 1. R. Griesse: Lipschitz Stability of Solutions to Some State-Constrained Elliptic Optimal Control Problems, Journal of Analysis and its Applications, 25(4), p.435–444, 2006 2. W. Alt, R. Griesse, N. Metla and A. Rösch: Lipschitz Stability for Elliptic Optimal Control Problems with Mixed Control-State Constraints, submitted to Applied Mathematics and Optimization, 2006 3. R. Griesse, M. Hintermüller and M. Hinze: Differential Stability of Control Constrained Optimal Control Problems for the Navier-Stokes Equations, Numerical Functional Analysis and Optimization 26(7–8), p.829–850, 2005 4. R. Griesse and S. Volkwein: Parametric Sensitivity Analysis for Optimal Boundary Control of a 3D Reaction-Diffusion System, in: Large-Scale Nonlinear Optimization, G. Di Pillo and M. Roma (editors), volume 83 of Nonconvex Optimization and its Applications, p.127–149, Springer, Berlin, 2006 Chapter 2 addresses a number of applications based on the concepts of stability and sensitivity of infinite dimensional optimization problems, and of optimal control problems in particular. The applications include the local convergence of the SQP (sequential quadratic programming) method for optimal control problems with mixed control-state constraints (Section 5), accurate update strategies for solutions of perturbed problems (Section 6), the quantitative stability analysis of optimal solutions (Section 7), and the efficient evaluation of first and second-order sensitivity derivatives of a quantity of interest (Section 8). Finally, the relationship between the sensitivity derivatives of optimization problems in function space, and the sensitivity derivatives of their relaxations in the context of interior point methods is investigated (Section 9). Chapter 2 is based on the following publications. 5. R. Griesse, N. Metla and A. Rösch: Local Quadratic Convergence of SQP for Elliptic Optimal Control Problems with Mixed Control-State Constraints, submitted to: ESAIM: Control, Optimisation, and Calculus of Variations, 2007 4 Preface 6. R. Griesse, T. Grund and D. Wachsmuth: Update Strategies for Perturbed Nonsmooth Equations, to appear in: Optimization Methods and Software, 2007 7. K. Brandes and R. Griesse: Quantitative Stability Analysis of Optimal Solutions in PDE-Constrained Optimization, Journal of Computational and Applied Mathematics, 206(2), p.809–826, 2007 8. R. Griesse and B. Vexler: Numerical Sensitivity Analysis for the Quantity of Interest in PDE-Constrained Optimization, SIAM Journal on Scientific Computing, 29(1), p.22–48, 2007 9. R. Griesse and M. Weiser: On the Interplay Between Interior Point Approximation and Parametric Sensitivities in Optimal Control, Journal of Mathematical Analysis and Applications, 337(2), p.771–793, 2008 An effort was made to use a consistent notation throughout the introductory paragraphs which link the individual papers. As a consequence, the notation used in the introduction to each section may differ slightly from the notation used in the actual publication. Moreover, all manuscripts have been typeset again from their LATEX sources, in order to achieve a uniform layout. In some cases, this may have led to an updated bibliography, or a different numbering scheme. All of the above publications were written after the completion of the author’s Ph.D. degree in February of 2003. In addition, the following publications were completed in the same period of time. 10. R. Griesse and D. Lorenz: A Semismooth Newton Method for Tikhonov Functionals with Sparsity Constraints, submitted, 2007 11. R. Griesse and K. Kunisch: Optimal Control for a Stationary MHD System in Velocity-Current Formulation, SIAM Journal on Control and Optimization, 45(5), p.1822–1845, 2006 12. A. Borzı̀ and R. Griesse: Distributed Optimal Control of Lambda-Omega Systems, Journal of Numerical Mathematics 14(1), p.17–40, 2006 13. A. Borzı̀ and R. Griesse: Experiences with a Space-Time Multigrid Method for the Optimal Control of a Chemical Turbulence Model, International Journal for Numerical Methods in Fluids 47(8–9), p.879–885, 2005 14. R. Griesse and S. Volkwein: A Primal-Dual Active Set Strategy for Optimal Boundary Control of a Reaction-Diffusion System, SIAM Journal of Control and Optimization 44(2), p.467–494, 2005 15. R. Griesse and A.J. Meir: Modeling of an MHD Free Surface Problem Arising in CZ Crystal Growth, submitted, 2007 16. J.C. de los Reyes and R. Griesse: State-Constrained Optimal Control of the Stationary Navier-Stokes Equations, submitted, 2006 17. R. Griesse, A.J. Meir and K. Kunisch: Control Issues in Magnetohydrodynamics, in: Optimal Control of Free Boundaries, Mathematisches Forschungsinstitut Oberwolfach, Report No. 8/2007, p.20–23, 2007 18. R. Griesse and A.J. Meir: Modeling of an MHD Free Surface Problem Arising in CZ Crystal Growth, in: Proceedings of the 5th IMACS Symposium on Mathematical Modelling (5th MATHMOD), I. Troch, F. Breitenecker (editors), ARGESIM Report 30, Vienna, 2006 19. R. Griesse and K. Kunisch: Optimal Control in Magnetohydrodynamics, in: Optimal Control of Coupled Systems of PDE, Mathematisches Forschungsinstitut Oberwolfach, Report No. 18/2005, p.1011–1014, 2005 5 20. R. Griesse and A. Walther: Towards Matrix-Free AD-Based Preconditioning of KKT Systems in PDE-Constrained Optimization, Proceedings of the GAMM 2005 Annual Scientific Meeting, PAMM 5(1), p.47–50, 2005 21. R. Griesse and S. Volkwein: A Semi-Smooth Newton Method for Optimal Boundary Control of a Nonlinear Reaction-Diffusion System, Proceedings of the Sixteenth International Symposium on Mathematical Theory of Networks and Systems (MTNS), Leuven, Belgium, 2004 A complete and updated list of publications can be found online at http://www.ricam.oeaw.ac.at/people/page/griesse/publications.html Acknowledgment. The publications which form the basis of this thesis were written during my postdoctoral appointments at Karl-Franzens University of Graz (supported by the SFB 003 Optimization and Control ), and at the Johann Radon Institute for Computational and Applied Mathematics (RICAM), Austrian Academy of Sciences, in Linz. I would like to express my gratitude to Prof. Karl Kunisch for giving me the opportunity to work in these two tremendous environments—both scientifically and otherwise—, for his continuous support and many inspiring discussions. I would also like to thank Prof. Heinz Engl, director of RICAM, for the opportunity to be part of this fantastic institute. The support of several project proposals by the Austrian Science Fund (FWF) is gratefully acknowledged. My sincere thanks go to former and current colleagues and co-workers in Graz and Linz, who contributed greatly in making the recent years very enjoyable and successful. I would like to mention in particular Stefan Volkwein, Georg Stadler, Juan Carlos de los Reyes, Alfio Borzı̀, and Michael Hintermüller in Graz, and Arnd Rösch, Boris Vexler, Marco Discacciati, Nataliya Metla, Svetlana Cherednichenko, Klaus Krumbiegel, Olaf Benedix, Martin Bernauer, Frank Schmidt, Sven Beuchler, Joachim Schöberl, Herbert Egger, Georg Regensburger, Martin Giese, Jörn Sass, and, of course, Annette Weihs, Florian Tischler, Doris Nikolaus, Magdalena Fuchs and Wolfgang Forsthuber in Linz. Many thanks also to all co-authors who have not yet been mentioned, for their effort and time. Last but not least, I would like to thank Julia for her constant love and support. Linz, October 2007 CHAPTER 1 Stability and Sensitivity Analysis Stability and sensitivity are important concepts in continuous optimization. Stability refers to the continuous dependence of an optimal solution on the problem data. In other words, stability ensures the well-posedness of the problem. On the other hand, sensitivity information allows further quantification of the solution’s dependence on problem data, using appropriate notions of differentiability. For a general account of perturbation analysis for infinite-dimensional optimization problems, we refer to the book of Bonnans and Shapiro [2000]. In this chapter, we consider the notions of stability and sensitivity of optimal control problems involving partial differential equations (PDEs). To fix ideas, we use as an example the following prototypical distributed optimal control problem for the Poisson equation, subject to perturbations δ. γ 1 ky − yd k2L2 (Ω) + kuk2L2 (Ω) − (δ1 , y)Ω − (δ2 , u)Ω Minimize 2 2 (P(δ)) −∆y = u + δ3 in Ω, subject to y=0 on Γ. The state y and control u are sought in H01 (Ω) and L2 (Ω), respectively, and we assume a positive control cost parameter γ > 0. We note that the system of necessary and sufficient optimality conditions associated to (P(δ)) is given by −∆p + y − yd = δ1 in Ω, p = 0 on Γ, (0.1) γ u − p = δ2 in Ω, −∆y − u = δ3 in Ω, y=0 on Γ, where p is the adjoint state, and δ appears as a right hand side perturbation. The understanding of problems of type (P(δ)) is key to the analysis of nonlinear optimal control problems which depend on a general perturbation parameter π, which may enter nonlinearly. Properties of nonlinear problems can be deduced from properties of (P(δ)) by means of an implicit function theorem, as outlined below. In addition to problem (P(δ)), we consider some variations with pointwise control constraints, pointwise state constraints, or pointwise mixed control-state constraints. This leads us to consider (Pcc (δ)) Solve (P(δ)) s.t. ua ≤ u ≤ ub a.e. in Ω, (Psc (δ)) Solve (P(δ)) s.t. ya ≤ y ≤ yb in Ω, (Pmc (δ)) Solve (P(δ)) s.t. yc ≤ ε u + y ≤ yd in Ω. The Control Constrained Case. Lipschitz stability properties of problems of type (Pcc (δ)) were first investigated in Unger [1997] and Malanowski and Tröltzsch [2000] for the elliptic case and in Malanowski and Tröltzsch [1999] for the parabolic case. We give here a brief account of their results, applied to our model problem (Pcc (δ)). Problems with pointwise state constraints and mixed control-state constraints will be addressed in Sections 1 and 2, respectively. 8 Stability and Sensitivity Analysis Assumption 0.1: Suppose that Ω ⊂ Rd , d ≥ 1, is a bounded Lipschitz domain and that γ > 0 and yd ∈ L2 (Ω) hold. It is well known that (Pcc (δ)) possesses a unique solution (y δ , uδ ) ∈ H01 (Ω) × Uad , Uad := {u ∈ L2 (Ω) : ua ≤ u ≤ ub a.e. in Ω}, provided that Uad 6= ∅. The solution and the associated unique adjoint state pδ ∈ H01 (Ω) are characterized by the following optimality system: (0.2) −∆pδ + y δ − yd = δ1 in Ω, pδ = 0 on Γ, −∆y δ − uδ = δ3 in Ω, yδ = 0 on Γ, (γ uδ − pδ − δ2 , u − uδ )Ω ≥ 0 for all u ∈ Uad . We begin by reviewing a Lipschitz stability result for the solution. For related results concerning optimal control of parabolic equations, we refer to Malanowski and Tröltzsch [1999], Tröltzsch [2000]. Theorem 0.2 (Malanowski and Tröltzsch [2000]): There exists a constant L2 such that ky δ − y δ0 kH 1 (Ω) + kuδ − uδ0 kL2 (Ω) + kpδ − pδ0 kH 1 (Ω) ≤ L2 kδ − δ 0 k[L2 (Ω)]3 holds for every δ, δ 0 ∈ [L2 (Ω)]3 . When the perturbations and other problem data are more regular, a stronger result can be obtained: Corollary 0.3 (compare Malanowski and Tröltzsch [2000]): If yd , ua , ub ∈ L∞ (Ω), then there exists a constant L∞ such that ky δ − y δ0 kL∞ + kuδ − uδ0 kL∞ (Ω) + kpδ − pδ0 kL∞ (Ω) ≤ L∞ kδ − δ 0 k[L∞ (Ω)]3 holds for every δ, δ 0 ∈ [L∞ (Ω)]3 . Indeed, the assumption on yd , δ1 and δ3 can be relaxed depending on the regularity of the solutions of the state and adjoint PDEs, i.e., depending on the dimension of Ω and the smoothness of its boundary Γ. Sensitivity Analysis in the Control Constrained Case. We now address differentiability properties of the parameter-to-solution map δ 7→ ξ(δ) := (ξ y (δ), ξ u (δ), ξ p (δ)) = (y δ , uδ , pδ ). We refer to Malanowski [2002, 2003a] for the original contributions in the elliptic and parabolic cases, respectively. Due to the presence of inequality constraints, ξ is a nonlinear function of the perturbation δ. We remark that the optimal control can be expressed as p + δ δ 2 uδ := ΠUad , γ where ΠUad denotes the pointwise projection onto the set Uad . Hence the differentiability properties of ξ are essentially those of the projection. Naturally, the subset of Ω where the projection is active or strongly active will play a role, compare Figure 0.1. We define 2 d U ba ≤ u ≤ u bb }, ad,δ := {u ∈ L (Ω) : u 9 term inside the projection operator ub ua strongly active weakly active inactive weakly active strongly active Figure 0.1. Illustration of the admissible set for the sensitivity derivative. In the left-most and right-most parts of the domain, one of the constraints is strongly active, i.e., γ −1 (pδ + δ2 ) > ub or < ua ba = u bb = 0. The deholds, and the derivative of uδ vanishes, i.e., u rivative points into the interior of the admissible region where one of the constraints is weakly active, i.e., where γ −1 (pδ + δ2 ) ∈ {ua , ub } holds. In the center part of the domain, neither constraint is active, and the derivative is not constrained, i.e., u bb = −b ua = ∞ holds. with bounds ( u ba = ( u bb = 0 where γ −1 (pδ + δ2 ) ≤ ua or > ub −∞ elsewhere 0 +∞ where γ −1 (pδ + δ2 ) < ua or ≥ ub elsewhere. Theorem 0.4 (Malanowski [2003a]): For every δ ∈ [L2 (Ω)]3 , the map ξ is directionally differentiable with values in H01 (Ω)× b at δ in the direction of δb is given L2 (Ω) × H01 (Ω). The directional derivative Dξ(δ; δ) by the unique solution and corresponding unique adjoint state of 1 γ Minimize kyk2L2 (Ω) + kuk2L2 (Ω) − (δb1 , y)Ω − (δb2 , u)Ω 2( 2 −∆y = u + δb3 in Ω b (DQP(δ, δ)) subject to y=0 on Γ d and u ∈ U ad,δ . Moreover, differentiability with respect to higher Lp norms was also obtained in Malanowski [2002, 2003a], and the directional derivative was shown to have the Boulib uniformly in all directions δ. b gand property, i.e., the remainder term is of order o(kδk) The original proof of Theorem 0.4 was based on a pointwise construction of the limit of a sequence of finite differences, and Lebesgue’s Dominated Convergence Theorem was used to obtain a limit in L2 (Ω). Recently, a more direct proof of Theorem 0.4 has been obtained in Griesse, Grund, and Wachsmuth [to appear], which exploits Bouligand differentiability of the projection ΠUad . We refer to Section 6 for details. 10 Stability and Sensitivity Analysis Remark 0.5: d We remark that in general U ad,δ is not a linear space and thus the directional derivative b b However, in the presence of strict Dξ(δ; δ) may depend nonlinearly on the direction δ. complementarity, i.e., if {x ∈ Ω : γ −1 (pδ + δ2 ) = ua or ub } = 0 b d holds, then U ad,δ becomes a linear space, and Dξ(δ; δ) does depend linearly on the b direction δ. Nonlinear Optimal Control Problems. As mentioned earlier, the stability and sensitivity analysis for nonlinear problems can be reduced to that for linearquadratic problems by means of an implicit function theorem. Due to the presence of inequality constraints and the variational inequality in (0.2), the classical Implicit Function Theorem is not applicable. To fix ideas, we consider the model problem Z Minimize ϕ(x, y, u) dx (Ω −∆y + β y 3 + α y = u + f in Ω, (Pcc (π)) subject to y=0 on Γ and ua ≤ u ≤ ub a.e. in Ω. Problem (Pcc (π)) depends on the parameter π = (α, β, f ) ∈ R2 × L2 (Ω) =: P. Under appropriate assumptions (see, e.g., [Tröltzsch, 2005, Satz 4.18]), for any local optimal solution (y, u) of (Pcc (π)), there exists a unique adjoint state p such that the following system of necessary optimality conditions is satisfied: −∆p + 3 β y 2 p + α p = ϕy (·, y, u) (0.3) 3 −∆y + β y + α y = u + f (ϕu (·, y, u) − p, u − u)Ω ≥ 0 in Ω, p=0 on Γ, in Ω, y=0 on Γ, for all u ∈ Uad . To make (0.3) accessible to an implicit function theorem, we write it as an equivalent generalized equation, (0.4) 0 ∈ F (y, u, p; π) + N (u). Here, F is defined as −∆p + 3 β y 2 p + α p − ϕy (·, y, u) −∆y + β y 3 + α y − u − f F (y, u, p; π) = ϕu (·, y, u) − p and it maps F : X × P → Z where X = H01 (Ω) ∩ L∞ (Ω) × L2 (Ω) × H01 (Ω) ∩ L∞ (Ω) Z = [H −1 (Ω)]2 × L2 (Ω). when the differential operators are understood in their weak form. The set-valued part N (u) is related to the normal cone of Uad at u, and we define N (u) = {0} × {0} × {µ ∈ L2 (Ω) : (µ, v − u)Ω ≤ 0 for all v ∈ Uad } in case u ∈ Uad , whereas N (u) = ∅ if u 6∈ Uad . For generalized equations such as (0.4), we have the following Implicit Function Theorem. 11 Theorem 0.6 ([Dontchev, 1995, Theorem 2.4]): Let X be a Banach space and let P, Z be normed linear spaces. Suppose that F : X × P → Z is a function and N : X → Z is a set-valued map. Let x ∈ X be a solution to (0.5) 0 ∈ F (x; π) + N (x) for π = π0 , and let W be a neighborhood of 0 ∈ Z. Suppose that (i) F is Lipschitz in π, uniformly in x at (x, π0 ), and F (x, ·) is directionally differentiable at π0 with directional derivative Dπ F (x, π0 ); δπ) for all δπ ∈ P , (ii) F is partially Fréchet differentiable with respect to x in a neighborhood of (x, π0 ), and its partial derivative Fx is continuous in both x and π at (x, π0 ), (iii) there exists a function ξ : W → X such that ξ(0) = x, δ ∈ F (x, π0 ) + Fx (x, π0 )(ξ(δ) − x) + N (ξ(δ)) for all δ ∈ W, and ξ is Lipschitz continuous. Then there exist neighborhoods U of x and V of π0 and a function π 7→ Ξ(π) = x(π) from V to U such that Ξ(π0 ) = x, Ξ(π) is a solution of (0.5) for every π ∈ V , and Ξ is Lipschitz continuous. b ⊃ X is a normed linear space such that If, in addition, X b is directionally (or Bouligand) differentiable at 0 with derivative (iv) ξ : W → X b Dξ(0; δ) for all δb ∈ Z, b is also directionally (or Bouligand) differentiable at π0 and its then π 7→ Ξ(π) ∈ X derivative is given by (0.6) DΞ(π0 ; δπ) = Dξ(0; −Dπ F ((x, π0 ); δπ), for any δπ ∈ P . Definition 0.7 (Robinson [1980]): The property (iii) is termed the strong regularity of the generalized equation (0.5) at x and π0 . This implicit function theorem can be applied to the generalized equation (0.4) with the setting x = (y, u, p). Assumptions (i) and (ii) are readily verified if ϕ is of class C 2 . When we use ξ(δ) = (y δ , y δ , pδ ), the linearized generalized equation in assumption (iii) are the necessary optimality conditions for a linear-quadratic approximation of (Pcc (π)), perturbed by δ: Z Z ϕyy ϕyu 1 y y u Minimize y p (y − y)2 dx dx + 3 β ϕuy ϕuu u 2 Ω Ω Z + ϕy (y − y) + ϕu (u − u) dx − (δ1 , y)Ω − (δ2 , u)Ω (AQPcc (δ)) ( Ω −∆y + (3 β y 2 + α) y = u + f + 2 β y 3 + δ3 in Ω, subject to y=0 on Γ and ua ≤ u ≤ ub a.e. in Ω. If second-order sufficient conditions hold at (y, u) and p, then (AQPcc (δ)) is a strictly convex problem and it has a unique solution (y, u, pδ ) ∈ X, which depends Lipschitz continously on δ ∈ Z, so that assumption (iii) is satisfied. This can be proved along the lines of Theorem 0.2. As in Corollary 0.3, stability w.r.t. L∞ (Ω) norms can be obtained as well by changing X and Z appropriately. Finally, Theorem 0.4 implies 12 Stability and Sensitivity Analysis that also assumption (iv) is satisfied, so that ξ(δ) can be shown to be directionally and Bouligand differentiable, as was done for a similar problem in Malanowski [2002, 2003a]. Following this overview of techniques and results for the control constrained case, the following sections provide complementary results for optimal control problems with state constraints (Section 1), and mixed control-state constraints (Section 2). In Sections 3 and 4, we address again control constrained problems, but with more involved dynamics, which are given by the time-dependent Navier-Stokes equations or a semilinear reaction-diffusion system, respectively. Each section begins with an introduction, followed by the corresponding publication. 1. Lipschitz Stability of Solutions for Elliptic Optimal Control Problems with Pointwise State Constraints R. Griesse: Lipschitz Stability of Solutions to Some State-Constrained Elliptic Optimal Control Problems, Journal of Analysis and its Applications, 25(4), p.435–444, 2006 In this publication we derive Lipschitz stability results with respect to perturbations for optimal control problems involving linear and semilinear elliptic partial differential equations as well as pointwise state constraints. The problem setting in the linear case with distributed control is very similar to our model problem (Psc (δ)) above, which we repeat here for easy reference: Minimize (Psc (δ)) subject to γ 1 ky − yd k2L2 (Ω) + kuk2L2 (Ω) − (δ1 , y)Ω − (δ2 , u)Ω 2 2 −∆y = u + δ3 in Ω, y=0 and ya ≤ y ≤ yb on Γ. in Ω. We work in sufficiently smooth domains Ω ⊂ Rd , d ≤ 3, so that the state y will belong to W = H 2 (Ω) ∩ H01 (Ω), which embeds continuously into C0 (Ω). In this setting, we can allow perturbations δ1 ∈ W ∗ , the dual space of W , so that the term (δ1 , y)Ω in the objective is replaced by hδ1 , yiW ∗ ,W . Following standard arguments, one can show that (Psc (δ)) has a unique solution (y δ , uδ ) ∈ W × L2 (Ω) for any given δ ∈ Z := W ∗ × [L2 (Ω)]2 , provided that the feasible set {(y, u) ∈ W × L2 (Ω) : y = Su and ya ≤ y ≤ yb in Ω} is nonempty, where S : L2 (Ω) → W denotes the solution operator of −∆y = u in Ω, y = 0 on Γ. We prove Theorem 1.1 ([Griesse, 2006, Theorem 2.3]): There exists L2 > 0 such that ky δ − y δ0 kH 2 (Ω) + kuδ − uδ0 kL2 (Ω) ≤ L2 kδ − δ 0 kZ . This result was obtained from a variational argument, without reference to the adjoint state or Lagrange multiplier, hence no Slater condition is required up to here. However, 1. State Constrained Optimal Control Problems 13 whenever a Slater condition holds, it is known from Casas [1986] that there exists a unique measure µ ∈ M(Ω) = C0 (Ω)∗ and a unique adjoint state satisfying −∆p = −(y − yd ) − µ + δ1 in Ω, p=0 on Γ, −∆y = u + δ3 in Ω, y=0 on Γ, γ u − p = δ2 hy, µi ≤ hy, µi in Ω for all y ∈ W ∩ Yad , see Proposition 2.4 of the following paper. The adjoint equation has to be understood in a very weak sense. We may easily derive a Lipschitz estimate for pδ from the third equation, kpδ − pδ0 kL2 (Ω) ≤ (γ L2 + 1)kδ − δ 0 kZ . However, a Lipschitz estimate for pδ in higher norms is not available, in contrast to the control constrained case, compare Theorem 0.2 and Corollary 0.3. As outlined in the introduction, the Implicit Function Theorem 0.6 can be used to derive Lipschitz stability results in the presence of semilinear equations. In view of the findings above for the linear-quadratic case, we choose X = W × [L2 (Ω)]2 as the space for the unknowns and Z = W ∗ × [L2 (Ω)]2 as the space of perturbations. We refer to Theorem 3.10 of the following publication for an application of this technique. The case of Robin boundary control of a linear elliptic equation with state constraints is treated as well. However, the same technique as above then only admits the Lipschitz estimate kpδ − pδ0 kL2 (Γ) ≤ (γ L2 + 1)kδ − δ 0 kZ on the boundary Γ. Therefore, Lipschitz stability results for the case of boundary control of semilinear equations remain an open problem. 14 Stability and Sensitivity Analysis LIPSCHITZ STABILITY OF SOLUTIONS TO SOME STATE-CONSTRAINED ELLIPTIC OPTIMAL CONTROL PROBLEMS ROLAND GRIESSE Abstract. In this paper, optimal control problems with pointwise state constraints for linear and semilinear elliptic partial differential equations are studied. The problems are subject to perturbations in the problem data. Lipschitz stability with respect to perturbations of the optimal control and the state and adjoint variables is established initially for linear–quadratic problems. Both the distributed and Neumann boundary control cases are treated. Based on these results, and using an implicit function theorem for generalized equations, Lipschitz stability is also shown for an optimal control problem involving a semilinear elliptic equation. 1. Introduction In this paper, we consider optimal control problems on bounded domains Ω ⊂ RN of the form: γ 1 ky − yd k2L2 (Ω) + ku − ud k2L2 (Ω) (1.1) Minimize 2 2 for the control u and state y, subject to linear or semilinear elliptic partial differential equations. For instance, in the linear case with distributed control u we have −∆y + a0 y = u on Ω , y=0 on ∂Ω, (1.2a) while the boundary control case reads ∂y + β y = u on ∂Ω. (1.2b) ∂n Instead of the Laplace operator, an elliptic operator in divergence form is also permitted. Moreover, the problem is subject to pointwise state constraints −∆y + a0 y = f on Ω , ya ≤ y ≤ yb on Ω (or Ω), (1.3) where ya and yb are the lower and upper bound functions, respectively. Unless otherwise specified, ya and yb may be arbitrary functions with values in R ∪ {±∞} such that ya ≤ yb holds everywhere. Problems of type (1.1)–(1.3) appear as subproblems after linearization of semilinear state-constrained optimal control problems, such as the example considered in Section 3, but they are also of independent interest. Under suitable conditions, one can show the existence of an adjoint state and a Lagrange multiplier associated with the state constraint (1.3). We refer to [9] for distributed control of elliptic equations and [6, 10, 12, 13] for their boundary control. We also mention [7, 8, 33] and [3–5, 7, 11, 31–33] for distributed and boundary control, respectively, of parabolic equations. In the distributed case, the optimality system comprises the state equation −∆y + a0 y = u on Ω (1.4) the adjoint equation −∆λ = −(y − yd ) − µ on Ω the optimality condition γ(u − ud ) − λ = 0 on Ω, (1.5) (1.6) 1. State Constrained Optimal Control Problems 15 and a complementarity condition for the multiplier µ associated with the state constraint (1.3). In this paper, we extend the above-mentioned results by proving the Lipschitz stability of solutions for semilinear and linear elliptic state-constrained optimal control problems with respect to perturbations of the problem data. We begin by showing that the linear–quadratic problem (1.1)–(1.3) admits solutions which depend Lipschitz continuously on particular perturbations δ = (δ1 , δ2 , δ3 ) of the right hand sides in the first order optimality system (1.4)–(1.6), i.e., −∆λ + (y − yd ) + µ = δ1 on Ω γ(u − ud ) − λ = δ2 on Ω −∆y + a0 y − u = δ3 on Ω in the case of distributed control. The perturbations δ1 and δ2 generate additional linear terms in the objective (1.1). Our main result for the linear–quadratic cases is given in Theorems 2.3 and 4.3, for distributed and boundary control, respectively. It has numerous applications: Firstly, it may serve as a starting point to prove the convergence of numerical algorithms for nonlinear state-constrained optimal control problems. The central notion in this context is the strong regularity property of the first order necessary conditions, which precisely requires their linearization to possess the Lipschitz stability proved in this paper, compare [2]. Secondly, proofs of convergence of the discrete to the continuous solution as the mesh size tends to zero are also based on the strong regularity property, see, e.g., [26]. Thirdly, our results ensure the well-posedness of problem (1.1)–(1.3) in the following sense: If the optimality system is solved only up to a residual δ (for instance, when solving it numerically), our stability result implies that the approximate solution found is the exact and nearby solution of a perturbed problem. Fourthly, our results can be used to prove the Lipschitz stability for optimal control problems with semilinear elliptic equations and with respect to more general perturbations by means of Dontchev’s implicit function theorem for generalized equations, see [14]. We illustrate this technique in Section 3. To the author’s knowledge, the Lipschitz dependence of solutions in optimal control of partial differential equations (PDEs) in the presence of pointwise state constraints has not yet been studied. Most existing results concern control-constrained problems: Malanowski and Tröltzsch [28] prove Lipschitz dependence of solutions for a controlconstrained optimal control problem for a linear elliptic PDE subject to nonlinear Neumann boundary control. In the course of their proof, the authors establish the Lipschitz property also for the linear–quadratic problem obtained by linearization of the first order necessary conditions. In [36], Tröltzsch proves the Lipschitz stability for a linear–quadratic optimal control problem involving a parabolic PDE. In Malanowski and Tröltzsch [27], this result is extended to obtain Lipschitz stability in the case of a semilinear parabolic equation. In the same situation, Malanowski [25] has recently proved parameter differentiability. This result is extended in [18, 19] to an optimal control problem governed by a system of semilinear parabolic equations, and numerical results are provided there. All of the above citations cover the case of pointwise control constraints. Note also that the general theory developed in [23] does not apply to the problems treated in the present paper since the hypothesis of surjectivity [23, (H3)] is not satisfied for bilateral state constraints (1.3). The case of state-constrained optimal control problems governed by ordinary differential equations was studied in [15, 24]. The analysis in these papers relies heavily on the property that the state constraint multiplier µ is Lipschitz on the interval [0, T ] of interest (see, e.g., [22]), so it cannot be applied to the present situation. 16 Stability and Sensitivity Analysis The remainder of this paper is organized as follows: In Section 2, we establish the Lipschitz continuity with respect to perturbations of optimal solutions in the linear– quadratic distributed control case, in the presence of pointwise state constraints. In Section 3, we use these results to obtain Lipschitz stability also for a problem governed by a semilinear equation with distributed control, and with respect to a wider set of perturbations. Finally, Section 4 is devoted to the case of Neumann (co-normal) boundary control in the linear–quadratic case. Throughout, let Ω be a bounded domain in RN for some N ∈ N, and let Ω denote its closure. By C(Ω) we denote the space of continuous functions on Ω, endowed with the norm of uniform convergence. C0 (Ω) is the subspace of C(Ω) of functions with zero trace on the boundary. The dual spaces of C(Ω) and C0 (Ω) are known to be M(Ω) and M(Ω), the spaces of finite signed regular measures with the total variation norm, see for instance [17, Proposition 7.16] or [35, Theorem 6.19]. Finally, we denote by W m,p (Ω) the Sobolev space of functions on Ω whose distributional derivatives up to order m are in Lp (Ω), see Adams [1]. In particular, we write H m (Ω) instead of W m,2 (Ω). The space W0m,p (Ω) is the closure of Cc∞ (Ω) (the space of infinitely differentiable functions on Ω with compact support) in W m,p (Ω). 2. Linear–quadratic distributed control Throughout this section, we are concerned with optimal control problems governed by a state equation with an elliptic operator in divergence form and distributed control. As delineated in the introduction, the problem depends on perturbation parameters δ = (δ1 , δ2 , δ3 ): Z 1 γ 2 2 Minimize ky − yd kL2 (Ω) + ku − ud kL2 (Ω) − hy, δ1 iW,W 0 − u δ2 (2.1) 2 2 Ω over u ∈ L2 (Ω) s.t. −div (A∇y) + a0 y = u + δ3 y=0 and ya ≤ y ≤ yb 2 on Ω (2.2) on ∂Ω (2.3) on Ω. (2.4) H01 (Ω) so that the pointwise state conWe work with the state space W = H (Ω) ∩ straint (2.4) is meaningful. The perturbations are introduced below. Let us fix the standing assumption for this section: Assumption 2.1. Let Ω be a bounded domain in RN (N ∈ {1, 2, 3}) with C 1,1 boundary ∂Ω, see [20, p. 5]. The state equation is governed by an operator with N × N symmetric coefficient matrix A with entries aij which are Lipschitz continuous on Ω. We assume the condition of uniform ellipticity: There exists m0 > 0 such that ξ >A ξ ≥ m0 |ξ|2 for all ξ ∈ RN and almost all x ∈ Ω. The coefficient a0 ∈ L∞ (Ω) is assumed to be nonnegative a.e. on Ω. Moreover, yd and ud denote desired states and controls in L2 (Ω), respectively, while γ is a positive number. The bounds ya and yb may be arbitrary functions on Ω such that the admissible set KW = {y ∈ W : ya ≤ y ≤ yb on Ω} is nonempty. The following result allows us to define the solution operator Tδ : L2 (Ω) → W such that y = Tδ (u) satisfies (2.2)–(2.3) for given δ and u. For the proof we refer to [20, Theorems 2.4.2.5 and 2.3.3.2]: 1. State Constrained Optimal Control Problems 17 Proposition 2.2 (The State Equation). Given u and δ3 in L2 (Ω), the state equation (2.2)–(2.3) has a unique solution y ∈ W in the sense that (2.2) is satisfied almost everywhere on Ω. The solution verifies the a priori estimate kykH 2 (Ω) ≤ cA ku + δ3 kL2 (Ω) . (2.5) In order to apply the results of this section to prove the Lipschitz stability of solutions in the semilinear case in Section 3, we consider here very general perturbations (δ1 , δ2 , δ3 ) ∈ W 0 × L2 (Ω) × L2 (Ω), where W 0 is the dual of the state space W. Of course, this comprises more regular perturbations. In particular, (2.1) includes perturbations of the desired state in view of Z 1 1 2 2 ky − (yd + δ1 )kL2 (Ω) = ky − yd kL2 (Ω) − y δ1 + c 2 2 Ω where c is a constant. Likewise, δ2 covers perturbations in the desired control ud , and δ3 accounts for perturbations in the right hand side of the PDE. We can now state the main result of this section which proves the Lipschitz stability of the optimal state and control with respect to perturbations. It relies on a variational argument and does not invoke any dual variables. Theorem 2.3 (Lipschitz Continuity). For any δ = (δ1 , δ2 , δ3 ) ∈ W 0 × L2 (Ω) × L2 (Ω), problem (2.1)–(2.4) has a unique solution. Moreover, there exists a constant L > 0 such that for any two pertubations (δ10 , δ20 , δ30 ) and (δ100 , δ200 , δ300 ), the corresponding solutions of (2.1)–(2.4) satisfy ky 0 − y 00 kH 2 (Ω) + ku0 − u00 kL2 (Ω) ≤ L kδ10 − δ100 kW 0 + kδ20 − δ200 kL2 (Ω) + kδ30 − δ300 kL2 (Ω) . Proof. Let δ ∈ W 0 × L2 (Ω) × L2 (Ω) be arbitrary. We introduce the shifted control variable v := u + δ3 and define γ 1 fe(y, v, δ) = ky − yd k2L2 (Ω) + kv − ud − δ3 k2L2 (Ω) 2 Z2 − hy, δ1 iW,W 0 − Obviously, our problem is now to minimize fe(y, v, δ) Ω (v − δ3 ) δ2 . subject to (y, v) ∈ M 2 where M = {(y, v) ∈ KW × L (Ω) : −div (A∇y) + a0 y = v on Ω}. Due to Assumption 2.1, the feasible set M is nonempty, closed and convex and also independent of δ. In view of γ > 0 and the a priori estimate (2.5), the objective is strictly convex. It is also weakly lower semicontinuous and radially unbounded, hence it is a standard result from convex analysis [16, Chapter II, Proposition 1.2] that (2.1)–(2.4) has a unique solution (y, u) ∈ W × L2 (Ω) for any δ. A necessary and sufficient condition for optimality is fey (y, v, δ)(y − y) + fev (y, v, δ)(v − v) ≥ 0 for all (y, v) ∈ M. (2.6) Now let δ 0 and δ 00 be two perturbations with corresponding solutions (y 0 , v 0 ) and (y 00 , v 00 ). From the variational inequality (2.6), evaluated at (y 0 , v 0 ) and with (y, v) = (y 00 , v 00 ) we obtain Z Z 00 0 (y − yd )(y − y ) + γ (v 0 − ud − δ30 )(v 00 − v 0 ) Ω Ω Z 00 − hy − y 0 , δ10 iW,W 0 − (v 00 − v 0 ) δ20 ≥ 0 Ω 18 Stability and Sensitivity Analysis By interchanging the roles of (y 0 , v 0 ) and (y 00 , v 00 ) and adding the inequalities, we obtain ky 0 − y 00 k2L2 (Ω) + γ kv 0 − v 00 k2L2 (Ω) ≤ hy 0 − y 00 , δ10 − δ100 iW,W 0 + γ Z Ω (v 0 − v 00 )(δ30 − δ300 ) + Z Ω (v 0 − v 00 )(δ20 − δ200 ) ≤ ky 0 − y 00 kH 2 (Ω) kδ10 − δ100 kW 0 + kv 0 − v 00 kL2 (Ω) γ kδ30 − δ300 kL2 (Ω) + kδ20 − δ200 kL2 (Ω) . Using the a priori estimate (2.5), the left hand side can be replaced by γ γ 0 kv − v 00 k2L2 (Ω) + 2 ky 0 − y 00 k2H 2 (Ω) . 2 2cA Now we apply Young’s inequality to the right hand side and absorb the terms involving the state and control into the left hand side, which yields the Lipschitz stability of y and v, hence also of u. As a precursor for the semilinear case in Section 3, we recall in Proposition 2.4 a known result concerning the adjoint state and the Lagrange multiplier associated with problem (2.1)–(2.4). Proposition 2.4. Let δ ∈ W 0 × L2 (Ω) × L2 (Ω) be a given perturbation and let (y, u) be the corresponding unique solution of (2.1)–(2.4). If KW has nonempty interior, then there exists a unique adjoint variable λ ∈ L2 (Ω) and unique Lagrange multiplier µ ∈ W 0 such that the following holds: Z Z Z − λ div (A∇y) + a0 λy = − (y − yd )y + hy, δ1 − µiW,W 0 ∀y ∈ W (2.7) Ω Ω Ω hy, µiW,W 0 ≤ hy, µiW,W 0 γ(u − ud ) − λ = δ2 ∀y ∈ KW (2.8) on Ω. (2.9) Proof. Let ye be an interior point of KW . Since Tδ0 (u) is an isomorphism from L2 (Ω) → W, u e can be chosen such that ye = Tδ (u) + Tδ0 (u)(e u − u), hence a Slater condition is satisfied. The rest of the proof can be carried out along the lines of Casas [9], or using the abstract multiplier theorem [10, Theorem 5.2]. In the proposition above, we have assumed that KW has nonempty interior. This is not a very restrictive assumption, as any ye ∈ KW satisfying ye − ya ≥ ε and yb − ye ≥ ε on Ω for some ε > 0 is an interior point of KW . Remark 2.5. 1. In [9], it was shown that the state constraint multiplier µ is indeed a measure in M(Ω), i.e., µ has better regularity than just W 0 . However, in the following section we will not be able to use this extra regularity. 2. In view of the previous statement, if δ1 ∈ M(Ω), then so is the right hand side −(y − yd ) + δ1 − µ of the adjoint equation (2.7) and thus the adjoint state λ is an element of W01,s (Ω) for all s ∈ [1, NN−1 ), see [9]. 3. Note that we do not have a stability result for the Lagrange multiplier µ so that we cannot use (2.7) to derive a stability result for the adjoint state λ even in the presence of regular perturbations. This observation is very much in contrast with the control-constrained case, where the control-constraint multiplier does not appear in the adjoint equation’s right hand side and hence the stability of λ can be obtained using an a priori estimate for the adjoint PDE. 4. Nevertheless, from the optimality condition (2.9) we can derive the Lipschitz estimate kλ0 − λ00 kL2 (Ω) ≤ (γL + 1) kδ 0 − δ 00 k (2.10) 1. State Constrained Optimal Control Problems 19 for the adjoint states belonging to two perturbations δ 0 and δ 00 . However, we use here that the control is distributed on all of Ω. We close this section by another observation: Let δ 0 and δ 00 be two perturbations with associated optimal states y 0 and y 00 and Lagrange multipliers µ0 and µ00 . Then hy 0 − y 00 , µ0 − µ00 iW,W 0 ≤ 0 holds, as can be inferred directly from (2.8). 3. A semilinear distributed control problem In this section we show how the Lipschitz stability results for state-constrained linear–quadratic optimal control problems can be transferred to semilinear problems using an appropriate implicit function theorem for generalized equations, see Dontchev [14] and also Robinson [34]. To illustrate this technique, we consider the following parameter-dependent problem P(p): Minimize over s.t. γ 1 ky − yd k2L2 (Ω) + ku − ud k2L2 (Ω) 2 2 u ∈ L2 (Ω) −D∆y + βy 3 + αy = u + f y=0 and (3.1) ya ≤ y ≤ yb on Ω (3.2) on ∂Ω (3.3) on Ω. (3.4) The semilinear state equation is a stationary Ginzburg–Landau model, see [21]. We work again with the state space W = H 2 (Ω) ∩ H01 (Ω). Throughout this section, we make the following standing assumption: Assumption 3.1. Let Ω be a bounded domain in RN (N ∈ {1, 2, 3}) with C 1,1 boundary. Let D, α and β be positive numbers, and let f ∈ L2 (Ω). Moreover, let yd and ud be in L2 (Ω) and γ > 0. The bounds ya and yb may be arbitrary functions on Ω such that the admissible set KW = {y ∈ W : ya ≤ y ≤ yb on Ω} has nonempty interior. The results obtained in this section can immediately be generalized to the state equation −div (A∇y) + φ(y) = u + f with appropriate assumptions on the semilinear term φ(y). However, we prefer to consider an example which explicitly contains a number of parameters which otherwise would be hidden in the nonlinearity. In the example above, we can take p = (yd , ud , f, D, α, β, γ) ∈ Π = [L2 (Ω)]3 × R4 as the perturbation parameter and we introduce Π+ = {p ∈ P : D > 0, α > 0, β > 0, γ > 0}. In the sequel, we refer to problem (3.1)–(3.4) as P(p) when we wish to emphasize its dependence on the parameter p. Note that in contrast to the previous section, the parameter p now appears in a more complicated fashion which cannot be expressed solely as right hand side perturbations of the optimality system. Proposition 3.2 (The State Equation). For fixed parameter p ∈ Π+ and for any given u in L2 (Ω), the state equation (3.2)–(3.3) has a unique solution y ∈ W in the sense that y satisfies (3.2) almost everywhere on Ω. The solution depends Lipschitz continuously on the data, i.e., there exists c > 0 such that ky − y 0 kH01 (Ω) ≤ c ku − u0 kL2 (Ω) 20 Stability and Sensitivity Analysis holds for all u, u0 in L2 (Ω). Moreover, the nonlinear solution map Tp : L2 (Ω) → H 2 (Ω) ∩ H01 (Ω) defined by u 7→ y is Fréchet differentiable. Its derivative Tp0 (u)δu at u in the direction of δu is given by the unique solution δy of −D∆δy + (3βy 2 + α) δy = δu δy = 0 on Ω on ∂Ω where y = Tp (u). Moreover, Tp0 (u) is an isomorphism from L2 (Ω) → W. Proof. Existence and uniqueness in H01 (Ω) of the solution for (3.2)–(3.3) and the assertion of Lipschitz continuity follow from the theory of monotone operators, see [37, p. 557], applied to A : H01 (Ω) 3 y 7→ −D∆y + βy 3 + αy − f ∈ H −1 (Ω). Note that A is strongly monotone, coercive, and hemicontinuous. The solution’s H 2 (Ω) regularity now follows from considering βy 3 an additional source term, which is in L2 (Ω) due to the Sobolev Embedding Theorem (see [1, p. 97]). Fréchet differentiability of the solution map is a consequence of the implicit function theorem, see, e.g., [38, p. 250]. The isomorphism property of Tp0 (u) follows from Proposition 2.2. Note that 3βy 2 + α ∈ L∞ (Ω) since y ∈ L∞ (Ω). Before we turn to the main discussion, we state the following existence result for global minimizers: Lemma 3.3. For any given parameter p ∈ Π+ , P(p) has a global optimal solution. Proof. The proof follows a standard argument and is therefore only sketched. Let {(yn , un )} be a feasible minimizing sequence for the objective (3.1). Then {un } is bounded in L2 (Ω) and, by Lipschitz continuity of the solution map, {yn } is bounded in H01 (Ω). Extracting weakly convergent subsequences, one shows that the weak limit satisfies the state equation (3.2)–(3.3). By compactness of the embedding H01 (Ω) ,→ L2 (Ω) (see [1, p. 144]) and extracting a pointwise a.e. convergent subsequence of {yn }, one sees that the limit satisfies the state constraint (3.4). Weak lower semicontinuity of the objective (3.1) completes the proof. For the remainder of this section, let p∗ = (yd∗ , u∗d , f ∗ , α∗ , β ∗ , γ ∗ ) ∈ Π+ denote a fixed reference parameter. Our strategy for proving the Lipschitz dependence of solutions for P(p) near p∗ with respect to changes in the parameter p is as follows: 1. We verify a Slater condition and show that for every local optimal solution of P(p∗ ), there exists an adjoint state and a Lagrange multiplier satisfying a certain first order necessary optimality system (Proposition 3.5). 2. We pick a solution (y ∗ , u∗ , λ∗ ) of the first order optimality system (for instance the global minimizer) and rewrite the optimality system as a generalized equation. 3. We linearize this generalized equation and introduce new perturbations δ which correspond to right hand side perturbations of the optimality system. We identify this generalized equation with the optimality system of an auxiliary linear-quadratic optimal control problem AQP(δ), see Lemma 3.7. 4. We assume a coercivity condition (AC) for the Hessian of the Lagrangian at (y ∗ , u∗ , λ∗ ) and use the results obtained in Section 2 to prove the existence and uniqueness of solutions to AQP(δ) and their Lipschitz continuity with respect to δ. Consequently, the solutions to the linearized generalized equation from Step 3 are unique and depend Lipschitz continuously on δ (Proposition 3.9). 5. In virtue of an implicit function theorem for generalized equations [14], the solutions of the optimality system for P(p) near p∗ are shown to be locally unique and to depend Lipschitz continuously on the perturbation p (Theorem 3.10). 1. State Constrained Optimal Control Problems 21 6. We verify that the coercivity condition (AC) implies second order sufficient conditions, which are then shown to be stable under perturbations, to the effect that solutions of the optimality system are indeed local optimal solutions of the perturbed problem (Theorem 3.11). We refer to the individual steps as Step 1–Step 6 and begin with Step 1. For the proof of adjoint states and Lagrange multipliers, we verify the following Slater condition: Lemma 3.4 (Slater Condition). Let p ∈ Π+ and let u be a local optimal solution for problem P(p) with optimal state y = Tp (u). Then there exists u ep ∈ L2 (Ω) such that ye := Tp (u) + Tp0 (u)(e up − u) (3.5) lies in the interior of the set of admissible states KW . Proof. By Assumption 3.1 there exists an interior point ye of KW . Since Tp0 (u) is an isomorphism, u e can be chosen such that (3.5) is satisfied. Using this Slater condition, the following result follows directly from the abstract multiplier theorem in [10, Theorem 5.2]: Proposition 3.5 (Lagrange Multipliers). Let p ∈ Π+ and let (y, u) ∈ W × L2 (Ω) be a local optimal solution for problem P(p). Then there exists a unique adjoint variable λ ∈ L2 (Ω) and unique Lagrange multiplier µ ∈ W 0 such that Z Z Z (3.6) −D λ∆y + (3β|y|2 + α)λ y = − (y − yd ) y − hy, µiW,W 0 ∀y ∈ W Ω Ω Ω hy, µiW,W 0 ≤ hy, µiW,W 0 γ(u − ud ) − λ = 0 ∀y ∈ KW (3.7) on Ω. (3.8) From now on, we denote by (y ∗ , u∗ , λ∗ ) a local optimal solution of (3.1)–(3.4) for the parameter p∗ with corresponding adjoint state λ∗ and multiplier µ∗ . Our next Step 2 is to rewrite the optimality system as a generalized equation in the form 0 ∈ F (y, u, λ; p)+N (y) where N is a set-valued operator which represents the variational inequality (3.7) using the dual cone of the admissible set KW . We define F : W × L2 (Ω) × L2 (Ω) × Π → W 0 × L2 (Ω) × L2 (Ω) −D∆λ + (3βy 2 + α)λ + (y − yd ) F (y, u, λ; p) = γ(u − ud ) − λ 3 −D∆y + βy + αy − u − f and N (y) = {µ ∈ W 0 : hy − y, µiΩ ≤ 0 for all y ∈ KW } × {0} × {0} ⊂ Z if y ∈ KW , and N (y)R= ∅ else. The term ∆λ is understood in the sense of distributions, i.e., h∆λ, φiW 0 ,W = Ω λ∆φ for all φ ∈ W. It is now easy to check that the optimality system (3.2)–(3.3), (3.6)–(3.7) is equivalent to the generalized equation 0 ∈ F (y, u, λ; p) + N (y). (3.9) + Hence a solution (y, u, λ) of (3.9) for given p ∈ Π will be called a critical point. For future reference, we summarize the following evident properties of the operator F : Lemma 3.6 (Properties of F ). (a) F is partially Fréchet differentiable with respect to (y, u, λ) in a neighborhood of (y ∗ , u∗ , λ∗ ; p∗ ). (This partial derivative is denoted by F 0 .) (b) The map (y, u, λ; p) 7→ F 0 (y, u, λ; p) is continuous at (y ∗ , u∗ , λ∗ ; p∗ ). 22 Stability and Sensitivity Analysis (c) F is Lipschitz in p, uniformly in (y, u, λ) at (y ∗ , u∗ , λ∗ ), i.e., there exist L > 0 and neighborhoods U of (y ∗ , u∗ , λ∗ ) in W × L2 (Ω) × L2 (Ω) and V of p∗ in P such that kF (y, u, λ; p1 ) − F (y, u, λ; p2 )k ≤ L kp1 − p2 kP for all (y, u, λ) ∈ U and all p1 , p2 ∈ V . In Step 3 we set up the following linearization: δ ∈ F (y ∗ , u∗ , λ∗ ; p∗ ) + F 0 (y ∗ , u∗ , λ∗ ; p∗ ) y−y ∗ u−u∗ λ−λ∗ ! + N (y). (3.10) For the present example, (3.10) reads −D∗ ∆λ + (3β ∗ |y ∗ |2 +α∗ )λ + 6β ∗ y ∗ λ∗ (y−y ∗ ) + y−yd∗ δ1 δ2 ∈ + N (y). (3.11) γ ∗ (u − u∗d ) − λ ∗ ∗ ∗ 2 ∗ ∗ ∗ 3 ∗ δ3 −D ∆y + (3β |y | + α )y − 2β (y ) − u − f We confirm in Lemma 3.7 below that (3.11) is exactly the first order optimality system for the following auxiliary linear–quadratic optimal control problem, termed AQP(δ): Z γ∗ 1 ∗ 2 ∗ ky − yd kL2 (Ω) + 3β y ∗ λ∗ (y − y ∗ )2 + ku − u∗d k2L2 (Ω) Minimize 2 2 Ω Z (3.12) − hy, δ1 iW,W 0 − u δ2 Ω u ∈ L2 (Ω) over s.t. −D∗ ∆y + (3β ∗ |y ∗ |2 + α∗ ) y = u + f ∗ + 2β ∗ (y ∗ )3 + δ3 y=0 and ya ≤ y ≤ yb on Ω (3.13) on ∂Ω (3.14) on Ω. (3.15) Lemma 3.7. Let δ ∈ W 0 ×L2 (Ω)×L2 (Ω) be arbitrary. If (y, u) ∈ W ×L2 (Ω) is a local optimal solution for AQP(δ), then there exists a unique adjoint variable λ ∈ L2 (Ω) and unique Lagrange muliplier µ ∈ W 0 such that (3.11) is satisfied with µ ∈ N (y). Proof. We note that the state equation (3.13)–(3.14) defines an affine solution operator T : L2 (Ω) → W which turns out to satisfy T (u) = Tp∗ (u∗ ) + Tp0 ∗ (u∗ )(u − u∗ + δ3 ). Hence if u is a local optimal solution of (3.12)–(3.15) with optimal state y = T (u), then ye and u ep∗ − δ3 , taken from Lemma 3.4, satisfy the Slater condition ye = T (u) + 0 ∗ T (u)(e up − δ3 − u) with ye in the interior of KW . Along the lines of Casas [9], or using the abstract multiplier theorem [10, Theorem 5.2], one proves as in Proposition 2.4 that there exist λ ∈ L2 (Ω) and µ ∈ W 0 such that Z Z −D∗ λ∆y + (3β ∗ |y ∗ |2 + α∗ )λ Ω Ω + 6β ∗ y ∗ λ∗ (y − y ∗ ) + y − yd∗ ] y = hy, δ1 − µiW,W 0 ∀y ∈ W γ ∗ (u − u∗d ) − λ = δ2 on Ω hµ, y − yiW 0 ,W ≤ 0 ∀y ∈ KW hold. That is, −D∗ ∆λ + (3β ∗ |y ∗ |2 + α∗ )λ + 6β ∗ y ∗ λ∗ (y − y ∗ ) + y − yd∗ − δ1 + µ = 0, and µ ∈ N (y) holds. Hence, (3.11) is satisfied. 1. State Constrained Optimal Control Problems 23 In order that AQP(δ) has a unique global solution, we assume the following coercivity property: Assumption 3.8. Suppose that at the reference solution (y ∗ , u∗ ) with corresponding adjoint state λ∗ , there exists ρ > 0 such that Z γ∗ 1 kyk2L2 (Ω) + 3β ∗ y ∗ λ∗ |y|2 + kuk2L2 (Ω) ≥ ρ kyk2H 2 (Ω) + kuk2L2 (Ω) (AC) 2 2 Ω holds for all (y, u) ∈ W × L2 (Ω) which obey −D∗ ∆y + (3β ∗ |y ∗ |2 + α∗ ) y = u y=0 ∗ on Ω (3.16a) on ∂Ω. (3.16b) ∗ ∗ Note that Assumption 3.8 is satisfied if β ky λ kL2 (Ω) is sufficiently small, since then the second term in (AC) can be absorbed into the third. Proposition 3.9. Suppose that Assumption 3.8 holds and let δ ∈ W 0 × L2 (Ω) × L2 (Ω) be given. Then AQP(δ) is strictly convex and thus it has a unique global solution. The generalized equation (3.11) is a necessary and sufficient condition for local optimality, hence (3.11) is also uniquely solvable. Moreover, the solution depends Lipschitz continuously on δ. Proof. Due to (AC), the quadratic part of the objective (3.12) is strictly convex, independent of δ. Hence we may repeat the proof of Theorem 2.3 with only minor modifications due to the now different objective (3.12). The existence of a unique adjoint state follows as in Proposition 2.4 and it is Lipschitz in δ by (2.10). We conclude that for any given δ, AQP(δ) has a unique solution (y, u) and adjoint state λ which depend Lipschitz continuously on δ. In addition, the necessary conditions (3.11) are sufficient, hence the generalized equation (3.10) is uniquely solvable and its solution depends Lipschitz continuously on δ. We note in passing that the property assured by Proposition 3.9 is called strong regularity of the generalized equation (3.9). We are now in the position to give our main theorem (Step 5): Theorem 3.10 (Lipschitz Stability for P(p)). Let Assumption 3.8 be satisfied. Then there are numbers ε, ε0 > 0 such that for any two parameter vectors (yd0 , u0d , f 0 , D0 , α0 , β 0 , γ 0 ) and (yd00 , u00d , f 00 , D00 , α00 , β 00 , γ 00 ) in the ε-ball around p∗ in Π, there are critical points (y 0 , u0 , λ0 ) and (y 00 , u00 , λ00 ), i.e., solutions of (3.9), which are unique in the ε0 -ball of (y ∗ , u∗ , λ∗ ). These solutions depend Lipschitz continuously on the parameter perturbation, i.e., there exists L > 0 such that ky 0 − y 00 kH 2 (Ω) + ku0 − u00 kL2 (Ω) + kλ0 − λ00 kL2 (Ω) ≤ L kyd0 − yd00 k2L2 (Ω) + ku0d − u00d k2L2 (Ω) + kf 0 − f 00 kL2 (Ω) + |D0 − D00 | + |α0 − α00 | + |β 0 − β 00 | + |γ 0 − γ 00 | . Proof. Using the properties of F (Lemma 3.6) and the strong regularity of the first order necessary optimality conditions (3.9) (Proposition 3.9), the claim follows directly from the implicit function theorem for generalized equations [14, Theorem 2.4 and Corollary 2.5]. In the sequel, we denote these critical points by (yp , up , λp ). Finally, in Step 6 we are concerned with second order sufficient conditions: Theorem 3.11 (Second Order Sufficient Conditions). Suppose that Assumption 3.8 holds and that ya , yb ∈ H 2 (Ω). Then second order sufficient conditions are satisfied at (y ∗ , u∗ ). Moreover, there exists ε > 0 (possibly smaller than above) such that second 24 Stability and Sensitivity Analysis order sufficient conditions hold also at the perturbed critical points in the ε-ball around p∗ . Hence they are indeed local minimizers of the perturbed problems P(p). Proof. In order to apply the theory of Maurer [29], we make the following identifications: G1 (y, u) = ∆y − β y 3 − α y + u + f K1 = {0} ⊂ Y1 = L2 (Ω) G2 (y, u) = (y − ya , yb − y)> K2 = [{ϕ ∈ H 2 (Ω) : ϕ ≥ 0 on Ω}]2 ⊂ Y2 = [H 2 (Ω)]2 . Note that K2 is a convex closed cone of Y2 with nonempty interior. For instance, ϕ ≡ 1 is an interior point. Since Π+ is open, one has p ∈ Π+ for all p such that kp − p∗ k < ε for sufficiently small ε. Consequently, the Slater condition (Lemma 3.4) is satisfied also at the perturbed critical points. That is, there exists u ep such that ye = Tp (up ) + Tp0 (up )(e up − up ) holds. This entails that (yp , up ) is a regular point in the sense of [29, equation (2.3)] with the choice 0 Tp (up )(e up − up ) . h= u ep − up The multiplier theorem [29, Theorem 2.1] yields the existence of λp and nonnegative − 0 µ+ p , µp ∈ W which coincide with our adjoint variable and state constraint multiplier − via µp = µ+ p − µp . We continue by defining the Lagrangian L(y, u, λ, µ+ , µ− ; p) = γ 1 ky − yd k2L2 (Ω) + ku − ud k2L2 (Ω) 2 Z 2 + −∆y + βy 3 + αy − u − f λ Ω + hya − y, µ− iW,W 0 + hy − yb , µ+ iW,W 0 . By coercivity assumption (AC), abbreviating x = (y, u), we find that the Lagrangian’s second derivative with respect to x, Z 1 γ∗ Lxx (y ∗ , u∗ , λ∗ ; p∗ )(x, x) = kyk2L2 (Ω) + 3β ∗ y ∗ λ∗ |y|2 + kuk2L2 (Ω) 2 2 Ω (which no longer depends on µ) is coercive on the space of all (y, u) satisfying (3.16), thus, in particular, the second order sufficient conditions [29, Theorem 2.3] are satisfied at the nominal critical point (y ∗ , u∗ , λ∗ ). We now show that (AC) continues to hold at the perturbed Kuhn-Tucker points. The technique of proof is inspired by [27, Lemma 5.2]. For a parameter p from the ε-ball around p∗ , we denote by (yp , up , λp ) the corresponding solution of the first order necessary conditions (3.9). One easily sees that Lxx (yp , up , λp ; p)(x, x) − Lxx (y ∗ , u∗ , λ∗ ; p∗ )(x, x) ≤ c1 ε0 kxk2 (3.17) holds for some c1 > 0 and for all x = (y, u) ∈ W × L2 (Ω), the norm being the usual norm of the product space. For arbitrary u ∈ L2 (Ω), let y satisfy the linear PDE −D∆y + (3βyp2 + α)y = u y=0 on Ω (3.18a) on ∂Ω. (3.18b) Let y be the solution to (3.16) corresponding to the control u, then y − y satisfies −D∗ ∆y + (3β ∗ |y ∗ |2+α∗ )y = (3β ∗ |y ∗ |2+α∗ )− (3βyp2+α) y + (D−D∗ )∆y on Ω 1. State Constrained Optimal Control Problems 25 and y = 0 on ∂Ω, i.e., by the standard a priori estimate and boundedness of k3βyp2 + αkL∞ (Ω) near p∗ , ky − ykH 2 (Ω) ≤ c2 ε0 kykH 2 (Ω) (3.20) holds with some c2 > 0. Using the triangle inequality, we obtain from (3.20) ky − ykH 2 (Ω) ≤ c2 ε0 kykH 2 (Ω) . 1 − c2 ε0 We have thus proved that for any x = (y, u) which satisfies (3.18), there exists x = (y, u) which satisfies (3.16) such that kx − xk ≤ c2 ε0 kxk. 1 − c2 ε0 (3.21) Using the estimate from Maurer and Zowe [30, Lemma 5.5], it follows from (3.21) that Lxx (y ∗ , u∗ , λ∗ ; p∗ )(x, x) ≥ ρ0 kxk2 (3.22) holds with some ρ0 > 0. Combining (3.17) and (3.22) finally yields Lxx (yp , up , λp ; p)(x, x) ≥ Lxx (y ∗ , u∗ , λ∗ ; p∗ )(x, x) − c1 ε0 kxk2 ≥ (ρ0 − c1 ε0 )kxk2 which proves that (AC) holds at the perturbed Kuhn-Tucker points, possibly after further reducing ε0 . Concluding as above for the nominal solution, the second order sufficient conditions in [29, Theorem 2.3] imply that (yp , up ) is in fact a local optimal solution for our problem (3.1)–(3.4). 4. Linear–quadratic boundary control In this section, we briefly cover the case of optimal boundary control of a linear elliptic equation with quadratic objective. Due to the similarity of the arguments to the ones used in Section 2, they are kept short. We consider the optimal control problem, subject to perturbations δ = (δ1 , δ2 , δ3 ): Z Z γ 1 2 2 (4.1) ky − yd kL2 (Ω) + ku − ud kL2 (∂Ω) − y dδ1 − u δ2 Minimize 2 2 Ω ∂Ω u ∈ L2 (∂Ω) over s.t. −div (A∇y) + a0 y = f ∂y/∂nA + β y = u + δ3 and ya ≤ y ≤ yb on Ω (4.2) on ∂Ω (4.3) on Ω. (4.4) where ∂/∂nA denotes the co-normal derivative of y corresponding to A, i.e., ∂y/∂nA = n>A∇y. The standing assumption for this section is the following one: Assumption 4.1. Let Ω be a bounded domain in RN (N ∈ {1, 2}) with C 1,1 boundary ∂Ω, see [20, p. 5]. The state equation is governed by an operator with N ×N symmetric coefficient matrix A with entries aij which are Lipschitz continuous on Ω. We assume the condition of uniform ellipticity: There exists m0 > 0 such that ξ >A ξ ≥ m0 |ξ|2 for all ξ ∈ RN and almost all x ∈ Ω. The coefficient a0 ∈ L∞ (Ω) is assumed to satisfy ess inf a0 > 0, while β ∈ L∞ (∂Ω) is nonnegative. Finally, the source term f is an element of L2 (Ω). Again, yd ∈ L2 (Ω) and ud ∈ L2 (∂Ω) denote desired states and controls, while γ is a positive number. The bounds ya and yb may be arbitrary functions on Ω such that the admissible set KC(Ω) = {y ∈ C(Ω) : ya ≤ y ≤ yb on Ω} is nonempty. 26 Stability and Sensitivity Analysis Note that we restrict ourselves to one- and two-dimensional domains, as in three dimensions we would need the control u ∈ Ls (∂Ω) for some s > 2 to obtain solutions in C(Ω) for which a pointwise state constraint is meaningful. Proposition 4.2 (The State Equation). Under Assumption 4.1, and given u and δ3 in L2 (∂Ω), the state equation (4.2)–(4.3) has a unique solution y ∈ H 1 (Ω) ∩ C(Ω) in the weak sense: Z Z Z Z Z A∇y · ∇y + a0 yy + (4.5) βyy = f y + uy for all y ∈ H 1 (Ω). Ω Ω ∂Ω Ω ∂Ω The solution verifies the a priori estimate kykH 1 (Ω) + kykC(Ω) ≤ cA kukL2 (∂Ω) + kδ3 kL2 (∂Ω) + kf kL2 (Ω) . Proof. Uniqueness and existence of the solution in H 1 (Ω) and the a priori bound in H 1 (Ω) follow directly from the Lax–Milgram Theorem applied to the variational equation (4.5). The proof of C(Ω) regularity and the corresponding a priori estimate follow from Casas [10, Theorem 3.1] if β y is considered a right hand side term. The perturbations are taken as (δ1 , δ2 , δ3 ) ∈ M(Ω) × L2 (∂Ω) × L2 (∂Ω). They comprise in particular perturbations of the desired state yd and control ud . Notice that δ3 affects only the boundary data so that, as in the proof of Theorem 2.3, we can absorb this perturbation into the control and obtain an admissible set independent of δ. Theorem 4.3 (Lipschitz Continuity). For any δ = (δ1 , δ2 , δ3 ) ∈ M(Ω) × L2 (∂Ω) × L2 (∂Ω), problem (4.2)–(4.4) has a unique solution. Moreover, there exists a constant L > 0 such that for any two (δ10 , δ20 , δ30 ) and (δ100 , δ200 , δ300 ), the corresponding solutions of (4.2)–(4.4) satisfy ky 0 − y 00 kH 1 (Ω) + ky 0 − y 00 kC(Ω) + ku0 − u00 kL2 (∂Ω) ≤ L kδ10 − δ100 kM(Ω) + kδ20 − δ200 kL2 (∂Ω) + kδ30 − δ300 kL2 (∂Ω) . Similar to the distributed control case, if KC(Ω) has nonempty interior, one can prove the existence of an adjoint state λ ∈ W 1,s (Ω) for all s ∈ [1, NN−1 ) and Lagrange multiplier µ ∈ M(Ω) such that hµ, y − yiM(Ω),C(Ω) ≤ 0 γ(u − ud ) − λ = δ2 −div (A∇λ) + a0 λ = −(y − yd ) − µΩ + δ1Ω ∂λ + β λ = −µ∂Ω + δ1∂Ω ∂nA ∀y ∈ KC(Ω) (4.6a) on ∂Ω (4.6b) on Ω (4.6c) on ∂Ω (4.6d) where (4.6c) is understood in the sense of distributions, and (4.6d) holds in the sense of traces (see Casas [10]). The measures µΩ and µ∂Ω are obtained by restricting µ to Ω and ∂Ω, respectively, and the same splitting applies to δ1 . Note that again, we have no stability result for the Lagrange multiplier µ, and hence we cannot derive a stability result for the adjoint state λ from (4.6c)–(4.6d). We merely obtain from (4.6b) that on the boundary ∂Ω, kλ0 − λ00 kL2 (∂Ω) ≤ (γL + 1)kδ 0 − δ 00 k holds. Unless the state constraint is restricted to the boundary ∂Ω, this difficulty prevents the treatment of a semilinear boundary control case along the lines of Section 3. 1. State Constrained Optimal Control Problems 27 5. Conclusion In this paper, we have proved the Lipschitz stability with respect to perturbations of solutions to pointwise state-constrained optimal control problems for elliptic equations. For distributed control, it was shown how the stability result for linear state equations can be extended to the semilinear case, using an implicit function theorem for generalized equations. In the boundary control case, this method seems not applicable since we are lacking a stability estimate for the state constraint multiplier and thus for the adjoint state on the domain Ω. This is due to the fact that the control variable and the state constraint act on different parts of the domain Ω. Acknowledgments The author would like to thank the anonymous referees for their suggestions which have led to a significant improvement of the presentation. This work was supported in part by the Austrian Science Fund under SFB F003 ”Optimization and Control”. References [1] Adams, R, Sobolev Spaces. New York: Academic Press 1975. [2] Alt, W., The Lagrange-Newton method for infinite-dimensional optimization problems. Numer. Funct. Anal. Optim. 11 (1990), 201 – 224. [3] Arada, N. and Raymond, J. P., Optimality conditions for state-constrained Dirichlet boundary control problems. J. Optim. Theory Appl. 102 (1999)(1), 51 – 68. [4] Arada, N. and Raymond, J. P., Optimal control problems with mixed control-state constraints. SIAM J. Control Optim. 39 (2000)(5), 1391 – 1407. [5] Arada, N. and Raymond, J. P., Dirichlet boundary control of semilinear parabolic equations (II): Problems with pointwise state constraints. Appl. Math. Optim. 45 (2002)(2), 145 – 167. [6] Bergounioux, M., On boundary state constrained control problems. Numer. Funct. Anal. Optim. 14 (1993)(5–6), 515 – 543. [7] Bergounioux, M., Optimal control of parabolic problems with state constraints: A penalization method for optimality conditions. Appl. Math. Optim. 29 (1994)(3), 285 – 307. [8] Bergounioux, M. and Tröltzsch, F., Optimality conditions and generalized bang-bang principle for a state-constrained semilinear parabolic problem. Numer. Funct. Anal. Optim. 17 (1996)(5– 6), 517 – 536. [9] Casas, E., Control of an elliptic problem with pointwise state constraints. SIAM J. Control Optim. 24 (1986)(6), 1309 – 1318. [10] Casas, E., Boundary control of semilinear elliptic equations with pointwise state constraints. SIAM J. Control Optim. 31 (1993)(4), 993 – 1006. [11] Casas, E., Raymond, J. P. and Zidani, H., Pontryagin’s principle for local solutions of control problems with mixed control-state constraints. SIAM J. Control Optim., 39 (2000)(4), 1182 – 1203. [12] Casas, E. and Tröltzsch, F., Second-order necessary optimality conditions for some stateconstrained control problems of semilinear elliptic equations. Appl. Math. Optim. 39 (1999)(2), 211 – 227. [13] Casas, E., Tröltzsch, F. and Unger, A., Second order aufficient optimality conditions for some state-constrained control problems of semilinear elliptic equations. SIAM J. Control Optim. 38 (2000)(5), 1369 – 1391. [14] Dontchev, A., Implicit function theorems for generalized equations. Math. Programming 70 (1995), 91 – 106. [15] Dontchev, A. and Hager, W., Lipschitzian stability for state constrained nonlinear optimal control problems. SIAM J. Control Optim. 36 (1998)(2), 698 – 718. [16] Ekeland, I. and Temam, R., Convex Analysis and Variational Problems. Amsterdam: NorthHolland 1976. [17] Folland, G., Real Analysis. New York: Wiley 1984. [18] Griesse, R., Parametric sensitivity analysis in optimal control of a reaction-diffusion system (I): Solution differentiability. Numer. Funct. Anal. Optim. 25 (2004)(1–2), 93 – 117. [19] Griesse, R., Parametric sensitivity analysis in optimal control of a reaction-diffusion system (II): Practical methods and examples. Optim. Methods Softw. 19 (2004)(2), 217 – 242. [20] Grisvard, P., Elliptic Problems in Nonsmooth Domains. Boston: Pitman 1985. 28 Stability and Sensitivity Analysis [21] Gunzburger, M., Hou, L. and Svobodny, T., Finite element approximations of an optimal control problem associated with the scalar Ginzburg–Landau equation. Comput. Math. Appl. 21 (1991)(2–3), 123 – 131. [22] Hager, W.: Lipschitz continuity for constrained processes. SIAM J. Control Optim. 17 (1979), 321 – 338. [23] Ito, K. and Kunisch, K., Sensitivity analayis of solutions to optimization problems in Hilbert spaces with applications to optimal control and estimation. J. Diff. Equations 99 (1992)(1), 1 – 40. [24] Malanowski, K., Stability and sensitivity of solutions to nonlinear optimal control problems. Appl. Math. Optim. 32 (1995)(2), 111 – 141. [25] Malanowski, K., Sensitivity analysis for parametric optimal control of semilinear parabolic equations. J. Convex Anal. 9 (2002)(2), 543 – 561. [26] Malanowski, K., Büskens, C. and Maurer, H., Convergence of approximations to nonlinear optimal control problems. In: Mathematical Programming with Data Perturbations (ed.: A. Fiacco). Lecture Notes Pure Appl. Math. 195. New York: Dekker 1998, pp. 253 – 284 [27] Malanowski, K. and Tröltzsch, F., Lipschitz stability of solutions to parametric optimal control for parabolic equations. Z. Anal. Anwendungen 18 (1999)(2), 469 – 489. [28] Malanowski, K. and Tröltzsch, F., Lipschitz stability of solutions to parametric optimal control for elliptic equations. Control Cybernet. 29 (2000), 237 – 256. [29] Maurer, H., First and second order sufficient optimality conditions in mathematical programming and optimal control. Math. Programming Study 14 (1981), 163 – 177. [30] Maurer, H. and Zowe, J., First and second order necessary and sufficient optimality conditions for infinite-dimensional programming problems. Math. Programming 16 (1979), 98 – 110. [31] Raymond. J. P., Nonlinear boundary control of semilinear parabolic problems with pointwise state constraints. Discrete Contin. Dynam. Systems Series A 3 (1997)(3), 341 – 370. [32] Raymond. J. P., Pontryagin’s principle for state-constrained control problems governed by parabolic equations with unbounded controls. SIAM J.Control Optim. 36 (1998)(6), 1853 – 1879. [33] Raymond, J. P. and Tröltzsch, F., Second order sufficient optimality conditions for nonlinear parabolic control problems with state constraints. Discrete Contin. Dynam. Systems Series A 6 (2000)(2), 431 – 450. [34] Robinson, St. M., Strongly regular generalized equations. Math. Oper. Res. 5 (1980)(1), 43 – 62. [35] Rudin, W., Real and Complex Analysis. New York: McGraw–Hill 1987. [36] Tröltzsch, F., Lipschitz stability of solutions of linear-quadratic parabolic control problems with respect to perturbations. Dynam. Contin. Discrete Impuls. Systems Series A 7 (2000)(2), 289 – 306. [37] Zeidler, E., Nonlinear Functional Analysis and its Applications (Vol. II/B). New York: Springer 1990. [38] Zeidler, E., Applied Functional Analysis: Main Principles and their Applications. New York: Springer 1995. 2. Mixed State Constrained Optimal Control Problems 29 2. Lipschitz Stability of Solutions for Elliptic Optimal Control Problems with Pointwise Mixed Control-State Constraints W. Alt, R. Griesse, N. Metla and A. Rösch: Lipschitz Stability for Elliptic Optimal Control Problems with Mixed Control-State Constraints, submitted In this manuscript, we analyze an optimal control problem of type (Pmc (δ)), but with additional pure control constraints. The problem under consideration is Minimize (Pmcc (δ)) subject to γ 1 ky − yd k2L2 (Ω) + ku − ud k2L2 (Ω) − (δ1 , y)Ω − (δ2 , u)Ω 2 2 −∆y = u + δ3 in Ω, y=0 and on Γ. u − δ4 ≥ 0 ε u + y − δ 5 ≥ yc in Ω, in Ω. Here, ε and γ are positive numbers. From the point of view of Lipschitz stability, the perturbation of the inequality constraints by δ4 , δ5 , poses no particular difficulty. These perturbations are included in order to treat problems with nonlinear constraints in the future. We consider only one-sided constraints in order to simplify the discussion about the existence of regular Lagrange multipliers. Invoking a result from Rösch and Tröltzsch [2006], we prove in Lemma 2.5 of the manuscript below that for any given δ ∈ Z := L2 (Ω) × L∞ (Ω) × L2 (Ω) × L∞ (Ω) × L∞ (Ω), the unique solution (y δ , uδ ) of (Pmcc (δ)) is characterized by the existence of Lagrange multipliers µ1,2 ∈ L∞ (Ω) and an adjoint state p ∈ H 2 (Ω) ∩ H01 (Ω) satisfying (2.1) −∆p = −(y − yd ) + δ1 + µ2 in Ω, p=0 on Γ −∆y = u + δ3 in Ω, y=0 on Γ γ (u − ud ) − δ2 − p − µ1 − εµ2 = 0 a.e. in Ω, 0 ≤ µ1 ⊥ u ≥ 0 a.e. in Ω, 0 ≤ µ2 ⊥ εu + y − yc ≥ 0 a.e. in Ω. However, the Lagrange multipliers and adjoint state need not be unique, and thus one cannot prove Lipschitz stability without further assumptions (see Remark 2.6 and Proposition 3.5 of the manuscript). Remark 2.1: In the absence of the first inequality constraint u − δ4 ≥ 0, i.e., in the case µ1 = 0, we see that −∆p + ε−1 p = −(y δ − yd ) + δ1 + ε−1 γ (uδ − ud ) − ε−1 δ2 holds on Ω. In view of the uniqueness of (y δ , uδ ), also p and finally µ2 must be unique. One may now proceed in a straightforward way, testing the adjoint equation by y δ −y δ0 , testing the state equation by pδ − pδ0 and the gradient equation by uδ − uδ0 , to obtain a result analogous to Theorem 0.2 (see p. 8): ky δ − y δ0 kH 2 (Ω) + kuδ − uδ0 kL2 (Ω) + kpδ − pδ0 kH 2 (Ω) + kµ2,δ − µ2,δ0 kL2 (Ω) ≤ Lkδ − δ 0 k[L2 (Ω)]3 . We conclude that the additional level of difficulty in (Pmcc (δ)) is not caused by the mixed constraints alone but by the simultaneous presence of the two inequality constraints on the same set Ω. The assumption which allows us to overcome this difficulty is 30 Stability and Sensitivity Analysis Assumption 2.2: Suppose that there exists σ > 0 such that S1σ := {x ∈ Ω : 0 ≤ u0 ≤ σ} S2σ := {x ∈ Ω : 0 ≤ εu0 + y0 − yc ≤ σ} satisfy S1σ ∩ S2σ = ∅. We proceed by showing that there exists G > 0 such that for any δ ∈ Z satisfying (2.2) the active sets kδkZ ≤ G σ, Aδ1 := {x ∈ Ω : uδ = 0} Aδ2 := {x ∈ Ω : ε uδ + y δ − yc = 0} corresponding to (Pmcc (δ)) do not intersect, see Lemma 4.1 of the manuscript below. Consequently, the Lagrange multipliers and adjoint state are unique and will be denoted by µ1,δ , µ2,δ , and pδ , respectively. Our main result is: Theorem 2.3 ([Alt, Griesse, Metla, and Rösch, 2006, Theorem 4.2, Corollary 4.4]): Suppose that δ, δ 0 ∈ Z satisfy (2.2). Then there exists L∞ > 0 such that ky δ − y δ0 kH 2 (Ω) + kuδ − uδ0 kL∞ (Ω) + kpδ − pδ0 kH 2 (Ω) + kµ1,δ − µ1,δ0 kL∞ (Ω) + kµ2,δ − µ2,δ0 kL∞ (Ω) ≤ L∞ kδ − δ 0 kZ . Remark 2.4: It is possible to replace the space H 2 (Ω) ∩ H01 (Ω) for the state and adjoint state by H01 (Ω) ∩ L∞ (Ω), and thus relax the regularity requirement for Ω. 2. Mixed State Constrained Optimal Control Problems 31 LIPSCHITZ STABILITY FOR ELLIPTIC OPTIMAL CONTROL PROBLEMS WITH MIXED CONTROL-STATE CONSTRAINTS WALTER ALT, ROLAND GRIESSE, NATALIYA METLA, AND ARND RÖSCH Abstract. A family of linear-quadratic optimal control problems with pointwise mixed state-control constaints governed by linear elliptic partial differential equations is considered. All data depend on a vector parameter of perturbations. Lipschitz stability with respect to perturbations of the optimal control, the state and adjoint variables, and the Lagrange multipliers is established. 1. Introduction In this paper we consider the following class of linear-quadratic optimal control problems: Z Z γ 1 2 2 y δ1 dx − u δ2 dx (P(δ)) Minimize ky − yd kL2 (Ω) + ku − ud kL2 (Ω) − 2 2 Ω Ω subject to u ∈ L2 (Ω) and the elliptic state equation Ay = u + δ3 y=0 on Ω on ∂Ω (1.1) as well as pointwise constraints u − δ4 > 0 on Ω εu + y − δ5 > yc on Ω. (1.2) Above, Ω is a bounded domain in RN , N ∈ {2, 3}, which is convex or has a C 1,1 boundary. In (1.1), A is an elliptic operator in H01 (Ω) specified below, and ε and γ are positive numbers. The desired state yd is a function in L2 (Ω), while the desired control ud and the bound yc are functions in L∞ (Ω). Problem (P(δ)) depends on a parameter δ = (δ1 , δ2 , δ3 , δ4 , δ5 ) ∈ L2 (Ω) × L∞ (Ω) × 2 L (Ω) × L∞ (Ω) × L∞ (Ω). The main contribution of this paper is to prove, in L∞ (Ω), the Lipschitz stability of the unique optimal solution of (P(δ)) with respect to perturbations in δ. The stability analysis for linear-quadratic problems plays an essential role in the analysis of nonlinear optimal control problems, in the convergence of the SQP method, and in the convergence of solutions to a discretized problem to solutions of the continuous problem. Problems with mixed control-state constraints are important as Lavrientiev-type regularizations of pointwise state-constrained problems [15–17], but they are also interesting in their own right. In the former case, ε is a small parameter tending to zero. For the purpose of this paper, we consider ε to be fixed. Note that in addition to the mixed control-state constraints, a pure control constraint is present on the same domain. Let us put our work into perspective. One of the fundamental results in stability analysis of solutions to optimization problems is Robinson’s implicit function theorem for generalized equations (see [18]). Further developments and applications of Robinson’s result to parametric control problems involving control constraints and discretizations of control problems can be found e.g. in [2–4,6,7,10,13]. For more references see 32 Stability and Sensitivity Analysis the bibliography in [12], where the stability of optimal solutions involving nonlinear ordinary differential equations and control-state constraints was investigated. Problems of type (P(δ)) were investigated in [19] and the existence of regular (L2 ) Lagrange multipliers was proved, but no perturbations were considered. For elliptic partial differential equations, Lipschitz stability results are available only for problems with pointwise pure control constraints [14] and pure state constraints [8]. The presence of simultaneous control and mixed constraints (1.2) complicates our analysis. The multipliers associated to these constraints are present in every equation involving the adjoint state. Therefore, the direct estimation of the norm of the adjoint state, which was used in [8, 14], is not possible in the present situation. In addition, the simultaneous constraints preclude the transformation used in [17], where a mixed control-state constraint was converted to a pure control constraint by defining a new control v := εu + y. While this transformation simplifies our mixed constraint to v > yc + δ5 , it also converts the simple constraint u > δ4 into the mixed constraint v − y > εδ4 and nothing is gained. In order to prove the Lipschitz stability result, we need to assume that the active sets for mixed and control constraints are well separated at the reference problem δ = 0. The outline of the paper is as follows: In Section 2, we investigate some basic properties of problem (P(δ)) for a fixed parameter δ. In particular, we state a projection formula for the Lagrange multipliers. Section 3 is devoted to the Lipschitz stability analysis of an auxiliary optimal control problem. This auxiliary problem is introduced to exclude the possibility of overlapping active sets for both types of constraints. In Section 4, we prove that the solutions of the auxiliary and the original problems coincide and obtain our main results. 2. Properties of the Optimal Control Problem In this section we investigate the elliptic optimal control problem (P(δ)) with pointwise mixed control-state constraints for a fixed parameter δ. With δ = 0 the corresponding problem is considered the unperturbed problem (reference problem). Throughout, (·, ·) denotes the scalar product in L2 (Ω) or L2 (Ω)N , respectively. The following assumptions (A1)–(A3) are assumed to hold throughout the paper. Assumption. (A1) Let Ω be a bounded domain in RN , N ∈ {2, 3} which is convex or has C 1,1 boundary ∂Ω. (A2) The operator A : H01 (Ω) → H −1 (Ω) is defined as hAy, vi = a[y, v], where a[y, v] = ((∇v), A0 ∇y) + (b> ∇y, v) + (cy, v). A0 is an N × N matrix with Lipschitz continuous entries on Ω such that ξ > A0 (x)ξ > m0 |ξ|2 holds with some m0 > 0 for all ξ ∈ RN and almost all x ∈ Ω. Moreover, b ∈ L∞ (Ω)N and c ∈ L∞ (Ω). The bilinear form a[·, ·] is not necessarily symmetric but it is assumed to be continuous and coercive, i.e., a[y, v] 6 c kykH 1 (Ω) kvkH 1 (Ω) 2 a[y, y] > c kykH 1 (Ω) for all y, v ∈ H01 (Ω) with some positive constants c and c. A simple example is a[y, v] = (∇y, ∇v), corresponding to A = −∆. (A3) For the remaining data, we assume ε > 0, γ > 0, yd ∈ L2 (Ω), ud , yc ∈ L∞ (Ω) and δ ∈ Z, where Z := L2 (Ω) × L∞ (Ω) × L2 (Ω) × L∞ (Ω) × L∞ (Ω). Under these assumptions we show in this section that (P(δ)) possesses a unique solution and we characterize this solution. 2. Mixed State Constrained Optimal Control Problems Definition 2.1. A function y is called a weak solution of the elliptic PDE Ay = f on Ω y=0 on ∂Ω 33 (2.1) if y ∈ H01 (Ω) and a[y, v] = (f, v) holds for all v ∈ H01 (Ω). It is known that (2.1) has a unique weak solution in Y := H 2 (Ω) ∩ H01 (Ω). Lemma 2.2. Let assumptions (A1)–(A2) hold. For any given right hand side f ∈ L2 (Ω), there exists a unique weak solution of (2.1) in the space Y . It satisfies the a priori estimate kykH 2 (Ω) 6 CΩ kf kL2 (Ω) . (2.2) Moreover, the maximum principle holds, i.e., f > 0 a.e. on Ω implies y > 0 a.e. on Ω. Proof. The proof of H 2 (Ω)-regularity and the a priori estimate can be found in [9, Theorem 2.4.2.5]. For the proof of the maximum principle, we use v = y − = − min{0, y} ∈ H01 (Ω) as a test function [11]. We obtain c ky − k2H 1 (Ω) 6 a[y − , y − ] = −a[y, y − ] = −(f, y − ) 6 0, hence y − = 0 and y > 0 almost everywhere on Ω. The previous lemma gives rise to the definition of the linear solution mapping S : L2 (Ω) 3 f 7−→ y = Sf ∈ Y. We recall that due to the Sobolev embedding theorem [1], there exist C∞ > 0 and C2 > 0 such that kykL∞ (Ω) 6 C∞ kykH 2 (Ω) ∀y ∈ H 2 (Ω) kykL2 (Ω) 6 C2 kykL∞ (Ω) ∀y ∈ L∞ (Ω). Lemma 2.3. For any δ ∈ Z, problem (P(δ)) admits a feasible pair (y, u) satisfying (1.1)–(1.2). Proof. Let δ ∈ Z be given and let us define 1 u(x) := C∞ CΩ kδ3 kL2 (Ω) + kyc kL∞ (Ω) + kδ5 kL∞ (Ω) + kδ4 kL∞ (Ω) = const > 0 ε for all x ∈ Ω. Then u − δ4 > 0 holds a.e. on Ω. Moreover, we define y := S(u + δ3 ) and estimate εu + y − yc − δ5 = C∞ CΩ kδ3 kL2 (Ω) + kyc kL∞ (Ω) + kδ5 kL∞ (Ω) + ε kδ4 kL∞ (Ω) + Su + Sδ3 − yc − δ5 > C∞ CΩ kδ3 kL2 (Ω) + ε kδ4 kL∞ (Ω) + Su + Sδ3 . Due to Lemma 2.2, we have kSδ3 kL∞ (Ω) 6 C∞ kSδ3 kH 2 (Ω) 6 C∞ CΩ kδ3 kL2 (Ω) and Su > 0. It follows that εu + y − yc − δ5 > ε kδ4 kL∞ (Ω) > 0 a.e. on Ω, hence (1.1)–(1.2) are satisfied. For future reference, we define the cost functional associated to (P(δ)) Z Z γ 1 2 2 y δ1 dx − u δ2 dx J(y, u, δ) := ky − yd kL2 (Ω) + ku − ud kL2 (Ω) − 2 2 Ω Ω and the reduced cost functional ˜ δ) := J(S(u + δ3 ), u, δ). J(u, 34 Stability and Sensitivity Analysis Lemma 2.4. For any δ ∈ Z, problem (P(δ)) has a unique global optimal solution. Proof. Let δ ∈ Z be given and let us define Mδ := {u ∈ L2 (Ω) : u > δ4 , εu + S(u + δ3 ) − δ5 > yc a.e. on Ω}. Note that Mδ is a convex subset of L2 (Ω) since S is a linear operator. Mδ is nonempty due to Lemma 2.3. It is easy to see that the reduced cost functional ˜ δ) ∈ R is strictly convex on Mδ , radially unbounded and weakly Mδ 3 u 7−→ J(u, lower semicontinuous. Due to a classical result from convex analysis, see e.g., [21], (P(δ)) has a unique global solution. Let us define the Lagrange functional L : Y × L2 (Ω) × Y × L2 (Ω) × L2 (Ω) → R L(y, u, p, µ1 , µ2 ) = J(y, u, δ) + a[y, p] − (p, u + δ3 ) − (µ1 , u − δ4 ) − (µ2 , εu + y − yc − δ5 ). From the general Kuhn Tucker theory in Banach spaces, one expects that the optimal solution of (P(δ)) has associated Lagrange multipliers p ∈ L2 (Ω) and µi ∈ L∞ (Ω)∗ . However, for the problem (P(δ)) under consideration and other control problems of bottleneck type, the existence of regular Lagrange multipliers µi ∈ L∞ (Ω) was shown in [19, Theorem 7.3], which implies p ∈ Y . Lemma 2.5 (Optimality System). Let δ ∈ Z be given. (i) Suppose that (y, u) is the unique global solution of (P(δ)). Then there exist Lagrange multipliers µi ∈ L∞ (Ω), i = 1, 2, and an adjoint state p ∈ Y such that (y − yd , v) − (δ1 , v) + a[v, p] − (µ2 , v) = 0 ∀v ∈ H01 (Ω) (2.3) γ(u − ud , v) − (δ2 , v) − (p, v) − (µ1 , v) − (εµ2 , v) = 0 ∀v ∈ L2 (Ω) (2.4) ∀v ∈ H01 (Ω) (2.5) a.e. on Ω (2.6) a[y, v] − (u, v) − (δ3 , v) = 0 µ1 (u − δ4 ) = 0 µ1 > 0, u − δ4 > 0 µ2 (εu + y − yc − δ5 ) = 0 µ2 > 0, εu + y − yc − δ5 > 0 is satisfied. (ii) On the other hand, if (y ∗ , u∗ , p∗ , µ∗1 , µ∗2 ) ∈ Y × L2 (Ω) × Y × L2 (Ω) × L2 (Ω) satiesfies (2.3)–(2.6), then (y ∗ , u∗ ) is the unique global optimum of (P(δ)). Proof. Part (i) was proved in [19, Theorem 7.3]. For part (ii), let (y, u) be any admissible pair for (P(δ)), i.e., satisfying (1.1)–(1.2). We consider the difference γ 1 2 2 ky − y ∗ kL2 (Ω) + ku − u∗ kL2 (Ω) 2 2 + (y − y ∗ , y ∗ − yd ) − (y − y ∗ , δ1 ) + γ (u − u∗ , u∗ − ud ) − (u − u∗ , δ2 ), J(y, u, δ) − J(y ∗ , u∗ , δ) = 2 2 2 where we used kak − kbk = ka − bk + 2(a − b, b). To evaluate the terms in the scalar products, we use equations (2.3)–(2.5). First, (2.4) yields γ(u∗ − ud , u − u∗ ) − (δ2 , u − u∗ ) = (p∗ , u − u∗ ) + (µ∗1 , u − u∗ ) + (εµ∗2 , u − u∗ ). Since both (y, u) and (y ∗ , u∗ ) satisfy (2.5), we obtain for their difference that a[y − y ∗ , p∗ ] = (u − u∗ , p∗ ) 2. Mixed State Constrained Optimal Control Problems 35 holds. Finally, using v = y − y ∗ in (2.3) for p∗ , we get (y ∗ − yd , y − y ∗ ) − (δ1 , y − y ∗ ) + a[y − y ∗ , p∗ ] = (µ∗2 , y − y ∗ ). Hence we conclude J(y, u, δ) − J(y ∗ , u∗ , δ) = 1 γ 2 2 ky − y ∗ kL2 (Ω) + ku − u∗ kL2 (Ω) 2 2 + (y − y ∗ + ε(u − u∗ ), µ∗2 ) + (u − u∗ , µ∗1 ). Note that by (2.6), we obtain µ∗1 (u∗ − δ4 ) = 0 and µ∗1 (u − δ4 ) > 0 a.e. on Ω, hence (u − u∗ , µ∗1 ) > 0. Similarly, one obtains (y − y ∗ + ε(u − u∗ ), µ∗2 ) > 0. Consequently, we have γ 2 J(y, u, δ) − J(y ∗ , u∗ , δ) > ku − u∗ kL2 (Ω) 2 which shows that (y ∗ , u∗ ) is the unique global solution. Remark 2.6. The Lagrange multipliers µi and the adjoint state p associated to the unique solution of (P(δ)) need not be unique. Consider the following example on an arbitrary bounded domain Ω with Lipschitz boundary: γ 1 2 2 Minimize kykL2 (Ω) + ku − ud kL2 (Ω) 2 2 −∆y = u on Ω, u > 0 on Ω subject to y = 0 on ∂Ω, εu + y > 0 on Ω. Suppose that ud := −γ −1 (ε + S1), where 1 denotes the constant function 1. Due to the maximum principle (Lemma 2.2), ud 6 −γ −1 ε holds a.e. on Ω. Apparently, y = u = 0 is the unique solution of this problem. Any tuple (p, µ1 , µ2 ) satisfying (2.3), (2.4) and (2.6), i.e., −∆p = µ2 p=0 on Ω on ∂Ω µ1 > 0, µ2 > 0 a.e. on Ω −γud − p − µ1 − εµ2 = 0 a.e. on Ω is a set of Lagrange multipliers for the problem. It is easy to check that (p, µ1 , µ2 ) = (S1, 0, 1) and (p, µ1 , µ2 ) = (0, ε + S1, 0) both satisfy this system, and so does any convex combination. The L∞ -regularity of the Lagrange multipliers and the control will be shown by means of a projection formula. This idea was introduced in [20]. However, in that paper the situation was simpler since both inequalities could not be active simultaneously. Lemma 2.7. Suppose that δ ∈ Z and (y, u, p, µ1 , µ2 ) ∈ Y ×L2 (Ω)×Y ×L2 (Ω)×L2 (Ω) satisfy (2.4) and (2.6). Then the following projection formula o n 1 (2.7) µ1 + εµ2 = max 0, γ(max{δ4 , (yc + δ5 − y)} − ud ) − p − δ2 ε is valid. Moreover, u, µ1 , µ2 ∈ L∞ (Ω) hold. Proof. From (2.6), we obtain 1 u > δ4 hence u > max{δ4 , (yc + δ5 − y)}. u > yc +δε5 −y , ε (2.8) Plugging this into (2.4) we get 1 µ1 + εµ2 = γ(u − ud ) − p − δ2 > γ max{δ4 , (yc + δ5 − y)} − ud − p − δ2 . ε Since µ1 + εµ2 > 0, we have n o 1 µ1 + εµ2 > max 0, γ(max{δ4 , (yc + δ5 − y)} − ud ) − p − δ2 . ε (2.9) 36 Stability and Sensitivity Analysis We proceed by distinguishing two subsets of Ω. (a) On Ω1 = {x ∈ Ω : µ1 (x) > 0 or µ2 > 0}, at least one of the inequality constraints is active. Thus (2.8) yields u = max{δ4 , 1ε (yc + δ5 − y)}, equality holds in (2.9), and (2.7) follows. (b) On Ω2 = {x ∈ Ω : µ1 (x) = µ2 (x) = 0}, the left hand side in (2.9) is zero and again (2.7) follows. To show the boundedness of u, µ1 and µ2 , we see that the expression inside the inner max-function in (2.7) is an L∞ (Ω)-function due to assumption (A3) and the fact that y, p ∈ H 2 (Ω) which embeds into L∞ (Ω). The L∞ (Ω)-regularity is preserved by the max-function. Consequently, we have µ1 + εµ2 ∈ L∞ (Ω). Moreover, the estimate 0 6 µ1 6 µ1 + εµ2 6 kµ1 + εµ2 kL∞ (Ω) shows that µ1 ∈ L∞ (Ω), and similarly µ2 ∈ L∞ (Ω). Finally, equation (2.4), i.e u= 1 (p + µ1 + εµ2 + δ2 ) + ud γ a.e. on Ω yields u ∈ L∞ (Ω). (2.10) We have noted above that the Lagrange multipliers µi and the adjoint state p need not be unique. Hence it is impossible to prove the Lipschitz stability of these quantities without further assumptions. As a remedy, we impose a condition at the solution (y0 , u0 ) of the reference problem (P(0)) which ensures that the active sets are well separated. This leads us to the following definition: Definition 2.8. Let σ > 0 be real number. We define two subsets S1σ = {x ∈ Ω : 0 6 u0 (x) 6 σ} S2σ = {x ∈ Ω : 0 6 εu0 (x) + y0 (x) − yc (x) 6 σ}, called the security sets of level σ for (P(0)). The sets Aδ1 = {x ∈ Ω : uδ (x) − δ4 (x) = 0} Aδ2 = {x ∈ Ω : εuδ (x) + yδ (x) − yc (x) − δ5 (x) = 0} are called the active sets of problem (P(δ)). From now on we emphasize the dependence of the problem on the parameter δ and denote the unique solution of (P(δ)) by (yδ , uδ ). Assumption. (A4) We require that S1σ ∩ S2σ = Ø for some fixed σ > 0. Note that A01 ⊂ S1σ , and A02 ⊂ S2σ , i.e. A01 ∩ A02 = Ø and the active sets at the reference problem (P(0)) do not intersect. We will show in the remainder of the paper that (A4) implies that also Aδ1 ∩ Aδ2 = Ø for δ sufficiently small. More precisely, we will determine a function g(σ) such that Aδ1 ∩ Aδ2 = Ø for all kδkZ 6 g(σ). It will be shown that this assumption also guarantees the uniqueness and Lipschitz stability of the Lagrange multipliers and adjoint states. As an intermediate step, we consider in Section 3 a family of auxiliary problems (Paux (δ)), in which the active sets are separated by construction. This technique was suggested in [12] in the context of ordinary differential equations. 2. Mixed State Constrained Optimal Control Problems 37 3. Stability Analysis for an Auxiliary Problem In this section we introduce an auxiliary optimal control problem, in which we restrict the inequality constraints (1.2) to the disjoint sets S1σ and S2σ , respectively. Assumptions (A1)–(A4) are taken to hold throughout the remainder of the paper. We consider Z Z 1 γ 2 2 min ky − yd kL2 (Ω) + ku − ud kL2 (Ω) − y δ1 dx − u δ2 dx (Paux (δ)) 2 2 Ω Ω subject to the elliptic state equation Ay = u + δ3 y=0 on Ω (3.1) on ∂Ω and the pointwise constraints u − δ4 > 0 εu + y − δ5 > yc on S1σ (3.2) on S2σ . With analogous arguments as for (P(δ)), it is easy to see that (Paux (δ)) has a unique aux ∞ aux solution (yδaux , uaux δ ) ∈ Y ×L (Ω) with associated Lagrange multipliers (µ1,δ , µ2,δ ) ∈ ∞ σ ∞ σ aux L (S1 ) × L (S2 ) and adjoint state pδ ∈ Y which satisfy the following necessary and sufficient optimality system: aux (yδaux − yd , v) − (δ1 , v) + a[v, paux δ ] − (µ2,δ , v) = 0 ∀v ∈ H01 (Ω) aux aux γ(uaux − ud , v) − (δ2 , v) − (paux δ δ , v) − (µ1,δ , v) − (εµ2,δ , v) = 0 ∀v ∈ L2 (Ω) a[yδaux , v] − (uaux δ , v) − (δ3 , v) = 0 aux µaux − δ4 ) = 0 1,δ (uδ µaux 1,δ uaux δ > 0, > 0, εuaux δ + yδaux ) − δ4 > 0 aux µaux + yδaux − yc − δ5 ) = 0 2,δ (εuδ µaux 2,δ ∀v ∈ H01 (Ω) a.e. on S1σ ) − yc − δ 5 > 0 a.e. on S2σ In order to give a meaning to the scalar products in the first and second equation, aux the Lagrange multipliers µaux 1,δ and µ2,δ are extended from their respective domains σ σ of definition S1 and S2 to Ω by zero. Lemma 3.1. The Lagrange multipliers and adjoint state for (Paux (δ)) are unique. Proof. We exploit that S1σ ∩ S2σ = Ø by Assumption (A4) and multiply the second σ equation with the characteristic function χS1σ . Since µaux 2,δ = 0 on S1 , we obtain aux µaux − ud ) − δ2 − paux δ 1,δ = γ(uδ Likewise, by multiplying with χS2σ , we obtain a.e. on S1σ . 1 γ(uaux − ud ) − δ2 − paux a.e. on S2σ . δ δ ε We plug this expression into the adjoint equation and obtain 1 aux a0 [v, paux − yd , v) + γ(uaux − ud , χS2σ · v) − (δ2 , χS2σ · v) δ ] = (δ1 , v) − (yδ δ ε for all v ∈ H01 (Ω), where 1 a0 [v, p] := a[v, p] + (p , χS2σ · v) ε µaux 2,δ = 38 Stability and Sensitivity Analysis is a modification of the original bilinear form. Note that a0 [y, v] 6 c + ε−1 kykH 1 (Ω) kvkH 1 (Ω) 2 a0 [y, y] > c kykH 1 (Ω) and thus the problem a0 [v, p] = (f, v) for all v ∈ H01 (Ω) admits a unique solution which satisfies the a priori estimate kpkH 2 (Ω) 6 CΩ∗ kf kL2 (Ω) , (3.3) paux δ compare Lemma 2.2. Note that the equation for contains only known data and aux the unique solution (yδaux , uaux ), hence p is also unique. From the equations for δ δ aux µaux and µ we conclude the uniqueness of the Lagrange multipliers. 1,δ 2,δ 3.1. Stability Analysis in L2 . As delineated in the introduction, the original problem depends on perturbation parameters δ ∈ Z. In particular, (P(δ)) includes perturbations of the desired state in view of Z 1 1 2 2 ky − (yd + δ1 )kL2 (Ω) = ky − yd kL2 (Ω) − yδ1 + c, 2 2 Ω where c is a constant. In the same way, δ2 covers perturbations in the desired control ud , and δ3 accounts for perturbations in the right hand side of the PDE, while δ4 and δ5 are perturbations of the inequality constraints (1.2). Now we can state the main result of this section concerning the Lipschitz stability of the optimal state and control with respect to perturbations for (Paux (δ)). Proposition 3.2. Let Assumptions (A1)–(A4) be satisfied. Then there exists a constant Laux > 0 such that for any δ, δ 0 ∈ Z, the corresponding unique solutions of the auxiliary problem satisfy aux kyδaux − yδaux kH 2 (Ω) + kuaux 0 δ 0 − uδ kL2 (Ω) 6 Laux kδ 0 − δk[L2 (Ω)]5 . This result can be obtained from a general result on strong regularity for generalized equations, see [5, Theorem 5.20]. Nevertheless, we give here a short direct proof. We begin with an auxiliary result. Lemma 3.3. The Lagrange multipliers associated to the solutions (yδaux , uaux δ ) and aux , uaux (δ)) and (Paux (δ 0 )) satisfy (yδaux 0 δ 0 ) of (P aux aux aux aux aux aux aux µaux − yδaux + ε(uaux 2,δ 0 − µ2,δ , yδ 0 δ 0 − uδ ) + µ1,δ 0 − µ1,δ , uδ 0 − uδ aux 0 aux aux 0 6 µaux 2,δ 0 − µ2,δ , δ5 − δ5 + µ1,δ 0 − µ1,δ , δ4 − δ4 . Proof. Using the complementarity conditions in the optimality system, we infer and aux −µaux − δ4 ) = 0 1,δ (uδ aux 0 −µaux 1,δ 0 (uδ 0 − δ4 ) = 0 aux µaux − δ4 ) > 0 1,δ 0 (uδ aux 0 µaux 1,δ (uδ 0 − δ4 ) > 0 aux aux aux aux 0 µaux 6 µaux 1,δ 0 − µ1,δ , uδ 0 − uδ 1,δ 0 − µ1,δ , δ4 − δ4 follows. Similarly, one obtains the second part. Proof of Proposition 3.2. Let δ, δ 0 ∈ Z be arbitrary. We abbreviate δu := uaux − uaux δ δ0 and similarly for the remaining quantitites. We consider the respective optimality systems and start with the adjoint equation using v = δy as test function. We obtain 2 kδykL2 (Ω) = (δ10 − δ1 , δy) + (δµ2 , δy) − a[δy, δp]. 2. Mixed State Constrained Optimal Control Problems 39 Testing the difference of the second equations in the optimality system with v = δu yields 2 γ kδukL2 (Ω) = (δ20 − δ2 , δu) + (δp, δu) + (δµ1 , δu) + ε(δµ2 , δu). From the state equation, tested with δp, we get a[δy, δp] − (δu, δp) − (δ30 − δ3 , δp) = 0. Adding these equations yields 2 2 kδykL2 (Ω) + γ kδukL2 (Ω) = (δ10 − δ1 , δy) + (δ20 − δ2 , δu) − (δ30 − δ3 , δp) + (δµ2 , δy) + (δµ1 , δu) + ε(δµ2 , δu). Applying Lemma 3.3 shows that 2 2 kδykL2 (Ω) + γ kδukL2 (Ω) 6 (δ10 − δ1 , δy) + (δ20 − δ2 , δu) − (δ30 − δ3 , δp) + (δµ2 , δ50 − δ5 ) + (δµ1 , δ40 − δ4 ). Cauchy’s and Young’s inequality imply that γ 1 1 1 2 2 2 2 2 kδykL2 (Ω) + kδukL2 (Ω) 6 kδ10 − δ1 kL2 (Ω) + kδ 0 − δ2 kL2 (Ω) + κ kδpkL2 (Ω) 2 2 2 2γ 2 1 1 2 2 2 + kδ 0 − δ3 kL2 (Ω) + κ kδµ2 kL2 (Ω) + kδ 0 − δ5 kL2 (Ω) 4κ 3 4κ 5 1 2 2 + κ kδµ1 kL2 (Ω) + (3.4) kδ 0 − δ4 kL2 (Ω) , 4κ 4 where κ > 0 will be specified below. The difference of the adjoint states satisfies 1 a0 [v, δp] = (δ10 − δ1 , v) − (δy, v) + γ(δu , χS2σ · v) − (δ20 − δ2 , χS2σ · v) , ε 0 where a [·, ·] was defined in the proof of Lemma 3.1. By (3.3) we can estimate the difference of the adjoint states, kδpkL2 (Ω) 6 kδpkH 2 (Ω) 6 CΩ∗ kδ10 − δ1 kL2 (Ω) + kδykL2 (Ω) γ 1 kδukL2 (S σ ) + kδ20 − δ2 kL2 (S σ ) . (3.5) 2 2 ε ε Moreover, with the representation of the Lagrange multipliers from Lemma 3.1, we find + kδµ1 kL2 (Ω) = kδµ1 kS σ (Ω) 6 γ kδukL2 (Ω) + kδ20 − δ2 kL2 (Ω) + kδpkL2 (Ω) 1 1 γ kδukL2 (Ω) + kδ20 − δ2 kL2 (Ω) + kδpkL2 (Ω) . ε Plugging these estimates into (3.4), we obtain 1 γ c2 2 2 2 kδykL2 (Ω) + kδukL2 (Ω) 6 c1 + + c3 κ kδ 0 − δk[L2 (Ω)]5 2 2 κ 2 2 + c4 κ kδykL2 (Ω) + kδukL2 (Ω) kδµ2 kL2 (Ω) = kδµ2 kS σ (Ω) 6 2 where c1 , . . . , c4 depend only on γ, ε and CΩ∗ . Now we choose κ > 0 such that c4 κ < 21 min{1, γ}. We obtain kδuk2L2 (Ω) 6 L0 · kδ 0 − δk2[L2 (Ω)]5 . Using a priori estimate (2.2), Lipschitz stability for the state follows: kδyk2H 2 (Ω) and the proof is complete. 6 L1 · kδ 0 − δk2[L2 (Ω)]5 40 Stability and Sensitivity Analysis Corollary 3.4. There exists a constant L2 > 0 such that for any δ, δ 0 ∈ Z, the corresponding adjoint states of the auxiliary problem satisfy aux kpaux δ 0 − pδ kH 2 (Ω) 6 L2 · kδ 0 − δk[L2 (Ω)]5 . This result is evident directly from (3.5) and Proposition 3.2. 3.2. Stability Analysis in L∞ . The considerations in Section 3.1 describe the stability behavior of the auxiliary problem (Paux (δ)). However, the results are not strong enough to apply them to the original problem (P(δ)). Indeed, we will make this precise in the following remark. This is the reason why we consider stability estimates in L∞ in this subsection. The key in showing the desired estimates is the projection formula, Lemma 2.7. We emphasize that the uniform second order growth condition holds only with respect to the L2 -norm. Therefore, general stability results (e.g. [5, Theorem 5.20]) cannot be applied here. Proposition 3.5. Suppose that Assumptions (A1)–(A3) hold and that (y0 , u0 ) is the optimal solution of (P(0)) which satisfies the separation assumption (A4). Moreover, we assume that the active set A01 contains an open ball B such that µ1,0 > M > 0 holds on B. Then for every R > 0 there exists δ ∈ [L2 (Ω)]5 with kδk[L2 (Ω)]5 < R such that the dual variables for (P(δ)) are not unique. Consequently, the dual variables cannot be Lipschitz stable with respect to perturbations. Note that this implies in particular that the generalized equation representing the optimality system of (P(0)) is not strongly regular, see [5, Definition 5.12]. The proof is given in the appendix. Let us now start with the L∞ stability estimates for (Paux (δ)). Lemma 3.6. Let (A1)–(A4) be satisfied. Then there exists a constant L3 > 0 such that for any δ, δ 0 ∈ Z, the corresponding unique solutions of the auxiliary problem satisfy aux 0 kuaux δ 0 − uδ kL∞ (Ω) 6 L3 · kδ − δkZ . Proof. From the projection formula (2.7) we have almost everywhere on Ω aux aux aux µaux 1,δ 0 − µ1,δ + ε(µ2,δ 0 − µ2,δ ) o n 1 0 )} − ud ) − paux − δ = max 0, γ(max{δ40 , (yc + δ50 − yδaux 0 0 δ 2 ε n o 1 − max 0, γ(max{δ4 , (yc + δ5 − yδaux )} − ud ) − paux − δ 2 . δ ε Using max{a, b} − max{c, d} 6 max{a − c, b − d} twice and the fact that e 6 f implies max{0, e} 6 max{0, f }, we continue n 1 1 6 max 0, γ max{δ40 , (yc + δ50 − yδaux )} − max{δ4 , (yc + δ5 − yδaux )} 0 ε ε o aux aux 0 − (pδ0 − pδ ) − (δ2 − δ2 ) n 1 − yδaux ) 6 max 0, γ max δ40 − δ4 , (δ50 − δ5 ) − (yδaux 0 ε o aux aux 0 − (pδ0 − pδ ) − (δ2 − δ2 ) o n 1 0 kδ5 − δ5 kL∞ (Ω) + kyδaux − yδaux kL∞ (Ω) 6 γ max kδ40 − δ4 kL∞ (Ω) , 0 ε aux 0 + kpaux − p k + kδ 0 δ δ 2 − δ2 kL∞ (Ω) . L∞ (Ω) 2. Mixed State Constrained Optimal Control Problems 41 From the embedding of H 2 (Ω) into L∞ (Ω) we have aux ε−1 kyδaux − yδaux kL∞ (Ω) + kpaux 0 δ 0 − pδ kL∞ (Ω) aux − yδaux kH 2 (Ω) + kpaux 6 C∞ ε−1 kyδaux 0 δ 0 − pδ kH 2 (Ω) . By Proposition 3.2 and Corollary 3.4, the right hand side can be estimated by C∞ ε−1 Laux + L2 kδ 0 − δk[L2 (Ω)]5 . Collecting terms and replacing the norm in [L2 (Ω)]5 by the stronger norm in Z, we obtain aux aux aux 0 0 µaux a.e. on Ω. 1,δ 0 − µ1,δ + ε(µ2,δ 0 − µ2,δ ) 6 L3 kδ − δkZ Since the same inequality is obtained by exchanging the roles of δ and δ 0 , we have aux aux aux 0 0 kµaux 1,δ 0 − µ1,δ + ε(µ2,δ 0 − µ2,δ )kL∞ (Ω) 6 L3 kδ − δkZ . The claim then follows from applying the estimates above to aux = uaux δ 0 − uδ 1 0 aux aux aux aux aux (paux δ 0 − pδ ) + (µ1,δ 0 − µ1,δ ) + ε(µ2,δ 0 − µ2,δ ) + (δ2 − δ2 ) . γ Corollary 3.7. For δ 0 = 0 the previous lemma implies ku0 − uaux δ kL∞ (Ω) 6 L3 kδkZ . 4. Stability Analysis for the Original Problem In this section we formulate the main result of Lipschitz continuity for the primal and dual variables of (P(δ)). We have seen in Proposition 3.5 that the structure of the active sets of (P(δ)) can change dramatically even for arbitrarily small perturbations with respect to the L2 norm. By contrast, the stability estimates in L∞ with respect to the norm of Z are strong enough in order for the constraints to stay inactive outside of the security sets for small perturbations. This implies that for sufficiently small δ, the solutions of (P(δ)) and (Paux (δ)) coincide. We will admit δ ∈ Z which satisfy the condition kδkZ 6 g(σ) := min{g1 (σ), g2 (σ)}, where g1 (σ) := σ L3 +1 and g2 (σ) := (4.1) σ εL3 +C∞ C2 L1 +1 . Lemma 4.1. Suppose that kδkZ 6 g(σ) and that (yδaux , uaux δ ) is the unique solution aux of (Paux (δ)) with adjoint state paux and Lagrange multipliers (µaux δ 1,δ , µ2,δ ). Then the solution is feasible for the original problem (P(δ)). When the multipliers are extended aux aux aux by zero outside S1σ and S2σ , respectively, the tuple (yδaux , uaux δ , pδ , µ1,δ , µ2,δ ) satisfies aux aux the optimality system (2.3)–(2.6). In particular, (yδ , uδ ) is the unique solution of (P(δ)). aux Proof. The pair (yδaux , uaux (δ)), i.e., δ ) is feasible for (P uaux − δ4 > 0 on S1σ δ εuaux + yδaux − δ5 > yc δ on S2σ and we have to show εuaux δ uaux − δ4 > 0 on Ω \ S1σ δ + yδaux − δ5 > yc on Ω \ S2σ . 42 Stability and Sensitivity Analysis As u0 > σ holds a.e. on Ω \ S1σ , we have uaux − δ4 = u0 + uaux − u0 − δ4 δ δ > u0 − ku0 − uaux δ kL∞ (Ω) − kδ4 kL∞ (Ω) > σ − L3 kδkZ − kδ4 kL∞ (Ω) > σ − (L3 + 1) g1 (σ) = 0 almost everywhere on Ω \ S1σ . As for the second inequality, we have εu0 + y0 − yc > σ on Ω \ S2σ and consequently εuaux + yδaux − yc − δ5 = εu0 + y0 − yc + ε(uaux − u0 ) + (yδaux − y0 ) − δ5 δ δ aux > εu0 + y0 − yc − εku0 − uaux kL∞ (Ω) − kδ5 kL∞ (Ω) δ kL∞ (Ω) − ky0 − yδ > σ − εL3 kδkZ − C∞ C2 L1 kδkZ − kδ5 kL∞ (Ω) > σ − (εL3 + C∞ C2 L1 + 1) g2 (σ) = 0 almost everyhere on Ω \ S2σ . aux We extend the multipliers (µaux 1,δ , µ2,δ ) by zero to all of Ω. Then it is easy to see aux aux aux that (yδaux , uaux δ , pδ , µ1,δ , µ2,δ ) satisfies the optimality system (2.3)–(2.6), which is a sufficient condition for optimality of (P(δ)) by Lemma 2.5. Theorem 4.2. There exists a constant L > 0 such that for any δ, δ 0 ∈ Z satisfying (4.1), the unique solutions (yδ , uδ ) and (yδ0 , uδ0 ) of (P(δ)) and (P(δ 0 )) satisfy kyδ0 − yδ kH 2 (Ω) + kuδ0 − uδ kL∞ (Ω) 6 L · kδ 0 − δkZ . (4.2) 0 Proof. By the previous lemma, (yδ , uδ ) = (yδaux , uaux δ ) and the same for δ . Hence we aux can apply the Lipschitz stability results for (P (δ)), Proposition 3.2 and Lemma 3.6, to obtain (4.2). Corollary 4.3. For any δ ∈ Z satisfying (4.1), we have Aδ1 ⊂ S1σ and Aδ2 ⊂ S2σ , hence Aδ1 ∩ Aδ2 = Ø. Moreover, the Lagrange multipliers and adjoint state for (P(δ)) are unique and coincide with those for (Paux (δ)). Proof. We consider a point x∗ ∈ Aδ1 , so uδ (x∗ ) − δ4 (x∗ ) = 0 holds. u0 (x∗ ) = u0 (x∗ ) − uδ (x∗ ) + uδ (x∗ ) − δ4 (x∗ ) + δ4 (x∗ ) 6 ku0 − uδ kL∞ (Ω) + kδ4 kL∞ (Ω) 6 L3 kδkZ + kδkZ 6 σ, where we have used Corollary 3.7. This shows x∗ ∈ S1σ and Aδ1 ⊂ S1σ . Analogously, Aδ2 ⊂ S2σ and by Assumption (A4), we have Aδ1 ∩ Aδ2 = Ø. Using the same arguments as in Lemma 3.1, we see that the Lagrange multipliers µi,δ and adjoint state p for aux aux aux (P(δ)) are unique. In Lemma 4.1, the tuple (yδaux , uaux δ , pδ , µ1,δ , µ2,δ ) was shown to satisfy the optimality system (2.3)–(2.6) for (P(δ)), so in particular the Lagrange multipliers and adjoint state for (P(δ)) coincide with those for (Paux (δ)). The previous corollary allows us to use the symbols pδ , µ1,δ and µ2,δ without ambiguity for kδkZ 6 g(σ). Finally, we obtain a Lipschitz stability result also for these quantities: Corollary 4.4. There exist constants L4 , L5 and L6 > 0 such that for any δ, δ 0 ∈ Z satisfying (4.1), the unique adjoint state and Lagrange multipliers (pδ , µ1,δ , µ2,δ ) and 2. Mixed State Constrained Optimal Control Problems 43 (pδ0 , µ1,δ0 , µ2,δ0 ) associated to the solutions of (P(δ)) and (P(δ 0 )), respectively, satisfy kpδ0 − pδ kH 2 (Ω) 6 L2 · kδ 0 − δk[L2 (Ω)]5 kµ1,δ0 − µ1,δ kL∞ (Ω) 6 L5 · kδ 0 − δkZ kµ2,δ0 − µ2,δ kL∞ (Ω) 6 L6 · kδ 0 − δkZ . Proof. The first claim follows from Corollary 3.4 and the equality pδ = paux from the δ previous corollary. From the proof of Lemma 3.6, we have kµ1,δ0 − µ1,δ + ε(µ2,δ0 − µ2,δ )kL∞ (Ω) 6 L2 kδ 0 − δkZ . Since µ1,δ0 − µ1,δ is zero outside S1σ , we get max kµ1,δ0 − µ1,δ kL∞ (Ω) , ε kµ2,δ0 − µ2,δ kL∞ (Ω) 6 L2 kδ 0 − δkZ and the claim follows. Acknowledgement This work was partially supported by the Austrian Science Fund FWF under project number P18056-N12. Appendix A. Proof of Proposition 3.5 Let (y0 , u0 , p0 , µ1,0 , µ2,0 ) be any solution of the optimality system (2.3)–(2.6) for (P(0)). Due to the separation assumption (A4), this is also a solution of the optimality system for (Paux (0)). Since the solution of the optimality system for (Paux (0)) is unique, see Lemma 3.1, this must hold for (P(0)) as well. In particular, (y0 , u0 , p0 , µ1,0 , µ2,0 ) = aux aux aux (y0aux , uaux 0 , p0 , µ1,0 , µ2,0 ). Let us denote by B the open ball centered at ξ ∈ Ω contained in A01 such that µ1,0 > M > 0 holds on B. Let r > 0 such that Br (ξ) ⊂ B and kε u0 + y0 − yc kL∞ (Ω) |Br |1/2 < R. We choose δ1 = · · · = δ4 ≡ 0 and ( ε u 0 + y0 − yc δ5 = 0 in Br in Ω \ Br . It follows immediately that kδk[L2 (Ω)]5 < R. It is also easy to see that (y0 , u0 ) is feasible for (P(δ)). Moreover, (y0 , u0 , p0 , µ1,0 , µ2,0 ) satisfies the optimality system for (P(δ)). However, we will show that this solution of the optimality system for (P(δ)) is not unique with respect to the dual variables. We choose κ > 0 and ( κ, in Br (ξ) µ2 = µ2,0 , elsewhere and let p be corresponding solution of (2.3). We set ( µ1,0 − ε µ2 + p0 − p, in Br (ξ) µ1 = µ1,0 , elsewhere. It is easy to check that (p, µ1 , µ2 ) satisfies (2.4). It remains to show that µ1 > 0 holds. We find µ1 > M − ε κ − kp0 − pkL∞ (Ω) > M − ε κ − κ CΩ C∞ χBr (ξ) L2 (Ω) > M − ε κ − κ CΩ C∞ |Br (ξ)|1/2 > M − ε κ − κ CΩ C∞ |Ω|1/2 in Br (ξ). Consequently, µ1 > 0 holds on all of Ω for sufficiently small κ. Therefore, the tuple (y0 , u0 , p, µ1 , µ2 ) satisfies the optimality system (2.3)–(2.6) and it is different from (y0 , u0 , p0 , µ1,0 , µ2,0 ) in view of κ > 0. 44 Stability and Sensitivity Analysis References [1] R. Adams. Sobolev Spaces. Academic Press, New York, 1975. [2] W. Alt. Local stability of solutions to differentiable optimization problems in Banach spaces. Journal of Optimization Theory and Application, 70:443–466, 1991. [3] W. Alt. Discretization and mesh-independence of Newton’s method for generalized equations. In Antony V. Fiacco, editor, Mathematical Programming with Data Perturbations V, volume 195 of Lecture Notes in Pure and Applied Mathematics, pages 1–30. Marcel Dekker, 1997. [4] W. Alt and K. Malanowski. The Lagrange-Newton method for nonlinear optimal control problems. Computational Optimization and Application, 2:77–100, 1993. [5] F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems. Springer, Berlin, 2000. [6] A. L. Dontchev and W. W. Hager. Implicit functions, Lipschitz maps, and stability in optimization. Mathematics of Operations Research, 19:753–768, 1994. [7] A. L. Dontchev, W. W. Hager, A. B. Poore, and B. Yang. Optimality, stability, and convergence in nonlinear control. Applied Mathematics and Optimization, 31:297–326, 1995. [8] R. Griesse. Lipschitz stability of solutions to some state-constrained elliptic optimal control problems. to appear in: Journal of Analysis and its Applications, 2005. [9] P. Grisvard. Elliptic Problems in Nonsmooth Domains. Pitman, Boston, 1985. [10] K. Ito and K. Kunisch. Sensitivity analysis of solutions to optimization problems in Hilbert spaces with applications to optimal control and estimation. Journal of Differential Equations, 99:1–40, 1992. [11] D. Kinderlehrer and G. Stampacchia. An Introduction to Variational Inequalities and Their Applications. Academic Press, New York, 1980. [12] K. Malanowski. Stability and sensitivity analysis for optimal control problems with control-state constraints. Dissertationes Mathematicae (Rozprawy Matematyczne), 394, 2001. [13] K. Malanowski, C. Büskens, and H. Maurer. Convergence of approximations to nonlinear optimal control problems. In Antony V. Fiacco, editor, Mathematical Programming with Data Perturbations V, volume 195 of Lecture Notes in Pure and Applied Mathematics, pages 253–284. Marcel Dekker, 1997. [14] K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control for elliptic equations. Control and Cybernetics, 29:237–256, 2000. [15] C. Meyer, U. Prüfert, and F. Tröltzsch. On two numerical methods for state-constrained elliptic control probelms. submitted, 2005. [16] C. Meyer, A. Rösch, and F. Tröltzsch. Optimal control of PDEs with regularized pointwise state constraints. Computational Optimization and Applications, 33(2–3):209–228, 2005. [17] C. Meyer and F. Tröltzsch. On an elliptic optimal control problem with pointwise mixed controlstate constraints. In A. Seeger, editor, Recent Advances in Optimization. Proceedings of the 12th French-German-Spanish Conference on Optimization, volume 563 of Lecture Notes in Economics and Mathematical Systems, pages 187–204, New York, 2006. Springer. [18] Stephen M. Robinson. Strongly regular generalized equations. Mathematics of Operations Research, 5:43–62, 1980. [19] A. Rösch and F. Tröltzsch. Existence of regular Lagrange multipliers for elliptic optimal control problems with pointwise control-state constraints. SIAM Journal on Control and Optimization, 45(2):548–564, 2006. [20] A. Rösch and D. Wachsmuth. Regularity of solutions for an optimal control problem with mixed control-state constraints. submitted, 2005. [21] E. Zeidler. Applied Functional Analysis: Main Principles and their Applications. Springer, New York, 1995. 3. Sensitivity Analysis for NSE Opt. Control Problems 45 3. Sensitivity Analysis for Optimal Control Problems Involving the Navier-Stokes Equations R. Griesse, M. Hintermüller and M. Hinze: Differential Stability of Control Constrained Optimal Control Problems for the Navier-Stokes Equations, Numerical Functional Analysis and Optimization 26(7–8), p.829–850, 2005 The Navier-Stokes equations govern the flow of an (here incompressible) viscous fluid and thus have numerous applications. We consider here the optimal control problem with distributed (vector-valued) control and pointwise (componentwise) control constraints, Z Z Z αT αQ T |y − yQ |2 dx dt + |y(·, T ) − yT |2 dx Minimize 2 0 Ω 2 Ω Z Z Z Z γ T αR T | curl y|2 dx dt + |u|2 dx dt + 2 0 Ω 2 0 Ω yt + (y · ∇)y − ν∆y + ∇p = u in Q := Ω × (0, T ), (3.1) div y = 0 in Q, subject to y = 0 on Σ := ∂Ω × (0, T ), y(·, 0) = y0 in Ω, and ua ≤ u ≤ ub a.e. in Q. This optimal control problem and its solutions are considered to be functions of a number of perturbation parameters, namely of the scalars αQ , αT , αR and desired state functions yQ , yT appearing in the objective, of the viscosity ν (the inverse of the Reynolds number), and of the initial conditions y0 in the state equation. In our notation from the introduction of Chapter 1, we denote the vector of perturbation parameters by π = (ν, αQ , αT , αR , γ, yQ , yT , y0 ) ∈ P := R5 × L2 (Q) × H × V. Before the publication of this paper, the Lipschitz stability of local optimal solutions with respect to such parameters had been investigated in Roubı́ček and Tröltzsch [2003] for the steady-state case and in Hintermüller and Hinze [2006], Wachsmuth [2005] for the time-dependent case. We take this analysis one step further and prove that under second-order sufficient conditions, the dependence of local optimal solutions on π is indeed directionally differentiable. As outlined in the introduction of Chapter 1, this analysis can be carried out by rewriting the optimality system in terms of a generalized equation. It is then sufficient to analyze a linearization of this generalized equation and employ the Implicit Function Theorem 0.6. The core step is proved in Theorem 3.9 of the paper under discussion, which establishes the directional differentiability of the linearized optimality system with respect to certain perturbations δ. We work here with divergence-free spaces, which avoids the need of dealing with perturbations in the incompressibility condition of the linearized forward and adjoint equations. The differentiability property of local optimal solutions of (3.1) with respect to π allows a second-order Taylor expansion of the minimum value function, which is calculated and discussed in Section 5 of the paper. The steady-state case is easier and is briefly treated in Section 6. 46 Stability and Sensitivity Analysis DIFFERENTIAL STABILITY OF CONTROL CONSTRAINED OPTIMAL CONTROL PROBLEMS FOR THE NAVIER-STOKES EQUATIONS ROLAND GRIESSE, MICHAEL HINZE, AND MICHAEL HINTERMÜLLER Abstract. Distributed optimal control problems for the time-dependent and the stationary Navier-Stokes equations subject to pointwise control constraints are considered. Under a coercivity condition on the Hessian of the Lagrange function, optimal solutions are shown to be directionally differentiable functions of perturbation parameters such as the Reynolds number, the desired trajectory, or the initial conditions. The derivative is characterized as the solution of an auxiliary linear-quadratic optimal control problem. Thus, it can be computed at relatively low cost. Taylor expansions of the minimum value function are provided as well. 1. Introduction Perturbation theory for continuous minimization problems is of fundamental importance since many real world applications are embedded in families of optimization problems. Frequently, these families are generated by scalar or vector-valued parameters, such as the Reynolds number in fluid flow, desired state trajectories, initial conditions for time-dependent problems, and many more. From a theoretical as well as numerical algorithmic point of view the behavior of optimal solutions under variations of the parameters is of interest: • The knowledge of smoothness properties of the parameter-to-solution map allows to establish a qualitative theory. • On the numerical level one can exploit stability results for proving convergence of numerical schemes, or to develop algorithms with real time features. In fact, based on a known nominal local solution of the optimization problem, the solution of a nearby problem obtained by small variations of one or more parameters is approximated by the solution of a typically simpler minimization problem than the original one. Motivated by these aspects, in the present paper we contribute to the presently ongoing investigation of stability properties of PDE-constrained optimal control problems. Due to its importance in many applications in hydrodynamics, medicine, environmental or ocean sciences, our work is based on the following control constrained optimal control problem for the transient Navier-Stokes equations, i.e., we aim to Z Z Z αT αQ T |y − yQ |2 dx dt + |y(·, T ) − yT |2 dx 2 0 Ω 2 Ω Z Z Z Z αR T γ T + | curl y|2 dx dt + |u|2 dx dt (1.1) 2 0 Ω 2 0 Ω minimize J(y, u) = 3. Sensitivity Analysis for NSE Opt. Control Problems 47 subject to the instationary Navier-Stokes system with distributed control u on a fixed domain Ω ⊂ R2 given by yt + (y · ∇)y − ν∆y + ∇π = u in Q := Ω × (0, T ), (1.2) div y = 0 in Q, (1.3) y=0 y(·, 0) = y0 on in Σ := ∂Ω × (0, T ), Ω, (1.4) (1.5) in Q. (1.6) and pointwise control constraints of the form a(x, t) ≤ u(x, t) ≤ b(x, t) In (1.1)–(1.6) we have ν, γ > 0, and αQ , αT , αR ≥ 0. Further, we assume that the data yQ , yT and y0 are sufficiently smooth; for more details see the subsequent sections. We frequently refer to (1.1)–(1.6) as (P). The optimal control problem (P) and its solutions are considered to be functions of a number of perturbation parameters, namely of the scalars αQ , αT , αR and desired state functions yQ , yT appearing in the objective J, of the viscosity ν (the inverse of the Reynolds number), and of the initial conditions y0 in the state equation. To emphasize the dependence on such a parameter vector p, we also write (P(p)) instead of (P). The main result of our paper states that under a coercivity condition on the Hessian of the Lagrangian of (P(p∗ )), where p∗ denotes some nominal (or reference) parameter, an optimal solution is directionally differentiable with respect to p ∈ B(p∗ ) with B(p∗ ) some sufficiently small neighborhood of p∗ . We also characterize this derivative as the solution of a linear-quadratic optimal control problem which involves the linearized Navier-Stokes equations as well as pointwise inequality constraints on the control similar to (1.6). While this work is primarily concerned with analysis, in a forthcoming paper we focus on the algorithmic implications alluded to above. Let us relate our work to recent efforts in the field: On the one hand, optimal control problems for the Navier-Stokes equations (without dependence on a parameter) have received a formidable amount of attention in recent years. Here we only mention [5, 9] for steady-state problems and [1, 10, 11, 14, 27] for the time-dependent case. On the other hand, a number of stability results for solutions to a variety of control-constrained optimal control problems have been developed recently. As in the present paper, these analyses concern the behavior of optimal solutions under perturbations of finite or infinite dimensional parameters in the problem. We refer to, e.g., [18,24] for Lipschitz stability in optimal control of linear and semilinear parabolic equations, and [7,16] for recent results on differentiability properties. Related results for linear elliptic problems with nonlinear boundary control can be found in [17, 19]. Further, Lipschitz stability for state-constrained elliptic optimal control problems is the subject of [8]. For optimal control problems involving the Navier-Stokes equations with distributed control, Lipschitz stability results have been obtained in [22] for the steady-state and in [12, 28] for the time-dependent case. However, differential stability results are still missing and are the focus of the present paper. It is known that both Lipschitz and differential stability hinge on the condition of strong regularity of the first order necessary conditions at a nominal solution; see Dontchev [6] and Remark 3.8 below. The strong regularity of such a system is a consequence of a coercivity condition on the Hessian of the Lagrangian, which is closely related to second order sufficient conditions; compare Remark 4.2. Strong regularity is also the basis of convergence proofs for numerical algorithms; see [2] for the general Lagrange-Newton method and [12] for a SQP semismooth Newton-type algorithm for the control of the time-dependent Navier-Stokes equations. The plan of the paper is as follows: Section 2 introduces some notation and the function space setting used throughout the paper. In Section 3 we recall the first order 48 Stability and Sensitivity Analysis optimality system (OS) for our problem (P). We state the coercivity condition needed (Assumption 3.4) to prove the strong regularity and to establish differential stability results for a linearized version (LOS) of (OS) (see Theorem 3.9). Our main result is given in Section 4: By an implicit function theorem for generalized equations, the directional differentiability property carries over to the nonlinear optimality system (OS), and the directional derivatives can be characterized. Additionally, we find that our coercivity assumption implies the second order sufficient condition of [26], which guarantees that critical points are indeed strict local optimizers. We proceed in Section 5 by presenting Taylor expansions of the optimal value function about a given nominal parameter value. Section 6 covers the case of the stationary Navier-Stokes equations. Due to the similarity of the arguments involved, we only state the results briefly. 2. Preliminaries For the reader’s convenience we now collect the preliminaries for a proper analytical formulation of our problem (P). Throughout, we assume that Ω ⊂ R2 is a bounded domain with C 2 boundary ∂Ω. For given final time T > 0, we denote by Q the timespace cylinder Q = Ω × (0, T ) and by Σ its lateral boundary Σ = ∂Ω × (0, T ). We begin with defining the spaces H = closure in [L2 (Ω)]2 of {v ∈ [C0∞ (Ω)]2 : div v = 0} V = closure in [H 1 (Ω)]2 of {v ∈ [C0∞ (Ω)]2 : div v = 0}. These spaces form a Gelfand triple (see [23]): V ,→ H = H 0 ,→ V 0 , where V 0 denotes the dual of V , and analogously for H 0 . Next we introduce the Hilbert spaces Wqp = {v ∈ Lp (0, T ; V ) : vt ∈ Lq (0, T ; V 0 )}, endowed with the norm kvkWqp = kvkLp (V ) + kvt kLq (V 0 ) . We use W = W22 . Further, we define H 2,1 = {v ∈ L2 (0, T ; H 2 (Ω) ∩ V ) : vt ∈ L2 (0, T ; H)}, endowed with the norm kvkH 2,1 = kvkL2 (H 2 (Ω)) + kvt kL2 (L2 (Ω)) . Here and elsewhere, vt refers to the distributional derivative of v with respect to the time variable. For the sake of brevity, we simply write L2 (V ) instead of L2 (0, T ; V ), etc. Depending on the context, by h·, ·i we denote the duality pairing of either V and V 0 or L2 (V ) and L2 (V 0 ), respectively. Additionally, by (·, ·) we denote the scalar products of L2 (Ω) and L2 (Q). In the sequel, we will find it convenient to write L2 (Ω) or L2 (Q) when we actually refer to [L2 (Ω)]2 or [L2 (Q)]2 , respectively. In the following lemma, we recall some results about W and H 2,1 . The proofs can be found in [4, 15, 20]; compare also [13]: Lemma 2.1 (Properties of W and H 2,1 ). (a) The space W is continuously embedded in the space C([0, T ]; H). (b) The space W is compactly embedded in the space L2 (H) ⊆ L2 (Q). (c) The space H 2,1 is continuously embedded in the space C([0, T ]; V ). The time-dependent Navier-Stokes equations (1.2)–(1.5) are understood in their weak form with divergence-free and boundary conditions incorporated in the space V . 3. Sensitivity Analysis for NSE Opt. Control Problems 49 That is, y ∈ W is a weak solution to the system (1.2)–(1.5) with given u ∈ L2 (V 0 ) if and only if yt + (y · ∇)y − ν∆y = u y(·, 0) = y0 in L2 (V 0 ), (2.1) in H. (2.2) As usual, the pressure term ∇π cancels out due to the solenoidal, i.e., divergence-free, function space setting. There holds, (compare [3, 23]): Lemma 2.2 (Navier-Stokes Equations). For every y0 ∈ H and u ∈ L2 (V 0 ), there exists a unique weak solution y ∈ W of (1.2)–(1.5). The map H × L2 (V 0 ) 3 (y0 , u) 7→ y ∈ W is locally Lipschitz continuous. Likewise, for every y0 ∈ V and u ∈ L2 (Q), there exists a unique weak solution y ∈ H 2,1 of (1.2)–(1.5). The map V × L2 (Q) 3 (y0 , u) 7→ y ∈ H 2,1 is locally Lipschitz continuous. For the linearized Navier-Stokes system, we have (compare [14]): Lemma 2.3 (Linearized Navier-Stokes Equations). Assume that y ∗ ∈ W and let f ∈ L2 (V 0 ) and g ∈ H. Then the linearized Navier-Stokes system yt + (y ∗ · ∇)y + (y · ∇)y ∗ − ν∆y = f y(·, 0) = g in L2 (V 0 ) in H has a unique solution y ∈ W , which depends continuously on the data: kykW ≤ c (kf kL2 (V 0 ) + kgkL2 (Ω) ) (2.3) where the constant c is independent of f and g. Likewise, if y ∗ ∈ W ∩ L∞ (V ) ∩ L2 (H 2 (Ω)), f ∈ L2 (Q) and g ∈ V , then y ∈ H 2,1 holds with continuous dependence on the data: kykH 2,1 ≤ c (kf kL2 (Q) + kgkH 1 (Ω) ). (2.4) Subsequently, we need the following result for the adjoint system (see [14, Proposition 2.4]): Lemma 2.4 (Adjoint Equation). Assume that y ∗ ∈ W ∩ L∞ (V ) and let f ∈ L2 (V 0 ) and g ∈ H. Then the adjoint equation −λt + (∇y ∗ )> λ − (y ∗ · ∇)λ − ν∆λ = f λ(·, T ) = g in W 0 in H has a unique solution in λ ∈ W , which depends continuously on the data: kλkW ≤ c (kf kL2 (V 0 ) + kgkL2 (Ω) ) (2.5) where c is independent of f and g. Next we define the Lagrange function L : W × U × W → R of (P): αT αQ ky − yQ k2L2 (Q) + ky(·, T ) − yT k2L2 (Ω) 2 2 Z T αR γ 2 2 + k curl ykL2 (Q) + kukL2 (Q) + hyt + (y · ∇)y − ν∆y, λi dt 2 2 0 Z − (u, λ) + (y(·, 0) − y0 )λ(·, 0) dx L(y, u, λ) = Ω (2.6) where we took care of the fact that the Lagrange multiplier belonging to the constraint y(·, 0) = y0 is identical to λ(·, 0) ∈ H, which is the adjoint state at the initial time. 50 Stability and Sensitivity Analysis The Lagrangian is infinitely continuously differentiable and its second derivatives with respect to y and u read Lyy (y, u, λ)(y1 , y2 ) = αQ (y1 , y2 ) + αT (y1 (·, T ), y2 (·, T )) + αR (curl y1 , curl y2 ) Z Z + ((y1 · ∇)y2 )λ dx dt + ((y2 · ∇)y1 )λ dx dt (2.7) Q Q Luu (y, u, λ)(u1 , u2 ) = γ(u1 , u2 ) while Lyu and Luy vanish. In order to complete the proper description of problem (P), we recall for y ∈ R2 the definition ∂ ∂ ∂ y − y 2 1 ∂y ∂x ∂y ∂ ∂ . y2 − y1 and curl curl y = curl y = ∂ ∂ ∂ ∂x ∂y − ∂x ∂x y2 − ∂y y1 It is straightforward to check that for y ∈ W , curl y ∈ L2 (Q) and curl curl y ∈ L2 (V 0 ). 3. Differential Stability of the Linearized Optimality System In the present section we recall the first order optimality system (OS) associated with our problem (P). We reformulate it as a generalized equation (GE) and introduce its linearization (LGE). Then we prove directional differentiability of the solutions to the linearized generalized equation (LGE). By virtue of an implicit function theorem for generalized equations due to Robinson [21] and Dontchev [6], the differentiability property carries over to the solution map of the original nonlinear optimality system (OS), as is detailed in Section 4. Let us begin by specifying the analytical setting for our problem (P). To this end, we define the control space U = L2 (Q) and the closed convex subset of admissible controls Uad = {u ∈ L2 (Q) : a(x, t) ≤ u(x, t) ≤ b(x, t) a.e. on Q} ⊂ U, where a(x, t) and b(x, t) are the bounds in L2 (Q). The inequalities are understood componentwise. This choice of the control space motivates to use H 2,1 as the state space, presumed the initial condition y0 is smooth enough. We can now write (P) in the compact form Minimize J(y, u) over H 2,1 × Uad subject to (2.1)–(2.2). As announced earlier, we consider (P) in dependence on the parameter vector p = (ν, αQ , αT , αR , γ, yQ , yT , y0 ) ∈ P = R5 × L2 (Q) × H × V, which involves both quantities appearing in the objective function and in the governing equations. To ensure well-posedness of (P), we invoke the following assumption on p: Assumption 3.1. We assume the viscosity parameter ν is positive and that the initial conditions y0 are given in V . The weights in the objective satisfy αQ , αT , αR ≥ 0 and γ > 0. Moreover, the desired trajectory and terminal states are yQ ∈ L2 (Q) and yT ∈ H, respectively. Under Assumption 3.1 it is standard to argue existence of a solution to (P); see, e.g., [1]. A solution (y, u) ∈ H 2,1 × Uad is characterized by the following lemma. 3. Sensitivity Analysis for NSE Opt. Control Problems 51 Lemma 3.2 (Optimality System). Let Assumption 3.1 hold, and let (y, u) ∈ H 2,1 × Uad be a local minimizer of (P). Then there exists a unique adjoint state λ ∈ W such that the following optimality system is satisfied: − λt + (∇y)> λ − (y · ∇)λ − ν∆λ = −αQ (y − yQ ) − αR curl curl y Z Q λ(·, T ) = −αT (y(·, T ) − yT ) (γu − λ)(u − u) dx dt ≥ 0 in W 0 in H for all u ∈ Uad yt + (y · ∇)y − ν∆y = u y(·, 0) = y0 (OS) in L2 (V 0 ) in H . As motivated in Section 2, we have stated the state and adjoint equations in their weak form and in the solenoidal setting to eliminate the pressure π and the corresponding adjoint pressure. In order to reformulate the optimality system (OS) as a generalized equation we introduce the set-valued mapping N3 (u) : L2 (Q) → L2 (Q) as the dual cone of the set of admissible controls Uad at u, i.e., N3 (u) = {v ∈ L2 (Q) : (v, u − u) ≤ 0 for all u ∈ Uad } (3.1) if u ∈ Uad , and N3 (u) = ∅ in case u 6∈ Uad . It is easily seen that the variational inequality in (OS) is equivalent to 0 ∈ γu − λ + N3 (u). Next we introduce the set-valued mapping N (u) = (0, 0, N3 (u), 0, 0)> and define F = (F1 , F2 , F3 , F4 , F5 )> as F1 (y, u, λ, p) = − λt + (∇y)> λ − (y · ∇)λ − ν∆λ + αQ (y − yQ ) + αR curl curl y, F2 (y, u, λ, p) = λ(·, T ) + αT (y(·, T ) − yT ), F3 (y, u, λ, p) = γu − λ, F4 (y, u, λ, p) = yt + (y · ∇)y − ν∆y − u, (3.2) F5 (y, u, λ, p) = y(·, 0) − y0 with F : H 2,1 × U × W × P → L2 (V 0 ) × H × L2 (Q) × L2 (Q) × V. Note that the parameter p appears as an additional argument. The optimality system (OS) can now be rewritten as the generalized equation 0 ∈ F(y, u, λ, p) = F (y, u, λ, p) + N (u). 1 (GE) Note that F(·, p) is a C function; compare [12]. From now on, let p∗ denote a reference (or nominal) parameter with associated solution (y ∗ , u∗ , λ∗ ). Our goal is to show that the solution map p 7→ (yp , up , λp ) for (GE) is well-defined near p∗ and that it is directionally differentiable at p∗ . By the work of Robinson [21] and Dontchev [6], it is sufficient to show that the solutions to the linearized generalized equation y − y∗ ∗ ∗ ∗ ∗ 0 ∗ ∗ ∗ ∗ δ ∈ F(y , u , λ , p ) + F (y , u , λ , p ) u − u∗ + N (u) (LGE) λ − λ∗ 52 Stability and Sensitivity Analysis have these properties for sufficiently small δ. This fact is appealing since one has to deal with a linearization of F instead of the fully nonlinear system. In addition, one only needs to consider perturbations δ which, unlike p, appear solely on the left hand side of the equation. Note that F is the gradient of the Lagrangian L (see (2.6)), and F 0 , the derivative with respect to (y, u, λ), is its Hessian. Throughout this section we work under the following assumption: ∗ ∗ ∗ Assumption 3.3. Let p∗ = (ν ∗ , αQ , αT∗ , αR , γ ∗ , yQ , yT∗ , y0∗ ) ∈ P = R5 ×L2 (Q)×H ×V be a given reference or nominal parameter such that Assumption 3.1 is satisfied. Moreover, let (y ∗ , u∗ , λ∗ ) be a given nominal solution to the first order necessary conditions (OS). A short calculation shows that the linearized generalized equality (LGE) is identical to the system − λt + (∇y ∗ )> λ − (y ∗ · ∇)λ − ν ∗ ∆λ ∗ ∗ = − αQ (y − yQ ) − αR curl curl y − (∇(y − y ∗ ))> λ∗ + ((y − y ∗ ) · ∇)λ∗ + δ1 Z Q λ(·, T ) = − αT∗ (y(·, T ) − yT∗ ) + δ2 in W 0 in H ∗ (γ u − λ − δ3 )(u − u) dx dt ≥ 0 for all u ∈ Uad (LOS) yt + (y ∗ · ∇)y + (y · ∇)y ∗ − ν ∗ ∆y = u + δ4 + (y ∗ · ∇)y ∗ y(·, 0) = y0∗ + δ5 in L2 (V 0 ) in H. In turn, (LOS) can be interpreted as the first order optimality system for the linear quadratic problem (AQP(δ)), depending on δ: Z ∗ Z T Z αQ αT∗ ∗ 2 Minimize |y − yQ | dx dt + |y(·, T ) − yT∗ |2 dx 2 0 Ω 2 Ω Z TZ Z Z α∗ γ∗ T + R | curl y|2 dx dt + |u|2 dx dt − hδ1 , yiL2 (V 0 ),L2 (V ) 2 0 Ω 2 0 Ω Z TZ − (δ2 , y(·, T )) − (δ3 , u) + ((y − y ∗ ) · ∇)(y − y ∗ )λ∗ dx dt 0 Ω subject to the linearized Navier-Stokes system given above in (LOS) and u ∈ Uad . Note that the nominal solution (y ∗ , u∗ , λ∗ ) satisfies both the nonlinear optimality system (OS) and the linearized optimality system (LOS) for δ = 0. The following coercivity condition is crucial for proving Lipschitz continuity and directional differentiability of the function δ 7→ (yδ , uδ , λδ ) which maps a perturbation δ to a solution of (AQP(δ)): Assumption 3.4 (Coercivity). Suppose that there exists ρ > 0 such that the coercivity condition Υ(y, u) := ∗ αQ α∗ α∗ γ∗ kyk2L2 (Q) + T ky(·, T )k2L2 (Ω) + R k curl yk2L2 (Q) + kuk2L2 (Q) 2 2 2 2 Z TZ + (3.3) ((y · ∇)y)λ∗ dx dt ≥ ρ kuk2L2 (Q) 0 Ω holds at least for all u = u1 − u2 where u1 , u2 ∈ Uad , i.e., for all u ∈ L2 (Q) which satisfy |u(x, t)| ≤ b(x, t) − a(x, t) a.e. on Q (in the componentwise sense), and for the 3. Sensitivity Analysis for NSE Opt. Control Problems 53 corresponding states y ∈ H 2,1 satisfying the linear PDE yt + (y ∗ · ∇)y + (y · ∇)y ∗ − ν ∗ ∆y = u y(·, 0) = 0 in L2 (V 0 ) , (3.4) in H. (3.5) Remark 3.5 (Strict Convexity). Let C = {(y, u) | u ∈ Uad , y satisfies (3.4)–(3.5)}. The Coercivity Assumption 3.4 immediately implies that C 3 (y, u) 7→ Υ(y, u) is strictly convex over C. Since the quadratic part of the objective (3.3) in (AQP(δ)) coincides with Υ, (3.3) is also strictly convex over C. The same holds for the objective (3.7) in the auxiliary problem (DQP(δ̂)) below so that the strict convexity will allow us to conclude uniqueness of the sensitivity derivative in the proof of Theorem 3.9 later on. Finally, we notice that Υ(y, u) is equal to 12 Lxx (y ∗ , u∗ , λ∗ )(x, x) with p = p∗ and x = (y, u, λ); compare (2.7). Remark 3.6 (Smallness of the Adjoint). Obviously the only term in (3.3) which can spoil the coercivity condition is the term involving λ∗ , which originates from the state equation’s nonlinearity. Hence, for the coercivity condition to be satisfied, it is sufficient that the nominal adjoint variable λ∗ is sufficiently small in an appropriate norm. In fact, for λ∗ = 0 condition (3.3) holds with ρ = γ ∗ /2 > 0. A first consequence of the coercivity assumption is the Lipschitz continuity of the map δ 7→ (yδ , uδ , λδ ). We refer to [25] for the Burgers equation, to [22] for the stationary Navier-Stokes equations and to [12, 28] for the instationary case. Lemma 3.7 (Lipschitz Stability). Under Assumptions 3.3 and 3.4, there exists a unique solution (yδ , uδ , λδ ) to (LOS) and thus to (LGE) for every δ. The mapping δ 7→ (yδ , uδ , λδ ) is Lipschitz continuous from L2 (V 0 ) × H × L2 (Q) × L2 (Q) × V to H 2,1 × U × W . Remark 3.8 (Strong Regularity). The Lipschitz stability property established by Lemma 3.7 above is called strong regularity of the generalized equation (GE) at the nominal critical point (y ∗ , u∗ , λ∗ , p∗ ). Strong regularity implies that the Lipschitz continuity and differentiability properties of the map δ 7→ (yδ , uδ , λδ ) are inherited by the map p 7→ (yp , up , λp ) in view of the implicit function theorem for generalized equations; see [21] and [6]. This is utilized below in Section 4. Note that in the absence of control constraints, the operator N (u) is identical to {0}, and strong regularity becomes bounded invertability of the Hessian of the Lagrangian F 0 , which is also required by the classical implicit function theorem. To study the directional differentiability of the map δ 7→ (yδ , uδ , λδ ), we introduce the following definitions: At the nominal solution (y ∗ , u∗ , λ∗ ), we define (up to sets of measure zero) Q+ = {(x, t) ∈ Q : u∗ (x, t) = a(x, t)} and Q− = {(x, t) ∈ Q : u∗ (x, t) = b(x, t)} collecting the points where the constraint u∗ ∈ Uad is active. We again point out that indeed there is one such set for each component of u, but we can continue to use our notation without ambiguity. From the variational inequality in (OS) one infers that γu − λ ∈ L2 (Q) acts as a Lagrange multiplier for the constraint u ∈ Uad . Hence we define the sets ∗ ∗ ∗ Q+ 0 = {(x, t) ∈ Q : (γ u − λ )(x, t) > 0} and ∗ ∗ ∗ Q− 0 = {(x, t) ∈ Q : (γ u − λ )(x, t) < 0} 54 Stability and Sensitivity Analysis − + − where the constraint is said to be strongly active. Note that Q+ 0 ⊂ Q and Q0 ⊂ Q hold true. Finally, we set bad = {u ∈ L2 (Q) : u ≥ 0 on Q− , u ≤ 0 on Q+ , u = 0 on Q+ ∪ Q− }. U 0 0 (3.6) bad contains the admissible control variations (see Theorem 3.9 below) and The set U reflects the fact that on Q− , where the nominal control u∗ is equal to the lower bound a, any admissible sequence of controls can approach it only from above; analogously for Q+ . In addition, the control variation is zero to first order on the strongly active + subsets Q− 0 and Q0 . We now turn to the main result of this section, which is to prove directional differentiability of the map δ 7→ (yδ , uδ , λδ ). This extends the proof of Lipschitz stability of the same map in [12,22,28]. It turns out that the coercivity Assumption 3.4 is already sufficient to obtain our new result. Subsequently we denote by ”→” convergence with respect to the strong topology and by ”*” convergence with respect to the weak topology. Theorem 3.9. Under Assumptions 3.3 and 3.4, the mapping δ 7→ (yδ , uδ , λδ ) is directionally differentiable at δ = 0. The derivative in the direction of δ̂ = (δ̂1 , δ̂2 , δ̂3 , δ̂4 , δ̂5 )> ∈ L2 (V 0 ) × H × L2 (Q) × L2 (Q) × V is given by the unique solution (ŷ, û) ∈ H 2,1 × U and adjoint variable λ̂ ∈ W of the linear-quadratic problem (DQP(δ̂)) Z ∗ Z T Z αQ αT∗ 2 Minimize |y| dx dt + |y(·, T )|2 dx 2 0 Ω 2 Ω Z TZ Z Z α∗ γ∗ T + R | curl y|2 dx dt + |u|2 dx dt − δ̂1 , y L2 (V 0 ),L2 (V ) 2 0 Ω 2 0 Ω Z TZ − (δ̂2 , y(·, T )) − (δ̂3 , u) + ((y · ∇)y)λ∗ dx dt (3.7) 0 Ω subject to the linearized Navier-Stokes system yt + (y · ∇)y ∗ + (y ∗ · ∇)y − ν ∗ ∆y = u + δ̂4 y(·, 0) = δ̂5 in L2 (V 0 ), in H (3.8) bad . Its first order conditions are and u ∈ U − λt + (∇y ∗ )> λ − (y ∗ · ∇)λ − ν ∗ ∆λ ∗ ∗ = − αQ y − αR curl curl y − (∇y)> λ∗ + (y · ∇)λ∗ + δ̂1 λ(·, T ) = − αT∗ y(·, T ) + δ̂2 Z (γ ∗ u − λ − δ̂3 )(u − u) dx dt ≥ 0 Q in W 0 , in H, bad , for all u ∈ U (3.9) (3.10) plus the state equation (3.8). Proof. Let δ̂ ∈ L2 (V 0 ) × H × L2 (Q) × L2 (Q) × V be any given direction of perturbation and let {τn } be a sequence of real numbers such that τn & 0. We set δn = τn δ̂ and denote the solution of (AQP(δn )) by (yn , un , λn ). Note that (y ∗ , u∗ , λ∗ ) is the solution of (AQP(0)). Then, by virtue of Lemma 3.7, we have yn − y ∗ un − u∗ λn − λ∗ ≤ L kδ̂k + (3.11) τn 2,1 + τn 2 τn H L (Q) W with some Lipschitz constant L > 0. Since H 2,1 is a Hilbert space, we can extract a weakly convergent subsequence (still denoted by index n) and use compactness of the 3. Sensitivity Analysis for NSE Opt. Control Problems 55 embedding of H 2,1 into L2 (Q) (see Lemma 2.1) to obtain: yn − y ∗ * ŷ τn in H 2,1 and → ŷ in L2 (Q). (3.12) for some ŷ ∈ H 2,1 . In the case of λ, the same argument with H 2,1 replaced by W applies and we obtain λn − λ∗ * λ̂ in W and → λ̂ in L2 (Q) (3.13) τn for some λ̂ ∈ W . By taking yet another subsequence in (3.12) and (3.13), the convergence can be taken to hold pointwise almost everywhere in Q. Let us now denote by PUad (u) the pointwise projection of any function u onto the admissible set Uad . From the variational inequality in (LOS) it follows that 1 un = PUad (λn + τn δ̂3 ) ∈ Uad . γ∗ Following the technique in [7, 16], by distinguishing the cases of inactive, active and strongly active control, one shows that the pointwise limit in the control component is 1 bad . (λ̂ + δ̂3 ) ∈ U û = PUbad γ∗ By Lebesgue’s Dominated Convergence Theorem with a suitable upper bound (see [7]), we obtain the strong convergence in the control component: un − u∗ → û in L2 (Q). (3.14) τn Now we prove that the limit ŷ introduced in (3.12) satisfies the state equation (3.8), i.e., ŷt + (y ∗ · ∇)ŷ + (ŷ · ∇)y ∗ − ν ∗ ∆ŷ = û + δ̂4 ŷ(·, 0) = δ̂5 in L2 (V 0 ) (3.15) in H. (3.16) Recalling the linear state equation in (LOS), we observe that the quotient qn = (yn − y ∗ )/τn satisfies (qn )t + (y ∗ · ∇)qn + (qn · ∇)y ∗ − ν ∗ ∆qn = un − u∗ + δ̂4 τn in L2 (V 0 ) whose left and right hand sides converge weakly in L2 (Q) to (3.15) since the left hand side maps qn ∈ H 2,1 to an element of L2 (Q), linearly and continuously. Likewise, (3.16) is satisfied. Similarly, one proves that the limit λ̂ satisfies (3.9). To complete the proof, we need to show that the convergence in (3.12) and (3.13) is strong in H 2,1 and W , respectively. To this end, note that (yn − y ∗ )/τn − ŷ satisfies the linear state equation (3.15) with û replaced by (un − u∗ )/τn − û and δ̂4 replaced by zero. The a priori estimate (2.4) now yields the desired convergence as the right hand side tends to zero in L2 (Q), i.e., we have yn − y ∗ → ŷ τn in H 2,1 . (3.17) By a similar argument for the adjoint equation (3.9), using the a priori estimate (2.5), we find λn − λ∗ → λ̂ in W. (3.18) τn We recall that so far the convergence only holds for a subsequence. However, the whole argument remains valid if in the beginning, one starts with an arbitrary subsequence 56 Stability and Sensitivity Analysis of {τn }. Then the limit (ŷ, û, λ̂) again satisfies the first order optimality system (3.8)– (3.10). Since the critical point is unique in view of the strict convexity of the objective (3.7) guaranteed by Coercivity Assumption 3.4 and Remark 3.5, this limit is always the same, regardless of the initial subsequence. Hence the convergence in (3.14), (3.17) and (3.18) extends to the whole sequence, which proves that (ŷ, û, λ̂) is the desired directional derivative. Finally, it is straightforward to verify that (3.8)–(3.10) are the first order conditions for the linear-quadratic problem (DQP(δ̂)). 4. Differential Stability of the Nonlinear Optimality System By the implicit function theorems for generalized equations [6, 21], the properties of the solutions for the linearized optimality system (LOS) carry over to the solutions of the nonlinear optimality system (OS). In [22] and [28], this was exploited to show Lipschitz stability of the map p 7→ (yp , up , λp ) by proving the same property for δ 7→ (yδ , uδ , λδ ), in the presence of the stationary and instationary Navier-Stokes equations, respectively. We can now continue this analysis and prove that both Lipschitz continuity and directional differentiability hold. Our main result is: Theorem 4.1. Under Assumptions 3.3 and 3.4, there is a neighborhood B(p∗ ) of p∗ such that for all p ∈ B(p∗ ) there exists a solution (yp , up , λp ) to the first order conditions (OS) of the perturbed problem (P(p)). This solution is unique in a neighborhood of (y ∗ , u∗ , λ∗ ). The optimal control u, the corresponding state y and the adjoint variable λ are Lipschitz continuous functions of p in B(p∗ ) and directionally differentiable at p∗ . In the direction of p̂ = (ν̂, α̂Q , α̂T , α̂R , γ̂, ŷQ , ŷT , ŷ0 ) ∈ P = R5 × L2 (Q) × H × V, bad and the adjoint this derivative is given by the unique solution (ŷ, û) ∈ H 2,1 × U variable of the linear-quadratic problem (DQP(δ̂)) in the direction δ̂ = (δ̂1 , δ̂2 , δ̂3 , δ̂4 , δ̂5 )> = −Fp (y ∗ , u∗ , λ∗ , p∗ ) p̂ ∗ ∗ ŷQ − α̂R curl curl y ∗ ) + αQ ν̂∆λ∗ − α̂Q (y ∗ − yQ −α̂T (y ∗ (·, T ) − yT∗ ) + αT∗ ŷT = −γ̂u∗ . ∗ ν̂∆y ŷ0 (4.1) Proof. For the local uniqueness of the solution (yp , up , λp ) and its Lipschitz continuity, it is enough to verify that F is Lipschitz with respect to p near p∗ , uniformly in a neighborhood of (y ∗ , u∗ , λ∗ ). For instance, for F1 we have (see (3.2)) kF1 (y, u, λ, p1 ) − F1 (y, u, λ, p2 )kL2 (V 0 ) 1 2 1 1 2 ≤ |ν1 − ν2 | k∆λkL2 (V 0 ) + |αQ − αQ | kykL2 (Q) + |αQ | kyQ − yQ kL2 (Q) 1 2 2 1 2 + |αQ − αQ | kyQ kL2 (Q) + |αR − αR |k curl curl ykL2 (V 0 ) ≤ L kp1 − p2 k, where L depends on the diameters of the neighborhoods of (y ∗ , u∗ , λ∗ ) and p∗ only. The claim now follows from the implicit function theorem for generalized equations, see Dontchev [6, Theorem 2.4]. Directional differentiability follows from the same theorem, since it is easily seen that F is Fréchet differentiable with respect to p. The next remark clarifies that the Coercivity Assumption 3.4 implies that a second order sufficient optimality condition holds at the reference point (y ∗ , u∗ , λ∗ ), which, thus, is a strict local minimizer. 3. Sensitivity Analysis for NSE Opt. Control Problems 57 Remark 4.2 (Second Order Sufficiency). Recently, second order sufficient optimality conditions for (y ∗ , u∗ , λ∗ ) were proved in [26]. One of these conditions requires that ∗ αQ α∗ α∗ γ∗ kyk2L2 (Q) + T ky(·, T )k2L2 (Ω) + R k curl yk2L2 (Q) + kuk2L2 (Q) 2 2 2 Z2 ∗ + ((y · ∇)y)λ dx ≥ ρ kuk2Lq (Q) Ω (4.2) with q = 4/3 and some ρ > 0 holds for all pairs (y, u) where y solves (3.4) and u ∈ L2 (Q) satisfies u = u − u∗ with u ∈ Uad . Additionally, u may be chosen zero on so-called -strongly active subsets of Ω. Hence, any such u is in Uad − Uad = {u1 − u2 | u1 , u2 ∈ Uad }. Consequently, Assumption 3.4 implies that (4.2) holds for all q ≤ 2, and, by [26, Theorem 4.12], there exist α, β > 0 such that J(y, u) ≥ J(y ∗ , u∗ ) + αku − u∗ k2L4/3 (Q) holds for all admissible pairs with ku − u∗ kL2 (Q) ≤ β. In particular, (y ∗ , u∗ ) is a strict local minimizer in the sense of L2 (Q). Corollary 4.3 (Strict Local Optimality). As was already mentioned in [22, Corollary 3.5] for the stationary case, the Coercivity Assumption 3.4 and thus the second order sufficient condition (4.2) are stable under small perturbation of p∗ . That is, (3.3) ∗ ∗ ∗ , yT∗ , y0∗ ) in , γ ∗ , yQ , αT∗ , αR continues to hold, possibly with a smaller ρ, if p∗ = (ν ∗ , αQ ∗ (3.3)–(3.4) is replaced by a parameter p sufficiently close to p . As a consequence, possibly by shrinking the neighborhood U of p∗ mentioned in Theorem 4.1, the corresponding (yp , up ) are strict local minimizers for the perturbed problems (P(p)). Remark 4.4 (Strict Complementarity). Assume that û is the directional derivative of bad in the nominal control u∗ for p = p∗ , in a given direction p̂. From the definition of U (3.6) it becomes evident that in general −û can not be the directional derivative in the direction of −p̂ since it may not be admissible. That is, the directional derivative is in general not linear in the direction but only positively homogeneous. However, linearity − + − does hold if the sets Q+ 0 \ Q and Q0 \ Q are null sets, or, in other words, if strict complementarity holds at the nominal solution (y ∗ , u∗ , λ∗ ). Remark 4.5. Recall that by Assumption 3.3 one or more of the parameters αQ , αT and αR may have a nominal value of zero. That is, every neighborhood of p∗ contains parameter vectors with negative α entries. According to Corollary 4.3 however, the terms associated to these negative α values are absorbed by the ρkuk2 term in the Coercivity Assumption 3.4 for small enough perturbations, so that the perturbed problems remain locally convex. 5. Taylor Expansions of the Minimum Value Function This section is concerned with a Taylor expansion of the minimum value function p 7→ Φ(p) = J(yp , up ) 58 Stability and Sensitivity Analysis in a neighborhood of the nominal parameter p∗ . The following theorem proves that α̂T ∗ α̂Q ∗ ∗ 2 ∗ ∗ ky − yQ kL2 (Q) − αQ (y ∗ − yQ , ŷQ ) + ky (·, T ) − yT∗ k2L2 (Ω) DΦ(p∗ ; p̂) = 2 2 α̂R γ̂ − αT∗ (y ∗ (·, T ) − yT∗ , ŷT ) + k curl y ∗ k2L2 (Q) + ku∗ k2L2 (Q) 2 2 Z T Z + ν̂(∇y ∗ , ∇λ∗ ) dt − ŷ0 λ∗ (·, 0) dx (5.1) 0 Ω ∗ ∗ ∗ , y − y Q ) − αQ (ŷQ , y − y Q ) − αQ (y ∗ − yQ , ŷQ ) D2 Φ(p∗ ; p, p̂) = α̂Q (y ∗ − yQ + α̂T (y ∗ (·, T ) − yT∗ , y(·, T ) − y T ) − αT∗ (ŷT , y(·, T ) − y T ) − αT (y ∗ (·, T ) − yT∗ , ŷT ) + α̂R (curl y ∗ , curl y) + γ̂(u∗ , u) Z Z T + ŷ0 λ(·, 0) dx ν̂(∇y, ∇λ∗ ) + ν̂(∇y ∗ , ∇λ) dt − Ω 0 (5.2) are its first and second order directional derivatives. Here, p̂ = (ν̂, α̂Q , α̂T , α̂R , γ̂, ŷQ , ŷT , ŷ0 ) ∈ P = R5 × L2 (Q) × H × V and similarly p denote two given directions, and (ŷ, û, λ̂) and (y, u, λ) are the directional derivatives of the nominal solution in p∗ in the directions of p̂ and p, respectively, according to Theorem 4.1. Theorem 5.1. The minimum value function possesses the Taylor expansion 1 Φ(p∗ + τ p̂) = Φ(p∗ ) + τ DΦ(p∗ ; p̂) + τ 2 D2 Φ(p∗ ; p̂, p̂) + o(τ 2 ) 2 with the first and second directional derivatives given by (5.1)–(5.2). (5.3) Proof. It is known that the first order derivative of the value function equals the partial derivative of the Lagrangian (2.6) with respect to the parameter, i.e., DΦ(p∗ ; p̂) = Lp (y ∗ , u∗ , λ∗ , p∗ )(p̂); see, e.g., [16], which proves (5.1). For the second derivative, one has to compute the total derivative of (5.1) with respect to p, which yields (5.2). The estimate (5.3) then follows from the Taylor formula. Remark 5.2. From (5.1) we conclude that a first order Taylor expansion can be easily obtained without computing the sensitivity differentials (ŷ, û, λ̂). 6. Optimal Control of the Stationary Navier-Stokes Equations In this section we briefly comment on the case of distributed control for the stationary Navier-Stokes equations. Due to the similarity of the arguments, we only give the main results and the formulas. First of all, our problem (P) now reads: Z Z Z αR γ αΩ 2 2 |y − yΩ | dx + | curl y| dx + |u|2 dx Minimize J(y, u) = 2 Ω 2 Ω 2 Ω subject to the stationary Navier-Stokes system with distributed control u: (y · ∇)y − ν∆y + ∇π = u div y = 0 y=0 in Ω in Ω on ∂Ω and control constraints u ∈ Uad , where Uad = {u ∈ L2 (Ω) : a(x) ≤ u(x) ≤ b(x) a.e. on Ω} ⊂ U = L2 (Ω). The parameter vector reduces to p = (ν, αΩ , αR , γ, yΩ ) ∈ P = R4 × L2 (Ω). 3. Sensitivity Analysis for NSE Opt. Control Problems 59 Again, the Navier-Stokes system is understood in weak form, i.e., (y · ∇)y − ν∆y = u in V 0 . The Lagrangian in the stationary case reads αR γ αΩ ky − yΩ k2L2 (Ω) + k curl yk2L2 (Ω) + kuk2L2 (Ω) L(y, u, λ) = 2 2 2 + h(y · ∇)y − ν∆y, λi − (u, λ). The first order optimality system is given by (∇y)> λ − (y · ∇)λ − ν∆λ = −αΩ (y − yΩ ) − αR curl curl y in V 0 , Z for all u ∈ Uad (γu − λ)(u − u) dx ≥ 0 Ω (y · ∇)y − ν∆y = u (OS) in V , and F : V × U × V × P → V 0 × L2 (Ω) × V 0 now reads: F1 (y, u, λ, p) = (∇y)> λ − (y · ∇)λ − ν∆λ + αΩ (y − yΩ ) + αR curl curl y F2 (y, u, λ, p) = γu − λ F3 (y, u, λ, p) = (y · ∇)y − ν∆y − u. The conditions paralleling Assumptions 3.3 and 3.4 are: ∗ ∗ ∗ Assumption 6.1 (Nominal Point). Let p∗ = (ν ∗ , αΩ , αR , γ ∗ , yΩ ) ∈ P = R4 × L2 (Ω) ∗ ∗ be a given reference or nominal parameter such that αΩ , αR ≥ 0 and γ ∗ > 0 hold and ∗ ∈ L2 (Ω). Moreover, let (y ∗ , u∗ , λ∗ ) be a given solution to the first order necessary yΩ conditions (OS), termed a nominal solution. Assumption 6.2 (Coercivity). Suppose that there exists ρ > 0 such that the coercivity condition Z ∗ ∗ αR γ∗ αΩ 2 2 2 kykL2 (Ω) + k curl ykL2 (Ω) + kukL2 (Ω) + ((y · ∇)y)λ∗ dx ≥ ρ kuk2L2 (Ω) 2 2 2 Ω (6.1) holds for all u ∈ Uad − Uad ⊂ L2 (Ω), i.e., for all u ∈ L2 (Ω) which satisfy |u(x)| ≤ b(x) − a(x) a.e. on Ω (in the componentwise sense), and for the corresponding states y ∈ V satisfying the linear PDE (y ∗ · ∇)y + (y · ∇)y ∗ − ν ∗ ∆y = u in V 0 . (6.2) Under Assumptions 6.1 and 6.2, the results and remarks of Section 3 remain valid with the obvious modifications. In particular, we have Theorem 6.3. Under Assumptions 6.1 and 6.2, the mapping δ 7→ (yδ , uδ , λδ ) is directionally differentiable at δ = 0. The derivative in the direction of δ̂ = (δ̂1 , δ̂2 , δ̂3 )> ∈ V 0 × L2 (Ω) × V 0 is given by the unique solution (ŷ, û) ∈ V × U and adjoint variable λ̂ ∈ V of the auxiliary QP problem (DQP(δ̂)) Z Z Z α∗ α∗ γ∗ Minimize Ω |y|2 dx + R | curl y|2 dx + |u|2 dx − δ̂1 , y 2 Ω 2 Ω 2 Ω Z − (δ̂2 , u) + ((y · ∇)y)λ∗ dx (6.3) Ω subject to the stationary linearized Navier-Stokes system (y · ∇)y ∗ + (y ∗ · ∇)y − ν ∗ ∆y = u + δ̂3 in V0 (6.4) 60 Stability and Sensitivity Analysis bad . Its first order conditions are and u ∈ U ∗ ∗ (∇y ∗ )> λ − (y ∗ · ∇)λ − ν ∗ ∆λ = − αΩ y − αR curl curl y Z Ω − (∇y)> λ∗ + (y · ∇)λ∗ + δ̂1 (γ ∗ u − λ − δ̂2 )(u − u) dx ≥ 0 in V 0 bad for all u ∈ U plus the linear state equation (6.4). Also, results analogous to the ones of Section 4 remain valid. In particular, the map p 7→ (yp , up , λp ) is directionally differentiable at p∗ with the derivative given by the solution and adjoint variable of (DQP(δ̂)) in the direction of δ̂ = (δ̂1 , δ̂2 , δ̂3 )> = −Fp (y ∗ , u∗ , λ∗ , p∗ ) p̂ ∗ ∗ ν̂∆λ∗ − α̂Ω (y ∗ − yΩ ) + αΩ ŷΩ − α̂R curl curl y ∗ . = −γ̂u∗ ∗ ν̂∆y Finally, the directional derivatives of the minimum value function are α̂R α̂Ω ∗ ∗ 2 ∗ ∗ ky − yΩ kL2 (Ω) − αΩ (y ∗ − yΩ , ŷΩ ) + k curl y ∗ k2L2 (Ω) DΦ(p∗ ; p̂) = 2 2 γ̂ + ku∗ k2L2 (Ω) + ν̂(∇y ∗ , ∇λ∗ ) 2 ∗ ∗ ∗ D2 Φ(p∗ ; p, p̂) = α̂Ω (y ∗ − yΩ , y − y Ω ) − αΩ (ŷΩ , y − y Ω ) − αΩ (y ∗ − yΩ , ŷΩ ) + α̂R (curl y ∗ , curl y) + γ̂(u∗ , u) + ν̂(∇y, ∇λ∗ ) + ν̂(∇y ∗ , ∇λ). Acknowledgments The third author acknowledges support of the Sonderforschungsbereich 609 Elektromagnetische Strömungskontrolle in Metallurgie, Kristallzüchtung und Elektrochemie, located at the Technische Universität Dresden, and supported by the German Research Foundation. References [1] F. Abergel and R. Temam. On some optimal control problems in fluid mechanics. Theoretical and Computational Fluid Mechanics, 1(6):303–325, 1990. [2] W. Alt. The Lagrange-Newton method for infinite-dimensional optimization problems. Numerical Functional Analysis and Optimization, 11:201–224, 1990. [3] P. Constantin and C. Foias. Navier-Stokes Equations. The University of Chicago Press, Chicago, 1988. [4] R. Dautray and J. L. Lions. Mathematical Analysis and Numerical Methods for Science and Technology, volume 5. Springer, Berlin, 2000. [5] M. Desai and K. Ito. Optimal Controls of Navier-Stokes Equations. SIAM Journal on Control and Optimization, 32:1428–1446, 1994. [6] A. Dontchev. Implicit function theorems for generalized equations. Mathematical Programming, 70:91–106, 1995. [7] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system— Part I: Solution differentiability. Numerical Functional Analysis and Optimization, 25(1–2):93– 117, 2004. [8] R. Griesse. Lipschitz stability of solutions to some state-constrained elliptic optimal control problems. Journal of Analysis and its Applications, 25:435–455, 2006. [9] M. Gunzburger, L. Hou, and T. Svobodny. Analysis and finite element approximation of optimal control problems for the stationary Navier-Stokes equations with distributed and Neumann controls. Mathematics of Computation, 57(195):123–151, 1991. [10] M. Gunzburger and S. Manservisi. Analysis and approximation of the velocity tracking problem for Navier-Stokes flows with distribued controls. SIAM Journal on Numerical Analysis, 37(5):1481–1512, 2000. 3. Sensitivity Analysis for NSE Opt. Control Problems 61 [11] M. Gunzburger and S. Manservisi. The velocity tracking problem for Navier-Stokes flows with boundary control. SIAM Journal on Control and Optimization, 39(2):594–634, 2000. [12] M. Hintermüller and M. Hinze. An SQP Semi-Smooth Newton-Type Algorithm Applied to the Instationary Navier-Stokes System Subject to Control Constraints. SIAM Journal on Optimization, 16(4):1177–1200, 2006. [13] M. Hinze. Optimal and instantaneous control of the instationary Navier–Stokes equations. Habilitation Thesis, Fachbereich Mathematik, Technische Universität Berlin, 2000. [14] M. Hinze and K. Kunisch. Second order methods for optimal control of time-dependent fluid flow. SIAM Journal on Control and Optimization, 40(3):925–946, 2001. [15] J. L. Lions. Quelques méthodes de résolution des problemès aux limites non linéaires. Dunod Gauthier-Villars, Paris, 1969. [16] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002. [17] K. Malanowski. Solution differentiability of parametric optimal control for elliptic equations. In E. W. Sachs and R. Tichatschke, editors, System Modeling and Optimization XX, Proceedings of the 20th IFIP TC 7 Conference, pages 271–285. Kluwer Academic Publishers, 2003. [18] K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control for parabolic equations. Journal of Analysis and its Applications, 18(2):469–489, 1999. [19] K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control for elliptic equations. Control and Cybernetics, 29:237–256, 2000. [20] P. Neittaanmäki and D. Tiba. Optimal Control of Nonlinear Parabolic Systems. Marcel Dekker, New York, 1994. [21] S. Robinson. Strongly regular generalized equations. Mathematics of Operations Research, 5(1):43–62, 1980. [22] T. Roubı́ček and F. Tröltzsch. Lipschitz stability of optimal controls for the steady-state NavierStokes equations. Control and Cybernetics, 32(3):683–705, 2003. [23] R. Temam. Navier-Stokes Equations, Theory and Numerical Analysis. North-Holland, Amsterdam, 1984. [24] F. Tröltzsch. Lipschitz stability of solutions of linear-quadratic parabolic control problems with respect to perturbations. Dynamics of Continuous, Discrete and Impulsive Systems Series A Mathematical Analysis, 7(2):289–306, 2000. [25] F. Tröltzsch and S. Volkwein. The SQP method for control constrained optimal control of the Burgers equation. ESAIM: Control, Optimisation and Calculus of Variations, 6:649–674, 2001. [26] F. Tröltzsch and D. Wachsmuth. Second-order sufficient optimality conditions for the optimal control of Navier-Stokes equations. ESAIM: Control, Optimisation and Calculus of Variations, 12(1):93–119, 2006. [27] M. Ulbrich. Constrained optimal control of Navier-Stokes flow by semismooth Newton methods. Systems and Control Letters, 48:297–311, 2003. [28] D. Wachsmuth. Regularity and stability of optimal controls of instationary Navier-Stokes equations. Control and Cybernetics, 34:387–410, 2005. 62 Stability and Sensitivity Analysis 4. Sensitivity Analysis for Optimal Boundary Control Problems of a 3D Reaction-Diffusion System R. Griesse and S. Volkwein: Parametric Sensitivity Analysis for Optimal Boundary Control of a 3D Reaction-Diffusion System, in: Large-Scale Nonlinear Optimization, G. Di Pillo and M. Roma (editors), volume 83 of Nonconvex Optimization and its Applications, p.127–149, Springer, Berlin, 2006 This paper extends the previous stability and sensitivity analysis to a class of timedependent semilinear parabolic boundary optimal control problems. More precisely, we consider here the reaction-diffusion optimal control problem in three space dimensions: β2 β1 kc1 (T ) − c1T k2L2 (Ω) + kc2 (T ) − c2T k2L2 (Ω) Minimize 2 2 n Z T o3 γ 1 + ku − ud k2L2 (0,T ) + max 0, u(t) dt − uc 2 ε 0 c = D ∆c − k c c in Q := Ω × (0, T ), 1,t 1 1 1 1 2 c2,t = D2 ∆c2 − k2 c1 c2 in Q, ∂c1 =0 on Σ := ∂Ω × (0, T ), D1 (4.1) ∂n ∂c1 subject to D2 = u(t)α(x, t) on Σc , ∂n ∂c1 =0 on Σn D2 ∂n c1 (·, 0) = c10 in Ω, c (·, 0) = c in Ω, 2 20 and ua ≤ u ≤ ub a.e. in (0, T ). Here, ci denotes the concentration of the ith reactant, and Di and ki are diffusion and reaction constants. The state equation, the optimal control problem and a primal-dual active set method in function space had been analyzed previously by the authors in Griesse and Volkwein [2005] and the extended preprint Griesse and Volkwein [2003]. In the paper under discussion, we establish the Lipschitz stability and directional differentiability of local optimal solutions of (4.1) with respect to the perturbation parameter π = (Di , ki , βi , γ, uc , ε, ci0 , ciT , ud )i=1,2 , provided that second-order sufficient conditions hold. As before, we proceed by proving the Lipschitz stability and directional differentiability for the linearized optimality system (Propositions 3.2 and 3.3 in the paper). The proof requires the compactness of the spatial trace operator τ : W (0, T ) → L2 (0, T ; L2 (Γ)). The main result, Theorem 4.1, then follows from the Implicit Function Theorem 0.6. Numerical results for the nominal, the perturbed and the sensitivity problems are also provided, see Section 5 of the paper. In particular, sensitivity derivatives of the optimal control and optimal state are calculated and interpreted. 4. Sensitivity Analysis for Reaction-Diffusion Problems 63 PARAMETRIC SENSITIVITY ANALYSIS FOR OPTIMAL BOUNDARY CONTROL OF A 3D REACTION-DIFFUSION SYSTEM ROLAND GRIESSE AND STEFAN VOLKWEIN Abstract. A boundary optimal control problem for an instationary nonlinear reaction-diffusion equation system in three spatial dimensions is presented. The control is subject to pointwise control constraints and a penalized integral constraint. Under a coercivity condition on the Hessian of the Lagrange function, an optimal solution is shown to be a directionally differentiable function of perturbation parameters such as the reaction and diffusion constants or desired and initial states. The solution’s derivative, termed parametric sensitivity, is characterized as the solution of an auxiliary linear-quadratic optimal control problem. A numerical example illustrates the utility of parametric sensitivities which allow a quantitative and qualitative perturbation analysis of optimal solutions. 1. Introduction Parametric sensitivity analysis for optimal control problems governed by partial differential equations (PDE) is concerned with the behavior of optimal solutions under perturbations of system data. The subject matter of the present paper is an optimal boundary control problem for a time-dependent coupled system of semilinear parabolic reaction-diffusion equations. The equations model a chemical or biological process where the species involved are subject to diffusion and reaction among each other. The goal in the optimal control problem is to drive the reaction-diffusion model from the given initial state as close as possible to a desired terminal state. However, the control has to be chosen within given upper and lower bounds which are motivated by physical or technological considerations. In practical applications, it is unlikely that all parameters in the model are precisely known a priori. Therefore, we embed the optimal control problem into a family of problems, which depend on a parameter vector p. In our case, p can comprise physical parameters such as reaction and diffusion constants, but also desired terminal states, etc. In this paper we prove that under a coercivity condition on the Hessian of the Lagrange function, local solutions of the optimal control problem depend Lipschitz continuously and directionally differentiably on the parameter p. Moreover, we characterize the derivative as the solution of an additional linear-quadratic optimal control problem, known as the sensitivity problem. If these sensitivities are computed ”offline”, i.e., along with the optimal solution of the nominal (unperturbed) problem belonging to the expected parameter value p0 , a first order Taylor approximation can give a real-time (”online”) estimate of the perturbed solution. Let us put the current paper into a wider perspective: Lipschitz dependence and differentiability properties of parameter-dependent optimal control problems for PDEs have been investigated in the recent papers [6, 11–14, 16, 18]. In particular, sensitivity results have been derived in [6] for a two-dimensional reaction-diffusion model with distributed control. In contrast, we consider here the more difficult situation in three spatial dimensions and with boundary control and present both theoretical and numerical results. Other numerical results can be found in [3, 7]. 1 64 Stability and Sensitivity Analysis The main part of the paper is organized as follows: In Section 2, we introduce the reaction-diffusion system at hand and the corresponding optimal control problem. We also state its first order optimality conditions. Since this problem, without parameter dependence, has been thoroughly investigated in [9], we only briefly recall the main results. Section 3 is devoted to establishing the so-called strong regularity property for the optimality system. This necessitates the investigation of the linearized optimality system for which the solution is shown to be Lipschitz and differentiable with respect to perturbations. In Section 4, these properties for the linearized problem are shown to carry over to the original nonlinear optimality system, in virtue of a suitable implicit function theorem. Finally, we present some numerical results in Section 5 in order to further illustrate the concept of parametric sensitivities. Necessarily all numerical results are based on a discretized version of our infinitedimensional problem. Nevertheless we prefer to carry out the analysis in the continuous setting so that smoothness properties of the involved quantities become evident which could then be used for instance to determine rates of convergence under refinements of the discretization etc. In view of our problem involving a nonlinear time-dependent system of partial differential equations, its discretization yields a large scale nonlinear optimization problem, albeit with a special structure. 2. The Reaction-Diffusion Optimal Boundary Control Problem Reaction-diffusion equations model chemical or biological processes where the species involved are subject to diffusion and reaction among each other. As an example, we consider the reaction A + B → C which obeys the law of mass action. To simplify the discussion, we assume that the backward reaction C → A + B is negligible and that the forward reaction proceeds with a constant (not temperature-dependent) rate. This leads to a coupled semilinear parabolic system for the respective concentrations (c1 , c2 , c3 ) as follows: ∂ c1 (t, x) = d1 ∆c1 (t, x) − k1 c1 (t, x)c2 (t, x) for all (t, x) ∈ Q, (2.1a) ∂t ∂ c2 (t, x) = d2 ∆c2 (t, x) − k2 c1 (t, x)c2 (t, x) for all (t, x) ∈ Q, (2.1b) ∂t ∂ c3 (t, x) = d3 ∆c3 (t, x) + k3 c1 (t, x)c2 (t, x) for all (t, x) ∈ Q. (2.1c) ∂t The scalars di and ki , i = 1, . . . , 3, are the diffusion and reaction constants, respectively. Here and throughout, let Ω ⊂ R3 denote the domain of reaction and let Q = (0, T ) × Ω be the time-space cylinder where T > 0 is the given final time. We suppose that the boundary Γ = ∂Ω is Lipschitz and can be decomposed into two disjoint parts Γ = Γn ∪ Γc , where Γc denotes the control boundary. Moreover, we let Σn = (0, T ) × Γn and Σc = (0, T ) × Γc . We impose the following Neumann boundary conditions: ∂c1 d1 (t, x) = 0 for all (t, x) ∈ Σ, (2.2a) ∂n ∂c2 (t, x) = u(t) α(t, x) for all (t, x) ∈ Σc , (2.2b) d2 ∂n ∂c2 (t, x) = 0 for all (t, x) ∈ Σn , (2.2c) d2 ∂n ∂c3 (t, x) = 0 for all (t, x) ∈ Σ. (2.2d) d3 ∂n Equation (2.2b) prescribes the boundary flux of the second substance B by means of a given shape function α(t, x) ≥ 0, modeling, e.g., the location of a spray nozzle revolving with time around one of the surfaces of Ω, while u(t) denotes the control 4. Sensitivity Analysis for Reaction-Diffusion Problems 65 intensity at time t which is to be determined. The remaining homogeneous Neumann boundary conditions simply correspond to a ”no-outflow” condition of the substances through the boundary of the reaction vessel Ω. In order to complete the description of the model, we impose initial conditions for all three substances involved, i.e., c1 (0, x) = c10 (x) for all x ∈ Ω, (2.3a) c2 (0, x) = c20 (x) c3 (0, x) = c30 (x) for all x ∈ Ω, for all x ∈ Ω. (2.3b) (2.3c) Our goal is to drive the reaction-diffusion model (2.1)–(2.3) from the given initial state near a desired terminal state. Hence, we introduce the cost functional Z 1 β1 |c1 (T ) − c1T |2 + β2 |c2 (T ) − c2T |2 dx J1 (c1 , c2 , u) = 2 Ω Z γ T + |u − ud |2 dt. 2 0 Here and in the sequel, we will find it convenient to abbreviate the notation and write c1 (T ) instead of c1 (T, ·) or omit the arguments altogether when no ambiguity arises. In the cost functional, β1 , β2 and γ are non-negative weights, c1T and c2T are the desired terminal states, and ud is some desired (or expected) control. In order to shorten the notation, we have assumed that the objective J1 does not depend on the product concentration c3 . This allows us to delete the product concentration c3 from the equations altogether and consider only the system for (c1 , c2 ). All results obtained can be extended to the three-component system in a straightforward way. The control u : [0, T ] → R is subject to pointwise box constraints ua (t) ≤ u(t) ≤ ub (t). It is reasonable to assume that ua (t) ≥ 0, which together with α(t, x) ≥ 0 implies that the second (controlled) substance B can not be withdrawn through the boundary. The presence of an upper limit ub is motivated by technological reasons. In addition to the pointwise constraint, it may be desirable to limit the total amount of substance B added during the process, i.e., to impose a constraint like Z T u(t) dt ≤ uc . 0 In the current investigation, we do not enforce this inequality directly but instead we add a penalization term )3 ( Z T 1 u(t) dt − uc J2 (u) = max 0, ε 0 to the objective, which then assumes the final form J(c1 , c2 , u) = J1 (c1 , c2 , u) + J2 (u). (2.4) Our optimal control problem can now be stated as problem (P) Minimize J(c1 , c2 , u) s.t. (2.1a)–(2.1b), (2.2a)–(2.2c) and (2.3a)–(2.3b) and ua (t) ≤ u(t) ≤ ub (t) hold. (P) 2.1. State Equation and Optimality System. The results in this section draw from the investigations carried out in [9] and are stated here for convenience and without proof. Our problem (P) can be posed in the setting u ∈ U = L2 (0, T ) (c1 , c2 ) ∈ Y = W (0, T ) × W (0, T ). 66 Stability and Sensitivity Analysis That is, we consider the state equation (2.1a)–(2.1b), (2.2a)–(2.2c) and (2.3a)–(2.3b) in its weak form, see Remark 2.4 and Section 2.2 for details. Here and throughout, L2 (0, T ) denotes the usual Sobolev space [1] of square-integrable functions on the interval (0, T ) and the Hilbert space W (0, T ) is defined as ∂ W (0, T ) = ϕ ∈ L2 (0, T ; H 1(Ω)) : ϕ ∈ L2 (0, T ; H 1 (Ω)′ ) . ∂t containing functions of different regularity in space and time. Here, H 1 (Ω) is again the usual Sobolev space and H 1 (Ω)′ is its dual. At this point we note for later reference the compact embedding [17, Chapter 3, Theorem 2.1] W (0, T ) ֒→֒→ L2 (0, T ; H s (Ω)) for any 1/2 < s < 1 (2.5) involving the fractional-order space H s (Ω). For convenience of notation, we define the admissible set Uad = {u ∈ U : ua (t) ≤ u(t) ≤ ub (t)}. Let us summarize the fundamental results about the state equation and problem (P). We begin with the following assumption which is needed throughout the paper: Assumption 2.1. (a) Let Ω ⊂ R3 be a bounded open domain with Lipschitz continuous boundary Γ = ∂Ω, which is partitioned into the control part Γc and the remainder Γn . Let di and ki , i = 1, 2 be positive constants, and assume that α ∈ L∞ (0, T ; L2(Γc )) is non-negative. The initial conditions ci0 , i = 1, 2 are supposed to be in L2 (Ω). T > 0 is the given final time of the process. (b) For the control problem, we assume desired terminal states ciT ∈ L2 (Ω), i = 1, 2, and desired control ud ∈ L2 (0, T ) to be given. Moreover, let β1 , β2 be non-negative and γ be positive. Finally, we assume that the penalization parameter ε is positive and that uc ∈ R and ua and ub are in L∞ (0, T ) such RT that 0 ua (t) dt ≤ uc . Theorem 2.2. Under Assumption 2.1(a), the state equation (2.1a)–(2.1b), (2.2a)– (2.2c) and (2.3a)–(2.3b) has a unique weak solution (c1 , c2 ) ∈ W (0, T ) × W (0, T ) for any given u ∈ L2 (0, T ). The solution satisfies the a priori estimate kc1 kW (0,T ) + kc2 kW (0,T ) ≤ C 1 + kc10 kL2 (Ω) + kc20 kL2 (Ω) + kukL2 (0,T ) with some constant C > 0. In order to state the system of first order necessary optimality conditions, we introduce the active sets A− (u) = {t ∈ [0, T ] : u(t) = ua (t)} A+ (u) = {t ∈ [0, T ] : u(t) = ub (t)} for any given control u ∈ Uad . Theorem 2.3. Under Assumption 2.1, the optimal control problem (P) possesses at least one global solution in Y × Uad . If (c1 , c2 , u) ∈ Y × Uad is a local solution, then 4. Sensitivity Analysis for Reaction-Diffusion Problems there exists a unique adjoint variable (λ1 , λ2 ) ∈ Y satisfying ∂ − λ1 − d1 ∆λ1 = −k1 c2 λ1 − k2 c2 λ2 ∂t ∂ − λ2 − d2 ∆λ2 = −k1 c1 λ1 − k2 c1 λ2 ∂t ∂λ1 =0 d1 ∂n ∂λ2 d2 =0 ∂n λ1 (T ) = −β1 (c1 (T ) − c1T ) λ2 (T ) = −β2 (c2 (T ) − c2T ) 67 in Q, (2.6a) in Q, (2.6b) on Σ, (2.6c) on Σ, (2.6d) in Ω, (2.6e) in Ω (2.6f) 2 in the weak sense, and a unique Lagrange multiplier ξ ∈ L (0, T ) such that the optimality condition o2 Z n Z T 3 u(t) dt − uc − α(t, x) λ2 (t, x) dx + ξ(t) = 0 (2.7) γ(u(t) − ud (t)) + max 0, ε Γc 0 holds for almost all t ∈ [0, T ], together with the complementarity condition ξ|A− (u) ≤ 0, ξ|A+ (u) ≥ 0. (2.8) Remark 2.4. The partial differential equations throughout this paper are always meant in their weak form. In case of the state and adjoint equations (2.1)–(2.3) and (2.6), respectively, the weak forms are precisely stated in Section 2.2 below, see the definition of F . However, we prefer to write the equations in their strong form to make them easier understandable. Solutions to the optimality system (2.6)–(2.8), including the state equation, can be found numerically by employing, e.g., semismooth Newton or primal-dual active set methods, see [8, 10, 19] and [2, 9], respectively. In the sequel, we will often find it convenient to use the abbreviations y = (c1 , c2 ) for the vector of state variables, x = (y, u) for state/control pairs, and λ = (λ1 , λ2 ) for the vector of adjoint states. In passing, we define the Lagrangian associated to our problem (P), Z T Z Z ∂ L(x, λ) = J(x) + c1 , λ1 + d1 ∇c1 ∇λ1 dx + k1 c1 c2 λ1 dx dt ∂t 0 Ω Ω Z Z Z Z T ∂ c2 , λ2 + d2 ∇c2 ∇λ2 dx + k2 c1 c2 λ2 dx − d2 α u λ2 dx dt + ∂t Ω ∂Ω 0 Z ZΩ (c2 (0) − c20 ) λ2 (0) dx (2.9) (c1 (0) − c10 ) λ1 (0) dx + + Ω Ω for any x = (c1 , c2 , u) ∈ Y × U and λ = (λ1 , λ2 ) ∈ Y . The bracket hu, vi denotes the duality between u ∈ H 1 (Ω)′ and v ∈ H 1 (Ω). The Lagrangian is twice continuously differentiable, and its Hessian with respect to the state and control variables is readily seen to be Lxx (x, λ)(x, x) = β1 kc1 (T )k2L2 (Ω) + β2 kc2 (T )k2L2 (Ω) + γkuk2L2 (0,T ) ) Z ( Z !2 Z T T 6 + max 0, u(t) dt − uc u(t) dt + 2 (k1 λ1 + k2 λ2 ) c1 c2 dx dt. ε 0 0 Q (2.10) The Hessian is a bounded bilinear form, i.e., there exists a constant C > 0 such that Lxx (x, λ)(x1 , x2 ) ≤ C kx1 kY ×U kx2 kY ×U 68 Stability and Sensitivity Analysis holds for all (x1 , x2 ) ∈ [Y × U ]2 . 2.2. Parameter Dependence. As announced in the introduction, we consider problem (P) in dependence on a vector of parameters p and emphasize this by writing (P(p)). It is our goal to investigate the behavior of locally optimal solutions of (P(p)), or solutions of the optimality system (2.6)–(2.8) for that matter, as p deviates from its given nominal value p∗ . In practice, the parameter vector p can be thought of as problem data which may be subject to perturbation or uncertainty. The nominal value p∗ is then simply the expected value of the data. Our main result (Theorem 4.1) states that under a coercivity condition on the Hessian (2.10) of the Lagrange function, the solution of the optimality system belonging to (P(p)) depends directionally differentiably on p. The derivatives are called parametric sensitivities since they yield the sensitivities of their underlying quantities with respect to perturbations in the parameter. Our analysis can be used to predict the solution at p near the nominal value p∗ using a Taylor expansion. This can be exploited to devise a solution algorithm for (P(p)) with real-time capabilities, provided that the nominal solution to (P(p∗ )) along with the sensitivities are computed beforehand (”offline”). In addition, the sensitivities allow a qualitative perturbation analysis of optimal solutions. In our current problem, we take p = (d1 , d2 , k1 , k2 , β1 , β2 , γ, uc, ε, c10 , c20 , c1T , c2T , ud ) ∈ R9 × L2 (Ω)4 × L2 (0, T ) =: Π (2.11) as the vector of perturbation parameters. Note that p belongs to an infinite-dimensional Hilbert space and that, besides containing physical parameters such as the reaction and diffusion constants ki and di , it comprises non-physical data such as the penalization parameter ε. In order to carry out our analysis, it is convenient to rewrite the optimality system (2.6)–(2.8) plus the state equation as a generalized equation, involving a set-valued operator. We notice that the complementarity condition (2.8) together with (2.7) is equivalent to the variational inequality Z 0 T ξ(t)(u(t) − u(t)) dt ≤ 0 ∀u ∈ Uad . (2.12) This can also be expressed as ξ ∈ N (u) where N (u) = {v ∈ L2 (0, T ) : Z 0 T v (u − u) dt ≤ 0 for all u ∈ Uad } if u ∈ Uad , and N (u) = ∅ if u 6∈ Uad . This set-valued operator is known as the dual cone of Uad at u (after identification of L2 (0, T ) with its dual). To rewrite the remaining components of the optimality system into operator form, we introduce F : W (0, T ) × L2 (0, T ) × W (0, T ) × Q → Z with the target space Z given by Z = L2 (0, T ; H 1 (Ω)′ )2 × L2 (Ω)2 × L2 (0, T ) × L2 (0, T ; H 1 (Ω)′ )2 × L2 (Ω)2 . 4. Sensitivity Analysis for Reaction-Diffusion Problems 69 The components of F are given next. Wherever it appears, φ denotes an arbitrary function in L2 (0, T ; H 1(Ω)). For reasons of brevity, we introduce K = k1 λ1 + k2 λ2 . Z F1 (y, u, λ, p)(φ) = Z F2 (y, u, λ, p)(φ) = T 0 T 0 Z Z ∂ Kc2 φ dx dt ∇λ1 · ∇φ dx + − λ1 , φ + d1 ∂t Ω Ω Z Z ∂ − λ2 , φ + d2 Kc1 φ dx dt ∇λ2 · ∇φ dx + ∂t Ω Ω F3 (y, u, λ, p) = λ1 (T ) + β1 (c1 (T ) − c1T ) F4 (y, u, λ, p) = λ2 (T ) + β2 (c2 (T ) − c2T ) )2 Z ( Z T 3 α λ2 dx − u(t) dt − uc F5 (y, u, λ, p) = γ(u − ud ) + max 0, ε Γc 0 Z T Z Z ∂ c1 , φ + d1 ∇c1 · ∇φ dx + k1 c1 c2 φ dx dt ∂t 0 Ω Ω Z Z Z T ∂ c2 , φ + d2 ∇c2 · ∇φ dx + k2 c1 c2 φ dx dt F7 (y, u, λ, p)(φ) = ∂t Ω Ω 0 Z α u φ dx dt − F6 (y, u, λ, p)(φ) = Σ F8 (y, u, λ, p) = c1 (0) − c10 F9 (y, u, λ, p) = c2 (0) − c20 . At this point it is not difficult to see that the optimality system (2.6)–(2.8), including the state equation (2.1a)–(2.1b), (2.2a)–(2.2c) and (2.3a)–(2.3b), is equivalent to the generalized equation 0 ∈ F (y, u, λ, p) + N (u) (2.13) where we have set N (u) = (0, 0, 0, 0, N (u), 0, 0, 0, 0)⊤ ⊂ Z. In the next section, we will investigate the following linearization around a given solution (y ∗ , u∗ , λ∗ ) of (2.13) and for the given parameter p∗ . This linearization depends on a new parameter δ ∈ Z: y − y∗ δ ∈ F (y ∗ , u∗ , λ∗ , p∗ ) + F ′ (y ∗ , u∗ , λ∗ , p∗ ) u − u∗ + N (u). λ − λ∗ (2.14) Herein F ′ denotes the Fréchet derivative of F with respect to (y, u, λ). Note that F is the gradient of the Lagrangian L and F ′ is its Hessian whose ”upper-left block” was already mentioned in (2.10). 3. Properties of the Linearized Problem In order to become more familiar with the linearized generalized equation (2.14), we write it in its strong form, assuming smooth perturbations δ = (δ1 , . . . , δ5 ). For better readability, the given parameter p∗ is still denoted as in (2.11), without additional ∗ 70 Stability and Sensitivity Analysis in every component. We obtain from the linearizations of F1 through F4 : ∂ λ1 − d1 ∆λ1 + K c∗2 + K ∗ c2 = K ∗ c∗2 + δ1 ∂t ∂ − λ2 − d2 ∆λ2 + K c∗1 + K ∗ c1 = K ∗ c∗1 + δ2 ∂t ∂λ1 = δ1 |Σ d1 ∂n ∂λ2 = δ2 |Σ d2 ∂n λ1 (T ) = −β1 (c1 (T ) − c1T ) + δ3 − λ2 (T ) = −β2 (c2 (T ) − c2T ) + δ4 in Q, (3.1a) in Q, (3.1b) on Σ, (3.1c) on Σ, (3.1d) in Ω, (3.1e) in Ω, ∗ k1 λ∗1 where we have abbreviated K = k1 λ1 + k2 λ2 and K = + components F6 through F9 we obtain a linearized state equation: ∂ c1 − d1 ∆c1 + k1 c1 c∗2 + k1 c∗1 c2 = k1 c∗1 c∗2 + δ6 ∂t ∂ c2 − d2 ∆c2 + k2 c1 c∗2 + k2 c∗1 c2 = k2 c∗1 c∗2 + δ7 ∂t ∂c1 = δ6 |Σ d1 ∂n ∂c2 = α u + δ7 |Σ d2 ∂n c1 (0) = c10 + δ8 c2 (0) = c20 + δ9 (3.1f) k2 λ∗2 . From the in Q, (3.2a) in Q, (3.2b) on Σ, (3.2c) on Σ, (3.2d) in Ω, (3.2e) in Ω. (3.2f) Finally, the component F5 becomes the variational inequality Z T ξ(t)(u(t) − u(t)) dt ≤ 0 ∀u ∈ Uad (3.3) 0 where in analogy to the original problem, ξ ∈ L2 (0, T ) is defined through o2 Z n Z T 3 ∗ α λ2 dx − δ5 γ(u − ud ) + max 0, u (t) dt − uc − ε Γc 0 n Z T oZ T 6 u∗ (t) dt − uc (u(t) − u∗ (t)) dt + ξ(t) = 0. (3.4) + max 0, ε 0 0 In turn, the system (3.1)–(3.4) is easily recognized as the optimality system for an auxiliary linear quadratic optimization problem, which we term (AQP(δ)): Z Z 1 ∗ ∗ c1T c1 (T ) dx − β2 c2T c2 (T ) dx Minimize Lxx (x , λ )(x, x) − β1 2 Ω Ω o2 Z T n Z T 3 + max 0, u∗ (t) dt − uc u(t) dt ε 0 0 ! ! Z T n Z T o Z T 6 u∗ (t) dt − uc − max 0, u∗ (t) dt u(t) dt ε 0 0 0 Z T Z −γ ud u dt − (k1 λ∗1 + k2 λ∗2 )(c∗1 c2 + c1 c∗2 ) dx dt 0 Q − hδ1 , c1 i − hδ2 , c2 i − Ω Z Z Z δ3 c1 (T ) − Ω δ4 c2 (T ) − 0 T δ5 u dt (3.5) 4. Sensitivity Analysis for Reaction-Diffusion Problems 71 subject to the linearized state equation (3.2) above and u ∈ Uad . The bracket hδ1 , c1 i here denotes the duality between L2 (0, T ; H 1 (Ω)) and its dual L2 (0, T ; H 1 (Ω)′ ). In order for (AQP(δ)) to have a strictly convex objective and thus to have a unique solution, we require the following assumption: Assumption 3.1 (Coercivity Condition). We assume that there exists ρ > 0 such that Lxx (x∗ , λ∗ )(x, x) ≥ ρ kxk2Y ×U holds for all x = (c1 , c2 , u) ∈ Y × U which satisfy the linearized state equation (3.2) in weak form, with all right hand sides except the term α u replaced by zero. Sufficient conditions for Assumption 3.1 to hold are given in [9, Theorem 3.15]. We now prove our first result for the auxiliary problem (AQP(δ)): Proposition 3.2 (Lipschitz Stability for the Linearized Problem). Under Assumption 2.1, holding for the parameter p∗ , and Assumption 3.1, (AQP(δ)) has a unique solution which depends Lipschitz continuously on the parameter δ ∈ Z. That is, there exists L > 0 such that for all δ̂, δ̌ ∈ Z with corresponding solutions (x̂, λ̂) and (x̌, λ̌), kĉ1 − č1 kW (0,T ) + kĉ2 − č2 kW (0,T ) + kû − ǔkL2 (0,T ) + kλ̂1 − λ̌1 kW (0,T ) + kλ̂2 − λ̌2 kW (0,T ) ≤ L kδ̂ − δ̌kZ hold. Proof. The proof follows the technique of [18] and is therefore kept relatively short here. Throughout, we denote by capital letters the differences we wish to estimate, i.e., C1 = ĉ1 − č1 , etc. To improve readability, we omit the differentials dx and dt in integrals whenever possible. We begin by testing the weak form of the adjoint equation (3.1) by C1 and C2 , and testing the weak form of the state equation (3.2) by Λ1 and Λ2 , using integration by parts with respect to time and plugging in the initial and terminal conditions from (3.1) and (3.2). One obtains 2 Z 2 β1 kC1 (T )k + β2 kC2 k + 2 ∗ Z K C1 C2 + α UΛ Σ Z Z C2 (T )∆4 − hΛ1 , ∆6 i − hΛ2 , ∆7 i C1 (T )∆3 + = − hC1 , ∆1 i − hC2 , ∆2 i + Ω Ω Z Z − Λ1 (0)∆8 − Λ2 (0)∆9 . (3.6) Q Ω Ω From the variational inequality (3.3), using u = û or u = ǔ as test functions, we get Z − Σ 2 α U Λ2 ≤ −γ kU k + Z 0 ) Z !2 ( Z T T 6 ∗ U dt . (3.7) U ∆5 − max 0, u (t) dt − uc ε 0 0 T 72 Stability and Sensitivity Analysis Unless otherwise stated, all norms are the natural norms for the respective terms. Adding the inequality (3.7) to (3.6) above and collecting terms yields Lxx (x∗ , λ∗ )((C1 , C2 , U ), (C1 , C2 , U )) Z T Z Z C2 (T )∆4 + C1 (T )∆3 + U ∆5 ≤ − hC1 , ∆1 i − hC2 , ∆2 i + Ω Ω 0 Z Z Λ2 (0)∆9 Λ1 (0)∆8 − − hΛ1 , ∆6 i − hΛ2 , ∆7 i − Ω Ω 9 1 X k∆i k2 ≤ κ (1 + c2 ) kC1 k2 + kC2 k2 + kΛ1 k2 + kΛ2 k2 + κ kU k2 + 4κ i=1 (3.8) where the last inequality has been obtained using Hölder’s inequality, the embedding W (0, T ) ֒→ C([0, T ]; L2 (Ω)) and Young’s inequality in the form ab ≤ κa2 + b2 /(4κ). The number κ > 0 denotes a sufficiently small constant which will be determined later at our convenience. Here and throughout, generic constants are denoted by c. They may take different values in different locations. In order to make use of the Coercivity Assumption 3.1, we decompose Ci = zi + wi , i = 1, 2 and consider their respective equations, see (3.2). The z components account for the control influence while the w components arise from the perturbation differences ∆1 , . . . , ∆4 . We have on Q, Σ and Ω, respectively, ∂ ∂ z1 −d1 ∆z1 +k1 z1 c∗2 +k1 c∗1 z2 = 0, w1 −d1 ∆w1 +k1 w1 c∗2 + k1 c∗1 w2 = ∆6 ∂t ∂t ∂ ∂ z2 −d2 ∆z2 +k2 z1 c∗2 +k2 c∗1 z2 = 0, w2 −d2 ∆w2 +k2 w1 c∗2 + k2 c∗1 w2 = ∆7 ∂t ∂t ∂z1 ∂w1 d1 = 0, d1 = ∆6 |Σ ∂n ∂n ∂w2 ∂z2 = α U, d2 = ∆7 |Σ d2 ∂n ∂n z1 (0) = 0, w1 (0) = ∆8 z2 (0) = 0, w2 (0) = ∆9 . Note that for (z1 , z2 , U ), the Coercivity Assumption 3.1 applies and that standard a priori estimates yield kz1 k + kz2 k ≤ ckU k and kw1 k + kw2 k ≤ c(k∆6 k + k∆7 k + k∆8 k + k∆9 k). Using the generic estimates kzi k2 ≥ kCi k2 − 2kCi kkwi k + kwi k2 and kzi k ≤ kCi k + kwi k, the embedding W (0, T ) ֒→ C([0, T ]; L2 (Ω)) and the coercivity assumption, we obtain Lxx (x∗ , λ∗ )((C1 , C2 , U ), (C1 , C2 , U )) = Lxx (x∗ , λ∗ )((z1 , z2 , U ), (z1 , z2 , U )) Z Z β β1 + β1 z1 (T )w1 (T ) + β2 z2 (T )w2 (T ) + kw1 (T )k2 + kw2 (T )k2 2 2 Ω Z Ω ∗ K (w1 z2 + z1 w2 + w1 w2 ) + Q ≥ ρ kC1 k2 + kC2 k2 + kU k2 − 2ρ kC1 kkw1 k + kC2 kkw2 k − β1 c kw1 k kC1 k + kw1 k − β2 c kw2 k kC2 k + kw2 k − c kK ∗ kL2 (Q) kw1 kkC2 k + kC1 kkw2 k + 3kw1 kkw2 k . (3.9) 4. Sensitivity Analysis for Reaction-Diffusion Problems 73 For the last term, we have employed Hölder’s inequality and the embedding W (0, T ) ֒→ L4 (Q), see [4, p. 7]. Combining the inequalities (3.8) and (3.9) yields ρ kC1 k2 + kC2 k2 + kU k2 ≤ 2ρ kC1 kkw1 k + kC2 kkw2 k + β1 c kw1 k kC1 k + kw1 k + β2 c kw2 k kC2 k + kw2 k + c kK ∗ kL2 (Ω) kw1 kkC2 k + kC1 kkw2 k + 3kw1 kkw2 k 9 + κ κ 1 X k∆i k2 + (1 + c2 ) kC1 k2 + kC2 k2 + kΛ1 k2 + kΛ2 k2 + kU k2 8κ i=1 2 2 (3.10) and the last two terms can be absorbed in the left hand side when choosing κ > 0 sufficiently small and observing that Λ1 and Λ2 depend continuously on the data C1 and C2 . By the a priori estimate stated above, wi , i = 1, 2, can be estimated against the data ∆7 , . . . , ∆9 . Using again Young’s inequality on the terms kCi kkwj k and absorbing the quantities of type κkCi k2 into the left hand side, we obtain the Lipschitz dependence of (C1 , C2 , U ) on ∆1 , . . . , ∆9 . Invoking once more the continuous dependence of Λi on (C1 , C2 ), Lipschitz stability is seen to hold also for the adjoint variable. If (x∗ , λ∗ ) is a solution to the optimality system (2.6)–(2.8) and state equation, then the previous theorem implies that the generalized equation (2.13) is strongly regular at this solution, compare [15]. Before showing that the Coercivity Assumption 3.1 implies also directional differentiability of the solution of (AQP(δ)) in dependence on δ, we introduce the strongly active subsets for the solution (y ∗ , u∗ , λ∗ ) with multiplier ξ ∗ given by (2.7), A0− (u∗ ) = {t ∈ [0, T ] : ξ ∗ (t) < 0} A0+ (u∗ ) = {t ∈ [0, T ] : ξ ∗ (t) > 0} Note that necessarily u∗ = ua on A0− (u∗ ) and u∗ = ub on A0+ (u∗ ) hold in view of the variational inequality (2.12). Based on the notion of strongly active sets, we define bad , the set of admissible control variations: U 0 ∗ 0 ∗ u = 0 on A− (u ) ∪ A+ (u ) bad ⇔ u ∈ L2 (0, T ) and u ≥ 0 on A− (u∗ ) u∈U u ≤ 0 on A+ (u∗ ). This definition reflects the fact that if the solution u∗ associated to the parameter value p∗ is equal to the lower bound ua at some point t ∈ [0, T ], we can approach it only from above (and vice versa for the upper bound). In addition, if the control constraint is strongly active at some point, i.e., if it has a nonzero multiplier ξ ∗ there, the variation is zero. Proposition 3.3 (Differentiability for the Linearized Problem). Under Assumptions 2.1 and 3.1, the unique solution to (AQP(δ)) depends directionally differentiably on the parameter δ ∈ Z. The directional derivative in the direction of δ̂ ∈ Z is given by the solution of the auxiliary linear quadratic problem (DQP(δ̂)), Z Z 1 Minimize Lxx (x∗ , λ∗ )(x, x) − δ̂1 , c1 − δ̂2 , c2 − δ̂3 c1 (T ) − δ̂4 c2 (T ) 2 Ω Ω Z T δ̂5 u dt − 0 74 Stability and Sensitivity Analysis bad and the linearized state equation subject to u ∈ U ∂ c1 − d1 ∆c1 + k1 c1 c∗2 + k1 c∗1 c2 = δ̂6 ∂t ∂ c2 − d2 ∆c2 + k2 c1 c∗2 + k2 c∗1 c2 = δ̂7 ∂t ∂c1 = δ̂6 |Σ d1 ∂n ∂c2 = α u + δ̂7 |Σ d2 ∂n in Q (3.11a) in Q (3.11b) on Σ (3.11c) on Σ (3.11d) c1 (0) = δ̂8 in Ω (3.11e) c2 (0) = δ̂9 in Ω. (3.11f) Proof. Let δ̂ ∈ Z be any given direction of perturbation and let {τn } be a sequence of real numbers such that τn ց 0. We set δn = τn δ̂ and denote the solution of (AQP(δn )) by (cn1 , cn2 , un , λn1 , λn2 ). Note that (c∗1 , c∗2 , u∗ , λ∗1 , λ∗2 ) is the solution of (AQP(0)). Then, by virtue of Proposition 3.2, we have n c1 − c∗1 cn2 − c∗2 un − u∗ λn1 − λ∗1 λn2 − λ∗2 + + + + τn τn τn τn τn ≤ L kδ̂k (3.12) 2 in the norms of W (0, T ), L (0, T ), and Z, respectively, and with some Lipschitz constant L > 0. We can thus extract weakly convergent subsequences (still denoted by index n) and use the compact embedding of W (0, T ) into L2 (Q) to obtain un − u∗ ⇀ ũ in L2 (0, T ) (3.13) τn cn1 − c∗1 ⇀ ĉ1 in W (0, T ) and → ĉ1 in L2 (Q) (3.14) τn and similarly for the remaining components. Taking yet another subsequence, all components except the control are seen also to converge pointwise almost everywhere in Q. From here, we only sketch the remainder of the proof since it closely parallels the ones given in [6, 12]. In addition to the arguments given there, our analysis relies on the strong convergence (and thus pointwise convergence almost everywhere on [0, T ] of a subsequence) of Z Z λn2 − λ∗2 α λ̂2 in L2 (0, T ) (3.15) → α n τ Γc Γc which follows from the compact embedding of W (0, T ) into L2 (0, T ; H s (Ω)) for 1/2 < s < 1 (see (2.5)) and the continuity of the trace operator H s (Ω) → L2 (Γc ). One expresses un as the pointwise projection of un + ξ n /γ onto the admissible set Uad with ξ n given by (3.4) evaluated at (un , λn2 ). Using (3.13) and (3.15), one shows that (un − u∗ )/τ n possesses a pointwise convergent subsequence (still denoted by index n). Distinguishing cases, one finds the pointwise limit û of (un −u∗ )/τ n to be the pointwise bad . Using a suitable projection of limn→∞ (un + ξ n /γ) onto the new admissible set U upper bound in Lebesgue’s Dominated Convergence Theorem, one shows that û is also the limit in the sense of L2 (0, T ) and thus û = ũ must hold. It remains to show that the limit (ĉ1 , ĉ2 , û, λ̂1 , λ̂2 ) satisfy the first order optimality system for (DQP(δ̂)) (which is routine) and that the limits actually hold in their strong senses in W (0, T ) (which follows from standard a priori estimates). Since we could have started with a subsequence of τ n in the first place and since the limit (ĉ1 , ĉ2 , û, λ̂1 , λ̂2 ) must always be the same in view of the Coercivity Assumption 3.1, the convergence extends to the whole sequence. 4. Sensitivity Analysis for Reaction-Diffusion Problems 75 4. Properties of the Nonlinear Problem In the current section, we shall prove that the solutions to the original nonlinear generalized equation (2.13) depend on p in the same way as the solutions to the linearized generalized equation (2.14) depend on δ. To this end, we invoke an implicit function theorem for generalized equations. Throughout this section, let again p∗ be a given nominal (or unperturbed or expected) value of the parameter vector p = (d1 , d2 , k1 , k2 , β1 , β2 , γ, uc, ε, c10 , c20 , c1T , c2T , ud ) ∈ R9 × L2 (Ω)4 × L2 (0, T ) =: Π satisfying Assumption 2.1. Moreover, let (x∗ , λ∗ ) = (c∗1 , c∗2 , u∗ , λ∗1 , λ∗2 ) be a solution of the first order necessary conditions (2.6)–(2.8) plus the state equation, or, in other words, of the generalized equation (2.13). Theorem 4.1 (Lipschitz Continuity and Directional Differentiability). Under Assumptions 2.1 and 3.1, there exists a neighborhood B(p∗ ) ⊂ Π of p∗ and a neighborhood B(y ∗ , u∗ , λ∗ ) ⊂ Y × U × Y and a Lipschitz continuous function B(p∗ ) ∋ p 7→ (yp , up , λp ) ∈ B(y ∗ , u∗ , λ∗ ) such that (yp , up , λp ) solves the optimality system (2.6)–(2.8) plus the state equation for parameter p and such that it is the only critical point in B(y ∗ , u∗ , λ∗ ). Moreover, the map p 7→ (yp , up , λp ) is directionally differentiable, and its derivative in the direction p̂ ∈ Π is given by the unique solution of (DQP(δ̂)), in the direction of δ̂ = −Fp (y ∗ , u∗ , λ∗ , p∗ ) p̂. Proof. The proof is based on the implicit function theorem for generalized equations from [5, 15]. It relies on the strong regularity property, which was shown in Proposition 3.2. It remains to verify that F is Lipschitz in p near p∗ , uniformly in a neighborhood of (y ∗ , u∗ , λ∗ ), and that F is differentiable with respect to p, which is straightforward. The formula for its derivative is given in the remark below. Remark 4.2. In order to compute the parametric sensitivities of the nominal solution (c∗1 , c∗2 , u∗ , λ∗1 , λ∗2 ) for (P(p∗ )) in a perturbation direction p̂, we need to solve the linearquadratic problem (DQP(δ̂)) with δ̂ = − Fp (y ∗ , u∗ , λ∗ , p∗ ) p̂ Z = − dˆ1 ∇λ∗1 ∇ · +(k̂1 λ∗1 + k̂2 λ∗2 )c∗2 ·, Q β̂1 (c∗1 (T ) − c∗1T ) − β1∗ ĉ1T , dˆ2 Z Q ∇λ∗2 ∇ · +(k̂1 λ∗1 + k̂2 λ∗2 )c∗1 ·, β̂2 (c∗2 (T ) − c∗2T ) − β2∗ ĉ2T , 6 3ε̂ γ̂(u∗ − u∗d ) − γ ∗ ûd − ∗ 2 I 2 − ∗ I ûc , (ε ) ε Z Z Z Z k̂2 c∗1 c∗2 ·, k̂1 c∗1 c∗2 ·, dˆ2 ∇c∗2 · ∇ · + dˆ1 ∇c∗1 · ∇ · + Q − ĉ10 , Q Q − ĉ20 Q ⊤ , o n R T where I means max 0, 0 u∗ (t) dt − u∗c . We close this section by remarking that the parametric sensitivities allow to compute a second-order expansion of the value of the objective, see [6,12] for details. In addition, the Coercivity Assumption 3.1 implies that second order sufficient conditions hold at the nominal and also at the perturbed solutions, so that points satisfying the first order necessary conditions are indeed strict local optimizers. 76 Stability and Sensitivity Analysis 5. Numerical Results In this section, we present some numerical results and show evidence that the parametric sensitivities yield valuable information which is useful in making qualitative and quantitative estimates of the solution under perturbations. In our example, the three-dimensional geometry of the problem is given by the annular cylinder between the planes z = 0 and z = 0.5 with inner radius 0.4 and outer radius 1.0 whose rotational axis is the z-axis (Figure 5.1). The control boundary Γc is the upper annulus, and we use the control shape function α(t, x) = exp −5 (x1 − 0.7 cos(2πt))2 + (x2 − 0.7 sin(2πt))2 . which corresponds to a nozzle circling for t ∈ [0, 1] once around in counter-clockwise direction at a radius of 0.7. For fixed t, α is a function which decays exponentially with the square of the distance from the current location of the nozzle. The problem was discretized using the finite element method on a mesh consisting of 1797 points and 7519 tetrahedra. The ’triangulation’ of the domain Ω by tetrahedra is also shown in Figure 5.1. In the time direction, the interval [0, T ] was uniformly divided into 100 parts. By controlling the second substance B, we wish to steer the concentration of Figure 5.1. Domain Ω ⊂ R3 and its triangulation with tetrahedra the first substance A to zero at terminal time T = 1, i.e., we choose β1∗ = 1 β2∗ = 0 c∗1T ≡ 0 The control cost parameter is γ ∗ = 10−2 and the control bounds are chosen as ua ≡ 1 ub ≡ 5. The chemical reaction is governed by equations (2.1)–(2.3) with parameters d∗1 = 0.15 d∗2 = 0.20 k1∗ = 1.0 k2∗ = 1.0. As initial concentrations, we use c∗10 ≡ 1.0 c∗20 ≡ 0.0. The discrete optimal solution without the contribution from the penalized integral constraint J2 (corresponding to ε = ∞) yields Z T u∗ (t) dt = 4.2401, J1 (c∗1 , c∗2 , u∗ ) = 0.2413. 0 In order for this constraint to become relevant, we choose u∗c = 3.5 and enforce it using the penalization parameter ε∗ = 1. Details on the numerical implementation are given in [8, 9]. For the discretization described above, we obtain a problem size of approximately 726 000 variables, including the adjoint states, which takes a couple of minutes to solve on a standard desktop PC. 4. Sensitivity Analysis for Reaction-Diffusion Problems 5.5 0.5 5 0.4 4.5 0.3 4 0.2 3.5 0.1 3 0 2.5 −0.1 2 −0.2 1.5 −0.3 −0.4 1 0.5 77 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 5.2. Left: Optimal control u∗ (thick solid), true perturbed control up (thin solid) and predicted control (circles). Right: Parametric sensitivity dup∗ /dp in the direction of p − p∗ . In Figures 5.3–5.4 (left columns) and Figure 5.2 (left), we show the individual components of the optimal solution. We note that the optimal control lies on the upper bound in the first part of the time interval, then in the interior of the admissible interval [1, 5] and finally on the lower bound. From Figure 5.3 (left) we infer that as time advances, substance A decays and approaches the desired value of zero to the extent permitted by the control cost parameter γ and the control bounds. Figure 5.4 (left) nicely shows the influence of the revolving control nozzle on the upper surface of the annular cylinder, adding amounts of substance B over time which then diffuse towards the interior of the reaction vessel and react with substance A. In order to illustrate the sensitivity calculus, we perturb the reaction constants k1∗ and k2∗ by 50%, taking k1 = 1.5 k2 = 1.5 as their new values. With the reaction now proceeding faster, one presumes that the desired goal of consuming substance A within the given time interval will be achieved to a higher degree, which will in fact be confirmed below from sensitivity information. Figure 5.2 (left) shows, next to the nominal control, the solution obtained by a first order Taylor approximation using the sensitivity of the control variable, i.e., up ≈ up∗ + d up∗ (p − p∗ ). dp To allow a comparison, the true perturbed solution is also depicted, which of course required the repeated solution of the nonlinear optimal control problem (P(p)). It is remarkable how well the perturbed solution can be predicted in face of a 50% perturbation using the sensitivity information, without recomputing the solution to the nonlinear problem. We observe that the perturbed control is lower than the nominal one in the first part of the time interval, later to become higher. This behavior can not easily be predicted without any sensitivity information at hand. Besides, a qualitative analysis of the state sensitivities reveals more interesting information. We have argued above that with the reaction proceeding faster, the control goal can more easily be reached. This can be inferred from Figure 5.3 (right column), showing that the sensitivity derivatives of the first substance are negative throughout, i.e., the perturbed solution comes closer in a pointwise sense to the desired zero terminal state (to first order). The sensitivities for the second state component (see Figure 5.4, right column) nicely reflect the expected behavior inferred from the control sensitivities, see Figure 5.2 (right). As the perturbed control is initially lower than the unperturbed 78 Stability and Sensitivity Analysis Figure 5.3. Concentrations of substance A (left) and its sensitivity (right) at times t = 0.25, t = 0.50, t = 0.75, and t = 1.00. 4. Sensitivity Analysis for Reaction-Diffusion Problems Figure 5.4. Concentrations of substance B (left) and its sensitivity (right) at times t = 0.25, t = 0.50, t = 0.75, and t = 1.00. 79 80 Stability and Sensitivity Analysis one after leaving the upper bound, the sensitivity of the second substance is below zero there. Later, it becomes positive, as does the sensitivity for the control variable. References [1] R. Adams. Sobolev Spaces. Academic Press, New York, 1975. [2] M. Bergounioux, M. Haddou, M. Hintermüller, and K. Kunisch. A comparison of a MoreauYosida-based active set strategy and interior point methods for constrained optimal control problems. SIAM Journal on Optimization, 11(2):495–521, 2000. [3] C. Büskens and R. Griesse. Parametric sensitivity analysis of perturbed PDE optimal control problems with state and control constraints. Journal of Optimization Theory and Applications, 131(1):17–35, 2006. [4] E. DiBenedetto. Degenerate Parabolic Equations. Springer, Berlin, 1993. [5] A. Dontchev. Implicit function theorems for generalized equations. Mathematical Programming, 70:91–106, 1995. [6] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system— Part I: Solution differentiability. Numerical Functional Analysis and Optimization, 25(1–2):93– 117, 2004. [7] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system— Part II: Practical methods and examples. Optimization Methods and Software, 19(2):217–242, 2004. [8] R. Griesse and S. Volkwein. A semi-smooth Newton method for optimal boundary control of a nonlinear reaction-diffusion system. In Proceedings of the Sixteenth International Symposium on Mathematical Theory of Networks and Systems (MTNS), Leuven, Belgium, 2004. [9] R. Griesse and S. Volkwein. A primal-dual active set strategy for optimal boundary control of a nonlinear reaction-diffusion system. SIAM Journal on Control and Optimization, 44(2):467–494, 2005. [10] M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth Newton method. SIAM Journal on Optimization, 13(3):865–888, 2002. [11] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002. [12] K. Malanowski. Solution differentiability of parametric optimal control for elliptic equations. In E. W. Sachs and R. Tichatschke, editors, System Modeling and Optimization XX, Proceedings of the 20th IFIP TC 7 Conference, pages 271–285. Kluwer Academic Publishers, 2003. [13] K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control for parabolic equations. Journal of Analysis and its Applications, 18(2):469–489, 1999. [14] K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control for elliptic equations. Control and Cybernetics, 29:237–256, 2000. [15] S. Robinson. Strongly regular generalized equations. Mathematics of Operations Research, 5(1):43–62, 1980. [16] T. Roubı́ček and F. Tröltzsch. Lipschitz stability of optimal controls for the steady-state NavierStokes equations. Control and Cybernetics, 32(3):683–705, 2003. [17] R. Temam. Navier-Stokes Equations, Theory and Numerical Analysis. North-Holland, Amsterdam, 1984. [18] F. Tröltzsch. Lipschitz stability of solutions of linear-quadratic parabolic control problems with respect to perturbations. Dynamics of Continuous, Discrete and Impulsive Systems Series A Mathematical Analysis, 7(2):289–306, 2000. [19] M. Ulbrich. Semismooth Newton methods for operator equations in function spaces. SIAM Journal on Control and Optimization, 13(3):805–842, 2003. CHAPTER 2 Numerical Methods and Applications Besides their theoretical interest, the concepts of stability and sensitivity of optimization and of optimal control problems in particular have a number of applications. We address some of them in this chapter, along with numerical methods for the computation of sensitivity derivatives and related quantities. First of all, Newton’s method, when applied to a generalized equation, exhibits local quadratic convergence whenever the generalized equation 0 ∈ F (w) + N (w) is strongly regular (see Definition 0.7 on p. 11) and F is sufficiently smooth. In the context of optimal control problems, Newton’s method amounts to an SQP (sequential quadratic programming) approach. Based on our Lipschitz stability results for optimal control problems with mixed control-state constraints (Section 2), we establish the local quadratic convergence of SQP for semilinear problems with such constraints in Section 5 below. In addition, we have considered in Chapter 1 the differentiability of local optimal solutions of various optimal control problems with control constraints. These problems can be written in abstract form as Minimize J(y, u; π) (Pcc (π)) subject to e(y, u; π) = 0 and ua ≤ u ≤ ub a.e. Using the Implicit Function Theorem 0.6, we have shown in various cases the existence of a local map π 7→ Ξ(π) = (Ξy (π), Ξu (π), Ξp (π)) near the nominal parameter π0 , which is Lipschitz and directionally differentiable. The computation of one directional derivative DΞ(π0 ; δπ) amounts to the solution of a linear-quadratic optimal control problem with the same type of control constraints, b on p. 9, Theorem 4.1 of Griesse, Hintermüller, and Hinze [2005] compare (DQP(δ, δ)) (Section 3), or Theorem 4.1 of Griesse and Volkwein [2006] (Section 4). We address in this chapter a number of questions related to these sensitivity derivatives: (1) How can the solution of a perturbed problem Ξ(π) be recovered from the solution of the nominal problem Ξ(π0 ) and derivative information, as accurately as possible? (Section 6) (2) What is the worst-case perturbation which has the greatest impact on the solution or a quantity of interest depending on the solution? (Section 7) (3) How can first and second-order derivatives of such a quantity of interest be evaluated efficiently? (Section 8) (4) What is the relationship between the sensitivity derivatives of (Pcc (π)) and of its relaxation arising in interior point approaches? (Section 9) 82 Numerical Methods and Applications 5. Local Quadratic Convergence of SQP for Elliptic Optimal Control Problems with Mixed Control-State Constraints R. Griesse, N. Metla and A. Rösch: Local Quadratic Convergence of SQP for Elliptic Optimal Control Problems with Mixed Control-State Constraints, submitted to: ESAIM: Control, Optimisation, and Calculus of Variations, 2007 In this paper, we show the local quadratic convergence behavior of the sequential quadratic programming (SQP) approach for the solution of semilinear elliptic optimal control problems of the type Z Minimize φ(x, y, u) dx Ω A y + d(x, y) = u in Ω, subject to (Pmcc ) y = 0 on Γ. u ≥ 0 in Ω, and ε u + y ≥ yc in Ω, where A is a uniformly elliptic second-order differential operator and d is a monotone nonlinearity. The SQP method was considered previously for optimal control problems with control constraints only, see for instance Unger [1997], Heinkenschloss and Tröltzsch [1998], Tröltzsch [1999], Tröltzsch and Volkwein [2001], Hintermüller and Hinze [2006] and Wachsmuth [2007]. The first-order optimality system for (Pmcc ) is a generalization of the system (2.1) for the linear-quadratic problem given on p. 29. It is re-written (see Section 4 of the paper) as a generalized equation (5.1) 0 ∈ F (w) + N (w), where in contrast to the previous cases, F now comprises also the inequality constraints, and w = (y, u, p, µ1 , µ2 ) comprises state, control and adjoint variables as well as Lagrange multipliers. This approach would allow nonlinear inequality constraints as well, which will be considered in an upcoming publication. Given a current iterate wk , Newton’s method, applied to (5.1), produces the new iterate as a solution of (5.2) 0 ∈ F (wk ) + F 0 (wk )(wk+1 − wk ) + N (wk+1 ). It can be verified that (5.2) is equivalent to one step of the SQP method (see Section 5 of the paper). However, for the convergence analysis it is convenient to think in terms of Newton’s method. Let us briefly outline how the local quadratic convergence is shown, compare Alt [1990, 1994]. Suppose that w∗ is a solution of (5.1). We write the Newton step (5.2) as a perturbed step taken at w∗ : (5.3) δ k+1 ∈ F (w∗ ) + F 0 (w∗ )(wk+1 − w∗ ) + N (wk+1 ) where δ k+1 := F (w∗ ) − F (wk ) + F 0 (w∗ )(wk+1 − w∗ ) − F 0 (wk )(wk+1 − wk ). By the fact that w∗ is a solution of (5.1), it also solves (5.4) 0 ∈ F (w∗ ) + F 0 (w∗ )(w∗ − w∗ ) + N (w∗ ), Under the condition that (5.1) is strongly regular at w∗ , we get from (5.3) and (5.4) that kwk+1 − w∗ k ≤ L kδ k+1 k (5.5) It remains to show that kδ k+1 k ≤ c1 kwk+1 − wk k2 + c2 kwk − w∗ kkwk+1 − w∗ k, 5. SQP for Mixed-Constrained Optimal Control Problems 83 which follows from differentiability and Lipschitz properties of F , i.e., from the properties of d and φ. Given that kwk − w∗ k is sufficiently small, the second term can be hidden in the left hand side of (5.5), which yields the local quadratic convergence. The strong regularity of (5.1) at w∗ follows from our results in Alt et al. [2006], see Section 2 of this thesis, under the assumption that the active sets at the solution w∗ are well separated and that second-order sufficient conditions hold, see Theorem 6.7 of the paper under discussion. 84 Numerical Methods and Applications LOCAL QUADRATIC CONVERGENCE OF SQP FOR ELLIPTIC OPTIMAL CONTROL PROBLEMS WITH MIXED CONTROL-STATE CONSTRAINTS ROLAND GRIESSE, NATALIYA METLA, AND ARND RÖSCH Abstract. Semilinear elliptic optimal control problems with pointwise control and mixed control-state constraints are considered. Necessary and sufficient optimality conditions are given. The equivalence of the SQP method and Newton’s method for a generalized equation is discussed. Local quadratic convergence of the SQP method is proved. 1. Introduction This paper is concerned with the local convergence analysis of the sequential quadratic programming (SQP) method for the following class of semilinear optimal control problems: Z Minimize f (y, u) := φ(ξ, y(ξ), u(ξ)) dξ (P) Ω ∞ subject to u ∈ L (Ω) and the elliptic state equation A y + d(ξ, y) = u y=0 in Ω, on ∂Ω, (1.1) as well as pointwise constraints u>0 εu + y > yc in Ω, in Ω. (1.2) Here and throughout, Ω is a bounded domain in RN , N ∈ {2, 3}, which is convex or has a C 1,1 boundary ∂Ω. In (1.1), A is an elliptic operator in H01 (Ω) specified below, and ε is a positive number. The bound yc is a function in L∞ (Ω). Problems with mixed control-state constraints are important as Lavrientiev-type regularizations of pointwise state-constrained problems [10–12], but they are also interesting in their own right. In the former case, ε is a small parameter tending to zero. For the purpose of this paper, we consider ε to be fixed. Note that in addition to the mixed control-state constraint, a pure control constraint is present on the same domain. Since problem (P) is nonconvex, different local minima may occur. SQP methods have proved to be fast solution methods for nonlinear programming problems. A large body of literature exists concerning the analysis of these methods for finite-dimensional problems. For a convergence analysis in a general Banach space setting with equality and inequality constraints, we refer to [2, 3]. The main contribution of this paper is the proof of local quadratic convergence of the SQP method, applied to (P). To our knowledge, such convergence results in the context of PDE-constrained optimization are so far only available for purely controlconstrained problems [7, 17, 19]. Following [2], we exploit the equivalence between the SQP and the Lagrange-Newton methods, i.e., Newton’s method, applied to a generalized (set-valued) equation representing necessary conditions of optimality. We concentrate on specific issues arising due to the semilinear state equation, e.g., the 5. SQP for Mixed-Constrained Optimal Control Problems 85 careful choice of suitable function spaces. An important step is the verification of the so-called strong regularity of the generalized equation, which is made difficult by the simultaneous presence of pure control and mixed control-state constraints (1.2). The key idea was recently developed in [4]. We remark that strong regularity is known to be closely related to second-order sufficient conditions (SSC). For problems with pure control constraints, SSC are well understood and they are close to the necessary ones when so-called strongly active subsets are used, see, e.g., [17, 19, 20]. However, the situation is more difficult for problems with mixed control-state constraints [14, 16] or even pure state constraints. In order to avoid a more technical discussion, we presently employ relatively strong SSC and refer to future work for their refinement. We also refer to an upcoming publication concerning the numerical application of the SQP method to problems of type (P). The material in this paper is organized as follows. In Section 2, we state our main assumptions and recall some properties about the state equation. Necessary and sufficient optimality conditions for problem (P) are stated in Section 3, and their reformulation as a generalized equation is given in Section 4. Section 5 addresses the equivalence of the SQP and Lagrange-Newton methods. Section 6 is devoted to the proof of strong regularity of the generalized equation. Finally, Section 7 completes the convergence analysis of the SQP method. A number of auxiliary results have been collected in the Appendix. We denote by Lp (Ω) and H m (Ω) the usual Lebesgue and Sobolev spaces [1], and (·, ·) is the scalar product in L2 (Ω) or [L2 (Ω)]N , respectively. H01 (Ω) is the subspace of H 1 (Ω) with zero boundary traces, and H −1 (Ω) is its dual. The continuous embedding of a normed space X into a normed space Y is denoted by X ,→ Y . Throughout, we denote by BrX (x) the open ball of radius r around x, in the topology of X. In particular, we write Br∞ (x) for the open ball with respect to the L∞ (Ω) norm. Throughout, c, c1 etc. denote generic positive constants whose value may change from instance to instance. 2. Assumptions and Properties of the State Equation The following assumptions (A1)–(A4) are taken to hold throughout the paper. Assumption. (A1) Let Ω be a bounded domain in RN , N ∈ {2, 3} which is convex or has C 1,1 boundary ∂Ω. The bound yc is in L∞ (Ω), and ε > 0. (A2) The operator A : H01 (Ω) → H −1 (Ω) is defined as A y(v) = a[y, v], where a[y, v] = ((∇v), A0 ∇y) + (b> ∇y, v) + (cy, v). A0 is an N × N matrix with Lipschitz continuous entries on Ω such that ρ>A0 (ξ) ρ > m0 |ρ|2 holds with some m0 > 0 for all ρ ∈ RN and almost all ξ ∈ Ω. Moreover, b ∈ L∞ (Ω)N and c ∈ L∞ (Ω). The bilinear form a[·, ·] is not necessarily symmetric but it is assumed to be continuous and coercive, i.e., a[y, v] 6 c kykH 1 (Ω) kvkH 1 (Ω) 2 a[y, y] > c kykH 1 (Ω) for all y, v ∈ H01 (Ω) with some positive constants c and c. A simple example is a[y, v] = (∇y, ∇v), corresponding to A = −∆. (A3) d(ξ, y) belongs to the C 2 -class of functions with respect to y for almost all ξ ∈ Ω. Moreover, dyy is assumed be a locally bounded and locally Lipschitzcontinuous function with respect to y, i.e., the following conditions hold true: 86 Numerical Methods and Applications there exists K > 0 such that |d(ξ, 0)| + |dy (ξ, 0)| + |dyy (ξ, 0)| 6 Kd , and for any M > 0, there exists Ld (M ) > 0 such that |dyy (ξ, y1 ) − dyy (ξ, y2 )| 6 Ld (M ) |y1 − y2 | a.e. in Ω for all y1 , y2 ∈ R satisfying |y1 |, |y2 | 6 M . Additionally dy (ξ, y) > 0 a.e. in Ω, for all y ∈ R. (A4) The function φ = φ(ξ, y, u) is measurable with respect to ξ ∈ Ω for each y and u, and of class C 2 with respect to y and u for almost all ξ ∈ Ω. Moreover, the second derivatives are assumed to be locally bounded and locally Lipschitz-continuous functions, i.e., the following conditions hold: there exist Ky , Ku , Kyu > 0 such that |φ(ξ, 0, 0)| + |φy (ξ, 0, 0)| + |φyy (ξ, 0, 0)| 6 Ky , |φyu (ξ, 0, 0)| 6 Kyu , |φ(ξ, 0, 0)| + |φu (ξ, 0, 0)| + |φuu (ξ, 0, 0)| 6 Ku , Moreover, for any M > 0, there exists Lφ (M ) > 0 such that |φyy (ξ, y1 , u1 ) − φyy (ξ, y2 , u2 )| 6 Lφ (M ) |y1 − y2 | + |u1 − u2 | , |φyu (ξ, y1 , u1 ) − φyu (ξ, y2 , u2 )| 6 Lφ (M ) |y1 − y2 | + |u1 − u2 | , |φuy (ξ, y1 , u1 ) − φuy (ξ, y2 , u2 )| 6 Lφ (M ) |y1 − y2 | + |u1 − u2 | , |φuu (ξ, y1 , u1 ) − φuu (ξ, y2 , u2 )| 6 Lφ (M ) |y1 − y2 | + |u1 − u2 | for all yi , ui ∈ R satisfying |yi |, |ui | 6 M , i = 1, 2. In addition, φuu (ξ, y, u) > m > 0 a.e. in Ω, for all (y, u) ∈ R2 . In the sequel, we will simply write d(y) instead of d(ξ, y) etc. As a consequence of (A3)–(A4), the Nemyckii operators d(·) and φ(·) are twice continuously Fréchet differentiable with respect to the L∞ (Ω) norms, and their derivatives are locally Lipschitz continuous, see Lemma A.1. The necessity of using L∞ (Ω) norms for general nonlinearities d and φ motivates our choice Y := H 2 (Ω) ∩ H01 (Ω) ∞ as a state space, since Y ,→ L (Ω). Remark 2.1. In case Ω has only a Lipschitz boundary, our results remain true when Y is replaced by H01 (Ω) ∩ L∞ (Ω). Recall that a function y ∈ H01 (Ω) ∩ L∞ (Ω) is called a weak solution of (1.1) with u ∈ L2 (Ω) if a[y, v] + (d(y), v) = (u, v) holds for all v ∈ H01 (Ω). Lemma 2.2. Under assumptions (A1)–(A3) and for any given u ∈ L2 (Ω), the semilinear equation (1.1) possesses a unique weak solution y ∈ Y . It satisfies the a priori estimate kykH 1 (Ω) + kykL∞ (Ω) 6 CΩ kukL2 (Ω) + 1 with a constant CΩ independent of u. Proof. The existence and uniqueness of a weak solution y ∈ H01 (Ω) ∩ L∞ (Ω) is a standard result [18, Theorem 4.8]. It satisfies kykH 1 (Ω) + kykL∞ (Ω) 6 CΩ (kukL2 (Ω) + 1) =: M with some constant CΩ independent of u. Lemma A.1 implies that d(y) ∈ L∞ (Ω). Using the embedding L∞ (Ω) ,→ L2 (Ω), we conclude that the difference u−d(y) belongs to L2 (Ω). Owing to assumption (A1), y ∈ H 2 (Ω), see for instance [6, Theorem 2.2.2.3]. 5. SQP for Mixed-Constrained Optimal Control Problems 87 We will frequently also need the corresponding result for the linearized equation A y + dy (y) y = u in Ω, y = 0 on ∂Ω. (2.1) Lemma 2.3. Under assumptions (A1)–(A3) and given y ∈ L∞ (Ω), the linearized PDE (2.1) possesses a unique weak solution y ∈ Y for any given u ∈ L2 (Ω). It satisfies the a priori estimate kykH 2 (Ω) 6 CΩ (y) kukL2 (Ω) with a constant CΩ (y) independent of u. Proof. According to (A3) and Lemma A.1, dy (y) is a nonnegative coefficient in L∞ (Ω). The claim thus follows again from standard arguments, see, e.g., [6, Theorem 2.2.2.3]. 3. Necessary and Sufficient Optimality Conditions In this section, we introduce necessary and sufficient optimality conditions for problem (P). For convenience, we define the Lagrange functional L : Y × L∞ (Ω) × Y × L∞ (Ω) × L∞ (Ω) → R as L(y, u, p, µ1 , µ2 ) = f (y, u) + a[y, p] + (p, d(y) − u) − (µ1 , u) − (µ2 , εu + y − yc ). Here, µi are Lagrange multipliers associated to the inequality constraints, and p is the adjoint state. The existence of regular Lagrange multipliers µ1 , µ2 ∈ L∞ (Ω) was shown in [15, Theorem 7.3], which implies the following lemma: Lemma 3.1. Suppose that (y, u) ∈ Y × L∞ (Ω) is a local optimal solution of (P). Then there exist regular Lagrange multipliers µ1 , µ2 ∈ L∞ (Ω) and an adjoint state p ∈ Y such that the first-oder necessary optimality conditions Ly (y, u, p, µ1 , µ2 ) = 0, Lu (y, u, p, µ1 , µ2 ) = 0, Lp (y, u, p, µ1 , µ2 ) = 0, u > 0, µ1 > 0, µ1 u = 0, (FON) εu + y − yc > 0, µ2 > 0, µ2 (εu + y − yc ) = 0 hold. Remark 3.2. The Lagrange multipliers and adjoint state associated to a local optimal solution of (P) need not be unique if the active sets {ξ ∈ Ω : u = 0} and {ξ ∈ Ω : εu + y − yc = 0} intersect nontrivially. This situation will be excluded by Assumption (A6) below. Conditions (FON) are also stated in explicit form in (4.1) below. To guarantee that x = (y, u) with associated multipliers λ = (µ1 , µ2 , p) is a local solution of (P), we introduce the following second-order sufficient optimality condition (SSC): There exists a constant α > 0 such that 2 Lxx (x, λ)(δx, δx) > α kδxk[L2 (Ω)]2 (3.1) for all δx = (δy, δu) ∈ Y × L∞ (Ω) which satisfy the linearized equation Aδy + dy (y) · δy = δu δy = 0 in Ω, on ∂Ω. In (3.1), the Hessian of the Lagrange functional is given by Z > δy δy φyy (y, u) + dyy (y) p φyu (y, u) Lxx (x, λ)(δx, δx) := dξ. φuy (y, u) φuu (y, u) δu Ω δu (3.2) 88 Numerical Methods and Applications For convenience, we will use the abbreviation X := Y × L∞ (Ω) = H 2 (Ω) ∩ H01 (Ω) × L∞ (Ω) in the sequel. Assumption. (A5) We assume that x∗ = (y ∗ , u∗ ) ∈ X, together with associated Lagrange multipliers λ∗ = (p∗ , µ∗1 , µ∗2 ) ∈ Y × [L∞ (Ω)]2 , satisfies both (FON) and (SSC). As mentioned in the introduction, we are aware of the fact that there exist weaker sufficient conditions which take into account strongly active sets. However, this further complicates the convergence analysis of SQP and is therefore postponed to later work. Definition 3.3. (a) A pair x = (y, u) ∈ X is called an admissible point if it satisfies (1.1) and (1.2). (b) A point x̄ ∈ X is called a strict local optimal solution in the sense of L∞ (Ω) if there exists ε > 0 such that the inequality f (x̄) < f (x) holds for all admissible x ∈ X \ {x̄} with kx − x̄k[L∞ (Ω)]2 6 ε. Theorem 3.4. Under Assumptions (A1)–(A5), there exists β > 0 and ε > 0 such that 2 f (x) > f (x∗ ) + β kx − x∗ k[L2 (Ω)]2 holds for all admissible x ∈ X with kx − x∗ k[L∞ (Ω)]2 6 ε. In particular, x∗ is a strict local optimal solution in the sense of L∞ (Ω). Proof. The proof uses the two-norm discrepancy principle, see [8, Theorem 3.5]. Let x ∈ X be an admissible point, which implies a[y, p∗ ] + (p∗ , d(y) − u) = 0 and u > 0, εu + y − yc > 0 a.e. in Ω. In view of µ∗1 , µ∗2 > 0, we can estimate the cost functional f by the Lagrange functional: f (x) > f (x) + a[y, p∗ ] + (p∗ , d(y) − u) − (µ∗1 , u) − (µ∗2 , εu + y − yc ) = L(x, λ∗ ). (3.3) The Lagrange functional is twice continuously differentiable with respect to the L∞ (Ω) norms, as is easily seen from Lemma A.1. Hence it possesses a Taylor expansion L(x, λ∗ ) = L(x∗ , λ∗ ) + Lx (x∗ , λ∗ )(x − x∗ ) + Lxx (x + θ(x − x∗ ), λ∗ )(x − x∗ , x − x∗ ) for all x ∈ X, where θ ∈ (0, 1). Since the pair (x∗ , λ∗ ) satisfies (FON), we have f (x∗ ) = L(x∗ , λ∗ ) + Lx (x∗ , λ)(x − x∗ ), which implies L(x, λ∗ ) = f (x∗ ) + Lxx (x∗ , λ∗ )(x − x∗ , x − x∗ ) + Lxx (x∗ + θ(x − x∗ ), λ∗ ) − Lxx (x∗ , λ∗ ) (x − x∗ , x − x∗ ). We cannot use (SSC) directly since x satisfies the semilinear equation (1.1) instead of the linearized one (3.2). However, Lemma A.2 implies that there exist ε > 0 and α0 > 0 such that 2 L(x, λ∗ ) > f (x∗ ) + α0 kx − x∗ k[L2 (Ω)]2 + Lxx (x∗ + θ(x − x∗ ), λ∗ ) − Lxx (x∗ , λ∗ ) (x − x∗ , x − x∗ ), (3.4) 5. SQP for Mixed-Constrained Optimal Control Problems 89 given that kx − x∗ k[L∞ (Ω)]2 6 ε. Moreover, the Hessian of the Lagrange functional satisfies the following local Lipschitz condition (see Lemma A.1 and also [18, Lemma 4.24]): | Lxx (x∗ + θ(x − x∗ ), λ∗ ) − Lxx (x∗ , λ∗ ) (x − x∗ , x − x∗ )| 2 6 c kx − x∗ k[L∞ (Ω)]2 kx − x∗ k[L2 (Ω)]2 (3.5) for all kx − x∗ k[L∞ (Ω)]2 6 ε. Summarizing (3.3)–(3.5), we can estimate 2 f (x) > f (x∗ ) + β kx − x∗ k[L2 (Ω)]2 , where β := α0 − c kx − x∗ k[L∞ (Ω)]2 > α0 − c ε > 0 when ε is taken sufficiently small. 4. Generalized Equation We recall the necessary optimality conditions (FON) for problem (P), which read in explicit form a[v, p] + (dy (y)p, v) + (φy (y, u), v) − (µ2 , v) = 0, v ∈ H01 (Ω) (φu (y, u), v) − (p, v) − (µ1 , v) − (εµ2 , v) = 0, v ∈ L2 (Ω) 1 (4.1) a[y, v] + (d(y), v) − (u, v) = 0, v ∈ H0 (Ω) µ1 > 0, u > 0, µ1 u = 0 a.e. in Ω. µ2 > 0, εu + y − yc > 0, µ2 (εu + y − yc ) = 0 As was mentioned in the introduction, the local convergence analysis of SQP is based on its interpretation as Newton’s method for a generalized (set-valued) equation 0 ∈ F (y, u, p, µ1 , µ2 ) + N (y, u, p, µ1 , µ2 ) (4.2) equivalent to (4.1). We define K := {µ ∈ L∞ (Ω) : µ > 0 a.e. in Ω}, the cone of nonnegative functions in L∞ (Ω), and the dual cone N1 : L∞ (Ω) −→ P (L∞ (Ω)), ( {z ∈ L∞ (Ω) : (z, µ − ν) > 0 ∀ν ∈ K} if µ ∈ K, N1 (µ) := ∅ if µ 6∈ K. Here P (L∞ ) denotes the power set of L∞ (Ω), i.e., the set of all subsets In (4.2), F contains the single-valued part of (4.1), i.e., a[·, p] + (dy (y) p, ·) + (φy (y, u), ·) − (µ2 , ·) φu (y, u) − p − µ1 − εµ2 a[y, ·] + (d(y), ·) − (u, ·) F (y, u, p, µ1 , µ2 )(·) = u εu + y − yc of L∞ (Ω). and N is a set-valued function N (y, u, p, µ1 , µ2 ) = ({0}, {0}, {0}, N1 (µ1 ), N1 (µ2 ))> . Note that the generalized equation (4.2) is nonlinear, since it contains the nonlinear functions d, dy , φy and φu . 90 Numerical Methods and Applications Remark 4.1. Let W := Y × L∞ (Ω) × Y × L∞ (Ω) × L∞ (Ω), Z := L2 (Ω) × L∞ (Ω) × L2 (Ω) × L∞ (Ω) × L∞ (Ω). Then F : W −→ Z and N : W −→ P (Z). Owing to Assumptions (A3) and (A4), F is continuously Fréchet differentiable with respect to the L∞ (Ω) norms, see Lemma A.1. Lemma 4.2. The first-order necessary conditions (4.1) and the generalized equation (4.2) are equivalent. Proof. (4.2) ⇒ (4.1): This is immediate for the first three components. For the fourth component we have − u ∈ N1 (µ1 ) ⇒ µ1 ∈ K ⇒ µ1 (ξ) > 0 This implies and (−u, µ1 − ν) > 0 and for all ν ∈ K − u(ξ)(µ1 (ξ) − ν) > 0 µ1 (ξ) = 0 ⇒ for all ν > 0, a.e. in Ω. u(ξ) > 0 µ1 (ξ) > 0 ⇒ u(ξ) = 0, which shows the first complementarity system in (4.1). The second follows analogously. (4.1) ⇒ (4.2): This is again immediate for the first three components. From the first complementarity system in (4.1) we infer that u(ξ) ν > 0 for all ν > 0, ⇒ − u(ξ)(µ1 (ξ) − ν) > 0 ⇒ − (u, µ1 − ν) > 0 a.e. in Ω for all ν > 0, a.e. in Ω for all ν ∈ K. In view of µ1 ∈ K, this implies −u ∈ N1 (µ1 ). Again, −(εu + y − yc ) ∈ N1 (µ2 ) follows analogously. 5. SQP Method In this section we briefly recall the SQP (sequential quadratic programming) method for the solution of problem (P). We also discuss its equivalence with Newton’s method, applied to the generalized equation (4.2), which is often called the Lagrange-Newton approach. Throughout the rest of the paper we use the notation wk := (xk , λk ) = (y k , uk , pk , µk1 , µk2 ) ∈ W to denote an iterate of either method. SQP methods break down the solution of (P) into a sequence of quadratic programming problems. At any given iterate wk , one solves 1 (QPk ) Minimize fx (xk )(x − xk ) + Lxx (xk , λk )(x − xk , x − xk ) 2 subject to x = (y, u) ∈ Y × L∞ (Ω), the linear state equation A y + d(y k ) + dy (y k )(y − y k ) = u y=0 in Ω, on ∂Ω, (5.1) and inequality constraints u > 0 in Ω, εu + y − yc > 0 in Ω. The solution (which needs to be shown to exist) x = (y, u) ∈ Y × L∞ (Ω), (5.2) 5. SQP for Mixed-Constrained Optimal Control Problems 91 together with the adjoint state and Lagrange multipliers λ = (p, µ1 , µ2 ) ∈ Y × L∞ (Ω) × L∞ (Ω), will serve as the next iterate wk+1 . Lemma 5.1. There exists R > 0 such that (QPk ) has a unique global solution x = ∞ ∗ ∗ (y, u) ∈ X, provided that (xk , pk ) ∈ BR (x , p ). Proof. For every u ∈ L2 (Ω), the linearized PDE (5.1) has a unique solution y ∈ Y by Lemma 2.3. We define the feasible set M k := {x = (y, u) ∈ Y × L2 (Ω) satisfying (5.1) and (5.2)}. The set M k is non-empty, which follows from [4, Lemma 2.3] using δ3 = −d(y k ) + dy (y k ) y k . The proof uses the maximum principle for the differential operator Ay + dy (y k ) y. Clearly, M k is also closed and convex. The cost functional of (QPk ) can be decomposed into quadratic and affine parts in x. Lemma A.3 shows that there exists R > 0 and α00 > 0 such that 2 Lxx (xk , λk ) x, x > α00 kxk[L2 (Ω)]2 for all (y, u) ∈ X satisfying A y + dy (y k ) y = u in Ω with homogeneous Dirichlet ∞ ∗ ∗ (x , p ). This implies that the boundary conditions, provided that (xk , pk ) ∈ BR cost functional is uniformly convex, continuous (i.e., weakly lower semicontinuous) and radially unbounded, which shows the unique solvability of (QPk ) in Y × L2 (Ω). Using the optimality system (5.3) below, we can conclude as in [4, Lemma 2.7] that u ∈ L∞ (Ω). The solution (y, u) of (QPk ) and its Lagrange multipliers (p, µ1 , µ2 ) are characterized by the first order optimality system (compare [4, Lemma 2.5]): a[v, p] + (dy (y k ) p, v) + (φy (y k , uk ), v) + (φyu (y k , uk )(u − uk ), v) k k k k k 1 + (φyy (y , u ) + dyy (y ) p )(y − y ), v − (µ2 , v) = 0, v ∈ H0 (Ω) k k k k k (φu (y , u ), v) + (φuu (y , u )(u − u ), v) k k k 2 +(φuy (y , u )(y − y ), v) − (p, v) − (µ1 , v) − (εµ2 , v) = 0, v ∈ L (Ω) k k k 1 a[y, v] + (d(y ), v) + (dy (y )(y − y ), v) − (u, v) = 0, v ∈ H0 (Ω) µ1 > 0, u > 0, µ1 u = 0 a.e. in Ω. µ2 > 0, εu + y − yc > 0, µ2 (εu + y − yc ) = 0 (5.3) Note that due to the convexity of the cost functional, (5.3) is both necessary and ∞ ∗ ∗ sufficient for optimality, provided that (xk , pk ) ∈ BR (x , p ). Remark 5.2. The Lagrange multipliers (µ1 , µ2 ) and the adjoint state p in (5.3) need not be unique, compare [4, Remark 2.6]. Non-uniqueness can occur only if µ1 and µ2 are simulateneously nonzero on a set of positive measure. We recall for convenience the generalized equation (4.2), 0 ∈ F (w) + N (w). (5.4) Given the iterate wk , Newton’s method yields the next iterate wk+1 as the solution of the linearized generalized equation 0 ∈ F (wk ) + F 0 (wk )(w − wk ) + N (w). (5.5) Analogously to Lemma 4.2, one can show: Lemma 5.3. System (5.3) and the linearized generalized equation (5.5) are equivalent. 92 Numerical Methods and Applications 6. Strong Regularity The local convergence analysis of Newton’s method (5.5) for the solution of (5.4) is based on a perturbation argument. It will be carried out in Section 7. The main ingredient in the proof is the local Lipschitz stability of solutions w = w(η) of 0 ∈ F (η) + F 0 (η)(w − η) + N (w) (6.1) ∗ with respect to the parameter η near w . The difficulty arises due to the fact that η enters nonlinearly in (6.1). Therefore, we employ an implicit function theorem due to Dontchev [5] to derive this result. This theorem requires the so-called strong regularity of (5.4), i.e., the Lipschitz stability of solutions w = w(δ) of δ ∈ F (w∗ ) + F 0 (w∗ )(w − w∗ ) + N (w) (6.2) with respect to the new perturbation parameter δ, which enters linearly. The parameter δ belongs to the image space of F Z := L2 (Ω) × L∞ (Ω) × L2 (Ω) × L∞ (Ω) × L∞ (Ω), see Remark 4.1. Note that w∗ is a solution of both (5.4) and (6.2) for δ = 0. Definition 6.1 (see [13]). The generalized equation (5.4) is called strongly regular at w∗ if there exist radii r1 > 0, r2 > 0 and a positive constant Lδ such that for all perturbations δ ∈ BrZ1 (0), the following hold: (1) the linearized equation (6.2) has a solution wδ = w(δ) ∈ BrW2 (w∗ ) (2) wδ is the only solution of (6.2) in BrW2 (w∗ ) (3) wδ satisfies the Lipschitz condition kwδ − wδ0 kW 6 Lδ kδ − δ 0 kZ for all δ, δ 0 ∈ BrZ1 (0). The verification of strong regularity is based on the interpretation of (6.2) as the optimality system of the following QP problem, which depends on the perturbation δ: 1 (LQP(δ)) Minimize fx (x∗ )(x − x∗ ) + Lxx (x∗ , λ∗ ) x − x∗ , x − x∗ 2 − [δ1 , δ2 ], x − x∗ subject to x = (y, u) ∈ Y × L∞ (Ω), the linear state equation A y + d(y ∗ ) + dy (y ∗ )(y − y ∗ ) = u + δ3 in Ω, y=0 on ∂Ω, (6.3) and inequality constraints u > δ4 in Ω, εu + y − yc > δ5 in Ω. (6.4) As before, it is easy to check that the necessary optimality conditions of (LQP(δ)) are equivalent to (6.2). Lemma 6.2. For any δ ∈ Z, problem (LQP(δ)) possesses a unique global solution xδ = (yδ , uδ ) ∈ X. If λδ = (pδ , µ1,δ , µ2,δ ) ∈ Y × L∞ (Ω) × L∞ (Ω) are associated Lagrange multipliers, then (xδ , λδ ) satisfies (6.2). On other hand, if any (xδ , λδ ) ∈ W satisfies (6.2), then xδ is the unique global solution of (LQP(δ)), and λδ are associated adjoint state and Lagrange multipliers. Proof. For any given δ ∈ Z, let us denote by Mδ the set of all x = (y, u) ∈ Y × L2 (Ω) satisfying (6.3) and (6.4). Then Mδ is nonempty (as can be shown along the lines 5. SQP for Mixed-Constrained Optimal Control Problems 93 of [4, Lemma 2.3]), convex and closed. Moreover, (A5) implies that the cost functional fδ (x) of (LQP(δ)) satisfies α 2 fδ (x) > kxk[L2 (Ω)]2 + linear terms in x 2 for all x satisfying (6.3). As in the proof of Lemma 5.1, we conclude that (LQP(δ)) has a unique solution xδ = (yδ , uδ ) ∈ X. Suppose that λδ = (pδ , µ1,δ , µ2,δ ) ∈ Y × L∞ (Ω) × L∞ (Ω) are associated Lagrange multipliers, i.e., the necessary optimality conditions of (LQP(δ)) are satisfied. As argued above, it is easy to check that then (6.2) holds. On the other hand, suppose that any (xδ , λδ ) ∈ W satisfies (6.2), i.e., the necessary optimality conditions of (LQP(δ)). As fδ is strictly convex, these conditions are likewise sufficient for optimality, and the minimizer xδ is unique. The proof of Lipschitz stability of solutions for problems of type (LQP(δ)) has recently been achieved in [4]. The main difficulty consisted in overcoming the nonuniqueness of the associated adjoint state and Lagrange multipliers. We follow the same technique here. Definition 6.3. Let σ > 0 be real number. We define two subsets of Ω, S1σ = {ξ ∈ Ω : 0 6 u∗ (ξ) 6 σ} S2σ = {ξ ∈ Ω : 0 6 εu∗ (ξ) + y ∗ (ξ) − yc (ξ) 6 σ}, called the security sets of level σ for (P). Assumption. (A6) We require that S1σ ∩ S2σ = ∅ for some fixed σ > 0. From now on, we suppose (A1)–(A6) to hold. Assumption (A6) implies that the active sets A∗1 = {ξ ∈ Ω : u∗ (ξ) = 0} A∗2 = {ξ ∈ Ω : εu∗ (ξ) + y ∗ (ξ) − yc (ξ) = 0} are well separated. This in turn implies the uniqueness of the Lagrange multipliers and adjoint state (p∗ , µ∗1 , µ∗2 ). Due to a continuity argument, the same conclusions hold for the solution and Lagrange multipliers of (LQP(δ)) for sufficiently small δ, as proved in the following theorem. Theorem 6.4. There exist G > 0 and Lδ > 0 such that kδkZ 6 G σ implies: (1) The Lagrange multipliers λδ = (pδ , µ1,δ , µ2,δ ) for (LQP(δ)) are unique. (2) For any such δ and δ 0 , the corresponding solutions and Lagrange multipliers of (LQP(δ)) satisfy kxδ0 − xδ kY ×L∞ (Ω) + kλδ0 − λδ kY ×L∞ (Ω)×L∞ (Ω) 6 Lδ kδ 0 − δkZ . (6.5) Proof. The proof employs the technique introduced in [4], so we will only revisit the main steps here. In contrast to the linear quadratic problem considered in [4], the cost functional and PDE in (LQP(δ)) are slightly more general. To overcome potential non-uniqueness of Lagrange multipliers, one introduces an auxiliary problem with solutions (yδaux , uaux δ ), in which the inequality constraints (6.4) are considered only on the disjoint sets S1σ and S2σ , respectively. Then the associated Lagrange multipliers aux µaux are unique, see [4, Lemma 3.1]. For any two i,δ , i = 1, 2, and adjoint state pδ 0 perturbations δ, δ ∈ Z we abbreviate δu := uaux − uaux δ δ0 94 Numerical Methods and Applications and similarly for the remaining quantitites. From the optimality conditions of the auxiliary problem one deduces 2 2 (A5) α (kδykL2 (Ω) + kδukL2 (Ω) ) 6 Lxx (y ∗ , u∗ )(δx, δx) = (δ10 − δ1 , δy) + (δ20 − δ2 , δu) − (δ30 − δ3 , δp) + (δµ2 , δy) + (δµ1 , δu) + ε (δµ2 , δu) 6 (δ10 − δ1 , δy) + (δ20 − δ2 , δu) − (δ30 − δ3 , δp) + (δµ1 , δ40 − δ4 ) + (δµ2 , δ50 − δ5 ). The last inequality follows from [4, Lemma 3.3]. Young’s inequality yields α 2 2 (kδykL2 (Ω) + kδukL2 (Ω) ) 2 2 1 2 2 2 2 6 max , kδ − δ 0 k[L2 (Ω)]5 + κ kδpkL2 (Ω) + kδµ1 kL2 (Ω) + kδµ2 kL2 (Ω) , (6.6) α 4κ where κ > 0 is specified below. The difference of the adjoint states satisfies a[v, δp] + (dy (y ∗ ) δp, v) = −(φyy (y ∗ , u∗ ) δy, v) − (dyy (y ∗ ) p∗ δy, v) − (φyu (y ∗ , u∗ ) δu, v) + (δ1 − δ10 , v) + (δµ2 , v) for all v ∈ H01 (Ω). δµ1 = and εδµ2 = (6.7) The differences in the Lagrange multipliers are given by φuu (y ∗ , u∗ ) δu + φuy (y ∗ , u∗ ) δy − δp − (δ2 − δ20 ) 0 in S1σ in Ω \ S1σ (6.8) φuu (y ∗ , u∗ ) δu + φuy (y ∗ , u∗ ) δy − δp − (δ2 − δ20 ) 0 in S2σ , in Ω \ S2σ (6.9) The substitution of δµ2 into (6.7) yields 1 a[v, δp] + (dy (y ∗ ) δp, v) + (δp, χS2σ · v) ε = −(φyy (y ∗ , u∗ ) δy, v) − (dyy (y ∗ ) p∗ , δy) − φyu (y ∗ , u∗ ) δu + (δ1 − δ10 , v) 1 1 + (φuu (y ∗ , u∗ ) δu, χS2σ · v) + (φuy (y ∗ , u∗ ) δy, χS2σ · v) − (δ2 − δ20 , χS2σ · v). ε ε A standard a priori estimate (compare Lemma 2.3) implies kδpkL2 (Ω) 6 kδpkY 6 c kδykL2 (Ω) + kδukL2 (Ω) + kδ1 − δ10 kL2 (Ω) + kδ2 − δ20 kL2 (Ω) . From (6.8) and (6.9), we infer that kδµ1 kL2 (Ω) and kδµ2 kL2 (Ω) can be estimated a similar expression. Plugging these estimates into (6.6), and choosing κ sufficiently small, we get 2 2 2 kδykL2 (Ω) + kδukL2 (Ω) 6 caux kδ − δ 0 k[L2 (Ω)]5 . By a priori estimates for the linearized and adjoint PDEs, we immediately obtain Lipschitz stability for δy and thus for δp with respect to the H 2 (Ω)-norm. The projection formula (compare [4, Lemma 2.7] and also Lemma A.1) n yc + δ5 − yδaux aux ∗ ∗ µaux − u∗ 1,δ + εµ2,δ = max 0, φuu (y , u ) max δ4 , ε o + φuy (y ∗ , u∗ ) (yδaux − y ∗ ) + φu (y ∗ , u∗ ) − paux − δ2 δ aux yields the L∞ (Ω)-regularity for the Lagrange multipliers (µaux 1,δ , µ2,δ ) and the control aux uδ . As in [4, Lemma 3.5], we conclude kδµ1 + ε δµ2 kL∞ (Ω) 6 c kδ 0 − δkZ . 5. SQP for Mixed-Constrained Optimal Control Problems 95 From the optimality system we have φuu (y ∗ , u∗ ) δu = δµ1 + ε δµ2 − φuy (y ∗ , u∗ ) δy + δp + (δ2 − δ20 ), which implies by Assumption (A4) m kδukL∞ (Ω) 6 c kδµ1 + ε δµ2 kL∞ (Ω) + kδykL∞ (Ω) + kδpkL∞ (Ω) + kδ2 − δ20 kL∞ (Ω) and yields the desired L∞ -stability for the control of the auxiliary problem. As in [4, Lemma 4.1], one shows that for kδkZ 6 G σ (for a certain constant G > 0), the solution (yδaux , uaux δ ) of the auxiliary problem coincides with the solution of (LQP(δ)). Likewise, the Lagrange multipliers and adjoint states of both problems coincide and are Lipschitz stable in L∞ (Ω) and Y , respectively (see [4, Lemma 4.4]). Remark 6.5. Theorem 6.4, together with Lemma 6.2, proves the strong regularity of (5.4) at w∗ . In order to apply the implicit function theorem, we verify that (6.1) satisfies a Lipschitz condition with respect to η, uniformly in a neighborhood of w∗ . Lemma 6.6. For any radii r3 > 0, r4 > 0 there exists L > 0 such that for any η1 , η2 ∈ BrW3 (w∗ ) and for all w ∈ BrW4 (w∗ ) there holds the Lipschitz condition kF (η1 ) + F 0 (η1 )(w − η1 ) − F (η2 ) − F 0 (η2 )(w − η2 )kZ 6 L kη1 − η2 kW . (6.10) Proof. Let us denote ηi = (yi , ui , pi , µi1 , µi2 ) ∈ BrW3 (w∗ ) and w = (y, u, p, µ1 , µ2 ) ∈ BrW4 (w∗ ), with r3 , r4 > 0 arbitrary. A simple calculation shows F (η1 ) + F 0 (η1 )(w − η1 ) − F (η2 ) − F 0 (η2 )(w − η2 ) = (f1 (y1 , u1 ) − f1 (y2 , u2 ), f2 (y1 , u1 ) − f2 (y2 , u2 ), f3 (y1 ) − f3 (y2 ), 0, 0)> , where f1 (yi , ui ) = dy (yi ) p + φy (yi , ui ) + [φyy (yi , ui ) + dyy (yi ) pi ](y − yi ) + φyu (yi , ui )(u − ui ) f2 (yi , ui ) = φu (yi , ui ) + φuy (yi , ui )(y − yi ) + φuu (yi , ui )(u − ui ) f3 (yi ) = d(yi ) + dy (yi )(y − yi ). We consider only the Lipschitz condition for f3 , the rest follows analogously. Using the triangle inequality, we obtain kf3 (y1 ) − f3 (y2 )kL2 (Ω) 6 kd(y1 ) − d(y2 )kL2 (Ω) + kdy (y1 )(y2 − y1 )kL2 (Ω) + k(dy (y1 ) − dy (y2 ))(y − y2 )kL2 (Ω) 6 kd(y1 ) − d(y2 )kL2 (Ω) + kdy (y1 )kL∞ (Ω) ky2 − y1 kL2 (Ω) + kdy (y1 ) − dy (y2 )kL∞ (Ω) ky − y2 kL2 (Ω) . The properties of d, see Lemma A.1, imply that kdy (y1 )kL∞ (Ω) is uniformly bounded for all y1 ∈ Br∞3 (y ∗ ). Moreover, ky − y2 kL2 (Ω) 6 ky − y ∗ kL2 (Ω) + ky ∗ − y2 kL2 (Ω) 6 c (r3 + r4 ) holds. Together with the Lipschitz properties of d and dy , see again Lemma A.1, we obtain kf3 (y1 ) − f3 (y2 )kL2 (Ω) 6 L ky1 − y2 kL∞ (Ω) for some constant L > 0. Using Theorem 6.4 and Lemma 6.6, the main result of this section follows directly from Dontchev’s implicit function theorem [5, Theorem 2.1]: 96 Numerical Methods and Applications Theorem 6.7. There exist radii r5 > 0, r6 > 0 such that for any parameter η ∈ BrW5 (w∗ ), there exists a solution w(η) ∈ BrW6 (w∗ ) of (6.1), which is unique in this neighborhood. Moreover, there exists a constant Lη > 0 such that for each η1 , η2 ∈ BrW5 (w∗ ), the Lipschitz estimate kw(η1 ) − w(η2 )kW 6 Lη kη1 − η2 kW holds. 7. Local Convergence Analysis of SQP This section is devoted to the local quadratic convergence analysis of the SQP method. As was shown in Section 5, the SQP method is equivalent to Newton’s method (5.5), applied to the generalized equation (5.4). It is convenient to carry out the convergence analysis on the level of generalized equations. As mentioned in the previous section, the key property is the local Lipschitz stability of solutions w(η) of (6.1) and w(δ) of (6.2), as proved in Theorems 6.7 and 6.4, respectively. In the proof of our main result, the iterates wk are considered perturbations of the solution w∗ of (5.4) and play the role of the parameter η. We recall the function spaces W := Y × L∞ (Ω) × Y × L∞ (Ω) × L∞ (Ω) Y := H 2 (Ω) ∩ H01 (Ω) Z := L2 (Ω) × L∞ (Ω) × L2 (Ω) × L∞ (Ω) × L∞ (Ω) Theorem 7.1. There exists a radius r > 0 and a constant CSQP > 0 such that for each starting point w0 ∈ BrW (w∗ ), the sequence of iterates wk generated by (5.5) is well-defined in BrW (w∗ ) and satisfy k+1 2 w − w∗ W 6 CSQP wk − w∗ W . Proof. Suppose that the iterate wk ∈ BrW (w∗ ) is given. The radius r satisfying r5 > r > 0 will be specified below. From Theorem 6.7, we infer the existence of a solution wk+1 of (5.5) which is unique in BrW6 (w∗ ). That is, we have 0 ∈ F (w∗ ) + F 0 (w∗ )(w∗ − w∗ ) + N (w∗ ), k 0 k 0 ∈ F (w ) + F (w )(w k+1 k − w ) + N (w (7.1a) k+1 ). (7.1b) Adding and subtracting the terms F 0 (w∗ )(wk+1 − w∗ ) and F (w∗ ) to (7.1b), we obtain δ k+1 ∈ F (w∗ ) + F 0 (w∗ )(wk+1 − w∗ ) + N (wk+1 ) (7.2) where δ k+1 := F (w∗ ) − F (wk ) + F 0 (w∗ )(wk+1 − w∗ ) − F 0 (wk )(wk+1 − wk ). From Lemma 6.6 with η1 := w∗ , η2 := wk , w := wk+1 , and r3 := r5 , r4 := r6 , we get k+1 δ 6 L wk − w∗ < L r, (7.3) Z W k+1 6 G σ holds whenever where L depends only on the radii. That is, δ Z r6 Gσ , L which we impose on r. Lemma 6.2 shows that (7.1a) and (7.2) are equivalent to problem (LQP(δ)) for δ = 0 and δ = δ k+1 , respectively. From Theorem 6.4, we thus obtain k+1 w − w∗ W 6 Lδ δ k+1 − 0Z . (7.4) 5. SQP for Mixed-Constrained Optimal Control Problems 97 It remains to verify that δ k+1 Z is quadratic in wk − w∗ W . We estimate k+1 δ 6 F (w∗ ) − F (wk ) − F 0 (wk )(w∗ − wk ) Z Z + (F 0 (w∗ ) − F 0 (wk ))(wk+1 − w∗ )Z . As kin the proof of Theorem 3.4, the first term is bounded by a constant times w − w∗ 2 ∞ 5 . Moreover, the Lipschitz properties of the terms in F 0 imply that [L (Ω)] the second term is bounded by a constant times wk − w∗ ∞ 5 wk+1 − w∗ 2 5 . [L (Ω)] We thus conclude k+1 δ 6 c1 wk − w∗ 2 + c2 wk − w∗ wk+1 − w∗ , Z W W W [L (Ω)] (7.5) where the constants depend only on the radius r5 . We finally choose r as o n Gσ 1 . , r = min r5 , L Lδ max{2 c2 , c1 + c2 Lδ L} Then (7.3)–(7.5) imply wk+1 ∈ BrW (w∗ ) since k+1 w − w∗ W < Lδ c1 r + c2 wk+1 − w∗ W r 6 Lδ c1 + c2 Lδ L r2 6 r. Moreover, (7.4)–(7.5) yield k+1 2 w − w∗ W 6 Lδ c1 wk − w∗ W + c2 Lδ r wk+1 − w∗ W and thus k+1 2 w − w∗ W 6 CSQP wk − w∗ W holds with CSQP = Lδ c1 1−c2 Lδ r . Clearly, Theorem 7.1 proves the local quadratic convergence of the SQP method. Recall that the iterates wk are defined by means of Theorem 6.7, as the local unique solutions, Lagrange multipliers and adjoint states of (QPk ). Indeed, we can now prove that wk+1 = (xk+1 , λk+1 ) is globally unique, provided that wk is already sufficiently close to w∗ . Corollary 7.2. There exists a radius r0 > 0 such that wk ∈ BrW0 (w∗ ) implies that (QPk ) has a unique global solution xk+1 . The associated Lagrange multipliers and adjoint state λk+1 = (µk+1 , µk+1 , pk+1 ) are also unique. The iterate wk+1 lies again 1 2 W ∗ ∗ in Br0 (x , λ ). Proof. We first observe that Theorem 7.1 remains valid (with the same constant CSQP ) if r is taken to be smaller than chosen in the proof. Here, we set o n σ , R, r , r0 = min σ, c∞ + ε where R and r are the radii from Lemma 5.1 and Theorem 7.1, respectively, and c∞ is the embedding constant of H 2 (Ω) ,→ L∞ (Ω). Suppose that wk ∈ BrW0 (w∗ ) holds. Then Lemma 5.1 implies that (QPk ) possesses a globally unique solution xk+1 ∈ Y × L∞ (Ω). The corresponding active sets are defined by Ak+1 := {ξ ∈ Ω : uk+1 (ξ) = 0} 1 Ak+1 := {ξ ∈ Ω : εuk+1 (ξ) + y k+1 (ξ) − yc (ξ) = 0}. 2 We show that A1k+1 ⊂ S1σ and Ak+1 ⊂ S2σ . For almost every ξ ∈ Ak+1 , we have 2 1 ∗ ∗ ∗ k+1 k+1 0 u (ξ) = u (ξ) − u (ξ) 6 u − u 6 r 6 σ, L∞ (Ω) 98 Numerical Methods and Applications since Theorem 7.1 implies that wk+1 ∈ BrW0 (w∗ ) and thus in particular uk+1 ∈ Br∞0 (u∗ ). By the same argument, for almost every ξ ∈ Ak+1 we obtain 2 y ∗ (ξ) + ε u∗ (ξ) − yc (ξ) = y ∗ (ξ) + ε u∗ (ξ) − y k+1 (ξ) − ε uk+1 (ξ) 6 y ∗ − y k+1 ∞ + ε u∗ − uk+1 ∞ L (Ω) L (Ω) 0 6 (c∞ + ε) r 6 σ. Owing to Assumption (A6), the active sets Ak+1 and Ak+1 are disjoint, and one can 1 2 k+1 show as in [4, Lemma 3.1] that the Lagrange multipliers µ1 , µk+1 and adjoint state 2 pk+1 are unique. 8. Conclusion We have studied a class of distributed optimal control problems with semilinear elliptic state equation and a mixed control-state constraint as well as a pure control constraint on the domain Ω. We have assumed that (y ∗ , u∗ ) is a solution and (p∗ , µ∗1 , µ∗2 ) are Lagrange multipliers which satisfy second-order sufficient optimality conditions (A5). Moreover, the active sets at the solution were assumed to be well separated (A6). We have shown the local quadratic convergence of the SQP method towards this solution. In particular, we have proved that the quadratic subproblems possess global unique solutions and unique Lagrange multipliers. Appendix A. Auxiliary Results In this appendix we collect some auxiliary results. We begin with a standard result for the Nemyckii operators d(·) and φ(·) whose proof can be found, e.g., in [18, Lemma 4.10, Satz 4.20]. Throughout, we impose Assumptions (A1)–(A5). Lemma A.1. The Nemyckii operator d(·) maps L∞ (Ω) into L∞ (Ω) and it is twice continuously differentiable in these spaces. For arbitrary M > 0, the Lipschitz condition kdyy (y1 ) − dyy (y2 )kL∞ (Ω) 6 Ld (M ) ky1 − y2 kL∞ (Ω) holds for all yi ∈ L∞ (Ω) such that kyi kL∞ (Ω) 6 M , i = 1, 2. In particular, kdyy (y)kL∞ (Ω) 6 Kd + Ld (M ) M holds for all y ∈ L∞ (Ω) such that kykL∞ (Ω) 6 M . The same properties, with different constants, are valid for dy (·) and d(·). Analogous results hold for φ and its derivatives up to second-order, for all (y, u) ∈ [L∞ (Ω)]2 such that kyi kL∞ (Ω) + kui kL∞ (Ω) 6 M . The remaining results address the coercivity of the second derivative of the Lagrangian, considered at different lienarization points and for perturbed PDEs. Recall that (x∗ , λ∗ ) ∈ W satisfies the second-order sufficient conditions (SSC) with coercivity constant α > 0, see (3.1). Lemma A.2. There exists ε > 0 and α0 > 0 such that 2 Lxx (x∗ , λ∗ )(x − x∗ , x − x∗ ) > α0 kx − x∗ k[L2 (Ω)]2 (A.1) holds for all x = (y, u) ∈ Y × L∞ (Ω) which satisfy the semilinear PDE (1.1) and kx − x∗ k[L∞ (Ω)]2 6 ε. Proof. Let x = (y, u) satisfy (1.1). We define δx = (δy, δu) ∈ Y × L∞ (Ω) as A δy + dy (y ∗ ) δy = δu on Ω with homogeneous Dirichlet boundary conditions. Then the error e := y ∗ − y − δy satisfies the linear PDE A e + dy (y ∗ ) e = f on Ω (A.2) 5. SQP for Mixed-Constrained Optimal Control Problems 99 with homogeneous Dirichlet boundary conditions and f := d(y) − d(y ∗ ) − dy (y ∗ )(y − y ∗ ). We estimate kf kL2 (Ω) Z 1 ∗ ∗ ∗ ∗ = dy (y + s(y − y )) − dy (y ) ds (y − y ) 0 L2 (Ω) Z 1 6L s ds ky − y ∗ kL∞ (Ω) ky − y ∗ kL2 (Ω) 0 L 6 ky − y ∗ kL∞ (Ω) kδykL2 (Ω) + kekL2 (Ω) . 2 In view of Lemma A.1, dy (y ∗ ) ∈ L∞ (Ω) holds and it is a standard result that the unique solution e of (A.2) satisfies an a priori estimate kekL∞ (Ω) 6 c kf kL2 (Ω) . In view of the embedding L∞ (Ω) ,→ L2 (Ω) we obtain Lε kδykL2 (Ω) + kekL2 (Ω) . 2 For sufficiently small ε > 0, we can absorb the last term in the left hand side and obtain kekL2 (Ω) 6 c00 (ε) kδykL2 (Ω) kekL2 (Ω) 6 c0 where c00 (ε) & 0 as ε & 0. A straightforward application of [9, Lemma 5.5] concludes the proof. Lemma A.3. There exists R > 0 and α00 > 0 such that 2 Lxx (xk , λk )(x, x) > α00 kxk[L2 (Ω)]2 holds for all (y, u) ∈ Y × L2 (Ω): A y + dy (y k ) y = u in Ω (A.3) y=0 on ∂Ω, k provided that x − x∗ L∞ (Ω) + pk − p∗ L∞ (Ω) < R. Proof. Let (y, u) be an arbitrary pair satisfying (A.3) and define ŷ ∈ Y as the unique solution of A ŷ + dy (y ∗ ) ŷ = u in Ω ŷ = 0 on ∂Ω, for the same control u as above. Then δy := y − ŷ satisfies A δy + dy (y ∗ ) δy = dy (y ∗ ) − dy (y k ) y in Ω with homogeneous boundary conditions. A standard a priori estimate and the triangle inequality yield kδykL2 (Ω) 6 dy (y ∗ ) − dy (y k )L∞ (Ω) kykL2 (Ω) 6 dy (y ∗ ) − dy (y k )L∞ (Ω) kŷkL2 (Ω) + kδykL2 (Ω) . Due to the Lipschitz property of dy (·) with respect to L∞ (Ω), there exists a function c(R) tending to 0 as R → 0, such that dy (y ∗ ) − dy (y k )L∞ (Ω) 6 c(R), provided that k y − y ∗ ∞ < R. For sufficiently small R, the term kδyk 2 can be absorbed in L (Ω) the left hand side, and we obtain kδykL2 (Ω) 6 c0 (R) kŷkL2 (Ω) , L (Ω) 100 Numerical Methods and Applications where c0 (R) has the same property as c(R). Again, [9, Lemma 5.5] implies that there exists α0 > 0 and R > 0 such that 2 L00xx (x∗ , λ∗ )(x, x) > α0 kxkL2 (Ω) , provided that y k − y ∗ L∞ (Ω) < R. Note that L00xx depends only on x and the adjoint state p. Owing to its Lipschitz property, we further conclude that L00xx (xk , λk )(x, x) = L00xx (x∗ , λ∗ )(x, x) + L00xx (xk , λk ) − L00xx (x∗ , λ∗ ) (x, x) 2 2 > α0 kxkL2 (Ω) − L (xk , pk ) − (x∗ , p∗ )L∞ (Ω) kxkL2 (Ω) 2 2 > α0 − L R kxkL2 (Ω) =: α00 kxkL2 (Ω) , ∞ ∗ ∗ given that (xk , pk ) ∈ BR (x , p ). For sufficiently small R, we obtain α00 > 0, which completes the proof. Acknowledgement This work was supported by the Austrian Science Fund FWF under project number P18056-N12. References [1] R. Adams. Sobolev Spaces. Academic Press, New York-London, 1975. Pure and Applied Mathematics, Vol. 65. [2] W. Alt. The Lagrange-Newton method for infinite-dimensional optimization problems. Numerical Functional Analysis and Optimization, 11:201–224, 1990. [3] W. Alt. Local convergence of the Lagrange-Newton method with applications to optimal control. Control and Cybernetics, 23(1–2):87–105, 1994. [4] W. Alt, R. Griesse, N. Metla, and A. Rösch. Lipschitz stability for elliptic optimal control problems with mixed control-state constraints. submitted, 2006. [5] A. Dontchev. Implicit function theorems for generalized equations. Mathematical Programming, 70:91–106, 1995. [6] P. Grisvard. Elliptic Problems in Nonsmooth Domains. Pitman, Boston, 1985. [7] M. Heinkenschloss and F. Tröltzsch. Analysis of the Lagrange-SQP-Newton Method for the Control of a Phase-Field Equation. Control Cybernet., 28:177–211, 1998. [8] H. Maurer. First and Second Order Sufficient Optimality Conditions in Mathematical Programming and Optimal Control. Mathematical Programming Study, 14:163–177, 1981. Mathematical programming at Oberwolfach (Proc. Conf., Math. Forschungsinstitut, Oberwolfach, 1979). [9] H. Maurer and J. Zowe. First and second order necessary and sufficient optimality conditions for infinite-dimensional programming problems. Mathematical Programming, 16:98–110, 1979. [10] C. Meyer, U. Prüfert, and F. Tröltzsch. On two numerical methods for state-constrained elliptic control problems. Optimization Methods and Software, to appear. [11] C. Meyer, A. Rösch, and F. Tröltzsch. Optimal control of PDEs with regularized pointwise state constraints. Computational Optimization and Applications, 33(2–3):209–228, 2005. [12] C. Meyer and F. Tröltzsch. On an elliptic optimal control problem with pointwise mixed controlstate constraints. In A. Seeger, editor, Recent Advances in Optimization. Proceedings of the 12th French-German-Spanish Conference on Optimization, volume 563 of Lecture Notes in Economics and Mathematical Systems, pages 187–204, New York, 2006. Springer. [13] S. Robinson. Strongly regular generalized equations. Mathematics of Operations Research, 5(1):43–62, 1980. [14] A. Rösch and F. Tröltzsch. Sufficient second-order optimality conditions for a parabolic optimal control problem with pointwise control-state constraints. SIAM Journal on Control and Optimization, 42(1):138–154, 2003. [15] A. Rösch and F. Tröltzsch. Existence of regular Lagrange multipliers for elliptic optimal control problems with pointwise control-state constraints. SIAM Journal on Control and Optimization, 45(2):548–564, 2006. [16] A. Rösch and F. Tröltzsch. Sufficient second-order optimality conditions for an elliptic optimal control problem with pointwise control-state constraints. SIAM Journal on Optimization, 17(3):776–794, 2006. 5. SQP for Mixed-Constrained Optimal Control Problems 101 [17] F. Tröltzsch. On the Lagrange-Newton-SQP method for the optimal control of semilinear parabolic equations. SIAM Journal on Control and Optimization, 38(1):294–312, 1999. [18] F. Tröltzsch. Optimale Steuerung partieller Differentialgleichungen. Theorie, Verfahren und Anwendungen. Vieweg, Wiesbaden, 2005. [19] F. Tröltzsch and S. Volkwein. The SQP method for control constrained optimal control of the Burgers equation. ESAIM: Control, Optimisation and Calculus of Variations, 6:649–674, 2001. [20] F. Tröltzsch and D. Wachsmuth. Second-order sufficient optimality conditions for the optimal control of Navier-Stokes equations. ESAIM: Control, Optimisation and Calculus of Variations, 12(1):93–119, 2006. 102 Numerical Methods and Applications 6. Update Strategies for Perturbed Nonsmooth Equations R. Griesse, T. Grund and D. Wachsmuth: Update Strategies for Perturbed Nonsmooth Equations, to appear in: Optimization Methods and Software, 2007 This paper addresses the question how the optimal control of a perturbed problem (with parameter π) can be recovered from the optimal control of the nominal problem (with parameter π0 ), and from derivative information. Our analysis is carried out in a general setting where the unknown function u is the solution of a nonsmooth equation (6.1) u = ΠUad g(π) − G(π) u . Here G(π) is a linear and monotone operator with smoothing properties, and π is a perturbation parameter. We denote the unique solution of (6.1) by Ξu (π), see Lemma 3.1. Example: In the context of an optimal control problem such as (Pcc (δ)) on p. 7, δ plays the role of π and we have G(π) = S ? S/γ, where S is the solution operator of the PDE, S ? is its adjoint, and g(π) = S ? yd /γ. One of the results of this paper (see Theorem 4.2) is the Bouligand differentiability of the projection ΠUad between Lp spaces with a norm gap, which generalizes a previous result in Malanowski [2003b]. The directional derivative of the projection ΠUad is given by another projection whose upper and lower bounds are either zero or ±∞, depending on whether the projection is active or not. This was already observed in Theorem 0.4, see p. 9. This norm gap is responsible for the observation that the Taylor expansion (6.2) Ξu (π0 ) + DΞu (π0 ; π − π0 ) does not yield error estimates in L∞ , and neither does the modification (6.3) ΠUad Ξu (π0 ) + DΞu (π0 ; π − π0 ) , see Theorem 7.1. Note that, in contrast to (6.2), the expression (6.3) produces a feasible estimate for the solution Ξu (π) of the perturbed problem. We propose in this paper an alternative update strategy, which uses an adjoint variable given by the solution of (6.4) φ = g(π) − G(π)ΠUad (φ). The essential observation here is that the order of the projection and smoothing operations is reversed with respect to (6.1). Primal and adjoint variables are related by u = ΠUad (φ), and φ = g(π) − G(π) u. We denote the unique solution of (6.4) by Ξφ (π) and propose the update formula (6.5) ΠUad Ξφ (π0 ) + DΞφ (π0 ; π − π0 ) . We are then able to prove L∞ error estimates for (6.5), see Theorem 7.1 of the paper. We also show that the nominal solution Ξu (π0 ) of (6.1) as well as the derivative DΞu (π0 ; π − π0 ) can be efficiently computed by a generalized (semismooth) Newton method, see Bergounioux, Ito, and Kunisch [1999], Hintermüller, Ito, and Kunisch [2002], Ulbrich [2003]. It turns out that the adjoint quantities Ξφ (π0 ) and DΞφ (π0 ; π − π0 ) appear naturally in the Newton iterations and thus incur no additional work, see Section 6 of the paper. As our main application, we re-interpret these update strategies in the context of optimal control problems with control constraints (Section 8). Suppose that the optimal control u and the adjoint state p are related by u = ΠUad (p/γ), as for instance in the 6. Update Strategies for Perturbed Nonsmooth Equations 103 model problem (Pcc (δ)), see 7. Then p = γ φ holds, and our proposed strategy (6.5) amounts to the update formula (6.6) ΠUad Ξp (π0 ) + DΞp (π0 ; π − π0 ) /γ based on the adjoint state. We also note that (6.2) and (6.3) lack the ability to accurately predict the behavior of the active sets under the change from π to π0 . The reason is that DΞu (π0 ; ·) is zero on the strongly active subsets. In contrast, (6.5) and (6.6) can predict such a change. We refer to Figure 6.1 below for an illustration. The paper concludes with numerical results which confirm the theoretical findings and show that indeed (6.5) yields much better results in recovering the solution of perturbed problems. As we remark in Section 7 of the paper, however, the full potential can only be revealed in nonlinear applications, where the solution of the derivative problem is significantly less expensive than the solution of the original problem. bound ub bound ub nominal u0 nominal u0 nominal φ0 nominal φ 0 u and φ u and φ u updated by (6.3) x−axis x−axis bound ub bound ub nominal u nominal u0 0 nominal φ nominal φ0 φ updated u updated by (6.5) φ updated u and φ u and φ 0 x−axis x−axis Figure 6.1. The top left figure shows the nominal or unperturbed situation, where u0 = ΠUad (φ0 ) holds. (We use the notation u0 = Ξu (π0 ) and φ0 = Ξφ (π0 ) here.) In the top right figure, π0 has changed to π and u has been updated by (6.3). One clearly sees that the change of the active set is missed since DΞu (π0 ; ·) is zero on the strongly active subset. The lower left figure shows φ updated by Ξφ (π0 ) + DΞφ (π; π − π0 ). Finally, the bottom right figure displays the situation where u has been updated by (6.5). The change of the active set is now captured. 104 Numerical Methods and Applications UPDATE STRATEGIES FOR PERTURBED NONSMOOTH EQUATIONS ROLAND GRIESSE, THOMAS GRUND AND DANIEL WACHSMUTH Abstract. Nonsmooth operator equations in function spaces are considered, which depend on perturbation parameters. The nonsmoothness arises from a projection onto an admissible interval. Lipschitz stability in L∞ and Bouligand differentiability in Lp of the parameter-to-solution map are derived. An adjoint problem is introduced for which Lipschitz stability and Bouligand differentiability in L∞ are obtained. Three different update strategies, which recover a perturbed from an unperturbed solution, are analyzed. They are based on Taylor expansions of the primal and adjoint variables, where the latter admits error estimates in L∞ . Numerical results are provided. 1. Introduction In this work we consider nonsmooth operator equations of the form u = Π[a,b] (g(θ) − G(θ)u), (Oθ ) where the unknown u ∈ L2 (D) is defined on some bounded domain D ⊂ RN , and θ is a parameter. Moreover, Π[a,b] denotes the pointwise projection onto the set Uad = {u ∈ L2 (D) : a(x) ≤ u(x) ≤ b(x) a.e. on D}. Such nonsmooth equations appear as a reformulation of the variational inequality Find u ∈ Uad s.t. hu + G(θ)u − g(θ), v − ui ≥ 0 for all v ∈ Uad . (VI θ ) Applications of (VI θ ) abound, and we mention in particular control-constrained optimal control problems. Throughout, G(θ) : L2 (D) → L2+δ (D) is a bounded and monotone linear operator with smoothing properties, such as a solution operator to a differential equation, and g(θ) ∈ L∞ (D). Both G and g may depend nonlinearly and also in a nonsmooth way on a parameter θ in some normed linear space Θ. Under conditions made precise in Section 2, (Oθ ) has a unique solution u[θ] for any given θ. We are concerned here with the behavior of u[θ] under perturbations of the parameter. In particular, we establish the directional differentiability of the nonsmooth map u[·] with uniformly vanishing remainder, a concept called Bouligand differentiability (B-differentiability for short). We prove B-differentiability of u[·] : Θ → Lp (D) for p ∈ [1, ∞), which is a sharp result and allows a Taylor expansion of u[·] around a reference parameter θ0 with error estimates in Lp (D). Based on this Taylor expansion, we analyze three update strategies C1 (θ) := u0 + u′ [θ0 ](θ − θ0 ) C2 (θ) := Π[a,b] u0 + u′ [θ0 ](θ − θ0 ) C3 (θ) := Π[a,b] φ0 + φ′ [θ0 ](θ − θ0 ) which allow to recover approximations of the perturbed solution u[θ] from the reference solution u0 = u[θ0 ] and derivative information. Our main result is that (C3 ), which 6. Update Strategies for Perturbed Nonsmooth Equations 105 involves a dual (adjoint) variable satisfying φ = g(θ) − G(θ)Π[a,b] φ, ∞ allows error estimates in L (D) while the other strategies do not. We therefore advocate to use update strategy (C3 ). As an important application, our setting accomodates linear–quadratic optimal control problems, where u is the control variable, S represents the control–to–state map associated to a linear elliptic or parabolic partial differential equation and G = S ⋆ S. Then (Oθ ) are necessary and sufficient optimality conditions. We shall elaborate on this case later on. In the context of optimal control, B-differentiability of optimal solutions for semilinear problems has been investigated in [4, 6]. We provide here a simplified proof in the linear case. The outline of the paper is as follows: In Section 2, we specify the problem setting and recall the concept of B-differentiability. In Sections 3 and 4, we prove the Lipschitz stability of the solution map u[·] into L∞ (D) and its B-differentiability into Lp (D), p < ∞. Section 5 is devoted to the analysis of the adjoint problem, for which we prove B-differentiability into L∞ (D). In Section 6, we discuss the application of the semismooth Newton method to the original problem and the problem associated with the derivative. We analyze the three update strategies (C1 )–(C3 ) in Section 7 and prove error estimates. In Section 8 we apply our results to the optimal control of a linear elliptic partial differential equation and report on numerical results confirming the superiority of the adjoint-based strategy (C3 ). Throughout, c and L denote generic positive constants which take different values in different locations. 2. Problem Setting Let us specify the standing assumptions for problem (Oθ ) taken to hold throughout the paper. We assume that D ⊂ RN is a bounded and measurable domain, N ≥ 1. By Lp (D), 1 ≤ p ≤ ∞, we denote the Lebuesge spaces of p-integrable or essentially bounded functions on D. We write hu, vi to denote the scalar product of two functions u, v ∈ L2 (D). The norm in Lp (D) is denoted by k · kp or simply k · k in the case p = 2. The space of bounded linear operators from Lp (D) to Lq (D) is denoted by L(Lp (D), Lq (D)) and its norm by k · kp→q . The lower and upper bounds a, b : D → [−∞, ∞] for the admissible set are functions satisfying a(x) ≤ b(x) a.e. on D. We assume the existence of an admissible function u∞ ∈ L∞ (D) ∩ Uad . Hence, the admissible set Uad = {u ∈ L2 (D) : a(x) ≤ u(x) ≤ b(x) a.e. on D} is nonempty, convex and closed but not necessarily bounded in L2 (D). Π[a,b] denotes the pointwise projection of a function on D onto Uad , i.e., Π[a,b] u = max{a, min{u, b}} pointwise on D. Note that Π[a,b] : Lp (D) → Lp (D) is Lipschitz continuous with Lipschitz constant 1 for all p ∈ [1, ∞]. Finally, let Θ be the normed linear space of parameters with norm k · k and let θ0 ∈ Θ be a given reference parameter. We recall two definitions: Definition 2.1. A function f : X → Y is said to be locally Lipschitz continuous at x0 ∈ X if there exists an open neighborhood of x0 and L > 0 such that kf (x) − f (y)kY ≤ Lkx − ykX holds for all x, y in the said neighborhood of x0 . In addition, f is said to be locally Lipschitz continuous if it is locally Lipschitz continuous at all x0 ∈ X. 106 Numerical Methods and Applications Definition 2.2. A function f : X → Y between normed linear spaces X and Y is said to be B-differentiable at x0 ∈ X if there exists ε > 0 and a positively homogeneous operator f ′ (x0 ) : X → Y such that f (x) = f (x0 ) + f ′ (x0 )(x − x0 ) + r(x0 ; x − x0 ) holds for all x ∈ X, where the remainder satisfies kr(x0 ; x − x0 )kY /kx − x0 kX → 0 as kx − x0 kX → 0. In addition, f is said to be B-differentiable if it is B-differentiable at all x0 ∈ X. The B-derivative is also called a directional Fréchet derivative, see [1]. Recall that an operator A : X → Y is said to be positively homogeneous if A(λx) = λA(x) holds for all λ ≥ 0 and all x ∈ X. Let us specify the standing assumptions for the function g: (1) g is locally Lipschitz continuous from Θ to L∞ (D) (2) g is B-differentiable from Θ to L∞ (D). Moreover, we assume that G : Θ → L(L2 (D), L2 (D)) satisfies the following smoothing properties with some δ > 0: (3) G(θ) is bounded from Lp (D) to Lp+δ (D) for all p ∈ [2, ∞) and all θ ∈ Θ (4) G(θ) is bounded from Lp (D) to L∞ (D) for all p > p0 and all θ ∈ Θ. In addition, we demand that G(θ) : L2 (D) → L2 (D) is monotone for all θ ∈ Θ: hG(θ)(u − v), u − vi ≥ 0 for all u, v ∈ L2 (D), and that (5) G is locally Lipschitz continuous from Θ to L(L2 (D), L2 (D)) (6) G is locally Lipschitz continuous from Θ to L(L∞ (D), L∞ (D)). Finally, we assume that (7) G is B-differentiable from Θ to L(Lp0 +δ (D), L∞ (D)). Remark 2.3. For control-constrained optimal control problems, G = S ⋆ S where S is the solution operator of the differential equation involved. An example is presented in Section 8. If assumptions (1)–(2) and (5)–(7) hold only at a specified parameter θ0 and (3)–(4) hold only in a neighborhood of θ0 , the subsequent analysis remains valid locally. Remark 2.4. The assumptions (1)–(7) can be changed if G does not map into L∞ (D) but only into Ls (D) for some s ∈ (2, ∞). (1’) (2’) (3’) (5’) (6’) (7’) g is locally Lipschitz continuous from Θ to Ls (D) g is B-differentiable from Θ to Ls (D). G(θ) is bounded from Lp (D) to Lp+δ (D) for all p ∈ [2, s − δ] and all θ ∈ Θ G is locally Lipschitz continuous from Θ to L(L2 (D), L2 (D)) G is locally Lipschitz continuous from Θ to L(Ls (D), Ls (D)). G is B-differentiable from Θ to L(Ls (D), Ls (D)). In this case, the results of Proposition 3.2 and Theorems 4.5 and 5.2 change accordingly. In particular, our main result Theorem 7.1 remains true if ∞ is replaced by s. In the sequel, we will need the B-derivative of a composite function. A similar result for a related differentiation concept can be found in [8, Prop. 3.6]. Lemma 2.5. Consider normed linear spaces X, Y, Z and mappings F : Y → Z, G : X → Y . Assume that the mapping G is B-differentiable at θ0 ∈ X and that F is Bdifferentiable at G(θ0 ). Furthermore assume that G is locally Lipschitz continuous at θ0 6. Update Strategies for Perturbed Nonsmooth Equations 107 and that F ′ (G(θ0 )) is locally Lipschitz continuous at 0. Then the mapping H : X → Z defined by H = F ◦ G is B-differentiable at θ0 with the derivative H ′ (θ0 ) = F ′ (G(θ0 )) ◦ G′ (θ0 ). Proof. Applying B-differentiability of F and G we obtain F (G(θ)) − F (G(θ0 )) = F ′ (G(θ0 )) (G(θ) − G(θ0 )) + rF , = F ′ (G(θ0 )) (G′ (θ0 )(θ − θ0 ) + rG ) + rF (2.1) with the remainder terms rF and rG satisfying krF kZ → 0 as kG(θ) − G(θ0 )kY → 0 kG(θ) − G(θ0 )kY and krG kY → 0 as kθ − θ0 kX → 0 kθ − θ0 kX respectively. Now let us write F ′ (G(θ0 )) (G′ (θ0 )(θ − θ0 ) + rG ) = F ′ (G(θ0 ))G′ (θ0 )(θ − θ0 ) + F ′ (G(θ0 )) (G′ (θ0 )(θ − θ0 ) + rG ) − F ′ (G(θ0 ))G′ (θ0 )(θ − θ0 ). (2.2) Putting (2.1) and (2.2) together, we get an expression for the remainder term F (G(θ)) − F (G(θ0 )) − F ′ (G(θ0 ))G′ (θ0 )(θ − θ0 ) = rF + F ′ (G(θ0 )) (G′ (θ0 )(θ − θ0 ) + rG ) − F ′ (G(θ0 ))G′ (θ0 )(θ − θ0 ) (2.3) ′ Note that G (θ0 )(θ − θ0 ) and rG are small in the norm of Y whenever θ − θ0 is small in the norm of X. Since F ′ (G(θ0 )) is locally Lipschitz continuous at 0, we can estimate kF (G(θ)) − F (G(θ0 )) − F ′ (G(θ0 ))G′ (θ0 )(θ − θ0 )kZ ≤ krF kZ + cF ′ krG kY . It remains to prove that the right-hand side, divided by kθ − θ0 kX , vanishes for kθ − θ0 kX → 0. This is true for krG kY . So we have to investigate krF kZ : krF kZ krF kZ krF kZ kG(θ) − G(θ0 )kY = ≤ cG kθ − θ0 kX kG(θ) − G(θ0 )kY kθ − θ0 kX kG(θ) − G(θ0 )kY by the local Lipschitz continuity of G at θ0 . For kθ − θ0 kX → 0 it follows kG(θ) − G(θ0 )kY → 0. Hence, the right-hand side vanishes for kθ − θ0 kX → 0. And the proof is complete. Combining locally Lipschitz continuity and B-differentiability, we can prove a useful continuity result for the B-derivative. Lemma 2.6. Consider normed linear spaces X, Y and the mapping G : X → Y . Let G be B-differentiable and locally Lipschitz continuous at θ0 ∈ X. Then it holds kG′ (θ0 )(θ − θ0 )kY → 0 for kθ − θ0 kX → 0, i.e. the B-derivative is continuous in the origin with respect to the direction. Proof. By local Lipschitz continuity of G at θ0 , there exist ǫ > 0 and L > 0 such that kG(θ) − G(θ0 )kY ≤ Lkθ − θ0 kX Let us write ∀θ ∈ X : kθ − θ0 kX < ǫ. G(θ) = G(θ0 ) + G′ (θ0 )(θ − θ0 ) + rG with the remainder rG satisfying krG kY → 0 as kθ − θ0 kX → 0. kθ − θ0 kX Then, we have kG′ (θ0 )(θ − θ0 )kY ≤ Lkθ − θ0 kX + krG kY , 108 Numerical Methods and Applications and it follows that the right-hand side tends to zero as kθ − θ0 kX → 0. 3. Lipschitz Stability of the Solution Map In this section we draw some simple conclusions from the assumptions made in Section 2. We recall that our problem (Oθ ) is equivalent to the following variational inequality: Find u ∈ Uad s.t. hu + G(θ)u − g(θ), v − ui ≥ 0 for all v ∈ Uad . (VI θ ) We begin by proving the Lipschitz stability of solutions u[θ] with respect to the L2 (D) norm. Lemma 3.1. For any given θ ∈ Θ, (Oθ ) has a unique solution u[θ] ∈ L2 (D). The solution map u[·] is locally Lipschitz continuous from Θ to L2 (D). Proof. Let θ ∈ Θ be given and let F (u) = u + G(θ)u − g(θ). By monotonicity of G(θ) it follows that hF (u1 ) − F (u2 ), u1 − u2 i ≥ ku1 − u2 k2 , hence F is strongly monotone. This implies the unique solvability of (VI θ ) and thus of (Oθ ), see, for instance, [3]. If θ′ ∈ Θ is another parameter, then we obtain from (VI θ ) hu + G(θ)u − g(θ), u′ − ui + hu′ + G(θ′ )u′ − g(θ′ ), u − u′ i ≥ 0. Inserting the term G(θ′ )u − G(θ′ )u and using the monotonicity of G(θ′ ), we obtain ku′ − uk2 ≤ (kG(θ) − G(θ′ )k2→2 kuk + kg(θ) − g(θ′ )k) ku′ − uk. This proves the local Lipschitz continuity of u[·] at any given parameter θ: Suppose that θ and θ′ are in some ball of radius ε around θ0 such that, by Assumption (5), kG(θ) − G(θ′ )k2→2 ≤ Lkθ − θ′ k. If we set u0 = u[θ0 ], then ku − u0 k ≤ Lkθ − θ0kku0 k ≤ εLku0k and thus kuk ≤ εLku0k + ku0 k. Hence ku′ − uk ≤ Lkθ − θ′ k(1 + εL)ku0 k. By exploiting the smoothing properties of G(θ), this result can be strenghtened: Proposition 3.2. The solution map u[·] is locally Lipschitz continuous from Θ to L∞ (D). Proof. We use a bootstrapping argument to show that the solution u[θ] lies in L∞ (D). The fact that g(θ) ∈ L∞ (D) and the smoothing property (3) of G(θ) yield g(θ) − G(θ)u[θ] ∈ L2+δ (D). By the properties of the projection, it follows from (Oθ ) that u[θ] ∈ L2+δ (D). Repeating this argument until 2 + nδ > p0 , we find u[θ] ∈ L∞ (D) by Assumption (4). We prove without loss of generality the local Lipschitz continuity of u[·] at the reference parameter θ0 . Let θ and θ′ be any two parameters in a ball of radius ε around θ0 such that kG(θ) − G(θ′ )k∞→∞ ≤ Lkθ − θ′ k and kg(θ) − g(θ′ )k∞ ≤ Lkθ − θ′ k hold. Using the Lipschitz continuity of the projection, we obtain ku − u′ k2+δ ≤ kg(θ) − g(θ′ )k2+δ + kG(θ)u − G(θ′ )u′ k2+δ ≤ c kg(θ) − g(θ′ )k∞ + kG(θ)(u − u′ )k2+δ + k(G(θ) − G(θ′ ))u′ k2+δ ≤ c L kθ − θ′ k + c ku − u′ k + c L kθ − θ′ kku′ k∞ for some c > 0 and hence the local Lipschitz stability for u[·] in L2+δ (D) follows. Repeating this argument until 2 + nδ > p0 , we obtain the local Lipschitz stability for u[·] in L∞ (D). 6. Update Strategies for Perturbed Nonsmooth Equations 109 4. B-Differentiability of the Solution Map In this section we study the differentiability properties of the solution map u[·], which depend on the properties of the projection. We extend the results of [5]. Let us define a set I[a, b, u0 ] by u(x) = 0 u0 (x) 6∈ [a(x), b(x)] u(x) = 0 if u0 (x) = a(x) = b(x) . I[a, b, u0 ] = u ∈ L2 (D) : u(x) ≥ 0 u0 (x) = a(x) u(x) ≤ 0 u0 (x) = b(x) The pointwise projection on this set is denoted by ΠI[a,b,u0 ] . By construction it holds for u0 , u, a, b ∈ L2 (D), a ≤ b ΠI[a,b,u0 ] (u) = −ΠI[−b,−a,−u0 ] (−u), ΠI[a,+∞,u0 ] (u) = ΠI[0,+∞,u0 −a] (u), ΠI[a,b,u0 ] (u) = ΠI[a,+∞,u0 ] ΠI[−∞,b,u0 ] (u) . (4.1) It turns out that ΠI[a,b,u0 ] is the B-derivative of the projection onto the admissible set Π[a,b] . We start with the proof of B-differentiability of the projection on the cone of non-negative functions. Theorem 4.1. The projection Π[0,+∞] is B-differentiable from Lp (D) to Lq (D) for 1 ≤ q < p ≤ ∞. And it holds where Π[0,+∞] (u) = Π[0,+∞] (u0 ) + ΠI[0,+∞,u0 ] (u − u0 ) + r1 (4.2a) kr1 kq → 0 as ku − u0 kp → 0. ku − u0 kp (4.2b) Remark 4.2. The claim for the case p = ∞ was proven in [5]. A counterexample was given there, which shows that the projection is not B-differentiable from L∞ (D) to L∞ (D). Proof of Theorem 4.1. Clearly, the function ΠI[0,+∞,u0 ] is positively homogeneous. Let us define the function r as the remainder term r = Π[0,+∞] (u) − Π[0,+∞] (u0 ) − ΠI[0,+∞,u0 ] (u − u0 ). A short calculation shows that r(x) = ( |u(x)| 0 if u(x)u0 (x) < 0 otherwise (4.3) (4.4) holds, see also the discussion in [5]. It implies the estimate r(x) ≤ |u(x) − u0 (x)|. Now suppose that 1 ≤ q < p ≤ ∞. It remains to prove krkq → 0 as ku − u0 kp → 0. ku − u0 kp (4.5) We will argue by contradiction. Assume that (4.5) does not hold. Then there exists ǫ > 0 such that for all δ > 0 there is a function uδ with kuδ − u0 kp < δ and satisfying krδ kq ≥ ǫ. kuδ − u0 kp (4.6) Here, rδ is the remainder term defined as in (4.3). Let us choose a sequence {δk } with limk→∞ δk = 0, uk = uδk , and rk := rδk . By Egoroff’s Theorem, for each σ > 0 there 110 Numerical Methods and Applications exists a set Dσ ⊂ D with meas(D \ Dσ ) < σ such that the convergence uk → u0 is uniform on Dσ . It allows us to estimate !1/q Z 1/q Z q q krk kq ≤ |uk (x) − u0 (x)| dx + |rk (x)| dx D\Dσ 1 1 ≤ σ q − p kuk − u0 kp + Z Dσ Dσ 1/q . |rk (x)|q dx Here, the second addend needs more investigation. Let us define a subset Dσ,k of Dσ by ′ ′ Dσ,k = x ∈ Dσ : 0 < |u0 (x)| < sup |uk (x ) − u0 (x )| . x′ ∈Dσ Then by construction it holds rk (x) = 0 on Dσ \ Dσ,k , compare (4.4). Observe that meas(Dσ,k ) → 0 as k → ∞ due to the uniform convergence of uk to u0 on Dσ . And we can proceed with Z 1/q 1 1 krk kq ≤ σ q − p kuk − u0 kp + |rk (x)|q dx Dσ =σ 1 1 q−p 1 !1/q Z kuk − u0 kp + 1 Dσ,k q |rk (x)| dx 1 1 ≤ σ q − p kuk − u0 kp + meas(Dσ,k ) q − p kuk − u0 kp , which is a contradiction to (4.6). Now, we calculate the B-derivative of Π[a,b] using the chain rule developed in Lemma 2.5. Theorem 4.3. The projection Π[a,b] is B-differentiable from Lp (D) to Lq (D) for 1 ≤ q < p ≤ ∞. And it holds where Π[a,b] (u) = Π[a,b] (u0 ) + ΠI[a,b,u0 ] (u − u0 ) + r1 (4.7a) kr1 kq → 0 as ku − u0 kp → 0. ku − u0 kp (4.7b) Proof. The projection Π[a,b] can be written as a composition of two projections on the set of non-negative functions as Π[a,b] (u) = Π[0,+∞] b − Π[0,+∞] (b − u) − a + a. The projection Π[0,+∞] and its B-derivative ΠI[0,+∞,u0 ] are Lipschitz continuous. Thus, the B-differentiability of Π[a,b] follows by Lemma 2.5. The chain rule yields the derivative Π′[a,b] (u0 )(u − u0 ) = ΠI[0,+∞,b−Π[0,+∞] (b−u0 )−a] −ΠI[0,+∞,b−u0 ] (−(u − u0 )) = ΠI[0,+∞,b−Π[0,+∞] (b−u0 )−a] ΠI[−∞,b,u0 ] (u − u0 ) = ΠI [a,+∞,Π[−∞,b] (u0 )] ΠI[−∞,b,u0 ] (u − u0 ) . Here, we used the properties (4.1) of the projection ΠI . It remains to prove that the right-hand side is equal to ΠI[a,b,u0 ] (u − u0 ). To this end, let us introduce the following disjoint subsets of D: D1 := {x ∈ D : u0 (x) ≤ b(x)}, D2 := {x ∈ D : b(x) < u0 (x)}. 6. Update Strategies for Perturbed Nonsmooth Equations 111 Let us denote by χDi the characteristic function of the set Di . The projection ΠI is additive with respect to functions with disjoint support, i.e. ΠI[a,b,u0 ] (v) = ΠI[a,b,u0 ] (χD1 v) + ΠI[a,b,u0 ] (χD2 v) holds for all a, b, u0 , v. Since Π′[a,b] (u0 )(u − u0 ) is a composition of such projections, we can split Π′[a,b] (u0 )(u − u0 ) = Π′[a,b] (u0 )(χD1 (u − u0 )) + Π′[a,b] (u0 )(χD2 (u − u0 )). Furthermore, it holds ΠI[a,b,u0 ] (χDi v) = ΠI[a,b,χDi u0 ] (χDi v). At first, we have χD1 Π[−∞,b] (χD1 u0 ) = χD1 u0 . Π′[a,b] (u0 )(χD1 (u − u0 )) = ΠI [a,+∞,Π[−∞,b] (u0 )] ΠI[−∞,b,u0 ] (χD1 (u − u0 )) = ΠI[a,+∞,u0 ] ΠI[−∞,b,u0 ] (χD1 (u − u0 )) = ΠI[a,b,u0 ] (χD1 (u − u0 )). The last equality follows from the third property of ΠI in (4.1). For the second set D2 , we have ΠI[−∞,b,u0 ] (χD2 (u − u0 )) = 0, since u0 (x) is not admissible for x ∈ D2 . For the same reason, we get also ΠI[a,b,u0 ] (χD2 (u − u0 )) = 0, which gives Π′[a,b] (u0 )(χD2 (u − u0 )) = 0 = ΠI[a,b,u0 ] (χD2 (u − u0 )). Consequently, we obtain Π′[a,b] (u0 )(u − u0 ) = Π′[a,b] (u0 )(χD1 (u − u0 )) + Π′[a,b] (u0 )(χD2 (u − u0 )) = ΠI[a,b,u0 ] (χD1 (u − u0 )) + ΠI[a,b,u0 ] (χD2 (u − u0 )) = ΠI[a,b,u0 ] (u − u0 ), and the claim is proven. Let us remark that the result of the last two Theorems is sharp with respect to the choice of function spaces: Remark 4.4. The projection is not B-differentiable from Lp (D) to Lp (D) for any p, as the following example shows. Take a = 0, b = +∞, D = (0, 1). We choose u0 (x) = −1 and ( 1 if x ∈ (0, 1/k) uk (x) = −1 otherwise. In this case, the remainder term given by (4.4) is r1,k = (uk − u0 )/2. Therefore it holds 1 kr1,k kp = 6→ 0 for k → ∞. kuk − u0 kp 2 As a side result of the previous theorem, however, we get for α ∈ (−∞, 1) kr1,k kp → 0 for k → ∞. kuk − u0 kα p We are now in the position to prove B-differentiability of the solution mapping u[θ] of our non-smooth equation (Oθ ). 112 Numerical Methods and Applications Theorem 4.5. The solution mapping u[θ] of problem (Oθ ) is B-differentiable from Θ to Lp (D), 2 ≤ p < ∞. The Bouligand derivative of u[·] at θ0 in direction θ, henceforth called u′ [θ0 ]θ, is the unique solution of the non-smooth equation u = ΠI[a,b,φ0 ] (g ′ (θ0 )θ − G(θ0 )u − (G′ (θ0 )θ)u0 ) (Oθ′ 0 ;θ ) where u0 = u[θ0 ] and φ0 = g(θ0 ) − G(θ0 )u0 . Proof. The problem (Oθ′ 0 ;θ ) is equivalent to finding a solution u ∈ I[a, b, φ0 ] of the variational inequality hu + G(θ0 )u + (G′ (θ0 )θ)u0 − g ′ (θ0 )θ, v − ui ≥ 0 ∀v ∈ I[a, b, φ0 ]. By monotonicity of G(θ0 ) this variational inequality is uniquely solvable, compare Lemma 3.1. Moreover, the projection ΠI[a,b,φ0 ] is positively homogeneous. So the mapping θ 7→ u′ [θ0 ]θ is positively homogeneous as well. Now, let us take θ1 ∈ Θ and u1 := u[θ1 ]. Let p ∈ [2, ∞) be fixed. Further, let ud be the solution of (Oθ′ 0 ;θ ) for θ = θ1 − θ0 , i.e. ud = ΠI[a,b,φ0 ] (g ′ (θ0 )(θ1 − θ0 ) − G(θ0 )ud − G′ (θ0 )(θ1 − θ0 )u0 ). (4.8) Let us investigate the difference u1 − u0 . We obtain by B-differentiability of the projection from Lp+δ (D) to Lp (D) u1 − u0 = Π[a,b] (g(θ1 ) − G(θ1 )u1 ) − Π[a,b] (g(θ0 ) − G(θ0 )u0 ) = ΠI[a,b,g(θ0 )−G(θ0 )u0 ] (g(θ1 ) − G(θ1 )u1 − g(θ0 ) + G(θ0 )u0 ) + r1 (4.9) = ΠI[a,b,φ0 ] (g(θ1 ) − G(θ1 )u1 − g(θ0 ) + G(θ0 )u0 ) + r1 . The remainder term r1 satisfies kr1 kp →0 kg(θ1 ) − G(θ1 )u1 − g(θ0 ) + G(θ0 )u0 kp+δ as kg(θ1 ) − G(θ1 )u1 − g(θ0 ) + G(θ0 )u0 kp+δ → 0. Applying Lipschitz continuity of u[·], G, and g, we get kg(θ1 ) − G(θ1 )u1 − g(θ0 ) + G(θ0 )u0 kp+δ ≤ c (kθ1 − θ0 k + ku1 − u0 kp ) ≤ c kθ1 − θ0 k. Hence, we find for the remainder term kr1 kp → 0 as kθ1 − θ0 k → 0. kθ1 − θ0 k (4.10) Let us rewrite (4.9) as u1 − u0 − r1 = ΠI[a,b,φ0 ] g(θ1 ) − g(θ0 ) − G(θ0 )(u1 − u0 ) − (G(θ1 ) − G(θ0 ))u1 = ΠI[a,b,φ0 ] g ′ (θ0 )(θ1 − θ0 ) + r1g − G(θ0 )(u1 − u0 ) − (G′ (θ0 )(θ1 − θ0 ) + r1G )u1 = ΠI[a,b,φ0 ] g ′ (θ0 )(θ1 − θ0 ) − G(θ0 )(u1 − u0 − r1 ) − G′ (θ0 )(θ1 − θ0 )u1 + r1g + r1G u1 − G(θ0 )r1 = ΠI[a,b,φ0 ] g ′ (θ0 )(θ1 − θ0 ) − G(θ0 )(u1 − u0 − r1 ) − G′ (θ0 )(θ1 − θ0 )u1 + r1∗ with a remainder term r1∗ = r1g + r1G u1 − G(θ0 )r1 satisfying kr1∗ kp → 0 as kθ1 − θ0 k → 0. (4.11) kθ1 − θ0 k We can interpret ur := u1 − u0 − r1 as the solution of the non-smooth equation ur = ΠI[a,b,φ0 ] g ′ (θ0 )(θ1 − θ0 ) − G(θ0 )ur − G′ (θ0 )(θ1 − θ0 )u1 + r1∗ , 6. Update Strategies for Perturbed Nonsmooth Equations 113 which is similar to (4.8) but perturbed by −G′ (θ0 )(θ1 − θ0 )(u1 − u0 ) + r1∗ . Analogously as in Section 3, it can be shown that the solution mapping of that equation is Lipschitz continuous in the data, i.e., the map r ∋ Lp (D) 7→ u ∈ Lp (D), where u = ΠI[a,b,φ0 ] (−G(θ0 )u + r), is Lipschitz continuous. So we can estimate ku1 − u0 − r1 − ud kp = kur − ud kp ≤ c kG′ (θ0 )(θ1 − θ0 )(u1 − u0 )kp + c kr1∗ kp ≤ c kG′ (θ0 )(θ1 − θ0 )(u1 − u0 )k∞ + c kr1∗ kp . (4.12) Using the assumptions on G, we obtain by Lemma 2.6 kG′ (θ0 )(θ1 − θ0 )k∞→∞ → 0 as kθ1 − θ0 k → 0. The mapping θ 7→ u[θ] is locally Lipschitz continuous from Θ to L∞ (D), see Proposition 3.2. Both properties imply kG′ (θ0 )(θ1 − θ0 )(u1 − u0 )k∞ → 0 as kθ1 − θ0 k → 0. kθ1 − θ0 k (4.13) Combining (4.11)–(4.13) yields in turn ku1 − u0 − r1 − ud kp → 0 as kθ1 − θ0 k → 0. kθ1 − θ0 k (4.14) Finally, we have ku1 − (u0 + ud )kp ≤ ku1 − u0 − r1 − ud kp + kr1 kp and consequently by (4.10) and (4.14) ku1 − (u0 + ud )kp → 0 as kθ1 − θ0 k → 0. kθ1 − θ0 k Hence, ud is the Bouligand derivative of u[·] at θ0 in the direction θ1 − θ0 . (4.15) Remark 4.6. This result cannot be strengthened. The map u[θ] cannot be Bouligand from Θ to L∞ (D). To see this, consider the case G = 0. It trivially fulfills all requirements of Section 2. Then u[θ] = Π[a,b] (g(θ)) holds, but the projection Π[a,b] is not B-differentiable from L∞ (D) to L∞ (D), see Remark 4.4. Lemma 4.7. The B-derivative u′ [θ0 ] satisfies for all α ∈ (−∞, 1) ku[θ0 ] + u′ [θ0 ](θ1 − θ0 ) − u[θ1 ]k∞ → 0 as kθ1 − θ0 k → 0. kθ1 − θ0 kα Proof. Here, we will follow the steps of the proof of the previous theorem. Let α be less than 1. The limiting factors in the proof are the remainder terms r1 and r1∗ . We obtain for r1 and r1∗ due to Remark 4.4 the property kr1∗ k∞ kr1 k∞ → 0 and → 0 as kθ1 − θ0 k → 0. kθ1 − θ0 kα kθ1 − θ0 kα Combining these with estimates (4.12)–(4.15) completes the proof. 5. Properties of the Adjoint Problem In this section we investigate an adjoint problem defined by φ = g(θ) − G(θ)Π[a,b] (φ). (Dθ ) If we interpret (Oθ ) as an optimal control problem with control constraints, see Section 8, then problem (Dθ ) is an equation for the adjoint state. The primal and adjoint formulations are closely connected: If u[θ] is the unique solution of (Oθ ) then φ := g(θ) − G(θ)u[θ] (5.1) 114 Numerical Methods and Applications is a solution of (Dθ ), which means that (Dθ ) admits at least one solution. And if φ is a solution of the dual (adjoint) equation (Dθ ) then the projection u = Π[a,b] (φ[θ]) is the unique solution of the original problem (Oθ ). Now, let us briefly answer the question of uniqueness of adjoint solutions. If φ1 and φ2 are two solutions of (Dθ ), then both Π[a,b] (φ1 ) and Π[a,b] (φ2 ) are solutions of (Oθ ). By Lemma 3.1 this problem has a unique solution, hence Π[a,b] (φ1 ) = Π[a,b] (φ2 ). For the difference φ1 − φ2 we have φ1 − φ2 = g(θ) − G(θ)Π[a,b] (φ1 ) − g(θ) − G(θ)Π[a,b] (φ2 ) = −G(θ)(Π[a,b] (φ1 ) − Π[a,b] (φ2 )) = 0, which implies in fact the unique solvability of (Dθ ). In the following, we denote this unique solution by φ[θ]. An immediate conclusion of the considerations in Section 3 is the Lipschitz property of φ[·]. Corollary 5.1. The mapping φ[θ] is locally Lipschitz from Θ to L∞ (D). Thus, we found that φ[·] inherits Lipschitz continuity from u[·]. However, in contrast to the primal map u[·], the adjoint map φ[·] is B-differentiable into L∞ (D). The property which allows us to prove this result is that in (Dθ ), the smoothing operator G(θ) is applied after the projection Π[a,b] . Theorem 5.2. The mapping φ[θ] is B-differentiable from Θ to L∞ (D). The Bderivative of φ[·] at θ0 in direction θ, henceforth called φ′ [θ0 ]θ, is the solution of the non-smooth equation φ = g ′ (θ0 )θ − G(θ0 )ΠI[a,b,φ0 ] (φ) − (G′ (θ0 )θ)Π[a,b] (φ0 ), (5.2) where φ0 = φ[θ0 ] = g(θ0 ) − G(θ0 )u[θ0 ]. Proof. Due to the linearity of G, the B-derivative of H(θ) := G(θ)u[θ] at θ0 , in the direction of θ, can be written as H ′ (θ0 )θ = G(θ0 )u′ [θ0 ]θ + (G′ (θ0 )θ)u0 , where u0 = u[θ0 ]. By Theorem 4.5, u[·] is B-differentiable from Θ to Lp0 +δ (D). Together with the B-differentiability of G(·) from Θ to L(Lp0 +δ (D), L∞ (D)), the relationship φ[θ] = g(θ) − G(θ)u[θ] implies B-differentiability of φ[·] from Θ to L∞ (D). The formula (5.2) is obtained by differentiating equation (Dθ ). We now discuss the use of the derivative of φ[θ] to obtain an update rule for the primal variable u[θ]. Suppose that u0 = u[θ0 ] and φ0 = φ[θ0 ] are the solutions of the primal and dual problems at the reference parameter θ0 . We use the following construction as a first-order approximation of u[θ]: ũ[θ0 , θ − θ0 ] := C3 (θ) = Π[a,b] φ0 + φ′ [θ0 ](θ − θ0 ) . (5.3) We can prove that the L∞ -norm of the remainder u[θ] − ũ[θ0 , θ − θ0 ], divided by kθ − θ0 k, vanishes as θ → θ0 . This is a stronger result than can be obtained using merely the B-differentiability. There, the remainder u[θ]−u[θ0 ]−u′ [θ0 ](θ−θ0 ), divided by kθ − θ0 k, vanishes only in weaker Lp -norms. We refer to Section 7 for a comparison of this advanced update rule with the conventional rules (C1 ) and (C2 ). Corollary 5.3. Let ũ[θ0 , θ − θ0 ] be given by (5.3). Then ku[θ] − ũ[θ0 , θ − θ0 ]k∞ → 0 as θ → θ0 . kθ − θ0 k Proof. By construction, we have u[θ] − ũ[θ0 , θ − θ0 ] = Π[a,b] (φ[θ]) − Π[a,b] φ[θ0 ] + φ′ [θ0 ](θ − θ0 ) . 6. Update Strategies for Perturbed Nonsmooth Equations 115 The projection is Lipschitz from L∞ (D) to L∞ (D), hence we can estimate ku[θ] − ũ[θ0 , θ − θ0 ]k∞ ≤ kφ[θ] − φ[θ0 ] − φ′ [θ0 ](θ − θ0 )k∞ . We know already by Theorem 5.2 that φ[θ] is B-differentiable at θ0 from Θ to L∞ (D). Thus, it holds for kθ − θ0 k → 0 kφ[θ] − φ[θ0 ] + φ′ [θ0 ](θ − θ0 )k∞ → 0. kθ − θ0 k for θ − θ0 → 0. Consequently, we get the same behavior for the remainder u[θ] − ũ[θ0 , θ − θ0 ], which proves the claim. In the next section we discuss how the quantities u[θ0 ], φ[θ0 ] and the required directional derivatives of these quantities can be computed. It turns out that the derivative φ′ [θ0 ](θ−θ0 ) is available at no additional cost when evaluating u′ [θ0 ](θ−θ0 ), so the new update rule (C3 ) incurs no additional cost. On the other hand, it is also easily possible to obtain φ′ [θ0 ](θ − θ0 ) a posteriori from ′ u [θ0 ](θ − θ0 ). Once u′ [θ0 ](θ − θ0 ) is known, φ′ [θ0 ](θ − θ0 ) can be computed from φ′ [θ0 ](θ − θ0 ) = g ′ (θ0 )(θ − θ0 ) − G(θ0 )u′ [θ0 ](θ − θ0 ) − (G′ (θ0 )(θ − θ0 ))u0 . Hence the a posteriori computation of φ′ involves only the application of G and G′ and it is not necessary to solve any additional non-smooth equations. For optimal control problems the quantity φ′ [θ0 ](θ − θ0 ) is closely related to the adjoint state of the problem belonging to u′ [θ0 ](θ − θ0 ). 6. Computation of the Solution and its Derivative In this section we address the question how to solve problem (Oθ ) for the nominal parameter θ0 and the derivative problem (Oθ′ 0 ;θ ) algorithmically. In the recent past, generalized Newton methods in function spaces have been developed [2, 10], where a generalized set-valued derivative plays the role of the Fréchet derivative in the classical Newton method. The semismooth Newton concept can be applied here, in view of the smoothing properties of the operator G(θ0 ). Let us consider the following nonsmooth equation: F (u) := −u + g(θ0 ) − G(θ0 )u − max{0, g(θ0 ) − G(θ0 )u − b} − min{0, g(θ0 ) − G(θ0 )u − a} = 0. (6.1) It is easy to check that (6.1) holds if and only if u solves (Oθ ) at θ0 . Following [2], we infer that F is Newton differentiable as a map from Lp (D) to p L (D) for any p ∈ [2, ∞]. The usual norm gap in the min and max functions is compensated by the smoothing properties of G(θ0 ). The generalized derivative of F is set-valued, and we take F ′ (u) δu = −G(θ0 ) δu − δu + χA+ (u) G(θ0 ) δu + χA− (u) G(θ0 ) δu as a particular choice. Here, A+ (u) = {x ∈ D : g(θ0 ) − G(θ0 )u − b ≥ 0} A(u) = A+ (u) ∪ A− (u) A− (u) = {x ∈ D : g(θ0 ) − G(θ0 )u − a ≤ 0} I(u) = D \ A(u) are the so-called active and inactive sets, and χA is the characteristic function of a measurable set A. A generalized Newton step F ′ (u) δu = −F (u) can be computed 116 Numerical Methods and Applications by splitting the unknown δu into its parts supported on the active and inactive sets. Then a simple calculation shows that on A+ (u) : δu|A+ (u) = b − u on A− (u) : δu|A− (u) = a − u on I(u) : (G(θ0 ) + I) δu|I(u) = g(θ0 ) − G(θ0 )u − u − G(θ0 ) δu|A(u) . Lemma 6.1. For given u ∈ Lp (D) where 2 ≤ p ≤ ∞, the generalized Newton step F ′ (u) δu = −F (u) has a unique solution δu ∈ Lp (D). Proof. We only need to verify that the step on the inactive set I(u) is indeed uniquely solvable. This follows from the strong monotonicity of G(θ0 ) + I, considered as an operator from L2 (I(u)) to itself, compare the proof of Lemma 3.1. Hence the unique solution has an a priori regularity δu ∈ L2 (D). The terms of lowest regularity on the right hand sides are the terms −u. Hence δu inherits the Lp (D) regularity of u. Note that in case b or a are equal to ±∞ on a subset of D, this subset can not intersect A+ (u) or A− (u) and thus the update δu lies in L∞ (D), provided that u ∈ L∞ (D), even if the bounds take on infinite values. By the previous lemma, the generalized Newton iteration is well-defined. For a convergence analysis, we refer to [2, 10]. For completeness, we state the semismooth Newton method for problem (Oθ ) below (Algorithm 1). Note that the dual variable Algorithm 1 Semismooth Newton algorithm to compute u0 and φ0 . 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: Choose u0 ∈ L∞ (D) and set n := 0 Set φn := g(θ0 ) − G(θ0 )un Set rn := F (un ) = φn − un − max{0, φn − b} − min{0, φn − a} while krn k∞ > tol do Set δu|A+ (un ) := b − un on A+ (un ) Set δu|A− (un ) := a − un on A− (un ) Solve (G(θ0 ) + I) δu|I(un) = φn − un − G(θ0 )δu|A(un ) on I(un ) Set un+1 := un + δu Set φn+1 := g(θ0 ) − G(θ0 )un+1 Set rn+1 := F (un+1 ) = φn+1 − un+1 − max{0, φn+1 − b} − min{0, φn+1 − a} Set n := n + 1 end while Set u0 := un and φ0 := φn φ0 appears naturally as an auxiliary quantity in the iteration, so it is available at no extra cost. With minor modifications, the same routine solves the derivative problems (Oθ′ 0 ;θ ) for u′ [θ0 ](θ) and (5.2) for φ′ [θ0 ](θ) simultaneously. Similarly as before, we consider the nonsmooth equation Fb (b u) := −b u + g ′ (θ0 )θ − G(θ0 )b u − (G′ (θ0 )θ)u0 − max{0, g ′(θ0 )θ − G(θ0 )b u − (G′ (θ0 )θ)u0 − bb} − min{0, g ′ (θ0 )θ − G(θ0 )b u − (G′ (θ0 )θ)u0 − b a} = 0. (6.2) Hats indicate variables that are associated with derivatives. The new bounds b a and bb depend on the solution and adjoint solution u0 and φ0 of the reference problem, through the definition of I[a, b, φ0 ] in Section 4: ( ( 0 where u0 = a or φ0 6∈ [a, b] bb = 0 where u0 = b or φ0 6∈ [a, b] b a= −∞ elsewhere ∞ elsewhere. (6.3) 6. Update Strategies for Perturbed Nonsmooth Equations 117 The active and inactive sets Ab+ (b u) etc. for the derivative problem are taken with respect to the bounds b a and bb. For the ease of reference, we also state the semismooth Newton method for the derivative problems u b = u′ [θ0 ]θ and φb = φ′ [θ0 ]θ, see Algorithm 2. Note that these quantities satisfy u′ [θ0 ](θ) = ΠI[a,b,φ0 ] φ′ [θ0 ]θ φ′ [θ0 ](θ) = g ′ (θ0 )θ − G(θ0 )u′ [θ0 ]θ − (G′ (θ0 )θ)u0 , so each can be computed from the other. Algorithm 2 Semismooth Newton algorithm to compute u′ [θ0 ]θ and φ′ [θ0 ]θ. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: Choose u b0 ∈ L∞ (D) and set n := 0 Set the bounds b a and bb according to (6.3) ′ b Set φn := g (θ0 )θ − G(θ0 )b un − (G′ (θ0 )θ)u0 b b Set rbn := F (b un ) = φn − u bn − max{0, φbn − bb} − min{0, φbn − b a} while kb rn k∞ > tol do Set δu|Ab+ (bun ) := bb − u bn on Ab+ (b un ) − Set δu b− := b a−u bn on Ab (b un ) |A (b un ) b b un ) Solve (G(θ0 ) + I) δu|I(b bn − G(θ0 )δu|A(b b un ) = φn − u b un ) on I(b Set u bn+1 := u bn + δu Set φbn+1 := g ′ (θ0 )θ − G(θ0 )b un+1 − (G′ (θ0 )θ)u0 bn+1 − b Set rbn+1 := Fb(b un+1 ) = φbn+1 − u bn+1 − max{0, φbn+1 − bb} − min{0, φ a} Set n := n + 1 end while Set u′ [θ0 ]θ := u bn and φ′ [θ0 ]θ := φbn 7. Update Strategies and Error Estimates In this section, we analyze three different update strategies for the solution of (Oθ ). Suppose that θ0 ∈ Θ is a given reference parameter, and that u0 = u[θ0 ] is the unique solution of (Oθ ) associated to this parameter. Our goal is to analyze strategies to approximate the perturbed solution u[θ] using the known reference solution u0 and derivative information u′ [θ0 ] or φ′ [θ0 ]. Such strategies are particularly useful if they provide a reasonable approximation of the perturbed solution at lower numerical effort than is required by the repeated solution of the perturbed problem. We will see below that our strategies fulfill this condition to some degree. However, the full potential of these update schemes can only be revealed in nonlinear applications, where the solution of the derivative problem is significantly less expensive then the solution of the original problem. This deserves further investigation. The three strategies we are considering are: C1 (θ) := u0 + u′ [θ0 ](θ − θ0 ) C2 (θ) := Π[a,b] C3 (θ) := Π[a,b] u0 + u [θ0 ](θ − θ0 ) φ0 + φ′ [θ0 ](θ − θ0 ) . ′ (C1 ) (C2 ) (C3 ) Apparently, all of the above yield approximations of u[θ] in the vicinity of θ0 . Strategies (C1 )–(C2 ) are based exclusively on primal quantities, while (C3 ) invokes adjoint quantities. Note that in the equations (Oθ ) and (Dθ ), the orders of the smoothing operation G and the projection Π are reversed. Our main result is: 118 Numerical Methods and Applications Theorem 7.1. The update strategies (C1 )–(C3 ) admit the following approximation properties: kC1 (θ) − u[θ]kp → 0 as kθ − θ0 k → 0 for all p ∈ [2, ∞) (7.1) kθ − θ0 k kC2 (θ) − u[θ]kp → 0 as kθ − θ0 k → 0 for all p ∈ [2, ∞) (7.2) kθ − θ0 k kC3 (θ) − u[θ]kp → 0 as kθ − θ0 k → 0 for all p ∈ [2, ∞]. (7.3) kθ − θ0 k Strategies (C2 ) and (C3 ) yield feasible approximations, i.e., Ci (θ) ∈ Uad for i = 2, 3. The error term for (C2 ) is not larger than the term for (C1 ). Proof. Equation (7.1) follows immediately from the B-differentiability result for u[·], Theorem 4.5. For the second strategy, we have kC2 (θ) − u[θ]kp = kΠ[a,b] u0 + u′ [θ0 ](θ − θ0 ) − u[θ]kp = kΠ[a,b] u0 + u′ [θ0 ](θ − θ0 ) − Π[a,b] (u[θ])kp ≤ ku0 + u′ [θ0 ](θ − θ0 ) − u[θ]kp = kC1 (θ) − u[θ]kp , by the Lipschitz property of the projection, and the result follows as before. Finally, (7.3) was proven in Corollary 5.3. Note that (C3 ) admits an estimate for the remainder quotient in L∞ (D), while the others do not. However, the remainder itself can be estimated in L∞ as the following corollary shows: Corollary 7.2. Strategies (C1 )–(C3 ) admit the following approximation property: kCi (θ) − u[θ]k∞ → 0 as kθ − θ0 k → 0, for i = 1, 2, 3. Proof. For strategy (C1 ), the claim was proven in Lemma 4.7 with α = 0. For (C2 ), we estimate as in the proof of Theorem 7.1 and obtain kC2 (θ) − u[θ]k∞ = kΠ[a,b] u0 + u′ [θ0 ](θ − θ0 ) − u[θ]k∞ = kΠ[a,b] u0 + u′ [θ0 ](θ − θ0 ) − Π[a,b] (u[θ])k∞ ≤ ku0 + u′ [θ0 ](θ − θ0 ) − u[θ]k∞ = kC1 (θ) − u[θ]k∞ The claim for (C3 ) follows directly from (7.3). All three update strategies come at practically the same numerical cost, namely the solution of one derivative problem. Note that both u′ [θ0 ](θ − θ0 ) and φ′ [θ0 ](θ − θ0 ) are computed simultaneously by Algorithm 2. The additional projection in (C2 ) and (C3 ) is inexpensive. However, only (C2 ) and (C3 ) yield feasible approximations of the perturbed solution, and only for (C3 ) the remainder quotient (7.3) goes to zero in L∞ (D) as θ → θ0 . Therefore, we advocate the use of the (C3 ) strategy to compute corrections of the nominal solution u0 in the presence of perturbations. In the next section, our findings are supported by numerical experiments. 8. Applications in Optimal Control In this section, we present some applications of our results in the context of optimal control and report on numerical experiments. As an example, we treat a class of elliptic boundary control problems. The case of distributed control is simpler and therefore omitted. Numerical results are given which illustrate the performance of the update strategies analyzed in Section 7 and support the superiority of scheme (C3 ). 6. Update Strategies for Perturbed Nonsmooth Equations 119 8.1. Boundary Control of an Elliptic Equation. Let us suppose that Ω ⊂ RN , N ∈ {2, 3} is a bounded domain with Lipschitz continuous boundary Γ. We define the elliptic differential operator Ay(x) = −∇ · (A(x)∇y(x)) ⊤ N ×N where A(x) = A(x) ∈ R has entries in L∞ (Ω) such that A is uniformly elliptic, ⊤ 2 i.e., y A(x)y ≥ ̺|y| holds uniformly in Ω with some ̺ > 0. We consider the elliptic partial differential equation with boundary control Ay + c0 y = 0 on Ω (8.1) ∂y + αy = u on Γ ∂nA where c0 ∈ L∞ (Ω), c0 ≥ 0, α ∈ L∞ (Γ), α ≥ 0 such that kαkL2 (Γ) + kc0 kL2 (Ω) > 0. It is well known that (8.1) has a unique solution y = Su for every u ∈ L2 (Γ). The adjoint operator S ⋆ maps a given f to the trace of the unique solution of Ap + c0 p = f on Ω (8.2) ∂p + αp = 0 on Γ. ∂nA Lemma 8.1 (see [9]). The following are bounded linear operators: (1) S : L2 (Γ) → Lp (Ω) for all p ∈ [2, ∞). (2) S ⋆ : Lr (Ω) → L∞ (Γ) for all r ∈ (N/2, ∞]. We set D = Γ and consider the elliptic boundary optimal control problem: γ 1 (Eθ ) Find u ∈ Uad which minimizes kSu − θk2L2 (Ω) + kuk2 2 2 with γ > 0. For the parameter space, i.e., desired states, it is sufficient to choose Θ = L2 (Ω) in order to satisfy the assumptions of Section 2. It is well known that for any given θ ∈ Θ, a necessary and sufficient optimality condition for (Eθ ) is 1 u = Π[a,b] − S ⋆ (Su − θ) (8.3) γ which fits our setting (Oθ ) with the choice 1 1 g(θ) = S ⋆ θ G(θ) = S ⋆ S. γ γ Using Lemma 8.1, one readily verifies the conditions of Section 2. Note that p[θ] := γ g(θ) − G(θ)u[θ] = −S ⋆ (Su[θ] − θ) = γφ[θ] is the usual adjoint state belonging to problem (Eθ ), which satisfies (8.2) with f = −(Su[θ] − θ). 8.2. Numerical Results. We will verify our analytical results by means of the following example: We consider as a specific choice of (8.1) −∆y + y = 0 on Ω ∂y = u on Γ ∂n on Ω = (0, 1) × (0, 1). As bounds, we have a = −10 and b = 2. The control cost factor is γ = 0.1 and the nominal parameter is θ0 (x1 , x2 ) = x21 + x22 . The discretization is carried out with piecewise linear and globally continuous finite elements on a grid with 3121 vertices and 5600 triangles, which is refined near the boundary of Ω, see Figure 8.1. We refer to the corresponding finite element space as Vh ⊂ H 1 (Ω) and its restriction to the boundary is Bh . During the optimization loop (Algorithm 1), the discretized variables u and φ are taken as elements of Bh while the 120 Numerical Methods and Applications intermediate quantities Su as well as the adjoint state −S ⋆ (Su − θ), before restriction to the boundary, are taken in Vh . The computation of the active sets in the generalized Newton’s method is done in a simple way, by determining those vertices of the given grid at which φ ≥ b (or ≤ a) are satisfied. As a caveat, we remark that our convergence results (7.1)–(7.3) for the update strategies (C1 ) through (C3 ) cannot be observed when all quantities are confined to any fixed grid. The reason is that in this entirely static finite-dimensional problem, all Lp -norms are equivalent and hence the numerical results show no difference in the approximation qualities of the different strategies. In order to obtain more accurate results while keeping a fixed grid for the ease of implementation, we apply three postprocessing steps during the computation, see [7]. The exact procedure used is outlined below as Algorithm 3 and we explain the individual steps. Once the nominal solution u0 ∈ Bh is computed as described above (step 1:), the final u f0 6∈ Bh is obtained by a postprocessing step, i.e., by a pointwise exact projection of the piecewise linear function φ0 ∈ Bh to the interval [a, b], observing that the intersection of φ0 with the bounds does not usually coincide with boundary vertices of the finite element grid (step 2:). The nominal solution is shown in Figure 8.1 and 8.2. Nominal Control 2.2 1 2 1.8 0.5 1.6 0 1.4 1.2 −0.5 1 −1 −1 −0.5 0 0.5 1 0.8 0 1 2 3 4 Figure 8.1. Mesh refined near the boundary (left). The right figure shows the nominal control u0 (solid) and dual quantity φ0 (dashed), unrolled from the lower left corner of the domain in counterclockwise direction. Nominal State Nominal Desired State 2 2 1.5 1.5 1 1 0.5 0.5 0 1 0 1 1 0.5 0 0 −1 −1 −0.5 1 0.5 0 0 −1 −1 −0.5 Figure 8.2. Nominal state Su0 (left) and nominal desired state θ0 (right). 6. Update Strategies for Perturbed Nonsmooth Equations 121 A sequence of perturbed solutions u[θi ] corresponding to parameters {θi }ni=1 near θ0 is computed in the same way (step 3:), i.e., with the simple active set strategy on the fixed grid and a postprocessing step. In the numerical experiments, every parameter θi is obtained by a random perturbation of the finite element coordinates of the desired state θ0 . This allows us to verify that the error estimates of Theorem 7.1 are indeed uniform with respect to the perturbation direction. The perturbations have specified norms, namely i−1 {kθi − θ0 k2 }ni=1 = logspace(0,-2.5,n) = 10−2.5· n−1 , i = 1, . . . , n, where n = 61. The derivative problems for u′ [θ0 ](θi − θ0 ) and φ′ [θ0 ](θi − θ0 ) involve bounds which take only the values b a, bb ∈ {0, ±∞} and depend on the nominal solution u0 and adjoint quantity φ0 , see (6.3). These bounds are expressed in terms of constant values on the intervals of the boundary grid (step 4:), and again the simple active set strategy on the original grid is used to solve the derivative problems u′ [θ0 ](θ − θ0 ) and φ′ [θ0 ](θ − θ0 ), see (step 5:), for the various perturbation directions θi − θ0 . Then two postprocessing steps follow. In the first (step 6:), b a and bb are determined from (6.3) more accurately than before, using the true intersection points of the nominal adjoint variable φ0 with the original bounds a and b. In the second (step 7:), the derivative u′ [θ0 ](θ − θ0 ) is postprocessed and set to the true projection of φ′ [θ0 ](θ − θ0 ) to the improved bounds b a and bb. The exact procedure used to verify our theoretical results is outlined below as Algorithm 3. Algorithm 3 The discretized procedure used to obtain the numerical results. 1: 2: 3: 4: 5: 6: 7: Run Algorithm 1 on the fixed grid (Figure 8.1). Active sets are determined by boundary mesh points. The results u0 and φ0 are elements of Bh . The state Su0 and adjoint state −S ⋆ (Su0 − θ0 ) are elements of Vh . Obtain an improved solution u f0 = Π[a,b] (φ0 ) by carrying out the exact projection (postprocessing) of the adjoint quantity φ0 ∈ Bh to the bounds a and b. u f0 is no longer in Bh . Repeat steps 1: and 2: for a sequence of perturbations {θi }ni=1 near θ0 to obtain solutions ui and, by postprocessing, improved solutions uei , i = 1, . . . , n. (This is to form the difference quotients (7.1)–(7.3) later.) Compute the bounds b a and bb by (6.3) as functions which are constant (possibly ±∞) on the intervals of the boundary grid. Run Algorithm 2 on the fixed grid (Figure 8.1), for the given sequence of perturbation directions θi − θ0 , i = 1. . . . , n. One obtains the derivatives u′ [θ0 ](θi − θ0 ) and dual derivatives φ′ [θ0 ](θi − θ0 ), both elements of Bh . Obtain an improved choice for the bounds b a and bb by determining the exact transition points in (6.3). Obtain an improved derivative u e′ [θ0 ](θi − θ0 ) by carrying out the exact projection (postprocessing) of the dual derivative φ′ [θ0 ](θ − θ0 ) to the improved bounds b a and bb. Figure 8.3 (left) shows the behavior of the approximation errors kapproximation errori kp = kC1 (θi − θ0 ) − u[θi ]kp , while Figure 8.3 (right) shows the behavior of the error quotients kCi (θi − θ0 ) − u[θi ]kp kapproximation errori kp = ksize of perturbationkL2 (Ω) kθi − θ0 kL2 (Ω) 122 Numerical Methods and Applications C1 approximation errors in different norms 0 L L2 −5 10 approximation error approximation error ∞ L L2 −5 10 −10 10 −15 10 −20 −10 10 −15 10 −20 10 10 −25 10 C1 error quotients in different norms 10 ∞ −25 −2 10 −1 10 perturbation size ||θ − θ || 10 0 10 −2 10 0 2 0 L L2 ∞ L L2 −5 10 10 approximation error approximation error C2 error quotients in different norms 10 ∞ −10 10 −15 10 −20 −10 10 −15 10 −20 10 10 −25 10 −25 −2 10 −1 10 perturbation size ||θ − θ || 10 0 10 −2 10 0 2 0 ∞ L L2 −5 10 10 approximation error approximation error 0 10 C3 error quotients in different norms 10 ∞ L L2 −5 −1 10 perturbation size ||θ − θ || 0 2 C3 approximation errors in different norms −10 10 −15 10 −20 −10 10 −15 10 −20 10 10 −25 10 0 10 0 2 C2 approximation errors in different norms −5 −1 10 perturbation size ||θ − θ || −25 −2 10 −1 10 perturbation size ||θ − θ || 0 2 0 10 10 −2 10 −1 10 perturbation size ||θ − θ || 0 10 0 2 Figure 8.3. Approximation errors kCi (θ) − u[θ]kp (left) and error quotients (7.1)–(7.3) (right) in different Lp (Γ) norms, plotted against the size of the perturbation kθi − θ0 k2 in a double logarithmic scale. Top row refers to strategy (C1 ), middle row to (C2 ), bottom row to (C3 ). In each plot, the upper line corresponds to p = ∞, the lower to p = 2. as in (7.1)–(7.3). In the enumerator, the Lp (Γ) norms for p ∈ {2, ∞} are used. The scales in Figure 8.3 are doubly logarithmic and they are the same for each of the plots. Using the procedure for the discretized problems outlined in Algorithm 3, we observe the following results: (1) The approximation error for strategy (C2 ) is indeed smaller (approximately by a factor of 2) than the error using strategy (C1 ), see Figure 8.3 (first and second row), as expected from Theorem 7.1. 6. Update Strategies for Perturbed Nonsmooth Equations 123 (2) The approximation error for strategy (C3 ) is in turn smaller (approximately by a factor of 7) than the error using strategy (C2 ), see Figure 8.3 (second and third row). (3) As predicted by Theorem 7.1, the error quotient in the L∞ (Γ) norm does not tend to zero for strategies (C1 ) and (C2 ), see Figure 8.3 (top right and middle right). (4) Theorem 7.1 predicts the approximation error and its quotient for strategy (C3 ) to tend to zero in particular in the L∞ (Γ)-norm. In the experiments, we observe that the approximation error tends to a constant (approximately 6.3 · 10−14 , see Figure 8.3 (bottom left)). This is to be expected as we reach the discretization limit on the given grid. To summarize, Theorem 7.1 is confirmed by the numerical results. The update strategy (C3 ), which involves the dual variable φ, performs significantly better than the strategies based on the primal variable u. We can also offer a geometric interpretation for this: The derivative u′ [θ0 ] of the primal variable u0 is given by a projection and it is zero on the so-called strongly active sets, i.e., where φ0 6∈ [a, b], compare Theorem 4.5 and (6.3). Consequently, the primal-based strategies (C1 ) and (C2 ) can only predict a possible growth of the active sets from u0 to u[θ], and not their shrinking. On the other hand, the derivative of the dual variable φ′ [θ0 ] (Theorem 5.2) has a different structure and it can capture the change of active sets more accurately. Since u′ [θ0 ] and φ′ [θ0 ] are available simultaneously, see Algorithm 2, we advocate the use of strategy (C3 ) to recover a perturbed from an unperturbed solution. References [1] F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems. Springer, Berlin, 2000. [2] M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth Newton method. SIAM Journal on Optimization, 13(3):865–888, 2002. [3] D. Kinderlehrer and G. Stampacchia. An Introduction to Variational Inequalities and their Applications. Academic Press, New York, 1980. [4] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002. [5] K. Malanowski. Remarks on differentiability of metric projections onto cones of nonnegative functions. Journal of Convex Analysis, 10(1):285–294, 2003. [6] K. Malanowski. Solution differentiability of parametric optimal control for elliptic equations. In System modeling and optimization, Proceedings of the IFIP TC7 Conference, volume 130, pages 271–285. Kluwer, 2003. [7] C. Meyer and A. Rösch. Superconvergence properties of optimal control problems. SIAM Journal on Control and Optimization, 43(3):970–985, 2004. [8] A. Shapiro. On concepts of directional differentiability. Journal of Optimization Theory and Applications, 66(3):477–487, 1990. [9] F. Tröltzsch. Optimale Steuerung partieller Differentialgleichungen. Vieweg, Wiesbaden, 2005. [10] M. Ulbrich. Semismooth Newton methods for operator equations in function spaces. SIAM Journal on Optimization, 13:805–842, 2003. 124 Numerical Methods and Applications 7. Quantitative Stability Analysis of Optimal Solutions in PDE-Constrained Optimization K. Brandes and R. Griesse: Quantitative Stability Analysis of Optimal Solutions in PDE-Constrained Optimization, Journal of Computational and Applied Mathematics, 206(2), p.809–826, 2007 The derivative of an optimal solution with respect to parameter perturbations naturally lends itself to the quantitative assessment of that solution’s stability. In this paper, we address the question on how to identify the perturbation direction which has the greatest impact on the solution, or on a quantity of interest depending on the solution. We address only the case without inequality constraints, because we exploit in particular the linearity of the map δπ 7→ DΞ(π0 ; δπ). However, the results can be easily extended to problems with inequality constraints if strict complementarity holds, compare Remark 0.5 on p. 10. We employ here the setting of a generic optimal control problem in Banach spaces, (7.1) min J(y, u, π) (y,u) subject to e(y, u, π) = 0. with Lagrangian L(y, u, p; π) = J(y, u, π) + hp, e(y, u, π)i . Under differentiability and constraint qualification assumptions, a local optimal solution of (7.1) for the nominal parameter π0 ∈ P satisfies, together with its adjoint state, Ly (y0 , u0 , p0 ; π0 ) = Lu (y0 , u0 , p0 ; π0 ) = Lp (y0 , u0 , p0 ; π0 ) = 0. Differentiating totally with respect to π yields δy δu := Ξ0 (π0 ) δπ = K−1 B δπ, δp where Lyy Lyu e?y Lyπ K = Luy Luu e?u B = − Luπ eπ ey eu 0 and everything is evaluated at Ξ(π0 ) = (y0 , u0 , p0 ) and parameter π0 . The Implicit Function Theorem justifies the existence and representation of the derivative since K is indeed boundedly invertible under (assumed) second-order sufficient optimality conditions. As stated above, it is our goal to analyze the rate of change of an observed quantity q(y, u, p) ∈ H which depends on the optimal solution, which in turn depends on the parameter π. Here, the space of observations H may be a finite or infinite dimensional Hilbert space, and the same holds for the space of parameters P . Let us denote the derivative of q at (y0 , u0 , p0 ) by Π. By the chain rule, q is totally differentiable with respect to the parameter π, and our operator of interest is (7.2) A := ΠK−1 B Note that A is the Fréchet derivative of the map π 7→ q(Ξ(π)) at π0 . In other words, it represents the linear relation between a perturbation direction δπ and the first order change in the observed quantity q, when π changes from π0 + δπ. The desired information regarding perturbation directions of greatest impact is contained in A and can be retrieved by a (partial) singular value decomposition (SVD) of A. (This requires A to be compact, which is naturally the case in many situations, see Example 2.9 in the paper.) The right singular vectors, in descending order with 7. Quantitative Stability Analysis 125 respect to the singular values, disclose the perturbation directions of greatest impact on the observed quantity q, in the norm of the observation space H. The corresponding left singular values yield the respective directional derivatives of q under these perturbations. The largest singular value allows to quantify the first-order stability of q, which may be an important piece of information in practical applications. The remainder of the paper deals with the practical evaluation of a partial and approximate SVD of A, after a Galerkin discretization. One of the challenges consists in overcoming the occurrence of Cholesky factors of mass matrices, which naturally appear when one computes with respect to given bases of finite dimensional subspaces of P and H. We achieve this goal by exchanging the SVD for an eigen decomposition of the associated Jordan-Wielandt matrix, to which we apply a suitable similarity transformation, see Section 3 of the paper. We describe an algorithm which allows to construct, using standard iterative eigen decomposition software such as Matlab’s eigs routine, a partial SVD entirely in terms of coordinate vectors with respect to the chosen bases, and without the need of modifying any scalar products. In Section 4 of the paper, we present numerical examples. We deal explicitly with the cases of low and high dimensional parameter and observation spaces P and H. 126 Numerical Methods and Applications QUANTITATIVE STABILITY ANALYSIS OF OPTIMAL SOLUTIONS IN PDE-CONSTRAINED OPTIMIZATION KERSTIN BRANDES AND ROLAND GRIESSE Abstract. PDE-constrained optimization problems under the influence of perturbation parameters are considered. A quantitative stability analysis for local optimal solutions is performed. The perturbation directions of greatest impact on an observed quantity are characterized using the singular value decomposition of a certain linear operator. An efficient numerical method is proposed to compute a partial singular value decomposition for discretized problems, with an emphasis on infinite-dimensional parameter and observation spaces. Numerical examples are provided. 1. Introduction In this work we consider nonlinear infinite-dimensional equality-constrained optimization problems, subject to a parameter p in the problem data: min f (x, p) x subject to e(x, p) = 0. (1.1) The optimization variable x and the parameter p are in some Banach and Hilbert spaces, respectively, and f and e are twice continuously differentiable. In particular, we have in mind optimal control problems for partial differential equations (PDE). When solving practical optimal control problems which describe the behavior of physical systems, uncertainty in the physical parameters is virtually unavoidable. In (1.1), the uncertain data is expressed in terms of a parameter p for which a nominal or expected value p0 is available but whose actual value is unknown. Having solved problem (1.1) for p = p0 , it is thus natural and sometimes crucial to assess the stability of the optimal solution with respect to unforeseen changes in the problem data. In this contribution we quantify the first-order stability properties of a local optimal solution of (1.1), and more generally, the stability properties of an observed quantity depending on the solution. We make use of the singular value decomposition (SVD) for compact operators. Moreover, we propose a practical and efficient procedure to approximate the corresponding singular system. The right singular vectors corresponding to the largest singular values represent the perturbation directions of greatest impact on the observed quantity. The singular values themselves provide an upper bound for the influence of unit perturbations. Altogether, this information allows practitioners to assess the stability properties of any given optimal solution, and to avoid the perturbations of greatest impact. Let us briefly relate our effort to previous results in the field. The differentiability properties of optimal solutions with respect to p in the context of PDE-constrained optimization were studied in, e.g., [4,10]. The impact of given perturbations on optimal solutions and the optimal value of the objective has also been discussed there. For the dependence of a scalar quantity of interest on perturbations we refer to [6]. All of these results admit pointwise inequality constraints for the control variable. For simplicity of the presentation, we elaborate on the case without inequality constraints. However, our results extend to problems with inequality (control) constraints in the presence of strict complementarity, see Remark 3.6. 7. Quantitative Stability Analysis 127 The material is organized as follows: In Section 2, we perform a first order perturbation analysis of solutions for (1.1) in the infinite-dimensional setting of PDEconstrained optimization, and discuss their stability properties using the singular value decomposition of a certain compact linear map. In Section 3 we focus on the discretized problem and propose a practical and efficient method to compute the most significant part of the singular system. Finally, we present numerical examples in Section 4. For normed linear spaces X and Y , L(X, Y ) denotes the space of bounded linear operators from X into Y . The standard notation Lp (Ω) and H 1 (Ω) for Sobolev spaces is used, see [1]. 2. Infinite-Dimensional Perturbation Analysis As mentioned in the introduction, we are mainly interested in the analysis of optimal control problems involving PDEs. Hence we re-state problem (1.1) as min f (y, u, p) subject to e(y, u, p) = 0 y,u (2.1) where the optimization variable x = (y, u) splits into a state variable y ∈ Y and a control or design variable u ∈ U and where e : Y × U → Z ⋆ represents the weak form of a stationary or non-stationary partial differential equation. Throughout, Y , U and Z are reflexive Banach spaces and Z ⋆ denotes the dual of Z. Problem (2.1) depends on a parameter p taken from a Hilbert space P , which is not optimized for but which represents perturbations or uncertainty in the problem data. We emphasize that p may be finite- or infinite-dimensional. For future reference, it will be convenient to define the Lagrangian of problem (2.1) as L(y, u, λ, p) = f (y, u, p) + hλ, e(y, u, p)i . (2.2) The following two results are well known [11]: Lemma 2.1 (First-Order Necessary Conditions). Let f and e be continuously differentiable with respect to (y, u). Moreover, let (y, u) be a local optimal solution for problem (2.1) for some given parameter p. If ey (y, u, p) ∈ L(Y, Z ⋆ ) is onto, then there exists a unique Lagrange multiplier λ ∈ Z such that the following optimality system is satisfied: Ly (y, u, λ, p) = fy (y, u, p) + hλ, ey (y, u, p)i = 0 (2.3) Lu (y, u, λ, p) = fu (y, u, p) + hλ, eu (y, u, p)i = 0 (2.4) Lλ (y, u, λ, p) = e(y, u, p) = 0. (2.5) In the context of optimal control, λ is called the adjoint state. A triple (y, u, λ) satisfying (2.3)–(2.5) is called a critical point. Lemma 2.2 (Second-Order Sufficient Conditions). Let (y, u, λ) be a critical point such that ey (y, u, p) is onto and let f and e be twice continuously differentiable with respect to (y, u). Suppose that there exists ρ > 0 such that Lxx (y, u, λ, p)(x, x) ≥ ρ kxk2Y ×U holds for all x ∈ ker ex (y, u, p). Then (y, u) is a strict local optimal solution of (2.1). Let us fix the standing assumptions for the rest of the paper: Assumption 2.3. (1) Let f and e be twice continuously differentiable with respect to (y, u, p). (2) Let p0 be a given nominal or expected value of the parameter, and let (y0 , u0 ) be a local optimal solution of (2.1) for p0 . (3) Suppose that ey (y0 , u0 , p0 ) is onto and that λ0 is the unique adjoint state. (4) Suppose that the second-order sufficient conditions of Lemma 2.2 hold at (y0 , u0 , λ0 ). 128 Numerical Methods and Applications Remark 2.4. For the sake of the generality of the presentation, we abstain from using more specific, i.e., weaker, second-order sufficient conditions for optimal control problems with PDEs, see, e.g., [16, 17]. In case the setting of a specific problem at hand requires refined second-order conditions and a careful choice of function spaces, the subsequent ideas still remain valid, compare Example 2.5. Let us define now the Karush-Kuhn-Tucker (KKT) operator Lyy Lyu e⋆y K = Luy Luu e⋆u ey eu 0 (2.6) where all terms are evaluated at the nominal solution (y0 , u0 , λ0 ) and the nominal parameter p0 , and e⋆y and e⋆u denote the adjoint operators of ey and eu , respectively. Note that K is self-adjoint. Here and in the sequel, when no ambiguity arises, we will frequently omit the function arguments. Under the conditions of Assumption 2.3, K is boundedly invertible as an element of L(Y × U × Z, Y ⋆ × U ⋆ × Z ⋆ ). Example 2.5 (Optimal Control of the Stationary Navier-Stokes System). As mentioned in Remark 2.4, nonlinear PDE-constrained problems may require refined secondorder sufficient conditions. Consider, for instance, the distributed optimal control problem for the stationary Navier-Stokes equations, min y,u s.t. γ 1 ky − yd k2[L2 (Ω)]N + kuk2[L2(Ω)]N 2 2 −ν∆y + (y · ∇)y + ∇p = u on Ω div y = 0 on Ω y = 0 on ∂Ω on some bounded Lipschitz domain Ω ⊂ RN , N ∈ {2, 3}. Suitable function spaces for the problem are Y = Z = closure in [H 1 (Ω)]N of {v ∈ [C0∞ (Ω)]N : div v = 0}, U = [L2 (Ω)]N . In [17, Theorem 3.16] it was proved that the condition Z kyk2[L2(Ω)]N + γkuk2[L2(Ω)]N + 2 (y · ∇)yλ0 ≥ ρ kuk2[L4/3(Ω)]N Ω for some ρ > 0 and all (y, u) satisfying the linearized state equation at (y0 , u0 ) is a second-order sufficient condition of optimality for a critical point (y0 , u0 , λ0 ). Hence this weaker condition may replace Assumption 2.3(4) for this problem. Still, it can be proved along the lines of [4, 10] that K is boundedly invertible as an element of L(Y × [L4/3 (Ω)]N × Z, Y ⋆ × [L4 (Ω)]N × Z ⋆ ). The subsequent ideas remain valid when U is replaced by L4/3 (Ω). From the bounded invertibility of K, we can easily derive the differentiability of the parameter-to-solution map from the implicit function theorem [2]: Lemma 2.6. There exist neighborhoods B1 of p0 and B2 of (y0 , u0 , λ0 ) and a continuously differentiable function Ψ : B1 → B2 such that for all p ∈ B1 , Ψ(p) is the unique solution in B2 of (2.3)–(2.5). The Fréchet derivative of Ψ at p0 is given by Lyp Ψ′ (p0 ) = −K−1 Lup (2.7) ep where the right hand side is evaluated at the nominal solution (y0 , u0 , λ0 ) and p0 . 7. Quantitative Stability Analysis 129 In particular, we infer from Lemma 2.6 that for a given perturbation direction p, the directional derivatives of the nominal optimal state and optimal control and the corresponding adjoint state (y, u, λ) are given by the unique solution of the linear system in Y ⋆ × U ⋆ × Z ⋆ y Lyp where B = − Lup . K u = B p (2.8) ep λ These directional derivatives are called the parametric sensitivities of the state, control and adjoint variables. They describe the first-order change in these variables as p changes from p0 to p0 + p. It is worth noting that these sensitivities can be characterized alternatively as the unique solution x = (y, u) and adjoint state of the following auxiliary problem with quadratic objective and linear constraint: 1 min Lxx (y0 , u0 , λ0 , p0 )(x, x) + Lxp (y0 , u0 , λ0 , p0 )(x, p) y,u 2 subject to ey (y0 , u0 , p0 ) y + eu (y0 , u0 , p0 ) u = −ep (y0 , u0 , p0 ) p. (2.9) Hence, computing the parametric sensitivity in a given direction p amounts to solving one linear-quadratic problem (2.9). We recall that it is our goal to analyze the stability properties of an observed quantity q : Y × U × Z ∋ (y, u, λ) 7→ q(y, u, λ) ∈ H depending on the solution, where H is another finite- or infinite-dimensional Hilbert space and q is differentiable. By the chain rule, the first-order change in the observed quantity, as p changes from p0 to p0 + p, is given by Π(y, u, λ) := q ′ (y0 , u0 , λ0 )(y, u, λ). (2.10) ′ We refer to Π = q (y0 , u0 , λ0 ) ∈ L(Y × U × Z, H) as the observation operator. Due to (2.8), we have the following linear relation between perturbation direction p and first order change in the observed quantity: Π(y, u, λ) = ΠK−1 B p. Example 2.7 (Observation Operators). (i) If one is interested in the impact of perturbations on the optimal state on some subset Ω′ of the computational domain Ω, one has q(y, u, λ) = y|Ω′ and, due to linearity, Π = q holds. (ii) If the quantity of interest is the impact of perturbations on the average value R of the control variable, one chooses q(y, u, λ) = u where the integral extends over the control domain. It is the bounded linear map ΠK−1 B that we now focus our attention on. The maximum impact of all perturbations (of unit size) on the observed quantity is given by the operator norm kΠK−1 BkL(P,H) = sup p6=0 kΠK−1 B pkH . kpkP (2.11) To simplify the notation, we will also use the abbreviation A := ΠK−1 B. In general, the operator norm need not be attained for any direction p. Therefore, and in order to perform the singular value decomposition, we make the following assumption: 130 Numerical Methods and Applications Assumption 2.8. Suppose that A is compact from P to H. To demonstrate that this assumption is not overly restrictive, we discuss several important examples. Recall that in PDE-constrained optimization, Y and Z are infinite-dimensional function spaces. Hence, K−1 cannot be compact since then its spectrum would contain 0 which entails non-invertibility of K−1 . (Of course, if all of Y , U and Z are finite-dimensional, Assumption 2.8 holds trivially.) Example 2.9 (Compactness of A). (i) If at least one of the parameter or observation spaces P or H is finite-dimensional, A is trivially compact. (ii) For sufficiently regular perturbations, B and thus A is compact: Consider the standard distributed optimal control problem with Y = Z = H01 (Ω), U = L2 (Ω), where Ω is a bounded domain with Lipschitz boundary in RN , N ≥ 1, yd , ud ∈ L2 (Ω), and γ 1 ky − yd k2L2 (Ω) + ku − ud k2L2 (Ω) 2 2 e(y, u, p)(ϕ) = (∇y, ∇ϕ) − (u, ϕ) − hp, ϕiH −1 (Ω),H01 (Ω) , f (y, u) = ϕ ∈ H01 (Ω), which corresponds to −∆y = u+p on Ω and y = 0 on ∂Ω. It is straightforward to verify that B = (0, 0, id)⊤ . By compact embedding, see [1], B is compact from P = L(N +2)/(2N )+ε (Ω) into Y ⋆ ×U ⋆ ×Z ⋆ for any ε > 0, and in particular for the Hilbert space P = L2 (Ω) in any dimension N . Hence A = ΠK−1 B is compact for P = L2 (Ω) and arbitrary linear and bounded observation operators Π. (iii) In the previous example, neither B nor K−1 B is compact if P = H −1 (Ω). In that case, one has to choose an observation space of sufficiently low regularity, so that Π and hence A is compact. For instance, in the previous example, Π(y, u, λ) = y is compact into H = L2 (Ω) due to the compact embedding of H01 (Ω) into L2 (Ω). We refer to Section 4 for more examples and return to the issue of computing the operator norm (2.11). This can be achieved by the singular value decomposition [3, Ch. 2.2]: Lemma 2.10. There exists a countable system {(σn , vn , un )}n∈N such that {σn }n∈N is non-increasing and non-negative, {(σn2 , vn )} ⊂ R×P is a complete orthonormal system of eigenpairs for AH A (spanning the closure of the range of AH ), and {(σn2 , un )} ⊂ R × H is a complete orthonormal system of eigenpairs for AAH (spanning the closure of the range of A). In addition, Avn = σn un holds and we have A p = ΠK−1 B p = ∞ X σn (p, vn )P un (2.12) n=1 for all p ∈ P , where the series converges in H. Every value in {σn }n∈N appears with finite multiplicity. In Lemma 2.10, AH : H → P denotes the Hilbert space adjoint of A and (·, ·)P is the scalar product of P . A system according to Lemma 2.10 is called a singular system for A, with singular values σn , left singular vectors un ∈ H, and right singular vectors vn ∈ P . Knowledge of the singular system will not only allow us to compute the operator norm (2.11) and the direction(s) p for which this bound is attained, but in addition, we obtain a complete sequence of perturbation directions in decreasing order of importance with regard to the perturbations in the observed quantity. This is formulated in the following proposition: 7. Quantitative Stability Analysis 131 Proposition 2.11. Let {(σn , vn , un )}n∈N be a singular system for A. Then the operator norm in (2.11) is given by σ1 . Moreover, the supremum is attained exactly for all non-zero vectors p ∈ span{v1 , . . . , vk } =: V1 , where k is the largest integer such that σ1 = σk . Similarly, when A is restricted to V1⊥ , its operator norm is given by σk+1 and it is attained exactly for all non-zero vectors p ∈ span{vk+1 , . . . , vl }, where l is the largest integer such that σk+1 = σl , and so on. Proof. The claim follows directly from the properties of the singular system. Proposition 2.11 shows that the question of greatest impact of arbitrary perturbations on the observed quantity is answered by the singular value decomposition (SVD) of A. It is well known that SVD is closely related to principal components analysis (PCA) in statistics and image processing [8], and proper orthogonal decomposition (POD) in dynamical systems, compare [13, 18]. To our knowledge, however, this technique has not been exploited for the quantitative stability analysis of optimization problems. In the following section we focus on an efficient algorithm for the numerical computation of the largest singular values and left and right singular vectors for a discretized version of problem (2.1). 3. Numerical Stability Analysis In this section, we propose an efficient algorithm for the numerical computation of the singular system for a discretized (matrix) version of ΠK−1 B. The convergence of the singular system of the discretized problem to the singular system of the continuous problem will be discussed elsewhere. In practice, it will be sufficient to compute only a partial SVD, starting with the largest singular value, down to a certain threshold, in order to collect the perturbation directions of greatest impact with respect to the observed quantity. The method we propose makes use of existing standard software which iteratively approximates the extreme eigenpairs of non-symmetric matrices, and it will be efficient in the following sense: It is unnecessary to assemble the (discretized) matrix ΠK−1 B, which is prohibitive for high-dimensional parameter and observation spaces. Only matrix–vector products with K−1 B are required, i.e., the solution of sensitivity problems (2.8), and the inexpensive application of the observation operator Π. In particular, we avoid the computation of certain Cholesky factors which relate the Euclidean norms of coordinate vectors and the function space norms of the functions represented by them, see below. We discretize problem (2.1) by a Galerkin procedure, e.g., the finite element or wavelet method. To this end, we introduce finite-dimensional subspaces Yh ⊂ Y , Uh ⊂ U and Zh ⊂ Z, which inherit the norms from the larger spaces. The discretized problem reads min f (y, u, p) subject to e(y, u, p)(ϕ) = 0 for all ϕ ∈ Zh , y,u (3.1) where (y, u) ∈ Yh × Uh . In the general case of an infinite-dimensional parameter space, we also choose a finite-dimensional subspace Ph ⊂ P . Should any of the spaces be finite-dimensional in the first place, we leave it unchanged by discretization. Suppose that for the given parameter p0 ∈ Ph , a critical point for the discretized problem has been computed by a suitable method, for instance, by sequential quadratic programming (SQP) methods [12, 15]. That is, (yh , uh , λh ) ∈ Yh × Uh × Zh satisfies the discretized optimality system, compare (2.3)–(2.5): fy (yh , uh , p0 )(δyh ) + hλh , ey (yh , uh , p0 )(δyh )i = 0 for all δyh ∈ Yh (3.2) fu (yh , uh , p0 )(δuh ) + hλh , eu (yh , uh , p0 )(δuh )i = 0 for all δuh ∈ Uh (3.3) e(yh , uh , p0 )(δzh ) = 0 for all δzh ∈ Zh . (3.4) 132 Numerical Methods and Applications We consider the discrete analog of the sensitivity system (2.8), i.e., yh δyh E δyh E D D for all (δyh , δuh , δzh ) ∈ Yh × Uh × Zh , Kh uh , δuh = B h ph , δuh δzh δzh λh (3.5) where Kh and B h are defined as before in (2.6) and (2.8), evaluated at the critical point (yh , uh , λh ). The perturbation direction ph is taken from the discretized parameter space Ph . Assumption 3.1. Suppose that the critical point (yh , uh , λh ) is sufficiently close to the local solution of the continuous problem (y0 , u0 , λ0 ), such that second-order sufficient conditions hold for the discretized problem. That is, ey (yh , uh , p0 ) maps Yh onto Zh , and there exists ρ′ > 0 such that Lxx (yh , uh , λh , p0 )(x, x) ≥ ρ′ kxk2Y ×U for all x ∈ Yh × Uh satisfying hex (yh , uh , p0 )x, ϕi = 0 for all ϕ ∈ Zh . Under Assumption 3.1, the KKT operator Kh at the discrete solution is invertible and equation (3.5) gives rise to a linear map (Kh )−1 B h : Ph → Yh × Uh × Zh which acts between finite-dimensional spaces and thus is automatically bounded. There is no need to discretize the observation space H since ΠK−1 B, restricted to Ph , has finite-dimensional range. Nevertheless, we define for convenience the subspace of H, Rh = range of Πh (Kh )−1 B h considered as a map Ph → H, where Πh = q ′ (yh , uh , λh ), compare (2.10). We recall that it is our goal to calculate the portion of the singular system for Πh (Kh )−1 B h : Ph → Rh which belongs to the largest singular values. At this point, we introduce a basis for the discretized parameter space Ph , say Ph = span {ϕ1 , . . . , ϕm }. Likewise, we define a space Hh by Hh := span {ψ1 , . . . , ψn } such that Hh ⊃ Rh . Both the systems {ϕi } and {ψj } are assumed linearly independent without loss of generality. As the range space Rh is usually not known exactly, we allow the functions ψj to span a larger space Hh . For instance, in case of the state observation operator Πh (y h , uh , λh ) = y h , we may choose {ψj }nj=1 to be identical to the finite element basis of the state space Yh , which certainly contains the range space Rh . For the application of numerical procedures, we need to switch to a coordinate representation of the elements of the discretized parameter and observation spaces Ph and Hh . Note that a function p ∈ Ph can be identified with its coordinate vector p = (p1 , . . . , pm )⊤ with respect to the given basis. In other words, Rm and Ph are isomorphic, and the isomorphism and its inverse are given by the expansion and coordinate maps EP : Rm ∋ p 7→ m X pi ϕi ∈ Ph i=1 CP = EP−1 : Ph → Rm . We also introduce the mass matrix associated to the chosen basis of Ph , MP = (mij )m i,j=1 , mij = (ϕi , ϕj )P . 7. Quantitative Stability Analysis 133 In case of a discretization by orthogonal wavelets, MP is the identity matrix, while in the finite element case, MP is a sparse symmetric positive definite matrix. In any case, we have the following relation between the Euclidean norm of the coordinate vector p and the norm of the element p ∈ Ph represented by it: 1/2 kpk2P = p⊤ MP p = kMP pk22 , 1/2 1/2⊤ 1/2 where MP is the Cholesky factor of MP = MP MP , and k · k2 denotes the Euclidean norm of vectors in Rm or Rn . Similarly as above, we define expansion and −1 coordinate maps EH : Rn → Hh and CH = EH and the mass matrix MH = (mij )ni,j=1 , mij = (ψi , ψj )H to obtain ⊤ 1/2 khk2H = h MH h = kMH hk22 Pn for an element h = j=1 hj ψj ∈ Hh with coordinate vector h = (h1 , . . . , hn )⊤ . Any numerical procedure which solves the sensitivity problem (3.5) and applies the observation operator Πh does not directly implement the operator Πh (Kh )−1 B h . Rather, it realizes its representation in the coordinate systems given by the bases of Ph and Hh , i.e., Ah := CH Πh (Kh )−1 B h EP ∈ Rn×m . As mentioned earlier, the proposed method will employ matrix-vector products with Ah . Every matrix-vector product requires the solution of a discretized sensitivity equation (3.5) followed by the application of the observation operator. Note that there is a discrepancy in the operator Ah being given in terms of coordinate vectors and the requirement that the SVD should respect the norms of the spaces Ph and Hh . One way to overcome this discrepancy is to exchange the Euclidean scalar products in the SVD routine at hand by scalar products with respect to the mass matrices MP and Mh , respectively. In the sequel, we describe an alternative approach based on iterative eigen decomposition software, without the need of modifying any scalar products. By the relations between coordinate vectors and functions, we have kΠh (Kh )−1 B h kL(Ph ,Hh ) = = kΠh (Kh )−1 B h ph kH kph kP ph ∈Ph \{0} sup kΠh (Kh )−1 B h EP pkH kEH Ah pkH = sup 1/2 kEP pkP p∈Rm \{0} p∈Rm \{0} kM pk2 sup = 1/2 kMH Ah pk2 sup 1/2 p∈Rm \{0} kMP pk2 = P −1/2 1/2 h kMH A MP sup kp′ k2 p′ ∈Rm \{0} p′ k2 . (3.6) −1/2 The last manipulation is a coordinate transformation in Ph , and MP denotes the inverse of the Cholesky factor of MP . This transformation shows that a finitedimensional SVD procedure which employs the standard Euclidean vector norms in −1/2 1/2 the image and pre-image spaces should target the matrix MH Ah MP . Coordinate vectors referring to the new coordinate systems will be indicated by a prime. We have the relationships 1/2 1/2 p′ = MP p and kp′ k2 = kMP pk2 = kpkP . Hence the Euclidean norm of the transformed coordinate vector equals the norm of the function represented by it. The corresponding basis can in principle be obtained by an orthonormalization procedure with respect to the scalar product in P , starting from the previously chosen basis {ϕi }. Assembling the mass matrices and forming the 134 Numerical Methods and Applications 1/2 1/2 Cholesky factors MH and MP , however, will be too costly in general. Therefore, we propose the following strategy which avoids the Cholesky factors altogether. It is based on the following Jordan-Wielandt Lemma, see, e.g., [14, Theorem I.4.2]: −1/2 1/2 is equivalent to the Lemma 3.2. The singular value decomposition of MH Ah MP eigen decomposition of the symmetric Jordan-Wielandt matrix ! 1/2 −1/2 0 MH Ah MP ∈ R(m+n)×(m+n) J= 1/2⊤ −1/2⊤ h⊤ 0 A MH MP min{m,n} are in the following sense: The eigenvalues of J are exactly ±σi , where {σi }i=1 1/2 h −1/2 the singular values of MH A MP , plus a suitable number of zeros. The eigenvectors vi′ belonging to the nonnegative eigenvalues σi , i = 1, . . . , min{m, n}, can be partitioned into vi′ = (l′i , r′i )⊤ , where r′i ∈ Rm and l′i ∈ Rn . After normalization, r′i 1/2 −1/2 and l′i are the right and left singular vectors of MH Ah MP . 1/2 −1/2 Exchanging the singular value decomposition of MH Ah MP for an eigen decomposition of the Jordan-Wielandt matrix J does not resolve the issue of forming the 1/2 1/2 Cholesky factors MH and MP . To this end, we apply a similarity transform to J using the similarity matrices ! ! 1/2 −1/2 MH 0 MH 0 −1 X = X= 1/2 . −1/2 , 0 MP 0 MP Then the transformed matrix XJX −1 = 0 MP−1 Ah⊤ MH Ah 0 (3.7) 1/2 −1/2 has the same eigenvalues as J, including the desired singular values of MH Ah MP Lemma 3.3. The transformed matrix has the form 0 CH Πh (Kh )−1 B h EP −1 XJX = , CP (B h )⋆ (Kh )−1 (Πh )⋆ EH 0 . (3.8) where (B h )⋆ : Yh × Uh × Zh → Ph and (Πh )⋆ : Hh → Yh × Uh × Zh are the adjoint operators of B h and Πh , respectively. Proof. We only need to consider the lower left block. By transposing Ah , we obtain ⋆ Ah⊤ = EP⋆ (B h )⋆ (Kh )−1 (Πh )⋆ CH since Kh is symmetric. By definition, the adjoint operator EP⋆ satisfies hEP⋆ ξ, piRm = hξ, EP piP for all ξ ∈ Ph and p ∈ Rm . Hence, we obtain p⊤ (EP⋆ ξ) = hξ, m X pi ϕi iP = p⊤ MP (CP ξ) i=1 and thus EP⋆ = MP CP . Moreover, −1 ⋆ −1 −1 −1 ⋆ ⋆ −1 CH = (EH ) = (EH ) = (MH CH )−1 = CH MH = EH MH holds. Consequently, MP−1 Ah⊤ MH = CP (B h )⋆ (Kh )−1 (Πh )⋆ EH as claimed. Remark 3.4. Algorithmically, evaluating a matrix-vector product with (3.8) and a given coordinate vector (h, p)⊤ ∈ Rn ×Rm amounts to solving two sensitivity problems: (1) The first problem is (3.5) with the perturbation direction p = EP p ∈ Ph . 7. Quantitative Stability Analysis 135 (2) For the second problem, the right hand side operator B h in (3.5) is replaced by (Πh )⋆ , and the observation operator Πh is replaced by (B h )⋆ . The direction of evaluation is h = EH h ∈ Hh . Step (2) requires a modification of the original sensitivity problem (3.5). As an alternative, one may apply the following duality argument to (3.7): The vector MP−1 Ah⊤ MH h ⊤ is equal to the transpose of h MH Ah MP−1 . In case that the dimension of the parameter space m is small, the inversion of MP and the solution of m sensitivity problems to get Ah MP−1 may be feasible. (1) (2) (1)⊤ M H wi Let us denote by wi = (wi , wi )⊤ the eigenvectors of XJX −1 belonging to the nonnegative eigenvalues σi , i = 1, . . . , min{m, n}. This similarity transformation with X and X −1 does indeed avoid the Cholesky factors of the mass matrices, as will become clear in the sequel. Recall that the eigenvalues of XJX −1 are ±σi , plus a suitable number of zeros, where σi are the desired singular values. Hence the largest singular values correspond to the eigenvalues of largest magnitude, which can be conveniently computed iteratively, e.g., by an implicitly restarted Arnoldi process [19, Ch. 6.4]. Available software routines include the library ArPack (DNAUPD and DNEUPD), see [9], and Matlab’s eigs function. In case that the parameter space (or the observation space) is lowdimensional, we may also compute the matrix XJX −1 explicitly, see Sections 4.1 and 4.2, but these cases are not considered typical for our applications. We now discuss how to recover the desired partial singular value decomposition from the partial eigen decomposition of XJX −1 . For later reference, we note the following property of the eigenvectors of (3.7), which is readily verified: wi (1) (2)⊤ = wi (2) M P wi . (3.9) Note also that the eigenvectors wi of XJX −1 and vi′ of J are related by wi = Xvi′ . 1/2 −1/2 As the left and right singular vectors of MH Ah MP are just a partitioning of vi′ according to Lemma 3.2, we get ! ′ (1) li ′ −1 wi = vi = X (2) , r′i wi which in turn seems to bring up the Cholesky factors we wish to avoid. However, r′i is a coordinate vector with respect to an artificial (orthonormal) basis of Ph , which does not in general coincide with our chosen basis {ϕi }. Going back to this natural basis and normalizing, we arrive at ri = (2) (2)⊤ wi wi (2) 1/2 M P wi (3.10) Now ri is the coordinate representation of the desired i-th right singular vector with respect to the basis {ϕi }. Due to the normalization, the function represented by ri has P -norm one. We also wish to find the coordinate representation li of the response of the system (2) h A , given the perturbation input ri . As ri is a multiple of wi and thus part of an (1) eigenvector of XJX −1 , we infer from (3.7) that Ah maps ri to a multiple of wi . We are thus led to define li = (1) (1)⊤ wi wi (1) 1/2 M H wi . (3.11) 136 Numerical Methods and Applications (1) Despite the individual normalizations of wi the same proportionality constant: (2) and wi , li and ri are still related by Ah ri = σi li , (3.12) as can be easily verified using (3.9). We have thus proved our main result: Theorem 3.5. Suppose that σi > 0 is an eigenvalue of the matrix XJX −1 with (2) (1) eigenvector wi = (wi , wi )⊤ . Let ri and li be given by (3.10) and (3.11), respectively and let ri = EP ri ∈ Ph and li = EH li ∈ Hh be the functions represented by them. Then the following relations are satisfied: (a) kri kP = kli kH = 1. (b) The perturbation ri invokes the first order change σi li of magnitude σi in the observed quantity. In terms of coordinate vectors, Ah ri = σi li . Based on these considerations, we propose to compute the desired singular value −1/2 1/2 by iteratively approximation the extreme eigenvalues decomposition of MH Ah MP and corresponding eigenvectors of XJX −1 . This avoids the Cholesky factors of the mass matrices, as desired. We summarize the proposed procedure in Algorithm 1. Algorithm 1 Given: discretized spaces Yh , Uh , Zh and Ph , Hh , a discrete critical point (yh , uh , λh ) satisfying (3.2)–(3.4) for p0 ∈ Ph , a routine evaluating XJX −1 (h, p)⊤ for any given coordinate vector (h, p)⊤ , see Remark 3.4 Desired: a user-defined number s of singular values and perturbation directions (right singular vectors) in coordinate representation, which are of greatest first order impact with respect to the observed quantity 1: 2: 3: 4: Call a routine which iteratively computes the 2s eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λs ≥ 0 ≥ λs+1 ≥ . . . ≥ λ2s of largest absolute value and corresponding eigenvectors wi of XJX −1. Set σi := λi for i = 1, . . . , s. (2) (1) Split wi into (wi , wi ) of lengths n and m, respectively, for i = 1, . . . , s. Compute vectors ri and li for i = 1, . . . , s according to (3.10) and (3.11). Remark 3.6. The singular value decomposition of A and Ah relies on the linearity of the map p 7→ (y, u, λ), which maps a perturbation direction p to the directional derivative of the optimal solution and adjoint state, compare (2.7)–(2.8). For optimal control problems with pointwise control constraints a(x) ≤ u(x) ≤ b(x) almost everywhere on the control domain, the derivative need not be linear with respect to the direction, see [4, 10]. The presence of strict complementarity, however, restores the linearity. The procedure outlined above carries over to this case, with only minor modifications of the operators Kh and B h on the so-called active sets, compare also [6]. 4. Numerical Examples We consider as an example the optimal control problem Z 1 γ minimize − y(x) dx + kuk2L2 (C) 4 Ω 2 −κ∆y = χ u on Ω C s.t. κ ∂ y = α (y − y∞ ) on ∂Ω. ∂n (4.1) 7. Quantitative Stability Analysis 137 It represents the optimal heating of a room Ω = (−1, 1)2 ⊂ R2 to maximal average temperature y, subject to quadratic control costs. Heating is achieved through two radiators on some part of the domain C ⊂ Ω, and the heating power u serves as a distributed control variable. κ denotes the constant heat diffusivity, while α is the heat transfer coefficient with the environment. The latter has constant temperature y∞ . α is taken to be zero at the walls but greater than zero at the two windows, see Figure 4.1. FEM Mesh window 1 1 0.8 radiator 1 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 radiator 2 window 2 −0.8 −1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Figure 4.1. Layout of the domain and an intermediate finite element mesh with 4225 vertices (degrees of freedom). In the sequel, we consider the window heat transfer coefficients as perturbation parameters. As its nominal value, we take 0 at the walls α(x) = 1 at the lower (larger) window # 2 2 at the upper (smaller) window # 1. We will explore how the optimal temperature y changes under changes of α. Our example fits in the framework of Section 2 with Z 1 γ f (y, u) = − y(x) dx + kuk2L2 (C) 4 Ω 2 e(y, u, p)(ϕ) = κ(∇y, ∇ϕ)Ω − (u, ϕ)C − (α(y − y∞ ), ϕ)∂Ω . Suitable function spaces for the problem are Y = H 1 (Ω), U = L2 (C), Z = H 1 (Ω), P = L2 (W1 ) × L2 (W2 ). f and e are infinitely differentiable w.r.t. (y, u, p). For any given (y, u, p) ∈ Y × U × P , ey (y, u, p) : Y → Z ⋆ is onto and even boundedly invertible. Moreover, the problem is strictly convex and thus has a unique global solution which satisfies the second-order condition. The KKT operator is boundedly invertible. As state observation operator, we will use Π(y, u, λ) = y ∈ H = L2 (Ω). Compactness of A then follows from compactness of the embedding Y ֒→ H. Hence the example satisfies the Assumptions 2.3 and 2.8. Note that the parameter enters only in the PDE and not in the objective. The problem is discretized using standard linear continuous finite elements for the state and adjoint, and discontinuous piecewise constant elements for the control. In order to estimate the order of convergence for the singular values, a hierarchy of uniformly refined triangular meshes is used. An intermediate mesh is shown in Figure 4.1 (right). Since the problem has a quadratic objective and a linear PDE constraint, its solution requires the solution of only one linear system involving K. Here and throughout, 138 Numerical Methods and Applications systems involving K were solved using the conjugate gradient method applied to the reduced Hessian operator −1 −1 ⋆ −ey eu −ey eu Lyy Lyu Kred = , Luy Luu id id see, e.g., [5,7] for details. The state and adjoint partial differential equations are solved using a sparse direct solver. Figure 4.2 shows the nominal solution (yh , uh ) in the case κ = 1, γ = 0.005, y∞ = 0 C = (−0.8, 0.0) × (0.4, 0.8) ∪ (−0.75, 0.75) × (−0.8, −0.6) W1 = (−0.75, 0) × {1}, W2 = (−0.75, 0.75) × {−1}. This setup describes the goal to heat up the room to a maximal average temperature (taking control costs into account) at an environmental temperature of 0◦ C. One clearly sees how heat is lost through the two windows. Nominal control 1 100 0.8 0.6 95 0.4 0.2 90 0 −0.2 85 −0.4 −0.6 80 −0.8 −1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Figure 4.2. Nominal solution: Optimal state (left) and optimal control (right). In the sequel, we consider three variations of this problem. In every case, the insulation of the two windows, i.e., the heat transfer coefficient α restricted to the window areas, serves as a perturbation parameter. In Problem 1, this parameter is constant for each window and it is a spatial function in Problems 2 and 3. The optimal temperature y is the basis of the observation in all cases. In Problems 1 and 3, we observe the temperature at every point. In Problem 2, we consider only the average temperature throughout the room. Hence, these problems cover all cases where at least one of the parameter or observation spaces P and H is infinite-dimensional and high-dimensional after discretization. All examples are implemented using Matlab’s PDE toolbox. In every case, we use Matlab’s eigs function with standard tolerances to compute a partial eigen decomposition of the matrix XJX −1 . For Problems 1 and 2, we assemble this matrix explicitly according to (3.7). For Problem 3, we provide matrix-vector products with XJX −1 according to (3.8). Every matrix-vector product comes at the expense of the solution of two sensitivity problems (3.5), compare Remark 3.4. 4.1. Problem 1: Few Parameters, Large Observation Space. We begin by considering perturbations of the heat transfer coefficient on each window, i.e., p = (α|W1 , α|W2 ) ∈ R2 . 7. Quantitative Stability Analysis 139 That is, we study the effect of replacing the windows by others with different insulation properties. While the parameter space is only two-dimensional, we consider an infinite-dimensional observation space and observe the effect of the perturbations on the overall temperature throughout the room. That is, we have the observation operator Π(y, u, λ) = y, and the space H is taken as L2 (Ω). Hence the mass matrix MH in the discrete observation space is given by the L2 (Ω)-inner products of the linear continuous finite element basis on the respective grid. The mass matrix in the parameter space MP is chosen as 0.75 0 MP = 0 1.50 and it is generated by the L2 -inner product of the constant functions of value one on W1 and W2 . It thus reflects the lengths of the two windows and allows a comparison with Problem 3 later on. Since the matrix Ah ∈ Rn×2 has only two columns, it can be formed explicitly by solving only two sensitivity systems. From there, we easily set up XJX −1 according to (3.7) to avoid Cholesky factors of mass matrices, and perform an iterative partial eigen decomposition. Note that since Ah has only two nonzero singular values, only four eigenvalues of XJX −1 are needed. Table 4.1 shows the convergence of the singular values as the mesh is uniformly refined. In addition, the number of degrees of freedom of each finite element mesh and the total number of variables in the optimization problem is shown. The last column lists the number of QP steps, i.e., solutions of (3.5) with matrix Kh , which were necessary to obtain convergence of the (partial) eigen decomposition. For this problem, the number of QP solves is always two since Ah ∈ Rn×2 was assembled explicitly. Note also that our original problem (4.1) is linear-quadratic, hence finding the nominal solution requires only one solution with Kh and computing the singular values and vectors is twice as expensive. # var σ1 rate σ2 rate # Ah p # dof 81 168 5.0572 1.1886 2 289 626 11.8804 0.93 2.2487 0.81 2 1 089 2 394 13.3803 0.32 2.5896 0.40 2 4 225 9 530 16.6974 1.15 3.2168 1.29 2 16 641 38 136 18.8838 2.31 3.5678 2.38 2 66 049 151 898 19.3367 2.48 3.6283 1.87 2 263 169 605 946 19.4352 3.6510 2 Table 4.1. Degrees of freedom and total number of discrete state, control and adjoint variables on a hierarchy of finite element grids. Singular values and estimated rate of convergence w.r.t. grid size h for Problem 1. Number of sensitivity problems (3.5) solved. In this and the subsequent problems, we observed monotone convergence of the computed singular values. The estimated rate of convergence given in the tables was calculated according to |σh −σ∗ | log |σ 2h −σ∗ | log 1/2 , where σ∗ is the respective singular value on the finest mesh, and σh and σ2h is the same value on two neighboring intermediate meshes. The exact rate of convergence is difficult to predict from the table and clearly deserves further investigation. 140 Numerical Methods and Applications On the finest mesh, we obtain as singular values and right singular vectors −0.5103 −1.0358 σ1 = 19.3367 r1 = σ2 = 3.6283 r2 = . −0.7324 0.3609 Recall that r1 and r2 represent piecewise constant functions r1 and r2 on W1 ∪ W2 whose values on W1 and W2 are given by the upper and lower entries, respectively, see Figure 4.3 (right). The corresponding left singular vectors are shown in Figure 4.3 (left). These results can be interpreted as follows: Of all perturbations of Problem 3: First right singular vector (windows 1 and 2) −0.5 −0.55 −0.6 −0.65 −0.7 −0.75 −0.8 −0.6 −0.4 −0.2 0 x position 0.2 0.4 0.6 0.8 Problem 3: Second right singular vector (windows 1 and 2) 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1.2 −0.8 −0.6 −0.4 −0.2 0 x position 0.2 0.4 0.6 Figure 4.3. Problem 1: First and second left singular vectors l1 and l2 (left) and first and second right singular vectors (right), lower window (red) and upper window (blue). unit size (with respect to the scalar product given by MP ), the nominal state (from Figure 4.2) is perturbed most (in the L2 (Ω)-norm) when both windows are better insulated with the ratio of the improvement given by the ratio of the entries of the right singular vector r1 . The effect of this perturbation direction on the observed quantity (the optimal state) is represented by the first left singular vector l1 = EH l1 , multiplied by σ1 , compare (3.12). Due to the improved insulation at both windows, l1 is positive, i.e., the optimal temperature increases throughout the domain Ω when p changes from p0 to p0 + r1 . Since the second entry in r1 is greater in magnitude, the effect on the optimal temperature is more pronounced near the lower window, see Figure 4.3 (top left). Since the parameter space is only two-dimensional, the second right singular vector r2 represents the unit perturbation of lowest impact on the optimal state. Figure 4.3 (bottom left) shows the corresponding second left singular vector. Note that 0.8 7. Quantitative Stability Analysis 141 kl1 kL2 (Ω) = kl2 kL2 (Ω) = 1 and that l1 and l2 are perpendicular with respect to the inner product of L2 (Ω). The singular value σ2 shows that any given perturbation of the heat transfer coefficients of unit size has at least an impact of 3.6283 on the optimal state in the L2 (Ω)-norm, to first order. This should be viewed in relation to the L2 (Ω)-norm of the nominal solution, which is 48.3982. The data obtained from the singular value decomposition can be used to decide whether the observed quantity depending on the optimal solution is sufficiently stable with respect to perturbations. This decision should take into account the expected range of parameter variations and the tolerable variations in the observed quantity. 4.2. Problem 2: Many Parameters, Small Observation Space. In contrast to the previous situation, we now consider the window heat transfer coefficients to be spatially variable. That is, we have parameters p = (α(x)|W1 , α(x)|W2 ) ∈ L2 (W1 ) × L2 (W2 ). As an observed quantity, we choose the scalar value of the temperature averaged over the entire room. Hence the observation space is H = R and Z 1 y(x) dx. Π(y, u, λ) = 4 Ω Such a scalar output quantity is often called a quantity of interest. The weight in the observation space is MH = 1 and the mass matrix in the parameter space is the boundary mass matrix on W1 ∪ W2 with respect to piecewise constant functions on the boundary of the respective finite element grid. The matrix Ah ∈ R1×m now has only one row. It is thus strongly advisable to compute its transpose which requires only one solution of a linear system with Kh . This transposition technique was already used in [6] to compute derivatives of a quantity of interest depending on an optimal solution in the presence of perturbations. As above, we show in Table 4.2 the convergence behavior of the only non-zero singular value of Ah . # var σ1 rate # Ah p # dof 81 168 2.5381 1 289 626 5.9245 0.93 1 1 089 2 394 6.6786 0.32 1 4 225 9 530 8.3316 1.15 1 16 641 38 136 9.4157 2.31 1 66 049 151 898 9.6393 2.47 1 263 169 605 946 9.6887 1 Table 4.2. Problem 2: Singular value and estimated rate of convergence w.r.t. grid size h for Problem 2. Number of sensitivity problems (3.5) solved. Figure 4.4 (right) displays the right singular vector r1 = EP r1 belonging to this problem. From this we infer that the largest increase in average temperature is achieved when the insulation at the larger (lower) window is improved to a higher degree than that of the smaller (upper) window, although the nominal insulation of the larger (lower) window is already twice as good. It is interesting to note that for the maximum impact on the average temperature, the insulation should be improved primarily near the edges of the windows. Again, the sensitivity y of the optimal state belonging to the perturbation of greatest impact is positive throughout (Figure 4.4 (left)). 142 Numerical Methods and Applications Problem 2: First right singular vector (windows 1 and 2) −0.45 −0.5 −0.55 −0.6 −0.65 −0.7 −0.75 −0.8 −0.85 −0.9 −0.8 −0.6 −0.4 −0.2 0 x position 0.2 0.4 0.6 Figure 4.4. Problem 2: Parametric sensitivity y (left) of the optimal state belonging to the first right singular vector r1 (right). Lower window (red) and upper window (blue). 4.3. Problem 3: Many Parameters, Large Observation Space. The final example features both large parameter and observation spaces, so that assembling the matrices Ah and XJX −1 as in the previous examples is prohibitive. Instead, we supply only matrix-vector products of XJX −1 to the iterative eigen solver. This situation is considered typical for many applications. The parameter space is chosen as in Problem 2, and the observation is the temperature on all of Ω as in Problem 1. Table 4.3 shows again the convergence of the singular values as the mesh is uniformly refined. # dof # var σ1 rate σ2 rate # Ah p 81 168 5.0771 1.1947 40 289 626 11.9262 0.93 2.3426 0.83 68 1 089 2 394 13.4326 0.32 2.6603 0.35 68 4 225 9 530 16.7587 1.15 3.3093 1.20 68 16 641 38 136 18.9500 2.31 3.7092 2.31 68 66 049 151 898 19.4037 2.48 3.7896 2.31 68 263 169 605 946 19.5024 3.8099 68 Table 4.3. Problem 3: Singular values and estimated rate of convergence w.r.t. grid size h for Problem 3. Number of sensitivity problems (3.5) solved. Note that the parameter space of Problem 1 (two constant heat transfer coefficients) is a two-dimensional subspace of the current high-dimensional parameter space. Hence, we expect the singular values for Problem 3 to be greater than those for Problem 1. This is confirmed by comparing Tables 4.1 and 4.3. However, the first two singular values σ1 and σ2 are only slightly larger than in Problem 1. In particular, the augmentation of the parameter space does not lead to additional perturbation directions of an impact comparable to the impact of r1 . Comparing the right singular vector r1 , Figure 4.5 (top right), with the right singular vector r1 = (−0.5103, −0.7324)⊤ from Problem 1, representing a piecewise constant function, we infer that the stronger insulation near the edges of the windows does not significantly increase the impact on the optimal state. We also observe that the first right singular vector r1 (Figure 4.5 (top right)) describing the perturbation of largest impact on the optimal state is very similar to 0.8 7. Quantitative Stability Analysis 143 Problem 3: First right singular vector (windows 1 and 2) −0.45 −0.5 −0.55 −0.6 −0.65 −0.7 −0.75 −0.8 −0.85 −0.9 −0.95 −0.8 −0.6 −0.4 −0.2 0 x position 0.2 0.4 0.6 0.8 Problem 3: Second right singular vector (windows 1 and 2) 1.5 1 0.5 0 −0.5 −1 −1.5 −0.8 −0.6 −0.4 −0.2 0 x position 0.2 0.4 0.6 Figure 4.5. Problem 3: First and second left singular vectors (left) and first and second right singular vectors (right), lower window (red) and upper window (blue). the right singular vector in Problem 2, see Figure 4.4 (right), although the observed quantities are different in Problems 2 and 3. Finally, we present in Figure 4.6 the distribution of the largest 20 singular values. Their fast decay shows that only a few singular values and the corresponding right singular vectors capture the practically significant perturbation directions of high impact for the problem at hand. Distribution of the largest singular values 20 18 16 14 12 10 8 6 4 2 0 0 2 4 6 8 10 12 14 16 18 20 Figure 4.6. Problem 3: First 20 singular values. 0.8 144 Numerical Methods and Applications 5. Conclusion In this paper, we presented an approach for the quantitative stability analysis of local optimal solutions in PDE-constrained optimization. The singular value decomposition of a compact linear operator was used in order to determine the perturbation direction of greatest impact on an observed quantity which in turn depends on the solution. After a Galerkin discretization, mass matrices and their Cholesky factors naturally appear in the singular value decomposition of the discretized operator. In order to avoid forming these Cholesky factors, we described a similarity transformation of the Jordan-Wielandt matrix. A matrix-vector multiplication with this transformed matrix amounts to the solution of two sensitivity problems. The desired (partial) singular value decomposition can be obtained using standard iterative eigen decomposition software, e.g., implicitly restarted Arnoldi methods. We presented a number of numerical examples to validate the proposed method and to explain the results in the context of a concrete problem. The order of convergence of the singular values deserves further investigation. We observed that the numerical effort even for the computation of few singular values may be large compared to the solution of the nominal problem itself. In order to accelerate the computation of the desired singular values and vectors, however, it may be sufficient to compute them on a coarser grid. In addition, parallel implementations of eigen solvers can be used. References [1] R. Adams and J. Fournier. Sobolev Spaces. Academic Press, New York, second edition, 2003. [2] K. Deimling. Nonlinear Functional Analysis. Springer, Berlin, 1985. [3] H. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems. Kluwer Academic Publishers, Boston, 1996. [4] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system— Part I: Solution differentiability. Numerical Functional Analysis and Optimization, 25(1–2):93– 117, 2004. [5] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system— Part II: Practical methods and examples. Optimization Methods and Software, 19(2):217–242, 2004. [6] R. Griesse and B. Vexler. Numerical sensitivity analysis for the quantity of interest in PDEconstrained optimization. SIAM Journal on Scientific Computing, 29(1):22–48, 2007. [7] M. Hinze and K. Kunisch. Second order methods for optimal control of time-dependent fluid flow. SIAM Journal on Control and Optimization, 40(3):925–946, 2001. [8] I. Jolliffe. Principal Component Analysis. Springer, New York, second edition, 2002. [9] R. B. Lehoucq, D. C. Sorensen, and C. Yang. Arpack User’s Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. Software, Environments, and Tools. SIAM, Philadelphia, 1998. [10] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002. [11] H. Maurer and J. Zowe. First and second order necessary and sufficient optimality conditions for infinite-dimensional programming problems. Mathematical Programming, 16:98–110, 1979. [12] J. Nocedal and S. Wright. Numerical Optimization. Springer, New York, 1999. [13] L. Sirovich. Turbulence and the dynamics of coherent structures. I. Quarterly of Applied Mathematics, 45(3):561–571, 1987. [14] G. Stewart and J.-G. Sun. Matrix Perturbation Thoery. Academic Press, New York, 1990. [15] F. Tröltzsch. On the Lagrange-Newton-SQP method for the optimal control of semilinear parabolic equations. SIAM Journal on Control and Optimization, 38(1):294–312, 1999. [16] F. Tröltzsch. Optimale Steuerung partieller Differentialgleichungen. Vieweg, Wiesbaden, 2005. [17] F. Tröltzsch and D. Wachsmuth. Second-order sufficient optimality conditions for the optimal control of Navier-Stokes equations. ESAIM: Control, Optimisation and Calculus of Variations, 12(1):93–119, 2006. [18] S. Volkwein. Interpretation of proper orthogonal decomposition as singular value decomposition and HJB-based feedback design. In Proceedings of the Sixteenth International Symposium on Mathematical Theory of Networks and Systems (MTNS), Leuven, Belgium, 2004. [19] D. Watkins. Fundamentals of Matrix Computations. Wiley-Interscience, New York, 2002. 8. Numerical Sensitivity Analysis for the Quantity of Interest 145 8. Numerical Sensitivity Analysis for the Quantity of Interest in PDE-Constrained Optimization R. Griesse and B. Vexler: Numerical Sensitivity Analysis for the Quantity of Interest in PDE-Constrained Optimization, SIAM Journal on Scientific Computing, 29(1), p.22– 48, 2007 As in the previous section, we consider the situation of an observed quantity which depends on the solution of an optimization problem subject to perturbations. This quantity of interest I, or output functional, is real-valued and and may differ from the cost functional J used during optimization. Using the notation of the previous section, we introduce the Lagrangian L(y, u, p; π) = J(y, u, π) + hp, e(y, u; π)i and the reduced cost functional j(π) = (J ◦ Ξ)(π). We recall that the first-order directional derivatives of j with respect to perturbations satisfy (8.1) Dj(π0 ; δπ) = D(J ◦ Ξ)(π0 ; δπ) = Lπ (Ξ(π0 )) δπ. This is due to the fact that the partial derivatives of Ly , Lu and Lp vanish at Ξ(π0 ). Moreover, (8.1) continues to hold in the presence of control constraints, see, e.g. Malanowski [2002], or Proposition 3.16 of the paper. The situation is different for a quantity of interest I 6= J, since the evaluation of its directional derivative D(I ◦ Ξ)(π0 ; δπ) requires the solution of one sensitivity problem to find DΞ(π0 ; δπ), see Proposition 3.5 and Theorem 3.6 below. The focus of the present paper is on the efficient evaluation of the gradient and Hessian of the reduced quantity of interest i(π) = (I ◦ Ξ)(π). It extends results from Becker and Vexler [2005], where the gradient case was investigated. Naturally, the gradient and Hessian information of i and j can be used, for instance, to predict the value of these quantities for perturbed problem settings (compare Section 6). From the discussion above we conclude that the straightforward evaluation of the gradient i0 (π0 ) requires the solution of dim P sensitivity problems (one for each direction), where dim P is the dimension of the parameter space. Using a duality (transposition) argument, we are able to reduce this effort to only one sensitivity problem. With the same idea, the evaluation of the Hessian can be accomplished by solving 1 + dim P sensitivity problems, rather than O (dim P )2 in a straightforward approach. The duality trick is outlined in Section 3.1 of the paper, and it is elaborated on in Sections 3.2 and 3.3 for problems with and without control constraints. In the control constrained case, we need to assume strict complementarity, and, in order for the second derivative of Ξ(π) to exist, we need to make the additional assumption that the active sets do not change when moving from the nominal to the perturbed problem, see the text before Theorem 3.19. The paper concludes with the presentation of an algorithm for the evaluation of the gradient and Hessian of the reduced cost functional and quantity of interest j(π) and i(π), and with two numerical examples which verify the proposed method. The first example is a parameter estimation problem for the incompressible Navier-Stokes equation, and the quantity of interest is either the parameter to be identified, or the drag of cylinder located in the flow. The scalar perturbation parameter enters one of the boundary conditions, and no inequality constraints are present. The second example is the control constrained boundary optimal control problem for the reaction 146 Numerical Methods and Applications diffusion system considered in Section 4 of this thesis. The quantity of interest is the total amount of control action over time, and the infinite-dimensional parameter is one of the initial states of the system. 8. Numerical Sensitivity Analysis for the Quantity of Interest 147 NUMERICAL SENSITIVITY ANALYSIS FOR THE QUANTITY OF INTEREST IN PDE-CONSTRAINED OPTIMIZATION ROLAND GRIESSE AND BORIS VEXLER Abstract. In this paper, we consider the efficient computation of derivatives of a functional (the quantity of interest) which depends on the solution of a PDEconstrained optimization problem with inequality constraints and which may be different from the cost functional. The optimization problem is subject to perturbations in the data. We derive conditions under with the quantity of interest possesses first and second order derivatives with respect to the perturbation parameters. An algorithm for the efficient evaluation of these derivatives is developed, with considerable savings over a direct approach, especially in the case of high-dimensional parameter spaces. The computational cost is shown to be small compared to that of the overall optimization algorithm. Numerical experiments involving a parameter identification problem for Navier-Stokes flow and an optimal control problem for a reaction-diffusion system are presented which demonstrate the efficiency of the method. 1. Introduction In this paper we consider PDE-constrained optimization problems with inequality constraints. The optimization problems are formulated in a general setting including optimal control as well as parameter identification problems. The problems are subject to perturbation in the data. We suppose to be given a quantity of interest (output functional), which depends on both the state and the control variables and which may be different from the cost functional used during the optimization. The quantity of interest is shown to possess first and, under tighter assumptions, second order derivatives with respect to the perturbation parameters. In the presence of control constraints, strict complementarity and compactness of certain derivatives of the state equation are assumed; for second order derivatives, stability of the active set is required in addition. The precise conditions are given in Section 3. The main contribution of this paper is to devise an efficient algorithm to evaluate these sensitivity derivatives, which offers considerable savings over a direct approach, especially in the case of high-dimensional parameter spaces. We show that the derivatives of the quantity of interest can be computed with only little additional numerical effort in comparison to the corresponding derivatives of the cost functional. Moreover, the computational cost for the evaluation of the gradient of the quantity of interest is independent of the dimension of the parameter space and low compared to that of the overall optimization algorithm. The cost to evaluate the Hessian grows linearly with the dimension of the parameter space. We refer to Table 3.1 for details. The parametric derivatives of the quantity of interest offer a significant amount of additional information on top of an optimal solution. The derivative information can be used to assess the stability of an optimal solution, or to compute a Taylor expansion which allows the fast prediction of the perturbed value of the quantity of interest in a neighborhood of a reference parameter. We note that a quantity of interest different from the cost functional is often natural. For instance, an optimization problem in fluid flow may aim at minimizing the drag of a given body, e.g., by adjusting the boundary conditions. The quantity of interest, 148 Numerical Methods and Applications however, may be the lift coefficient of the optimal configuration. We also mention the applicability of our results to bi-level optimization problems where the outer variable is the ”perturbation” parameter and the outer objective is the output functional, whose derivatives are needed to employ efficient optimization algorithms. The necessity to compute higher order derivatives may impose possible limitations to the applicability of the methods presented in this paper. Second order derivatives of the cost functional and the PDE constraint are required to evaluate the gradient of the quantity of interest, and third order derivatives are required to evaluate the Hessian. Let us put our work into perspective. The existence of first and second order sensitivity derivatives of the objective function (cost functional) in optimal control of PDEs with control constraints has been proved in [7, 17]. Moreover, [8] addresses the numerical computation of these derivatives. Recently, the computation of the gradient of the quantity of interest in the absence of inequality constraints has been discussed in [3]. Problem Setting. We consider the PDE-constrained optimization problem in the following abstract form: The state variable u in an appropriate Hilbert space V with scalar product (·, ·)V is determined by a partial differential equation (state equation) in weak form: a(u, q, p)(φ) = f (φ) ∀φ ∈ V, (1.1) where q denotes the control, or more generally, design variable in the Hilbert space Q = L2 (ω) with the standard scalar product (·, ·). Typically, ω is a subset of the computational domain Ω or a subset of its boundary ∂Ω. In case of finite dimensional controls we set Q = Rn and identify this space with L2 (ω) where ω = {1, 2, . . . , n} to keep the notation consistent. The parameter p from a normed linear space P describes the perturbations of the data. For fixed p ∈ P, the semi-linear form a(·, ·, p)(·) is defined on the Hilbert space V × Q × V. Semi-linear forms are written with two parentheses, the first one refers to the nonlinear arguments, whereas the second one embraces all linear arguments. The partial derivatives of the semi-linear form a(·, ·, p)(·) are denoted by a0u (·, ·, p)(·, ·), a0q (·, ·, p)(·, ·) etc. The linear functional f ∈ V 0 represents the right hand side of the state equation, where V 0 denotes the dual space of V. For the cost functional (objective functional) we assume the form α J(u, p) + kq − qk2Q , (1.2) 2 which is typical in PDE-constrained optimization problems. Here, α > 0 is a regularization parameter and q ∈ Q is a reference control. The functional J : V × P → R is also subject to perturbation. It is possible to extend our analysis to more general cost functionals than (1.2). In particular, only notational changes are necessary if J contains linear terms in q, and if α and q also depend on the perturbation parameter. However, full generality of the cost functional comes at the expense of additional assumptions which would unnecessarily complicate the discussion. In order to cover additional control constraints we introduce a nonempty closed convex subset Qad ⊂ Q by: Qad = {q ∈ Q | b− (x) ≤ q(x) ≤ b+ (x) a.e. on ω}, with bounds b− ≤ b+ ∈ Q. In the case of finite dimensional controls the inequality b− ≤ q ≤ b+ is meant to hold componentwise. The problem under consideration is to minimize (1.2) over Qad × V subject to the state equation (1.1) (OP(p)) 8. Numerical Sensitivity Analysis for the Quantity of Interest 149 for fixed p ∈ P. We assume that in a neighbourhood of a reference parameter p0 , there exist functions u = U (p) and q = Q(p), which map the perturbation parameter p to a local solution (u, q) of the problem (OP(p)). In Section 3, we give sufficient conditions ensuring the existence and differentiability of these functions. Our results complement previous findings in [7, 10, 17]. The quantity of interest is denoted by a functional I : V × Q × P → R. (1.3) This gives rise to the definition of the reduced quantity of interest i : P → R, i(p) = I(U (p), Q(p), p). (1.4) Likewise, we denote by j : P → R the reduced cost functional: j(p) = J(U (p), p) + α kQ(p) − qk2Q . 2 (1.5) As stated above, the main contribution of this paper is to devise an efficient algorithm to evaluate the first and second derivatives of the reduced quantity of interest i(p). The outline of the paper is as follows: In the next section we specify the first order necessary optimality conditions for the problem under consideration. We recall a primal-dual active set method for its solution. The core step of this method is described to some detail since it is also used for the problems arising during the sensitivity computation. In Section 3 we use duality arguments for the efficient evaluation of the first and second order sensitivities of the quantity of interest with respect to perturbation parameters. Throughout, we compare the standard sensitivity analysis for the reduced cost functional j(p) with our analysis for the reduced quantity of interest i(p). In the last section we discuss two numerical examples illustrating our approach. The first example deals with a parameter identification problem for a channel flow described by the incompressible Navier-Stokes equations. In the second example we consider the optimal control of time-dependent three-species reaction-diffusion equations under control constraints. 2. Optimization algorithm In this section we recall the first order necessary conditions for the problem (OP(p)) and describe the optimization algorithm with active set strategy which we use in our numerical examples. In particular, we specify the Newton step taking into account the active sets since the sensitivity problems arising in Section 3 are solved by the same technique. Throughout the paper we make the following assumption: Assumption 2.1. (1) Let a(·, ·, ·)(·) be three times continuously differentiable with respect to (u, q, p). (2) Let J(·, ·) be three times continuously differentiable with respect to (u, p). (3) Let I(·, ·, ·) be twice continuously differentiable with respect to (u, q, p). In order to establish the optimality system, we introduce the Lagrangian L : V × Q × V × P → R as follows: L(u, q, z, p) = J(u, p) + α kq − qk2Q + f (z) − a(u, q, p)(z), 2 (2.1) 150 Numerical Methods and Applications where z ∈ V denotes the adjoint state. The first order necessary conditions for the problem (OP(p)) read: L0u (u, q, z, p)(δu) 0 Lq (u, q, z, p)(δq − q) L0z (u, q, z, p)(δz) = 0 ∀δu ∈ V, ≥ 0 ∀δq ∈ Qad , (2.2) (2.3) = (2.4) 0 ∀δz ∈ V. They can be explicitly rewritten as follows: Ju0 (u, p)(δu) − a0u (u, q, p)(δu, z) = 0 ∀δu ∈ V, α(q − q, δq − q) − a0q (u, q, p)(δq − q, z) ≥ 0 ∀δq ∈ Qad , f (δz) − a(u, q, p)(δz) = 0 ∀δz ∈ V. (2.5) (2.6) (2.7) For given u, q, z, p, we introduce an additional Lagrange multiplier µ ∈ L2 (ω) by the following identification: (µ, δq) := −L0q (u, q, z, p)(δq) = −α(q − q, δq) + a0q (u, q, p)(δq, z) ∀δq ∈ L2 (ω). The variational inequality (2.6) is known to be equivalent to the following pointwise conditions almost everywhere on ω : q(x) = b− (x) ⇒ µ ≤ 0, (2.8) q(x) = b+ (x) ⇒ µ ≥ 0, (2.9) b− (x) < q(x) < b+ (x) ⇒ µ = 0. (2.10) In addition to the necessary conditions above, in the following lemma we recall second order sufficient optimality conditions: Lemma 2.2 (Sufficient optimality conditions). Let x = (u, q, z) satisfy the first order necessary conditions (2.2)–(2.4) of (OP(p)). Moreover, let a0u (u, q, p) : V → V 0 be surjective. If there exists ρ > 0 such that δu δq L00uu (x, p) L00uq (x, p) δu ≥ ρ kδuk2V + kδqk2Q 00 00 Lqu (x, p) Lqq (x, p) δq holds for all (δu, δq) satisfying the linear (tangent) PDE a0u (u, q, p)(δu, ϕ) + a0q (u, q, p)(δq, ϕ) = 0 ∀ϕ ∈ V, then (u, q) is a strict local optimal solution of (OP(p)). For the proof we refer to [18]. For the solution of the first order necessary conditions (2.5)–(2.7) for fixed p ∈ P, we employ a nonlinear primal-dual active set strategy, see [4, 12, 15, 20]. In the following we sketch the corresponding algorithm on the continuous level: 8. Numerical Sensitivity Analysis for the Quantity of Interest 151 Nonlinear primal-dual active set strategy (1) Choose initial guess u0 , q 0 , z 0 , µ0 and c > 0 and set n = 1 (2) While not converged (3) Determine the active sets An+ and An− An− = {x ∈ ω | q n−1 + µn−1 /c − b− ≤ 0} An+ = {x ∈ ω | q n−1 + µn−1 /c − b+ ≥ 0} (4) Solve the equality-constrained optimization problem Minimize J(un , p) + α n kq − qk2Q over V × Q 2 subject to (1.1) and to q n (x) = b− (x) on An− q n (x) = b+ (x) on An+ with adjoint variable z n (5) Set µn = −α(q n − q) + a0q (un , q n , p)(·, z n ) (6) Set n = n + 1 and go to 2. Remark 2.3. (1) The initial guess for the Lagrange multiplier µ0 can be taken according to step 5. Another possibility is choosing µ0 = 0 and q 0 ∈ Qad , which leads to solving the optimization problem (step 4) without control constraints in the first iteration. (2) The convergence in step 2 can determined conveniently from agreement of the active sets in two consecutive iterations. Later on, the above algorithm is applied on the discrete level. The concrete discretization schemes are described in Section 4 for each individual example. Clearly, the main step in the primal-dual algorithm is the solution of the equalityconstrained nonlinear optimization problem in step 4. We shall describe the Lagrange Newton SQP method for its solution in some detail since exactly the same procedure may be used to solve the sensitivity problems in Section 3, which are the main focus of our paper. For given active and inactive sets A = A+ ∪ A− and I = ω \ A, let us define the ”restriction” operator RI : L2 (ω) → L2 (ω) by RI (q) = q · χI , where χI is a characteristic function of the set I. Similarly, the operators RA , RA+ and RA− are defined. Note that RI etc. are obviously self-adjoint. 152 Numerical Methods and Applications The first order necessary conditions for the purely equality-constrained problem in step 4 are (compare (2.2)–(2.4), respectively (2.5)–(2.7)): L0u (u, q, z, p)(δu) L0q (u, q, z, p)(δq) q − b− q − b+ L0z (u, q, z, p)(δz) = 0 ∀δu ∈ V, = 2 0 ∀δq ∈ L (I ), (2.12) An− An+ (2.13) (2.14) = 0 on = 0 on = (2.11) n 0 ∀δz ∈ V, (2.15) with the inactive set I n = ω\(An− ∪An+ ). Using the restriction operators, (2.12)–(2.14) can be reformulated as L0q (u, q, z, p)(RI n δq) + (q − b− , RAn− δq) + (q − b+ , RAn+ δq) = 0 ∀δq ∈ Q. The Lagrange Newton SQP method is defined as Newton’s method, applied to (2.11)– (2.15). To this end, we define B as the Hessian operator of the Lagrangian L, i.e. 00 Luu (x, p)(·, ·) L00uq (x, p)(·, ·) L00uz (x, p)(·, ·) (2.16) B(x, p) = L00qu (x, p)(·, ·) L00qq (x, p)(·, ·) L00qz (x, p)(·, ·) L00zu (x, p)(·, ·) L00zq (x, p)(·, ·) 0 To shorten the notation, we abbreviate x = (u, q, z) and X = V × Q × V. Note that B(x, p) is a bilinear operator on the space X . By ”multiplication” of B with an element δx ∈ X from the left, we mean the insertion of the components of δx into the first argument. Similarly we define the ”multiplication” of B with an element δx ∈ X from the right as insertion of the components of δx into the second argument. When only one element is inserted, B is interpreted as a linear operator B : X → X 0 . In the sequel, we shall omit the (·, ·) notation if no ambiguity arises. In the absence of control constraints, the Newton update (∆u, ∆q, ∆z) for (2.11)– (2.15) at the current iterate (uk , qk , zk ) is given by the solution of 0 Lu (xk , p) ∆u B(xk , p) ∆q = − L0q (xk , p) . (2.17) ∆z L0z (xk , p) With non-empty active sets An− and An+ , however, (2.17) is replaced by L0u (xk , p) ∆u 0 e k , p) ∆q = − RI n Lq (xk , p) + RAn (qk − b− ) + RAn (qk − b+ ) B(x − + ∆z L0z (xk , p) where e k , p) = B(x id RI n id B(xk , p) id RI n id + 0 (2.18) RAn . 0 (2.19) e is obtained from B by replacing those components in the derivatives In other words, B with respect to the control q by the identity which belong to the active set. In our practical realization, we reduce the system (2.18) to the control space L2 (ω) using Schur complement techniques, see, e.g., [16]. The reduced system is solved iteratively using the conjugate gradient method, where each step requires the evaluation of a matrix–vector product for the reduced Hessian, which in turn requires the solution of one tangent and one dual problem, see, e.g., [13], or [2] for a detailed description of this procedure in the context of space-time finite element discretization of the problem. In fact, the reduced system needs to be solved only on the currently inactive part L2 (I n ) of the control space since on the active sets, the update ∆q satisfies the trivial relation RAn± (∆q) = RAn± (b± − qk−1 ). 8. Numerical Sensitivity Analysis for the Quantity of Interest 153 The Newton step is completed by applying the update (uk+1 , qk+1 , zk+1 ) = (uk , qk , zk )+ (∆u, ∆q, ∆z). 3. Sensitivity analysis In this section we analyze the behavior of local optimal solutions for (OP(p)) under perturbations of the parameter p. We derive formulas for the first and second order derivatives of the reduced quantity of interest and develop an efficient method for their evaluation. To set the stage, we outline the main ideas in Section 3.1 by means of a finitedimensional optimization problem, without partitioning the optimization variables into states and controls, and in the absence of control constraints. To facilitate the discussion of the infinite-dimensional case, we treat the case of no control constraints in Section 3.2 and turn to problems with these constraints in Section 3.3. Throughout, we compare the standard sensitivity analysis for the reduced cost functional j(p) (1.5) with our analysis for the reduced quantity of interest i(p) (1.4). The main results can be found in Theorems 3.6 for the unconstrained case and Theorems 3.18 and 3.21 for the case with control constraints. An algorithm at the end of Section 3 summarizes the necessary steps to evaluate the various sensitivity quantities. 3.1. Outline of ideas. Let us consider the nonlinear finite-dimensional equalityconstrained optimization problem Minimize J(x, p) s.t. g(x, p) = 0, (3.1) where x ∈ Rn denotes the optimization variable, p ∈ Rd is the perturbation parameter, and g : Rn × Rd → Rm collects a number of equality constraints. The Lagrangian of (3.1) is L(x, p) = J(x, p) − z > g(x, p), and under standard constraint qualifications, a local minimizer x0 of (3.1) at the reference parameter p0 has an associated Lagrange multiplier z0 ∈ Rm such that L0x (x0 , z0 , p0 ) = Jx0 (x0 , p0 ) − z0> gx0 (x0 , p0 ) = 0 L0z (x0 , z0 , p0 ) = g(x0 , p0 ) = 0 (3.2) holds. If we assume second order sufficient conditions to hold in addition, then the implicit function theorem yields the local existence of functions X(p) and Z(p) which satisfy (3.2) with p instead of p0 , and X(p0 ) = x0 and Z(p0 ) = z0 hold. Moreover, (3.2) can be differentiated totally with respect to the parameter and we obtain 00 0 00 Lxp (x0 , z0 , p0 ) δp X (p0 ) δp Lxx (x0 , z0 , p0 ) gx0 (x0 , p0 )> . (3.3) = − gp0 (x0 , p0 ) δp Z 0 (p0 ) δp gx0 (x0 , p0 ) 0 The solution of (3.3) is a directional derivative of X(p) (and Z(p)) at p = p0 , and we note that it is equivalent to the solution of a linear-quadratic optimization problem. Hence the evaluation of the full Jacobian X 0 (p0 ) requires d = dim P solves of (3.3) with different δp. In our context of large-scale problems, iterative solvers need to be used and the numerical effort to evaluate the full Jacobian scales linearly with the number of right hand sides, i.e., with the dimension of the parameter space d = dim P. We adapt the definition of the reduced cost functional and the reduced quantity of interest to our current setting, j(p) = J(X(p), p) and i(p) = I(X(p), p). Since we wish to compare the effort to compute the first and second order derivatives of both, we begin by recalling the following result: 154 Numerical Methods and Applications Lemma 3.1. Under the conditions above, the reduced cost functional is twice differentiable and j 0 (p0 ) δp = L0p (x0 , z0 , p0 ) δp b + L00 (x0 , z0 , p0 )Z 0 (p0 ) δp b b = δp> L00 (x0 , z0 , p0 )X 0 (p0 ) δp δp> j 00 (p0 ) δp px pz b . + L00pp (x0 , z0 , p0 ) δp Proof. We have j(p) = L(X(p), Z(p), p) and hence by the chain rule j 0 (p0 ) = L0x (x0 , z0 , p0 )X 0 (p0 ) + L0z (x0 , z0 , p0 )Z 0 (p0 ) + L0p (x0 , z0 , p0 ), where the first two terms vanish in view of (3.2). Differentiating again totally with respect to p yields the expression for the second derivative. Lemma 3.1 shows that the evaluation of the gradient of j(·) does not require any linear solves of the sensitivity system (3.3), while the evaluation of the Hessian requires d = dim P such solves. The corresponding results for the infinite-dimensional case can be found below in Propositions 3.5 and 3.16 for the unconstrained and control constrained cases. We will show now that the derivatives of the reduced quantity of interest i(·) can be evaluated efficiently, requiring just one additional system solve. This is a significant improvement over a direct approach, compare Table 3.1. From a first look at i0 (p0 ) δp = Ix0 (x0 , p0 )X 0 (p0 ) δp + Ip0 (x0 , p0 ) δp it seems that the evaluation of the gradient i0 (p0 ) requires d = dim P solves of the system (3.3). This is referred to as the direct approach in Table 3.1. However, using (3.3), we may rewrite this as L00xp (x0 , z0 , p0 ) δp i0 (p0 ) δp = − Ix0 (x0 , p0 ), 0 B0−1 + Ip0 (x0 , p0 ) δp, gp0 (x0 , p0 ) δp where B0 is the matrix on the left hand side of (3.3). Realizing that Ix0 (x0 , p0 ) has just one row, evaluating the term in square brackets amounts to only one linear system solve. We define the dual quantities (v, y) by 0 v I (x , p ) B0> =− x 0 0 y 0 and finally obtain i0 (p0 ) δp = v > L00xp (x0 , z0 , p0 ) δp + y > L00zp (x0 , z0 , p0 ) δp + Ip0 (x0 , p0 ) δp. (3.4) We refer to this as a dual approach. In our context, B0 is symmetric and hence the computation of the dual quantities requires just one solve of (3.3) with a modified right hand side, see again Table 3.1. For the second derivative, we differentiate (3.4) totally with respect to p. From the chain rule we infer that the sensitivities X 0 (p0 ) and Z 0 (p0 ) now come into play. In addition, v and y need to be differentiated with respect to p, but again a duality technique can be used in order to avoid computing these extra terms. Hence the extra computational cost to evaluate the Hessian of i(·) amounts to d = dim P solves for the evaluation of the sensitivity matrices X 0 (p0 ) and Z 0 (p0 ), see Table 3.1. Details can be found in the proofs of Theorems 3.6 for the unconstrained case and Theorems 3.18 and 3.21 for the case with control constraints. 8. Numerical Sensitivity Analysis for the Quantity of Interest 155 3.2. The case of no control constraints. Throughout this and the following section, we denote by p0 ∈ P a given reference parameter and by x0 = (u0 , q0 , z0 ) a solution to the corresponding first order optimality system (2.11)–(2.15). Moreover, we make the following regularity assumption which we require throughout: Assumption 3.2. Let the derivative a0u (u0 , q0 , p0 ) : V → V 0 be both surjective and injective, so that it possesses a continuous inverse. In the case of no control constraints, i.e., Qad = Q, the first order necessary conditions (2.11)–(2.15) simplify to L0u (u, q, z, p)(δu) = L0q (u, q, z, p)(δq) = 0 ∀δu ∈ V, 0 ∀δq ∈ Q, (3.5) (3.6) L0z (u, q, z, p)(δz) 0 ∀δz ∈ V. (3.7) = The analysis in this subsection is based on the classical implicit function theorem. We denote by B0 = B(x0 , p0 ) the previously defined Hessian operator at the given reference solution. For the results in this section we require that B0 is boundedly invertible. This property follows from the second order sufficient conditions, see for instance [14]: Lemma 3.3. Let the second order sufficient conditions set forth in Lemma 2.2 hold at x0 for OP(p0 ). Then B0 is boundedly invertible. The following lemma is a direct application of the implicit function theorem (see [5]) to the first order optimality system (3.5)–(3.7). Lemma 3.4. Let B0 be boundedly invertible. Then there exist neighborhoods N (p0 ) ⊂ P of p0 and N (x0 ) ⊂ X of x0 and a continuously differentiable function (U, Q, Z) : N (p0 ) → N (x0 ) with the following properties: (a) For every p ∈ N (p0 ), (U (p), Q(p), Z(p)) is the unique solution to the system (3.5)–(3.7) in the neighborhood N (x0 ). (b) (U (p0 ), Q(p0 ), Z(p0 )) = (u0 , q0 , z0 ) holds. (c) The derivative of (U, Q, Z) at p0 in the direction δp ∈ P is given by the unique solution of 0 00 Lup (x0 , p0 )(·, δp) U (p0 )(δp) B0 Q0 (p0 )(δp) = − L00qp (x0 , p0 )(·, δp) . (3.8) Z 0 (p0 )(δp) L00zp (x0 , p0 )(·, δp) In the following proposition we recall the first and second order sensitivity derivatives of the cost functional j(p), compare [17]. Proposition 3.5. Let B0 be boundedly invertible. Then the reduced cost functional j(p) = J(U (p), p) + α2 kQ(p) − qk2Q is twice continuously differentiable in N (p0 ). The first order derivative at p0 in the direction δp ∈ P is given by j 0 (p0 )(δp) = L0p (x0 , p0 )(δp). (3.9) b we have For the second order derivative in the directions of δp and δp, b = L00 (x0 , p0 )(U 0 (p)(δp), δp) b + L00 (x0 , p0 )(Q0 (p)(δp), δp) b j 00 (p0 )(δp, δp) up qp b + L00 (x0 , p0 )(δp, δp). b +L00zp (x0 , p0 )(Z 0 (p)(δp), δp) pp Proof. Since (U (p), Q(p)) satisfies the state equation, we have j(p) = L(U (p), Q(p), Z(p), p) (3.10) 156 Numerical Methods and Applications for all p ∈ N (p0 ). By the chain rule, the derivative of j(p) reads j 0 (p0 )(δp) = L0u (x0 , p0 )(U 0 (p0 )(δp)) + L0q (x0 , p0 )(Q0 (p0 )(δp)) + L0z (x0 , p0 )(Z 0 (p0 )(δp)) +L0p (x0 , p0 )(δp). The three terms in the first line vanish in view of the optimality system (3.5)–(3.7). b yields (3.10), Differentiating (3.9) again totally with respect to p in the direction of δp which completes the proof. The previous proposition allows to evaluate the first order derivative of the reduced cost functional without computing the sensitivity derivatives of the state, control and adjoint variables. That is, the effort to evaluate j 0 (p0 ) is negligible compared to the effort required to solve the optimization problem. In order to obtain second order derivative j 00 (p0 ), however, the sensitivity derivatives have to be computed according to formula (3.8). This corresponds to the solution of one additional linear-quadratic optimization problem per perturbation direction δp, whose optimality system is given by (3.8). We now turn to our main result in the absence of control constraints. In the following theorem, we show that the first and second order derivatives of the quantity of interest can be evaluated at practically the same effort as those of the cost functional. To this end, we use a duality technique (see Section 3.1) and formulate the following dual problem for the dual variables v ∈ V, r ∈ Q and y ∈ V: 0 Iu (q0 , u0 , p0 ) v B0 r = − Iq0 (q0 , u0 , p0 ) . y 0 (3.11) We remark that this dual problem involves the same operator matrix B0 as the sensitivity problem (3.8) since B0 is self-adjoint. Theorem 3.6. Let B0 be boundedly invertible. Then the reduced quantity of interest i(p) defined in (1.4) is twice continuously differentiable in N (p0 ). The first order derivative at p0 in the direction δp ∈ P is given by i0 (p0 )(δp) = L00up (x0 , p0 )(v, δp) + L00qp (x0 , p0 )(r, δp) + L00zp (x0 , p0 )(y, δp) + Ip0 (u0 , q0 , p0 )(δp). (3.12) b we have For the second order derivative in the directions of δp and δp, b = hv, ηi i00 (p0 )(δp, δp) V×V 0 + hr, κiQ×Q0 + hy, σiV×V 0 0 0 > 00 00 00 b U (p0 )(δp) Iuu (q0 , u0 , p0 ) Iuq (q0 , u0 , p0 ) Iup (q0 , u0 , p0 ) U (p0 )(δp) 00 00 00 b (q0 , u0 , p0 ) Iqq (q0 , u0 , p0 ) Iqp (q0 , u0 , p0 ) Q0 (p0 )(δp) +Q0 (p0 )(δp) Iqu . 00 00 00 b δp Ipu (q0 , u0 , p0 ) Ipq (q0 , u0 , p0 ) Ipp (q0 , u0 , p0 ) δp (3.13) 8. Numerical Sensitivity Analysis for the Quantity of Interest 157 Here, (η, κ, σ) ∈ V 0 × Q0 × V 0 is given by 000 b Lupp ()(·, δp, δp) η b κ = L000 qpp ()(·, δp, δp) b σ L000 zpp ()(·, δp, δp) 0 000 0 000 0 b b b L000 upu ()(·, δp, U (p0 )(δp)) + Lupq ()(·, δp, Q (p0 )(δp) + Lupz ()(·, δp, Z (p0 )(δp) 0 000 0 000 0 b b b + L000 qpu ()(·, δp, U (p0 )(δp)) + Lqpq ()(·, δp, Q (p0 )(δp) + Lqpz ()(·, δp, Z (p0 )(δp) 0 000 0 b b L000 zpu ()(·, δp, U (p0 )(δp)) + Lzpq ()(·, δp, Q (p0 )(δp) U 0 (p0 )(δp) b + B 0 ()(Q0 (p0 )(δp)) b + B 0 ()(Z 0 (p0 )(δp)) b + B 0 ()(δp) b Q0 (p0 )(δp) . + Bu0 ()(U 0 (p0 )(δp)) q z p Z 0 (p0 )(δp) Remark 3.7. (a) In the definition of (η, κ, σ) we have abbreviated the evaluation at the point (x0 , p0 ) by (). (b) The bracket h·, ·iV×V 0 in (3.13) denotes the duality pairing between V and its dual space V 0 . For instance, the evaluation of hv, ηiV×V 0 amounts to plugging in v instead of · in the definition of η. A similar notation is used for the control space Q. (c) It is tedious but straightforward to check that (3.13) coincides with (3.10) if the quantity of interest is chosen equal to the cost functional. In this case, it follows from (3.11) that the dual quantities v and r vanish and y = z0 holds. Proof. (of Theorem 3.6) From the definition of the reduced quantity of interest (1.4), we infer that i0 (p0 )(δp) = Iu0 (u0 , q0 , p0 )(U 0 (p0 )(δp)) + Iq0 (u0 , q0 , p0 )(Q0 (p0 )(δp)) + Ip0 (u0 , q0 , p0 )(δp) (3.14) holds. In virtue of (3.8) and (3.11), the sum of the first two terms equals > 00 0 > 00 Lup (x0 , p0 )(·, δp) Lup (x0 , p0 )(·, δp) Iu (u0 , q0 , p0 ) v −1 − Iq0 (u0 , q0 , p0 ) B0 L00qp (x0 , p0 )(·, δp) = r L00qp (x0 , p0 )(·, δp) y L00zp (x0 , p0 )(·, δp) L00zp (x0 , p0 )(·, δp) 0 which implies (3.12). In order to obtain the second derivative, we differentiate (3.14) b This yields totally with respect to p in the direction of δp. b = i00 (p0 )(δp, δp) 0 0 > 00 00 00 b U (p0 )(δp) Iuu (q0 , u0 , p0 ) Iuq (q0 , u0 , p0 ) Iup (q0 , u0 , p0 ) U (p0 )(δp) 00 00 00 Q0 (p0 )(δp) Iqu b (q0 , u0 , p0 ) Iqq (q0 , u0 , p0 ) Iqp (q0 , u0 , p0 ) Q0 (p0 )(δp) 00 00 00 b δp Ipu (q0 , u0 , p0 ) Ipq (q0 , u0 , p0 ) Ipp (q0 , u0 , p0 ) δp 0 > 00 b U (p0 )(δp, δp) Iu (u0 , q0 , p0 ) 00 0 b + Iq (u0 , q0 , p0 ) Q (p0 )(δp, δp) . (3.15) b 0 Z 00 (p0 )(δp, δp) b we obtain From differentiating (3.8) totally with respect to p in the direction of δp, b U 00 (p0 )(δp, δp) η b B0 Q00 (p0 )(δp, δp) (3.16) = − κ . b σ Z 00 (p0 )(δp, δp) From here, (3.13) follows. The main statement of the previous theorem is that the first and second order derivatives of the reduced quantity of interest can be evaluated at the additional 158 Numerical Methods and Applications Table 3.1. Number of linear-quadratic problems to be solved to evaluate the derivatives of j(p) and i(p). reduced cost functional j(p) gradient Hessian reduced quantity of interest i(p) dual approach direct approach 0 dim P 1 1 + dim P dim P (dim P) (dim P+1)/2 expense of just one dual problem (3.11), compared to the evaluation of the reduced cost functional’s derivatives. More precisely, computing the gradient of i(p) at p0 requires only the solution of (3.11). In addition, in order to compute the Hessian of i(p) at p0 , the sensitivity quantities U 0 (p0 ), Q0 (p0 ) and Z 0 (p0 ) need to be evaluated in the directions of a collection of basis vectors of the parameter space P. That is, dim P sensitivity problems (3.8) need to be solved. These are exactly the same problems which have to be solved for the computation of the Hessian of the reduced cost functional, see Table 3.1. Note that in the combined effort 1 + dim P, ”1” refers to the same dual problem (3.11) that has already been solved during the computation of the gradient of i(p). In case that the space P is infinite-dimensional, it needs to be discretized first. Finally, in order to evaluate the second order Taylor expansion for a given direction δp, 1 i(p0 + δp) ≈ i(p0 ) + i0 (p0 )(δp) + i00 (p0 )(δp, δp), 2 the same dual problem (3.11) and one sensitivity problem (3.8) in the direction of δp are needed, see Table 3.1. Note that the sensitivity and dual problems (3.8) and (3.11), respectively, are solved by the technique described in Section 2. The solution of such problem amounts to the computation of one additional QP step (2.17), with different right hand side. Therefore, the numerical effort to compute, e.g., the second order Taylor expansion for a given direction is typically low compared to the solution of the nonlinear optimization problem OP(p0 ) 3.3. The control-constrained case. The analysis is based on the notion of strong regularity for the problem OP(p). Strong regularity extends the previous assumption of bounded invertibility of B0 used throughout Section 3.2. Below, we make use of µ0 ∈ Q given by the following identification: (µ0 , δq) = −L0q (x0 , p0 )(δq) ∀δq ∈ Q. (3.17) This quantity acts as a Lagrange multiplier for the control constraint q ∈ Qad . For the definition of strong regularity we introduce the following linearized optimality system which depends on ε = (εu , εq , εz ) ∈ V × Q × V: (LOS(ε)) L00uu (x0 , p0 )(δu, u − u0 ) + L00uq (x0 , p0 )(δu, q − q0 ) +L00uz (x0 , p0 )(δu, z − z0 ) + L0u (x0 , p0 )(δu) + (εu , δu)V = 0 ∀δu ∈ V L00uq (x0 , p0 )(u − u0 , δq − q) + L00qq (x0 , p0 )(δq − q, q − q0 ) 0 +Lq (x0 , p0 )(δq − q) + (εq , δq − q) L00zu (x0 , p0 )(δz, u − u0 ) + L00zq (x0 , p0 )(δz, q − q0 ) +L0z (x0 , p0 )(δz) + (εz , δz)V In the sequel, we refer to (3.18)–(3.20) as (LOS(ε)). + L00qz (x0 , p0 )(δq ≥0 ∀δq ∈ Qad = 0 ∀δz ∈ V (3.18) − q, z − z0 ) (3.19) (3.20) 8. Numerical Sensitivity Analysis for the Quantity of Interest 159 Definition 3.8 (Strong Regularity). Let p0 ∈ P be a given reference parameter and let x0 = (u0 , q0 , z0 ) be a solution to the corresponding first order optimality system (2.5)– (2.7). If there exist neighborhoods N (0) ⊂ X = V × Q × V of 0 and N (x0 ) ⊂ X of x0 such that the following conditions hold: (a) For every ε ∈ N (0), there exists a solution (uε , q ε , z ε ) to the linearized optimality system (3.18)–(3.20). (b) (uε , q ε , z ε ) is the unique solution of (3.18)–(3.20) in N (x0 ). (c) (uε , q ε , z ε ) depends Lipschitz-continuously on ε, i.e., there exists L > 0 such that kuε1 − uε2 kV + kq ε1 − q ε2 kQ + kz ε1 − z ε2 kV ≤ L kε1 − ε2 kX (3.21) holds for all ε1 , ε2 ∈ N (0), then the first order optimality system (2.5)–(2.7) is called strongly regular at x0 . Note that (u0 , q0 , z0 ) solves (3.18)–(3.20) for ε = 0. It is not difficult to see that in the case of no control constraints, i.e., Q = Qad , strong regularity is nothing else than bounded invertibility of B0 which we had to assume in Section 3.2. In the following lemma we show that strong regularity holds under suitable second order sufficient optimality conditions, in analogy to Lemma 3.3. The proof can be carried out using the techniques presented in [21]. Lemma 3.9. Let the second order sufficient optimality conditions set forth in Lemma 2.2 hold at x0 for OP(p0 ). Then for any ε ∈ X , (3.18)–(3.20) has a unique solution (uε , q ε , z ε ) and the map X 3 ε 7→ (uε , q ε , z ε ) ∈ X (3.22) is Lipschitz continuous. That is, the optimality system is strongly regular at x0 . In the next step, we proceed to prove that the solution (uε , q ε , z ε ) of the linearized optimality system (3.18)–(3.20) is directionally differentiable with respect to the perturbation ε. To this end, we need the following assumption: Assumption 3.10. At the reference point (u0 , q0 , z0 ), let the following linear operators be compact: (1) V 3 u 7→ a00qu (u0 , q0 , p0 )(·, u, z0 ) ∈ Q0 (2) Q 3 q 7→ a00qq (u0 , q0 , p0 )(·, q, z0 ) ∈ Q0 (3) V 3 z 7→ a0q (u0 , q0 , p0 )(·, z) ∈ Q0 Remark 3.11. The previous assumption is satisfied for the following important classes of PDE-constrained optimization problems on bounded domains Ω ⊂ Rd , d ∈ {1, 2, 3}: (a) If (OP(p)) is a distributed optimal control problem for a semilinear elliptic PDE, e.g., −∆u = f (u) + q on Ω with V = H01 (Ω) and Q = L2 (Ω), then a00qu = a00qq = 0 and a0q is the compact injection of V into Q. (b) In the case of Neumann boundary control on ∂Ω, e.g., −∆u = f (u) on Ω and ∂ u = q on ∂Ω, ∂n we have V = H 1 (Ω) and Q = L2 (∂Ω). Again, a00qu = a00qq = 0 and a0q is the compact Dirichlet trace operator from V to Q. 160 Numerical Methods and Applications (c) For bilinear control problems, e.g., −∆u = qu + f on Ω with V = H01 (Ω), Q = L2 (Ω) and an appropriate admissible set Qad , we have a00qq = 0. Moreover, the operators u 7→ a00qu (u0 , q0 , p0 )(·, u, z0 ) = (uz0 , ·) and z 7→ a0q (u0 , q0 , z0 ) = (u0 z, ·) are compact from V to Q0 since the pointwise product of two functions in V embeds compactly into Q. (d) For parabolic equations such as ut = ∆u + f (u) + q with solutions in V = {u ∈ L2 (0, T ; H01 (Ω)) : ut ∈ L2 (0, T ; H −1 (Ω)} we have a00qu = a00qq = 0 and a0q is the compact injection of V into Q = L2 (Ω × (0, T )). (e) Finally, Assumption 3.10 is always satisfied if the space Q is finite-dimensional. This includes all cases of parameter identification problems without any additional restrictions on the coupling between the parameters q and the state variable u. For instance, the Arrhenius law leads to reaction-diffusion equations of the form −∆u = f (u) + equ on Ω with unknown Arrhenius parameter q ∈ R. bad , defined as For the following theorem, we introduce the admissible set Q bad = {q̂ ∈ Q : bb− (x) ≤ q̂(x) ≤ bb+ (x) a.e. on ω} Q with bounds bb− (x) bb+ (x) ( 0 if µ0 (x) 6= 0 or q0 (x) = b− (x) = −∞ else ( 0 if µ0 (x) 6= 0 or q0 (x) = b+ (x) = +∞ else. Theorem 3.12. Let the second order sufficient optimality conditions set forth in Lemma 2.2 hold at x0 for OP(p0 ) in addition to Assumption 3.10. Then the map (3.22) is directionally differentiable at ε = 0 in every direction δε = (δεu , δεq , δεz ) ∈ X . The directional derivative is given by the unique solution (û, q̂) and adjoint variable ẑ of the following linear-quadratic optimal control problem, termed DQP(δε): L00uu (x0 , p0 ) L00uq (x0 , p0 ) 1 û û q̂ + (û, δεu )V + (q̂, δεq ) Minimize L00qu (x0 , p0 ) L00qq (x0 , p0 ) q̂ 2 (DQP(δε)) bad and subject to q̂ ∈ Q a0u (u0 , q0 , p0 )(û, φ) + a0q (u0 , q0 , p0 )(q̂, φ) + (δεz , φ) = 0 for all φ ∈ V. The first order optimality conditions for this problem read: L00uu (x0 , p0 )(δu, û) + L00uq (x0 , p0 )(δu, q̂) +L00uz (x0 , p0 )(δu, ẑ) + (δεu , δu) = 0 ∀δu ∈ V (3.23) ∀δq ∈ Q̂ad (3.24) ∀δz ∈ V. (3.25) L00uq (x0 , p0 )(û, δq − q̂) + L00qq (x0 , p0 )(δq − q̂, q̂) +L00qz (x0 , p0 )(δq − q̂, ẑ) + (δεq , δq − q̂) ≥ 0 L00zu (x0 , p0 )(δz, û) + L00zq (x0 , p0 )(δz, q̂) +(δεz , δz) = 0 8. Numerical Sensitivity Analysis for the Quantity of Interest 161 Proof. Let δε = (δεu , δεq , δεz ) ∈ X be given and let {τn } ⊂ R+ denote a sequence converging to zero. We denote by (un , qn , zn ) ∈ X the unique solution of LOS(εn ) where εn = τn δε. Note that (u0 , q0 , z0 ) is the unique solution of LOS(0) and that (un , qn , zn ) → (u0 , q0 , z0 ) strongly in X . From Lemma 3.9 we infer that un − u0 + qn − q0 + zn − z0 ≤ L kδεkX . τn τn τn V Q V This implies that a subsequence (still denoted by index n) of the difference quotients converges weakly to some limit element (û, q̂, ẑ) ∈ X . The proof proceeds with the construction of the pointwise limit qe of (qn − q0 )/τn , which is later shown to coincide with q̂. It is well known that the variational inequality (3.19) in LOS(εn ) can be equivalently rewritten as qn (x) = Π[b− (x),b+ (x)] dn (x) a.e. on ω, (3.26) where Π[b− (x),b+ (x)] is the projection onto the interval [b− (x), b+ (x)] and 1 00 a (u0 , q0 , p0 )(·, un − u0 , z0 ) + a00qq (u0 , q0 , p0 )(·, qn − q0 , z0 ) dn = q̄ + α qu +a0q (u0 , q0 , p0 )(·, zn ) − εqn ∈ Q. (3.27) The linear operators in (3.27) are understood as their Riesz representations in Q. Similarly, we have q0 (x) = Π[b− (x),b+ (x)] d0 (x) a.e. on ω, where 1 0 a (u0 , q0 , p0 )(·, z0 ) ∈ Q. (3.28) α q Note that dn → d0 strongly in Q since the Fréchet derivatives in (3.27) are bounded linear operators. From the compactness properties in Assumption 3.10 we infer that d0 = q̄ + dn − d0 → dˆ strongly in Q, τn where 1 00 dˆ = aqu (u0 , q0 , p0 )(·, û, z0 ) + a00qq (u0 , q0 , p0 )(·, q̂, z0 ) + a0q (u0 , q0 , p0 )(·, ẑ) − δεq . α By taking another subsequence, we obtain that dn → d0 and (dn − d0 )/τn → dˆ hold also pointwise a.e. on ω. The construction of the pointwise limit qn (x) − q0 (x) τn uses the following partition of ω into five disjoint subsets: qe(x) = lim n→∞ ω = ω I ∪ ω0+ ∪ (ω + \ ω0+ ) ∪ ω0− ∪ (ω − \ ω0− ) (3.29) where ω I = {x ∈ ω : b− (x) < q0 (x) < b+ (x)} (inactive) (3.30a) ω0+ + = {x ∈ ω : µ0 (x) > 0} (upper strongly active) (3.30b) ω = {x ∈ ω : q0 (x) = b+ (x)} (upper active) (3.30c) ω0− − = {x ∈ ω : µ0 (x) < 0} (lower strongly active) (3.30d) (lower active). (3.30e) ω = {x ∈ ω : q0 (x) = b− (x)} The Lagrange multiplier µ0 belonging to the constraint q0 ∈ Qad defined in (3.17) allows the following representation: µ0 = α(d0 − q0 ). (3.31) Note that the five sets in (3.29) are guaranteed to be disjoint if b− (x) < b+ (x) holds a.e. on ω. However, one can easily check that qe is well-defined also in the case that 162 Numerical Methods and Applications the bounds coincide on all or part of ω. We now distinguish 5 cases according to the sets in (3.29): Case 1: For almost every x in the inactive subset ω I , we have q0 (x) = d0 (x) and qn (x) = dn (x) for all sufficiently large n. Therefore, qe(x) = lim n→∞ qn (x) − q0 (x) ˆ = d(x). τn Case 2: For almost every x ∈ ω0+ , µ0 (x) > 0 implies d0 (x) > q0 (x) by (3.31). Therefore, q0 (x) = b+ (x) and dn (x) > q0 (x) for sufficiently large n. Hence qn = b+ (x) for these n and qe(x) = lim n→∞ qn (x) − q0 (x) = 0. τn Case 3: For almost every x ∈ ω + \ ω0+ , we have q0 (x) = b+ (x) = d0 (x). ˆ (a) If d(x) > 0, then dn (x) > b+ (x) for sufficiently large n. Therefore, qn (x) = b+ (x) for these n and hence qe(x) = 0. ˆ = 0, then (qn (x)−q0 (x))/τn = min{0, dn (x)−b+ (x)}/τ n for sufficiently (b) If d(x) large n, hence qe(x) = 0. ˆ < 0, then dn (x) < b+ (x) and hence qn (x) = dn (x) for sufficiently large (c) If d(x) ˆ n. Therefore, qe(x) = d(x) holds. Case 3 can be summarized as qe(x) = lim n→∞ qn (x) − q0 (x) ˆ = min{0, d(x)}. τn Case 4: For almost every x ∈ ω0− , we obtain, similarly to Case 2, qe(x) = lim n→∞ qn (x) − q0 (x) = 0. τn Case 5: For almost every x ∈ ω − \ ω0− , we obtain, similarly to Case 3, qe(x) = lim n→∞ qn (x) − q0 (x) ˆ = max{0, d(x)}. τn Summarizing all previous cases, we have shown that ˆ qe(x) = Π[bb− (x),bb+ (x)] (d(x)). (3.32) We proceed by showing that qn − q0 → qe strongly in Q = L2 (ω). τn (3.33) From the Lipschitz continuity of the projection Π, it follows that qn − q0 1 ˆ τn − qe = τn (ΠQad (dn ) − ΠQad (d0 )) − ΠQbad (d) Q Q dn − d0 ˆ ˆ ≤ τn + kdkQ → 2kdkQ . Q From Lebesgue’s Dominated Convergence Theorem, (3.33) follows. Consequently, we have qe = q̂. The projection formula (3.32) is equivalent to the variational inequality (3.24). Using the equations (3.18) and (3.20) for (un , qn , zn ) and for (u0 , q0 , z0 ), we infer that the weak limit (û, q̂, ẑ) satisfies (3.23) and (3.25). It is readily checked that (3.23)–(3.25) are the first order necessary conditions for (DQP(δε)). In view of the second order sufficient optimality conditions (Lemma 2.2), (DQP(δε)) is strictly 8. Numerical Sensitivity Analysis for the Quantity of Interest 163 convex and thus it has a unique solution. In view of Assumption 3.2 and (3.25), we obtain un − u0 qn − q0 − û ≤ C − q̂ τn τn V Q where C is independent of n. Hence û is also the strong limit of the difference quotient in V. The same arguments holds for ẑ. Our whole argument remains valid if in the beginning, we start with an arbitrary subsequence of {τn }. Since the limit (û, q̂, ẑ) is always the same, the convergence extends to the whole sequence. From the previous theorem we derive the following important corollary. The proof follows from a direct application of the implicit function theorem for generalized equations, see [6, Theorem 2.4]. Corollary 3.13. Under the conditions of the previous theorem, there exist neighborhoods N (p0 ) ⊂ P of p0 and N (x0 ) ⊂ X of x0 and a directionally differentiable function (U, Q, Z) : N (p0 ) → N (x0 ) with the following properties: (a) For every p ∈ N (p0 ), (U (p), Q(p), Z(p)) is the unique solution to the system (2.5)–(2.7) in the neighborhood N (x0 ). (b) (U (p0 ), Q(p0 ), Z(p0 )) = (u0 , q0 , z0 ) holds. (c) The directional derivative of (U, Q, Z) at p0 in the direction δp ∈ P is given by the derivative of ε 7→ (uε , q ε , z ε ) at ε = 0 in the direction 00 Lup (x0 , p0 )(·, δp) (3.34) δε = L00qp (x0 , p0 )(·, δp) , L00zp (x0 , p0 )(·, δp) i.e., by the solution and adjoint (û, q̂, ẑ) of DQP(δε). We remark that computing the sensitivity derivative of (U, Q, Z) for a given direction δp amounts to solving the linear-quadratic optimal control problem DQP(δε) for δε given by (3.34). Note that this problem, like the original one OP(p0 ), is subject to pointwise inequality constraints for the control variable. Due to the structure bad , the directional derivative of (U, Q, Z) is in general not a of the admissible set Q linear function of the direction δp, but only positively homogeneous. Note however bad is a linear space (which follows from a condition known as if the admissible set Q strict complementarity, see below), then the directional derivative becomes linear in the direction (i.e., it is the Gateaux differential). Definition 3.14 (Strict complementarity). Strict complementarity is said to hold at (x0 , p0 ) if n o x ∈ ω : q0 (x) ∈ {b− (x), b+ (x)} and µ0 (x) = 0 is a set of measure zero. A consequence of the strict complementarity condition is that the sensitivity derivatives are characterized by a linear system of equations set forth in the following e was defined in (2.19) and that RI denotes the multilemma. We recall that B plication of a function in L2 (ω) with the characteristic function of the inactive set ω I = {x ∈ ω : b− (x) < q0 (x) < b+ (x)}, see Section 2. Lemma 3.15. Under the conditions of Theorem 3.12 and if strict complementarity holds at (x0 , p0 ), then the directional derivative of (U, Q, Z) is characterized by the following linear system of equations: 0 00 Lup (x0 , p0 )(·, δp) U (p0 )(δp) e 0 , p0 ) Q0 (p0 )(δp) = − RI L00qp (x0 , p0 )(·, δp) . B(x (3.35) Z 0 (p0 )(δp) L00zp (x0 , p0 )(·, δp) 164 Numerical Methods and Applications e 0 , p0 ) : X → X 0 is boundedly invertible. Moreover, the operator B(x bad defined Proof. In virtue of the strict complementarity property, the admissible set Q in Theorem 3.12 becomes n o bad = q̂ ∈ Q : q̂(x) = 0 where q0 (x) ∈ {b− (x), b+ (x)} . Q Consequently, the variational inequality (3.24) simplifies to the following equation for bad : Q0 (p0 )(δp) ∈ Q L00qu (x0 , p0 )(δq, U 0 (p0 )(δp)) + L00qq (x0 , p0 )(δq, Q0 (p0 )(δp)) bad , + L00qz (x0 , p0 )(δq, Z 0 (p0 )(δp)) = −L00qp (x0 , p0 )(δq, δp) ∀δq ∈ Q which is equivalent to the middle equation in (3.35). The first and third equation in (3.35) coincide with (3.23) and (3.25), which proves the first claim. From Theorem 3.12 e 0 , p0 ) is bijective. Since it a continuous linear operator from we conclude that B(x X → X 0 , so is its inverse. We are now in the position to recall the first and second order sensitivity derivatives of the reduced cost functional j(p), compare again [17]. Note that we do not make use of strict complementarity in the following proposition. Proposition 3.16. Under the conditions of Theorem 3.12, the reduced cost functional α j(p) = J(U (p), p) + kQ(p) − qk2Q 2 is continuously differentiable in N (p0 ). The derivative at p0 in the direction δp ∈ P is given by j 0 (p)(δp) = L0p (x0 , p0 )(δp). (3.36) Additionally, the second order directional derivatives of the reduced cost function j exist, and are given by the following formula: b = L00 (x0 , p0 )(U 0 (p0 )(δp), δp) b + L00 (x0 , p0 )(Q0 (p0 )(δp), δp) b j 00 (p0 )(δp, δp) up qp b + L00 (x0 , p0 )(δp, δp). b + L00zp (x0 , p0 )(Z 0 (p0 )(δp), δp) (3.37) pp Proof. As in the unconstrained case there holds: j 0 (p0 )(δp) = L0u (x0 , p0 )(U 0 (p0 )(δp)) + L0q (x0 , p0 )(Q0 (p0 )(δp)) + L0z (x0 , p0 )(Z 0 (p0 )(δp)) + L0p (x0 , p0 )(δp). and the terms L0u and L0z vanish. Moreover, L0q (x0 , p0 )(Q0 (p0 )(δp)) = −(µ0 , Q0 (p0 )(δp) = 0 since Q0 (p0 )(δp) is zero on the strongly active set and µ0 vanishes on its complement. The formula for the second order derivative follows as in Proposition 3.5 by total directional differentiation of the first order formula. Remark 3.17. We note that the expressions for the first and second order derivatives in Proposition 3.16 are the same as in the unconstrained case, see Proposition 3.5. We now turn to our main result in the control-constrained case, concerning the differentiability and efficient evaluation of the sensitivity derivatives for the reduced quantity of interest (1.4). We recall that in the unconstrained case, we have made use of a duality argument for the efficient computation of the first and second order derivatives, see Section 3.2. However, in the presence of control constraints, this technique seems to be applicable only in the case of strict complementarity since otherwise, the derivatives (U 0 (p0 )(δp), ξ 0 (p0 )(δp), Z 0 (p0 )(δp)) do not depend linearly 8. Numerical Sensitivity Analysis for the Quantity of Interest 165 on the direction δp. In analogy to (3.11) and (3.35), we define the dual quantities (e v , re, ye) ∈ X by 0 Iu (q0 , u0 , p0 ) ve e 0 , p0 ) re = − RI Iq0 (q0 , u0 , p0 ) . B(x (3.38) ye 0 Theorem 3.18. Under the conditions of Theorem 3.12, the reduced quantity of interest i(p) is directionally differentiable at the reference parameter p0 . If in addition, strict complementarity holds at (x0 , p0 ), then the first order directional derivative at p0 in the direction δp ∈ P is given by i0 (p0 )(δp) = L00up (x0 , p0 )(e v , δp) + L00qp (x0 , p0 )(RI re, δp) + L00zp (x0 , p0 )(e y , δp) + Ip0 (u0 , q0 , p0 )(δp). (3.39) Proof. The proof is carried out similar to the proof of Theorem 3.6 using Lemma 3.15. Our next goal is to consider second order derivatives of the reduced quantity of interest. In order to apply the approach used in the unconstrained case, we rely on the existence of second order directional derivatives of (U, Q, Z) at p0 . However, these second order derivatives do not exist without further assumptions, as seen from the following simple consideration: Suppose that near a given reference parameter p0 = 0, the local optimal control is given by Q(p)(x) = max{0, x + p} ∈ L2 (ω) for x ∈ ω = (−1, 1) and p ∈ R. (An appropriate optimal control problem (OP(p)) can be easily constructed.) Then Q0 (p)(x) = H(x + p) (the Heaviside function), which is not directionally differentiable with respect to p and values in L2 (ω). Note that the point x = −p of discontinuity marks the boundary between the active and inactive sets of (OP(p)). Hence we conclude that the reason for the non-existence of the second order directional derivatives of Q lies in the change of the active set with p. The preceding argument leads to the following assumption: Assumption 3.19. There exists a neighborhood N (p0 ) ⊂ P of the reference parameter p0 such that for every p ∈ N (p0 ), strict complementarity holds at the solution (U (p), Q(p), Z(p)), and the active sets coincide with those of (u0 , q0 , z0 ). Remark 3.20. The previous assumption seems difficult to satisfy in the general case. However, if the control variable is finite-dimensional and strict complementarity is assumed at the reference solution (u0 , q0 , z0 ), then Assumption 3.19 is satisfied since the Lagrange multiplier µ(p) = −L0q (U (p), Q(p), Z(p), p) is continuous with respect to p and has values in Rn . We now proceed to our main result concerning second order derivatives of the reduced quantity of interest. In the theorem below, we use again () to denote evaluation at the point (x0 , p0 ). Theorem 3.21. Under the conditions of Theorem 3.12 and Assumption 3.19, the reduced quantity of interest i(p) is twice directionally differentiable at p0 . The second b are given by order directional derivatives in the directions of δp and δp b = he i00 (p0 )(δp, δp) v , ηiV×V 0 + he r, κiQ×Q0 + he y , σiV×V 0 0 0 > 00 00 00 b U (p0 )(δp) Iuu (q0 , u0 , p0 ) Iuq (q0 , u0 , p0 ) Iup (q0 , u0 , p0 ) U (p0 )(δp) 00 00 00 b (q0 , u0 , p0 ) Iqq (q0 , u0 , p0 ) Iqp (q0 , u0 , p0 ) Q0 (p0 )(δp) +Q0 (p0 )(δp) Iqu . 00 00 00 b δp Ipu (q0 , u0 , p0 ) Ipq (q0 , u0 , p0 ) Ipp (q0 , u0 , p0 ) δp (3.40) 166 Numerical Methods and Applications Here, (η, κ, σ) ∈ V 0 × Q0 × V 0 is given, as in the unconstrained case, by 000 b Lupp ()(·, δp, δp) η b κ = L000 qpp ()(·, δp, δp) b σ L000 zpp ()(·, δp, δp) 0 000 0 000 0 b b b L000 upu ()(·, δp, U (p0 )(δp)) + Lupq ()(·, δp, Q (p0 )(δp) + Lupz ()(·, δp, Z (p0 )(δp) 0 000 0 000 0 b b b + L000 qpu ()(·, δp, U (p0 )(δp)) + Lqpq ()(·, δp, Q (p0 )(δp) + Lqpz ()(·, δp, Z (p0 )(δp) 0 000 0 b b L000 zpu ()(·, δp, U (p0 )(δp)) + Lzpq ()(·, δp, Q (p0 )(δp) U 0 (p0 )(δp) b Q0 (p0 )(δp) . b +B b +B b +B e 0 ()(U 0 (p0 )(δp)) eq0 ()(Q0 (p0 )(δp)) ez0 ()(Z 0 (p0 )(δp)) ep0 ()(δp) + B u Z 0 (p0 )(δp) (3.41) Proof. The proof uses the same argument as the proof of Theorem 3.6. Note that in e (p), Q(p), Z(p), p) is totally directionally differentiable view of Assumption 3.19, B(U b the derivative is with respect to p at p0 . In the direction δp, b b +B b +B b +B eu0 ()(U 0 (p0 )(δp)) eq0 ()(Q0 (p0 )(δp)) ez0 ()(Z 0 (p0 )(δp)) ep0 ()(δp). B Due to the constant active sets, these partial derivatives have the following form: e 0 () = B u id RI id Bu0 (x0 , p0 ) id RI id , e 0 , p0 ), see Lemma 3.15, the second etc. In view of the bounded invertibility of B(x order partial derivatives of (U, Q, Z) at p0 exist by the Implicit Function Theorem. They satisfy the analogue of equation (3.16). We conclude this section by outlining an algorithm which collects the necessary steps b to evaluate the first and second order sensitivity derivatives j 0 (p0 ) δp and j 00 (p0 )(δp, δp) 0 00 b for given δp, δp b ∈ P. We suppose that the original as well as i (p0 ) δp and i (p0 )(δp, δp) optimization problem (OP(p)) has been solved, e.g., by the primal-dual active set approach in Section 2, for the nominal parameter p0 . We denote by A± and I the active and inactive sets belonging to the nominal solution (u0 , q0 ) and adjoint state e 0 , p0 ) appearing in equations (3.35) and (3.38), we refer z0 . For the definition of B(x to (2.19). 8. Numerical Sensitivity Analysis for the Quantity of Interest 167 Evaluation of sensitivity derivatives (1) Evaluate j 0 (p0 ) δp according to (3.36) (2) Compute the sensitivities U 0 (p0 ) δp, Q0 (p0 ) δp and Z 0 (p0 ) δp from (3.35) b according to (3.37) (3) Evaluate j 00 (p0 )(δp, δp) (4) Compute the dual quantities (e v , re, ye) from (3.38) (5) Evaluate i0 (p0 ) δp according to (3.39) b (6) Compute the sensitivities U 0 (p0 ) δp, b and Q0 (p0 ) δp b from (3.35) Z 0 (p0 ) δp (7) Compute the auxiliary quantities (η, κ, σ) from (3.41) b according to (3.40) (8) Evaluate i00 (p0 )(δp, δp) 4. Numerical Examples In this section we illustrate our approach using two examples from different areas. The first example is concerned with a parameter identification problem for the stationary Navier-Stokes system. No inequality constraints are present in this problem, and first and second order derivatives of the quantity of interest are obtained. In the second example, we consider a control-constrained optimal control problem for an instationary reaction-diffusion system subject to an infinite-dimensional parameter, which demonstrates the full potential of our approach. 4.1. Example 1. In this section we illustrate our approach using as an example a parameter identification flow problem without inequality constraints. We consider the configuration sketched in Figure 4.1. Γ0 ξ2 Γ1 Γ2 ξ1 Γ0 ΓC ξ3 ξ4 Γ3 Figure 4.1. Configuration of the system of pipes with measurement points The (stationary) flow in this system of pipes around the cylinder ΓC is described by incompressible Navier-Stokes equations, with unknown viscosity q: −q∆v + v · ∇v + ∇p ∇·v v v ∂v q ∂n − pn ∂v q ∂n − pn = = = = = = f 0 0 vin πn 0 in in on on on on Ω, Ω, Γ 0 ∪ ΓC , Γ1 , Γ2 , Γ3 . (4.1) 168 Numerical Methods and Applications Here, the state variable u = (v, p) consists of the velocity v = (v 1 , v 2 ) ∈ H 1 (Ω)2 and the pressure p ∈ L2 (Ω). The inflow Dirichlet boundary condition on Γ1 is given by a parabolic inflow vin . The outflow boundary conditions of the Neumann type are prescribed on Γ2 and Γ3 involving the perturbation parameter π ∈ P = R. (unlike previous sections, we denote the perturbation parameter by π to avoid the confusion with the pressure p.) Physically, the perturbation parameter π describes the pressure difference between Γ2 and Γ3 , see [11] for detailed discussion of this type of outflow boundary conditions. The reference parameter is chosen π0 = 0.029. The aim is to estimate the unknown viscosity q ∈ Q = R using the measurements of the velocity in four given points, see Figure 4.1. By the least squares approach, this results in the following parameter identification problem: Minimize 4 X 2 X i=1 j=1 (v j (ξi ) − v̄ij )2 + αq 2 , subject to (4.1). Here, v̄ij are the measured values of the components of the velocity at the point ξi and α is a regularization parameter. For a priori error analysis for finite element discretization of parameter identification problems with pointwise measurements we refer to [19]. The sensitivity analysis of previous sections allows to study the dependence on the perturbation parameter π. To illustrate this, we define two functionals describing the possible quantities of interest: I1 (u, q) = q, I2 (u, q) = cd (u), where cd (u) is the drag coefficient on the cylinder ΓC defined as: Z cd (u) = c0 n · σ · d ds, (4.2) ΓC with a chosen direction d = (1, 0), given constant c0 , and the stress tensor σ given by: ν σ = (∇v + (∇v)T ) − pI. 2 For the discretization of the state equation we use conforming finite elements on a shape-regular quadrilateral mesh Th . The trial and test spaces consist of cell-wise bilinear shape-functions for both pressure and velocities. We add further terms to the finite element formulation in order to obtain a stable formulation with respect to both the pressure-velocity coupling and convection dominated flow. This type of stabilization techniques is based on local projections of the pressure (LPS-method) first introduced in [1]. The resulting parameter identification problem is solved by Newton’s method on the parameter space as described in [3] which is known to be mesh-independent. The nonlinear state equation is likewise solved by Newton’s method, whereas the linear sub-problems are computed using a standard multi-grid algorithm. With these ingredients, the total numerical cost for the solution of this parameter identification problem on a given mesh behaves like O(N ), where N is the number of degrees of freedom (dof) for the state equation. For the reduced quantities of interest i1 (π) and i2 (π) we compute the first and second derivatives using the representations from Theorem 3.6. In Table 4.1 we collect the values of these derivatives for a sequence of uniformly refined meshes. In order to verify the computed sensitivity derivatives, we make a comparison with the derivatives computed by the second order difference quotients. To this end we choose ε = 10−4 and compute: dil = il (π0 + ε) − il (π0 − ε) , 2ε ddil = il (π0 + ε) − 2il (π0 ) + il (π0 − ε) , ε2 8. Numerical Sensitivity Analysis for the Quantity of Interest 169 Table 4.1. The values of i1 (π) and its derivatives on a sequence of uniformly refined meshes cells dofs 60 240 960 3840 15360 i01 (π) i1 (π) 270 1.0176e–2 900 1.0086e–2 3240 1.0013e–2 12240 1.0003e–2 47520 1.0000e–2 i001 (π) –3.9712e–1 1.4065e–1 –3.9386e–1 –3.2022e–1 –3.9613e–1 –8.5278e–1 –3.9940e–1 –1.0168e–0 –4.0030e–1 –1.0601e–0 Table 4.2. The values of i2 (π) and its derivatives on a sequence of uniformly refined meshes cells dofs i2 (π) 60 270 3.9511e–1 240 900 3.9106e–1 960 3240 3.9293e–1 3840 12240 3.9242e–1 15360 47520 3.9235e–1 i02 (π) i002 (π) –13.4846 9.89988 –13.8759 –4.09824 –13.8151 16.5239 –13.7357 19.3916 –13.7144 19.9385 by solving the optimization problem additionally for π = π0 − ε and π = π0 + ε. The results are shown in Table 4.3. Remark 4.1. The relative errors in Table 4.3 are of the order of the estimated finite difference truncation error. We therefore consider the correctness of our method to have been verified to within the accuracy of this test. The same holds for Example 2 and Table 4.4 below. Table 4.3. Comparison of the computed derivatives of il (l = 1, 2) with difference quotients, on the finest grid l i0l 1 2 –0.399403 –13.73574 dil dil −i0l i0l i00l –0.399404 2.5e–6 –1.01676 –13.73573 –7.3e–7 19.3916 ddil ddil −i00 l i00 l –1.01678 2.0e–5 19.3917 5.2e–6 4.2. Example 2. The second example concerns a control-constrained optimal control problem for an instationary reaction-diffusion model in 3 spatial dimensions. As the problem setup was described in detail in [9], we will be brief here. The reactiondiffusion state equation is given by (c1 )t = D1 ∆c1 − k1 c1 c2 in Ω × (0, T ), (4.3a) (c2 )t = D2 ∆c2 − k2 c1 c2 in Ω × (0, T ), (4.3b) where ci denotes the concentration of the i-th substance, hence u = (c1 , c2 ) is the state variable. Ω is a domain in R3 , in this case an annular cylinder (Figure 4.2), and T is the given final time. The control q enters through the inhomogeneous boundary 170 Numerical Methods and Applications conditions ∂c1 = 0 in ∂Ω × (0, T ), ∂n ∂c2 D2 = q(t) α(t, x) in ∂Ωc × (0, T ), ∂n ∂c2 D2 = 0 in (∂Ω \ ∂Ωc ) × (0, T ), ∂n and α is a given shape function on the boundary, modeling a revolving nozzle control surface ∂Ωc , the upper annulus. Initial conditions D1 (4.4a) (4.4b) (4.4c) on the c1 (0, x) = c10 (x) in Ω, (4.5a) c2 (0, x) = c20 (x) in Ω (4.5b) are also given. The objective to be minimized is Z Z 1 γ T α1 |c1 (T, ·) − c1T |2 + α2 |c2 (T, ·) − c2T |2 dx + |q − qd |2 dt J(c1 , c2 , q) = 2 Ω 2 0 o3 n Z T 1 + max 0, q(t) dt − qc , ε 0 i.e., it contains contributions from deviation of the concentrations at the given terminal time T from the desired ones ciT , plus control cost and a term stemming from a penalization of excessive total control action. We consider here the particular setup described in [9, Example 1], where substance c1 is to be driven to zero at time T (i.e., we have α1 = 1 and α2 = 0) from given uniform initial state c10 ≡ 1. This problem features a number of parameters, and differentiability of optimal solutions with respect to these parameters was proved in [10], hence, we may apply the results of Section 3. The nominal as well as the sensitivity and dual problems were solved using a primaldual active set strategy, see [9,15]. The nominal control is depicted in Figure 4.2. One clearly sees that the upper and lower bounds with values 5 and 1, respectively, are active in the beginning and end of the time interval. All computations were carried out using piecewise linear finite elements on a tetrahedral grid with roughly 3300 vertices, 13200 tetrahedra and 100 time steps. control 5 4.5 4 control q(t) 3.5 3 2.5 2 1.5 1 0 0.1 0.2 0.3 0.4 0.5 time t 0.6 0.7 0.8 0.9 1 Figure 4.2. Optimal (unperturbed) control q (left) and computational domain (right) Since the control variable is infinite-dimensional and control constraints are active in the solution, the active sets will in general change even under arbitrarily small perturbations, hence second order derivatives of the reduced quantity of interest i(p) may not exist (see the discussion before Asumption 3.19). 8. Numerical Sensitivity Analysis for the Quantity of Interest 171 We choose as quantity of interest the total amount of control action Z T I(u, q) = q(t) dt. 0 In contrast to the previous example, we consider now an infinite-dimensional parameter p = c10 , the initial value of the first substance. After discretization on the given spatial grid, the parameter space has a dimension dim P ≈ 3300. A look at Table 3.1 now reveals the potential of our method: The direct evaluation of the derivative i0 (p0 ) would have required the solution of 3300 auxiliary linear-quadratic problems, an unbearable effort. By our dual approach, however, we need to solve only one additional such problem (3.38) for the dual quantities. The derivative i0 (p0 ) is shown in Figure 4.3 as a distributed function on Ω. In the unperturbed setup, the desired terminal state Figure 4.3. Gradient of the quantity of interest c1 (T ) is everywhere above the desired state c1T ≡ 0. By increasing the value of the initial state c10 , the desired terminal state is even more difficult to reach, which leads to an increased control effort and thus an increased value of the quantity of interest. This is reflected by the sign of the function in Figure 4.3, which is everywhere positive. Moreover, one can identify the region of Ω where perturbations in the initial state have the greatest impact on the value of the quantity of interest. In order to check the derivative, we use again a comparison with a difference quotient in the given direction of δp ≡ 1. Table 4.4 shows the analogue of Table 4.3 with ε = 10−2 for this example. Table 4.4. Comparison of the computed derivatives of i with difference quotients i0 di di−i0 i0 0.222770 0.222463 –1.4e–3 5. Conclusion In this paper, we considered PDE-constrained optimization problems with inequality constraints, which depend on a perturbation parameter p. The differentiability of optimal solutions with respect to this parameter is shown in Theorem 3.12. This result complements previous findings in [7, 17] and makes precise the compactness assumptions needed for the proof. 172 Numerical Methods and Applications We obtained sensitivity results for a quantity of interest which depends on the optimal solution and is different from the cost functional. The main contribution of this paper is to devise an efficient algorithm to evaluate these sensitivity derivatives. Using a duality technique, we showed that the numerical cost of evaluating the gradient or the Hessian of the quantity of interest is only marginally higher than the evaluation of the gradient or the Hessian of the cost functional. The small additional effort is spent for the solution of one additional linear-quadratic optimization problem for a suitable dual quantity. A comparison with a direct approach for the evaluation of the gradient and the Hessian revealed the tremendous savings of the dual approach especially in the case of a high-dimenensional parameter space. Two numerical examples confirmed the correctness of our derivative formulae and illustrated the applicability of our results. References [1] R. Becker and M. Braack. A finite element pressure gradient stabilization for the stokes equations based on local projections. Calcolo, 38(4):173–199, 2001. [2] R. Becker, D. Meidner, and B. Vexler. Efficient numerical solution of parabolic optimization problems by finite element methods. submitted, 2005. [3] R. Becker and B. Vexler. Mesh refinement and numerical sensitivity analysis for parameter calibration of partial differential equations. Journal of Computational Physics, 206(1):95–110, 2005. [4] M. Bergounioux, K. Ito, and K. Kunisch. Primal-dual strategy for constrained optimal control problems. SIAM Journal on Control and Optimization, 37(4):1176–1194, 1999. [5] J. Dieudonné. Foundations of Modern Analysis. Academic Press, New York, 1969. [6] A. Dontchev. Implicit function theorems for generalized equations. Math. Program., 70:91–106, 1995. [7] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system— Part I: Solution differentiability. Numerical Functional Analysis and Optimization, 25(1–2):93– 117, 2004. [8] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system— Part II: Practical methods and examples. Optimization Methods and Software, 19(2):217–242, 2004. [9] R. Griesse and S. Volkwein. A primal-dual active set strategy for optimal boundary control of a nonlinear reaction-diffusion system. SIAM Journal on Control and Optimization, 44(2):467–494, 2005. [10] R. Griesse and S. Volkwein. Parametric sensitivity analysis for optimal boundary control of a 3D reaction-diffusion system. In G. Di Pillo and M. Roma, editors, Large-Scale Nonlinear Optimization, volume 83 of Nonconvex Optimization and its Applications, pages 127–149, Berlin, 2006. Springer. [11] J. Heywood, R. Rannacher, and S. Turek. Artificial boundaries and flux and pressure conditions for the incompressible navier–stokes equations. International Journal for Numerical Methods in Fluids, 22(5):325–352, 1996. [12] M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth Newton method. SIAM Journal on Optimization, 13(3):865–888, 2002. [13] M. Hinze and K. Kunisch. Second Order Methods for Optimal Control of Time-Dependent Fluid Flow. SIAM Journal on Control and Optimization, 40(3):925–946, 2001. [14] K. Ito and K. Kunisch. Augmented Lagrangian-SQP Methods in Hilbert Spaces and Application to Control in the Coefficients Problem. SIAM Journal on Optimization, 6(1):96–125, 1996. [15] K. Ito and K. Kunisch. The primal-dual active set method for nonlinear optimal control problems with bilateral constraints. SIAM Journal on Control and Optimization, 43(1):357–376, 2004. [16] F. Kupfer. An infinite-dimensional convergence theory for reduced SQP methods in Hilbert space. SIAM Journal on Optimization, 6:126–163, 1996. [17] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002. [18] H. Maurer and J. Zowe. First and second order necessary and sufficient optimality conditions for infinite-dimensional programming problems. Mathematical Programming, 16:98–110, 1979. [19] R. Rannacher and B. Vexler. A priori error estimates for the finite element discretization of elliptic parameter identification problems with pointwise measurements. SIAM Journal on Control and Optimization, 44(5):1844–1863, 2005. [20] A. Rösch and K. Kunisch. A primal-dual active set strategy for a general class of constrained optimal control problems. SIAM Journal on Optimization, 13(2):321–334, 2002. 8. Numerical Sensitivity Analysis for the Quantity of Interest 173 [21] F. Tröltzsch. Lipschitz stability of solutions of linear-quadratic parabolic control problems with respect to perturbations. Dynamics of Continuous, Discrete and Impulsive Systems Series A Mathematical Analysis, 7(2):289–306, 2000. 174 Numerical Methods and Applications 9. On the Interplay Between Interior Point Approximation and Parametric Sensitivities in Optimal Control R. Griesse and M. Weiser: On the Interplay Between Interior Point Approximation and Parametric Sensitivities in Optimal Control, to appear in: Journal of Mathematical Analysis and Applications, 2007 In all previous publications in this thesis, the primal-dual active set method (see Bergounioux et al. [1999], Hintermüller et al. [2002]) was routinely used in order to compute optimal solutions and sensitivity derivatives. Interior point methods offer an alternative approach to this task. We consider here the classical variant which employs a relaxation (u − ua ) η = µ of the complementarity conditions arising in the presence of, say, a one-sided control constraint u ≥ ua . When the homotopy parameter µ tends to zero, the corresponding solutions define the so-called central path. We investigate the interplay between the function space interior point method and parametric sensitivity derivatives for optimization problems of the following kind: Z Z Z 1 1 u(Ku) dx + α u2 dx + f u dx Minimize J(u; π) = 2 Ω 2 Ω (9.1) Ω subject to u − ua ≥ 0 a.e. in Ω. Here, K is a self-adjoint and positive semidefinite operator in L2 (Ω), which maps compactly into L∞ (Ω), f ∈ L∞ (Ω), and α ≥ α0 > 0. The perturbation parameter π may enter K, α and f in a Lipschitz and differentiable way, see Assumption 2.1 of the paper. This setting accommodates in particular optimal control problems, where K = S ? S and S is the solution operator of the underlying PDE. The interior point approach leads to the following relaxed optimality system for (9.1): Ju (u : π) − η (9.2) F (u, η; π, µ) = = 0. (u − ua ) η − µ The solutions of (9.2) are considered to be functions of both the homotopy parameter µ, viewed as an inner parameter, and the outer parameter π: Ξ(π, µ) = (Ξu (π, µ), Ξη (π, µ)) = v(π, µ). Our main results are the following estimates for the convergence of the interior point approximations v(π, µ) and their sensitivity derivatives vπ (π, µ) to the exact counterparts at µ = 0: kv(π, µ) − v(π, 0)kLq (Ω) ≤ c µ(1+q)/(2q) 1/(2q) kvπ (π, µ) − vπ (π, 0)kLq (Ω) ≤ cµ (Theorem 4.6) (Theorem 4.8) for all µ < µ0 and q ∈ [2, ∞). In other words, the sensitivity derivatives lag behind by √ a factor of µ as µ & 0. By excluding a neighborhood of the boundary of the active set, the convergence rates can be improved by an order of 1/4 (see Theorem 4.9). These findings are confirmed by three numerical examples in Section 5 of the paper. The first example is a simple problem with K ≡ 0. An elliptic optimal control problem serves as a second example, where the parameter π shifts the desired state. As a third example, we consider an obstacle problem, which fits our setting after switching to its dual formulation with regularization. 9. Parametric Sensitivities and Interior Point Methods 175 ON THE INTERPLAY BETWEEN INTERIOR POINT APPROXIMATION AND PARAMETRIC SENSITIVITIES IN OPTIMAL CONTROL ROLAND GRIESSE AND MARTIN WEISER Abstract. Infinite-dimensional parameter-dependent optimization problems of the form ’min J(u; p) subject to g(u) ≥ 0’ are studied, where u is sought in an L∞ function space, J is a quadratic objective functional, and g represents pointwise linear constraints. This setting covers in particular control constrained optimal control problems. Sensitivities with respect to the parameter p of both, optimal solutions of the original problem, and of its approximation by the classical primaldual interior point approach are considered. The convergence of the latter to the former is shown as the homotopy parameter µ goes to zero, and error bounds in various Lq norms are derived. Several numerical examples illustrate the results. 1. Introduction In this paper we study infinite-dimensional optimization problems of the form min J(u; p) s.t. g(u) ≥ 0 u (1.1) where u denotes the optimization variable, and p is a parameter in the problem which is not optimized for. The optimization variable u will be called the control variable throughout. It is sought in a suitable function space defined over a domain Ω. The function g(u) represents a pointwise constraint for the control. For simplicity of the presentation, we restrict ourselves here to the case of a scalar control, quadratic functionals J, and linear constraints. The exact setting is given in Section 2 and accomodates in particular optimal control of elliptic partial differential equations. Let us set the dependence of (1.1) on the parameter aside. In the recent past, a lot of effort has been devoted to the development of infinite-dimensional algorithms capable of solving such inequality-constrained problems. Among them are active set strategies [1, 5–7, 11] and interior point methods [12, 14, 15]. In the latter class, the complementarity condition holding for the constraint g(u) ≥ 0 and the corresponding Lagrange multiplier η ≥ 0 is relaxed to g(u)η = µ almost everywhere with µ denoting the duality gap homotopy parameter. When µ is driven to zero, the corresponding relaxed solutions (u(µ), η(µ)) define the so-called central path. In a different line of research, the parameter dependence of solutions for optimal control problems with partial differential equations and pointwise control constraints has been investigated. Differentiability results have been obtained for elliptic [9] and for parabolic problems [4, 8]. Under certain coercivity assumptions for second order derivatives, the solutions u(p) were shown to be at least directionally differentiable with respect to the parameter p. These derivatives, often called parametric sensitivities, allow to assess a solution’s stability properties and to design real-time capable update schemes. This paper intends to investigate the interplay between function space interior point methods and parametric sensitivity analysis for optimization problems. The solutions v(p, µ) = (u(p, µ), η(p, µ)) of the interior-point relaxed optimality systems depend on both the homotopy parameter µ, viewed as an inner parameter, and the outer 176 Numerical Methods and Applications parameter p. Our main results are, under appropriate assumptions, convergence of the interior point approximation and its parametric sensitivity to their exact counterparts: kv(p, µ) − v(p, 0)kLq ≤ c µ(1+q)/(2q) 1/(2q) kvp (p, µ) − vp (p, 0)kLq ≤ cµ (Theorem 4.6) (Theorem 4.8) for all µ < µ0 and q ∈ [2, ∞). By excluding a neighborhood of the boundary of the active set, the convergence rates can be improved by an order of 1/4 (Theorem 4.9). These convergence rates are confirmed by several numerical examples. The examples include a distributed elliptic optimal control problem with pointwise control constraints as well as a dualized and regularized obstacle problem. The outline of the paper is as follows: In Section 2 we define the setting for our problem. Section 3 is devoted to the parametric sensitivity analysis of problem (1.1). In Section 4 we establish our main convergence results, which are confirmed by numerical examples in Section 5. Throughout, c denotes a generic positive constant which is independent of the homotopy parameter µ and the choice of the norm q. It has different values in different locations. In case q = ∞, expressions like (r − q)/(2q) are understood in the sense of their limit. By L(X, Y ), we denote the space of linear and continuous operators from X to Y . The (partial) Fréchet derivatives of a function G(u, p) are denoted by Gu (u, p) and Gp (u, p), respectively. In contrast, we denote the (partial) directional derivative of G in the direction δp by Dp (G(u, p); δp). 2. Problem Setting In this section, we define the problem setting and standing assumptions taken to hold throughout the paper. We consider the infinite-dimensional optimization problem min J(u; p) s.t. g(u) ≥ 0. u (2.1) Here, u ∈ L∞ (Ω) is the control variable, defined on a bounded domain Ω ⊂ Rd . For ease of notation, we shall denote the standard Lebesgue spaces Lq (Ω) by Lq . The problem depends on a parameter p from some normed linear space P . The objective J : L∞ × P → R is assumed to have the following form: Z Z Z 1 1 f (x, p) u(x) dx u(x)((K(p)u)(x)) dx + α(x, p)[u(x)]2 dx + J(u; p) = 2 Ω 2 Ω Ω (2.2) Assumption 2.1. We assume that p∗ ∈ P is a given reference parameter and that the following holds for p in a fixed neighborhood Ve of p∗ : (a) K(p) : L2 → L∞ is a linear compact operator which is self-adjoint and positive semidefinite as an operator L2 → L2 , (b) p 7→ K(p) ∈ L(L∞ , L∞ ) is Lipschitz continuous and differentiable, (c) p 7→ α(p) ∈ L∞ is Lipschitz continuous and differentiable, (d) α := inf{ess inf α(p) : p ∈ Ve } > 0, (e) p 7→ f (p) ∈ L∞ is Lipschitz continuous and differentiable. R Note that since Ω α(x, p)[u(x)]2 dx ≥ α kuk2L2 , J is strictly convex. In addition, J is weakly lower semicontinuous and radially unbounded and hence (2.1) admits a global unique minimizer u(p) ∈ L∞ over any nonempty convex closed subset of L∞ . This setting accomodates in particular optimal control problems with parameter-dependent desired state yd and objective α 1 J(u; p) = kSu − yd (p)k2L2 + kuk2L2 2 2 9. Parametric Sensitivities and Interior Point Methods 177 where Su is the unique solution of, e.g., a second-order elliptic partial differential equation with distributed control u and K = S ⋆ S. For simplicity of notation, we will from now on omit the argument p from K, α and f . From (2.2) we infer that the objective is differentiable with respect to the norm of L2 and we identify Ju with its Riesz representative, i.e., we have Ju (u; p) = Ku + αu + f. Note that for u ∈ Lq , Ju (u; p) ∈ Lq holds for all q ∈ [2, ∞]. Likewise, we write Juu (u; p) = K + αI for the second derivative, meaning that Z Z α v1 v2 . v2 (Kv1 ) + Juu (u; p)(v1 , v2 ) = Ω Ω Let us now turn to the constraints which are given in terms of a Nemyckii operator involving a twice differentiable real function g : R → R with Lipschitz continuous derivatives. For simplicity, we restrict ourselves here to linear control constraints g(u) = u − a ≥ 0 a.e. on Ω (2.3) with lower bound a ∈ L∞ . The general case is commented on when appropriate. For later reference, we define the admissible set Uad = {u ∈ L∞ : g(u) ≥ 0 a.e. on Ω}. In this setting, the existence of a regular Lagrange multiplier can be proved: Lemma 2.2. u is the unique global optimal solution for problem (2.1) if and only if there exists a Lagrange multiplier η ∈ L∞ such that the optimality conditions Ju (u; p) − gu (u)⋆ η = 0, g(u) ≥ 0, and η ≥ 0 (2.4) g(u) η hold. Proof. The minimizer u is characterized by the variational inequality Ju (u; p)(u − u) ≥ 0 for all u ∈ Uad which can be pointwisely decomposed as Ju (u; p) = 0 where g(u) > 0 and Ju (u; p) ≥ 0 where g(u) = 0. Hence, η := Ju (u; p) ∈ L∞ is a multiplier for problem (2.1) such that (2.4) is satisfied. In the general case, the derivative gu (u) extends to a continuous operator from Lq to Lq (see [14]) and gu (u)⋆ above denotes its L2 adjoint. In view of our choice (2.3) we have gu (u)⋆ = I. 3. Parametric Sensitivity Analysis In this section we derive a differentiability result for the unrelaxed solution v(p, 0) with respect to changes in the parameter. K, α and f are evaluated at p∗ . Moreover, (u∗ , η ∗ ) = v(p∗ , 0) ∈ L∞ × L∞ is the unique solution of (2.4). In order to formulate our result, it is useful to define the weakly/strongly active and inactive subsets for the reference control u∗ : Ω0 = {x ∈ Ω : g(u∗ ) = 0 and η ∗ = 0} Ω+ = {x ∈ Ω : g(u∗ ) = 0 and η ∗ > 0} Ωi = {x ∈ Ω : g(u∗ ) > 0 and η ∗ = 0} which form a partition of Ω unique up to sets of measure zero. In addition, we define bad = {u ∈ L∞ : u = 0 a.e. on Ω+ and u ≥ 0 a.e. on Ω0 }. U 178 Numerical Methods and Applications Theorem 3.1. Suppose that Assumption 2.1 holds. Then there exist neighborhoods V ⊂ Ve of p∗ and U of u∗ and a map V ∋ p 7→ (u(p), η(p)) ∈ L∞ × L∞ such that u(p) is the unique solution of (2.1) in U and η(p) is the unique Lagrange multiplier. Moreover, this map is Lipschitz continuous (in the norm of L∞ ) and directionally differentiable at p∗ (in the norm of Lq for all q ∈ [2, ∞)). For any given direction δp, the derivatives δu and δη are the unique solution and Lagrange multiplier in L∞ × L∞ of the auxiliary problem Z Z 1 1 δu(x)((Kδu)(x)) dx + α(x)[δu(x)]2 dx + Jup (u∗ ; p∗ )(δu, δp) min δu 2 Ω 2 Ω bad . s.t. δu ∈ U (3.1) That is, δu and δη satisfy Kδu + αδu − δη = −Jup (u∗ ; p∗ )(δu, δp) δu δη = 0 bad δu ∈ U a.e. on Ω, δη ≥ 0 a.e. on Ω0 . (3.2) Proof. The main tool in deriving the result is the implicit function theorem for genb = Lq and eralized equations [3], see Appendix A, which we apply with X = L∞ , X W = Z = L∞ . We formulate (2.4) as a generalized equation. To this end, let G(u; p) = Ju (u; p) and Z N (u) = {ϕ ∈ L∞ : Ω ϕ (u − u) ≤ 0 for all u ∈ Uad } if u ∈ Uad while N (u) = ∅ otherwise. It is readily seen that (2.4) is equivalent to the generalized equation 0 ∈ G(u; p) + N (u). (3.3) Conditions (i) and (ii) of Theorem A.1 are a direct consequence of Assumption 2.1. The verification of conditions (iii) and (iv) proceeds in three steps: construction of the function ξ, the proof of its Lipschitz continuity, and the proof of directional differentiability. Step 1: We set up the linearization of (3.3) with respect to u, δ ∈ G(u∗ ; p∗ ) + Gu (u∗ ; p∗ )(u − u∗ ) + N (u), which can be written as δ ∈ Ku + αu + f + N (u). (3.4) These are the first order necessary conditions for a perturbation of problem (2.1) with R an additional linear term − Ω δ(x) u(x) dx in the objective, which does not disturb the strict convexity. Consequently, (3.4) is sufficient for optimality and thus uniquely solvable for any given δ. This defines the map ξ : L∞ ∋ δ 7→ u = ξ(δ) ∈ L∞ in Theorem A.1. Step 2: In order to prove that ξ is Lipschitz, let u′ and u′′ be the unique solutions of (3.4) belonging to δ ′ and δ ′′ . Then (3.4) readily yields Z Z (αu′ + Ku′ + f − δ ′ )(u′′ − u′ ) + (αu′′ + Ku′′ + f − δ ′′ )(u′ − u′′ ) ≥ 0. Ω Ω From there, we obtain Z Z α (u′′ − u′ )2 ≤ kδ ′′ − δ ′ kL2 ku′′ − u′ kL2 − (u′′ − u′ )K(u′′ − u′ ). α ku′′ − u′ k2L2 ≤ Ω Ω 9. Parametric Sensitivities and Interior Point Methods 179 Due to positive semidefiniteness of K, 1 c ku′′ − u′ kL2 ≤ kδ ′ − δ ′′ kL2 ≤ kδ ′ − δ ′′ kL∞ α α follows. To derive the L∞ estimate, we employ a pointwise argument. Let us denote by Pu(x) = max{u(x), a(x)} the pointwise projection of a function to the admissible set Uad . As (3.4) is equivalent to δ(x) − (Ku)(x) − f (x) , u(x) = P α(x) and the projection is Lipschitz with constant 1, we find that 1 ′′ |u′′ (x) − u′ (x)| ≤ |δ (x) − δ ′ (x)| + |(K(u′′ − u′ ))(x)| α(x) 1 ′′ ≤ kδ − δ ′ kL∞ + kKkL2 →L∞ ku′′ − u′ kL2 , α ′′ from where the desired ku − u′ kL∞ ≤ c kδ ′ − δ ′′ kL∞ follows. Since kη ′′ − η ′ kL∞ = kJu (u′′ ; p∗ ) − Ju (u′ ; p∗ ) − δ ′ + δ ′′ kL∞ ≤ kK(u′′ − u′ )kL∞ + kαkL∞ ku′′ − u′ kL∞ + kδ ′′ − δ ′ kL∞ holds, we have Lipschitz continuity also for the Lagrange multiplier. In Step 3 we deduce that u = ξ(δ) in (3.4) depends directionally differentiably on δ. To this end, let δb ∈ L∞ be a given direction, let {τn } be a real sequence such that b We consider τn ց 0 and let us define un to be the solution of (3.4) for δn = τn δ. ∗ the difference quotient (un − u )/τn which, by the Lipschitz stability shown above, is b L . Hence we can extract a bounded in L∞ and thus in L2 by a constant times kδk ∞ subsequence such that un − u∗ ⇀u b in L2 . τn By compactness, K((un − u∗ )/τn ) → K u b in L∞ holds. Hence the sequence dn = −(Kun + f − δn )/α converges uniformly to d∗ = −(Ku∗ + f )/α and (dn − d∗ )/τn converges uniformly to db = (δb − K u b)/α. We now construct a pointwise limit of the difference quotient taking advantage of the decomposition of Ω. Note that α(u∗ −d∗ ) = η ∗ and un = Pdn and likewise u∗ = Pd∗ hold. On Ωi , we have d∗ > a and thus dn > a for sufficiently large n, which entails that un − u∗ Pdn − Pd∗ dn − d∗ = = → db on Ωi . τn τn τn On Ω+ , η ∗ > 0 implies d∗ < a, hence dn < a for sufficiently large n and thus Pdn − Pd∗ 0−0 un − u∗ = = → 0 on Ω+ . τn τn τn Finally on Ω0 we have η ∗ = 0 and thus d∗ = a so that un − u∗ Pdn − Pd∗ Pdn − a b0 = = → max d, on Ω0 . τn τn τn Hence we have constructed a pointwise limit u e = lim(un − u∗ )/τn on Ω. As ∗ ∗ dn − d∗ un − u un − u b + |d| + |e u| ≤ −u e ≤ τn τn τn b for any q ∈ [2, ∞), we and the right hand side converges pointwise and in Lq to 2 |d| infer from Lebesgue’s Dominated Convergence Theorem that un − u∗ →u e in Lq for all q ∈ [2, ∞) τn 180 Numerical Methods and Applications and hence u e=u b must hold. As for the Lagrange multiplier, we observe that ∗ un − u∗ b Ju (un ; p∗ ) − Ju (u∗ ; p∗ ) − δn un − u∗ ηn − η +α = =K −δ τn τn τn τn −→ ηb := K u b + αb u − δb in Lq for all q ∈ [2, ∞). It is straightforward to check that (b u, ηb) are the unique solution and Lagrange multiplier in L∞ × L∞ of the auxiliary problem Z Z Z 1 1 2 b u(x) dx s.t. u ∈ U bad . min δ(x) u(x)((Ku)(x)) dx + α(x)[u(x)] dx − u 2 Ω 2 Ω Ω (3.5) b = Lq and We are now in the position to apply Theorem A.1 with X = L∞ , X Z = L∞ . It follows that there exists a map V ∋ p 7→ u(p) ∈ U ⊂ L∞ mapping p to the unique solution of (3.3). Lemma 2.2 shows that u(p) is also the unique solution of our problem (2.1). Moreover, u(p∗ ) = u∗ holds, and u(p) is directionally differentiable at p∗ into Lq for any q ∈ [2, ∞). By the first equation in (2.4), i.e., η(p) = Ju (u(p); p), the same holds for η(p). The derivative (δu, δη) in the direction of δp is given by the unique solution and Lagrange multiplier of (3.5) with δb = −Jup (u∗ ; p∗ )(·, δp), whose necessary and sufficient optimality conditions coincide with (3.2). This completes the proof. Remark 3.2. (1) The directional derivative map P ∋ δp 7→ (δu, δη) ∈ L∞ × L∞ (3.6) is positively homogeneous in the direction δp but may be nonlinear. However, k(δu, δη)k∞ ≤ c kδpkP holds with c independent of the direction. (2) In case of Ω0 being a set of measure zero, we say that strict complementarity holds at the solution u(p∗ , 0). As a consequence, the admissible set for the bad is a linear space and the map (3.6) is linear. sensitivities U 4. Convergence of Solutions and Parametric Sensitivities As mentioned in the introduction, we consider an interior point regularization of problem (2.1) by means of the classical primal-dual relaxation of the first order necessary conditions (2.4). That is, we introduce the homotopy parameter µ ≥ 0 and define the relaxed optimality system by Ju (u; p) − η F (u, η; p, µ) = = 0. (4.1) g(u) η − µ As opposed to the previous section, we write again p instead of p∗ for the fixed reference parameter. Lemma 4.1. For each µ > 0 there exists a unique admissible solution of (4.1). Proof. A proof is given in [10]. For convenience, we sketch the main ideas here. The interior point equation (4.1) is the optimality system for the primal interior point formulation Z min J(u; p) − µ Ω ln(g(u)) dx of (1.1). For each ǫ > 0, this functional is lower semicontinuous on the set Mǫ := {u ∈ L∞ : g(u) ≥ ǫ}, such that by convexity and coercivity a unique minimizer uǫ (µ) exists. Moreover, if ǫ is sufficiently small, uǫ (µ) = u(µ) ∈ int Mǫ holds, such that u(µ) and the associated multiplier satisfy (4.1). 9. Parametric Sensitivities and Interior Point Methods We denote the solution of (4.1) by v(p, µ) := 181 u(p, µ) . η(p, µ) It defines the central path homotopy as µ ց 0 for fixed parameter p. This section is devoted to the convergence analysis of v(p, µ) → v(p, 0) and of vp (p, µ) → vp (p, 0) as µ ց 0. We will establish orders of convergence for the full scale of Lq norms. In order to avoid cluttered notation with operator norms, we assume throughout that δp is an arbitrary parameter direction of unit norm, and we use up (p, µ) vp (p, µ) = ηp (p, µ) to denote the directional derivative of v(p, µ) in this direction, whose existence is guaranteed by Theorem 3.1 in case µ = 0 and by Lemma 4.7 below for µ > 0. Moreover, we shall omit function arguments when appropriate. To begin with, we establish the invertibility of the Karush-Kuhn-Tucker operator √ belonging to problem (2.1). Note that gη = µ implies that g + η ≥ 2 µ. Lemma 4.2. For any µ > 0, the derivative Fv (v(p, µ); p, µ) is boundedly invertible from Lq → Lq for all q ∈ [2, ∞] and satisfies b . kFv−1 (·)(a, b)kLq ≤ c kakLq + g + η Lq Proof. Obviously, F is differentiable with respect to v = (u, η). In view of linearity of the inequality constraint, we need to consider the system Juu −gu⋆ ū a = η gu g η̄ b where the matrix elements are evaluated at u(p, µ) and η(p, µ), respectively. We introduce the almost active set ΩA = {x ∈ Ω : g ≤ η} and its complement ΩI = Ω\ΩA , the almost inactive set. The associated characteristic functions χA and χI = 1 − χA , respectively, can be interpreted as orthogonal projectors onto the subspaces L2 (ΩA ) and L2 (ΩI ). Dividing the second row by η, we obtain a Juu −gu⋆ ū . = gu (χA + χI ) ηg (χA + χI )η̄ (χA + χI ) ηb Eliminating η b χI η̄ = χI − gu ū g η and multiplying the second row by −1 leads to the reduced system # " Juu + gu⋆ χI ηg gu −gu⋆ a + gu⋆ χI gb ū . = −gu −χA gη χA η̄ −χA ηb This linear saddle point problem satisfies the assumptions of Lemma B.1 in [2] (see also Appendix B) with V = L2 (Ω) and M = L2 (ΩA ): the upper left block is uniformly elliptic (with constant α independent of µ) and uniformly bounded since η/g ≤ 1 on ΩI , the off-diagonal blocks satisfy an inf-sup-condition (independently of µ), and the negative semidefinite lower right block is uniformly bounded since g/η ≤ 1 on ΩA . Therefore, the operator’s inverse is bounded independently of µ. Using that g ≤ η on ΩA and η ≤ g on ΩI , we obtain k(ū, χA η̄)kL2 ≤ c k(a + gu⋆ χI b/g, χAb/η)kL2 ≤ c (kakL2 + kb/(g + η)kL2 ) . 182 Numerical Methods and Applications Having the L2 -estimate at hand, we can move the spatially coupling operator K to the right hand side and apply the saddle point lemma pointwisely (with V = M = R) to # " a + gu⋆ χI gb − K ū α + gu⋆ χI ηg gu −gu⋆ ū = . gu χA ηg χA η̄ χA ηb Since K : L2 → L∞ is compact, we obtain |(ū, χA η̄)(x)| ≤ c|(a + gu⋆ χI b/g − K ū, χA b/η)| ≤ c (|a| + |b|/(g + η) + kKkL2→L∞ kūkL2 ) ≤ c (|a| + |b|/(g + η) + kakL2 + kb/(g + η)kL2 ) for almost all x ∈ Ω. From this we conclude that k(ū, χA η̄)kLq ≤ c(kakLq + kb/(g + η)kLq for all q ≥ 2. Moreover, kχI η̄kLq η b = χI − gu ū g η Lq ≤ 2kb/(g + η)kLq + c(kakLq + kb/(g + η)kLq ) ≤ c(kakLq + kb/(g + η)kLq ) holds, which proves the claim. Remark 4.3. For more complex settings with multicomponent u ∈ Ln∞ and g : Rn → Rm , the proof is essentially the same. The almost active and inactive sets ΩA and ΩI have to be defined for each component of g separately. The only nontrivial change is to show the inf-sup-condition for gu . In order to prove convergence of the parametric sensitivities, we will need the strong complementarity (cf. [12]) of the non-relaxed solution. Assumption 4.4. Suppose there exists c > 0 such that the solution v(p, 0) satisfies |{x ∈ Ω : g(u(p, 0)) + η(p, 0) ≤ ǫ}| ≤ c ǫr (4.2) for all ǫ > 0 and some 0 < r ≤ 1. Note that Assumption 4.4 entails that the set Ω0 of weakly active constraints has measure zero, as \ |Ω0 | = | {x ∈ Ω : g(u(p, 0)) + η(p, 0) ≤ ǫ}| ≤ lim c ǫr = 0. ǫց0 ǫ>0 In other words, strict complementarity holds at the solution u(p, 0). In our examples, Assumption 4.4 is satisfied with r = 1. For convenience, we state a special case of Theorem 8.8 from [13] for use in the current setting. Lemma 4.5. Assume that f ∈ Lq , 1 ≤ q < ∞ satisfies {x ∈ Ω : |f (x)| > s} ≤ ψ(s), 0 ≤ s < ∞, for some integrable function ψ. Then, kf kqLq ≤q Z ∞ 0 sq−1 ψ(s) ds. We now prove a bound for the derivative vµ of the central path with respect to the duality gap parameter µ. 9. Parametric Sensitivities and Interior Point Methods 183 Theorem 4.6. Suppose that Assumption 4.4 holds. Then the map µ 7→ v(µ, p) is differentiable and the slope of the central path is bounded by kvµ (p, µ)kLq ≤ c µ(r−q)/(2q) , q ∈ [2, ∞]. (4.3) In particular, the a priori error estimate kv(p, µ) − v(p, 0)kLq ≤ c µ(r+q)/(2q) (4.4) holds. Proof. By the implicit function theorem, the derivative vµ is given by Fv (v(p, µ); p, µ) vµ (p, µ) = −Fµ (v(p, µ); p, µ) = 0 . 1 Hence from Lemma 4.2 above we obtain kvµ (p, µ)kL∞ ≤ c k(g + η)−1 kL∞ ≤ c µ−1/2 . √ The latter inequality holds since gη = µ implies that g + η ≥ 2 µ. Now let µn , n ∈ N be a positive sequence converging to zero. We may estimate for n>m Z kv(p, µn ) − v(p, µm )kL∞ ≤ µm µn Z kvµ (p, µ)kL∞ dµ ≤ c µm µn µ−1/2 dµ √ 1/2 ≤ c µm , ≤ c µ1/2 − µ m n which is less than any ǫ > 0 for sufficiently large m ≥ mǫ . Thus, v(p, µn ) is a Cauchy sequence with limit point v. Using continuity of L∞ ∋ v 7→ (Ju (u; p) − η, g(u)η) we find v = v(p; 0). The limit n → ∞ now yields kv(p, µ) − v(p, 0)kL∞ ≤ c √ µ, (4.5) which proves (4.3) and (4.4) for the case q = ∞. From (4.5) and (4.2) we obtain |{x ∈ Ω : g(u(p, µ)) + η(p, µ) < ǫ}| ( √ 0, if ǫ ≤ 2 µ ≤ √ |{x ∈ Ω : g(u(p, 0)) + η(p, 0) < ǫ + c µ}| otherwise ( √ 0, if ǫ ≤ 2 µ ≤ √ c (ǫ + c µ)r otherwise with c independent of r. Using Lemmas 4.2 and 4.5 we estimate for q ∈ [2, ∞) kvµ kqLq ≤ cq k(g + η)−1 kqLq ≤ cq q with ( ψ(s) = 0, √ c (s−1 + µ)r Z 0 ∞ sq−1 ψ(s) ds √ if s ≥ (2 µ)−1 otherwise 184 Numerical Methods and Applications and obtain kvµ kqLq ≤c ≤c q+1 q+1 Z q 0 Z q = cq+1 q √ (2 µ)−1 √ (2 µ)−1 0 3 r Z sq−1 (s−1 + s q−1 √ (2 µ)−1 3 −1 s 2 √ r µ) ds r ds sq−1−r ds 2 0 3 r (2√µ)−1 q+1 q sq−r 0 =c q−r 2 q+1 q 3r 2−q µ(r−q)/2 . ≤c q−r This implies (4.3). As before in the proof of Theorem 4.6, integration over µ then yields (4.4). Lemma 4.7. Along the central path, the solutions v(p, µ) are Fréchet differentiable w.r.t. p. There exists µ0 > 0 such that the parametric sensitivities are bounded independently of µ: kvp (p, µ)kL∞ ≤ c for all µ < µ0 . Proof. By the implicit function theorem and Lemma 4.2, vp exists and satisfies J (u(p, µ); p) Fv (v(p, µ); p, µ) vp (p, µ) = −Fp (v(p, µ); p, µ) = − up . (4.6) 0 and kvp kL∞ ≤ c kJup (u(p, µ); p)kL∞ holds. By (4.4), ku(p, µ)kL∞ is bounded, and by Assumption 2.1, the same holds for kJup (u(p, µ); p)kL∞ . Theorem 4.8. Suppose that Assumption 4.4 holds. Then there exist constants µ0 > 0 and c independent of µ such that kvp (p, µ) − vp (p, 0)kLq ≤ cµr/(2q) for all µ < µ0 and q ∈ [2, ∞), where vp (p, 0) is the parametric sensitivity of the original problem. Proof. We begin with the sensitivity equation (4.6) and differentiate it totally with respect to µ, which yields Fvv (vp , vµ ) + Fvµ vp + Fv vpµ = −Fpv vµ − Fpµ . First we observe Fvµ = 0, Fpµ = 0 and a Jupu uµ =: −Fvv (vp , vµ ) − Fpµ vµ = − . b ηp gu uµ + up gu⋆ ηµ (4.7) (4.8) In view of Assumption 2.1, Jupu is a fixed element of L(Lq , Lq ). Hence by Theorem 4.6, we have kakLq ≤ c µ(r−q)/(2q) for all q ∈ [2, ∞). The quantities (uµ , ηµ ) and (up , ηp ) can be estimated by Theorem 4.6 and Lemma 4.7, respectively, which entails kbkLq ≤ c kηp kL∞ kuµ kLq + kup kL∞ kηµ kLq ≤ c µ(r−q)/(2q) for all q ∈ [2, ∞) 9. Parametric Sensitivities and Interior Point Methods 185 and sufficiently small µ. We have seen that (4.7) reduces to Fv (vpµ ) = (a, b)⊤ . Applying Lemma 4.2 yields kvpµ kLq ≤ c kakLq + kb/(g + η)kLq ≤ c µ(r−q)/(2q) + µ(r−q)/(2q)−1/2 ≤ c µ(r−2q)/(2q) and thus kvpµ kLq ≤ c µ(r−2q)/(2q) for all q ∈ [2, ∞). Integrating over µ > 0 as before, we obtain the error estimate q kvp (p, µ) − vkLq ≤ c µr/(2q) , r where v = limµց0 vp (p, µ). Taking the limit µ ց 0 of (4.6) and using continuity of L∞ × L2 ∋ (v, vp ) 7→ Fv (v) vp + Fp (v) ∈ L2 , we have Fv (v(p, 0); p, 0) v + Fp (v(p, 0); p, 0) = 0, that is, Juu (u(p, 0); p, 0) u − gu (u(p; 0)) η = −Jup (u(p, 0); p) η(p, 0)gu (u(p, 0)) u + g(u(p, 0)) η = 0. (4.9) (4.10) From (4.10) we deduce that u=0 on the strongly active set Ω+ η=0 on the inactive set Ωi , which together with (4.9) uniquely characterize the exact sensitivity, see Theorem 3.1. Note that strict complementarity holds at u(p, 0), i.e., Ω0 is a null set in view of Assumption 4.4. Hence the limit v is equal to the sensitivity derivative vp (p, 0) of the unrelaxed problem. Comparing the results of Theorem 4.6 and 4.8, we observe that the convergence of √ the sensitivities lags behind the convergence of the solutions by a factor of µ, see also Table 4.1. Therefore Theorem 4.8 does not provide any convergence in L∞ . This was to be expected since under mild assumptions, up (p, µ) is a continuous function on Ω for all µ > 0 while the limit up (p, 0) exhibits discontinuities at junction points, compare Figure 5.1. It turns out that the convergence rates are limited by effects on the transition regions, where g(u) + η is small. However, sufficiently far away from the boundary of the active set, we can improve the L∞ estimates by r/4: Theorem 4.9. Suppose that Assumption 4.4 holds. For β > 0 define the β-determined set as Dβ = {x ∈ Ω : g(u(p, 0)) + η(p, 0) ≥ β}. Then the following estimates hold: kv(p, µ) − v(p, 0)kL∞ (Dβ ) ≤ cµ(r+2)/4 r/4 kvp (p, µ) − vp (p, 0)kL∞ (Dβ ) ≤ cµ (4.11) (4.12) Proof. First we note that due to the uniform convergence on the central path there is some µ̄ > 0, such that g(u(p, µ)) + η(p, µ) ≥ β/2 for all µ ≤ µ̄ and almost all x ∈ Dβ . We recall that the derivative of the solutions on the central path vµ is given by 0 Fv (v(p, µ); p, µ) vµ (p, µ) = −Fµ (v(p, µ); p, µ) = . 1 186 Numerical Methods and Applications We return to (5.2) in the proof of Lemma 4.2 with a = 0 and b = 1. Pointwise application of the saddle point lemma on Dβ yields kvµ kL∞ (Dβ ) ≤ k(g + η)−1 kL∞ (Dβ ) + kKkL2→L∞ kuµ kL2 (Ω) 2 ≤ + c µ(r−2)/4 for all µ ≤ µ̄ β by Theorem 4.6. Integration over µ proves (4.11). Similarly, vpµ is defined by (4.7) with a and b given by (4.8). Thus we have kvpµ kL∞ (Dβ ) ≤ c kbkL∞ (Dβ ) k(g + η)−1 kL∞ (Dβ ) + kKkL2→L∞ kvpµ kL2 (Ω) 2 ≤ c µ−1/2 · + µ(r−4)/4 β ≤ c µ(r−4)/4 . Integration over µ verifies the claim (4.12). Before we turn to our numerical results, we summarize in Table 4.1 the convergence results proved. norm Lq (Ω) L∞ (Ω) L∞ (Dβ ) v(p, µ) → v(p, 0) vp (p, µ) → vp (p, 0) (r + q)/(2q) 1/2 (r + 2)/4 r/(2q) — r/4 Table 4.1. Convergence rates for Lq , q ∈ [2, ∞), and L∞ of the solutions and their sensitivities along the central path. Remark 4.10. One may ask oneself whether the interior point relaxation of the sensitivity problem (3.1) for vp (p, 0) coincides with the sensitivity problem (4.7) for vp (p, µ) on the path µ > 0. This, however, cannot be the case, as (3.1) includes equality constraints for up (p, 0) on the strongly active set Ω+ , whereas (4.7) shows no such restrictions. 5. Numerical Examples 5.1. An Introductory Example. We start with a simple but instructive example: Z 1 min (u(x) − x − p)2 dx s.t. u(x) ≥ 0 Ω 2 on Ω = (−1, 1). The simplicity arises from the fact that this problem is spatially decoupled and K = 0 holds. Nevertheless, several interesting properties of parametric sensitivities and their interior point approximations may be explored. The solution is given by u(p, 0) = max(0, x + p) with sensitivity ( 1, x + p > 0 up (p, 0) = 0, x + p < 0. The interior point approximations are p + x 1p (p + x)2 + 4µ u(p, µ) = + 2 2 and their sensitivities s 1 p+x 1 up (p, µ) = + . 2 2 (p + x)2 + 4µ 9. Parametric Sensitivities and Interior Point Methods Control sensitivities Control 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 −1 −0.8 −0.6 −0.4 −0.2 0 187 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Figure 5.1. Interior point solutions (left) and their sensitivities (right) for µ ∈ [10−6 , 10−1 ]. Convergence of solution in different norms 0 Convergence of sensitivity in different norms 0 10 10 −1 10 L1 L1 L2 L2 L4 L4 L8 L8 L L ∞ ∞ −2 −1 10 10 −3 10 −4 −2 10 10 −5 10 −6 10 −3 −6 10 −5 10 −4 −3 10 10 −2 10 −1 10 10 −6 10 −5 10 −4 −3 10 mu 10 −2 10 mu Figure 5.2. Convergence behavior of solutions (left) and their sensitivities (right) for q ∈ {2, 4, 8, ∞}. Finally, the Lagrange multiplier and its sensitivity are given by η(p, µ) = u(p, µ) − x − p ηp (p, µ) = up (p, µ) − 1. As a reference parameter, we choose p = 0. From the solution we infer that {x ∈ Ω : g(u(p, 0)) + η(p, 0) ≤ ǫ} = [−ǫ, ǫ] so Assumption 4.4 is satisfied with r = 1. A sequence of solutions obtained for a discretization of Ω with 212 points and µ ∈ [10−6 , 10−1 ] is depicted in Figure 5.1. The error of the solution ku(p, µ)−u(p, 0)kLq and the sensitivities kup (p, µ) − up (p, 0)kLq in different Lq norms are given in the double logarithmic Figure 5.2. Similar plots can be obtained for the multiplier and its sensitivities. Table 5.1 shows that the predicted convergence rates for q ∈ [2, ∞] are in very good accordance with those observed numerically. The numerical convergence rates −1 10 188 Numerical Methods and Applications q 1 2 4 8 ∞ control control sensitivity predicted observed predicted observed — 0.7500 0.6250 0.5625 0.5000 0.9132 0.7476 0.6221 0.5571 0.5000 — 0.2500 0.1250 0.0625 — 0.4960 0.2481 0.1214 0.0565 — Table 5.1. Predicted and observed convergence rates in different Lq norms for the control and its sensitivity. are estimated from ku(p,µ1 )−u(p,0)kL log ku(p,µ2 )−u(p,0)kLq q log µµ12 (5.1) and the same expression with u replaced by up , where µ1 and µ2 are the smallest and the middle value of the sequence of µ values used. The corresponding rates for the multiplier are identical. Our theory does not provide Lq estimates for q < 2. However, since exact solutions are available here, we can calculate √ 1 + 4µ + 1 1 p 1 + 4µ − 1 + µ ln √ ku(p, µ) − u(p, 0)kL1 = 2 1 + 4µ − 1 p p kup (p, µ) − up (p, 0)kL1 = 1 + 4µ − 1 + 4µ. Hence the L1 convergence orders approach 1 and 1/2, respectively, as µ ց 0, see Table 5.1. 5.2. An Optimal Control Example. In this section, we consider a linear-quadratic optimal control problem involving an elliptic partial differential equation: 1 α min J(u; p) = kSu − yd + pk2L2 + kuk2L2 s.t. u − a ≥ 0 and b − u ≥ 0 u 2 2 where Ω = (0, 1) ⊂ R and y = Su is the unique solution of the Poisson equation −∆y = u on Ω y(0) = y(1) = 0. The linear solution operator maps u ∈ L2 into Su ∈ H 2 ∩ H01 . Moreover, S ⋆ = S holds and K = S ⋆ S is compact from L2 into L∞ so that the problem fits into our setting. To complete the problem specification, we choose α = 10−4 , a ≡ −40, b ≡ 40 and yd = sin(3πx) as desired state. The reference parameter is p = 0. The presence of upper and lower bounds for the control requires a straightforward extension of our convergence results which is readily obtained and verified by this example. To illustrate our results, we discretize the problem using the standard 3-point finite difference stencil on a uniform grid with 512 points. The interior point relaxed problem is solved for a sequence of duality gap parameters µ ∈ [10−7 , 10−1 ] by applying Newton’s method to the discretized optimality system. The corresponding sensitivity problems require only one additional Newton step each since p ∈ R. To obtain a reference solution, the unrelaxed problem for µ = 0 is solved using a primal-dual active set strategy [1, 5], which is also used to find the solution of the sensitivity problem at µ = 0. The sequence of solutions u(p, µ) and sensitivity derivatives up (p, µ) is shown in Figure 5.3. As in the previous example, the error of the solution ku(p, µ) − u(p, 0)kLq and the sensitivities kup (p, µ) − up (p, 0)kLq in different Lq norms are given in the 9. Parametric Sensitivities and Interior Point Methods Control 189 Control Sensitivities 40 50 45 30 40 20 35 10 30 0 25 −10 20 −20 15 10 −30 5 −40 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 5.3. Interior point solutions (left) and their sensitivities (right) for µ ∈ [10−7 , 10−1 ]. Convergence of control in different norms 2 Convergence of control sensitivity in different norms 2 10 10 L1 L1 L2 L2 L4 L4 L8 L8 L∞ 1 10 L∞ 1 10 0 10 0 10 −1 10 −2 10 −1 −7 10 −6 10 −5 10 −4 10 mu −3 10 −2 10 −1 10 10 −7 −6 10 10 −5 10 −4 10 mu −3 10 −2 10 Figure 5.4. Convergence behavior of solutions (left) and their sensitivities (right) for q ∈ {2, 4, 8, ∞}. double logarithmic Figure 5.4. In order to compare the predicted convergence rates with the observed ones, we need to estimate the exponent r in the strong complementarity Assumption 4.4. To this end, we analyze the discrete solution u(p, 0) together with its Lagrange multiplier η(p, 0) = Ju (u(p, 0); p) whose positive and negative parts are multipliers for the lower and upper constraints, respectively. A finite sequence of estimates is generated according to rn ≈ n| log |Ω|Ωmin | ǫn log ǫmin , where ǫmin is the smallest value of ǫ > 0 such that {x ∈ Ω : u(p, 0) − a + η + (p, 0) ≤ ǫ} contains 10 grid points. |Ωmin | is the measure of the corresponding set. Similarly, we define ǫmax as the maximum value of u(p, 0) − a + η + (p, 0) on Ω and n ǫn = exp log(ǫmin ) + (log(ǫmax ) − log(ǫmin )) , n = 0, . . . , 20. 20 |Ωn | is again the measure of the corresponding set. For the current example, we obtain the sequence {rn } shown in Figure 5.5. From the slope of the line in the left part of −1 10 190 Numerical Methods and Applications Measure of set 0 10 −1 10 −2 10 −4 −3 10 10 −2 10 −1 10 ε 0 10 1 10 2 10 Figure 5.5. Sequence of estimates rn for the exponent in the strong complementarity assumption. q 1 2 4 8 ∞ control control sensitivity state state sensitivity predicted observed predicted observed observed observed — 0.7500 0.6250 0.5625 0.5000 0.8403 0.7136 0.5961 0.5387 0.4978 — 0.2500 0.1250 0.0625 — 0.4894 0.2470 0.1169 0.0484 — 0.8731 0.8739 0.8739 0.8765 0.8801 0.5096 0.4934 0.4710 0.4482 0.4015 Table 5.2. Predicted and observed convergence rates in different Lq norms for the control and its sensitivity, and observed rates for the state and its sensitivity. the figure, we deduce the estimate r = 1. The same result is found for the upper bound. Table 5.2 shows again the predicted and observed convergence rates for the control and its sensitivity, as well as the observed rates for the state y = Su and its sensitivity. All observed rates are estimated using (5.1) with µ1 and µ2 being the two smallest nonzero values of µ used. Again, the observed convergence rates for the control are in good agreement with the predicted ones and confirm our analysis for q ∈ [2, ∞]. Since in 1D, the solution operator S is continuous from L1 to L∞ , the observed rates for the control in L1 carry over to the state variables in Lq for all q ∈ [2, ∞], and likewise to the adjoint states. Similarly, the L1 rates for the control sensitivity carry over to the Lq rates for the state and adjoint sensitivities. 5.3. A Regularized Obstacle Problem. Here we consider the obstacle problem min k∇uk2L2 + phu, li s.t. u ≥ −1 u∈H01 (5.2) on Ω = (0, 1)2 ⊂ R2 , which, however, does not fit into the theoretical frame set in Section 2. Formally dualizing (5.2) leads to min hη, −∆−1 ηi + phη, ∆−1 li s.t. η ≥ 0, η∈H −1 9. Parametric Sensitivities and Interior Point Methods 191 0.5 0.5 0 0 −0.5 −0.5 −1 −1 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 1 0.5 0.8 0.4 0.1 0.6 0.4 0.1 0.2 0 0.8 0.4 0.2 0.4 0.2 1 0.5 0.3 0.6 0.3 0.2 0 0 0 Figure 5.6. Interior point solution u(µ) (left) and sensitivities up (µ) (right) for the regularized obstacle problem at µ = 5.7 · 10−4 . where ∆ : H01 → H −1 denotes the Laplace operator. Adding a regularization term for the Lagrange multiplier η, we obtain α (5.3) min hη, −∆−1 ηi + phη, ∆−1 li + kηk2L2 s.t. η ≥ 0. η∈L2 2 This dualized and regularized variant of the original obstacle problem (5.2) fits into the theoretical frame presented above. The original constraint u + 1 is the Lagrange multiplier associated to (5.3). For the numerical results we choose α = 1, p = 1, and an arbitrary linear term l = 45(2 sin(xy)+sin(−10x) cos(8y−1.25)), which results in a nice nonsymmetric contact region. The problem has been discretized on a uniform cartesian grid of 512×512 points using the standard 5-point finite difference stencil. Intermediate iterates and sensitivities computed on a coarser grid are shown in Figure 5.6. The convergence behaviour is illustrated in Figure 5.7. Again, the observed convergence rates are in good agreement with the predicted values for r = 1. For larger values of q the numerical convergence rate of up (µ) is greater than predicted. This can be attributed to the discretization, since for very small µ the linear convergence to the solution of the discretized problem is observed. References [1] M. Bergounioux, K. Ito, and K. Kunisch. Primal-dual strategy for constrained optimal control problems. SIAM Journal on Control and Optimization, 37(4):1176–1194, 1999. [2] D. Braess and C. Blömer. A multigrid method for a parameter dependent problem in solid mechanics. Numerische Mathematik, 57:747–761, 1990. [3] A. Dontchev. Implicit function theorems for generalized equations. Math. Program., 70:91–106, 1995. [4] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system— Part I: Solution differentiability. Numerical Functional Analysis and Optimization, 25(1–2):93– 117, 2004. [5] M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth Newton method. SIAM Journal on Optimization, 13(3):865–888, 2002. [6] M. Hintermüller and K. Kunisch. Path-following methods for for a class of constrained minimization problems in function space. SIAM Journal on Optimization, 17:159–187, 2006. [7] M. Hinze. A variational discretization concept in control constrained optimization: the linearquadratic case. Computational Optimization and Applications, 30:45–63, 2005. [8] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002. [9] K. Malanowski. Solution differentiability of parametric optimal control for elliptic equations. In E. W. Sachs and R. Tichatschke, editors, System Modeling and Optimization XX, Proceedings of the 20th IFIP TC 7 Conference, pages 271–285. Kluwer Academic Publishers, 2003. [10] U. Prüfert, F. Tröltzsch, and M. Weiser. The convergence of an interior point method for an elliptic control problem with mixed control-state constraints. Technical Report 36–2004, Institute of Mathematics, TU Berlin, Germany, 2004. 192 Numerical Methods and Applications 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 10 100 1000 Figure 5.7. Numerically observed convergence rates of interior point iterates (top markers) and sensitivities (bottom markers) for different values of q ∈ [1, 1000]. Thin lines denote the analytically predicted values. [11] M. Ulbrich. Semismooth Newton methods for operator equations in function spaces. SIAM Journal on Optimization, 13:805–842, 2003. [12] M. Ulbrich and S. Ulbrich. Superlinear convergence of affine-scaling interior-point Newton methods for infinite-dimensional nonlinear problems with pointwise bounds. SIAM Journal on Control and Optimization, 38(6):1938–1984, 2000. [13] M. Väth. Integration Theory. A second course. World Scientific, Singapore, 2002. [14] M. Weiser. Interior point methods in function space. SIAM Journal on Control and Optimization, 44(5):1766–1786, 2005. [15] M. Weiser, T. Gänzler, and A. Schiela. Control reduced primal interior point methods. Report 04-38, ZIB, 2004. Appendix A. An Implicit Function Theorem For the sake of easy reference we state here an implicit function theorem which is an adaptation of Theorem [3, Theorem 2.4]. Theorem A.1 (Implicit Function Theorem). Let X be a Banach space and let P, Z be normed linear spaces. Suppose that G : X × P → Z is a function and N : X → Z is a set-valued map. Let u∗ ∈ X be a solution to 0 ∈ G(u, p) + N (u) ∗ (A.1) for p = p , and let W be a neighborhood of 0 ∈ Z. Suppose that (i) G is Lipschitz in p, uniformly in u at (u∗ , p∗ ), and G(u∗ , ·) is directionally differentiable at p∗ with directional deriative Dp (G(u∗ , p∗ ); δp) for all δp ∈ P , (ii) G is partially Fréchet differentiable with respect to u in a neighborhood of (u∗ , p∗ ), and its partial derivative Gu is continuous in both u and p at (u∗ , p∗ ), (iii) there exists a function ξ : W → X such that ξ(0) = u0 , δ ∈ G(u∗ , p∗ ) + Gu (u∗ , p∗ )(ξ(δ) − u∗ ) + N (ξ(δ)) for all δ ∈ W, and ξ is Lipschitz continuous. Then there exist neighborhoods U of u∗ and V of p∗ and a function p 7→ u(p) from V to U such that u(p∗ ) = u∗ , u(p) is a solution of (A.1) for every p ∈ V , and u(·) is Lipschitz continuous. 9. Parametric Sensitivities and Interior Point Methods 193 b ⊃ X is a normed linear space such that If in addition, X b for all b is directionally differentiable at 0 with derivative Dξ(0; δ) (iv) ξ : W → X b δ ∈ Z, b is also directionally differentiable at p∗ and the derivative is given then p 7→ u(p) ∈ X by Dξ(0; −Dp G(u∗ , p∗ ; δp)) for all δp ∈ P . Appendix B. A Saddle Point Lemma For convenience we state here the saddle point lemma by Braess and Blömer [2, Lemma B.1]. Lemma B.1. Let V and M be Hilbert spaces. Assume the following conditions hold: (1) The continuous linear operator B : V → M ∗ satisfies the inf-sup-condition: There exists a constant β > 0 such that hζ, Bvi ≥β. inf sup ζ∈M v∈V kvkV kζkM (2) The continuous linear operator A : V → V ∗ is symmetric positive definite on the nullspace of B and positive semidefinite on the whole space V : There exists a constant α > 0 such that hv, Avi ≥ αkvk2V and for all v ∈ ker B hv, Avi ≥ 0 for all v ∈ V . (3) The continuous linear operator D : M → M ∗ is symmetric positive semidefinite. Then, the operator A B∗ : V × M → V ∗ × M∗ B −D is invertible. The inverse is bounded by a constant depending only on α, β, and the norms of A, B, and D. Bibliography W. Alt. The Lagrange-Newton method for infinite-dimensional optimization problems. Numerical Functional Analysis and Optimization, 11:201–224, 1990. W. Alt. Local convergence of the Lagrange-Newton method with applications to optimal control. Control and Cybernetics, 23(1–2):87–105, 1994. W. Alt, R. Griesse, N. Metla, and A. Rösch. Lipschitz stability for elliptic optimal control problems with mixed control-state constraints. submitted, 2006. R. Becker and B. Vexler. Mesh refinement and numerical sensitivity analysis for parameter calibration of partial differential equations. Journal of Computational Physics, 206(1):95–110, 2005. M. Bergounioux, K. Ito, and K. Kunisch. Primal-dual strategy for constrained optimal control problems. SIAM Journal on Control and Optimization, 37(4):1176–1194, 1999. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems. Springer, Berlin, 2000. K. Brandes and R. Griesse. Quantitative stability analysis of optimal solutions in PDEconstrained optimization. Journal of Computational and Applied Mathematics, 206 (2):908–926, 2007. doi: http://dx.doi.org/10.1016/j.cam.2006.08.038. E. Casas. Control of an elliptic problem with pointwise state constraints. SIAM Journal on Control and Optimization, 24(6):1309–1318, 1986. A. Dontchev. Implicit function theorems for generalized equations. Mathematical Programming, 70:91–106, 1995. R. Griesse. Lipschitz stability of solutions to some state-constrained elliptic optimal control problems. Journal of Analysis and its Applications, 25:435–455, 2006. R. Griesse and B. Vexler. Numerical sensitivity analysis for the quantity of interest in PDE-constrained optimization. SIAM Journal on Scientific Computing, 29(1): 22–48, 2007. R. Griesse and S. Volkwein. A primal-dual active set strategy for optimal boundary control of a nonlinear reaction-diffusion system. SIAM Journal on Control and Optimization, 44(2):467–494, 2005. doi: http://dx.doi.org/10.1137/S0363012903438696. R. Griesse and S. Volkwein. Analysis for optimal boundary control for a threedimensional reaction-diffusion system. Report No. 277, Special Research Center F003, Project Area II: Continuous Optimization and Control, University of Graz & Technical University of Graz, Austria, 2003. R. Griesse and S. Volkwein. Parametric sensitivity analysis for optimal boundary control of a 3D reaction-diffusion system. In G. Di Pillo and M. Roma, editors, Large-Scale Nonlinear Optimization, volume 83 of Nonconvex Optimization and its Applications, pages 127–149, Berlin, 2006. Springer. R. Griesse and M. Weiser. On the interplay between interior point approximation and parametric sensitivities in optimal control. Journal of Mathematical Analysis and Applications, 337(2):771–793, 2008. doi: http://dx.doi.org/10.1016/j.jmaa.2007.03.106. 196 Bibliography R. Griesse, M. Hintermüller, and M. Hinze. Differential stability of control constrained optimal control problems for the Navier-Stokes equations. Numerical Functional Analysis and Optimization, 26(7–8):829–850, 2005. doi: http://dx.doi.org/10.1080/01630560500434278. R. Griesse, N. Metla, and A. Rösch. Local quadratic convergence of SQP for elliptic optimal control problems with mixed control-state constraints. submitted, 2007. R. Griesse, T. Grund, and D. Wachsmuth. Update strategies for perturbed nonsmooth equations. Optimization Methods and Software, to appear. M. Heinkenschloss and F. Tröltzsch. Analysis of the Lagrange-SQP-Newton Method for the Control of a Phase-Field Equation. Control Cybernet., 28:177–211, 1998. M. Hintermüller and M. Hinze. An SQP Semi-Smooth Newton-Type Algorithm Applied to the Instationary Navier-Stokes System Subject to Control Constraints. SIAM Journal on Optimization, 16(4):1177–1200, 2006. M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth Newton method. SIAM Journal on Optimization, 13(3):865–888, 2002. K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002. K. Malanowski. Solution differentiability of parametric optimal control for elliptic equations. In E. W. Sachs and R. Tichatschke, editors, System Modeling and Optimization XX, Proceedings of the 20th IFIP TC 7 Conference, pages 271–285. Kluwer Academic Publishers, 2003a. K. Malanowski. Remarks on differentiability of metric projections onto cones of nonnegative functions. Journal of Convex Analysis, 10(1):285–294, 2003b. K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control for elliptic equations. Control and Cybernetics, 29:237–256, 2000. K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control for parabolic equations. Journal of Analysis and its Applications, 18(2): 469–489, 1999. S. Robinson. Strongly regular generalized equations. Mathematics of Operations Research, 5(1):43–62, 1980. A. Rösch and F. Tröltzsch. Existence of regular Lagrange multipliers for a nonlinear elliptic optimal control problem with pointwise control-state constraints. SIAM Journal on Control and Optimization, 45(2):548–564, 2006. T. Roubı́ček and F. Tröltzsch. Lipschitz stability of optimal controls for the steadystate Navier-Stokes equations. Control and Cybernetics, 32(3):683–705, 2003. F. Tröltzsch. Lipschitz stability of solutions of linear-quadratic parabolic control problems with respect to perturbations. Dynamics of Continuous, Discrete and Impulsive Systems Series A Mathematical Analysis, 7(2):289–306, 2000. F. Tröltzsch. Optimale Steuerung partieller Differentialgleichungen. Vieweg, Wiesbaden, 2005. F. Tröltzsch. On the Lagrange-Newton-SQP method for the optimal control of semilinear parabolic equations. SIAM Journal on Control and Optimization, 38(1):294–312, 1999. F. Tröltzsch and S. Volkwein. The SQP method for control constrained optimal control of the Burgers equation. ESAIM: Control, Optimisation and Calculus of Variations, 6:649–674, 2001. M. Ulbrich. Semismooth Newton methods for operator equations in function spaces. SIAM Journal on Control and Optimization, 13(3):805–842, 2003. A. Unger. Hinreichende Optimalitätsbedingungen zweiter Ordnung und Konvergenz des SQP-Verfahrens für semilineare elliptische Randsteuerpeobleme. PhD thesis, Chemnitz University of Technology, Germany, 1997. 197 D. Wachsmuth. Regularity and stability of optimal controls of instationary NavierStokes equations. Control and Cybernetics, 34:387–410, 2005. D. Wachsmuth. Analysis of the SQP method for optimal control problems governed by the instationary Navier-Stokes equations based on Lp theory. SIAM Journal on Control and Optimization, 46:1133–1153, 2007.
© Copyright 2025 Paperzz