Stability and Sensitivity Analysis in Optimal Control of

Stability and Sensitivity Analysis
in Optimal Control
of Partial Differential Equations
Dr. rer. nat. Roland Griesse
Cumulative Habilitation Thesis
Faculty of Natural Sciences
Karl-Franzens University Graz
October 2007
Contents
Preface
3
Chapter 1. Stability and Sensitivity Analysis
1. Lipschitz Stability of Solutions for Elliptic Optimal Control Problems with
Pointwise State Constraints
2. Lipschitz Stability of Solutions for Elliptic Optimal Control Problems with
Pointwise Mixed Control-State Constraints
3. Sensitivity Analysis for Optimal Control Problems Involving the
Navier-Stokes Equations
4. Sensitivity Analysis for Optimal Boundary Control Problems of a 3D
Reaction-Diffusion System
7
Chapter 2. Numerical Methods and Applications
5. Local Quadratic Convergence of SQP for Elliptic Optimal Control
Problems with Mixed Control-State Constraints
6. Update Strategies for Perturbed Nonsmooth Equations
7. Quantitative Stability Analysis of Optimal Solutions in PDE-Constrained
Optimization
8. Numerical Sensitivity Analysis for the Quantity of Interest in PDEConstrained Optimization
9. On the Interplay Between Interior Point Approximation and Parametric
Sensitivities in Optimal Control
Bibliography
12
29
45
62
81
82
102
124
145
174
195
Preface
The topic of this thesis is stability and sensitivity analysis in optimal control of partial
differential equations. Stability refers to the continuous behavior of optimal solutions
under perturbations of the problem data, while sensitivity indicates a differentiable
dependence.
This thesis is divided into two chapters. Chapter 1 provides a short overview over the
topic and its theoretical foundations. The individual sections give an introduction to
the author’s contributions concerning new stability and sensitivity results for several
problem classes, in particular optimal control problems with state constraints (Section 1) and mixed control-state constraints (Section 2), as well as problems involving
the Navier-Stokes equations (Section 3) and boundary control problems for a system of
coupled reaction-diffusion equations (Section 4). Chapter 1 is based on the following
publications.
1. R. Griesse: Lipschitz Stability of Solutions to Some State-Constrained Elliptic
Optimal Control Problems, Journal of Analysis and its Applications, 25(4),
p.435–444, 2006
2. W. Alt, R. Griesse, N. Metla and A. Rösch: Lipschitz Stability for Elliptic
Optimal Control Problems with Mixed Control-State Constraints, submitted
to Applied Mathematics and Optimization, 2006
3. R. Griesse, M. Hintermüller and M. Hinze: Differential Stability of Control Constrained Optimal Control Problems for the Navier-Stokes Equations,
Numerical Functional Analysis and Optimization 26(7–8), p.829–850, 2005
4. R. Griesse and S. Volkwein: Parametric Sensitivity Analysis for Optimal
Boundary Control of a 3D Reaction-Diffusion System, in: Large-Scale Nonlinear Optimization, G. Di Pillo and M. Roma (editors), volume 83 of Nonconvex Optimization and its Applications, p.127–149, Springer, Berlin, 2006
Chapter 2 addresses a number of applications based on the concepts of stability and
sensitivity of infinite dimensional optimization problems, and of optimal control problems in particular. The applications include the local convergence of the SQP (sequential quadratic programming) method for optimal control problems with mixed
control-state constraints (Section 5), accurate update strategies for solutions of perturbed problems (Section 6), the quantitative stability analysis of optimal solutions
(Section 7), and the efficient evaluation of first and second-order sensitivity derivatives
of a quantity of interest (Section 8). Finally, the relationship between the sensitivity
derivatives of optimization problems in function space, and the sensitivity derivatives
of their relaxations in the context of interior point methods is investigated (Section 9).
Chapter 2 is based on the following publications.
5. R. Griesse, N. Metla and A. Rösch: Local Quadratic Convergence of SQP
for Elliptic Optimal Control Problems with Mixed Control-State Constraints,
submitted to: ESAIM: Control, Optimisation, and Calculus of Variations,
2007
4
Preface
6. R. Griesse, T. Grund and D. Wachsmuth: Update Strategies for Perturbed
Nonsmooth Equations, to appear in: Optimization Methods and Software,
2007
7. K. Brandes and R. Griesse: Quantitative Stability Analysis of Optimal Solutions in PDE-Constrained Optimization, Journal of Computational and
Applied Mathematics, 206(2), p.809–826, 2007
8. R. Griesse and B. Vexler: Numerical Sensitivity Analysis for the Quantity
of Interest in PDE-Constrained Optimization, SIAM Journal on Scientific
Computing, 29(1), p.22–48, 2007
9. R. Griesse and M. Weiser: On the Interplay Between Interior Point Approximation and Parametric Sensitivities in Optimal Control, Journal of Mathematical Analysis and Applications, 337(2), p.771–793, 2008
An effort was made to use a consistent notation throughout the introductory paragraphs which link the individual papers. As a consequence, the notation used in the
introduction to each section may differ slightly from the notation used in the actual
publication. Moreover, all manuscripts have been typeset again from their LATEX
sources, in order to achieve a uniform layout. In some cases, this may have led to an
updated bibliography, or a different numbering scheme.
All of the above publications were written after the completion of the author’s Ph.D.
degree in February of 2003. In addition, the following publications were completed in
the same period of time.
10. R. Griesse and D. Lorenz: A Semismooth Newton Method for Tikhonov Functionals with Sparsity Constraints, submitted, 2007
11. R. Griesse and K. Kunisch: Optimal Control for a Stationary MHD System in
Velocity-Current Formulation, SIAM Journal on Control and Optimization,
45(5), p.1822–1845, 2006
12. A. Borzı̀ and R. Griesse: Distributed Optimal Control of Lambda-Omega
Systems, Journal of Numerical Mathematics 14(1), p.17–40, 2006
13. A. Borzı̀ and R. Griesse: Experiences with a Space-Time Multigrid Method for
the Optimal Control of a Chemical Turbulence Model, International Journal
for Numerical Methods in Fluids 47(8–9), p.879–885, 2005
14. R. Griesse and S. Volkwein: A Primal-Dual Active Set Strategy for Optimal
Boundary Control of a Reaction-Diffusion System, SIAM Journal of Control
and Optimization 44(2), p.467–494, 2005
15. R. Griesse and A.J. Meir: Modeling of an MHD Free Surface Problem Arising
in CZ Crystal Growth, submitted, 2007
16. J.C. de los Reyes and R. Griesse: State-Constrained Optimal Control of the
Stationary Navier-Stokes Equations, submitted, 2006
17. R. Griesse, A.J. Meir and K. Kunisch: Control Issues in Magnetohydrodynamics, in: Optimal Control of Free Boundaries, Mathematisches Forschungsinstitut Oberwolfach, Report No. 8/2007, p.20–23, 2007
18. R. Griesse and A.J. Meir: Modeling of an MHD Free Surface Problem Arising in CZ Crystal Growth, in: Proceedings of the 5th IMACS Symposium on
Mathematical Modelling (5th MATHMOD), I. Troch, F. Breitenecker (editors), ARGESIM Report 30, Vienna, 2006
19. R. Griesse and K. Kunisch: Optimal Control in Magnetohydrodynamics, in:
Optimal Control of Coupled Systems of PDE, Mathematisches Forschungsinstitut Oberwolfach, Report No. 18/2005, p.1011–1014, 2005
5
20. R. Griesse and A. Walther: Towards Matrix-Free AD-Based Preconditioning of KKT Systems in PDE-Constrained Optimization, Proceedings of the
GAMM 2005 Annual Scientific Meeting, PAMM 5(1), p.47–50, 2005
21. R. Griesse and S. Volkwein: A Semi-Smooth Newton Method for Optimal
Boundary Control of a Nonlinear Reaction-Diffusion System, Proceedings of
the Sixteenth International Symposium on Mathematical Theory of Networks
and Systems (MTNS), Leuven, Belgium, 2004
A complete and updated list of publications can be found online at
http://www.ricam.oeaw.ac.at/people/page/griesse/publications.html
Acknowledgment. The publications which form the basis of this thesis were
written during my postdoctoral appointments at Karl-Franzens University of Graz
(supported by the SFB 003 Optimization and Control ), and at the Johann Radon
Institute for Computational and Applied Mathematics (RICAM), Austrian Academy
of Sciences, in Linz. I would like to express my gratitude to Prof. Karl Kunisch for
giving me the opportunity to work in these two tremendous environments—both scientifically and otherwise—, for his continuous support and many inspiring discussions.
I would also like to thank Prof. Heinz Engl, director of RICAM, for the opportunity
to be part of this fantastic institute. The support of several project proposals by the
Austrian Science Fund (FWF) is gratefully acknowledged.
My sincere thanks go to former and current colleagues and co-workers in Graz and Linz,
who contributed greatly in making the recent years very enjoyable and successful. I
would like to mention in particular Stefan Volkwein, Georg Stadler, Juan Carlos de los
Reyes, Alfio Borzı̀, and Michael Hintermüller in Graz, and Arnd Rösch, Boris Vexler,
Marco Discacciati, Nataliya Metla, Svetlana Cherednichenko, Klaus Krumbiegel, Olaf
Benedix, Martin Bernauer, Frank Schmidt, Sven Beuchler, Joachim Schöberl, Herbert
Egger, Georg Regensburger, Martin Giese, Jörn Sass, and, of course, Annette Weihs,
Florian Tischler, Doris Nikolaus, Magdalena Fuchs and Wolfgang Forsthuber in Linz.
Many thanks also to all co-authors who have not yet been mentioned, for their effort
and time.
Last but not least, I would like to thank Julia for her constant love and support.
Linz, October 2007
CHAPTER 1
Stability and Sensitivity Analysis
Stability and sensitivity are important concepts in continuous optimization. Stability
refers to the continuous dependence of an optimal solution on the problem data. In
other words, stability ensures the well-posedness of the problem. On the other hand,
sensitivity information allows further quantification of the solution’s dependence on
problem data, using appropriate notions of differentiability. For a general account of
perturbation analysis for infinite-dimensional optimization problems, we refer to the
book of Bonnans and Shapiro [2000].
In this chapter, we consider the notions of stability and sensitivity of optimal control
problems involving partial differential equations (PDEs). To fix ideas, we use as an
example the following prototypical distributed optimal control problem for the Poisson
equation, subject to perturbations δ.
γ
1
ky − yd k2L2 (Ω) + kuk2L2 (Ω) − (δ1 , y)Ω − (δ2 , u)Ω
Minimize
2
2
(P(δ))
−∆y = u + δ3 in Ω,
subject to
y=0
on Γ.
The state y and control u are sought in H01 (Ω) and L2 (Ω), respectively, and we assume
a positive control cost parameter γ > 0. We note that the system of necessary and
sufficient optimality conditions associated to (P(δ)) is given by
−∆p + y − yd = δ1 in Ω, p = 0 on Γ,
(0.1)
γ u − p = δ2
in Ω,
−∆y − u = δ3
in Ω,
y=0
on Γ,
where p is the adjoint state, and δ appears as a right hand side perturbation. The
understanding of problems of type (P(δ)) is key to the analysis of nonlinear optimal
control problems which depend on a general perturbation parameter π, which may
enter nonlinearly. Properties of nonlinear problems can be deduced from properties of
(P(δ)) by means of an implicit function theorem, as outlined below.
In addition to problem (P(δ)), we consider some variations with pointwise control
constraints, pointwise state constraints, or pointwise mixed control-state constraints.
This leads us to consider
(Pcc (δ))
Solve (P(δ)) s.t. ua ≤ u ≤ ub a.e. in Ω,
(Psc (δ))
Solve (P(δ)) s.t. ya ≤ y ≤ yb in Ω,
(Pmc (δ))
Solve (P(δ)) s.t. yc ≤ ε u + y ≤ yd in Ω.
The Control Constrained Case. Lipschitz stability properties of problems of
type (Pcc (δ)) were first investigated in Unger [1997] and Malanowski and Tröltzsch
[2000] for the elliptic case and in Malanowski and Tröltzsch [1999] for the parabolic
case. We give here a brief account of their results, applied to our model problem
(Pcc (δ)). Problems with pointwise state constraints and mixed control-state constraints will be addressed in Sections 1 and 2, respectively.
8
Stability and Sensitivity Analysis
Assumption 0.1:
Suppose that Ω ⊂ Rd , d ≥ 1, is a bounded Lipschitz domain and that γ > 0 and
yd ∈ L2 (Ω) hold.
It is well known that (Pcc (δ)) possesses a unique solution (y δ , uδ ) ∈ H01 (Ω) × Uad ,
Uad := {u ∈ L2 (Ω) : ua ≤ u ≤ ub a.e. in Ω},
provided that Uad 6= ∅. The solution and the associated unique adjoint state pδ ∈
H01 (Ω) are characterized by the following optimality system:
(0.2)
−∆pδ + y δ − yd = δ1
in Ω,
pδ = 0
on Γ,
−∆y δ − uδ = δ3
in Ω,
yδ = 0
on Γ,
(γ uδ − pδ − δ2 , u − uδ )Ω ≥ 0
for all u ∈ Uad .
We begin by reviewing a Lipschitz stability result for the solution. For related results concerning optimal control of parabolic equations, we refer to Malanowski and
Tröltzsch [1999], Tröltzsch [2000].
Theorem 0.2 (Malanowski and Tröltzsch [2000]):
There exists a constant L2 such that
ky δ − y δ0 kH 1 (Ω) + kuδ − uδ0 kL2 (Ω) + kpδ − pδ0 kH 1 (Ω) ≤ L2 kδ − δ 0 k[L2 (Ω)]3
holds for every δ, δ 0 ∈ [L2 (Ω)]3 .
When the perturbations and other problem data are more regular, a stronger result
can be obtained:
Corollary 0.3 (compare Malanowski and Tröltzsch [2000]):
If yd , ua , ub ∈ L∞ (Ω), then there exists a constant L∞ such that
ky δ − y δ0 kL∞ + kuδ − uδ0 kL∞ (Ω) + kpδ − pδ0 kL∞ (Ω) ≤ L∞ kδ − δ 0 k[L∞ (Ω)]3
holds for every δ, δ 0 ∈ [L∞ (Ω)]3 .
Indeed, the assumption on yd , δ1 and δ3 can be relaxed depending on the regularity
of the solutions of the state and adjoint PDEs, i.e., depending on the dimension of Ω
and the smoothness of its boundary Γ.
Sensitivity Analysis in the Control Constrained Case. We now address
differentiability properties of the parameter-to-solution map
δ 7→ ξ(δ) := (ξ y (δ), ξ u (δ), ξ p (δ)) = (y δ , uδ , pδ ).
We refer to Malanowski [2002, 2003a] for the original contributions in the elliptic and
parabolic cases, respectively. Due to the presence of inequality constraints, ξ is a
nonlinear function of the perturbation δ. We remark that the optimal control can be
expressed as
p + δ δ
2
uδ := ΠUad
,
γ
where ΠUad denotes the pointwise projection onto the set Uad . Hence the differentiability properties of ξ are essentially those of the projection. Naturally, the subset of Ω
where the projection is active or strongly active will play a role, compare Figure 0.1.
We define
2
d
U
ba ≤ u ≤ u
bb },
ad,δ := {u ∈ L (Ω) : u
9
term inside the projection operator
ub
ua
strongly active
weakly active
inactive
weakly active
strongly active
Figure 0.1. Illustration of the admissible set for the sensitivity derivative. In the left-most and right-most parts of the domain, one
of the constraints is strongly active, i.e., γ −1 (pδ + δ2 ) > ub or < ua
ba = u
bb = 0. The deholds, and the derivative of uδ vanishes, i.e., u
rivative points into the interior of the admissible region where one of
the constraints is weakly active, i.e., where γ −1 (pδ + δ2 ) ∈ {ua , ub }
holds. In the center part of the domain, neither constraint is active,
and the derivative is not constrained, i.e., u
bb = −b
ua = ∞ holds.
with bounds
(
u
ba =
(
u
bb =
0
where γ −1 (pδ + δ2 ) ≤ ua or > ub
−∞ elsewhere
0
+∞
where γ −1 (pδ + δ2 ) < ua or ≥ ub
elsewhere.
Theorem 0.4 (Malanowski [2003a]):
For every δ ∈ [L2 (Ω)]3 , the map ξ is directionally differentiable with values in H01 (Ω)×
b at δ in the direction of δb is given
L2 (Ω) × H01 (Ω). The directional derivative Dξ(δ; δ)
by the unique solution and corresponding unique adjoint state of
1
γ
Minimize
kyk2L2 (Ω) + kuk2L2 (Ω) − (δb1 , y)Ω − (δb2 , u)Ω
2(
2
−∆y
=
u
+
δb3 in Ω
b
(DQP(δ, δ))
subject to
y=0
on Γ
d
and u ∈ U
ad,δ .
Moreover, differentiability with respect to higher Lp norms was also obtained in
Malanowski [2002, 2003a], and the directional derivative was shown to have the Boulib uniformly in all directions δ.
b
gand property, i.e., the remainder term is of order o(kδk)
The original proof of Theorem 0.4 was based on a pointwise construction of the limit
of a sequence of finite differences, and Lebesgue’s Dominated Convergence Theorem
was used to obtain a limit in L2 (Ω).
Recently, a more direct proof of Theorem 0.4 has been obtained in Griesse, Grund, and
Wachsmuth [to appear], which exploits Bouligand differentiability of the projection
ΠUad . We refer to Section 6 for details.
10
Stability and Sensitivity Analysis
Remark 0.5:
d
We remark that in general U
ad,δ is not a linear space and thus the directional derivative
b
b However, in the presence of strict
Dξ(δ; δ) may depend nonlinearly on the direction δ.
complementarity, i.e., if
{x ∈ Ω : γ −1 (pδ + δ2 ) = ua or ub } = 0
b
d
holds, then U
ad,δ becomes a linear space, and Dξ(δ; δ) does depend linearly on the
b
direction δ.
Nonlinear Optimal Control Problems. As mentioned earlier, the stability
and sensitivity analysis for nonlinear problems can be reduced to that for linearquadratic problems by means of an implicit function theorem. Due to the presence
of inequality constraints and the variational inequality in (0.2), the classical Implicit
Function Theorem is not applicable. To fix ideas, we consider the model problem
Z
Minimize
ϕ(x, y, u) dx
(Ω
−∆y + β y 3 + α y = u + f in Ω,
(Pcc (π))
subject to
y=0
on Γ
and ua ≤ u ≤ ub a.e. in Ω.
Problem (Pcc (π)) depends on the parameter
π = (α, β, f ) ∈ R2 × L2 (Ω) =: P.
Under appropriate assumptions (see, e.g., [Tröltzsch, 2005, Satz 4.18]), for any local
optimal solution (y, u) of (Pcc (π)), there exists a unique adjoint state p such that the
following system of necessary optimality conditions is satisfied:
−∆p + 3 β y 2 p + α p = ϕy (·, y, u)
(0.3)
3
−∆y + β y + α y = u + f
(ϕu (·, y, u) − p, u − u)Ω ≥ 0
in Ω,
p=0
on Γ,
in Ω,
y=0
on Γ,
for all u ∈ Uad .
To make (0.3) accessible to an implicit function theorem, we write it as an equivalent
generalized equation,
(0.4)
0 ∈ F (y, u, p; π) + N (u).
Here, F is defined as


−∆p + 3 β y 2 p + α p − ϕy (·, y, u)

−∆y + β y 3 + α y − u − f
F (y, u, p; π) = 
ϕu (·, y, u) − p
and it maps F : X × P → Z where
X = H01 (Ω) ∩ L∞ (Ω) × L2 (Ω) × H01 (Ω) ∩ L∞ (Ω)
Z = [H −1 (Ω)]2 × L2 (Ω).
when the differential operators are understood in their weak form. The set-valued part
N (u) is related to the normal cone of Uad at u, and we define
N (u) = {0} × {0} × {µ ∈ L2 (Ω) : (µ, v − u)Ω ≤ 0
for all v ∈ Uad }
in case u ∈ Uad , whereas N (u) = ∅ if u 6∈ Uad .
For generalized equations such as (0.4), we have the following Implicit Function Theorem.
11
Theorem 0.6 ([Dontchev, 1995, Theorem 2.4]):
Let X be a Banach space and let P, Z be normed linear spaces. Suppose that F :
X × P → Z is a function and N : X → Z is a set-valued map. Let x ∈ X be a solution
to
(0.5)
0 ∈ F (x; π) + N (x)
for π = π0 , and let W be a neighborhood of 0 ∈ Z. Suppose that
(i) F is Lipschitz in π, uniformly in x at (x, π0 ), and F (x, ·) is directionally
differentiable at π0 with directional derivative Dπ F (x, π0 ); δπ) for all δπ ∈ P ,
(ii) F is partially Fréchet differentiable with respect to x in a neighborhood of
(x, π0 ), and its partial derivative Fx is continuous in both x and π at (x, π0 ),
(iii) there exists a function ξ : W → X such that ξ(0) = x, δ ∈ F (x, π0 ) +
Fx (x, π0 )(ξ(δ) − x) + N (ξ(δ)) for all δ ∈ W, and ξ is Lipschitz continuous.
Then there exist neighborhoods U of x and V of π0 and a function
π 7→ Ξ(π) = x(π)
from V to U such that Ξ(π0 ) = x, Ξ(π) is a solution of (0.5) for every π ∈ V , and Ξ
is Lipschitz continuous.
b ⊃ X is a normed linear space such that
If, in addition, X
b is directionally (or Bouligand) differentiable at 0 with derivative
(iv) ξ : W → X
b
Dξ(0; δ) for all δb ∈ Z,
b is also directionally (or Bouligand) differentiable at π0 and its
then π 7→ Ξ(π) ∈ X
derivative is given by
(0.6)
DΞ(π0 ; δπ) = Dξ(0; −Dπ F ((x, π0 ); δπ),
for any δπ ∈ P .
Definition 0.7 (Robinson [1980]):
The property (iii) is termed the strong regularity of the generalized equation (0.5)
at x and π0 .
This implicit function theorem can be applied to the generalized equation (0.4) with
the setting x = (y, u, p). Assumptions (i) and (ii) are readily verified if ϕ is of class
C 2 . When we use
ξ(δ) = (y δ , y δ , pδ ),
the linearized generalized equation in assumption (iii) are the necessary optimality
conditions for a linear-quadratic approximation of (Pcc (π)), perturbed by δ:
Z
Z
ϕyy ϕyu
1
y
y u
Minimize
y p (y − y)2 dx
dx + 3 β
ϕuy ϕuu
u
2 Ω
Ω
Z
+
ϕy (y − y) + ϕu (u − u) dx − (δ1 , y)Ω − (δ2 , u)Ω
(AQPcc (δ))
( Ω
−∆y + (3 β y 2 + α) y = u + f + 2 β y 3 + δ3 in Ω,
subject to
y=0
on Γ
and ua ≤ u ≤ ub a.e. in Ω.
If second-order sufficient conditions hold at (y, u) and p, then (AQPcc (δ)) is a strictly
convex problem and it has a unique solution (y, u, pδ ) ∈ X, which depends Lipschitz
continously on δ ∈ Z, so that assumption (iii) is satisfied. This can be proved along
the lines of Theorem 0.2. As in Corollary 0.3, stability w.r.t. L∞ (Ω) norms can be
obtained as well by changing X and Z appropriately. Finally, Theorem 0.4 implies
12
Stability and Sensitivity Analysis
that also assumption (iv) is satisfied, so that ξ(δ) can be shown to be directionally
and Bouligand differentiable, as was done for a similar problem in Malanowski [2002,
2003a].
Following this overview of techniques and results for the control constrained case,
the following sections provide complementary results for optimal control problems
with state constraints (Section 1), and mixed control-state constraints (Section 2).
In Sections 3 and 4, we address again control constrained problems, but with more
involved dynamics, which are given by the time-dependent Navier-Stokes equations
or a semilinear reaction-diffusion system, respectively. Each section begins with an
introduction, followed by the corresponding publication.
1. Lipschitz Stability of Solutions for Elliptic Optimal Control Problems
with Pointwise State Constraints
R. Griesse: Lipschitz Stability of Solutions to Some State-Constrained Elliptic Optimal
Control Problems, Journal of Analysis and its Applications, 25(4), p.435–444, 2006
In this publication we derive Lipschitz stability results with respect to perturbations
for optimal control problems involving linear and semilinear elliptic partial differential
equations as well as pointwise state constraints. The problem setting in the linear case
with distributed control is very similar to our model problem (Psc (δ)) above, which
we repeat here for easy reference:
Minimize
(Psc (δ))
subject to
γ
1
ky − yd k2L2 (Ω) + kuk2L2 (Ω) − (δ1 , y)Ω − (δ2 , u)Ω
2
2
−∆y = u + δ3 in Ω,
y=0
and ya ≤ y ≤ yb
on Γ.
in Ω.
We work in sufficiently smooth domains Ω ⊂ Rd , d ≤ 3, so that the state y will belong
to
W = H 2 (Ω) ∩ H01 (Ω),
which embeds continuously into C0 (Ω). In this setting, we can allow perturbations
δ1 ∈ W ∗ , the dual space of W , so that the term (δ1 , y)Ω in the objective is replaced by
hδ1 , yiW ∗ ,W . Following standard arguments, one can show that (Psc (δ)) has a unique
solution (y δ , uδ ) ∈ W × L2 (Ω) for any given
δ ∈ Z := W ∗ × [L2 (Ω)]2 ,
provided that the feasible set
{(y, u) ∈ W × L2 (Ω) : y = Su and ya ≤ y ≤ yb in Ω}
is nonempty, where S : L2 (Ω) → W denotes the solution operator of −∆y = u in Ω,
y = 0 on Γ. We prove
Theorem 1.1 ([Griesse, 2006, Theorem 2.3]):
There exists L2 > 0 such that
ky δ − y δ0 kH 2 (Ω) + kuδ − uδ0 kL2 (Ω) ≤ L2 kδ − δ 0 kZ .
This result was obtained from a variational argument, without reference to the adjoint
state or Lagrange multiplier, hence no Slater condition is required up to here. However,
1. State Constrained Optimal Control Problems
13
whenever a Slater condition holds, it is known from Casas [1986] that there exists a
unique measure µ ∈ M(Ω) = C0 (Ω)∗ and a unique adjoint state satisfying
−∆p = −(y − yd ) − µ + δ1
in Ω,
p=0
on Γ,
−∆y = u + δ3
in Ω,
y=0
on Γ,
γ u − p = δ2
hy, µi ≤ hy, µi
in Ω
for all y ∈ W ∩ Yad ,
see Proposition 2.4 of the following paper. The adjoint equation has to be understood
in a very weak sense. We may easily derive a Lipschitz estimate for pδ from the third
equation,
kpδ − pδ0 kL2 (Ω) ≤ (γ L2 + 1)kδ − δ 0 kZ .
However, a Lipschitz estimate for pδ in higher norms is not available, in contrast to
the control constrained case, compare Theorem 0.2 and Corollary 0.3.
As outlined in the introduction, the Implicit Function Theorem 0.6 can be used to
derive Lipschitz stability results in the presence of semilinear equations. In view of
the findings above for the linear-quadratic case, we choose X = W × [L2 (Ω)]2 as the
space for the unknowns and Z = W ∗ × [L2 (Ω)]2 as the space of perturbations. We
refer to Theorem 3.10 of the following publication for an application of this technique.
The case of Robin boundary control of a linear elliptic equation with state constraints
is treated as well. However, the same technique as above then only admits the Lipschitz
estimate
kpδ − pδ0 kL2 (Γ) ≤ (γ L2 + 1)kδ − δ 0 kZ
on the boundary Γ. Therefore, Lipschitz stability results for the case of boundary
control of semilinear equations remain an open problem.
14
Stability and Sensitivity Analysis
LIPSCHITZ STABILITY OF SOLUTIONS TO SOME
STATE-CONSTRAINED ELLIPTIC OPTIMAL CONTROL
PROBLEMS
ROLAND GRIESSE
Abstract. In this paper, optimal control problems with pointwise state constraints for linear and semilinear elliptic partial differential equations are studied.
The problems are subject to perturbations in the problem data. Lipschitz stability
with respect to perturbations of the optimal control and the state and adjoint variables is established initially for linear–quadratic problems. Both the distributed
and Neumann boundary control cases are treated. Based on these results, and
using an implicit function theorem for generalized equations, Lipschitz stability is
also shown for an optimal control problem involving a semilinear elliptic equation.
1. Introduction
In this paper, we consider optimal control problems on bounded domains Ω ⊂ RN
of the form:
γ
1
ky − yd k2L2 (Ω) + ku − ud k2L2 (Ω)
(1.1)
Minimize
2
2
for the control u and state y, subject to linear or semilinear elliptic partial differential
equations. For instance, in the linear case with distributed control u we have
−∆y + a0 y = u
on Ω ,
y=0
on ∂Ω,
(1.2a)
while the boundary control case reads
∂y
+ β y = u on ∂Ω.
(1.2b)
∂n
Instead of the Laplace operator, an elliptic operator in divergence form is also permitted. Moreover, the problem is subject to pointwise state constraints
−∆y + a0 y = f
on Ω ,
ya ≤ y ≤ yb
on Ω (or Ω),
(1.3)
where ya and yb are the lower and upper bound functions, respectively. Unless otherwise specified, ya and yb may be arbitrary functions with values in R ∪ {±∞} such
that ya ≤ yb holds everywhere. Problems of type (1.1)–(1.3) appear as subproblems
after linearization of semilinear state-constrained optimal control problems, such as
the example considered in Section 3, but they are also of independent interest.
Under suitable conditions, one can show the existence of an adjoint state and a
Lagrange multiplier associated with the state constraint (1.3). We refer to [9] for
distributed control of elliptic equations and [6, 10, 12, 13] for their boundary control.
We also mention [7, 8, 33] and [3–5, 7, 11, 31–33] for distributed and boundary control,
respectively, of parabolic equations. In the distributed case, the optimality system
comprises
the state equation
−∆y + a0 y = u
on Ω
(1.4)
the adjoint equation
−∆λ = −(y − yd ) − µ on Ω
the optimality condition γ(u − ud ) − λ = 0
on Ω,
(1.5)
(1.6)
1. State Constrained Optimal Control Problems
15
and a complementarity condition for the multiplier µ associated with the state constraint (1.3).
In this paper, we extend the above-mentioned results by proving the Lipschitz
stability of solutions for semilinear and linear elliptic state-constrained optimal control
problems with respect to perturbations of the problem data. We begin by showing
that the linear–quadratic problem (1.1)–(1.3) admits solutions which depend Lipschitz
continuously on particular perturbations δ = (δ1 , δ2 , δ3 ) of the right hand sides in the
first order optimality system (1.4)–(1.6), i.e.,
−∆λ + (y − yd ) + µ = δ1
on Ω
γ(u − ud ) − λ = δ2
on Ω
−∆y + a0 y − u = δ3
on Ω
in the case of distributed control. The perturbations δ1 and δ2 generate additional
linear terms in the objective (1.1). Our main result for the linear–quadratic cases
is given in Theorems 2.3 and 4.3, for distributed and boundary control, respectively.
It has numerous applications: Firstly, it may serve as a starting point to prove the
convergence of numerical algorithms for nonlinear state-constrained optimal control
problems. The central notion in this context is the strong regularity property of
the first order necessary conditions, which precisely requires their linearization to
possess the Lipschitz stability proved in this paper, compare [2]. Secondly, proofs of
convergence of the discrete to the continuous solution as the mesh size tends to zero are
also based on the strong regularity property, see, e.g., [26]. Thirdly, our results ensure
the well-posedness of problem (1.1)–(1.3) in the following sense: If the optimality
system is solved only up to a residual δ (for instance, when solving it numerically), our
stability result implies that the approximate solution found is the exact and nearby
solution of a perturbed problem. Fourthly, our results can be used to prove the
Lipschitz stability for optimal control problems with semilinear elliptic equations and
with respect to more general perturbations by means of Dontchev’s implicit function
theorem for generalized equations, see [14]. We illustrate this technique in Section 3.
To the author’s knowledge, the Lipschitz dependence of solutions in optimal control
of partial differential equations (PDEs) in the presence of pointwise state constraints
has not yet been studied. Most existing results concern control-constrained problems:
Malanowski and Tröltzsch [28] prove Lipschitz dependence of solutions for a controlconstrained optimal control problem for a linear elliptic PDE subject to nonlinear
Neumann boundary control. In the course of their proof, the authors establish the
Lipschitz property also for the linear–quadratic problem obtained by linearization of
the first order necessary conditions. In [36], Tröltzsch proves the Lipschitz stability for
a linear–quadratic optimal control problem involving a parabolic PDE. In Malanowski
and Tröltzsch [27], this result is extended to obtain Lipschitz stability in the case of
a semilinear parabolic equation. In the same situation, Malanowski [25] has recently
proved parameter differentiability. This result is extended in [18, 19] to an optimal
control problem governed by a system of semilinear parabolic equations, and numerical
results are provided there. All of the above citations cover the case of pointwise control
constraints. Note also that the general theory developed in [23] does not apply to the
problems treated in the present paper since the hypothesis of surjectivity [23, (H3)] is
not satisfied for bilateral state constraints (1.3).
The case of state-constrained optimal control problems governed by ordinary differential equations was studied in [15, 24]. The analysis in these papers relies heavily
on the property that the state constraint multiplier µ is Lipschitz on the interval [0, T ]
of interest (see, e.g., [22]), so it cannot be applied to the present situation.
16
Stability and Sensitivity Analysis
The remainder of this paper is organized as follows: In Section 2, we establish the
Lipschitz continuity with respect to perturbations of optimal solutions in the linear–
quadratic distributed control case, in the presence of pointwise state constraints. In
Section 3, we use these results to obtain Lipschitz stability also for a problem governed
by a semilinear equation with distributed control, and with respect to a wider set of
perturbations. Finally, Section 4 is devoted to the case of Neumann (co-normal)
boundary control in the linear–quadratic case.
Throughout, let Ω be a bounded domain in RN for some N ∈ N, and let Ω denote
its closure. By C(Ω) we denote the space of continuous functions on Ω, endowed with
the norm of uniform convergence. C0 (Ω) is the subspace of C(Ω) of functions with
zero trace on the boundary. The dual spaces of C(Ω) and C0 (Ω) are known to be
M(Ω) and M(Ω), the spaces of finite signed regular measures with the total variation
norm, see for instance [17, Proposition 7.16] or [35, Theorem 6.19]. Finally, we denote
by W m,p (Ω) the Sobolev space of functions on Ω whose distributional derivatives up
to order m are in Lp (Ω), see Adams [1]. In particular, we write H m (Ω) instead
of W m,2 (Ω). The space W0m,p (Ω) is the closure of Cc∞ (Ω) (the space of infinitely
differentiable functions on Ω with compact support) in W m,p (Ω).
2. Linear–quadratic distributed control
Throughout this section, we are concerned with optimal control problems governed
by a state equation with an elliptic operator in divergence form and distributed control.
As delineated in the introduction, the problem depends on perturbation parameters
δ = (δ1 , δ2 , δ3 ):
Z
1
γ
2
2
Minimize
ky − yd kL2 (Ω) + ku − ud kL2 (Ω) − hy, δ1 iW,W 0 −
u δ2
(2.1)
2
2
Ω
over
u ∈ L2 (Ω)
s.t.
−div (A∇y) + a0 y = u + δ3
y=0
and
ya ≤ y ≤ yb
2
on Ω
(2.2)
on ∂Ω
(2.3)
on Ω.
(2.4)
H01 (Ω)
so that the pointwise state conWe work with the state space W = H (Ω) ∩
straint (2.4) is meaningful. The perturbations are introduced below. Let us fix the
standing assumption for this section:
Assumption 2.1. Let Ω be a bounded domain in RN (N ∈ {1, 2, 3}) with C 1,1 boundary ∂Ω, see [20, p. 5]. The state equation is governed by an operator with N × N
symmetric coefficient matrix A with entries aij which are Lipschitz continuous on Ω.
We assume the condition of uniform ellipticity: There exists m0 > 0 such that
ξ >A ξ ≥ m0 |ξ|2
for all ξ ∈ RN and almost all x ∈ Ω.
The coefficient a0 ∈ L∞ (Ω) is assumed to be nonnegative a.e. on Ω. Moreover, yd
and ud denote desired states and controls in L2 (Ω), respectively, while γ is a positive
number. The bounds ya and yb may be arbitrary functions on Ω such that the admissible
set KW = {y ∈ W : ya ≤ y ≤ yb on Ω} is nonempty.
The following result allows us to define the solution operator
Tδ : L2 (Ω) → W
such that y = Tδ (u) satisfies (2.2)–(2.3) for given δ and u. For the proof we refer
to [20, Theorems 2.4.2.5 and 2.3.3.2]:
1. State Constrained Optimal Control Problems
17
Proposition 2.2 (The State Equation). Given u and δ3 in L2 (Ω), the state equation
(2.2)–(2.3) has a unique solution y ∈ W in the sense that (2.2) is satisfied almost
everywhere on Ω. The solution verifies the a priori estimate
kykH 2 (Ω) ≤ cA ku + δ3 kL2 (Ω) .
(2.5)
In order to apply the results of this section to prove the Lipschitz stability of solutions in the semilinear case in Section 3, we consider here very general perturbations
(δ1 , δ2 , δ3 ) ∈ W 0 × L2 (Ω) × L2 (Ω),
where W 0 is the dual of the state space W. Of course, this comprises more regular
perturbations. In particular, (2.1) includes perturbations of the desired state in view
of
Z
1
1
2
2
ky − (yd + δ1 )kL2 (Ω) = ky − yd kL2 (Ω) −
y δ1 + c
2
2
Ω
where c is a constant. Likewise, δ2 covers perturbations in the desired control ud , and
δ3 accounts for perturbations in the right hand side of the PDE.
We can now state the main result of this section which proves the Lipschitz stability
of the optimal state and control with respect to perturbations. It relies on a variational
argument and does not invoke any dual variables.
Theorem 2.3 (Lipschitz Continuity). For any δ = (δ1 , δ2 , δ3 ) ∈ W 0 × L2 (Ω) × L2 (Ω),
problem (2.1)–(2.4) has a unique solution. Moreover, there exists a constant L >
0 such that for any two pertubations (δ10 , δ20 , δ30 ) and (δ100 , δ200 , δ300 ), the corresponding
solutions of (2.1)–(2.4) satisfy
ky 0 − y 00 kH 2 (Ω) + ku0 − u00 kL2 (Ω)
≤ L kδ10 − δ100 kW 0 + kδ20 − δ200 kL2 (Ω) + kδ30 − δ300 kL2 (Ω) .
Proof. Let δ ∈ W 0 × L2 (Ω) × L2 (Ω) be arbitrary. We introduce the shifted control
variable v := u + δ3 and define
γ
1
fe(y, v, δ) = ky − yd k2L2 (Ω) + kv − ud − δ3 k2L2 (Ω)
2
Z2
− hy, δ1 iW,W 0 −
Obviously, our problem is now to
minimize fe(y, v, δ)
Ω
(v − δ3 ) δ2 .
subject to (y, v) ∈ M
2
where M = {(y, v) ∈ KW × L (Ω) : −div (A∇y) + a0 y = v on Ω}. Due to Assumption 2.1, the feasible set M is nonempty, closed and convex and also independent of
δ. In view of γ > 0 and the a priori estimate (2.5), the objective is strictly convex.
It is also weakly lower semicontinuous and radially unbounded, hence it is a standard
result from convex analysis [16, Chapter II, Proposition 1.2] that (2.1)–(2.4) has a
unique solution (y, u) ∈ W × L2 (Ω) for any δ.
A necessary and sufficient condition for optimality is
fey (y, v, δ)(y − y) + fev (y, v, δ)(v − v) ≥ 0 for all (y, v) ∈ M.
(2.6)
Now let δ 0 and δ 00 be two perturbations with corresponding solutions (y 0 , v 0 ) and
(y 00 , v 00 ). From the variational inequality (2.6), evaluated at (y 0 , v 0 ) and with (y, v) =
(y 00 , v 00 ) we obtain
Z
Z
00
0
(y − yd )(y − y ) + γ (v 0 − ud − δ30 )(v 00 − v 0 )
Ω
Ω
Z
00
− hy − y 0 , δ10 iW,W 0 − (v 00 − v 0 ) δ20 ≥ 0
Ω
18
Stability and Sensitivity Analysis
By interchanging the roles of (y 0 , v 0 ) and (y 00 , v 00 ) and adding the inequalities, we obtain
ky 0 − y 00 k2L2 (Ω) + γ kv 0 − v 00 k2L2 (Ω)
≤ hy 0 − y 00 , δ10 − δ100 iW,W 0 + γ
Z
Ω
(v 0 − v 00 )(δ30 − δ300 ) +
Z
Ω
(v 0 − v 00 )(δ20 − δ200 )
≤ ky 0 − y 00 kH 2 (Ω) kδ10 − δ100 kW 0
+ kv 0 − v 00 kL2 (Ω) γ kδ30 − δ300 kL2 (Ω) + kδ20 − δ200 kL2 (Ω) .
Using the a priori estimate (2.5), the left hand side can be replaced by
γ
γ 0
kv − v 00 k2L2 (Ω) + 2 ky 0 − y 00 k2H 2 (Ω) .
2
2cA
Now we apply Young’s inequality to the right hand side and absorb the terms involving
the state and control into the left hand side, which yields the Lipschitz stability of y
and v, hence also of u.
As a precursor for the semilinear case in Section 3, we recall in Proposition 2.4 a
known result concerning the adjoint state and the Lagrange multiplier associated with
problem (2.1)–(2.4).
Proposition 2.4. Let δ ∈ W 0 × L2 (Ω) × L2 (Ω) be a given perturbation and let (y, u)
be the corresponding unique solution of (2.1)–(2.4). If KW has nonempty interior,
then there exists a unique adjoint variable λ ∈ L2 (Ω) and unique Lagrange multiplier
µ ∈ W 0 such that the following holds:
Z
Z
Z
− λ div (A∇y) + a0 λy = − (y − yd )y + hy, δ1 − µiW,W 0 ∀y ∈ W
(2.7)
Ω
Ω
Ω
hy, µiW,W 0 ≤ hy, µiW,W 0
γ(u − ud ) − λ = δ2
∀y ∈ KW
(2.8)
on Ω.
(2.9)
Proof. Let ye be an interior point of KW . Since Tδ0 (u) is an isomorphism from L2 (Ω) →
W, u
e can be chosen such that ye = Tδ (u) + Tδ0 (u)(e
u − u), hence a Slater condition is
satisfied. The rest of the proof can be carried out along the lines of Casas [9], or using
the abstract multiplier theorem [10, Theorem 5.2].
In the proposition above, we have assumed that KW has nonempty interior. This is
not a very restrictive assumption, as any ye ∈ KW satisfying ye − ya ≥ ε and yb − ye ≥ ε
on Ω for some ε > 0 is an interior point of KW .
Remark 2.5.
1. In [9], it was shown that the state constraint multiplier µ is indeed a measure
in M(Ω), i.e., µ has better regularity than just W 0 . However, in the following section
we will not be able to use this extra regularity.
2. In view of the previous statement, if δ1 ∈ M(Ω), then so is the right hand side
−(y − yd ) + δ1 − µ of the adjoint equation (2.7) and thus the adjoint state λ is an
element of W01,s (Ω) for all s ∈ [1, NN−1 ), see [9].
3. Note that we do not have a stability result for the Lagrange multiplier µ so that
we cannot use (2.7) to derive a stability result for the adjoint state λ even in the
presence of regular perturbations. This observation is very much in contrast with the
control-constrained case, where the control-constraint multiplier does not appear in the
adjoint equation’s right hand side and hence the stability of λ can be obtained using an
a priori estimate for the adjoint PDE.
4. Nevertheless, from the optimality condition (2.9) we can derive the Lipschitz
estimate
kλ0 − λ00 kL2 (Ω) ≤ (γL + 1) kδ 0 − δ 00 k
(2.10)
1. State Constrained Optimal Control Problems
19
for the adjoint states belonging to two perturbations δ 0 and δ 00 . However, we use here
that the control is distributed on all of Ω.
We close this section by another observation: Let δ 0 and δ 00 be two perturbations
with associated optimal states y 0 and y 00 and Lagrange multipliers µ0 and µ00 . Then
hy 0 − y 00 , µ0 − µ00 iW,W 0 ≤ 0
holds, as can be inferred directly from (2.8).
3. A semilinear distributed control problem
In this section we show how the Lipschitz stability results for state-constrained linear–quadratic optimal control problems can be transferred to semilinear problems using an appropriate implicit function theorem for generalized equations, see
Dontchev [14] and also Robinson [34]. To illustrate this technique, we consider the
following parameter-dependent problem P(p):
Minimize
over
s.t.
γ
1
ky − yd k2L2 (Ω) + ku − ud k2L2 (Ω)
2
2
u ∈ L2 (Ω)
−D∆y + βy 3 + αy = u + f
y=0
and
(3.1)
ya ≤ y ≤ yb
on Ω
(3.2)
on ∂Ω
(3.3)
on Ω.
(3.4)
The semilinear state equation is a stationary Ginzburg–Landau model, see [21]. We
work again with the state space W = H 2 (Ω) ∩ H01 (Ω). Throughout this section, we
make the following standing assumption:
Assumption 3.1. Let Ω be a bounded domain in RN (N ∈ {1, 2, 3}) with C 1,1 boundary. Let D, α and β be positive numbers, and let f ∈ L2 (Ω). Moreover, let yd and ud
be in L2 (Ω) and γ > 0. The bounds ya and yb may be arbitrary functions on Ω such
that the admissible set KW = {y ∈ W : ya ≤ y ≤ yb on Ω} has nonempty interior.
The results obtained in this section can immediately be generalized to the state
equation
−div (A∇y) + φ(y) = u + f
with appropriate assumptions on the semilinear term φ(y). However, we prefer to
consider an example which explicitly contains a number of parameters which otherwise would be hidden in the nonlinearity. In the example above, we can take
p = (yd , ud , f, D, α, β, γ) ∈ Π = [L2 (Ω)]3 × R4 as the perturbation parameter and
we introduce
Π+ = {p ∈ P : D > 0, α > 0, β > 0, γ > 0}.
In the sequel, we refer to problem (3.1)–(3.4) as P(p) when we wish to emphasize its
dependence on the parameter p. Note that in contrast to the previous section, the
parameter p now appears in a more complicated fashion which cannot be expressed
solely as right hand side perturbations of the optimality system.
Proposition 3.2 (The State Equation). For fixed parameter p ∈ Π+ and for any
given u in L2 (Ω), the state equation (3.2)–(3.3) has a unique solution y ∈ W in the
sense that y satisfies (3.2) almost everywhere on Ω. The solution depends Lipschitz
continuously on the data, i.e., there exists c > 0 such that
ky − y 0 kH01 (Ω) ≤ c ku − u0 kL2 (Ω)
20
Stability and Sensitivity Analysis
holds for all u, u0 in L2 (Ω). Moreover, the nonlinear solution map
Tp : L2 (Ω) → H 2 (Ω) ∩ H01 (Ω)
defined by u 7→ y is Fréchet differentiable. Its derivative Tp0 (u)δu at u in the direction
of δu is given by the unique solution δy of
−D∆δy + (3βy 2 + α) δy = δu
δy = 0
on Ω
on ∂Ω
where y = Tp (u). Moreover, Tp0 (u) is an isomorphism from L2 (Ω) → W.
Proof. Existence and uniqueness in H01 (Ω) of the solution for (3.2)–(3.3) and the assertion of Lipschitz continuity follow from the theory of monotone operators, see [37,
p. 557], applied to
A : H01 (Ω) 3 y 7→ −D∆y + βy 3 + αy − f ∈ H −1 (Ω).
Note that A is strongly monotone, coercive, and hemicontinuous. The solution’s H 2 (Ω)
regularity now follows from considering βy 3 an additional source term, which is in
L2 (Ω) due to the Sobolev Embedding Theorem (see [1, p. 97]). Fréchet differentiability
of the solution map is a consequence of the implicit function theorem, see, e.g., [38,
p. 250]. The isomorphism property of Tp0 (u) follows from Proposition 2.2. Note that
3βy 2 + α ∈ L∞ (Ω) since y ∈ L∞ (Ω).
Before we turn to the main discussion, we state the following existence result for
global minimizers:
Lemma 3.3. For any given parameter p ∈ Π+ , P(p) has a global optimal solution.
Proof. The proof follows a standard argument and is therefore only sketched. Let
{(yn , un )} be a feasible minimizing sequence for the objective (3.1). Then {un } is
bounded in L2 (Ω) and, by Lipschitz continuity of the solution map, {yn } is bounded
in H01 (Ω). Extracting weakly convergent subsequences, one shows that the weak limit
satisfies the state equation (3.2)–(3.3). By compactness of the embedding H01 (Ω) ,→
L2 (Ω) (see [1, p. 144]) and extracting a pointwise a.e. convergent subsequence of {yn },
one sees that the limit satisfies the state constraint (3.4). Weak lower semicontinuity
of the objective (3.1) completes the proof.
For the remainder of this section, let p∗ = (yd∗ , u∗d , f ∗ , α∗ , β ∗ , γ ∗ ) ∈ Π+ denote
a fixed reference parameter. Our strategy for proving the Lipschitz dependence of
solutions for P(p) near p∗ with respect to changes in the parameter p is as follows:
1. We verify a Slater condition and show that for every local optimal solution of
P(p∗ ), there exists an adjoint state and a Lagrange multiplier satisfying a certain first
order necessary optimality system (Proposition 3.5).
2. We pick a solution (y ∗ , u∗ , λ∗ ) of the first order optimality system (for instance
the global minimizer) and rewrite the optimality system as a generalized equation.
3. We linearize this generalized equation and introduce new perturbations δ which
correspond to right hand side perturbations of the optimality system. We identify
this generalized equation with the optimality system of an auxiliary linear-quadratic
optimal control problem AQP(δ), see Lemma 3.7.
4. We assume a coercivity condition (AC) for the Hessian of the Lagrangian at
(y ∗ , u∗ , λ∗ ) and use the results obtained in Section 2 to prove the existence and uniqueness of solutions to AQP(δ) and their Lipschitz continuity with respect to δ. Consequently, the solutions to the linearized generalized equation from Step 3 are unique
and depend Lipschitz continuously on δ (Proposition 3.9).
5. In virtue of an implicit function theorem for generalized equations [14], the
solutions of the optimality system for P(p) near p∗ are shown to be locally unique and
to depend Lipschitz continuously on the perturbation p (Theorem 3.10).
1. State Constrained Optimal Control Problems
21
6. We verify that the coercivity condition (AC) implies second order sufficient
conditions, which are then shown to be stable under perturbations, to the effect that
solutions of the optimality system are indeed local optimal solutions of the perturbed
problem (Theorem 3.11).
We refer to the individual steps as Step 1–Step 6 and begin with Step 1. For
the proof of adjoint states and Lagrange multipliers, we verify the following Slater
condition:
Lemma 3.4 (Slater Condition). Let p ∈ Π+ and let u be a local optimal solution for
problem P(p) with optimal state y = Tp (u). Then there exists u
ep ∈ L2 (Ω) such that
ye := Tp (u) + Tp0 (u)(e
up − u)
(3.5)
lies in the interior of the set of admissible states KW .
Proof. By Assumption 3.1 there exists an interior point ye of KW . Since Tp0 (u) is an
isomorphism, u
e can be chosen such that (3.5) is satisfied.
Using this Slater condition, the following result follows directly from the abstract
multiplier theorem in [10, Theorem 5.2]:
Proposition 3.5 (Lagrange Multipliers). Let p ∈ Π+ and let (y, u) ∈ W × L2 (Ω) be
a local optimal solution for problem P(p). Then there exists a unique adjoint variable
λ ∈ L2 (Ω) and unique Lagrange multiplier µ ∈ W 0 such that
Z
Z
Z
(3.6)
−D λ∆y + (3β|y|2 + α)λ y = − (y − yd ) y − hy, µiW,W 0 ∀y ∈ W
Ω
Ω
Ω
hy, µiW,W 0 ≤ hy, µiW,W 0
γ(u − ud ) − λ = 0
∀y ∈ KW
(3.7)
on Ω.
(3.8)
From now on, we denote by (y ∗ , u∗ , λ∗ ) a local optimal solution of (3.1)–(3.4) for
the parameter p∗ with corresponding adjoint state λ∗ and multiplier µ∗ .
Our next Step 2 is to rewrite the optimality system as a generalized equation in
the form 0 ∈ F (y, u, λ; p)+N (y) where N is a set-valued operator which represents the
variational inequality (3.7) using the dual cone of the admissible set KW . We define
F : W × L2 (Ω) × L2 (Ω) × Π → W 0 × L2 (Ω) × L2 (Ω)


−D∆λ + (3βy 2 + α)λ + (y − yd )

F (y, u, λ; p) = 
γ(u − ud ) − λ
3
−D∆y + βy + αy − u − f
and
N (y) = {µ ∈ W 0 : hy − y, µiΩ ≤ 0
for all y ∈ KW } × {0} × {0} ⊂ Z
if y ∈ KW , and N (y)R= ∅ else. The term ∆λ is understood in the sense of distributions,
i.e., h∆λ, φiW 0 ,W = Ω λ∆φ for all φ ∈ W.
It is now easy to check that the optimality system (3.2)–(3.3), (3.6)–(3.7) is equivalent to the generalized equation
0 ∈ F (y, u, λ; p) + N (y).
(3.9)
+
Hence a solution (y, u, λ) of (3.9) for given p ∈ Π will be called a critical point. For
future reference, we summarize the following evident properties of the operator F :
Lemma 3.6 (Properties of F ).
(a) F is partially Fréchet differentiable with respect to (y, u, λ) in a neighborhood
of (y ∗ , u∗ , λ∗ ; p∗ ). (This partial derivative is denoted by F 0 .)
(b) The map (y, u, λ; p) 7→ F 0 (y, u, λ; p) is continuous at (y ∗ , u∗ , λ∗ ; p∗ ).
22
Stability and Sensitivity Analysis
(c) F is Lipschitz in p, uniformly in (y, u, λ) at (y ∗ , u∗ , λ∗ ), i.e., there exist L > 0
and neighborhoods U of (y ∗ , u∗ , λ∗ ) in W × L2 (Ω) × L2 (Ω) and V of p∗ in P
such that
kF (y, u, λ; p1 ) − F (y, u, λ; p2 )k ≤ L kp1 − p2 kP
for all (y, u, λ) ∈ U and all p1 , p2 ∈ V .
In Step 3 we set up the following linearization:
δ ∈ F (y ∗ , u∗ , λ∗ ; p∗ ) + F 0 (y ∗ , u∗ , λ∗ ; p∗ )
y−y ∗
u−u∗
λ−λ∗
!
+ N (y).
(3.10)
For the present example, (3.10) reads
  

−D∗ ∆λ + (3β ∗ |y ∗ |2 +α∗ )λ + 6β ∗ y ∗ λ∗ (y−y ∗ ) + y−yd∗
δ1
 δ2  ∈ 
 + N (y). (3.11)
γ ∗ (u − u∗d ) − λ
∗
∗ ∗ 2
∗
∗ ∗ 3
∗
δ3
−D ∆y + (3β |y | + α )y − 2β (y ) − u − f
We confirm in Lemma 3.7 below that (3.11) is exactly the first order optimality system
for the following auxiliary linear–quadratic optimal control problem, termed AQP(δ):
Z
γ∗
1
∗ 2
∗
ky − yd kL2 (Ω) + 3β
y ∗ λ∗ (y − y ∗ )2 + ku − u∗d k2L2 (Ω)
Minimize
2
2
Ω
Z
(3.12)
− hy, δ1 iW,W 0 −
u δ2
Ω
u ∈ L2 (Ω)
over
s.t.
−D∗ ∆y + (3β ∗ |y ∗ |2 + α∗ ) y = u + f ∗ + 2β ∗ (y ∗ )3 + δ3
y=0
and
ya ≤ y ≤ yb
on Ω
(3.13)
on ∂Ω
(3.14)
on Ω.
(3.15)
Lemma 3.7. Let δ ∈ W 0 ×L2 (Ω)×L2 (Ω) be arbitrary. If (y, u) ∈ W ×L2 (Ω) is a local
optimal solution for AQP(δ), then there exists a unique adjoint variable λ ∈ L2 (Ω)
and unique Lagrange muliplier µ ∈ W 0 such that (3.11) is satisfied with µ ∈ N (y).
Proof. We note that the state equation (3.13)–(3.14) defines an affine solution operator
T : L2 (Ω) → W which turns out to satisfy
T (u) = Tp∗ (u∗ ) + Tp0 ∗ (u∗ )(u − u∗ + δ3 ).
Hence if u is a local optimal solution of (3.12)–(3.15) with optimal state y = T (u),
then ye and u
ep∗ − δ3 , taken from Lemma 3.4, satisfy the Slater condition ye = T (u) +
0
∗
T (u)(e
up − δ3 − u) with ye in the interior of KW . Along the lines of Casas [9], or using
the abstract multiplier theorem [10, Theorem 5.2], one proves as in Proposition 2.4
that there exist λ ∈ L2 (Ω) and µ ∈ W 0 such that
Z
Z
−D∗ λ∆y +
(3β ∗ |y ∗ |2 + α∗ )λ
Ω
Ω
+ 6β ∗ y ∗ λ∗ (y − y ∗ ) + y − yd∗ ] y = hy, δ1 − µiW,W 0
∀y ∈ W
γ ∗ (u − u∗d ) − λ = δ2
on Ω
hµ, y − yiW 0 ,W ≤ 0
∀y ∈ KW
hold. That is,
−D∗ ∆λ + (3β ∗ |y ∗ |2 + α∗ )λ + 6β ∗ y ∗ λ∗ (y − y ∗ ) + y − yd∗ − δ1 + µ = 0,
and µ ∈ N (y) holds. Hence, (3.11) is satisfied.
1. State Constrained Optimal Control Problems
23
In order that AQP(δ) has a unique global solution, we assume the following coercivity property:
Assumption 3.8. Suppose that at the reference solution (y ∗ , u∗ ) with corresponding
adjoint state λ∗ , there exists ρ > 0 such that
Z
γ∗
1
kyk2L2 (Ω) + 3β ∗
y ∗ λ∗ |y|2 + kuk2L2 (Ω) ≥ ρ kyk2H 2 (Ω) + kuk2L2 (Ω)
(AC)
2
2
Ω
holds for all (y, u) ∈ W × L2 (Ω) which obey
−D∗ ∆y + (3β ∗ |y ∗ |2 + α∗ ) y = u
y=0
∗
on Ω
(3.16a)
on ∂Ω.
(3.16b)
∗ ∗
Note that Assumption 3.8 is satisfied if β ky λ kL2 (Ω) is sufficiently small, since
then the second term in (AC) can be absorbed into the third.
Proposition 3.9. Suppose that Assumption 3.8 holds and let δ ∈ W 0 × L2 (Ω) × L2 (Ω)
be given. Then AQP(δ) is strictly convex and thus it has a unique global solution. The
generalized equation (3.11) is a necessary and sufficient condition for local optimality, hence (3.11) is also uniquely solvable. Moreover, the solution depends Lipschitz
continuously on δ.
Proof. Due to (AC), the quadratic part of the objective (3.12) is strictly convex,
independent of δ. Hence we may repeat the proof of Theorem 2.3 with only minor
modifications due to the now different objective (3.12). The existence of a unique
adjoint state follows as in Proposition 2.4 and it is Lipschitz in δ by (2.10). We
conclude that for any given δ, AQP(δ) has a unique solution (y, u) and adjoint state
λ which depend Lipschitz continuously on δ. In addition, the necessary conditions
(3.11) are sufficient, hence the generalized equation (3.10) is uniquely solvable and its
solution depends Lipschitz continuously on δ.
We note in passing that the property assured by Proposition 3.9 is called strong
regularity of the generalized equation (3.9). We are now in the position to give our
main theorem (Step 5):
Theorem 3.10 (Lipschitz Stability for P(p)). Let Assumption 3.8 be satisfied. Then
there are numbers ε, ε0 > 0 such that for any two parameter vectors (yd0 , u0d , f 0 , D0 , α0 , β 0 , γ 0 )
and (yd00 , u00d , f 00 , D00 , α00 , β 00 , γ 00 ) in the ε-ball around p∗ in Π, there are critical points
(y 0 , u0 , λ0 ) and (y 00 , u00 , λ00 ), i.e., solutions of (3.9), which are unique in the ε0 -ball of
(y ∗ , u∗ , λ∗ ). These solutions depend Lipschitz continuously on the parameter perturbation, i.e., there exists L > 0 such that
ky 0 − y 00 kH 2 (Ω) + ku0 − u00 kL2 (Ω) + kλ0 − λ00 kL2 (Ω)
≤ L kyd0 − yd00 k2L2 (Ω) + ku0d − u00d k2L2 (Ω) + kf 0 − f 00 kL2 (Ω)
+ |D0 − D00 | + |α0 − α00 | + |β 0 − β 00 | + |γ 0 − γ 00 | .
Proof. Using the properties of F (Lemma 3.6) and the strong regularity of the first
order necessary optimality conditions (3.9) (Proposition 3.9), the claim follows directly
from the implicit function theorem for generalized equations [14, Theorem 2.4 and
Corollary 2.5].
In the sequel, we denote these critical points by (yp , up , λp ). Finally, in Step 6 we
are concerned with second order sufficient conditions:
Theorem 3.11 (Second Order Sufficient Conditions). Suppose that Assumption 3.8
holds and that ya , yb ∈ H 2 (Ω). Then second order sufficient conditions are satisfied at
(y ∗ , u∗ ). Moreover, there exists ε > 0 (possibly smaller than above) such that second
24
Stability and Sensitivity Analysis
order sufficient conditions hold also at the perturbed critical points in the ε-ball around
p∗ . Hence they are indeed local minimizers of the perturbed problems P(p).
Proof. In order to apply the theory of Maurer [29], we make the following identifications:
G1 (y, u) = ∆y − β y 3 − α y + u + f
K1 = {0} ⊂ Y1 = L2 (Ω)
G2 (y, u) = (y − ya , yb − y)>
K2 = [{ϕ ∈ H 2 (Ω) : ϕ ≥ 0 on Ω}]2 ⊂ Y2 = [H 2 (Ω)]2 .
Note that K2 is a convex closed cone of Y2 with nonempty interior. For instance,
ϕ ≡ 1 is an interior point. Since Π+ is open, one has p ∈ Π+ for all p such that
kp − p∗ k < ε for sufficiently small ε. Consequently, the Slater condition (Lemma 3.4)
is satisfied also at the perturbed critical points. That is, there exists u
ep such that
ye = Tp (up ) + Tp0 (up )(e
up − up ) holds. This entails that (yp , up ) is a regular point in the
sense of [29, equation (2.3)] with the choice
0
Tp (up )(e
up − up )
.
h=
u
ep − up
The multiplier theorem [29, Theorem 2.1] yields the existence of λp and nonnegative
−
0
µ+
p , µp ∈ W which coincide with our adjoint variable and state constraint multiplier
−
via µp = µ+
p − µp .
We continue by defining the Lagrangian
L(y, u, λ, µ+ , µ− ; p) =
γ
1
ky − yd k2L2 (Ω) + ku − ud k2L2 (Ω)
2 Z
2
+
−∆y + βy 3 + αy − u − f λ
Ω
+ hya − y, µ− iW,W 0 + hy − yb , µ+ iW,W 0 .
By coercivity assumption (AC), abbreviating x = (y, u), we find that the Lagrangian’s
second derivative with respect to x,
Z
1
γ∗
Lxx (y ∗ , u∗ , λ∗ ; p∗ )(x, x) = kyk2L2 (Ω) + 3β ∗
y ∗ λ∗ |y|2 + kuk2L2 (Ω)
2
2
Ω
(which no longer depends on µ) is coercive on the space of all (y, u) satisfying (3.16),
thus, in particular, the second order sufficient conditions [29, Theorem 2.3] are satisfied
at the nominal critical point (y ∗ , u∗ , λ∗ ).
We now show that (AC) continues to hold at the perturbed Kuhn-Tucker points.
The technique of proof is inspired by [27, Lemma 5.2]. For a parameter p from the
ε-ball around p∗ , we denote by (yp , up , λp ) the corresponding solution of the first order
necessary conditions (3.9). One easily sees that
Lxx (yp , up , λp ; p)(x, x) − Lxx (y ∗ , u∗ , λ∗ ; p∗ )(x, x) ≤ c1 ε0 kxk2
(3.17)
holds for some c1 > 0 and for all x = (y, u) ∈ W × L2 (Ω), the norm being the usual
norm of the product space. For arbitrary u ∈ L2 (Ω), let y satisfy the linear PDE
−D∆y + (3βyp2 + α)y = u
y=0
on Ω
(3.18a)
on ∂Ω.
(3.18b)
Let y be the solution to (3.16) corresponding to the control u, then y − y satisfies
−D∗ ∆y + (3β ∗ |y ∗ |2+α∗ )y = (3β ∗ |y ∗ |2+α∗ )− (3βyp2+α) y + (D−D∗ )∆y on Ω
1. State Constrained Optimal Control Problems
25
and y = 0 on ∂Ω, i.e., by the standard a priori estimate and boundedness of k3βyp2 +
αkL∞ (Ω) near p∗ ,
ky − ykH 2 (Ω) ≤ c2 ε0 kykH 2 (Ω)
(3.20)
holds with some c2 > 0. Using the triangle inequality, we obtain from (3.20)
ky − ykH 2 (Ω) ≤
c2 ε0
kykH 2 (Ω) .
1 − c2 ε0
We have thus proved that for any x = (y, u) which satisfies (3.18), there exists x =
(y, u) which satisfies (3.16) such that
kx − xk ≤
c2 ε0
kxk.
1 − c2 ε0
(3.21)
Using the estimate from Maurer and Zowe [30, Lemma 5.5], it follows from (3.21) that
Lxx (y ∗ , u∗ , λ∗ ; p∗ )(x, x) ≥ ρ0 kxk2
(3.22)
holds with some ρ0 > 0. Combining (3.17) and (3.22) finally yields
Lxx (yp , up , λp ; p)(x, x) ≥ Lxx (y ∗ , u∗ , λ∗ ; p∗ )(x, x) − c1 ε0 kxk2
≥ (ρ0 − c1 ε0 )kxk2
which proves that (AC) holds at the perturbed Kuhn-Tucker points, possibly after
further reducing ε0 . Concluding as above for the nominal solution, the second order
sufficient conditions in [29, Theorem 2.3] imply that (yp , up ) is in fact a local optimal
solution for our problem (3.1)–(3.4).
4. Linear–quadratic boundary control
In this section, we briefly cover the case of optimal boundary control of a linear
elliptic equation with quadratic objective. Due to the similarity of the arguments
to the ones used in Section 2, they are kept short. We consider the optimal control
problem, subject to perturbations δ = (δ1 , δ2 , δ3 ):
Z
Z
γ
1
2
2
(4.1)
ky − yd kL2 (Ω) + ku − ud kL2 (∂Ω) − y dδ1 − u δ2
Minimize
2
2
Ω
∂Ω
u ∈ L2 (∂Ω)
over
s.t.
−div (A∇y) + a0 y = f
∂y/∂nA + β y = u + δ3
and
ya ≤ y ≤ yb
on Ω
(4.2)
on ∂Ω
(4.3)
on Ω.
(4.4)
where ∂/∂nA denotes the co-normal derivative of y corresponding to A, i.e., ∂y/∂nA =
n>A∇y. The standing assumption for this section is the following one:
Assumption 4.1. Let Ω be a bounded domain in RN (N ∈ {1, 2}) with C 1,1 boundary
∂Ω, see [20, p. 5]. The state equation is governed by an operator with N ×N symmetric
coefficient matrix A with entries aij which are Lipschitz continuous on Ω. We assume
the condition of uniform ellipticity: There exists m0 > 0 such that
ξ >A ξ ≥ m0 |ξ|2
for all ξ ∈ RN and almost all x ∈ Ω.
The coefficient a0 ∈ L∞ (Ω) is assumed to satisfy ess inf a0 > 0, while β ∈ L∞ (∂Ω) is
nonnegative. Finally, the source term f is an element of L2 (Ω). Again, yd ∈ L2 (Ω)
and ud ∈ L2 (∂Ω) denote desired states and controls, while γ is a positive number.
The bounds ya and yb may be arbitrary functions on Ω such that the admissible set
KC(Ω) = {y ∈ C(Ω) : ya ≤ y ≤ yb on Ω} is nonempty.
26
Stability and Sensitivity Analysis
Note that we restrict ourselves to one- and two-dimensional domains, as in three
dimensions we would need the control u ∈ Ls (∂Ω) for some s > 2 to obtain solutions
in C(Ω) for which a pointwise state constraint is meaningful.
Proposition 4.2 (The State Equation). Under Assumption 4.1, and given u and δ3
in L2 (∂Ω), the state equation (4.2)–(4.3) has a unique solution y ∈ H 1 (Ω) ∩ C(Ω) in
the weak sense:
Z
Z
Z
Z
Z
A∇y · ∇y + a0 yy +
(4.5)
βyy = f y +
uy for all y ∈ H 1 (Ω).
Ω
Ω
∂Ω
Ω
∂Ω
The solution verifies the a priori estimate
kykH 1 (Ω) + kykC(Ω) ≤ cA kukL2 (∂Ω) + kδ3 kL2 (∂Ω) + kf kL2 (Ω) .
Proof. Uniqueness and existence of the solution in H 1 (Ω) and the a priori bound
in H 1 (Ω) follow directly from the Lax–Milgram Theorem applied to the variational
equation (4.5). The proof of C(Ω) regularity and the corresponding a priori estimate
follow from Casas [10, Theorem 3.1] if β y is considered a right hand side term.
The perturbations are taken as (δ1 , δ2 , δ3 ) ∈ M(Ω) × L2 (∂Ω) × L2 (∂Ω). They comprise in particular perturbations of the desired state yd and control ud . Notice that δ3
affects only the boundary data so that, as in the proof of Theorem 2.3, we can absorb
this perturbation into the control and obtain an admissible set independent of δ.
Theorem 4.3 (Lipschitz Continuity). For any δ = (δ1 , δ2 , δ3 ) ∈ M(Ω) × L2 (∂Ω) ×
L2 (∂Ω), problem (4.2)–(4.4) has a unique solution. Moreover, there exists a constant
L > 0 such that for any two (δ10 , δ20 , δ30 ) and (δ100 , δ200 , δ300 ), the corresponding solutions of
(4.2)–(4.4) satisfy
ky 0 − y 00 kH 1 (Ω) + ky 0 − y 00 kC(Ω) + ku0 − u00 kL2 (∂Ω)
≤ L kδ10 − δ100 kM(Ω) + kδ20 − δ200 kL2 (∂Ω) + kδ30 − δ300 kL2 (∂Ω) .
Similar to the distributed control case, if KC(Ω) has nonempty interior, one can
prove the existence of an adjoint state λ ∈ W 1,s (Ω) for all s ∈ [1, NN−1 ) and Lagrange
multiplier µ ∈ M(Ω) such that
hµ, y − yiM(Ω),C(Ω) ≤ 0
γ(u − ud ) − λ = δ2
−div (A∇λ) + a0 λ = −(y − yd ) − µΩ + δ1Ω
∂λ
+ β λ = −µ∂Ω + δ1∂Ω
∂nA
∀y ∈ KC(Ω)
(4.6a)
on ∂Ω
(4.6b)
on Ω
(4.6c)
on ∂Ω
(4.6d)
where (4.6c) is understood in the sense of distributions, and (4.6d) holds in the sense
of traces (see Casas [10]). The measures µΩ and µ∂Ω are obtained by restricting µ to
Ω and ∂Ω, respectively, and the same splitting applies to δ1 .
Note that again, we have no stability result for the Lagrange multiplier µ, and
hence we cannot derive a stability result for the adjoint state λ from (4.6c)–(4.6d).
We merely obtain from (4.6b) that on the boundary ∂Ω,
kλ0 − λ00 kL2 (∂Ω) ≤ (γL + 1)kδ 0 − δ 00 k
holds. Unless the state constraint is restricted to the boundary ∂Ω, this difficulty prevents the treatment of a semilinear boundary control case along the lines of Section 3.
1. State Constrained Optimal Control Problems
27
5. Conclusion
In this paper, we have proved the Lipschitz stability with respect to perturbations
of solutions to pointwise state-constrained optimal control problems for elliptic equations. For distributed control, it was shown how the stability result for linear state
equations can be extended to the semilinear case, using an implicit function theorem
for generalized equations. In the boundary control case, this method seems not applicable since we are lacking a stability estimate for the state constraint multiplier and
thus for the adjoint state on the domain Ω. This is due to the fact that the control
variable and the state constraint act on different parts of the domain Ω.
Acknowledgments
The author would like to thank the anonymous referees for their suggestions which
have led to a significant improvement of the presentation. This work was supported
in part by the Austrian Science Fund under SFB F003 ”Optimization and Control”.
References
[1] Adams, R, Sobolev Spaces. New York: Academic Press 1975.
[2] Alt, W., The Lagrange-Newton method for infinite-dimensional optimization problems. Numer.
Funct. Anal. Optim. 11 (1990), 201 – 224.
[3] Arada, N. and Raymond, J. P., Optimality conditions for state-constrained Dirichlet boundary
control problems. J. Optim. Theory Appl. 102 (1999)(1), 51 – 68.
[4] Arada, N. and Raymond, J. P., Optimal control problems with mixed control-state constraints.
SIAM J. Control Optim. 39 (2000)(5), 1391 – 1407.
[5] Arada, N. and Raymond, J. P., Dirichlet boundary control of semilinear parabolic equations (II):
Problems with pointwise state constraints. Appl. Math. Optim. 45 (2002)(2), 145 – 167.
[6] Bergounioux, M., On boundary state constrained control problems. Numer. Funct. Anal. Optim.
14 (1993)(5–6), 515 – 543.
[7] Bergounioux, M., Optimal control of parabolic problems with state constraints: A penalization
method for optimality conditions. Appl. Math. Optim. 29 (1994)(3), 285 – 307.
[8] Bergounioux, M. and Tröltzsch, F., Optimality conditions and generalized bang-bang principle
for a state-constrained semilinear parabolic problem. Numer. Funct. Anal. Optim. 17 (1996)(5–
6), 517 – 536.
[9] Casas, E., Control of an elliptic problem with pointwise state constraints. SIAM J. Control
Optim. 24 (1986)(6), 1309 – 1318.
[10] Casas, E., Boundary control of semilinear elliptic equations with pointwise state constraints.
SIAM J. Control Optim. 31 (1993)(4), 993 – 1006.
[11] Casas, E., Raymond, J. P. and Zidani, H., Pontryagin’s principle for local solutions of control
problems with mixed control-state constraints. SIAM J. Control Optim., 39 (2000)(4), 1182 –
1203.
[12] Casas, E. and Tröltzsch, F., Second-order necessary optimality conditions for some stateconstrained control problems of semilinear elliptic equations. Appl. Math. Optim. 39 (1999)(2),
211 – 227.
[13] Casas, E., Tröltzsch, F. and Unger, A., Second order aufficient optimality conditions for some
state-constrained control problems of semilinear elliptic equations. SIAM J. Control Optim. 38
(2000)(5), 1369 – 1391.
[14] Dontchev, A., Implicit function theorems for generalized equations. Math. Programming 70
(1995), 91 – 106.
[15] Dontchev,
A. and Hager, W.,
Lipschitzian stability for state constrained
nonlinear optimal control problems. SIAM J. Control Optim. 36 (1998)(2),
698 – 718.
[16] Ekeland, I. and Temam, R., Convex Analysis and Variational Problems. Amsterdam: NorthHolland 1976.
[17] Folland, G., Real Analysis. New York: Wiley 1984.
[18] Griesse, R., Parametric sensitivity analysis in optimal control of a reaction-diffusion system (I):
Solution differentiability. Numer. Funct. Anal. Optim. 25 (2004)(1–2), 93 – 117.
[19] Griesse, R., Parametric sensitivity analysis in optimal control of a reaction-diffusion system (II):
Practical methods and examples. Optim. Methods Softw. 19 (2004)(2), 217 – 242.
[20] Grisvard, P., Elliptic Problems in Nonsmooth Domains. Boston: Pitman 1985.
28
Stability and Sensitivity Analysis
[21] Gunzburger, M., Hou, L. and Svobodny, T., Finite element approximations of an optimal control problem associated with the scalar Ginzburg–Landau equation. Comput. Math. Appl. 21
(1991)(2–3), 123 – 131.
[22] Hager, W.: Lipschitz continuity for constrained processes. SIAM J. Control Optim. 17 (1979),
321 – 338.
[23] Ito, K. and Kunisch, K., Sensitivity analayis of solutions to optimization problems in Hilbert spaces with applications to optimal control and estimation.
J. Diff. Equations 99 (1992)(1), 1 – 40.
[24] Malanowski, K., Stability and sensitivity of solutions to nonlinear optimal control problems.
Appl. Math. Optim. 32 (1995)(2), 111 – 141.
[25] Malanowski, K., Sensitivity analysis for parametric optimal control of semilinear parabolic equations. J. Convex Anal. 9 (2002)(2), 543 – 561.
[26] Malanowski, K., Büskens, C. and Maurer, H., Convergence of approximations to nonlinear optimal control problems. In: Mathematical Programming with Data Perturbations (ed.: A. Fiacco).
Lecture Notes Pure Appl. Math. 195. New York: Dekker 1998, pp. 253 – 284
[27] Malanowski, K. and Tröltzsch, F., Lipschitz stability of solutions to parametric optimal control
for parabolic equations. Z. Anal. Anwendungen 18 (1999)(2), 469 – 489.
[28] Malanowski, K. and Tröltzsch, F., Lipschitz stability of solutions to parametric optimal control
for elliptic equations. Control Cybernet. 29 (2000), 237 – 256.
[29] Maurer, H., First and second order sufficient optimality conditions in mathematical programming
and optimal control. Math. Programming Study 14 (1981), 163 – 177.
[30] Maurer, H. and Zowe, J., First and second order necessary and sufficient optimality conditions
for infinite-dimensional programming problems. Math. Programming 16 (1979), 98 – 110.
[31] Raymond. J. P., Nonlinear boundary control of semilinear parabolic problems with pointwise
state constraints. Discrete Contin. Dynam. Systems Series A 3 (1997)(3), 341 – 370.
[32] Raymond. J. P., Pontryagin’s principle for state-constrained control problems governed by parabolic equations with unbounded controls. SIAM J.Control Optim. 36 (1998)(6), 1853 – 1879.
[33] Raymond, J. P. and Tröltzsch, F., Second order sufficient optimality conditions for nonlinear
parabolic control problems with state constraints. Discrete Contin. Dynam. Systems Series A 6
(2000)(2), 431 – 450.
[34] Robinson, St. M., Strongly regular generalized equations. Math. Oper. Res. 5 (1980)(1), 43 – 62.
[35] Rudin, W., Real and Complex Analysis. New York: McGraw–Hill 1987.
[36] Tröltzsch, F., Lipschitz stability of solutions of linear-quadratic parabolic control problems with
respect to perturbations. Dynam. Contin. Discrete Impuls. Systems Series A 7 (2000)(2), 289 –
306.
[37] Zeidler, E., Nonlinear Functional Analysis and its Applications (Vol. II/B). New York: Springer
1990.
[38] Zeidler, E., Applied Functional Analysis: Main Principles and their Applications. New York:
Springer 1995.
2. Mixed State Constrained Optimal Control Problems
29
2. Lipschitz Stability of Solutions for Elliptic Optimal Control Problems
with Pointwise Mixed Control-State Constraints
W. Alt, R. Griesse, N. Metla and A. Rösch: Lipschitz Stability for Elliptic Optimal
Control Problems with Mixed Control-State Constraints, submitted
In this manuscript, we analyze an optimal control problem of type (Pmc (δ)), but with
additional pure control constraints. The problem under consideration is
Minimize
(Pmcc (δ))
subject to
γ
1
ky − yd k2L2 (Ω) + ku − ud k2L2 (Ω) − (δ1 , y)Ω − (δ2 , u)Ω
2
2
−∆y = u + δ3 in Ω,
y=0
and
on Γ.
u − δ4 ≥ 0
ε u + y − δ 5 ≥ yc
in Ω,
in Ω.
Here, ε and γ are positive numbers. From the point of view of Lipschitz stability,
the perturbation of the inequality constraints by δ4 , δ5 , poses no particular difficulty.
These perturbations are included in order to treat problems with nonlinear constraints
in the future. We consider only one-sided constraints in order to simplify the discussion
about the existence of regular Lagrange multipliers. Invoking a result from Rösch and
Tröltzsch [2006], we prove in Lemma 2.5 of the manuscript below that for any given
δ ∈ Z := L2 (Ω) × L∞ (Ω) × L2 (Ω) × L∞ (Ω) × L∞ (Ω),
the unique solution (y δ , uδ ) of (Pmcc (δ)) is characterized by the existence of Lagrange
multipliers µ1,2 ∈ L∞ (Ω) and an adjoint state p ∈ H 2 (Ω) ∩ H01 (Ω) satisfying
(2.1)
−∆p = −(y − yd ) + δ1 + µ2
in Ω,
p=0
on Γ
−∆y = u + δ3
in Ω,
y=0
on Γ
γ (u − ud ) − δ2 − p − µ1 − εµ2 = 0
a.e. in Ω,
0 ≤ µ1 ⊥ u ≥ 0
a.e. in Ω,
0 ≤ µ2 ⊥ εu + y − yc ≥ 0
a.e. in Ω.
However, the Lagrange multipliers and adjoint state need not be unique, and thus
one cannot prove Lipschitz stability without further assumptions (see Remark 2.6 and
Proposition 3.5 of the manuscript).
Remark 2.1:
In the absence of the first inequality constraint u − δ4 ≥ 0, i.e., in the case µ1 = 0, we
see that
−∆p + ε−1 p = −(y δ − yd ) + δ1 + ε−1 γ (uδ − ud ) − ε−1 δ2
holds on Ω. In view of the uniqueness of (y δ , uδ ), also p and finally µ2 must be unique.
One may now proceed in a straightforward way, testing the adjoint equation by y δ −y δ0 ,
testing the state equation by pδ − pδ0 and the gradient equation by uδ − uδ0 , to obtain
a result analogous to Theorem 0.2 (see p. 8):
ky δ − y δ0 kH 2 (Ω) + kuδ − uδ0 kL2 (Ω) + kpδ − pδ0 kH 2 (Ω)
+ kµ2,δ − µ2,δ0 kL2 (Ω) ≤ Lkδ − δ 0 k[L2 (Ω)]3 .
We conclude that the additional level of difficulty in (Pmcc (δ)) is not caused by the
mixed constraints alone but by the simultaneous presence of the two inequality constraints on the same set Ω.
The assumption which allows us to overcome this difficulty is
30
Stability and Sensitivity Analysis
Assumption 2.2:
Suppose that there exists σ > 0 such that
S1σ := {x ∈ Ω : 0 ≤ u0 ≤ σ}
S2σ := {x ∈ Ω : 0 ≤ εu0 + y0 − yc ≤ σ}
satisfy S1σ ∩ S2σ = ∅.
We proceed by showing that there exists G > 0 such that for any δ ∈ Z satisfying
(2.2)
the active sets
kδkZ ≤ G σ,
Aδ1 := {x ∈ Ω : uδ = 0}
Aδ2 := {x ∈ Ω : ε uδ + y δ − yc = 0}
corresponding to (Pmcc (δ)) do not intersect, see Lemma 4.1 of the manuscript below. Consequently, the Lagrange multipliers and adjoint state are unique and will be
denoted by µ1,δ , µ2,δ , and pδ , respectively. Our main result is:
Theorem 2.3 ([Alt, Griesse, Metla, and Rösch, 2006, Theorem 4.2, Corollary 4.4]):
Suppose that δ, δ 0 ∈ Z satisfy (2.2). Then there exists L∞ > 0 such that
ky δ − y δ0 kH 2 (Ω) + kuδ − uδ0 kL∞ (Ω) + kpδ − pδ0 kH 2 (Ω)
+ kµ1,δ − µ1,δ0 kL∞ (Ω) + kµ2,δ − µ2,δ0 kL∞ (Ω) ≤ L∞ kδ − δ 0 kZ .
Remark 2.4:
It is possible to replace the space H 2 (Ω) ∩ H01 (Ω) for the state and adjoint state by
H01 (Ω) ∩ L∞ (Ω), and thus relax the regularity requirement for Ω.
2. Mixed State Constrained Optimal Control Problems
31
LIPSCHITZ STABILITY FOR ELLIPTIC OPTIMAL CONTROL
PROBLEMS WITH MIXED CONTROL-STATE CONSTRAINTS
WALTER ALT, ROLAND GRIESSE, NATALIYA METLA, AND ARND RÖSCH
Abstract. A family of linear-quadratic optimal control problems with pointwise
mixed state-control constaints governed by linear elliptic partial differential equations is considered. All data depend on a vector parameter of perturbations.
Lipschitz stability with respect to perturbations of the optimal control, the state
and adjoint variables, and the Lagrange multipliers is established.
1. Introduction
In this paper we consider the following class of linear-quadratic optimal control
problems:
Z
Z
γ
1
2
2
y δ1 dx −
u δ2 dx
(P(δ))
Minimize ky − yd kL2 (Ω) + ku − ud kL2 (Ω) −
2
2
Ω
Ω
subject to u ∈ L2 (Ω) and the elliptic state equation
Ay = u + δ3
y=0
on Ω
on ∂Ω
(1.1)
as well as pointwise constraints
u − δ4 > 0
on Ω
εu + y − δ5 > yc
on Ω.
(1.2)
Above, Ω is a bounded domain in RN , N ∈ {2, 3}, which is convex or has a C 1,1
boundary. In (1.1), A is an elliptic operator in H01 (Ω) specified below, and ε and γ
are positive numbers. The desired state yd is a function in L2 (Ω), while the desired
control ud and the bound yc are functions in L∞ (Ω).
Problem (P(δ)) depends on a parameter δ = (δ1 , δ2 , δ3 , δ4 , δ5 ) ∈ L2 (Ω) × L∞ (Ω) ×
2
L (Ω) × L∞ (Ω) × L∞ (Ω). The main contribution of this paper is to prove, in L∞ (Ω),
the Lipschitz stability of the unique optimal solution of (P(δ)) with respect to perturbations in δ. The stability analysis for linear-quadratic problems plays an essential
role in the analysis of nonlinear optimal control problems, in the convergence of the
SQP method, and in the convergence of solutions to a discretized problem to solutions
of the continuous problem.
Problems with mixed control-state constraints are important as Lavrientiev-type
regularizations of pointwise state-constrained problems [15–17], but they are also interesting in their own right. In the former case, ε is a small parameter tending to zero.
For the purpose of this paper, we consider ε to be fixed. Note that in addition to
the mixed control-state constraints, a pure control constraint is present on the same
domain.
Let us put our work into perspective. One of the fundamental results in stability
analysis of solutions to optimization problems is Robinson’s implicit function theorem
for generalized equations (see [18]). Further developments and applications of Robinson’s result to parametric control problems involving control constraints and discretizations of control problems can be found e.g. in [2–4,6,7,10,13]. For more references see
32
Stability and Sensitivity Analysis
the bibliography in [12], where the stability of optimal solutions involving nonlinear
ordinary differential equations and control-state constraints was investigated.
Problems of type (P(δ)) were investigated in [19] and the existence of regular (L2 )
Lagrange multipliers was proved, but no perturbations were considered. For elliptic
partial differential equations, Lipschitz stability results are available only for problems
with pointwise pure control constraints [14] and pure state constraints [8].
The presence of simultaneous control and mixed constraints (1.2) complicates our
analysis. The multipliers associated to these constraints are present in every equation
involving the adjoint state. Therefore, the direct estimation of the norm of the adjoint
state, which was used in [8, 14], is not possible in the present situation. In addition,
the simultaneous constraints preclude the transformation used in [17], where a mixed
control-state constraint was converted to a pure control constraint by defining a new
control v := εu + y. While this transformation simplifies our mixed constraint to
v > yc + δ5 , it also converts the simple constraint u > δ4 into the mixed constraint
v − y > εδ4 and nothing is gained. In order to prove the Lipschitz stability result,
we need to assume that the active sets for mixed and control constraints are well
separated at the reference problem δ = 0.
The outline of the paper is as follows: In Section 2, we investigate some basic properties of problem (P(δ)) for a fixed parameter δ. In particular, we state a projection
formula for the Lagrange multipliers. Section 3 is devoted to the Lipschitz stability
analysis of an auxiliary optimal control problem. This auxiliary problem is introduced
to exclude the possibility of overlapping active sets for both types of constraints. In
Section 4, we prove that the solutions of the auxiliary and the original problems coincide and obtain our main results.
2. Properties of the Optimal Control Problem
In this section we investigate the elliptic optimal control problem (P(δ)) with
pointwise mixed control-state constraints for a fixed parameter δ. With δ = 0 the
corresponding problem is considered the unperturbed problem (reference problem).
Throughout, (·, ·) denotes the scalar product in L2 (Ω) or L2 (Ω)N , respectively.
The following assumptions (A1)–(A3) are assumed to hold throughout the paper.
Assumption.
(A1) Let Ω be a bounded domain in RN , N ∈ {2, 3} which is convex or has C 1,1
boundary ∂Ω.
(A2) The operator A : H01 (Ω) → H −1 (Ω) is defined as hAy, vi = a[y, v], where
a[y, v] = ((∇v), A0 ∇y) + (b> ∇y, v) + (cy, v).
A0 is an N × N matrix with Lipschitz continuous entries on Ω such that
ξ > A0 (x)ξ > m0 |ξ|2 holds with some m0 > 0 for all ξ ∈ RN and almost all
x ∈ Ω. Moreover, b ∈ L∞ (Ω)N and c ∈ L∞ (Ω). The bilinear form a[·, ·] is not
necessarily symmetric but it is assumed to be continuous and coercive, i.e.,
a[y, v] 6 c kykH 1 (Ω) kvkH 1 (Ω)
2
a[y, y] > c kykH 1 (Ω)
for all y, v ∈ H01 (Ω) with some positive constants c and c. A simple example
is a[y, v] = (∇y, ∇v), corresponding to A = −∆.
(A3) For the remaining data, we assume ε > 0, γ > 0, yd ∈ L2 (Ω), ud , yc ∈ L∞ (Ω)
and δ ∈ Z, where
Z := L2 (Ω) × L∞ (Ω) × L2 (Ω) × L∞ (Ω) × L∞ (Ω).
Under these assumptions we show in this section that (P(δ)) possesses a unique
solution and we characterize this solution.
2. Mixed State Constrained Optimal Control Problems
Definition 2.1. A function y is called a weak solution of the elliptic PDE
Ay = f on Ω
y=0
on ∂Ω
33
(2.1)
if y ∈ H01 (Ω) and a[y, v] = (f, v) holds for all v ∈ H01 (Ω).
It is known that (2.1) has a unique weak solution in
Y := H 2 (Ω) ∩ H01 (Ω).
Lemma 2.2. Let assumptions (A1)–(A2) hold. For any given right hand side f ∈
L2 (Ω), there exists a unique weak solution of (2.1) in the space Y . It satisfies the a
priori estimate
kykH 2 (Ω) 6 CΩ kf kL2 (Ω) .
(2.2)
Moreover, the maximum principle holds, i.e., f > 0 a.e. on Ω implies y > 0 a.e. on
Ω.
Proof. The proof of H 2 (Ω)-regularity and the a priori estimate can be found in [9, Theorem 2.4.2.5]. For the proof of the maximum principle, we use v = y − = − min{0, y} ∈
H01 (Ω) as a test function [11]. We obtain
c ky − k2H 1 (Ω) 6 a[y − , y − ] = −a[y, y − ] = −(f, y − ) 6 0,
hence y − = 0 and y > 0 almost everywhere on Ω.
The previous lemma gives rise to the definition of the linear solution mapping
S : L2 (Ω) 3 f 7−→ y = Sf ∈ Y.
We recall that due to the Sobolev embedding theorem [1], there exist C∞ > 0 and
C2 > 0 such that
kykL∞ (Ω) 6 C∞ kykH 2 (Ω)
∀y ∈ H 2 (Ω)
kykL2 (Ω) 6 C2 kykL∞ (Ω)
∀y ∈ L∞ (Ω).
Lemma 2.3. For any δ ∈ Z, problem (P(δ)) admits a feasible pair (y, u) satisfying
(1.1)–(1.2).
Proof. Let δ ∈ Z be given and let us define
1
u(x) := C∞ CΩ kδ3 kL2 (Ω) + kyc kL∞ (Ω) + kδ5 kL∞ (Ω) + kδ4 kL∞ (Ω) = const > 0
ε
for all x ∈ Ω. Then u − δ4 > 0 holds a.e. on Ω. Moreover, we define y := S(u + δ3 )
and estimate
εu + y − yc − δ5 = C∞ CΩ kδ3 kL2 (Ω) + kyc kL∞ (Ω) + kδ5 kL∞ (Ω) + ε kδ4 kL∞ (Ω)
+ Su + Sδ3 − yc − δ5
> C∞ CΩ kδ3 kL2 (Ω) + ε kδ4 kL∞ (Ω) + Su + Sδ3 .
Due to Lemma 2.2, we have kSδ3 kL∞ (Ω) 6 C∞ kSδ3 kH 2 (Ω) 6 C∞ CΩ kδ3 kL2 (Ω) and
Su > 0. It follows that
εu + y − yc − δ5 > ε kδ4 kL∞ (Ω) > 0 a.e. on Ω,
hence (1.1)–(1.2) are satisfied.
For future reference, we define the cost functional associated to (P(δ))
Z
Z
γ
1
2
2
y δ1 dx −
u δ2 dx
J(y, u, δ) := ky − yd kL2 (Ω) + ku − ud kL2 (Ω) −
2
2
Ω
Ω
and the reduced cost functional
˜ δ) := J(S(u + δ3 ), u, δ).
J(u,
34
Stability and Sensitivity Analysis
Lemma 2.4. For any δ ∈ Z, problem (P(δ)) has a unique global optimal solution.
Proof. Let δ ∈ Z be given and let us define
Mδ := {u ∈ L2 (Ω) : u > δ4 , εu + S(u + δ3 ) − δ5 > yc a.e. on Ω}.
Note that Mδ is a convex subset of L2 (Ω) since S is a linear operator. Mδ is
nonempty due to Lemma 2.3. It is easy to see that the reduced cost functional
˜ δ) ∈ R is strictly convex on Mδ , radially unbounded and weakly
Mδ 3 u 7−→ J(u,
lower semicontinuous. Due to a classical result from convex analysis, see e.g., [21],
(P(δ)) has a unique global solution.
Let us define the Lagrange functional L : Y × L2 (Ω) × Y × L2 (Ω) × L2 (Ω) → R
L(y, u, p, µ1 , µ2 ) = J(y, u, δ) + a[y, p] − (p, u + δ3 )
− (µ1 , u − δ4 ) − (µ2 , εu + y − yc − δ5 ).
From the general Kuhn Tucker theory in Banach spaces, one expects that the optimal
solution of (P(δ)) has associated Lagrange multipliers p ∈ L2 (Ω) and µi ∈ L∞ (Ω)∗ .
However, for the problem (P(δ)) under consideration and other control problems of
bottleneck type, the existence of regular Lagrange multipliers µi ∈ L∞ (Ω) was shown
in [19, Theorem 7.3], which implies p ∈ Y .
Lemma 2.5 (Optimality System). Let δ ∈ Z be given.
(i) Suppose that (y, u) is the unique global solution of (P(δ)). Then there exist
Lagrange multipliers µi ∈ L∞ (Ω), i = 1, 2, and an adjoint state p ∈ Y such
that
(y − yd , v) − (δ1 , v) + a[v, p] − (µ2 , v) = 0
∀v ∈ H01 (Ω)
(2.3)
γ(u − ud , v) − (δ2 , v) − (p, v) − (µ1 , v) − (εµ2 , v) = 0
∀v ∈ L2 (Ω)
(2.4)
∀v ∈ H01 (Ω)
(2.5)
a.e. on Ω
(2.6)
a[y, v] − (u, v) − (δ3 , v) = 0

µ1 (u − δ4 ) = 0





µ1 > 0,
u − δ4 > 0

µ2 (εu + y − yc − δ5 ) = 0





µ2 > 0,
εu + y − yc − δ5 > 0
is satisfied.
(ii) On the other hand, if (y ∗ , u∗ , p∗ , µ∗1 , µ∗2 ) ∈ Y × L2 (Ω) × Y × L2 (Ω) × L2 (Ω)
satiesfies (2.3)–(2.6), then (y ∗ , u∗ ) is the unique global optimum of (P(δ)).
Proof. Part (i) was proved in [19, Theorem 7.3]. For part (ii), let (y, u) be any
admissible pair for (P(δ)), i.e., satisfying (1.1)–(1.2). We consider the difference
γ
1
2
2
ky − y ∗ kL2 (Ω) + ku − u∗ kL2 (Ω)
2
2
+ (y − y ∗ , y ∗ − yd ) − (y − y ∗ , δ1 ) + γ (u − u∗ , u∗ − ud ) − (u − u∗ , δ2 ),
J(y, u, δ) − J(y ∗ , u∗ , δ) =
2
2
2
where we used kak − kbk = ka − bk + 2(a − b, b). To evaluate the terms in the scalar
products, we use equations (2.3)–(2.5). First, (2.4) yields
γ(u∗ − ud , u − u∗ ) − (δ2 , u − u∗ ) = (p∗ , u − u∗ ) + (µ∗1 , u − u∗ ) + (εµ∗2 , u − u∗ ).
Since both (y, u) and (y ∗ , u∗ ) satisfy (2.5), we obtain for their difference that
a[y − y ∗ , p∗ ] = (u − u∗ , p∗ )
2. Mixed State Constrained Optimal Control Problems
35
holds. Finally, using v = y − y ∗ in (2.3) for p∗ , we get
(y ∗ − yd , y − y ∗ ) − (δ1 , y − y ∗ ) + a[y − y ∗ , p∗ ] = (µ∗2 , y − y ∗ ).
Hence we conclude
J(y, u, δ) − J(y ∗ , u∗ , δ) =
1
γ
2
2
ky − y ∗ kL2 (Ω) + ku − u∗ kL2 (Ω)
2
2
+ (y − y ∗ + ε(u − u∗ ), µ∗2 ) + (u − u∗ , µ∗1 ).
Note that by (2.6), we obtain µ∗1 (u∗ − δ4 ) = 0 and µ∗1 (u − δ4 ) > 0 a.e. on Ω, hence
(u − u∗ , µ∗1 ) > 0. Similarly, one obtains (y − y ∗ + ε(u − u∗ ), µ∗2 ) > 0. Consequently, we
have
γ
2
J(y, u, δ) − J(y ∗ , u∗ , δ) > ku − u∗ kL2 (Ω)
2
which shows that (y ∗ , u∗ ) is the unique global solution.
Remark 2.6. The Lagrange multipliers µi and the adjoint state p associated to the
unique solution of (P(δ)) need not be unique. Consider the following example on an
arbitrary bounded domain Ω with Lipschitz boundary:
γ
1
2
2
Minimize kykL2 (Ω) + ku − ud kL2 (Ω)
2
2
−∆y = u on Ω,
u > 0 on Ω
subject to
y = 0 on ∂Ω, εu + y > 0 on Ω.
Suppose that ud := −γ −1 (ε + S1), where 1 denotes the constant function 1. Due to the
maximum principle (Lemma 2.2), ud 6 −γ −1 ε holds a.e. on Ω. Apparently, y = u = 0
is the unique solution of this problem. Any tuple (p, µ1 , µ2 ) satisfying (2.3), (2.4) and
(2.6), i.e.,
−∆p = µ2
p=0
on Ω
on ∂Ω
µ1 > 0,
µ2 > 0
a.e. on Ω
−γud − p − µ1 − εµ2 = 0
a.e. on Ω
is a set of Lagrange multipliers for the problem. It is easy to check that (p, µ1 , µ2 ) =
(S1, 0, 1) and (p, µ1 , µ2 ) = (0, ε + S1, 0) both satisfy this system, and so does any
convex combination.
The L∞ -regularity of the Lagrange multipliers and the control will be shown by
means of a projection formula. This idea was introduced in [20]. However, in that
paper the situation was simpler since both inequalities could not be active simultaneously.
Lemma 2.7. Suppose that δ ∈ Z and (y, u, p, µ1 , µ2 ) ∈ Y ×L2 (Ω)×Y ×L2 (Ω)×L2 (Ω)
satisfy (2.4) and (2.6). Then the following projection formula
o
n
1
(2.7)
µ1 + εµ2 = max 0, γ(max{δ4 , (yc + δ5 − y)} − ud ) − p − δ2
ε
is valid. Moreover, u, µ1 , µ2 ∈ L∞ (Ω) hold.
Proof. From (2.6), we obtain
1
u > δ4
hence u > max{δ4 , (yc + δ5 − y)}.
u > yc +δε5 −y ,
ε
(2.8)
Plugging this into (2.4) we get
1
µ1 + εµ2 = γ(u − ud ) − p − δ2 > γ max{δ4 , (yc + δ5 − y)} − ud − p − δ2 .
ε
Since µ1 + εµ2 > 0, we have
n
o
1
µ1 + εµ2 > max 0, γ(max{δ4 , (yc + δ5 − y)} − ud ) − p − δ2 .
ε
(2.9)
36
Stability and Sensitivity Analysis
We proceed by distinguishing two subsets of Ω.
(a) On Ω1 = {x ∈ Ω : µ1 (x) > 0 or µ2 > 0}, at least one of the inequality
constraints is active. Thus (2.8) yields u = max{δ4 , 1ε (yc + δ5 − y)}, equality
holds in (2.9), and (2.7) follows.
(b) On Ω2 = {x ∈ Ω : µ1 (x) = µ2 (x) = 0}, the left hand side in (2.9) is zero and
again (2.7) follows.
To show the boundedness of u, µ1 and µ2 , we see that the expression inside the inner
max-function in (2.7) is an L∞ (Ω)-function due to assumption (A3) and the fact that
y, p ∈ H 2 (Ω) which embeds into L∞ (Ω). The L∞ (Ω)-regularity is preserved by the
max-function. Consequently, we have µ1 + εµ2 ∈ L∞ (Ω). Moreover, the estimate
0 6 µ1 6 µ1 + εµ2 6 kµ1 + εµ2 kL∞ (Ω)
shows that µ1 ∈ L∞ (Ω), and similarly µ2 ∈ L∞ (Ω). Finally, equation (2.4), i.e
u=
1
(p + µ1 + εµ2 + δ2 ) + ud
γ
a.e. on Ω
yields u ∈ L∞ (Ω).
(2.10)
We have noted above that the Lagrange multipliers µi and the adjoint state p
need not be unique. Hence it is impossible to prove the Lipschitz stability of these
quantities without further assumptions. As a remedy, we impose a condition at the
solution (y0 , u0 ) of the reference problem (P(0)) which ensures that the active sets are
well separated. This leads us to the following definition:
Definition 2.8. Let σ > 0 be real number. We define two subsets
S1σ = {x ∈ Ω : 0 6 u0 (x) 6 σ}
S2σ = {x ∈ Ω : 0 6 εu0 (x) + y0 (x) − yc (x) 6 σ},
called the security sets of level σ for (P(0)). The sets
Aδ1 = {x ∈ Ω : uδ (x) − δ4 (x) = 0}
Aδ2 = {x ∈ Ω : εuδ (x) + yδ (x) − yc (x) − δ5 (x) = 0}
are called the active sets of problem (P(δ)).
From now on we emphasize the dependence of the problem on the parameter δ and
denote the unique solution of (P(δ)) by (yδ , uδ ).
Assumption.
(A4) We require that S1σ ∩ S2σ = Ø for some fixed σ > 0.
Note that A01 ⊂ S1σ , and A02 ⊂ S2σ , i.e. A01 ∩ A02 = Ø and the active sets at the
reference problem (P(0)) do not intersect. We will show in the remainder of the paper
that (A4) implies that also Aδ1 ∩ Aδ2 = Ø for δ sufficiently small. More precisely, we
will determine a function g(σ) such that Aδ1 ∩ Aδ2 = Ø for all kδkZ 6 g(σ). It will be
shown that this assumption also guarantees the uniqueness and Lipschitz stability of
the Lagrange multipliers and adjoint states.
As an intermediate step, we consider in Section 3 a family of auxiliary problems
(Paux (δ)), in which the active sets are separated by construction. This technique was
suggested in [12] in the context of ordinary differential equations.
2. Mixed State Constrained Optimal Control Problems
37
3. Stability Analysis for an Auxiliary Problem
In this section we introduce an auxiliary optimal control problem, in which we
restrict the inequality constraints (1.2) to the disjoint sets S1σ and S2σ , respectively.
Assumptions (A1)–(A4) are taken to hold throughout the remainder of the paper. We
consider
Z
Z
1
γ
2
2
min ky − yd kL2 (Ω) + ku − ud kL2 (Ω) −
y δ1 dx −
u δ2 dx
(Paux (δ))
2
2
Ω
Ω
subject to the elliptic state equation
Ay = u + δ3
y=0
on Ω
(3.1)
on ∂Ω
and the pointwise constraints
u − δ4 > 0
εu + y − δ5 > yc
on S1σ
(3.2)
on S2σ .
With analogous arguments as for (P(δ)), it is easy to see that (Paux (δ)) has a unique
aux
∞
aux
solution (yδaux , uaux
δ ) ∈ Y ×L (Ω) with associated Lagrange multipliers (µ1,δ , µ2,δ ) ∈
∞
σ
∞
σ
aux
L (S1 ) × L (S2 ) and adjoint state pδ ∈ Y which satisfy the following necessary
and sufficient optimality system:
aux
(yδaux − yd , v) − (δ1 , v) + a[v, paux
δ ] − (µ2,δ , v) = 0
∀v ∈ H01 (Ω)
aux
aux
γ(uaux
− ud , v) − (δ2 , v) − (paux
δ
δ , v) − (µ1,δ , v) − (εµ2,δ , v) = 0
∀v ∈ L2 (Ω)
a[yδaux , v] − (uaux
δ , v) − (δ3 , v) = 0
aux
µaux
− δ4 ) = 0
1,δ (uδ
µaux
1,δ
uaux
δ
> 0,
> 0,
εuaux
δ
+
yδaux
)
− δ4 > 0
aux
µaux
+ yδaux − yc − δ5 ) = 0
2,δ (εuδ
µaux
2,δ
∀v ∈ H01 (Ω)
a.e. on S1σ
)
− yc − δ 5 > 0
a.e. on S2σ
In order to give a meaning to the scalar products in the first and second equation,
aux
the Lagrange multipliers µaux
1,δ and µ2,δ are extended from their respective domains
σ
σ
of definition S1 and S2 to Ω by zero.
Lemma 3.1. The Lagrange multipliers and adjoint state for (Paux (δ)) are unique.
Proof. We exploit that S1σ ∩ S2σ = Ø by Assumption (A4) and multiply the second
σ
equation with the characteristic function χS1σ . Since µaux
2,δ = 0 on S1 , we obtain
aux
µaux
− ud ) − δ2 − paux
δ
1,δ = γ(uδ
Likewise, by multiplying with χS2σ , we obtain
a.e. on S1σ .
1
γ(uaux
− ud ) − δ2 − paux
a.e. on S2σ .
δ
δ
ε
We plug this expression into the adjoint equation and obtain
1
aux
a0 [v, paux
− yd , v) + γ(uaux
− ud , χS2σ · v) − (δ2 , χS2σ · v)
δ ] = (δ1 , v) − (yδ
δ
ε
for all v ∈ H01 (Ω), where
1
a0 [v, p] := a[v, p] + (p , χS2σ · v)
ε
µaux
2,δ =
38
Stability and Sensitivity Analysis
is a modification of the original bilinear form. Note that
a0 [y, v] 6 c + ε−1 kykH 1 (Ω) kvkH 1 (Ω)
2
a0 [y, y] > c kykH 1 (Ω)
and thus the problem a0 [v, p] = (f, v) for all v ∈ H01 (Ω) admits a unique solution which
satisfies the a priori estimate
kpkH 2 (Ω) 6 CΩ∗ kf kL2 (Ω) ,
(3.3)
paux
δ
compare Lemma 2.2. Note that the equation for
contains only known data and
aux
the unique solution (yδaux , uaux
),
hence
p
is
also
unique.
From the equations for
δ
δ
aux
µaux
and
µ
we
conclude
the
uniqueness
of
the
Lagrange
multipliers.
1,δ
2,δ
3.1. Stability Analysis in L2 . As delineated in the introduction, the original problem depends on perturbation parameters δ ∈ Z. In particular, (P(δ)) includes perturbations of the desired state in view of
Z
1
1
2
2
ky − (yd + δ1 )kL2 (Ω) = ky − yd kL2 (Ω) − yδ1 + c,
2
2
Ω
where c is a constant. In the same way, δ2 covers perturbations in the desired control
ud , and δ3 accounts for perturbations in the right hand side of the PDE, while δ4 and
δ5 are perturbations of the inequality constraints (1.2).
Now we can state the main result of this section concerning the Lipschitz stability
of the optimal state and control with respect to perturbations for (Paux (δ)).
Proposition 3.2. Let Assumptions (A1)–(A4) be satisfied. Then there exists a constant Laux > 0 such that for any δ, δ 0 ∈ Z, the corresponding unique solutions of the
auxiliary problem satisfy
aux
kyδaux
− yδaux kH 2 (Ω) + kuaux
0
δ 0 − uδ kL2 (Ω)
6 Laux kδ 0 − δk[L2 (Ω)]5 .
This result can be obtained from a general result on strong regularity for generalized
equations, see [5, Theorem 5.20]. Nevertheless, we give here a short direct proof. We
begin with an auxiliary result.
Lemma 3.3. The Lagrange multipliers associated to the solutions (yδaux , uaux
δ ) and
aux
, uaux
(δ)) and (Paux (δ 0 )) satisfy
(yδaux
0
δ 0 ) of (P
aux
aux
aux
aux
aux
aux
aux
µaux
− yδaux + ε(uaux
2,δ 0 − µ2,δ , yδ 0
δ 0 − uδ ) + µ1,δ 0 − µ1,δ , uδ 0 − uδ
aux 0
aux
aux 0
6 µaux
2,δ 0 − µ2,δ , δ5 − δ5 + µ1,δ 0 − µ1,δ , δ4 − δ4 .
Proof. Using the complementarity conditions in the optimality system, we infer
and
aux
−µaux
− δ4 ) = 0
1,δ (uδ
aux
0
−µaux
1,δ 0 (uδ 0 − δ4 ) = 0
aux
µaux
− δ4 ) > 0
1,δ 0 (uδ
aux
0
µaux
1,δ (uδ 0 − δ4 ) > 0
aux
aux
aux
aux 0
µaux
6 µaux
1,δ 0 − µ1,δ , uδ 0 − uδ
1,δ 0 − µ1,δ , δ4 − δ4
follows. Similarly, one obtains the second part.
Proof of Proposition 3.2. Let δ, δ 0 ∈ Z be arbitrary. We abbreviate
δu := uaux
− uaux
δ
δ0
and similarly for the remaining quantitites. We consider the respective optimality
systems and start with the adjoint equation using v = δy as test function. We obtain
2
kδykL2 (Ω) = (δ10 − δ1 , δy) + (δµ2 , δy) − a[δy, δp].
2. Mixed State Constrained Optimal Control Problems
39
Testing the difference of the second equations in the optimality system with v = δu
yields
2
γ kδukL2 (Ω) = (δ20 − δ2 , δu) + (δp, δu) + (δµ1 , δu) + ε(δµ2 , δu).
From the state equation, tested with δp, we get
a[δy, δp] − (δu, δp) − (δ30 − δ3 , δp) = 0.
Adding these equations yields
2
2
kδykL2 (Ω) + γ kδukL2 (Ω) = (δ10 − δ1 , δy) + (δ20 − δ2 , δu) − (δ30 − δ3 , δp)
+ (δµ2 , δy) + (δµ1 , δu) + ε(δµ2 , δu).
Applying Lemma 3.3 shows that
2
2
kδykL2 (Ω) + γ kδukL2 (Ω) 6 (δ10 − δ1 , δy) + (δ20 − δ2 , δu) − (δ30 − δ3 , δp)
+ (δµ2 , δ50 − δ5 ) + (δµ1 , δ40 − δ4 ).
Cauchy’s and Young’s inequality imply that
γ
1
1
1
2
2
2
2
2
kδykL2 (Ω) + kδukL2 (Ω) 6 kδ10 − δ1 kL2 (Ω) +
kδ 0 − δ2 kL2 (Ω) + κ kδpkL2 (Ω)
2
2
2
2γ 2
1
1
2
2
2
+
kδ 0 − δ3 kL2 (Ω) + κ kδµ2 kL2 (Ω) +
kδ 0 − δ5 kL2 (Ω)
4κ 3
4κ 5
1
2
2
+ κ kδµ1 kL2 (Ω) +
(3.4)
kδ 0 − δ4 kL2 (Ω) ,
4κ 4
where κ > 0 will be specified below. The difference of the adjoint states satisfies
1
a0 [v, δp] = (δ10 − δ1 , v) − (δy, v) + γ(δu , χS2σ · v) − (δ20 − δ2 , χS2σ · v) ,
ε
0
where a [·, ·] was defined in the proof of Lemma 3.1. By (3.3) we can estimate the
difference of the adjoint states,
kδpkL2 (Ω) 6 kδpkH 2 (Ω) 6 CΩ∗ kδ10 − δ1 kL2 (Ω) + kδykL2 (Ω)
γ
1
kδukL2 (S σ ) + kδ20 − δ2 kL2 (S σ ) . (3.5)
2
2
ε
ε
Moreover, with the representation of the Lagrange multipliers from Lemma 3.1, we
find
+
kδµ1 kL2 (Ω) = kδµ1 kS σ (Ω) 6 γ kδukL2 (Ω) + kδ20 − δ2 kL2 (Ω) + kδpkL2 (Ω)
1
1
γ kδukL2 (Ω) + kδ20 − δ2 kL2 (Ω) + kδpkL2 (Ω) .
ε
Plugging these estimates into (3.4), we obtain
1
γ
c2
2
2
2
kδykL2 (Ω) + kδukL2 (Ω) 6 c1 +
+ c3 κ kδ 0 − δk[L2 (Ω)]5
2
2
κ
2
2
+ c4 κ kδykL2 (Ω) + kδukL2 (Ω)
kδµ2 kL2 (Ω) = kδµ2 kS σ (Ω) 6
2
where c1 , . . . , c4 depend only on γ, ε and CΩ∗ . Now we choose κ > 0 such that
c4 κ < 21 min{1, γ}. We obtain
kδuk2L2 (Ω) 6 L0 · kδ 0 − δk2[L2 (Ω)]5 .
Using a priori estimate (2.2), Lipschitz stability for the state follows:
kδyk2H 2 (Ω)
and the proof is complete.
6 L1 · kδ 0 − δk2[L2 (Ω)]5
40
Stability and Sensitivity Analysis
Corollary 3.4. There exists a constant L2 > 0 such that for any δ, δ 0 ∈ Z, the
corresponding adjoint states of the auxiliary problem satisfy
aux
kpaux
δ 0 − pδ kH 2 (Ω)
6 L2 · kδ 0 − δk[L2 (Ω)]5 .
This result is evident directly from (3.5) and Proposition 3.2.
3.2. Stability Analysis in L∞ . The considerations in Section 3.1 describe the stability behavior of the auxiliary problem (Paux (δ)). However, the results are not strong
enough to apply them to the original problem (P(δ)). Indeed, we will make this precise
in the following remark. This is the reason why we consider stability estimates in L∞
in this subsection. The key in showing the desired estimates is the projection formula,
Lemma 2.7. We emphasize that the uniform second order growth condition holds
only with respect to the L2 -norm. Therefore, general stability results (e.g. [5, Theorem 5.20]) cannot be applied here.
Proposition 3.5. Suppose that Assumptions (A1)–(A3) hold and that (y0 , u0 ) is the
optimal solution of (P(0)) which satisfies the separation assumption (A4). Moreover,
we assume that the active set A01 contains an open ball B such that µ1,0 > M > 0
holds on B. Then for every R > 0 there exists δ ∈ [L2 (Ω)]5 with kδk[L2 (Ω)]5 < R such
that the dual variables for (P(δ)) are not unique. Consequently, the dual variables
cannot be Lipschitz stable with respect to perturbations.
Note that this implies in particular that the generalized equation representing the
optimality system of (P(0)) is not strongly regular, see [5, Definition 5.12]. The proof is
given in the appendix. Let us now start with the L∞ stability estimates for (Paux (δ)).
Lemma 3.6. Let (A1)–(A4) be satisfied. Then there exists a constant L3 > 0 such
that for any δ, δ 0 ∈ Z, the corresponding unique solutions of the auxiliary problem
satisfy
aux
0
kuaux
δ 0 − uδ kL∞ (Ω) 6 L3 · kδ − δkZ .
Proof. From the projection formula (2.7) we have almost everywhere on Ω
aux
aux
aux
µaux
1,δ 0 − µ1,δ + ε(µ2,δ 0 − µ2,δ )
o
n
1
0
)} − ud ) − paux
−
δ
= max 0, γ(max{δ40 , (yc + δ50 − yδaux
0
0
δ
2
ε
n
o
1
− max 0, γ(max{δ4 , (yc + δ5 − yδaux )} − ud ) − paux
−
δ
2 .
δ
ε
Using max{a, b} − max{c, d} 6 max{a − c, b − d} twice and the fact that e 6 f implies
max{0, e} 6 max{0, f }, we continue
n
1
1
6 max 0, γ max{δ40 , (yc + δ50 − yδaux
)} − max{δ4 , (yc + δ5 − yδaux )}
0
ε
ε
o
aux
aux
0
− (pδ0 − pδ ) − (δ2 − δ2 )
n
1
− yδaux )
6 max 0, γ max δ40 − δ4 , (δ50 − δ5 ) − (yδaux
0
ε o
aux
aux
0
− (pδ0 − pδ ) − (δ2 − δ2 )
o
n
1 0
kδ5 − δ5 kL∞ (Ω) + kyδaux
− yδaux kL∞ (Ω)
6 γ max kδ40 − δ4 kL∞ (Ω) ,
0
ε
aux
0
+ kpaux
−
p
k
+
kδ
0
δ
δ
2 − δ2 kL∞ (Ω) .
L∞ (Ω)
2. Mixed State Constrained Optimal Control Problems
41
From the embedding of H 2 (Ω) into L∞ (Ω) we have
aux
ε−1 kyδaux
− yδaux kL∞ (Ω) + kpaux
0
δ 0 − pδ kL∞ (Ω)
aux
− yδaux kH 2 (Ω) + kpaux
6 C∞ ε−1 kyδaux
0
δ 0 − pδ kH 2 (Ω) .
By Proposition 3.2 and Corollary 3.4, the right hand side can be estimated by
C∞ ε−1 Laux + L2 kδ 0 − δk[L2 (Ω)]5 .
Collecting terms and replacing the norm in [L2 (Ω)]5 by the stronger norm in Z, we
obtain
aux
aux
aux
0
0
µaux
a.e. on Ω.
1,δ 0 − µ1,δ + ε(µ2,δ 0 − µ2,δ ) 6 L3 kδ − δkZ
Since the same inequality is obtained by exchanging the roles of δ and δ 0 , we have
aux
aux
aux
0
0
kµaux
1,δ 0 − µ1,δ + ε(µ2,δ 0 − µ2,δ )kL∞ (Ω) 6 L3 kδ − δkZ .
The claim then follows from applying the estimates above to
aux
=
uaux
δ 0 − uδ
1
0
aux
aux
aux
aux
aux
(paux
δ 0 − pδ ) + (µ1,δ 0 − µ1,δ ) + ε(µ2,δ 0 − µ2,δ ) + (δ2 − δ2 ) .
γ
Corollary 3.7. For δ 0 = 0 the previous lemma implies
ku0 − uaux
δ kL∞ (Ω) 6 L3 kδkZ .
4. Stability Analysis for the Original Problem
In this section we formulate the main result of Lipschitz continuity for the primal
and dual variables of (P(δ)). We have seen in Proposition 3.5 that the structure of the
active sets of (P(δ)) can change dramatically even for arbitrarily small perturbations
with respect to the L2 norm. By contrast, the stability estimates in L∞ with respect
to the norm of Z are strong enough in order for the constraints to stay inactive outside
of the security sets for small perturbations. This implies that for sufficiently small δ,
the solutions of (P(δ)) and (Paux (δ)) coincide.
We will admit δ ∈ Z which satisfy the condition
kδkZ 6 g(σ) := min{g1 (σ), g2 (σ)},
where g1 (σ) :=
σ
L3 +1
and g2 (σ) :=
(4.1)
σ
εL3 +C∞ C2 L1 +1 .
Lemma 4.1. Suppose that kδkZ 6 g(σ) and that (yδaux , uaux
δ ) is the unique solution
aux
of (Paux (δ)) with adjoint state paux
and Lagrange multipliers (µaux
δ
1,δ , µ2,δ ). Then the
solution is feasible for the original problem (P(δ)). When the multipliers are extended
aux
aux
aux
by zero outside S1σ and S2σ , respectively, the tuple (yδaux , uaux
δ , pδ , µ1,δ , µ2,δ ) satisfies
aux
aux
the optimality system (2.3)–(2.6). In particular, (yδ , uδ ) is the unique solution of
(P(δ)).
aux
Proof. The pair (yδaux , uaux
(δ)), i.e.,
δ ) is feasible for (P
uaux
− δ4 > 0 on S1σ
δ
εuaux
+ yδaux − δ5 > yc
δ
on S2σ
and we have to show
εuaux
δ
uaux
− δ4 > 0 on Ω \ S1σ
δ
+ yδaux − δ5 > yc
on Ω \ S2σ .
42
Stability and Sensitivity Analysis
As u0 > σ holds a.e. on Ω \ S1σ , we have
uaux
− δ4 = u0 + uaux
− u0 − δ4
δ
δ
> u0 − ku0 − uaux
δ kL∞ (Ω) − kδ4 kL∞ (Ω)
> σ − L3 kδkZ − kδ4 kL∞ (Ω)
> σ − (L3 + 1) g1 (σ) = 0
almost everywhere on Ω \ S1σ . As for the second inequality, we have εu0 + y0 − yc > σ
on Ω \ S2σ and consequently
εuaux
+ yδaux − yc − δ5 = εu0 + y0 − yc + ε(uaux
− u0 ) + (yδaux − y0 ) − δ5
δ
δ
aux
> εu0 + y0 − yc − εku0 − uaux
kL∞ (Ω) − kδ5 kL∞ (Ω)
δ kL∞ (Ω) − ky0 − yδ
> σ − εL3 kδkZ − C∞ C2 L1 kδkZ − kδ5 kL∞ (Ω)
> σ − (εL3 + C∞ C2 L1 + 1) g2 (σ) = 0
almost everyhere on Ω \ S2σ .
aux
We extend the multipliers (µaux
1,δ , µ2,δ ) by zero to all of Ω. Then it is easy to see
aux
aux
aux
that (yδaux , uaux
δ , pδ , µ1,δ , µ2,δ ) satisfies the optimality system (2.3)–(2.6), which is
a sufficient condition for optimality of (P(δ)) by Lemma 2.5.
Theorem 4.2. There exists a constant L > 0 such that for any δ, δ 0 ∈ Z satisfying
(4.1), the unique solutions (yδ , uδ ) and (yδ0 , uδ0 ) of (P(δ)) and (P(δ 0 )) satisfy
kyδ0 − yδ kH 2 (Ω) + kuδ0 − uδ kL∞ (Ω)
6 L · kδ 0 − δkZ .
(4.2)
0
Proof. By the previous lemma, (yδ , uδ ) = (yδaux , uaux
δ ) and the same for δ . Hence we
aux
can apply the Lipschitz stability results for (P (δ)), Proposition 3.2 and Lemma 3.6,
to obtain (4.2).
Corollary 4.3. For any δ ∈ Z satisfying (4.1), we have Aδ1 ⊂ S1σ and Aδ2 ⊂ S2σ , hence
Aδ1 ∩ Aδ2 = Ø. Moreover, the Lagrange multipliers and adjoint state for (P(δ)) are
unique and coincide with those for (Paux (δ)).
Proof. We consider a point x∗ ∈ Aδ1 , so uδ (x∗ ) − δ4 (x∗ ) = 0 holds.
u0 (x∗ ) = u0 (x∗ ) − uδ (x∗ ) + uδ (x∗ ) − δ4 (x∗ ) + δ4 (x∗ )
6 ku0 − uδ kL∞ (Ω) + kδ4 kL∞ (Ω)
6 L3 kδkZ + kδkZ 6 σ,
where we have used Corollary 3.7. This shows x∗ ∈ S1σ and Aδ1 ⊂ S1σ . Analogously,
Aδ2 ⊂ S2σ and by Assumption (A4), we have Aδ1 ∩ Aδ2 = Ø. Using the same arguments
as in Lemma 3.1, we see that the Lagrange multipliers µi,δ and adjoint state p for
aux
aux
aux
(P(δ)) are unique. In Lemma 4.1, the tuple (yδaux , uaux
δ , pδ , µ1,δ , µ2,δ ) was shown
to satisfy the optimality system (2.3)–(2.6) for (P(δ)), so in particular the Lagrange
multipliers and adjoint state for (P(δ)) coincide with those for (Paux (δ)).
The previous corollary allows us to use the symbols pδ , µ1,δ and µ2,δ without ambiguity for kδkZ 6 g(σ). Finally, we obtain a Lipschitz stability result also for these
quantities:
Corollary 4.4. There exist constants L4 , L5 and L6 > 0 such that for any δ, δ 0 ∈ Z
satisfying (4.1), the unique adjoint state and Lagrange multipliers (pδ , µ1,δ , µ2,δ ) and
2. Mixed State Constrained Optimal Control Problems
43
(pδ0 , µ1,δ0 , µ2,δ0 ) associated to the solutions of (P(δ)) and (P(δ 0 )), respectively, satisfy
kpδ0 − pδ kH 2 (Ω)
6 L2 · kδ 0 − δk[L2 (Ω)]5
kµ1,δ0 − µ1,δ kL∞ (Ω)
6 L5 · kδ 0 − δkZ
kµ2,δ0 − µ2,δ kL∞ (Ω)
6 L6 · kδ 0 − δkZ .
Proof. The first claim follows from Corollary 3.4 and the equality pδ = paux
from the
δ
previous corollary. From the proof of Lemma 3.6, we have
kµ1,δ0 − µ1,δ + ε(µ2,δ0 − µ2,δ )kL∞ (Ω) 6 L2 kδ 0 − δkZ .
Since µ1,δ0 − µ1,δ is zero outside S1σ , we get
max kµ1,δ0 − µ1,δ kL∞ (Ω) , ε kµ2,δ0 − µ2,δ kL∞ (Ω) 6 L2 kδ 0 − δkZ
and the claim follows.
Acknowledgement
This work was partially supported by the Austrian Science Fund FWF under project
number P18056-N12.
Appendix A. Proof of Proposition 3.5
Let (y0 , u0 , p0 , µ1,0 , µ2,0 ) be any solution of the optimality system (2.3)–(2.6) for
(P(0)). Due to the separation assumption (A4), this is also a solution of the optimality system for (Paux (0)). Since the solution of the optimality system for (Paux (0)) is
unique, see Lemma 3.1, this must hold for (P(0)) as well. In particular, (y0 , u0 , p0 , µ1,0 , µ2,0 ) =
aux
aux
aux
(y0aux , uaux
0 , p0 , µ1,0 , µ2,0 ).
Let us denote by B the open ball centered at ξ ∈ Ω contained in A01 such that
µ1,0 > M > 0 holds on B. Let r > 0 such that Br (ξ) ⊂ B and
kε u0 + y0 − yc kL∞ (Ω) |Br |1/2 < R.
We choose δ1 = · · · = δ4 ≡ 0 and
(
ε u 0 + y0 − yc
δ5 =
0
in Br
in Ω \ Br .
It follows immediately that kδk[L2 (Ω)]5 < R. It is also easy to see that (y0 , u0 ) is
feasible for (P(δ)). Moreover, (y0 , u0 , p0 , µ1,0 , µ2,0 ) satisfies the optimality system for
(P(δ)). However, we will show that this solution of the optimality system for (P(δ))
is not unique with respect to the dual variables.
We choose κ > 0 and
(
κ,
in Br (ξ)
µ2 =
µ2,0 , elsewhere
and let p be corresponding solution of (2.3). We set
(
µ1,0 − ε µ2 + p0 − p, in Br (ξ)
µ1 =
µ1,0 ,
elsewhere.
It is easy to check that (p, µ1 , µ2 ) satisfies (2.4). It remains to show that µ1 > 0 holds.
We find
µ1 > M − ε κ − kp0 − pkL∞ (Ω) > M − ε κ − κ CΩ C∞ χBr (ξ) L2 (Ω)
> M − ε κ − κ CΩ C∞ |Br (ξ)|1/2 > M − ε κ − κ CΩ C∞ |Ω|1/2
in Br (ξ).
Consequently, µ1 > 0 holds on all of Ω for sufficiently small κ. Therefore, the tuple
(y0 , u0 , p, µ1 , µ2 ) satisfies the optimality system (2.3)–(2.6) and it is different from
(y0 , u0 , p0 , µ1,0 , µ2,0 ) in view of κ > 0.
44
Stability and Sensitivity Analysis
References
[1] R. Adams. Sobolev Spaces. Academic Press, New York, 1975.
[2] W. Alt. Local stability of solutions to differentiable optimization problems in Banach spaces.
Journal of Optimization Theory and Application, 70:443–466, 1991.
[3] W. Alt. Discretization and mesh-independence of Newton’s method for generalized equations. In
Antony V. Fiacco, editor, Mathematical Programming with Data Perturbations V, volume 195
of Lecture Notes in Pure and Applied Mathematics, pages 1–30. Marcel Dekker, 1997.
[4] W. Alt and K. Malanowski. The Lagrange-Newton method for nonlinear optimal control problems. Computational Optimization and Application, 2:77–100, 1993.
[5] F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems. Springer, Berlin,
2000.
[6] A. L. Dontchev and W. W. Hager. Implicit functions, Lipschitz maps, and stability in optimization. Mathematics of Operations Research, 19:753–768, 1994.
[7] A. L. Dontchev, W. W. Hager, A. B. Poore, and B. Yang. Optimality, stability, and convergence
in nonlinear control. Applied Mathematics and Optimization, 31:297–326, 1995.
[8] R. Griesse. Lipschitz stability of solutions to some state-constrained elliptic optimal control
problems. to appear in: Journal of Analysis and its Applications, 2005.
[9] P. Grisvard. Elliptic Problems in Nonsmooth Domains. Pitman, Boston, 1985.
[10] K. Ito and K. Kunisch. Sensitivity analysis of solutions to optimization problems in Hilbert
spaces with applications to optimal control and estimation. Journal of Differential Equations,
99:1–40, 1992.
[11] D. Kinderlehrer and G. Stampacchia. An Introduction to Variational Inequalities and Their
Applications. Academic Press, New York, 1980.
[12] K. Malanowski. Stability and sensitivity analysis for optimal control problems with control-state
constraints. Dissertationes Mathematicae (Rozprawy Matematyczne), 394, 2001.
[13] K. Malanowski, C. Büskens, and H. Maurer. Convergence of approximations to nonlinear optimal
control problems. In Antony V. Fiacco, editor, Mathematical Programming with Data Perturbations V, volume 195 of Lecture Notes in Pure and Applied Mathematics, pages 253–284. Marcel
Dekker, 1997.
[14] K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control
for elliptic equations. Control and Cybernetics, 29:237–256, 2000.
[15] C. Meyer, U. Prüfert, and F. Tröltzsch. On two numerical methods for state-constrained elliptic
control probelms. submitted, 2005.
[16] C. Meyer, A. Rösch, and F. Tröltzsch. Optimal control of PDEs with regularized pointwise state
constraints. Computational Optimization and Applications, 33(2–3):209–228, 2005.
[17] C. Meyer and F. Tröltzsch. On an elliptic optimal control problem with pointwise mixed controlstate constraints. In A. Seeger, editor, Recent Advances in Optimization. Proceedings of the 12th
French-German-Spanish Conference on Optimization, volume 563 of Lecture Notes in Economics
and Mathematical Systems, pages 187–204, New York, 2006. Springer.
[18] Stephen M. Robinson. Strongly regular generalized equations. Mathematics of Operations Research, 5:43–62, 1980.
[19] A. Rösch and F. Tröltzsch. Existence of regular Lagrange multipliers for elliptic optimal control
problems with pointwise control-state constraints. SIAM Journal on Control and Optimization,
45(2):548–564, 2006.
[20] A. Rösch and D. Wachsmuth. Regularity of solutions for an optimal control problem with mixed
control-state constraints. submitted, 2005.
[21] E. Zeidler. Applied Functional Analysis: Main Principles and their Applications. Springer, New
York, 1995.
3. Sensitivity Analysis for NSE Opt. Control Problems
45
3. Sensitivity Analysis for Optimal Control Problems Involving the
Navier-Stokes Equations
R. Griesse, M. Hintermüller and M. Hinze: Differential Stability of Control Constrained Optimal Control Problems for the Navier-Stokes Equations, Numerical Functional Analysis and Optimization 26(7–8), p.829–850, 2005
The Navier-Stokes equations govern the flow of an (here incompressible) viscous fluid
and thus have numerous applications. We consider here the optimal control problem with distributed (vector-valued) control and pointwise (componentwise) control
constraints,
Z Z
Z
αT
αQ T
|y − yQ |2 dx dt +
|y(·, T ) − yT |2 dx
Minimize
2 0 Ω
2 Ω
Z Z
Z Z
γ T
αR T
| curl y|2 dx dt +
|u|2 dx dt
+
2 0 Ω
2 0 Ω

yt + (y · ∇)y − ν∆y + ∇p = u in Q := Ω × (0, T ),
(3.1)




div y = 0 in Q,
subject to

y = 0 on Σ := ∂Ω × (0, T ),



y(·, 0) = y0 in Ω,
and ua ≤ u ≤ ub a.e. in Q.
This optimal control problem and its solutions are considered to be functions of a
number of perturbation parameters, namely of the scalars αQ , αT , αR and desired
state functions yQ , yT appearing in the objective, of the viscosity ν (the inverse of
the Reynolds number), and of the initial conditions y0 in the state equation. In our
notation from the introduction of Chapter 1, we denote the vector of perturbation
parameters by
π = (ν, αQ , αT , αR , γ, yQ , yT , y0 ) ∈ P := R5 × L2 (Q) × H × V.
Before the publication of this paper, the Lipschitz stability of local optimal solutions
with respect to such parameters had been investigated in Roubı́ček and Tröltzsch [2003]
for the steady-state case and in Hintermüller and Hinze [2006], Wachsmuth [2005] for
the time-dependent case. We take this analysis one step further and prove that under
second-order sufficient conditions, the dependence of local optimal solutions on π is
indeed directionally differentiable.
As outlined in the introduction of Chapter 1, this analysis can be carried out by
rewriting the optimality system in terms of a generalized equation. It is then sufficient
to analyze a linearization of this generalized equation and employ the Implicit Function
Theorem 0.6.
The core step is proved in Theorem 3.9 of the paper under discussion, which establishes
the directional differentiability of the linearized optimality system with respect to
certain perturbations δ. We work here with divergence-free spaces, which avoids the
need of dealing with perturbations in the incompressibility condition of the linearized
forward and adjoint equations.
The differentiability property of local optimal solutions of (3.1) with respect to π allows
a second-order Taylor expansion of the minimum value function, which is calculated
and discussed in Section 5 of the paper. The steady-state case is easier and is briefly
treated in Section 6.
46
Stability and Sensitivity Analysis
DIFFERENTIAL STABILITY OF CONTROL CONSTRAINED
OPTIMAL CONTROL PROBLEMS FOR THE NAVIER-STOKES
EQUATIONS
ROLAND GRIESSE, MICHAEL HINZE, AND MICHAEL HINTERMÜLLER
Abstract. Distributed optimal control problems for the time-dependent and the
stationary Navier-Stokes equations subject to pointwise control constraints are
considered. Under a coercivity condition on the Hessian of the Lagrange function,
optimal solutions are shown to be directionally differentiable functions of perturbation parameters such as the Reynolds number, the desired trajectory, or the
initial conditions. The derivative is characterized as the solution of an auxiliary
linear-quadratic optimal control problem. Thus, it can be computed at relatively
low cost. Taylor expansions of the minimum value function are provided as well.
1. Introduction
Perturbation theory for continuous minimization problems is of fundamental importance since many real world applications are embedded in families of optimization
problems. Frequently, these families are generated by scalar or vector-valued parameters, such as the Reynolds number in fluid flow, desired state trajectories, initial
conditions for time-dependent problems, and many more. From a theoretical as well
as numerical algorithmic point of view the behavior of optimal solutions under variations of the parameters is of interest:
• The knowledge of smoothness properties of the parameter-to-solution map
allows to establish a qualitative theory.
• On the numerical level one can exploit stability results for proving convergence
of numerical schemes, or to develop algorithms with real time features. In
fact, based on a known nominal local solution of the optimization problem,
the solution of a nearby problem obtained by small variations of one or more
parameters is approximated by the solution of a typically simpler minimization
problem than the original one.
Motivated by these aspects, in the present paper we contribute to the presently
ongoing investigation of stability properties of PDE-constrained optimal control problems. Due to its importance in many applications in hydrodynamics, medicine, environmental or ocean sciences, our work is based on the following control constrained
optimal control problem for the transient Navier-Stokes equations, i.e., we aim to
Z Z
Z
αT
αQ T
|y − yQ |2 dx dt +
|y(·, T ) − yT |2 dx
2 0 Ω
2 Ω
Z Z
Z Z
αR T
γ T
+
| curl y|2 dx dt +
|u|2 dx dt
(1.1)
2 0 Ω
2 0 Ω
minimize J(y, u) =
3. Sensitivity Analysis for NSE Opt. Control Problems
47
subject to the instationary Navier-Stokes system with distributed control u on a fixed
domain Ω ⊂ R2 given by
yt + (y · ∇)y − ν∆y + ∇π = u
in
Q := Ω × (0, T ),
(1.2)
div y = 0
in
Q,
(1.3)
y=0
y(·, 0) = y0
on
in
Σ := ∂Ω × (0, T ),
Ω,
(1.4)
(1.5)
in Q.
(1.6)
and pointwise control constraints of the form
a(x, t) ≤ u(x, t) ≤ b(x, t)
In (1.1)–(1.6) we have ν, γ > 0, and αQ , αT , αR ≥ 0. Further, we assume that
the data yQ , yT and y0 are sufficiently smooth; for more details see the subsequent
sections. We frequently refer to (1.1)–(1.6) as (P).
The optimal control problem (P) and its solutions are considered to be functions
of a number of perturbation parameters, namely of the scalars αQ , αT , αR and desired
state functions yQ , yT appearing in the objective J, of the viscosity ν (the inverse
of the Reynolds number), and of the initial conditions y0 in the state equation. To
emphasize the dependence on such a parameter vector p, we also write (P(p)) instead
of (P). The main result of our paper states that under a coercivity condition on the
Hessian of the Lagrangian of (P(p∗ )), where p∗ denotes some nominal (or reference)
parameter, an optimal solution is directionally differentiable with respect to p ∈ B(p∗ )
with B(p∗ ) some sufficiently small neighborhood of p∗ . We also characterize this
derivative as the solution of a linear-quadratic optimal control problem which involves
the linearized Navier-Stokes equations as well as pointwise inequality constraints on
the control similar to (1.6). While this work is primarily concerned with analysis, in
a forthcoming paper we focus on the algorithmic implications alluded to above.
Let us relate our work to recent efforts in the field: On the one hand, optimal control
problems for the Navier-Stokes equations (without dependence on a parameter) have
received a formidable amount of attention in recent years. Here we only mention [5, 9]
for steady-state problems and [1, 10, 11, 14, 27] for the time-dependent case. On the
other hand, a number of stability results for solutions to a variety of control-constrained
optimal control problems have been developed recently. As in the present paper, these
analyses concern the behavior of optimal solutions under perturbations of finite or
infinite dimensional parameters in the problem. We refer to, e.g., [18,24] for Lipschitz
stability in optimal control of linear and semilinear parabolic equations, and [7,16] for
recent results on differentiability properties. Related results for linear elliptic problems
with nonlinear boundary control can be found in [17, 19]. Further, Lipschitz stability
for state-constrained elliptic optimal control problems is the subject of [8].
For optimal control problems involving the Navier-Stokes equations with distributed
control, Lipschitz stability results have been obtained in [22] for the steady-state and
in [12, 28] for the time-dependent case. However, differential stability results are still
missing and are the focus of the present paper.
It is known that both Lipschitz and differential stability hinge on the condition
of strong regularity of the first order necessary conditions at a nominal solution; see
Dontchev [6] and Remark 3.8 below. The strong regularity of such a system is a
consequence of a coercivity condition on the Hessian of the Lagrangian, which is closely
related to second order sufficient conditions; compare Remark 4.2. Strong regularity
is also the basis of convergence proofs for numerical algorithms; see [2] for the general
Lagrange-Newton method and [12] for a SQP semismooth Newton-type algorithm for
the control of the time-dependent Navier-Stokes equations.
The plan of the paper is as follows: Section 2 introduces some notation and the
function space setting used throughout the paper. In Section 3 we recall the first order
48
Stability and Sensitivity Analysis
optimality system (OS) for our problem (P). We state the coercivity condition needed
(Assumption 3.4) to prove the strong regularity and to establish differential stability
results for a linearized version (LOS) of (OS) (see Theorem 3.9). Our main result
is given in Section 4: By an implicit function theorem for generalized equations, the
directional differentiability property carries over to the nonlinear optimality system
(OS), and the directional derivatives can be characterized. Additionally, we find
that our coercivity assumption implies the second order sufficient condition of [26],
which guarantees that critical points are indeed strict local optimizers. We proceed in
Section 5 by presenting Taylor expansions of the optimal value function about a given
nominal parameter value. Section 6 covers the case of the stationary Navier-Stokes
equations. Due to the similarity of the arguments involved, we only state the results
briefly.
2. Preliminaries
For the reader’s convenience we now collect the preliminaries for a proper analytical
formulation of our problem (P). Throughout, we assume that Ω ⊂ R2 is a bounded
domain with C 2 boundary ∂Ω. For given final time T > 0, we denote by Q the timespace cylinder Q = Ω × (0, T ) and by Σ its lateral boundary Σ = ∂Ω × (0, T ). We
begin with defining the spaces
H = closure in [L2 (Ω)]2 of {v ∈ [C0∞ (Ω)]2 : div v = 0}
V = closure in [H 1 (Ω)]2 of {v ∈ [C0∞ (Ω)]2 : div v = 0}.
These spaces form a Gelfand triple (see [23]): V ,→ H = H 0 ,→ V 0 , where V 0 denotes
the dual of V , and analogously for H 0 . Next we introduce the Hilbert spaces
Wqp = {v ∈ Lp (0, T ; V ) : vt ∈ Lq (0, T ; V 0 )},
endowed with the norm
kvkWqp = kvkLp (V ) + kvt kLq (V 0 ) .
We use W = W22 . Further, we define
H 2,1 = {v ∈ L2 (0, T ; H 2 (Ω) ∩ V ) : vt ∈ L2 (0, T ; H)},
endowed with the norm
kvkH 2,1 = kvkL2 (H 2 (Ω)) + kvt kL2 (L2 (Ω)) .
Here and elsewhere, vt refers to the distributional derivative of v with respect to the
time variable. For the sake of brevity, we simply write L2 (V ) instead of L2 (0, T ; V ),
etc.
Depending on the context, by h·, ·i we denote the duality pairing of either V and V 0 or
L2 (V ) and L2 (V 0 ), respectively. Additionally, by (·, ·) we denote the scalar products
of L2 (Ω) and L2 (Q). In the sequel, we will find it convenient to write L2 (Ω) or L2 (Q)
when we actually refer to [L2 (Ω)]2 or [L2 (Q)]2 , respectively.
In the following lemma, we recall some results about W and H 2,1 . The proofs can
be found in [4, 15, 20]; compare also [13]:
Lemma 2.1 (Properties of W and H 2,1 ).
(a) The space W is continuously embedded in the space C([0, T ]; H).
(b) The space W is compactly embedded in the space L2 (H) ⊆ L2 (Q).
(c) The space H 2,1 is continuously embedded in the space C([0, T ]; V ).
The time-dependent Navier-Stokes equations (1.2)–(1.5) are understood in their
weak form with divergence-free and boundary conditions incorporated in the space V .
3. Sensitivity Analysis for NSE Opt. Control Problems
49
That is, y ∈ W is a weak solution to the system (1.2)–(1.5) with given u ∈ L2 (V 0 ) if
and only if
yt + (y · ∇)y − ν∆y = u
y(·, 0) = y0
in L2 (V 0 ),
(2.1)
in H.
(2.2)
As usual, the pressure term ∇π cancels out due to the solenoidal, i.e., divergence-free,
function space setting. There holds, (compare [3, 23]):
Lemma 2.2 (Navier-Stokes Equations). For every y0 ∈ H and u ∈ L2 (V 0 ), there
exists a unique weak solution y ∈ W of (1.2)–(1.5). The map H × L2 (V 0 ) 3 (y0 , u) 7→
y ∈ W is locally Lipschitz continuous. Likewise, for every y0 ∈ V and u ∈ L2 (Q),
there exists a unique weak solution y ∈ H 2,1 of (1.2)–(1.5). The map V × L2 (Q) 3
(y0 , u) 7→ y ∈ H 2,1 is locally Lipschitz continuous.
For the linearized Navier-Stokes system, we have (compare [14]):
Lemma 2.3 (Linearized Navier-Stokes Equations). Assume that y ∗ ∈ W and let
f ∈ L2 (V 0 ) and g ∈ H. Then the linearized Navier-Stokes system
yt + (y ∗ · ∇)y + (y · ∇)y ∗ − ν∆y = f
y(·, 0) = g
in L2 (V 0 )
in H
has a unique solution y ∈ W , which depends continuously on the data:
kykW ≤ c (kf kL2 (V 0 ) + kgkL2 (Ω) )
(2.3)
where the constant c is independent of f and g. Likewise, if y ∗ ∈ W ∩ L∞ (V ) ∩
L2 (H 2 (Ω)), f ∈ L2 (Q) and g ∈ V , then y ∈ H 2,1 holds with continuous dependence
on the data:
kykH 2,1 ≤ c (kf kL2 (Q) + kgkH 1 (Ω) ).
(2.4)
Subsequently, we need the following result for the adjoint system (see [14, Proposition 2.4]):
Lemma 2.4 (Adjoint Equation). Assume that y ∗ ∈ W ∩ L∞ (V ) and let f ∈ L2 (V 0 )
and g ∈ H. Then the adjoint equation
−λt + (∇y ∗ )> λ − (y ∗ · ∇)λ − ν∆λ = f
λ(·, T ) = g
in W 0
in H
has a unique solution in λ ∈ W , which depends continuously on the data:
kλkW ≤ c (kf kL2 (V 0 ) + kgkL2 (Ω) )
(2.5)
where c is independent of f and g.
Next we define the Lagrange function L : W × U × W → R of (P):
αT
αQ
ky − yQ k2L2 (Q) +
ky(·, T ) − yT k2L2 (Ω)
2
2
Z T
αR
γ
2
2
+
k curl ykL2 (Q) + kukL2 (Q) +
hyt + (y · ∇)y − ν∆y, λi dt
2
2
0
Z
− (u, λ) + (y(·, 0) − y0 )λ(·, 0) dx
L(y, u, λ) =
Ω
(2.6)
where we took care of the fact that the Lagrange multiplier belonging to the constraint
y(·, 0) = y0 is identical to λ(·, 0) ∈ H, which is the adjoint state at the initial time.
50
Stability and Sensitivity Analysis
The Lagrangian is infinitely continuously differentiable and its second derivatives with
respect to y and u read
Lyy (y, u, λ)(y1 , y2 ) = αQ (y1 , y2 ) + αT (y1 (·, T ), y2 (·, T )) + αR (curl y1 , curl y2 )
Z
Z
+ ((y1 · ∇)y2 )λ dx dt + ((y2 · ∇)y1 )λ dx dt
(2.7)
Q
Q
Luu (y, u, λ)(u1 , u2 ) = γ(u1 , u2 )
while Lyu and Luy vanish.
In order to complete the proper description of problem (P), we recall for y ∈ R2
the definition


∂
∂
∂
y
−
y
2
1
∂y ∂x
∂y
∂
∂
.
y2 −
y1 and curl curl y = 
curl y =
∂
∂
∂
∂x
∂y
− ∂x ∂x y2 − ∂y
y1
It is straightforward to check that for y ∈ W , curl y ∈ L2 (Q) and curl curl y ∈ L2 (V 0 ).
3. Differential Stability of the Linearized Optimality System
In the present section we recall the first order optimality system (OS) associated
with our problem (P). We reformulate it as a generalized equation (GE) and introduce
its linearization (LGE). Then we prove directional differentiability of the solutions to
the linearized generalized equation (LGE). By virtue of an implicit function theorem
for generalized equations due to Robinson [21] and Dontchev [6], the differentiability
property carries over to the solution map of the original nonlinear optimality system
(OS), as is detailed in Section 4.
Let us begin by specifying the analytical setting for our problem (P). To this end,
we define the control space U = L2 (Q) and the closed convex subset of admissible
controls
Uad = {u ∈ L2 (Q) : a(x, t) ≤ u(x, t) ≤ b(x, t) a.e. on Q} ⊂ U,
where a(x, t) and b(x, t) are the bounds in L2 (Q). The inequalities are understood
componentwise. This choice of the control space motivates to use H 2,1 as the state
space, presumed the initial condition y0 is smooth enough. We can now write (P) in
the compact form
Minimize
J(y, u) over H 2,1 × Uad
subject to (2.1)–(2.2).
As announced earlier, we consider (P) in dependence on the parameter vector
p = (ν, αQ , αT , αR , γ, yQ , yT , y0 ) ∈ P = R5 × L2 (Q) × H × V,
which involves both quantities appearing in the objective function and in the governing
equations.
To ensure well-posedness of (P), we invoke the following assumption on p:
Assumption 3.1. We assume the viscosity parameter ν is positive and that the initial
conditions y0 are given in V . The weights in the objective satisfy αQ , αT , αR ≥ 0 and
γ > 0. Moreover, the desired trajectory and terminal states are yQ ∈ L2 (Q) and
yT ∈ H, respectively.
Under Assumption 3.1 it is standard to argue existence of a solution to (P); see,
e.g., [1]. A solution (y, u) ∈ H 2,1 × Uad is characterized by the following lemma.
3. Sensitivity Analysis for NSE Opt. Control Problems
51
Lemma 3.2 (Optimality System). Let Assumption 3.1 hold, and let (y, u) ∈ H 2,1 ×
Uad be a local minimizer of (P). Then there exists a unique adjoint state λ ∈ W such
that the following optimality system is satisfied:
− λt + (∇y)> λ − (y · ∇)λ − ν∆λ
= −αQ (y − yQ ) − αR curl curl y
Z
Q
λ(·, T ) = −αT (y(·, T ) − yT )
(γu − λ)(u − u) dx dt ≥ 0
in W 0
in H
for all u ∈ Uad
yt + (y · ∇)y − ν∆y = u
y(·, 0) = y0
(OS)
in L2 (V 0 )
in H .
As motivated in Section 2, we have stated the state and adjoint equations in their
weak form and in the solenoidal setting to eliminate the pressure π and the corresponding adjoint pressure.
In order to reformulate the optimality system (OS) as a generalized equation we
introduce the set-valued mapping N3 (u) : L2 (Q) → L2 (Q) as the dual cone of the set
of admissible controls Uad at u, i.e.,
N3 (u) = {v ∈ L2 (Q) : (v, u − u) ≤ 0 for all u ∈ Uad }
(3.1)
if u ∈ Uad , and N3 (u) = ∅ in case u 6∈ Uad . It is easily seen that the variational
inequality in (OS) is equivalent to
0 ∈ γu − λ + N3 (u).
Next we introduce the set-valued mapping
N (u) = (0, 0, N3 (u), 0, 0)>
and define F = (F1 , F2 , F3 , F4 , F5 )> as
F1 (y, u, λ, p) = − λt + (∇y)> λ − (y · ∇)λ − ν∆λ
+ αQ (y − yQ ) + αR curl curl y,
F2 (y, u, λ, p) = λ(·, T ) + αT (y(·, T ) − yT ),
F3 (y, u, λ, p) = γu − λ,
F4 (y, u, λ, p) = yt + (y · ∇)y − ν∆y − u,
(3.2)
F5 (y, u, λ, p) = y(·, 0) − y0
with
F : H 2,1 × U × W × P → L2 (V 0 ) × H × L2 (Q) × L2 (Q) × V.
Note that the parameter p appears as an additional argument. The optimality system
(OS) can now be rewritten as the generalized equation
0 ∈ F(y, u, λ, p) = F (y, u, λ, p) + N (u).
1
(GE)
Note that F(·, p) is a C function; compare [12].
From now on, let p∗ denote a reference (or nominal) parameter with associated
solution (y ∗ , u∗ , λ∗ ). Our goal is to show that the solution map p 7→ (yp , up , λp ) for
(GE) is well-defined near p∗ and that it is directionally differentiable at p∗ . By the
work of Robinson [21] and Dontchev [6], it is sufficient to show that the solutions to
the linearized generalized equation


y − y∗
∗
∗
∗ ∗
0 ∗
∗
∗ ∗ 
δ ∈ F(y , u , λ , p ) + F (y , u , λ , p ) u − u∗  + N (u)
(LGE)
λ − λ∗
52
Stability and Sensitivity Analysis
have these properties for sufficiently small δ. This fact is appealing since one has to
deal with a linearization of F instead of the fully nonlinear system. In addition, one
only needs to consider perturbations δ which, unlike p, appear solely on the left hand
side of the equation. Note that F is the gradient of the Lagrangian L (see (2.6)), and
F 0 , the derivative with respect to (y, u, λ), is its Hessian.
Throughout this section we work under the following assumption:
∗
∗
∗
Assumption 3.3. Let p∗ = (ν ∗ , αQ
, αT∗ , αR
, γ ∗ , yQ
, yT∗ , y0∗ ) ∈ P = R5 ×L2 (Q)×H ×V
be a given reference or nominal parameter such that Assumption 3.1 is satisfied. Moreover, let (y ∗ , u∗ , λ∗ ) be a given nominal solution to the first order necessary conditions
(OS).
A short calculation shows that the linearized generalized equality (LGE) is identical
to the system
− λt + (∇y ∗ )> λ − (y ∗ · ∇)λ − ν ∗ ∆λ
∗
∗
= − αQ (y − yQ
) − αR
curl curl y
− (∇(y − y ∗ ))> λ∗
+ ((y − y ∗ ) · ∇)λ∗ + δ1
Z
Q
λ(·, T ) = −
αT∗ (y(·, T )
−
yT∗ )
+ δ2
in W 0
in H
∗
(γ u − λ − δ3 )(u − u) dx dt ≥ 0 for all u ∈ Uad
(LOS)
yt + (y ∗ · ∇)y + (y · ∇)y ∗ − ν ∗ ∆y
= u + δ4 + (y ∗ · ∇)y ∗
y(·, 0) =
y0∗
+ δ5
in L2 (V 0 )
in H.
In turn, (LOS) can be interpreted as the first order optimality system for the linear
quadratic problem (AQP(δ)), depending on δ:
Z
∗ Z T Z
αQ
αT∗
∗ 2
Minimize
|y − yQ | dx dt +
|y(·, T ) − yT∗ |2 dx
2 0 Ω
2 Ω
Z TZ
Z Z
α∗
γ∗ T
+ R
| curl y|2 dx dt +
|u|2 dx dt − hδ1 , yiL2 (V 0 ),L2 (V )
2 0 Ω
2 0 Ω
Z TZ
− (δ2 , y(·, T )) − (δ3 , u) +
((y − y ∗ ) · ∇)(y − y ∗ )λ∗ dx dt
0
Ω
subject to the linearized Navier-Stokes system given above in (LOS) and u ∈ Uad .
Note that the nominal solution (y ∗ , u∗ , λ∗ ) satisfies both the nonlinear optimality
system (OS) and the linearized optimality system (LOS) for δ = 0.
The following coercivity condition is crucial for proving Lipschitz continuity and
directional differentiability of the function δ 7→ (yδ , uδ , λδ ) which maps a perturbation
δ to a solution of (AQP(δ)):
Assumption 3.4 (Coercivity).
Suppose that there exists ρ > 0 such that the coercivity condition
Υ(y, u) :=
∗
αQ
α∗
α∗
γ∗
kyk2L2 (Q) + T ky(·, T )k2L2 (Ω) + R k curl yk2L2 (Q) + kuk2L2 (Q)
2
2
2
2
Z TZ
+
(3.3)
((y · ∇)y)λ∗ dx dt ≥ ρ kuk2L2 (Q)
0
Ω
holds at least for all u = u1 − u2 where u1 , u2 ∈ Uad , i.e., for all u ∈ L2 (Q) which
satisfy |u(x, t)| ≤ b(x, t) − a(x, t) a.e. on Q (in the componentwise sense), and for the
3. Sensitivity Analysis for NSE Opt. Control Problems
53
corresponding states y ∈ H 2,1 satisfying the linear PDE
yt + (y ∗ · ∇)y + (y · ∇)y ∗ − ν ∗ ∆y = u
y(·, 0) = 0
in L2 (V 0 ) ,
(3.4)
in H.
(3.5)
Remark 3.5 (Strict Convexity).
Let C = {(y, u) | u ∈ Uad , y satisfies (3.4)–(3.5)}. The Coercivity Assumption 3.4
immediately implies that C 3 (y, u) 7→ Υ(y, u) is strictly convex over C. Since the
quadratic part of the objective (3.3) in (AQP(δ)) coincides with Υ, (3.3) is also
strictly convex over C. The same holds for the objective (3.7) in the auxiliary problem
(DQP(δ̂)) below so that the strict convexity will allow us to conclude uniqueness of
the sensitivity derivative in the proof of Theorem 3.9 later on.
Finally, we notice that Υ(y, u) is equal to 12 Lxx (y ∗ , u∗ , λ∗ )(x, x) with p = p∗ and
x = (y, u, λ); compare (2.7).
Remark 3.6 (Smallness of the Adjoint). Obviously the only term in (3.3) which
can spoil the coercivity condition is the term involving λ∗ , which originates from the
state equation’s nonlinearity. Hence, for the coercivity condition to be satisfied, it is
sufficient that the nominal adjoint variable λ∗ is sufficiently small in an appropriate
norm. In fact, for λ∗ = 0 condition (3.3) holds with ρ = γ ∗ /2 > 0.
A first consequence of the coercivity assumption is the Lipschitz continuity of the
map δ 7→ (yδ , uδ , λδ ). We refer to [25] for the Burgers equation, to [22] for the stationary Navier-Stokes equations and to [12, 28] for the instationary case.
Lemma 3.7 (Lipschitz Stability). Under Assumptions 3.3 and 3.4, there exists a
unique solution (yδ , uδ , λδ ) to (LOS) and thus to (LGE) for every δ. The mapping
δ 7→ (yδ , uδ , λδ ) is Lipschitz continuous from L2 (V 0 ) × H × L2 (Q) × L2 (Q) × V to
H 2,1 × U × W .
Remark 3.8 (Strong Regularity). The Lipschitz stability property established by Lemma
3.7 above is called strong regularity of the generalized equation (GE) at the nominal
critical point (y ∗ , u∗ , λ∗ , p∗ ). Strong regularity implies that the Lipschitz continuity
and differentiability properties of the map δ 7→ (yδ , uδ , λδ ) are inherited by the map
p 7→ (yp , up , λp ) in view of the implicit function theorem for generalized equations;
see [21] and [6]. This is utilized below in Section 4. Note that in the absence of control constraints, the operator N (u) is identical to {0}, and strong regularity becomes
bounded invertability of the Hessian of the Lagrangian F 0 , which is also required by
the classical implicit function theorem.
To study the directional differentiability of the map δ 7→ (yδ , uδ , λδ ), we introduce
the following definitions: At the nominal solution (y ∗ , u∗ , λ∗ ), we define (up to sets of
measure zero)
Q+ = {(x, t) ∈ Q : u∗ (x, t) = a(x, t)} and
Q− = {(x, t) ∈ Q : u∗ (x, t) = b(x, t)}
collecting the points where the constraint u∗ ∈ Uad is active. We again point out that
indeed there is one such set for each component of u, but we can continue to use our
notation without ambiguity. From the variational inequality in (OS) one infers that
γu − λ ∈ L2 (Q) acts as a Lagrange multiplier for the constraint u ∈ Uad . Hence we
define the sets
∗ ∗
∗
Q+
0 = {(x, t) ∈ Q : (γ u − λ )(x, t) > 0} and
∗ ∗
∗
Q−
0 = {(x, t) ∈ Q : (γ u − λ )(x, t) < 0}
54
Stability and Sensitivity Analysis
−
+
−
where the constraint is said to be strongly active. Note that Q+
0 ⊂ Q and Q0 ⊂ Q
hold true. Finally, we set
bad = {u ∈ L2 (Q) : u ≥ 0 on Q− , u ≤ 0 on Q+ , u = 0 on Q+ ∪ Q− }.
U
0
0
(3.6)
bad contains the admissible control variations (see Theorem 3.9 below) and
The set U
reflects the fact that on Q− , where the nominal control u∗ is equal to the lower bound
a, any admissible sequence of controls can approach it only from above; analogously
for Q+ . In addition, the control variation is zero to first order on the strongly active
+
subsets Q−
0 and Q0 .
We now turn to the main result of this section, which is to prove directional differentiability of the map δ 7→ (yδ , uδ , λδ ). This extends the proof of Lipschitz stability of
the same map in [12,22,28]. It turns out that the coercivity Assumption 3.4 is already
sufficient to obtain our new result.
Subsequently we denote by ”→” convergence with respect to the strong topology
and by ”*” convergence with respect to the weak topology.
Theorem 3.9. Under Assumptions 3.3 and 3.4, the mapping δ 7→ (yδ , uδ , λδ ) is directionally differentiable at δ = 0. The derivative in the direction of δ̂ = (δ̂1 , δ̂2 , δ̂3 , δ̂4 , δ̂5 )> ∈
L2 (V 0 ) × H × L2 (Q) × L2 (Q) × V is given by the unique solution (ŷ, û) ∈ H 2,1 × U
and adjoint variable λ̂ ∈ W of the linear-quadratic problem (DQP(δ̂))
Z
∗ Z T Z
αQ
αT∗
2
Minimize
|y| dx dt +
|y(·, T )|2 dx
2 0 Ω
2 Ω
Z TZ
Z Z
α∗
γ∗ T
+ R
| curl y|2 dx dt +
|u|2 dx dt − δ̂1 , y L2 (V 0 ),L2 (V )
2 0 Ω
2 0 Ω
Z TZ
− (δ̂2 , y(·, T )) − (δ̂3 , u) +
((y · ∇)y)λ∗ dx dt
(3.7)
0
Ω
subject to the linearized Navier-Stokes system
yt + (y · ∇)y ∗ + (y ∗ · ∇)y − ν ∗ ∆y = u + δ̂4
y(·, 0) = δ̂5
in
L2 (V 0 ),
in
H
(3.8)
bad . Its first order conditions are
and u ∈ U
− λt + (∇y ∗ )> λ − (y ∗ · ∇)λ − ν ∗ ∆λ
∗
∗
= − αQ
y − αR
curl curl y − (∇y)> λ∗ + (y · ∇)λ∗ + δ̂1
λ(·, T ) = − αT∗ y(·, T ) + δ̂2
Z
(γ ∗ u − λ − δ̂3 )(u − u) dx dt ≥ 0
Q
in W 0 ,
in H,
bad ,
for all u ∈ U
(3.9)
(3.10)
plus the state equation (3.8).
Proof. Let δ̂ ∈ L2 (V 0 ) × H × L2 (Q) × L2 (Q) × V be any given direction of perturbation
and let {τn } be a sequence of real numbers such that τn & 0. We set δn = τn δ̂ and
denote the solution of (AQP(δn )) by (yn , un , λn ). Note that (y ∗ , u∗ , λ∗ ) is the solution
of (AQP(0)). Then, by virtue of Lemma 3.7, we have
yn − y ∗ un − u∗ λn − λ∗ ≤ L kδ̂k
+
(3.11)
τn 2,1 + τn 2
τn H
L (Q)
W
with some Lipschitz constant L > 0. Since H 2,1 is a Hilbert space, we can extract a
weakly convergent subsequence (still denoted by index n) and use compactness of the
3. Sensitivity Analysis for NSE Opt. Control Problems
55
embedding of H 2,1 into L2 (Q) (see Lemma 2.1) to obtain:
yn − y ∗
* ŷ
τn
in H 2,1
and
→ ŷ
in L2 (Q).
(3.12)
for some ŷ ∈ H 2,1 . In the case of λ, the same argument with H 2,1 replaced by W
applies and we obtain
λn − λ∗
* λ̂ in W and → λ̂ in L2 (Q)
(3.13)
τn
for some λ̂ ∈ W . By taking yet another subsequence in (3.12) and (3.13), the convergence can be taken to hold pointwise almost everywhere in Q. Let us now denote by
PUad (u) the pointwise projection of any function u onto the admissible set Uad . From
the variational inequality in (LOS) it follows that
1
un = PUad
(λn + τn δ̂3 ) ∈ Uad .
γ∗
Following the technique in [7, 16], by distinguishing the cases of inactive, active and
strongly active control, one shows that the pointwise limit in the control component
is
1
bad .
(λ̂ + δ̂3 ) ∈ U
û = PUbad
γ∗
By Lebesgue’s Dominated Convergence Theorem with a suitable upper bound (see [7]),
we obtain the strong convergence in the control component:
un − u∗
→ û in L2 (Q).
(3.14)
τn
Now we prove that the limit ŷ introduced in (3.12) satisfies the state equation (3.8),
i.e.,
ŷt + (y ∗ · ∇)ŷ + (ŷ · ∇)y ∗ − ν ∗ ∆ŷ = û + δ̂4
ŷ(·, 0) = δ̂5
in L2 (V 0 )
(3.15)
in H.
(3.16)
Recalling the linear state equation in (LOS), we observe that the quotient qn =
(yn − y ∗ )/τn satisfies
(qn )t + (y ∗ · ∇)qn + (qn · ∇)y ∗ − ν ∗ ∆qn =
un − u∗
+ δ̂4
τn
in L2 (V 0 )
whose left and right hand sides converge weakly in L2 (Q) to (3.15) since the left hand
side maps qn ∈ H 2,1 to an element of L2 (Q), linearly and continuously. Likewise,
(3.16) is satisfied. Similarly, one proves that the limit λ̂ satisfies (3.9). To complete
the proof, we need to show that the convergence in (3.12) and (3.13) is strong in H 2,1
and W , respectively. To this end, note that (yn − y ∗ )/τn − ŷ satisfies the linear state
equation (3.15) with û replaced by (un − u∗ )/τn − û and δ̂4 replaced by zero. The a
priori estimate (2.4) now yields the desired convergence as the right hand side tends
to zero in L2 (Q), i.e., we have
yn − y ∗
→ ŷ
τn
in H 2,1 .
(3.17)
By a similar argument for the adjoint equation (3.9), using the a priori estimate (2.5),
we find
λn − λ∗
→ λ̂ in W.
(3.18)
τn
We recall that so far the convergence only holds for a subsequence. However, the whole
argument remains valid if in the beginning, one starts with an arbitrary subsequence
56
Stability and Sensitivity Analysis
of {τn }. Then the limit (ŷ, û, λ̂) again satisfies the first order optimality system (3.8)–
(3.10). Since the critical point is unique in view of the strict convexity of the objective
(3.7) guaranteed by Coercivity Assumption 3.4 and Remark 3.5, this limit is always
the same, regardless of the initial subsequence. Hence the convergence in (3.14), (3.17)
and (3.18) extends to the whole sequence, which proves that (ŷ, û, λ̂) is the desired
directional derivative.
Finally, it is straightforward to verify that (3.8)–(3.10) are the first order conditions
for the linear-quadratic problem (DQP(δ̂)).
4. Differential Stability of the Nonlinear Optimality System
By the implicit function theorems for generalized equations [6, 21], the properties
of the solutions for the linearized optimality system (LOS) carry over to the solutions of the nonlinear optimality system (OS). In [22] and [28], this was exploited to
show Lipschitz stability of the map p 7→ (yp , up , λp ) by proving the same property for
δ 7→ (yδ , uδ , λδ ), in the presence of the stationary and instationary Navier-Stokes equations, respectively. We can now continue this analysis and prove that both Lipschitz
continuity and directional differentiability hold. Our main result is:
Theorem 4.1. Under Assumptions 3.3 and 3.4, there is a neighborhood B(p∗ ) of p∗
such that for all p ∈ B(p∗ ) there exists a solution (yp , up , λp ) to the first order conditions (OS) of the perturbed problem (P(p)). This solution is unique in a neighborhood
of (y ∗ , u∗ , λ∗ ). The optimal control u, the corresponding state y and the adjoint variable λ are Lipschitz continuous functions of p in B(p∗ ) and directionally differentiable
at p∗ . In the direction of
p̂ = (ν̂, α̂Q , α̂T , α̂R , γ̂, ŷQ , ŷT , ŷ0 ) ∈ P = R5 × L2 (Q) × H × V,
bad and the adjoint
this derivative is given by the unique solution (ŷ, û) ∈ H 2,1 × U
variable of the linear-quadratic problem (DQP(δ̂)) in the direction
δ̂ = (δ̂1 , δ̂2 , δ̂3 , δ̂4 , δ̂5 )> = −Fp (y ∗ , u∗ , λ∗ , p∗ ) p̂

∗
∗
ŷQ − α̂R curl curl y ∗
) + αQ
ν̂∆λ∗ − α̂Q (y ∗ − yQ


−α̂T (y ∗ (·, T ) − yT∗ ) + αT∗ ŷT




=
−γ̂u∗
.


∗


ν̂∆y
ŷ0

(4.1)
Proof. For the local uniqueness of the solution (yp , up , λp ) and its Lipschitz continuity,
it is enough to verify that F is Lipschitz with respect to p near p∗ , uniformly in a
neighborhood of (y ∗ , u∗ , λ∗ ). For instance, for F1 we have (see (3.2))
kF1 (y, u, λ, p1 ) − F1 (y, u, λ, p2 )kL2 (V 0 )
1
2
1
1
2
≤ |ν1 − ν2 | k∆λkL2 (V 0 ) + |αQ
− αQ
| kykL2 (Q) + |αQ
| kyQ
− yQ
kL2 (Q)
1
2
2
1
2
+ |αQ
− αQ
| kyQ
kL2 (Q) + |αR
− αR
|k curl curl ykL2 (V 0 )
≤ L kp1 − p2 k,
where L depends on the diameters of the neighborhoods of (y ∗ , u∗ , λ∗ ) and p∗ only.
The claim now follows from the implicit function theorem for generalized equations,
see Dontchev [6, Theorem 2.4]. Directional differentiability follows from the same
theorem, since it is easily seen that F is Fréchet differentiable with respect to p. The next remark clarifies that the Coercivity Assumption 3.4 implies that a second
order sufficient optimality condition holds at the reference point (y ∗ , u∗ , λ∗ ), which,
thus, is a strict local minimizer.
3. Sensitivity Analysis for NSE Opt. Control Problems
57
Remark 4.2 (Second Order Sufficiency). Recently, second order sufficient optimality
conditions for (y ∗ , u∗ , λ∗ ) were proved in [26]. One of these conditions requires that
∗
αQ
α∗
α∗
γ∗
kyk2L2 (Q) + T ky(·, T )k2L2 (Ω) + R k curl yk2L2 (Q) + kuk2L2 (Q)
2
2
2
Z2
∗
+ ((y · ∇)y)λ dx ≥ ρ kuk2Lq (Q)
Ω
(4.2)
with q = 4/3 and some ρ > 0 holds for all pairs (y, u) where y solves (3.4) and
u ∈ L2 (Q) satisfies u = u − u∗ with u ∈ Uad . Additionally, u may be chosen zero on
so-called -strongly active subsets of Ω.
Hence, any such u is in Uad − Uad = {u1 − u2 | u1 , u2 ∈ Uad }. Consequently, Assumption 3.4 implies that (4.2) holds for all q ≤ 2, and, by [26, Theorem 4.12], there exist
α, β > 0 such that
J(y, u) ≥ J(y ∗ , u∗ ) + αku − u∗ k2L4/3 (Q)
holds for all admissible pairs with ku − u∗ kL2 (Q) ≤ β. In particular, (y ∗ , u∗ ) is a strict
local minimizer in the sense of L2 (Q).
Corollary 4.3 (Strict Local Optimality). As was already mentioned in [22, Corollary 3.5] for the stationary case, the Coercivity Assumption 3.4 and thus the second
order sufficient condition (4.2) are stable under small perturbation of p∗ . That is, (3.3)
∗
∗
∗
, yT∗ , y0∗ ) in
, γ ∗ , yQ
, αT∗ , αR
continues to hold, possibly with a smaller ρ, if p∗ = (ν ∗ , αQ
∗
(3.3)–(3.4) is replaced by a parameter p sufficiently close to p . As a consequence,
possibly by shrinking the neighborhood U of p∗ mentioned in Theorem 4.1, the corresponding (yp , up ) are strict local minimizers for the perturbed problems (P(p)).
Remark 4.4 (Strict Complementarity). Assume that û is the directional derivative of
bad in
the nominal control u∗ for p = p∗ , in a given direction p̂. From the definition of U
(3.6) it becomes evident that in general −û can not be the directional derivative in the
direction of −p̂ since it may not be admissible. That is, the directional derivative is in
general not linear in the direction but only positively homogeneous. However, linearity
−
+
−
does hold if the sets Q+
0 \ Q and Q0 \ Q are null sets, or, in other words, if strict
complementarity holds at the nominal solution (y ∗ , u∗ , λ∗ ).
Remark 4.5. Recall that by Assumption 3.3 one or more of the parameters αQ , αT
and αR may have a nominal value of zero. That is, every neighborhood of p∗ contains
parameter vectors with negative α entries. According to Corollary 4.3 however, the
terms associated to these negative α values are absorbed by the ρkuk2 term in the Coercivity Assumption 3.4 for small enough perturbations, so that the perturbed problems
remain locally convex.
5. Taylor Expansions of the Minimum Value Function
This section is concerned with a Taylor expansion of the minimum value function
p 7→ Φ(p) = J(yp , up )
58
Stability and Sensitivity Analysis
in a neighborhood of the nominal parameter p∗ . The following theorem proves that
α̂T ∗
α̂Q ∗
∗ 2
∗
∗
ky − yQ
kL2 (Q) − αQ
(y ∗ − yQ
, ŷQ ) +
ky (·, T ) − yT∗ k2L2 (Ω)
DΦ(p∗ ; p̂) =
2
2
α̂R
γ̂
− αT∗ (y ∗ (·, T ) − yT∗ , ŷT ) +
k curl y ∗ k2L2 (Q) + ku∗ k2L2 (Q)
2
2
Z T
Z
+
ν̂(∇y ∗ , ∇λ∗ ) dt −
ŷ0 λ∗ (·, 0) dx
(5.1)
0
Ω
∗
∗
∗
, y − y Q ) − αQ
(ŷQ , y − y Q ) − αQ (y ∗ − yQ
, ŷQ )
D2 Φ(p∗ ; p, p̂) = α̂Q (y ∗ − yQ
+ α̂T (y ∗ (·, T ) − yT∗ , y(·, T ) − y T ) − αT∗ (ŷT , y(·, T ) − y T )
− αT (y ∗ (·, T ) − yT∗ , ŷT ) + α̂R (curl y ∗ , curl y) + γ̂(u∗ , u)
Z
Z T
+
ŷ0 λ(·, 0) dx
ν̂(∇y, ∇λ∗ ) + ν̂(∇y ∗ , ∇λ) dt −
Ω
0
(5.2)
are its first and second order directional derivatives. Here,
p̂ = (ν̂, α̂Q , α̂T , α̂R , γ̂, ŷQ , ŷT , ŷ0 ) ∈ P = R5 × L2 (Q) × H × V
and similarly p denote two given directions, and (ŷ, û, λ̂) and (y, u, λ) are the directional
derivatives of the nominal solution in p∗ in the directions of p̂ and p, respectively,
according to Theorem 4.1.
Theorem 5.1. The minimum value function possesses the Taylor expansion
1
Φ(p∗ + τ p̂) = Φ(p∗ ) + τ DΦ(p∗ ; p̂) + τ 2 D2 Φ(p∗ ; p̂, p̂) + o(τ 2 )
2
with the first and second directional derivatives given by (5.1)–(5.2).
(5.3)
Proof. It is known that the first order derivative of the value function equals the partial
derivative of the Lagrangian (2.6) with respect to the parameter, i.e., DΦ(p∗ ; p̂) =
Lp (y ∗ , u∗ , λ∗ , p∗ )(p̂); see, e.g., [16], which proves (5.1). For the second derivative, one
has to compute the total derivative of (5.1) with respect to p, which yields (5.2). The
estimate (5.3) then follows from the Taylor formula.
Remark 5.2. From (5.1) we conclude that a first order Taylor expansion can be easily
obtained without computing the sensitivity differentials (ŷ, û, λ̂).
6. Optimal Control of the Stationary Navier-Stokes Equations
In this section we briefly comment on the case of distributed control for the stationary Navier-Stokes equations. Due to the similarity of the arguments, we only give
the main results and the formulas.
First of all, our problem (P) now reads:
Z
Z
Z
αR
γ
αΩ
2
2
|y − yΩ | dx +
| curl y| dx +
|u|2 dx
Minimize J(y, u) =
2 Ω
2 Ω
2 Ω
subject to the stationary Navier-Stokes system with distributed control u:
(y · ∇)y − ν∆y + ∇π = u
div y = 0
y=0
in
Ω
in Ω
on ∂Ω
and control constraints u ∈ Uad , where
Uad = {u ∈ L2 (Ω) : a(x) ≤ u(x) ≤ b(x) a.e. on Ω} ⊂ U = L2 (Ω).
The parameter vector reduces to
p = (ν, αΩ , αR , γ, yΩ ) ∈ P = R4 × L2 (Ω).
3. Sensitivity Analysis for NSE Opt. Control Problems
59
Again, the Navier-Stokes system is understood in weak form, i.e.,
(y · ∇)y − ν∆y = u
in V 0 .
The Lagrangian in the stationary case reads
αR
γ
αΩ
ky − yΩ k2L2 (Ω) +
k curl yk2L2 (Ω) + kuk2L2 (Ω)
L(y, u, λ) =
2
2
2
+ h(y · ∇)y − ν∆y, λi − (u, λ).
The first order optimality system is given by
(∇y)> λ − (y · ∇)λ − ν∆λ = −αΩ (y − yΩ ) − αR curl curl y in V 0 ,
Z
for all u ∈ Uad
(γu − λ)(u − u) dx ≥ 0
Ω
(y · ∇)y − ν∆y = u
(OS)
in V ,
and F : V × U × V × P → V 0 × L2 (Ω) × V 0 now reads:
F1 (y, u, λ, p) = (∇y)> λ − (y · ∇)λ − ν∆λ + αΩ (y − yΩ ) + αR curl curl y
F2 (y, u, λ, p) = γu − λ
F3 (y, u, λ, p) = (y · ∇)y − ν∆y − u.
The conditions paralleling Assumptions 3.3 and 3.4 are:
∗
∗
∗
Assumption 6.1 (Nominal Point). Let p∗ = (ν ∗ , αΩ
, αR
, γ ∗ , yΩ
) ∈ P = R4 × L2 (Ω)
∗
∗
be a given reference or nominal parameter such that αΩ , αR ≥ 0 and γ ∗ > 0 hold and
∗
∈ L2 (Ω). Moreover, let (y ∗ , u∗ , λ∗ ) be a given solution to the first order necessary
yΩ
conditions (OS), termed a nominal solution.
Assumption 6.2 (Coercivity). Suppose that there exists ρ > 0 such that the coercivity
condition
Z
∗
∗
αR
γ∗
αΩ
2
2
2
kykL2 (Ω) +
k curl ykL2 (Ω) + kukL2 (Ω) + ((y · ∇)y)λ∗ dx ≥ ρ kuk2L2 (Ω)
2
2
2
Ω
(6.1)
holds for all u ∈ Uad − Uad ⊂ L2 (Ω), i.e., for all u ∈ L2 (Ω) which satisfy |u(x)| ≤
b(x) − a(x) a.e. on Ω (in the componentwise sense), and for the corresponding states
y ∈ V satisfying the linear PDE
(y ∗ · ∇)y + (y · ∇)y ∗ − ν ∗ ∆y = u
in V 0 .
(6.2)
Under Assumptions 6.1 and 6.2, the results and remarks of Section 3 remain valid
with the obvious modifications. In particular, we have
Theorem 6.3. Under Assumptions 6.1 and 6.2, the mapping δ 7→ (yδ , uδ , λδ ) is directionally differentiable at δ = 0. The derivative in the direction of δ̂ = (δ̂1 , δ̂2 , δ̂3 )> ∈
V 0 × L2 (Ω) × V 0 is given by the unique solution (ŷ, û) ∈ V × U and adjoint variable
λ̂ ∈ V of the auxiliary QP problem (DQP(δ̂))
Z
Z
Z
α∗
α∗
γ∗
Minimize Ω
|y|2 dx + R
| curl y|2 dx +
|u|2 dx − δ̂1 , y
2 Ω
2 Ω
2 Ω
Z
− (δ̂2 , u) + ((y · ∇)y)λ∗ dx
(6.3)
Ω
subject to the stationary linearized Navier-Stokes system
(y · ∇)y ∗ + (y ∗ · ∇)y − ν ∗ ∆y = u + δ̂3
in
V0
(6.4)
60
Stability and Sensitivity Analysis
bad . Its first order conditions are
and u ∈ U
∗
∗
(∇y ∗ )> λ − (y ∗ · ∇)λ − ν ∗ ∆λ = − αΩ
y − αR
curl curl y
Z
Ω
− (∇y)> λ∗ + (y · ∇)λ∗ + δ̂1
(γ ∗ u − λ − δ̂2 )(u − u) dx ≥ 0
in V 0
bad
for all u ∈ U
plus the linear state equation (6.4).
Also, results analogous to the ones of Section 4 remain valid. In particular, the
map p 7→ (yp , up , λp ) is directionally differentiable at p∗ with the derivative given by
the solution and adjoint variable of (DQP(δ̂)) in the direction of
δ̂ = (δ̂1 , δ̂2 , δ̂3 )> = −Fp (y ∗ , u∗ , λ∗ , p∗ ) p̂


∗
∗
ν̂∆λ∗ − α̂Ω (y ∗ − yΩ
) + αΩ
ŷΩ − α̂R curl curl y ∗
.
=
−γ̂u∗
∗
ν̂∆y
Finally, the directional derivatives of the minimum value function are
α̂R
α̂Ω ∗
∗ 2
∗
∗
ky − yΩ
kL2 (Ω) − αΩ
(y ∗ − yΩ
, ŷΩ ) +
k curl y ∗ k2L2 (Ω)
DΦ(p∗ ; p̂) =
2
2
γ̂
+ ku∗ k2L2 (Ω) + ν̂(∇y ∗ , ∇λ∗ )
2
∗
∗
∗
D2 Φ(p∗ ; p, p̂) = α̂Ω (y ∗ − yΩ
, y − y Ω ) − αΩ
(ŷΩ , y − y Ω ) − αΩ (y ∗ − yΩ
, ŷΩ )
+ α̂R (curl y ∗ , curl y) + γ̂(u∗ , u) + ν̂(∇y, ∇λ∗ ) + ν̂(∇y ∗ , ∇λ).
Acknowledgments
The third author acknowledges support of the Sonderforschungsbereich 609 Elektromagnetische Strömungskontrolle in Metallurgie, Kristallzüchtung und Elektrochemie,
located at the Technische Universität Dresden, and supported by the German Research
Foundation.
References
[1] F. Abergel and R. Temam. On some optimal control problems in fluid mechanics. Theoretical
and Computational Fluid Mechanics, 1(6):303–325, 1990.
[2] W. Alt. The Lagrange-Newton method for infinite-dimensional optimization problems. Numerical
Functional Analysis and Optimization, 11:201–224, 1990.
[3] P. Constantin and C. Foias. Navier-Stokes Equations. The University of Chicago Press, Chicago,
1988.
[4] R. Dautray and J. L. Lions. Mathematical Analysis and Numerical Methods for Science and
Technology, volume 5. Springer, Berlin, 2000.
[5] M. Desai and K. Ito. Optimal Controls of Navier-Stokes Equations. SIAM Journal on Control
and Optimization, 32:1428–1446, 1994.
[6] A. Dontchev. Implicit function theorems for generalized equations. Mathematical Programming,
70:91–106, 1995.
[7] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system—
Part I: Solution differentiability. Numerical Functional Analysis and Optimization, 25(1–2):93–
117, 2004.
[8] R. Griesse. Lipschitz stability of solutions to some state-constrained elliptic optimal control
problems. Journal of Analysis and its Applications, 25:435–455, 2006.
[9] M. Gunzburger, L. Hou, and T. Svobodny. Analysis and finite element approximation of optimal control problems for the stationary Navier-Stokes equations with distributed and Neumann
controls. Mathematics of Computation, 57(195):123–151, 1991.
[10] M. Gunzburger and S. Manservisi. Analysis and approximation of the velocity tracking problem for Navier-Stokes flows with distribued controls. SIAM Journal on Numerical Analysis,
37(5):1481–1512, 2000.
3. Sensitivity Analysis for NSE Opt. Control Problems
61
[11] M. Gunzburger and S. Manservisi. The velocity tracking problem for Navier-Stokes flows with
boundary control. SIAM Journal on Control and Optimization, 39(2):594–634, 2000.
[12] M. Hintermüller and M. Hinze. An SQP Semi-Smooth Newton-Type Algorithm Applied to the
Instationary Navier-Stokes System Subject to Control Constraints. SIAM Journal on Optimization, 16(4):1177–1200, 2006.
[13] M. Hinze. Optimal and instantaneous control of the instationary Navier–Stokes equations. Habilitation Thesis, Fachbereich Mathematik, Technische Universität Berlin, 2000.
[14] M. Hinze and K. Kunisch. Second order methods for optimal control of time-dependent fluid
flow. SIAM Journal on Control and Optimization, 40(3):925–946, 2001.
[15] J. L. Lions. Quelques méthodes de résolution des problemès aux limites non linéaires. Dunod
Gauthier-Villars, Paris, 1969.
[16] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002.
[17] K. Malanowski. Solution differentiability of parametric optimal control for elliptic equations. In
E. W. Sachs and R. Tichatschke, editors, System Modeling and Optimization XX, Proceedings
of the 20th IFIP TC 7 Conference, pages 271–285. Kluwer Academic Publishers, 2003.
[18] K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control
for parabolic equations. Journal of Analysis and its Applications, 18(2):469–489, 1999.
[19] K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control
for elliptic equations. Control and Cybernetics, 29:237–256, 2000.
[20] P. Neittaanmäki and D. Tiba. Optimal Control of Nonlinear Parabolic Systems. Marcel Dekker,
New York, 1994.
[21] S. Robinson. Strongly regular generalized equations. Mathematics of Operations Research,
5(1):43–62, 1980.
[22] T. Roubı́ček and F. Tröltzsch. Lipschitz stability of optimal controls for the steady-state NavierStokes equations. Control and Cybernetics, 32(3):683–705, 2003.
[23] R. Temam. Navier-Stokes Equations, Theory and Numerical Analysis. North-Holland, Amsterdam, 1984.
[24] F. Tröltzsch. Lipschitz stability of solutions of linear-quadratic parabolic control problems with
respect to perturbations. Dynamics of Continuous, Discrete and Impulsive Systems Series A
Mathematical Analysis, 7(2):289–306, 2000.
[25] F. Tröltzsch and S. Volkwein. The SQP method for control constrained optimal control of the
Burgers equation. ESAIM: Control, Optimisation and Calculus of Variations, 6:649–674, 2001.
[26] F. Tröltzsch and D. Wachsmuth. Second-order sufficient optimality conditions for the optimal
control of Navier-Stokes equations. ESAIM: Control, Optimisation and Calculus of Variations,
12(1):93–119, 2006.
[27] M. Ulbrich. Constrained optimal control of Navier-Stokes flow by semismooth Newton methods.
Systems and Control Letters, 48:297–311, 2003.
[28] D. Wachsmuth. Regularity and stability of optimal controls of instationary Navier-Stokes equations. Control and Cybernetics, 34:387–410, 2005.
62
Stability and Sensitivity Analysis
4. Sensitivity Analysis for Optimal Boundary Control Problems of a 3D
Reaction-Diffusion System
R. Griesse and S. Volkwein: Parametric Sensitivity Analysis for Optimal Boundary
Control of a 3D Reaction-Diffusion System, in: Large-Scale Nonlinear Optimization,
G. Di Pillo and M. Roma (editors), volume 83 of Nonconvex Optimization and its
Applications, p.127–149, Springer, Berlin, 2006
This paper extends the previous stability and sensitivity analysis to a class of timedependent semilinear parabolic boundary optimal control problems. More precisely, we
consider here the reaction-diffusion optimal control problem in three space dimensions:
β2
β1
kc1 (T ) − c1T k2L2 (Ω) + kc2 (T ) − c2T k2L2 (Ω)
Minimize
2
2
n Z T
o3
γ
1
+ ku − ud k2L2 (0,T ) + max 0,
u(t) dt − uc
2
ε
0

c
=
D
∆c
−
k
c
c
in
Q := Ω × (0, T ),

1,t
1
1
1 1 2




c2,t = D2 ∆c2 − k2 c1 c2 in Q,





∂c1


=0
on Σ := ∂Ω × (0, T ),
 D1

(4.1)

∂n


∂c1
subject to
D2
= u(t)α(x, t)
on Σc ,

∂n



∂c1



=0
on Σn
D2


∂n




c1 (·, 0) = c10
in Ω,



 c (·, 0) = c
in Ω,
2
20
and ua ≤ u ≤ ub a.e. in (0, T ).
Here, ci denotes the concentration of the ith reactant, and Di and ki are diffusion and
reaction constants.
The state equation, the optimal control problem and a primal-dual active set method
in function space had been analyzed previously by the authors in Griesse and Volkwein
[2005] and the extended preprint Griesse and Volkwein [2003].
In the paper under discussion, we establish the Lipschitz stability and directional
differentiability of local optimal solutions of (4.1) with respect to the perturbation
parameter
π = (Di , ki , βi , γ, uc , ε, ci0 , ciT , ud )i=1,2 ,
provided that second-order sufficient conditions hold. As before, we proceed by proving the Lipschitz stability and directional differentiability for the linearized optimality
system (Propositions 3.2 and 3.3 in the paper). The proof requires the compactness
of the spatial trace operator τ : W (0, T ) → L2 (0, T ; L2 (Γ)). The main result, Theorem 4.1, then follows from the Implicit Function Theorem 0.6.
Numerical results for the nominal, the perturbed and the sensitivity problems are
also provided, see Section 5 of the paper. In particular, sensitivity derivatives of the
optimal control and optimal state are calculated and interpreted.
4. Sensitivity Analysis for Reaction-Diffusion Problems
63
PARAMETRIC SENSITIVITY ANALYSIS FOR OPTIMAL
BOUNDARY CONTROL OF A 3D REACTION-DIFFUSION
SYSTEM
ROLAND GRIESSE AND STEFAN VOLKWEIN
Abstract. A boundary optimal control problem for an instationary nonlinear
reaction-diffusion equation system in three spatial dimensions is presented. The
control is subject to pointwise control constraints and a penalized integral constraint. Under a coercivity condition on the Hessian of the Lagrange function,
an optimal solution is shown to be a directionally differentiable function of perturbation parameters such as the reaction and diffusion constants or desired and
initial states. The solution’s derivative, termed parametric sensitivity, is characterized as the solution of an auxiliary linear-quadratic optimal control problem.
A numerical example illustrates the utility of parametric sensitivities which allow
a quantitative and qualitative perturbation analysis of optimal solutions.
1. Introduction
Parametric sensitivity analysis for optimal control problems governed by partial
differential equations (PDE) is concerned with the behavior of optimal solutions under
perturbations of system data. The subject matter of the present paper is an optimal
boundary control problem for a time-dependent coupled system of semilinear parabolic
reaction-diffusion equations. The equations model a chemical or biological process
where the species involved are subject to diffusion and reaction among each other.
The goal in the optimal control problem is to drive the reaction-diffusion model from
the given initial state as close as possible to a desired terminal state. However, the
control has to be chosen within given upper and lower bounds which are motivated by
physical or technological considerations.
In practical applications, it is unlikely that all parameters in the model are precisely known a priori. Therefore, we embed the optimal control problem into a family
of problems, which depend on a parameter vector p. In our case, p can comprise physical parameters such as reaction and diffusion constants, but also desired terminal
states, etc. In this paper we prove that under a coercivity condition on the Hessian
of the Lagrange function, local solutions of the optimal control problem depend Lipschitz continuously and directionally differentiably on the parameter p. Moreover, we
characterize the derivative as the solution of an additional linear-quadratic optimal
control problem, known as the sensitivity problem. If these sensitivities are computed
”offline”, i.e., along with the optimal solution of the nominal (unperturbed) problem
belonging to the expected parameter value p0 , a first order Taylor approximation can
give a real-time (”online”) estimate of the perturbed solution.
Let us put the current paper into a wider perspective: Lipschitz dependence and
differentiability properties of parameter-dependent optimal control problems for PDEs
have been investigated in the recent papers [6, 11–14, 16, 18]. In particular, sensitivity
results have been derived in [6] for a two-dimensional reaction-diffusion model with
distributed control. In contrast, we consider here the more difficult situation in three
spatial dimensions and with boundary control and present both theoretical and numerical results. Other numerical results can be found in [3, 7].
1
64
Stability and Sensitivity Analysis
The main part of the paper is organized as follows: In Section 2, we introduce the
reaction-diffusion system at hand and the corresponding optimal control problem. We
also state its first order optimality conditions. Since this problem, without parameter
dependence, has been thoroughly investigated in [9], we only briefly recall the main
results. Section 3 is devoted to establishing the so-called strong regularity property for
the optimality system. This necessitates the investigation of the linearized optimality
system for which the solution is shown to be Lipschitz and differentiable with respect
to perturbations. In Section 4, these properties for the linearized problem are shown to
carry over to the original nonlinear optimality system, in virtue of a suitable implicit
function theorem. Finally, we present some numerical results in Section 5 in order to
further illustrate the concept of parametric sensitivities.
Necessarily all numerical results are based on a discretized version of our infinitedimensional problem. Nevertheless we prefer to carry out the analysis in the continuous
setting so that smoothness properties of the involved quantities become evident which
could then be used for instance to determine rates of convergence under refinements
of the discretization etc. In view of our problem involving a nonlinear time-dependent
system of partial differential equations, its discretization yields a large scale nonlinear
optimization problem, albeit with a special structure.
2. The Reaction-Diffusion Optimal Boundary Control Problem
Reaction-diffusion equations model chemical or biological processes where the species
involved are subject to diffusion and reaction among each other. As an example, we
consider the reaction A + B → C which obeys the law of mass action. To simplify
the discussion, we assume that the backward reaction C → A + B is negligible and
that the forward reaction proceeds with a constant (not temperature-dependent) rate.
This leads to a coupled semilinear parabolic system for the respective concentrations
(c1 , c2 , c3 ) as follows:
∂
c1 (t, x) = d1 ∆c1 (t, x) − k1 c1 (t, x)c2 (t, x)
for all (t, x) ∈ Q,
(2.1a)
∂t
∂
c2 (t, x) = d2 ∆c2 (t, x) − k2 c1 (t, x)c2 (t, x)
for all (t, x) ∈ Q,
(2.1b)
∂t
∂
c3 (t, x) = d3 ∆c3 (t, x) + k3 c1 (t, x)c2 (t, x)
for all (t, x) ∈ Q.
(2.1c)
∂t
The scalars di and ki , i = 1, . . . , 3, are the diffusion and reaction constants, respectively. Here and throughout, let Ω ⊂ R3 denote the domain of reaction and let
Q = (0, T ) × Ω be the time-space cylinder where T > 0 is the given final time. We
suppose that the boundary Γ = ∂Ω is Lipschitz and can be decomposed into two
disjoint parts Γ = Γn ∪ Γc , where Γc denotes the control boundary. Moreover, we let
Σn = (0, T ) × Γn and Σc = (0, T ) × Γc . We impose the following Neumann boundary
conditions:
∂c1
d1
(t, x) = 0
for all (t, x) ∈ Σ,
(2.2a)
∂n
∂c2
(t, x) = u(t) α(t, x)
for all (t, x) ∈ Σc ,
(2.2b)
d2
∂n
∂c2
(t, x) = 0
for all (t, x) ∈ Σn ,
(2.2c)
d2
∂n
∂c3
(t, x) = 0
for all (t, x) ∈ Σ.
(2.2d)
d3
∂n
Equation (2.2b) prescribes the boundary flux of the second substance B by means
of a given shape function α(t, x) ≥ 0, modeling, e.g., the location of a spray nozzle
revolving with time around one of the surfaces of Ω, while u(t) denotes the control
4. Sensitivity Analysis for Reaction-Diffusion Problems
65
intensity at time t which is to be determined. The remaining homogeneous Neumann
boundary conditions simply correspond to a ”no-outflow” condition of the substances
through the boundary of the reaction vessel Ω.
In order to complete the description of the model, we impose initial conditions for
all three substances involved, i.e.,
c1 (0, x) = c10 (x)
for all x ∈ Ω,
(2.3a)
c2 (0, x) = c20 (x)
c3 (0, x) = c30 (x)
for all x ∈ Ω,
for all x ∈ Ω.
(2.3b)
(2.3c)
Our goal is to drive the reaction-diffusion model (2.1)–(2.3) from the given initial
state near a desired terminal state. Hence, we introduce the cost functional
Z
1
β1 |c1 (T ) − c1T |2 + β2 |c2 (T ) − c2T |2 dx
J1 (c1 , c2 , u) =
2 Ω
Z
γ T
+
|u − ud |2 dt.
2 0
Here and in the sequel, we will find it convenient to abbreviate the notation and write
c1 (T ) instead of c1 (T, ·) or omit the arguments altogether when no ambiguity arises.
In the cost functional, β1 , β2 and γ are non-negative weights, c1T and c2T are the
desired terminal states, and ud is some desired (or expected) control. In order to
shorten the notation, we have assumed that the objective J1 does not depend on the
product concentration c3 . This allows us to delete the product concentration c3 from
the equations altogether and consider only the system for (c1 , c2 ). All results obtained
can be extended to the three-component system in a straightforward way.
The control u : [0, T ] → R is subject to pointwise box constraints ua (t) ≤ u(t) ≤
ub (t). It is reasonable to assume that ua (t) ≥ 0, which together with α(t, x) ≥ 0
implies that the second (controlled) substance B can not be withdrawn through the
boundary. The presence of an upper limit ub is motivated by technological reasons.
In addition to the pointwise constraint, it may be desirable to limit the total amount
of substance B added during the process, i.e., to impose a constraint like
Z T
u(t) dt ≤ uc .
0
In the current investigation, we do not enforce this inequality directly but instead we
add a penalization term
)3
( Z
T
1
u(t) dt − uc
J2 (u) = max 0,
ε
0
to the objective, which then assumes the final form
J(c1 , c2 , u) = J1 (c1 , c2 , u) + J2 (u).
(2.4)
Our optimal control problem can now be stated as problem (P)
Minimize
J(c1 , c2 , u) s.t. (2.1a)–(2.1b), (2.2a)–(2.2c) and (2.3a)–(2.3b)
and ua (t) ≤ u(t) ≤ ub (t) hold.
(P)
2.1. State Equation and Optimality System. The results in this section draw
from the investigations carried out in [9] and are stated here for convenience and
without proof. Our problem (P) can be posed in the setting
u ∈ U = L2 (0, T )
(c1 , c2 ) ∈ Y = W (0, T ) × W (0, T ).
66
Stability and Sensitivity Analysis
That is, we consider the state equation (2.1a)–(2.1b), (2.2a)–(2.2c) and (2.3a)–(2.3b)
in its weak form, see Remark 2.4 and Section 2.2 for details. Here and throughout,
L2 (0, T ) denotes the usual Sobolev space [1] of square-integrable functions on the
interval (0, T ) and the Hilbert space W (0, T ) is defined as
∂
W (0, T ) = ϕ ∈ L2 (0, T ; H 1(Ω)) :
ϕ ∈ L2 (0, T ; H 1 (Ω)′ ) .
∂t
containing functions of different regularity in space and time. Here, H 1 (Ω) is again the
usual Sobolev space and H 1 (Ω)′ is its dual. At this point we note for later reference
the compact embedding [17, Chapter 3, Theorem 2.1]
W (0, T ) ֒→֒→ L2 (0, T ; H s (Ω))
for any 1/2 < s < 1
(2.5)
involving the fractional-order space H s (Ω). For convenience of notation, we define the
admissible set
Uad = {u ∈ U : ua (t) ≤ u(t) ≤ ub (t)}.
Let us summarize the fundamental results about the state equation and problem
(P). We begin with the following assumption which is needed throughout the paper:
Assumption 2.1.
(a) Let Ω ⊂ R3 be a bounded open domain with Lipschitz continuous boundary Γ = ∂Ω, which is partitioned into the control part Γc and
the remainder Γn . Let di and ki , i = 1, 2 be positive constants, and assume
that α ∈ L∞ (0, T ; L2(Γc )) is non-negative. The initial conditions ci0 , i = 1, 2
are supposed to be in L2 (Ω). T > 0 is the given final time of the process.
(b) For the control problem, we assume desired terminal states ciT ∈ L2 (Ω), i =
1, 2, and desired control ud ∈ L2 (0, T ) to be given. Moreover, let β1 , β2
be non-negative and γ be positive. Finally, we assume that the penalization
parameter ε is positive and that uc ∈ R and ua and ub are in L∞ (0, T ) such
RT
that 0 ua (t) dt ≤ uc .
Theorem 2.2. Under Assumption 2.1(a), the state equation (2.1a)–(2.1b), (2.2a)–
(2.2c) and (2.3a)–(2.3b) has a unique weak solution (c1 , c2 ) ∈ W (0, T ) × W (0, T ) for
any given u ∈ L2 (0, T ). The solution satisfies the a priori estimate
kc1 kW (0,T ) + kc2 kW (0,T ) ≤ C 1 + kc10 kL2 (Ω) + kc20 kL2 (Ω) + kukL2 (0,T )
with some constant C > 0.
In order to state the system of first order necessary optimality conditions, we introduce the active sets
A− (u) = {t ∈ [0, T ] : u(t) = ua (t)}
A+ (u) = {t ∈ [0, T ] : u(t) = ub (t)}
for any given control u ∈ Uad .
Theorem 2.3. Under Assumption 2.1, the optimal control problem (P) possesses at
least one global solution in Y × Uad . If (c1 , c2 , u) ∈ Y × Uad is a local solution, then
4. Sensitivity Analysis for Reaction-Diffusion Problems
there exists a unique adjoint variable (λ1 , λ2 ) ∈ Y satisfying
∂
− λ1 − d1 ∆λ1 = −k1 c2 λ1 − k2 c2 λ2
∂t
∂
− λ2 − d2 ∆λ2 = −k1 c1 λ1 − k2 c1 λ2
∂t
∂λ1
=0
d1
∂n
∂λ2
d2
=0
∂n
λ1 (T ) = −β1 (c1 (T ) − c1T )
λ2 (T ) = −β2 (c2 (T ) − c2T )
67
in Q,
(2.6a)
in Q,
(2.6b)
on Σ,
(2.6c)
on Σ,
(2.6d)
in Ω,
(2.6e)
in Ω
(2.6f)
2
in the weak sense, and a unique Lagrange multiplier ξ ∈ L (0, T ) such that the optimality condition
o2 Z
n Z T
3
u(t) dt − uc − α(t, x) λ2 (t, x) dx + ξ(t) = 0 (2.7)
γ(u(t) − ud (t)) + max 0,
ε
Γc
0
holds for almost all t ∈ [0, T ], together with the complementarity condition
ξ|A− (u) ≤ 0,
ξ|A+ (u) ≥ 0.
(2.8)
Remark 2.4. The partial differential equations throughout this paper are always meant
in their weak form. In case of the state and adjoint equations (2.1)–(2.3) and (2.6),
respectively, the weak forms are precisely stated in Section 2.2 below, see the definition
of F . However, we prefer to write the equations in their strong form to make them
easier understandable.
Solutions to the optimality system (2.6)–(2.8), including the state equation, can be
found numerically by employing, e.g., semismooth Newton or primal-dual active set
methods, see [8, 10, 19] and [2, 9], respectively.
In the sequel, we will often find it convenient to use the abbreviations y = (c1 , c2 )
for the vector of state variables, x = (y, u) for state/control pairs, and λ = (λ1 , λ2 )
for the vector of adjoint states. In passing, we define the Lagrangian associated to our
problem (P),
Z T
Z
Z
∂
L(x, λ) = J(x) +
c1 , λ1 + d1
∇c1 ∇λ1 dx + k1 c1 c2 λ1 dx dt
∂t
0
Ω
Ω
Z
Z
Z
Z T
∂
c2 , λ2 + d2
∇c2 ∇λ2 dx + k2 c1 c2 λ2 dx − d2
α u λ2 dx dt
+
∂t
Ω
∂Ω
0
Z
ZΩ
(c2 (0) − c20 ) λ2 (0) dx (2.9)
(c1 (0) − c10 ) λ1 (0) dx +
+
Ω
Ω
for any x = (c1 , c2 , u) ∈ Y × U and λ = (λ1 , λ2 ) ∈ Y . The bracket hu, vi denotes the
duality between u ∈ H 1 (Ω)′ and v ∈ H 1 (Ω). The Lagrangian is twice continuously
differentiable, and its Hessian with respect to the state and control variables is readily
seen to be
Lxx (x, λ)(x, x) = β1 kc1 (T )k2L2 (Ω) + β2 kc2 (T )k2L2 (Ω) + γkuk2L2 (0,T )
) Z
( Z
!2
Z
T
T
6
+ max 0,
u(t) dt − uc
u(t) dt + 2 (k1 λ1 + k2 λ2 ) c1 c2 dx dt.
ε
0
0
Q
(2.10)
The Hessian is a bounded bilinear form, i.e., there exists a constant C > 0 such that
Lxx (x, λ)(x1 , x2 ) ≤ C kx1 kY ×U kx2 kY ×U
68
Stability and Sensitivity Analysis
holds for all (x1 , x2 ) ∈ [Y × U ]2 .
2.2. Parameter Dependence. As announced in the introduction, we consider problem (P) in dependence on a vector of parameters p and emphasize this by writing
(P(p)). It is our goal to investigate the behavior of locally optimal solutions of (P(p)),
or solutions of the optimality system (2.6)–(2.8) for that matter, as p deviates from
its given nominal value p∗ . In practice, the parameter vector p can be thought of
as problem data which may be subject to perturbation or uncertainty. The nominal
value p∗ is then simply the expected value of the data. Our main result (Theorem
4.1) states that under a coercivity condition on the Hessian (2.10) of the Lagrange
function, the solution of the optimality system belonging to (P(p)) depends directionally differentiably on p. The derivatives are called parametric sensitivities since they
yield the sensitivities of their underlying quantities with respect to perturbations in
the parameter. Our analysis can be used to predict the solution at p near the nominal value p∗ using a Taylor expansion. This can be exploited to devise a solution
algorithm for (P(p)) with real-time capabilities, provided that the nominal solution to
(P(p∗ )) along with the sensitivities are computed beforehand (”offline”). In addition,
the sensitivities allow a qualitative perturbation analysis of optimal solutions.
In our current problem, we take
p = (d1 , d2 , k1 , k2 , β1 , β2 , γ, uc, ε, c10 , c20 , c1T , c2T , ud )
∈ R9 × L2 (Ω)4 × L2 (0, T ) =: Π
(2.11)
as the vector of perturbation parameters. Note that p belongs to an infinite-dimensional Hilbert space and that, besides containing physical parameters such as the
reaction and diffusion constants ki and di , it comprises non-physical data such as the
penalization parameter ε.
In order to carry out our analysis, it is convenient to rewrite the optimality system
(2.6)–(2.8) plus the state equation as a generalized equation, involving a set-valued
operator. We notice that the complementarity condition (2.8) together with (2.7) is
equivalent to the variational inequality
Z
0
T
ξ(t)(u(t) − u(t)) dt ≤ 0 ∀u ∈ Uad .
(2.12)
This can also be expressed as ξ ∈ N (u) where
N (u) = {v ∈ L2 (0, T ) :
Z
0
T
v (u − u) dt ≤ 0 for all u ∈ Uad }
if u ∈ Uad , and N (u) = ∅ if u 6∈ Uad . This set-valued operator is known as the
dual cone of Uad at u (after identification of L2 (0, T ) with its dual). To rewrite the
remaining components of the optimality system into operator form, we introduce
F : W (0, T ) × L2 (0, T ) × W (0, T ) × Q → Z
with the target space Z given by
Z = L2 (0, T ; H 1 (Ω)′ )2 × L2 (Ω)2 × L2 (0, T ) × L2 (0, T ; H 1 (Ω)′ )2 × L2 (Ω)2 .
4. Sensitivity Analysis for Reaction-Diffusion Problems
69
The components of F are given next. Wherever it appears, φ denotes an arbitrary
function in L2 (0, T ; H 1(Ω)). For reasons of brevity, we introduce K = k1 λ1 + k2 λ2 .
Z
F1 (y, u, λ, p)(φ) =
Z
F2 (y, u, λ, p)(φ) =
T
0
T
0
Z
Z
∂
Kc2 φ dx dt
∇λ1 · ∇φ dx +
− λ1 , φ + d1
∂t
Ω
Ω
Z
Z
∂
− λ2 , φ + d2
Kc1 φ dx dt
∇λ2 · ∇φ dx +
∂t
Ω
Ω
F3 (y, u, λ, p) = λ1 (T ) + β1 (c1 (T ) − c1T )
F4 (y, u, λ, p) = λ2 (T ) + β2 (c2 (T ) − c2T )
)2 Z
( Z
T
3
α λ2 dx
−
u(t) dt − uc
F5 (y, u, λ, p) = γ(u − ud ) + max 0,
ε
Γc
0
Z T
Z
Z
∂
c1 , φ + d1 ∇c1 · ∇φ dx + k1 c1 c2 φ dx dt
∂t
0
Ω
Ω
Z
Z
Z T
∂
c2 , φ + d2 ∇c2 · ∇φ dx + k2 c1 c2 φ dx dt
F7 (y, u, λ, p)(φ) =
∂t
Ω
Ω
0
Z
α u φ dx dt
−
F6 (y, u, λ, p)(φ) =
Σ
F8 (y, u, λ, p) = c1 (0) − c10
F9 (y, u, λ, p) = c2 (0) − c20 .
At this point it is not difficult to see that the optimality system (2.6)–(2.8), including
the state equation (2.1a)–(2.1b), (2.2a)–(2.2c) and (2.3a)–(2.3b), is equivalent to the
generalized equation
0 ∈ F (y, u, λ, p) + N (u)
(2.13)
where we have set N (u) = (0, 0, 0, 0, N (u), 0, 0, 0, 0)⊤ ⊂ Z. In the next section, we will
investigate the following linearization around a given solution (y ∗ , u∗ , λ∗ ) of (2.13) and
for the given parameter p∗ . This linearization depends on a new parameter δ ∈ Z:


y − y∗
δ ∈ F (y ∗ , u∗ , λ∗ , p∗ ) + F ′ (y ∗ , u∗ , λ∗ , p∗ ) u − u∗  + N (u).
λ − λ∗
(2.14)
Herein F ′ denotes the Fréchet derivative of F with respect to (y, u, λ). Note that F is
the gradient of the Lagrangian L and F ′ is its Hessian whose ”upper-left block” was
already mentioned in (2.10).
3. Properties of the Linearized Problem
In order to become more familiar with the linearized generalized equation (2.14), we
write it in its strong form, assuming smooth perturbations δ = (δ1 , . . . , δ5 ). For better
readability, the given parameter p∗ is still denoted as in (2.11), without additional ∗
70
Stability and Sensitivity Analysis
in every component. We obtain from the linearizations of F1 through F4 :
∂
λ1 − d1 ∆λ1 + K c∗2 + K ∗ c2 = K ∗ c∗2 + δ1
∂t
∂
− λ2 − d2 ∆λ2 + K c∗1 + K ∗ c1 = K ∗ c∗1 + δ2
∂t
∂λ1
= δ1 |Σ
d1
∂n
∂λ2
= δ2 |Σ
d2
∂n
λ1 (T ) = −β1 (c1 (T ) − c1T ) + δ3
−
λ2 (T ) = −β2 (c2 (T ) − c2T ) + δ4
in Q,
(3.1a)
in Q,
(3.1b)
on Σ,
(3.1c)
on Σ,
(3.1d)
in Ω,
(3.1e)
in Ω,
∗
k1 λ∗1
where we have abbreviated K = k1 λ1 + k2 λ2 and K =
+
components F6 through F9 we obtain a linearized state equation:
∂
c1 − d1 ∆c1 + k1 c1 c∗2 + k1 c∗1 c2 = k1 c∗1 c∗2 + δ6
∂t
∂
c2 − d2 ∆c2 + k2 c1 c∗2 + k2 c∗1 c2 = k2 c∗1 c∗2 + δ7
∂t
∂c1
= δ6 |Σ
d1
∂n
∂c2
= α u + δ7 |Σ
d2
∂n
c1 (0) = c10 + δ8
c2 (0) = c20 + δ9
(3.1f)
k2 λ∗2 .
From the
in Q,
(3.2a)
in Q,
(3.2b)
on Σ,
(3.2c)
on Σ,
(3.2d)
in Ω,
(3.2e)
in Ω.
(3.2f)
Finally, the component F5 becomes the variational inequality
Z T
ξ(t)(u(t) − u(t)) dt ≤ 0 ∀u ∈ Uad
(3.3)
0
where in analogy to the original problem, ξ ∈ L2 (0, T ) is defined through
o2 Z
n Z T
3
∗
α λ2 dx − δ5
γ(u − ud ) + max 0,
u (t) dt − uc −
ε
Γc
0
n Z T
oZ T
6
u∗ (t) dt − uc
(u(t) − u∗ (t)) dt + ξ(t) = 0. (3.4)
+ max 0,
ε
0
0
In turn, the system (3.1)–(3.4) is easily recognized as the optimality system for an
auxiliary linear quadratic optimization problem, which we term (AQP(δ)):
Z
Z
1
∗
∗
c1T c1 (T ) dx − β2
c2T c2 (T ) dx
Minimize Lxx (x , λ )(x, x) − β1
2
Ω
Ω
o2 Z T
n Z T
3
+ max 0,
u∗ (t) dt − uc
u(t) dt
ε
0
0
!
!
Z T
n Z T
o Z T
6
u∗ (t) dt − uc
− max 0,
u∗ (t) dt
u(t) dt
ε
0
0
0
Z T
Z
−γ
ud u dt − (k1 λ∗1 + k2 λ∗2 )(c∗1 c2 + c1 c∗2 ) dx dt
0
Q
− hδ1 , c1 i − hδ2 , c2 i −
Ω
Z
Z
Z
δ3 c1 (T ) −
Ω
δ4 c2 (T ) −
0
T
δ5 u dt
(3.5)
4. Sensitivity Analysis for Reaction-Diffusion Problems
71
subject to the linearized state equation (3.2) above and u ∈ Uad . The bracket hδ1 , c1 i
here denotes the duality between L2 (0, T ; H 1 (Ω)) and its dual L2 (0, T ; H 1 (Ω)′ ). In
order for (AQP(δ)) to have a strictly convex objective and thus to have a unique
solution, we require the following assumption:
Assumption 3.1 (Coercivity Condition).
We assume that there exists ρ > 0 such that
Lxx (x∗ , λ∗ )(x, x) ≥ ρ kxk2Y ×U
holds for all x = (c1 , c2 , u) ∈ Y × U which satisfy the linearized state equation (3.2) in
weak form, with all right hand sides except the term α u replaced by zero.
Sufficient conditions for Assumption 3.1 to hold are given in [9, Theorem 3.15]. We
now prove our first result for the auxiliary problem (AQP(δ)):
Proposition 3.2 (Lipschitz Stability for the Linearized Problem).
Under Assumption 2.1, holding for the parameter p∗ , and Assumption 3.1, (AQP(δ))
has a unique solution which depends Lipschitz continuously on the parameter δ ∈ Z.
That is, there exists L > 0 such that for all δ̂, δ̌ ∈ Z with corresponding solutions (x̂, λ̂)
and (x̌, λ̌),
kĉ1 − č1 kW (0,T ) + kĉ2 − č2 kW (0,T ) + kû − ǔkL2 (0,T )
+ kλ̂1 − λ̌1 kW (0,T ) + kλ̂2 − λ̌2 kW (0,T ) ≤ L kδ̂ − δ̌kZ
hold.
Proof. The proof follows the technique of [18] and is therefore kept relatively short
here. Throughout, we denote by capital letters the differences we wish to estimate,
i.e., C1 = ĉ1 − č1 , etc. To improve readability, we omit the differentials dx and dt in
integrals whenever possible. We begin by testing the weak form of the adjoint equation
(3.1) by C1 and C2 , and testing the weak form of the state equation (3.2) by Λ1 and
Λ2 , using integration by parts with respect to time and plugging in the initial and
terminal conditions from (3.1) and (3.2). One obtains
2
Z
2
β1 kC1 (T )k + β2 kC2 k + 2
∗
Z
K C1 C2 +
α UΛ
Σ
Z
Z
C2 (T )∆4 − hΛ1 , ∆6 i − hΛ2 , ∆7 i
C1 (T )∆3 +
= − hC1 , ∆1 i − hC2 , ∆2 i +
Ω
Ω
Z
Z
−
Λ1 (0)∆8 −
Λ2 (0)∆9 . (3.6)
Q
Ω
Ω
From the variational inequality (3.3), using u = û or u = ǔ as test functions, we get
Z
−
Σ
2
α U Λ2 ≤ −γ kU k +
Z
0
) Z
!2
( Z
T
T
6
∗
U dt . (3.7)
U ∆5 − max 0, u (t) dt − uc
ε
0
0
T
72
Stability and Sensitivity Analysis
Unless otherwise stated, all norms are the natural norms for the respective terms.
Adding the inequality (3.7) to (3.6) above and collecting terms yields
Lxx (x∗ , λ∗ )((C1 , C2 , U ), (C1 , C2 , U ))
Z T
Z
Z
C2 (T )∆4 +
C1 (T )∆3 +
U ∆5
≤ − hC1 , ∆1 i − hC2 , ∆2 i +
Ω
Ω
0
Z
Z
Λ2 (0)∆9
Λ1 (0)∆8 −
− hΛ1 , ∆6 i − hΛ2 , ∆7 i −
Ω
Ω
9
1 X
k∆i k2
≤ κ (1 + c2 ) kC1 k2 + kC2 k2 + kΛ1 k2 + kΛ2 k2 + κ kU k2 +
4κ i=1
(3.8)
where the last inequality has been obtained using Hölder’s inequality, the embedding
W (0, T ) ֒→ C([0, T ]; L2 (Ω)) and Young’s inequality in the form ab ≤ κa2 + b2 /(4κ).
The number κ > 0 denotes a sufficiently small constant which will be determined later
at our convenience. Here and throughout, generic constants are denoted by c. They
may take different values in different locations.
In order to make use of the Coercivity Assumption 3.1, we decompose Ci = zi + wi ,
i = 1, 2 and consider their respective equations, see (3.2). The z components account
for the control influence while the w components arise from the perturbation differences
∆1 , . . . , ∆4 . We have on Q, Σ and Ω, respectively,
∂
∂
z1 −d1 ∆z1 +k1 z1 c∗2 +k1 c∗1 z2 = 0,
w1 −d1 ∆w1 +k1 w1 c∗2 + k1 c∗1 w2 = ∆6
∂t
∂t
∂
∂
z2 −d2 ∆z2 +k2 z1 c∗2 +k2 c∗1 z2 = 0,
w2 −d2 ∆w2 +k2 w1 c∗2 + k2 c∗1 w2 = ∆7
∂t
∂t
∂z1
∂w1
d1
= 0,
d1
= ∆6 |Σ
∂n
∂n
∂w2
∂z2
= α U,
d2
= ∆7 |Σ
d2
∂n
∂n
z1 (0) = 0,
w1 (0) = ∆8
z2 (0) = 0,
w2 (0) = ∆9 .
Note that for (z1 , z2 , U ), the Coercivity Assumption 3.1 applies and that standard
a priori estimates yield kz1 k + kz2 k ≤ ckU k and kw1 k + kw2 k ≤ c(k∆6 k + k∆7 k +
k∆8 k + k∆9 k). Using the generic estimates kzi k2 ≥ kCi k2 − 2kCi kkwi k + kwi k2 and
kzi k ≤ kCi k + kwi k, the embedding W (0, T ) ֒→ C([0, T ]; L2 (Ω)) and the coercivity
assumption, we obtain
Lxx (x∗ , λ∗ )((C1 , C2 , U ), (C1 , C2 , U )) = Lxx (x∗ , λ∗ )((z1 , z2 , U ), (z1 , z2 , U ))
Z
Z
β
β1
+ β1 z1 (T )w1 (T ) + β2 z2 (T )w2 (T ) + kw1 (T )k2 + kw2 (T )k2
2
2
Ω
Z Ω
∗
K (w1 z2 + z1 w2 + w1 w2 )
+
Q
≥ ρ kC1 k2 + kC2 k2 + kU k2 − 2ρ kC1 kkw1 k + kC2 kkw2 k
− β1 c kw1 k kC1 k + kw1 k − β2 c kw2 k kC2 k + kw2 k
− c kK ∗ kL2 (Q) kw1 kkC2 k + kC1 kkw2 k + 3kw1 kkw2 k .
(3.9)
4. Sensitivity Analysis for Reaction-Diffusion Problems
73
For the last term, we have employed Hölder’s inequality and the embedding W (0, T ) ֒→
L4 (Q), see [4, p. 7]. Combining the inequalities (3.8) and (3.9) yields
ρ kC1 k2 + kC2 k2 + kU k2 ≤ 2ρ kC1 kkw1 k + kC2 kkw2 k
+ β1 c kw1 k kC1 k + kw1 k + β2 c kw2 k kC2 k + kw2 k
+ c kK ∗ kL2 (Ω) kw1 kkC2 k + kC1 kkw2 k + 3kw1 kkw2 k
9
+
κ
κ
1 X
k∆i k2 + (1 + c2 ) kC1 k2 + kC2 k2 + kΛ1 k2 + kΛ2 k2 + kU k2
8κ i=1
2
2
(3.10)
and the last two terms can be absorbed in the left hand side when choosing κ > 0
sufficiently small and observing that Λ1 and Λ2 depend continuously on the data
C1 and C2 . By the a priori estimate stated above, wi , i = 1, 2, can be estimated
against the data ∆7 , . . . , ∆9 . Using again Young’s inequality on the terms kCi kkwj k
and absorbing the quantities of type κkCi k2 into the left hand side, we obtain the
Lipschitz dependence of (C1 , C2 , U ) on ∆1 , . . . , ∆9 . Invoking once more the continuous
dependence of Λi on (C1 , C2 ), Lipschitz stability is seen to hold also for the adjoint
variable.
If (x∗ , λ∗ ) is a solution to the optimality system (2.6)–(2.8) and state equation, then
the previous theorem implies that the generalized equation (2.13) is strongly regular
at this solution, compare [15]. Before showing that the Coercivity Assumption 3.1
implies also directional differentiability of the solution of (AQP(δ)) in dependence on
δ, we introduce the strongly active subsets for the solution (y ∗ , u∗ , λ∗ ) with multiplier
ξ ∗ given by (2.7),
A0− (u∗ ) = {t ∈ [0, T ] : ξ ∗ (t) < 0}
A0+ (u∗ ) = {t ∈ [0, T ] : ξ ∗ (t) > 0}
Note that necessarily u∗ = ua on A0− (u∗ ) and u∗ = ub on A0+ (u∗ ) hold in view of the
variational inequality (2.12). Based on the notion of strongly active sets, we define
bad , the set of admissible control variations:
U

0
∗
0
∗

 u = 0 on A− (u ) ∪ A+ (u )
bad ⇔ u ∈ L2 (0, T ) and
u ≥ 0 on A− (u∗ )
u∈U


u ≤ 0 on A+ (u∗ ).
This definition reflects the fact that if the solution u∗ associated to the parameter
value p∗ is equal to the lower bound ua at some point t ∈ [0, T ], we can approach
it only from above (and vice versa for the upper bound). In addition, if the control
constraint is strongly active at some point, i.e., if it has a nonzero multiplier ξ ∗ there,
the variation is zero.
Proposition 3.3 (Differentiability for the Linearized Problem).
Under Assumptions 2.1 and 3.1, the unique solution to (AQP(δ)) depends directionally differentiably on the parameter δ ∈ Z. The directional derivative in the direction
of δ̂ ∈ Z is given by the solution of the auxiliary linear quadratic problem (DQP(δ̂)),
Z
Z
1
Minimize Lxx (x∗ , λ∗ )(x, x) − δ̂1 , c1 − δ̂2 , c2 − δ̂3 c1 (T ) − δ̂4 c2 (T )
2
Ω
Ω
Z T
δ̂5 u dt
−
0
74
Stability and Sensitivity Analysis
bad and the linearized state equation
subject to u ∈ U
∂
c1 − d1 ∆c1 + k1 c1 c∗2 + k1 c∗1 c2 = δ̂6
∂t
∂
c2 − d2 ∆c2 + k2 c1 c∗2 + k2 c∗1 c2 = δ̂7
∂t
∂c1
= δ̂6 |Σ
d1
∂n
∂c2
= α u + δ̂7 |Σ
d2
∂n
in Q
(3.11a)
in Q
(3.11b)
on Σ
(3.11c)
on Σ
(3.11d)
c1 (0) = δ̂8
in Ω
(3.11e)
c2 (0) = δ̂9
in Ω.
(3.11f)
Proof. Let δ̂ ∈ Z be any given direction of perturbation and let {τn } be a sequence of
real numbers such that τn ց 0. We set δn = τn δ̂ and denote the solution of (AQP(δn ))
by (cn1 , cn2 , un , λn1 , λn2 ). Note that (c∗1 , c∗2 , u∗ , λ∗1 , λ∗2 ) is the solution of (AQP(0)). Then,
by virtue of Proposition 3.2, we have
n
c1 − c∗1 cn2 − c∗2 un − u∗ λn1 − λ∗1 λn2 − λ∗2 +
+
+
+
τn τn τn τn τn ≤ L kδ̂k
(3.12)
2
in the norms of W (0, T ), L (0, T ), and Z, respectively, and with some Lipschitz constant L > 0. We can thus extract weakly convergent subsequences (still denoted by
index n) and use the compact embedding of W (0, T ) into L2 (Q) to obtain
un − u∗
⇀ ũ in L2 (0, T )
(3.13)
τn
cn1 − c∗1
⇀ ĉ1 in W (0, T ) and → ĉ1 in L2 (Q)
(3.14)
τn
and similarly for the remaining components. Taking yet another subsequence, all
components except the control are seen also to converge pointwise almost everywhere
in Q. From here, we only sketch the remainder of the proof since it closely parallels the
ones given in [6, 12]. In addition to the arguments given there, our analysis relies on
the strong convergence (and thus pointwise convergence almost everywhere on [0, T ]
of a subsequence) of
Z
Z
λn2 − λ∗2
α λ̂2 in L2 (0, T )
(3.15)
→
α
n
τ
Γc
Γc
which follows from the compact embedding of W (0, T ) into L2 (0, T ; H s (Ω)) for 1/2 <
s < 1 (see (2.5)) and the continuity of the trace operator H s (Ω) → L2 (Γc ). One
expresses un as the pointwise projection of un + ξ n /γ onto the admissible set Uad
with ξ n given by (3.4) evaluated at (un , λn2 ). Using (3.13) and (3.15), one shows that
(un − u∗ )/τ n possesses a pointwise convergent subsequence (still denoted by index n).
Distinguishing cases, one finds the pointwise limit û of (un −u∗ )/τ n to be the pointwise
bad . Using a suitable
projection of limn→∞ (un + ξ n /γ) onto the new admissible set U
upper bound in Lebesgue’s Dominated Convergence Theorem, one shows that û is
also the limit in the sense of L2 (0, T ) and thus û = ũ must hold. It remains to show
that the limit (ĉ1 , ĉ2 , û, λ̂1 , λ̂2 ) satisfy the first order optimality system for (DQP(δ̂))
(which is routine) and that the limits actually hold in their strong senses in W (0, T )
(which follows from standard a priori estimates). Since we could have started with a
subsequence of τ n in the first place and since the limit (ĉ1 , ĉ2 , û, λ̂1 , λ̂2 ) must always
be the same in view of the Coercivity Assumption 3.1, the convergence extends to the
whole sequence.
4. Sensitivity Analysis for Reaction-Diffusion Problems
75
4. Properties of the Nonlinear Problem
In the current section, we shall prove that the solutions to the original nonlinear
generalized equation (2.13) depend on p in the same way as the solutions to the
linearized generalized equation (2.14) depend on δ. To this end, we invoke an implicit
function theorem for generalized equations. Throughout this section, let again p∗ be
a given nominal (or unperturbed or expected) value of the parameter vector
p = (d1 , d2 , k1 , k2 , β1 , β2 , γ, uc, ε, c10 , c20 , c1T , c2T , ud )
∈ R9 × L2 (Ω)4 × L2 (0, T ) =: Π
satisfying Assumption 2.1. Moreover, let (x∗ , λ∗ ) = (c∗1 , c∗2 , u∗ , λ∗1 , λ∗2 ) be a solution
of the first order necessary conditions (2.6)–(2.8) plus the state equation, or, in other
words, of the generalized equation (2.13).
Theorem 4.1 (Lipschitz Continuity and Directional Differentiability). Under Assumptions 2.1 and 3.1, there exists a neighborhood B(p∗ ) ⊂ Π of p∗ and a neighborhood
B(y ∗ , u∗ , λ∗ ) ⊂ Y × U × Y and a Lipschitz continuous function
B(p∗ ) ∋ p 7→ (yp , up , λp ) ∈ B(y ∗ , u∗ , λ∗ )
such that (yp , up , λp ) solves the optimality system (2.6)–(2.8) plus the state equation
for parameter p and such that it is the only critical point in B(y ∗ , u∗ , λ∗ ). Moreover, the map p 7→ (yp , up , λp ) is directionally differentiable, and its derivative in
the direction p̂ ∈ Π is given by the unique solution of (DQP(δ̂)), in the direction of
δ̂ = −Fp (y ∗ , u∗ , λ∗ , p∗ ) p̂.
Proof. The proof is based on the implicit function theorem for generalized equations
from [5, 15]. It relies on the strong regularity property, which was shown in Proposition 3.2. It remains to verify that F is Lipschitz in p near p∗ , uniformly in a
neighborhood of (y ∗ , u∗ , λ∗ ), and that F is differentiable with respect to p, which
is straightforward. The formula for its derivative is given in the remark below.
Remark 4.2. In order to compute the parametric sensitivities of the nominal solution
(c∗1 , c∗2 , u∗ , λ∗1 , λ∗2 ) for (P(p∗ )) in a perturbation direction p̂, we need to solve the linearquadratic problem (DQP(δ̂)) with
δ̂ = − Fp (y ∗ , u∗ , λ∗ , p∗ ) p̂
Z
= − dˆ1
∇λ∗1 ∇ · +(k̂1 λ∗1 + k̂2 λ∗2 )c∗2 ·,
Q
β̂1 (c∗1 (T ) − c∗1T ) − β1∗ ĉ1T ,
dˆ2
Z
Q
∇λ∗2 ∇ · +(k̂1 λ∗1 + k̂2 λ∗2 )c∗1 ·,
β̂2 (c∗2 (T ) − c∗2T ) − β2∗ ĉ2T ,
6
3ε̂
γ̂(u∗ − u∗d ) − γ ∗ ûd − ∗ 2 I 2 − ∗ I ûc ,
(ε )
ε
Z
Z
Z
Z
k̂2 c∗1 c∗2 ·,
k̂1 c∗1 c∗2 ·,
dˆ2
∇c∗2 · ∇ · +
dˆ1
∇c∗1 · ∇ · +
Q
− ĉ10 ,
Q
Q
− ĉ20
Q
⊤
,
o
n R
T
where I means max 0, 0 u∗ (t) dt − u∗c . We close this section by remarking that
the parametric sensitivities allow to compute a second-order expansion of the value of
the objective, see [6,12] for details. In addition, the Coercivity Assumption 3.1 implies
that second order sufficient conditions hold at the nominal and also at the perturbed
solutions, so that points satisfying the first order necessary conditions are indeed strict
local optimizers.
76
Stability and Sensitivity Analysis
5. Numerical Results
In this section, we present some numerical results and show evidence that the
parametric sensitivities yield valuable information which is useful in making qualitative
and quantitative estimates of the solution under perturbations. In our example, the
three-dimensional geometry of the problem is given by the annular cylinder between
the planes z = 0 and z = 0.5 with inner radius 0.4 and outer radius 1.0 whose rotational
axis is the z-axis (Figure 5.1). The control boundary Γc is the upper annulus, and we
use the control shape function
α(t, x) = exp −5 (x1 − 0.7 cos(2πt))2 + (x2 − 0.7 sin(2πt))2 .
which corresponds to a nozzle circling for t ∈ [0, 1] once around in counter-clockwise
direction at a radius of 0.7. For fixed t, α is a function which decays exponentially
with the square of the distance from the current location of the nozzle. The problem
was discretized using the finite element method on a mesh consisting of 1797 points
and 7519 tetrahedra. The ’triangulation’ of the domain Ω by tetrahedra is also shown
in Figure 5.1. In the time direction, the interval [0, T ] was uniformly divided into 100
parts. By controlling the second substance B, we wish to steer the concentration of
Figure 5.1. Domain Ω ⊂ R3 and its triangulation with tetrahedra
the first substance A to zero at terminal time T = 1, i.e., we choose
β1∗ = 1
β2∗ = 0
c∗1T ≡ 0
The control cost parameter is γ ∗ = 10−2 and the control bounds are chosen as
ua ≡ 1
ub ≡ 5.
The chemical reaction is governed by equations (2.1)–(2.3) with parameters
d∗1 = 0.15
d∗2 = 0.20
k1∗ = 1.0
k2∗ = 1.0.
As initial concentrations, we use
c∗10 ≡ 1.0
c∗20 ≡ 0.0.
The discrete optimal solution without the contribution from the penalized integral
constraint J2 (corresponding to ε = ∞) yields
Z T
u∗ (t) dt = 4.2401,
J1 (c∗1 , c∗2 , u∗ ) = 0.2413.
0
In order for this constraint to become relevant, we choose u∗c = 3.5 and enforce it
using the penalization parameter ε∗ = 1. Details on the numerical implementation
are given in [8, 9]. For the discretization described above, we obtain a problem size of
approximately 726 000 variables, including the adjoint states, which takes a couple of
minutes to solve on a standard desktop PC.
4. Sensitivity Analysis for Reaction-Diffusion Problems
5.5
0.5
5
0.4
4.5
0.3
4
0.2
3.5
0.1
3
0
2.5
−0.1
2
−0.2
1.5
−0.3
−0.4
1
0.5
77
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
−0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 5.2. Left: Optimal control u∗ (thick solid), true perturbed
control up (thin solid) and predicted control (circles). Right: Parametric sensitivity dup∗ /dp in the direction of p − p∗ .
In Figures 5.3–5.4 (left columns) and Figure 5.2 (left), we show the individual
components of the optimal solution. We note that the optimal control lies on the upper
bound in the first part of the time interval, then in the interior of the admissible interval
[1, 5] and finally on the lower bound. From Figure 5.3 (left) we infer that as time
advances, substance A decays and approaches the desired value of zero to the extent
permitted by the control cost parameter γ and the control bounds. Figure 5.4 (left)
nicely shows the influence of the revolving control nozzle on the upper surface of the
annular cylinder, adding amounts of substance B over time which then diffuse towards
the interior of the reaction vessel and react with substance A.
In order to illustrate the sensitivity calculus, we perturb the reaction constants k1∗
and k2∗ by 50%, taking
k1 = 1.5
k2 = 1.5
as their new values. With the reaction now proceeding faster, one presumes that the
desired goal of consuming substance A within the given time interval will be achieved
to a higher degree, which will in fact be confirmed below from sensitivity information.
Figure 5.2 (left) shows, next to the nominal control, the solution obtained by a first
order Taylor approximation using the sensitivity of the control variable, i.e.,
up ≈ up∗ +
d
up∗ (p − p∗ ).
dp
To allow a comparison, the true perturbed solution is also depicted, which of course
required the repeated solution of the nonlinear optimal control problem (P(p)). It
is remarkable how well the perturbed solution can be predicted in face of a 50%
perturbation using the sensitivity information, without recomputing the solution to the
nonlinear problem. We observe that the perturbed control is lower than the nominal
one in the first part of the time interval, later to become higher. This behavior
can not easily be predicted without any sensitivity information at hand. Besides,
a qualitative analysis of the state sensitivities reveals more interesting information.
We have argued above that with the reaction proceeding faster, the control goal can
more easily be reached. This can be inferred from Figure 5.3 (right column), showing
that the sensitivity derivatives of the first substance are negative throughout, i.e., the
perturbed solution comes closer in a pointwise sense to the desired zero terminal state
(to first order). The sensitivities for the second state component (see Figure 5.4, right
column) nicely reflect the expected behavior inferred from the control sensitivities, see
Figure 5.2 (right). As the perturbed control is initially lower than the unperturbed
78
Stability and Sensitivity Analysis
Figure 5.3. Concentrations of substance A (left) and its sensitivity
(right) at times t = 0.25, t = 0.50, t = 0.75, and t = 1.00.
4. Sensitivity Analysis for Reaction-Diffusion Problems
Figure 5.4. Concentrations of substance B (left) and its sensitivity
(right) at times t = 0.25, t = 0.50, t = 0.75, and t = 1.00.
79
80
Stability and Sensitivity Analysis
one after leaving the upper bound, the sensitivity of the second substance is below
zero there. Later, it becomes positive, as does the sensitivity for the control variable.
References
[1] R. Adams. Sobolev Spaces. Academic Press, New York, 1975.
[2] M. Bergounioux, M. Haddou, M. Hintermüller, and K. Kunisch. A comparison of a MoreauYosida-based active set strategy and interior point methods for constrained optimal control
problems. SIAM Journal on Optimization, 11(2):495–521, 2000.
[3] C. Büskens and R. Griesse. Parametric sensitivity analysis of perturbed PDE optimal control
problems with state and control constraints. Journal of Optimization Theory and Applications,
131(1):17–35, 2006.
[4] E. DiBenedetto. Degenerate Parabolic Equations. Springer, Berlin, 1993.
[5] A. Dontchev. Implicit function theorems for generalized equations. Mathematical Programming,
70:91–106, 1995.
[6] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system—
Part I: Solution differentiability. Numerical Functional Analysis and Optimization, 25(1–2):93–
117, 2004.
[7] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system—
Part II: Practical methods and examples. Optimization Methods and Software, 19(2):217–242,
2004.
[8] R. Griesse and S. Volkwein. A semi-smooth Newton method for optimal boundary control of a
nonlinear reaction-diffusion system. In Proceedings of the Sixteenth International Symposium on
Mathematical Theory of Networks and Systems (MTNS), Leuven, Belgium, 2004.
[9] R. Griesse and S. Volkwein. A primal-dual active set strategy for optimal boundary control of a
nonlinear reaction-diffusion system. SIAM Journal on Control and Optimization, 44(2):467–494,
2005.
[10] M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth
Newton method. SIAM Journal on Optimization, 13(3):865–888, 2002.
[11] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002.
[12] K. Malanowski. Solution differentiability of parametric optimal control for elliptic equations. In
E. W. Sachs and R. Tichatschke, editors, System Modeling and Optimization XX, Proceedings
of the 20th IFIP TC 7 Conference, pages 271–285. Kluwer Academic Publishers, 2003.
[13] K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control
for parabolic equations. Journal of Analysis and its Applications, 18(2):469–489, 1999.
[14] K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal control
for elliptic equations. Control and Cybernetics, 29:237–256, 2000.
[15] S. Robinson. Strongly regular generalized equations. Mathematics of Operations Research,
5(1):43–62, 1980.
[16] T. Roubı́ček and F. Tröltzsch. Lipschitz stability of optimal controls for the steady-state NavierStokes equations. Control and Cybernetics, 32(3):683–705, 2003.
[17] R. Temam. Navier-Stokes Equations, Theory and Numerical Analysis. North-Holland, Amsterdam, 1984.
[18] F. Tröltzsch. Lipschitz stability of solutions of linear-quadratic parabolic control problems with
respect to perturbations. Dynamics of Continuous, Discrete and Impulsive Systems Series A
Mathematical Analysis, 7(2):289–306, 2000.
[19] M. Ulbrich. Semismooth Newton methods for operator equations in function spaces. SIAM Journal on Control and Optimization, 13(3):805–842, 2003.
CHAPTER 2
Numerical Methods and Applications
Besides their theoretical interest, the concepts of stability and sensitivity of optimization and of optimal control problems in particular have a number of applications. We
address some of them in this chapter, along with numerical methods for the computation of sensitivity derivatives and related quantities.
First of all, Newton’s method, when applied to a generalized equation, exhibits local
quadratic convergence whenever the generalized equation
0 ∈ F (w) + N (w)
is strongly regular (see Definition 0.7 on p. 11) and F is sufficiently smooth. In the
context of optimal control problems, Newton’s method amounts to an SQP (sequential
quadratic programming) approach. Based on our Lipschitz stability results for optimal
control problems with mixed control-state constraints (Section 2), we establish the
local quadratic convergence of SQP for semilinear problems with such constraints in
Section 5 below.
In addition, we have considered in Chapter 1 the differentiability of local optimal
solutions of various optimal control problems with control constraints. These problems
can be written in abstract form as
Minimize J(y, u; π)
(Pcc (π))
subject to e(y, u; π) = 0
and ua ≤ u ≤ ub
a.e.
Using the Implicit Function Theorem 0.6, we have shown in various cases the existence
of a local map
π 7→ Ξ(π) = (Ξy (π), Ξu (π), Ξp (π))
near the nominal parameter π0 , which is Lipschitz and directionally differentiable.
The computation of one directional derivative DΞ(π0 ; δπ) amounts to the solution of
a linear-quadratic optimal control problem with the same type of control constraints,
b on p. 9, Theorem 4.1 of Griesse, Hintermüller, and Hinze [2005]
compare (DQP(δ, δ))
(Section 3), or Theorem 4.1 of Griesse and Volkwein [2006] (Section 4).
We address in this chapter a number of questions related to these sensitivity derivatives:
(1) How can the solution of a perturbed problem Ξ(π) be recovered from the
solution of the nominal problem Ξ(π0 ) and derivative information, as accurately as possible? (Section 6)
(2) What is the worst-case perturbation which has the greatest impact on the
solution or a quantity of interest depending on the solution? (Section 7)
(3) How can first and second-order derivatives of such a quantity of interest be
evaluated efficiently? (Section 8)
(4) What is the relationship between the sensitivity derivatives of (Pcc (π)) and
of its relaxation arising in interior point approaches? (Section 9)
82
Numerical Methods and Applications
5. Local Quadratic Convergence of SQP for Elliptic Optimal Control
Problems with Mixed Control-State Constraints
R. Griesse, N. Metla and A. Rösch: Local Quadratic Convergence of SQP for Elliptic Optimal Control Problems with Mixed Control-State Constraints, submitted to:
ESAIM: Control, Optimisation, and Calculus of Variations, 2007
In this paper, we show the local quadratic convergence behavior of the sequential
quadratic programming (SQP) approach for the solution of semilinear elliptic optimal
control problems of the type
Z
Minimize
φ(x, y, u) dx
Ω
A y + d(x, y) = u in Ω,
subject
to
(Pmcc )
y = 0 on Γ.
u ≥ 0 in Ω,
and
ε u + y ≥ yc in Ω,
where A is a uniformly elliptic second-order differential operator and d is a monotone
nonlinearity. The SQP method was considered previously for optimal control problems with control constraints only, see for instance Unger [1997], Heinkenschloss and
Tröltzsch [1998], Tröltzsch [1999], Tröltzsch and Volkwein [2001], Hintermüller and
Hinze [2006] and Wachsmuth [2007].
The first-order optimality system for (Pmcc ) is a generalization of the system (2.1)
for the linear-quadratic problem given on p. 29. It is re-written (see Section 4 of the
paper) as a generalized equation
(5.1)
0 ∈ F (w) + N (w),
where in contrast to the previous cases, F now comprises also the inequality constraints, and w = (y, u, p, µ1 , µ2 ) comprises state, control and adjoint variables as well
as Lagrange multipliers. This approach would allow nonlinear inequality constraints
as well, which will be considered in an upcoming publication.
Given a current iterate wk , Newton’s method, applied to (5.1), produces the new
iterate as a solution of
(5.2)
0 ∈ F (wk ) + F 0 (wk )(wk+1 − wk ) + N (wk+1 ).
It can be verified that (5.2) is equivalent to one step of the SQP method (see Section 5
of the paper). However, for the convergence analysis it is convenient to think in terms
of Newton’s method. Let us briefly outline how the local quadratic convergence is
shown, compare Alt [1990, 1994]. Suppose that w∗ is a solution of (5.1). We write the
Newton step (5.2) as a perturbed step taken at w∗ :
(5.3)
δ k+1 ∈ F (w∗ ) + F 0 (w∗ )(wk+1 − w∗ ) + N (wk+1 )
where
δ k+1 := F (w∗ ) − F (wk ) + F 0 (w∗ )(wk+1 − w∗ ) − F 0 (wk )(wk+1 − wk ).
By the fact that w∗ is a solution of (5.1), it also solves
(5.4)
0 ∈ F (w∗ ) + F 0 (w∗ )(w∗ − w∗ ) + N (w∗ ),
Under the condition that (5.1) is strongly regular at w∗ , we get from (5.3) and (5.4)
that
kwk+1 − w∗ k ≤ L kδ k+1 k
(5.5)
It remains to show that
kδ k+1 k ≤ c1 kwk+1 − wk k2 + c2 kwk − w∗ kkwk+1 − w∗ k,
5. SQP for Mixed-Constrained Optimal Control Problems
83
which follows from differentiability and Lipschitz properties of F , i.e., from the properties of d and φ. Given that kwk − w∗ k is sufficiently small, the second term can be
hidden in the left hand side of (5.5), which yields the local quadratic convergence.
The strong regularity of (5.1) at w∗ follows from our results in Alt et al. [2006], see
Section 2 of this thesis, under the assumption that the active sets at the solution w∗
are well separated and that second-order sufficient conditions hold, see Theorem 6.7
of the paper under discussion.
84
Numerical Methods and Applications
LOCAL QUADRATIC CONVERGENCE OF SQP FOR ELLIPTIC
OPTIMAL CONTROL PROBLEMS WITH MIXED
CONTROL-STATE CONSTRAINTS
ROLAND GRIESSE, NATALIYA METLA, AND ARND RÖSCH
Abstract. Semilinear elliptic optimal control problems with pointwise control
and mixed control-state constraints are considered. Necessary and sufficient optimality conditions are given. The equivalence of the SQP method and Newton’s
method for a generalized equation is discussed. Local quadratic convergence of
the SQP method is proved.
1. Introduction
This paper is concerned with the local convergence analysis of the sequential quadratic programming (SQP) method for the following class of semilinear optimal control
problems:
Z
Minimize f (y, u) :=
φ(ξ, y(ξ), u(ξ)) dξ
(P)
Ω
∞
subject to u ∈ L (Ω) and the elliptic state equation
A y + d(ξ, y) = u
y=0
in Ω,
on ∂Ω,
(1.1)
as well as pointwise constraints
u>0
εu + y > yc
in Ω,
in Ω.
(1.2)
Here and throughout, Ω is a bounded domain in RN , N ∈ {2, 3}, which is convex or
has a C 1,1 boundary ∂Ω. In (1.1), A is an elliptic operator in H01 (Ω) specified below,
and ε is a positive number. The bound yc is a function in L∞ (Ω).
Problems with mixed control-state constraints are important as Lavrientiev-type
regularizations of pointwise state-constrained problems [10–12], but they are also interesting in their own right. In the former case, ε is a small parameter tending to
zero. For the purpose of this paper, we consider ε to be fixed. Note that in addition
to the mixed control-state constraint, a pure control constraint is present on the same
domain. Since problem (P) is nonconvex, different local minima may occur.
SQP methods have proved to be fast solution methods for nonlinear programming
problems. A large body of literature exists concerning the analysis of these methods
for finite-dimensional problems. For a convergence analysis in a general Banach space
setting with equality and inequality constraints, we refer to [2, 3].
The main contribution of this paper is the proof of local quadratic convergence of
the SQP method, applied to (P). To our knowledge, such convergence results in the
context of PDE-constrained optimization are so far only available for purely controlconstrained problems [7, 17, 19]. Following [2], we exploit the equivalence between
the SQP and the Lagrange-Newton methods, i.e., Newton’s method, applied to a
generalized (set-valued) equation representing necessary conditions of optimality. We
concentrate on specific issues arising due to the semilinear state equation, e.g., the
5. SQP for Mixed-Constrained Optimal Control Problems
85
careful choice of suitable function spaces. An important step is the verification of the
so-called strong regularity of the generalized equation, which is made difficult by the
simultaneous presence of pure control and mixed control-state constraints (1.2). The
key idea was recently developed in [4].
We remark that strong regularity is known to be closely related to second-order
sufficient conditions (SSC). For problems with pure control constraints, SSC are well
understood and they are close to the necessary ones when so-called strongly active
subsets are used, see, e.g., [17, 19, 20]. However, the situation is more difficult for
problems with mixed control-state constraints [14, 16] or even pure state constraints.
In order to avoid a more technical discussion, we presently employ relatively strong
SSC and refer to future work for their refinement. We also refer to an upcoming
publication concerning the numerical application of the SQP method to problems of
type (P).
The material in this paper is organized as follows. In Section 2, we state our
main assumptions and recall some properties about the state equation. Necessary
and sufficient optimality conditions for problem (P) are stated in Section 3, and their
reformulation as a generalized equation is given in Section 4. Section 5 addresses the
equivalence of the SQP and Lagrange-Newton methods. Section 6 is devoted to the
proof of strong regularity of the generalized equation. Finally, Section 7 completes the
convergence analysis of the SQP method. A number of auxiliary results have been
collected in the Appendix.
We denote by Lp (Ω) and H m (Ω) the usual Lebesgue and Sobolev spaces [1], and
(·, ·) is the scalar product in L2 (Ω) or [L2 (Ω)]N , respectively. H01 (Ω) is the subspace of
H 1 (Ω) with zero boundary traces, and H −1 (Ω) is its dual. The continuous embedding
of a normed space X into a normed space Y is denoted by X ,→ Y . Throughout, we
denote by BrX (x) the open ball of radius r around x, in the topology of X. In particular,
we write Br∞ (x) for the open ball with respect to the L∞ (Ω) norm. Throughout, c,
c1 etc. denote generic positive constants whose value may change from instance to
instance.
2. Assumptions and Properties of the State Equation
The following assumptions (A1)–(A4) are taken to hold throughout the paper.
Assumption.
(A1) Let Ω be a bounded domain in RN , N ∈ {2, 3} which is convex or has C 1,1
boundary ∂Ω. The bound yc is in L∞ (Ω), and ε > 0.
(A2) The operator A : H01 (Ω) → H −1 (Ω) is defined as A y(v) = a[y, v], where
a[y, v] = ((∇v), A0 ∇y) + (b> ∇y, v) + (cy, v).
A0 is an N × N matrix with Lipschitz continuous entries on Ω such that
ρ>A0 (ξ) ρ > m0 |ρ|2 holds with some m0 > 0 for all ρ ∈ RN and almost all
ξ ∈ Ω. Moreover, b ∈ L∞ (Ω)N and c ∈ L∞ (Ω). The bilinear form a[·, ·] is not
necessarily symmetric but it is assumed to be continuous and coercive, i.e.,
a[y, v] 6 c kykH 1 (Ω) kvkH 1 (Ω)
2
a[y, y] > c kykH 1 (Ω)
for all y, v ∈ H01 (Ω) with some positive constants c and c. A simple example
is a[y, v] = (∇y, ∇v), corresponding to A = −∆.
(A3) d(ξ, y) belongs to the C 2 -class of functions with respect to y for almost all
ξ ∈ Ω. Moreover, dyy is assumed be a locally bounded and locally Lipschitzcontinuous function with respect to y, i.e., the following conditions hold true:
86
Numerical Methods and Applications
there exists K > 0 such that
|d(ξ, 0)| + |dy (ξ, 0)| + |dyy (ξ, 0)| 6 Kd ,
and for any M > 0, there exists Ld (M ) > 0 such that
|dyy (ξ, y1 ) − dyy (ξ, y2 )| 6 Ld (M ) |y1 − y2 |
a.e. in Ω
for all y1 , y2 ∈ R satisfying |y1 |, |y2 | 6 M .
Additionally dy (ξ, y) > 0 a.e. in Ω, for all y ∈ R.
(A4) The function φ = φ(ξ, y, u) is measurable with respect to ξ ∈ Ω for each
y and u, and of class C 2 with respect to y and u for almost all ξ ∈ Ω.
Moreover, the second derivatives are assumed to be locally bounded and locally Lipschitz-continuous functions, i.e., the following conditions hold: there
exist Ky , Ku , Kyu > 0 such that
|φ(ξ, 0, 0)| + |φy (ξ, 0, 0)| + |φyy (ξ, 0, 0)| 6 Ky ,
|φyu (ξ, 0, 0)| 6 Kyu ,
|φ(ξ, 0, 0)| + |φu (ξ, 0, 0)| + |φuu (ξ, 0, 0)| 6 Ku ,
Moreover, for any M > 0, there exists Lφ (M ) > 0 such that
|φyy (ξ, y1 , u1 ) − φyy (ξ, y2 , u2 )| 6 Lφ (M ) |y1 − y2 | + |u1 − u2 | ,
|φyu (ξ, y1 , u1 ) − φyu (ξ, y2 , u2 )| 6 Lφ (M ) |y1 − y2 | + |u1 − u2 | ,
|φuy (ξ, y1 , u1 ) − φuy (ξ, y2 , u2 )| 6 Lφ (M ) |y1 − y2 | + |u1 − u2 | ,
|φuu (ξ, y1 , u1 ) − φuu (ξ, y2 , u2 )| 6 Lφ (M ) |y1 − y2 | + |u1 − u2 |
for all yi , ui ∈ R satisfying |yi |, |ui | 6 M , i = 1, 2.
In addition, φuu (ξ, y, u) > m > 0 a.e. in Ω, for all (y, u) ∈ R2 .
In the sequel, we will simply write d(y) instead of d(ξ, y) etc. As a consequence of
(A3)–(A4), the Nemyckii operators d(·) and φ(·) are twice continuously Fréchet differentiable with respect to the L∞ (Ω) norms, and their derivatives are locally Lipschitz
continuous, see Lemma A.1.
The necessity of using L∞ (Ω) norms for general nonlinearities d and φ motivates
our choice
Y := H 2 (Ω) ∩ H01 (Ω)
∞
as a state space, since Y ,→ L (Ω).
Remark 2.1. In case Ω has only a Lipschitz boundary, our results remain true when
Y is replaced by H01 (Ω) ∩ L∞ (Ω).
Recall that a function y ∈ H01 (Ω) ∩ L∞ (Ω) is called a weak solution of (1.1) with
u ∈ L2 (Ω) if a[y, v] + (d(y), v) = (u, v) holds for all v ∈ H01 (Ω).
Lemma 2.2. Under assumptions (A1)–(A3) and for any given u ∈ L2 (Ω), the semilinear equation (1.1) possesses a unique weak solution y ∈ Y . It satisfies the a priori
estimate
kykH 1 (Ω) + kykL∞ (Ω) 6 CΩ kukL2 (Ω) + 1
with a constant CΩ independent of u.
Proof. The existence and uniqueness of a weak solution y ∈ H01 (Ω) ∩ L∞ (Ω) is a
standard result [18, Theorem 4.8]. It satisfies
kykH 1 (Ω) + kykL∞ (Ω) 6 CΩ (kukL2 (Ω) + 1) =: M
with some constant CΩ independent of u. Lemma A.1 implies that d(y) ∈ L∞ (Ω).
Using the embedding L∞ (Ω) ,→ L2 (Ω), we conclude that the difference u−d(y) belongs
to L2 (Ω). Owing to assumption (A1), y ∈ H 2 (Ω), see for instance [6, Theorem 2.2.2.3].
5. SQP for Mixed-Constrained Optimal Control Problems
87
We will frequently also need the corresponding result for the linearized equation
A y + dy (y) y = u
in Ω,
y = 0 on ∂Ω.
(2.1)
Lemma 2.3. Under assumptions (A1)–(A3) and given y ∈ L∞ (Ω), the linearized
PDE (2.1) possesses a unique weak solution y ∈ Y for any given u ∈ L2 (Ω). It
satisfies the a priori estimate
kykH 2 (Ω) 6 CΩ (y) kukL2 (Ω)
with a constant CΩ (y) independent of u.
Proof. According to (A3) and Lemma A.1, dy (y) is a nonnegative coefficient in L∞ (Ω).
The claim thus follows again from standard arguments, see, e.g., [6, Theorem 2.2.2.3].
3. Necessary and Sufficient Optimality Conditions
In this section, we introduce necessary and sufficient optimality conditions for problem (P). For convenience, we define the Lagrange functional
L : Y × L∞ (Ω) × Y × L∞ (Ω) × L∞ (Ω) → R
as
L(y, u, p, µ1 , µ2 ) = f (y, u) + a[y, p] + (p, d(y) − u) − (µ1 , u) − (µ2 , εu + y − yc ).
Here, µi are Lagrange multipliers associated to the inequality constraints, and p is
the adjoint state. The existence of regular Lagrange multipliers µ1 , µ2 ∈ L∞ (Ω) was
shown in [15, Theorem 7.3], which implies the following lemma:
Lemma 3.1. Suppose that (y, u) ∈ Y × L∞ (Ω) is a local optimal solution of (P).
Then there exist regular Lagrange multipliers µ1 , µ2 ∈ L∞ (Ω) and an adjoint state
p ∈ Y such that the first-oder necessary optimality conditions

Ly (y, u, p, µ1 , µ2 ) = 0, Lu (y, u, p, µ1 , µ2 ) = 0, Lp (y, u, p, µ1 , µ2 ) = 0,

u > 0,
µ1 > 0,
µ1 u = 0,
(FON)


εu + y − yc > 0,
µ2 > 0,
µ2 (εu + y − yc ) = 0
hold.
Remark 3.2. The Lagrange multipliers and adjoint state associated to a local optimal
solution of (P) need not be unique if the active sets {ξ ∈ Ω : u = 0} and {ξ ∈ Ω : εu +
y − yc = 0} intersect nontrivially. This situation will be excluded by Assumption (A6)
below.
Conditions (FON) are also stated in explicit form in (4.1) below. To guarantee
that x = (y, u) with associated multipliers λ = (µ1 , µ2 , p) is a local solution of (P),
we introduce the following second-order sufficient optimality condition (SSC):
There exists a constant α > 0 such that
2
Lxx (x, λ)(δx, δx) > α kδxk[L2 (Ω)]2
(3.1)
for all δx = (δy, δu) ∈ Y × L∞ (Ω) which satisfy the linearized equation
Aδy + dy (y) · δy = δu
δy = 0
in Ω,
on ∂Ω.
In (3.1), the Hessian of the Lagrange functional is given by
Z >
δy
δy
φyy (y, u) + dyy (y) p φyu (y, u)
Lxx (x, λ)(δx, δx) :=
dξ.
φuy (y, u)
φuu (y, u)
δu
Ω δu
(3.2)
88
Numerical Methods and Applications
For convenience, we will use the abbreviation
X := Y × L∞ (Ω) = H 2 (Ω) ∩ H01 (Ω) × L∞ (Ω)
in the sequel.
Assumption.
(A5) We assume that x∗ = (y ∗ , u∗ ) ∈ X, together with associated Lagrange multipliers λ∗ = (p∗ , µ∗1 , µ∗2 ) ∈ Y × [L∞ (Ω)]2 , satisfies both (FON) and (SSC).
As mentioned in the introduction, we are aware of the fact that there exist weaker
sufficient conditions which take into account strongly active sets. However, this further
complicates the convergence analysis of SQP and is therefore postponed to later work.
Definition 3.3.
(a) A pair x = (y, u) ∈ X is called an admissible point if it satisfies (1.1) and
(1.2).
(b) A point x̄ ∈ X is called a strict local optimal solution in the sense of
L∞ (Ω) if there exists ε > 0 such that the inequality f (x̄) < f (x) holds for all
admissible x ∈ X \ {x̄} with kx − x̄k[L∞ (Ω)]2 6 ε.
Theorem 3.4. Under Assumptions (A1)–(A5), there exists β > 0 and ε > 0 such
that
2
f (x) > f (x∗ ) + β kx − x∗ k[L2 (Ω)]2
holds for all admissible x ∈ X with kx − x∗ k[L∞ (Ω)]2 6 ε. In particular, x∗ is a strict
local optimal solution in the sense of L∞ (Ω).
Proof. The proof uses the two-norm discrepancy principle, see [8, Theorem 3.5]. Let
x ∈ X be an admissible point, which implies
a[y, p∗ ] + (p∗ , d(y) − u) = 0
and
u > 0,
εu + y − yc > 0 a.e. in Ω.
In view of µ∗1 , µ∗2 > 0, we can estimate the cost functional f by the Lagrange functional:
f (x) > f (x) + a[y, p∗ ] + (p∗ , d(y) − u) − (µ∗1 , u) − (µ∗2 , εu + y − yc ) = L(x, λ∗ ). (3.3)
The Lagrange functional is twice continuously differentiable with respect to the L∞ (Ω)
norms, as is easily seen from Lemma A.1. Hence it possesses a Taylor expansion
L(x, λ∗ ) = L(x∗ , λ∗ ) + Lx (x∗ , λ∗ )(x − x∗ ) + Lxx (x + θ(x − x∗ ), λ∗ )(x − x∗ , x − x∗ )
for all x ∈ X, where θ ∈ (0, 1). Since the pair (x∗ , λ∗ ) satisfies (FON), we have
f (x∗ ) = L(x∗ , λ∗ ) + Lx (x∗ , λ)(x − x∗ ),
which implies
L(x, λ∗ ) = f (x∗ ) + Lxx (x∗ , λ∗ )(x − x∗ , x − x∗ )
+ Lxx (x∗ + θ(x − x∗ ), λ∗ ) − Lxx (x∗ , λ∗ ) (x − x∗ , x − x∗ ).
We cannot use (SSC) directly since x satisfies the semilinear equation (1.1) instead
of the linearized one (3.2). However, Lemma A.2 implies that there exist ε > 0 and
α0 > 0 such that
2
L(x, λ∗ ) > f (x∗ ) + α0 kx − x∗ k[L2 (Ω)]2
+ Lxx (x∗ + θ(x − x∗ ), λ∗ ) − Lxx (x∗ , λ∗ ) (x − x∗ , x − x∗ ), (3.4)
5. SQP for Mixed-Constrained Optimal Control Problems
89
given that kx − x∗ k[L∞ (Ω)]2 6 ε. Moreover, the Hessian of the Lagrange functional satisfies the following local Lipschitz condition (see Lemma A.1 and also [18, Lemma 4.24]):
| Lxx (x∗ + θ(x − x∗ ), λ∗ ) − Lxx (x∗ , λ∗ ) (x − x∗ , x − x∗ )|
2
6 c kx − x∗ k[L∞ (Ω)]2 kx − x∗ k[L2 (Ω)]2
(3.5)
for all kx − x∗ k[L∞ (Ω)]2 6 ε. Summarizing (3.3)–(3.5), we can estimate
2
f (x) > f (x∗ ) + β kx − x∗ k[L2 (Ω)]2 ,
where
β := α0 − c kx − x∗ k[L∞ (Ω)]2 > α0 − c ε > 0
when ε is taken sufficiently small.
4. Generalized Equation
We recall the necessary optimality conditions (FON) for problem (P), which read
in explicit form

a[v, p] + (dy (y)p, v) + (φy (y, u), v) − (µ2 , v) = 0, v ∈ H01 (Ω) 



(φu (y, u), v) − (p, v) − (µ1 , v) − (εµ2 , v) = 0, v ∈ L2 (Ω) 


1
(4.1)
a[y, v] + (d(y), v) − (u, v) = 0, v ∈ H0 (Ω)




µ1 > 0,
u > 0,
µ1 u = 0


a.e. in Ω. 
µ2 > 0, εu + y − yc > 0, µ2 (εu + y − yc ) = 0
As was mentioned in the introduction, the local convergence analysis of SQP is based
on its interpretation as Newton’s method for a generalized (set-valued) equation
0 ∈ F (y, u, p, µ1 , µ2 ) + N (y, u, p, µ1 , µ2 )
(4.2)
equivalent to (4.1). We define
K := {µ ∈ L∞ (Ω) : µ > 0
a.e. in Ω},
the cone of nonnegative functions in L∞ (Ω), and the dual cone N1 : L∞ (Ω) −→
P (L∞ (Ω)),
(
{z ∈ L∞ (Ω) : (z, µ − ν) > 0 ∀ν ∈ K} if µ ∈ K,
N1 (µ) :=
∅
if µ 6∈ K.
Here P (L∞ ) denotes the power set of L∞ (Ω), i.e., the set of all subsets
In (4.2), F contains the single-valued part of (4.1), i.e.,

a[·, p] + (dy (y) p, ·) + (φy (y, u), ·) − (µ2 , ·)

φu (y, u) − p − µ1 − εµ2

a[y, ·] + (d(y), ·) − (u, ·)
F (y, u, p, µ1 , µ2 )(·) = 


u
εu + y − yc
of L∞ (Ω).






and N is a set-valued function
N (y, u, p, µ1 , µ2 ) = ({0}, {0}, {0}, N1 (µ1 ), N1 (µ2 ))> .
Note that the generalized equation (4.2) is nonlinear, since it contains the nonlinear
functions d, dy , φy and φu .
90
Numerical Methods and Applications
Remark 4.1. Let
W := Y × L∞ (Ω) × Y × L∞ (Ω) × L∞ (Ω),
Z := L2 (Ω) × L∞ (Ω) × L2 (Ω) × L∞ (Ω) × L∞ (Ω).
Then F : W −→ Z and N : W −→ P (Z). Owing to Assumptions (A3) and (A4), F is
continuously Fréchet differentiable with respect to the L∞ (Ω) norms, see Lemma A.1.
Lemma 4.2. The first-order necessary conditions (4.1) and the generalized equation
(4.2) are equivalent.
Proof. (4.2) ⇒ (4.1): This is immediate for the first three components. For the fourth
component we have
− u ∈ N1 (µ1 )
⇒
µ1 ∈ K
⇒
µ1 (ξ) > 0
This implies
and (−u, µ1 − ν) > 0
and
for all ν ∈ K
− u(ξ)(µ1 (ξ) − ν) > 0
µ1 (ξ) = 0
⇒
for all ν > 0,
a.e. in Ω.
u(ξ) > 0
µ1 (ξ) > 0 ⇒ u(ξ) = 0,
which shows the first complementarity system in (4.1). The second follows analogously.
(4.1) ⇒ (4.2): This is again immediate for the first three components. From the
first complementarity system in (4.1) we infer that
u(ξ) ν > 0
for all ν > 0,
⇒
− u(ξ)(µ1 (ξ) − ν) > 0
⇒
− (u, µ1 − ν) > 0
a.e. in Ω
for all ν > 0,
a.e. in Ω
for all ν ∈ K.
In view of µ1 ∈ K, this implies −u ∈ N1 (µ1 ). Again, −(εu + y − yc ) ∈ N1 (µ2 ) follows
analogously.
5. SQP Method
In this section we briefly recall the SQP (sequential quadratic programming) method
for the solution of problem (P). We also discuss its equivalence with Newton’s method,
applied to the generalized equation (4.2), which is often called the Lagrange-Newton
approach. Throughout the rest of the paper we use the notation
wk := (xk , λk ) = (y k , uk , pk , µk1 , µk2 ) ∈ W
to denote an iterate of either method. SQP methods break down the solution of (P)
into a sequence of quadratic programming problems. At any given iterate wk , one
solves
1
(QPk )
Minimize fx (xk )(x − xk ) + Lxx (xk , λk )(x − xk , x − xk )
2
subject to x = (y, u) ∈ Y × L∞ (Ω), the linear state equation
A y + d(y k ) + dy (y k )(y − y k ) = u
y=0
in Ω,
on ∂Ω,
(5.1)
and inequality constraints
u > 0 in Ω,
εu + y − yc > 0 in Ω.
The solution (which needs to be shown to exist)
x = (y, u) ∈ Y × L∞ (Ω),
(5.2)
5. SQP for Mixed-Constrained Optimal Control Problems
91
together with the adjoint state and Lagrange multipliers
λ = (p, µ1 , µ2 ) ∈ Y × L∞ (Ω) × L∞ (Ω),
will serve as the next iterate wk+1 .
Lemma 5.1. There exists R > 0 such that (QPk ) has a unique global solution x =
∞ ∗ ∗
(y, u) ∈ X, provided that (xk , pk ) ∈ BR
(x , p ).
Proof. For every u ∈ L2 (Ω), the linearized PDE (5.1) has a unique solution y ∈ Y by
Lemma 2.3. We define the feasible set
M k := {x = (y, u) ∈ Y × L2 (Ω) satisfying (5.1) and (5.2)}.
The set M k is non-empty, which follows from [4, Lemma 2.3] using δ3 = −d(y k ) +
dy (y k ) y k . The proof uses the maximum principle for the differential operator Ay +
dy (y k ) y. Clearly, M k is also closed and convex.
The cost functional of (QPk ) can be decomposed into quadratic and affine parts in
x. Lemma A.3 shows that there exists R > 0 and α00 > 0 such that
2
Lxx (xk , λk ) x, x > α00 kxk[L2 (Ω)]2
for all (y, u) ∈ X satisfying A y + dy (y k ) y = u in Ω with homogeneous Dirichlet
∞ ∗ ∗
(x , p ). This implies that the
boundary conditions, provided that (xk , pk ) ∈ BR
cost functional is uniformly convex, continuous (i.e., weakly lower semicontinuous)
and radially unbounded, which shows the unique solvability of (QPk ) in Y × L2 (Ω).
Using the optimality system (5.3) below, we can conclude as in [4, Lemma 2.7] that
u ∈ L∞ (Ω).
The solution (y, u) of (QPk ) and its Lagrange multipliers (p, µ1 , µ2 ) are characterized by the first order optimality system (compare [4, Lemma 2.5]):

a[v, p] + (dy (y k ) p, v) + (φy (y k , uk ), v) + (φyu (y k , uk )(u − uk ), v)




k
k
k
k
k
1
+ (φyy (y , u ) + dyy (y ) p )(y − y ), v − (µ2 , v) = 0, v ∈ H0 (Ω) 






k
k
k
k
k

(φu (y , u ), v) + (φuu (y , u )(u − u ), v)



k
k
k
2
+(φuy (y , u )(y − y ), v) − (p, v) − (µ1 , v) − (εµ2 , v) = 0, v ∈ L (Ω)




k
k
k
1
a[y, v] + (d(y ), v) + (dy (y )(y − y ), v) − (u, v) = 0, v ∈ H0 (Ω) 







µ1 > 0,
u > 0,
µ1 u = 0



a.e. in Ω.
µ2 > 0, εu + y − yc > 0, µ2 (εu + y − yc ) = 0
(5.3)
Note that due to the convexity of the cost functional, (5.3) is both necessary and
∞ ∗ ∗
sufficient for optimality, provided that (xk , pk ) ∈ BR
(x , p ).
Remark 5.2. The Lagrange multipliers (µ1 , µ2 ) and the adjoint state p in (5.3) need
not be unique, compare [4, Remark 2.6]. Non-uniqueness can occur only if µ1 and µ2
are simulateneously nonzero on a set of positive measure.
We recall for convenience the generalized equation (4.2),
0 ∈ F (w) + N (w).
(5.4)
Given the iterate wk , Newton’s method yields the next iterate wk+1 as the solution of
the linearized generalized equation
0 ∈ F (wk ) + F 0 (wk )(w − wk ) + N (w).
(5.5)
Analogously to Lemma 4.2, one can show:
Lemma 5.3. System (5.3) and the linearized generalized equation (5.5) are equivalent.
92
Numerical Methods and Applications
6. Strong Regularity
The local convergence analysis of Newton’s method (5.5) for the solution of (5.4)
is based on a perturbation argument. It will be carried out in Section 7. The main
ingredient in the proof is the local Lipschitz stability of solutions w = w(η) of
0 ∈ F (η) + F 0 (η)(w − η) + N (w)
(6.1)
∗
with respect to the parameter η near w . The difficulty arises due to the fact that η
enters nonlinearly in (6.1). Therefore, we employ an implicit function theorem due to
Dontchev [5] to derive this result. This theorem requires the so-called strong regularity
of (5.4), i.e., the Lipschitz stability of solutions w = w(δ) of
δ ∈ F (w∗ ) + F 0 (w∗ )(w − w∗ ) + N (w)
(6.2)
with respect to the new perturbation parameter δ, which enters linearly. The parameter δ belongs to the image space of F
Z := L2 (Ω) × L∞ (Ω) × L2 (Ω) × L∞ (Ω) × L∞ (Ω),
see Remark 4.1. Note that w∗ is a solution of both (5.4) and (6.2) for δ = 0.
Definition 6.1 (see [13]). The generalized equation (5.4) is called strongly regular
at w∗ if there exist radii r1 > 0, r2 > 0 and a positive constant Lδ such that for all
perturbations δ ∈ BrZ1 (0), the following hold:
(1) the linearized equation (6.2) has a solution wδ = w(δ) ∈ BrW2 (w∗ )
(2) wδ is the only solution of (6.2) in BrW2 (w∗ )
(3) wδ satisfies the Lipschitz condition
kwδ − wδ0 kW 6 Lδ kδ − δ 0 kZ
for all δ, δ 0 ∈ BrZ1 (0).
The verification of strong regularity is based on the interpretation of (6.2) as the
optimality system of the following QP problem, which depends on the perturbation δ:
1
(LQP(δ))
Minimize fx (x∗ )(x − x∗ ) + Lxx (x∗ , λ∗ ) x − x∗ , x − x∗
2
− [δ1 , δ2 ], x − x∗
subject to x = (y, u) ∈ Y × L∞ (Ω), the linear state equation
A y + d(y ∗ ) + dy (y ∗ )(y − y ∗ ) = u + δ3 in Ω,
y=0
on ∂Ω,
(6.3)
and inequality constraints
u > δ4
in Ω,
εu + y − yc > δ5
in Ω.
(6.4)
As before, it is easy to check that the necessary optimality conditions of (LQP(δ))
are equivalent to (6.2).
Lemma 6.2. For any δ ∈ Z, problem (LQP(δ)) possesses a unique global solution
xδ = (yδ , uδ ) ∈ X. If λδ = (pδ , µ1,δ , µ2,δ ) ∈ Y × L∞ (Ω) × L∞ (Ω) are associated
Lagrange multipliers, then (xδ , λδ ) satisfies (6.2). On other hand, if any (xδ , λδ ) ∈ W
satisfies (6.2), then xδ is the unique global solution of (LQP(δ)), and λδ are associated
adjoint state and Lagrange multipliers.
Proof. For any given δ ∈ Z, let us denote by Mδ the set of all x = (y, u) ∈ Y × L2 (Ω)
satisfying (6.3) and (6.4). Then Mδ is nonempty (as can be shown along the lines
5. SQP for Mixed-Constrained Optimal Control Problems
93
of [4, Lemma 2.3]), convex and closed. Moreover, (A5) implies that the cost functional
fδ (x) of (LQP(δ)) satisfies
α
2
fδ (x) > kxk[L2 (Ω)]2 + linear terms in x
2
for all x satisfying (6.3). As in the proof of Lemma 5.1, we conclude that (LQP(δ))
has a unique solution xδ = (yδ , uδ ) ∈ X.
Suppose that λδ = (pδ , µ1,δ , µ2,δ ) ∈ Y × L∞ (Ω) × L∞ (Ω) are associated Lagrange
multipliers, i.e., the necessary optimality conditions of (LQP(δ)) are satisfied. As
argued above, it is easy to check that then (6.2) holds. On the other hand, suppose that
any (xδ , λδ ) ∈ W satisfies (6.2), i.e., the necessary optimality conditions of (LQP(δ)).
As fδ is strictly convex, these conditions are likewise sufficient for optimality, and the
minimizer xδ is unique.
The proof of Lipschitz stability of solutions for problems of type (LQP(δ)) has
recently been achieved in [4]. The main difficulty consisted in overcoming the nonuniqueness of the associated adjoint state and Lagrange multipliers. We follow the
same technique here.
Definition 6.3. Let σ > 0 be real number. We define two subsets of Ω,
S1σ = {ξ ∈ Ω : 0 6 u∗ (ξ) 6 σ}
S2σ = {ξ ∈ Ω : 0 6 εu∗ (ξ) + y ∗ (ξ) − yc (ξ) 6 σ},
called the security sets of level σ for (P).
Assumption.
(A6) We require that S1σ ∩ S2σ = ∅ for some fixed σ > 0.
From now on, we suppose (A1)–(A6) to hold. Assumption (A6) implies that the
active sets
A∗1 = {ξ ∈ Ω : u∗ (ξ) = 0}
A∗2 = {ξ ∈ Ω : εu∗ (ξ) + y ∗ (ξ) − yc (ξ) = 0}
are well separated. This in turn implies the uniqueness of the Lagrange multipliers
and adjoint state (p∗ , µ∗1 , µ∗2 ). Due to a continuity argument, the same conclusions
hold for the solution and Lagrange multipliers of (LQP(δ)) for sufficiently small δ, as
proved in the following theorem.
Theorem 6.4. There exist G > 0 and Lδ > 0 such that kδkZ 6 G σ implies:
(1) The Lagrange multipliers λδ = (pδ , µ1,δ , µ2,δ ) for (LQP(δ)) are unique.
(2) For any such δ and δ 0 , the corresponding solutions and Lagrange multipliers
of (LQP(δ)) satisfy
kxδ0 − xδ kY ×L∞ (Ω) + kλδ0 − λδ kY ×L∞ (Ω)×L∞ (Ω) 6 Lδ kδ 0 − δkZ .
(6.5)
Proof. The proof employs the technique introduced in [4], so we will only revisit the
main steps here. In contrast to the linear quadratic problem considered in [4], the cost
functional and PDE in (LQP(δ)) are slightly more general. To overcome potential
non-uniqueness of Lagrange multipliers, one introduces an auxiliary problem with
solutions (yδaux , uaux
δ ), in which the inequality constraints (6.4) are considered only on
the disjoint sets S1σ and S2σ , respectively. Then the associated Lagrange multipliers
aux
µaux
are unique, see [4, Lemma 3.1]. For any two
i,δ , i = 1, 2, and adjoint state pδ
0
perturbations δ, δ ∈ Z we abbreviate
δu := uaux
− uaux
δ
δ0
94
Numerical Methods and Applications
and similarly for the remaining quantitites. From the optimality conditions of the
auxiliary problem one deduces
2
2
(A5)
α (kδykL2 (Ω) + kδukL2 (Ω) ) 6 Lxx (y ∗ , u∗ )(δx, δx)
= (δ10 − δ1 , δy) + (δ20 − δ2 , δu) − (δ30 − δ3 , δp)
+ (δµ2 , δy) + (δµ1 , δu) + ε (δµ2 , δu)
6 (δ10 − δ1 , δy) + (δ20 − δ2 , δu) − (δ30 − δ3 , δp)
+ (δµ1 , δ40 − δ4 ) + (δµ2 , δ50 − δ5 ).
The last inequality follows from [4, Lemma 3.3]. Young’s inequality yields
α
2
2
(kδykL2 (Ω) + kδukL2 (Ω) )
2
2 1 2
2
2
2
6 max
,
kδ − δ 0 k[L2 (Ω)]5 + κ kδpkL2 (Ω) + kδµ1 kL2 (Ω) + kδµ2 kL2 (Ω) , (6.6)
α 4κ
where κ > 0 is specified below. The difference of the adjoint states satisfies
a[v, δp] + (dy (y ∗ ) δp, v) = −(φyy (y ∗ , u∗ ) δy, v) − (dyy (y ∗ ) p∗ δy, v)
− (φyu (y ∗ , u∗ ) δu, v) + (δ1 − δ10 , v) + (δµ2 , v)
for all v ∈
H01 (Ω).
δµ1 =
and
εδµ2 =
(6.7)
The differences in the Lagrange multipliers are given by
φuu (y ∗ , u∗ ) δu + φuy (y ∗ , u∗ ) δy − δp − (δ2 − δ20 )
0
in S1σ
in Ω \ S1σ
(6.8)
φuu (y ∗ , u∗ ) δu + φuy (y ∗ , u∗ ) δy − δp − (δ2 − δ20 )
0
in S2σ ,
in Ω \ S2σ
(6.9)
The substitution of δµ2 into (6.7) yields
1
a[v, δp] + (dy (y ∗ ) δp, v) + (δp, χS2σ · v)
ε
= −(φyy (y ∗ , u∗ ) δy, v) − (dyy (y ∗ ) p∗ , δy) − φyu (y ∗ , u∗ ) δu + (δ1 − δ10 , v)
1
1
+ (φuu (y ∗ , u∗ ) δu, χS2σ · v) + (φuy (y ∗ , u∗ ) δy, χS2σ · v) − (δ2 − δ20 , χS2σ · v).
ε
ε
A standard a priori estimate (compare Lemma 2.3) implies
kδpkL2 (Ω) 6 kδpkY 6 c kδykL2 (Ω) + kδukL2 (Ω) + kδ1 − δ10 kL2 (Ω) + kδ2 − δ20 kL2 (Ω) .
From (6.8) and (6.9), we infer that kδµ1 kL2 (Ω) and kδµ2 kL2 (Ω) can be estimated a
similar expression. Plugging these estimates into (6.6), and choosing κ sufficiently
small, we get
2
2
2
kδykL2 (Ω) + kδukL2 (Ω) 6 caux kδ − δ 0 k[L2 (Ω)]5 .
By a priori estimates for the linearized and adjoint PDEs, we immediately obtain
Lipschitz stability for δy and thus for δp with respect to the H 2 (Ω)-norm.
The projection formula (compare [4, Lemma 2.7] and also Lemma A.1)
n
yc + δ5 − yδaux aux
∗
∗
µaux
− u∗
1,δ + εµ2,δ = max 0, φuu (y , u ) max δ4 ,
ε
o
+ φuy (y ∗ , u∗ ) (yδaux − y ∗ ) + φu (y ∗ , u∗ ) − paux
− δ2
δ
aux
yields the L∞ (Ω)-regularity for the Lagrange multipliers (µaux
1,δ , µ2,δ ) and the control
aux
uδ . As in [4, Lemma 3.5], we conclude
kδµ1 + ε δµ2 kL∞ (Ω) 6 c kδ 0 − δkZ .
5. SQP for Mixed-Constrained Optimal Control Problems
95
From the optimality system we have
φuu (y ∗ , u∗ ) δu = δµ1 + ε δµ2 − φuy (y ∗ , u∗ ) δy + δp + (δ2 − δ20 ),
which implies by Assumption (A4)
m kδukL∞ (Ω) 6 c kδµ1 + ε δµ2 kL∞ (Ω) + kδykL∞ (Ω) + kδpkL∞ (Ω) + kδ2 − δ20 kL∞ (Ω)
and yields the desired L∞ -stability for the control of the auxiliary problem.
As in [4, Lemma 4.1], one shows that for kδkZ 6 G σ (for a certain constant
G > 0), the solution (yδaux , uaux
δ ) of the auxiliary problem coincides with the solution
of (LQP(δ)). Likewise, the Lagrange multipliers and adjoint states of both problems
coincide and are Lipschitz stable in L∞ (Ω) and Y , respectively (see [4, Lemma 4.4]).
Remark 6.5. Theorem 6.4, together with Lemma 6.2, proves the strong regularity of
(5.4) at w∗ .
In order to apply the implicit function theorem, we verify that (6.1) satisfies a
Lipschitz condition with respect to η, uniformly in a neighborhood of w∗ .
Lemma 6.6. For any radii r3 > 0, r4 > 0 there exists L > 0 such that for any
η1 , η2 ∈ BrW3 (w∗ ) and for all w ∈ BrW4 (w∗ ) there holds the Lipschitz condition
kF (η1 ) + F 0 (η1 )(w − η1 ) − F (η2 ) − F 0 (η2 )(w − η2 )kZ 6 L kη1 − η2 kW .
(6.10)
Proof. Let us denote ηi = (yi , ui , pi , µi1 , µi2 ) ∈ BrW3 (w∗ ) and w = (y, u, p, µ1 , µ2 ) ∈
BrW4 (w∗ ), with r3 , r4 > 0 arbitrary. A simple calculation shows
F (η1 ) + F 0 (η1 )(w − η1 ) − F (η2 ) − F 0 (η2 )(w − η2 )
= (f1 (y1 , u1 ) − f1 (y2 , u2 ), f2 (y1 , u1 ) − f2 (y2 , u2 ), f3 (y1 ) − f3 (y2 ), 0, 0)> ,
where
f1 (yi , ui ) = dy (yi ) p + φy (yi , ui ) + [φyy (yi , ui ) + dyy (yi ) pi ](y − yi )
+ φyu (yi , ui )(u − ui )
f2 (yi , ui ) = φu (yi , ui ) + φuy (yi , ui )(y − yi ) + φuu (yi , ui )(u − ui )
f3 (yi ) = d(yi ) + dy (yi )(y − yi ).
We consider only the Lipschitz condition for f3 , the rest follows analogously. Using
the triangle inequality, we obtain
kf3 (y1 ) − f3 (y2 )kL2 (Ω) 6 kd(y1 ) − d(y2 )kL2 (Ω) + kdy (y1 )(y2 − y1 )kL2 (Ω)
+ k(dy (y1 ) − dy (y2 ))(y − y2 )kL2 (Ω)
6 kd(y1 ) − d(y2 )kL2 (Ω) + kdy (y1 )kL∞ (Ω) ky2 − y1 kL2 (Ω)
+ kdy (y1 ) − dy (y2 )kL∞ (Ω) ky − y2 kL2 (Ω) .
The properties of d, see Lemma A.1, imply that kdy (y1 )kL∞ (Ω) is uniformly bounded
for all y1 ∈ Br∞3 (y ∗ ). Moreover, ky − y2 kL2 (Ω) 6 ky − y ∗ kL2 (Ω) + ky ∗ − y2 kL2 (Ω) 6
c (r3 + r4 ) holds. Together with the Lipschitz properties of d and dy , see again
Lemma A.1, we obtain
kf3 (y1 ) − f3 (y2 )kL2 (Ω) 6 L ky1 − y2 kL∞ (Ω)
for some constant L > 0.
Using Theorem 6.4 and Lemma 6.6, the main result of this section follows directly
from Dontchev’s implicit function theorem [5, Theorem 2.1]:
96
Numerical Methods and Applications
Theorem 6.7. There exist radii r5 > 0, r6 > 0 such that for any parameter η ∈
BrW5 (w∗ ), there exists a solution w(η) ∈ BrW6 (w∗ ) of (6.1), which is unique in this
neighborhood. Moreover, there exists a constant Lη > 0 such that for each η1 , η2 ∈
BrW5 (w∗ ), the Lipschitz estimate
kw(η1 ) − w(η2 )kW 6 Lη kη1 − η2 kW
holds.
7. Local Convergence Analysis of SQP
This section is devoted to the local quadratic convergence analysis of the SQP
method. As was shown in Section 5, the SQP method is equivalent to Newton’s
method (5.5), applied to the generalized equation (5.4). It is convenient to carry out
the convergence analysis on the level of generalized equations. As mentioned in the
previous section, the key property is the local Lipschitz stability of solutions w(η) of
(6.1) and w(δ) of (6.2), as proved in Theorems 6.7 and 6.4, respectively. In the proof
of our main result, the iterates wk are considered perturbations of the solution w∗ of
(5.4) and play the role of the parameter η. We recall the function spaces
W := Y × L∞ (Ω) × Y × L∞ (Ω) × L∞ (Ω)
Y := H 2 (Ω) ∩ H01 (Ω)
Z := L2 (Ω) × L∞ (Ω) × L2 (Ω) × L∞ (Ω) × L∞ (Ω)
Theorem 7.1. There exists a radius r > 0 and a constant CSQP > 0 such that for
each starting point w0 ∈ BrW (w∗ ), the sequence of iterates wk generated by (5.5) is
well-defined in BrW (w∗ ) and satisfy
k+1
2
w
− w∗ W 6 CSQP wk − w∗ W .
Proof. Suppose that the iterate wk ∈ BrW (w∗ ) is given. The radius r satisfying r5 >
r > 0 will be specified below. From Theorem 6.7, we infer the existence of a solution
wk+1 of (5.5) which is unique in BrW6 (w∗ ). That is, we have
0 ∈ F (w∗ ) + F 0 (w∗ )(w∗ − w∗ ) + N (w∗ ),
k
0
k
0 ∈ F (w ) + F (w )(w
k+1
k
− w ) + N (w
(7.1a)
k+1
).
(7.1b)
Adding and subtracting the terms F 0 (w∗ )(wk+1 − w∗ ) and F (w∗ ) to (7.1b), we obtain
δ k+1 ∈ F (w∗ ) + F 0 (w∗ )(wk+1 − w∗ ) + N (wk+1 )
(7.2)
where
δ k+1 := F (w∗ ) − F (wk ) + F 0 (w∗ )(wk+1 − w∗ ) − F 0 (wk )(wk+1 − wk ).
From Lemma 6.6 with η1 := w∗ , η2 := wk , w := wk+1 , and r3 := r5 , r4 := r6 , we get
k+1 δ
6 L wk − w∗ < L r,
(7.3)
Z
W
k+1 6 G σ holds whenever
where L depends only on the radii. That is, δ
Z
r6
Gσ
,
L
which we impose on r.
Lemma 6.2 shows that (7.1a) and (7.2) are equivalent to problem (LQP(δ)) for
δ = 0 and δ = δ k+1 , respectively. From Theorem 6.4, we thus obtain
k+1
w
− w∗ W 6 Lδ δ k+1 − 0Z .
(7.4)
5. SQP for Mixed-Constrained Optimal Control Problems
97
It remains to verify that δ k+1 Z is quadratic in wk − w∗ W . We estimate
k+1 δ
6 F (w∗ ) − F (wk ) − F 0 (wk )(w∗ − wk )
Z
Z
+ (F 0 (w∗ ) − F 0 (wk ))(wk+1 − w∗ )Z .
As
kin the
proof of Theorem 3.4, the first term is bounded by a constant times
w − w∗ 2 ∞ 5 . Moreover, the Lipschitz properties of the terms in F 0 imply that
[L (Ω)]
the second term is bounded by a constant times wk − w∗ ∞ 5 wk+1 − w∗ 2 5 .
[L
(Ω)]
We thus conclude
k+1 δ
6 c1 wk − w∗ 2 + c2 wk − w∗ wk+1 − w∗ ,
Z
W
W
W
[L (Ω)]
(7.5)
where the constants depend only on the radius r5 . We finally choose r as
o
n Gσ
1
.
,
r = min r5 ,
L Lδ max{2 c2 , c1 + c2 Lδ L}
Then (7.3)–(7.5) imply wk+1 ∈ BrW (w∗ ) since
k+1
w
− w∗ W < Lδ c1 r + c2 wk+1 − w∗ W r
6 Lδ c1 + c2 Lδ L r2 6 r.
Moreover, (7.4)–(7.5) yield
k+1
2
w
− w∗ W 6 Lδ c1 wk − w∗ W + c2 Lδ r wk+1 − w∗ W
and thus
k+1
2
w
− w∗ W 6 CSQP wk − w∗ W
holds with CSQP =
Lδ c1
1−c2 Lδ r .
Clearly, Theorem 7.1 proves the local quadratic convergence of the SQP method.
Recall that the iterates wk are defined by means of Theorem 6.7, as the local unique
solutions, Lagrange multipliers and adjoint states of (QPk ). Indeed, we can now prove
that wk+1 = (xk+1 , λk+1 ) is globally unique, provided that wk is already sufficiently
close to w∗ .
Corollary 7.2. There exists a radius r0 > 0 such that wk ∈ BrW0 (w∗ ) implies that
(QPk ) has a unique global solution xk+1 . The associated Lagrange multipliers and
adjoint state λk+1 = (µk+1
, µk+1
, pk+1 ) are also unique. The iterate wk+1 lies again
1
2
W
∗
∗
in Br0 (x , λ ).
Proof. We first observe that Theorem 7.1 remains valid (with the same constant CSQP )
if r is taken to be smaller than chosen in the proof. Here, we set
o
n
σ
, R, r ,
r0 = min σ,
c∞ + ε
where R and r are the radii from Lemma 5.1 and Theorem 7.1, respectively, and c∞
is the embedding constant of H 2 (Ω) ,→ L∞ (Ω).
Suppose that wk ∈ BrW0 (w∗ ) holds. Then Lemma 5.1 implies that (QPk ) possesses
a globally unique solution xk+1 ∈ Y × L∞ (Ω). The corresponding active sets are
defined by
Ak+1
:= {ξ ∈ Ω : uk+1 (ξ) = 0}
1
Ak+1
:= {ξ ∈ Ω : εuk+1 (ξ) + y k+1 (ξ) − yc (ξ) = 0}.
2
We show that A1k+1 ⊂ S1σ and Ak+1
⊂ S2σ . For almost every ξ ∈ Ak+1
, we have
2
1
∗
∗
∗
k+1
k+1 0
u (ξ) = u (ξ) − u
(ξ) 6 u − u
6 r 6 σ,
L∞ (Ω)
98
Numerical Methods and Applications
since Theorem 7.1 implies that wk+1 ∈ BrW0 (w∗ ) and thus in particular uk+1 ∈ Br∞0 (u∗ ).
By the same argument, for almost every ξ ∈ Ak+1
we obtain
2
y ∗ (ξ) + ε u∗ (ξ) − yc (ξ) = y ∗ (ξ) + ε u∗ (ξ) − y k+1 (ξ) − ε uk+1 (ξ)
6 y ∗ − y k+1 ∞
+ ε u∗ − uk+1 ∞
L
(Ω)
L
(Ω)
0
6 (c∞ + ε) r 6 σ.
Owing to Assumption (A6), the active sets Ak+1
and Ak+1
are disjoint, and one can
1
2
k+1
show as in [4, Lemma 3.1] that the Lagrange multipliers µ1 , µk+1
and adjoint state
2
pk+1 are unique.
8. Conclusion
We have studied a class of distributed optimal control problems with semilinear
elliptic state equation and a mixed control-state constraint as well as a pure control constraint on the domain Ω. We have assumed that (y ∗ , u∗ ) is a solution and
(p∗ , µ∗1 , µ∗2 ) are Lagrange multipliers which satisfy second-order sufficient optimality
conditions (A5). Moreover, the active sets at the solution were assumed to be well
separated (A6). We have shown the local quadratic convergence of the SQP method
towards this solution. In particular, we have proved that the quadratic subproblems
possess global unique solutions and unique Lagrange multipliers.
Appendix A. Auxiliary Results
In this appendix we collect some auxiliary results. We begin with a standard result for the Nemyckii operators d(·) and φ(·) whose proof can be found, e.g., in [18,
Lemma 4.10, Satz 4.20]. Throughout, we impose Assumptions (A1)–(A5).
Lemma A.1. The Nemyckii operator d(·) maps L∞ (Ω) into L∞ (Ω) and it is twice
continuously differentiable in these spaces. For arbitrary M > 0, the Lipschitz condition
kdyy (y1 ) − dyy (y2 )kL∞ (Ω) 6 Ld (M ) ky1 − y2 kL∞ (Ω)
holds for all yi ∈ L∞ (Ω) such that kyi kL∞ (Ω) 6 M , i = 1, 2. In particular,
kdyy (y)kL∞ (Ω) 6 Kd + Ld (M ) M
holds for all y ∈ L∞ (Ω) such that kykL∞ (Ω) 6 M . The same properties, with different
constants, are valid for dy (·) and d(·). Analogous results hold for φ and its derivatives
up to second-order, for all (y, u) ∈ [L∞ (Ω)]2 such that kyi kL∞ (Ω) + kui kL∞ (Ω) 6 M .
The remaining results address the coercivity of the second derivative of the Lagrangian, considered at different lienarization points and for perturbed PDEs. Recall
that (x∗ , λ∗ ) ∈ W satisfies the second-order sufficient conditions (SSC) with coercivity
constant α > 0, see (3.1).
Lemma A.2. There exists ε > 0 and α0 > 0 such that
2
Lxx (x∗ , λ∗ )(x − x∗ , x − x∗ ) > α0 kx − x∗ k[L2 (Ω)]2
(A.1)
holds for all x = (y, u) ∈ Y × L∞ (Ω) which satisfy the semilinear PDE (1.1) and
kx − x∗ k[L∞ (Ω)]2 6 ε.
Proof. Let x = (y, u) satisfy (1.1). We define δx = (δy, δu) ∈ Y × L∞ (Ω) as
A δy + dy (y ∗ ) δy = δu
on Ω
with homogeneous Dirichlet boundary conditions. Then the error e := y ∗ − y − δy
satisfies the linear PDE
A e + dy (y ∗ ) e = f on Ω
(A.2)
5. SQP for Mixed-Constrained Optimal Control Problems
99
with homogeneous Dirichlet boundary conditions and
f := d(y) − d(y ∗ ) − dy (y ∗ )(y − y ∗ ).
We estimate
kf kL2 (Ω)
Z 1
∗
∗
∗
∗ =
dy (y + s(y − y )) − dy (y ) ds (y − y )
0
L2 (Ω)
Z 1
6L
s ds ky − y ∗ kL∞ (Ω) ky − y ∗ kL2 (Ω)
0
L
6 ky − y ∗ kL∞ (Ω) kδykL2 (Ω) + kekL2 (Ω) .
2
In view of Lemma A.1, dy (y ∗ ) ∈ L∞ (Ω) holds and it is a standard result that the
unique solution e of (A.2) satisfies an a priori estimate
kekL∞ (Ω) 6 c kf kL2 (Ω) .
In view of the embedding L∞ (Ω) ,→ L2 (Ω) we obtain
Lε
kδykL2 (Ω) + kekL2 (Ω) .
2
For sufficiently small ε > 0, we can absorb the last term in the left hand side and
obtain
kekL2 (Ω) 6 c00 (ε) kδykL2 (Ω)
kekL2 (Ω) 6 c0
where c00 (ε) & 0 as ε & 0. A straightforward application of [9, Lemma 5.5] concludes
the proof.
Lemma A.3. There exists R > 0 and α00 > 0 such that
2
Lxx (xk , λk )(x, x) > α00 kxk[L2 (Ω)]2
holds for all (y, u) ∈ Y × L2 (Ω):
A y + dy (y k ) y = u
in Ω
(A.3)
y=0
on ∂Ω,
k
provided that x − x∗ L∞ (Ω) + pk − p∗ L∞ (Ω) < R.
Proof. Let (y, u) be an arbitrary pair satisfying (A.3) and define ŷ ∈ Y as the unique
solution of
A ŷ + dy (y ∗ ) ŷ = u in Ω
ŷ = 0 on ∂Ω,
for the same control u as above. Then δy := y − ŷ satisfies
A δy + dy (y ∗ ) δy = dy (y ∗ ) − dy (y k ) y in Ω
with homogeneous boundary conditions. A standard a priori estimate and the triangle
inequality yield
kδykL2 (Ω) 6 dy (y ∗ ) − dy (y k )L∞ (Ω) kykL2 (Ω)
6 dy (y ∗ ) − dy (y k )L∞ (Ω) kŷkL2 (Ω) + kδykL2 (Ω) .
Due to the Lipschitz property of dy (·) with
respect to L∞ (Ω), there exists a function
c(R) tending to 0 as R → 0, such that dy (y ∗ ) − dy (y k )L∞ (Ω) 6 c(R), provided that
k
y − y ∗ ∞
< R. For sufficiently small R, the term kδyk 2
can be absorbed in
L
(Ω)
the left hand side, and we obtain
kδykL2 (Ω) 6 c0 (R) kŷkL2 (Ω) ,
L (Ω)
100
Numerical Methods and Applications
where c0 (R) has the same property as c(R). Again, [9, Lemma 5.5] implies that there
exists α0 > 0 and R > 0 such that
2
L00xx (x∗ , λ∗ )(x, x) > α0 kxkL2 (Ω) ,
provided that y k − y ∗ L∞ (Ω) < R.
Note that L00xx depends only on x and the adjoint state p. Owing to its Lipschitz
property, we further conclude that
L00xx (xk , λk )(x, x) = L00xx (x∗ , λ∗ )(x, x) + L00xx (xk , λk ) − L00xx (x∗ , λ∗ ) (x, x)
2
2
> α0 kxkL2 (Ω) − L (xk , pk ) − (x∗ , p∗ )L∞ (Ω) kxkL2 (Ω)
2
2
> α0 − L R kxkL2 (Ω) =: α00 kxkL2 (Ω) ,
∞ ∗ ∗
given that (xk , pk ) ∈ BR
(x , p ). For sufficiently small R, we obtain α00 > 0, which
completes the proof.
Acknowledgement
This work was supported by the Austrian Science Fund FWF under project number
P18056-N12.
References
[1] R. Adams. Sobolev Spaces. Academic Press, New York-London, 1975. Pure and Applied Mathematics, Vol. 65.
[2] W. Alt. The Lagrange-Newton method for infinite-dimensional optimization problems. Numerical
Functional Analysis and Optimization, 11:201–224, 1990.
[3] W. Alt. Local convergence of the Lagrange-Newton method with applications to optimal control.
Control and Cybernetics, 23(1–2):87–105, 1994.
[4] W. Alt, R. Griesse, N. Metla, and A. Rösch. Lipschitz stability for elliptic optimal control
problems with mixed control-state constraints. submitted, 2006.
[5] A. Dontchev. Implicit function theorems for generalized equations. Mathematical Programming,
70:91–106, 1995.
[6] P. Grisvard. Elliptic Problems in Nonsmooth Domains. Pitman, Boston, 1985.
[7] M. Heinkenschloss and F. Tröltzsch. Analysis of the Lagrange-SQP-Newton Method for the
Control of a Phase-Field Equation. Control Cybernet., 28:177–211, 1998.
[8] H. Maurer. First and Second Order Sufficient Optimality Conditions in Mathematical Programming and Optimal Control. Mathematical Programming Study, 14:163–177, 1981. Mathematical
programming at Oberwolfach (Proc. Conf., Math. Forschungsinstitut, Oberwolfach, 1979).
[9] H. Maurer and J. Zowe. First and second order necessary and sufficient optimality conditions for
infinite-dimensional programming problems. Mathematical Programming, 16:98–110, 1979.
[10] C. Meyer, U. Prüfert, and F. Tröltzsch. On two numerical methods for state-constrained elliptic
control problems. Optimization Methods and Software, to appear.
[11] C. Meyer, A. Rösch, and F. Tröltzsch. Optimal control of PDEs with regularized pointwise state
constraints. Computational Optimization and Applications, 33(2–3):209–228, 2005.
[12] C. Meyer and F. Tröltzsch. On an elliptic optimal control problem with pointwise mixed controlstate constraints. In A. Seeger, editor, Recent Advances in Optimization. Proceedings of the 12th
French-German-Spanish Conference on Optimization, volume 563 of Lecture Notes in Economics
and Mathematical Systems, pages 187–204, New York, 2006. Springer.
[13] S. Robinson. Strongly regular generalized equations. Mathematics of Operations Research,
5(1):43–62, 1980.
[14] A. Rösch and F. Tröltzsch. Sufficient second-order optimality conditions for a parabolic optimal control problem with pointwise control-state constraints. SIAM Journal on Control and
Optimization, 42(1):138–154, 2003.
[15] A. Rösch and F. Tröltzsch. Existence of regular Lagrange multipliers for elliptic optimal control
problems with pointwise control-state constraints. SIAM Journal on Control and Optimization,
45(2):548–564, 2006.
[16] A. Rösch and F. Tröltzsch. Sufficient second-order optimality conditions for an elliptic optimal control problem with pointwise control-state constraints. SIAM Journal on Optimization,
17(3):776–794, 2006.
5. SQP for Mixed-Constrained Optimal Control Problems
101
[17] F. Tröltzsch. On the Lagrange-Newton-SQP method for the optimal control of semilinear parabolic equations. SIAM Journal on Control and Optimization, 38(1):294–312, 1999.
[18] F. Tröltzsch. Optimale Steuerung partieller Differentialgleichungen. Theorie, Verfahren und Anwendungen. Vieweg, Wiesbaden, 2005.
[19] F. Tröltzsch and S. Volkwein. The SQP method for control constrained optimal control of the
Burgers equation. ESAIM: Control, Optimisation and Calculus of Variations, 6:649–674, 2001.
[20] F. Tröltzsch and D. Wachsmuth. Second-order sufficient optimality conditions for the optimal
control of Navier-Stokes equations. ESAIM: Control, Optimisation and Calculus of Variations,
12(1):93–119, 2006.
102
Numerical Methods and Applications
6. Update Strategies for Perturbed Nonsmooth Equations
R. Griesse, T. Grund and D. Wachsmuth: Update Strategies for Perturbed Nonsmooth
Equations, to appear in: Optimization Methods and Software, 2007
This paper addresses the question how the optimal control of a perturbed problem
(with parameter π) can be recovered from the optimal control of the nominal problem
(with parameter π0 ), and from derivative information. Our analysis is carried out in a
general setting where the unknown function u is the solution of a nonsmooth equation
(6.1)
u = ΠUad g(π) − G(π) u .
Here G(π) is a linear and monotone operator with smoothing properties, and π is
a perturbation parameter. We denote the unique solution of (6.1) by Ξu (π), see
Lemma 3.1.
Example:
In the context of an optimal control problem such as (Pcc (δ)) on p. 7, δ plays the role
of π and we have G(π) = S ? S/γ, where S is the solution operator of the PDE, S ? is
its adjoint, and g(π) = S ? yd /γ.
One of the results of this paper (see Theorem 4.2) is the Bouligand differentiability of
the projection ΠUad between Lp spaces with a norm gap, which generalizes a previous
result in Malanowski [2003b]. The directional derivative of the projection ΠUad is
given by another projection whose upper and lower bounds are either zero or ±∞,
depending on whether the projection is active or not. This was already observed in
Theorem 0.4, see p. 9. This norm gap is responsible for the observation that the Taylor
expansion
(6.2)
Ξu (π0 ) + DΞu (π0 ; π − π0 )
does not yield error estimates in L∞ , and neither does the modification
(6.3)
ΠUad Ξu (π0 ) + DΞu (π0 ; π − π0 ) ,
see Theorem 7.1. Note that, in contrast to (6.2), the expression (6.3) produces a
feasible estimate for the solution Ξu (π) of the perturbed problem.
We propose in this paper an alternative update strategy, which uses an adjoint variable
given by the solution of
(6.4)
φ = g(π) − G(π)ΠUad (φ).
The essential observation here is that the order of the projection and smoothing operations is reversed with respect to (6.1). Primal and adjoint variables are related by
u = ΠUad (φ), and φ = g(π) − G(π) u. We denote the unique solution of (6.4) by Ξφ (π)
and propose the update formula
(6.5)
ΠUad Ξφ (π0 ) + DΞφ (π0 ; π − π0 ) .
We are then able to prove L∞ error estimates for (6.5), see Theorem 7.1 of the paper.
We also show that the nominal solution Ξu (π0 ) of (6.1) as well as the derivative
DΞu (π0 ; π − π0 ) can be efficiently computed by a generalized (semismooth) Newton
method, see Bergounioux, Ito, and Kunisch [1999], Hintermüller, Ito, and Kunisch
[2002], Ulbrich [2003]. It turns out that the adjoint quantities Ξφ (π0 ) and DΞφ (π0 ; π −
π0 ) appear naturally in the Newton iterations and thus incur no additional work, see
Section 6 of the paper.
As our main application, we re-interpret these update strategies in the context of optimal control problems with control constraints (Section 8). Suppose that the optimal
control u and the adjoint state p are related by u = ΠUad (p/γ), as for instance in the
6. Update Strategies for Perturbed Nonsmooth Equations
103
model problem (Pcc (δ)), see 7. Then p = γ φ holds, and our proposed strategy (6.5)
amounts to the update formula
(6.6)
ΠUad Ξp (π0 ) + DΞp (π0 ; π − π0 ) /γ
based on the adjoint state.
We also note that (6.2) and (6.3) lack the ability to accurately predict the behavior of
the active sets under the change from π to π0 . The reason is that DΞu (π0 ; ·) is zero
on the strongly active subsets. In contrast, (6.5) and (6.6) can predict such a change.
We refer to Figure 6.1 below for an illustration.
The paper concludes with numerical results which confirm the theoretical findings
and show that indeed (6.5) yields much better results in recovering the solution of
perturbed problems. As we remark in Section 7 of the paper, however, the full potential
can only be revealed in nonlinear applications, where the solution of the derivative
problem is significantly less expensive than the solution of the original problem.
bound ub
bound ub
nominal u0
nominal u0
nominal φ0
nominal φ
0
u and φ
u and φ
u updated by (6.3)
x−axis
x−axis
bound ub
bound ub
nominal u
nominal u0
0
nominal φ
nominal φ0
φ updated
u updated by (6.5)
φ updated
u and φ
u and φ
0
x−axis
x−axis
Figure 6.1. The top left figure shows the nominal or unperturbed
situation, where u0 = ΠUad (φ0 ) holds. (We use the notation u0 =
Ξu (π0 ) and φ0 = Ξφ (π0 ) here.) In the top right figure, π0 has changed
to π and u has been updated by (6.3). One clearly sees that the
change of the active set is missed since DΞu (π0 ; ·) is zero on the
strongly active subset. The lower left figure shows φ updated by
Ξφ (π0 ) + DΞφ (π; π − π0 ). Finally, the bottom right figure displays
the situation where u has been updated by (6.5). The change of the
active set is now captured.
104
Numerical Methods and Applications
UPDATE STRATEGIES FOR PERTURBED NONSMOOTH
EQUATIONS
ROLAND GRIESSE, THOMAS GRUND AND DANIEL WACHSMUTH
Abstract. Nonsmooth operator equations in function spaces are considered,
which depend on perturbation parameters. The nonsmoothness arises from a
projection onto an admissible interval. Lipschitz stability in L∞ and Bouligand
differentiability in Lp of the parameter-to-solution map are derived. An adjoint
problem is introduced for which Lipschitz stability and Bouligand differentiability
in L∞ are obtained. Three different update strategies, which recover a perturbed
from an unperturbed solution, are analyzed. They are based on Taylor expansions
of the primal and adjoint variables, where the latter admits error estimates in L∞ .
Numerical results are provided.
1. Introduction
In this work we consider nonsmooth operator equations of the form
u = Π[a,b] (g(θ) − G(θ)u),
(Oθ )
where the unknown u ∈ L2 (D) is defined on some bounded domain D ⊂ RN , and θ is
a parameter. Moreover, Π[a,b] denotes the pointwise projection onto the set
Uad = {u ∈ L2 (D) : a(x) ≤ u(x) ≤ b(x) a.e. on D}.
Such nonsmooth equations appear as a reformulation of the variational inequality
Find u ∈ Uad s.t. hu + G(θ)u − g(θ), v − ui ≥ 0
for all v ∈ Uad .
(VI θ )
Applications of (VI θ ) abound, and we mention in particular control-constrained optimal control problems.
Throughout, G(θ) : L2 (D) → L2+δ (D) is a bounded and monotone linear operator
with smoothing properties, such as a solution operator to a differential equation, and
g(θ) ∈ L∞ (D). Both G and g may depend nonlinearly and also in a nonsmooth way
on a parameter θ in some normed linear space Θ. Under conditions made precise in
Section 2, (Oθ ) has a unique solution u[θ] for any given θ. We are concerned here
with the behavior of u[θ] under perturbations of the parameter. In particular, we
establish the directional differentiability of the nonsmooth map u[·] with uniformly
vanishing remainder, a concept called Bouligand differentiability (B-differentiability
for short). We prove B-differentiability of u[·] : Θ → Lp (D) for p ∈ [1, ∞), which is
a sharp result and allows a Taylor expansion of u[·] around a reference parameter θ0
with error estimates in Lp (D).
Based on this Taylor expansion, we analyze three update strategies
C1 (θ) := u0 + u′ [θ0 ](θ − θ0 )
C2 (θ) := Π[a,b] u0 + u′ [θ0 ](θ − θ0 )
C3 (θ) := Π[a,b] φ0 + φ′ [θ0 ](θ − θ0 )
which allow to recover approximations of the perturbed solution u[θ] from the reference
solution u0 = u[θ0 ] and derivative information. Our main result is that (C3 ), which
6. Update Strategies for Perturbed Nonsmooth Equations
105
involves a dual (adjoint) variable satisfying
φ = g(θ) − G(θ)Π[a,b] φ,
∞
allows error estimates in L (D) while the other strategies do not. We therefore
advocate to use update strategy (C3 ).
As an important application, our setting accomodates linear–quadratic optimal
control problems, where u is the control variable, S represents the control–to–state map
associated to a linear elliptic or parabolic partial differential equation and G = S ⋆ S.
Then (Oθ ) are necessary and sufficient optimality conditions. We shall elaborate on
this case later on.
In the context of optimal control, B-differentiability of optimal solutions for semilinear problems has been investigated in [4, 6]. We provide here a simplified proof in
the linear case.
The outline of the paper is as follows: In Section 2, we specify the problem setting
and recall the concept of B-differentiability. In Sections 3 and 4, we prove the Lipschitz
stability of the solution map u[·] into L∞ (D) and its B-differentiability into Lp (D),
p < ∞. Section 5 is devoted to the analysis of the adjoint problem, for which we
prove B-differentiability into L∞ (D). In Section 6, we discuss the application of the
semismooth Newton method to the original problem and the problem associated with
the derivative. We analyze the three update strategies (C1 )–(C3 ) in Section 7 and
prove error estimates. In Section 8 we apply our results to the optimal control of a
linear elliptic partial differential equation and report on numerical results confirming
the superiority of the adjoint-based strategy (C3 ).
Throughout, c and L denote generic positive constants which take different values
in different locations.
2. Problem Setting
Let us specify the standing assumptions for problem (Oθ ) taken to hold throughout
the paper. We assume that D ⊂ RN is a bounded and measurable domain, N ≥ 1.
By Lp (D), 1 ≤ p ≤ ∞, we denote the Lebuesge spaces of p-integrable or essentially
bounded functions on D. We write hu, vi to denote the scalar product of two functions
u, v ∈ L2 (D). The norm in Lp (D) is denoted by k · kp or simply k · k in the case
p = 2. The space of bounded linear operators from Lp (D) to Lq (D) is denoted by
L(Lp (D), Lq (D)) and its norm by k · kp→q .
The lower and upper bounds a, b : D → [−∞, ∞] for the admissible set are functions
satisfying a(x) ≤ b(x) a.e. on D. We assume the existence of an admissible function
u∞ ∈ L∞ (D) ∩ Uad . Hence, the admissible set
Uad = {u ∈ L2 (D) : a(x) ≤ u(x) ≤ b(x) a.e. on D}
is nonempty, convex and closed but not necessarily bounded in L2 (D). Π[a,b] denotes
the pointwise projection of a function on D onto Uad , i.e.,
Π[a,b] u = max{a, min{u, b}}
pointwise on D. Note that Π[a,b] : Lp (D) → Lp (D) is Lipschitz continuous with
Lipschitz constant 1 for all p ∈ [1, ∞].
Finally, let Θ be the normed linear space of parameters with norm k · k and let
θ0 ∈ Θ be a given reference parameter. We recall two definitions:
Definition 2.1. A function f : X → Y is said to be locally Lipschitz continuous at
x0 ∈ X if there exists an open neighborhood of x0 and L > 0 such that
kf (x) − f (y)kY ≤ Lkx − ykX
holds for all x, y in the said neighborhood of x0 . In addition, f is said to be locally
Lipschitz continuous if it is locally Lipschitz continuous at all x0 ∈ X.
106
Numerical Methods and Applications
Definition 2.2. A function f : X → Y between normed linear spaces X and Y is
said to be B-differentiable at x0 ∈ X if there exists ε > 0 and a positively homogeneous
operator f ′ (x0 ) : X → Y such that
f (x) = f (x0 ) + f ′ (x0 )(x − x0 ) + r(x0 ; x − x0 )
holds for all x ∈ X, where the remainder satisfies kr(x0 ; x − x0 )kY /kx − x0 kX → 0 as
kx − x0 kX → 0. In addition, f is said to be B-differentiable if it is B-differentiable at
all x0 ∈ X.
The B-derivative is also called a directional Fréchet derivative, see [1]. Recall that
an operator A : X → Y is said to be positively homogeneous if A(λx) = λA(x) holds
for all λ ≥ 0 and all x ∈ X.
Let us specify the standing assumptions for the function g:
(1) g is locally Lipschitz continuous from Θ to L∞ (D)
(2) g is B-differentiable from Θ to L∞ (D).
Moreover, we assume that G : Θ → L(L2 (D), L2 (D)) satisfies the following smoothing
properties with some δ > 0:
(3) G(θ) is bounded from Lp (D) to Lp+δ (D) for all p ∈ [2, ∞) and all θ ∈ Θ
(4) G(θ) is bounded from Lp (D) to L∞ (D) for all p > p0 and all θ ∈ Θ.
In addition, we demand that G(θ) : L2 (D) → L2 (D) is monotone for all θ ∈ Θ:
hG(θ)(u − v), u − vi ≥ 0
for all u, v ∈ L2 (D),
and that
(5) G is locally Lipschitz continuous from Θ to L(L2 (D), L2 (D))
(6) G is locally Lipschitz continuous from Θ to L(L∞ (D), L∞ (D)).
Finally, we assume that
(7) G is B-differentiable from Θ to L(Lp0 +δ (D), L∞ (D)).
Remark 2.3. For control-constrained optimal control problems, G = S ⋆ S where S is
the solution operator of the differential equation involved. An example is presented in
Section 8. If assumptions (1)–(2) and (5)–(7) hold only at a specified parameter θ0
and (3)–(4) hold only in a neighborhood of θ0 , the subsequent analysis remains valid
locally.
Remark 2.4. The assumptions (1)–(7) can be changed if G does not map into L∞ (D)
but only into Ls (D) for some s ∈ (2, ∞).
(1’)
(2’)
(3’)
(5’)
(6’)
(7’)
g is locally Lipschitz continuous from Θ to Ls (D)
g is B-differentiable from Θ to Ls (D).
G(θ) is bounded from Lp (D) to Lp+δ (D) for all p ∈ [2, s − δ] and all θ ∈ Θ
G is locally Lipschitz continuous from Θ to L(L2 (D), L2 (D))
G is locally Lipschitz continuous from Θ to L(Ls (D), Ls (D)).
G is B-differentiable from Θ to L(Ls (D), Ls (D)).
In this case, the results of Proposition 3.2 and Theorems 4.5 and 5.2 change accordingly. In particular, our main result Theorem 7.1 remains true if ∞ is replaced by
s.
In the sequel, we will need the B-derivative of a composite function. A similar result
for a related differentiation concept can be found in [8, Prop. 3.6].
Lemma 2.5. Consider normed linear spaces X, Y, Z and mappings F : Y → Z,
G : X → Y . Assume that the mapping G is B-differentiable at θ0 ∈ X and that F is Bdifferentiable at G(θ0 ). Furthermore assume that G is locally Lipschitz continuous at θ0
6. Update Strategies for Perturbed Nonsmooth Equations
107
and that F ′ (G(θ0 )) is locally Lipschitz continuous at 0. Then the mapping H : X → Z
defined by H = F ◦ G is B-differentiable at θ0 with the derivative
H ′ (θ0 ) = F ′ (G(θ0 )) ◦ G′ (θ0 ).
Proof. Applying B-differentiability of F and G we obtain
F (G(θ)) − F (G(θ0 )) = F ′ (G(θ0 )) (G(θ) − G(θ0 )) + rF ,
= F ′ (G(θ0 )) (G′ (θ0 )(θ − θ0 ) + rG ) + rF
(2.1)
with the remainder terms rF and rG satisfying
krF kZ
→ 0 as kG(θ) − G(θ0 )kY → 0
kG(θ) − G(θ0 )kY
and
krG kY
→ 0 as kθ − θ0 kX → 0
kθ − θ0 kX
respectively. Now let us write
F ′ (G(θ0 )) (G′ (θ0 )(θ − θ0 ) + rG ) = F ′ (G(θ0 ))G′ (θ0 )(θ − θ0 )
+ F ′ (G(θ0 )) (G′ (θ0 )(θ − θ0 ) + rG ) − F ′ (G(θ0 ))G′ (θ0 )(θ − θ0 ).
(2.2)
Putting (2.1) and (2.2) together, we get an expression for the remainder term
F (G(θ)) − F (G(θ0 )) − F ′ (G(θ0 ))G′ (θ0 )(θ − θ0 )
= rF + F ′ (G(θ0 )) (G′ (θ0 )(θ − θ0 ) + rG ) − F ′ (G(θ0 ))G′ (θ0 )(θ − θ0 )
(2.3)
′
Note that G (θ0 )(θ − θ0 ) and rG are small in the norm of Y whenever θ − θ0 is small in
the norm of X. Since F ′ (G(θ0 )) is locally Lipschitz continuous at 0, we can estimate
kF (G(θ)) − F (G(θ0 )) − F ′ (G(θ0 ))G′ (θ0 )(θ − θ0 )kZ ≤ krF kZ + cF ′ krG kY .
It remains to prove that the right-hand side, divided by kθ − θ0 kX , vanishes for kθ −
θ0 kX → 0. This is true for krG kY . So we have to investigate krF kZ :
krF kZ
krF kZ
krF kZ
kG(θ) − G(θ0 )kY
=
≤ cG
kθ − θ0 kX
kG(θ) − G(θ0 )kY
kθ − θ0 kX
kG(θ) − G(θ0 )kY
by the local Lipschitz continuity of G at θ0 . For kθ − θ0 kX → 0 it follows kG(θ) −
G(θ0 )kY → 0. Hence, the right-hand side vanishes for kθ − θ0 kX → 0. And the proof
is complete.
Combining locally Lipschitz continuity and B-differentiability, we can prove a useful
continuity result for the B-derivative.
Lemma 2.6. Consider normed linear spaces X, Y and the mapping G : X → Y .
Let G be B-differentiable and locally Lipschitz continuous at θ0 ∈ X. Then it holds
kG′ (θ0 )(θ − θ0 )kY → 0 for kθ − θ0 kX → 0, i.e. the B-derivative is continuous in the
origin with respect to the direction.
Proof. By local Lipschitz continuity of G at θ0 , there exist ǫ > 0 and L > 0 such that
kG(θ) − G(θ0 )kY ≤ Lkθ − θ0 kX
Let us write
∀θ ∈ X : kθ − θ0 kX < ǫ.
G(θ) = G(θ0 ) + G′ (θ0 )(θ − θ0 ) + rG
with the remainder rG satisfying
krG kY
→ 0 as kθ − θ0 kX → 0.
kθ − θ0 kX
Then, we have
kG′ (θ0 )(θ − θ0 )kY ≤ Lkθ − θ0 kX + krG kY ,
108
Numerical Methods and Applications
and it follows that the right-hand side tends to zero as kθ − θ0 kX → 0.
3. Lipschitz Stability of the Solution Map
In this section we draw some simple conclusions from the assumptions made in
Section 2. We recall that our problem (Oθ ) is equivalent to the following variational
inequality:
Find u ∈ Uad s.t. hu + G(θ)u − g(θ), v − ui ≥ 0
for all v ∈ Uad .
(VI θ )
We begin by proving the Lipschitz stability of solutions u[θ] with respect to the L2 (D)
norm.
Lemma 3.1. For any given θ ∈ Θ, (Oθ ) has a unique solution u[θ] ∈ L2 (D). The
solution map u[·] is locally Lipschitz continuous from Θ to L2 (D).
Proof. Let θ ∈ Θ be given and let F (u) = u + G(θ)u − g(θ). By monotonicity of G(θ)
it follows that hF (u1 ) − F (u2 ), u1 − u2 i ≥ ku1 − u2 k2 , hence F is strongly monotone.
This implies the unique solvability of (VI θ ) and thus of (Oθ ), see, for instance, [3].
If θ′ ∈ Θ is another parameter, then we obtain from (VI θ )
hu + G(θ)u − g(θ), u′ − ui + hu′ + G(θ′ )u′ − g(θ′ ), u − u′ i ≥ 0.
Inserting the term G(θ′ )u − G(θ′ )u and using the monotonicity of G(θ′ ), we obtain
ku′ − uk2 ≤ (kG(θ) − G(θ′ )k2→2 kuk + kg(θ) − g(θ′ )k) ku′ − uk.
This proves the local Lipschitz continuity of u[·] at any given parameter θ: Suppose
that θ and θ′ are in some ball of radius ε around θ0 such that, by Assumption (5),
kG(θ) − G(θ′ )k2→2 ≤ Lkθ − θ′ k. If we set u0 = u[θ0 ], then ku − u0 k ≤ Lkθ − θ0kku0 k ≤
εLku0k and thus kuk ≤ εLku0k + ku0 k. Hence ku′ − uk ≤ Lkθ − θ′ k(1 + εL)ku0 k. By exploiting the smoothing properties of G(θ), this result can be strenghtened:
Proposition 3.2. The solution map u[·] is locally Lipschitz continuous from Θ to
L∞ (D).
Proof. We use a bootstrapping argument to show that the solution u[θ] lies in L∞ (D).
The fact that g(θ) ∈ L∞ (D) and the smoothing property (3) of G(θ) yield g(θ) −
G(θ)u[θ] ∈ L2+δ (D). By the properties of the projection, it follows from (Oθ ) that
u[θ] ∈ L2+δ (D). Repeating this argument until 2 + nδ > p0 , we find u[θ] ∈ L∞ (D) by
Assumption (4).
We prove without loss of generality the local Lipschitz continuity of u[·] at the
reference parameter θ0 . Let θ and θ′ be any two parameters in a ball of radius ε
around θ0 such that kG(θ) − G(θ′ )k∞→∞ ≤ Lkθ − θ′ k and kg(θ) − g(θ′ )k∞ ≤ Lkθ − θ′ k
hold. Using the Lipschitz continuity of the projection, we obtain
ku − u′ k2+δ ≤ kg(θ) − g(θ′ )k2+δ + kG(θ)u − G(θ′ )u′ k2+δ
≤ c kg(θ) − g(θ′ )k∞ + kG(θ)(u − u′ )k2+δ + k(G(θ) − G(θ′ ))u′ k2+δ
≤ c L kθ − θ′ k + c ku − u′ k + c L kθ − θ′ kku′ k∞
for some c > 0 and hence the local Lipschitz stability for u[·] in L2+δ (D) follows.
Repeating this argument until 2 + nδ > p0 , we obtain the local Lipschitz stability for
u[·] in L∞ (D).
6. Update Strategies for Perturbed Nonsmooth Equations
109
4. B-Differentiability of the Solution Map
In this section we study the differentiability properties of the solution map u[·],
which depend on the properties of the projection. We extend the results of [5]. Let us
define a set I[a, b, u0 ] by


u(x) = 0
u0 (x) 6∈ [a(x), b(x)] 





u(x) = 0 if u0 (x) = a(x) = b(x)
.
I[a, b, u0 ] = u ∈ L2 (D) :
u(x) ≥ 0
u0 (x) = a(x)






u(x) ≤ 0
u0 (x) = b(x)
The pointwise projection on this set is denoted by ΠI[a,b,u0 ] . By construction it holds
for u0 , u, a, b ∈ L2 (D), a ≤ b
ΠI[a,b,u0 ] (u) = −ΠI[−b,−a,−u0 ] (−u),
ΠI[a,+∞,u0 ] (u) = ΠI[0,+∞,u0 −a] (u),
ΠI[a,b,u0 ] (u) = ΠI[a,+∞,u0 ]
ΠI[−∞,b,u0 ] (u) .
(4.1)
It turns out that ΠI[a,b,u0 ] is the B-derivative of the projection onto the admissible
set Π[a,b] . We start with the proof of B-differentiability of the projection on the cone
of non-negative functions.
Theorem 4.1. The projection Π[0,+∞] is B-differentiable from Lp (D) to Lq (D) for
1 ≤ q < p ≤ ∞. And it holds
where
Π[0,+∞] (u) = Π[0,+∞] (u0 ) + ΠI[0,+∞,u0 ] (u − u0 ) + r1
(4.2a)
kr1 kq
→ 0 as ku − u0 kp → 0.
ku − u0 kp
(4.2b)
Remark 4.2. The claim for the case p = ∞ was proven in [5]. A counterexample
was given there, which shows that the projection is not B-differentiable from L∞ (D)
to L∞ (D).
Proof of Theorem 4.1. Clearly, the function ΠI[0,+∞,u0 ] is positively homogeneous.
Let us define the function r as the remainder term
r = Π[0,+∞] (u) − Π[0,+∞] (u0 ) − ΠI[0,+∞,u0 ] (u − u0 ).
A short calculation shows that
r(x) =
(
|u(x)|
0
if u(x)u0 (x) < 0
otherwise
(4.3)
(4.4)
holds, see also the discussion in [5]. It implies the estimate r(x) ≤ |u(x) − u0 (x)|. Now
suppose that 1 ≤ q < p ≤ ∞. It remains to prove
krkq
→ 0 as ku − u0 kp → 0.
ku − u0 kp
(4.5)
We will argue by contradiction. Assume that (4.5) does not hold. Then there exists
ǫ > 0 such that for all δ > 0 there is a function uδ with kuδ − u0 kp < δ and satisfying
krδ kq
≥ ǫ.
kuδ − u0 kp
(4.6)
Here, rδ is the remainder term defined as in (4.3). Let us choose a sequence {δk } with
limk→∞ δk = 0, uk = uδk , and rk := rδk . By Egoroff’s Theorem, for each σ > 0 there
110
Numerical Methods and Applications
exists a set Dσ ⊂ D with meas(D \ Dσ ) < σ such that the convergence uk → u0 is
uniform on Dσ . It allows us to estimate
!1/q Z
1/q
Z
q
q
krk kq ≤
|uk (x) − u0 (x)| dx
+
|rk (x)| dx
D\Dσ
1
1
≤ σ q − p kuk − u0 kp +
Z
Dσ
Dσ
1/q
.
|rk (x)|q dx
Here, the second addend needs more investigation. Let us define a subset Dσ,k of Dσ
by
′
′
Dσ,k = x ∈ Dσ : 0 < |u0 (x)| < sup |uk (x ) − u0 (x )| .
x′ ∈Dσ
Then by construction it holds rk (x) = 0 on Dσ \ Dσ,k , compare (4.4). Observe that
meas(Dσ,k ) → 0 as k → ∞ due to the uniform convergence of uk to u0 on Dσ . And
we can proceed with
Z
1/q
1
1
krk kq ≤ σ q − p kuk − u0 kp +
|rk (x)|q dx
Dσ
=σ
1
1
q−p
1
!1/q
Z
kuk − u0 kp +
1
Dσ,k
q
|rk (x)| dx
1
1
≤ σ q − p kuk − u0 kp + meas(Dσ,k ) q − p kuk − u0 kp ,
which is a contradiction to (4.6).
Now, we calculate the B-derivative of Π[a,b] using the chain rule developed in
Lemma 2.5.
Theorem 4.3. The projection Π[a,b] is B-differentiable from Lp (D) to Lq (D) for
1 ≤ q < p ≤ ∞. And it holds
where
Π[a,b] (u) = Π[a,b] (u0 ) + ΠI[a,b,u0 ] (u − u0 ) + r1
(4.7a)
kr1 kq
→ 0 as ku − u0 kp → 0.
ku − u0 kp
(4.7b)
Proof. The projection Π[a,b] can be written as a composition of two projections on the
set of non-negative functions as
Π[a,b] (u) = Π[0,+∞] b − Π[0,+∞] (b − u) − a + a.
The projection Π[0,+∞] and its B-derivative ΠI[0,+∞,u0 ] are Lipschitz continuous.
Thus, the B-differentiability of Π[a,b] follows by Lemma 2.5.
The chain rule yields the derivative
Π′[a,b] (u0 )(u − u0 ) = ΠI[0,+∞,b−Π[0,+∞] (b−u0 )−a] −ΠI[0,+∞,b−u0 ] (−(u − u0 ))
= ΠI[0,+∞,b−Π[0,+∞] (b−u0 )−a] ΠI[−∞,b,u0 ] (u − u0 )
= ΠI [a,+∞,Π[−∞,b] (u0 )] ΠI[−∞,b,u0 ] (u − u0 ) .
Here, we used the properties (4.1) of the projection ΠI . It remains to prove that the
right-hand side is equal to ΠI[a,b,u0 ] (u − u0 ). To this end, let us introduce the following
disjoint subsets of D:
D1 := {x ∈ D : u0 (x) ≤ b(x)},
D2 := {x ∈ D : b(x) < u0 (x)}.
6. Update Strategies for Perturbed Nonsmooth Equations
111
Let us denote by χDi the characteristic function of the set Di . The projection ΠI is
additive with respect to functions with disjoint support, i.e.
ΠI[a,b,u0 ] (v) = ΠI[a,b,u0 ] (χD1 v) + ΠI[a,b,u0 ] (χD2 v)
holds for all a, b, u0 , v. Since Π′[a,b] (u0 )(u − u0 ) is a composition of such projections,
we can split
Π′[a,b] (u0 )(u − u0 ) = Π′[a,b] (u0 )(χD1 (u − u0 )) + Π′[a,b] (u0 )(χD2 (u − u0 )).
Furthermore, it holds ΠI[a,b,u0 ] (χDi v) = ΠI[a,b,χDi u0 ] (χDi v). At first, we have
χD1 Π[−∞,b] (χD1 u0 ) = χD1 u0 .
Π′[a,b] (u0 )(χD1 (u − u0 )) = ΠI [a,+∞,Π[−∞,b] (u0 )] ΠI[−∞,b,u0 ] (χD1 (u − u0 ))
= ΠI[a,+∞,u0 ] ΠI[−∞,b,u0 ] (χD1 (u − u0 ))
= ΠI[a,b,u0 ] (χD1 (u − u0 )).
The last equality follows from the third property of ΠI in (4.1).
For the second set D2 , we have
ΠI[−∞,b,u0 ] (χD2 (u − u0 )) = 0,
since u0 (x) is not admissible for x ∈ D2 . For the same reason, we get also
ΠI[a,b,u0 ] (χD2 (u − u0 )) = 0,
which gives
Π′[a,b] (u0 )(χD2 (u − u0 )) = 0 = ΠI[a,b,u0 ] (χD2 (u − u0 )).
Consequently, we obtain
Π′[a,b] (u0 )(u − u0 ) = Π′[a,b] (u0 )(χD1 (u − u0 )) + Π′[a,b] (u0 )(χD2 (u − u0 ))
= ΠI[a,b,u0 ] (χD1 (u − u0 )) + ΠI[a,b,u0 ] (χD2 (u − u0 ))
= ΠI[a,b,u0 ] (u − u0 ),
and the claim is proven.
Let us remark that the result of the last two Theorems is sharp with respect to the
choice of function spaces:
Remark 4.4. The projection is not B-differentiable from Lp (D) to Lp (D) for any
p, as the following example shows. Take a = 0, b = +∞, D = (0, 1). We choose
u0 (x) = −1 and
(
1
if x ∈ (0, 1/k)
uk (x) =
−1
otherwise.
In this case, the remainder term given by (4.4) is r1,k = (uk − u0 )/2. Therefore it
holds
1
kr1,k kp
= 6→ 0 for k → ∞.
kuk − u0 kp
2
As a side result of the previous theorem, however, we get for α ∈ (−∞, 1)
kr1,k kp
→ 0 for k → ∞.
kuk − u0 kα
p
We are now in the position to prove B-differentiability of the solution mapping u[θ]
of our non-smooth equation (Oθ ).
112
Numerical Methods and Applications
Theorem 4.5. The solution mapping u[θ] of problem (Oθ ) is B-differentiable from Θ
to Lp (D), 2 ≤ p < ∞. The Bouligand derivative of u[·] at θ0 in direction θ, henceforth
called u′ [θ0 ]θ, is the unique solution of the non-smooth equation
u = ΠI[a,b,φ0 ] (g ′ (θ0 )θ − G(θ0 )u − (G′ (θ0 )θ)u0 )
(Oθ′ 0 ;θ )
where u0 = u[θ0 ] and φ0 = g(θ0 ) − G(θ0 )u0 .
Proof. The problem (Oθ′ 0 ;θ ) is equivalent to finding a solution u ∈ I[a, b, φ0 ] of the
variational inequality
hu + G(θ0 )u + (G′ (θ0 )θ)u0 − g ′ (θ0 )θ, v − ui ≥ 0
∀v ∈ I[a, b, φ0 ].
By monotonicity of G(θ0 ) this variational inequality is uniquely solvable, compare
Lemma 3.1. Moreover, the projection ΠI[a,b,φ0 ] is positively homogeneous. So the
mapping θ 7→ u′ [θ0 ]θ is positively homogeneous as well.
Now, let us take θ1 ∈ Θ and u1 := u[θ1 ]. Let p ∈ [2, ∞) be fixed. Further, let ud
be the solution of (Oθ′ 0 ;θ ) for θ = θ1 − θ0 , i.e.
ud = ΠI[a,b,φ0 ] (g ′ (θ0 )(θ1 − θ0 ) − G(θ0 )ud − G′ (θ0 )(θ1 − θ0 )u0 ).
(4.8)
Let us investigate the difference u1 − u0 . We obtain by B-differentiability of the
projection from Lp+δ (D) to Lp (D)
u1 − u0 = Π[a,b] (g(θ1 ) − G(θ1 )u1 ) − Π[a,b] (g(θ0 ) − G(θ0 )u0 )
= ΠI[a,b,g(θ0 )−G(θ0 )u0 ] (g(θ1 ) − G(θ1 )u1 − g(θ0 ) + G(θ0 )u0 ) + r1
(4.9)
= ΠI[a,b,φ0 ] (g(θ1 ) − G(θ1 )u1 − g(θ0 ) + G(θ0 )u0 ) + r1 .
The remainder term r1 satisfies
kr1 kp
→0
kg(θ1 ) − G(θ1 )u1 − g(θ0 ) + G(θ0 )u0 kp+δ
as kg(θ1 ) − G(θ1 )u1 − g(θ0 ) + G(θ0 )u0 kp+δ → 0. Applying Lipschitz continuity of u[·],
G, and g, we get
kg(θ1 ) − G(θ1 )u1 − g(θ0 ) + G(θ0 )u0 kp+δ ≤ c (kθ1 − θ0 k + ku1 − u0 kp )
≤ c kθ1 − θ0 k.
Hence, we find for the remainder term
kr1 kp
→ 0 as kθ1 − θ0 k → 0.
kθ1 − θ0 k
(4.10)
Let us rewrite (4.9) as
u1 − u0 − r1 = ΠI[a,b,φ0 ] g(θ1 ) − g(θ0 ) − G(θ0 )(u1 − u0 ) − (G(θ1 ) − G(θ0 ))u1
= ΠI[a,b,φ0 ] g ′ (θ0 )(θ1 − θ0 ) + r1g − G(θ0 )(u1 − u0 )
− (G′ (θ0 )(θ1 − θ0 ) + r1G )u1
= ΠI[a,b,φ0 ] g ′ (θ0 )(θ1 − θ0 ) − G(θ0 )(u1 − u0 − r1 )
− G′ (θ0 )(θ1 − θ0 )u1 + r1g + r1G u1 − G(θ0 )r1
= ΠI[a,b,φ0 ] g ′ (θ0 )(θ1 − θ0 ) − G(θ0 )(u1 − u0 − r1 )
− G′ (θ0 )(θ1 − θ0 )u1 + r1∗
with a remainder term r1∗ = r1g + r1G u1 − G(θ0 )r1 satisfying
kr1∗ kp
→ 0 as kθ1 − θ0 k → 0.
(4.11)
kθ1 − θ0 k
We can interpret ur := u1 − u0 − r1 as the solution of the non-smooth equation
ur = ΠI[a,b,φ0 ] g ′ (θ0 )(θ1 − θ0 ) − G(θ0 )ur − G′ (θ0 )(θ1 − θ0 )u1 + r1∗ ,
6. Update Strategies for Perturbed Nonsmooth Equations
113
which is similar to (4.8) but perturbed by −G′ (θ0 )(θ1 − θ0 )(u1 − u0 ) + r1∗ . Analogously as in Section 3, it can be shown that the solution mapping of that equation
is Lipschitz continuous in the data, i.e., the map r ∋ Lp (D) 7→ u ∈ Lp (D), where
u = ΠI[a,b,φ0 ] (−G(θ0 )u + r), is Lipschitz continuous.
So we can estimate
ku1 − u0 − r1 − ud kp = kur − ud kp ≤ c kG′ (θ0 )(θ1 − θ0 )(u1 − u0 )kp + c kr1∗ kp
≤ c kG′ (θ0 )(θ1 − θ0 )(u1 − u0 )k∞ + c kr1∗ kp .
(4.12)
Using the assumptions on G, we obtain by Lemma 2.6
kG′ (θ0 )(θ1 − θ0 )k∞→∞ → 0 as kθ1 − θ0 k → 0.
The mapping θ 7→ u[θ] is locally Lipschitz continuous from Θ to L∞ (D), see Proposition 3.2. Both properties imply
kG′ (θ0 )(θ1 − θ0 )(u1 − u0 )k∞
→ 0 as kθ1 − θ0 k → 0.
kθ1 − θ0 k
(4.13)
Combining (4.11)–(4.13) yields in turn
ku1 − u0 − r1 − ud kp
→ 0 as kθ1 − θ0 k → 0.
kθ1 − θ0 k
(4.14)
Finally, we have
ku1 − (u0 + ud )kp ≤ ku1 − u0 − r1 − ud kp + kr1 kp
and consequently by (4.10) and (4.14)
ku1 − (u0 + ud )kp
→ 0 as kθ1 − θ0 k → 0.
kθ1 − θ0 k
Hence, ud is the Bouligand derivative of u[·] at θ0 in the direction θ1 − θ0 .
(4.15)
Remark 4.6. This result cannot be strengthened. The map u[θ] cannot be Bouligand
from Θ to L∞ (D). To see this, consider the case G = 0. It trivially fulfills all
requirements of Section 2. Then u[θ] = Π[a,b] (g(θ)) holds, but the projection Π[a,b] is
not B-differentiable from L∞ (D) to L∞ (D), see Remark 4.4.
Lemma 4.7. The B-derivative u′ [θ0 ] satisfies for all α ∈ (−∞, 1)
ku[θ0 ] + u′ [θ0 ](θ1 − θ0 ) − u[θ1 ]k∞
→ 0 as kθ1 − θ0 k → 0.
kθ1 − θ0 kα
Proof. Here, we will follow the steps of the proof of the previous theorem. Let α be
less than 1. The limiting factors in the proof are the remainder terms r1 and r1∗ . We
obtain for r1 and r1∗ due to Remark 4.4 the property
kr1∗ k∞
kr1 k∞
→
0
and
→ 0 as kθ1 − θ0 k → 0.
kθ1 − θ0 kα
kθ1 − θ0 kα
Combining these with estimates (4.12)–(4.15) completes the proof.
5. Properties of the Adjoint Problem
In this section we investigate an adjoint problem defined by
φ = g(θ) − G(θ)Π[a,b] (φ).
(Dθ )
If we interpret (Oθ ) as an optimal control problem with control constraints, see Section 8, then problem (Dθ ) is an equation for the adjoint state. The primal and adjoint
formulations are closely connected: If u[θ] is the unique solution of (Oθ ) then
φ := g(θ) − G(θ)u[θ]
(5.1)
114
Numerical Methods and Applications
is a solution of (Dθ ), which means that (Dθ ) admits at least one solution. And if φ is
a solution of the dual (adjoint) equation (Dθ ) then the projection u = Π[a,b] (φ[θ]) is
the unique solution of the original problem (Oθ ).
Now, let us briefly answer the question of uniqueness of adjoint solutions. If φ1 and
φ2 are two solutions of (Dθ ), then both Π[a,b] (φ1 ) and Π[a,b] (φ2 ) are solutions of (Oθ ).
By Lemma 3.1 this problem has a unique solution, hence Π[a,b] (φ1 ) = Π[a,b] (φ2 ). For
the difference φ1 − φ2 we have
φ1 − φ2 = g(θ) − G(θ)Π[a,b] (φ1 ) − g(θ) − G(θ)Π[a,b] (φ2 )
= −G(θ)(Π[a,b] (φ1 ) − Π[a,b] (φ2 )) = 0,
which implies in fact the unique solvability of (Dθ ). In the following, we denote this
unique solution by φ[θ]. An immediate conclusion of the considerations in Section 3
is the Lipschitz property of φ[·].
Corollary 5.1. The mapping φ[θ] is locally Lipschitz from Θ to L∞ (D).
Thus, we found that φ[·] inherits Lipschitz continuity from u[·]. However, in contrast
to the primal map u[·], the adjoint map φ[·] is B-differentiable into L∞ (D). The
property which allows us to prove this result is that in (Dθ ), the smoothing operator
G(θ) is applied after the projection Π[a,b] .
Theorem 5.2. The mapping φ[θ] is B-differentiable from Θ to L∞ (D). The Bderivative of φ[·] at θ0 in direction θ, henceforth called φ′ [θ0 ]θ, is the solution of the
non-smooth equation
φ = g ′ (θ0 )θ − G(θ0 )ΠI[a,b,φ0 ] (φ) − (G′ (θ0 )θ)Π[a,b] (φ0 ),
(5.2)
where φ0 = φ[θ0 ] = g(θ0 ) − G(θ0 )u[θ0 ].
Proof. Due to the linearity of G, the B-derivative of H(θ) := G(θ)u[θ] at θ0 , in the
direction of θ, can be written as
H ′ (θ0 )θ = G(θ0 )u′ [θ0 ]θ + (G′ (θ0 )θ)u0 ,
where u0 = u[θ0 ]. By Theorem 4.5, u[·] is B-differentiable from Θ to Lp0 +δ (D).
Together with the B-differentiability of G(·) from Θ to L(Lp0 +δ (D), L∞ (D)), the relationship φ[θ] = g(θ) − G(θ)u[θ] implies B-differentiability of φ[·] from Θ to L∞ (D).
The formula (5.2) is obtained by differentiating equation (Dθ ).
We now discuss the use of the derivative of φ[θ] to obtain an update rule for the
primal variable u[θ]. Suppose that u0 = u[θ0 ] and φ0 = φ[θ0 ] are the solutions of
the primal and dual problems at the reference parameter θ0 . We use the following
construction as a first-order approximation of u[θ]:
ũ[θ0 , θ − θ0 ] := C3 (θ) = Π[a,b] φ0 + φ′ [θ0 ](θ − θ0 ) .
(5.3)
We can prove that the L∞ -norm of the remainder u[θ] − ũ[θ0 , θ − θ0 ], divided by
kθ − θ0 k, vanishes as θ → θ0 . This is a stronger result than can be obtained using
merely the B-differentiability. There, the remainder u[θ]−u[θ0 ]−u′ [θ0 ](θ−θ0 ), divided
by kθ − θ0 k, vanishes only in weaker Lp -norms. We refer to Section 7 for a comparison
of this advanced update rule with the conventional rules (C1 ) and (C2 ).
Corollary 5.3. Let ũ[θ0 , θ − θ0 ] be given by (5.3). Then
ku[θ] − ũ[θ0 , θ − θ0 ]k∞
→ 0 as θ → θ0 .
kθ − θ0 k
Proof. By construction, we have
u[θ] − ũ[θ0 , θ − θ0 ] = Π[a,b] (φ[θ]) − Π[a,b] φ[θ0 ] + φ′ [θ0 ](θ − θ0 ) .
6. Update Strategies for Perturbed Nonsmooth Equations
115
The projection is Lipschitz from L∞ (D) to L∞ (D), hence we can estimate
ku[θ] − ũ[θ0 , θ − θ0 ]k∞ ≤ kφ[θ] − φ[θ0 ] − φ′ [θ0 ](θ − θ0 )k∞ .
We know already by Theorem 5.2 that φ[θ] is B-differentiable at θ0 from Θ to L∞ (D).
Thus, it holds for kθ − θ0 k → 0
kφ[θ] − φ[θ0 ] + φ′ [θ0 ](θ − θ0 )k∞
→ 0.
kθ − θ0 k
for θ − θ0 → 0. Consequently, we get the same behavior for the remainder u[θ] −
ũ[θ0 , θ − θ0 ], which proves the claim.
In the next section we discuss how the quantities u[θ0 ], φ[θ0 ] and the required
directional derivatives of these quantities can be computed. It turns out that the
derivative φ′ [θ0 ](θ−θ0 ) is available at no additional cost when evaluating u′ [θ0 ](θ−θ0 ),
so the new update rule (C3 ) incurs no additional cost.
On the other hand, it is also easily possible to obtain φ′ [θ0 ](θ − θ0 ) a posteriori from
′
u [θ0 ](θ − θ0 ). Once u′ [θ0 ](θ − θ0 ) is known, φ′ [θ0 ](θ − θ0 ) can be computed from
φ′ [θ0 ](θ − θ0 ) = g ′ (θ0 )(θ − θ0 ) − G(θ0 )u′ [θ0 ](θ − θ0 ) − (G′ (θ0 )(θ − θ0 ))u0 .
Hence the a posteriori computation of φ′ involves only the application of G and G′
and it is not necessary to solve any additional non-smooth equations. For optimal
control problems the quantity φ′ [θ0 ](θ − θ0 ) is closely related to the adjoint state of
the problem belonging to u′ [θ0 ](θ − θ0 ).
6. Computation of the Solution and its Derivative
In this section we address the question how to solve problem (Oθ ) for the nominal
parameter θ0 and the derivative problem (Oθ′ 0 ;θ ) algorithmically. In the recent past,
generalized Newton methods in function spaces have been developed [2, 10], where a
generalized set-valued derivative plays the role of the Fréchet derivative in the classical
Newton method. The semismooth Newton concept can be applied here, in view of the
smoothing properties of the operator G(θ0 ).
Let us consider the following nonsmooth equation:
F (u) := −u + g(θ0 ) − G(θ0 )u − max{0, g(θ0 ) − G(θ0 )u − b}
− min{0, g(θ0 ) − G(θ0 )u − a} = 0. (6.1)
It is easy to check that (6.1) holds if and only if u solves (Oθ ) at θ0 .
Following [2], we infer that F is Newton differentiable as a map from Lp (D) to
p
L (D) for any p ∈ [2, ∞]. The usual norm gap in the min and max functions is
compensated by the smoothing properties of G(θ0 ). The generalized derivative of F
is set-valued, and we take
F ′ (u) δu = −G(θ0 ) δu − δu + χA+ (u) G(θ0 ) δu + χA− (u) G(θ0 ) δu
as a particular choice. Here,
A+ (u) = {x ∈ D : g(θ0 ) − G(θ0 )u − b ≥ 0}
A(u) = A+ (u) ∪ A− (u)
A− (u) = {x ∈ D : g(θ0 ) − G(θ0 )u − a ≤ 0}
I(u) = D \ A(u)
are the so-called active and inactive sets, and χA is the characteristic function of a
measurable set A. A generalized Newton step F ′ (u) δu = −F (u) can be computed
116
Numerical Methods and Applications
by splitting the unknown δu into its parts supported on the active and inactive sets.
Then a simple calculation shows that
on A+ (u) : δu|A+ (u) = b − u
on A− (u) : δu|A− (u) = a − u
on I(u) : (G(θ0 ) + I) δu|I(u) = g(θ0 ) − G(θ0 )u − u − G(θ0 ) δu|A(u) .
Lemma 6.1. For given u ∈ Lp (D) where 2 ≤ p ≤ ∞, the generalized Newton step
F ′ (u) δu = −F (u) has a unique solution δu ∈ Lp (D).
Proof. We only need to verify that the step on the inactive set I(u) is indeed uniquely
solvable. This follows from the strong monotonicity of G(θ0 ) + I, considered as an
operator from L2 (I(u)) to itself, compare the proof of Lemma 3.1. Hence the unique
solution has an a priori regularity δu ∈ L2 (D). The terms of lowest regularity on the
right hand sides are the terms −u. Hence δu inherits the Lp (D) regularity of u. Note
that in case b or a are equal to ±∞ on a subset of D, this subset can not intersect
A+ (u) or A− (u) and thus the update δu lies in L∞ (D), provided that u ∈ L∞ (D),
even if the bounds take on infinite values.
By the previous lemma, the generalized Newton iteration is well-defined. For a
convergence analysis, we refer to [2, 10]. For completeness, we state the semismooth
Newton method for problem (Oθ ) below (Algorithm 1). Note that the dual variable
Algorithm 1 Semismooth Newton algorithm to compute u0 and φ0 .
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
Choose u0 ∈ L∞ (D) and set n := 0
Set φn := g(θ0 ) − G(θ0 )un
Set rn := F (un ) = φn − un − max{0, φn − b} − min{0, φn − a}
while krn k∞ > tol do
Set δu|A+ (un ) := b − un on A+ (un )
Set δu|A− (un ) := a − un on A− (un )
Solve (G(θ0 ) + I) δu|I(un) = φn − un − G(θ0 )δu|A(un ) on I(un )
Set un+1 := un + δu
Set φn+1 := g(θ0 ) − G(θ0 )un+1
Set rn+1 := F (un+1 ) = φn+1 − un+1 − max{0, φn+1 − b} − min{0, φn+1 − a}
Set n := n + 1
end while
Set u0 := un and φ0 := φn
φ0 appears naturally as an auxiliary quantity in the iteration, so it is available at no
extra cost. With minor modifications, the same routine solves the derivative problems
(Oθ′ 0 ;θ ) for u′ [θ0 ](θ) and (5.2) for φ′ [θ0 ](θ) simultaneously. Similarly as before, we
consider the nonsmooth equation
Fb (b
u) := −b
u + g ′ (θ0 )θ − G(θ0 )b
u − (G′ (θ0 )θ)u0
− max{0, g ′(θ0 )θ − G(θ0 )b
u − (G′ (θ0 )θ)u0 − bb}
− min{0, g ′ (θ0 )θ − G(θ0 )b
u − (G′ (θ0 )θ)u0 − b
a} = 0. (6.2)
Hats indicate variables that are associated with derivatives. The new bounds b
a and
bb depend on the solution and adjoint solution u0 and φ0 of the reference problem,
through the definition of I[a, b, φ0 ] in Section 4:
(
(
0
where u0 = a or φ0 6∈ [a, b]
bb = 0 where u0 = b or φ0 6∈ [a, b]
b
a=
−∞ elsewhere
∞ elsewhere.
(6.3)
6. Update Strategies for Perturbed Nonsmooth Equations
117
The active and inactive sets Ab+ (b
u) etc. for the derivative problem are taken with
respect to the bounds b
a and bb. For the ease of reference, we also state the semismooth Newton method for the derivative problems u
b = u′ [θ0 ]θ and φb = φ′ [θ0 ]θ, see
Algorithm 2. Note that these quantities satisfy
u′ [θ0 ](θ) = ΠI[a,b,φ0 ] φ′ [θ0 ]θ
φ′ [θ0 ](θ) = g ′ (θ0 )θ − G(θ0 )u′ [θ0 ]θ − (G′ (θ0 )θ)u0 ,
so each can be computed from the other.
Algorithm 2 Semismooth Newton algorithm to compute u′ [θ0 ]θ and φ′ [θ0 ]θ.
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
Choose u
b0 ∈ L∞ (D) and set n := 0
Set the bounds b
a and bb according to (6.3)
′
b
Set φn := g (θ0 )θ − G(θ0 )b
un − (G′ (θ0 )θ)u0
b
b
Set rbn := F (b
un ) = φn − u
bn − max{0, φbn − bb} − min{0, φbn − b
a}
while kb
rn k∞ > tol do
Set δu|Ab+ (bun ) := bb − u
bn on Ab+ (b
un )
−
Set δu b−
:= b
a−u
bn on Ab (b
un )
|A (b
un )
b
b un )
Solve (G(θ0 ) + I) δu|I(b
bn − G(θ0 )δu|A(b
b un ) = φn − u
b un ) on I(b
Set u
bn+1 := u
bn + δu
Set φbn+1 := g ′ (θ0 )θ − G(θ0 )b
un+1 − (G′ (θ0 )θ)u0
bn+1 − b
Set rbn+1 := Fb(b
un+1 ) = φbn+1 − u
bn+1 − max{0, φbn+1 − bb} − min{0, φ
a}
Set n := n + 1
end while
Set u′ [θ0 ]θ := u
bn and φ′ [θ0 ]θ := φbn
7. Update Strategies and Error Estimates
In this section, we analyze three different update strategies for the solution of (Oθ ).
Suppose that θ0 ∈ Θ is a given reference parameter, and that u0 = u[θ0 ] is the unique
solution of (Oθ ) associated to this parameter. Our goal is to analyze strategies to
approximate the perturbed solution u[θ] using the known reference solution u0 and
derivative information u′ [θ0 ] or φ′ [θ0 ]. Such strategies are particularly useful if they
provide a reasonable approximation of the perturbed solution at lower numerical effort
than is required by the repeated solution of the perturbed problem. We will see below
that our strategies fulfill this condition to some degree. However, the full potential
of these update schemes can only be revealed in nonlinear applications, where the
solution of the derivative problem is significantly less expensive then the solution of
the original problem. This deserves further investigation.
The three strategies we are considering are:
C1 (θ) := u0 + u′ [θ0 ](θ − θ0 )
C2 (θ) := Π[a,b]
C3 (θ) := Π[a,b]
u0 + u [θ0 ](θ − θ0 )
φ0 + φ′ [θ0 ](θ − θ0 ) .
′
(C1 )
(C2 )
(C3 )
Apparently, all of the above yield approximations of u[θ] in the vicinity of θ0 . Strategies (C1 )–(C2 ) are based exclusively on primal quantities, while (C3 ) invokes adjoint
quantities. Note that in the equations (Oθ ) and (Dθ ), the orders of the smoothing
operation G and the projection Π are reversed.
Our main result is:
118
Numerical Methods and Applications
Theorem 7.1. The update strategies (C1 )–(C3 ) admit the following approximation
properties:
kC1 (θ) − u[θ]kp
→ 0 as kθ − θ0 k → 0 for all p ∈ [2, ∞)
(7.1)
kθ − θ0 k
kC2 (θ) − u[θ]kp
→ 0 as kθ − θ0 k → 0 for all p ∈ [2, ∞)
(7.2)
kθ − θ0 k
kC3 (θ) − u[θ]kp
→ 0 as kθ − θ0 k → 0 for all p ∈ [2, ∞].
(7.3)
kθ − θ0 k
Strategies (C2 ) and (C3 ) yield feasible approximations, i.e., Ci (θ) ∈ Uad for i = 2, 3.
The error term for (C2 ) is not larger than the term for (C1 ).
Proof. Equation (7.1) follows immediately from the B-differentiability result for u[·],
Theorem 4.5. For the second strategy, we have
kC2 (θ) − u[θ]kp = kΠ[a,b] u0 + u′ [θ0 ](θ − θ0 ) − u[θ]kp
= kΠ[a,b] u0 + u′ [θ0 ](θ − θ0 ) − Π[a,b] (u[θ])kp
≤ ku0 + u′ [θ0 ](θ − θ0 ) − u[θ]kp
= kC1 (θ) − u[θ]kp ,
by the Lipschitz property of the projection, and the result follows as before. Finally,
(7.3) was proven in Corollary 5.3.
Note that (C3 ) admits an estimate for the remainder quotient in L∞ (D), while the
others do not. However, the remainder itself can be estimated in L∞ as the following
corollary shows:
Corollary 7.2. Strategies (C1 )–(C3 ) admit the following approximation property:
kCi (θ) − u[θ]k∞ → 0 as kθ − θ0 k → 0,
for i = 1, 2, 3.
Proof. For strategy (C1 ), the claim was proven in Lemma 4.7 with α = 0. For (C2 ),
we estimate as in the proof of Theorem 7.1 and obtain
kC2 (θ) − u[θ]k∞ = kΠ[a,b] u0 + u′ [θ0 ](θ − θ0 ) − u[θ]k∞
= kΠ[a,b] u0 + u′ [θ0 ](θ − θ0 ) − Π[a,b] (u[θ])k∞
≤ ku0 + u′ [θ0 ](θ − θ0 ) − u[θ]k∞
= kC1 (θ) − u[θ]k∞
The claim for (C3 ) follows directly from (7.3).
All three update strategies come at practically the same numerical cost, namely the
solution of one derivative problem. Note that both u′ [θ0 ](θ − θ0 ) and φ′ [θ0 ](θ − θ0 )
are computed simultaneously by Algorithm 2. The additional projection in (C2 ) and
(C3 ) is inexpensive. However, only (C2 ) and (C3 ) yield feasible approximations of the
perturbed solution, and only for (C3 ) the remainder quotient (7.3) goes to zero in
L∞ (D) as θ → θ0 . Therefore, we advocate the use of the (C3 ) strategy to compute
corrections of the nominal solution u0 in the presence of perturbations.
In the next section, our findings are supported by numerical experiments.
8. Applications in Optimal Control
In this section, we present some applications of our results in the context of optimal
control and report on numerical experiments. As an example, we treat a class of elliptic
boundary control problems. The case of distributed control is simpler and therefore
omitted. Numerical results are given which illustrate the performance of the update
strategies analyzed in Section 7 and support the superiority of scheme (C3 ).
6. Update Strategies for Perturbed Nonsmooth Equations
119
8.1. Boundary Control of an Elliptic Equation. Let us suppose that Ω ⊂ RN ,
N ∈ {2, 3} is a bounded domain with Lipschitz continuous boundary Γ. We define the
elliptic differential operator
Ay(x) = −∇ · (A(x)∇y(x))
⊤
N ×N
where A(x) = A(x) ∈ R
has entries in L∞ (Ω) such that A is uniformly elliptic,
⊤
2
i.e., y A(x)y ≥ ̺|y| holds uniformly in Ω with some ̺ > 0. We consider the elliptic
partial differential equation with boundary control
Ay + c0 y = 0 on Ω
(8.1)
∂y
+ αy = u on Γ
∂nA
where c0 ∈ L∞ (Ω), c0 ≥ 0, α ∈ L∞ (Γ), α ≥ 0 such that kαkL2 (Γ) + kc0 kL2 (Ω) > 0.
It is well known that (8.1) has a unique solution y = Su for every u ∈ L2 (Γ). The
adjoint operator S ⋆ maps a given f to the trace of the unique solution of
Ap + c0 p = f on Ω
(8.2)
∂p
+ αp = 0 on Γ.
∂nA
Lemma 8.1 (see [9]). The following are bounded linear operators:
(1) S : L2 (Γ) → Lp (Ω) for all p ∈ [2, ∞).
(2) S ⋆ : Lr (Ω) → L∞ (Γ) for all r ∈ (N/2, ∞].
We set D = Γ and consider the elliptic boundary optimal control problem:
γ
1
(Eθ )
Find u ∈ Uad which minimizes kSu − θk2L2 (Ω) + kuk2
2
2
with γ > 0. For the parameter space, i.e., desired states, it is sufficient to choose
Θ = L2 (Ω) in order to satisfy the assumptions of Section 2. It is well known that for
any given θ ∈ Θ, a necessary and sufficient optimality condition for (Eθ ) is
1
u = Π[a,b] − S ⋆ (Su − θ)
(8.3)
γ
which fits our setting (Oθ ) with the choice
1
1
g(θ) = S ⋆ θ
G(θ) = S ⋆ S.
γ
γ
Using Lemma 8.1, one readily verifies the conditions of Section 2. Note that
p[θ] := γ g(θ) − G(θ)u[θ] = −S ⋆ (Su[θ] − θ) = γφ[θ]
is the usual adjoint state belonging to problem (Eθ ), which satisfies (8.2) with f =
−(Su[θ] − θ).
8.2. Numerical Results. We will verify our analytical results by means of the following example: We consider as a specific choice of (8.1)
−∆y + y = 0 on Ω
∂y
= u on Γ
∂n
on Ω = (0, 1) × (0, 1). As bounds, we have a = −10 and b = 2. The control cost factor
is γ = 0.1 and the nominal parameter is θ0 (x1 , x2 ) = x21 + x22 .
The discretization is carried out with piecewise linear and globally continuous finite
elements on a grid with 3121 vertices and 5600 triangles, which is refined near the
boundary of Ω, see Figure 8.1. We refer to the corresponding finite element space as
Vh ⊂ H 1 (Ω) and its restriction to the boundary is Bh . During the optimization loop
(Algorithm 1), the discretized variables u and φ are taken as elements of Bh while the
120
Numerical Methods and Applications
intermediate quantities Su as well as the adjoint state −S ⋆ (Su − θ), before restriction
to the boundary, are taken in Vh . The computation of the active sets in the generalized
Newton’s method is done in a simple way, by determining those vertices of the given
grid at which φ ≥ b (or ≤ a) are satisfied.
As a caveat, we remark that our convergence results (7.1)–(7.3) for the update
strategies (C1 ) through (C3 ) cannot be observed when all quantities are confined to
any fixed grid. The reason is that in this entirely static finite-dimensional problem,
all Lp -norms are equivalent and hence the numerical results show no difference in the
approximation qualities of the different strategies.
In order to obtain more accurate results while keeping a fixed grid for the ease
of implementation, we apply three postprocessing steps during the computation, see
[7]. The exact procedure used is outlined below as Algorithm 3 and we explain the
individual steps. Once the nominal solution u0 ∈ Bh is computed as described above
(step 1:), the final u
f0 6∈ Bh is obtained by a postprocessing step, i.e., by a pointwise
exact projection of the piecewise linear function φ0 ∈ Bh to the interval [a, b], observing
that the intersection of φ0 with the bounds does not usually coincide with boundary
vertices of the finite element grid (step 2:). The nominal solution is shown in Figure 8.1
and 8.2.
Nominal Control
2.2
1
2
1.8
0.5
1.6
0
1.4
1.2
−0.5
1
−1
−1
−0.5
0
0.5
1
0.8
0
1
2
3
4
Figure 8.1. Mesh refined near the boundary (left). The right figure
shows the nominal control u0 (solid) and dual quantity φ0 (dashed),
unrolled from the lower left corner of the domain in counterclockwise
direction.
Nominal State
Nominal Desired State
2
2
1.5
1.5
1
1
0.5
0.5
0
1
0
1
1
0.5
0
0
−1 −1
−0.5
1
0.5
0
0
−1 −1
−0.5
Figure 8.2. Nominal state Su0 (left) and nominal desired state θ0 (right).
6. Update Strategies for Perturbed Nonsmooth Equations
121
A sequence of perturbed solutions u[θi ] corresponding to parameters {θi }ni=1 near θ0
is computed in the same way (step 3:), i.e., with the simple active set strategy on the
fixed grid and a postprocessing step. In the numerical experiments, every parameter θi
is obtained by a random perturbation of the finite element coordinates of the desired
state θ0 . This allows us to verify that the error estimates of Theorem 7.1 are indeed
uniform with respect to the perturbation direction. The perturbations have specified
norms, namely
i−1
{kθi − θ0 k2 }ni=1 = logspace(0,-2.5,n) = 10−2.5· n−1 ,
i = 1, . . . , n,
where n = 61.
The derivative problems for u′ [θ0 ](θi − θ0 ) and φ′ [θ0 ](θi − θ0 ) involve bounds which
take only the values b
a, bb ∈ {0, ±∞} and depend on the nominal solution u0 and adjoint
quantity φ0 , see (6.3). These bounds are expressed in terms of constant values on the
intervals of the boundary grid (step 4:), and again the simple active set strategy on the
original grid is used to solve the derivative problems u′ [θ0 ](θ − θ0 ) and φ′ [θ0 ](θ − θ0 ),
see (step 5:), for the various perturbation directions θi − θ0 . Then two postprocessing
steps follow. In the first (step 6:), b
a and bb are determined from (6.3) more accurately
than before, using the true intersection points of the nominal adjoint variable φ0 with
the original bounds a and b. In the second (step 7:), the derivative u′ [θ0 ](θ − θ0 ) is
postprocessed and set to the true projection of φ′ [θ0 ](θ − θ0 ) to the improved bounds
b
a and bb. The exact procedure used to verify our theoretical results is outlined below
as Algorithm 3.
Algorithm 3 The discretized procedure used to obtain the numerical results.
1:
2:
3:
4:
5:
6:
7:
Run Algorithm 1 on the fixed grid (Figure 8.1). Active sets are determined by
boundary mesh points. The results u0 and φ0 are elements of Bh . The state Su0
and adjoint state −S ⋆ (Su0 − θ0 ) are elements of Vh .
Obtain an improved solution u
f0 = Π[a,b] (φ0 ) by carrying out the exact projection
(postprocessing) of the adjoint quantity φ0 ∈ Bh to the bounds a and b. u
f0 is no
longer in Bh .
Repeat steps 1: and 2: for a sequence of perturbations {θi }ni=1 near θ0 to obtain
solutions ui and, by postprocessing, improved solutions uei , i = 1, . . . , n. (This is
to form the difference quotients (7.1)–(7.3) later.)
Compute the bounds b
a and bb by (6.3) as functions which are constant (possibly
±∞) on the intervals of the boundary grid.
Run Algorithm 2 on the fixed grid (Figure 8.1), for the given sequence of perturbation directions θi − θ0 , i = 1. . . . , n. One obtains the derivatives u′ [θ0 ](θi − θ0 )
and dual derivatives φ′ [θ0 ](θi − θ0 ), both elements of Bh .
Obtain an improved choice for the bounds b
a and bb by determining the exact transition points in (6.3).
Obtain an improved derivative u
e′ [θ0 ](θi − θ0 ) by carrying out the exact projection
(postprocessing) of the dual derivative φ′ [θ0 ](θ − θ0 ) to the improved bounds b
a
and bb.
Figure 8.3 (left) shows the behavior of the approximation errors
kapproximation errori kp = kC1 (θi − θ0 ) − u[θi ]kp ,
while Figure 8.3 (right) shows the behavior of the error quotients
kCi (θi − θ0 ) − u[θi ]kp
kapproximation errori kp
=
ksize of perturbationkL2 (Ω)
kθi − θ0 kL2 (Ω)
122
Numerical Methods and Applications
C1 approximation errors in different norms
0
L
L2
−5
10
approximation error
approximation error
∞
L
L2
−5
10
−10
10
−15
10
−20
−10
10
−15
10
−20
10
10
−25
10
C1 error quotients in different norms
10
∞
−25
−2
10
−1
10
perturbation size ||θ − θ ||
10
0
10
−2
10
0 2
0
L
L2
∞
L
L2
−5
10
10
approximation error
approximation error
C2 error quotients in different norms
10
∞
−10
10
−15
10
−20
−10
10
−15
10
−20
10
10
−25
10
−25
−2
10
−1
10
perturbation size ||θ − θ ||
10
0
10
−2
10
0 2
0
∞
L
L2
−5
10
10
approximation error
approximation error
0
10
C3 error quotients in different norms
10
∞
L
L2
−5
−1
10
perturbation size ||θ − θ ||
0 2
C3 approximation errors in different norms
−10
10
−15
10
−20
−10
10
−15
10
−20
10
10
−25
10
0
10
0 2
C2 approximation errors in different norms
−5
−1
10
perturbation size ||θ − θ ||
−25
−2
10
−1
10
perturbation size ||θ − θ ||
0 2
0
10
10
−2
10
−1
10
perturbation size ||θ − θ ||
0
10
0 2
Figure 8.3. Approximation errors kCi (θ) − u[θ]kp (left) and error
quotients (7.1)–(7.3) (right) in different Lp (Γ) norms, plotted against
the size of the perturbation kθi − θ0 k2 in a double logarithmic scale.
Top row refers to strategy (C1 ), middle row to (C2 ), bottom row to
(C3 ). In each plot, the upper line corresponds to p = ∞, the lower to
p = 2.
as in (7.1)–(7.3). In the enumerator, the Lp (Γ) norms for p ∈ {2, ∞} are used. The
scales in Figure 8.3 are doubly logarithmic and they are the same for each of the plots.
Using the procedure for the discretized problems outlined in Algorithm 3, we observe
the following results:
(1) The approximation error for strategy (C2 ) is indeed smaller (approximately
by a factor of 2) than the error using strategy (C1 ), see Figure 8.3 (first and
second row), as expected from Theorem 7.1.
6. Update Strategies for Perturbed Nonsmooth Equations
123
(2) The approximation error for strategy (C3 ) is in turn smaller (approximately
by a factor of 7) than the error using strategy (C2 ), see Figure 8.3 (second and
third row).
(3) As predicted by Theorem 7.1, the error quotient in the L∞ (Γ) norm does not
tend to zero for strategies (C1 ) and (C2 ), see Figure 8.3 (top right and middle
right).
(4) Theorem 7.1 predicts the approximation error and its quotient for strategy
(C3 ) to tend to zero in particular in the L∞ (Γ)-norm. In the experiments,
we observe that the approximation error tends to a constant (approximately
6.3 · 10−14 , see Figure 8.3 (bottom left)). This is to be expected as we reach
the discretization limit on the given grid.
To summarize, Theorem 7.1 is confirmed by the numerical results. The update
strategy (C3 ), which involves the dual variable φ, performs significantly better than the
strategies based on the primal variable u. We can also offer a geometric interpretation
for this: The derivative u′ [θ0 ] of the primal variable u0 is given by a projection and it is
zero on the so-called strongly active sets, i.e., where φ0 6∈ [a, b], compare Theorem 4.5
and (6.3). Consequently, the primal-based strategies (C1 ) and (C2 ) can only predict a
possible growth of the active sets from u0 to u[θ], and not their shrinking. On the other
hand, the derivative of the dual variable φ′ [θ0 ] (Theorem 5.2) has a different structure
and it can capture the change of active sets more accurately. Since u′ [θ0 ] and φ′ [θ0 ]
are available simultaneously, see Algorithm 2, we advocate the use of strategy (C3 ) to
recover a perturbed from an unperturbed solution.
References
[1] F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems. Springer, Berlin,
2000.
[2] M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth
Newton method. SIAM Journal on Optimization, 13(3):865–888, 2002.
[3] D. Kinderlehrer and G. Stampacchia. An Introduction to Variational Inequalities and their Applications. Academic Press, New York, 1980.
[4] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002.
[5] K. Malanowski. Remarks on differentiability of metric projections onto cones of nonnegative
functions. Journal of Convex Analysis, 10(1):285–294, 2003.
[6] K. Malanowski. Solution differentiability of parametric optimal control for elliptic equations. In
System modeling and optimization, Proceedings of the IFIP TC7 Conference, volume 130, pages
271–285. Kluwer, 2003.
[7] C. Meyer and A. Rösch. Superconvergence properties of optimal control problems. SIAM Journal
on Control and Optimization, 43(3):970–985, 2004.
[8] A. Shapiro. On concepts of directional differentiability. Journal of Optimization Theory and
Applications, 66(3):477–487, 1990.
[9] F. Tröltzsch. Optimale Steuerung partieller Differentialgleichungen. Vieweg, Wiesbaden, 2005.
[10] M. Ulbrich. Semismooth Newton methods for operator equations in function spaces. SIAM Journal on Optimization, 13:805–842, 2003.
124
Numerical Methods and Applications
7. Quantitative Stability Analysis of Optimal Solutions in
PDE-Constrained Optimization
K. Brandes and R. Griesse: Quantitative Stability Analysis of Optimal Solutions in
PDE-Constrained Optimization, Journal of Computational and Applied Mathematics,
206(2), p.809–826, 2007
The derivative of an optimal solution with respect to parameter perturbations naturally lends itself to the quantitative assessment of that solution’s stability. In this
paper, we address the question on how to identify the perturbation direction which
has the greatest impact on the solution, or on a quantity of interest depending on the
solution. We address only the case without inequality constraints, because we exploit
in particular the linearity of the map δπ 7→ DΞ(π0 ; δπ). However, the results can
be easily extended to problems with inequality constraints if strict complementarity
holds, compare Remark 0.5 on p. 10.
We employ here the setting of a generic optimal control problem in Banach spaces,
(7.1)
min J(y, u, π)
(y,u)
subject to e(y, u, π) = 0.
with Lagrangian
L(y, u, p; π) = J(y, u, π) + hp, e(y, u, π)i .
Under differentiability and constraint qualification assumptions, a local optimal solution of (7.1) for the nominal parameter π0 ∈ P satisfies, together with its adjoint
state,
Ly (y0 , u0 , p0 ; π0 ) = Lu (y0 , u0 , p0 ; π0 ) = Lp (y0 , u0 , p0 ; π0 ) = 0.
Differentiating totally with respect to π yields
 
δy
δu := Ξ0 (π0 ) δπ = K−1 B δπ,
δp
where




Lyy Lyu e?y
Lyπ
K = Luy Luu e?u 
B = − Luπ 
eπ
ey
eu
0
and everything is evaluated at Ξ(π0 ) = (y0 , u0 , p0 ) and parameter π0 . The Implicit
Function Theorem justifies the existence and representation of the derivative since
K is indeed boundedly invertible under (assumed) second-order sufficient optimality
conditions.
As stated above, it is our goal to analyze the rate of change of an observed quantity
q(y, u, p) ∈ H which depends on the optimal solution, which in turn depends on the
parameter π. Here, the space of observations H may be a finite or infinite dimensional
Hilbert space, and the same holds for the space of parameters P . Let us denote the
derivative of q at (y0 , u0 , p0 ) by Π. By the chain rule, q is totally differentiable with
respect to the parameter π, and our operator of interest is
(7.2)
A := ΠK−1 B
Note that A is the Fréchet derivative of the map π 7→ q(Ξ(π)) at π0 . In other words, it
represents the linear relation between a perturbation direction δπ and the first order
change in the observed quantity q, when π changes from π0 + δπ.
The desired information regarding perturbation directions of greatest impact is contained in A and can be retrieved by a (partial) singular value decomposition (SVD)
of A. (This requires A to be compact, which is naturally the case in many situations,
see Example 2.9 in the paper.) The right singular vectors, in descending order with
7. Quantitative Stability Analysis
125
respect to the singular values, disclose the perturbation directions of greatest impact
on the observed quantity q, in the norm of the observation space H. The corresponding left singular values yield the respective directional derivatives of q under these
perturbations. The largest singular value allows to quantify the first-order stability of
q, which may be an important piece of information in practical applications.
The remainder of the paper deals with the practical evaluation of a partial and approximate SVD of A, after a Galerkin discretization. One of the challenges consists
in overcoming the occurrence of Cholesky factors of mass matrices, which naturally
appear when one computes with respect to given bases of finite dimensional subspaces
of P and H. We achieve this goal by exchanging the SVD for an eigen decomposition
of the associated Jordan-Wielandt matrix, to which we apply a suitable similarity
transformation, see Section 3 of the paper. We describe an algorithm which allows
to construct, using standard iterative eigen decomposition software such as Matlab’s
eigs routine, a partial SVD entirely in terms of coordinate vectors with respect to the
chosen bases, and without the need of modifying any scalar products.
In Section 4 of the paper, we present numerical examples. We deal explicitly with the
cases of low and high dimensional parameter and observation spaces P and H.
126
Numerical Methods and Applications
QUANTITATIVE STABILITY ANALYSIS OF OPTIMAL
SOLUTIONS IN PDE-CONSTRAINED OPTIMIZATION
KERSTIN BRANDES AND ROLAND GRIESSE
Abstract. PDE-constrained optimization problems under the influence of perturbation parameters are considered. A quantitative stability analysis for local
optimal solutions is performed. The perturbation directions of greatest impact on
an observed quantity are characterized using the singular value decomposition of
a certain linear operator. An efficient numerical method is proposed to compute
a partial singular value decomposition for discretized problems, with an emphasis
on infinite-dimensional parameter and observation spaces. Numerical examples
are provided.
1. Introduction
In this work we consider nonlinear infinite-dimensional equality-constrained optimization problems, subject to a parameter p in the problem data:
min f (x, p)
x
subject to e(x, p) = 0.
(1.1)
The optimization variable x and the parameter p are in some Banach and Hilbert
spaces, respectively, and f and e are twice continuously differentiable. In particular, we
have in mind optimal control problems for partial differential equations (PDE). When
solving practical optimal control problems which describe the behavior of physical
systems, uncertainty in the physical parameters is virtually unavoidable. In (1.1), the
uncertain data is expressed in terms of a parameter p for which a nominal or expected
value p0 is available but whose actual value is unknown. Having solved problem (1.1)
for p = p0 , it is thus natural and sometimes crucial to assess the stability of the optimal
solution with respect to unforeseen changes in the problem data.
In this contribution we quantify the first-order stability properties of a local optimal solution of (1.1), and more generally, the stability properties of an observed
quantity depending on the solution. We make use of the singular value decomposition
(SVD) for compact operators. Moreover, we propose a practical and efficient procedure to approximate the corresponding singular system. The right singular vectors
corresponding to the largest singular values represent the perturbation directions of
greatest impact on the observed quantity. The singular values themselves provide an
upper bound for the influence of unit perturbations. Altogether, this information allows practitioners to assess the stability properties of any given optimal solution, and
to avoid the perturbations of greatest impact.
Let us briefly relate our effort to previous results in the field. The differentiability
properties of optimal solutions with respect to p in the context of PDE-constrained
optimization were studied in, e.g., [4,10]. The impact of given perturbations on optimal
solutions and the optimal value of the objective has also been discussed there. For
the dependence of a scalar quantity of interest on perturbations we refer to [6]. All
of these results admit pointwise inequality constraints for the control variable. For
simplicity of the presentation, we elaborate on the case without inequality constraints.
However, our results extend to problems with inequality (control) constraints in the
presence of strict complementarity, see Remark 3.6.
7. Quantitative Stability Analysis
127
The material is organized as follows: In Section 2, we perform a first order perturbation analysis of solutions for (1.1) in the infinite-dimensional setting of PDEconstrained optimization, and discuss their stability properties using the singular value
decomposition of a certain compact linear map. In Section 3 we focus on the discretized
problem and propose a practical and efficient method to compute the most significant
part of the singular system. Finally, we present numerical examples in Section 4.
For normed linear spaces X and Y , L(X, Y ) denotes the space of bounded linear
operators from X into Y . The standard notation Lp (Ω) and H 1 (Ω) for Sobolev spaces
is used, see [1].
2. Infinite-Dimensional Perturbation Analysis
As mentioned in the introduction, we are mainly interested in the analysis of optimal
control problems involving PDEs. Hence we re-state problem (1.1) as
min f (y, u, p) subject to e(y, u, p) = 0
y,u
(2.1)
where the optimization variable x = (y, u) splits into a state variable y ∈ Y and a
control or design variable u ∈ U and where e : Y × U → Z ⋆ represents the weak form
of a stationary or non-stationary partial differential equation. Throughout, Y , U and
Z are reflexive Banach spaces and Z ⋆ denotes the dual of Z. Problem (2.1) depends
on a parameter p taken from a Hilbert space P , which is not optimized for but which
represents perturbations or uncertainty in the problem data. We emphasize that p
may be finite- or infinite-dimensional.
For future reference, it will be convenient to define the Lagrangian of problem (2.1)
as
L(y, u, λ, p) = f (y, u, p) + hλ, e(y, u, p)i .
(2.2)
The following two results are well known [11]:
Lemma 2.1 (First-Order Necessary Conditions). Let f and e be continuously differentiable with respect to (y, u). Moreover, let (y, u) be a local optimal solution for
problem (2.1) for some given parameter p. If ey (y, u, p) ∈ L(Y, Z ⋆ ) is onto, then there
exists a unique Lagrange multiplier λ ∈ Z such that the following optimality system is
satisfied:
Ly (y, u, λ, p) = fy (y, u, p) + hλ, ey (y, u, p)i = 0
(2.3)
Lu (y, u, λ, p) = fu (y, u, p) + hλ, eu (y, u, p)i = 0
(2.4)
Lλ (y, u, λ, p) = e(y, u, p) = 0.
(2.5)
In the context of optimal control, λ is called the adjoint state. A triple (y, u, λ)
satisfying (2.3)–(2.5) is called a critical point.
Lemma 2.2 (Second-Order Sufficient Conditions). Let (y, u, λ) be a critical point such
that ey (y, u, p) is onto and let f and e be twice continuously differentiable with respect
to (y, u). Suppose that there exists ρ > 0 such that Lxx (y, u, λ, p)(x, x) ≥ ρ kxk2Y ×U
holds for all x ∈ ker ex (y, u, p). Then (y, u) is a strict local optimal solution of (2.1).
Let us fix the standing assumptions for the rest of the paper:
Assumption 2.3.
(1) Let f and e be twice continuously differentiable with respect to (y, u, p).
(2) Let p0 be a given nominal or expected value of the parameter, and let (y0 , u0 )
be a local optimal solution of (2.1) for p0 .
(3) Suppose that ey (y0 , u0 , p0 ) is onto and that λ0 is the unique adjoint state.
(4) Suppose that the second-order sufficient conditions of Lemma 2.2 hold at (y0 , u0 , λ0 ).
128
Numerical Methods and Applications
Remark 2.4. For the sake of the generality of the presentation, we abstain from
using more specific, i.e., weaker, second-order sufficient conditions for optimal control
problems with PDEs, see, e.g., [16, 17]. In case the setting of a specific problem at
hand requires refined second-order conditions and a careful choice of function spaces,
the subsequent ideas still remain valid, compare Example 2.5.
Let us define now the Karush-Kuhn-Tucker (KKT) operator


Lyy Lyu e⋆y
K = Luy Luu e⋆u 
ey
eu
0
(2.6)
where all terms are evaluated at the nominal solution (y0 , u0 , λ0 ) and the nominal
parameter p0 , and e⋆y and e⋆u denote the adjoint operators of ey and eu , respectively.
Note that K is self-adjoint. Here and in the sequel, when no ambiguity arises, we will
frequently omit the function arguments.
Under the conditions of Assumption 2.3, K is boundedly invertible as an element
of L(Y × U × Z, Y ⋆ × U ⋆ × Z ⋆ ).
Example 2.5 (Optimal Control of the Stationary Navier-Stokes System). As mentioned in Remark 2.4, nonlinear PDE-constrained problems may require refined secondorder sufficient conditions. Consider, for instance, the distributed optimal control
problem for the stationary Navier-Stokes equations,
min
y,u
s.t.
γ
1
ky − yd k2[L2 (Ω)]N + kuk2[L2(Ω)]N
2
2

−ν∆y
+
(y
·
∇)y
+
∇p = u on Ω


div y = 0 on Ω


y = 0 on ∂Ω
on some bounded Lipschitz domain Ω ⊂ RN , N ∈ {2, 3}. Suitable function spaces for
the problem are
Y = Z = closure in [H 1 (Ω)]N of {v ∈ [C0∞ (Ω)]N : div v = 0},
U = [L2 (Ω)]N .
In [17, Theorem 3.16] it was proved that the condition
Z
kyk2[L2(Ω)]N + γkuk2[L2(Ω)]N + 2 (y · ∇)yλ0 ≥ ρ kuk2[L4/3(Ω)]N
Ω
for some ρ > 0 and all (y, u) satisfying the linearized state equation at (y0 , u0 ) is a
second-order sufficient condition of optimality for a critical point (y0 , u0 , λ0 ). Hence
this weaker condition may replace Assumption 2.3(4) for this problem. Still, it can
be proved along the lines of [4, 10] that K is boundedly invertible as an element of
L(Y × [L4/3 (Ω)]N × Z, Y ⋆ × [L4 (Ω)]N × Z ⋆ ). The subsequent ideas remain valid when
U is replaced by L4/3 (Ω).
From the bounded invertibility of K, we can easily derive the differentiability of the
parameter-to-solution map from the implicit function theorem [2]:
Lemma 2.6. There exist neighborhoods B1 of p0 and B2 of (y0 , u0 , λ0 ) and a continuously differentiable function Ψ : B1 → B2 such that for all p ∈ B1 , Ψ(p) is the unique
solution in B2 of (2.3)–(2.5). The Fréchet derivative of Ψ at p0 is given by


Lyp
Ψ′ (p0 ) = −K−1 Lup 
(2.7)
ep
where the right hand side is evaluated at the nominal solution (y0 , u0 , λ0 ) and p0 .
7. Quantitative Stability Analysis
129
In particular, we infer from Lemma 2.6 that for a given perturbation direction p,
the directional derivatives of the nominal optimal state and optimal control and the
corresponding adjoint state (y, u, λ) are given by the unique solution of the linear
system in Y ⋆ × U ⋆ × Z ⋆


 
y
Lyp
where
B = − Lup  .
K u  = B p
(2.8)
ep
λ
These directional derivatives are called the parametric sensitivities of the state, control
and adjoint variables. They describe the first-order change in these variables as p
changes from p0 to p0 + p.
It is worth noting that these sensitivities can be characterized alternatively as the
unique solution x = (y, u) and adjoint state of the following auxiliary problem with
quadratic objective and linear constraint:
1
min Lxx (y0 , u0 , λ0 , p0 )(x, x) + Lxp (y0 , u0 , λ0 , p0 )(x, p)
y,u 2
subject to ey (y0 , u0 , p0 ) y + eu (y0 , u0 , p0 ) u = −ep (y0 , u0 , p0 ) p.
(2.9)
Hence, computing the parametric sensitivity in a given direction p amounts to solving
one linear-quadratic problem (2.9).
We recall that it is our goal to analyze the stability properties of an observed
quantity
q : Y × U × Z ∋ (y, u, λ) 7→ q(y, u, λ) ∈ H
depending on the solution, where H is another finite- or infinite-dimensional Hilbert
space and q is differentiable. By the chain rule, the first-order change in the observed
quantity, as p changes from p0 to p0 + p, is given by
Π(y, u, λ) := q ′ (y0 , u0 , λ0 )(y, u, λ).
(2.10)
′
We refer to Π = q (y0 , u0 , λ0 ) ∈ L(Y × U × Z, H) as the observation operator. Due to
(2.8), we have the following linear relation between perturbation direction p and first
order change in the observed quantity:
Π(y, u, λ) = ΠK−1 B p.
Example 2.7 (Observation Operators).
(i) If one is interested in the impact of perturbations on the optimal state on some
subset Ω′ of the computational domain Ω, one has q(y, u, λ) = y|Ω′ and, due
to linearity, Π = q holds.
(ii) If the quantity of interest is the impact of perturbations
on the average value
R
of the control variable, one chooses q(y, u, λ) = u where the integral extends
over the control domain.
It is the bounded linear map ΠK−1 B that we now focus our attention on. The
maximum impact of all perturbations (of unit size) on the observed quantity is given
by the operator norm
kΠK−1 BkL(P,H) = sup
p6=0
kΠK−1 B pkH
.
kpkP
(2.11)
To simplify the notation, we will also use the abbreviation
A := ΠK−1 B.
In general, the operator norm need not be attained for any direction p. Therefore,
and in order to perform the singular value decomposition, we make the following
assumption:
130
Numerical Methods and Applications
Assumption 2.8. Suppose that A is compact from P to H.
To demonstrate that this assumption is not overly restrictive, we discuss several
important examples. Recall that in PDE-constrained optimization, Y and Z are
infinite-dimensional function spaces. Hence, K−1 cannot be compact since then its
spectrum would contain 0 which entails non-invertibility of K−1 . (Of course, if all of
Y , U and Z are finite-dimensional, Assumption 2.8 holds trivially.)
Example 2.9 (Compactness of A).
(i) If at least one of the parameter or observation spaces P or H is finite-dimensional,
A is trivially compact.
(ii) For sufficiently regular perturbations, B and thus A is compact: Consider the
standard distributed optimal control problem with Y = Z = H01 (Ω), U =
L2 (Ω), where Ω is a bounded domain with Lipschitz boundary in RN , N ≥ 1,
yd , ud ∈ L2 (Ω), and
γ
1
ky − yd k2L2 (Ω) + ku − ud k2L2 (Ω)
2
2
e(y, u, p)(ϕ) = (∇y, ∇ϕ) − (u, ϕ) − hp, ϕiH −1 (Ω),H01 (Ω) ,
f (y, u) =
ϕ ∈ H01 (Ω),
which corresponds to −∆y = u+p on Ω and y = 0 on ∂Ω. It is straightforward
to verify that B = (0, 0, id)⊤ . By compact embedding, see [1], B is compact
from P = L(N +2)/(2N )+ε (Ω) into Y ⋆ ×U ⋆ ×Z ⋆ for any ε > 0, and in particular
for the Hilbert space P = L2 (Ω) in any dimension N . Hence A = ΠK−1 B is
compact for P = L2 (Ω) and arbitrary linear and bounded observation operators
Π.
(iii) In the previous example, neither B nor K−1 B is compact if P = H −1 (Ω). In
that case, one has to choose an observation space of sufficiently low regularity,
so that Π and hence A is compact. For instance, in the previous example,
Π(y, u, λ) = y is compact into H = L2 (Ω) due to the compact embedding of
H01 (Ω) into L2 (Ω).
We refer to Section 4 for more examples and return to the issue of computing the
operator norm (2.11). This can be achieved by the singular value decomposition [3,
Ch. 2.2]:
Lemma 2.10. There exists a countable system {(σn , vn , un )}n∈N such that {σn }n∈N is
non-increasing and non-negative, {(σn2 , vn )} ⊂ R×P is a complete orthonormal system
of eigenpairs for AH A (spanning the closure of the range of AH ), and {(σn2 , un )} ⊂
R × H is a complete orthonormal system of eigenpairs for AAH (spanning the closure
of the range of A). In addition, Avn = σn un holds and we have
A p = ΠK−1 B p =
∞
X
σn (p, vn )P un
(2.12)
n=1
for all p ∈ P , where the series converges in H. Every value in {σn }n∈N appears with
finite multiplicity.
In Lemma 2.10, AH : H → P denotes the Hilbert space adjoint of A and (·, ·)P
is the scalar product of P . A system according to Lemma 2.10 is called a singular
system for A, with singular values σn , left singular vectors un ∈ H, and right singular
vectors vn ∈ P . Knowledge of the singular system will not only allow us to compute
the operator norm (2.11) and the direction(s) p for which this bound is attained, but
in addition, we obtain a complete sequence of perturbation directions in decreasing
order of importance with regard to the perturbations in the observed quantity. This
is formulated in the following proposition:
7. Quantitative Stability Analysis
131
Proposition 2.11. Let {(σn , vn , un )}n∈N be a singular system for A. Then the operator norm in (2.11) is given by σ1 . Moreover, the supremum is attained exactly for all
non-zero vectors p ∈ span{v1 , . . . , vk } =: V1 , where k is the largest integer such that
σ1 = σk . Similarly, when A is restricted to V1⊥ , its operator norm is given by σk+1
and it is attained exactly for all non-zero vectors p ∈ span{vk+1 , . . . , vl }, where l is
the largest integer such that σk+1 = σl , and so on.
Proof. The claim follows directly from the properties of the singular system.
Proposition 2.11 shows that the question of greatest impact of arbitrary perturbations on the observed quantity is answered by the singular value decomposition (SVD)
of A. It is well known that SVD is closely related to principal components analysis
(PCA) in statistics and image processing [8], and proper orthogonal decomposition
(POD) in dynamical systems, compare [13, 18]. To our knowledge, however, this technique has not been exploited for the quantitative stability analysis of optimization
problems.
In the following section we focus on an efficient algorithm for the numerical computation of the largest singular values and left and right singular vectors for a discretized
version of problem (2.1).
3. Numerical Stability Analysis
In this section, we propose an efficient algorithm for the numerical computation of
the singular system for a discretized (matrix) version of ΠK−1 B. The convergence of
the singular system of the discretized problem to the singular system of the continuous
problem will be discussed elsewhere. In practice, it will be sufficient to compute only
a partial SVD, starting with the largest singular value, down to a certain threshold,
in order to collect the perturbation directions of greatest impact with respect to the
observed quantity. The method we propose makes use of existing standard software
which iteratively approximates the extreme eigenpairs of non-symmetric matrices, and
it will be efficient in the following sense: It is unnecessary to assemble the (discretized)
matrix ΠK−1 B, which is prohibitive for high-dimensional parameter and observation
spaces. Only matrix–vector products with K−1 B are required, i.e., the solution of
sensitivity problems (2.8), and the inexpensive application of the observation operator
Π. In particular, we avoid the computation of certain Cholesky factors which relate the
Euclidean norms of coordinate vectors and the function space norms of the functions
represented by them, see below.
We discretize problem (2.1) by a Galerkin procedure, e.g., the finite element or
wavelet method. To this end, we introduce finite-dimensional subspaces Yh ⊂ Y ,
Uh ⊂ U and Zh ⊂ Z, which inherit the norms from the larger spaces. The discretized
problem reads
min f (y, u, p) subject to e(y, u, p)(ϕ) = 0 for all ϕ ∈ Zh ,
y,u
(3.1)
where (y, u) ∈ Yh × Uh . In the general case of an infinite-dimensional parameter space,
we also choose a finite-dimensional subspace Ph ⊂ P . Should any of the spaces be
finite-dimensional in the first place, we leave it unchanged by discretization.
Suppose that for the given parameter p0 ∈ Ph , a critical point for the discretized
problem has been computed by a suitable method, for instance, by sequential quadratic
programming (SQP) methods [12, 15]. That is, (yh , uh , λh ) ∈ Yh × Uh × Zh satisfies
the discretized optimality system, compare (2.3)–(2.5):
fy (yh , uh , p0 )(δyh ) + hλh , ey (yh , uh , p0 )(δyh )i = 0 for all δyh ∈ Yh
(3.2)
fu (yh , uh , p0 )(δuh ) + hλh , eu (yh , uh , p0 )(δuh )i = 0 for all δuh ∈ Uh
(3.3)
e(yh , uh , p0 )(δzh ) = 0 for all δzh ∈ Zh .
(3.4)
132
Numerical Methods and Applications
We consider the discrete analog of the sensitivity system (2.8), i.e.,


  

yh
δyh E
δyh E D
D
for all (δyh , δuh , δzh ) ∈ Yh × Uh × Zh ,
Kh uh  , δuh  = B h ph , δuh 
δzh
δzh
λh
(3.5)
where Kh and B h are defined as before in (2.6) and (2.8), evaluated at the critical point
(yh , uh , λh ). The perturbation direction ph is taken from the discretized parameter
space Ph .
Assumption 3.1. Suppose that the critical point (yh , uh , λh ) is sufficiently close to the
local solution of the continuous problem (y0 , u0 , λ0 ), such that second-order sufficient
conditions hold for the discretized problem. That is, ey (yh , uh , p0 ) maps Yh onto Zh ,
and there exists ρ′ > 0 such that Lxx (yh , uh , λh , p0 )(x, x) ≥ ρ′ kxk2Y ×U for all x ∈
Yh × Uh satisfying hex (yh , uh , p0 )x, ϕi = 0 for all ϕ ∈ Zh .
Under Assumption 3.1, the KKT operator Kh at the discrete solution is invertible
and equation (3.5) gives rise to a linear map
(Kh )−1 B h : Ph → Yh × Uh × Zh
which acts between finite-dimensional spaces and thus is automatically bounded.
There is no need to discretize the observation space H since ΠK−1 B, restricted to
Ph , has finite-dimensional range. Nevertheless, we define for convenience the subspace
of H,
Rh = range of Πh (Kh )−1 B h
considered as a map Ph → H,
where Πh = q ′ (yh , uh , λh ), compare (2.10).
We recall that it is our goal to calculate the portion of the singular system for
Πh (Kh )−1 B h : Ph → Rh which belongs to the largest singular values. At this point,
we introduce a basis for the discretized parameter space Ph , say
Ph = span {ϕ1 , . . . , ϕm }.
Likewise, we define a space Hh by
Hh := span {ψ1 , . . . , ψn } such that Hh ⊃ Rh .
Both the systems {ϕi } and {ψj } are assumed linearly independent without loss of
generality. As the range space Rh is usually not known exactly, we allow the functions
ψj to span a larger space Hh . For instance, in case of the state observation operator
Πh (y h , uh , λh ) = y h , we may choose {ψj }nj=1 to be identical to the finite element basis
of the state space Yh , which certainly contains the range space Rh .
For the application of numerical procedures, we need to switch to a coordinate
representation of the elements of the discretized parameter and observation spaces Ph
and Hh . Note that a function p ∈ Ph can be identified with its coordinate vector
p = (p1 , . . . , pm )⊤ with respect to the given basis. In other words, Rm and Ph
are isomorphic, and the isomorphism and its inverse are given by the expansion and
coordinate maps
EP : Rm ∋ p 7→
m
X
pi ϕi ∈ Ph
i=1
CP = EP−1 : Ph → Rm .
We also introduce the mass matrix associated to the chosen basis of Ph ,
MP = (mij )m
i,j=1 ,
mij = (ϕi , ϕj )P .
7. Quantitative Stability Analysis
133
In case of a discretization by orthogonal wavelets, MP is the identity matrix, while in
the finite element case, MP is a sparse symmetric positive definite matrix. In any case,
we have the following relation between the Euclidean norm of the coordinate vector p
and the norm of the element p ∈ Ph represented by it:
1/2
kpk2P = p⊤ MP p = kMP pk22 ,
1/2
1/2⊤
1/2
where MP is the Cholesky factor of MP = MP MP , and k · k2 denotes the
Euclidean norm of vectors in Rm or Rn . Similarly as above, we define expansion and
−1
coordinate maps EH : Rn → Hh and CH = EH
and the mass matrix
MH = (mij )ni,j=1 ,
mij = (ψi , ψj )H
to obtain
⊤
1/2
khk2H = h MH h = kMH hk22
Pn
for an element h = j=1 hj ψj ∈ Hh with coordinate vector h = (h1 , . . . , hn )⊤ .
Any numerical procedure which solves the sensitivity problem (3.5) and applies
the observation operator Πh does not directly implement the operator Πh (Kh )−1 B h .
Rather, it realizes its representation in the coordinate systems given by the bases of
Ph and Hh , i.e.,
Ah := CH Πh (Kh )−1 B h EP ∈ Rn×m .
As mentioned earlier, the proposed method will employ matrix-vector products with
Ah . Every matrix-vector product requires the solution of a discretized sensitivity
equation (3.5) followed by the application of the observation operator.
Note that there is a discrepancy in the operator Ah being given in terms of coordinate vectors and the requirement that the SVD should respect the norms of the spaces
Ph and Hh . One way to overcome this discrepancy is to exchange the Euclidean scalar
products in the SVD routine at hand by scalar products with respect to the mass
matrices MP and Mh , respectively. In the sequel, we describe an alternative approach
based on iterative eigen decomposition software, without the need of modifying any
scalar products.
By the relations between coordinate vectors and functions, we have
kΠh (Kh )−1 B h kL(Ph ,Hh ) =
=
kΠh (Kh )−1 B h ph kH
kph kP
ph ∈Ph \{0}
sup
kΠh (Kh )−1 B h EP pkH
kEH Ah pkH
= sup
1/2
kEP pkP
p∈Rm \{0}
p∈Rm \{0} kM
pk2
sup
=
1/2
kMH Ah pk2
sup
1/2
p∈Rm \{0} kMP pk2
=
P
−1/2
1/2 h
kMH A MP
sup
kp′ k2
p′ ∈Rm \{0}
p′ k2
. (3.6)
−1/2
The last manipulation is a coordinate transformation in Ph , and MP
denotes
the inverse of the Cholesky factor of MP . This transformation shows that a finitedimensional SVD procedure which employs the standard Euclidean vector norms in
−1/2
1/2
the image and pre-image spaces should target the matrix MH Ah MP .
Coordinate vectors referring to the new coordinate systems will be indicated by a
prime. We have the relationships
1/2
1/2
p′ = MP p and kp′ k2 = kMP pk2 = kpkP .
Hence the Euclidean norm of the transformed coordinate vector equals the norm of
the function represented by it. The corresponding basis can in principle be obtained
by an orthonormalization procedure with respect to the scalar product in P , starting
from the previously chosen basis {ϕi }. Assembling the mass matrices and forming the
134
Numerical Methods and Applications
1/2
1/2
Cholesky factors MH and MP , however, will be too costly in general. Therefore,
we propose the following strategy which avoids the Cholesky factors altogether. It is
based on the following Jordan-Wielandt Lemma, see, e.g., [14, Theorem I.4.2]:
−1/2
1/2
is equivalent to the
Lemma 3.2. The singular value decomposition of MH Ah MP
eigen decomposition of the symmetric Jordan-Wielandt matrix
!
1/2
−1/2
0
MH Ah MP
∈ R(m+n)×(m+n)
J=
1/2⊤
−1/2⊤ h⊤
0
A MH
MP
min{m,n}
are
in the following sense: The eigenvalues of J are exactly ±σi , where {σi }i=1
1/2 h
−1/2
the singular values of MH A MP , plus a suitable number of zeros. The eigenvectors vi′ belonging to the nonnegative eigenvalues σi , i = 1, . . . , min{m, n}, can be
partitioned into vi′ = (l′i , r′i )⊤ , where r′i ∈ Rm and l′i ∈ Rn . After normalization, r′i
1/2
−1/2
and l′i are the right and left singular vectors of MH Ah MP .
1/2
−1/2
Exchanging the singular value decomposition of MH Ah MP
for an eigen decomposition of the Jordan-Wielandt matrix J does not resolve the issue of forming the
1/2
1/2
Cholesky factors MH and MP . To this end, we apply a similarity transform to J
using the similarity matrices
!
!
1/2
−1/2
MH
0
MH
0
−1
X =
X=
1/2 .
−1/2 ,
0
MP
0
MP
Then the transformed matrix
XJX −1 =
0
MP−1 Ah⊤ MH
Ah
0
(3.7)
1/2
−1/2
has the same eigenvalues as J, including the desired singular values of MH Ah MP
Lemma 3.3. The transformed matrix has the form
0
CH Πh (Kh )−1 B h EP
−1
XJX =
,
CP (B h )⋆ (Kh )−1 (Πh )⋆ EH
0
.
(3.8)
where (B h )⋆ : Yh × Uh × Zh → Ph and (Πh )⋆ : Hh → Yh × Uh × Zh are the adjoint
operators of B h and Πh , respectively.
Proof. We only need to consider the lower left block. By transposing Ah , we obtain
⋆
Ah⊤ = EP⋆ (B h )⋆ (Kh )−1 (Πh )⋆ CH
since Kh is symmetric. By definition, the adjoint operator EP⋆ satisfies hEP⋆ ξ, piRm =
hξ, EP piP for all ξ ∈ Ph and p ∈ Rm . Hence, we obtain
p⊤ (EP⋆ ξ) = hξ,
m
X
pi ϕi iP = p⊤ MP (CP ξ)
i=1
and thus EP⋆ = MP CP . Moreover,
−1 ⋆
−1
−1
−1
⋆
⋆ −1
CH
= (EH
) = (EH
) = (MH CH )−1 = CH
MH
= EH MH
holds. Consequently,
MP−1 Ah⊤ MH = CP (B h )⋆ (Kh )−1 (Πh )⋆ EH
as claimed.
Remark 3.4. Algorithmically, evaluating a matrix-vector product with (3.8) and a
given coordinate vector (h, p)⊤ ∈ Rn ×Rm amounts to solving two sensitivity problems:
(1) The first problem is (3.5) with the perturbation direction p = EP p ∈ Ph .
7. Quantitative Stability Analysis
135
(2) For the second problem, the right hand side operator B h in (3.5) is replaced by
(Πh )⋆ , and the observation operator Πh is replaced by (B h )⋆ . The direction of
evaluation is h = EH h ∈ Hh .
Step (2) requires a modification of the original sensitivity problem (3.5). As an alternative, one may apply the following duality argument to (3.7): The vector MP−1 Ah⊤ MH h
⊤
is equal to the transpose of h MH Ah MP−1 . In case that the dimension of the parameter space m is small, the inversion of MP and the solution of m sensitivity problems
to get Ah MP−1 may be feasible.
(1)
(2)
(1)⊤
M H wi
Let us denote by wi = (wi , wi )⊤ the eigenvectors of XJX −1 belonging to the
nonnegative eigenvalues σi , i = 1, . . . , min{m, n}. This similarity transformation with
X and X −1 does indeed avoid the Cholesky factors of the mass matrices, as will
become clear in the sequel.
Recall that the eigenvalues of XJX −1 are ±σi , plus a suitable number of zeros,
where σi are the desired singular values. Hence the largest singular values correspond
to the eigenvalues of largest magnitude, which can be conveniently computed iteratively, e.g., by an implicitly restarted Arnoldi process [19, Ch. 6.4]. Available software
routines include the library ArPack (DNAUPD and DNEUPD), see [9], and Matlab’s
eigs function. In case that the parameter space (or the observation space) is lowdimensional, we may also compute the matrix XJX −1 explicitly, see Sections 4.1 and
4.2, but these cases are not considered typical for our applications.
We now discuss how to recover the desired partial singular value decomposition
from the partial eigen decomposition of XJX −1 . For later reference, we note the
following property of the eigenvectors of (3.7), which is readily verified:
wi
(1)
(2)⊤
= wi
(2)
M P wi .
(3.9)
Note also that the eigenvectors wi of XJX −1 and vi′ of J are related by wi = Xvi′ .
1/2
−1/2
As the left and right singular vectors of MH Ah MP
are just a partitioning of vi′
according to Lemma 3.2, we get
!
′
(1)
li
′
−1 wi
= vi = X
(2) ,
r′i
wi
which in turn seems to bring up the Cholesky factors we wish to avoid. However, r′i is
a coordinate vector with respect to an artificial (orthonormal) basis of Ph , which does
not in general coincide with our chosen basis {ϕi }. Going back to this natural basis
and normalizing, we arrive at
ri =
(2)
(2)⊤
wi
wi
(2) 1/2
M P wi
(3.10)
Now ri is the coordinate representation of the desired i-th right singular vector with
respect to the basis {ϕi }. Due to the normalization, the function represented by ri
has P -norm one.
We also wish to find the coordinate representation li of the response of the system
(2)
h
A , given the perturbation input ri . As ri is a multiple of wi and thus part of an
(1)
eigenvector of XJX −1 , we infer from (3.7) that Ah maps ri to a multiple of wi . We
are thus led to define
li =
(1)
(1)⊤
wi
wi
(1) 1/2
M H wi
.
(3.11)
136
Numerical Methods and Applications
(1)
Despite the individual normalizations of wi
the same proportionality constant:
(2)
and wi , li and ri are still related by
Ah ri = σi li ,
(3.12)
as can be easily verified using (3.9). We have thus proved our main result:
Theorem 3.5. Suppose that σi > 0 is an eigenvalue of the matrix XJX −1 with
(2)
(1)
eigenvector wi = (wi , wi )⊤ . Let ri and li be given by (3.10) and (3.11), respectively
and let ri = EP ri ∈ Ph and li = EH li ∈ Hh be the functions represented by them.
Then the following relations are satisfied:
(a) kri kP = kli kH = 1.
(b) The perturbation ri invokes the first order change σi li of magnitude σi in the
observed quantity. In terms of coordinate vectors, Ah ri = σi li .
Based on these considerations, we propose to compute the desired singular value
−1/2
1/2
by iteratively approximation the extreme eigenvalues
decomposition of MH Ah MP
and corresponding eigenvectors of XJX −1 . This avoids the Cholesky factors of the
mass matrices, as desired. We summarize the proposed procedure in Algorithm 1.
Algorithm 1
Given:
discretized spaces Yh , Uh , Zh and Ph , Hh ,
a discrete critical point (yh , uh , λh ) satisfying (3.2)–(3.4) for p0 ∈ Ph ,
a routine evaluating XJX −1 (h, p)⊤ for any given coordinate vector
(h, p)⊤ , see Remark 3.4
Desired: a user-defined number s of singular values and perturbation directions
(right singular vectors) in coordinate representation, which are of greatest first order impact with respect to the observed quantity
1:
2:
3:
4:
Call a routine which iteratively computes the 2s eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λs ≥
0 ≥ λs+1 ≥ . . . ≥ λ2s of largest absolute value and corresponding eigenvectors wi
of XJX −1.
Set σi := λi for i = 1, . . . , s.
(2)
(1)
Split wi into (wi , wi ) of lengths n and m, respectively, for i = 1, . . . , s.
Compute vectors ri and li for i = 1, . . . , s according to (3.10) and (3.11).
Remark 3.6. The singular value decomposition of A and Ah relies on the linearity
of the map p 7→ (y, u, λ), which maps a perturbation direction p to the directional
derivative of the optimal solution and adjoint state, compare (2.7)–(2.8). For optimal control problems with pointwise control constraints a(x) ≤ u(x) ≤ b(x) almost
everywhere on the control domain, the derivative need not be linear with respect to
the direction, see [4, 10]. The presence of strict complementarity, however, restores
the linearity. The procedure outlined above carries over to this case, with only minor
modifications of the operators Kh and B h on the so-called active sets, compare also [6].
4. Numerical Examples
We consider as an example the optimal control problem
Z
1
γ
minimize −
y(x) dx + kuk2L2 (C)
4 Ω
2

−κ∆y
=
χ
u
on Ω
C

s.t.
 κ ∂ y = α (y − y∞ ) on ∂Ω.
∂n
(4.1)
7. Quantitative Stability Analysis
137
It represents the optimal heating of a room Ω = (−1, 1)2 ⊂ R2 to maximal average
temperature y, subject to quadratic control costs. Heating is achieved through two
radiators on some part of the domain C ⊂ Ω, and the heating power u serves as a
distributed control variable. κ denotes the constant heat diffusivity, while α is the
heat transfer coefficient with the environment. The latter has constant temperature
y∞ . α is taken to be zero at the walls but greater than zero at the two windows, see
Figure 4.1.
FEM Mesh
window 1
1
0.8
radiator 1
0.6
0.4
0.2
0
−0.2
−0.4
−0.6
radiator 2
window 2
−0.8
−1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Figure 4.1. Layout of the domain and an intermediate finite element
mesh with 4225 vertices (degrees of freedom).
In the sequel, we consider the window heat transfer coefficients as perturbation
parameters. As its nominal value, we take


0 at the walls
α(x) = 1 at the lower (larger) window # 2


2 at the upper (smaller) window # 1.
We will explore how the optimal temperature y changes under changes of α. Our
example fits in the framework of Section 2 with
Z
1
γ
f (y, u) = −
y(x) dx + kuk2L2 (C)
4 Ω
2
e(y, u, p)(ϕ) = κ(∇y, ∇ϕ)Ω − (u, ϕ)C − (α(y − y∞ ), ϕ)∂Ω .
Suitable function spaces for the problem are
Y = H 1 (Ω),
U = L2 (C),
Z = H 1 (Ω),
P = L2 (W1 ) × L2 (W2 ).
f and e are infinitely differentiable w.r.t. (y, u, p). For any given (y, u, p) ∈ Y × U × P ,
ey (y, u, p) : Y → Z ⋆ is onto and even boundedly invertible. Moreover, the problem is
strictly convex and thus has a unique global solution which satisfies the second-order
condition. The KKT operator is boundedly invertible. As state observation operator,
we will use Π(y, u, λ) = y ∈ H = L2 (Ω). Compactness of A then follows from compactness of the embedding Y ֒→ H. Hence the example satisfies the Assumptions 2.3
and 2.8. Note that the parameter enters only in the PDE and not in the objective.
The problem is discretized using standard linear continuous finite elements for the
state and adjoint, and discontinuous piecewise constant elements for the control. In
order to estimate the order of convergence for the singular values, a hierarchy of
uniformly refined triangular meshes is used. An intermediate mesh is shown in Figure 4.1 (right).
Since the problem has a quadratic objective and a linear PDE constraint, its solution
requires the solution of only one linear system involving K. Here and throughout,
138
Numerical Methods and Applications
systems involving K were solved using the conjugate gradient method applied to the
reduced Hessian operator
−1 −1 ⋆ −ey eu
−ey eu
Lyy Lyu
Kred =
,
Luy Luu
id
id
see, e.g., [5,7] for details. The state and adjoint partial differential equations are solved
using a sparse direct solver.
Figure 4.2 shows the nominal solution (yh , uh ) in the case
κ = 1,
γ = 0.005,
y∞ = 0
C = (−0.8, 0.0) × (0.4, 0.8) ∪ (−0.75, 0.75) × (−0.8, −0.6)
W1 = (−0.75, 0) × {1},
W2 = (−0.75, 0.75) × {−1}.
This setup describes the goal to heat up the room to a maximal average temperature
(taking control costs into account) at an environmental temperature of 0◦ C. One
clearly sees how heat is lost through the two windows.
Nominal control
1
100
0.8
0.6
95
0.4
0.2
90
0
−0.2
85
−0.4
−0.6
80
−0.8
−1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Figure 4.2. Nominal solution: Optimal state (left) and optimal control (right).
In the sequel, we consider three variations of this problem. In every case, the
insulation of the two windows, i.e., the heat transfer coefficient α restricted to the
window areas, serves as a perturbation parameter. In Problem 1, this parameter is
constant for each window and it is a spatial function in Problems 2 and 3. The optimal
temperature y is the basis of the observation in all cases. In Problems 1 and 3, we
observe the temperature at every point. In Problem 2, we consider only the average
temperature throughout the room. Hence, these problems cover all cases where at
least one of the parameter or observation spaces P and H is infinite-dimensional and
high-dimensional after discretization.
All examples are implemented using Matlab’s PDE toolbox. In every case, we
use Matlab’s eigs function with standard tolerances to compute a partial eigen
decomposition of the matrix XJX −1 . For Problems 1 and 2, we assemble this matrix
explicitly according to (3.7). For Problem 3, we provide matrix-vector products with
XJX −1 according to (3.8). Every matrix-vector product comes at the expense of the
solution of two sensitivity problems (3.5), compare Remark 3.4.
4.1. Problem 1: Few Parameters, Large Observation Space. We begin by
considering perturbations of the heat transfer coefficient on each window, i.e.,
p = (α|W1 , α|W2 ) ∈ R2 .
7. Quantitative Stability Analysis
139
That is, we study the effect of replacing the windows by others with different insulation properties. While the parameter space is only two-dimensional, we consider
an infinite-dimensional observation space and observe the effect of the perturbations
on the overall temperature throughout the room. That is, we have the observation
operator Π(y, u, λ) = y, and the space H is taken as L2 (Ω). Hence the mass matrix
MH in the discrete observation space is given by the L2 (Ω)-inner products of the linear continuous finite element basis on the respective grid. The mass matrix in the
parameter space MP is chosen as
0.75
0
MP =
0
1.50
and it is generated by the L2 -inner product of the constant functions of value one on
W1 and W2 . It thus reflects the lengths of the two windows and allows a comparison
with Problem 3 later on.
Since the matrix Ah ∈ Rn×2 has only two columns, it can be formed explicitly by
solving only two sensitivity systems. From there, we easily set up XJX −1 according
to (3.7) to avoid Cholesky factors of mass matrices, and perform an iterative partial
eigen decomposition. Note that since Ah has only two nonzero singular values, only
four eigenvalues of XJX −1 are needed.
Table 4.1 shows the convergence of the singular values as the mesh is uniformly
refined. In addition, the number of degrees of freedom of each finite element mesh
and the total number of variables in the optimization problem is shown. The last
column lists the number of QP steps, i.e., solutions of (3.5) with matrix Kh , which
were necessary to obtain convergence of the (partial) eigen decomposition. For this
problem, the number of QP solves is always two since Ah ∈ Rn×2 was assembled
explicitly. Note also that our original problem (4.1) is linear-quadratic, hence finding
the nominal solution requires only one solution with Kh and computing the singular
values and vectors is twice as expensive.
# var
σ1 rate
σ2 rate # Ah p
# dof
81
168
5.0572
1.1886
2
289
626 11.8804 0.93 2.2487 0.81
2
1 089
2 394 13.3803 0.32 2.5896 0.40
2
4 225
9 530 16.6974 1.15 3.2168 1.29
2
16 641 38 136 18.8838 2.31 3.5678 2.38
2
66 049 151 898 19.3367 2.48 3.6283 1.87
2
263 169 605 946 19.4352
3.6510
2
Table 4.1. Degrees of freedom and total number of discrete state,
control and adjoint variables on a hierarchy of finite element grids.
Singular values and estimated rate of convergence w.r.t. grid size h
for Problem 1. Number of sensitivity problems (3.5) solved.
In this and the subsequent problems, we observed monotone convergence of the
computed singular values. The estimated rate of convergence given in the tables was
calculated according to
|σh −σ∗ |
log |σ
2h −σ∗ |
log 1/2
,
where σ∗ is the respective singular value on the finest mesh, and σh and σ2h is the
same value on two neighboring intermediate meshes. The exact rate of convergence is
difficult to predict from the table and clearly deserves further investigation.
140
Numerical Methods and Applications
On the finest mesh, we obtain as singular values and right singular vectors
−0.5103
−1.0358
σ1 = 19.3367
r1 =
σ2 = 3.6283
r2 =
.
−0.7324
0.3609
Recall that r1 and r2 represent piecewise constant functions r1 and r2 on W1 ∪ W2
whose values on W1 and W2 are given by the upper and lower entries, respectively,
see Figure 4.3 (right). The corresponding left singular vectors are shown in Figure 4.3 (left). These results can be interpreted as follows: Of all perturbations of
Problem 3: First right singular vector (windows 1 and 2)
−0.5
−0.55
−0.6
−0.65
−0.7
−0.75
−0.8
−0.6
−0.4
−0.2
0
x position
0.2
0.4
0.6
0.8
Problem 3: Second right singular vector (windows 1 and 2)
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
−1.2
−0.8
−0.6
−0.4
−0.2
0
x position
0.2
0.4
0.6
Figure 4.3. Problem 1: First and second left singular vectors l1
and l2 (left) and first and second right singular vectors (right), lower
window (red) and upper window (blue).
unit size (with respect to the scalar product given by MP ), the nominal state (from
Figure 4.2) is perturbed most (in the L2 (Ω)-norm) when both windows are better insulated with the ratio of the improvement given by the ratio of the entries of the right
singular vector r1 . The effect of this perturbation direction on the observed quantity
(the optimal state) is represented by the first left singular vector l1 = EH l1 , multiplied
by σ1 , compare (3.12). Due to the improved insulation at both windows, l1 is positive,
i.e., the optimal temperature increases throughout the domain Ω when p changes from
p0 to p0 + r1 . Since the second entry in r1 is greater in magnitude, the effect on the
optimal temperature is more pronounced near the lower window, see Figure 4.3 (top
left).
Since the parameter space is only two-dimensional, the second right singular vector r2 represents the unit perturbation of lowest impact on the optimal state. Figure 4.3 (bottom left) shows the corresponding second left singular vector. Note that
0.8
7. Quantitative Stability Analysis
141
kl1 kL2 (Ω) = kl2 kL2 (Ω) = 1 and that l1 and l2 are perpendicular with respect to the
inner product of L2 (Ω). The singular value σ2 shows that any given perturbation
of the heat transfer coefficients of unit size has at least an impact of 3.6283 on the
optimal state in the L2 (Ω)-norm, to first order. This should be viewed in relation to
the L2 (Ω)-norm of the nominal solution, which is 48.3982.
The data obtained from the singular value decomposition can be used to decide
whether the observed quantity depending on the optimal solution is sufficiently stable
with respect to perturbations. This decision should take into account the expected
range of parameter variations and the tolerable variations in the observed quantity.
4.2. Problem 2: Many Parameters, Small Observation Space. In contrast to
the previous situation, we now consider the window heat transfer coefficients to be
spatially variable. That is, we have parameters
p = (α(x)|W1 , α(x)|W2 ) ∈ L2 (W1 ) × L2 (W2 ).
As an observed quantity, we choose the scalar value of the temperature averaged over
the entire room. Hence the observation space is H = R and
Z
1
y(x) dx.
Π(y, u, λ) =
4 Ω
Such a scalar output quantity is often called a quantity of interest. The weight in
the observation space is MH = 1 and the mass matrix in the parameter space is the
boundary mass matrix on W1 ∪ W2 with respect to piecewise constant functions on
the boundary of the respective finite element grid.
The matrix Ah ∈ R1×m now has only one row. It is thus strongly advisable to
compute its transpose which requires only one solution of a linear system with Kh . This
transposition technique was already used in [6] to compute derivatives of a quantity of
interest depending on an optimal solution in the presence of perturbations. As above,
we show in Table 4.2 the convergence behavior of the only non-zero singular value of
Ah .
# var
σ1 rate # Ah p
# dof
81
168 2.5381
1
289
626 5.9245 0.93
1
1 089
2 394 6.6786 0.32
1
4 225
9 530 8.3316 1.15
1
16 641 38 136 9.4157 2.31
1
66 049 151 898 9.6393 2.47
1
263 169 605 946 9.6887
1
Table 4.2. Problem 2: Singular value and estimated rate of convergence w.r.t. grid size h for Problem 2. Number of sensitivity problems
(3.5) solved.
Figure 4.4 (right) displays the right singular vector r1 = EP r1 belonging to this
problem. From this we infer that the largest increase in average temperature is
achieved when the insulation at the larger (lower) window is improved to a higher
degree than that of the smaller (upper) window, although the nominal insulation of
the larger (lower) window is already twice as good. It is interesting to note that for
the maximum impact on the average temperature, the insulation should be improved
primarily near the edges of the windows. Again, the sensitivity y of the optimal
state belonging to the perturbation of greatest impact is positive throughout (Figure 4.4 (left)).
142
Numerical Methods and Applications
Problem 2: First right singular vector (windows 1 and 2)
−0.45
−0.5
−0.55
−0.6
−0.65
−0.7
−0.75
−0.8
−0.85
−0.9
−0.8
−0.6
−0.4
−0.2
0
x position
0.2
0.4
0.6
Figure 4.4. Problem 2: Parametric sensitivity y (left) of the optimal
state belonging to the first right singular vector r1 (right). Lower
window (red) and upper window (blue).
4.3. Problem 3: Many Parameters, Large Observation Space. The final example features both large parameter and observation spaces, so that assembling the
matrices Ah and XJX −1 as in the previous examples is prohibitive. Instead, we supply only matrix-vector products of XJX −1 to the iterative eigen solver. This situation
is considered typical for many applications.
The parameter space is chosen as in Problem 2, and the observation is the temperature on all of Ω as in Problem 1. Table 4.3 shows again the convergence of the
singular values as the mesh is uniformly refined.
# dof
# var
σ1 rate
σ2 rate # Ah p
81
168
5.0771
1.1947
40
289
626 11.9262 0.93 2.3426 0.83
68
1 089
2 394 13.4326 0.32 2.6603 0.35
68
4 225
9 530 16.7587 1.15 3.3093 1.20
68
16 641 38 136 18.9500 2.31 3.7092 2.31
68
66 049 151 898 19.4037 2.48 3.7896 2.31
68
263 169 605 946 19.5024
3.8099
68
Table 4.3. Problem 3: Singular values and estimated rate of convergence w.r.t. grid size h for Problem 3. Number of sensitivity problems
(3.5) solved.
Note that the parameter space of Problem 1 (two constant heat transfer coefficients)
is a two-dimensional subspace of the current high-dimensional parameter space. Hence,
we expect the singular values for Problem 3 to be greater than those for Problem 1.
This is confirmed by comparing Tables 4.1 and 4.3. However, the first two singular
values σ1 and σ2 are only slightly larger than in Problem 1. In particular, the augmentation of the parameter space does not lead to additional perturbation directions
of an impact comparable to the impact of r1 . Comparing the right singular vector
r1 , Figure 4.5 (top right), with the right singular vector r1 = (−0.5103, −0.7324)⊤
from Problem 1, representing a piecewise constant function, we infer that the stronger
insulation near the edges of the windows does not significantly increase the impact on
the optimal state.
We also observe that the first right singular vector r1 (Figure 4.5 (top right)) describing the perturbation of largest impact on the optimal state is very similar to
0.8
7. Quantitative Stability Analysis
143
Problem 3: First right singular vector (windows 1 and 2)
−0.45
−0.5
−0.55
−0.6
−0.65
−0.7
−0.75
−0.8
−0.85
−0.9
−0.95
−0.8
−0.6
−0.4
−0.2
0
x position
0.2
0.4
0.6
0.8
Problem 3: Second right singular vector (windows 1 and 2)
1.5
1
0.5
0
−0.5
−1
−1.5
−0.8
−0.6
−0.4
−0.2
0
x position
0.2
0.4
0.6
Figure 4.5. Problem 3: First and second left singular vectors (left)
and first and second right singular vectors (right), lower window (red)
and upper window (blue).
the right singular vector in Problem 2, see Figure 4.4 (right), although the observed
quantities are different in Problems 2 and 3.
Finally, we present in Figure 4.6 the distribution of the largest 20 singular values.
Their fast decay shows that only a few singular values and the corresponding right singular vectors capture the practically significant perturbation directions of high impact
for the problem at hand.
Distribution of the largest singular values
20
18
16
14
12
10
8
6
4
2
0
0
2
4
6
8
10
12
14
16
18
20
Figure 4.6. Problem 3: First 20 singular values.
0.8
144
Numerical Methods and Applications
5. Conclusion
In this paper, we presented an approach for the quantitative stability analysis of
local optimal solutions in PDE-constrained optimization. The singular value decomposition of a compact linear operator was used in order to determine the perturbation
direction of greatest impact on an observed quantity which in turn depends on the
solution. After a Galerkin discretization, mass matrices and their Cholesky factors
naturally appear in the singular value decomposition of the discretized operator. In
order to avoid forming these Cholesky factors, we described a similarity transformation
of the Jordan-Wielandt matrix. A matrix-vector multiplication with this transformed
matrix amounts to the solution of two sensitivity problems. The desired (partial)
singular value decomposition can be obtained using standard iterative eigen decomposition software, e.g., implicitly restarted Arnoldi methods.
We presented a number of numerical examples to validate the proposed method and
to explain the results in the context of a concrete problem. The order of convergence
of the singular values deserves further investigation. We observed that the numerical
effort even for the computation of few singular values may be large compared to the
solution of the nominal problem itself. In order to accelerate the computation of the
desired singular values and vectors, however, it may be sufficient to compute them on
a coarser grid. In addition, parallel implementations of eigen solvers can be used.
References
[1] R. Adams and J. Fournier. Sobolev Spaces. Academic Press, New York, second edition, 2003.
[2] K. Deimling. Nonlinear Functional Analysis. Springer, Berlin, 1985.
[3] H. Engl, M. Hanke, and A. Neubauer. Regularization of Inverse Problems. Kluwer Academic
Publishers, Boston, 1996.
[4] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system—
Part I: Solution differentiability. Numerical Functional Analysis and Optimization, 25(1–2):93–
117, 2004.
[5] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system—
Part II: Practical methods and examples. Optimization Methods and Software, 19(2):217–242,
2004.
[6] R. Griesse and B. Vexler. Numerical sensitivity analysis for the quantity of interest in PDEconstrained optimization. SIAM Journal on Scientific Computing, 29(1):22–48, 2007.
[7] M. Hinze and K. Kunisch. Second order methods for optimal control of time-dependent fluid
flow. SIAM Journal on Control and Optimization, 40(3):925–946, 2001.
[8] I. Jolliffe. Principal Component Analysis. Springer, New York, second edition, 2002.
[9] R. B. Lehoucq, D. C. Sorensen, and C. Yang. Arpack User’s Guide: Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. Software, Environments, and
Tools. SIAM, Philadelphia, 1998.
[10] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002.
[11] H. Maurer and J. Zowe. First and second order necessary and sufficient optimality conditions for
infinite-dimensional programming problems. Mathematical Programming, 16:98–110, 1979.
[12] J. Nocedal and S. Wright. Numerical Optimization. Springer, New York, 1999.
[13] L. Sirovich. Turbulence and the dynamics of coherent structures. I. Quarterly of Applied Mathematics, 45(3):561–571, 1987.
[14] G. Stewart and J.-G. Sun. Matrix Perturbation Thoery. Academic Press, New York, 1990.
[15] F. Tröltzsch. On the Lagrange-Newton-SQP method for the optimal control of semilinear parabolic equations. SIAM Journal on Control and Optimization, 38(1):294–312, 1999.
[16] F. Tröltzsch. Optimale Steuerung partieller Differentialgleichungen. Vieweg, Wiesbaden, 2005.
[17] F. Tröltzsch and D. Wachsmuth. Second-order sufficient optimality conditions for the optimal
control of Navier-Stokes equations. ESAIM: Control, Optimisation and Calculus of Variations,
12(1):93–119, 2006.
[18] S. Volkwein. Interpretation of proper orthogonal decomposition as singular value decomposition
and HJB-based feedback design. In Proceedings of the Sixteenth International Symposium on
Mathematical Theory of Networks and Systems (MTNS), Leuven, Belgium, 2004.
[19] D. Watkins. Fundamentals of Matrix Computations. Wiley-Interscience, New York, 2002.
8. Numerical Sensitivity Analysis for the Quantity of Interest
145
8. Numerical Sensitivity Analysis for the Quantity of Interest in
PDE-Constrained Optimization
R. Griesse and B. Vexler: Numerical Sensitivity Analysis for the Quantity of Interest in
PDE-Constrained Optimization, SIAM Journal on Scientific Computing, 29(1), p.22–
48, 2007
As in the previous section, we consider the situation of an observed quantity which
depends on the solution of an optimization problem subject to perturbations. This
quantity of interest I, or output functional, is real-valued and and may differ from the
cost functional J used during optimization.
Using the notation of the previous section, we introduce the Lagrangian L(y, u, p; π) =
J(y, u, π) + hp, e(y, u; π)i and the reduced cost functional
j(π) = (J ◦ Ξ)(π).
We recall that the first-order directional derivatives of j with respect to perturbations
satisfy
(8.1)
Dj(π0 ; δπ) = D(J ◦ Ξ)(π0 ; δπ) = Lπ (Ξ(π0 )) δπ.
This is due to the fact that the partial derivatives of Ly , Lu and Lp vanish at Ξ(π0 ).
Moreover, (8.1) continues to hold in the presence of control constraints, see, e.g.
Malanowski [2002], or Proposition 3.16 of the paper.
The situation is different for a quantity of interest I 6= J, since the evaluation of its
directional derivative D(I ◦ Ξ)(π0 ; δπ) requires the solution of one sensitivity problem
to find DΞ(π0 ; δπ), see Proposition 3.5 and Theorem 3.6 below.
The focus of the present paper is on the efficient evaluation of the gradient and Hessian
of the reduced quantity of interest
i(π) = (I ◦ Ξ)(π).
It extends results from Becker and Vexler [2005], where the gradient case was investigated. Naturally, the gradient and Hessian information of i and j can be used,
for instance, to predict the value of these quantities for perturbed problem settings
(compare Section 6).
From the discussion above we conclude that the straightforward evaluation of the gradient i0 (π0 ) requires the solution of dim P sensitivity problems (one for each direction),
where dim P is the dimension of the parameter space. Using a duality (transposition)
argument, we are able to reduce this effort to only one sensitivity problem. With the
same idea, the evaluation of the Hessian can be
accomplished by solving 1 + dim P
sensitivity problems, rather than O (dim P )2 in a straightforward approach. The
duality trick is outlined in Section 3.1 of the paper, and it is elaborated on in Sections 3.2 and 3.3 for problems with and without control constraints. In the control
constrained case, we need to assume strict complementarity, and, in order for the second derivative of Ξ(π) to exist, we need to make the additional assumption that the
active sets do not change when moving from the nominal to the perturbed problem,
see the text before Theorem 3.19.
The paper concludes with the presentation of an algorithm for the evaluation of the
gradient and Hessian of the reduced cost functional and quantity of interest j(π)
and i(π), and with two numerical examples which verify the proposed method. The
first example is a parameter estimation problem for the incompressible Navier-Stokes
equation, and the quantity of interest is either the parameter to be identified, or the
drag of cylinder located in the flow. The scalar perturbation parameter enters one
of the boundary conditions, and no inequality constraints are present. The second
example is the control constrained boundary optimal control problem for the reaction
146
Numerical Methods and Applications
diffusion system considered in Section 4 of this thesis. The quantity of interest is the
total amount of control action over time, and the infinite-dimensional parameter is
one of the initial states of the system.
8. Numerical Sensitivity Analysis for the Quantity of Interest
147
NUMERICAL SENSITIVITY ANALYSIS FOR THE QUANTITY OF
INTEREST IN PDE-CONSTRAINED OPTIMIZATION
ROLAND GRIESSE AND BORIS VEXLER
Abstract. In this paper, we consider the efficient computation of derivatives of
a functional (the quantity of interest) which depends on the solution of a PDEconstrained optimization problem with inequality constraints and which may be
different from the cost functional. The optimization problem is subject to perturbations in the data. We derive conditions under with the quantity of interest possesses first and second order derivatives with respect to the perturbation
parameters. An algorithm for the efficient evaluation of these derivatives is developed, with considerable savings over a direct approach, especially in the case
of high-dimensional parameter spaces. The computational cost is shown to be
small compared to that of the overall optimization algorithm. Numerical experiments involving a parameter identification problem for Navier-Stokes flow and
an optimal control problem for a reaction-diffusion system are presented which
demonstrate the efficiency of the method.
1. Introduction
In this paper we consider PDE-constrained optimization problems with inequality
constraints. The optimization problems are formulated in a general setting including
optimal control as well as parameter identification problems. The problems are subject
to perturbation in the data. We suppose to be given a quantity of interest (output
functional), which depends on both the state and the control variables and which may
be different from the cost functional used during the optimization.
The quantity of interest is shown to possess first and, under tighter assumptions,
second order derivatives with respect to the perturbation parameters. In the presence
of control constraints, strict complementarity and compactness of certain derivatives
of the state equation are assumed; for second order derivatives, stability of the active
set is required in addition. The precise conditions are given in Section 3. The main
contribution of this paper is to devise an efficient algorithm to evaluate these sensitivity
derivatives, which offers considerable savings over a direct approach, especially in
the case of high-dimensional parameter spaces. We show that the derivatives of the
quantity of interest can be computed with only little additional numerical effort in
comparison to the corresponding derivatives of the cost functional. Moreover, the
computational cost for the evaluation of the gradient of the quantity of interest is
independent of the dimension of the parameter space and low compared to that of the
overall optimization algorithm. The cost to evaluate the Hessian grows linearly with
the dimension of the parameter space. We refer to Table 3.1 for details.
The parametric derivatives of the quantity of interest offer a significant amount of
additional information on top of an optimal solution. The derivative information can
be used to assess the stability of an optimal solution, or to compute a Taylor expansion
which allows the fast prediction of the perturbed value of the quantity of interest in a
neighborhood of a reference parameter.
We note that a quantity of interest different from the cost functional is often natural.
For instance, an optimization problem in fluid flow may aim at minimizing the drag
of a given body, e.g., by adjusting the boundary conditions. The quantity of interest,
148
Numerical Methods and Applications
however, may be the lift coefficient of the optimal configuration. We also mention the
applicability of our results to bi-level optimization problems where the outer variable is
the ”perturbation” parameter and the outer objective is the output functional, whose
derivatives are needed to employ efficient optimization algorithms.
The necessity to compute higher order derivatives may impose possible limitations
to the applicability of the methods presented in this paper. Second order derivatives
of the cost functional and the PDE constraint are required to evaluate the gradient
of the quantity of interest, and third order derivatives are required to evaluate the
Hessian.
Let us put our work into perspective. The existence of first and second order
sensitivity derivatives of the objective function (cost functional) in optimal control of
PDEs with control constraints has been proved in [7, 17]. Moreover, [8] addresses the
numerical computation of these derivatives. Recently, the computation of the gradient
of the quantity of interest in the absence of inequality constraints has been discussed
in [3].
Problem Setting. We consider the PDE-constrained optimization problem in the
following abstract form: The state variable u in an appropriate Hilbert space V with
scalar product (·, ·)V is determined by a partial differential equation (state equation)
in weak form:
a(u, q, p)(φ) = f (φ) ∀φ ∈ V,
(1.1)
where q denotes the control, or more generally, design variable in the Hilbert space
Q = L2 (ω) with the standard scalar product (·, ·). Typically, ω is a subset of the
computational domain Ω or a subset of its boundary ∂Ω. In case of finite dimensional
controls we set Q = Rn and identify this space with L2 (ω) where ω = {1, 2, . . . , n} to
keep the notation consistent. The parameter p from a normed linear space P describes
the perturbations of the data.
For fixed p ∈ P, the semi-linear form a(·, ·, p)(·) is defined on the Hilbert space
V × Q × V. Semi-linear forms are written with two parentheses, the first one refers
to the nonlinear arguments, whereas the second one embraces all linear arguments.
The partial derivatives of the semi-linear form a(·, ·, p)(·) are denoted by a0u (·, ·, p)(·, ·),
a0q (·, ·, p)(·, ·) etc. The linear functional f ∈ V 0 represents the right hand side of the
state equation, where V 0 denotes the dual space of V. For the cost functional (objective
functional) we assume the form
α
J(u, p) + kq − qk2Q ,
(1.2)
2
which is typical in PDE-constrained optimization problems. Here, α > 0 is a regularization parameter and q ∈ Q is a reference control. The functional J : V × P → R
is also subject to perturbation. It is possible to extend our analysis to more general
cost functionals than (1.2). In particular, only notational changes are necessary if J
contains linear terms in q, and if α and q also depend on the perturbation parameter. However, full generality of the cost functional comes at the expense of additional
assumptions which would unnecessarily complicate the discussion.
In order to cover additional control constraints we introduce a nonempty closed
convex subset Qad ⊂ Q by:
Qad = {q ∈ Q | b− (x) ≤ q(x) ≤ b+ (x) a.e. on ω},
with bounds b− ≤ b+ ∈ Q. In the case of finite dimensional controls the inequality
b− ≤ q ≤ b+ is meant to hold componentwise.
The problem under consideration is to
minimize (1.2) over Qad × V
subject to the state equation (1.1)
(OP(p))
8. Numerical Sensitivity Analysis for the Quantity of Interest
149
for fixed p ∈ P. We assume that in a neighbourhood of a reference parameter p0 ,
there exist functions u = U (p) and q = Q(p), which map the perturbation parameter
p to a local solution (u, q) of the problem (OP(p)). In Section 3, we give sufficient
conditions ensuring the existence and differentiability of these functions. Our results
complement previous findings in [7, 10, 17].
The quantity of interest is denoted by a functional
I : V × Q × P → R.
(1.3)
This gives rise to the definition of the reduced quantity of interest i : P → R,
i(p) = I(U (p), Q(p), p).
(1.4)
Likewise, we denote by j : P → R the reduced cost functional:
j(p) = J(U (p), p) +
α
kQ(p) − qk2Q .
2
(1.5)
As stated above, the main contribution of this paper is to devise an efficient algorithm to evaluate the first and second derivatives of the reduced quantity of interest
i(p).
The outline of the paper is as follows: In the next section we specify the first order necessary optimality conditions for the problem under consideration. We recall
a primal-dual active set method for its solution. The core step of this method is
described to some detail since it is also used for the problems arising during the sensitivity computation. In Section 3 we use duality arguments for the efficient evaluation
of the first and second order sensitivities of the quantity of interest with respect to
perturbation parameters. Throughout, we compare the standard sensitivity analysis for the reduced cost functional j(p) with our analysis for the reduced quantity of
interest i(p). In the last section we discuss two numerical examples illustrating our approach. The first example deals with a parameter identification problem for a channel
flow described by the incompressible Navier-Stokes equations. In the second example we consider the optimal control of time-dependent three-species reaction-diffusion
equations under control constraints.
2. Optimization algorithm
In this section we recall the first order necessary conditions for the problem (OP(p))
and describe the optimization algorithm with active set strategy which we use in our
numerical examples. In particular, we specify the Newton step taking into account the
active sets since the sensitivity problems arising in Section 3 are solved by the same
technique.
Throughout the paper we make the following assumption:
Assumption 2.1.
(1) Let a(·, ·, ·)(·) be three times continuously differentiable with
respect to (u, q, p).
(2) Let J(·, ·) be three times continuously differentiable with respect to (u, p).
(3) Let I(·, ·, ·) be twice continuously differentiable with respect to (u, q, p).
In order to establish the optimality system, we introduce the Lagrangian L : V ×
Q × V × P → R as follows:
L(u, q, z, p) = J(u, p) +
α
kq − qk2Q + f (z) − a(u, q, p)(z),
2
(2.1)
150
Numerical Methods and Applications
where z ∈ V denotes the adjoint state. The first order necessary conditions for the
problem (OP(p)) read:
L0u (u, q, z, p)(δu)
0
Lq (u, q, z, p)(δq − q)
L0z (u, q, z, p)(δz)
= 0 ∀δu ∈ V,
≥ 0 ∀δq ∈ Qad ,
(2.2)
(2.3)
=
(2.4)
0 ∀δz ∈ V.
They can be explicitly rewritten as follows:
Ju0 (u, p)(δu) − a0u (u, q, p)(δu, z)
= 0 ∀δu ∈ V,
α(q − q, δq − q) − a0q (u, q, p)(δq − q, z) ≥ 0 ∀δq ∈ Qad ,
f (δz) − a(u, q, p)(δz)
=
0 ∀δz ∈ V.
(2.5)
(2.6)
(2.7)
For given u, q, z, p, we introduce an additional Lagrange multiplier µ ∈ L2 (ω) by the
following identification:
(µ, δq) := −L0q (u, q, z, p)(δq)
= −α(q − q, δq) + a0q (u, q, p)(δq, z) ∀δq ∈ L2 (ω).
The variational inequality (2.6) is known to be equivalent to the following pointwise
conditions almost everywhere on ω :
q(x) = b− (x) ⇒
µ ≤ 0,
(2.8)
q(x) = b+ (x) ⇒
µ ≥ 0,
(2.9)
b− (x) < q(x) < b+ (x) ⇒
µ = 0.
(2.10)
In addition to the necessary conditions above, in the following lemma we recall
second order sufficient optimality conditions:
Lemma 2.2 (Sufficient optimality conditions). Let x = (u, q, z) satisfy the first order
necessary conditions (2.2)–(2.4) of (OP(p)). Moreover, let a0u (u, q, p) : V → V 0 be
surjective. If there exists ρ > 0 such that
δu
δq
L00uu (x, p) L00uq (x, p) δu
≥ ρ kδuk2V + kδqk2Q
00
00
Lqu (x, p) Lqq (x, p)
δq
holds for all (δu, δq) satisfying the linear (tangent) PDE
a0u (u, q, p)(δu, ϕ) + a0q (u, q, p)(δq, ϕ) = 0
∀ϕ ∈ V,
then (u, q) is a strict local optimal solution of (OP(p)).
For the proof we refer to [18].
For the solution of the first order necessary conditions (2.5)–(2.7) for fixed p ∈ P, we
employ a nonlinear primal-dual active set strategy, see [4, 12, 15, 20]. In the following
we sketch the corresponding algorithm on the continuous level:
8. Numerical Sensitivity Analysis for the Quantity of Interest
151
Nonlinear primal-dual active set strategy
(1) Choose initial guess u0 , q 0 , z 0 , µ0 and c > 0 and set n = 1
(2) While not converged
(3) Determine the active sets An+ and An−
An− = {x ∈ ω | q n−1 + µn−1 /c − b− ≤ 0}
An+ = {x ∈ ω | q n−1 + µn−1 /c − b+ ≥ 0}
(4) Solve the equality-constrained optimization problem
Minimize
J(un , p) +
α n
kq − qk2Q over V × Q
2
subject to (1.1) and to
q n (x) = b− (x) on An−
q n (x) = b+ (x) on An+
with adjoint variable z n
(5) Set µn = −α(q n − q) + a0q (un , q n , p)(·, z n )
(6) Set n = n + 1 and go to 2.
Remark 2.3.
(1) The initial guess for the Lagrange multiplier µ0 can be taken
according to step 5. Another possibility is choosing µ0 = 0 and q 0 ∈ Qad , which
leads to solving the optimization problem (step 4) without control constraints
in the first iteration.
(2) The convergence in step 2 can determined conveniently from agreement of the
active sets in two consecutive iterations.
Later on, the above algorithm is applied on the discrete level. The concrete discretization schemes are described in Section 4 for each individual example.
Clearly, the main step in the primal-dual algorithm is the solution of the equalityconstrained nonlinear optimization problem in step 4. We shall describe the Lagrange
Newton SQP method for its solution in some detail since exactly the same procedure
may be used to solve the sensitivity problems in Section 3, which are the main focus
of our paper.
For given active and inactive sets A = A+ ∪ A− and I = ω \ A, let us define the
”restriction” operator RI : L2 (ω) → L2 (ω) by
RI (q) = q · χI ,
where χI is a characteristic function of the set I. Similarly, the operators RA , RA+
and RA− are defined. Note that RI etc. are obviously self-adjoint.
152
Numerical Methods and Applications
The first order necessary conditions for the purely equality-constrained problem in
step 4 are (compare (2.2)–(2.4), respectively (2.5)–(2.7)):
L0u (u, q, z, p)(δu)
L0q (u, q, z, p)(δq)
q − b−
q − b+
L0z (u, q, z, p)(δz)
=
0 ∀δu ∈ V,
=
2
0 ∀δq ∈ L (I ),
(2.12)
An−
An+
(2.13)
(2.14)
= 0 on
= 0 on
=
(2.11)
n
0 ∀δz ∈ V,
(2.15)
with the inactive set I n = ω\(An− ∪An+ ). Using the restriction operators, (2.12)–(2.14)
can be reformulated as
L0q (u, q, z, p)(RI n δq) + (q − b− , RAn− δq) + (q − b+ , RAn+ δq) = 0
∀δq ∈ Q.
The Lagrange Newton SQP method is defined as Newton’s method, applied to (2.11)–
(2.15). To this end, we define B as the Hessian operator of the Lagrangian L, i.e.

 00
Luu (x, p)(·, ·) L00uq (x, p)(·, ·) L00uz (x, p)(·, ·)
(2.16)
B(x, p) =  L00qu (x, p)(·, ·) L00qq (x, p)(·, ·) L00qz (x, p)(·, ·) 
L00zu (x, p)(·, ·) L00zq (x, p)(·, ·)
0
To shorten the notation, we abbreviate x = (u, q, z) and X = V × Q × V. Note
that B(x, p) is a bilinear operator on the space X . By ”multiplication” of B with an
element δx ∈ X from the left, we mean the insertion of the components of δx into the
first argument. Similarly we define the ”multiplication” of B with an element δx ∈ X
from the right as insertion of the components of δx into the second argument. When
only one element is inserted, B is interpreted as a linear operator B : X → X 0 . In the
sequel, we shall omit the (·, ·) notation if no ambiguity arises.
In the absence of control constraints, the Newton update (∆u, ∆q, ∆z) for (2.11)–
(2.15) at the current iterate (uk , qk , zk ) is given by the solution of
 
 0

Lu (xk , p)
∆u
B(xk , p)  ∆q  = −  L0q (xk , p)  .
(2.17)
∆z
L0z (xk , p)
With non-empty active sets An− and An+ , however, (2.17) is replaced by
 


L0u (xk , p)
∆u
0
e k , p)  ∆q  = − RI n Lq (xk , p) + RAn (qk − b− ) + RAn (qk − b+ )
B(x
−
+
∆z
L0z (xk , p)
where

e k , p) = 
B(x
id

RI n
id

 B(xk , p) 
id

RI n
id

+
0
(2.18)

RAn
.
0
(2.19)
e is obtained from B by replacing those components in the derivatives
In other words, B
with respect to the control q by the identity which belong to the active set. In our
practical realization, we reduce the system (2.18) to the control space L2 (ω) using
Schur complement techniques, see, e.g., [16]. The reduced system is solved iteratively
using the conjugate gradient method, where each step requires the evaluation of a
matrix–vector product for the reduced Hessian, which in turn requires the solution of
one tangent and one dual problem, see, e.g., [13], or [2] for a detailed description of this
procedure in the context of space-time finite element discretization of the problem. In
fact, the reduced system needs to be solved only on the currently inactive part L2 (I n )
of the control space since on the active sets, the update ∆q satisfies the trivial relation
RAn± (∆q) = RAn± (b± − qk−1 ).
8. Numerical Sensitivity Analysis for the Quantity of Interest
153
The Newton step is completed by applying the update (uk+1 , qk+1 , zk+1 ) = (uk , qk , zk )+
(∆u, ∆q, ∆z).
3. Sensitivity analysis
In this section we analyze the behavior of local optimal solutions for (OP(p)) under
perturbations of the parameter p. We derive formulas for the first and second order
derivatives of the reduced quantity of interest and develop an efficient method for their
evaluation.
To set the stage, we outline the main ideas in Section 3.1 by means of a finitedimensional optimization problem, without partitioning the optimization variables
into states and controls, and in the absence of control constraints. To facilitate the
discussion of the infinite-dimensional case, we treat the case of no control constraints
in Section 3.2 and turn to problems with these constraints in Section 3.3. Throughout,
we compare the standard sensitivity analysis for the reduced cost functional j(p) (1.5)
with our analysis for the reduced quantity of interest i(p) (1.4). The main results can
be found in Theorems 3.6 for the unconstrained case and Theorems 3.18 and 3.21 for
the case with control constraints. An algorithm at the end of Section 3 summarizes
the necessary steps to evaluate the various sensitivity quantities.
3.1. Outline of ideas. Let us consider the nonlinear finite-dimensional equalityconstrained optimization problem
Minimize J(x, p) s.t. g(x, p) = 0,
(3.1)
where x ∈ Rn denotes the optimization variable, p ∈ Rd is the perturbation parameter,
and g : Rn × Rd → Rm collects a number of equality constraints. The Lagrangian of
(3.1) is L(x, p) = J(x, p) − z > g(x, p), and under standard constraint qualifications, a
local minimizer x0 of (3.1) at the reference parameter p0 has an associated Lagrange
multiplier z0 ∈ Rm such that
L0x (x0 , z0 , p0 ) = Jx0 (x0 , p0 ) − z0> gx0 (x0 , p0 ) = 0
L0z (x0 , z0 , p0 ) = g(x0 , p0 ) = 0
(3.2)
holds. If we assume second order sufficient conditions to hold in addition, then the
implicit function theorem yields the local existence of functions X(p) and Z(p) which
satisfy (3.2) with p instead of p0 , and X(p0 ) = x0 and Z(p0 ) = z0 hold. Moreover,
(3.2) can be differentiated totally with respect to the parameter and we obtain
00
0
00
Lxp (x0 , z0 , p0 ) δp
X (p0 ) δp
Lxx (x0 , z0 , p0 ) gx0 (x0 , p0 )>
.
(3.3)
=
−
gp0 (x0 , p0 ) δp
Z 0 (p0 ) δp
gx0 (x0 , p0 )
0
The solution of (3.3) is a directional derivative of X(p) (and Z(p)) at p = p0 , and we
note that it is equivalent to the solution of a linear-quadratic optimization problem.
Hence the evaluation of the full Jacobian X 0 (p0 ) requires d = dim P solves of (3.3)
with different δp. In our context of large-scale problems, iterative solvers need to
be used and the numerical effort to evaluate the full Jacobian scales linearly with the
number of right hand sides, i.e., with the dimension of the parameter space d = dim P.
We adapt the definition of the reduced cost functional and the reduced quantity
of interest to our current setting, j(p) = J(X(p), p) and i(p) = I(X(p), p). Since we
wish to compare the effort to compute the first and second order derivatives of both,
we begin by recalling the following result:
154
Numerical Methods and Applications
Lemma 3.1. Under the conditions above, the reduced cost functional is twice differentiable and
j 0 (p0 ) δp = L0p (x0 , z0 , p0 ) δp
b + L00 (x0 , z0 , p0 )Z 0 (p0 ) δp
b
b = δp> L00 (x0 , z0 , p0 )X 0 (p0 ) δp
δp> j 00 (p0 ) δp
px
pz
b .
+ L00pp (x0 , z0 , p0 ) δp
Proof. We have j(p) = L(X(p), Z(p), p) and hence by the chain rule
j 0 (p0 ) = L0x (x0 , z0 , p0 )X 0 (p0 ) + L0z (x0 , z0 , p0 )Z 0 (p0 ) + L0p (x0 , z0 , p0 ), where the first two
terms vanish in view of (3.2). Differentiating again totally with respect to p yields the
expression for the second derivative.
Lemma 3.1 shows that the evaluation of the gradient of j(·) does not require any
linear solves of the sensitivity system (3.3), while the evaluation of the Hessian requires
d = dim P such solves. The corresponding results for the infinite-dimensional case
can be found below in Propositions 3.5 and 3.16 for the unconstrained and control
constrained cases.
We will show now that the derivatives of the reduced quantity of interest i(·) can be
evaluated efficiently, requiring just one additional system solve. This is a significant
improvement over a direct approach, compare Table 3.1.
From a first look at
i0 (p0 ) δp = Ix0 (x0 , p0 )X 0 (p0 ) δp + Ip0 (x0 , p0 ) δp
it seems that the evaluation of the gradient i0 (p0 ) requires d = dim P solves of the
system (3.3). This is referred to as the direct approach in Table 3.1. However, using
(3.3), we may rewrite this as
L00xp (x0 , z0 , p0 ) δp
i0 (p0 ) δp = − Ix0 (x0 , p0 ), 0 B0−1
+ Ip0 (x0 , p0 ) δp,
gp0 (x0 , p0 ) δp
where B0 is the matrix on the left hand side of (3.3). Realizing that Ix0 (x0 , p0 ) has just
one row, evaluating the term in square brackets amounts to only one linear system
solve. We define the dual quantities (v, y) by
0
v
I (x , p )
B0>
=− x 0 0
y
0
and finally obtain
i0 (p0 ) δp = v > L00xp (x0 , z0 , p0 ) δp + y > L00zp (x0 , z0 , p0 ) δp + Ip0 (x0 , p0 ) δp.
(3.4)
We refer to this as a dual approach. In our context, B0 is symmetric and hence the
computation of the dual quantities requires just one solve of (3.3) with a modified
right hand side, see again Table 3.1.
For the second derivative, we differentiate (3.4) totally with respect to p. From
the chain rule we infer that the sensitivities X 0 (p0 ) and Z 0 (p0 ) now come into play.
In addition, v and y need to be differentiated with respect to p, but again a duality
technique can be used in order to avoid computing these extra terms. Hence the extra
computational cost to evaluate the Hessian of i(·) amounts to d = dim P solves for
the evaluation of the sensitivity matrices X 0 (p0 ) and Z 0 (p0 ), see Table 3.1. Details can
be found in the proofs of Theorems 3.6 for the unconstrained case and Theorems 3.18
and 3.21 for the case with control constraints.
8. Numerical Sensitivity Analysis for the Quantity of Interest
155
3.2. The case of no control constraints. Throughout this and the following section, we denote by p0 ∈ P a given reference parameter and by x0 = (u0 , q0 , z0 ) a
solution to the corresponding first order optimality system (2.11)–(2.15). Moreover,
we make the following regularity assumption which we require throughout:
Assumption 3.2. Let the derivative a0u (u0 , q0 , p0 ) : V → V 0 be both surjective and
injective, so that it possesses a continuous inverse.
In the case of no control constraints, i.e., Qad = Q, the first order necessary conditions (2.11)–(2.15) simplify to
L0u (u, q, z, p)(δu) =
L0q (u, q, z, p)(δq) =
0 ∀δu ∈ V,
0 ∀δq ∈ Q,
(3.5)
(3.6)
L0z (u, q, z, p)(δz)
0 ∀δz ∈ V.
(3.7)
=
The analysis in this subsection is based on the classical implicit function theorem.
We denote by B0 = B(x0 , p0 ) the previously defined Hessian operator at the given
reference solution. For the results in this section we require that B0 is boundedly
invertible. This property follows from the second order sufficient conditions, see for
instance [14]:
Lemma 3.3. Let the second order sufficient conditions set forth in Lemma 2.2 hold
at x0 for OP(p0 ). Then B0 is boundedly invertible.
The following lemma is a direct application of the implicit function theorem (see [5])
to the first order optimality system (3.5)–(3.7).
Lemma 3.4. Let B0 be boundedly invertible. Then there exist neighborhoods N (p0 ) ⊂
P of p0 and N (x0 ) ⊂ X of x0 and a continuously differentiable function (U, Q, Z) :
N (p0 ) → N (x0 ) with the following properties:
(a) For every p ∈ N (p0 ), (U (p), Q(p), Z(p)) is the unique solution to the system (3.5)–(3.7) in the neighborhood N (x0 ).
(b) (U (p0 ), Q(p0 ), Z(p0 )) = (u0 , q0 , z0 ) holds.
(c) The derivative of (U, Q, Z) at p0 in the direction δp ∈ P is given by the unique
solution of
 0


 00
Lup (x0 , p0 )(·, δp)
U (p0 )(δp)
B0 Q0 (p0 )(δp) = −  L00qp (x0 , p0 )(·, δp)  .
(3.8)
Z 0 (p0 )(δp)
L00zp (x0 , p0 )(·, δp)
In the following proposition we recall the first and second order sensitivity derivatives of the cost functional j(p), compare [17].
Proposition 3.5. Let B0 be boundedly invertible. Then the reduced cost functional
j(p) = J(U (p), p) + α2 kQ(p) − qk2Q is twice continuously differentiable in N (p0 ). The
first order derivative at p0 in the direction δp ∈ P is given by
j 0 (p0 )(δp) = L0p (x0 , p0 )(δp).
(3.9)
b we have
For the second order derivative in the directions of δp and δp,
b = L00 (x0 , p0 )(U 0 (p)(δp), δp)
b + L00 (x0 , p0 )(Q0 (p)(δp), δp)
b
j 00 (p0 )(δp, δp)
up
qp
b + L00 (x0 , p0 )(δp, δp).
b
+L00zp (x0 , p0 )(Z 0 (p)(δp), δp)
pp
Proof. Since (U (p), Q(p)) satisfies the state equation, we have
j(p) = L(U (p), Q(p), Z(p), p)
(3.10)
156
Numerical Methods and Applications
for all p ∈ N (p0 ). By the chain rule, the derivative of j(p) reads
j 0 (p0 )(δp) = L0u (x0 , p0 )(U 0 (p0 )(δp)) + L0q (x0 , p0 )(Q0 (p0 )(δp)) + L0z (x0 , p0 )(Z 0 (p0 )(δp))
+L0p (x0 , p0 )(δp).
The three terms in the first line vanish in view of the optimality system (3.5)–(3.7).
b yields (3.10),
Differentiating (3.9) again totally with respect to p in the direction of δp
which completes the proof.
The previous proposition allows to evaluate the first order derivative of the reduced
cost functional without computing the sensitivity derivatives of the state, control and
adjoint variables. That is, the effort to evaluate j 0 (p0 ) is negligible compared to the
effort required to solve the optimization problem. In order to obtain second order
derivative j 00 (p0 ), however, the sensitivity derivatives have to be computed according
to formula (3.8). This corresponds to the solution of one additional linear-quadratic
optimization problem per perturbation direction δp, whose optimality system is given
by (3.8).
We now turn to our main result in the absence of control constraints. In the
following theorem, we show that the first and second order derivatives of the quantity
of interest can be evaluated at practically the same effort as those of the cost functional.
To this end, we use a duality technique (see Section 3.1) and formulate the following
dual problem for the dual variables v ∈ V, r ∈ Q and y ∈ V:
 
 0

Iu (q0 , u0 , p0 )
v
B0 r  = −  Iq0 (q0 , u0 , p0 )  .
y
0
(3.11)
We remark that this dual problem involves the same operator matrix B0 as the sensitivity problem (3.8) since B0 is self-adjoint.
Theorem 3.6. Let B0 be boundedly invertible. Then the reduced quantity of interest
i(p) defined in (1.4) is twice continuously differentiable in N (p0 ). The first order
derivative at p0 in the direction δp ∈ P is given by
i0 (p0 )(δp) = L00up (x0 , p0 )(v, δp) + L00qp (x0 , p0 )(r, δp) + L00zp (x0 , p0 )(y, δp)
+ Ip0 (u0 , q0 , p0 )(δp). (3.12)
b we have
For the second order derivative in the directions of δp and δp,
b = hv, ηi
i00 (p0 )(δp, δp)
V×V 0 + hr, κiQ×Q0 + hy, σiV×V 0

 0
 0
>  00
00
00
b
U (p0 )(δp)
Iuu (q0 , u0 , p0 ) Iuq
(q0 , u0 , p0 ) Iup
(q0 , u0 , p0 )
U (p0 )(δp)

00
00
00
b 
(q0 , u0 , p0 ) Iqq
(q0 , u0 , p0 ) Iqp
(q0 , u0 , p0 )  Q0 (p0 )(δp)
+Q0 (p0 )(δp)  Iqu
.
00
00
00
b
δp
Ipu
(q0 , u0 , p0 ) Ipq
(q0 , u0 , p0 ) Ipp
(q0 , u0 , p0 )
δp
(3.13)
8. Numerical Sensitivity Analysis for the Quantity of Interest
157
Here, (η, κ, σ) ∈ V 0 × Q0 × V 0 is given by

   000
b
Lupp ()(·, δp, δp)
η
b 
κ = 
 L000
qpp ()(·, δp, δp) 
b
σ
L000
zpp ()(·, δp, δp)


0
000
0
000
0
b
b
b
L000
upu ()(·, δp, U (p0 )(δp)) + Lupq ()(·, δp, Q (p0 )(δp) + Lupz ()(·, δp, Z (p0 )(δp)

0
000
0
000
0
b
b
b 
+  L000
qpu ()(·, δp, U (p0 )(δp)) + Lqpq ()(·, δp, Q (p0 )(δp) + Lqpz ()(·, δp, Z (p0 )(δp) 
0
000
0
b
b
L000
zpu ()(·, δp, U (p0 )(δp)) + Lzpq ()(·, δp, Q (p0 )(δp)


U 0 (p0 )(δp)
b + B 0 ()(Q0 (p0 )(δp))
b + B 0 ()(Z 0 (p0 )(δp))
b + B 0 ()(δp)
b Q0 (p0 )(δp) .
+ Bu0 ()(U 0 (p0 )(δp))
q
z
p
Z 0 (p0 )(δp)
Remark 3.7.
(a) In the definition of (η, κ, σ) we have abbreviated the evaluation
at the point (x0 , p0 ) by ().
(b) The bracket h·, ·iV×V 0 in (3.13) denotes the duality pairing between V and its
dual space V 0 . For instance, the evaluation of hv, ηiV×V 0 amounts to plugging
in v instead of · in the definition of η. A similar notation is used for the
control space Q.
(c) It is tedious but straightforward to check that (3.13) coincides with (3.10) if
the quantity of interest is chosen equal to the cost functional. In this case, it
follows from (3.11) that the dual quantities v and r vanish and y = z0 holds.
Proof. (of Theorem 3.6) From the definition of the reduced quantity of interest (1.4),
we infer that
i0 (p0 )(δp) = Iu0 (u0 , q0 , p0 )(U 0 (p0 )(δp)) + Iq0 (u0 , q0 , p0 )(Q0 (p0 )(δp)) + Ip0 (u0 , q0 , p0 )(δp)
(3.14)
holds. In virtue of (3.8) and (3.11), the sum of the first two terms equals

  >  00
 0
>
 00
Lup (x0 , p0 )(·, δp)
Lup (x0 , p0 )(·, δp)
Iu (u0 , q0 , p0 )
v
−1
−  Iq0 (u0 , q0 , p0 )  B0  L00qp (x0 , p0 )(·, δp)  = r   L00qp (x0 , p0 )(·, δp) 
y
L00zp (x0 , p0 )(·, δp)
L00zp (x0 , p0 )(·, δp)
0
which implies (3.12). In order to obtain the second derivative, we differentiate (3.14)
b This yields
totally with respect to p in the direction of δp.
b =
i00 (p0 )(δp, δp)

 0
 0
>  00
00
00
b
U (p0 )(δp)
Iuu (q0 , u0 , p0 ) Iuq
(q0 , u0 , p0 ) Iup
(q0 , u0 , p0 )
U (p0 )(δp)

00
00
00
Q0 (p0 )(δp)  Iqu
b 
(q0 , u0 , p0 ) Iqq
(q0 , u0 , p0 ) Iqp
(q0 , u0 , p0 )  Q0 (p0 )(δp)

00
00
00
b
δp
Ipu (q0 , u0 , p0 ) Ipq (q0 , u0 , p0 ) Ipp (q0 , u0 , p0 )
δp

 0
>  00
b
U (p0 )(δp, δp)
Iu (u0 , q0 , p0 )
 00

0


b
+ Iq (u0 , q0 , p0 )
Q (p0 )(δp, δp) . (3.15)
b
0
Z 00 (p0 )(δp, δp)
b we obtain
From differentiating (3.8) totally with respect to p in the direction of δp,


 
b
U 00 (p0 )(δp, δp)
η

b 
B0 Q00 (p0 )(δp, δp)
(3.16)
 = − κ .
b
σ
Z 00 (p0 )(δp, δp)
From here, (3.13) follows.
The main statement of the previous theorem is that the first and second order
derivatives of the reduced quantity of interest can be evaluated at the additional
158
Numerical Methods and Applications
Table 3.1. Number of linear-quadratic problems to be solved to evaluate the derivatives of j(p) and i(p).
reduced cost functional j(p)
gradient
Hessian
reduced quantity of interest i(p)
dual approach
direct approach
0
dim P
1
1 + dim P
dim P
(dim P) (dim P+1)/2
expense of just one dual problem (3.11), compared to the evaluation of the reduced
cost functional’s derivatives. More precisely, computing the gradient of i(p) at p0
requires only the solution of (3.11). In addition, in order to compute the Hessian of
i(p) at p0 , the sensitivity quantities U 0 (p0 ), Q0 (p0 ) and Z 0 (p0 ) need to be evaluated
in the directions of a collection of basis vectors of the parameter space P. That
is, dim P sensitivity problems (3.8) need to be solved. These are exactly the same
problems which have to be solved for the computation of the Hessian of the reduced
cost functional, see Table 3.1. Note that in the combined effort 1 + dim P, ”1” refers
to the same dual problem (3.11) that has already been solved during the computation
of the gradient of i(p). In case that the space P is infinite-dimensional, it needs to be
discretized first. Finally, in order to evaluate the second order Taylor expansion for a
given direction δp,
1
i(p0 + δp) ≈ i(p0 ) + i0 (p0 )(δp) + i00 (p0 )(δp, δp),
2
the same dual problem (3.11) and one sensitivity problem (3.8) in the direction of δp
are needed, see Table 3.1.
Note that the sensitivity and dual problems (3.8) and (3.11), respectively, are solved
by the technique described in Section 2. The solution of such problem amounts to
the computation of one additional QP step (2.17), with different right hand side.
Therefore, the numerical effort to compute, e.g., the second order Taylor expansion for
a given direction is typically low compared to the solution of the nonlinear optimization
problem OP(p0 )
3.3. The control-constrained case. The analysis is based on the notion of strong
regularity for the problem OP(p). Strong regularity extends the previous assumption
of bounded invertibility of B0 used throughout Section 3.2.
Below, we make use of µ0 ∈ Q given by the following identification:
(µ0 , δq) = −L0q (x0 , p0 )(δq) ∀δq ∈ Q.
(3.17)
This quantity acts as a Lagrange multiplier for the control constraint q ∈ Qad . For the
definition of strong regularity we introduce the following linearized optimality system
which depends on ε = (εu , εq , εz ) ∈ V × Q × V:
(LOS(ε))
L00uu (x0 , p0 )(δu, u − u0 ) + L00uq (x0 , p0 )(δu, q − q0 )
+L00uz (x0 , p0 )(δu, z − z0 ) + L0u (x0 , p0 )(δu) + (εu , δu)V = 0 ∀δu ∈ V
L00uq (x0 , p0 )(u
− u0 , δq − q) +
L00qq (x0 , p0 )(δq
− q, q − q0 )
0
+Lq (x0 , p0 )(δq − q) + (εq , δq − q)
L00zu (x0 , p0 )(δz, u − u0 ) + L00zq (x0 , p0 )(δz, q − q0 )
+L0z (x0 , p0 )(δz) + (εz , δz)V
In the sequel, we refer to (3.18)–(3.20) as (LOS(ε)).
+
L00qz (x0 , p0 )(δq
≥0
∀δq ∈ Qad
= 0 ∀δz ∈ V
(3.18)
− q, z − z0 )
(3.19)
(3.20)
8. Numerical Sensitivity Analysis for the Quantity of Interest
159
Definition 3.8 (Strong Regularity). Let p0 ∈ P be a given reference parameter and let
x0 = (u0 , q0 , z0 ) be a solution to the corresponding first order optimality system (2.5)–
(2.7). If there exist neighborhoods N (0) ⊂ X = V × Q × V of 0 and N (x0 ) ⊂ X of x0
such that the following conditions hold:
(a) For every ε ∈ N (0), there exists a solution (uε , q ε , z ε ) to the linearized optimality system (3.18)–(3.20).
(b) (uε , q ε , z ε ) is the unique solution of (3.18)–(3.20) in N (x0 ).
(c) (uε , q ε , z ε ) depends Lipschitz-continuously on ε, i.e., there exists L > 0 such
that
kuε1 − uε2 kV + kq ε1 − q ε2 kQ + kz ε1 − z ε2 kV ≤ L kε1 − ε2 kX
(3.21)
holds for all ε1 , ε2 ∈ N (0),
then the first order optimality system (2.5)–(2.7) is called strongly regular at x0 .
Note that (u0 , q0 , z0 ) solves (3.18)–(3.20) for ε = 0. It is not difficult to see that in
the case of no control constraints, i.e., Q = Qad , strong regularity is nothing else than
bounded invertibility of B0 which we had to assume in Section 3.2. In the following
lemma we show that strong regularity holds under suitable second order sufficient
optimality conditions, in analogy to Lemma 3.3. The proof can be carried out using
the techniques presented in [21].
Lemma 3.9. Let the second order sufficient optimality conditions set forth in Lemma 2.2
hold at x0 for OP(p0 ). Then for any ε ∈ X , (3.18)–(3.20) has a unique solution
(uε , q ε , z ε ) and the map
X 3 ε 7→ (uε , q ε , z ε ) ∈ X
(3.22)
is Lipschitz continuous. That is, the optimality system is strongly regular at x0 .
In the next step, we proceed to prove that the solution (uε , q ε , z ε ) of the linearized
optimality system (3.18)–(3.20) is directionally differentiable with respect to the perturbation ε. To this end, we need the following assumption:
Assumption 3.10. At the reference point (u0 , q0 , z0 ), let the following linear operators
be compact:
(1) V 3 u 7→ a00qu (u0 , q0 , p0 )(·, u, z0 ) ∈ Q0
(2) Q 3 q 7→ a00qq (u0 , q0 , p0 )(·, q, z0 ) ∈ Q0
(3) V 3 z 7→ a0q (u0 , q0 , p0 )(·, z) ∈ Q0
Remark 3.11. The previous assumption is satisfied for the following important classes
of PDE-constrained optimization problems on bounded domains Ω ⊂ Rd , d ∈ {1, 2, 3}:
(a) If (OP(p)) is a distributed optimal control problem for a semilinear elliptic
PDE, e.g.,
−∆u = f (u) + q on Ω
with V = H01 (Ω) and Q = L2 (Ω), then a00qu = a00qq = 0 and a0q is the compact
injection of V into Q.
(b) In the case of Neumann boundary control on ∂Ω, e.g.,
−∆u = f (u) on Ω
and
∂
u = q on ∂Ω,
∂n
we have V = H 1 (Ω) and Q = L2 (∂Ω). Again, a00qu = a00qq = 0 and a0q is the
compact Dirichlet trace operator from V to Q.
160
Numerical Methods and Applications
(c) For bilinear control problems, e.g.,
−∆u = qu + f on Ω
with V = H01 (Ω), Q = L2 (Ω) and an appropriate admissible set Qad , we have
a00qq = 0. Moreover, the operators u 7→ a00qu (u0 , q0 , p0 )(·, u, z0 ) = (uz0 , ·) and
z 7→ a0q (u0 , q0 , z0 ) = (u0 z, ·) are compact from V to Q0 since the pointwise
product of two functions in V embeds compactly into Q.
(d) For parabolic equations such as
ut = ∆u + f (u) + q
with solutions in V = {u ∈ L2 (0, T ; H01 (Ω)) : ut ∈ L2 (0, T ; H −1 (Ω)} we have
a00qu = a00qq = 0 and a0q is the compact injection of V into Q = L2 (Ω × (0, T )).
(e) Finally, Assumption 3.10 is always satisfied if the space Q is finite-dimensional.
This includes all cases of parameter identification problems without any additional restrictions on the coupling between the parameters q and the state
variable u. For instance, the Arrhenius law leads to reaction-diffusion equations of the form
−∆u = f (u) + equ on Ω
with unknown Arrhenius parameter q ∈ R.
bad , defined as
For the following theorem, we introduce the admissible set Q
bad = {q̂ ∈ Q : bb− (x) ≤ q̂(x) ≤ bb+ (x) a.e. on ω}
Q
with bounds
bb− (x)
bb+ (x)
(
0
if µ0 (x) 6= 0 or q0 (x) = b− (x)
=
−∞ else
(
0
if µ0 (x) 6= 0 or q0 (x) = b+ (x)
=
+∞ else.
Theorem 3.12. Let the second order sufficient optimality conditions set forth in
Lemma 2.2 hold at x0 for OP(p0 ) in addition to Assumption 3.10. Then the map
(3.22) is directionally differentiable at ε = 0 in every direction δε = (δεu , δεq , δεz ) ∈
X . The directional derivative is given by the unique solution (û, q̂) and adjoint variable
ẑ of the following linear-quadratic optimal control problem, termed DQP(δε):
L00uu (x0 , p0 ) L00uq (x0 , p0 )
1
û
û q̂
+ (û, δεu )V + (q̂, δεq )
Minimize
L00qu (x0 , p0 ) L00qq (x0 , p0 )
q̂
2
(DQP(δε))
bad and
subject to q̂ ∈ Q
a0u (u0 , q0 , p0 )(û, φ) + a0q (u0 , q0 , p0 )(q̂, φ) + (δεz , φ) = 0
for all φ ∈ V.
The first order optimality conditions for this problem read:
L00uu (x0 , p0 )(δu, û) + L00uq (x0 , p0 )(δu, q̂)
+L00uz (x0 , p0 )(δu, ẑ) + (δεu , δu) =
0
∀δu ∈ V
(3.23)
∀δq ∈ Q̂ad
(3.24)
∀δz ∈ V.
(3.25)
L00uq (x0 , p0 )(û, δq − q̂) + L00qq (x0 , p0 )(δq − q̂, q̂)
+L00qz (x0 , p0 )(δq − q̂, ẑ) + (δεq , δq − q̂) ≥ 0
L00zu (x0 , p0 )(δz, û) + L00zq (x0 , p0 )(δz, q̂)
+(δεz , δz) =
0
8. Numerical Sensitivity Analysis for the Quantity of Interest
161
Proof. Let δε = (δεu , δεq , δεz ) ∈ X be given and let {τn } ⊂ R+ denote a sequence
converging to zero. We denote by (un , qn , zn ) ∈ X the unique solution of LOS(εn )
where εn = τn δε. Note that (u0 , q0 , z0 ) is the unique solution of LOS(0) and that
(un , qn , zn ) → (u0 , q0 , z0 ) strongly in X . From Lemma 3.9 we infer that
un − u0 + qn − q0 + zn − z0 ≤ L kδεkX .
τn τn τn V
Q
V
This implies that a subsequence (still denoted by index n) of the difference quotients
converges weakly to some limit element (û, q̂, ẑ) ∈ X . The proof proceeds with the
construction of the pointwise limit qe of (qn − q0 )/τn , which is later shown to coincide
with q̂. It is well known that the variational inequality (3.19) in LOS(εn ) can be
equivalently rewritten as
qn (x) = Π[b− (x),b+ (x)] dn (x) a.e. on ω,
(3.26)
where Π[b− (x),b+ (x)] is the projection onto the interval [b− (x), b+ (x)] and
1 00
a (u0 , q0 , p0 )(·, un − u0 , z0 ) + a00qq (u0 , q0 , p0 )(·, qn − q0 , z0 )
dn = q̄ +
α qu
+a0q (u0 , q0 , p0 )(·, zn ) − εqn ∈ Q.
(3.27)
The linear operators in (3.27) are understood
as their Riesz representations in Q.
Similarly, we have q0 (x) = Π[b− (x),b+ (x)] d0 (x) a.e. on ω, where
1 0
a (u0 , q0 , p0 )(·, z0 ) ∈ Q.
(3.28)
α q
Note that dn → d0 strongly in Q since the Fréchet derivatives in (3.27) are bounded
linear operators. From the compactness properties in Assumption 3.10 we infer that
d0
= q̄ +
dn − d0
→ dˆ strongly in Q,
τn
where
1 00
dˆ =
aqu (u0 , q0 , p0 )(·, û, z0 ) + a00qq (u0 , q0 , p0 )(·, q̂, z0 ) + a0q (u0 , q0 , p0 )(·, ẑ) − δεq .
α
By taking another subsequence, we obtain that dn → d0 and (dn − d0 )/τn → dˆ hold
also pointwise a.e. on ω. The construction of the pointwise limit
qn (x) − q0 (x)
τn
uses the following partition of ω into five disjoint subsets:
qe(x) = lim
n→∞
ω
= ω I ∪ ω0+ ∪ (ω + \ ω0+ ) ∪ ω0− ∪ (ω − \ ω0− )
(3.29)
where
ω I = {x ∈ ω : b− (x) < q0 (x) < b+ (x)}
(inactive)
(3.30a)
ω0+
+
= {x ∈ ω : µ0 (x) > 0}
(upper strongly active)
(3.30b)
ω = {x ∈ ω : q0 (x) = b+ (x)}
(upper active)
(3.30c)
ω0−
−
= {x ∈ ω : µ0 (x) < 0}
(lower strongly active)
(3.30d)
(lower active).
(3.30e)
ω = {x ∈ ω : q0 (x) = b− (x)}
The Lagrange multiplier µ0 belonging to the constraint q0 ∈ Qad defined in (3.17)
allows the following representation:
µ0 = α(d0 − q0 ).
(3.31)
Note that the five sets in (3.29) are guaranteed to be disjoint if b− (x) < b+ (x) holds
a.e. on ω. However, one can easily check that qe is well-defined also in the case that
162
Numerical Methods and Applications
the bounds coincide on all or part of ω. We now distinguish 5 cases according to the
sets in (3.29):
Case 1: For almost every x in the inactive subset ω I , we have q0 (x) = d0 (x) and
qn (x) = dn (x) for all sufficiently large n. Therefore,
qe(x) = lim
n→∞
qn (x) − q0 (x)
ˆ
= d(x).
τn
Case 2: For almost every x ∈ ω0+ , µ0 (x) > 0 implies d0 (x) > q0 (x) by (3.31). Therefore,
q0 (x) = b+ (x) and dn (x) > q0 (x) for sufficiently large n. Hence qn = b+ (x) for these
n and
qe(x) = lim
n→∞
qn (x) − q0 (x)
= 0.
τn
Case 3: For almost every x ∈ ω + \ ω0+ , we have q0 (x) = b+ (x) = d0 (x).
ˆ
(a) If d(x)
> 0, then dn (x) > b+ (x) for sufficiently large n. Therefore, qn (x) =
b+ (x) for these n and hence qe(x) = 0.
ˆ = 0, then (qn (x)−q0 (x))/τn = min{0, dn (x)−b+ (x)}/τ n for sufficiently
(b) If d(x)
large n, hence qe(x) = 0.
ˆ < 0, then dn (x) < b+ (x) and hence qn (x) = dn (x) for sufficiently large
(c) If d(x)
ˆ
n. Therefore, qe(x) = d(x)
holds.
Case 3 can be summarized as
qe(x) = lim
n→∞
qn (x) − q0 (x)
ˆ
= min{0, d(x)}.
τn
Case 4: For almost every x ∈ ω0− , we obtain, similarly to Case 2,
qe(x) = lim
n→∞
qn (x) − q0 (x)
= 0.
τn
Case 5: For almost every x ∈ ω − \ ω0− , we obtain, similarly to Case 3,
qe(x) = lim
n→∞
qn (x) − q0 (x)
ˆ
= max{0, d(x)}.
τn
Summarizing all previous cases, we have shown that
ˆ
qe(x) = Π[bb− (x),bb+ (x)] (d(x)).
(3.32)
We proceed by showing that
qn − q0
→ qe strongly in Q = L2 (ω).
τn
(3.33)
From the Lipschitz continuity of the projection Π, it follows that
qn − q0
1
ˆ
τn − qe = τn (ΠQad (dn ) − ΠQad (d0 )) − ΠQbad (d)
Q
Q
dn − d0 ˆ
ˆ
≤
τn + kdkQ → 2kdkQ .
Q
From Lebesgue’s Dominated Convergence Theorem, (3.33) follows. Consequently, we
have qe = q̂. The projection formula (3.32) is equivalent to the variational inequality
(3.24). Using the equations (3.18) and (3.20) for (un , qn , zn ) and for (u0 , q0 , z0 ), we
infer that the weak limit (û, q̂, ẑ) satisfies (3.23) and (3.25). It is readily checked
that (3.23)–(3.25) are the first order necessary conditions for (DQP(δε)). In view of
the second order sufficient optimality conditions (Lemma 2.2), (DQP(δε)) is strictly
8. Numerical Sensitivity Analysis for the Quantity of Interest
163
convex and thus it has a unique solution. In view of Assumption 3.2 and (3.25), we
obtain
un − u0
qn − q0
− û ≤ C − q̂ τn
τn
V
Q
where C is independent of n. Hence û is also the strong limit of the difference quotient
in V. The same arguments holds for ẑ. Our whole argument remains valid if in the
beginning, we start with an arbitrary subsequence of {τn }. Since the limit (û, q̂, ẑ) is
always the same, the convergence extends to the whole sequence.
From the previous theorem we derive the following important corollary. The proof
follows from a direct application of the implicit function theorem for generalized equations, see [6, Theorem 2.4].
Corollary 3.13. Under the conditions of the previous theorem, there exist neighborhoods N (p0 ) ⊂ P of p0 and N (x0 ) ⊂ X of x0 and a directionally differentiable function
(U, Q, Z) : N (p0 ) → N (x0 ) with the following properties:
(a) For every p ∈ N (p0 ), (U (p), Q(p), Z(p)) is the unique solution to the system (2.5)–(2.7) in the neighborhood N (x0 ).
(b) (U (p0 ), Q(p0 ), Z(p0 )) = (u0 , q0 , z0 ) holds.
(c) The directional derivative of (U, Q, Z) at p0 in the direction δp ∈ P is given
by the derivative of ε 7→ (uε , q ε , z ε ) at ε = 0 in the direction

 00
Lup (x0 , p0 )(·, δp)
(3.34)
δε =  L00qp (x0 , p0 )(·, δp)  ,
L00zp (x0 , p0 )(·, δp)
i.e., by the solution and adjoint (û, q̂, ẑ) of DQP(δε).
We remark that computing the sensitivity derivative of (U, Q, Z) for a given direction δp amounts to solving the linear-quadratic optimal control problem DQP(δε)
for δε given by (3.34). Note that this problem, like the original one OP(p0 ), is subject to pointwise inequality constraints for the control variable. Due to the structure
bad , the directional derivative of (U, Q, Z) is in general not a
of the admissible set Q
linear function of the direction δp, but only positively homogeneous. Note however
bad is a linear space (which follows from a condition known as
if the admissible set Q
strict complementarity, see below), then the directional derivative becomes linear in
the direction (i.e., it is the Gateaux differential).
Definition 3.14 (Strict complementarity). Strict complementarity is said to hold at
(x0 , p0 ) if
n
o
x ∈ ω : q0 (x) ∈ {b− (x), b+ (x)} and µ0 (x) = 0
is a set of measure zero.
A consequence of the strict complementarity condition is that the sensitivity derivatives are characterized by a linear system of equations set forth in the following
e was defined in (2.19) and that RI denotes the multilemma. We recall that B
plication of a function in L2 (ω) with the characteristic function of the inactive set
ω I = {x ∈ ω : b− (x) < q0 (x) < b+ (x)}, see Section 2.
Lemma 3.15. Under the conditions of Theorem 3.12 and if strict complementarity
holds at (x0 , p0 ), then the directional derivative of (U, Q, Z) is characterized by the
following linear system of equations:
 0


 00
Lup (x0 , p0 )(·, δp)
U (p0 )(δp)
e 0 , p0 ) Q0 (p0 )(δp) = − RI L00qp (x0 , p0 )(·, δp) .
B(x
(3.35)
Z 0 (p0 )(δp)
L00zp (x0 , p0 )(·, δp)
164
Numerical Methods and Applications
e 0 , p0 ) : X → X 0 is boundedly invertible.
Moreover, the operator B(x
bad defined
Proof. In virtue of the strict complementarity property, the admissible set Q
in Theorem 3.12 becomes
n
o
bad = q̂ ∈ Q : q̂(x) = 0 where q0 (x) ∈ {b− (x), b+ (x)} .
Q
Consequently, the variational inequality (3.24) simplifies to the following equation for
bad :
Q0 (p0 )(δp) ∈ Q
L00qu (x0 , p0 )(δq, U 0 (p0 )(δp)) + L00qq (x0 , p0 )(δq, Q0 (p0 )(δp))
bad ,
+ L00qz (x0 , p0 )(δq, Z 0 (p0 )(δp)) = −L00qp (x0 , p0 )(δq, δp) ∀δq ∈ Q
which is equivalent to the middle equation in (3.35). The first and third equation in
(3.35) coincide with (3.23) and (3.25), which proves the first claim. From Theorem 3.12
e 0 , p0 ) is bijective. Since it a continuous linear operator from
we conclude that B(x
X → X 0 , so is its inverse.
We are now in the position to recall the first and second order sensitivity derivatives
of the reduced cost functional j(p), compare again [17]. Note that we do not make use
of strict complementarity in the following proposition.
Proposition 3.16. Under the conditions of Theorem 3.12, the reduced cost functional
α
j(p) = J(U (p), p) + kQ(p) − qk2Q
2
is continuously differentiable in N (p0 ). The derivative at p0 in the direction δp ∈ P
is given by
j 0 (p)(δp) = L0p (x0 , p0 )(δp).
(3.36)
Additionally, the second order directional derivatives of the reduced cost function j
exist, and are given by the following formula:
b = L00 (x0 , p0 )(U 0 (p0 )(δp), δp)
b + L00 (x0 , p0 )(Q0 (p0 )(δp), δp)
b
j 00 (p0 )(δp, δp)
up
qp
b + L00 (x0 , p0 )(δp, δp).
b
+ L00zp (x0 , p0 )(Z 0 (p0 )(δp), δp)
(3.37)
pp
Proof. As in the unconstrained case there holds:
j 0 (p0 )(δp) = L0u (x0 , p0 )(U 0 (p0 )(δp)) + L0q (x0 , p0 )(Q0 (p0 )(δp))
+ L0z (x0 , p0 )(Z 0 (p0 )(δp)) + L0p (x0 , p0 )(δp).
and the terms L0u and L0z vanish. Moreover,
L0q (x0 , p0 )(Q0 (p0 )(δp)) = −(µ0 , Q0 (p0 )(δp) = 0
since Q0 (p0 )(δp) is zero on the strongly active set and µ0 vanishes on its complement.
The formula for the second order derivative follows as in Proposition 3.5 by total
directional differentiation of the first order formula.
Remark 3.17. We note that the expressions for the first and second order derivatives
in Proposition 3.16 are the same as in the unconstrained case, see Proposition 3.5.
We now turn to our main result in the control-constrained case, concerning the
differentiability and efficient evaluation of the sensitivity derivatives for the reduced
quantity of interest (1.4). We recall that in the unconstrained case, we have made
use of a duality argument for the efficient computation of the first and second order
derivatives, see Section 3.2. However, in the presence of control constraints, this
technique seems to be applicable only in the case of strict complementarity since
otherwise, the derivatives (U 0 (p0 )(δp), ξ 0 (p0 )(δp), Z 0 (p0 )(δp)) do not depend linearly
8. Numerical Sensitivity Analysis for the Quantity of Interest
165
on the direction δp. In analogy to (3.11) and (3.35), we define the dual quantities
(e
v , re, ye) ∈ X by
 
 0

Iu (q0 , u0 , p0 )
ve
e 0 , p0 ) re = − RI Iq0 (q0 , u0 , p0 ) .
B(x
(3.38)
ye
0
Theorem 3.18. Under the conditions of Theorem 3.12, the reduced quantity of interest i(p) is directionally differentiable at the reference parameter p0 . If in addition,
strict complementarity holds at (x0 , p0 ), then the first order directional derivative at
p0 in the direction δp ∈ P is given by
i0 (p0 )(δp) = L00up (x0 , p0 )(e
v , δp) + L00qp (x0 , p0 )(RI re, δp) + L00zp (x0 , p0 )(e
y , δp)
+ Ip0 (u0 , q0 , p0 )(δp). (3.39)
Proof. The proof is carried out similar to the proof of Theorem 3.6 using Lemma 3.15.
Our next goal is to consider second order derivatives of the reduced quantity of
interest. In order to apply the approach used in the unconstrained case, we rely
on the existence of second order directional derivatives of (U, Q, Z) at p0 . However,
these second order derivatives do not exist without further assumptions, as seen from
the following simple consideration: Suppose that near a given reference parameter
p0 = 0, the local optimal control is given by Q(p)(x) = max{0, x + p} ∈ L2 (ω) for
x ∈ ω = (−1, 1) and p ∈ R. (An appropriate optimal control problem (OP(p)) can be
easily constructed.) Then Q0 (p)(x) = H(x + p) (the Heaviside function), which is not
directionally differentiable with respect to p and values in L2 (ω). Note that the point
x = −p of discontinuity marks the boundary between the active and inactive sets of
(OP(p)). Hence we conclude that the reason for the non-existence of the second order
directional derivatives of Q lies in the change of the active set with p.
The preceding argument leads to the following assumption:
Assumption 3.19. There exists a neighborhood N (p0 ) ⊂ P of the reference parameter p0 such that for every p ∈ N (p0 ), strict complementarity holds at the solution
(U (p), Q(p), Z(p)), and the active sets coincide with those of (u0 , q0 , z0 ).
Remark 3.20. The previous assumption seems difficult to satisfy in the general case.
However, if the control variable is finite-dimensional and strict complementarity is
assumed at the reference solution (u0 , q0 , z0 ), then Assumption 3.19 is satisfied since
the Lagrange multiplier µ(p) = −L0q (U (p), Q(p), Z(p), p) is continuous with respect to
p and has values in Rn .
We now proceed to our main result concerning second order derivatives of the
reduced quantity of interest. In the theorem below, we use again () to denote evaluation
at the point (x0 , p0 ).
Theorem 3.21. Under the conditions of Theorem 3.12 and Assumption 3.19, the
reduced quantity of interest i(p) is twice directionally differentiable at p0 . The second
b are given by
order directional derivatives in the directions of δp and δp
b = he
i00 (p0 )(δp, δp)
v , ηiV×V 0 + he
r, κiQ×Q0 + he
y , σiV×V 0

 0
 0
>  00
00
00
b
U (p0 )(δp)
Iuu (q0 , u0 , p0 ) Iuq
(q0 , u0 , p0 ) Iup
(q0 , u0 , p0 )
U (p0 )(δp)

00
00
00
b 
(q0 , u0 , p0 ) Iqq
(q0 , u0 , p0 ) Iqp
(q0 , u0 , p0 )  Q0 (p0 )(δp)
+Q0 (p0 )(δp)  Iqu
.
00
00
00
b
δp
Ipu
(q0 , u0 , p0 ) Ipq
(q0 , u0 , p0 ) Ipp
(q0 , u0 , p0 )
δp
(3.40)
166
Numerical Methods and Applications
Here, (η, κ, σ) ∈ V 0 × Q0 × V 0 is given, as in the unconstrained case, by

   000
b
Lupp ()(·, δp, δp)
η
b 
κ = 
 L000
qpp ()(·, δp, δp) 
b
σ
L000
zpp ()(·, δp, δp)


0
000
0
000
0
b
b
b
L000
upu ()(·, δp, U (p0 )(δp)) + Lupq ()(·, δp, Q (p0 )(δp) + Lupz ()(·, δp, Z (p0 )(δp)

0
000
0
000
0
b
b
b 
+  L000
qpu ()(·, δp, U (p0 )(δp)) + Lqpq ()(·, δp, Q (p0 )(δp) + Lqpz ()(·, δp, Z (p0 )(δp) 
0
000
0
b
b
L000
zpu ()(·, δp, U (p0 )(δp)) + Lzpq ()(·, δp, Q (p0 )(δp)


U 0 (p0 )(δp)
b Q0 (p0 )(δp) .
b +B
b +B
b +B
e 0 ()(U 0 (p0 )(δp))
eq0 ()(Q0 (p0 )(δp))
ez0 ()(Z 0 (p0 )(δp))
ep0 ()(δp)
+ B
u
Z 0 (p0 )(δp)
(3.41)
Proof. The proof uses the same argument as the proof of Theorem 3.6. Note that in
e (p), Q(p), Z(p), p) is totally directionally differentiable
view of Assumption 3.19, B(U
b the derivative is
with respect to p at p0 . In the direction δp,
b
b +B
b +B
b +B
eu0 ()(U 0 (p0 )(δp))
eq0 ()(Q0 (p0 )(δp))
ez0 ()(Z 0 (p0 )(δp))
ep0 ()(δp).
B
Due to the constant active sets, these partial derivatives have the following form:

e 0 () = 
B
u
id

RI
id

 Bu0 (x0 , p0 ) 
id

RI
id
,
e 0 , p0 ), see Lemma 3.15, the second
etc. In view of the bounded invertibility of B(x
order partial derivatives of (U, Q, Z) at p0 exist by the Implicit Function Theorem.
They satisfy the analogue of equation (3.16).
We conclude this section by outlining an algorithm which collects the necessary steps
b
to evaluate the first and second order sensitivity derivatives j 0 (p0 ) δp and j 00 (p0 )(δp, δp)
0
00
b for given δp, δp
b ∈ P. We suppose that the original
as well as i (p0 ) δp and i (p0 )(δp, δp)
optimization problem (OP(p)) has been solved, e.g., by the primal-dual active set
approach in Section 2, for the nominal parameter p0 . We denote by A± and I the
active and inactive sets belonging to the nominal solution (u0 , q0 ) and adjoint state
e 0 , p0 ) appearing in equations (3.35) and (3.38), we refer
z0 . For the definition of B(x
to (2.19).
8. Numerical Sensitivity Analysis for the Quantity of Interest
167
Evaluation of sensitivity derivatives
(1) Evaluate j 0 (p0 ) δp according to (3.36)
(2) Compute the sensitivities U 0 (p0 ) δp,
Q0 (p0 ) δp and
Z 0 (p0 ) δp from (3.35)
b according to (3.37)
(3) Evaluate j 00 (p0 )(δp, δp)
(4) Compute the dual quantities (e
v , re, ye) from (3.38)
(5) Evaluate i0 (p0 ) δp according to (3.39)
b
(6) Compute the sensitivities U 0 (p0 ) δp,
b and
Q0 (p0 ) δp
b from (3.35)
Z 0 (p0 ) δp
(7) Compute the auxiliary quantities (η, κ, σ) from (3.41)
b according to (3.40)
(8) Evaluate i00 (p0 )(δp, δp)
4. Numerical Examples
In this section we illustrate our approach using two examples from different areas.
The first example is concerned with a parameter identification problem for the stationary Navier-Stokes system. No inequality constraints are present in this problem,
and first and second order derivatives of the quantity of interest are obtained. In
the second example, we consider a control-constrained optimal control problem for an
instationary reaction-diffusion system subject to an infinite-dimensional parameter,
which demonstrates the full potential of our approach.
4.1. Example 1. In this section we illustrate our approach using as an example a
parameter identification flow problem without inequality constraints. We consider the
configuration sketched in Figure 4.1.
Γ0
ξ2
Γ1
Γ2
ξ1
Γ0
ΓC
ξ3
ξ4
Γ3
Figure 4.1. Configuration of the system of pipes with measurement
points
The (stationary) flow in this system of pipes around the cylinder ΓC is described
by incompressible Navier-Stokes equations, with unknown viscosity q:
−q∆v + v · ∇v + ∇p
∇·v
v
v
∂v
q ∂n − pn
∂v
q ∂n
− pn
=
=
=
=
=
=
f
0
0
vin
πn
0
in
in
on
on
on
on
Ω,
Ω,
Γ 0 ∪ ΓC ,
Γ1 ,
Γ2 ,
Γ3 .
(4.1)
168
Numerical Methods and Applications
Here, the state variable u = (v, p) consists of the velocity v = (v 1 , v 2 ) ∈ H 1 (Ω)2 and
the pressure p ∈ L2 (Ω). The inflow Dirichlet boundary condition on Γ1 is given by
a parabolic inflow vin . The outflow boundary conditions of the Neumann type are
prescribed on Γ2 and Γ3 involving the perturbation parameter π ∈ P = R. (unlike
previous sections, we denote the perturbation parameter by π to avoid the confusion
with the pressure p.) Physically, the perturbation parameter π describes the pressure
difference between Γ2 and Γ3 , see [11] for detailed discussion of this type of outflow
boundary conditions. The reference parameter is chosen π0 = 0.029.
The aim is to estimate the unknown viscosity q ∈ Q = R using the measurements
of the velocity in four given points, see Figure 4.1. By the least squares approach, this
results in the following parameter identification problem:
Minimize
4 X
2
X
i=1 j=1
(v j (ξi ) − v̄ij )2 + αq 2 ,
subject to (4.1).
Here, v̄ij are the measured values of the components of the velocity at the point ξi
and α is a regularization parameter. For a priori error analysis for finite element
discretization of parameter identification problems with pointwise measurements we
refer to [19].
The sensitivity analysis of previous sections allows to study the dependence on the
perturbation parameter π. To illustrate this, we define two functionals describing the
possible quantities of interest:
I1 (u, q) = q,
I2 (u, q) = cd (u),
where cd (u) is the drag coefficient on the cylinder ΓC defined as:
Z
cd (u) = c0 n · σ · d ds,
(4.2)
ΓC
with a chosen direction d = (1, 0), given constant c0 , and the stress tensor σ given by:
ν
σ = (∇v + (∇v)T ) − pI.
2
For the discretization of the state equation we use conforming finite elements on a
shape-regular quadrilateral mesh Th . The trial and test spaces consist of cell-wise bilinear shape-functions for both pressure and velocities. We add further terms to the finite
element formulation in order to obtain a stable formulation with respect to both the
pressure-velocity coupling and convection dominated flow. This type of stabilization
techniques is based on local projections of the pressure (LPS-method) first introduced
in [1]. The resulting parameter identification problem is solved by Newton’s method
on the parameter space as described in [3] which is known to be mesh-independent.
The nonlinear state equation is likewise solved by Newton’s method, whereas the linear sub-problems are computed using a standard multi-grid algorithm. With these
ingredients, the total numerical cost for the solution of this parameter identification
problem on a given mesh behaves like O(N ), where N is the number of degrees of
freedom (dof) for the state equation.
For the reduced quantities of interest i1 (π) and i2 (π) we compute the first and
second derivatives using the representations from Theorem 3.6. In Table 4.1 we collect
the values of these derivatives for a sequence of uniformly refined meshes.
In order to verify the computed sensitivity derivatives, we make a comparison with
the derivatives computed by the second order difference quotients. To this end we
choose ε = 10−4 and compute:
dil =
il (π0 + ε) − il (π0 − ε)
,
2ε
ddil =
il (π0 + ε) − 2il (π0 ) + il (π0 − ε)
,
ε2
8. Numerical Sensitivity Analysis for the Quantity of Interest
169
Table 4.1. The values of i1 (π) and its derivatives on a sequence of
uniformly refined meshes
cells
dofs
60
240
960
3840
15360
i01 (π)
i1 (π)
270
1.0176e–2
900 1.0086e–2
3240 1.0013e–2
12240 1.0003e–2
47520 1.0000e–2
i001 (π)
–3.9712e–1 1.4065e–1
–3.9386e–1 –3.2022e–1
–3.9613e–1 –8.5278e–1
–3.9940e–1 –1.0168e–0
–4.0030e–1 –1.0601e–0
Table 4.2. The values of i2 (π) and its derivatives on a sequence of
uniformly refined meshes
cells
dofs
i2 (π)
60
270
3.9511e–1
240
900
3.9106e–1
960
3240 3.9293e–1
3840 12240 3.9242e–1
15360 47520 3.9235e–1
i02 (π)
i002 (π)
–13.4846 9.89988
–13.8759 –4.09824
–13.8151 16.5239
–13.7357 19.3916
–13.7144 19.9385
by solving the optimization problem additionally for π = π0 − ε and π = π0 + ε. The
results are shown in Table 4.3.
Remark 4.1. The relative errors in Table 4.3 are of the order of the estimated finite
difference truncation error. We therefore consider the correctness of our method to
have been verified to within the accuracy of this test. The same holds for Example 2
and Table 4.4 below.
Table 4.3. Comparison of the computed derivatives of il (l = 1, 2)
with difference quotients, on the finest grid
l
i0l
1
2
–0.399403
–13.73574
dil
dil −i0l
i0l
i00l
–0.399404 2.5e–6 –1.01676
–13.73573 –7.3e–7 19.3916
ddil
ddil −i00
l
i00
l
–1.01678 2.0e–5
19.3917 5.2e–6
4.2. Example 2. The second example concerns a control-constrained optimal control
problem for an instationary reaction-diffusion model in 3 spatial dimensions. As the
problem setup was described in detail in [9], we will be brief here. The reactiondiffusion state equation is given by
(c1 )t = D1 ∆c1 − k1 c1 c2
in Ω × (0, T ),
(4.3a)
(c2 )t = D2 ∆c2 − k2 c1 c2
in Ω × (0, T ),
(4.3b)
where ci denotes the concentration of the i-th substance, hence u = (c1 , c2 ) is the
state variable. Ω is a domain in R3 , in this case an annular cylinder (Figure 4.2), and
T is the given final time. The control q enters through the inhomogeneous boundary
170
Numerical Methods and Applications
conditions
∂c1
= 0 in ∂Ω × (0, T ),
∂n
∂c2
D2
= q(t) α(t, x) in ∂Ωc × (0, T ),
∂n
∂c2
D2
= 0 in (∂Ω \ ∂Ωc ) × (0, T ),
∂n
and α is a given shape function on the boundary, modeling a revolving nozzle
control surface ∂Ωc , the upper annulus. Initial conditions
D1
(4.4a)
(4.4b)
(4.4c)
on the
c1 (0, x) = c10 (x)
in Ω,
(4.5a)
c2 (0, x) = c20 (x)
in Ω
(4.5b)
are also given. The objective to be minimized is
Z
Z
1
γ T
α1 |c1 (T, ·) − c1T |2 + α2 |c2 (T, ·) − c2T |2 dx +
|q − qd |2 dt
J(c1 , c2 , q) =
2 Ω
2 0
o3
n Z T
1
+ max 0,
q(t) dt − qc ,
ε
0
i.e., it contains contributions from deviation of the concentrations at the given terminal
time T from the desired ones ciT , plus control cost and a term stemming from a
penalization of excessive total control action. We consider here the particular setup
described in [9, Example 1], where substance c1 is to be driven to zero at time T (i.e.,
we have α1 = 1 and α2 = 0) from given uniform initial state c10 ≡ 1. This problem
features a number of parameters, and differentiability of optimal solutions with respect
to these parameters was proved in [10], hence, we may apply the results of Section 3.
The nominal as well as the sensitivity and dual problems were solved using a primaldual active set strategy, see [9,15]. The nominal control is depicted in Figure 4.2. One
clearly sees that the upper and lower bounds with values 5 and 1, respectively, are
active in the beginning and end of the time interval. All computations were carried out
using piecewise linear finite elements on a tetrahedral grid with roughly 3300 vertices,
13200 tetrahedra and 100 time steps.
control
5
4.5
4
control q(t)
3.5
3
2.5
2
1.5
1
0
0.1
0.2
0.3
0.4
0.5
time t
0.6
0.7
0.8
0.9
1
Figure 4.2. Optimal (unperturbed) control q (left) and computational domain (right)
Since the control variable is infinite-dimensional and control constraints are active
in the solution, the active sets will in general change even under arbitrarily small
perturbations, hence second order derivatives of the reduced quantity of interest i(p)
may not exist (see the discussion before Asumption 3.19).
8. Numerical Sensitivity Analysis for the Quantity of Interest
171
We choose as quantity of interest the total amount of control action
Z T
I(u, q) =
q(t) dt.
0
In contrast to the previous example, we consider now an infinite-dimensional parameter
p = c10 , the initial value of the first substance. After discretization on the given spatial
grid, the parameter space has a dimension dim P ≈ 3300. A look at Table 3.1 now
reveals the potential of our method: The direct evaluation of the derivative i0 (p0 ) would
have required the solution of 3300 auxiliary linear-quadratic problems, an unbearable
effort. By our dual approach, however, we need to solve only one additional such
problem (3.38) for the dual quantities. The derivative i0 (p0 ) is shown in Figure 4.3
as a distributed function on Ω. In the unperturbed setup, the desired terminal state
Figure 4.3. Gradient of the quantity of interest
c1 (T ) is everywhere above the desired state c1T ≡ 0. By increasing the value of the
initial state c10 , the desired terminal state is even more difficult to reach, which leads
to an increased control effort and thus an increased value of the quantity of interest.
This is reflected by the sign of the function in Figure 4.3, which is everywhere positive.
Moreover, one can identify the region of Ω where perturbations in the initial state have
the greatest impact on the value of the quantity of interest.
In order to check the derivative, we use again a comparison with a difference quotient
in the given direction of δp ≡ 1. Table 4.4 shows the analogue of Table 4.3 with
ε = 10−2 for this example.
Table 4.4. Comparison of the computed derivatives of i with difference quotients
i0
di
di−i0
i0
0.222770 0.222463 –1.4e–3
5. Conclusion
In this paper, we considered PDE-constrained optimization problems with inequality constraints, which depend on a perturbation parameter p. The differentiability
of optimal solutions with respect to this parameter is shown in Theorem 3.12. This
result complements previous findings in [7, 17] and makes precise the compactness
assumptions needed for the proof.
172
Numerical Methods and Applications
We obtained sensitivity results for a quantity of interest which depends on the
optimal solution and is different from the cost functional. The main contribution of
this paper is to devise an efficient algorithm to evaluate these sensitivity derivatives.
Using a duality technique, we showed that the numerical cost of evaluating the gradient
or the Hessian of the quantity of interest is only marginally higher than the evaluation
of the gradient or the Hessian of the cost functional. The small additional effort is spent
for the solution of one additional linear-quadratic optimization problem for a suitable
dual quantity. A comparison with a direct approach for the evaluation of the gradient
and the Hessian revealed the tremendous savings of the dual approach especially in the
case of a high-dimenensional parameter space. Two numerical examples confirmed the
correctness of our derivative formulae and illustrated the applicability of our results.
References
[1] R. Becker and M. Braack. A finite element pressure gradient stabilization for the stokes equations
based on local projections. Calcolo, 38(4):173–199, 2001.
[2] R. Becker, D. Meidner, and B. Vexler. Efficient numerical solution of parabolic optimization
problems by finite element methods. submitted, 2005.
[3] R. Becker and B. Vexler. Mesh refinement and numerical sensitivity analysis for parameter
calibration of partial differential equations. Journal of Computational Physics, 206(1):95–110,
2005.
[4] M. Bergounioux, K. Ito, and K. Kunisch. Primal-dual strategy for constrained optimal control
problems. SIAM Journal on Control and Optimization, 37(4):1176–1194, 1999.
[5] J. Dieudonné. Foundations of Modern Analysis. Academic Press, New York, 1969.
[6] A. Dontchev. Implicit function theorems for generalized equations. Math. Program., 70:91–106,
1995.
[7] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system—
Part I: Solution differentiability. Numerical Functional Analysis and Optimization, 25(1–2):93–
117, 2004.
[8] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system—
Part II: Practical methods and examples. Optimization Methods and Software, 19(2):217–242,
2004.
[9] R. Griesse and S. Volkwein. A primal-dual active set strategy for optimal boundary control of a
nonlinear reaction-diffusion system. SIAM Journal on Control and Optimization, 44(2):467–494,
2005.
[10] R. Griesse and S. Volkwein. Parametric sensitivity analysis for optimal boundary control of
a 3D reaction-diffusion system. In G. Di Pillo and M. Roma, editors, Large-Scale Nonlinear
Optimization, volume 83 of Nonconvex Optimization and its Applications, pages 127–149, Berlin,
2006. Springer.
[11] J. Heywood, R. Rannacher, and S. Turek. Artificial boundaries and flux and pressure conditions
for the incompressible navier–stokes equations. International Journal for Numerical Methods in
Fluids, 22(5):325–352, 1996.
[12] M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth
Newton method. SIAM Journal on Optimization, 13(3):865–888, 2002.
[13] M. Hinze and K. Kunisch. Second Order Methods for Optimal Control of Time-Dependent Fluid
Flow. SIAM Journal on Control and Optimization, 40(3):925–946, 2001.
[14] K. Ito and K. Kunisch. Augmented Lagrangian-SQP Methods in Hilbert Spaces and Application
to Control in the Coefficients Problem. SIAM Journal on Optimization, 6(1):96–125, 1996.
[15] K. Ito and K. Kunisch. The primal-dual active set method for nonlinear optimal control problems
with bilateral constraints. SIAM Journal on Control and Optimization, 43(1):357–376, 2004.
[16] F. Kupfer. An infinite-dimensional convergence theory for reduced SQP methods in Hilbert space.
SIAM Journal on Optimization, 6:126–163, 1996.
[17] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002.
[18] H. Maurer and J. Zowe. First and second order necessary and sufficient optimality conditions for
infinite-dimensional programming problems. Mathematical Programming, 16:98–110, 1979.
[19] R. Rannacher and B. Vexler. A priori error estimates for the finite element discretization of elliptic
parameter identification problems with pointwise measurements. SIAM Journal on Control and
Optimization, 44(5):1844–1863, 2005.
[20] A. Rösch and K. Kunisch. A primal-dual active set strategy for a general class of constrained
optimal control problems. SIAM Journal on Optimization, 13(2):321–334, 2002.
8. Numerical Sensitivity Analysis for the Quantity of Interest
173
[21] F. Tröltzsch. Lipschitz stability of solutions of linear-quadratic parabolic control problems with
respect to perturbations. Dynamics of Continuous, Discrete and Impulsive Systems Series A
Mathematical Analysis, 7(2):289–306, 2000.
174
Numerical Methods and Applications
9. On the Interplay Between Interior Point Approximation and
Parametric Sensitivities in Optimal Control
R. Griesse and M. Weiser: On the Interplay Between Interior Point Approximation and
Parametric Sensitivities in Optimal Control, to appear in: Journal of Mathematical
Analysis and Applications, 2007
In all previous publications in this thesis, the primal-dual active set method (see
Bergounioux et al. [1999], Hintermüller et al. [2002]) was routinely used in order to
compute optimal solutions and sensitivity derivatives. Interior point methods offer an
alternative approach to this task. We consider here the classical variant which employs
a relaxation (u − ua ) η = µ of the complementarity conditions arising in the presence
of, say, a one-sided control constraint u ≥ ua . When the homotopy parameter µ tends
to zero, the corresponding solutions define the so-called central path.
We investigate the interplay between the function space interior point method and
parametric sensitivity derivatives for optimization problems of the following kind:
Z
Z
Z
1
1
u(Ku) dx +
α u2 dx +
f u dx
Minimize J(u; π) =
2 Ω
2 Ω
(9.1)
Ω
subject to u − ua ≥ 0 a.e. in Ω.
Here, K is a self-adjoint and positive semidefinite operator in L2 (Ω), which maps
compactly into L∞ (Ω), f ∈ L∞ (Ω), and α ≥ α0 > 0. The perturbation parameter
π may enter K, α and f in a Lipschitz and differentiable way, see Assumption 2.1 of
the paper. This setting accommodates in particular optimal control problems, where
K = S ? S and S is the solution operator of the underlying PDE.
The interior point approach leads to the following relaxed optimality system for (9.1):
Ju (u : π) − η
(9.2)
F (u, η; π, µ) =
= 0.
(u − ua ) η − µ
The solutions of (9.2) are considered to be functions of both the homotopy parameter
µ, viewed as an inner parameter, and the outer parameter π:
Ξ(π, µ) = (Ξu (π, µ), Ξη (π, µ)) = v(π, µ).
Our main results are the following estimates for the convergence of the interior point
approximations v(π, µ) and their sensitivity derivatives vπ (π, µ) to the exact counterparts at µ = 0:
kv(π, µ) − v(π, 0)kLq (Ω) ≤ c µ(1+q)/(2q)
1/(2q)
kvπ (π, µ) − vπ (π, 0)kLq (Ω) ≤ cµ
(Theorem 4.6)
(Theorem 4.8)
for all µ < µ0 and q ∈ [2, ∞). In other words, the sensitivity derivatives lag behind by
√
a factor of µ as µ & 0. By excluding a neighborhood of the boundary of the active
set, the convergence rates can be improved by an order of 1/4 (see Theorem 4.9).
These findings are confirmed by three numerical examples in Section 5 of the paper.
The first example is a simple problem with K ≡ 0. An elliptic optimal control problem
serves as a second example, where the parameter π shifts the desired state. As a third
example, we consider an obstacle problem, which fits our setting after switching to its
dual formulation with regularization.
9. Parametric Sensitivities and Interior Point Methods
175
ON THE INTERPLAY BETWEEN INTERIOR POINT
APPROXIMATION AND PARAMETRIC SENSITIVITIES IN
OPTIMAL CONTROL
ROLAND GRIESSE AND MARTIN WEISER
Abstract. Infinite-dimensional parameter-dependent optimization problems of
the form ’min J(u; p) subject to g(u) ≥ 0’ are studied, where u is sought in an L∞
function space, J is a quadratic objective functional, and g represents pointwise
linear constraints. This setting covers in particular control constrained optimal
control problems. Sensitivities with respect to the parameter p of both, optimal
solutions of the original problem, and of its approximation by the classical primaldual interior point approach are considered. The convergence of the latter to the
former is shown as the homotopy parameter µ goes to zero, and error bounds in
various Lq norms are derived. Several numerical examples illustrate the results.
1. Introduction
In this paper we study infinite-dimensional optimization problems of the form
min J(u; p) s.t. g(u) ≥ 0
u
(1.1)
where u denotes the optimization variable, and p is a parameter in the problem which
is not optimized for. The optimization variable u will be called the control variable throughout. It is sought in a suitable function space defined over a domain Ω.
The function g(u) represents a pointwise constraint for the control. For simplicity of
the presentation, we restrict ourselves here to the case of a scalar control, quadratic
functionals J, and linear constraints. The exact setting is given in Section 2 and
accomodates in particular optimal control of elliptic partial differential equations.
Let us set the dependence of (1.1) on the parameter aside. In the recent past, a
lot of effort has been devoted to the development of infinite-dimensional algorithms
capable of solving such inequality-constrained problems. Among them are active set
strategies [1, 5–7, 11] and interior point methods [12, 14, 15]. In the latter class, the
complementarity condition holding for the constraint g(u) ≥ 0 and the corresponding
Lagrange multiplier η ≥ 0 is relaxed to g(u)η = µ almost everywhere with µ denoting
the duality gap homotopy parameter. When µ is driven to zero, the corresponding
relaxed solutions (u(µ), η(µ)) define the so-called central path.
In a different line of research, the parameter dependence of solutions for optimal
control problems with partial differential equations and pointwise control constraints
has been investigated. Differentiability results have been obtained for elliptic [9] and
for parabolic problems [4, 8]. Under certain coercivity assumptions for second order
derivatives, the solutions u(p) were shown to be at least directionally differentiable with
respect to the parameter p. These derivatives, often called parametric sensitivities,
allow to assess a solution’s stability properties and to design real-time capable update
schemes.
This paper intends to investigate the interplay between function space interior point
methods and parametric sensitivity analysis for optimization problems. The solutions
v(p, µ) = (u(p, µ), η(p, µ)) of the interior-point relaxed optimality systems depend
on both the homotopy parameter µ, viewed as an inner parameter, and the outer
176
Numerical Methods and Applications
parameter p. Our main results are, under appropriate assumptions, convergence of the
interior point approximation and its parametric sensitivity to their exact counterparts:
kv(p, µ) − v(p, 0)kLq ≤ c µ(1+q)/(2q)
1/(2q)
kvp (p, µ) − vp (p, 0)kLq ≤ cµ
(Theorem 4.6)
(Theorem 4.8)
for all µ < µ0 and q ∈ [2, ∞). By excluding a neighborhood of the boundary of
the active set, the convergence rates can be improved by an order of 1/4 (Theorem 4.9). These convergence rates are confirmed by several numerical examples. The
examples include a distributed elliptic optimal control problem with pointwise control
constraints as well as a dualized and regularized obstacle problem.
The outline of the paper is as follows: In Section 2 we define the setting for our problem. Section 3 is devoted to the parametric sensitivity analysis of problem (1.1). In
Section 4 we establish our main convergence results, which are confirmed by numerical
examples in Section 5.
Throughout, c denotes a generic positive constant which is independent of the
homotopy parameter µ and the choice of the norm q. It has different values in different
locations. In case q = ∞, expressions like (r − q)/(2q) are understood in the sense of
their limit.
By L(X, Y ), we denote the space of linear and continuous operators from X to Y .
The (partial) Fréchet derivatives of a function G(u, p) are denoted by Gu (u, p) and
Gp (u, p), respectively. In contrast, we denote the (partial) directional derivative of G
in the direction δp by Dp (G(u, p); δp).
2. Problem Setting
In this section, we define the problem setting and standing assumptions taken to
hold throughout the paper. We consider the infinite-dimensional optimization problem
min J(u; p) s.t. g(u) ≥ 0.
u
(2.1)
Here, u ∈ L∞ (Ω) is the control variable, defined on a bounded domain Ω ⊂ Rd . For
ease of notation, we shall denote the standard Lebesgue spaces Lq (Ω) by Lq .
The problem depends on a parameter p from some normed linear space P . The
objective J : L∞ × P → R is assumed to have the following form:
Z
Z
Z
1
1
f (x, p) u(x) dx
u(x)((K(p)u)(x)) dx +
α(x, p)[u(x)]2 dx +
J(u; p) =
2 Ω
2 Ω
Ω
(2.2)
Assumption 2.1. We assume that p∗ ∈ P is a given reference parameter and that
the following holds for p in a fixed neighborhood Ve of p∗ :
(a) K(p) : L2 → L∞ is a linear compact operator which is self-adjoint and positive
semidefinite as an operator L2 → L2 ,
(b) p 7→ K(p) ∈ L(L∞ , L∞ ) is Lipschitz continuous and differentiable,
(c) p 7→ α(p) ∈ L∞ is Lipschitz continuous and differentiable,
(d) α := inf{ess inf α(p) : p ∈ Ve } > 0,
(e) p 7→ f (p) ∈ L∞ is Lipschitz continuous and differentiable.
R
Note that since Ω α(x, p)[u(x)]2 dx ≥ α kuk2L2 , J is strictly convex. In addition, J is
weakly lower semicontinuous and radially unbounded and hence (2.1) admits a global
unique minimizer u(p) ∈ L∞ over any nonempty convex closed subset of L∞ . This
setting accomodates in particular optimal control problems with parameter-dependent
desired state yd and objective
α
1
J(u; p) = kSu − yd (p)k2L2 + kuk2L2
2
2
9. Parametric Sensitivities and Interior Point Methods
177
where Su is the unique solution of, e.g., a second-order elliptic partial differential
equation with distributed control u and K = S ⋆ S. For simplicity of notation, we will
from now on omit the argument p from K, α and f .
From (2.2) we infer that the objective is differentiable with respect to the norm of
L2 and we identify Ju with its Riesz representative, i.e., we have
Ju (u; p) = Ku + αu + f.
Note that for u ∈ Lq , Ju (u; p) ∈ Lq holds for all q ∈ [2, ∞]. Likewise, we write
Juu (u; p) = K + αI for the second derivative, meaning that
Z
Z
α v1 v2 .
v2 (Kv1 ) +
Juu (u; p)(v1 , v2 ) =
Ω
Ω
Let us now turn to the constraints which are given in terms of a Nemyckii operator
involving a twice differentiable real function g : R → R with Lipschitz continuous
derivatives. For simplicity, we restrict ourselves here to linear control constraints
g(u) = u − a ≥ 0
a.e. on Ω
(2.3)
with lower bound a ∈ L∞ . The general case is commented on when appropriate. For
later reference, we define the admissible set
Uad = {u ∈ L∞ : g(u) ≥ 0 a.e. on Ω}.
In this setting, the existence of a regular Lagrange multiplier can be proved:
Lemma 2.2. u is the unique global optimal solution for problem (2.1) if and only if
there exists a Lagrange multiplier η ∈ L∞ such that the optimality conditions
Ju (u; p) − gu (u)⋆ η
= 0, g(u) ≥ 0, and η ≥ 0
(2.4)
g(u) η
hold.
Proof. The minimizer u is characterized by the variational inequality
Ju (u; p)(u − u) ≥ 0 for all u ∈ Uad
which can be pointwisely decomposed as Ju (u; p) = 0 where g(u) > 0 and Ju (u; p) ≥ 0
where g(u) = 0. Hence, η := Ju (u; p) ∈ L∞ is a multiplier for problem (2.1) such
that (2.4) is satisfied.
In the general case, the derivative gu (u) extends to a continuous operator from Lq
to Lq (see [14]) and gu (u)⋆ above denotes its L2 adjoint. In view of our choice (2.3)
we have gu (u)⋆ = I.
3. Parametric Sensitivity Analysis
In this section we derive a differentiability result for the unrelaxed solution v(p, 0)
with respect to changes in the parameter. K, α and f are evaluated at p∗ . Moreover,
(u∗ , η ∗ ) = v(p∗ , 0) ∈ L∞ × L∞ is the unique solution of (2.4).
In order to formulate our result, it is useful to define the weakly/strongly active
and inactive subsets for the reference control u∗ :
Ω0 = {x ∈ Ω : g(u∗ ) = 0 and η ∗ = 0}
Ω+ = {x ∈ Ω : g(u∗ ) = 0 and η ∗ > 0}
Ωi = {x ∈ Ω : g(u∗ ) > 0 and η ∗ = 0}
which form a partition of Ω unique up to sets of measure zero. In addition, we define
bad = {u ∈ L∞ : u = 0 a.e. on Ω+ and u ≥ 0 a.e. on Ω0 }.
U
178
Numerical Methods and Applications
Theorem 3.1. Suppose that Assumption 2.1 holds. Then there exist neighborhoods
V ⊂ Ve of p∗ and U of u∗ and a map
V ∋ p 7→ (u(p), η(p)) ∈ L∞ × L∞
such that u(p) is the unique solution of (2.1) in U and η(p) is the unique Lagrange
multiplier. Moreover, this map is Lipschitz continuous (in the norm of L∞ ) and
directionally differentiable at p∗ (in the norm of Lq for all q ∈ [2, ∞)). For any given
direction δp, the derivatives δu and δη are the unique solution and Lagrange multiplier
in L∞ × L∞ of the auxiliary problem
Z
Z
1
1
δu(x)((Kδu)(x)) dx +
α(x)[δu(x)]2 dx + Jup (u∗ ; p∗ )(δu, δp)
min
δu 2 Ω
2 Ω
bad .
s.t. δu ∈ U
(3.1)
That is, δu and δη satisfy
Kδu + αδu − δη = −Jup (u∗ ; p∗ )(δu, δp) δu δη = 0
bad
δu ∈ U
a.e. on Ω,
δη ≥ 0
a.e. on Ω0 .
(3.2)
Proof. The main tool in deriving the result is the implicit function theorem for genb = Lq and
eralized equations [3], see Appendix A, which we apply with X = L∞ , X
W = Z = L∞ . We formulate (2.4) as a generalized equation. To this end, let
G(u; p) = Ju (u; p)
and
Z
N (u) = {ϕ ∈ L∞ :
Ω
ϕ (u − u) ≤ 0
for all u ∈ Uad } if u ∈ Uad
while N (u) = ∅ otherwise. It is readily seen that (2.4) is equivalent to the generalized
equation
0 ∈ G(u; p) + N (u).
(3.3)
Conditions (i) and (ii) of Theorem A.1 are a direct consequence of Assumption 2.1.
The verification of conditions (iii) and (iv) proceeds in three steps: construction
of the function ξ, the proof of its Lipschitz continuity, and the proof of directional
differentiability.
Step 1: We set up the linearization of (3.3) with respect to u,
δ ∈ G(u∗ ; p∗ ) + Gu (u∗ ; p∗ )(u − u∗ ) + N (u),
which can be written as
δ ∈ Ku + αu + f + N (u).
(3.4)
These are the first order necessary
conditions for a perturbation of problem (2.1) with
R
an additional linear term − Ω δ(x) u(x) dx in the objective, which does not disturb
the strict convexity. Consequently, (3.4) is sufficient for optimality and thus uniquely
solvable for any given δ. This defines the map ξ : L∞ ∋ δ 7→ u = ξ(δ) ∈ L∞ in
Theorem A.1.
Step 2: In order to prove that ξ is Lipschitz, let u′ and u′′ be the unique solutions
of (3.4) belonging to δ ′ and δ ′′ . Then (3.4) readily yields
Z
Z
(αu′ + Ku′ + f − δ ′ )(u′′ − u′ ) + (αu′′ + Ku′′ + f − δ ′′ )(u′ − u′′ ) ≥ 0.
Ω
Ω
From there, we obtain
Z
Z
α (u′′ − u′ )2 ≤ kδ ′′ − δ ′ kL2 ku′′ − u′ kL2 − (u′′ − u′ )K(u′′ − u′ ).
α ku′′ − u′ k2L2 ≤
Ω
Ω
9. Parametric Sensitivities and Interior Point Methods
179
Due to positive semidefiniteness of K,
1
c
ku′′ − u′ kL2 ≤ kδ ′ − δ ′′ kL2 ≤ kδ ′ − δ ′′ kL∞
α
α
follows. To derive the L∞ estimate, we employ a pointwise argument. Let us denote
by Pu(x) = max{u(x), a(x)} the pointwise projection of a function to the admissible
set Uad . As (3.4) is equivalent to
δ(x) − (Ku)(x) − f (x)
,
u(x) = P
α(x)
and the projection is Lipschitz with constant 1, we find that
1 ′′
|u′′ (x) − u′ (x)| ≤
|δ (x) − δ ′ (x)| + |(K(u′′ − u′ ))(x)|
α(x)
1 ′′
≤
kδ − δ ′ kL∞ + kKkL2 →L∞ ku′′ − u′ kL2 ,
α
′′
from where the desired ku − u′ kL∞ ≤ c kδ ′ − δ ′′ kL∞ follows. Since
kη ′′ − η ′ kL∞ = kJu (u′′ ; p∗ ) − Ju (u′ ; p∗ ) − δ ′ + δ ′′ kL∞
≤ kK(u′′ − u′ )kL∞ + kαkL∞ ku′′ − u′ kL∞ + kδ ′′ − δ ′ kL∞
holds, we have Lipschitz continuity also for the Lagrange multiplier.
In Step 3 we deduce that u = ξ(δ) in (3.4) depends directionally differentiably on
δ. To this end, let δb ∈ L∞ be a given direction, let {τn } be a real sequence such that
b We consider
τn ց 0 and let us define un to be the solution of (3.4) for δn = τn δ.
∗
the difference quotient (un − u )/τn which, by the Lipschitz stability shown above, is
b L . Hence we can extract a
bounded in L∞ and thus in L2 by a constant times kδk
∞
subsequence such that
un − u∗
⇀u
b in L2 .
τn
By compactness, K((un − u∗ )/τn ) → K u
b in L∞ holds. Hence the sequence dn =
−(Kun + f − δn )/α converges uniformly to d∗ = −(Ku∗ + f )/α and (dn − d∗ )/τn
converges uniformly to db = (δb − K u
b)/α. We now construct a pointwise limit of the
difference quotient taking advantage of the decomposition of Ω. Note that α(u∗ −d∗ ) =
η ∗ and un = Pdn and likewise u∗ = Pd∗ hold. On Ωi , we have d∗ > a and thus dn > a
for sufficiently large n, which entails that
un − u∗
Pdn − Pd∗
dn − d∗
=
=
→ db on Ωi .
τn
τn
τn
On Ω+ , η ∗ > 0 implies d∗ < a, hence dn < a for sufficiently large n and thus
Pdn − Pd∗
0−0
un − u∗
=
=
→ 0 on Ω+ .
τn
τn
τn
Finally on Ω0 we have η ∗ = 0 and thus d∗ = a so that
un − u∗
Pdn − Pd∗
Pdn − a
b0
=
=
→ max d,
on Ω0 .
τn
τn
τn
Hence we have constructed a pointwise limit u
e = lim(un − u∗ )/τn on Ω. As
∗
∗
dn − d∗ un − u un − u
b
+ |d|
+ |e
u| ≤ −u
e ≤ τn
τn τn b for any q ∈ [2, ∞), we
and the right hand side converges pointwise and in Lq to 2 |d|
infer from Lebesgue’s Dominated Convergence Theorem that
un − u∗
→u
e in Lq for all q ∈ [2, ∞)
τn
180
Numerical Methods and Applications
and hence u
e=u
b must hold. As for the Lagrange multiplier, we observe that
∗
un − u∗ b
Ju (un ; p∗ ) − Ju (u∗ ; p∗ ) − δn
un − u∗
ηn − η
+α
=
=K
−δ
τn
τn
τn
τn
−→ ηb := K u
b + αb
u − δb in Lq
for all q ∈ [2, ∞).
It is straightforward to check that (b
u, ηb) are the unique solution and Lagrange multiplier in L∞ × L∞ of the auxiliary problem
Z
Z
Z
1
1
2
b u(x) dx s.t. u ∈ U
bad .
min
δ(x)
u(x)((Ku)(x)) dx +
α(x)[u(x)] dx −
u 2 Ω
2 Ω
Ω
(3.5)
b = Lq and
We are now in the position to apply Theorem A.1 with X = L∞ , X
Z = L∞ . It follows that there exists a map V ∋ p 7→ u(p) ∈ U ⊂ L∞ mapping p to
the unique solution of (3.3). Lemma 2.2 shows that u(p) is also the unique solution of
our problem (2.1). Moreover, u(p∗ ) = u∗ holds, and u(p) is directionally differentiable
at p∗ into Lq for any q ∈ [2, ∞). By the first equation in (2.4), i.e., η(p) = Ju (u(p); p),
the same holds for η(p). The derivative (δu, δη) in the direction of δp is given by the
unique solution and Lagrange multiplier of (3.5) with δb = −Jup (u∗ ; p∗ )(·, δp), whose
necessary and sufficient optimality conditions coincide with (3.2). This completes the
proof.
Remark 3.2.
(1) The directional derivative map
P ∋ δp 7→ (δu, δη) ∈ L∞ × L∞
(3.6)
is positively homogeneous in the direction δp but may be nonlinear. However,
k(δu, δη)k∞ ≤ c kδpkP holds with c independent of the direction.
(2) In case of Ω0 being a set of measure zero, we say that strict complementarity
holds at the solution u(p∗ , 0). As a consequence, the admissible set for the
bad is a linear space and the map (3.6) is linear.
sensitivities U
4. Convergence of Solutions and Parametric Sensitivities
As mentioned in the introduction, we consider an interior point regularization of
problem (2.1) by means of the classical primal-dual relaxation of the first order necessary conditions (2.4). That is, we introduce the homotopy parameter µ ≥ 0 and define
the relaxed optimality system by
Ju (u; p) − η
F (u, η; p, µ) =
= 0.
(4.1)
g(u) η − µ
As opposed to the previous section, we write again p instead of p∗ for the fixed reference
parameter.
Lemma 4.1. For each µ > 0 there exists a unique admissible solution of (4.1).
Proof. A proof is given in [10]. For convenience, we sketch the main ideas here. The
interior point equation (4.1) is the optimality system for the primal interior point
formulation
Z
min J(u; p) − µ
Ω
ln(g(u)) dx
of (1.1). For each ǫ > 0, this functional is lower semicontinuous on the set Mǫ :=
{u ∈ L∞ : g(u) ≥ ǫ}, such that by convexity and coercivity a unique minimizer uǫ (µ)
exists. Moreover, if ǫ is sufficiently small, uǫ (µ) = u(µ) ∈ int Mǫ holds, such that u(µ)
and the associated multiplier satisfy (4.1).
9. Parametric Sensitivities and Interior Point Methods
We denote the solution of (4.1) by
v(p, µ) :=
181
u(p, µ)
.
η(p, µ)
It defines the central path homotopy as µ ց 0 for fixed parameter p.
This section is devoted to the convergence analysis of v(p, µ) → v(p, 0) and of
vp (p, µ) → vp (p, 0) as µ ց 0. We will establish orders of convergence for the full scale
of Lq norms.
In order to avoid cluttered notation with operator norms, we assume throughout
that δp is an arbitrary parameter direction of unit norm, and we use
up (p, µ)
vp (p, µ) =
ηp (p, µ)
to denote the directional derivative of v(p, µ) in this direction, whose existence is
guaranteed by Theorem 3.1 in case µ = 0 and by Lemma 4.7 below for µ > 0.
Moreover, we shall omit function arguments when appropriate.
To begin with, we establish the invertibility of the Karush-Kuhn-Tucker operator
√
belonging to problem (2.1). Note that gη = µ implies that g + η ≥ 2 µ.
Lemma 4.2. For any µ > 0, the derivative Fv (v(p, µ); p, µ) is boundedly invertible
from Lq → Lq for all q ∈ [2, ∞] and satisfies
b .
kFv−1 (·)(a, b)kLq ≤ c kakLq + g + η Lq
Proof. Obviously, F is differentiable with respect to v = (u, η). In view of linearity of
the inequality constraint, we need to consider the system
Juu −gu⋆ ū
a
=
η gu
g
η̄
b
where the matrix elements are evaluated at u(p, µ) and η(p, µ), respectively. We
introduce the almost active set ΩA = {x ∈ Ω : g ≤ η} and its complement ΩI = Ω\ΩA ,
the almost inactive set. The associated characteristic functions χA and χI = 1 − χA ,
respectively, can be interpreted as orthogonal projectors onto the subspaces L2 (ΩA )
and L2 (ΩI ). Dividing the second row by η, we obtain
a
Juu
−gu⋆
ū
.
=
gu (χA + χI ) ηg (χA + χI )η̄
(χA + χI ) ηb
Eliminating
η b
χI η̄ = χI
− gu ū
g η
and multiplying the second row by −1 leads to the reduced system
#
"
Juu + gu⋆ χI ηg gu
−gu⋆
a + gu⋆ χI gb
ū
.
=
−gu
−χA gη χA η̄
−χA ηb
This linear saddle point problem satisfies the assumptions of Lemma B.1 in [2] (see
also Appendix B) with V = L2 (Ω) and M = L2 (ΩA ): the upper left block is uniformly
elliptic (with constant α independent of µ) and uniformly bounded since η/g ≤ 1 on
ΩI , the off-diagonal blocks satisfy an inf-sup-condition (independently of µ), and the
negative semidefinite lower right block is uniformly bounded since g/η ≤ 1 on ΩA .
Therefore, the operator’s inverse is bounded independently of µ. Using that g ≤ η on
ΩA and η ≤ g on ΩI , we obtain
k(ū, χA η̄)kL2 ≤ c k(a + gu⋆ χI b/g, χAb/η)kL2
≤ c (kakL2 + kb/(g + η)kL2 ) .
182
Numerical Methods and Applications
Having the L2 -estimate at hand, we can move the spatially coupling operator K to
the right hand side and apply the saddle point lemma pointwisely (with V = M = R)
to
#
"
a + gu⋆ χI gb − K ū
α + gu⋆ χI ηg gu −gu⋆
ū
=
.
gu
χA ηg χA η̄
χA ηb
Since K : L2 → L∞ is compact, we obtain
|(ū, χA η̄)(x)| ≤ c|(a + gu⋆ χI b/g − K ū, χA b/η)|
≤ c (|a| + |b|/(g + η) + kKkL2→L∞ kūkL2 )
≤ c (|a| + |b|/(g + η) + kakL2 + kb/(g + η)kL2 )
for almost all x ∈ Ω. From this we conclude that
k(ū, χA η̄)kLq ≤ c(kakLq + kb/(g + η)kLq
for all q ≥ 2. Moreover,
kχI η̄kLq
η b
= χI
− gu ū g η
Lq
≤ 2kb/(g + η)kLq + c(kakLq + kb/(g + η)kLq )
≤ c(kakLq + kb/(g + η)kLq )
holds, which proves the claim.
Remark 4.3. For more complex settings with multicomponent u ∈ Ln∞ and g : Rn →
Rm , the proof is essentially the same. The almost active and inactive sets ΩA and ΩI
have to be defined for each component of g separately. The only nontrivial change is
to show the inf-sup-condition for gu .
In order to prove convergence of the parametric sensitivities, we will need the strong
complementarity (cf. [12]) of the non-relaxed solution.
Assumption 4.4. Suppose there exists c > 0 such that the solution v(p, 0) satisfies
|{x ∈ Ω : g(u(p, 0)) + η(p, 0) ≤ ǫ}| ≤ c ǫr
(4.2)
for all ǫ > 0 and some 0 < r ≤ 1.
Note that Assumption 4.4 entails that the set Ω0 of weakly active constraints has
measure zero, as
\
|Ω0 | = |
{x ∈ Ω : g(u(p, 0)) + η(p, 0) ≤ ǫ}| ≤ lim c ǫr = 0.
ǫց0
ǫ>0
In other words, strict complementarity holds at the solution u(p, 0). In our examples,
Assumption 4.4 is satisfied with r = 1.
For convenience, we state a special case of Theorem 8.8 from [13] for use in the
current setting.
Lemma 4.5. Assume that f ∈ Lq , 1 ≤ q < ∞ satisfies
{x ∈ Ω : |f (x)| > s} ≤ ψ(s), 0 ≤ s < ∞,
for some integrable function ψ. Then,
kf kqLq
≤q
Z
∞
0
sq−1 ψ(s) ds.
We now prove a bound for the derivative vµ of the central path with respect to the
duality gap parameter µ.
9. Parametric Sensitivities and Interior Point Methods
183
Theorem 4.6. Suppose that Assumption 4.4 holds. Then the map µ 7→ v(µ, p) is
differentiable and the slope of the central path is bounded by
kvµ (p, µ)kLq ≤ c µ(r−q)/(2q) ,
q ∈ [2, ∞].
(4.3)
In particular, the a priori error estimate
kv(p, µ) − v(p, 0)kLq ≤ c µ(r+q)/(2q)
(4.4)
holds.
Proof. By the implicit function theorem, the derivative vµ is given by
Fv (v(p, µ); p, µ) vµ (p, µ) = −Fµ (v(p, µ); p, µ) =
0
.
1
Hence from Lemma 4.2 above we obtain
kvµ (p, µ)kL∞ ≤ c k(g + η)−1 kL∞ ≤ c µ−1/2 .
√
The latter inequality holds since gη = µ implies that g + η ≥ 2 µ.
Now let µn , n ∈ N be a positive sequence converging to zero. We may estimate for
n>m
Z
kv(p, µn ) − v(p, µm )kL∞ ≤
µm
µn
Z
kvµ (p, µ)kL∞ dµ ≤ c
µm
µn
µ−1/2 dµ
√
1/2
≤ c µm ,
≤ c µ1/2
−
µ
m
n
which is less than any ǫ > 0 for sufficiently large m ≥ mǫ . Thus, v(p, µn ) is a Cauchy
sequence with limit point v. Using continuity of L∞ ∋ v 7→ (Ju (u; p) − η, g(u)η) we
find v = v(p; 0). The limit n → ∞ now yields
kv(p, µ) − v(p, 0)kL∞ ≤ c
√
µ,
(4.5)
which proves (4.3) and (4.4) for the case q = ∞. From (4.5) and (4.2) we obtain
|{x ∈ Ω : g(u(p, µ)) + η(p, µ) < ǫ}|
(
√
0,
if ǫ ≤ 2 µ
≤
√
|{x ∈ Ω : g(u(p, 0)) + η(p, 0) < ǫ + c µ}| otherwise
(
√
0,
if ǫ ≤ 2 µ
≤
√
c (ǫ + c µ)r otherwise
with c independent of r. Using Lemmas 4.2 and 4.5 we estimate for q ∈ [2, ∞)
kvµ kqLq ≤ cq k(g + η)−1 kqLq ≤ cq q
with
(
ψ(s) =
0,
√
c (s−1 + µ)r
Z
0
∞
sq−1 ψ(s) ds
√
if s ≥ (2 µ)−1
otherwise
184
Numerical Methods and Applications
and obtain
kvµ kqLq
≤c
≤c
q+1
q+1
Z
q
0
Z
q
= cq+1 q
√
(2 µ)−1
√
(2 µ)−1
0
3 r Z
sq−1 (s−1 +
s
q−1
√
(2 µ)−1
3 −1
s
2
√ r
µ) ds
r
ds
sq−1−r ds
2
0
3 r (2√µ)−1
q+1 q
sq−r 0
=c
q−r 2
q+1 q
3r 2−q µ(r−q)/2 .
≤c
q−r
This implies (4.3). As before in the proof of Theorem 4.6, integration over µ then
yields (4.4).
Lemma 4.7. Along the central path, the solutions v(p, µ) are Fréchet differentiable
w.r.t. p. There exists µ0 > 0 such that the parametric sensitivities are bounded independently of µ:
kvp (p, µ)kL∞ ≤ c for all µ < µ0 .
Proof. By the implicit function theorem and Lemma 4.2, vp exists and satisfies
J (u(p, µ); p)
Fv (v(p, µ); p, µ) vp (p, µ) = −Fp (v(p, µ); p, µ) = − up
.
(4.6)
0
and kvp kL∞ ≤ c kJup (u(p, µ); p)kL∞ holds. By (4.4), ku(p, µ)kL∞ is bounded, and by
Assumption 2.1, the same holds for kJup (u(p, µ); p)kL∞ .
Theorem 4.8. Suppose that Assumption 4.4 holds. Then there exist constants µ0 > 0
and c independent of µ such that
kvp (p, µ) − vp (p, 0)kLq ≤ cµr/(2q)
for all µ < µ0 and q ∈ [2, ∞),
where vp (p, 0) is the parametric sensitivity of the original problem.
Proof. We begin with the sensitivity equation (4.6) and differentiate it totally with
respect to µ, which yields
Fvv (vp , vµ ) + Fvµ vp + Fv vpµ = −Fpv vµ − Fpµ .
First we observe Fvµ = 0, Fpµ = 0 and
a
Jupu uµ
=:
−Fvv (vp , vµ ) − Fpµ vµ = −
.
b
ηp gu uµ + up gu⋆ ηµ
(4.7)
(4.8)
In view of Assumption 2.1, Jupu is a fixed element of L(Lq , Lq ). Hence by Theorem 4.6,
we have
kakLq ≤ c µ(r−q)/(2q)
for all q ∈ [2, ∞).
The quantities (uµ , ηµ ) and (up , ηp ) can be estimated by Theorem 4.6 and Lemma 4.7,
respectively, which entails
kbkLq ≤ c kηp kL∞ kuµ kLq + kup kL∞ kηµ kLq
≤ c µ(r−q)/(2q)
for all q ∈ [2, ∞)
9. Parametric Sensitivities and Interior Point Methods
185
and sufficiently small µ. We have seen that (4.7) reduces to Fv (vpµ ) = (a, b)⊤ . Applying Lemma 4.2 yields
kvpµ kLq ≤ c kakLq + kb/(g + η)kLq
≤ c µ(r−q)/(2q) + µ(r−q)/(2q)−1/2
≤ c µ(r−2q)/(2q)
and thus
kvpµ kLq ≤ c µ(r−2q)/(2q)
for all q ∈ [2, ∞).
Integrating over µ > 0 as before, we obtain the error estimate
q
kvp (p, µ) − vkLq ≤ c µr/(2q) ,
r
where v = limµց0 vp (p, µ). Taking the limit µ ց 0 of (4.6) and using continuity of
L∞ × L2 ∋ (v, vp ) 7→ Fv (v) vp + Fp (v) ∈ L2 , we have
Fv (v(p, 0); p, 0) v + Fp (v(p, 0); p, 0) = 0,
that is,
Juu (u(p, 0); p, 0) u − gu (u(p; 0)) η = −Jup (u(p, 0); p)
η(p, 0)gu (u(p, 0)) u + g(u(p, 0)) η = 0.
(4.9)
(4.10)
From (4.10) we deduce that
u=0
on the strongly active set Ω+
η=0
on the inactive set Ωi ,
which together with (4.9) uniquely characterize the exact sensitivity, see Theorem 3.1.
Note that strict complementarity holds at u(p, 0), i.e., Ω0 is a null set in view of
Assumption 4.4. Hence the limit v is equal to the sensitivity derivative vp (p, 0) of the
unrelaxed problem.
Comparing the results of Theorem 4.6 and 4.8, we observe that the convergence of
√
the sensitivities lags behind the convergence of the solutions by a factor of µ, see
also Table 4.1. Therefore Theorem 4.8 does not provide any convergence in L∞ . This
was to be expected since under mild assumptions, up (p, µ) is a continuous function
on Ω for all µ > 0 while the limit up (p, 0) exhibits discontinuities at junction points,
compare Figure 5.1.
It turns out that the convergence rates are limited by effects on the transition
regions, where g(u) + η is small. However, sufficiently far away from the boundary of
the active set, we can improve the L∞ estimates by r/4:
Theorem 4.9. Suppose that Assumption 4.4 holds. For β > 0 define the β-determined
set as
Dβ = {x ∈ Ω : g(u(p, 0)) + η(p, 0) ≥ β}.
Then the following estimates hold:
kv(p, µ) − v(p, 0)kL∞ (Dβ ) ≤ cµ(r+2)/4
r/4
kvp (p, µ) − vp (p, 0)kL∞ (Dβ ) ≤ cµ
(4.11)
(4.12)
Proof. First we note that due to the uniform convergence on the central path there is
some µ̄ > 0, such that g(u(p, µ)) + η(p, µ) ≥ β/2 for all µ ≤ µ̄ and almost all x ∈ Dβ .
We recall that the derivative of the solutions on the central path vµ is given by
0
Fv (v(p, µ); p, µ) vµ (p, µ) = −Fµ (v(p, µ); p, µ) =
.
1
186
Numerical Methods and Applications
We return to (5.2) in the proof of Lemma 4.2 with a = 0 and b = 1. Pointwise
application of the saddle point lemma on Dβ yields
kvµ kL∞ (Dβ ) ≤ k(g + η)−1 kL∞ (Dβ ) + kKkL2→L∞ kuµ kL2 (Ω)
2
≤ + c µ(r−2)/4 for all µ ≤ µ̄
β
by Theorem 4.6. Integration over µ proves (4.11). Similarly, vpµ is defined by (4.7)
with a and b given by (4.8). Thus we have
kvpµ kL∞ (Dβ ) ≤ c kbkL∞ (Dβ ) k(g + η)−1 kL∞ (Dβ ) + kKkL2→L∞ kvpµ kL2 (Ω)
2
≤ c µ−1/2 · + µ(r−4)/4
β
≤ c µ(r−4)/4 .
Integration over µ verifies the claim (4.12).
Before we turn to our numerical results, we summarize in Table 4.1 the convergence
results proved.
norm
Lq (Ω)
L∞ (Ω)
L∞ (Dβ )
v(p, µ) → v(p, 0) vp (p, µ) → vp (p, 0)
(r + q)/(2q)
1/2
(r + 2)/4
r/(2q)
—
r/4
Table 4.1. Convergence rates for Lq , q ∈ [2, ∞), and L∞ of the
solutions and their sensitivities along the central path.
Remark 4.10. One may ask oneself whether the interior point relaxation of the sensitivity problem (3.1) for vp (p, 0) coincides with the sensitivity problem (4.7) for vp (p, µ)
on the path µ > 0. This, however, cannot be the case, as (3.1) includes equality
constraints for up (p, 0) on the strongly active set Ω+ , whereas (4.7) shows no such
restrictions.
5. Numerical Examples
5.1. An Introductory Example. We start with a simple but instructive example:
Z
1
min
(u(x) − x − p)2 dx s.t. u(x) ≥ 0
Ω 2
on Ω = (−1, 1). The simplicity arises from the fact that this problem is spatially
decoupled and K = 0 holds. Nevertheless, several interesting properties of parametric
sensitivities and their interior point approximations may be explored.
The solution is given by u(p, 0) = max(0, x + p) with sensitivity
(
1, x + p > 0
up (p, 0) =
0, x + p < 0.
The interior point approximations are
p + x 1p
(p + x)2 + 4µ
u(p, µ) =
+
2
2
and their sensitivities
s
1 p+x
1
up (p, µ) = +
.
2
2
(p + x)2 + 4µ
9. Parametric Sensitivities and Interior Point Methods
Control sensitivities
Control
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−1
−0.8
−0.6
−0.4
−0.2
0
187
0.2
0.4
0.6
0.8
1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Figure 5.1. Interior point solutions (left) and their sensitivities
(right) for µ ∈ [10−6 , 10−1 ].
Convergence of solution in different norms
0
Convergence of sensitivity in different norms
0
10
10
−1
10
L1
L1
L2
L2
L4
L4
L8
L8
L
L
∞
∞
−2
−1
10
10
−3
10
−4
−2
10
10
−5
10
−6
10
−3
−6
10
−5
10
−4
−3
10
10
−2
10
−1
10
10
−6
10
−5
10
−4
−3
10
mu
10
−2
10
mu
Figure 5.2. Convergence behavior of solutions (left) and their sensitivities (right) for q ∈ {2, 4, 8, ∞}.
Finally, the Lagrange multiplier and its sensitivity are given by
η(p, µ) = u(p, µ) − x − p
ηp (p, µ) = up (p, µ) − 1.
As a reference parameter, we choose p = 0. From the solution we infer that
{x ∈ Ω : g(u(p, 0)) + η(p, 0) ≤ ǫ} = [−ǫ, ǫ]
so Assumption 4.4 is satisfied with r = 1.
A sequence of solutions obtained for a discretization of Ω with 212 points and
µ ∈ [10−6 , 10−1 ] is depicted in Figure 5.1. The error of the solution ku(p, µ)−u(p, 0)kLq
and the sensitivities kup (p, µ) − up (p, 0)kLq in different Lq norms are given in the
double logarithmic Figure 5.2. Similar plots can be obtained for the multiplier and its
sensitivities.
Table 5.1 shows that the predicted convergence rates for q ∈ [2, ∞] are in very
good accordance with those observed numerically. The numerical convergence rates
−1
10
188
Numerical Methods and Applications
q
1
2
4
8
∞
control
control sensitivity
predicted observed predicted observed
—
0.7500
0.6250
0.5625
0.5000
0.9132
0.7476
0.6221
0.5571
0.5000
—
0.2500
0.1250
0.0625
—
0.4960
0.2481
0.1214
0.0565
—
Table 5.1. Predicted and observed convergence rates in different Lq
norms for the control and its sensitivity.
are estimated from
ku(p,µ1 )−u(p,0)kL
log ku(p,µ2 )−u(p,0)kLq
q
log µµ12
(5.1)
and the same expression with u replaced by up , where µ1 and µ2 are the smallest and
the middle value of the sequence of µ values used. The corresponding rates for the
multiplier are identical. Our theory does not provide Lq estimates for q < 2. However,
since exact solutions are available here, we can calculate
√
1 + 4µ + 1
1 p
1 + 4µ − 1 + µ ln √
ku(p, µ) − u(p, 0)kL1 =
2
1 + 4µ − 1
p
p
kup (p, µ) − up (p, 0)kL1 = 1 + 4µ − 1 + 4µ.
Hence the L1 convergence orders approach 1 and 1/2, respectively, as µ ց 0, see
Table 5.1.
5.2. An Optimal Control Example. In this section, we consider a linear-quadratic
optimal control problem involving an elliptic partial differential equation:
1
α
min J(u; p) = kSu − yd + pk2L2 + kuk2L2 s.t. u − a ≥ 0 and b − u ≥ 0
u
2
2
where Ω = (0, 1) ⊂ R and y = Su is the unique solution of the Poisson equation
−∆y = u
on Ω
y(0) = y(1) = 0.
The linear solution operator maps u ∈ L2 into Su ∈ H 2 ∩ H01 . Moreover, S ⋆ = S
holds and K = S ⋆ S is compact from L2 into L∞ so that the problem fits into our
setting. To complete the problem specification, we choose α = 10−4 , a ≡ −40, b ≡ 40
and yd = sin(3πx) as desired state. The reference parameter is p = 0. The presence
of upper and lower bounds for the control requires a straightforward extension of our
convergence results which is readily obtained and verified by this example.
To illustrate our results, we discretize the problem using the standard 3-point finite
difference stencil on a uniform grid with 512 points. The interior point relaxed problem is solved for a sequence of duality gap parameters µ ∈ [10−7 , 10−1 ] by applying
Newton’s method to the discretized optimality system. The corresponding sensitivity
problems require only one additional Newton step each since p ∈ R. To obtain a reference solution, the unrelaxed problem for µ = 0 is solved using a primal-dual active
set strategy [1, 5], which is also used to find the solution of the sensitivity problem at
µ = 0. The sequence of solutions u(p, µ) and sensitivity derivatives up (p, µ) is shown in
Figure 5.3. As in the previous example, the error of the solution ku(p, µ) − u(p, 0)kLq
and the sensitivities kup (p, µ) − up (p, 0)kLq in different Lq norms are given in the
9. Parametric Sensitivities and Interior Point Methods
Control
189
Control Sensitivities
40
50
45
30
40
20
35
10
30
0
25
−10
20
−20
15
10
−30
5
−40
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 5.3. Interior point solutions (left) and their sensitivities
(right) for µ ∈ [10−7 , 10−1 ].
Convergence of control in different norms
2
Convergence of control sensitivity in different norms
2
10
10
L1
L1
L2
L2
L4
L4
L8
L8
L∞
1
10
L∞
1
10
0
10
0
10
−1
10
−2
10
−1
−7
10
−6
10
−5
10
−4
10
mu
−3
10
−2
10
−1
10
10
−7
−6
10
10
−5
10
−4
10
mu
−3
10
−2
10
Figure 5.4. Convergence behavior of solutions (left) and their sensitivities (right) for q ∈ {2, 4, 8, ∞}.
double logarithmic Figure 5.4. In order to compare the predicted convergence rates
with the observed ones, we need to estimate the exponent r in the strong complementarity Assumption 4.4. To this end, we analyze the discrete solution u(p, 0) together
with its Lagrange multiplier η(p, 0) = Ju (u(p, 0); p) whose positive and negative parts
are multipliers for the lower and upper constraints, respectively. A finite sequence of
estimates is generated according to
rn ≈
n|
log |Ω|Ωmin
|
ǫn
log ǫmin
,
where ǫmin is the smallest value of ǫ > 0 such that {x ∈ Ω : u(p, 0) − a + η + (p, 0) ≤ ǫ}
contains 10 grid points. |Ωmin | is the measure of the corresponding set. Similarly, we
define ǫmax as the maximum value of u(p, 0) − a + η + (p, 0) on Ω and
n
ǫn = exp log(ǫmin ) + (log(ǫmax ) − log(ǫmin )) , n = 0, . . . , 20.
20
|Ωn | is again the measure of the corresponding set. For the current example, we obtain
the sequence {rn } shown in Figure 5.5. From the slope of the line in the left part of
−1
10
190
Numerical Methods and Applications
Measure of set
0
10
−1
10
−2
10
−4
−3
10
10
−2
10
−1
10
ε
0
10
1
10
2
10
Figure 5.5. Sequence of estimates rn for the exponent in the strong
complementarity assumption.
q
1
2
4
8
∞
control
control sensitivity
state state sensitivity
predicted observed predicted observed observed
observed
—
0.7500
0.6250
0.5625
0.5000
0.8403
0.7136
0.5961
0.5387
0.4978
—
0.2500
0.1250
0.0625
—
0.4894
0.2470
0.1169
0.0484
—
0.8731
0.8739
0.8739
0.8765
0.8801
0.5096
0.4934
0.4710
0.4482
0.4015
Table 5.2. Predicted and observed convergence rates in different Lq
norms for the control and its sensitivity, and observed rates for the
state and its sensitivity.
the figure, we deduce the estimate r = 1. The same result is found for the upper
bound.
Table 5.2 shows again the predicted and observed convergence rates for the control
and its sensitivity, as well as the observed rates for the state y = Su and its sensitivity.
All observed rates are estimated using (5.1) with µ1 and µ2 being the two smallest
nonzero values of µ used. Again, the observed convergence rates for the control are in
good agreement with the predicted ones and confirm our analysis for q ∈ [2, ∞]. Since
in 1D, the solution operator S is continuous from L1 to L∞ , the observed rates for the
control in L1 carry over to the state variables in Lq for all q ∈ [2, ∞], and likewise to
the adjoint states. Similarly, the L1 rates for the control sensitivity carry over to the
Lq rates for the state and adjoint sensitivities.
5.3. A Regularized Obstacle Problem. Here we consider the obstacle problem
min k∇uk2L2 + phu, li s.t. u ≥ −1
u∈H01
(5.2)
on Ω = (0, 1)2 ⊂ R2 , which, however, does not fit into the theoretical frame set in
Section 2. Formally dualizing (5.2) leads to
min hη, −∆−1 ηi + phη, ∆−1 li s.t. η ≥ 0,
η∈H −1
9. Parametric Sensitivities and Interior Point Methods
191
0.5
0.5
0
0
−0.5
−0.5
−1
−1
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
1
0.5
0.8
0.4
0.1
0.6
0.4
0.1
0.2
0
0.8
0.4
0.2
0.4
0.2
1
0.5
0.3
0.6
0.3
0.2
0
0
0
Figure 5.6. Interior point solution u(µ) (left) and sensitivities up (µ)
(right) for the regularized obstacle problem at µ = 5.7 · 10−4 .
where ∆ : H01 → H −1 denotes the Laplace operator. Adding a regularization term for
the Lagrange multiplier η, we obtain
α
(5.3)
min hη, −∆−1 ηi + phη, ∆−1 li + kηk2L2 s.t. η ≥ 0.
η∈L2
2
This dualized and regularized variant of the original obstacle problem (5.2) fits into
the theoretical frame presented above. The original constraint u + 1 is the Lagrange
multiplier associated to (5.3). For the numerical results we choose α = 1, p = 1, and an
arbitrary linear term l = 45(2 sin(xy)+sin(−10x) cos(8y−1.25)), which results in a nice
nonsymmetric contact region. The problem has been discretized on a uniform cartesian
grid of 512×512 points using the standard 5-point finite difference stencil. Intermediate
iterates and sensitivities computed on a coarser grid are shown in Figure 5.6. The
convergence behaviour is illustrated in Figure 5.7. Again, the observed convergence
rates are in good agreement with the predicted values for r = 1. For larger values
of q the numerical convergence rate of up (µ) is greater than predicted. This can be
attributed to the discretization, since for very small µ the linear convergence to the
solution of the discretized problem is observed.
References
[1] M. Bergounioux, K. Ito, and K. Kunisch. Primal-dual strategy for constrained optimal control
problems. SIAM Journal on Control and Optimization, 37(4):1176–1194, 1999.
[2] D. Braess and C. Blömer. A multigrid method for a parameter dependent problem in solid
mechanics. Numerische Mathematik, 57:747–761, 1990.
[3] A. Dontchev. Implicit function theorems for generalized equations. Math. Program., 70:91–106,
1995.
[4] R. Griesse. Parametric sensitivity analysis in optimal control of a reaction-diffusion system—
Part I: Solution differentiability. Numerical Functional Analysis and Optimization, 25(1–2):93–
117, 2004.
[5] M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a semismooth
Newton method. SIAM Journal on Optimization, 13(3):865–888, 2002.
[6] M. Hintermüller and K. Kunisch. Path-following methods for for a class of constrained minimization problems in function space. SIAM Journal on Optimization, 17:159–187, 2006.
[7] M. Hinze. A variational discretization concept in control constrained optimization: the linearquadratic case. Computational Optimization and Applications, 30:45–63, 2005.
[8] K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002.
[9] K. Malanowski. Solution differentiability of parametric optimal control for elliptic equations. In
E. W. Sachs and R. Tichatschke, editors, System Modeling and Optimization XX, Proceedings
of the 20th IFIP TC 7 Conference, pages 271–285. Kluwer Academic Publishers, 2003.
[10] U. Prüfert, F. Tröltzsch, and M. Weiser. The convergence of an interior point method for an
elliptic control problem with mixed control-state constraints. Technical Report 36–2004, Institute
of Mathematics, TU Berlin, Germany, 2004.
192
Numerical Methods and Applications
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
10
100
1000
Figure 5.7. Numerically observed convergence rates of interior point
iterates (top markers) and sensitivities (bottom markers) for different
values of q ∈ [1, 1000]. Thin lines denote the analytically predicted
values.
[11] M. Ulbrich. Semismooth Newton methods for operator equations in function spaces. SIAM Journal on Optimization, 13:805–842, 2003.
[12] M. Ulbrich and S. Ulbrich. Superlinear convergence of affine-scaling interior-point Newton methods for infinite-dimensional nonlinear problems with pointwise bounds. SIAM Journal on Control
and Optimization, 38(6):1938–1984, 2000.
[13] M. Väth. Integration Theory. A second course. World Scientific, Singapore, 2002.
[14] M. Weiser. Interior point methods in function space. SIAM Journal on Control and Optimization,
44(5):1766–1786, 2005.
[15] M. Weiser, T. Gänzler, and A. Schiela. Control reduced primal interior point methods. Report
04-38, ZIB, 2004.
Appendix A. An Implicit Function Theorem
For the sake of easy reference we state here an implicit function theorem which is
an adaptation of Theorem [3, Theorem 2.4].
Theorem A.1 (Implicit Function Theorem). Let X be a Banach space and let P, Z
be normed linear spaces. Suppose that G : X × P → Z is a function and N : X → Z
is a set-valued map. Let u∗ ∈ X be a solution to
0 ∈ G(u, p) + N (u)
∗
(A.1)
for p = p , and let W be a neighborhood of 0 ∈ Z. Suppose that
(i) G is Lipschitz in p, uniformly in u at (u∗ , p∗ ), and G(u∗ , ·) is directionally
differentiable at p∗ with directional deriative Dp (G(u∗ , p∗ ); δp) for all δp ∈ P ,
(ii) G is partially Fréchet differentiable with respect to u in a neighborhood of
(u∗ , p∗ ), and its partial derivative Gu is continuous in both u and p at (u∗ , p∗ ),
(iii) there exists a function ξ : W → X such that ξ(0) = u0 , δ ∈ G(u∗ , p∗ ) +
Gu (u∗ , p∗ )(ξ(δ) − u∗ ) + N (ξ(δ)) for all δ ∈ W, and ξ is Lipschitz continuous.
Then there exist neighborhoods U of u∗ and V of p∗ and a function p 7→ u(p) from V
to U such that u(p∗ ) = u∗ , u(p) is a solution of (A.1) for every p ∈ V , and u(·) is
Lipschitz continuous.
9. Parametric Sensitivities and Interior Point Methods
193
b ⊃ X is a normed linear space such that
If in addition, X
b for all
b is directionally differentiable at 0 with derivative Dξ(0; δ)
(iv) ξ : W → X
b
δ ∈ Z,
b is also directionally differentiable at p∗ and the derivative is given
then p 7→ u(p) ∈ X
by Dξ(0; −Dp G(u∗ , p∗ ; δp)) for all δp ∈ P .
Appendix B. A Saddle Point Lemma
For convenience we state here the saddle point lemma by Braess and Blömer [2,
Lemma B.1].
Lemma B.1. Let V and M be Hilbert spaces. Assume the following conditions hold:
(1) The continuous linear operator B : V → M ∗ satisfies the inf-sup-condition:
There exists a constant β > 0 such that
hζ, Bvi
≥β.
inf sup
ζ∈M v∈V kvkV kζkM
(2) The continuous linear operator A : V → V ∗ is symmetric positive definite on
the nullspace of B and positive semidefinite on the whole space V : There exists
a constant α > 0 such that
hv, Avi ≥ αkvk2V
and
for all v ∈ ker B
hv, Avi ≥ 0 for all v ∈ V .
(3) The continuous linear operator D : M → M ∗ is symmetric positive semidefinite.
Then, the operator
A B∗
: V × M → V ∗ × M∗
B −D
is invertible. The inverse is bounded by a constant depending only on α, β, and the
norms of A, B, and D.
Bibliography
W. Alt. The Lagrange-Newton method for infinite-dimensional optimization problems.
Numerical Functional Analysis and Optimization, 11:201–224, 1990.
W. Alt. Local convergence of the Lagrange-Newton method with applications to
optimal control. Control and Cybernetics, 23(1–2):87–105, 1994.
W. Alt, R. Griesse, N. Metla, and A. Rösch. Lipschitz stability for elliptic optimal
control problems with mixed control-state constraints. submitted, 2006.
R. Becker and B. Vexler. Mesh refinement and numerical sensitivity analysis for
parameter calibration of partial differential equations. Journal of Computational
Physics, 206(1):95–110, 2005.
M. Bergounioux, K. Ito, and K. Kunisch. Primal-dual strategy for constrained optimal
control problems. SIAM Journal on Control and Optimization, 37(4):1176–1194,
1999.
F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems. Springer,
Berlin, 2000.
K. Brandes and R. Griesse. Quantitative stability analysis of optimal solutions in PDEconstrained optimization. Journal of Computational and Applied Mathematics, 206
(2):908–926, 2007. doi: http://dx.doi.org/10.1016/j.cam.2006.08.038.
E. Casas. Control of an elliptic problem with pointwise state constraints. SIAM
Journal on Control and Optimization, 24(6):1309–1318, 1986.
A. Dontchev. Implicit function theorems for generalized equations. Mathematical
Programming, 70:91–106, 1995.
R. Griesse. Lipschitz stability of solutions to some state-constrained elliptic optimal
control problems. Journal of Analysis and its Applications, 25:435–455, 2006.
R. Griesse and B. Vexler. Numerical sensitivity analysis for the quantity of interest
in PDE-constrained optimization. SIAM Journal on Scientific Computing, 29(1):
22–48, 2007.
R. Griesse and S. Volkwein. A primal-dual active set strategy for optimal boundary
control of a nonlinear reaction-diffusion system. SIAM Journal on Control and Optimization, 44(2):467–494, 2005. doi: http://dx.doi.org/10.1137/S0363012903438696.
R. Griesse and S. Volkwein. Analysis for optimal boundary control for a threedimensional reaction-diffusion system. Report No. 277, Special Research Center
F003, Project Area II: Continuous Optimization and Control, University of Graz &
Technical University of Graz, Austria, 2003.
R. Griesse and S. Volkwein. Parametric sensitivity analysis for optimal boundary
control of a 3D reaction-diffusion system. In G. Di Pillo and M. Roma, editors,
Large-Scale Nonlinear Optimization, volume 83 of Nonconvex Optimization and its
Applications, pages 127–149, Berlin, 2006. Springer.
R. Griesse and M. Weiser.
On the interplay between interior point
approximation and parametric sensitivities in optimal control.
Journal
of Mathematical Analysis and Applications, 337(2):771–793, 2008.
doi:
http://dx.doi.org/10.1016/j.jmaa.2007.03.106.
196
Bibliography
R. Griesse, M. Hintermüller, and M. Hinze.
Differential stability of control
constrained optimal control problems for the Navier-Stokes equations.
Numerical Functional Analysis and Optimization, 26(7–8):829–850, 2005.
doi:
http://dx.doi.org/10.1080/01630560500434278.
R. Griesse, N. Metla, and A. Rösch. Local quadratic convergence of SQP for elliptic
optimal control problems with mixed control-state constraints. submitted, 2007.
R. Griesse, T. Grund, and D. Wachsmuth. Update strategies for perturbed nonsmooth
equations. Optimization Methods and Software, to appear.
M. Heinkenschloss and F. Tröltzsch. Analysis of the Lagrange-SQP-Newton Method
for the Control of a Phase-Field Equation. Control Cybernet., 28:177–211, 1998.
M. Hintermüller and M. Hinze. An SQP Semi-Smooth Newton-Type Algorithm Applied to the Instationary Navier-Stokes System Subject to Control Constraints.
SIAM Journal on Optimization, 16(4):1177–1200, 2006.
M. Hintermüller, K. Ito, and K. Kunisch. The primal-dual active set strategy as a
semismooth Newton method. SIAM Journal on Optimization, 13(3):865–888, 2002.
K. Malanowski. Sensitivity analysis for parametric optimal control of semilinear parabolic equations. Journal of Convex Analysis, 9(2):543–561, 2002.
K. Malanowski. Solution differentiability of parametric optimal control for elliptic
equations. In E. W. Sachs and R. Tichatschke, editors, System Modeling and Optimization XX, Proceedings of the 20th IFIP TC 7 Conference, pages 271–285. Kluwer
Academic Publishers, 2003a.
K. Malanowski. Remarks on differentiability of metric projections onto cones of nonnegative functions. Journal of Convex Analysis, 10(1):285–294, 2003b.
K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal
control for elliptic equations. Control and Cybernetics, 29:237–256, 2000.
K. Malanowski and F. Tröltzsch. Lipschitz stability of solutions to parametric optimal
control for parabolic equations. Journal of Analysis and its Applications, 18(2):
469–489, 1999.
S. Robinson. Strongly regular generalized equations. Mathematics of Operations Research, 5(1):43–62, 1980.
A. Rösch and F. Tröltzsch. Existence of regular Lagrange multipliers for a nonlinear
elliptic optimal control problem with pointwise control-state constraints. SIAM
Journal on Control and Optimization, 45(2):548–564, 2006.
T. Roubı́ček and F. Tröltzsch. Lipschitz stability of optimal controls for the steadystate Navier-Stokes equations. Control and Cybernetics, 32(3):683–705, 2003.
F. Tröltzsch. Lipschitz stability of solutions of linear-quadratic parabolic control problems with respect to perturbations. Dynamics of Continuous, Discrete and Impulsive
Systems Series A Mathematical Analysis, 7(2):289–306, 2000.
F. Tröltzsch. Optimale Steuerung partieller Differentialgleichungen. Vieweg, Wiesbaden, 2005.
F. Tröltzsch. On the Lagrange-Newton-SQP method for the optimal control of semilinear parabolic equations. SIAM Journal on Control and Optimization, 38(1):294–312,
1999.
F. Tröltzsch and S. Volkwein. The SQP method for control constrained optimal control
of the Burgers equation. ESAIM: Control, Optimisation and Calculus of Variations,
6:649–674, 2001.
M. Ulbrich. Semismooth Newton methods for operator equations in function spaces.
SIAM Journal on Control and Optimization, 13(3):805–842, 2003.
A. Unger. Hinreichende Optimalitätsbedingungen zweiter Ordnung und Konvergenz
des SQP-Verfahrens für semilineare elliptische Randsteuerpeobleme. PhD thesis,
Chemnitz University of Technology, Germany, 1997.
197
D. Wachsmuth. Regularity and stability of optimal controls of instationary NavierStokes equations. Control and Cybernetics, 34:387–410, 2005.
D. Wachsmuth. Analysis of the SQP method for optimal control problems governed
by the instationary Navier-Stokes equations based on Lp theory. SIAM Journal on
Control and Optimization, 46:1133–1153, 2007.