KKT optimality conditions

KKT optimality conditions
Dimitar Dimitrov
Örebro University
May, 2011
1 / 35
Topics addressed in this material
KKT (first-order) necessary conditions
nonlinear equality constraints
nonlinear inequality constraints
second-order necessary conditions (nonlinear constraints)
Fritz John (first-order) necessary conditions
2 / 35
KKT necessary conditions (linear constraints) - recap
Consider the problem
minimize
f0 (x)
x∈Rn
subject to Ax ≤ b,
Cx = d,
A ∈ Rmi ×n
C ∈ Rme ×n .
Let x⋆ be a local minimizer of the above problem, where f0 is continuously
differentiable from Rn to R. Then the following conditions are satisfied.
First-order necessary conditions
Ax⋆ ≤ b,
⋆
Cx = d,
λ⋆ ≥ 0,
⋆
(Ax⋆ − b)T λ⋆ = 0,
T
⋆
T
⋆
∇f0 (x ) + A λ + C ν = 0,
primal feasibility condition
primal feasibility condition
dual feasibility condition
complementarity condition
stationarity condition
Additional assumption
If in addition, we assume that the normals to the active constraints at x⋆ are linearly
independent, then the Lagrange multipliers ν ⋆ and λ⋆ that satisfy the above
conditions are unique [7], pp. 316.
3 / 35
Stationarity condition (linear constraints) - restatement
By noting that the gradients of the linear constraints hi (x) = cT
i x − di = 0 and
fi (x) = aT
x
−
b
≤
0
are
given
by
∇h
=
c
and
∇f
=
a
,
respectively,
i
i
i
i
i
i
we can restate the stationarity condition as
∇f0 (x⋆ ) +
mi
X
i=1
λ⋆i ∇fi +
me
X
i=1
νi⋆ ∇hi = 0,
or
∇f0 (x⋆ ) +
X
i∈A(x⋆ )
λ⋆i ∇fi +
me
X
i=1
νi⋆ ∇hi = 0,
where A(x⋆ ) is a set containing the indexes of the active inequality constraints at x⋆ .
Using the Lagrangian function
The stationarity condition can be stated as
∇x L(x⋆ , λ⋆ , ν ⋆ ) = 0,
where L(x, λ, ν) is the Lagrangian function given by
L(x, λ, ν) = f0 (x) +
mi
X
i=1
λi fi (x) +
me
X
νi hi (x).
i=1
4 / 35
KKT necessary conditions (nonlinear constraints)
Regular point
A feasible vector x̃ at which the gradients of the active constraints are linearly
independent is called regular [7], pp. 285.
Consider the problem
f0 (x)
minimize
x∈Rn
subject to fi (x) ≤ 0,
hi (x) = 0,
i = 1, . . . , mi
i = 1, . . . , me
Let x⋆ be a local minimizer of the above problem, where fi and hi are continuously
differentiable from Rn to R (for all i). Then, if x⋆ is regular, there exist unique
⋆ ) that satisfy the
Lagrange multipliers λ⋆ = (λ⋆1 , . . . , λ⋆mi ), ν ⋆ = (ν1⋆ , . . . , νm
e
following conditions [7], pp. 316.
First-order conditions
fi (x⋆ ) ≤ 0,
i = 1, . . . , mi ,
primal feasibility condition
⋆
hi (x ) = 0,
i = 1, . . . , me ,
primal feasibility condition
λ⋆i ≥ 0,
i = 1, . . . , mi ,
dual feasibility condition
i = 1, . . . , mi ,
complementarity condition
λ⋆i fi (x⋆ ) = 0,
∇f0 (x⋆ ) +
mi
X
i=1
λ⋆i ∇fi (x⋆ ) +
me
X
i=1
µ⋆i ∇hi (x⋆ ) = 0,
stationarity condition.
5 / 35
KKT necessary conditions (nonlinear constraints) - summary
The problem
minimize
f0 (x)
x∈Rn
subject to fi (x) ≤ 0,
hi (x) = 0,
i = 1, . . . , mi
i = 1, . . . , me
First-order conditions
fi (x⋆ ) ≤ 0,
i = 1, . . . , mi ,
primal feasibility condition
hi (x⋆ ) = 0,
i = 1, . . . , me ,
primal feasibility condition
λ⋆i ≥ 0,
i = 1, . . . , mi ,
dual feasibility condition
i = 1, . . . , mi ,
complementarity condition
λ⋆i fi (x⋆ ) = 0,
∇f0 (x⋆ ) +
mi
X
i=1
λ⋆i ∇fi (x⋆ ) +
me
X
i=1
µ⋆i ∇hi (x⋆ ) = 0,
stationarity condition.
KKT (necessary) conditions for x⋆ to be a local minimizer for the problem
x⋆ is regular
⋆ )
there exist unique Lagrange multipliers λ⋆ = (λ⋆1 , . . . , λ⋆mi ), ν ⋆ = (ν1⋆ , . . . , νm
e
that satisfy the first-order conditions
6 / 35
Example (nonlinear constraints)
2.5
2
1.5
1
x2
0.5
0
∇f0 (x⋆ )
k∇f0 (x⋆ )k2
−0.5
x⋆
−1
−1.5
∇h(x⋆ )
k∇h(x⋆ )k2
−2
−2.5
−3
−2
level sets of f
−1
0
x1
1
2
3
Consider the problem ([3], pp. 308, [7], pp. 284)
minimize f0 (x) = x1 + x2
x∈R2
subject to h(x) = x21 + x22 − 2 = 0.
This is a problem with a linear objective function f (x) and one nonlinear equality
constraint h(x) = 0. At the solution x⋆ , the gradient of the constraint ∇h(x⋆ ) is
orthogonal to the level set of the function at x∗ , and hence ∇h(x⋆ ) and ∇f0 (x⋆ ) are
parallel i.e., there is a scalar ν ⋆ such that
∇f0 (x⋆ ) + ν ⋆ ∇h(x⋆ ) = 0.
Clearly, in this example x⋆ is regular (because ∇h(x⋆ ) 6= 0).
7 / 35
Example (nonlinear constraints)
2.5
∇f1 (x̃)
k∇f1 (x̃)k2
2
1.5
x̃
1
x2
0.5
0
∇f0 (x⋆ )
k∇f0 (x⋆ )k2
−0.5
x⋆
−1
−1.5
∇f1 (x⋆ )
k∇f1 (x⋆ )k2
−2
−2.5
−3
−2
−1
level sets of f
x10
1
2
3
Consider the problem ([3], pp. 310)
minimize f0 (x) = x1 + x2
x∈R2
subject to f1 (x) = x21 + x22 − 2 ≤ 0.
This is a problem with a linear objective function f (x) and one nonlinear inequality
constraint f1 (x) ≤ 0. At the solution x⋆ , the gradient of the constraint ∇f1 (x⋆ ) is
orthogonal to the level set of the function at x∗ , and the following equality holds
λ⋆
1
2
∇f0 (x⋆ ) + λ⋆ ∇f1 (x⋆ ) = 0,
for
= ≥ 0. Note that at the point x̃ = (1, 1), ∇f0 (x̃) + λ∇f1 (x̃) = 0 holds as
well, however λ = − 21 ≤ 0.
8 / 35
Example (nonlinear constraints)
2.5
2
1.5
1
x2
replacements
∇f1 (x⋆ )
k∇f1 (x⋆ )k2
0.5
0
∇f0 (x⋆ )
k∇f0 (x⋆ )k2
x⋆
−0.5
−1
∇f2 (x⋆ )
−1.5
−2
−2.5
−4
−3
−2
−1
x
1
0
1
2
Consider the problem ([3], pp. 313)
minimize f0 (x) = x1 + x2
x∈R2
subject to f1 (x) = x21 + x22 − 2 ≤ 0,
√
f2 (x) = −x2 ≤ 0.
At the solution x⋆ = (− 2, 0), −∇f0 (x⋆ ) belongs to the normal cone to the feasible
set at point x⋆ , hence, there is λ⋆ ≥ 0 that satisfies
∇f0 (x⋆ ) + λ⋆1 ∇f1 (x⋆ ) + λ⋆2 ∇f2 (x⋆ ) = 0.
9 / 35
We saw that when dealing only with linear constraints, the regularity of x⋆ affects
only the uniqueness of the Lagrange multipliers. In the general case (when nonlinear
constraints are present), however, it can have additional implications (as we show with
the following three examples).
Consider the problem ([2], pp. 78)
2.5
minimize f0 (x) = x1 + x2
x∈R2
2
subject to h1 (x) = (x1 + 1)2 + x22 − 2 = 0.
1.5
h2 (x) = (x1 − 1)2 + x22 − 2 = 0.
x2
This is a problem with a linear objective
function f0 (x) and two nonlinear equality
constraints h1 (x) = 0, h2 (x) = 0. At the
solution x⋆ (which is in fact the only
feasible point), no linear combination of
the gradients of the two constraints is
equal to ∇f0 (x⋆ ), i.e, there is no ν ⋆ such
that
1
∇f0 (x⋆ )
0.5
0
x⋆
∇h2 (x⋆ )
−0.5
∇h1 (x⋆ )
−1
−1.5
−2
−2.5
level sets of f
−3
−2
−1
x10
1
2
3
∇f0 (x⋆ ) + ν1⋆ ∇h1 (x⋆ ) + ν2⋆ ∇h2 (x⋆ ) = 0.
Hence, without the assumption that x⋆ is a regular point, the first-order conditions
are clearly not necessary for x⋆ to be a local minimizer.
10 / 35
2
x2 ≤ x31
1.5
1
∇f0 (x⋆ )
k∇f0 (x⋆ )k2
x2
replacements
0.5
feasible set
x⋆
0
−0.5
−1
−0.5
0
x2 ≥ 0
x1
0.5
1
1.5
Consider the problem ([5], pp. 203)
minimize f0 (x) = x1 + x2
x∈R2
subject to f1 (x) = x2 − x31 ≤ 0,
f2 (x) = −x2 ≤ 0.
This is a problem with a linear objective function f0 (x), one nonlinear inequality
constraint f1 (x) ≤ 0 and one linear inequality constraint f2 (x) ≤ 0. There are
infinitely many feasible points, however, at x⋆ (where both inequality constraints are
active), there is no λ⋆ satisfying
∇f0 (x⋆ ) + λ⋆1 ∇f1 (x⋆ ) + λ⋆2 ∇f2 (x⋆ ) = 0.
11 / 35
1.5
1
x2
replacements
∇f0 (x⋆ )
k∇f0 (x⋆ )k2
0.5
0
−0.5
−1
x⋆
−0.5
0
x1
0.5
1
1.5
Consider the problem ([6], pp. 116)
minimize f0 (x) = x1 + x2
x∈R2
subject to f1 (x) = x2 − x31 ≤ 0,
f2 (x) = x51 − x2 ≤ 0,
f3 (x) = −x2 ≤ 0.
At the solution x⋆ , all inequality constraints are active and their gradients are
co-linear. There is no λ⋆ satisfying
∇f0 (x⋆ ) + λ⋆1 ∇f1 (x⋆ ) + λ⋆2 ∇f2 (x⋆ ) + λ⋆3 ∇f3 (x⋆ ) = 0.
12 / 35
1.4
1.2
∇f1 (x⋆ )
k∇f1 (x⋆ )k2
x⋆
1
0.8
∇f2 (x⋆ )
k∇f2 (x⋆ )k2
2
x2
0.6
0.4
0.2
0
−0.2
−0.4
−0.6
∇f0 (x⋆ )
k∇f0 (x⋆ )k2
−0.5
0
0.5
x1
1
1.5
2
maximize f0 (x) = x1 + x2
x∈R2
subject to f1 (x) = x2 − x31 ≤ 0,
f2 (x) = x51 − x2 ≤ 0,
f3 (x) = −x2 ≤ 0.
Only the first two inequality constraints are active at the solution x∗ = (1, 1), which
satisfies the KKT necessary conditions with λ⋆1 = 3, λ⋆2 = 2 and λ⋆3 = 0. This can be
verified by solving the equation
T ⋆ T ⋆ −3
1
1
∇f1 (x1 )
λ1
λ1
⋆
=
−∇f
(x
)
⇒
=
.
∇f2 (x1 )
5
−1
1
λ⋆2
λ⋆2
13 / 35
What is the source of the problem?
In order the answer the above question, we need to revisit our definition of feasible
directions.
1.5
Feasible direction [6], pp. 88
x̃
1
∃δ > 0 such that x̃ + α∆x ∈ S,
for all α ∈ [0, δ].
0.5
x2
Suppose that we are at a point
x̃ ∈ S ⊆ Rn . ∆x ∈ Rn defines a feasible
direction at x̃, if a “small” step in the
direction ∆x does not lead outside of the
set S. In other words,
0
−0.5
−1
−1.5
−1.5
−1
−0.5
x10
0.5
1
1.5
Wait a minute ... feasible directions?
In the general case, when dealing with nonlinear constraints, there may be no feasible
directions, as defined above. The figure depicts an equality constraint
h(x) = x21 + x22 − 1 = 0.
Even an infinitesimally small step in any direction from a feasible point x̃ would lead
us outside of the set S (i.e., the boundary of the unit circle).
14 / 35
Feasible curves - nonlinear constraints
It is convenient to define vector valued

h1 (x)

.
..
h(x) = 
hme (x)
functions h(x) : Rn → Rme , f (x) : Rn → Rmi



f1 (x)



.
..
.
 , f (x) = 
fmi (x)
The set of points x such that h(x) = 0 is called a surface.
In order to retain feasibility with respect to a set of nonlinear equality constraints, we
have to move on the surface defined by them. We call a curve on which the function
h(x) remain identically equal to zero, a feasible curve (with respect to the equality
constraints).
Directed curve (also known as an arc).
We define an arc passing through a point x̃ as a set of points
{α(θ) : θ ∈ [θ, θ]},
parametrized by a single variable θ ranging from θ to θ. We assume that α(0) = x̃. A
feasible arc should satisfy for all θ
h(α(θ)) = 0,
f (α(θ)) ≤ 0.
We will consider only curves that are continuously differentiable, i.e.,
d(α(θ))
dθ
exists.
15 / 35
Example (feasible arc) [8], pp. 516
Consider the constraint
h(x) = x21 + x22 + x23 − 3 = 0.
There are infinitely many arcs passing
√
through the feasible point x̃ = ( 2, 0, 1).
Some of them are feasible arcs. Two such
feasible arcs are depicted on the figure.

 √
√2 cos(θ1 )

α1 (θ1 ) =
2 sin(θ1 )  ,
1
 √

3 cos(θ2 + γ)
,
0
α2 (θ2 ) =  √
3 sin(θ2 + γ)
where γ = √1 is an offest so that
3
α2 (0) = x̃, and θ1 , θ2 ∈ [−π, π].
1.5
1
x̃
0.5
0
−0.5
−1
−1.5
−1.5
−1
1
−0.5
0
0.5
0
1
1.5
−1
One can readily verify that h(α1 (θ1 )) = h(α2 (θ2 )) = 0. The tangents to both curves
at x̃ are depicted as well. They are given by


 √

− 3 sin(γ)
√0
dα
(θ
)
dα1 (θ1 ) 2
2
.
=
=
α̇1 (0) =
2  , α̇2 (0) =
√0
dθ1 θ1 =0
dθ2 θ2 =0
0
3
16 / 35
If x⋆ is a local minimizer of f0 , then it is a local minimizer of f0 along any feasible arc
passing through x⋆ [8], pp. 515.
Suppose that α(θ) is any such arc, with α(0) = x⋆ . Then, if θ = 0 is a local
minimizer of the one-dimensional function f0 (α(θ)), the derivative of f0 (α(θ)) with
respect to θ must vanish at θ = 0. Using the chain rule leads to
df0 (α(θ)) = ∇f0 (α(θ))T α̇(θ)
= ∇f0 (x⋆ )T α̇(0) = 0.
θ=0
dθ
θ=0
Let us define the set of all tangents to feasible arcs through x⋆
T (x⋆ ) := {p : p = α̇(0), for some feasible arc α(θ) through x⋆ }.
With the help of T (x⋆ ) we can state the following optimality condition
x⋆ is a local minimizer of f0 if
∇f0 (x⋆ )T p = 0,
for all p ∈ T (x⋆ ).
The above condition is not practical, since (in general) it is not easy to represent the
set of all feasible arcs explicitly.
17 / 35
The tangent cone
T (x⋆ ) is called the tangent cone
If we assume (for convenience) that 0 ∈ T (x⋆ ), then the set T (x⋆ ) is a cone. This
can be seen by noting that if p ∈ T (x⋆ ), so does µp, for some nonnegative µ. We
require for µ to be nonnegative, because our arc has direction.
By noting that “a tangent is a limiting direction of a feasible sequence” [3], pp. 316,
we can define the tangent cone in an alternative way
Tangent cone [7], pp. 343.
Given a subset X of Rn and a vector x ∈ X , a vector p is said to be a tangent of X
at x if either p = 0 or there exists a sequence {x(k) } ⊂ X such that x(k) 6= x for all
k and
x(k) → x,
x(k) − x
p
.
→
kpk
kx(k) − xk
The set of all tangents of X at x is called the tangent cone of X at x.
In words, the above definition states that a nonzero vector p is tangent to X at x if it
is possible to approach x with a feasible sequence {x(k) }, such that the normalized
direction sequence x(k) − x converges to the normalized direction p.
18 / 35
Example (tangent cone, equality constraint) [3], pp. 317
Consider again the equality constraint
1.5
h(x) = x21 + x22 − 1 = 0.
1
x(k) − x
,
kx(k) − xk
0.5
x2
The figure shows two feasible sequences
approaching x̃ (one depicted in red and
the other one in green). The green and red
arrows represent
x̃
0
−0.5
∇h(x̃)
k∇h(x̃)k2
−1
for some choices of feasible points x(k)
(depicted with green and red squares).
−1.5
−2
−1.5
−1
x−0.51
0
0.5
1
Any vector of the form p = (0, δ), where δ ∈ R, is tangent to a feasible sequence at x̃.
Observation
In this particular example, the tangent cone T (x̃) is equivalent to the set of vectors v
that satisfy
∇h(x̃)T v = 0.
Or in other words, T (x̃) = N (∇h(x̃)T ).
19 / 35
Example (tangent cone, equality constraints)
2.5
2
1.5
1
0.5
x2
Consider again the example where only one
feasible point x̃ = (0, 0) exists. In this
case, the tangent cone T (x̃) = 0, since
there are no feasible sequences that
converge to x̃.
0
x̃
∇h2 (x̃)
−0.5
∇h1 (x̃)
−1
−1.5
−2
−2.5
−3
−2
−1
x10
1
2
3
Only the vector p = (0, 0), is “tangent” to a “feasible sequence” at x̃.
Observation
In this particular example, the tangent cone T (x̃) is not equivalent to the set of
vectors v that satisfy
∇h1 (x̃)T
v = 0.
T
∇h2 (x̃)
20 / 35
Example (tangent cone, inequality constraint), [3], pp. 318
1.5
Consider again the feasible set defined by
the constraint
+
x22
− 1 ≤ 0.
0.5
Three feasible sequences approaching x̃ are
depicted. Any feasible arc must satisfy
f1 (α(θ)) ≤ 0,
1
for all θ.
x2
f1 (x) =
x21
x̃
0
−0.5
∇f1 (x̃)
k∇f1 (x̃)k2
−1
−1.5
−2
−1.5
−1
x−0.51
0
0.5
1
All vectors of the form p = (δ1 , δ2 ), where δ1 ∈ R+ , are tangent to a feasible
sequence at x̃.
Observation
In this particular example, the tangent cone T (x̃) is equivalent to the set of vectors v
that satisfy
∇f1 (x̃)T v ≤ 0.
21 / 35
Example (tangent cone, inequality constraints)
Consider again the feasible set in the
figure. Two feasible sequences approaching
x̃ are depicted. The green arrows represent
for some choices of feasible points x(k)
(depicted with green and red squares).
The red arrows are omitted. Any feasible
arc must satisfy
f1 (α(θ)) ≤ 0,
for all θ.
1
0.8
0.6
x2
x(k) − x
,
kx(k) − xk
1.2
0.4
0.2
0
−0.2
−0.2
x̃
0
0.2
0.4
0.6
x1
0.8
1
1.2
1.4
1.6
Only vectors of the form p = (δ, 0), where δ ∈ R+ , are tangent to a feasible sequence
at x̃.
Observation
In this particular example, the tangent cone T (x̃) is not equivalent to the set of
vectors v that satisfy


∇f1 (x̃)T
 ∇f2 (x̃)T  v ≤ 0.
∇f3 (x̃)T
22 / 35
Example (tangent cone, equality and inequality constraints) [6], pp. 116
Consider the feasible set defined by the
constraints
1.5
h1 (x) = x21 + x22 − 1 = 0,
1
f1 (x) = −x2 ≤ 0.
x2
0.5
The figure shows one feasible sequence
approaching x̃ (depicted in red). The red
arrows represent
0
∇h1 (x̃)
k∇h1 (x̃)k2
x̃
−0.5
x(k) − x
,
kx(k) − xk
∇f1 (x̃)
k∇f1 (x̃)k2
−1
for some choices of feasible points x(k)
(depicted with red squares).
−1.5
−2
−1.5
−1
x−0.51
0
0.5
1
Any vector of the form p = (0, δ), where δ ∈ R+ , is tangent to a feasible sequence at
x̃.
Observation
In this particular example, the tangent cone T (x̃) is equivalent to the set of vectors v
that satisfy both
∇h1 (x̃)T v = 0,
∇f1 (x̃)T v ≤ 0.
23 / 35
Approximation of T (x⋆ )
The tangent cone is not very amendable to manipulation [5], pp. 202, and it is
convenient to consider a way to approximate it. Our observations so far suggest that
except for some “special cases” it seems that a linear approximation is reasonable.
Define a set of linearized constraints at point x̃
Let the ith row of matrix C contain the gradient of the ith equality constraint, and
the ith row of matrix  contain the gradient of the ith active inequality constraint at
x̃ i.e.,

∇h1 (x̃)T


.
m ×n
..
C(x̃) = 
∈R e ,
T
∇hme (x̃)

Â(x̃) =
h
∇fi∈A(x̃) (x̃)T
i
∈ R|A(x̃)|×n ,
where |A(x̃)| denotes the cardinality of the set A(x̃) (i.e., the number of active
inequality constraints at x̃). Define a set F (x̃) of linearized constraints at point x̃ as
F (x̃) = {p ∈ Rn : C(x̃)p = 0, Â(x̃)p ≤ 0},
[3], pp. 316.
Note that the set F (x̃) is a cone (why?).
T (x̃) ⊆ F (x̃)
In all our examples we observed that the tangent cone at x̃ is a subset of F (x̃). This
is indeed the case in general [5], pp. 202.
24 / 35
When does T (x̃) = F(x̃) hold?
T (x̃) = F (x̃) holds if [5], pp. 203
All active constraints at x̃ are linear.
All active constraints at x̃ have linearly independent gradients (i.e., x̃ is a regular
point).
Essentially, the regularity assumptions guarantees that the linear approximation F (x̃)
captures the essential geometric features of the feasible set near x̃ [3], pp. 315.
We can think of F (x̃) as a set (in fact a subspace) of first order feasible variations.
Let us consider for example the ith equality constraint h1 (x) = 0. The Taylor-series
expansion about a feasible point x̃ along the direction p gives
h1 (x̃ + p) ≈ h1 (x̃) + ∇h1 (x̃)T p = ∇h1 (x̃)T p.
The constraint h1 (x̃ + p) = 0 retains feasibility, to first-order, when it satisfies
∇h1 (x̃)T p = 0. In case of an inequality constraint f1 (x) ≤ 0 we have
f1 (x̃ + p) ≈ f1 (x̃) + ∇f1 (x̃)T p = ∇f1 (x̃)T p.
Hence, using a similar argument, we conclude that ∇f1 (x̃)T p ≤ 0 is desirable.
25 / 35
Regularity assumption (again) - only nonlinear equality constraints
Worth noting ... (1)
Suppose that there are only nonlinear equality constraints. Again, let the ith row of
matrix C contain the gradient of the ith equality constraint, i.e.,
⋆


C(x ) = 

∇h1 (x⋆ )T

..
m ×n
∈R e .
.
∇hme (x⋆ )T
Then, if x⋆ is regular, i.e., rank(C(x⋆ )) = me , T (x⋆ ) = N (C(x⋆ )).
Of course, in this case the set F (x̃) is simply given by
F (x̃) = {p ∈ Rn : C(x⋆ )p = 0}.
Worth noting ... (2)
For a nice note regarding the “true set of feasible variations” T (x̃) and first-order
feasible variations F (x̃) see [7], pp. 287.
26 / 35
Constraint qualifications
[3], pp. 319
“Constraint qualifications are conditions under which the linearized feasible set F (x̃)
is similar to the tangent cone T (x̃). In fact, most constraint qualifications ensure that
these two sets are identical. As mentioned earlier, these conditions ensure that the
F (x̃), which is constructed by linearizing the algebraic description of the feasible set
at x̃, captures the essential geometric features of the feasible set in the vicinity of x̃,
as represented by T (x̃).”
The condition that the gradients of the active constraints at x̃ are linearly
independent, is known as the Linear Independence Constraint Qualification (LICQ).
There is a variety of other constraint qualifications that vary from abstract (and
difficult to check) to more specific (and easyly verifiable), but somewhat restrictive in
many situations [6], pp. 112. LICQ is a good example of the latter type of constraint
qualifications (it could pose strong assumptions in many practical situations
[6], pp. 132).
We will encounter the so called Slater’s constraint qualification when we deal with
convex optimization problems.
27 / 35
A word of caution
“It is important to note that the definition of tangent cone does not rely on the
algebraic specification of the feasible set, only on its geometry. The linearized feasible
direction set does, however, depend on the definition of the constraint functions.”
[3], pp. 316
Example
Let
h(x1 , x2 ) = x1 = 0.
This equality constraint yields the x2 axis, and every point on that axis is regular
because the gradient at each point is equal to ∇h = (1, 0). If instead we define
h(x1 , x2 ) = x21 = 0,
again the feasible set is the x2 axis but now no point on in is regular, because
∇h = (2x1 , 0) = (0, 0), since x1 = 0. [9], pp. 325
28 / 35
Second-order necessary conditions [8], pp. 519
Here, we assume that fi , hi (for all i) are twice continuously differentiable.
Recall that
if x⋆ is a local minimizer of f0 , then it is a local minimizer of f0 along any feasible arc
passing through x⋆ .
Suppose that α(θ) is any such arc, with α(0) = x⋆ . Then, if θ = 0 is a local
minimizer of the one-dimensional function f0 (α(θ)), the second derivative of f0 (α(θ))
with respect to θ must be nonnegative at θ = 0. Using the chain rule leads to
∇f0 (α(θ))T α̇(θ)
d2 f0 (α(θ))
=
= α̇(θ)T ∇2 f0 (α(θ))α̇(θ) + ∇f0 (α(θ))T α̈(θ) ≥ 0.
2
dθ
dθ
Hence, at θ = 0 (by using α(0) = x⋆ and p = α̇(0)) we have
d2 f0 (x⋆ )
= pT ∇2 f0 (x⋆ )p + ∇f0 (x⋆ )T α̈(0) ≥ 0.
dθ 2
(1)
“The second derivative along an arc depends not only on the Hessian of the objective,
but also on the curvature of the constraints (that is, on the term α̈(0))”
29 / 35
Getting rid of α̈(0)
For simplicity of presentation, we assume that there are only equality constraints
(however, the results can be extended to inequality constraints in a straightforward
way)
Note that
since h(θ) is constant along any feasible arc, its second derivative with respect to θ
must vanish for all θ (and in particular at θ = 0).
∇hi (α(θ))T α̇(θ)
d2 hi (α(θ))
=
= α̇(θ)T ∇2 hi (α(θ))α̇(θ) + ∇hi (α(θ))T α̈(θ) = 0.
2
dθ
dθ
Hence, at θ = 0 we have
d2 hi (x⋆ )
= pT ∇2 hi (x⋆ )p + ∇hi (x⋆ )T α̈(0) = 0.
dθ 2
Next, we multiply by νi⋆ and sum over all i to obtain
me
X
me
me
X
X
d2 hi (x⋆ )
T
2
⋆
=
p
(2)
ν
∇
h
(x
)p
+
νi ∇hi (x⋆ )T α̈(0) = 0.
i
i
dθ 2
i=1
i=1
i=1
{z
}
|
−∇f0 (x⋆ )T
P e
⋆
Since, from the first-order conditions we have that ∇f0 (x⋆ ) + m
i=1 νi ∇hi (x ) = 0.
νi
30 / 35
Getting rid of α̈(0) (continued)
Summing (1) and (2) leads to
pT
∇2 f0 (x⋆ ) +
me
X
i=1
!
νi ∇2 hi (x⋆ )
p ≥ 0.
for all tangent vectors p ∈ T (x⋆ ). The matrix in the brackets is the Hessian of the
Lagrangian function with respect to x at the point (x⋆ , ν ⋆ ), i.e.,
∇2xx L(x⋆ , ν ⋆ ) = ∇2 f0 (x⋆ ) +
me
X
i=1
νi ∇2 hi (x⋆ ),
where the Lagrangian function is given by
L(x, ν) = f0 (x) +
me
X
νi hi (x).
i=1
Note that since p ∈ T (x⋆ ), it follows that p ∈ N (C(x⋆ )), hence we can express p as
p = Z(x⋆ )pz ,
for some pz , where the columns of Z(x⋆ ) span N (C(x⋆ )).
31 / 35
Projected Hessian of the Lagrangian
Two equivalent conditions:
pT ∇2xx L(x⋆ , ν ⋆ )p ≥ 0.
for all tangent vectors p ∈ T (x⋆ ).
⋆ T 2
⋆
⋆
⋆
pT
z Z(x ) ∇xx L(x , ν )Z(x )pz ≥ 0.
for all pz ∈ Rz , where z is the dimension of the null space of C(x⋆ ).
Second-order necessary condition
The matrix Z(x⋆ )T ∇2xx L(x⋆ , ν ⋆ )Z(x⋆ ) is called the projected Hessian of the
Lagrangian, and it should be positive semidefinite at a local minimizer x⋆ .
Note that the second-order necessary conditions for a nonlinear constrained problem
depend on a “special combination” of the Hessian of the objective function and
Hessian matrices of the constraints.
32 / 35
Fritz John necessary conditions
It is interesting to see that we can formulate necessary conditions for optimality
without the regularity assumption. We illustrate this on the following nonlinear
inequality constrained problem
minimize
f0 (x)
x∈Rn
subject to fi (x) ≤ 0,
i = 1, . . . , mi .
If x⋆ is a local minimizer for the above problem, then there exist multipliers λ⋆i ,
i = 0, . . . , mi (not all equal to zero), such that
mi
X
λ⋆i ∇fi (x⋆ ) = 0,
λ⋆i fi (x⋆ ) = 0,
i = 1, . . . , mi ,
λ⋆i ≥ 0,
i = 0, . . . , mi .
i=0
For a proof (using the Farkas’ lemma) see [6], pp. 122.
Essentially, a multiplier λ⋆0 is assigned to the gradient of the objective function, so
(loosely speaking) in cases when x⋆ is not regular, λ⋆0 = 0 can be chosen. Of course
setting λ⋆0 = 0 means that the objective function plays no role in the optimality
conditions - this is a rather unexpected (and generally speaking, unwanted situation)
[6]. pp. 124.
33 / 35
Example (Fritz John necessary conditions)
Consider again the problem ([5], pp. 203)
2
minimize f0 (x) = x1 + x2
x∈R2
subject to f1 (x) = x2 − x31 ≤ 0,
x⋆
Clearly, the solution
is not regular, and
there is no λ⋆ = (λ⋆1 , λ⋆2 ) satisfying
∇f0 (x⋆ ) + λ⋆1 ∇f1 (x⋆ ) + λ⋆2 ∇f2 (x⋆ ) = 0.
1
∇f0 (x⋆ )
k∇f0 (x⋆ )k2
x2
f2 (x) = −x2 ≤ 0.
0.5
feasible set
x⋆
0
−0.5
The Fritz John necessary conditions
0
0
1
+λ⋆1
+λ⋆2
= 0,
λ⋆0
1
−1
1
| {z }
| {z }
| {z }
∇f0 (x⋆ )
∇f1 (x⋆ )
∇f2 (x⋆ )
x2 ≤ x31
1.5
−1
−0.5
λ0 , λ1 , λ2 ≥ 0,
0
x2 ≥ 0
x1
0.5
1
1.5
(not all λ⋆i equal to zero),
are satisfied for λ⋆0 = 0, and λ⋆1 = δ, λ⋆2 = δ, for any δ > 0. Note that the
complementarity condition is satisfied because all constraints are active at x⋆ .
34 / 35
[1]
S. Boyd, and L. Vandenberghe, “Convex Optimization,” Cambridge, 2004.
[2]
P. E. Gill, W. Murray, and M. H. Wright, “Practical Optimization,” Emerald,
2007.
[3]
J. Nocedal, and S. J. Wright, “Numerical Optimization,” Springer.
[4]
D. Bertsimas, and J. N. Tsitsiklis, “Introduction to Linear Optimization,” Athena
Scientific, 1997.
R. Fletcher, “Practical Methods of Optimization,” Wiley, (2rd edition), 1987.
[5]
[6]
N. Andréasson, A. Evgrafov, and M. Patriksson, “An Introduction to Continuous
Optimization: Foundations and Fundamental Algorithms,” 2005.
[7]
D. P. Bertsekas, “Nonlinear Programming,” Athena Scientific, (3rd print) 2008.
[8]
I. Griva, S. G. Nash, A. Sofer, “Linear and Nonlinear Optimization,” SIAM, 2009.
[9]
D. G. Luenberger, Y. Ye, “Linear and Nonlinear Programming,” Springer, 2010.
35 / 35