MS-E2139 Nonlinear Programming
5
Kimmo Berg
Optimality for inequality constrained problem
min f (x), S = {x ∈ X, gi(x) ≤ 0, 1 ≤ i ≤ m},
x∈S
g1
where gi : Rn 7→ R, X ⊂ Rn open, g = .. .
gm
Definition 5.1. d ∈ Rn is a descent direction of f at x0 if ∃δ > 0 s.t.
f (x0 + λd) < f (x0), ∀λ ∈ (0, δ). The cone of descent directions is d ∈ F .
Definition 5.2. Let S ⊂ Rn, x0 ∈ cl S. The cone of feasible directions
of S at x0 is D = {d ∈ Rn, d 6= 0̄, x0 + λd ∈ S, ∀λ ∈ (0, δ), for some δ > 0}.
69
MS-E2139 Nonlinear Programming
Kimmo Berg
Theorem (geometric optimality). x∗ is a local minimum iff there are no
feasible descent directions D ∩ F = ∅.
Theorem (4.2.2). Let f diff. at x∗ ∈ S. If x∗ is a local minimum then
F0 ∩ D = ∅.
Proof. F0 ⊂ F ⇒ F0 ∩ D = ∅ by geometric optimality.
Note. the condition is sufficient if f pseudoconvex and ∀x ∈ S ∩ N(x∗) ⇒
x − x∗ ∈ D.
70
MS-E2139 Nonlinear Programming
Kimmo Berg
Definition 5.3. The index set of active constraints at x0 is denoted by
I = {i, gi(x0) = 0} and the corresponding cone
G0 = {d, ∇gi(x0)T d < 0, ∀i ∈ I}.
Theorem (4.2.4). If gi, i ∈
/ I, continuous at x0 and gi, i ∈ I, differentiable
at x0 then G0 ⊆ D.
Proof. Since x0 ∈ X open, ∃δ1 > 0 s.t. x0 + λd ∈ X, ∀λ ∈ (0, δ1). Since
gi(x0) < 0, i ∈
/ I, are continuous, gi(x0 + λd) < 0, i ∈
/ I, ∀λ ∈ (0, δ2). If
d ∈ G0 then ∇gi(x0)T d < 0, i ∈ I. By Theorem 4.1.2 gi(x0 + λd) < gi(x0) = 0,
∀λ ∈ (0, δ3). Thus, x0 + λd ∈ S when λ ∈ (0, min(δ1, δ2, δ3)).
Note that G0 ⊆ D ⊆ G00, where G00 = {d 6= 0̄, ∇gi(x0)T d ≤ 0, i ∈ I}.
Also, D = G0 if gi, i ∈ I, are strictly ps.convex. D = G00 if they are strictly
ps.concave.
71
MS-E2139 Nonlinear Programming
Kimmo Berg
Theorem (4.2.5, road to FJ). Let x∗ ∈ S, gi, i ∈
/ I, continuous in x∗, gi,
i ∈ I, diff. at x∗. If x∗ is local minimum then F0 ∩ G0 = ∅.
Proof. By Theorem 4.2.2 F0 ∩D = ∅ and by Theorem 4.2.4 we have F0 ∩G0 ⊆
F0 ∩ D.
Note that the condition is sufficient if f ps.convex at x∗, gi, i ∈ I, strictly
ps.convex at N(x∗) for some > 0.
72
MS-E2139 Nonlinear Programming
Kimmo Berg
Example (4.2.6).
min (x1 − 3)2 + (x2 − 2)2
s.t. x21 + x22 ≤ 5,
x1 + x2 ≤ 3, x1 ≥ 0, x2 ≥ 0.
x∗ = (2, 1), I = {1, 2}, ∇f (x∗) = −(2, 2), ∇g1(x∗) = (4, 2), ∇g2(x∗) = (1, 1).
As should be F0 ∩ G0 = ∅, which in general does not imply that F0 ∩ D = ∅.
The problem does not satisfy the sufficient conditions since g2 is not strictly
ps. convex, and thus it cannot be said that x∗ is a local optimum only by
having F0 ∩ G0 = ∅. However, F0 ∩ G00 = ∅ ⇒ F0 ∩ D = ∅ and with
this we can say that x∗ is a local minimum. The feasible set is convex
and the objective is strictly convex, and thus x∗ is actually a unique global
minimum.
73
MS-E2139 Nonlinear Programming
Kimmo Berg
The idea is to use the separation theorems (Gordan and Motzkin) with the
geometric optimality to prove the algebraic conditions: the Fritz-John (FJ) and
finally the Karush-Kuhn-Tucker (KKT) conditions. FJ conditions are more
general but there are typically too many (nonoptimal) points that satisfy them.
By making more assumptions to the problem and its constraints with so called
constraint qualification (CQ) conditions, we can get rid of these nonoptimal
points, and we get the KKT from the FJ conditions.
Note that we cannot use the same technique to the equality constraints with
the following simple trick. We could define h(x) = 0 by h(x) ≤ 0 and −h(x) ≤ 0
but then the geometric optimality would not work since G0 = ∅ for all points.
74
MS-E2139 Nonlinear Programming
Kimmo Berg
Theorem (4.2.8, FJ necessary). If x∗ is a local minimum then ∃ u0, ui, i ∈ I,
s.t.
P
∗
u0∇f (x ) + i∈I ui∇gi(x∗) = 0̄,
(FJ1)
u0, ui ≥ 0, i ∈ I, uj 6= 0 for some j = 0 or j = i ∈ I,
where the last one could be written as (u0, uI ) 6= (0, 0̄). If also gi, i ∈
/ I
differentiable at x∗ then
Pm
∗
u0∇f (x ) + i=1 ui∇gi(x∗) = 0̄,
uigi(x∗) = 0, ∀i = 1, . . . , m,
(FJ2)
u0, u ≥ 0, (u0, u) 6= (0, 0̄).
Proof. Since x∗ is a local minimum, Theorem 4.2.5 implies F0 ∩ G0 = ∅. Let
0
m0 ≤ m be the number of indexes in I, A ∈ R(m +1)n with rows of ∇f (x∗)T and
∇gi(x∗)T , i ∈ I. Geometric optimality now means that @d ∈ Rn s.t. Ad < 0̄.
0
Theorem 2.4.9 (Gordan) implies that ∃p ≥ 0̄, p 6= 0̄, s.t. AT p = 0̄, p ∈ Rm +1.
Let us denote p = (u0, u1, . . . , um0 ). Thus, we have (F J1). The second equation
in (F J2), the complementary slackness condition, means that ui = 0, i ∈
/ I,
and it gives (F J2).
75
MS-E2139 Nonlinear Programming
Kimmo Berg
Note that if u0 = 0 then the conditions have no information about the objective.
Example. min f (x) s.t. g1(x) ≤ 0 and g2(x) ≤ 0. Now, any feasible x0
with ∇g1(x0) = −∇g2(x0) ⇒ G0 = ∅ and x0 is an FJ point.
There are too many FJ points and more assumptions are needed.
76
MS-E2139 Nonlinear Programming
Kimmo Berg
Theorem (4.2.13, KKT necessary). Assume ∇gi(x∗) are linearly independent. If x∗ is a local minimum then ∃ui ∈ R, i ∈ I s.t.
P
∗
∇f (x ) + i∈I ui∇gi(x∗) = 0̄,
(KKT1)
ui ≥ 0, i ∈ I.
If also gi, i ∈
/ I differentiable at x∗ then
∇f (x∗) + ∇g(x∗)T u = 0̄, (Lagrange optimality)
uigi(x∗) = 0, ∀i = 1, . . . , m, (complementary slackness)
(KKT2)
u ≥ 0. (dual feasibility)
The scalars ui are called the Lagrange multipliers or dual variables.
77
MS-E2139 Nonlinear Programming
Kimmo Berg
Example.
min (x1 − 3)2 + (x2 − 2)2
s.t. x21 + x22 ≤ 5,
x1 + 2x2 ≤ 4, x1 ≥ 0, x2 ≥ 0.
x∗ = (2, 1), I = {1, 2}. ∇g1(x∗) = (4, 2), ∇g2(x∗) = (1, 2). We can choose
the multipliers, e.g., u0 = 3 > 0, u1 = 1 > 0, u2 = 2 > 0 and u3 =
u4 = 0. These satisfy both FJ and KKT conditions (Lagrange multipliers
(1/3, 2/3)).
78
MS-E2139 Nonlinear Programming
Kimmo Berg
Example.
min −x1
s.t. x2 − (1 − x1)3 ≤ 0,
−x2 ≤ 0.
x∗ = (1, 0), I = {1, 2}. ∇g1(x∗) = (0, 1), ∇g2(x∗) = (0, −1). The constraints
gradients are linearly dependent. We can choose u0 = 0 and u1 = u2
arbitrarily so that FJ conditions hold. Note that the optimum does not
satisfy KKT conditions and there are no Lagrange multipliers.
79
MS-E2139 Nonlinear Programming
Kimmo Berg
Sufficient conditions
Theorem (4.2.16, KKT sufficient). Assume f and gI are convex. If x∗ is a
KKT point then x∗ is a global minimum. If the convexities hold in N(x∗)
for some > 0 then x∗ is a local minimum.
80
MS-E2139 Nonlinear Programming
Kimmo Berg
Extension: Production planning in continuous time*
This example is dynamic optimization and it is from Luenberger: optimization
by vector space methods p.234. Let us examine a production planning problem
where the decision variable is the production rate r(t) = ż(t), t ∈ (0, 1) and
z(t) is the amount of products manufactured. It is assumed that there are no
inventory costs and the demand rate d(t) = ṡ(t) is known, where s(t) is the
amount of sold units. It is assumed that the demand must be met
Z t
Z t
z(0) +
r(y)dy ≥
d(y)dy ⇔ z(t) ≥ s(t).
0
0
This means that the products available at time 0 plus production should be
greater than demand at all time instances.
Z 1
min 1/2
r2(t)dt
0
s.t. ż(t) = r(t), z(t) ≥ s(t), z(0) > 0,
81
MS-E2139 Nonlinear Programming
Kimmo Berg
2t, 0 ≤ t ≤ 1/2,
. The sales rate is
1, 1/2 ≤ t ≤ 1,
constant up to t = 1/2 and after that there is no sales. The space where
the problem is solved is chosen as X = Z = C [0, 1], the space ofR continuous
t
functions between 0 and 1, i.e., it is assumed that z(t) = z(0) + 0 r(k)dk is
continuous. Note that the minimum may not be in this space if there could be
jumps in the function. The dual space of continuous functions is N BV [0, 1],
normalized bounded variation functions, which may have finite number of finite
jumps. The Lagrange multiplier will belong to this space.
The Lagrange function is defined
Z 1
Z 1
φ(r, u) = 1/2
r2(t)dt +
(s(t) − z(t))du(t),
For example, z(0) = 1/2, s(t) =
0
0
where u ∈ N BV [0, 1] and u is nondecreasing. We can simplify the equation by
Leibniz integration rule and integration by parts
Z 1Z t
Z 1
Z 1
r(y)dydu(t) = /10
r(y)dyu(t) −
r(t)u(t)d(t).
0
0
0
82
0
MS-E2139 Nonlinear Programming
Kimmo Berg
Now, we get
Z
1
φ(r, u) = 1/2
Z0 1
= 1/2
0
r2(t)dt +
r2(t)dt +
Z
Z 1Z
1
(s(t) − z(0))du(t) −
r(y)du(du(t),
Z01
Z0 1
(s(t) − z(0))du(t) +
0
t
0
Z
r(t)u(t)dt − u(1)
0
since u(0) = 0 from normalization. The optimality conditions give
∂φ
= r∗(t) + u∗(t) − u∗(1) ≥ 0, ∀t,
∂r
r∗(t)(r∗(t) + u∗(t) − u∗(1)) = 0, ∀t,
u∗(t) varies only when z(t) = s(t),
u∗(t) is nondecreasing.
83
1
r(t)dt,
0
MS-E2139 Nonlinear Programming
Kimmo Berg
The economic interpretation of Lagrange multiplier is the same. Let J be the
total cost then
Z 1
Z 1
∆J =
∆s(t)du(t) = −
∆ṡ(t)u(t)dt + ∆s(t)u(1) − ∆s(0)u(0).
0
0
Since ∆s(0) = 0, u(1) = 0, we have
Z
∆J = −
1
∆d(t)u(t)dt,
0
i.e., −u(t) is the unit cost or the shadow price of extra demand. Now, this price
is zero when t > 1/2.
84
MS-E2139 Nonlinear Programming
6
Kimmo Berg
Equality and inequality constrained problem
For geometric optimality and feasible directions, we need more restrictive assumptions on the equality constraints and more mathematical machinery. The
next theorem gives the conditions that guarantee regularity in the constraints.
min f (x)
s.t. g(x) ≤ 0̄ ∈ Rm
h(x) = 0̄ ∈ Rl .
Definition 6.1. H0 = {d, ∇hi(x)T d = 0, i = 1, . . . , l}.
85
© Copyright 2026 Paperzz