Lecture 8- Nonlinear Objective with Nonlinear Constraints

Lecture 8- Nonlinear Objective with Nonlinear Constraints
Professor:Kevin Ross
Scribe: Megha Mohabey (2010), Damian Eads (2006), Pritam Roy (2005)
Oct 19, 2010
1
Karush-Kuhn-Tucker (KKT) Conditions
1.1 KKT Conditions
Text (Hiller & Lieberman) talks about standard form as
max
f (x)
s.t gi (x) ≤ bi
≥
xi
0
f (x) and g(x) can be any function
KKT conditions assume that f (x), g1 (x), . . . , gm (x) are dierentiable functions satisfying certain
regularity conditions
x∗ = (x∗1 , . . . , x∗n )
can be optimal to the optimization only if there exists m numbers u1 , . . . , um satisfying
Pm
∂gi
∂f
i=1 ui ∂xj ∂xj −
P
∂f
∂gi
x∗j ∂x
− m
i=1 ui ∂xi
j
gi (x∗ ) − bi ≤ 0
ui (gi (x∗ ) − bi ) = 0
)

≤ 0 
= 0 
at x = x∗ , for j = 1, 2, . . . , n
for i = 1, 2, . . . , m
x∗j ≥ 0 for j = 1, 2, . . . , n
(1)
(2)
(3)
(4)
ui ≥ 0 for i = 1, 2, . . . , m
These are necessary conditions
Sucient conditions:
gi (x) is convex i = 1, . . . , m and f (x) is concave
1
1.2 Eample
Example in text (H&L page 574 of 8th edition)
maximize f (x) = ln(x1 + 1) + x2
subject to 2x1 + x2 ≤ 3
and x1 , x2 ≥ 0
Can we nd an optimal point using KKT ?
1
− 2u1 ≤ 0
x1 + 1
1
− 2u1
= 0
x1
x1 + 1
1 − u1 ≤ 0
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
x2 (1 − u1 ) = 0
2x1 + x2 − 3 ≤ 0
u1 (2x1 + x2 − 3) = 0
x1 ≥ 0
x2 ≥ 0
u1 ≥ 0
• From Eq (7) u1 ≥ 1 and from Eq (11) x1 ≥ 0 therefore in Eq (5)
1
x1 +1
− 2u1 < 0
• Thus from Eq (6) x1 = 0
• u1 ≥ 1, so u1 6= 0, thus in Eq (10) (2x1 + x2 − 3) = 0 and x1 = 0, so x2 = 3
• From Eq (8), as x2 6= 0 so u1 = 1
• Thus x2 = 3, x1 = 0, u1 = 1. This satises the KKT conditions therefore is a potential
optimal. We can easily check the sucient conditions to conrm.
2
KKT Conditions (Other standard form)
The Karush-Kuhn-Tucker (KKT) conditions are the necessary and sucient conditions for optimality of a solution to a constrained nonlinear program. Now we will look at the KKT conditions for
a dierent standard form
min f (x)
subject to
Ci (x) = 0 for i ∈ E
Ci (x) ≥ 0 for i ∈ I
The Langrangian Lf of a function f is dened by
Lf (x, λ) =
m
X
λi Ci (x)
i
We say an equality constraint Ci is binding at x if Ci (x) = 0. All equality constraints must be
binding. Thus, we dene the active set A(x) is the set of binding constraints at x,
2
E ∪ {i ∈ I|Ci (x) = 0}
The Linear Independence Constraint Qualication (LICQ) is necessary condition for x∗ to be
optimal. It requires the gradients of the constraints in the active set A(x∗ ) to be linearly independent,
∇Ci (x∗ )∀i ∈ A(x∗ )
If x∗ is a local minimum and LICQ holds at x∗ then there exists a Lagrange multiplier vector
each component λ∗i corresponding to a constraint.
λ∗ with
2.1 First-order Condition
We require
(a) the gradient of the Langrangian to be zero,
∇x Lf (x∗ , λ∗ ) =
∂Lf ∗ ∗ ∂Lf ∗ ∗
∂Lf ∗ ∗
(x , λ )
(x , λ ) . . .
(x , λ ) = 0
∂x1
∂x2
∂xn
(b) the constraints to be obeyed,
Ci (x∗ ) = 0 for i ∈ E
Ci (x∗ ) ≥ 0 for i ∈ I
and
(c) the product of each constraint equation Ci and its respective Lagrange multiplier λ∗i to be
zero,
λ∗ Ci (x∗ ) = 0 ∀i ∈ E ∪ I
These conditions can be restated in terms of the active set for x∗ to give
0 = ∇x Lf (x∗ , λ∗ ) = ∇f (x∗ ) −
X
λ∗ ∇Ci (x∗ )
i∈A(x∗ )
2.2 Second-order Necessary Conditions
Suppose x∗ is a local minimum of f and the LICQ is satised at x∗ . Let λ∗ be a Lagrange multiplier
satisfying the rst order conditions then
wT ∇xx Lf (x∗ , λ∗ )w ≥ 0 for all w ≥ 0
for all w that satises ∇x CiT (x∗ )w = 0. Note that ∇xx Lf is the Hessian of Lf ; its ij th entry is
f
the result of taking partial derivative ∂x∂L
.
i ∂xj
2.3 Second-order Sucient Conditions
The second-order sucient conditions are the same as the necessary conditions except we require
wT ∇xx Lf (x∗ , λ∗ )w > 0
when w 6= 0.
3
2.4 Example
Given the nonlinear programming problem,
minimize f (x) = x1 + x2
subject to C(x) = x21 + x22 − 2 = 0
is x∗ = [−1 − 1] optimal? We compute the gradient of the sole constraint and evaluate it at x∗ ,
"
2x1
2x2
#
"
−2
−2
#
∇x C =
∗
∇x C(x ) =
We derive the Lagrangian,
Lf (x, λ) = x1 + x2 − λ(x21 + x22 − 2)
Note that since there is only one constraint, λ is treated as a scalar.
2.4.1
First-order Condition
To satisfy the rst order condition, we rst take the gradient of the Lagrangian,
"
∇x Lf (x, λ) =
1 − 2λx1
1 − 2λx2
#
We must nd a λ so that this gradient at x∗ is zero,
"
∗
∇x Lf (x , λ) =
1 − 2λx∗1
1 − 2λx∗2
#
"
=
1 − 2λ(−1)
1 − 2λ(−1)
#
=
When λ = − 21 the rst order condition is satised.
2.4.2
Second-order Condition
The rst step is to compute the Hessian of the Lagrangian,
"
∇xx Lf (x, λ) =
−2λ
0
0
−2λ
and then plug in λ∗ = − 12 to obtain the identity matrix,
"
∗
∗
∇xx Lf (x , λ ) =
4
1 0
0 1
#
"
#
1 + 2λ
1 + 2λ
#
=0
There is only one vector w that satises
wT ∇
x
C(x∗ )
= 0, namely w =
"
1
−1
#
so we only need
to check that wT ∇xx Lf (x∗ , λ∗ )w ≥ 0 for this one w. We observe that the gradient
wT ∇xx Lf (x∗ , λ∗ )w = wT Iw = [ 1 −1 ]wT = 2
which is clearly greater than or equal to zero so the necessary condition is satised, and also
greater than zero so the sucient condition is satised.
2.4.3
Summary
Having satised the rst and the second order KKT conditions, x∗ must be an optimal solution to
the problem stated.
Consider the example again,
minimize f (x) = x1 + x2
subject to C(x) = x21 + x22 − 2 ≥ 0
"
∇x C =
2x1
2x2
#
Compare this with the related problem
minimize f (x) = x1 + x2
subject to C(x) = 2 − x21 + x22 ≥ 0
Shaded region shows the feasible region. Thus we can see that the sign of λ matters
Check problems in this class
e.g. max x1 + x2
min x21 + x22
Exercise:
5
3
Duality in Nonlinear Programming
Let us recall from the LP duality results the following facts.
• The objective of Primal problem is to minimize a function then dual problem has a maximizing
objective and vice versa.
• Dual of dual is primal.
• Objective values give bounds to each other (weak duality).
• If one has an optimal solution, then so does the other and objective values are the same
(strong-duality).
We can enlist some dierences in duality theory in NLP from LP
• This is not straight forward as their linear counterpart.
• Lagrangian multipliers are used as dual variables.
• A problem has to be solved for
lambda -values.
We care for duality of an optimization problem because
• Dual problem can give an estimate of Lagrangian multiplier.
• Good multiplier estimates leads to good solution estimates for the primal problem
• Sometimes dual problems can be solved directly.
3.1 Duality Example From a Game
Let us assume that there are two players P1 and P2 playing a two player game, where
• P1 chooses from X and P2 chooses from Y at the same time.
• If one player wins, then the other looses i.e.
zero-sum game.
• We assume that both players are rational and want to
minimize the worst possible outcome.
Let P1 pays F (x, y) to P2, so
worst for P1 = for P2 = F ∗ (x) = maxy∈Y F (x, y)
P1 wants this function to have as small value as possible. So the function P1 want to solve is,
minx∈X maxy∈Y F (x, y)
Similarly, P2 will like to solve,
maxy∈Y minx∈X F (x, y)
We clearly note that two players are solving the dual problems. Moreover, two problems satisfy
weak duality,
maxy∈Y minx∈X F (x, y) ≤ minx∈X maxy∈Y F (x, y)
6
(13)
3.1.1
Example
Let A = payo matrix =
−1 4
2 3
!
Let P1 chooses rows and P2 chooses columns. Here optimal
solution for both problems are same. But we can produce some counter example where two solutions
are dierent.
!
−1 4
If A =
then max-min problem has optimal solution 1 whereas min-max problem has
2
1
optimal solution 2.
Duality Gap: Dierence between the solution of one problem to that of another.
3.2 Strong Duality, Weak Duality and Convex Duality
We have seen in the previous section that weak duality holds in the NLP problems. For strong
duality, let us dene a term saddle point.
Saddle Point: {(x∗ , y ∗ )|x∗ ∈ X, y∗ ∈ Y } is a saddle point if
F (x∗ , y) ≤ F (x∗ , y ∗ ) ≤ F (x, y ∗ )
(14)
for all x ∈ X, y ∈ Y . The idea is that
x∗ minimizes F (x, y) at y = y ∗ and
y ∗ minimizes F (x, y) at x = x∗ .
Theorem of Strong Duality:
maxy∈Y minx∈X F (x, y) = minx∈X maxy∈Y F (x, y)
(15)
holds if and only if there exists a pair (x∗ , y ∗ ) satisfying the saddle point condition.
3.3 Lagrangian Duality
Min-max duality is the basis for more general Lagrangian Duality.
Minimize f (x) subject to, g(x) ≥ 0 for x ∈ X . Let us dene,
L(x, λ) = f (x) − λT g(x)
(16)
L∗ (x) = maxλ≥0 L(x, λ) = maxλ≥0 (f (x) − λT g(x))
(17)
The primal problem is,
with constraints g(x) ≥ 0 for x ∈ X .
Hence we have,
L∗ (x) = f (x) if g(x) ≥ 0 feasible for x
= ∞ otherwise
Now the primal min-max problem is
minx∈X L∗ (x)
7
(18)
Hence, the dual problem will be
L∗ (λ) = minx∈X L(x, λ)
(19)
So the corresponding max-min problem would be
maxλ≥0 L∗ (λ) i.e. maxλ≥0 minx∈X (f (x) − λT .g(x))
3.3.1
(20)
Weak Duality Theorem
Let x be a feasible solution to primal problem and (x, λ) be the feasible solution pair to dual
problem. Then, f (x) − λT g(x) ≤ f (x) leads to,
maxλ≥0 {L∗ (λ)} ≤ minx∈X {f (x) : g(x) ≥ 0}
Duality Gap:
(21)
Dierence between solutions of primal and dual problem.
3.4 Convex Duality
If f is convex in minimize f (x) subject to g(x) ≥ 0 and gi0 s are all concave and assume all functions are continuously dierentiable and x∗ is a regular point of constraints. Let λ∗ = Lagrangian
Multiplier of x∗ , then (x∗ , λ∗ ) is dual feasible, and primal and dual objective values are equal.
8