Feasible directions

Feasible directions
Consider the problem:
min f (x),
x∈U
where U = {x : gi (x) ≤ 0, i = 1, . . . , m} ⊂ Rn is a nonempty set
and x̃ ∈ cl U (the closure of U).
◮
The cone of feasible directions of U at x̃
D = {d : d 6= 0, x̃ + td ∈ U, ∀t ∈ [0, δ) for some δ > 0}
◮
The cone of improving directions
F = {d : f (x̃ + td ) < f (x̃), ∀t ∈ [0, δ)}
◮
Define F0 = {d : ∇f (x̃)⊤ d < 0}
Thm. 6.1
Assume that f (x) is differentiable at point x̃ ∈ U.
1. If x̃ is a local minimum, then F0 ∩ D = ∅.
2. Conversely, suppose that F0 ∩ D = ∅, f (x) is convex and at x̃
there exist ǫ > 0 such that
d = x − x̃ ∈ D, ∀x ∈ U ∩ B(x̃, ǫ)
where B(x̃, ǫ) = {x : kx − x̃k < ǫ}. Then x̃ is a local
minimum.
Obs! For convex functions F0 = F . Generally this does not hold.
Algebraic characterization of D
◮
Problem NLP
Minimize
subject to
◮
f (x)
gi (x) ≤ 0, i = 1, . . . , m.
The index set of the active constraints at x̃:
I (x̃) = {i : gi (x̃) = 0}.
Lemma 6.2
If the functions gi (x), i ∈ I , are differentiable at x̃ and the
functions gi continuous fori ∈
/ I (x̃), then
D = {d : ∇gi (x̃)⊤ d ≤ 0, i ∈ I (x̃)} = G0 .
Optimality criterion
Thm. 6.3
Assume that x̃ ∈ U and that f (x) and gi (x), i = 1, . . . , m, are
differentiable at x̃.
1. If x̃ ∈ U is a local minimum, then F0 ∩ G0 = ∅
2. If F0 ∩ G0 = ∅, f (x) is convex and gi (x), i ∈ I (x̃) are
differentiable at x̃ and gi continuous for i ∈
/ I (x̃), then x̃ is a
local minimum.
Example 1
Minimize
Subject to
(x1 − 1)2 + (x2 − 1)2
x1 + x2 − 1 ≤ 0
Farkas lemma
Farkas lemma
Let A ∈ Rm×n . Then only one of the following alternatives hold:
1. There exists a vector d ∈ Rn such that Ad < 0
⊤
2. There exists a nonzero vector p ∈ Rm
+ such that A p = 0.
Karush-Kuhn-Tucker conditions
Thm. 6.4
Let x̃ be a feasible solution and I (x̃) as before. Suppose that f (x)
and gi (x) are differentiable and that the vectors
{∇gi (x̃), i ∈ I (x̃), at x̃ are linearly independent. If x̃ is a local
minimum, then there exists scalars ui , i ∈ I (x̃) such that
X
∇f (x̃) +
ui ∇gi (x̃) = 0
i ∈I (x̃)
ui ≥ 0, i ∈ I (x̃).
Proof of KKT-conditions
(i) Thm. 6.3 ⇒ there does not exist a vector d ∈ F0 ∩ D, ie. the
set
{d : ∇f (x̃)⊤ d < 0, ∇gi (x̃)⊤ d < 0, i ∈ I (x̃)} = ∅
(ii) Define matrix A = [∇f (x̃), ∇gI (x̃ ) (x̃)]⊤ . Hence the system
Ad < 0 is inconsistent. Then by the Farkas lemma the system
p ≥ 0, A⊤ p = 0, p 6= 0
has a solution.
(iii) Denote p = [p0 , pI (x̃) ]. Then by (ii) we have
p0 ∇f (x̃) +
X
i ∈I (x̃)
pi ∇gi (x̃) = 0.
Proof continues
(iv) We must have p0 > 0, because otherwise we violate the
linear independence of ∇gi (x̃), i ∈ I (x̃).
(v) The statement follows by setting ui =
pi
p0 .
◮
The scalars ui are the Lagrangian multipliers.
◮
x̃ ∈ U is the primal feasibility condition.
◮
The dual feasibility condition:
X
∇f (x̃) +
ui ∇gi (x̃) = 0, ui ≥ 0, i ∈ I (x̃).
i ∈I (x̃)
Alternative form of KKT
◮
If i ∈
/ I (x̃), then set ui = 0.
◮
KKT-conditions:
∇f (x̃ ) +
m
X
ui ∇gi (x̃) = 0
i =1
ui ≥ 0, i = 1, . . . , m
ui gi (x̃) = 0, i = 1, . . . , m.
◮
in vector form
∇f (x̃) + ∇g(x̃)⊤ u = 0
u⊤ g(x̃ ) = 0
u ≥ 0.
KKT sufficient conditions
Thm. 6.5
Let x̃ be a KKT solution and assume that all constraints are
active. Then
(a) If there exists ǫ > 0 such that the functions f and gi (x) are
convex differentiable over B(x̃, ǫ) ∩ U, then x̃ is a local
minimum.
(b) Assume that f and gi (x) are differentiable. If f is
pseudoconvex, e.g. the condition ∇f (z)⊤ (y − z) ≥ 0 implies
that f (y ) ≥ f (z), and the level sets of gi (x), i = 1, . . . , m
are convex. Then x̃ is the global optimal solution.
Obs! For convex optimization problem the KKT conditions are
also sufficient conditions.
Proof of the sufficiency
Let x ∈ U be any point. Then
f (x̃ ) ≤ f (x̃) −
m
X
λi gi (x)
(λi ≥ 0, gi (x) ≤ 0)
i =1
λi = 0, i ∈
/ I (x̃)
⇒
gi (x̃) = 0, i ∈ I (x̃)
X
= f (x̃) −
λi (gi (x) − gi (x̃))
i ∈I (x̃)
≤ f (x̃) −
X
λi ∇gi (x̃)⊤ (x − x̃), (gi convex )
i ∈I (x̃)
= f (x̃) + ∇f (x̃)⊤ (x − x̃), (KKT)
≤ f (x)
(f is convex)
Interpretation
◮
KKT-conditions express: There exists a function
Fx̃ (x) = f (x) +
m
X
λi (x̃ )gi (x)
i =1
which depends on x̃ such that
(
f (x̃) = inf x∈U f (x) ⇒ ∇x Fx̃ (x̃) = 0
f (x̃) = Fx̃ (x̃)
◮
If λ(x̃) is known, we end up to the necessary condition for a
unconstrained optimization.
Saddle point
◮
Let V and M be any sets and
L:V ×M →R
◮
a function.
The point (x̃, λ) ∈ V × M is a saddle point, if
sup L(x̃, µ) = L(x̃, λ) = inf L(x, λ)
µ∈M
x∈V
Saddle point property
Thm. 6.6
If (x̃, λ) is a saddle point of a function L : V × M → R, then
sup inf L(x, µ) = L(x̃, λ) = inf sup L(x, µ).
µ∈M x∈V
x∈V µ∈M
Proof:
◮
∀(x̄, µ̄) ∈ V × M : inf x∈V L(x, µ̄) ≤ L(x̄, µ̄) ≤ supµ∈M L(x̄, µ)
◮
Since (x̃, λ) is a saddle point; we have
inf sup L(x, µ) ≤ sup L(x̃, µ) = L(x̃, λ)
x∈V µ∈M
µ∈M
= inf L(x, λ) ≤ sup inf L(x, µ).
x∈V
µ∈M x∈V
A general non-linear programming problem
◮
◮
The cost function f : Rn → R
The constraint set
U = {x ∈ Rn : gi (x) ≤ 0, 1 ≤ i ≤ m}.
◮
The problem (P):
f (x̃) = inf f (x).
x∈U
◮
◮
◮
Every solution x̃ of the problem (P) is also the first argument
of a saddle point (x̃, λ) of a suitable Lagrangian function
L : Rn × M → R.
Conversely; if (x̃ , λ) is a saddle point of this Lagrangian, then
x̃ is a solution of (P).
The second argument λ is nothing else than the vector from
Rm
+ which appears in the KKT-conditions.
The Lagrangian
The Lagrangian associated with the problem (P)
L(x, µ) = f (x) +
m
X
µi gi (x).
i =1
Thm. 6.7
1. If (x̃, λ) ∈ Rn × Rm
+ is a saddle point of the Lagrangian L,
then the point x̃ ∈ U is a solution of the problem (P).
2. Suppose that the functions f (x) and gi (x) are convex,
differentiable at a point x̃ in which the constraints are
satisfied. Then, if x̃ is a solution of (P), there exists at least
one vector λ ∈ Rm
+ such that the pair (x̃, λ) is a saddle point
of the Lagrangian.
The associated unconstrained problem
◮
If one knew λ, then the constrained problem (P) would be
replaced by the unconstrained problem (Pλ ):
find xλ such that
xλ ∈ Rn : L(xλ , λ) = inf n L(x, λ)
x∈R
◮
◮
How do we find λ ∈ Rm
+?
The dual problem (Q)
find λ such that
λ ∈ Rm
+ : G (λ) = sup G (µ),
µ∈Rm
+
where
G (µ) = inf n L(x, µ)
x∈R
The dual problem
◮
The dual problem is a constrained optimization problem;
◮
But the constraints are easily dealt with:
µi ≥ 0, i = 1, . . . , m
◮
The associated projection operator is quite simple
P+ (µi ) = max{0, µi }.
◮
The constraints gi (x) ≤ 0 are, in general, impossible to deal
with numerically.
Primal-dual problem
Thm. 6.8
1. Suppose that the constraint functions gi (x), i = 1, . . . , m,
are continuous and that for every µ ∈ Rm
+ the problem (Pµ )
has a unique solution xµ which depends continuously on the
dual variable µ. Then, if λ is any solution of the dual problem
(Q), the solution xλ of the corresponding problem (Pλ ) is a
solution of the primal problem (P).
2. Suppose that the primal problem (P) has at least one
solution x̃, and that the functions f (x), gi (x) are convex and
differentiable and x̃ ∈ U. Then the dual problem (Q) has at
least one solution.
Quadratic optimization problem
◮
The quadratic functional
1
f (x) = x ⊤ Ax − b⊤ x,
2
where A is
◮
◮
◮
◮
symmetric
positive definite
b a given vector
The constraint set
U = {x ∈ Rn : Cx ≤ d }
where C ∈ Rm×n , d ∈ Rm .
The primal-dual formulation
◮
The Lagrangian functional
1
L(x, µ) = x ⊤ Ax − b⊤ x + µ⊤ (Cx − d )
2
1
= x ⊤ Ax − (b − C ⊤ µ)⊤ x − µ⊤ d
2
◮
Thm 6.X ⇒ The primal problem (P) has a unique solution x̃
◮
Thm 6.7 ⇒ it has at least one saddle point (x̃, λ)
◮
Uniqueness of the saddle point ⇔ uniqueness of the dual
problem
◮
We have to verify: For every µ ∈ Rm
+ the problem (Pµ ) has a
unique solution.
Uniqueness of (Pµ)
The function L(·, µ) is quadratic
1
L(x, µ) = x ⊤ Ax − bµ⊤ x + cµ
2
and it has a unique minimum xµ satisfying
Axµ = b − C ⊤ µ ⇔ xµ = A−1 (b − C ⊤ µ)
which depends continuously on µ.
The dual function G (µ)
◮
◮
By the uniqueness of xµ and (Axµ )⊤ x = bµ⊤ x we have
1
1
G (µ) = L(xµ , µ) = xµ⊤ Axµ − bµ⊤ xµ + cµ = − bµ⊤ xµ
2
2
1
= − (b − C ⊤ µ)⊤ xµ − µ⊤ d
2
1
= − (b − C ⊤ µ)⊤ [A−1 (b − C ⊤ µ)] − µ⊤ d .
2
After some algebraic gymnastics
1
1
−G (µ) = µ⊤ CA−1 C ⊤ µ − (CA−1 b − d )⊤ µ + b⊤ Ab.
2
2
◮
◮
The matrix CA−1 C ⊤ is symmetric and positive definite if and
only if Rank(C ) = m (≤ n).
When RankC = m ⇔ ImC = Rm ⇔ N(C ) = 0, the dual
problem has a unique solution.
Duality gap
Primal problem
◮
Dual problem
p∗ = min f (x)
d ∗ = max G (u)
g(x)≤0
u≥0
Weak duality: d ∗ ≤ p∗ is always valid, even for non-convex
problems.
◮
Strong duality: d ∗ = p∗ .
◮
Strong duality holds for convex problems (Thm. 6.7).
◮
The optimal duality gap: p∗ − d ∗ ≥ 0.
Tax interpretation of duality
◮
The variables xi , i = 1, . . . , n, describe the operation mode
of a company.
◮
The operation cost is f (x).
◮
Each of the constraints gj (x) ≤ 0, j = 1, . . . , m, represent
some resource or regulatory limit (labour, warehouse, raw
materials, etc.).
◮
Problem: Find the optimal operation mode x for the
company under given limitations
p∗ = min f (x),
g(x)≤0
where g(x) = [g1 (x), g2 (x), . . . , gm (x)].
Penalty for violations
◮
Assumption: The limits can be violated.
◮
The penalty is linear in the amount of violation ui gi (x):
If ui gi (x) > 0, then the company pays a penalty,
If ui gi (x) < 0, then the company receives money
for selling capacity.
◮
ui is the price in euros for braking the constraint gi (x) ≤ 0
per unit.
◮
Total cost
L(x, u) = f (x) + u⊤ g(x).
Interpretation
◮
The dual problem: d ∗ = minu≥0 G (u).
◮
Weak duality: The optimal cost, if violations are allowed, is
less than in the original situation even with the most
unfavourable prices ui .
◮
Strong duality: The dual optimum λ is a set of prices for
which there is no advantage for the company to violate the
constraints. They are called the shadow prices.
Equality constraints
◮
Primal problem:
min
f (x)
g(x) ≤ 0
h(x) = 0
where
◮



h1 (x)
g1 (x)




g(x) =  ...  , h(x) =  ...  .
hp (x)
gm (x)

Lagrangian:
L(x, u, v ) = f (x) +
m
X
i =1
ui gi (x) +
p
X
k=1
vk hk (x).
KKT optimality conditions
KKT-conditions
∇x L(x, u, v ) = ∇f (x) +
m
X
ui ∇gi (x) +
i =1
p
X
vk ∇hk (x) = 0
k=1
ui gi (x) = 0, u1 ≥ 0, gi (x) ≤ 0, i = 1, . . . , m
hk (x) = 0, k = 1, . . . , p
The dual function
G (u, v ) = inf n L(x, u, v )
x∈R
is always concave.
Dual problem: maxu≥0, v G (u, v ).
Uzawa’s method
◮
The primal problem (P):
min f (x),
x∈U
(1)
where
U = {x ∈ Rn : gi (x) ≤ 0, i = 1, . . . , m}.
◮
◮
The objective: Construct an algorithm which enables us to
approximate a solution of (P).
The dual problem (Q):
max G (µ) = G (λ).
µ∈Rm
+
◮
◮
We apply the gradient method to the dual problem using the
projection operator (P+ λ)i = max{λi , 0}, 1 ≤ i ≤ m.
Generate the sequence {λ(k) }k∈N :
λ(k+1) = P+ [λ(k) + ρk ∇G (λ(k) )], k ≥ 0
The gradient of the dual function
◮
m
Let λ ∈ Rm
+ and λ + µ ∈ R+ .
◮
xλ , xµ+λ solutions of the problem
min f (x) + α⊤ g (x), α = λ or λ + µ.
x
◮
Then we have the inequalities
L(xλ , λ) ≤ L(xλ+µ , λ), L(xλ+µ , λ + µ) ≤ L(xλ , λ + µ)
⇓
λ⊤ g (xλ+µ ) ≤ G (xλ+µ ) − G (λ) ≤ µ⊤ g (xλ ).
The gradient of the dual function
◮
Then there exists 0 ≤ θ ≤ 1 such that
G (λ + µ) − G (λ) = µ⊤ g (xλ ) + θµ⊤ [g (xλ+µ ) − g (xλ )].
◮
Assumption: gi (x), 1 ≤ i ≤ m, are continuous. Then
G (λ + µ) − G (λ) = µ⊤ g (xλ ) + kµkǫ(µ), lim ǫ(µ) = 0.
µ→0
◮
The gradient of the dual function is ∇G (λ) = g (xλ )
◮
xλ is the solution of the unconstrained problem
min f (x) + λ⊤ g (x)
x
Uzawa’s algorithm
◮
◮
◮
Initialization: Choose any λ(0) ∈ Rm
+
Define a sequence of pairs (x (k) , λ(k) ) ∈ Rn × Rm
+ by means
of the recurrence formulae:
Calculate
x (k) : min f (x) + (λ(k) )⊤ g (x)
Calculate
λ(k+1) : λi
x
(k+1)
(k)
= max{λi
+ ρk gi (x (k) ), 0}
The stepsize ρk is defined by the gradient method
The convergence of the Uzawa
◮
◮
f (x) elliptic: (∇f (x) − ∇f (y ))⊤ (x − y ) ≥ αkx − y k2
The constraints are linear inequalities:
g (x) ≤ 0 ⇔ Cx − d ≤ 0
◮
The norm of C : kC k = maxx∈Rn
Thm 6.9
If
0 < ρk <
kCxkm
kxkn
2α
,
kC k2
then the sequence {x (k) }k∈N converges to the unique solution of
the primal problem (P).
Furthermore, if Rank(C ) = m, then also λ(k) → λ which is the
unique solution of the dual problem (Q).
Example 1
Example 1
Solve
min x12 + x22
x1 +x2 ≥2
using the Uzawa’s method.
Solution: Choose λ0 = 1 and the stepsize ρ = 1
(
1
2x1 − 1 = 0
2
2
1. minx x1 + x2 − x1 − x2 + 2 ⇒
⇒ x (0) = 21
2x2 − 1 = 0
2
Projection phase:
(0)
(0)
λ1 = max{λ0 + ρ∇G (λ0 ), 0} = max{1 + (2 − x1 − x2 ), 0} = 2
(
2x1 − 2 = 0
1
2
2
(1)
2. minx x1 + x2 − 2x1 − 2x2 + 2 ⇒
⇒ x =
1
2x2 − 2 = 0
(1)
(1)
Projection phase: λ2 = max{2 + (2 − x1 − x2 ), 0} = 2. STOP!!!
The optimal solution has been found.
Example 2
Example 2
min
x12 +x22 −1≤0
x1 + x2
The steps in Uzawa’s algorithm:
1
− 2λ
1
− 2λ
◮
minx x1 + x2 + λ(x12 + x22 − 1) ⇒ xλ =
◮
The gradient of the dual function: ∇G (λ) = g (xλ ) =
◮
The projection: λnext = max{λ
◮
What is the correct step size?
+ ρ( 2λ1 2
− 1), 0}
1
2λ2
−1
The step size ρ = 1
Matlab-code:
>> r = 1; (the stesize)
>> l (1) = 1; (λ0 = 1)
>> for k = 1 : 20
l (k + 1) = max(l (k) + r /2 ∗ l (k)b(−2) − r , 0);
end
The result

1.0000
 0.5

 1.5

λ=
0.7222
0.6808

0.7596
0.6262
0.9013
0.5168
1.3888
0.6481
0.8386
0.5496
1.2049

0.5493
1.2064

0.5499

1.2032

0.5486

1.21 
0.5515
ρ=
1
2
⇒ Convergence

1.000000
0.750000

0.694444

Λ=
0.712844
0.704828

0.708066
0.706712
0.707271
0.707039
0.707135
0.707095
0.707112
0.707105
0.707108

0.707106
0.707107

0.707107

0.707107

0.707107

0.707107
0.707107
Linearly constrained quadratic optimization
◮
In the case of quadratic functional
1
f (x) = x ⊤ Ax − b⊤ x
2
◮
subject to constraints U = {x : Cx − d ≤ 0}.
One step at the Uzawa’s algorithm is
calculate x (k) :
calculate λ(k+1) :
◮
Ax (k) − b + C ⊤ λ(k) = 0
λ(k+1) = max{λ(k) + ρ(Cx (k) − d ), 0}
If A is symmetric and positive definite, then the Uzawa’s
algorithm converges provided
0<ρ<
2σ1 (A)
,
kC k2
where σ1 (A) is the smallest eigenvalue of A
Proof of Uzawa’s method
1. f (x) elliptic;
2. U is non-empty and convex
⇓
The problems
min f (x),
x∈U
min f (x) + u ⊤ (Cx − d )
x∈Rn
have unique solutions x̃ and x(u).
Moreover, the dual problem (Q) minu≥0 G (u) has at least one
solution (Thm. 6.7)
Proof continues
Some facts:
1. There exists at least one λ ∈ Rm
+ such that the pair (x(λ), λ)
is a saddle point of the Lagrangian L(x, u).
2. ∇f (x(λ)) + C ⊤ λ = 0.
3. (Cx(λ) − d )⊤ (u − λ) ≤ 0, ∀u ∈ Rm
+ , since
m
L(x(λ), λ) = maxu∈R+ L(x(λ), u)
The last relation can be written as:
⊤
λ − (λ + ρ(Cx(λ) − d ) (u − λ) ≥ 0, ∀u ∈ Rm
+.
So: λ is a projection of λ + ρ∇G (λ) to Rm
+.
Proof
In short:
(
∇f (x(λ)) + C ⊤ λ = 0
λ = P+ (λ + ρ(Cx(λ) − d )).
By the definition of Uzawa’s algorithm:
(
∇f (x (k) ) + C ⊤ λ(k) = 0
λ(k+1) = P+ (λ(k) + ρ(Cx (k) − d )).
Hence we obtain (projection does not increase distances)
(
∇(f (x (k) ) − f (x(λ))) + C ⊤ (λ(k) − λ) = 0
kλ(k+1) − λk ≤ kλ(k) − λ + ρ(C (x (k) − x(λ)))k.
Convergence of the sequence x (k)
Taking the squares and using the ellipticity:
kλ(k+1) − λk2 ≤ kλ(k) − λk2 + 2ρ(C ⊤ (λ(k) − λ))⊤ (x (k) − x(λ))
+ρkC (x (k) − x(λ))k2
= kλ(k) − λk2
+2ρ(∇(f (x (k) ) − f (x(λ))))⊤ (x (k) − x(λ))
+ρkC (x (k) − x(λ))k2
≤ kλ(k) − λk2 − ρ{2α − ρkC k2 }kx (k) − x(λ)k2
0≤
2α
⇒ kλ(k+1) − λk ≤ kλ(k) − λk
kC k2
Convergence
The sequence {kλ(k) − λk}k is monotonic, decreasing and
bounded below
⇒ lim {kλ(k+1) − λk2 − kλ(k) − λk2 } = 0
k→∞
On the other hand
ρ{2α − ρkC k2 }kx (k) − x(λ)k2 ≤ kλ(k+1) − λk2 − kλ(k) − λk2 → 0
as k → ∞.
Convergence of {λ(k) }k
◮
◮
◮
◮
kλ(k) − λk converges ⇒ {λ(k) }k is bounded.
There exist a converging subsequence {λ(l) }l which
converges to element λ̃.
∇f (x(λ)) + C ⊤ λ̃ = 0
If Rank(C ) = m, then the dual problem has a unique
solution λ for which (the primal problem is uniquely solvable)
∇f (x(λ)) + C ⊤ λ = 0
◮
λ = λ̃, because Ker(C ) = 0.
◮
The convergence follows for the whole sequence.