Constrained Convex Optimization

Constrained Convex Optimization
Sebastian Urban
[email protected]
1 / 41
Convexity
2 / 41
Why do we need convex functions for optimization?
∇f=0
∇f=0
Non-convex functions can have multiple local minima with ∇f (x ) = 0.
Gradient descent will not necessarily find the global minimum.
3 / 41
Convex function
Definition (convex function)
A function f : X → R defined on a convex set X is called convex if, for
any two points x , y ∈ X and any t ∈ [0, 1],
f (tx + (1 − t)y ) ≤ tf (x ) + (1 − t)f (y ) .
If a function is convex any two points of the function’s graph can be
connected by a straight line that is above the function graph.
4 / 41
Concave function
Definition (concave function)
A function f : X → R is concave if and only if −f is convex.
f (x )
5 / 41
First-order convexity condition
f(y)
f((1-t)x+ty)
f(x)
x
(1-t)x+ty
y
Convexity imposes a rate of rise constraint on the function.
f ((1 − t)x + ty ) ≤ (1 − t)f (x ) + tf (y )
f ((1 − t)x + ty ) − f (x )
t
Difference between f (y ) and f (x ) is bounded by function values between
x and y .
f (y ) − f (x ) ≥
6 / 41
First-order convexity condition
f (y ) − f (x ) ≥
f (x + t(y − x )) − f (x )
t
f(y)
Let t → 0 and apply the definition of the
directional derivative,
f (y ) − f (x ) ≥ (y − x )T ∇f (x ) .
Theorem
Suppose f : X → R is a differentiable
function and X is convex. Then f is convex
if and only if for x , y ∈ X
f (y ) ≥ f (x ) + (y − x )T ∇f (x ) .
x
Proof.
See Boyd p. 70.
7 / 41
Minimization of convex functions
Given a function f0 : RN → R, minimize f0 (x ).
Local minima have zero gradient
At a (local) minimum point x ∗ the gradient must
be zero,
∇f0 (x ∗ ) = 0 ,
otherwise we could follow the gradient to get an
even lower value,
f0 (x ∗ − ε∇f0 (x ∗ )) = f0 (x ∗ ) − ε||∇f0 (x ∗ )||22 + O(ε2 ||∇f0 (x ∗ )||22 )
< f0 (x ∗ ) for sufficently small ε .
Zero gradient implies minimum for convex functions
If f0 is a convex function and ∇f0 (x ∗ ) = 0 then x ∗ is a (global)
minimum of f0 . Using the first-order convexity criterion, we have
f0 (y ) ≥ f0 (x ∗ ) + (y − x ∗ )T ∇f0 (x ∗ ) = f0 (x ∗ )
for all y .
8 / 41
Convex set
Definition (convex set)
A set X in a vector space is called convex if, for all x , y ∈ X and any
t ∈ [0, 1],
[(1 − t)x + ty ] ∈ X .
A convex set.
A non-convex set.
9 / 41
Constrained Optimization
10 / 41
Optimization with inequality constraints
Constrained optimization problem
Given f0 : RN → R and fi : RN → R,
minimize f0 (x )
subject to fi (x ) ≤ 0,
i = 1, . . . , p .
Feasibility
A point x ∈ RN is called feasible if and only if it satisfies all constraints
fi (x ) ≤ 0, i = 1, . . . , p of the optimization problem.
Minimum and minimizer
We call the optimal value the minimum p ∗ , and the point where the
minimum is obtained the minimizer x ∗ . Thus p ∗ = f (x ∗ ).
11 / 41
Minimization with inequality constraints
Example
minimize f0 (x ) = −(x1 + x2 )
subject to f1 (x ) = x12 + x22 − 1 ≤ 0
I
I
If the minimum is in the interior of the
circle (f1 (x ∗ ) < 0), we have reached it
when ∇f0 (x ∗ ) = 0.
If the minimum is on the circle
(f1 (x ∗ ) = 0), we have reached it when
the negative gradient −∇f0 (x ∗ ) has no
component that we could follow
without changing the value of f1 .
∇f1
f1(x)≤0
-∇f0
min
Thus at the minimizer x ∗ we have
−∇f0 (x ∗ ) = α ∇f1 (x ∗ )
f0 (x ) is color coded.
with α ≥ 0 to ensure that −∇f0 and ∇f1
are not anti-parallel.
12 / 41
Multiple inequality constraints
Given f0 : RN → R and fi : RN → R,
minimize f0 (x )
subject to fi (x ) ≤ 0,
i = 1, . . . , p .
For multiple constraints f1 , . . . , fp we have reached the minimum when
the negative gradient has no component that we could follow without
changing the value of any constraint.
This is the case when −∇f0 is a linear combination of the ∇fi ’s,
−∇f0 (x ∗ ) =
p
X
αi ∇fi (x ∗ ) ,
αi ≥ 0 .
i=1
13 / 41
Lagrangian
minimize f0 (x )
subject to fi (x ) ≤ 0,
i = 1, . . . , m
Definition (Lagrangian)
We define the Lagrangian L : RN × Rm → R associated with the above
problem as
m
X
L(x , α) = f0 (x ) +
αi fi (x ) .
i=1
We refer to αi ≥ 0 as the Lagrange multiplier associated with the
inequality constraint fi (x ) ≤ 0.
Calculating the gradient of L w.r.t. x shows that our optimality criterion
at x ∗ is recovered,
∇x ∗ L(x ∗ , α) = ∇f0 (x ∗ ) +
m
X
αi ∇fi (x ∗ ) = 0 .
i=1
14 / 41
Interpretation of the Lagrangian
We can rewrite the problem,
minimize f0 (x )
subject to fi (x ) ≤ 0,
i = 1, . . . , m ,
as the unconstrained problem
minimize f0 (x ) +
m
X
I (fi (x ))
i=1
where the indicator function,
I (u) =
0 u ≤ 0,
∞ u > 0,
ensures that solutions that violate the constraints are rejected.
If we “approximate” I (fi (x )) by αi fi (x ) the objective becomes
minimize L(x , α) = f0 (x ) +
m
X
αi fi (x ) .
i=1
15 / 41
Interpretation of the Lagrangian
minimize f0 (x )
subject to fi (x ) ≤ 0,
i = 1, . . . , m
L(x , α) = f0 (x ) +
m
X
αi fi (x )
i=1
(Animation (lagrangian2.avi) only visible in Adobe Reader.)
For every α the minimum of L(x , α) w.r.t. x is a lower bound on the
optimal value of the constrained problem.
16 / 41
Interpretation of the Lagrangian
minimize f0 (x )
subject to fi (x ) ≤ 0,
i = 1, . . . , m
L(x , α) = f0 (x ) +
m
X
αi fi (x )
i=1
Magenta line shows minx L(x , α) for α ∈ [0, 5].
How does it relate to the solution of the constrained optimization
problem?
17 / 41
Lagrange dual function
Definition (Lagrange dual function)
The Largrange dual function g : Rm → R is the minimum of the
Lagrangian over x given α,
g(α) = min L(x , α) = min f0 (x ) +
x
x
m
X
!
αi fi (x )
.
i=1
It is concave since it is the pointwise minimum of a family of affine
functions of α. (Exercise)
18 / 41
Lagrange dual function
Theorem
For any α ≥ 0 the value of the Lagrange dual function g(α) is a lower
bound of the minimum p ∗ of the constrained problem,
g(α) ≤ p ∗ .
Proof.
Suppose α ≥ 0 and x̃ is a feasible point, that is fi (x̃ ) ≤ 0. Then we have
m
X
αi fi (x̃ ) ≤ 0
i=1
and therefore
L(x̃ , α) = f0 (x̃ ) +
m
X
αi fi (x̃ ) ≤ f0 (x̃ ) .
i=1
Hence
g(α) = min L(x , α) ≤ L(x̃ , α) ≤ f0 (x̃ ) .
x
19 / 41
Lagrange dual problem
For each α ≥ 0 the Lagrange dual function g(α) gives us a lower bound
on the optimal value p ∗ of the original optimization problem.
What is the best (highest) lower bound?
Lagrange dual problem
maximize g(α) = min L(x , α)
x
subject to αi ≥ 0,
i = 1, . . . , m
The maximum d ∗ of the Lagrange dual problem is the best lower bound
on p ∗ that we can achieve by using the Lagrangian.
20 / 41
Dualities
Weak duality (always)
Since g(α) is a lower bound of p ∗ we have weak duality,
d ∗ ≤ p∗ .
The difference between the solution of the original and the dual problem,
p∗ − d ∗ ≥ 0 ,
is called the (optimal) duality gap.
Strong duality (under certain conditions)
Under certrain conditions we have
d ∗ = p∗ ,
i.e. the solution to the Lagrange dual problem is a solution of the original
constrained optimization problem.
We then say that strong duality holds.
21 / 41
Constraint qualifications for convex problems
Consider the constrained optimization problem,
minimize f0 (x )
subject to fi (x ) ≤ 0 .
Slater’s constraint qualification
If f0 , f1 , . . . , fm are convex and there exists an x ∈ Rn such that
fi (x ) < 0,
i = 1, . . . , m
or the constraints are affine, that is
fi (x ) = w T
i x + bi ≤ 0 ,
the duality gap is zero.
Proof.
See Boyd p. 234.
22 / 41
Dual solution
Problem:
∗
∗
Even when d = p , we do not know if the minimizer
x̃ = arg minx L(x , α∗ ) of the dual problem equals the minimizer x ∗ of
the original problem.
So far the minimizer x̃ of the Lagrangian could differ from the minimizer
x ∗ of the original problem.
23 / 41
Dual solution
Let x ∗ be a minimizer of the primal problem and α∗ a maximizer of the
dual problem. If strong duality holds, then
L(x ∗ , α∗ ) = g(α∗ ) = min L(x , α∗ ) .
x
Proof.
Let x̃ be a minimizer of L(x , α∗ ) over x . We have
d ∗ = g(α∗ ) = min L(x , α∗ ) = L(x̃ , α∗ )
(minimizer)
x
= f0 (x̃ ) +
m
X
αi∗ fi (x̃ )
i=1
m
X
≤ f0 (x ∗ ) +
i=1
∗
≤ f0 (x ∗ ) = p
αi∗ fi (x ∗ )
(Lagrangian)
(x̃ minimizer over x )
(α ≥ 0; x ∗ feasible: fi (x ∗ ) ≤ 0)
and since d ∗ = p ∗ , we have g(α∗ ) = L(x̃ , α∗ ) = L(x ∗ , α∗ ).
24 / 41
Dual solution
Corollary
Let x ∗ be a minimizer of the primal problem and α∗ a maximizer of the
dual problem. If strong duality holds and L(x , α) is convex in x , then
x ∗ = arg min L(x , α∗ ) .
x
25 / 41
Recipe for solving constrained optimization problems
The constrained optimization problem,
minimize f0 (x )
subject to fi (x ) ≤ 0
i = 1, . . . , m ,
with f0 convex and f1 , . . . , fm convex can be solved as follows:
1. Calculate the Lagrangian
L(x , α) = f0 (x ) +
m
X
αi fi (x ) .
i=1
2. Obtain the Lagrange dual function by solving
∇x ∗ L(x ∗ , α) = 0 .
3. Solve the dual problem
maximize g(α) = L(x ∗ , α)
subject to αi ≥ 0
i = 1, . . . , m .
The solution of this problem is also the solution of the original problem if
Slater’s condition is satisfied.
26 / 41
Example
minimize f0 (x ) = −(x1 + x2 )
subject to f1 (x ) = x12 + x22 − 1 ≤ 0
27 / 41
Example
minimize f0 (x ) = −(x1 + x2 )
subject to f1 (x ) = x12 + x22 − 1 ≤ 0
Write down the Lagrangian
L(x , α) = −x1 − x2 + α(x12 + x22 − 1) .
Calculate the derivative of L(x , α) w.r.t. x and set it to zero,
1
∇x L(x , α) = −
+ 2αx = 0
1
1
⇒ x1 = x2 =
.
2α
Substituting this back into L(x , α) gives the dual function
g(α) =
1
1
−α− .
2α
α
28 / 41
Example
minimize f0 (x ) = −(x1 + x2 )
subject to f1 (x ) = x12 + x22 − 1 ≤ 0
1
1
g(α) =
−α−
2α
α
We must now maximize the dual function g(α) subject to α ≥ 0.
Since g(α) is concave, we set its derivative to zero and solve for α.
dg
1
1
= 2 −1− 2 =0
dα
α
2α
1
⇒ α2 =
2
1
⇒ α1,2 = ± √
2
1
⇒ α∗ = √
2
since we require α ≥ 0
29 / 41
Example
minimize f0 (x ) = −(x1 + x2 )
subject to f1 (x ) = x12 + x22 − 1 ≤ 0
1
1
g(α) =
−α−
2α
α
1
∗
α =√
2
The optimization problem is convex and Slater’s condition holds,
therefore the minimal value of f0 (x ) is
√
1
∗
√
= − 2.
p =g
2
√
We calculate the minimizer x ∗ by substituting α = 1/ 2 into
x1 = x2 = 1/2α and get
1
x1∗ = x2∗ = √ .
2
30 / 41
Example
minimize f0 (x ) = −(x1 + x2 )
subject to f1 (x ) = x12 + x22 − 1 ≤ 0
1
1
−α−
g(α) =
2α
α
∇f1
f1(x)≤0
-∇f0
min
1
α∗ = √
2
√
∗
p =− 2
1
x1∗ = x2∗ = √
2
31 / 41
Properties of the solution of a
constrained optimization problem
What can we say about the solution?
Is there yet another method to solve a constrained optimization problem?
32 / 41
KKT conditions for convex problems
Consider the constrained optimization problem,
minimize f0 (x )
subject to fi (x ) ≤ 0 ,
with f0 convex and f1 , . . . , fm convex and strong duality holds.
KKT conditions
If and only if x ∗ , α∗ are points that satisfy the Karush-Kuhn-Tucker
(KKT) conditions,
fi (x ∗ ) ≤ 0
αi∗
∗
primal feasibility,
≥0
dual feasibility,
=0
∇x L(x , α ) = 0
complementary slackness,
x ∗ minimizes Lagrangian,
αi∗ fi (x )
∗
∗
for i = 1, . . . , m, then x ∗ and α∗ are optimal solutions of the constrained
optimization problem and the corresponding Lagrange dual problem.
33 / 41
KKT conditions for convex problems
Proof.
x ∗ , α∗ optimal solutions ⇒ KKT conditions.
The problem is convex and therefore L(x ∗ , α∗ ) minimizes the Lagrangian,
∇x L(x ∗ , α∗ ) = 0 .
By strong duality we have
L(x ∗ , α∗ ) = f0 (x ∗ ) +
m
X
αi∗ fi (x ∗ ) = f0 (x ∗ )
i=1
and with α ≥ 0 and fi (x̃ ) ≤ 0 it follows that
α∗i fi (x ∗ ) = 0 i = 1, . . . , m .
34 / 41
KKT conditions for convex problems
Proof.
KKT conditions ⇒ x ∗ , α∗ optimal solutions.
Since α∗i ≥ 0 the Lagrangian L(x , α∗ ) is convex in x because it is a sum
of convex functions. From ∇x L(x ∗ , α∗ ) = 0 it follows that x ∗ is a
minimizer of L(x , α∗ ) and thus
g(α∗ ) = L(x ∗ , α∗ ) = f0 (x ∗ ) +
m
X
αi∗ fi (x ∗ ) = f0 (x ∗ )
i=1
since αi fi (x ∗ ) = 0.
Let x̃ be a minimizer of the primal problem. Since g(α∗ ) is a lower
bound of f0 (x ) for feasible x we have
f0 (x ∗ ) = g(α∗ ) ≤ f0 (x̃ ) ≤ f0 (x ∗ )
and thus
p ∗ = f0 (x̃ ) = f0 (x ∗ ) = g(α∗ ) .
35 / 41
Complementary slackness
The KKT condition
αi fi (x ∗ ) = 0
can only be satisfied if αi = 0 or fi (x ∗ ) = 0.
If the constraint fi is inactive (fi (x ∗ ) < 0), we have αi = 0.
αi > 0 is only possible if the constraint is active (fi (x ∗ ) = 0).
Explanation:
If the minimum is in the interior of the
feasible area (inactive constraint) we have
∇f0 (x ) = 0 .
∇f1
f1(x)≤0
-∇f0
min
However if the minimum is on the boundary
(active constraint) we can only demand
−∇f0 (x ) = α1 ∇f1 (x )
with α1 > 0.
36 / 41
Complementary slackness: example
minimize f0 (x ) = −(x1 + x2 )
subject to f1 (x ) = x12 + x22 − 1 ≤ 0
1
Solution: α∗ = √
2
1
x1∗ = x2∗ = √
2
Because α∗ > 0 we know
that constraint f1 is active,
i.e.
∇f1
f1(x)≤0
-∇f0
min
1
1
f1 (x ∗ ) = √ 2 + √ 2 −1 = 0 .
2
2
f1 is actively limiting the
achiveable minimum,
because if we moved further
towards the minimum of f0
we would get f1 (x ) > 0.
37 / 41
Summary
I
I
I
Lagrange formalism reformulates constrained optimization problem
into a dual problem.
KKT conditions characterize the solution and sometimes allow to
obtain it without minimization.
Convexity is necessary to guarantee optimality.
38 / 41
Constrained optimization with equality constraints
Given f0 : RN → R, fi : RN → R and hi : RN → R,
minimize f0 (x )
subject to fi (x ) ≤ 0,
i = 1, . . . , m,
hi (x ) = 0,
i = 1, . . . , p .
m
X
p
X
Lagrangian:
L(x , α) = f0 (x ) +
αi fi (x ) +
i=1
αi ≥ 0,
βi hi (x )
i=1
i = 1, . . . , m
(not for βi ’s)
Use same solution method.
39 / 41
Literature
S. Boyd, L. Vandenberghe
Convex Optimization
Cambridge University Press 2004
http://www.stanford.edu/~boyd/cvxbook/
40 / 41