Convex optimization

Lecture 10
Convex optimization
Kin Cheong Sou
Department of Mathematical Sciences
Chalmers University of Technology and Göteborg University
November 26, 2013
Convex sets and functions
Convex optimization
A set S ⊆ Rn is a convex set if
x 1 , x 2 ∈ S,
=⇒ λx 1 + (1 − λ)x 2 ∈ S
λ ∈ (0, 1)
A function f : Rn → R is a convex function on the convex set S if
x 1 , x 2 ∈ S,
=⇒ f λx 1 + (1 − λ)x 2 ≤ λf (x 1 ) + (1 − λ)f (x 2 ).
λ ∈ (0, 1)
TMA947 – Lecture 10
Convex optimization
2 / 38
Convex optimization
Convex optimization
A convex optimization problem is
f ∗ = infimum
f (x ),
subject to
x ∈ S,
where f : Rn → R is a convex function on S and S ⊆ Rn is a convex
set.
minimize
f (x)
x
subject to
gi (x) ≤ 0,
i = 1, . . . , m
hj (x) = 0,
j = 1, . . . , k
◮
f is a convex function,
◮
gi are convex functions, i = 1, . . . , m,
◮
hj are affine functions, j = 1, . . . , k. Why not just convex hj ’s?
TMA947 – Lecture 10
Convex optimization
3 / 38
Convex equality constraints are linear
Convex optimization
◮
For convex h(x), the set {x | h(x) = 0} need not be convex!
◮
Consider h(x) : R2 → R, h(x) = kxk2 − 1.
◮
The set {x | h(x) = 0} = {x ∈ R2 | kxk2 = 1} is the edge of a
circle not including inside. Clearly not convex!
2
{x | h(x) = 0}
{x | h(x) ≤ 0}
◮
The only convex equality constraints are linear constraints.
TMA947 – Lecture 10
Convex optimization
4 / 38
Local minimum = global minimum
Convex optimization
Consider convex optimization problem
minimize
f (x),
subject to
x ∈ S,
x
(CP)
If x ∗ is a local minimum of convex optimization problem (CP), then
x ∗ is also a global minimum of (CP).
Proof: The idea is as follows:
◮
If x ∗ is not global minimum, then there exists y ∈ S : f (y ) < f (x ∗ ).
◮
For any 0 < θ < 1, define z(θ) = θx ∗ + (1 − θ)y . z(θ) ∈ S and
f (z(θ)) < f (x ∗ ) by convexity of S and f .
◮
For θ ≪ 1, f (z(θ)) ≥ f (x ∗ ), as x ∗ is local minimum. Contradiction!
TMA947 – Lecture 10
Convex optimization
5 / 38
Algorithmic implication
Convex optimization
◮
Most algorithms for constrained optimization problems find only
KKT points.
◮
Examples include gradient projection method, penalty method,
interior point method, etc (see lecture 12 and lecture 13).
◮
Without additional assumptions KKT points need not be global
minima.
◮
With convexity and Slater’s constraint qualification, KKT points are
global minima (see Theorem 5.49, Corollary 5.51 in text).
TMA947 – Lecture 10
Convex optimization
6 / 38
Certifying convexity
Convex optimization
Let S ⊂ Rn . Let f : S → R be a function whose convexity to be decided
◮ Often, the def. f λx 1 + (1 − λ)x 2 ≤ λf (x 1 ) + (1 − λ)f (x 2 ), for
x 1 , x 2 ∈ S and 0 < λ < 1 is not the most handy test for convexity.
◮
We have first order condition: if f ∈ C 1 , S is convex and open, then
f is convex on S if and only if
T
f (y ) ≥ f (x) + ∇f (x) (y − x),
◮
We have second order condition: if f ∈ C 2 , S is convex and open,
then f is convex on S if and only if ∇2 f (x) is positive semidefinite:
∇2 f (x) 0,
◮
∀ x, y ∈ S.
∀ x ∈ S.
We can restrict f on a line: f is convex on S iff g (t) := f (x + tv ) is
convex for all x ∈ S, v ∈ Rn and t ∈ R such that x + tv ∈ S.
TMA947 – Lecture 10
Convex optimization
7 / 38
Certifying convexity, examples
◮
Show that f (x, y ) =
◮
◮
x2
y
Convex optimization
is convex on S = {(x, y ) ∈ R2 | y > 0}.
S is a convex set, and f ∈ C 2 . So, f is convex because
!
2
2
− 2x
y −xy
y
y2
3
2
0.
2
=
2y
∇ f (x, y ) =
2x
−xy x 2
− 2x
y2
y3
Show that f (X ) = log(det(X −1 )) is convex for X ≻ 0.
◮
◮
S = {X ∈ Rn×n | X ≻ 0} is a convex set (check definition!)
Let V ∈ Rn×n be any symmetric matrix. Define
g (t) := f (X + tV ) = − log(det(X + tV ))
1
1
= − log(det(X )) − log(det(I + tX − 2 VX − 2 ))
1
1
P
= − log(det(X )) − log(1 + tλi (X − 2 VX − 2 ))
i
− log(·) convex =⇒ g (t) convex ∀X + tV ≻ 0 =⇒ f (X ) is convex.
TMA947 – Lecture 10
Convex optimization
8 / 38
Convexity preserving operations
Convex optimization
We can construct convex functions (sets) from known convex functions
(sets). For example,
◮
Nonnegative weighted sum of convex functions is a convex function.
◮
Affine function composition: If f (x) is convex, then
g (y ) := f (Ay + b) is also convex.
◮
If S is a convex set, then {Ax + b | x ∈ S} is convex. Also,
{x | Ax + b ∈ S} is convex.
◮
Many more rules...
See text (Chapter 3) and “Convex Optimization” by Boyd and
Vandenberghe (free from the authors’ websites).
TMA947 – Lecture 10
Convex optimization
9 / 38
Common convex problems
Convex optimization
Some types of convex optimization problems are extensively studied,
because of applicability and availability of specialized efficient algorithms.
◮
Linear programming (LP) (see Lecture 8)
◮
(Convex) quadratic programming (convex QP)
◮
Second-order cone programming (SOCP)
◮
Semidefinite programming (SDP)
◮
Geometric programming (GP)
◮
Sum-of-squares programming (SOSP)
◮
and more...
TMA947 – Lecture 10
Convex optimization
10 / 38
Quadratic programming (QP)
Convex optimization
A (convex) quadratic programming (convex QP) minimizes a convex
quadratic objective function, subject to linear constraints.
minimize
n
x∈R
subject to
x T Q x + pT x
Ax ≤ b,
where Q ∈ Rn×n , Q 0 is given. p, A and b are given.
◮
Many applications: distance (and projection) to a polyhedron;
subproblem in more general problem (sequential quadratic
programming, lecture 13).
◮
When Q 0, many efficient algorithms are available (e.g., active-set
method, interior point method, conjugate gradient method, etc).
◮
When Q 0, problem is NP-hard (no polynomial-time algorithm
unless P = NP, which is unlikely).
TMA947 – Lecture 10
Convex optimization
11 / 38
Second-order cone programming (SOCP)
◮
Second-order cone programming (SOCP) minimizes a linear
objective function, with the second-order cone constraints.
minimize
n
x∈R
subject to
◮
Convex optimization
f Tx
kAi x + bi k2 ≤ ciT x + di ,
i = 1, . . . , m
Can model (convex) quadratically-constrained quadratic program
(QCQP)
minimize
n
x∈R
subject to
x T Q0 x T + p0T x
x T Qi x + piT x + ri ≤ 0,
i = 1, . . . , m,
where Qi 0 for i = 0, 1, . . . , m. Thus, SOCP generalizes QP.
◮
Can model hyperbolic constraint
T
x x ≤ yz,
TMA947 – Lecture 10
y, z ≥ 0
⇐⇒
2x y − z ≤ y + z,
2
Convex optimization
y, z ≥ 0
12 / 38
Semidefinite programming (SDP)
Convex optimization
Semidefinite programming (SDP) usually comes in two forms:
◮
Standard form
minimize
X ∈R n×n
subject to
tr(CX )
tr(Ai X ) = bi ,
X 0.
i = 1, . . . , m
where C , A1 , . . . , Am are symmetric matrices.
◮
Inequality form
maximize
m
bT y
subject to
A1 y1 + . . . + Am ym C .
y∈R
◮
SDP is also called linear matrix inequality (LMI) optimization.
TMA947 – Lecture 10
Convex optimization
13 / 38
Convex optimization, main messages
Convex optimization
◮
Convex optimization problem minimizes a convex function over a
convex feasible set.
◮
For convex optimization problem, local minima are global minima.
◮
◮
Recognizing convexity is not trivial... but there are tools to help:
◮
◮
◮
◮
Many efficient local optimization solvers become global
optimization solvers.
Many equivalent conditions for convexity
Many well-known convex optimization problem paradigms
Many “tricks” to establish convexity based on convexity
preserving operations
Convex optimization is very powerful in modeling problems.
TMA947 – Lecture 10
Convex optimization
14 / 38
Convex problem, non-differentiable objective
Subgradient
For convex problem with convex objective f and convex feasible set S:
minimize
f (x),
subject to
x ∈ S,
x
◮
Most algorithms assume some smoothness of f . For example,
gradient descent method: x k+1 ← x k − αk ∇f (x k )
requires that f is differentiable.
◮
For convex problem, we can relax the differentiability assumption
because of the subgradient method, to be detailed:
x k+1 ← x k − αk p k ,
where p k is a subgradient of f at x k . But what is a subgradient?
TMA947 – Lecture 10
Convex optimization
15 / 38
Subgradient
Subgradient
Definition
Let S ⊆ Rn be a nonempty convex set and let f : S → R be a
convex function. Then p is called a subgradient of f at x̄ ∈ S if
f (x) ≥ f (x̄) + p T (x − x̄),
◮
◮
for any x ∈ S.
We define the set of all subgradients to f at x̄ as the
subdifferential of f at x̄ as
∂f (x̄) = p ∈ Rn | f (x) ≥ f (x̄) + p T (x − x̄), for all x ∈ S.
So the set of all subgradients
TMA947 – Lecture 10
Convex optimization
16 / 38
Subgradient, differentiable functions
Subgradient
Lemma
Let S be a nonempty convex set and f : S → R a convex function.
Suppose that f is differentiable at x̄ ∈ int S, meaning that ∇f (x̄)
exists. Then
∂f (x̄) = {∇f (x̄ )}
f (x)
f (x̄) + ∇f (x̄)T (x − x̄)
x
x̄
Figure : When f is differentiable, ∂f (x̄) = {∇f (x̄ )}
TMA947 – Lecture 10
Convex optimization
17 / 38
Subgradient, non-differentiable functions
Subgradient
When f is not differentiable at x̄, ∂f (x̄) may not be a singleton.
f (x)
f (x̄) + p T
1 (x − x̄)
f (x̄) + p T
2 (x − x̄)
f (x̄) + p T
3 (x − x̄)
x
x̄
Figure : Example of three subgradients, p 1 , p 2 , p 3 of f at the point x̄.
TMA947 – Lecture 10
Convex optimization
18 / 38
Existence of subgradient
Subgradient
◮
Now we know what a subgradient is, but does it exist after all?
◮
Yes, it is (almost) true that subgradient exists when f is convex.
Theorem
Let S be a convex set and f : S → R be a convex function. For each
x̄ ∈ int S, there always exists a vector p such that
f (x) ≥ f (x̄) + p T (x − x̄),
for any x ∈ S.
◮
The statement holds for all x̄ ∈ int S, but at the boundary of S
something strange might happen... when f is not continuous.
◮
Why is the theorem true? We show it via a geometric approach. We
need two concepts: epigraph and supporting hyperplane theorem.
TMA947 – Lecture 10
Convex optimization
19 / 38
Epigraph
Subgradient
Let S ⊆ Rn and f : S → R. The epigraph of f with respect to S is
epiS f := {(x, α) ∈ S × R | f (x) ≤ α}
epiS f
S
f (x)
x
The graph of function f (all points (x, f (x))) is in the boundary of epiS f .
TMA947 – Lecture 10
Convex optimization
20 / 38
Epigraph, convex case
Subgradient
Theorem
Let S ⊆ Rn be a nonempty and convex set, and let f : S → R. Then
f is convex if and only if epiS f is a convex set.
epiS f
S
TMA947 – Lecture 10
Convex optimization
f (x)
x
21 / 38
Epigraph, convex case
Subgradient
Theorem
Let S ⊆ Rn be a nonempty and convex set, and let f : S → R. Then
f is convex if and only if epiS f is a convex set.
Proof: First assume f is convex. Let (x, αx ) and (y , αy ) be in epiS f .
Then, for 0 < θ < 1,
θαx + (1 − θ)αy ≥ θf (x) + (1 − θ)f (y ) ≥ f θx + (1 − θ)y
=⇒ θ(x, αx ) + (1 − θ)(y , αy ) ∈ epiS f . Thus, epiS f is a convex set.
Next, assume epiS f is a convex set. For any x, y ∈ S, (x, f (x)) and
(y , f (y )) are in epiS f . Thus, for any 0 < θ < 1, it holds that
θ(x, f (x))+(1−θ)(y , f (y )) ∈ epiS f =⇒ θf (x)+(1−θ)f (y ) ≥ f θx+(1−θ)y .
TMA947 – Lecture 10
Convex optimization
22 / 38
Supporting hyperplane theorem
Subgradient
Theorem
Let C ⊆ Rn be a nonempty and convex set. Let x̄ be a point on the
boundary of C . Then there exists a supporting hyperplane to C at
x̄, meaning that there exists v 6= 0n such that
v T (x − x̄) ≤ 0,
for all x ∈ C .
x2
v = (1, 1)T
C
x̄ =
√1 (1, 1)T
2
x1
v T (x − x̄) = 0
TMA947 – Lecture 10
Convex optimization
23 / 38
Subgradient by supporting hyperplane, I
◮
◮
Subgradient
For x̄ ∈ S, (x̄, f (x̄)) is a point at the boundary of epiS f (convex).
Thus, there exists v : v T x − x̄, z − f (x̄) ≤ 0, ∀(x, z) ∈ epiS f .
f (x)
epiS f
v T (x − x̄, z − f (x̄)) = 0
T
(x̄, f (x̄))
v
S
◮
x
Only when the hyperplane is “non-vertical” does v define a
subgradient at x̄!
TMA947 – Lecture 10
Convex optimization
24 / 38
Subgradient by supporting hyperplane, II
◮
◮
Subgradient
At (x̄, f (x̄)) apply supporting hyperplane theorem for epiS f yields
v T x − x̄, z − f (x̄) ≤ 0, ∀(x, z) ∈ epiS f
Write v = (u, t) ∈ Rn × R. For all (x, z) ∈ epiS f ,
u T (x−x̄)+t(z−f (x̄)) ≤ 0 =⇒ t ≤ 0 (otherwise LHS → ∞ as z → ∞).
◮
If t < 0 (i.e., hyperplane is non-vertical), replace z with f (x) yields
u
u
u
∈ ∂f (x̄) (i.e., is subgradient).
f (x) ≥ f (x̄)−( )T f (x)−f (x̄) =⇒
t
t
t
◮
When will t < 0? If t = 0, then u T (x − x̄) ≤ 0 for all x ∈ S. This is
impossible when x̄ ∈ int S.
TMA947 – Lecture 10
Convex optimization
25 / 38
Subgradient, summary
Subgradient
◮
Now we know subgradient exists for convex f over int S. But...
◮
How do we find a subgradient when we know at least one exists?
◮
◮
We will see one (later in this lecture) when dealing with
Lagrangian dual function.
What is the use of a subgradient? We use it to define...
◮
◮
Optimality condition
Subgradient method
for convex optimization problems with non-differentiable
objective functions.
TMA947 – Lecture 10
Convex optimization
26 / 38
Optimality conditions with subgradients
Subgradient
Proposition (optimality of a convex function over Rn )
Let f : Rn → R be a convex function. The following are equivalent:
1. f is globally minimized at x ∗ ∈ Rn ;
2. 0n ∈ ∂f (x ∗ );
f (x)
f (x̄) + 0T (x − x̄)
x
x̄
◮
N.B. A more completed theorem available in text (Proposition 6.19)!
TMA947 – Lecture 10
Convex optimization
27 / 38
Subgradient methods
Subgradient methods
We want to define an optimization method defined by subgradients.
◮
Since we are solving a minimization problem, we would like to move
in the negative subgradient direction −p (i.e., x k+1 ← x k − αk p k )
◮
But note: −p need not be a descent direction. (See the figure)
x2
f (x) = |x1 | + 2|x2 |
p
x1
◮
It however can move us ”towards” to the optimal solution.
TMA947 – Lecture 10
Convex optimization
28 / 38
Subgradient methods
Subgradient methods
Subgradient direction p defines “cutting plane” – for all x ∈ S : f (x) ≤ f (x̄),
f (x) ≥ f (x̄)+p T (x−x̄) =⇒ p T x ≤ p T x̄
(i.e., the halfspace containing x ∗ ).
−p
x2
−p
x
1
−p
x3
x4
Gradually “carve” the optimal solution set out of the feasible set.
TMA947 – Lecture 10
Convex optimization
29 / 38
Subgradient methods
Subgradient methods
So now we can define a subgradient method
Subgradient method (unconstrained)
0
Step 0 Initiate x 0 , fbest
= f (x 0 ). k = 0.
Step 1 Find a subgradient p k to f in x k .
Step 2 Update x k+1 = x k − αk p k
(αk is the step length in iteration k)
k+1
k
Step 3 Let fbest
= min{fbest
, f (x k+1 )}
Step 4 If some termination criteria is fulfilled, stop. Otherwise, let
k := k + 1 and go to Step 1.
TMA947 – Lecture 10
Convex optimization
30 / 38
Subgradient methods
Subgradient methods
A simple extention if we consider minimizing f over the convex set S.
Subgradient method (over a convex set)
0
Step 0 Initiate x 0 ∈ S, fbest
= f (x 0 ). k = 0.
Step 1 Find a subgradient p k to f in x k .
Step 2 Update x k+1 = ProjS x k − αk p k
k+1
k
Step 3 Let fbest
= min{fbest
, f (x k+1 )}
Step 4 If some termination criteria is fulfilled, stop. Otherwise, let
k := k + 1 and go to Step 1.
TMA947 – Lecture 10
Convex optimization
31 / 38
Step length rules
Subgradient methods
Examples of step size rules.
◮
Constant step size
αk = α
◮
Square summable but not summable
∞
X
k=0
For example: αk =
α2k < ∞,
∞
X
αk = ∞
k=0
a
b+ck
Depending on the step size rules, different convergence results can be
shown.
TMA947 – Lecture 10
Convex optimization
32 / 38
Subgradient methods, summary
Subgradient methods
◮
Now we know subgradient methods can solve convex optimization
problems with non-differentiable objective function. But...
◮
What are typical problems with non-differentiable objective?
◮
To use subgradient methods, we need to find subgradients. Are they
easy to find?
It turns out, the Lagrangian dual problem is naturally suited for
subgradient methods.
TMA947 – Lecture 10
Convex optimization
33 / 38
The Lagrangian dual
Lagrangian dual problem
Consider the Lagrangian relaxation of the problem to find
f ∗ = infimum f (x),
subject to g (x) ≤ 0,
x ∈ X.
We first construct the Lagrangian function
L(x, µ) = f (x) + µT g (x),
and define the dual function as
q(µ) = infimum L(x, µ)
x∈X
TMA947 – Lecture 10
Convex optimization
34 / 38
The Lagrangian dual
Lagrangian dual problem
We then define the Lagrangian dual problem, which is to find
q ∗ = supremum q(µ),
subject to µ ≥ 0
◮
As we have shown, the dual function q is always concave, so the
dual problem is a convex problem.
◮
q is however not in general differentiable.
◮
Therefore, subgradient methods are often utilized to solve the dual
problem.
TMA947 – Lecture 10
Convex optimization
35 / 38
Subgradients for the dual
Lagrangian dual problem
But how do we find subgradients to q at a point µ?
◮
In order to evaluate q(µ), we need to solve the problem
q(µ) = infimum L(x, µ) = infimum f (x) + µT g (x ).
x∈X
x∈X
Let the solution set to this problem be
X (µ) = argmin L(x , µ)
x∈X
◮
If we take any x ∈ X (µ), then g (x ) will be a subgradient. (We will
show this soon)
◮
So when evaluating the dual function q at the point µ, we obtain a
subgradient to q.
TMA947 – Lecture 10
Convex optimization
36 / 38
Subgradients for the dual
Lagrangian dual problem
Proposition
Assume that X is nonempty and compact.Then the following hold.
a) Let µ ∈ Rm . If x ∈ X (µ), then g (x) is a subgradient to q at µ,
that is, g (x) ∈ ∂q(µ).
b) Let µ ∈ Rm . Then
∂q(µ) = conv {g (x) | x ∈ X (µ)} .
Proof: See Proposition 6.20 in text.
TMA947 – Lecture 10
Convex optimization
37 / 38
Subgradients for the dual
Lagrangian dual problem
Subgradient method for the Lagrangian dual problem
0
= q(µ0 ), k := 0
Step 0 Initiate µ0 , qbest
Step 1 Solve the problem
q(µ) = infimum L(x , µk )
x∈X
Let the solution to the problem be x k .
g (x k ) is then a subgradient to q at µk .
Step 3 Update µk+1 = µk + αk g (x k ) +
(project to nonnegativity)
k+1
k
Step 4 Let qbest
= max{qbest
, q(µk+1 )}
Step 4 If some termination criteria is fulfilled, stop. Otherwise, let
k := k + 1 and go to Step 1.
TMA947 – Lecture 10
Convex optimization
38 / 38

Download Report

Convex optimization

Paperzz.com

Your Paperzz