Convex Optimization

ANDREW TULLOCH
C O N V E X O P T I M I Z AT I O N
TRINITY COLLEGE
THE UNIVERSITY OF CAMBRIDGE
Contents
1
Introduction
1.1 Setup
5
5
1.2 Convexity
6
2
Convexity
3
Cones and Generalized Inequalities
4
Conjugate Functions
9
15
21
4.1 The Legendre-Fenchel Transform
4.2 Duality Correspondences
24
5
Duality in Optimization
27
6
First-order Methods
7
Interior Point Methods
31
33
21
4
8
andrew tulloch
Support Vector Machines
8.1 Machine Learning
37
8.2 Linear Classifiers
8.3 Kernel Trick
9
37
38
39
Total Variation
41
9.1 Meyer’s G-norm
41
9.2 Non-local Regularization
42
9.2.1
43
How to choose w( x, y)?
10 Relaxation
45
10.1 Mumford-Shah model
11 Bibliography
49
45
1
Introduction
1.1
Setup
A setup is
(i) A function f : U → Ω (energy), an input image (i), and a reconstructed candidate image (u) and find the minimizer of the
problem
min f (u, i )
(1.1)
u ∈U
where f is typically of the form
f (u, i ) = l (u, i ) +
| {z }
cost
r (u)
|{z}
(1.2)
regulariser
Example 1.1 (Rudin-Osher-Fortem).
minu
1
2
Z
Ω
(u − I )2 dx +
Z
Ω
k∇ukdu
(1.3)
k∇ukdu
(1.4)
• Will reduce contrast
• Will not introduce new jumps
Example 1.2 (Total variation (TV) L1).
minu
1
2
Z
Ω
|u − I |dx + λ
• In general does not cause contrast loss.
Z
Ω
6
andrew tulloch
• Can show that if I = Br (0), then
u=


I
r≥
λ
2

c
r<
λ
2
(1.5)
Example 1.3 (Inpainting).
min
u
1.2
1
2
Z
Ω/A
|u − I |dx + λ
Z
Ω
k∇ukdx
(1.6)
Convexity
Theorem 1.4
If a function f is convex, every local optimizer is a global optimizer.
Theorem 1.5
If a function f is convex, it can (very often) be efficiently optimized
(polynomial time in number of bits in the input)
In computer vision, problems are:
• Usually large-scale (105 - 107 variables)
• Usually non-differentiable
Definition 1.6. f is lower semicountinuous if
f ( x 0 ) ≤ lim inf f ( x ) = min{α ∈ R̄|( x k ) → x10 , f ( x k ) → α}
x→ x0
(1.7)
Theorem 1.7. f : Rn → R̄. Then the following are equivalent
(i) f is lower semi continuous,
(ii) epi f is closed C in Rn × R
(iii) levα f = { x ∈ Rn | f ( x ) ≤ α} are closed for all α ∈ R̄
Proof. ((i ) → (ii )) Take ( x k , αk ) ∈ epi f → ( x, α) ∈ Rn × R. Then By
2.8 f ( x ) ≤ lim infk→∞ f ( x k ) ≤ limin f k→∞ x k = α as ( x, α) ∈ epi f .
((ii ) → (iii )) epi f closed implies epi f ∩ (Rn × { a0 }) closed for all
α0 ∈ R, if and only if
{ x ∈ Rn | f ( x ) ≤ α 0 }
(1.8)
convex optimization
is closed for all α0 ∈ R If α0 = ∞, lev≤+∞ f = Rn . If α0 = −∞,
lev≤−∞ f = ∩k∈N lev≤−k f which is the intersection of closed sets.
((iii ) → (i )) For x j → x 0 , take the subsequence x k → x 0 , with
f ( x k ) → lim inf f ( x ) = c
(1.9)
x→ x0
if c ∈ R, for all K (e) = K 0 such that f ( x k ) ≤ C + e for all k > K 0 .
Then
⇒ x k ∈ lev≤C+e f ⇒ x 0 ∈ lev≤C+e .
Equivalently, x 0 ∈ lev≤C f ⇒ f ( x 0 ) ≤ C = lim inf x → x 0 f ( x ).
If c = +∞, done - f ( x 0 ) ≤ +∞ = lim inf. If c = ∞, same argument
with k ∈ N, f ( x k ) ≤ −k, ...
Example 1.8. f ( x ) = x is lower semi-continuous, but does not have a
minimizer.
Definition 1.9. f : Rn → R̄ is level-bounded if and only if lev≤α f is
bounded for all α ∈ R.
This is also known as coercivity.
Example 1.10. f ( x ) = x2 is level-bounded. f ( x ) =
1
|x|
is not level
bounded.
Theorem 1.11. f : Rn → R̄ is level bounded if and only if f ( x k ) → ∞ if
k x k k → ∞.
Proof. Exercise.
Theorem 1.12. f : Rn → R̄ is lower-semicontinuous, level-bounded, and
proper. Then infx f ( x ) ∈ (−∞, +∞) and argmin f = { x | f ( x ) ≤ in f f ( x )}
is nonempty and compact.
Proof.
arg min f = { x | f ( x ) ≤ inf f ( x )}
x
1
= { x | f ( x ) ≤ inf f ( x ) + , ∀k ∈ N}
k
(1.11)
= ∩k∈N lev f
(1.12)
inf f + 1k
If inf f is −∞, just replace inf f +
αk = −k, k ∈ N.
(1.10)
1
k
by α with α > ∞, and set
7
8
andrew tulloch
These are bounded (by level-boundedness), closed (by f being
lower semicontinuous), and non-empty (since
1
k
≥ 0.). Then these
limit sets are compact, we can just take the limit of the left boundaries of the level sets, and construct the convergent subsequence that
is contained in every level set.
We need to show that inf f 6= −∞. If inf f = −∞, then there exists
x ∈ argmin f with f ( x ) = −∞. Since f is proper, this cannot exist.
Thus, inf f 6== −in f ty.
Remark 1.13. For Theorem 1.12, it suffices to have lev≤α f bounded and
non-empty for at least one α ∈ R.
Proposition 1.14. We have the following properties for lower semi continuity.
(i) If f , g is lower semicontinuous, then f + g is lower semicontinuous
(ii) IF f is lower semicontinuous, λ ≥ 0, then λ f is lower semicontinuous
(iii) f : Rn → R̄ is lower semicontinuous, g : Rm → Rn is continuous, then
f ◦ g is lower semicontinuous.
2
Convexity
Definition 2.1. (i) f : Rn → R̄ is convex if
f ((1 − τ ) x + τy) ≤ (1 − τ ) f ( x ) + τ f (y)
(2.1)
for all x, y ∈ Rn , τ ∈ (0, 1).
(ii) A set C ⊆ Rn is convex if and only if I(C ) is convex.
(iii) f is strictly convex if and only if (2.1) holds strictly whenever
x 6= y and f ( x ), f (y) ∈ R.
Remark 2.2. C ⊆ Rn is convex if and only if for all x, y ∈ C, the connecting line segment is contained in C.
Exercise 2.3. Show
{ x | a T x + b ≥ 0}
(2.2)
f (x) = aT x + b
(2.3)
is convex.
is convex.
Definition 2.4. x0 , . . . , xm ∈ Rn , λ0 , . . . , λm ≥ 0 with ∑im=0 λi = 1, then
∑i λi xi is a convex combination of the xi .
Theorem 2.5. Rn → R̄ is convex if and only if
n
f
∑ λi xi
i =1
!
n
≤
∑ λi f ( xi )
i =1
(2.4)
10
andrew tulloch
C ⊆ Rn is convex if and only if C contains all convex combinations of
it’s elements.
Proof.
n
λ
∑ λi xi = λm xm + 1 − λm 1 − iλm xi
i =1
(2.5)
which is a convex combination of two points, and proceed by induction on m.
The set version is proven by application on I(C ).
Proposition 2.6. f : Rn → R̄ is convex implies that the domain of f is
convex.
Proposition 2.7. f : Rn → R̄ is convex if and only if epi f is convex in
Rn × R if and only if
{( x, α) ∈ Rn × R| f ( x ) < α}
(2.6)
is convex in Rn × R.
Proof. epi f is convex if and only if for all τ ∈ (0, 1), ∀ x, y, α ≥
f ( x ), β > f (y), ( x, α), (y, β) ∈ epi f , we have
f ((1 − τ ) x + τy) ≤ (1 − τ )α + τβ
(2.7)
⇐⇒ ∀τ ∈ (0, 1), ∀ x, y, f ((1 − τ ) x + τy) ≤ (1 − τ ) f ( x ) + τ f (y) (2.8)
⇐⇒ f is convex.
(2.9)
Proposition 2.8. f : Rn → R̄ is convex implies lev≤α f is convex for all
α ∈ R.
Proof.
f ((1 − τ ) x + τy) ≤ (1 − τ ) f ( x ) + τ f (y) ≤ α
which implies lev≤α is convex.
α = ∞ then the lev≤α f = Rn which is convex.
Theorem 2.9. f : Rn → R̄ is convex. Then
convex optimization
(i) arg min f is convex
(ii) x is a local minimizer of f implies x is a global minimizer of f
(iii) f is strictly convex and proper implies f has at most one global minimizer.
Proof. (i) f = ∞ ⇒ arg min f = ∅. f 6= ∞ ⇒ arg min f = lev≤inf f f is
convex by previous proposition.
(ii) Assume x is a local minimizer and there exists y with f (y) < f ( x ).
Then
f ((1 − τ ) x + τy) ≤ (1 − τ ) f ( x ) + τ f (y) < f ( x )
|{z}
< f (x)
Taking τ → 0 shows that x cannot be a local minimizer, and thus y
cannot exist.
(iii) Assume x, y minimizes, which implies f ( x ) = f (y). Then
1
1
1
1
f ( x + y) < f ( x ) + f (y) = f ( x ) = f (y)
2
2
2
2
which implies x, y not global minimizers.
Proposition 2.10. To construct convex functions:
(i) Let f i , i ∈ I convex, then
f ( x ) = sup f ( x )
i ∈I
(2.10)
is convex.
(ii) Let f i , i ∈ I strictly convex, I finite, then
sup f (i )
(2.11)
∩i∈I Ci
(2.12)
i ∈I
is strictly convex.
(iii) Ci , i ∈ I convex sets, then
is convex.
11
12
andrew tulloch
(iv) f k , k ∈ N convex,
lim sup f k ( x )
k→∞
(2.13)
is convex.
Proof. Exercise.
Example 2.11. (i) C, D convex does not imply C ∪ D is convex (e.g. disjoint)
(ii) f : Rn → R is convex, C is convex implies f + I(C ) is convex.
(iii) f ( x ) = | x | = max{ x 0 − x } is convex.
(iv) f ( x ) = k x k p , p ≥ 1 is convex, as
k · k p = sup h·, yi
(2.14)
k y k p =1
Theorem 2.12. Let C ⊆ Rn be open and convex. Let f : C → R be
differentiable. Then the following are equivalent:
(i) f is [strictly] convex
(ii) hy − x, ∇ f (y) − ∇ f ( x )i ≥ 00 for all x, y ∈ C [with > 0 for x 6= y]
(iii) f ( x ) + h∇ f ( x ), y − x i ≤ f (y) for all x, y ∈ C [with < for x 6= y]
(iv) If f is additionally twice differentiable, then ∇2 f is positive semidefinite
for all x ∈ C.
(ii) is monotonicity of ∇ f .
Proof. Exercise (reduce to n = 1, then extend by a convex function is
convex on Rn iff it is convex on all Rn−1 subsets.)
Remark 2.13. Note that the inverse of (iv) does not necessarily hold, for
example f ( x ) = x4 .
Proposition 2.14. We have the following results hold for convex functions.
(i) f i , . . . , f m : Rn → R̄ is convex, λi , . . . , λm ≥ 0, the n
f =
∑ λi f i
(2.15)
i
is convex, and strictly convex if there exists i such that λi > 0 and f i is
strictly convex.
convex optimization
(ii) f i : Rn → R is convex implies f ( x1 , . . . , xm ) = ∑ f i ( xi ) is convex,
strictly convex if all f i are strictly convex.
(iii) f : Rn → R̄ is convex, A ∈ Rn×n , b ∈ Rn implies g( x ) = f ( Ax + b) is
convex.
Remark 2.15.
k Mke = sup (k Mx k)
x ∈Rn
is convex.
Proposition 2.16. (i) c1 , . . . , Cm convex implies C1 × · · · × Cm convex
(ii) C ⊆ Rn , A ∈ Rm×n , b ∈ Rm implies L(C ) is convex with L( x ) =
Ax + b.
(iii) C ⊆ Rm , ... ⇒ L? (C ) is convex.
(iv) f , g convex implies f + g are convex
(v) f convex, λ ∈ R implies λ f is convex
Definition 2.17. For S ⊆ Rn , x ∈ Rn , define the projection of x onto S
as
ΠS (y) = arg min k x − yk2
(2.16)
x ∈S
Proposition 2.18. If C ⊆ Rn is convex, closed, and non-empty, then ΠC is
single-valued - that is, the projection is unique.
Proof. Let
1
ΠS (y) = arg min k x − yk22 + IC ( x )
2
x ∈S
(2.17)
To show uniqueness, 21 k x − yk2 is strictly convex, and ∇2 12 k x −
yk2 = I > 0, so f is strictly convex.
f is proper (C 6= ∅), and thus f has at most one minimizer.
To show existence, we have that f is proper, lower semicontinuous
(left part from continuous, right part from C closed). Level bounded
as k x − yk22 → ∞ as k x k2 → ∞, and I(C ) ≥ 0.
Thus, the arg min f 6= ∅.
13
14
andrew tulloch
Definition 2.19. Let S ⊆ Rn is arbitrary. Then
con S = ∩C convex, C ⊇ S C
(2.18)
is the convex ball of S.
Remark 2.20. con S is the smallest convex set that contains S.
Theorem 2.21. Let S ⊆ Rn , then
(
con S =
n
n
i =0
i =0
∑ λi xi |xi ∈ S, λi ≥ 0, ∑ λi = 1
)
(2.19)
Proof. D is convex and contains S - if x, y ∈ D, then (1 − τ ) x + (τ )y ∈
D. Thus con S ⊆ D.
From a previous theorem, con S convex implies con S contains all
convex combinations of points in S. Thus con S ⊇ D
Thus con S = D.
Definition 2.22. For a set C ⊆ Rn , define cl C as
cl C = { x ∈ Rn |for all open neighborhoods N of x, N ∩ C 6= ∅}
(2.20)
int C as
int C = { x ∈ Rn |there exists an open neighborhood N of x with N ⊆ C }
(2.21)
∂C (the boundary) as
∂C = cl C \ int C
(2.22)
Remark 2.23.
cl G =
\
S closed, S ⊇ G
S
(2.23)
3
Cones and Generalized Inequalities
Definition 3.1. K ⊆ Rn is a cone if and only if 0 ∈ K and λx ∈ K for
all x ∈ K, λ ≥ 0.
Definition 3.2. A cone in pointed if and only if
x1 + · · · + xn = 0 ⇐⇒ x1 = · · · = xn = 0
(3.1)
Example 3.3. (i) Rn is a pointed cone.
(ii) {( x1 , x2 ) ∈ R2 | x2 > 0} is not a cone.
(iii) {( x1 , x2 ) ∈ R2 | x2 ≥ 0} is a cone, but is not pointed.
(iv) { x ∈ Rn | xi ≥ 0, i ∈ 1, . . . , n} is a pointed cone.
Proposition 3.4. K ⊆ Rn be any set. Then the following are equivalent:
(i) K is a convex cone.
(ii) K is a cone and K + K ⊆ K.
(iii) K 6= ∅ and ∑in=0 αi xi ∈ K for all xi ∈ K, αi ≥ 0.
Proof. Exercise.
Proposition 3.5. If K is a convex cone, then K is pointed if and only if
K ∩ (−K ) = {0}.
Proof. Exercise.
Definition 3.6.
∂ f ( x ) = {v ∈ Rn | f ( x ) + hv, y − x i ≤ f (y)∀y}
(3.2)
16
andrew tulloch
f 0 ( x ) = lim
t →0
f ( x + t) − f ( x )
t
(3.3)
Proposition 3.7. Let f , g : Rn → R̄ be convex. Then
(i) f differentiable at x implies ∂ f ( x ) = {∇ f ( x )}.
(ii) f differentiable at x, g( x ) ∈ R implies
∂( f + g)( x ) = ∇ f ( x ) + ∂g( x )
(3.4)
Proof. We have
(i) Equivalent to (ii) with g = 0.
(ii) To show LHS ⊇ RHS, if v ∈ ∂g( x ), then hv, y − x i ≤ g(y), which
by Theorem 3.12 in notes gives
h∇ f ( x ), y − x i + f ( x ) ≤ f (y)
(3.5)
hv + ∇ f ( x ), y − x i + ( f + g)( x ) ≤ ( f + g)(y)
(3.6)
⇐⇒ v + ∇ f ( x ) ∈ ∂( f + g)
(3.7)
The other direction is given as
lim inf z → x
f (z) + g(z) − f ( x ) − g( x ) − hv, z − x i
≥0
kz − x k
(3.8)
⇒ lim inf
z→ x
g ( z ) − g ( x ) − h v − ∇ f ( x ), z − x i
≥0
kz − x k
(3.9)
⇒ lim inf
t ↓0
g(z(t)) − g( x ) − hv − ∇ f ( x ), (1 − t) x + ty − yi
≥0
tky − x k
(3.10)
⇒ lim inf
t ↓0
t( g(y) − g( x )) − thv − ∇ f ( x ), y − x i
≥0
tky − x k
(3.11)
⇒ g(y) − g( x ) − hv − ∇ f ( x ), y − x i ≥ 0 ⇒ v − ∇ f ( x ) ∈ ∂g( x )
(3.12)
v ∈ ∇ f ( x ) − ∂g( x )
(3.13)
convex optimization
Theorem 3.8. Let f : Rn → R̄ be proper. Then
x ∈ arg min f ⇐⇒ 0 ∈ ∂ f ( x )
(3.14)
0 ∈ ∂ f ( x ) ⇐⇒ h0, y − x i + f ( x ) ≤ f (y)
| {z }
(3.15)
Proof.
0
Definition 3.9. For a convex set C ⊆ Rn and point x ∈ C,
NC ( x ) = {v ∈ Rn |hv, y − x i ≤ 0∀y ∈ G }
(3.16)
is the “normal cone” of x. By convention, NC ( x ) = ∅ for all x ∈
/ C.
Proposition 3.10. Let C be convex and C 6= ∅. Then
∂I(C ) ( x ) = NC ( x )
(3.17)
Proof. For x ∈ C, we have
∂I(C ) = {v|I(C ) ( x ) + hv, y − x i ≤ I(C ) (y)∀y ∈ C } = NC ( x ) (3.18)
For x ∈
/ C, .
Fill in proof here
Proposition 3.11. C, closed, C 6= ∅ and convex, x ∈ Rn . Then
y = ΠC ( x ) ⇐⇒ x − y ∈ NC (y)
(3.19)
Proof.
y ∈ ΠC ( x ) ⇐⇒ y minimizes
1
k y − x k2 + I( C ) ( y 0 ) .
2
{z
}
|
(3.20)
g(y)
If and only if 0 ∈ ∂g(y) if and only if 0 ∈ y − x + ∂I(C ) (y)
Proposition 3.12. Let f : Rn → R̄ be convex proper. Then
∂ f (x) =


∅
x∈
/ dom f

{v ∈ Rn |(u, −1) ∈ Nepi f ( x, f ( x ))}
x ∈ dom f
(3.21)
17
18
andrew tulloch
If x ∈ dom f ,
Ndom f = {v ∈ Rn |( x, 0) ∈ Nepi f ( x, f ( x ))}
(3.22)
Example 3.13. Let subdifferential of f : R → R, f ( x ) = | x | is
∂ f (x) =


sign x
x 6= 0

[−1, 1]
x=0
(3.23)
Definition 3.14. For C ⊆ Rn , define the affine hull as
aff(C ) = ∩ A affine, C ⊆ A A
(3.24)
rint(C ) = { x ∈ Rn |there exists an open neighborhood N of x such that N ∩ aff(C ) ⊆ C }
(3.25)
Example 3.15.
aff(Rn ) = Rn
(3.26)
aff(Rn ) = Rn
(3.27)
rint([0, 1]2 ) = (0, 1)2
(3.28)
rint([0, 1] × {0}) = (0, 1) × {0}
(3.29)
Exercise 3.16. We know
intA ∩ B ⊆
Z
A∩
Z
B
(3.30)
Does
rint( A ∩ B) ⊆ rint A ∩ rint B
(3.31)
Proposition 3.17. Let f : Rn → R̄ be convex. Then
(i) (i) g( x ) = f ( x + y) ⇒ ∂g( x ) = ∂ f ( x + y)
(ii) g( x ) = f (λx ) ⇒ ∂g( x ) = λ∂ f ( x )
(iii) g( x ) = λ f ( x ) ⇒ ∂g( x ) = λ∂ f ( x )
(ii) f : Rn → R̄ proper, convex, A ∈ Rm × n such that
{ Ay|y ∈ Rm } ∩ rint dom f 6= ∅
(3.32)
convex optimization
Then for x ∈ dom( f ◦ A) we have
∂( f ◦ A)( x ) = A T ∂ f ( Ax )
(3.33)
(iii) Let f 1 , . . . f m : Rn → R̄ be proper, convex, and rint dom f 1 ∩ · · · ∩
rint dom f m 6= ∅. Then
∂ f (x) = ∂ f1 (x) + · · · + ∂ f m (x)
(3.34)
19
4
Conjugate Functions
4.1
The Legendre-Fenchel Transform
Definition 4.1. For f : Rn → R̄, define
con f ( x ) =
sup
g( x )
(4.1)
g≤ f ,g convex
is the convex hull of f .
Proposition 4.2. con f is the greatest convex function majorized by f .
Definition 4.3. For f : Rn → R̄, the (lower) closure of f is defined as
cl f ( x ) = lim inf f (y)
y→ x
(4.2)
Proposition 4.4. For f : Rn → R̄,
epi(cl f ) = cl(epi f )
(4.3)
In particular, if f is convex, then cl f is convex.
Proof. Exercise.
Proposition 4.5. If f : Rn → R̄, then
(cl f )( x ) = sup g( x )
(4.4)
g≤ f ,g lsc
Proof.
Fill in
22
andrew tulloch
Theorem 4.6. Let C ⊆ Rn be closed and convex. Then
\
C=
Hb,β
(4.5)
(b,β)s.t.c⊆ Hb,β
where
Hb,β = { x ∈ Rn |h x, bi − β ≤ 0}
(4.6)
Proof.
Theorem 4.7. Let f : Rn → R̄ be proper, lsc, and convex. Then
f (x) =
sup
g( x )
(4.7)
g≤ f ,g affine
Proof.
This is quite an involved
proof.
Definition 4.8. For f : Rn → R̄, let f ? : Rn → R̄ be defined by
f ? (v) = sup {hv, x i − f ( x )}
x ∈Rn
(4.8)
as the conjugate to f . The mapping f 7→ f ? is the Legendre-Fenchel
Transform
Remark 4.9. For v ∈ Rn ,
f ? (v) = sup{ x ∈ Rn {hv, x i − f ( x )}
(4.9)
⇒ f ? (v) ≥ hv, x i − f ( x )∀ x ∈ Rn ⇐⇒ f ( x ) ≥ hv, x i − f ? ( x )∀ x ∈ Rn .
(4.10)
Thus f ? is the largest affine function with gradient v majorized by f .
Theorem 4.10. Assume f : Rn → R̄. Then
(i) f ? = (con f )? = (cl f )? = (cl con f )? and f ≥ f ?? = ( f ? )? , the
biconjugate of f .
(ii) If con f is proper, then f ? , f ?? as proper, lower semicontinuous, convex
and f ?? = cl con f .
(iii) If f : Rn → R̄ is proper, lower semicontinuous, and proper, then
f ?? = f
(4.11)
convex optimization
Proof. We have
(v, β) ∈ epi f ? ⇐⇒ β ≥ hv, x i − f ( x )∀ x ∈ Rn ⇐⇒ f ( x ) ≥ hv, x i − β∀ x ∈ Rn
(4.12)
We claim that for an affine function h, we have h ≤ f ⇐⇒ h ≤
con f ⇐⇒ h ≤ cl f ⇐⇒ h ≤ cl con f . This is shown as con f is the
largest convex function less than or equal to f , and h is convex. Same
for cl, cl con, etc.
Thus in (4.12) we can replace f by con f , cl f , cl con f , which gives
our required result.
We also have
f ?? (y) = sup {hv, yi − sup {hv, x i − f ( x )}}
(4.13)
≤ sup {hv, yi − hv, yi + f (y)}
(4.14)
= f (y)
(4.15)
v ∈Rn
x ∈Rn
v ∈Rn
For the second part, we have con f is proper, and claim that
cl con f is proper, lower semicontinuous, and convex.
Lower semicontinuity is a give. Convexity is given by the previous
proposition that f is convex implies cl f is convex. Properness is to be
shown in an exercise.
Applying the previous theorem,
cl con f ( x ) =
g( x )
sup
(4.16)
g∈cl con f ,g affine
=
sup
{hv, x i − β}
(4.17)
{hv, x i − f ? (v)}
(4.18)
(v,β)∈epi f ?
=
sup
(v,β)∈epi f ?
=
sup {hv, x i − f ? (v)}
(4.19)
v∈dom f ?
= sup {hv, x i − f ? (v)} − f ?? ( x )
v ∈Rn
(4.20)
with g( x ) ≤ hv, x i − β, (v, β) ∈ epi(cl con f )? ⇐⇒ (v, β) ∈ epi f ? .
To show f ? is proper, lower semicontinuous, and convex, we have
epi f ? is the intersection of closed convex sets, and therefore closed
and convex, and hence f ? is lower semicontinuous and convex.
23
24
andrew tulloch
To show properness, we have con f is proper implies there exists
x ∈ Rn with con f ( x ) < ∞. Then f ? (v) = supx∈Rn {hv, x i − f ( x )},
which is greater than −∞.
If f ? ≡ +∞, then cl con f = f ?? = supv hv, x − f ? ( x ) ≡ −∞, and so
cl con f is proper, which implies f ? is proper, lower semicontinuous,
and convex. Applying to f ? - we need con f ? proper (which is proper
by previous result)„ and thus f ?? is proper, lower semicontinuous,
and convex.
For part 3, apply 2 - f is convex, which implies con f = f and
con f is proper (as f is proper), and f is lsc and convex, and thus
f ?? = cl con f = f .
4.2
Duality Correspondences
Theorem 4.11. Let f : Rn → R̄ be proper, lower semicontinuous, and
convex. Then:
(i) ∂ f ? = (∂ f )−1
(ii) v ∈ ∂ f ( x ) ⇐⇒ f ( x ) + f ? (v) = hv, x i ⇐⇒ x ∈ ∂( f ? )(v).
(iii)
∂ f ( x ) = arg max{hv0 , x i − f ? (v0 )}
(4.21)
v0
∂ f ? ( x ) = arg max{hv, x 0 i − f ( x 0 )}
(4.22)
x0
(4.23)
Proof. (i) This is obvious from (2)
(ii)
f ( x ) + f ? (v) = hv, x i
(4.24)
⇐⇒ { f ? (v) = hv, x i − f ( x )
(4.25)
⇐⇒ x ∈ arg max{hv, x 0 i − f ( x 0 )}
(4.26)
...
(4.27)
Finish off proof
convex optimization
Proposition 4.12. Let f : Rn → R̄ be proper, lower semicontinuous, and
convex. Then
( f (·) − h a, ·i)? = f ?(·+a)
(4.28)
( f (· + b))? = f ? (·) − h·, bi
(4.29)
( f (·) + c)? = f ? (·) − c
·
(λ f (·))? = λ f ? ( ), λ > 0
λ
· ?
(λ f ( )) = λ f ? (·)
λ
(4.30)
(4.31)
(4.32)
Proof. Exercise
Proposition 4.13. f i : Rn → R̄, i : 1, . . . , m proper, f ( x1 , . . . , xm ) =
∑i f i ( xi ). Then f ? (v1 , . . . , vm ) = ∑i f ? (vi )
Definition 4.14. For any set S ⊆ Rn , define the support function
GS (v) = suphv, x i = (δS )? (v)
(4.33)
x ∈S
Definition 4.15. A function h : Rn → R̄ is positively homogeneous if
0 ∈ dom h and h(λx ) = λh( x ) for all λ > 0, x ∈ Rn .
Proposition 4.16. The set of positive homogeneous proper lower semicontinuous convex functions and the set of closed convex nonempty sets are in
one-to-one correspondence through the Legendre-Fenchel transform.
δC ↔ GC
(4.34)
x ∈ ∂GC (v) ⇐⇒ x ∈ C
(4.35)
GC (v) = hv, x i ⇐⇒ v ∈ NC ( x ) = ∂δC ( x )
(4.36)
and
and
The set of closed convex cones is in one-to-one correspondence with itself:
δK ↔ δK?
(4.37)
K ? = {v ∈ Rn |hv, x i ≤ 0∀ x ∈ K }
(4.38)
25
26
andrew tulloch
and
x ∈ NK? (v) ⇐⇒ x ∈ K, v ∈ K ? ,
h x, vi = 0 ⇐⇒ v ∈ NK ( x )
(4.39)
(4.40)
5
Duality in Optimization
Definition 5.1. For f : Rn × Rm → R̄ proper, lower semicontinuous,
convex, we define the primal and dual problems
inf φ( x ), φ( x ) = f ( x, 0)
(5.1)
sup ψ(y), ψ(y) = f ? (0, y)
(5.2)
x ∈Rn
y ∈Rm
and the inf-projections
p(u) = infn f ( x, u)
(5.3)
q(v) = infm f ? (v, y)
(5.4)
x ∈R
y ∈R
f is the perturbation function for φ, p is the associated projection
function.
Consider the problem
1
inf k x − zk2 + δ≥0 ( Ax − b) = inf φ( x )
x 2
(5.5)
Consider the perturbed problem
f ( x, u) =
1
k x − zk2 + δ≥0 ( Ax − b + u)
2
(5.6)
Proposition 5.2. Assume f satisfying the assumptions in Definition 5.1.
Then
(i) φ, − psi are convex and lower semicontinuous
(ii) p, q are convex
28
andrew tulloch
(iii) p(0) = infx φ( x )
(iv) p?? (0) = supy ψ(y)
(v) infx φ( x ) < ∞ ⇐⇒ 0 ∈ dom p
(vi) sup ψ(y) > −∞ ⇐⇒ 0 ∈ dom q
Proof. (i) φ is clearly convex. For ψ, f ? is lower semicontinuous and
convex, which implies −ψ is lower semicontinuous and convex.
(ii) Look at the strict epigraph of p:
E = {(u, α) ∈ Rm × R| p(u) < α}
(5.7)
= {(u, α) ∈ Rm × R|∃ x : f ( x, u) < α}
(5.8)
= A{(u, α, x ) ∈ Rm × R × Rn | f ( x, u) ≤ α}
(5.9)
= A(u, α, x ) 7→ (u, α)
(5.10)
as a linear map over a convex set. As E is convex, p must be convex. Similarly with q.
(iii) For p(0), this proceeds by definition. For p?? (0),
p? (y) = sup{hy, ui − p(u)}
(5.11)
u
= suphy, ui − f ( x, u)
(5.12)
u,x
= f ? (0, y)
(5.13)
and
p?? (0) = suph0, yi − p? (y)
(5.14)
y
= sup − f ? (0, y)
(5.15)
y
= sup ψ(y)
(5.16)
y
(iv) By definition, 0 ∈ dom p ⇐⇒ p(0) = infx f ( x ) < ∞.
complete proof
convex optimization
Theorem 5.3. Let f as in Definition 7.1. Then weak duality holds
p(0) = inf φ( x ) ≥ sup ψ(y) = p?? (0)
x
(5.17)
y
and under certain conditions the inf, sup are equal and finite (strong duality).
p(0) ∈ R and p lower-semicontinuous at 0 if and only if inf φ( x ) =
sup ψ(y) ∈ R.
Definition 5.4.
inf φ − sup ψ
(5.18)
is the duality gap
Proof. (⇐) p?? ≤ cl p ≤ p ⇒ cl p(0) = p(0).
(⇒) cl p is lower semicontinuous, convex ⇒ cl p( x ) is proper,
sup ψ = ( p? )? (0) = (cl p)?? (0) = cl p(0) = p(0) = inf φ
Proposition 5.5.
Fill in notes from lecture
b
29
6
First-order Methods
Idea is that we do gradient descent on our objective function f , with
step size τk .
Definition 6.1. Let f : Rn → R̄, then
(i)
Fτk f ( x k ) = x k − τk ∂ f ( x k ) = ( I − τk ∂ f )( x k )
(6.1)
Bτk f ( x k ) = ( I + τk ∂ f )−1 ( x k )
(6.2)
(ii)
so
(6.3)
Fill in
Proposition 6.2. If f : Rn → R̄ is proper, lower semicontinuous, and
convex, for τ > 0,
Bτk f ( x ) = arg min{
y
1
ky − x k22 + f (y)}
2τk
(6.4)
Proof.
y ∈ arg min{...} ⇐⇒ 0 ∈
1
(y − x )∂ f (y)
τk
(6.5)
(6.6)
Missed lecture material
7
Interior Point Methods
Theorem 7.1 (Problem). Consider the problem
inf hc, x i + δK ( Ax − b)
x
(7.1)
with K a closed convex cone - thus Ax ≥K b.
Notation.
The idea is to replace δK by a smooth approximation F, with
F ( x ) → ∞ as x → boundary K, and solve inf hc, x i + 1t F ( x ), equivalent
to solving thc, x i + F ( x ).
Proposition 7.2. F is canonical barrier for K. Then F is smooth on
R
dom F = K and strictly convex, with
F (tx ) = F ( x ) − Θ F λt,
(7.2)
and
(i) − ∇ F ( x ) ∈ dom F
(ii) h∇ F ( x ), x i = −Θ F
(iii) − ∇ F (− ∇ F ( x )) = x
(iv) − ∇ F (tx ) = − 1t ∇ F ( x ).
For the problem
inf hc, x i
x
(7.3)
34
andrew tulloch
such that Ax ≥K b with K closed, convex, self-dual cone, we have
sup inf hc, x i + δK ( Ax − b)
(7.4)
⇒ −hb, yi − h? (− A T y − c) − k? (y)
(7.5)
⇐⇒ sup hb, yis.t.y ≥K 0, A T y = c
(7.6)
⇐⇒ inf hc, x i + F ( ax − b)
(7.7)
⇐⇒ sup hb, yi − F (y) − δAT y=c
(7.8)
x
x
y
x
y
More conditions on, etc
Proposition 7.3. For t > 0, define
x (t) = arg min{thc, x i + F ( Ax − b)}
(7.9)
y(t) = arg min{−thb, yi + F (y) + δAT y=c }
(7.10)
z(t) = ( x (t), y(t))
(7.11)
These paths exist, are unique, and
( x, y) = ( x (t), y(t)) ⇐⇒


 AT y = c
(7.12)

ty + ∇ F ( Ax − b) = 0
Proof. The first part follows by definition.
For (7.12), the primal optimality condition is that 0 = tc +
A T ∇ F( Ax−b) . The dual optimality condition is that 0 ∈ −tb +
∇ F (y) + Nz| AT z=c (y), which is
⇐⇒ 0 ∈ −tb + ∇ F (y) +


∅ A T y 6= c

range A
(7.13)
AT y = c
⇐⇒ 0 ∈ −tb + ∇ F (y) + range A, A T y = c
(7.14)
Assume x satisfies the primal optimality condition. Then
tc + A T ∇ F ( Ax − b) = 0 ⇐⇒ tc + A T (−ty) = 0
⇐⇒ A T y = c
(7.15)
(7.16)
convex optimization
A few more steps...
Proposition 7.4. If x, y feasible, then
φ( x ) − ψ(y) = hy, Ax − bi
(7.17)
Moreover, for ( x (t), y(t)) on the central path,
φ( x (t)) − ψ(y(t)) =
ΘF
t
(7.18)
Proof.
φ( x ) − ψ(y) = hc, x i − hb, yi
D
E
= A T y, x − hb, yi
= h Ax − b, yi
(7.19)
(7.20)
(7.21)
For the second part, we have
φ( x (t)) − ψ(y(t)) = hy(t), Ax (t) − bi
1
= − ∇ F ( Ax (t) − b), Ax (t) − b
t
ΘF
=
t
(7.22)
(7.23)
(7.24)
Fill in from lecture notes
35
8
Support Vector Machines
8.1
Machine Learning
Given X = { x1 , . . . , x n } a sample set, xi ∈ F ⊆ Rm , with F our
feature space, and C a set of classes. We seek to find
hθ : F → C, θ ∈ Θ
(8.1)
with Θ our parameter space.
Our task is to find θ such that hθ is the “best” mapping from X
into C.
(i) Unsupervised learning (only X is known, usually |C | not known)
• Clustering
• Outlier detection
• Mapping to lower dimensional subspace
(ii) Supervised learning (|C | known). We have training date T =
{( x1 , y1 ), . . . , ( x n , yn )}, with yi ∈ C.
We seek to find
n
θ = arg min f (θ, T ) =
θ ∈Θ
∑ g(yi , hθ (xi )) + R(θ )
i =1
(8.2)
38
andrew tulloch
8.2
Linear Classifiers
The idea is to consider
 
w
hθ : Rm → {−1, 1}, θ =  
b
(8.3)
hθ ( x ) = sign(hw, x i + b)
(8.4)
We want to consider maximum margin classifiers, satisfying
w
max min y (
,x
kwk
w,b,w6=0 i
i
+
b
)
kwk
(8.5)
which can be rewritten as
max c
w,b
(8.6)
such that
w
b
i
c≤y
,x +
kwk
kwk
D
E
kwkc ≤ yi w, xi + b
i
(8.7)
(8.8)
(8.9)
or just
1
min kwk2
2
w,b
(8.10)
1 ≤ yi (hw, x i + b)
(8.11)
such that
Definition 8.1. In standard form,
 
w
inf k(w, b) + h( M   − e)
w,b
b
(8.12)
convex optimization
The conjugates are
1
k w k2
2
(8.13)
1
kuk2 + δ{0} (c)
2
(8.14)
h(z) = δ≥0 (z)h? (v) = δ≤v (v)
(8.15)
k(w, b) =
k? (u, c) =
In saddle point form,
1
inf sup kwk22 +
w,b z 2
*
 
+
w


− e, z − δ≤0 (z)
M
b
(8.16)
The dual problem is
n
1
sup −he, zi − δ≤0 (z) − k − ∑ yi xi zi k22 − δ{0} (hy, zi)
2
z
i =1
(8.17)
and thus
1
inf k ∑ yi xi zi k22 + he, zi
z 2
i
(8.18)
such that z ≤ 0, hy, zi = 0.
The optimality conditions are
(8.19)
Fill in rest of optimality conWe use the fact that if k( x ) + h( Ax + b) is our primal, then the dual
is
ditions
−hb, yi − k? (− A T y) − h? (z).
Explanation of support vectors
8.3
Kernel Trick
The idea is to embed our features into a a higher dimensional space,
mapping function φ. Then our decision function takes the form
hθ ( x ) = sign(hφ( x ), wi + b)
(8.20)
39
9
Total Variation
9.1
Meyer’s G-norm
Idea - regulariser that favors textured regions
kukG = {inf kvk∞ | ÷ v = u, v ∈ L∞ (Rd , Rd )}
(9.1)
Discretized, we have
kukG = inf{δ?C (v) + δ−G T v=u }
C = {v| ∑ kv( x )k2 ≤ 1}
(9.2)
(9.3)
x
D
E
= inf v sup sup{δC? (v) − w, G T v + u }
v
w
(9.4)
= sup − sup{hv, − Gwi + δC? (v) − hw, ui}
w
v
(9.5)
= sup{δC (− Gw) + hw, ui}
(9.6)
= sup{hw, ui − δC (− Gw)}
(9.7)
= sup{hw, ui| TV (w) ≤ 1}
(9.8)
w
w
and so k · kG is the dual to TV.
k · kG = δB? TV
(9.9)
42
andrew tulloch
where BTV = {u| TV (u) ≤ 1}.
Similarly,
sup{hu, wi|kwkG ≤ 1}
(9.10)
= sup{hu, wi|∃v : w = − G T v, kvk∞ ≤ 1}
(9.11)
D
E
= sup{ u, − G Tv |kvk∞ ≤ 1}
= TV (u)
(9.12)
(9.13)
Why is k · k good in separating noise?
1
arg min ku − gk22 s.t.kukG ≤ λ
2
(9.14)
∏
=
λBk·k
( g) = BλBk·k ( g) = g − BλBk·k = g − BλTV ( g)
G
G
G
(9.15)
9.2
Non-local Regularization
In real-world images, large k ∇ uk does not always mean noise.
Definition 9.1. Ω = {1, . . . , n}, given u ∈ Rn , x, y ∈ Ω, w ∈ Ω2 →
R≥0 , then
∂y u( x ) = (u(y) − u( x ))w( x, y)
(9.16)
∇ u( x = (∂y u( x )))y∈Ω
(9.17)
w
A suitable divergence ÷w u( x ) = ∑y∈Ω (w( x, y) − v(y, x ))w( x, y)
adjoint to ∇w with respect to Euclidean inner products.
D
E
h− ÷w v, ui = v, ∇ u
w
(9.18)
Non-local regularizers are
J (u) =
∑
x ∈Ω
g(∇ u( x ))
w
(9.19)
convex optimization
with
∑
g
TVNL (u) =
d
(u) =
TVNL
x ∈Ω
k ∇ u( x )k2
w
∑ ∑
|∂y u( x )
(9.20)
(9.21)
x ∈Ω y∈Ω
This reduces to classical TV if the weights are chosen as constant
(w( x, y) = 1h ).
9.2.1 How to choose w( x, y)?
(i) Large if neighborhoods of x, y are similar with respect to a distance metric.
(ii) Sparse, otherwise we have n2 terms in the regularized (n is number of pixels).
Possible choice
Z
KG (t)(u(y + t) − u( x + t))2 dt
(9.22)
A( x ) = arg min{ ∑ du ( x, y)| A ⊆ S( x ), | A| = k}
(9.23)
du ( x, y) =
A
Ω
y∈ A
w( x, y) =


1
y ∈ A ( x ), x ∈ A ( y )

0
otherwise
with KG a Gaussian kernel of variance G2 .
(9.24)
43
10
Relaxation
Example 10.1 (Segmentation). Find C ⊆ Ω that fits the given data and
prior knowledge about typical shapes.
Theorem 10.2 (Chan-Vese). Given g : Ω → R, Ω ⊆ Rd , consider
inf
C ⊆R,c1 ,c2 ∈R
f CV (C, c1 , c2 ) =
Z
C
f CV (C, c1 , c2 )
( g − c1 )2 dx +
Z
Ω\C
(10.1)
( g − c2 )2 dx + λHd−1 (∂C )
(10.2)
.
Confirm form of the reguThus fit c1 to shape, c2 to outside of shape, and C is the region of the
shape.
10.1
Mumford-Shah model
inf
K ⊆ Ω closed,u∈C ∞ (Ω\K )
f (K, u) =
Z
Ω
( g − u)2 dx + λ
f (K, u)
Z
Ω\K
k ∇ uk22 dx + νHd−1 (∂K ).
(10.3)
(10.4)
Chan-Vese is a special case of this, with forcing u = c1 I(C ) +
c2 (1 − I(C )).
lariser
46
andrew tulloch
Z
inf
C ⊆Ω,c1 ,c2 ∈R C
( g − c1 )2 dx +
Z
inf
u∈ BV ( D,{0,1}),c1 ,c2 ∈R Ω
Z
Ω\C
( g − c2 )2 dx + λHd−1 (∂C )
(10.5)
u( g − c1 )2 + (1 − u)( g − c2 )2 dx + λTV (u)
(10.6)
Fix c1 , c2 . Then
inf
Z
u∈ BV (Ω,{0,1}) Ω
u (( g − c1 )2 − ( g − c2 )2 ) dx + λTV (u)
{z
}
|
(C)
S(·)
Replacing {0, 1} by it’s convex hull, we obtain
inf
Z
u∈ BV (Ω,[0,1]) Ω
u · Sdx + λTV (u)
(R)
This is a “convex relaxation” (i) Replace non-convex energy by convex approximation
(ii) Replace non-convex f by con f .
Can the minima of ( R) have values ∈
/ {0, 1}. Assume u1 , u2 ∈
BV (Ω, {0, 1}) that u2 minimized ( R) and (C ), and u1 6= u2 . Then
1
1
u + u2 ∈
/ BV (Ω, {0, 1})
2 1 2
(10.7)
is a minimizer of ( R).
Proposition 10.3. Let c1 , c2 be fixed. If u minimizes ( R) and u ∈ BV (Ω, {0, 1}),
then u minimizes (C ) and (u = 1C ) C minimizes f CV (·, c1 , c2 ).
Proof. Let C 0 ⊆ Ω. Then 1C0 ∈ BV (Ω, [0, 1]). Then
f ( 1C 0 ) ≥ f ( 1C )
(10.8)
∀C 0 ⊆ Ω.
Definition 10.4. L = BV (Ω, [0, 1]). For u ∈ L, α ∈ [0, 1],
uα ( x ) = I({u > α}) ( x ) =


1
u( x ) > α

0
u( x ) ≤ α
(10.9)
convex optimization
Then f : L → R satisfies general coarea condition (gcc) if and
only if
f (u) =
Z 1
0
f (uα )dα
(10.10)
Upper bound?
Proposition 10.5. Let s ∈ L∞ (Ω), Ω bounded. Then f as in ( R) satisfies
gcc.
Proof. If f 1 , f 2 satisfy gcc, then f 1 + f 2 satisfy gcc. This follows as
λTV satisfies gcc by the coarea formula for total variation.
Then we have
Z
Ω
u( x )S( x )dx =
=
Z
(
Z 1
Ω 0
Z 1Z
0
Ω
I({u( x ) > α}) S( x )dα)dx
(10.11)
I({u( x ) > α}) dxdα.
(10.12)
Theorem 10.6. Assume f : BV (Ω, [0, 1]) → R satisfies gcc and
u? ∈
f ( u ).
arg min
(10.13)
u∈ BV (Ω,[0,1])
Then for almost any α ∈ [0, 1], u?α is a minimizer of f over BV (Ω, {0, 1}),
with
u?α ∈
f ( u ).
(10.14)
S = {α| f (u?α ) 6= f (u? )}
(10.15)
arg min
u∈ BV (Ω,{0,1})
Proof.
If L(S) = ∅, then for a.e. α,
f (u?a ) = f (u? ) =
inf
u∈ BV (Ω,[0,1])
f (u)
(10.16)
If L(S) > 0, then there exists e > 0 such that
L(Se ) > 0, Se = {α|u?α ≥ f (u? ) + e}
(10.17)
47
48
andrew tulloch
which implies
f (u? ) =
≤
Z
[0,1]\Se
Z 1
0
<
Z 1
0
f (u? )dα +
Z
Se
f (u? )dα
(10.18)
f (u?α )dα − eL(Se )
(10.19)
f (u?α )dα
(10.20)
which contradicts gcc.
Remark 10.7. Discretization problem, consider
min f (u), f (u) = hs, ui + λ ∑ k Guk2
u ∈Rn
(10.21)
i
such that 0 ≤ u ≤ 1 vs
min f (u), f (u) = hs, ui + λ ∑ k Guk2 .
u ∈Rn
(10.22)
i
such that u ∈ {0, 1}n .
If u? solves the relaxation, does u?α solve the combinatorial problem?
Only if the discretized energy f satisfies gcc.
11
Bibliography