5. Subgradient method

L. Vandenberghe
EE236C (Spring 2016)
5. Subgradient method
• subgradient method
• convergence analysis
• optimal step size when f ? is known
• alternating projections
• optimality
5-1
Subgradient method
to minimize a nondifferentiable convex function f : choose x(0) and repeat
x(k) = x(k−1) − tk g (k−1),
k = 1, 2, . . .
g (k−1) is any subgradient of f at x(k−1)
Step size rules
• fixed step: tk constant
• fixed length: tk kg (k−1)k2 = kx(k) − x(k−1)k2 is constant
• diminishing: tk → 0,
∞
P
tk = ∞
k=1
Subgradient method
5-2
Assumptions
• f has finite optimal value f ?, minimizer x?
• f is convex, dom f = Rn
• f is Lipschitz continuous with constant G > 0:
|f (x) − f (y)| ≤ Gkx − yk2
∀x, y
this is equivalent to kgk2 ≤ G for all x and g ∈ ∂f (x) (see next page)
Subgradient method
5-3
Proof.
• assume kgk2 ≤ G for all subgradients; choose gy ∈ ∂f (y), gx ∈ ∂f (x):
gxT (x − y) ≥ f (x) − f (y) ≥ gyT (x − y)
by the Cauchy-Schwarz inequality
Gkx − yk2 ≥ f (x) − f (y) ≥ −Gkx − yk2
• assume kgk2 > G for some g ∈ ∂f (x); take y = x + g/kgk2:
f (y) ≥ f (x) + g T (y − x)
= f (x) + kgk2
> f (x) + G
Subgradient method
5-4
Analysis
• the subgradient method is not a descent method
• the key quantity in the analysis is the distance to the optimal set
with x+ = x(i), x = x(i−1), g = g (i−1), t = ti:
2
kx+ − x?k22 = kx − tg − x?k2
= kx − x?k22 − 2tg T (x − x?) + t2kgk22
≤ kx − x?k22 − 2t (f (x) − f ?) + t2kgk22
(k)
combine inequalities for i = 1, . . . , k , and define fbest = min f (x(i)):
0≤i<k
2(
k
X
(k)
ti)(fbest − f ?) ≤ kx(0) − x?k22 − kx(k) − x?k22 +
i=1
≤ kx(0) − x?k22 +
k
X
k
X
t2i kg (i−1)k22
i=1
t2i kg (i−1)k22
i=1
Subgradient method
5-5
Fixed step size: ti = t
(k)
fbest
kx(0) − x?k22 G2t
−f ≤
+
2kt
2
?
(k)
• does not guarantee convergence of fbest
(k)
• for large k , fbest is approximately G2t/2-suboptimal
Fixed step length: ti = s/kg (i−1)k2
(k)
fbest
(0)
? 2
Gkx
−
x
k2 Gs
?
+
−f ≤
2ks
2
(k)
• does not guarantee convergence of fbest
(k)
• for large k , fbest is approximately Gs/2-suboptimal
Subgradient method
5-6
Diminishing step size: ti → 0,
∞
P
ti = ∞
i=1
kx
(k)
(0)
−
x?k22
fbest − f ? ≤
2
+G
k
P
2
k
P
i=1
t2i
ti
i=1
can show that
Subgradient method
k
P
k
P
2
( ti )/( ti)
i=1
i=1
(k)
→ 0; hence, fbest converges to f ?
5-7
Example: 1-norm minimization
minimize kAx − bk1
• subgradient is given by AT sign(Ax − b)
• example with A ∈ R500×100, b ∈ R500
Fixed steplength tk = s/kg (k−1)k2 for s = 0.1, 0.01, 0.001
(f (x(k)) − f ?)/f ?
100
0.1
0.01
0.001
10−1
(fbest(x(k)) − f ?)/f ?
100
0.1
0.01
0.001
10−1
10−2
10−2
10−3
10−3
0
20
Subgradient method
40
k
60
80
100
10−4
0
1000
k
2000
3000
5-8
√
Diminishing step size: tk = 0.01/ k and tk = 0.01/k
(fbest(x(k)) − f ?)/f ?
100
10−1
√
0.01/ k
0.01/k
10−2
10−3
10−4
10−5
Subgradient method
0
1000
2000
k
3000
4000
5000
5-9
Optimal step size for fixed number of iterations
from page 5-5: if si = tikg (i−1)k2 and kx(0) − x?k2 ≤ R:
R2 +
(k)
k
X
s2i
i=1
fbest − f ? ≤
2
k
X
si/G
i=1
√
• for given k , bound is minimized by fixed step length si = s = R/ k
• resulting bound after k steps is
(k)
fbest
GR
−f ≤ √
k
?
(k)
• guarantees accuracy fbest − f ? ≤ in k = O(1/2) iterations
Subgradient method
5-10
Optimal step size when f ? is known
• right-hand side in first inequality of page 5-5 is minimized by
f (x(i−1)) − f ?
ti =
kg (i−1)k22
• optimized bound is
(i−1)
f (x
)−f
kg (i−1)k22
? 2
≤ kx(i−1) − x?k22 − kx(i) − x?k22
• applying recursively (with kx(0) − x?k2 ≤ R and kg (i)k2 ≤ G) gives
(k)
fbest
Subgradient method
GR
−f ≤ √
k
?
5-11
Exercise: find point in intersection of convex sets
find a point in the intersection of m closed convex sets C1, . . . , Cm:
minimize
f (x) = max {f1(x), . . . , fm(x)}
where fj (x) = inf kx − yk2 is Euclidean distance of x to Cj
y∈Cj
• f ? = 0 if the intersection is nonempty
• (from p. 4-14): g ∈ ∂f (x̂) if g ∈ ∂fj (x̂) and Cj is farthest set from x̂
• (from p. 4-20) subgradient g ∈ ∂fj (x̂) follows from projection Pj (x̂) on Cj :
g = 0 (if x̂ ∈ Cj ),
g=
1
(x̂ − Pj (x̂)) (if x̂ 6∈ Cj )
kx̂ − Pj (x̂)k2
note that kgk2 = 1 if x̂ 6∈ Cj
Subgradient method
5-12
Subgradient method
• optimal step size (page 5-11) for f ? = 0 and kg (i−1)k2 = 1 is ti = f (x(i−1))
• at iteration k , find farthest set Cj (with f (x(k−1)) = fj (x(k−1))), and take
x
(k)
= x
(k−1)
f (x(k−1)) (k−1)
(k−1)
−
(x
−
P
(x
))
j
fj (x(k−1))
= Pj (x(k−1))
at each step, we project the current point onto the farthest set
• a version of the alternating projections algorithm
• for m = 2, projections alternate onto one set, then the other
• later, we will see faster versions of this that are almost as simple
Subgradient method
5-13
Optimality of the subgradient method
can the
(k)
fbest
√
− f ≤ GR/ k bound on page 5-10 be improved?
?
Problem class
• f is convex, with a minimizer x?
• we know a starting point x(0) with kx(0) − x?k2 ≤ R
• we know the Lipschitz constant G of f on {x | kx − x(0)k2 ≤ R}
• f is defined by an oracle: given x, oracle returns f (x) and a subgradient
Algorithm class: k iterations of any method that chooses x(i) in
x(0) + span{g (0), g (1), . . . , g (i−1)}
Subgradient method
5-14
Test problem and oracle
1
f (x) = max xi + kxk22,
i=1,...,k
2
x(0) = 0
1
1
?
• solution: x = − (1,
. . . , 1, 0, . . . , 0) and f = −
k | {z } | {z }
2k
?
k
• R = kx
(0)
n−k
√
√
− x k2 = 1/ k and G = 1 + 1/ k
?
• oracle returns subgradient e̂ + x where ̂ = min{j | xj = max xi}
i=1,...,k
(i)
(i)
Iteration: for i = 0, . . . , k − 1, entries xi+1, . . . , xk are zero; therefore
(k)
fbest − f ? = min f (x(i)) − f ? ≥ −f ? =
i<k
GR
√
2(1 + k)
√
Conclusion: O(1/ k) bound cannot be improved
Subgradient method
5-15
Summary: subgradient method
• handles general nondifferentiable convex problem
• often leads to very simple algorithms
• convergence can be very slow
• no good stopping criterion
• theoretical complexity: O(1/2) iterations to find -suboptimal point
• an ‘optimal’ 1st-order method: O(1/2) bound cannot be improved
Subgradient method
5-16
References
• S. Boyd, lecture notes and slides for EE364b, Convex Optimization II
• Yu. Nesterov, Introductory Lectures on Convex Optimization. A Basic Course
(2004)
§3.2.1 with the example on page 5-15 of this lecture
Subgradient method
5-17

Download Report

5. Subgradient method

Paperzz.com

Your Paperzz