Structured Alignment Methods in Machine Learning

Affine Invariant
Linear Convergence Analysis
for Frank-Wolfe Algorithms
Simon Lacoste-Julien
Martin Jaggi
NIPS FW Workshop – Dec. 10th 2013
slow convergence of Frank-Wolfe...
standard FW
slow convergence of Frank-Wolfe...
standard FW
FW with away steps
slow convergence of Frank-Wolfe...
away step
standard FW
FW with away steps
Previous convergence results
assumption: f is strongly convex
(with Lipschitz gradient)
[Wolfe 70, Guélat & Marcotte 86]:
Previous convergence results
assumption: f is strongly convex
(with Lipschitz gradient)
[Wolfe 70, Guélat & Marcotte 86]:
 Frank-Wolfe algorithm converges linearly if solution x* is
in relative interior of D
Previous convergence results
assumption: f is strongly convex
(with Lipschitz gradient)
[Wolfe 70, Guélat & Marcotte 86]:
 Frank-Wolfe algorithm converges linearly if solution x* is
in relative interior of D
 Frank-Wolfe with away steps converges linearly with a
constant depending on the distance between x* and the
boundary of D in the optimal face containing x*
Previous convergence results
assumption: f is strongly convex
(with Lipschitz gradient)
[Wolfe 70, Guélat & Marcotte 86]:
 Frank-Wolfe algorithm converges linearly if solution x* is
in relative interior of D
 Frank-Wolfe with away steps converges linearly with a
constant depending on the distance between x* and the
boundary of D in the optimal face containing x*
Problems:

constant could be arbitrarily close to zero -> not a true linear
convergence result
Previous convergence results
assumption: f is strongly convex
(with Lipschitz gradient)
[Wolfe 70, Guélat & Marcotte 86]:
 Frank-Wolfe algorithm converges linearly if solution x* is
in relative interior of D
 Frank-Wolfe with away steps converges linearly with a
constant depending on the distance between x* and the
boundary of D in the optimal face containing x*
Problems:


constant could be arbitrarily close to zero -> not a true linear
convergence result
constant depends on unknown x*
Previous convergence results
assumption: f is strongly convex
(with Lipschitz gradient)
[Wolfe 70, Guélat & Marcotte 86]:
 Frank-Wolfe algorithm converges linearly if solution x* is
in relative interior of D
 Frank-Wolfe with away steps converges linearly with a
constant depending on the distance between x* and the
boundary of D in the optimal face containing x*
Problems:
constant could be arbitrarily close to zero -> not a true linear
convergence result
 constant depends on unknown x*
 analysis is not affine invariant
(FW alg. is invariant to affine transformations of variables)

Our contribution:

we give an affine invariant analysis of the
global linear convergence of
Frank-Wolfe with away steps with constant
bounded away from zero:
Our contribution:

we give an affine invariant analysis of the
global linear convergence of
Frank-Wolfe with away steps with constant
bounded away from zero:
thm:
where:
Our contribution:

we give an affine invariant analysis of the
global linear convergence of
Frank-Wolfe with away steps with constant
bounded away from zero:
thm:
where:
curvature constant
Our contribution:

we give an affine invariant analysis of the
global linear convergence of
Frank-Wolfe with away steps with constant
bounded away from zero:
thm:
where:
geometric strong convexity constant (new!)
curvature constant
Our contribution:

we give an affine invariant analysis of the
global linear convergence of
Frank-Wolfe with away steps with constant
bounded away from zero:
thm:
where:
geometric strong convexity constant (new!)
curvature constant
Our contribution:

we give an affine invariant analysis of the
global linear convergence of
Frank-Wolfe with away steps with constant
bounded away from zero:
thm:
where:
geometric strong convexity constant (new!)
curvature constant
Our contribution:

we give an affine invariant analysis of the
global linear convergence of
Frank-Wolfe with away steps with constant
bounded away from zero:
thm:
where:
geometric strong convexity constant (new!)
curvature constant
Our contribution:

we give an affine invariant analysis of the
global linear convergence of
Frank-Wolfe with away steps with constant
bounded away from zero:
thm:
where:
geometric strong convexity constant (new!)
curvature constant
‘width’
diameter
‘width’
‘Condition number’ of domain!
diameter
‘width’
‘Condition number’ of domain!
diameter
condition
number of f
‘width’
‘Condition number’ of domain!
diameter
condition
number of f
‘eccentricity’ of D
‘width’
‘Condition number’ of domain!
diameter
condition
number of f
‘eccentricity’ of D
eccentricity in dimension d:
• probability simplex:
• unit cube:
‘width’
Summary
(for Frank-Wolfe with away steps with line-search)

Provide first truly global linear convergence rate for a
Frank-Wolfe type algorithm which doesn’t need to
compute any constants (in contrast to [Garber & Hazan
13])


and analysis is affine invariant
can bound constant with condition number and purely geometric
quantity ‘eccentricity’
constants...
towards
vertex
away
vertex
Bounding
theorem:
:
Convergence proof sketch: