Affine Invariant Linear Convergence Analysis for Frank-Wolfe Algorithms Simon Lacoste-Julien Martin Jaggi NIPS FW Workshop – Dec. 10th 2013 slow convergence of Frank-Wolfe... standard FW slow convergence of Frank-Wolfe... standard FW FW with away steps slow convergence of Frank-Wolfe... away step standard FW FW with away steps Previous convergence results assumption: f is strongly convex (with Lipschitz gradient) [Wolfe 70, Guélat & Marcotte 86]: Previous convergence results assumption: f is strongly convex (with Lipschitz gradient) [Wolfe 70, Guélat & Marcotte 86]:  Frank-Wolfe algorithm converges linearly if solution x* is in relative interior of D Previous convergence results assumption: f is strongly convex (with Lipschitz gradient) [Wolfe 70, Guélat & Marcotte 86]:  Frank-Wolfe algorithm converges linearly if solution x* is in relative interior of D  Frank-Wolfe with away steps converges linearly with a constant depending on the distance between x* and the boundary of D in the optimal face containing x* Previous convergence results assumption: f is strongly convex (with Lipschitz gradient) [Wolfe 70, Guélat & Marcotte 86]:  Frank-Wolfe algorithm converges linearly if solution x* is in relative interior of D  Frank-Wolfe with away steps converges linearly with a constant depending on the distance between x* and the boundary of D in the optimal face containing x* Problems:  constant could be arbitrarily close to zero -> not a true linear convergence result Previous convergence results assumption: f is strongly convex (with Lipschitz gradient) [Wolfe 70, Guélat & Marcotte 86]:  Frank-Wolfe algorithm converges linearly if solution x* is in relative interior of D  Frank-Wolfe with away steps converges linearly with a constant depending on the distance between x* and the boundary of D in the optimal face containing x* Problems:   constant could be arbitrarily close to zero -> not a true linear convergence result constant depends on unknown x* Previous convergence results assumption: f is strongly convex (with Lipschitz gradient) [Wolfe 70, Guélat & Marcotte 86]:  Frank-Wolfe algorithm converges linearly if solution x* is in relative interior of D  Frank-Wolfe with away steps converges linearly with a constant depending on the distance between x* and the boundary of D in the optimal face containing x* Problems: constant could be arbitrarily close to zero -> not a true linear convergence result  constant depends on unknown x*  analysis is not affine invariant (FW alg. is invariant to affine transformations of variables)  Our contribution:  we give an affine invariant analysis of the global linear convergence of Frank-Wolfe with away steps with constant bounded away from zero: Our contribution:  we give an affine invariant analysis of the global linear convergence of Frank-Wolfe with away steps with constant bounded away from zero: thm: where: Our contribution:  we give an affine invariant analysis of the global linear convergence of Frank-Wolfe with away steps with constant bounded away from zero: thm: where: curvature constant Our contribution:  we give an affine invariant analysis of the global linear convergence of Frank-Wolfe with away steps with constant bounded away from zero: thm: where: geometric strong convexity constant (new!) curvature constant Our contribution:  we give an affine invariant analysis of the global linear convergence of Frank-Wolfe with away steps with constant bounded away from zero: thm: where: geometric strong convexity constant (new!) curvature constant Our contribution:  we give an affine invariant analysis of the global linear convergence of Frank-Wolfe with away steps with constant bounded away from zero: thm: where: geometric strong convexity constant (new!) curvature constant Our contribution:  we give an affine invariant analysis of the global linear convergence of Frank-Wolfe with away steps with constant bounded away from zero: thm: where: geometric strong convexity constant (new!) curvature constant Our contribution:  we give an affine invariant analysis of the global linear convergence of Frank-Wolfe with away steps with constant bounded away from zero: thm: where: geometric strong convexity constant (new!) curvature constant ‘width’ diameter ‘width’ ‘Condition number’ of domain! diameter ‘width’ ‘Condition number’ of domain! diameter condition number of f ‘width’ ‘Condition number’ of domain! diameter condition number of f ‘eccentricity’ of D ‘width’ ‘Condition number’ of domain! diameter condition number of f ‘eccentricity’ of D eccentricity in dimension d: • probability simplex: • unit cube: ‘width’ Summary (for Frank-Wolfe with away steps with line-search)  Provide first truly global linear convergence rate for a Frank-Wolfe type algorithm which doesn’t need to compute any constants (in contrast to [Garber & Hazan 13])   and analysis is affine invariant can bound constant with condition number and purely geometric quantity ‘eccentricity’ constants... towards vertex away vertex Bounding theorem: : Convergence proof sketch: