An introduction to complexity analysis for nonconvex optimization

An introduction to complexity analysis
for nonconvex optimization
Philippe Toint (with Coralia Cartis and Nick Gould)
FUNDP – University of Namur, Belgium
Séminaire Résidentiel Interdisciplinaire, Saint Hubert, January 2011
The problem and how to caracterize a solution
The problem
We consider the unconstrained nonlinear programming problem:
minimize f (x)
for x ∈ IRn and f : IRn → IR smooth.
Important special case: the nonlinear least-squares problem
minimize f (x) = 12 kF (x)k2
for x ∈ IRn and F : IRn → IRm smooth.
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
2 / 20
The problem and how to caracterize a solution
A typical application of nonlinear least-squares
Consider a (physical, chemical, biological, . . . ) process evolving over time:
y = P(t)
and a parametrized model for this process
y = M(t, x)
for which observations {yi ≈ P(ti )}nobs
i=1 are known.
How to choose x, the model parameters? Often:
x∗ = arg min 12
x
nobs
X
kyi − M(ti , x)k22
i=1
Examples in sciences, engineering, economy, medecine, psychlogy, . . .
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
3 / 20
The problem and how to caracterize a solution
Unconstrained optimization algorithms
More generally, how to find
x∗ = arg min f (x)
x
(assuming the problem is well-defined) ???
Typically, generate a sequence of iterates {xk }∞
k=0 such that
{f (xk )}∞
k=0 is decreasing
and “hope” that, for some solution x∗ , {xk }∞
k=0 → x∗ !
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
4 / 20
The problem and how to caracterize a solution
How to generate the iterates?
A (good?) sequence of iterates {xk }∞
k=0 is generated by
unconstrained optimization algorithms
(search methods (no derivatives of f used))
gradient methods (steepest descent)
Newton methods and its variants ensuring global convergence
trust-region methods
cubic regularization methods
(linesearch methods)
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
5 / 20
The problem and how to caracterize a solution
How is an (approximate) solution recognized?
Stop the algorithm as soon as
the slope of f is (approximately) zero, i.e.
k∇x f (xk )k ≤ g
(1rst-order optimality)
the curvature of f is (approximately) non-negative, i.e.
λmin [∇xx f (xk )] ≥ −H
(2nd-order optimality)
for some (small) g > 0 and H > 0.
THE COMPLEXITY QUESTION:
How many iterations are needed at most?
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
6 / 20
Complexity for achieving approximate 1rst-order optimality
The complexity question
THE COMPLEXITY QUESTION:
How many iterations are needed at most?
needs assumptions on the smoothness of f [unspecified here]
(convex) vs. NONCONVEX
strongly depends on the algorithm!
the model of f being used (linear/quadratic/cubic)
the model minimization (global vs. local)
the cost of an iteration
typically very pessimistic
(usually quite tricky and technical. . . )
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
7 / 20
Complexity for achieving approximate 1rst-order optimality
A first approach to first-order optimality
Consider achieving (approximate) 1rst-order optimality:
SURPRISE nr 1: a bound exists!
(and is independent of problem dimension)
Gradient methods
O(1/2g )
Nesterov
1rst-order trust-region
O(1/2g )
Gratton, Sartenaer and T.
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
8 / 20
Complexity for achieving approximate 1rst-order optimality
How to prove such results?
1
Prove that, at “successful iterations” (j ∈ S),
f (xj ) − f (xj+1 ) ≥ κr k∇x f (xj+1 )kα ,
2
Assume k∇x f (xj )k ≥ g for all j = 0, . . . , k.
Then, for nS (k) = |S ∩ {1, . . . , k}|,
nS (k) αg ≤
k
X
k∇x f (xj+1 )kα ≤
j=0, j∈S
≤
1
κr
k
X
[f (xj ) − f (xj+1 )]
j=0, j∈S
f (x0 ) − f (xk+1 )
f (x0 ) − f∗
≤
κr
κr
and thus
nS (k) ≤
3
α=2
f (x0 ) − f∗
κr α
Prove that
k ≤ κs nS (k)
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
9 / 20
Complexity for achieving approximate 1rst-order optimality
More on first-order optimality (1)
SURPRISE nr 2:
a better bound exists for (cubic) regularization methods
3/2
With global model min
O(1/g )
With local model min
O(1/g )
Philippe Toint (NAXYS)
3/2
Nesterov
Cartis, Gould and T.
Complexity of nonconvex optimization
St Hubert, January 2011
10 / 20
Complexity for achieving approximate 1rst-order optimality
More on first-order optimality (2)
MOREOVER: the better bound (for cubic regularization) is
sharp
optimal for 2nd-order methods
Explicit counter example built by Hermite interpolation
(Cartis, Gould and T.)
4
2.2223
x 10
2.2222
2.2222
2.2222
2.2221
2.222
2.222
Philippe Toint (NAXYS)
0
1
2
3
4
5
6
7
Complexity of nonconvex optimization
8
9
St Hubert, January 2011
11 / 20
Complexity for achieving approximate 1rst-order optimality
More on first-order optimality (3)
IN ADDITION:
the not-so-good bound for steepest descent is also sharp
Another explicit counter example built by Hermite interpolation
(Cartis, Gould and T.)
4
2.5
x 10
2.5
2.5
2.5
2.5
2.4999
2.4999
2.4999
2.4999
2.4999
Philippe Toint (NAXYS)
0
1
2
3
4
5
6
Complexity of nonconvex optimization
7
St Hubert, January 2011
12 / 20
Complexity for achieving approximate 1rst-order optimality
And then...
SURPRISE nr 3:
Newton’s method may need as many iterations
as steepest descent (in its worst case)!!!
15
24998.7134
779
24998.6
24998.8371
24998.7939
4
24998.713
19 9
5
98.7
93
8.7
71
99
83
24
8.
99
24
249
8
4
9.00
83
8.88
2499
27
8.94
2499
249
99.2
2499
568
9.37
351
89
24999.7591
3
4
24998.7519
24998.9427
24998.8883
3
59
9.1
321
2
77
99
24
1
99.0
249
0
9
249
91
9.75
2499
0
24999.0048
24999.1593
9
24999.378
51
24999.2568
9.53
2499
0.1321
249
99.5
25000.1
24999.077
24999.3789
24999.7591
2500
5
24999.2568
24999.5351
25000.1321
10
249
2499 98.94
9.00 27
24999.0
48
2499
9.159377
24999.535124999.3789
5
6
⇒ Second-order information useless in the worst case!
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
13 / 20
Complexity for achieving approximate 1rst-order optimality
Other results
We can also prove that
the (better) bound for cubic regularization extend to
methods using finite-difference gradients
derivative-free methods
(but now depends also on dimension)
the boundedness of level sets has no impact on the complexity bound
the not-so-good bound for steepest descent extends to composite
non-smooth functions
much better results hold for the convex case
also on special function classes (gradient dominated,. . . )
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
14 / 20
Complexity for achieving approximate 2nd-order optimality
Finding weak unconstrained minimizers
We are now interested in finding xk such that
k∇x f (xk )k ≤ g
and λmin [∇xx f (xk )] ≥ −H
(needs second-order information)
For the cubic regularization:
With global model min
O(1/3g )
Nesterov
With line model min
O(1/3g )
Cartis, Gould and T.
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
15 / 20
Complexity for achieving approximate 2nd-order optimality
Finding weak unconstrained minimizers (2)
But also
H ≤ g
g < H <
√
√
g
Philippe Toint (NAXYS)
Trust-region
O(−3
H )
sharp
O(−3
H )
sharp
O(−3
H )
sharp
−3/2
g ≤ H
Practically sensible: H ≈
Cubic reg.
O(g )
sharp
√
−{3,5}
O(H
)
−[2,5/2]
O(g
)
“sharp”
g
Complexity of nonconvex optimization
St Hubert, January 2011
16 / 20
Constrained problems
Constrained optimization
Consider the constrained nonlinear programming problem:
minimize f (x)
x ∈F
for x ∈ IRn and f : IRn → IR smooth, and where
F is convex.
Typical: bounds on the variables
Main ideas:
exploit (cheap) projections on convex sets
prove global convergence + function-evaluation complexity
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
17 / 20
Constrained problems
A cubic regularization algorithm for the constrained case
For projection-variants to achieve (approximate) 1rst-order optimality
SURPRISE nr 4:
The same bounds hold as for the uncontrained case!!!
1rst-order cubic regularization
O(1/2g )
2nd-order cubic regularization
O(1/g )
3/2
Cartis, Gould and T.
⇒ Convex bounds irrelevant for 1rst-order complexity!
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
18 / 20
Finally. . .
Conclusions and perspectives
strong position of the cubic regularization approach
worst-case analysis not irrelevant for algorithm design
challenging emerging area with many open questions
Many thanks for your attention!
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
19 / 20
Finally. . .
Further reading
C. Cartis, N. I. M. Gould and Ph. L. Toint,
An adaptive cubic regularisation algorithm for nonconvex optimization with convex constraints and its
function-evaluation complexity,
IMA Journal of Numerical Analysis, submitted, 2011.
C. Cartis, N. I. M. Gould and Ph. L. Toint,
On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained
optimization,
SIAM Journal on Optimization, vol. 20(6), pp. 2833–2852, 2010.
C. Cartis, N. I. M. Gould and Ph. L. Toint,
Adaptive cubic overestimation methods for unconstrained optimization. Part II: worst-case function-evaluation
complexity,
Mathematical Programming A, to appear, 2011.
C. Cartis, N. I. M. Gould and Ph. L. Toint,
On the oracle complexity of first-order and derivative-free algorithms for smooth nonconvex minimization,
Rapport NAXYS-03-2010, 2010.
C. Cartis, N. I. M. Gould and Ph. L. Toint,
Complexity bounds for second-order optimality in unconstrained optimization,
Rapport NAXYS-11-2010, 2010.
C. Cartis, N. I. M. Gould and Ph. L. Toint,
Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization,
Rapport NAXYS-05-2010, 2010.
S. Gratton, A. Sartenaer and Ph. L. Toint,
Recursive Trust-Region Methods for Multiscale Nonlinear Optimization,
SIAM Journal on Optimization, vol. 19(1), pp. 414-444, 2008.
Yu. Nesterov and B. T. Polyak,
Cubic regularization of Newton method and its global performance,
Mathematical Programming A, vol. 108(1), pp. 177-205, 2006.
Philippe Toint (NAXYS)
Complexity of nonconvex optimization
St Hubert, January 2011
20 / 20