Introduction to optimization methods and line search

Introduction to optimization
methods and line search
Jussi Hakanen
Post-doctoral researcher
[email protected]
spring 2014
TIES483 Nonlinear optimization
How to find optimal solutions?
Trial and error β†’ widely used in practice, not
efficient and high possibility to miss good
solutions
Better to use a systematic way to find optimal
solution
Typically we know only
– function value(s) at the current trial point
– possibly gradients at the current trial point
How can we know which solution is
optimal?
How can we find optimal solutions?
spring 2014
TIES483 Nonlinear optimization
Optimality conditions
How can we know that a solution is optimal?
One way is to utilize optimality conditions
Necessary optimality conditions = conditions
that an optimal solution has to satisfy (does
not guarantee optimality)
Sufficient optimality conditions = conditions
that guarantee optimality when satisfied
1. order conditions (1. order derivatives) and 2.
order conditions (2. order derivatives)
Global vs. local minimizers
A solution π‘₯ βˆ— ∈ 𝑆 is a global minimizer if
𝑓 π‘₯ βˆ— ≀ 𝑓 π‘₯ for all π‘₯ ∈ 𝑆
A solution π‘₯ βˆ— ∈ 𝑆 is a local minimizer if there
exists an πœ– > 0 s.t. 𝑓 π‘₯ βˆ— ≀ 𝑓(π‘₯) for all π‘₯ ∈ 𝑆
where π‘₯ βˆ’ π‘₯ βˆ— < πœ–
Convexity: a local minimizer is a global
minimizer
Global minimizers are preferred, local
minimizers are usually easier to identify
spring 2014
TIES483 Nonlinear optimization
Solving an optimization problem
Find optimal values π‘₯ βˆ— for the variables
Problems that can be solved analytically
min π‘₯ 2 , π‘€β„Žπ‘’π‘› π‘₯ β‰₯ 3 β†’ π‘₯ βˆ— = 3
Usually impossible to solve analytically
Must be solved numerically β†’
approximation of the solution
– In mathematical optimization a starting point is
iteratively improved
spring 2014
TIES483 Nonlinear optimization
Numerical solution
Modelling β†’ mathematical model of the
problem
Numerical methods β†’ numerical simulation
model for the mathematical model
Optimization method β†’ solve the problem
utilizing the numerical simulation model
SO
modelling β†’ simulation β†’ optimization
spring 2014
TIES483 Nonlinear optimization
Optimization method
Algorithm: a mathematical description
1.
2.
3.
Choose a stopping parameter πœ€ > 0, starting point π‘₯ 1 and a
symmetric positive definite 𝑛 × π‘› matrix 𝐷1 (e.g. 𝐷1 = 𝐼). Set 𝑦1 = π‘₯ 1
and β„Ž = 𝑗 = 1.
If 𝛻𝑓(𝑦 𝑗 ) < πœ€, stop. Otherwise, set 𝑑 𝑗 = βˆ’π·π›»π‘“(𝑦 𝑗 ). Let πœ†π‘— be a
solution of
min 𝑓(𝑦 𝑗 + πœ†π‘‘ 𝑗 ), s.t. πœ† β‰₯ 0.
Set 𝑦 𝑗+1 = 𝑦 𝑗 + πœ†π‘— 𝑑 𝑗 . If 𝑗 = 𝑛, set 𝑦1 = π‘₯ β„Ž+1 = 𝑦 𝑛+1 , β„Ž = β„Ž + 1, 𝑗 = 1
and repeat (2).
Compute 𝐷 𝑗+1 . Set 𝑗 = 𝑗 + 1 and go to (2).
Method: numerical methods included
Software: a method implemented as a
computer programme
spring 2014
TIES483 Nonlinear optimization
Structure of optimization methods
Typically
– Constraint handling
converts the problem to (a
series of) unconstrained
problems
– In unconstrained
optimization a search
direction is determined at
each iteration
– The best solution in the
search direction is found
with line search
spring 2014
TIES483 Nonlinear optimization
Constraint handling
method
Unconstrained
optimization
Line
search
Local optimization methods
Find a (closest) local optimum
Fast
Usually utilize derivatives
Mathematical convergence
For example
– Direct search methods (pattern search, Hooke
& Jeeves, Nelder & Mead, …)
– Gradient based methods (steepest descent,
Newton’s method, quasi-Newton method,
conjugate gradient, SQP, interior point
methods…)
spring 2014
TIES483 Nonlinear optimization
Global optimization methods
Try to get as close to global optimum as
possible
No mathematical convergence
Do not assume much of the problem
Slow, use lots of function evaluations
Heuristic, contain randomness
Most well known are nature-inspired
methods (TIES451 Selected topics in soft computing)
– based on improving a population of solutions at
a time instead of a single solution
spring 2014
TIES483 Nonlinear optimization
Hybrid methods
Combination of global and local methods
Try to combine the benefits of both
– rough estimate with a global method, fine tune
with a local method
Challenge: how the methods should be
combined?
– e.g. when to switch from global to local? (speed
vs. accuracy)
spring 2014
TIES483 Nonlinear optimization
Line search
What did you find out
about line search?
spring 2014
TIES483 Nonlinear optimization
Line search
The idea of line search is to optimize a given
function with respect to a single variable
Optimization algorithms for multivariable problems
generate iteratively search directions in which
better solutions are found
– Line search is used to find these!
Exact minimum is not required but an
approximation of it which is within a given
tolerance πœ– > 0
– enough to know that x βˆ— ∈ [π‘Žβˆ— , 𝑏 βˆ— ] where 𝑏 βˆ— βˆ’ π‘Žβˆ— < πœ–
spring 2014
TIES483 Nonlinear optimization
Optimality conditions
Necessary: Let 𝑓: 𝑅 β†’ 𝑅 be differentiable. If π‘₯ βˆ—
is a local minimizer, then 𝑓 β€² π‘₯ βˆ— = 0. In
addition, if 𝑓 is twice continuously
differentiable and π‘₯ βˆ— is a local minimizer, then
𝑓 β€²β€² π‘₯ βˆ— β‰₯ 0.
Sufficient: Let 𝑓: 𝑅 β†’ 𝑅 be twice continuously
differentiable. If 𝑓 β€² π‘₯ βˆ— = 0 and 𝑓 β€²β€² π‘₯ βˆ— > 0,
then π‘₯ βˆ— is a strict local minimizer.
spring 2014
TIES483 Nonlinear optimization
Examples
𝑓 π‘₯ = (π‘₯ βˆ’ 2)2 βˆ’4
𝑓 β€² π‘₯ = 2π‘₯ βˆ’ 4
𝑓 β€²β€² π‘₯ = 2
If π‘₯ βˆ— = 2, then both the
necessary and sufficient
optimality conditions are
satisfied
𝑓 π‘₯ = (π‘₯ βˆ’ 2)2 βˆ’4
spring 2014
TIES483 Nonlinear optimization
Examples
𝑓 π‘₯ = (π‘₯ βˆ’ 2)3 βˆ’4
𝑓′ π‘₯ = 3 π‘₯ βˆ’ 2 2
𝑓 β€²β€² π‘₯ = 6π‘₯ βˆ’ 12
If π‘₯ βˆ— = 2, then the necessary
optimality conditions are
satisfied although π‘₯ βˆ— = 2 is
not a local minimizer
– It is a saddle point
Sufficient optimality
conditions are not satisfied
in π‘₯ βˆ— = 2
spring 2014
TIES483 Nonlinear optimization
𝑓 π‘₯ = (π‘₯ βˆ’ 2)3 βˆ’4
Note on optimality conditions
If 𝑓 is not differentiable, then local minimizer
can be in a point where 𝑓 is
1) not differentiable or
2) discontinuous
𝑓 π‘₯ = π‘₯
spring 2014
TIES483 Nonlinear optimization
Finding a unimodal interval
Most line search methods assume that the
search is started from a unimodal interval
[π‘Ž, 𝑏]
𝑓 is unimodal in [π‘Ž, 𝑏] if there is exactly one
π‘₯ βˆ— ∈ [π‘Ž, 𝑏] s.t. for all π‘₯ 1 , π‘₯ 2 ∈ [π‘Ž, 𝑏] for which
π‘₯ 1 < π‘₯ 2 holds
– If π‘₯ 2 < π‘₯ βˆ— , then 𝑓 π‘₯ 1 > 𝑓(π‘₯ 2 ) and
– If π‘₯ 1 > π‘₯ βˆ— , then 𝑓 π‘₯ 1 < 𝑓(π‘₯ 2 )
spring 2014
TIES483 Nonlinear optimization
Search with fixed steps
Let (𝐴, 𝐡) be the interval where we want to find a
minimum for 𝑓
Compute values for 𝑓 in 𝑃 equally spaced points
π‘₯ 𝑖 in (𝐴, 𝐡)
𝑖
– π‘₯ =𝐴+
𝑖
(𝐡
𝑃+1
βˆ’ 𝐴),
𝑖 = 1, … , 𝑃
When points π‘₯ 𝑗 , π‘₯ 𝑗+𝑖 and π‘₯ 𝑗+2 are found s.t.
𝑓 π‘₯ 𝑗 > 𝑓 π‘₯ 𝑗+1 < 𝑓(π‘₯ 𝑗+2 ), we know that there
exist at least one local minimizer in (π‘₯ 𝑗 , π‘₯ 𝑗+2 )
The interval can be further reduced
spring 2014
TIES483 Nonlinear optimization
Line search methods
Assume that 𝑓 is unimodal in [π‘Ž, 𝑏]
General idea is to start reducing the interval
[π‘Ž, 𝑏] s.t. the minimizer is still included in it
An approximation of the minimizer is found
when the length of the interval is smaller than
a pre-determined tolerance
Line search methods can be divided into
– Elimination methods
– Interpolation methods (often use derivatives)
spring 2014
TIES483 Nonlinear optimization
The method of bisection
Elimination method
1) Choose small but significant constant 2πœ– > 0 and an allowable
length 𝐿 > 0 for the final interval. Let [π‘Ž1 , 𝑏1 ] be the original
(unimodal) interval. Set β„Ž = 1.
2) If π‘β„Ž βˆ’ π‘Žβ„Ž < 𝐿, stop. Minimizer π‘₯ βˆ— ∈ [π‘Žβ„Ž , 𝑏 β„Ž ]. Otherwise,
compute values of 𝑓 in
π‘₯β„Ž
π‘Žβ„Ž +π‘β„Ž
π‘Žβ„Ž +π‘β„Ž
β„Ž
=
βˆ’ πœ– and 𝑦 =
+ πœ–.
2
2
𝑓(𝑦 β„Ž ), set π‘Žβ„Ž+1 = π‘Žβ„Ž and 𝑏 β„Ž+1 = 𝑦 β„Ž .
3) If 𝑓 π‘₯ β„Ž <
Otherwise, set
π‘Žβ„Ž+1 = π‘₯ β„Ž and 𝑏 β„Ž+1 = 𝑏 β„Ž . Set β„Ž = β„Ž + 1 and go to step 2).
π‘₯β„Ž
π‘Žβ„Ž
spring 2014
π‘¦β„Ž
2πœ–
TIES483 Nonlinear optimization
π‘β„Ž
The method of bisection (cont.)
Efficiency:
– Length of the interval after β„Ž iterations is
1
1
𝑏
βˆ’
π‘Ž
+
2πœ–
1
βˆ’
β„Ž
β„Ž
2
2
– Number of iterations required if the final length
should be 𝐿 is (why?)
πΏβˆ’2πœ–
β„Ž = βˆ’ln(
)/ ln 2
π‘βˆ’π‘Žβˆ’2πœ–
– For each iteration, the objective function is
evaluated 2 times (in π‘₯ β„Ž and 𝑦 β„Ž ) β†’ in total 2β„Ž
evaluations
spring 2014
TIES483 Nonlinear optimization
Golden section
Assume that we want to separate a sub interval (length 𝑦)
𝐿
𝑦
from an interval of length 𝐿 such that
=
𝑦
Then, 𝑦 =
5βˆ’1
2
πΏβˆ’π‘¦
𝐿 β‰ˆ 0.618𝐿
It is said that now the interval is divided in the ratio of
golden section
Theorem Divide an interval [π‘Ž, 𝑏] in the ratio of golden
section first from right (point 𝑑) and then from left (point 𝑐).
Then point 𝑐 divides the interval [π‘Ž, 𝑑] in the ratio of golden
section and point 𝑑 does the same for [𝑐, 𝑏].
π‘Ž
spring 2014
𝑐
𝑑
TIES483 Nonlinear optimization
𝑏
Golden section search
Elimination method, known also as Fibonacci search. Let
5βˆ’1
𝐢=
.
1)
2)
3)
4)
5)
π‘Žβ„Ž
spring 2014
2
Choose an allowable length 𝐿 > 0 for the final interval. Let
[π‘Ž1 , 𝑏1 ] be the original (unimodal) interval. Set π‘₯ 1 = π‘Ž1 +
1 βˆ’ 𝐢 𝑏1 βˆ’ π‘Ž1 = 𝑏1 βˆ’ 𝐢(𝑏1 βˆ’ π‘Ž1 ) and 𝑦1 = π‘Ž1 + 𝐢 𝑏1 βˆ’ π‘Ž1 .
Compute 𝑓(π‘₯ 1 ) and 𝑓(𝑦1 ). Set β„Ž = 1.
If 𝑏 β„Ž βˆ’ π‘Žβ„Ž < 𝐿, stop. Minimizer π‘₯ βˆ— ∈ [π‘Žβ„Ž , 𝑏 β„Ž ]. Otherwise, if
𝑓 π‘₯ β„Ž ≀ 𝑓(𝑦 β„Ž ) go to step 4).
Set π‘Žβ„Ž+1 = π‘₯ β„Ž and 𝑏 β„Ž+1 = 𝑏 β„Ž . Further set π‘₯ β„Ž+1 = 𝑦 β„Ž and
𝑦 β„Ž+1 = π‘Žβ„Ž+1 + 𝐢(𝑏 β„Ž+1 βˆ’ π‘Žβ„Ž+1 ). Compute 𝑓(𝑦 β„Ž+1 ) and go to
step 5).
Set π‘Žβ„Ž+1 = π‘Žβ„Ž and 𝑏 β„Ž+1 = 𝑦 β„Ž . Further set 𝑦 β„Ž+1 = π‘₯ β„Ž and
π‘₯ β„Ž+1 = π‘Žβ„Ž+1 + (1 βˆ’ 𝐢)(𝑏 β„Ž+1 βˆ’ π‘Žβ„Ž+1 ). Compute 𝑓(π‘₯ β„Ž+1 ).
Set β„Ž = β„Ž + 1 and go to step 2).
π‘₯β„Ž
π‘¦β„Ž
TIES483 Nonlinear optimization
𝑏h
Golden section search (cont.)
Efficiency
– Length of the interval after β„Ž iterations is
𝐢 β„Ž (𝑏 βˆ’ π‘Ž)
– Number of iterations required if the final length
should be 𝐿 is (why?)
𝐿
β„Ž = ln( )/ ln 𝐢
π‘βˆ’π‘Ž
– For each iteration (except the last), the objective
function is evaluated one time (in π‘₯ β„Ž+1 or 𝑦 β„Ž+1 ) plus
in the beginning in two points (π‘₯ 1 and 𝑦1 ) β†’ in total
β„Ž + 1 evaluations
spring 2014
TIES483 Nonlinear optimization
Quadratic interpolation
Idea is to approximate 𝑓 with a quadratic
polynomial whose minimizer is known
Taylor’s second order polynomial is used:
𝑝 π‘₯ =𝑓
π‘₯β„Ž
+
𝑓′
π‘₯β„Ž
π‘₯βˆ’
π‘₯β„Ž
+
1 β€²β€²
𝑓
2
π‘₯β„Ž
π‘₯βˆ’
2
β„Ž
π‘₯
If 𝑓 β€²β€² π‘₯ β„Ž β‰  0, then 𝑝(π‘₯) has a critical point in π‘₯ β„Ž+1
𝑓′(π‘₯ β„Ž )
β„Ž+1
β„Ž+1
β„Ž
when 𝑝′(π‘₯
)=0β†’π‘₯
=π‘₯ βˆ’
β„Ž
𝑓′′(π‘₯ )
Newton’s method for solving 𝑓 β€² π‘₯ = 0!!!
Interpolation can also be applied in the case
where no derivatives are available (find out the
idea by yourself)
spring 2014
TIES483 Nonlinear optimization
Programming assignment
Form the pairs!!!
Start programming by implementing some line
search method
Any programming language is ok
Test your implementation with some
optimization problems where you know the
minimizer
spring 2014
TIES483 Nonlinear optimization
Topic of the lectures on January 20th
& 22nd
Mon, Jan 20th: unconstrained optimization with
multiple variables, optimality conditions and
methods that don’t utilize gradient information
(=direct search methods)
Wed, Jan 22nd: methods that utilize gradient
information
Study this before the lecture!
Questions to be considered
– What kind of optimality conditions there exist?
– What kind of techniques direct search methods use to
find a local minimizer?
– How gradient information is utilized?
spring 2014
TIES483 Nonlinear optimization