Linear Systems. LU.

Tutorial 11
Unconstrained optimization
Steepest descent
Newton’s method
Why Function Optimization ?
There are three main reasons why most problems in robotics, vision,
and arguably every other science or endeavor take on the form of
optimization problems:
• The desired goal may not be achievable, and so we try to get as
close as possible to it.
• There may be more ways to achieve the goal, and so we can choose
one by assigning a quality to all the solutions and selecting the best
one.
• We may not know how to solve the system of equations f(x) = 0, so
instead we minimize the norm ||f(x)||, which is a scalar function of
the unknown vector x.
Tutorial 11
M4CS 2005
2
Characteristics of Optimization Algorithms
x* = arg min f(x)
x Rn
1. Stability
Under what conditions the minimum will be reached?
2. Convergence speed
f ( x k 1 )  f ( x )  c x k  x *
*
N
N – the order of the algorithm (usually N=1,2, rarely 3)
3. Complexity
 
O nM
How much time (CPU operations) takes each iteration.
Tutorial 11
M4CS 2005
3
Line search

Line search could run as follows: Let h  f x k  p k

be the scalar function of α representing the possible values of f(x) in the
direction of pk. Let (a,b,c) be the three points of α, such that a single point
of (constrained) minimum x* is between a and c: a < x* < c .
Then the following algorithm allows to
approach x* arbitrarily close:
If f(a) ≥ f(c),
u = (a+b)/2;
If f(u) < f(b)
(a,b,c) = (a,u,b)
Else
(a,b,c) = (u,b,c)
Tutorial 11
If f(a) < f(c),
u = (b+c)/2;
If f(u) < f(b)
(a,b,c) = (b,u,c)
Else
(a,b,c) = (a,b,u)
M4CS 2005
a
c
u
b
4
Taylor Series
The Taylor series for a scalar function f(x) is given by

2
m ( m )
f ( x  )  f ( x )  f ' ( x )  f ' ' ( x )  ... 
f (x)  R m
1!
2!
m!
,where
m1
Rm 
f ( m1) ( x    )
(m  1)!
The Taylor series can be derived by successive differentiation of polynomial
representation of f(x):

f (x)   ck x k
k 0
For the function of n variables, the expression is
k
1 

 
 f ( x)  Rm
f ( x  )    1   2
 ...   n
x1
x2
xn 
k 0 k! 
m
Tutorial 11
M4CS 2005
5
2D Taylor Series: Example
Consider an elliptic function: f(x,y)=(x-1)2+(2y-2)2 and find
the first three terms of Taylor expansion.
  2f

2
1 f
1
T  x
f (0  x )  f (0) 
 x  x  
2


f
1!  x
2!

 yx
 2f 

xy 
 x  R 3
2

 f
2 
y 
0
1
1
T 2
  x  R 3
f (0  x )  5   2  8 x  x   
1!
2!
 0 8
Tutorial 11
M4CS 2005
6
Steepest Descent: example
Consider the same elliptic function: f(x,y)=(x-1)2+(2y-2)2
and find the first step of Steepest Descent, from (0,0).
-f’(0)
f ( x, y)  x 2  2 x  1  4 y 2  8 y  4
f ( x, y )  (2,8)
2
f ()  2   22   1  48   88   4
2
2
1
Now, we the line search can be applied. Instead, we do:
f ()  8  4  4 128  64  0;
130  17  0   
17
;
130
 17 68 
( x1 , y1 )   , ;
 65 65 
 17 68 
f  ,   ...
 65 65 
Is it a minimum? Next step?
Tutorial 11
M4CS 2005
7
Newton’s Method
The steepest descent treats only the gradient term of Taylor expansion
to finds the minimization direction and therefore has linear convergence
rate. Newton’s method treats also the second derivatives to find both the
direction and the step, and is applicable, where the function f(x) near
minimum x* can be approximated by a paraboloid:
  2f

2
1 f
1
T  x
f (0  x )  f (0) 
 x  x  
  2f
1!  x
2!

 yx
in other words if the Hessian H is PD.
f ( x k  x )  f ( x k )  g Tk  x k 
 2f 

xy 
 x  R 3
 2f 

y 2 
1
x k T  H k  x k
2
Minimum of the function should require:

T
1 T
f ( x k  x )  g k  H k  x k  0  x k  H k g k
 x
Tutorial 11
M4CS 2005
8
Newton’s Method: Example
Consider the same elliptic function: f(x)=(x1-1)2+4(x2-2)2 and find the
first step for Newton’s Method.
1
x1  H1 g1
-f’(0)
2
2
g1  


16


 2 0

H1  
 0 8
1
2
Δx1   
0


0    2  1 

1   16 2

8
1
In this simple case, the description of the function with the first 3
Taylor terms is exact, and the first iteration converge to the minimum.
Tutorial 11
M4CS 2005
9
Complexity 1/2
For example, for a quadratic function
1 T
f (x)  c  a x  x Qx
2
T
The steepest descent takes many iterations to converge in general case
Q≠I, while the Newton’s method will require only one step.
However, this single iteration in Newton's method is more
expensive, because it requires both the gradient gk and the Hessian Hk to
be evaluated, for a total of
 n
n   
 2
derivatives . In addition, the
Hessian must be inverted, or, at least, a system
a  Qk  x k  0
must be solved. The explicit solution of this system requires about O(n3)
operations and O(n2) memory, what is very expensive.
Tutorial 11
M4CS 2005
12
Complexity 2/2
In contrast, steepest descent requires the gradient gk for selecting the
step direction pk, and a line search in the direction pk to find the step size.
These faster steps can be advantageous over faster convergence of
Newton’s method for large dimensionality of x, which can exceed many
thousands.
In the next tutorial we will discuss the method of conjugate
gradients, which is motivated by the desire to accelerate convergence
with respect to the steepest descent method, but without paying the
computation and storage cost of Newton's method.
Tutorial 11
M4CS 2005
13