Tutorial 11 Unconstrained optimization Steepest descent Newton’s method Why Function Optimization ? There are three main reasons why most problems in robotics, vision, and arguably every other science or endeavor take on the form of optimization problems: • The desired goal may not be achievable, and so we try to get as close as possible to it. • There may be more ways to achieve the goal, and so we can choose one by assigning a quality to all the solutions and selecting the best one. • We may not know how to solve the system of equations f(x) = 0, so instead we minimize the norm ||f(x)||, which is a scalar function of the unknown vector x. Tutorial 11 M4CS 2005 2 Characteristics of Optimization Algorithms x* = arg min f(x) x Rn 1. Stability Under what conditions the minimum will be reached? 2. Convergence speed f ( x k 1 ) f ( x ) c x k x * * N N – the order of the algorithm (usually N=1,2, rarely 3) 3. Complexity O nM How much time (CPU operations) takes each iteration. Tutorial 11 M4CS 2005 3 Line search Line search could run as follows: Let h f x k p k be the scalar function of α representing the possible values of f(x) in the direction of pk. Let (a,b,c) be the three points of α, such that a single point of (constrained) minimum x* is between a and c: a < x* < c . Then the following algorithm allows to approach x* arbitrarily close: If f(a) ≥ f(c), u = (a+b)/2; If f(u) < f(b) (a,b,c) = (a,u,b) Else (a,b,c) = (u,b,c) Tutorial 11 If f(a) < f(c), u = (b+c)/2; If f(u) < f(b) (a,b,c) = (b,u,c) Else (a,b,c) = (a,b,u) M4CS 2005 a c u b 4 Taylor Series The Taylor series for a scalar function f(x) is given by 2 m ( m ) f ( x ) f ( x ) f ' ( x ) f ' ' ( x ) ... f (x) R m 1! 2! m! ,where m1 Rm f ( m1) ( x ) (m 1)! The Taylor series can be derived by successive differentiation of polynomial representation of f(x): f (x) ck x k k 0 For the function of n variables, the expression is k 1 f ( x) Rm f ( x ) 1 2 ... n x1 x2 xn k 0 k! m Tutorial 11 M4CS 2005 5 2D Taylor Series: Example Consider an elliptic function: f(x,y)=(x-1)2+(2y-2)2 and find the first three terms of Taylor expansion. 2f 2 1 f 1 T x f (0 x ) f (0) x x 2 f 1! x 2! yx 2f xy x R 3 2 f 2 y 0 1 1 T 2 x R 3 f (0 x ) 5 2 8 x x 1! 2! 0 8 Tutorial 11 M4CS 2005 6 Steepest Descent: example Consider the same elliptic function: f(x,y)=(x-1)2+(2y-2)2 and find the first step of Steepest Descent, from (0,0). -f’(0) f ( x, y) x 2 2 x 1 4 y 2 8 y 4 f ( x, y ) (2,8) 2 f () 2 22 1 48 88 4 2 2 1 Now, we the line search can be applied. Instead, we do: f () 8 4 4 128 64 0; 130 17 0 17 ; 130 17 68 ( x1 , y1 ) , ; 65 65 17 68 f , ... 65 65 Is it a minimum? Next step? Tutorial 11 M4CS 2005 7 Newton’s Method The steepest descent treats only the gradient term of Taylor expansion to finds the minimization direction and therefore has linear convergence rate. Newton’s method treats also the second derivatives to find both the direction and the step, and is applicable, where the function f(x) near minimum x* can be approximated by a paraboloid: 2f 2 1 f 1 T x f (0 x ) f (0) x x 2f 1! x 2! yx in other words if the Hessian H is PD. f ( x k x ) f ( x k ) g Tk x k 2f xy x R 3 2f y 2 1 x k T H k x k 2 Minimum of the function should require: T 1 T f ( x k x ) g k H k x k 0 x k H k g k x Tutorial 11 M4CS 2005 8 Newton’s Method: Example Consider the same elliptic function: f(x)=(x1-1)2+4(x2-2)2 and find the first step for Newton’s Method. 1 x1 H1 g1 -f’(0) 2 2 g1 16 2 0 H1 0 8 1 2 Δx1 0 0 2 1 1 16 2 8 1 In this simple case, the description of the function with the first 3 Taylor terms is exact, and the first iteration converge to the minimum. Tutorial 11 M4CS 2005 9 Complexity 1/2 For example, for a quadratic function 1 T f (x) c a x x Qx 2 T The steepest descent takes many iterations to converge in general case Q≠I, while the Newton’s method will require only one step. However, this single iteration in Newton's method is more expensive, because it requires both the gradient gk and the Hessian Hk to be evaluated, for a total of n n 2 derivatives . In addition, the Hessian must be inverted, or, at least, a system a Qk x k 0 must be solved. The explicit solution of this system requires about O(n3) operations and O(n2) memory, what is very expensive. Tutorial 11 M4CS 2005 12 Complexity 2/2 In contrast, steepest descent requires the gradient gk for selecting the step direction pk, and a line search in the direction pk to find the step size. These faster steps can be advantageous over faster convergence of Newton’s method for large dimensionality of x, which can exceed many thousands. In the next tutorial we will discuss the method of conjugate gradients, which is motivated by the desire to accelerate convergence with respect to the steepest descent method, but without paying the computation and storage cost of Newton's method. Tutorial 11 M4CS 2005 13
© Copyright 2026 Paperzz