Performance of Numerical Optimization Routines

Mathematics
Performance of Numerical Optimization Routines
Sponsoring Faculty Member: Dr. Jon Ernstberger
Kayla S. Cline
Introduction
Nonlinear iterative optimization involves estimating a value based upon an initial
estimate to achieve a root of a function. This can be subject to certain constraints.
We also use nonlinear optimization on models to achieve more accurate fits to data
by approximating roots to objective functions such as
(1)
where q is an initial estimate to parameters of a mathematical model (f (xi ,q)) and
(xi) its corresponding data point. We will explore different methods for
optimization and compare each method for efficiency and accuracy. First, we will
look at the effectiveness of Newton, Secant, and Chord methods to approximate
roots in regards to the type of function and the accuracy of an initial guess. Then,
we will briefly explore the application of built-in optimization tools in MATLAB,
in particular fminsearch, a Nelder-Mead based method.
Newton-Based Root Finders
For a function f (x), a root is defined as x*, such that f (x*) = 0. Iterative, NewtonBased methods can be used to solve for a root of a given function. We can apply
these techniques to optimization problems by iteratively estimating using an input
parameter to a model for the purpose of achieving an approximate root of a related
objective function.
Derivation of Newton’s Method
Consider the Taylor polynomial for a function f (x):
(2)
where E(x) is the error term [1] defined by
134
Kayla S. Cline
We will use this polynomial as a starting point to develop an iterative method so
that we can estimate the root of the function. As the function approaches a root,
higher ordered terms of the Taylor polynomial will approach zero. Therefore, we
can consider the higher-order terms trivial. We truncate these terms to get a linear
approximate to f (x), given as
(3)
Since we are solving for a root, we set our approximate equal to 0 and solve for x:
(4)
Now we have an equation that approximates the value, x, at which f (x) equals zero
given an initial estimate of x0. From this, we can derive the general form of the
Newton - Raphson Method, given as
(5)
This creates an iterative process that gains a closer root approximate with each
iteration.
Quadratic Example
Now we will approximate a root to a function f (x) using Newton’s Method.
Consider the quadratic equation f (x) = x2 - x - 6 with the graph of the equation
shown in Figure 1(a). We will iterate Newton’s Method to approximate a zero for f
(x). We initiate our method by choosing x0 = 6. We continue with Newton’s
Method shown as
(6)
135
(a)
(b)
2
Figure 1: (a) Graph of f (x) = x - x – 6 and (b) Newton’s Method for Quadratic
Example
Now we have a closer approximate to the root of the function. We now repeat the
same process using x1 as our new estimate, and continue this process until we
converge to the value of x at the root of f:
Newton’s Method was successful in the convergence to the root of x = 3.
Since Newton’s Method is derivative based, the derivative of the initial estimate
may determine the root to which the method will converge. The derivative of the
initial estimate will determine where the tangent line will create the next
approximate, which determines the direction in which Newton’s Method
approximates the root. The convergence to x = 3, instead of the root x = -2, is
dependent upon the fact that our initial estimate of x0 = 6 was closer to x = 3 than x
= -2.
Convergence Criteria
136
Kayla S. Cline
According to C.T. Kelly in Iterative Methods for Optimization [3], there are
several criteria that must be met in order to guarantee convergence to a root. The
criteria to converge to a root, x*, on [a,b] are:
The initial estimate x0 is in the interval [a,b]. An initial estimate
sufficiently outside x0 [a,b] may result in divergence due to the difference
in the derivative values.
The function f must be continuously differentiable (f
C2[a,b]). For
convergence, derivatives must exist with no continuity issues.
The derivative value of the function at x* cannot be equal to zero (f (x*)
0). We must not divide by zero in order to maintain numerical precision.
A root x* must exist for our function so that f (x*) = 0.
Failure to meet even one of these criteria may result in failure of the algorithm for
Newton’s Method.
Problems with Newton’s Method
While Newton’s Method is at the current state of the art for approximations of
roots of functions, we may encounter several problems depending on our initial
estimate or the type of function. Consider certain functions for which the derivative
f (x) approaches zero near the root. As our derivative evaluations continue to get
smaller, we continue to divide by a number closer and closer to zero. Therefore,
Newton’s Method will diverge and may not reach the true root.
This is demonstrated by the function f (x) =ex. For this example, Newton’s
Method never converges to a root because no root exists. Algebraically, our
method simplifies to the following equation:
Consequently, the algorithm ultimately diverges to negative infinity. Other
functions such as f (x) = arctan(x), shown in Figure 2, will diverge if an initial
estimate is not extremely close to the actual root of the function.
137
Different types of functions can also cause a breakdown of quadratic
convergence in Newton’s Method. Consider the function f (x) = x2. Note that f (x)
has a double root at x = 0. Our generalized iteration of Newton’s Method would be
Figure 2: Graph of f (x) = arctan(x).
which generates a slowed, linear convergence rate. A modification to the Newton
Step from Equation (5), in functions containing a root x = p to the mth degree
where m > 1, is given as
(7)
138
Kayla S. Cline
To ensure quadratic convergence in functions containing repeated roots, we can
rewrite the Newton Step as
given as
, giving us a modified version of Newton’s Method
(8)
This does not resolve problems for cases where f (x*) = 0. However, it does ensure
that functions with repeated roots will maintain a quadratic convergence rate.
Algorithmic Improvements
There are several different alterations that can be made to Newton’s Method to
create better algorithmic implementation. Simple algorithmic constraints can be
added to decrease runtime. These constraints can improve the runtime of Newton’s
Method as well as provide a way to terminate the algorithm if divergence occurs.
Further, variations to the Newton Step can help to avoid extra function evaluations
in the method, which may result in faster convergence to the root.
Algorithmic Enhancements
There are three main modifications that we can add to the algorithm to ensure that
we do not continue to iterate once the estimate of the root is sufficient. We can set
a maximum number of iterations to ensure that, if Newton’s Method diverges, we
terminate the iterations and output an error message. This allows the user to
diagnose problems with the defined function while using computer resources
efficiently.
We must also understand what it means to converge to a root. If Newton’s
Method is converging to the root x* = 3, on just the eighth iteration we may have x8
= 3.00001 but may never converge to the true root x* = 3. However, at x8, if our
next iteration is an extremely small step towards the root, we can say that we have
sufficiently converged to the root value. This is done by using an absolute
difference of input values to determine the termination of the algorithm, or
(9)
where is an extremely small value (e.g., 1x10-6).
We can also use this same type of constraint on the function evaluation. As
Newton’s Method converges to the root value, the function evaluation will
139
converge to f (x*) = 0. Therefore, we can set a tolerance such that the algorithm
terminates when the function evaluation is close enough to zero, denoted as
(10)
The combined effort of these convergence inequalities works to ensure that a
sufficient number of iterations are completed.
Modification of the Newton Step
Several iterations, each requiring a function evaluation and a derivative evaluation,
can be time consuming. This is especially true as the calculations involve several
multiplications, divisions, or nonlinear function calls, each taking extra
computational time. Therefore, we use a process called Secant Method. Instead of
calculating the derivative with each iteration, we use function evaluations at each
iteration to calculate the derivative approximate. For our example, we approximate
f (x) using a forward derivative approximate, given as
(11)
where is small. This creates a secant line instead of the tangent line used in
Newton’s Method. This potentially less expensive approximate to the derivative
may result in faster convergence to the local root of the function.
Function evaluations may require significant time in these algorithms. If we
have a curve that is continuously differentiable, then as we approach a root we
should have a curve that is either increasing or decreasing. Therefore, our
derivative should maintain the same sign as iterations of Newton’s Method get
closer to the root value. Chord Method is a modified Newton’s Method that takes
into account similar derivative values. Instead of calculating f (x) at every iteration
of Newton’s Method, f (x) is calculated every mth iteration, where m is a number
such as m = 50. This may result in a greater number of iterations with fewer
derivative calculations, and potentially quicker convergence to an approximate of
the root.
Comparison of Newton, Secant, and Chord Methods
We now run each of these methods using the same function for the purpose of
comparison. Consider the cubic function
(12)
140
Kayla S. Cline
with roots at x* = -1,1, and 3. Let our initial estimate be x0 = 30. Due to the
derivative evaluation and tangent line at the initial estimate, we expect our methods
to converge to the root x = 3. When each method is implemented, Newton’s
Method and Secant Method converge to x = 3, each with different runtimes,
iterations, and function evaluations, as shown in Table 1. The first three iterations
of Newton’s Method for this function are shown in Figure 3.
Figure 3: Newton’s Method for f (x) = (x - 1)3 - 4x - 4
There are several differences in the runtimes and number of iterations between
the methods. The convergence of Newton’s Method took approximately twice as
long as Secant Method. Chord Method had the longest runtime, due to the
divergence of the method. Secant Method converged with the same number of
iterations as Newton’s Method, but Chord Method maintained fewer derivative
evaluations with many more iterations.
Not all three were able to converge to the root x = 3. Chord Method diverged
and returned “Not a Number” due to the lack of numerical precision. Table 1
demonstrates that the convergence of Chord Method may be dependent upon the
type of function.
141
Specification
Newton
Run Time (Seconds)
0.000837 0.000360 0.00143
Total Iterations
13
13
48
Derivative Calculations 13
13
3
Root Found
x=3
Not A Number
x=3
Secant
Chord
Table 1: Results with x0 = 30 for f (x) = (x – 1)3 - 4x + 4
Fitting Models to Data Using MATLAB Methods
Several methods of these methods can be used to locate the minimum of a multidimensional objective function (i.e. f (x, q) : N
) in order to fit a model to
data. There are also several different methods that are included in MATLAB. We
will focus on the program fminsearch, which uses the Nelder-Mead algorithm and
requires an initial estimate of the model parameters to locate the minimum of an
objective function, defined in a manner similar to
(13)
From the initial estimate, fminsearch will create two new estimates to form a
simplex, such as the one shown in Figure 4 for a two-dimensional plane.
Method
A simplex is a geometric object of N + 1 vertices in a space of N dimensions. The
objective function will be evaluated at each of the N + 1 vertices created by
fminsearch. The vertex evaluated with the optimal value of the cost function will
be used to form a line segment in a determined direction where the objective
function will be evaluated at several points along the segment. A new simplex will
be formed with the point at which the objective function is evaluated at the lowest
value, and then the process is reiterated. For fminsearch, this will happen until the
new line segment formed in each iteration reaches a determined minimum value.
Figure 4 exemplifies the Nelder-Mead algorithm converging to a point in a twodimensional space [4].
142
Kayla S. Cline
Problems with Nelder-Mead
Although this is an efficient algorithm, the convergence to the local minimum of
an objective function is directly based on the accuracy of the initial parameter
estimate. Due to the nature of the algorithm, it will converge to the nearest
minimum of the objective function, which may only be a local minimum instead of
a global minimum. Therefore, it may not be able to locate the optimal parameters
for the model. Finally, fminsearch does not take into account any constraints for
the model parameters, which limits its use and can result in nonsensical parameter
estimations.
Application
Consider the output generated by y (xi) = xi3 + 2xi2 + 3xi + 4. We will use this to
create an optimization problem to demonstrate the effectiveness of the algorithm.
We generate several data points to which the cubic model is a perfect fit. Now we
will rewrite the parameters as q = [a,b,c,d] of f (q) = q(1)x3 + q(2)x2 + q(3)x +
q(4). We want to show that fminsearch will take in an initial incorrect estimate of
(a)
(b)
Figure 4: (a) One Iteration of Nelder-Mead and
(b) Several Iterations Converging to (3, 2)
parameters and match them with our original coefficients. Let our initial estimate
be q0 = [5278] and our objective function be given as
(14)
where (x) is the corresponding generated data point for f (x).
143
The objective function evaluation at the initial estimate is J (qf) = 1.80x1013.
However, after a search time of 1.93 seconds, the algorithm outputs that the
optimal values for the parameters are qf = [0.999, 2.000, 3.000, 3.999]. At these
values, the objective function evaluation is J (qf) = 4.90x10-5. Adjusting constraints
on maximum iterations and tolerance values can affect the accuracy of the output
parameters.
Conclusion
Nonlinear optimization involves iterative processes that include root finders and
data fitting. Newton-based root finders offer an efficient method for locating the
root of a function. However, the speed and accuracy of these methods are
dependent upon the initial estimates and types of functions. Newton-based
methods, as well as the Nelder-Mead algorithm, can be applied to mathematical
models to achieve more accurate fits to data. These methods can be applied in
research throughout many science, technology, engineering, and mathematical
fields.
References
[1] Kendall E. Atkinson. “The Taylor Polynomial Error Formula”. 2003.
[2] Richard L Burden and John Douglas Faires. Numerical Analysis. Cengage Learning, 1993.
[3] C.T. Kelley. Iterative Methods for Optimization. Society for Industrial and Applied
Mathematics (Philadelphia, PA 19104), 1999.
[4] John H. Matthews and Kurtis K. Fink. Numerical Methods Using MATLAB. Prentice-Hall
Inc. (Upper Saddle River, New Jersey), 2004.
[5] Teukolsky SA Vetterling WT Flannery BP Press WH. Numerical Recipes: The Art of
Scientific Computing. Cambridge University Press (New York, NY), 2007.
MATLAB Implementation of Algorithms
Implementations of these algorithms in MATLAB software can be found at:
http://home.lagrange.edu/jernstberger/kscline/kscline_code.zip
144