An Evaluation of Mathematical Software That Solves Systems of

An Evaluation of Mathematical Software
That Solves Systems of Nonlinear
Equations
K. L. HIEBERT
Sandia Natfonal Laboratories
The "state of the art" of mathematical software that solves systems of nonlinear equations is
evaluated. The evaluation is based on a comparison of eight readily available FORTRAN codes.
Theoretmal and software aspects of the codes, as well as their performance on a carefully designed set
of test problems, are used m the evaluation.
Categories and Subject Descriptors' D.2.8 [Software Engineering]: Metrics--performance measures; G 1 5 [Numerical Analysis]: Proofs of Nonlinear Equations--systems of equatmns; G.4
[Mathematics of Computing]: Mathematmal Software--certification and testing
General Terms: Algorithms, Performance
Additional Key Words and Phrases" Brent's method; Brown's method; quasi-Newton Powell's hybrid
method
1. INTRODUCTION
An integral part of the development of m a t h e m a t i c a l software is the testing of
the software. One c o m m o n type of testing is comparison testing, t h a t is, comparing
the performance of two or more codes t h a t claim to be able to solve the same
type of problem so as to determine which code is the :'best." As a n y o n e who has
ever been involved in testing software knows, testing to find the "best" code is an
all but impossible task and very dependent on the definition of "best."
T h e r e are m a n y difficulties involved in testing, especially comparison testing.
One difficulty is the design of the tests. T h e design should depend on the purpose
of the comparison. For instance, m a n y times new codes claim to be more robust
t h a n existing codes. U n d e r these circumstances, the tests need to be designed to
determine the limits of the codes. A second difficulty is implementing the tests.
It is usually impossible to make the tests totally equivalent because of the
variations in the codes. However, the differences between the codes should be
understood in order to minimize their effects on w h a t the codes are asked to do.
And a third difficulty is the analysis of the results of the tests. T h e design and
implementation of the tests should be kept in mind while the analysis is done.
T h e analysis can be very difficult because most often the results are not clear-cut
as to which code is the "best."
One criticism about comparison testing is t h a t too often the testing is done by
the a u t h o r of one of the codes in the comparison to prove his code is, if n o t the
"best," better t h a n the other codes in some sense. T h e testing can be unfair
This artmle was sponsored by the U.S. Department of Energy under Contract DE-AC04-76DP00789.
Author's address. Numer]cal Mathematms Dwlslon 5642, Sandia National Laboratories, Albuquerque,
NM 87185
1982 ACM 0098-3500/82/0300-0005$00 00
ACMTransactionson MathematicalSoftware,Vol 8, No. 1, March1982,Pages5-20
6
•
K.L. HJebert
because of the emphasis placed on one capability of the author's code that the
other codes may not even claim to have. In this case, if the differences between
the codes and the design of the tests are not explained, the results can be
misleading. Another criticism is that even when the testing is done very carefully,
the results are soon outdated because one of the codes in the comparison changes
or a new code is developed.
The comparison of software that solves systems of nonlinear equations,
discussed in this paper, was done in order to advise a committee from several national laboratories on what software is available in this area and which
codes should be added to their common mathematical software library. Eight
FORTRAN codes were collected and used in the comparison. The author had
nothing to do with the development of any of these codes. The tests were designed
to determine how well the codes will perform under different conditions. (The
main areas of interest are the ability of the code to find solutions and the behavior
of the code when it is unable to find a solution.) Much time was spent in analyzing
the differences between the codes and implementing the tests so that the
comparison would be unbiased. As expected, when the analysis of the results was
completed, there was no clear-cut "best" code.
Instead of detailing the results of these tests (which are already outdated
because two of the codes have changed since the testing), we want to comment
on the "state of the art" of mathematical software that solves systems of nonlinear
equations. First, we discuss the theoretical differences in the codes, that is, the
different methods the codes implement. Second, we discuss the software differences. Some software issues are unimportant and just a matter of preference, but
others are very important. Next, the design and implementation of the testing is
discussed. It was impossible without changes to the codes to implement the tests
equivalently. We also discuss the differences in the codes that make the implementation so difficult. And finally, we make some comments on how well this
generation of software performs and how the next generation could improve. For
a detailed report on the testing and the results, see [6].
2. G E N E R A L I N F O R M A T I O N
The eight FORTRAN codes in the comparison are C05NAF (NAG, [10]), B R E N T
and HYBRD (MINPACK, [8]), 1NS01A (Harwell, [4]), QN and SOSNLE (Sandia,
[14]), ZONE (PORT, [11]), and ZSYSTM (IMSL, [7]). Comments on two additional codes C05PAF (NAG) and ZONEJ (PORT) are made, but the codes are
not included in the comparison.
In this section we want to comment on obvious differences in the codes and
how some of those differences affected the testing. Most of this information is
summarized in Table I.
M e t h o d s Implemented
The codes implement three different methods: Brown or Brent, quasi-Newton,
and Powell's hybrid. The following is a brief review of the basic ideas involved in
these methods.
In t h e latest version of t h e M I N P A C K Library, B R E N T h a s been removed, H Y B R D h a s s o m e
modifications, a n d t h e code H Y B R J h a s been added.
ACM Transactions on Mathematmal Software, Vol 8, No 1, March 1982
>
>
g
g
o
o
o
e~
o
e~
r~
i
~.~
~
~
N
N
+
+
4.a
o
+
~D
0>=
°~ ~
'= =
i
,~
g.
g-
A
+
+
+
+
+
a
a
a
a
~
t'~
tra
t~
,~
e4~
E
0
e~
o
g
=g
Z
o
[-
Z
Z
N
a~
o
o.<
0<
Z
o
r~
Z
Z
Z
o
.<
m
g
Z
t~
0
.<
>,
r.ra
©
Z
t'q
!
8
•
K.L. Hiebert
The basis of these methods is Newton's method. Consider the problem:
f ( x ) = O,
f: R n ---> R n,
x E R n.
(1)
By Taylor's theorem, we can expand f and form the following linear system:
0 = f ( x * ) ~z f ( x ) + J ( x ) ( x *
- x)
(2)
where x is the current iterate, x* is the solution, and J ( x ) is the Jacobian of f
evaluated at x. By rewriting (2), we can solve for the Newton step hx to improve
our current approximation for the solution
J(x)
hx = -f(x).
(3)
Under the assumptions that f is sufficiently smooth and J, the Jacobian, is a
dense matrix, some important facts about Newton's method are (a) it is a local
method, that is, the initial guess Xo must be sufficiently close to x*, the solution,
for the method to converge; (b) it is Q quadratically convergent; (c) solving (3)
requires O (n3/3) arithmetic operations; and (d) if the Jacobian is approximated
by finite differencing, n 2 function evaluations are required. In this section, by one
function evaluation we mean the evaluation of any one of the n component
functions of f as defined in (1). In the following sections, the codes count each call
to a subroutine which evaluates all n functions simultaneously as one function
evaluation. The methods that the codes implement are modifications of Newton's
method which try to improve on some aspect of it.
The Brown and Brent methods are concerned with efficiency. Both methods
can be classified as a discrete Newton method because each uses finite differencing
to approximate the Jacobian. They are not just the standard discrete Newton
method. Each iteration of these methods constructs a sequence of minor iterates
yo = x , , y l , . . . , y n - 1 ,
XL+I= y n
such that each yk is a zero of the linearization of the component function f~ for
j _< k. By processing each row of the Jacobian in this manner, only n2/2 of
the partial derivatives need to be approximated. The standard discrete Newton
requires all n 2 partial derivatives to be approximated. Brown's method uses
Gaussian elimination to solve for the yk, and Brent's method uses orthogonal
decomposition. For the solution of linear systems, orthogonal decomposition has
a smaller upper bound on the error than does Gaussian elimination, but in
practice there is generally little difference in their stability. Under the same
assumptions of sufficiently s m o o t h / a n d the Jacobian being dense, advantages of
the Brown and Brent methods are Ca) local quadratic convergence; and (b)
because the functions values are supplied one at a time, scaling problems involved
in approximating the Jaeobian can be handled more efficiently. Disadvantages of
these methods are Ca) they are local methods and require good initial guesses; (b)
they require 0 ( n 3 / 3 ) arithmetic operations to solve (3); (c) convergence and
divergence can depend on the order of the equations; and (d) they might not be
practical for problems in which evaluating one function at a time is difficult,
inconvenient, or impossible. SOSNLE (Sandia) and ZSYSTM (IMSL) implement
Brown's method, and B R E N T (MINPACK) implements Brent's method.
A C M Transacttons on M a t h e m a t m a l Software, Vol 8, No 1, M a r c h 1982.
Mathematical Software That Solves Systems of Nonlinear Equations
•
9
Quasi-Newton methods are also concerned with efficiency. In a quasi-Newton
method an approximation to the full Jacobian, which is updated instead of
recalculated at each iteration, is used. One example is the Broyden update. If Bk
is the approximation of the Jacobian at the kth iterate, then
Bk+l = Bk +
(y - Bks)s T
sTs
(4)
where y = f(xk+l) - f(xh) and s = xk+l - xk, is the approximation to the Jacobian
at the k + 1st iterate. In another form of Broyden's update, the inverse Broyden
update, the inverse of the approximate Jacobian is stored and updated at each
iteration, substantially reducing the amount of work in solving the linearized
system. Again under the same assumptions, advantages of quasi-Newton methods,
in particular the inverse Broyden update, are: (a) local superlinear convergence,
(b) only O ( n 2) arithmetic operations per iteration, and (c) only n function
evaluations per iteration. The main disadvantage is that it is a local method and
requires a good initial guess. QN (Sandia) is the only code in our study that
implements strictly a quasi-Newton method. It uses the inverse Broyden update.
In an attempt to improve the local properties but keep the efficiency of a quasiNewton method, Powell, in 1970, suggested a "hybrid" method that combined
quasi-Newton with the gradient method [12]. As stated before, the Newton step
Ax is the solution of
J(x)hx = -f(x).
The gradient step g is -#Vf(x) = - # J T ( x ) f ( x ) , where # is chosen to minimize
IIf(x) - # J T ( x ) f ( x ) ] [ .
The hybrid step is a linear combination of the Newton and gradient step. The
hybrid step y can also be expressed as the solution of
(~,I + J T ( x ) J ( x ) ) ~ = - J ( x ) f ( x ) .
Those familiar with nonlinear least squares will recognize this as a Levenberg/
Marquardt step. In the Levenberg/Marquardt method the Jacobian is supplied
analytically or approximated by finite differences. Powell's hybrid method uses
a Broyden update to approximate the Jacobian. The advantages are the difference
in the operation count (O (n 2) arithmetic operations for Powell's hybrid as opposed
to O (n3/3) for Levenberg/Marquardt) and the number of function evaluations in
updating the Jacobian. Four codes use variations of PoweU's original idea. They
are C05NAF (NAG), HYBRD (MINPACK), NS01A (Harwell), and ZONE
(PORT).
Two additional codes that solve systems of nonlinear equations, C05PAF
(NAG) and ZONEJ (PORT), require the user to supply the Jacobian. 2 The NAG
code C05PAF simply calls the NAG code E04GAF, which is a Levenberg/
Marquardt nonlinear least squares solver. For this reason C05PAF was not
included in the comparison. (For comments on the performance of the code
E04GAF, see [5].) The only difference between ZONE and ZONEJ is that when
2 T h e new code m the M I N P A C K L]brary, H Y B R J , requtres t h e user to s u p p l y t h e J a c o b i a n .
ACM Transactions on Mathematical Software, Vol 8, No 1, March 1982
10
•
K.L. Hiebert
a new Jacobian is required (i.e., to get Bo, the original approximation to the
Jacobian), ZONE uses a PORT subroutine to calculate the finite difference
approximation to the Jacobian and ZONEJ uses the user-supplied subroutine
that supplies the Jacobian. Because using the analytical Jacobian can supply
much more information about the system and using a subroutine to approximate
the Jacobian by finite differencing would be no different from using ZONE,
ZONEJ was not included in the comparison.
Software Issues
On the surface the codes are very similar. One difference is the length of the call
list. For instance, HYBRD has 23 items in its call list, while ZONE has only 6.
The codes with the longer call list tend to have more options for the users; for
example, HYBRD can internally scale the variables or allow the user to provide
scaling, take advantage of banded Jacobians, and allow the user to set the initial
step-length bound--ZONE has none of these options. The MINPACK codes,
B R E N T and HYBRD, have the longest call list, but the MINPACK Library also
has easy-to-use versions of the codes with shorter call lists.
A more substantial difference is the amount of storage the codes require. In the
environment of large computers, the amount of storage required by the code is
usually not important. However, for minicomputers storage can be a very valid
concern. The Brown and Brent codes require less storage than the quasi-Newton
and hybrid codes. Of the Brown and Brent codes, B R E N T requires twice as much
storage as SOSNLE and ZSYSTM. Of the hybrid codes, C05NAF and NS01A
required twice as much storage as HYBRD.
The software characteristics of the codes are summarized in Table I.
Return Criteria
Understanding the return criteria is very important to the testing. First, we need
to know why the codes return in order to make the tests as equivalent as possible.
Second, the performance of the codes was evaluated on why the codes returned
and the x value they returned. (Each of the codes has a parameter in which to
return a flag indicating why the codes returned.)
We divided the return criteria into two classes, root acceptance and other
returns. The first class of return criteria, root acceptance, is the most important
to our testing. These criteria vary substantially. The codes claim solutions (accept
roots) based on either the relative and/or absolute change in x, or the size of the
functions, or both. The other returns imply that the code has not found a solution
and is returning for one of several reasons, such as too much work, the iterates
are not improving, or the Jacobian appears to be singular. (See Table II for more
details.) The return criteria are discussed in more detail in the next section when
setting tolerances are described.
3. TESTING
The Test Problems
As stated in the Introduction, because software is becoming more robust, the
testing must become more elaborate to determine the limits of the software. We
ACM Transactions on M a t h e m a t i c a l Software, Vol 8, No 1, M a r c h 1982
Mathematical Software That Solves Systems of Nonlinear Equations
T a b l e II.
11
Return Criteria
a. Root acceptance
[~ -XOLD,[
FSUMSQ
II F II
IIX- XOLD][
IIFII =
ABS
REL
BRENT
SOSNLE ~
ZSYSTM
SOSNLE
SOSNLE
QN
QN
QN ~
ABS
REL
BRENT
ZSYSTM
C05NAF
HYBRD
HYBRD
NS01A
ZONE
b Other returns
Not improving
MAXFEV
MAXITR
BRENT
SOSNLE ~
ZSYSTM
QN ~
Slow
convergence
N u m b e r of
FEVAL
BRENT"
SOSNLE"
BRENT ~
SOSNLE ~
BRENT ~ BRENT ~
QN ~
QN ~
C05NAF ~
HYBRD ~
NS01A ~
QN ~
HYBRD ~
ZONE
Diverging
Nearby
stationary
point
ZSYSTM
C05NAF
HYBRD
NS01A
New JAC
Singular
Jacobian
C05NAF ~
HYBRD ~
NS01A ~
ZONE ~
ZONE ~
QN •
C05NAF ~
C05NAF ~
NS01A ~
ZONE ~
R e t u r n c r i t e r i a p a r a m e t e r s s e t b y code, t h e u s e r c a n n o t c h a n g e t h e m
feel that our test set represents many varied problems and was designed to
investigate the robustness of today's software.
The main set of test problems consists of the 23 problems listed in [9]. All of
the problems appear in other sources as well. The problems include Powell's
singular function, the Helical valley function, Brown's almost linear function, a
discrete boundary value problem function, a discrete integral equation function,
banded functions, and the Chebyquad function, which does not have a root when
the number of variables is equal to 8. A secondary set of test problems consists of
three chemical equilibrium problems found in [3].
The testing of the codes to investigate their robustness follows the suggestions
made by Mor~ in [9]. This testing was done on the main set of test problems and
not on the chemical equilibrium problems.
To test the robustness with respect to the initial starting value, 15 problems
were run using Xs, 10 * Xs, and 100 * Xs as the initial starting value, where Xs is
the "standard" starting value. Four problems were run using Xs and 10 * Xs and
four problems using only Xs. This increased the number of test problems
to 57.
An underlying assumption is that codes perform better when all the functions
are approximately the same magnitude and all the variables are of the same
magnitude. Tests for robustness with respect to poor scaling were also impleACM Transacnons on Mathematical Software, Vol. 8, No. 1, March 1982
12
•
K.L. Hiebert
mented. Therefore, to test for poor variable scaling, we modified our test set as
follows:
f(x)
= f(Xx);
and to test for poor function scaling, the test set was modified by
f ( x ) = Y=f(x).
In both cases f is the original set of 57 problems and X is a diagonal matrix. The
entries for X are o(I) = 10"*(5(2/- N - 1 ) / ( N - 1)), I = 1. . . . , N.
It should be noted that in the documentation of the codes, BRENT, SOSNLE,
ZSYSTM, and HYBRD gave no warning about poor scaling. QN gives the
suggestion that if QN returns with IFLAG = 5, 6, or 7, perhaps the user should
rescale the variables or functions. C05NAF recommends that the user scale the
variables so that their magnitudes are similar. NS01A gives the strongest warning:
"The functions f~ should be scaled so that all are similar in magnitude. The same
applies to the variables x~ " . . . . " If these recommendations are not followed, the
routine may waste computing time."
Another test on the behavior of the codes was suggested by John Dennis and
Pablo Barrera [1]. They have been experimenting with systems of nonlinear
equations where the functions cannot be evaluated very accurately. An example
of this type of problem is when evaluating the function implies solving a system
of differential equations. In this case the user not only knows that there is error
in the function values but also has some idea of the size of the error. While none
of the codes claim to be able to handle this type of problem, we felt it would be
interesting to see how the codes would perform under these circumstances. To
implement this test, the test set was modified in the following manner:
f ( x ) = f ( x ) + 10-~* e,
where e is a random number uniformly distributed over the range (0, 1). These
problems will be referred to as fuzzy functions.
The chemical equilibrium problems were included as a secondary set of test
problems because the system of nonlinear equations that results from this type
of problem is usually very poorly scaled and difficult to solve. In fact, the behavior
of most methods for solving systems of nonlinear equations has been so unsatisfactory that other methods have been devised to try to solve these problems; that
is, the most popular methods involve reformulating the problem so that optimization methods can be used. Another difficulty with these problems is that the
variables (which represent the amount of the chemicals) must be nonnegative.
None of the codes are really equipped to constrain the variables. (Because QN
requires the user to supply a bound on the variables, QN can be run in such a
way that negative solutions could be avoided.) One nice property of these
problems is that some of the equations are linear, for example, those that come
from the mass balance equations. If the equations are ordered so that the linear
equations come before the nonlinear equations, the Brown and Brent methods
take advantage of the linear equations by effectively reducing the size of the
problem by the number of linear equations.
ACM Transactions on M a t h e m a t m a l Software, Vol 8, No 1, M a r c h 1982
Mathematical Software That Solves Systems of Nonlinear Equations
•
13
Setting Parameters
As indicated earlier, the return criteria vary substantially. This fact made setting
of parameters (tolerances) to achieve equivalent tests impossible. No changes
were made to the existing codes to equalize the return criteria, because we wanted
to know how the original codes perform. In this section, we want to discuss some
of the difficulties in setting the parameters. For detailed information on how the
parameters were set, see [6].
Recall that we divided the return criteria into two classes, root acceptance and
other returns. The root acceptance criteria are more important in our testing,
because we want the codes to have equivalent tasks in finding a solution. The
difficulty is that three codes C05NAF, NS01A, and ZONE do not claim a solution
based on the change in x, but only on the size of the functions. In contrast, QN
can only claim solutions based on the change in x, not on the size of the functions.
Similarly, SOSNLE claims solutions based on the change in x and the very special
case when all of the functions are zero. B R E N T , ZSYSTM, and H Y B R D use
either the change in x or the size of the functions to claim a solution.
For setting the tolerances used in the root acceptance criteria, the codes were
divided into two groups, the Brown and Brent codes and the quasi-Newton and
hybrid codes. All of the Brown and Brent codes can claim solutions based on
both the change in x and the size of the functions. (SOSNLE requires all the
functions to be zero before a solution is claimed.) Therefore, for these codes, we
set the tolerance used in the change-in-x criterion to machine precision and the
tolerance used for the size of the functions to zero. At least for these codes,
claiming a solution was an equivalent task. In the other group of codes, all but
QN can claim a solution based on the size of the functions. For these codes we set
the tolerance used for the size of the functions criterion to machine precision.
(C05NAF will return with an error condition if the tolerance is less than machine
precision.) Although QN cannot actually claim a solution based on the size of the
functions, it does return when IIf(X,ew)II -< 10-1° IIf(Xo)I1" When QN did return
because of this condition and IIf(Xnew)II was less than machine precision, we
credited QN with claiming a solution. (For almost all the problems, QN claimed
a solution based on the change in x.) HYBRD can also claim solutions based on
the change in x. For both HYBRD and QN the tolerance used by the change-inx criterion was set to machine tolerance.
There are arguments for and against using machine precision as the tolerance
in the root acceptance criteria. One argument for using a larger tolerance is to
determine whether the codes are performing extra work to achieve the requested
accuracy. This can be a very important factor when solving certain types of
problems, for example, boundary value problems, which require a system of
nonlinear equations to be solved at each step and/or the function evaluations are
very expensive. In a case like this, not much accuracy is required, and the time to
solve each system of nonlinear equations can be very important because it is done
so often. A very strong argument for using machine precision as the tolerance is
that difficulties with the codes can be discovered when operating at limiting
precision. We chose the latter argument.
The fuzzy functions were first run using the tolerances set to machine precision
(fuzzy functions with strong tolerances). T h e y were run a second time with the
ACM Transactions on Mathematmal Software, Vol. 8, No. 1, March 1982.
14
•
K . L . Hlebert
tolerances set to size of the noise in the functions (fuzzy functions with weak
tolerances). For the fuzzy functions, a returned x value was accepted as a solution
if both the difference in the returned value and the known solution value for each
variable and the size of each function value were less than the noise in the
functions.
Most of the parameters used in the other return criteria were set by the
individual codes. One exception is the parameter for "too much work," either too
many function evaluations or too many iterations. Four codes returned because
of too many function evaluations and four codes returned because of too many
iterations. Because the number of function evaluations per iteration can vary
from code to code, the number of function evaluations was used as our return
criterion. A counter was added to the user-supplied subroutine that evaluates the
functions that counted the number of calls made to the subroutine. (Recall by
one function evaluation t hat we mean here the evaluation of all n component
functions. Th e Brown and Brent codes require only one component function
value with each call to the function subroutine; therefore n calls to their usersupplied subroutines were counted as one function evaluation.) Owing to the
counter, we could assume that none of the codes would stop because of too much
work before an equal number of function evaluations were performed.
Most of the codes have a return indicating that the iterates are not improving
(or are even diverging). T he criterion for determining this condition is based on
the size of the change in x or the size of the functions over the last k iterations or
1 Jacobian evaluations. T he parameters used in this type of criterion are set by
the codes.
A special option of interest is that of internal variable scaling. H Y B R D is the
only code in the comparison that provides this option. In order to judge how well
this option worked, H Y B R D was run twice on the problems, once using the
internal variable scaling option and once not using it.
4.
NUMERICAL RESULTS
T h e performance of the codes was measured by why the codes returned and what
x value they returned. We group the performance into the following eight
categories.
(1) Th e code claimed a solution and it returned with a solution.
(2) T h e code did not claim a solution but it returned with a solution.
(3) T h e code claimed a solution but did not return with a solution.
T h e code returned before finding a solution because of
(4) too many function evaluations;
(5) inability to improve or divergence;
(6) a singular Jacobian or nearby stationary point.
T h e run was aborted before the code could return because of
(7) an overflow or indefinite condition;
(8) the time limit (7 seconds of CPU time on a CDC 7600).
Instead of presenting the vast amount of data generated by the testing in this
paper, we again refer the reader to [6], where the performance data are presented
A C M T r a n s a c t i o n s o n M a t h e m a t i c a l S o f t w a r e , Vol 8, N o 1, M a r c h 1982
Mathematical Software That Solves Systems of Nonlinear Equations
15
in numerical tables and by using Chernoff faces. Information about the efficiency
of the codes with respect to the main problem tests, such as tables of function
evaluation counts and time, can also be found in [6]. In the remainder of this
section we comment on some aspects of the performance of the codes.
Recall that the Brent method uses orthogonal decomposition and the Brown
method uses Gaussian elimination. On the basis of the performance of BRENT,
a Brent code, and SOSNLE and ZSYSTM, both Brown codes, there appears to
be no advantage of the Brent method over the Brown method. The implementation appears to be more important than the actual method for these codes. For
example, out of 171 problems (no scaling, variable scaling, and function scaling)
both BRENT and SOSNLE found 98 solutions and ZSYSTM found 95. B R E N T
and SOSNLE had little trouble with overflow or indefinite conditions, but
ZSYSTM returned 33 times for this reason. The only return condition for which
SOSNLE and ZSYSTM performed similarly is the return because of a singular
Jacobian. SOSNLE returned 15 times, ZSYSTM returned 17 times, and B R E N T
never returned because of a singular Jacobian. (For these problems, B R E N T
returned because of lack of progress.)
Also recall that the purpose of the hybrid method was to take advantage of the
efficiency of the quasi-Newton method and improve upon its local convergence.
QN is the only strictly quasi-Newton code in the comparison. For the 57 problems
with no scaling, the hybrid codes found approximately twice as many solutions as
QN, no matter which starting value was used. This seems to indicate that the
hybrid method is an improvement over the quasi-Newton method. However, the
results with respect to the variable and function scaling do n o t support that
conclusion. For the variable and function scaled problem sets, QN found as many
and sometimes more solutions than the hybrid codes C05NAF, NS01A, and
ZONE. For the 57 function scaled problems, QN and NS01A each found 26
solutions, C05NAF found 23, and ZONE found 22. For the 57 variable scaled
problems, C05NAF found 8 solutions, QN and NS01A each found 7, and ZONE
found only 3. For the function scaled problems, QN did almost as well as HYBRD
(HYBRD found 35 solutions), but HYBRD did much better for the variable
scaled problems (HYBRD found 30 solutions). One distinct difference between
the performance of QN and that of the hybrid codes, is that QN returned much
more often because of a singular Jacobian (66 out of 171 problems). One important
difference among the hybrid codes is that NS01A had much more difficulty with
overflow or indefinite conditions.
It is the author's opinion that when a code is having difficulty solving a problem
(which is a definite possibility), it is more desirable for the code to detect this
condition and return because of "lack of progress" than not to detect it and return
only after the maximum number of function evaluations have been performed.
Therefore, a code should check for and return because of lack of progress.
ZSYSTM is the only code in the comparison that does not. However, C05NAF
and NS01A, as well as ZSYSTM, returned much more often than the other codes
because of too many function evaluations. C05NAF and NS01A use the same
checks for lack of progress. This result indicates that C05NAF and NS01A are
not using appropriate checks for lack of progress.
The following comments refer to the fuzzy function problems. Because of their
root acceptance criterion based on the size of the functions, when the strong
ACM Transactions on Mathematical Software, Vol. 8, No 1, March 1982
16
K.L. Hiebert
tolerances were used (note the tolerance was smaller than the noise in the
functions), all the hybrid codes, except HYBRD, could n o t c l a i m solutions.
Therefore, these codes did find solutions, but they could not claim them and
returned for other reasons. HYBRD, QN, and the Brown and Brent codes, on the
other hand, could claim solutions because they use the change-in-x criterion.
ZSYSTM found 29 solutions and QN found 18 solutions; however, neither code
c l a ~ m e d any of the solutions. HYBRD claimed all 19 solutions it found.
When the weak tolerances were used, the hybrid codes, ZSYSTM and QN,
c l a i m e d the solutions they had found and were unable to claim when the strong
tolerances were used. HYBRD found and claimed the same solutions it had found
and claimed when the strong tolerances were used. HYBRD also claimed 36
additional "solutions" that were n o t solutions. For each of these problems,
HYBRD claimed the solution based on the change-in-x root acceptance criterion.
The other codes, B R E N T and SOSNLE, found very few solutions and usually
returned because of lack of improvement in the iterates.
The following comments are with respect to the three chemical equilibrium
problems.
Problem 1, defined as
t ~ = x 2 - lO=O
f2 = X 1 X 2
-
5.104
~-0,
is a small problem but demonstrates the characteristics of these types of problems,
that is, poor scaling and some linear equations. The hybrid codes could not solve
this problem with the initial values given. When the order of the equations was
changed, the Brown and Brent codes did no better than the hybrid codes.
Problem 2 is defined as
f, = X I + X2 + X4 - O.O01= O
f~ = x s + x 6 -
55 = o
f3 =X1 +X2 +X3 + 2X5 +X6 - 110.001 = 0
f4 •X, - 0.1X2 = 0
f5 ~" X1 - 104X3X4 = 0
k = x~ -
5 5 • lO'4X3X~ = o.
Note that four of the six equations are linear. This problem is the worst scaled of
the three. Problem 2 was run twice. For the first running the functions were not
scaled. In an attempt to equalize the size of the functions, for the second running
the functions were scaled by (103, 2 * 10 -2, 10 -2, 10, 10 -4, 10-14). For the first
running the Brown and Brent codes did the best. SOSNLE, ZSYSTM, and
B R E N T solved the problem for all four starting values. QN solved it for three of
the starting values. (For xo = (0, 0, 0, 0, 0), QN returned because of a singular
Jacobian.) HYBRD solved it for only two starting values, ZONE for only one,
and C05NAF and NS01A returned because of too much work for all four starting
values. For this problem the quasi-Newton code performed better than all the
hybrid codes. The scaling had little effect on the performance of the Brown and
Brent codes and QN, but it did improve the performance of the hybrid codes.
ACM Transactions on Mathematmal Software, Vol 8, No 1, March 1982
Mathematmal Software That Solves Systems of Nonlinear Equations
17
With the scaling, H Y B R D found a solution for all four starting values, NS01A for
two, and C05NAF for one. It should be noted t h a t C05NAF claimed erroneous
solutions for the other three starting values when the scaling was used. This
demonstrates the hazards of using just the size of f f o r root acceptance.
Problem 3, defined as
f, = Xl + X4 -
3 = O
f2 = 2X1 + X 2 + X 4 + X 7 + X 8 + X 9 + 2 X , 0 - R = 0
f~= 2X2+
2X5+ X6 + X7 - 8--- 0
f4 = 2 X 3 + X s f5 = X , X 5 -
4R = O
1.93 •
f6 = X 6 ( X 2 ) '/2 -
lO-'X2X4
= 0
2.597,10 -3 •
(X2X4, TOT)
f7 = X7 (X4)'/2 _ 3.448 • 10 -3 • ( X 1 X 4
•
'/2 = 0
T O T ) 1/2 _ 0
fs = X s X 4 -
1.799 • 10-~,X2 • T O T -- 0
f9 = X 9 X 4 -
2.155 • 10 -4 , X , • (X3 • T O T ) '/2 -- 0
/% = X,0 {X4)2 - 3.846 • 10 -5 • (X4) 2 • T O T = 0,
describes the combustion of propane in air ( T O T is the sum of the variables and
R is the a m o u n t of air present in the combustion. If R is too small, the mass
balance equations cannot be satisfied.) This problem was very sensitive to the
starting values used. E v e r y code got into trouble by taking a step t h a t caused
some of the variables to become negative which, in turn, caused a negative
argument for the square root function. B y replacing variables 1 t h r o u g h 4 with
the square of the variable, the codes performed m u c h better. However, this
technique is not trouble-free; it can cause additional scaling difficulties and
singular Jacobians, and in this case there are no longer any linear equations. With
this change in variables, the performance of the Brown and B r e n t codes was not
substantially different from t h a t of the hybrid codes.
5. CONCLUSIONS
T h e following conclusions are based on the performance of the codes on the main
set of problems.
A very surprising conclusion is t h a t the starting value seemed to have little
effect on the performance of the codes. This was just as true for the codes
implementing the "local methods," t h a t is, the Brown, Brent, and quasi-Newton
methods, as for the codes implementing PoweU's hybrid method. It was also true
for each type of test, t h a t is, no scaling, variable scaling, function scaling, and
fuzzy functions.
A n o t h e r surprising conclusion is that the hybrid m e t h o d is not necessarily
m u c h of an i m p r o v e m e n t over the quasi-Newton method. It appears t h a t m u c h
of the i m p r o v e m e n t depends on the implementation of the hybrid method.
A general conclusion is t h a t some authors are m u c h more careful t h a n others
about how t h e y implement their codes. For instance, out of the 171 problems in
problem sets involving no scaling, variable scaling, and function scaling, Z S Y S T M
ACM TransacUons on M a t h e m a t m a l Software, Vol 8, No. 1. M a r c h 1982
18
K.L. Hiebert
aborted 33 problems because of an overflow or indefinite condition. Over the
same problem sets, SOSNLE, which implements the same method as ZSYSTM,
namely, Brown's method, had only 3 overflow or indefinite conditions. Similarly
for the hybrid codes, NS01A had 29 overflow or indefinite conditions, while
C05NAF had only 2.
On the original set of problems (no scaling), all the codes performed quite
similarly. The main difference is that the hybrid codes generally solved more
problems than the Brown and Brent codes and twice as many problems as the
quasi-Newton code.
With respect to the function scaled problems, the Brown and Brent codes
performed the best, that is, these codes found the most solutions. In fact, there is
essentially no difference between the performance of the Brown and Brent codes
on the problems with no scaling and their performance on the function scaled
problems. This is also true about the performance of QN, the quasi-Newton code.
These results indicate that the gradient step used by the hybrid codes is not very
beneficial when poor function scaling is present. The difference in the number of
problems solved between the Brown and Brent codes and QN is due in part to
the difficulty QN had in "finding" singular Jacobians (the return indicated that
the Jacobian formed by differencing appeared to be singular). Because the
function values must be supplied one at a time to the Brown and Brent codes,
they were able to adjust for the poor function scaling when approximating the
Jacobian.
None of the codes performed very well on the variable scaled problems. Based
on the number of solutions found, H Y B R D performed the best. Recall that
H Y B R D is the only code in the comparison that has the option of internally
scaling the variables. The surprising fact is that H Y B R D ' s performance with no
scaling by the code was better than its performance with internal variable scaling.
This was true on all the problem sets. H Y B R D has proved to be a very carefully
implemented code; however, the internal scaling option is not a positive addition
to the code. This definitely indicates the need for more work in the area of how
to handle poor variable scaling.
With respect to the fuzzy functions, simply changing the tolerances used in the
root acceptance criteria does not improve the performance of the codes. The
performance of the codes that claim solutions based on the size of the functions
did not really change between using the strong tolerances {machine precision)
and the weak tolerances (the noise in a function). The codes solved the same
problems in both cases. The only difference was that these codes could claim the
solutions they found when the weak tolerance was used. The performance of the
codes that use the change-in-x criterion varied. With the strong tolerances,
ZSYSTM and QN found solutions but did not claim them, and H Y B R D claimed
the solutions it found. When the weak tolerance was used, ZSYSTM and QN
claimed the solutions (ZSYSTM actually found a few more), but H Y B R D got
into trouble by claiming erroneous solutions {always based on the change in x).
This indicates that simply changing the tolerances given the codes is not enough
to solve this kind of problem. Work needs to be done in modifying the root
acceptance criteria to include information about the accuracy of the functions.
Conclusions based on the chemical equilibrium problems are that these are
still difficult problems for the codes and usually require special attention for the
A C M T r a n s a c t i o n s on M a t h e m a t i c a l Software, Vol 8, N o 1, M a r c h 1982
Mathemat)cal Software That Solves Systems of Nonhnear Equations
19
codes to be able to solve them. If it is convenient to use the Brown or Brent codes
(recall that they require the user to supply one function value at a time), then the
poor scaling of the problem can be handled without special attention. The Brown
and Brent codes can also take advantage of the linearity of the problems if the
equations are ordered properly. If the user properly scales the functions, then
HYBRD appears to perform reasonably well. None of the codes have the ability
to constrain the solution; therefore, some technique, such as a change of variables,
must be used to prevent negative solutions.
In summary, the state of the art for solving unconstrained systems of nonlinear
equations seems to be as follows:
(1) The starting value has little effect on the performance of the codes.
(2) Poor function scaling has essentially no effect on the performance of the
Brown and Brent codes. This is due partly to the fact that the function values
are supplied one at a time.
(3) The hybrid method has not necessarily been an improvement over the quasiNewton method, especially when poor variable or function scaling is present.
(4) Much more work needs to be done on how to handle poor variable scaling.
The following are areas in which the software might expand:
(1) In order to solve problems where the functions cannot be evaluated accurately, the codes should be able to incorporate information about the accuracy
of the functions.
(2) Add the ability to handle constraints such as nonnegativity and/or simple
bounds on the variables.
REFERENCES
(Note. References [2] and [13] are not cited in text.)
1 BARRERA, P., AND DENNIS, J. Private communications
2 BLUE, J L Solving systems of nonlinear equations Computing Science Tech. Rep. #50, Bell
Laboratories, Murray Hill, N.J, 1976
3 BREMERMANN, H J. Calculation of equilibrium points for models of ecological and chemical
systems. In Proc Conf on the Apphcation of Undergraduate Mathematics tn the Engineermg,
Ltfe, Managerial, and Social Sciences, Georgia Institute of Technology, Atlanta, June 1973, pp.
198-217.
4. HARWELL SUBROUTINE LIBRARY--CATALOG OF SUBROUTINES # A E R E 7477 (73) (M.J. Hopper), Computer Science and Systems Division, A.E.R.E. Harwell, Oxon., England.
5 HIEBERT, K.L A comparison of nonlinear least squares software. Sandia Tech. Rep. SAND790483, Sandia National Laboratories, Albuquerque, N Mex.
6 HIEBERT, K L. A comparison of software which solves systems of nonlinear equations Sandla
Tech Rep. SANDS0-0181, Sandia National Laboratories, Albuquerque, N. Mex.
7. IMSL LIBRARY REFERENCE MANUAL, 7TH ED. International Mathematical and Statistical
Libraries, Houston, Tex, 1979.
8. MINPACK DOCUMENTATION Applied Mathematics Division, Argonne National Laboratory,
Argonne, Ill
9 MOR]~,J.J., GARBOW, B S , AND HILLSTROM, K E Testing unconstrained optimization software,
TM-324, Applied Mathematics Division, Argonne National Laboratory, Argonne, Ill., 1978.
10 NAG FORTRAN LIBRARYMANUAL, MARK 6 The Numerical Algorithm Group (USA), Downers
Grove, Ill., 1977.
11. PORT MATHEMATICAL SUBROUTINE LIBRARY MANUAL Bell Laboratories, Murray Hill, N.J.,
1977
ACM Transactions on MathematmalSoftware, Vol. 8, No 1, March 1982
20
K.L. Hlebert
A hybrid method of nonlinear equations. In Numerical Methods for Nonhnear
Algebratc Equations, P. Rabinowitz, Ed., Gordon & Breach, London, 1970.
12 POWELL,M.J.D.
13. POWELL, M.J.D. A FORTRAN subroutine for solving systems of nonlinear algebraic equations.
In Numerical Methods for Nonlinear Algebraic Equations, P. Rabinowitz, Ed., Gordon &
Breach, London, 1970.
14. SANDIA MATHEMATICAl,SUBROUTINE LIBRARY Brief Instructions (K.H. Haskell and R.E.
Jones), Sandia Tech. Rep. SAND77-1441, Sandla National Laboratories, Albuquerque, N. Mex
Received August 1980; revised February 1981; accepted October 1981
ACM Transactions on Mathematical Software, Vol 8, No 1, March 1982