G + G.. Gu + Gv,

SIAM J. NUMER. ANAL.
Vol. 6, No. 4, December 1969
A QUADRATICALLY CONVERGENT NEWTON-LIKE
METHOD BASED UPON GAUSSIAN ELIMINATION*
KENNETH M. BROWN’
1. Introduction. Let a real-valued twice continuously differentiable system
of N nonlinear equations in N unknowns be given as
A(X)
f2(x)
A(X1,X2,
f2(x1, X2,
fv(x)
fzv(xl, x2,
XN)
"’", X)
O,
O,
(.)
xz)= 0
or in vector notation as
f(x)
(1.2)
0.
In this paper we present an iterative method for the numerical solution of(1.1).
The method is a variation of Newton’s method incorporating Gaussian elimination
in such a way that the most recent information is always used at each step of the
algorithm. After specifying the method in terms of an iteration function, we prove
that the iteration converges locally and that the convergence is quadratic in
nature. Computer results are given and a comparison is made with Newton’s
method; these results illustrate the effectiveness of the method for nonlinear
systems containing linear or mildly nonlinear equations.
2. Notation. We shall introduce most of the notation as needed;however
some comments concerning the symbols for partial differentiation are in order here.
If we are given a function G(u, v) in which
u(x, y),
u
v
v(x, y),
then we adopt the following conventions"
G
Ox
Ou
G,-;+ GxOX
OX
(2.1)
=- G,ux + Gvx,
cv
u
+
G..
G
Gu + Gv,
vy
vy
whereas
(2.2)
G
Gu,
G2
Go;
thus if we let f denote the ith function of the system (1.1), f, f c3f/Oxj is
meant in the sense of (2.1) and fj f,j =_ (f)j is meant in the sense of (2.2).
* Received by the editors December 16, 1968, and in revised form June 21, 1969.
]" Department of Computer Science, Cornell University, Ithaca, New York 14850. This work
constitutes a portion of the author’s doctoral thesis at the Department of Computer Sciences, Division
of Mathematical Sciences, Purdue University, Lafayette, Indiana, 1966.
560
561
NEWTON-LIKE METHOD
...,
3. Newton’s method. Let x
xv) be an initial approximation to a
(x,
real solution r (r 1, "’", rN) of(1.1). The iteration for Newton’s method is given by
X n+l
[J(f")]-if.,
where J(f) is the Jacobian matrix [c3f/c3xj] and the superscript
(3.1)
x"
0, 1,2,
n
...,
n means that all
functions involved are to be evaluated at x x". The following local convergence
theorem is well known for Newton’s method (see e.g. [7]).
THEOREM 3.1. If
(i) in a closed region R whose interior contains a root x r of(1.2), each fi is
twice continuously differentiable.for i= 1,..., N,
(ii) J(f) is nonsingular at x r, and
(iii) x is chosen in R sufficiently close to x r, then the iteration (3.1) is convergent to r.
L. V. Kantorovich [6, p. 708 has given a nonlocal convergence theorem for
Newton’s method which guarantees the existence of the solution r from information
available in a sphere about x
.
4. Description of the method. The algorithm which we develop is essentially
a modified Newton’s method based on Gaussian elimination; we approximate the
forward triangularization of the full Jacobian matrix by working with one row at a
time, eliminating one variable for each row treated. We shall assume that conditions
(i), (ii) and (iii) of Theorem 3.1 hold. Let x" denote an nth approximation to the
root x r of (1.1). The method consists of applying the following steps:
Step 1. Expand fl(x) in a Taylor series about the point x"; retain only first
order terms and thus obtain the linear approximation
A(x") + A(x")(x x)
+ A(x")(x x) + + A(x")(x x).
Equate the right-hand side of (4.1) to zero and solve for that variable, say
(4.1)
A(x)
whose corresponding partial derivative is largest in absolute value. If we assume
the hypotheses of Theorem 3.1, such an explicit solution can always be carried
out provided that x" is sufficiently close to r since, then, J(f’) will be close to J(f)],,
a nonsingular matrix; this implies that at least one of the flxj(x") is different from
zero. Thus we have,
(4.2)
x-
xv
N-1
(f]xj/f],,,,)(xj- x)j=l
...,
N- 1, and
and the constants fl,,(x")/fl,,,,(x"), j 1,
rename the
for
of
we
For
future
saved
are
use.
later,
clarity
purposes
fl(x")/flx,,(x")
left-hand side of (4.2) as b:
where
(4.3)
f]--fl,,(x")
bv(xl, x2,
..., xv_ 1)
right-hand side of (4.2),
xv
and define b, bv(x], xz,
x_ 1). We note that bv
Step 2. Define the function g2 of the n 1 variables x l,
(4.4)
gz(xl,
"",
Xlv-1)
f2(xl,
"",
xv-1,
bv(xl,
f]/f]x,,.
"’,
..’,
x_ by
xv-1)),
562
KENNETH M. BROWN
and g by
g"2
xv-1, bv).
f2(x],
Expand g2 in a Taylor series, this time about the point (x], .-., x,_ 1), linearize
and solve for that variable, say xN-1, whose corresponding partial derivative is
largest in magnitude obtaining
N-2
(4.5)
j=l
Here g,, is obtained by differentiation of (4.4) using the chain rule and is given by
bv]
1)
f2.i + fzv (-fJfx).
Renaming the left-hand side of(4.5)as b_ 1, a function of the N
variables, we have
(4.6)
b_ 1(xl, x2,
..., x-2)
2 remaining
right-hand side of (4.5).
...,
N 2 and g"2/g,,_,, are saved
Again the ratios formed, g"2,,.,/g"2,,_,, j 1,
for future use.
We shall show in Theorem 4.1 that under the hypotheses for Newton’s method
this process is well-defined, i.e., that there actually exists at least one nonvanishing
partial derivative at each stage of the process.
Step 3. Define g3 --f3(xl,
XN-2, bN-1, bn), where bN-1 and bn are obtained by back-substitution in the linear system (4.3) and (4.6) and repeat the
process of expansion, linearization and elimination of one variable, saving the
ratios formed. We note that gx is obtained by differentiating
xs-2),bs(xl, "", xN-2, bs-l(xl,
with respect to x at the point (x], -.., xv-2); thus,
fa(xl,
"",
xN-2,bs-l(xl,
"",
"",
xs-2)))
+ f/v" [(-f/f/v) + (f’[,-1/fl)" (gn2x,/gn2xN-,)]
We continue in this manner replacing one variable at a time, each gk being
expanded about the point (x’i
XN-k+I).
Step N. At this stage we have g fv(xl,b2,b3, "", bN), where the bj’s
are obtained by back-substitution in the N
1 rowed triangularized linear system
built up (i.e., the extension of the system begun in (4.3) and (4.6)) which now has the
form (omitting arguments):
i-1
(4.7)
j=l
gN- + 1/gN- + 1,xi
,N- 1,...,2,
(with gl fl and bl xl) so that gN is a function of just xl. Now expanding,
linearizing and solving for x, we obtain x
x g,/g We use the point x
i=N,
NEWTON-LIKE METHOD
563
thus obtained as the improved approximation X +1 to the first component r
of the root r, call it b,, and back-solve the bs system (4.7) to get improved approximations to the other components r s. Here xsn+ will equal the corresponding bs
obtained when back-solving. We note that the most recent information available
is immediately used in the construction of the next function argument, similar
to what is done in the Gauss-Seidel process for linear [3, pp. 194-203] and nonlinear [8, p. 503] systems of equations. The algorithm outlined in 4 will be called
"Algorithm 4" hereafter.
Remark. As the following example shows, Algorithm 4 is not mathematically
equivalent to Newton’s method.
Example. Consider the nonlinear system
f (x, y)= x 2
2y
+ 1 -O,
g(x,y)= x+ 2y 2- 3 =0,
which has a solution at (1, 1)T. For x
(0, 0) T we obtain
x
(3, 0.5) from Newton’s method, whereas
x
(2.5, 0.5) from Algorithm 4.
Matrix formulation. Recall that for the sake of definiteness we eliminated the
variables in the order xN, xN_ 1, "’", x2. If we use the chain rule for differentiation
to expand each derivative gixs which appears in Algorithm 4, we obtain the following matrix representation for the forward part of the method"
H.(x"+
where H
x")
-g,
(his) is given by
j= 1,...,N,
fls,
Aj A,N-i+2
As A,,-i+
(4.8)
(--1) i+1
A,N-1 AN
A,-, f:
flN
f/- 1,N-i+2
f/- 1,N
and where the argument of each fs is the progressive argument generated successively in Algorithm 4.
We observe that his 0 for j > N
+ 1 i.e., H is transverse upper triangular.
More important, the hs have precisely the same form as the corresponding elements
obtained by transverse triangularization of the Jacobian matrix J using Gaussian
elimination with partial pivoting. The argument of each fs in the triangularized
form of J is simply x"; however, at the root, the two types of arguments coincide
(see Lemm a 6.1 ).
564
KENNETH M. BROWN
Let us illustrate the matrix formulation explicitly for the 3 x 3 case. Here
A2
f12 f13
f22 f23
f13
A1
fll f13
f21 f23
f13
f A2 f
f21 f22 f23
f31 f32 f33
f12 f13
f f2
0
and the arguments are given by the following:
x" for flj,
(x], x, b3(x], x.)) for fzj, and
(x], b2(x]), b3(x], b2(x]))) for f3j, where j 1, 2, 3.
THEORFM 4.1. Under the hypotheses for Newton’s method given in Theorem 3.1,
there exists a nonvanishing partial derivative gi,j at the i-th step of the elimination
process defined by Algorithm 4.
Proof Let r again denote a solution of (1.2). When the gixj obtained at each
step of Algorithm 4 are evaluated at the point (rx,r2,... ri-i+ 1), the resulting
values are equal to the elements, say Tj(r), which appear when triangularizing
J(f)], relative to its minor diagonal by Gaussian elimination with pivoting. (This
follows from our mode of construction of the gx, or can be proven formally
using induction when comparing the T(r) with (4.8) above.) We assumed, however,
3(f)], to be nonsingular hence no row of [T] may contain all zeros. The conclusion
now follows from our assumptions of twice continuous differentiability and
sufficient closeness to r, since a matrix which is sufficiently close to an invertible
matrix is itself invertible.
5. The iteration function. We may formalize the process by writing down the
method in terms of the iteration function F (F, -.., FN) used, beginning with a
starting guess x to form successively
x "+
(5.1)
F(x"),
n
0, 1,2,
The iteration function, F, for Algorithm 4 is given by
i-1
Fi(XI’
(5.2)
XN)
Xi
Z (gN-i+l,x,i/gN-i+l,xi)(Fj
Xj)
j=l
g- + 1/g- +
,,,
i=l,2,...,N,
565
NEWTON-LIKE METHOD
where as usual we define
q
whenever p > q and where
g
A(x,
gi
f/(x1,x2,
gv-i+
,x),
XN-i+ 1,b-i+2,
fl-i+ 1(xl,
xi,
bv),
bN),
bi+ bi+ 2,
1,
b).
fly(X l, b2, b3,
gv
"",
The b+ 1, bi+2,’", b, i= 1,..., N- 1, are themselves functions of the xj
and are obtained recursively by successive substitution in the system
b+l
2
Xk+l
(g-,JgN-,x+)(bj
x)
;=i+1
(5.4)
g_ /g-
,
/,
k=i,i+l,...,N-1.
For purposes of completeness, define bl x 1.
THEOREM 5.1. Any fixed point x r of the iteration function F defined by (5.2)(5.4) is a solution of the original system f(x) 0.
Proof Since r F(r), it follows from (5.2) that
for/= 1,2,..-, N,
gv-i+l(rl,’", ri) =0
and from (5.3) that
b) O,
for
1, 2, -.., N. As a consequence of (5.4), using (5.5), we have b k+
k i, + 1, ..., N 1, but this implies from (5.6) that j}_+ l(r) 0,
1,
(5.6)
flu-i+ l(rl,
ri,
bi+ 1, bi+ 2,
rk+ 1,
..., N.
6. Local quadratic convergence of the method. Henrici gives the following
result for iteration functions [5, p. 1043.
TI-IOREM 6.1. Let the functions F1,’", F be defined in a region R, and let
them satisfy the following conditions:
(i) The first partial derivatives ofF1,..., FN exist and are continuous in R.
(ii) The system
x
F(x)
has a solution r in the interior of R such that J(F)]
0, the zero matrix.
Then there exists a number e > 0 such that algorithm (5.1) converges to r for any
where I1" denotes the
choice of the starting point, x which satisfies IIr xll
Euclidean norm.
We now establish the local quadratic convergence of Algorithm 4 by showing
that the Jacobian matrix of the iteration function F defined by (5.2)-(5.4) is the zero
matrix at the root x r, and then appealing to Theorem 6.1. We remark that since
the order of the elimination of the variables may change from step to step, the
,
,
566
KENNETH M. BROWN
coordinatized representation of the iteration function will perhaps change from
point to point in the iteration sequence. At the root x r, we shall assume the
coordinatized representation induced by eliminating the variables in the order
XN XNX2 X1
DEVINITION 6.1. Rn-i+
(rl, "", m-i+ 1).
N.
LEMMA 6.1. For x r, bi ri, and g(Rig_i+ x) 0,
1,
follow
from
The
and
conclusions
directly
(5.4)
(5.3).
Proof
LEMMA 6.2. If b x (g/gx,) and gjx,(Rig_j+ 1) :/: 0, then bix,(RN_j+ ) O.
Proof. With all functions evaluated at RN-j + 1,
...,
(gj,,)2
1-
bix
and since, by Lemma 6.1, ga(Rig_a+ 1) 0, we obtain the required result.
N 1, N 2,..., 1, g-i+ 1,,(Ri) 0, wheneer
LEMMA 6.3. For each
j=i+ 1, + 2,...,N.
Proof. We must avoid the temptation to say that the result is true immediately
since g_+ seems to be independent of the variables x+ 1, x+2, "-, x. In
reality, the ba,j i+ 1,..., N, which appear as arguments of the function
gig-i+ do depend on x+
x. The proof is accomplished by an induction
within an induction, first on the row subscripts taken in the order N
1, N 2,
1 and then on the relevant column subscripts. Lemmas 6.1 and 6.2 are invoked
, ...,
..,
repeatedly.
THEOREM 6.2. J(F)]x_0, the zero matrix.
Proof Now Flx(r)= 0 by Lemma 6.2; moreover, for j 2,3, ,N,
Fl(r) 0- gu(r)/gig:(r)= 0 by Lemma 6.3. Now assume that for each
i= 1, 2,...,k- 1,2=<k N, we have shown
(6.1)
F/(r) 0
...,
N. Once accomfor all j 1, .-., N. We now show Fkx(r) 0 for j 1, 2,
the
N1 times will establish
required result.
plished, repeating the argument
From (5.3) with
k, we first obtain by using (6.1) and Lemma 6.1
Fkx(r)
for all m
1,2,
g__s-_ k%l,(Rk) (0
gig-
0
..., k
k
+
1, k
gIg-k
+,x,.(Rk)
l,x(Rk)
0
gig- k +
k+
1; moreover
Fk(r)
Finally for m
1)
1
0
g-k + 1,x(Rk)
O.
g2_k+ 1,x(Rk)=
+ 2,..., N’
Fkxm(r)
0
Thus the theorem is established.
0
gig-k +
gN-k +
1,xm(Rk)
1,xk(Rk)
=0
by Lemma 6.3.
567
NEWTON-LIKE METHOD
7. Numerical results. The structure of Algorithm 4 suggests that it should be
best suited to nonlinear systems in which the initial equations are nearly linear; i.e.,
in Algorithm 4, information generated from the first equations of the system enters
into computations performed with the remaining equations. This contrasts
sharply with Newton’s method in which all equations are treated simultaneously.
Example 7.1. We consider an extreme case of the situation presented in the
previous paragraph, namely a system in which all but the last equation are linear"
N
f/(x)= -(S + 1)+ 2Xu +
i= 1,...
x j,
,S
1,
j=l
j@i
N
fN(x)
-1
+
l-I
j=l
In the computer results given in Table 1 for several values of N, the starting guess
used for all cases was 0.5 (a vector of length N each component of which is 0.5).
Algorithm 4 converged to the root 1 in each instance and for N 5 Newton’s
method converged to the root given approximately by (-0.579, -0.579, -0.579,
-0.579, 8.90) y. We observe that the Jacobian matrix of the system is nonsingular
at the starting guess and at the two roots. In the table "diverged" means that
Ilx" [oo oo whereas "converged" means that each component of x"+1 agreed
with the corresponding component of x" to 15 significant digits and IIf(x
< 10-15.
-
TABLE
Computer results.[br Example 7.1
N
5
10
15
20
Newton’s Method
Algorithm 4
converged in 6 iterations
converged in 7,iterations
converged in 8 iterations
converged in 8 iterations
converged in 18 iterations
103
diverged, Ilx lllt
diverged, IlxXll/2 105
diverged, [IxXll/2 106
Example 7.2. In the following system, fl(X1, X2) is approximately linear near
the roots"
fl(X1,X2)-- X21
f2(x1, X2)-- (X
The system has roots at
rl
r2
-
1,
X2
2) 2
nt-
(X 2
0.5) 2
1.
(1.54634288, 1.39117631)r; and
(1.06734609, 0.139227667) r.
The starting guess used was (0.1, 2.0) r. The computer results are given in Table 2.
568
KENNETH M. BROWN
TABLE 2
Computer results for Example 7.2
Method
Result
Newton
converged to
Algorithm 4
converged to r in 10 iterations
1"
in 24 iterations
Example 7.3. The following system was studied first by Freudenstein and
Roth [4] and later by Broyden [2]"
-
fx(x,xa)-- -13 + xx + ((--X 2 5)X 2 2)X2,
fz(Xl,Xz)-- --29 + X + ((X 2 _qt_ 1)X2 14)X2.
This system has a solution at (5, 4) r. In the computer results summarized in
Table 3, the starting guess used in each case was (15, -2)
r.
TABLE 3
Computer results Jbr Example 7.3
Method
Newton
Broyden’s [2, p. 591]
Broyden’s II [2, p. 591]
Damped Newton (discrete form) [9]
Algorithm 4
Result
converged in 42 iterations
diverged
diverged
diverged
converged in 10 iterations
Remark. In 1] we have given a description of the discretized form of Algorithm
4 in which the analytic partial derivatives are approximated by first difference
quotients. This discretized version of the method requires only (N2/2 + 3N/2)
function evaluations per iterative step as compared with (N 2 + N) evaluations
for the discretized Newton’s method. Experimental evidence shows a quadratic
type of convergence behavior for this discretized form of Algorithm 4, but rigorous
convergence results are yet to be obtained. In the latter connection, recent work
by Ortega and Rheinboldt [7] appears extremely useful.
Acknowledgments. The author wishes to thank Professor Samuel D. Conte,
his thesis advisor, for his abundant encouragement and support, and, in particular,
for many helpful discussions relative to this research. The author also wishes to
thank Professor Peter Henrici who guided and stimulated the author’s early work
in numerical analysis and who proposed research along the present lines. Finally,
the author wishes to thank Mr. D. Jordan, Mr. R. Tenney, and Mr. R. Weber for
assistance with the numerical experiments.
NEWTON-LIKE METHOD
569
REFERENCES
[1] K. M. BROWN, Solution ofsimultaneous non-linear equations, Comm. ACM, 10 (1967), pp. 728-729.
[2] C. G. BROYDEN, A class of methods for solving nonlinear simultaneous equations, Math. Comp.,
19 (1965), pp. 577-593.
[3] S. D. CONTE, Elementary Numerical Analysis, McGraw-Hill, New York, 1965.
[4] F. FREUDENSTEIN AND B. ROTH, Numerical solutions of systems of nonlinear equations, J. Assoc.
Comput. Mach., 10 (1963), pp. 550-556.
[5] P. K. HENRICI, Elements of Numerical Analysis, John Wiley, New York, 1964.
[6] L.V. KNTOROVICH AND G. P. AKILOV, Functional Analysis in Normed Spaces, Pergamon, Oxford,
England, 1964.
[7] J. M. ORTEGA AND W. C. RHEINBOLDT, Iterative Methods for Nonlinear Operator Equations,
Blaisdell, Waltham, Massachusetts, to appear.
[8] J. M. ORTEGA AND M. L. ROCKOFF, Nonlinear difference equations and Gauss-Siedel type iterative
methods, this Journal, 3 (1966), pp. 497-513.
[9] H. SP,TH, The damped Tailor’s series methodfor minimizing a sum of squares andfor solving systems
of nonlinear equations, Comm. ACM, 10 (1967), pp. 726-728.