SIAM J. NUMER. ANAL. Vol. 6, No. 4, December 1969 A QUADRATICALLY CONVERGENT NEWTON-LIKE METHOD BASED UPON GAUSSIAN ELIMINATION* KENNETH M. BROWN’ 1. Introduction. Let a real-valued twice continuously differentiable system of N nonlinear equations in N unknowns be given as A(X) f2(x) A(X1,X2, f2(x1, X2, fv(x) fzv(xl, x2, XN) "’", X) O, O, (.) xz)= 0 or in vector notation as f(x) (1.2) 0. In this paper we present an iterative method for the numerical solution of(1.1). The method is a variation of Newton’s method incorporating Gaussian elimination in such a way that the most recent information is always used at each step of the algorithm. After specifying the method in terms of an iteration function, we prove that the iteration converges locally and that the convergence is quadratic in nature. Computer results are given and a comparison is made with Newton’s method; these results illustrate the effectiveness of the method for nonlinear systems containing linear or mildly nonlinear equations. 2. Notation. We shall introduce most of the notation as needed;however some comments concerning the symbols for partial differentiation are in order here. If we are given a function G(u, v) in which u(x, y), u v v(x, y), then we adopt the following conventions" G Ox Ou G,-;+ GxOX OX (2.1) =- G,ux + Gvx, cv u + G.. G Gu + Gv, vy vy whereas (2.2) G Gu, G2 Go; thus if we let f denote the ith function of the system (1.1), f, f c3f/Oxj is meant in the sense of (2.1) and fj f,j =_ (f)j is meant in the sense of (2.2). * Received by the editors December 16, 1968, and in revised form June 21, 1969. ]" Department of Computer Science, Cornell University, Ithaca, New York 14850. This work constitutes a portion of the author’s doctoral thesis at the Department of Computer Sciences, Division of Mathematical Sciences, Purdue University, Lafayette, Indiana, 1966. 560 561 NEWTON-LIKE METHOD ..., 3. Newton’s method. Let x xv) be an initial approximation to a (x, real solution r (r 1, "’", rN) of(1.1). The iteration for Newton’s method is given by X n+l [J(f")]-if., where J(f) is the Jacobian matrix [c3f/c3xj] and the superscript (3.1) x" 0, 1,2, n ..., n means that all functions involved are to be evaluated at x x". The following local convergence theorem is well known for Newton’s method (see e.g. [7]). THEOREM 3.1. If (i) in a closed region R whose interior contains a root x r of(1.2), each fi is twice continuously differentiable.for i= 1,..., N, (ii) J(f) is nonsingular at x r, and (iii) x is chosen in R sufficiently close to x r, then the iteration (3.1) is convergent to r. L. V. Kantorovich [6, p. 708 has given a nonlocal convergence theorem for Newton’s method which guarantees the existence of the solution r from information available in a sphere about x . 4. Description of the method. The algorithm which we develop is essentially a modified Newton’s method based on Gaussian elimination; we approximate the forward triangularization of the full Jacobian matrix by working with one row at a time, eliminating one variable for each row treated. We shall assume that conditions (i), (ii) and (iii) of Theorem 3.1 hold. Let x" denote an nth approximation to the root x r of (1.1). The method consists of applying the following steps: Step 1. Expand fl(x) in a Taylor series about the point x"; retain only first order terms and thus obtain the linear approximation A(x") + A(x")(x x) + A(x")(x x) + + A(x")(x x). Equate the right-hand side of (4.1) to zero and solve for that variable, say (4.1) A(x) whose corresponding partial derivative is largest in absolute value. If we assume the hypotheses of Theorem 3.1, such an explicit solution can always be carried out provided that x" is sufficiently close to r since, then, J(f’) will be close to J(f)],, a nonsingular matrix; this implies that at least one of the flxj(x") is different from zero. Thus we have, (4.2) x- xv N-1 (f]xj/f],,,,)(xj- x)j=l ..., N- 1, and and the constants fl,,(x")/fl,,,,(x"), j 1, rename the for of we For future saved are use. later, clarity purposes fl(x")/flx,,(x") left-hand side of (4.2) as b: where (4.3) f]--fl,,(x") bv(xl, x2, ..., xv_ 1) right-hand side of (4.2), xv and define b, bv(x], xz, x_ 1). We note that bv Step 2. Define the function g2 of the n 1 variables x l, (4.4) gz(xl, "", Xlv-1) f2(xl, "", xv-1, bv(xl, f]/f]x,,. "’, ..’, x_ by xv-1)), 562 KENNETH M. BROWN and g by g"2 xv-1, bv). f2(x], Expand g2 in a Taylor series, this time about the point (x], .-., x,_ 1), linearize and solve for that variable, say xN-1, whose corresponding partial derivative is largest in magnitude obtaining N-2 (4.5) j=l Here g,, is obtained by differentiation of (4.4) using the chain rule and is given by bv] 1) f2.i + fzv (-fJfx). Renaming the left-hand side of(4.5)as b_ 1, a function of the N variables, we have (4.6) b_ 1(xl, x2, ..., x-2) 2 remaining right-hand side of (4.5). ..., N 2 and g"2/g,,_,, are saved Again the ratios formed, g"2,,.,/g"2,,_,, j 1, for future use. We shall show in Theorem 4.1 that under the hypotheses for Newton’s method this process is well-defined, i.e., that there actually exists at least one nonvanishing partial derivative at each stage of the process. Step 3. Define g3 --f3(xl, XN-2, bN-1, bn), where bN-1 and bn are obtained by back-substitution in the linear system (4.3) and (4.6) and repeat the process of expansion, linearization and elimination of one variable, saving the ratios formed. We note that gx is obtained by differentiating xs-2),bs(xl, "", xN-2, bs-l(xl, with respect to x at the point (x], -.., xv-2); thus, fa(xl, "", xN-2,bs-l(xl, "", "", xs-2))) + f/v" [(-f/f/v) + (f’[,-1/fl)" (gn2x,/gn2xN-,)] We continue in this manner replacing one variable at a time, each gk being expanded about the point (x’i XN-k+I). Step N. At this stage we have g fv(xl,b2,b3, "", bN), where the bj’s are obtained by back-substitution in the N 1 rowed triangularized linear system built up (i.e., the extension of the system begun in (4.3) and (4.6)) which now has the form (omitting arguments): i-1 (4.7) j=l gN- + 1/gN- + 1,xi ,N- 1,...,2, (with gl fl and bl xl) so that gN is a function of just xl. Now expanding, linearizing and solving for x, we obtain x x g,/g We use the point x i=N, NEWTON-LIKE METHOD 563 thus obtained as the improved approximation X +1 to the first component r of the root r, call it b,, and back-solve the bs system (4.7) to get improved approximations to the other components r s. Here xsn+ will equal the corresponding bs obtained when back-solving. We note that the most recent information available is immediately used in the construction of the next function argument, similar to what is done in the Gauss-Seidel process for linear [3, pp. 194-203] and nonlinear [8, p. 503] systems of equations. The algorithm outlined in 4 will be called "Algorithm 4" hereafter. Remark. As the following example shows, Algorithm 4 is not mathematically equivalent to Newton’s method. Example. Consider the nonlinear system f (x, y)= x 2 2y + 1 -O, g(x,y)= x+ 2y 2- 3 =0, which has a solution at (1, 1)T. For x (0, 0) T we obtain x (3, 0.5) from Newton’s method, whereas x (2.5, 0.5) from Algorithm 4. Matrix formulation. Recall that for the sake of definiteness we eliminated the variables in the order xN, xN_ 1, "’", x2. If we use the chain rule for differentiation to expand each derivative gixs which appears in Algorithm 4, we obtain the following matrix representation for the forward part of the method" H.(x"+ where H x") -g, (his) is given by j= 1,...,N, fls, Aj A,N-i+2 As A,,-i+ (4.8) (--1) i+1 A,N-1 AN A,-, f: flN f/- 1,N-i+2 f/- 1,N and where the argument of each fs is the progressive argument generated successively in Algorithm 4. We observe that his 0 for j > N + 1 i.e., H is transverse upper triangular. More important, the hs have precisely the same form as the corresponding elements obtained by transverse triangularization of the Jacobian matrix J using Gaussian elimination with partial pivoting. The argument of each fs in the triangularized form of J is simply x"; however, at the root, the two types of arguments coincide (see Lemm a 6.1 ). 564 KENNETH M. BROWN Let us illustrate the matrix formulation explicitly for the 3 x 3 case. Here A2 f12 f13 f22 f23 f13 A1 fll f13 f21 f23 f13 f A2 f f21 f22 f23 f31 f32 f33 f12 f13 f f2 0 and the arguments are given by the following: x" for flj, (x], x, b3(x], x.)) for fzj, and (x], b2(x]), b3(x], b2(x]))) for f3j, where j 1, 2, 3. THEORFM 4.1. Under the hypotheses for Newton’s method given in Theorem 3.1, there exists a nonvanishing partial derivative gi,j at the i-th step of the elimination process defined by Algorithm 4. Proof Let r again denote a solution of (1.2). When the gixj obtained at each step of Algorithm 4 are evaluated at the point (rx,r2,... ri-i+ 1), the resulting values are equal to the elements, say Tj(r), which appear when triangularizing J(f)], relative to its minor diagonal by Gaussian elimination with pivoting. (This follows from our mode of construction of the gx, or can be proven formally using induction when comparing the T(r) with (4.8) above.) We assumed, however, 3(f)], to be nonsingular hence no row of [T] may contain all zeros. The conclusion now follows from our assumptions of twice continuous differentiability and sufficient closeness to r, since a matrix which is sufficiently close to an invertible matrix is itself invertible. 5. The iteration function. We may formalize the process by writing down the method in terms of the iteration function F (F, -.., FN) used, beginning with a starting guess x to form successively x "+ (5.1) F(x"), n 0, 1,2, The iteration function, F, for Algorithm 4 is given by i-1 Fi(XI’ (5.2) XN) Xi Z (gN-i+l,x,i/gN-i+l,xi)(Fj Xj) j=l g- + 1/g- + ,,, i=l,2,...,N, 565 NEWTON-LIKE METHOD where as usual we define q whenever p > q and where g A(x, gi f/(x1,x2, gv-i+ ,x), XN-i+ 1,b-i+2, fl-i+ 1(xl, xi, bv), bN), bi+ bi+ 2, 1, b). fly(X l, b2, b3, gv "", The b+ 1, bi+2,’", b, i= 1,..., N- 1, are themselves functions of the xj and are obtained recursively by successive substitution in the system b+l 2 Xk+l (g-,JgN-,x+)(bj x) ;=i+1 (5.4) g_ /g- , /, k=i,i+l,...,N-1. For purposes of completeness, define bl x 1. THEOREM 5.1. Any fixed point x r of the iteration function F defined by (5.2)(5.4) is a solution of the original system f(x) 0. Proof Since r F(r), it follows from (5.2) that for/= 1,2,..-, N, gv-i+l(rl,’", ri) =0 and from (5.3) that b) O, for 1, 2, -.., N. As a consequence of (5.4), using (5.5), we have b k+ k i, + 1, ..., N 1, but this implies from (5.6) that j}_+ l(r) 0, 1, (5.6) flu-i+ l(rl, ri, bi+ 1, bi+ 2, rk+ 1, ..., N. 6. Local quadratic convergence of the method. Henrici gives the following result for iteration functions [5, p. 1043. TI-IOREM 6.1. Let the functions F1,’", F be defined in a region R, and let them satisfy the following conditions: (i) The first partial derivatives ofF1,..., FN exist and are continuous in R. (ii) The system x F(x) has a solution r in the interior of R such that J(F)] 0, the zero matrix. Then there exists a number e > 0 such that algorithm (5.1) converges to r for any where I1" denotes the choice of the starting point, x which satisfies IIr xll Euclidean norm. We now establish the local quadratic convergence of Algorithm 4 by showing that the Jacobian matrix of the iteration function F defined by (5.2)-(5.4) is the zero matrix at the root x r, and then appealing to Theorem 6.1. We remark that since the order of the elimination of the variables may change from step to step, the , , 566 KENNETH M. BROWN coordinatized representation of the iteration function will perhaps change from point to point in the iteration sequence. At the root x r, we shall assume the coordinatized representation induced by eliminating the variables in the order XN XNX2 X1 DEVINITION 6.1. Rn-i+ (rl, "", m-i+ 1). N. LEMMA 6.1. For x r, bi ri, and g(Rig_i+ x) 0, 1, follow from The and conclusions directly (5.4) (5.3). Proof LEMMA 6.2. If b x (g/gx,) and gjx,(Rig_j+ 1) :/: 0, then bix,(RN_j+ ) O. Proof. With all functions evaluated at RN-j + 1, ..., (gj,,)2 1- bix and since, by Lemma 6.1, ga(Rig_a+ 1) 0, we obtain the required result. N 1, N 2,..., 1, g-i+ 1,,(Ri) 0, wheneer LEMMA 6.3. For each j=i+ 1, + 2,...,N. Proof. We must avoid the temptation to say that the result is true immediately since g_+ seems to be independent of the variables x+ 1, x+2, "-, x. In reality, the ba,j i+ 1,..., N, which appear as arguments of the function gig-i+ do depend on x+ x. The proof is accomplished by an induction within an induction, first on the row subscripts taken in the order N 1, N 2, 1 and then on the relevant column subscripts. Lemmas 6.1 and 6.2 are invoked , ..., .., repeatedly. THEOREM 6.2. J(F)]x_0, the zero matrix. Proof Now Flx(r)= 0 by Lemma 6.2; moreover, for j 2,3, ,N, Fl(r) 0- gu(r)/gig:(r)= 0 by Lemma 6.3. Now assume that for each i= 1, 2,...,k- 1,2=<k N, we have shown (6.1) F/(r) 0 ..., N. Once accomfor all j 1, .-., N. We now show Fkx(r) 0 for j 1, 2, the N1 times will establish required result. plished, repeating the argument From (5.3) with k, we first obtain by using (6.1) and Lemma 6.1 Fkx(r) for all m 1,2, g__s-_ k%l,(Rk) (0 gig- 0 ..., k k + 1, k gIg-k +,x,.(Rk) l,x(Rk) 0 gig- k + k+ 1; moreover Fk(r) Finally for m 1) 1 0 g-k + 1,x(Rk) O. g2_k+ 1,x(Rk)= + 2,..., N’ Fkxm(r) 0 Thus the theorem is established. 0 gig-k + gN-k + 1,xm(Rk) 1,xk(Rk) =0 by Lemma 6.3. 567 NEWTON-LIKE METHOD 7. Numerical results. The structure of Algorithm 4 suggests that it should be best suited to nonlinear systems in which the initial equations are nearly linear; i.e., in Algorithm 4, information generated from the first equations of the system enters into computations performed with the remaining equations. This contrasts sharply with Newton’s method in which all equations are treated simultaneously. Example 7.1. We consider an extreme case of the situation presented in the previous paragraph, namely a system in which all but the last equation are linear" N f/(x)= -(S + 1)+ 2Xu + i= 1,... x j, ,S 1, j=l j@i N fN(x) -1 + l-I j=l In the computer results given in Table 1 for several values of N, the starting guess used for all cases was 0.5 (a vector of length N each component of which is 0.5). Algorithm 4 converged to the root 1 in each instance and for N 5 Newton’s method converged to the root given approximately by (-0.579, -0.579, -0.579, -0.579, 8.90) y. We observe that the Jacobian matrix of the system is nonsingular at the starting guess and at the two roots. In the table "diverged" means that Ilx" [oo oo whereas "converged" means that each component of x"+1 agreed with the corresponding component of x" to 15 significant digits and IIf(x < 10-15. - TABLE Computer results.[br Example 7.1 N 5 10 15 20 Newton’s Method Algorithm 4 converged in 6 iterations converged in 7,iterations converged in 8 iterations converged in 8 iterations converged in 18 iterations 103 diverged, Ilx lllt diverged, IlxXll/2 105 diverged, [IxXll/2 106 Example 7.2. In the following system, fl(X1, X2) is approximately linear near the roots" fl(X1,X2)-- X21 f2(x1, X2)-- (X The system has roots at rl r2 - 1, X2 2) 2 nt- (X 2 0.5) 2 1. (1.54634288, 1.39117631)r; and (1.06734609, 0.139227667) r. The starting guess used was (0.1, 2.0) r. The computer results are given in Table 2. 568 KENNETH M. BROWN TABLE 2 Computer results for Example 7.2 Method Result Newton converged to Algorithm 4 converged to r in 10 iterations 1" in 24 iterations Example 7.3. The following system was studied first by Freudenstein and Roth [4] and later by Broyden [2]" - fx(x,xa)-- -13 + xx + ((--X 2 5)X 2 2)X2, fz(Xl,Xz)-- --29 + X + ((X 2 _qt_ 1)X2 14)X2. This system has a solution at (5, 4) r. In the computer results summarized in Table 3, the starting guess used in each case was (15, -2) r. TABLE 3 Computer results Jbr Example 7.3 Method Newton Broyden’s [2, p. 591] Broyden’s II [2, p. 591] Damped Newton (discrete form) [9] Algorithm 4 Result converged in 42 iterations diverged diverged diverged converged in 10 iterations Remark. In 1] we have given a description of the discretized form of Algorithm 4 in which the analytic partial derivatives are approximated by first difference quotients. This discretized version of the method requires only (N2/2 + 3N/2) function evaluations per iterative step as compared with (N 2 + N) evaluations for the discretized Newton’s method. Experimental evidence shows a quadratic type of convergence behavior for this discretized form of Algorithm 4, but rigorous convergence results are yet to be obtained. In the latter connection, recent work by Ortega and Rheinboldt [7] appears extremely useful. Acknowledgments. The author wishes to thank Professor Samuel D. Conte, his thesis advisor, for his abundant encouragement and support, and, in particular, for many helpful discussions relative to this research. The author also wishes to thank Professor Peter Henrici who guided and stimulated the author’s early work in numerical analysis and who proposed research along the present lines. Finally, the author wishes to thank Mr. D. Jordan, Mr. R. Tenney, and Mr. R. Weber for assistance with the numerical experiments. NEWTON-LIKE METHOD 569 REFERENCES [1] K. M. BROWN, Solution ofsimultaneous non-linear equations, Comm. ACM, 10 (1967), pp. 728-729. [2] C. G. BROYDEN, A class of methods for solving nonlinear simultaneous equations, Math. Comp., 19 (1965), pp. 577-593. [3] S. D. CONTE, Elementary Numerical Analysis, McGraw-Hill, New York, 1965. [4] F. FREUDENSTEIN AND B. ROTH, Numerical solutions of systems of nonlinear equations, J. Assoc. Comput. Mach., 10 (1963), pp. 550-556. [5] P. K. HENRICI, Elements of Numerical Analysis, John Wiley, New York, 1964. [6] L.V. KNTOROVICH AND G. P. AKILOV, Functional Analysis in Normed Spaces, Pergamon, Oxford, England, 1964. [7] J. M. ORTEGA AND W. C. RHEINBOLDT, Iterative Methods for Nonlinear Operator Equations, Blaisdell, Waltham, Massachusetts, to appear. [8] J. M. ORTEGA AND M. L. ROCKOFF, Nonlinear difference equations and Gauss-Siedel type iterative methods, this Journal, 3 (1966), pp. 497-513. [9] H. SP,TH, The damped Tailor’s series methodfor minimizing a sum of squares andfor solving systems of nonlinear equations, Comm. ACM, 10 (1967), pp. 726-728.
© Copyright 2026 Paperzz