Global Convergence of Damped Newton's Method for Nonsmooth Equations via the Path Search Author(s): Daniel Ralph Source: Mathematics of Operations Research, Vol. 19, No. 2 (May, 1994), pp. 352-389 Published by: INFORMS Stable URL: http://www.jstor.org/stable/3690225 . Accessed: 09/06/2011 06:56 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at . http://www.jstor.org/action/showPublisher?publisherCode=informs. . Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Mathematics of Operations Research. http://www.jstor.org MATHEMATICS OF OPERATIONS Vol. 19, No. 2, May 1994 Printed in U.S.A. RESEARCH GLOBAL CONVERGENCE OF DAMPED NEWTON'S METHOD FOR NONSMOOTH EQUATIONS VIA THE PATH SEARCH DANIEL RALPH A natural damping of Newton's method for nonsmooth equations is presented. This damping, via the path search instead of the traditional line search, enlarges the domain of convergence of Newton's method and therefore is said to be globally convergent. Convergence behavior is like that of line search damped Newton's method for smooth equations, including Q-quadratic convergence rates under appropriate conditions. Applications of the path search include damping Robinson-Newton's method for nonsmooth normal equations corresponding to nonlinear complementarity problems and variational inequalities, hence damping both Wilson's method (sequential quadratic programming) for nonlinear programming and Josephy-Newton's method for generalized equations. Computational examples from nonlinear programming are given. 1. Introduction. This paper presents a novel, natural damping of (local) Newton's method-essentially Robinson-Newton's method (Robinson 1988)-for solving nonsmooth equations. This damping, via the so-called path search instead of the traditional line search, enlarges the domain of convergence of Newton's method and therefore is said to be globally convergent. It is natural in the sense that the convergence behavior of the method is fundamentally the same as that of the traditional line search damped Newton's method applied to smooth equations, which, under appropriate conditions, is roughly described as linear convergence far from a solution and superlinear, possibly quadratic, convergence near a solution. Such convergence results are amongst the strongest results of their kind, but also require strong conditions-see further discussion below. We also investigate the nonmonotone path search (??3, 4) which is an easy extension of the nonmonotone line search (Grippo, Lampariello and Lucidi 1986). An immediate application is damping (Robinson-) Newton's method for solving the nonsmooth normal equations (Robinson 1992), yielding a damping procedure for both Wilson's method (or sequential quadratic programming) (Wilson 1963, Fletcher 1987) for nonlinear programs, and Josephy-Newton's method (Josephy 1979) for generalized equations. Path search damped Newton's method likewise applies to the normal equation formulation of variational inequalities and nonlinear complementarity problems, as described in ?5. For the moment, our prototype of a nonsmooth function will be def F+(x) where F: RN --> N -F(x+) + x - x, Vx E RN is a smooth function, and x+ is the vector in R[N whose ith Received January 18, 1991; revised October 27, 1992. AMS 1980 subject classification. Primary: 49D15, 90C30, 90C33. IAOR 1973 subject classification. Main: Optimization. Cross references: Programming: Nonlinear. OR/MS Index 1978 subject classification. Primary: 653 Programming/Nonlinear. Key words. Damped Newton's method, global convergence, first-order approximation, line search, path search, nonsmooth equations, normal mappings, variational inequalities, generalized equations, complementarity problems, nonlinear programming. 352 0364-765X/94/ 1902/0352/$01.25 Copyright ( 1994, The Institute of Management Sciences/Operations Research Society of America GLOBAL CONVERGENCE OF DAMPED NEWTON S METHOD 353 component is max{x,, 0}. F+ is only nonsmooth on the boundaries of orthants in RN. The system F+(x)= 0 is actually the normal formulation of the nonlinear complementarity problem: find z E RN such that z > O, F(z) > 0, and = (z,F(z)) 0, where vector inequalities are taken in a componentwise fashion. Note that if x solves the normal equation F+(x) = 0, then z = defx solves the nonlinear complementarity problem; and, conversely, a solution z of the latter yields a solution x = defz - F(z) of the former. We need some notation to aid further explanation. Suppose X and Y are both N-dimensional normed spaces (though Banach spaces can be dealt with) and f: X -> Y. We wish to solve the equation f(x) x EX. =0, For the traditional damped Newton's method (Ortega and Rheinboldt 1970, Burdakov 1980) we assume f is a continuously differentiable function. Suppose xk is the kth iterate of the algorithm, and xk+1 is the zero of the "linearization" def Ak(x) + Vf(xk)(x f(xk) E X. -xk), Since Ak approximates f, Xk+ at least formally approximates a zero of f. Newton's SO Xk+ is called the Newton iterate. Newton's method defines xk+1 =d xk+l method is damped by a line search; that is by choosing xk+I on the line segment from xk to xk+l such that the decrease in |lf(')ll after moving from xk to xk+i is close to the decrease predicted by IIAk(Q)ll.The normed residual Ilf(-)l| is the merit function for determining xk+ . We refer to such methods as line search damped. Some convergence results under different line search strategies are set out by O. P. Burdakov (1980). The computational success of line search damped Newton's method relies on uniformly bounded invertibility of the Jacobians Vf(xk), which yields two key properties of the linearizations Ak that are independent of k: first, Ak is a firstorder approximation of f at xk, i.e., f(x) = Ak(x) + o(x - xk) where o(x - xk)/ Ix - xkll -- 0 as x -, xk; and, secondly, Ak goes to zero "rapidly" on the path pk(t) =def Xk + t(Xk+l - Xk) as t goes from 0 to 1, by which we mean that, given our choice of Ak and ^kk+, Ak(pk(t)) + tVf(xk)(xk+l =f(xk) xk) = (1 We call pk the Newton path. Observe that f(pk(t)) t)f(xk). = (1 - t)f(xk) + o(t) where o(t)/t -> 0 as t I 0, uniformly for all k. Thus for a fixed ore (0, 1) and all sufficiently small positive t independent of k, lf(pk(t))l||l So there exists a step length tk (1 - ot)llf(Xk)| . (0, 1] such that not only does the previous inequality hold at t = tk, but there is a positive lower bound on tk uniformly for all k. A (monotone) line search determines such a step length; then the damped Newton iterate is Xk+ 1 =def pk(tk). We see that the normed residual IIf(-)11is decreased by at least a constant factor after each iteration. As the residual converges to zero, the 354 DANIEL RALPH iterates converge to a solution and, within a neighborhood of this solution, superlinear convergence of the iterates is observed. We propose a damping of Newton's method suitable for a nonsmooth equation f(x) = 0, using first-order approximations Ak and pathspk with the properties summarized above. For example, suppose f = F+ from above, and the approximation to f at xk is Ak(X) def - + VF(xk)(x F(xk) -x+ ) + - +, Vx E N. The approximation error f(x)-Ak(x) is the difference between F(x+) and the linearization F(xt) + VF(xk )(x+- x ), i.e., the relationship of f to Ak is like that of a smooth function to its linearization about a given point. Newton's method-essentially Robinson-Newton's method (Robinson 1988) in this context-defines xk+1 as the Newton iterate xk+ = A 1(0), as in the smooth case. A naive damping of this method is a line search along the interval [xk+, xk]. Instead, we follow a path pk from xk (t = 0) to k+l (t = 1) on which Ak goes to zero rapidly: pk(t), the Newton path, is defined by the equation Ak(pk(t)) By assuming pk(t) f(pk(t)) k = = (1 - t)f(xk). O(t), we have that =Ak(pk(t)) + o(pk(t)+- Xk) = (1 - t)f(k) + 0(t); hence there is a path length tk (0, 1] such that lf(pk(tk))il < (1 - rt)ilf(xk)Ii. A path search determines the path length tk; then the damped Newton iterate is Xk+ =defp (tk). It is not even necessary that xk+l exist so long as pk can be defined on an interval [0, T7] for suitable Tk e (0, 1]. Assuming each Ak is invertible near xk, uniformly in some sense for all k, the path search ensures that the residuals f(xk) converge to zero at least linearly, and eventually we obtain superlinear convergence of the damped iterates to a solution. We have outlined path search damped Newton's method. Since Ak need not be affine (it is piecewise smooth above), pk need not be affine. In this case, there is no basis for the line search on [xk, xk 1]. It is even possible that both 1A1(t)l0 and Ilf(xk + t(k+ - xk))ll initially increase as t increases from 0, causing the line search to fail altogether. The path search, however, may still be numerically and theoretically sound. The remainder of the introduction will be devoted to related literature. For purposes of comparison, we observe that the approach proposed here is distinguished from most, if not all others by the use of nonsmooth Newton paths pk. The sections to come are as follows: ?2. Notation and preliminary results. ?3. Motivation: Line search damped Newton's method for smooth equations. ?4. Path search damped Newton's method for nonsmooth equations. ?5. Applications. One of the simplest approaches to (damping) Newton's method for nonsmooth equations is suggested by J.-S. Pang (1990): set = B(x) Rk(X) df(Xk) +f'(xk; x -k-xk) GLOBAL CONVERGENCE OF DAMPED 355 NEWTON S METHOD where we assume the existence of the directional derivative f'(x; d) at each x in each direction d. As before, suppose xk+1 solves Bk(x) = 0. Assuming f is also locally Lipschitz, f'(x; ) is a so-called B(ouligand)-derivative (Robinson 1987), and we say the Newton method given by xk+l =defxk+l is B-Newton's method. Now f(x)= Bk(x) + o(x - Xk), and Bk is affine on the interval [xk, xk+], hence damping B-Newton's method by a line search makes sense. A major difficulty, however, is that the closer xk is to a point of nondifferentiability f, the smaller the neighborhood of xk in which Bk accurately approximates f. This difficulty is reflected in the dearth of strong global convergence results of such a scheme. For instance, the global convergence result (Pang 1990, Theorem 6) requires the existence of an accumulation point x* of the sequence of iterates (xk) at which IIf(')l12 is differentiable; this requirement defeats the aim nonsmooth of solving equations. Harker and Xiao (1990) apply partly the theory of Pang (1990) to the nonsmooth normal mapping f = F+ associated with the nonlinear complementarity problem and provide some computational experience of line search damped B-Newton's method. For further discussion, see ?5.2. A global Newton method with convergence results comparable to ours is given by J.-S. Pang (1991), for the "min" equation formulation of variational inequalities. The min equation for the nonlinear complementarity problem, above, is 0 = f(x) def d min{x, F(x)}, where the min operation is taken componentwise. In Pang (1991) a modified B-Newton method, with line search damping, is given for this min equation. This method is globally convergent under a regularity condition at each iterate. The regularity condition at the kth iterate ensures invertibility of the equation defining the modified B-Newton method, and is quite similar to the local invertibility property of the approximation Ak of F+ needed in path search damped Newton's method. Nevertheless, the approach and analysis of Pang (1991) are quite different from ours, and have some of the flavor of Han, Pang and Rangaraj (1992), discussed next. Line search damping of general Newton methods for nonsmooth equations is considered by S.-P. Han, J.-S. Pang and N. Rangaraj (1992). Given a general iteration function G(xk, ) from RN to RN, the associated Newton method defines xk+l =def k+l where xk+l is such that f(xk) + G(xk, k+l - k) 0. To incorporate a line search on [xk,xk+l], conditions on the iteration function G(xk, ) are given to ensure a certain rate of decrease of Ilf(xk + t(xk+1 - xk))ll as t increases, at least for all small t > 0 uniformly in k. Such conditions, in general, preclude f(xk) + G(xk, ) from being a good approximation of f near xk, uniformly in k. In effect, Han, Pang and Rangaraj (1989) give conditions on G(xk, .) such that line search damping is viable on the interval [xk, xk+l], while the approximation properties of G(xk, ) with respect to f are not of importance. In contrast, we must abandon the line search to retain what we believe are the fundamental properties of the approximations Ak that make the classical Newton's method successful. Local and global Newton methods for locally Lipschitz, semismooth functions f are presented by L. Qi (1993). Such functions are B-differentiable. The proposed Newton iterate xk+1 satisfies f(xk) + Mk(k+l -xk) = 0, where the matrix Mk is any member of a refinement of the Clarke generalized Jacobian (Clarke 1983) of f at xk. When f is piecewise smooth, for example f(x) is F+(x) or min{x, F(x)}, Mk can be chosen very simply: roughly, let fk be one of the smooth functions that defines f on a The undamped "piece of smoothness" containing xk, and take Mk -defVfk(xk). and S. Shindo (1986). of M. method to the in this method reduces, Kojima case, to those are similar method line search of the Global convergence properties damped 356 DANIEL RALPH of line search damped B-Newton's method (Pang 1990), and can be enhanced using the framework of Han, Pang, and Rangaraj (1992). P. Marcotte and J.-P. Dussault (1987) have shown global convergence of JosephyNewton's method (Josephy 1979) with line search damping, for monotone variational inequalities over compact convex sets, using a nonsmooth merit function. Related work using M. Fukushima's smooth merit function is given in Fukushima (1992), and by K. Taji, M. Fukushima and T. Ibaraki (1993), and J. H. Wu, M. Florian and P. Marcotte (1990), for line search damped Josephy-Newton's method applied to strongly monotone variational inequalities over closed convex, but not necessarily compact sets. Our results are sharper when they apply, i.e., in the strongly monotone case; see Proposition 11 and the comments after its proof. It can be argued that line search damping of a Newton method for nonsmooth equations lacks the elegance of path search damping; putting it differently, path search damping seems better suited than other damping methods to the standard merit function, the normed residual. This view, however, may have little bearing on the ultimate usefulness of the different damping procedures. For example, in ?5 we use a modification of Lemke's algorithm to determine each path pk; hence this implementation might be prey to the same exponential behavior as the original Lemke's algorithm for certain pathological problems, a difficulty not observed when applying line search damped B-Newton's method to such problems (Harker and Xiao 1990). It has been pointed out by Liqun Qi that conditions sufficient for global and superlinear convergence of nonsmooth Newton methods are weaker than the assumptions made in this paper, which include a local invertibility property of Ak (see Theorem 9 and Proposition 10). For superlinear convergence see F. Bonnans (1990), M. Kojima and S. Shindo (1986), J.-S. Pang (1993), and L. Qi (1993). It seems, however, that substantially weaker conditions require the existence of the solution point a priori, whereas our procedure actually proves the existence of such a point. In fact the same criticism, and rebuttal of that criticism, can be applied to extensions of the classical Kantorovich-Newton theorem (Ortega and Rheinboldt 1970, Theorem 12.6.2) to generalized equations (Josephy 1979, Theorem 2.2) and nonsmooth equations (Robinson 1988, Theorem 3.2). Implications of our assumptions are explored further after Proposition 10. The weakest assumptions for global convergence, known to the author, are for Gauss-Newton-like methods, rather than Newton-like methods, such as the method of J.-S. Pang and S. A. Gabriel (1993). In this basic paper we prefer to make strong assumptions in order to use principles underlying the classical Newton method, and obtain strong results. A Gauss-Newton approach to nonsmooth systems prior to Pang and Gabriel (1993) is given by J.-S. Pang, S.-P. Han and N. Rangaraj (1991). Also, M. C. Ferris and S. Lucidi (1994) present conditions on merit functions used in line search damping, that are sufficient for global convergence. Finally we draw the reader's attention to the starting point of this research, K. Park (1989), in which homotopy methods via point-based approximations are considered for solving nonsmooth equations. Stephen M. Robinson has suggested that, while Park takes the viewpoint of path following, this paper proposes "multiple path following". 2. Notation and preliminary results. Most of the interest for us is in finite dimensions, when both X and Y are N-dimensional real spaces, RN, with arbitrary norms. Our most fundamental result, however, Theorem 9, is valid for complete normed spaces X, Y over R; hence this generality of X and Y will be used unless otherwise stated. Throughout, f is function mapping X to Y, and Bx, By denote the GLOBAL CONVERGENCE OF DAMPED NEWTON S METHOD 357 closed unit balls in X, Y respectively. The unit ball may be written B when the context is clear. We write t 0 to mean t -> 0, t > 0. By o(t) (as t I 0) we mean a (vector) function of the scalar t such that o(t)/t - 0 in norm as t 0. Likewise, by O(t) (as t I 0) we mean 0(t)/t is bounded as t I0. When d E X, o(d) denotes a function of d such that o(d)/lldll -- 0 as d -* 0, d / 0. The function f is Lipschitz (of modulus / > 0) on a subset XOof X if If(x) - f(x')\l is bounded above by a constant multiple (1) of lix - x'll, for any points x, x' in X0. We say f is locally Lipschitz if it is Lipschitz near each x E X. A function g from a subset XO of X to Y is said to be continuously invertible, or Lipschitzianly invertible (of modulus I > 0), if it is bijective and its inverse mapping is continuous, or Lipschitz (of modulus 1), respectively. Such a function g is continuously invertible, or Lipschitzianly invertible (of modulus 1) near a point x E XO if, for some neighborhoods U of x in X and V of g(x) in Y, the restricted mapping glunx0: Un X0 -> V: x -> g(x) is continuously invertible, or Lipschitzianly invertible (of modulus 1). In defining this restricted mapping it is tacitly assumed that g(U n Xo) c V. We are interested in approximating f when it is not necessarily differentiable. DEFINITION 1. Let X0 c X. (1) Let x e X. A first-order approximation of f at x is a mapping f: X > Y such that f(x') - f(x') = o(x - x'), x' X. A first-order approximation of f on Xo is a mapping v on XO such that for each x E Xo, a/(x) is a first-order approximation of f at x. (2) Let v be a first-order approximation of f on XO. v is a uniform first-order approximation (with respect to X0) if there exists A: (0, oo) -> [0, oo]with A(t) = o(t), such that for any x, x' e XO, |II|&(x)(x') - f(') (1) < A(llx x'tl1). v is a uniform first-order approximation near x"? XO if, for some A(t) as above, (1) holds for x, x' near x. The idea of a path will be needed to define the path search damping of Newton's method. DEFINITION 2. A path (in X) is a continuous function p: [0, T] -> X where T E [0, 1]. The domain of p is [0, T], denoted dom(p). We note a path lifting result. LEMMA 3. Let ': X -* Y, x E X and P(x) 0 O. Suppose the restricted mapping ) = def (I u: U -> V is continuously invertible, where U and V are neighborhoods of x and ((x), respectively. If U is open, and E > 0 is such that ((x) + EBy c V, then, for 0 < T < min{e/llj(x)ll, 1), the unique path p of domain [0, T] such that p(O) =-x, (p(t)) = (1 -t)(x) Vt E [0, T] is given by p(t) = )-'(( -t)PD(x)) Vt E [0,T]. PROOF. Let U, E and T be as above, and observe that p: [0, T] -- X: t -> - '((1 -t)dp(x)) is a well-defined mapping. Suppose q: [0, T] - X is a path such that q(0) = x and, for t E [0, T], ((q(t)) = (1 - t)P(x). Clearly q(t) = p(t) if and 358 DANIEL RALPH only if q(t) E U, so consider ,def t= sup{t E [O, T]Iq(s) Vs E U, E [0, t]}. If t E [0, T) and q(s) E U for each s e [0, t), then q(t) = p(t) by continuity of p and q; so q(t) belongs to the open set U. Therefore, by continuity of q, q(t') belongs to U for each t' E [0, T] near t. It follows that t 4 T, hence t = T, and q(t) = p(t) for each t e [0, T]. [ For a nonempty, closed convex set C in RN and each x E-RN, T.C(x) denotes the nearest point in C to x with respect to the Euclidean norm. The existence and uniqueness of the projected point rrc(x) is classical in the generality of Hilbert spaces. We refer the reader to Brezis (1973, Example 2.8.2 and Prop. 2.6.1) where we also see that the projection operator WTcis Lipschitz of modulus 1. The normal cone to C at z is def {y ( RN c EC} < 0, y,c-z) if E C, otherwise. 0(z) The tangent cone to C at z, denoted Tc(z), is {x E RN I(X, y) < 0, Vy E Nc(z)} if z E C, and the empty set otherwise. Next we present the normal mappings of Robinson (1992). These will be our source of applications (?5). DEFINITION4. Let C be a closed, convex, nonempty set in 1R and F: C > R. The normal mapping induced by F and C is def Fc = F o'c + I- 'c where I is the identity operator on RN. Our applications will concern finding a zero of a normal mapping such as Fc above, where C is usually polyhedral convex. If F is smooth, its derivative mapping is denoted VF. DEFINITION 5. Let F and C be as in Definition 4. F is continuously differentiable (on C) if the restriction of F to the relative interior of C, ri C, is continuously differentiable and, for each relative boundary point c of C, the limit def VF( ) lim VF(c) c ->c ceri C exists. If F, as above, is continuously differentiable (on C) then it is locally Lipschitz, just as in the classical case when C is a linear subspace; hence Fc is locally Lipschitz. We relate the normal mapping Fc to the set mapping F + Nc. 6. Suppose F, C are as in the Definition 4, and F is locally Lipschitz. The mapping Fc is Lipschitzianly invertible near x if and only if for some neighborhoods U, Vof 7rc(x), Fc(x) respectively,(F + Nc)-1 n U is a (single valued) Lipschitz function when restricted to V. LEMMA c PROOF. It is well known (Brezis 1973, Example 2.8.2) that c = rrc(x) if and only if E C and (x - c,c' - c) < , Vc' C; OF DAMPED GLOBAL CONVERGENCE c = -= F(x), So for U c RN, 359 METHOD 4 E Nc(c). It follows that thus c = rrc( + c) if and only if (2) NEWTON'S x= (F + N)(c), E c(x) + (I- F)(c). (F + Nc)(U) equals Fc(rc 1(U)) and, for s E [RN, we get (3) (F + Nc) Let x? e X and =defFc(xO). 1(e) n U = [7rr(F 1(6))] n U. Suppose Fc is Lipschitzianly invertible near x?, so, for some 8 > 0 and neighborhood V? of s?, the restriction of FC- n (x? + 286B)to V( is a Lipschitz function that maps V? onto x? + 281B.Let def - o Ud f(I -F)-(xOdef n Vd Fc(,l(U)) + 5B), (0 + 6), where e E (0, 5) is chosen such that 6? + eB c V?. Observe U is a neighborhood of - ?).Also V 7rc(x?), since I - F is a continuous function which maps 7rc(x?) to x? is a neighborhood of s?, since U is a neighborhood of 7rr(x?) and Fc is continuously invertible near x(. Moreover for E- V we have xE Fc-I( ), =x= 7Tc(x)E +( ( = x Ex? + 6 - U (IF)(77cc(x)) F)(rc(x)), E x( - +o85 ?0+ 5B cx0 + 26B. Thus 0 O [7cc(Fc '(e))] n U c 7r[Fcl'() n (x? + 25B)]. As the set on the right is a singleton, this statement combined with (3) yields (F + Nc) '(a) n U = 7,[Fc 1(e) n (x? + 26B)]. In particular (F - Nc)- n U, as a mapping on V, is a Lipschitz function. Conversely suppose the restriction of (F + Nc) 1 CnU to V is a Lipschitz function, where U and V are respective neighborhoods of 7rc(x?) and s?. Let UO be a neighborhood of 7rc(X0) in U such that F is Lipschitz on U?, and V? =def (F + Nc)(U?) n V. So the restriction of (F + Nc)-1 n U? to V0 is a Lipschitz function, and V? is a neighborhood of s?. Let U1 = def7 cl(U0), a neighborhood of x?. Using (2), we have F1; n Ul =I+ [I- F][(F+ Nc)-' n U]. So the restriction of Fc ' n U' to V0 is a Lipschitz function, and it follows that Fc is Lipschitzianly invertible near x?. n Finally, we restate Robinson (1988, Lemma 2.3), a generalization of the Banach perturbation lemma. 360 DANIEL RALPH LEMMA 7. Let f be a set in X. Let g and g' be functions from X into Y such that g l: f -> g(fl) is Lipschitzianly invertibleof modulus L > 0, and g - g' is Lipschitz on fl of modulus r7 > 0. Let x? E l and 8 > O. If (a) f D x? + SBx, (b) g(f) D g(x?) + (8/L)Y, (c) r7L < 1, then g'la: Qf - g'(f) is Lipschitzianly invertible of modulus L/(1 - 7L) > 0, and + (1 - r7L)(8/L)By. g'(Qf) D g'(x?) Define the perturbation function, h(*) = g'(.) - g(.) - [g'(x) Let g = defgli and observe that the Lipschitz property of g~' gives PROOF. 1/L < 1/sup{l ^-1(y) = inf({[g(x) - -(y')ll/Iy - g(x') - Y'Illy, Y' E g(f), /llIx - x'll Ix, x' E f, x - g(x?)]. y - y'} x'}. Thus according to Robinson (1988, Lemma 2.3), g + h satisfies (1/L) - 71 < inf{fl(g + h)(x) - (g + h)(x') ||/llx - x'll Ix, x' E fQ, x # x'} and (g + h)(fl) contains g(x?) + (1 - rlL)(5/L)By. Therefore (g + h)|n: fn (g + h)(Qf) is invertible and, as above, its inverse is Lipschitz of modulus - r] = L/(1 - 7L). The claimed properties of g' hold because g' = g + h 1/[(1/L) El + g'(x?) - g(x?). 3. Motivation: Line search damped Newton's Method for smooth equations. Let f: X -> Y be smooth, that is continuously differentiable. We wish to solve the nonlinear equation f(x) Suppose xk e X (k E {0, 1,... x =0, X. ) and there exists the Newton iterate xk+ 1, i.e., xk+l solves the equation f(xk) + Vf(xk)(x - k) =0, x E X. Newton's method is inductively given by setting xk+l =defxk+ 1 The algorithm is also called local Newton's method because the KantorovichNewton theorem (Ortega and Rheinboldt 1970, Theorem 12.6.2)-probably the best known convergence result for the Newton's method-shows convergence of the Newton iterates to a solution in a ball of radius 8 > 0 about the starting point x?. Assumptions include that Vf(x?) is boundedly invertible; then 8 is constructed small enough to ensure, by continuity of Vf and the Banach perturbation lemma, that Vf(x) is boundedly invertible at each x E x? + 5Bx. Further conditions guarantee 8 can also be chosen large enough such that the sequence of iterates remains in the 5-ball of x0, and convergence follows. It is well known (Burdakov 1980), however, that the domain of convergence of the algorithm can be substantially enlarged using line search damping, reviewed below, which preserves the asymptotic convergence properties of the local method. Choose line search parameters i,r? , (0, 1). As we saw in the introduction, the Newton path pk(t) =defxk + t(^k+l - k) is such that f(pk(t)) = (1 - t) GLOBAL CONVERGENCE OF DAMPED NEWTON'S METHOD 361 + o(t) for t > 0; hence for sufficiently small t > 0 and f(xk) Monotone Descent of the norm of the residual: =0, we have f(k) (MD) l(f(pk(t))l < ( l-t)llf(x )1l. In the Armijo line search familiar in optimization (McCormick 1983, Chapter 6, ?1), the step length tk is defined as rl, where I is the smallest integer 1 E {0, 1,... } such that (MD) holds at t = rl. The (line search) damped Newton iterate is xk+1 =def pk(tk). More recently, in the context of unconstrained optimization, Grippo, Lampariello and Lucidi (1986) have developed a line search using a Nonmonotone Descent condition that often yields better computational results than monotone damping. Let M E J, the memory length of the procedure, and relax the progress criterion (MD) to (NmD) lf(pk(t))Il < (1 - -t)max{| f(xk+l-)ll Ij 1,...,min{M,k + 1}}. This is identical to (MD) when M = 1. The nonmonotone Armijo line search would determine tk = T' similar to the monotone case. We abstract the general properties of an unspecified Nonmonotone Line search procedure: (NmLs). If (NmD) holds at t = 1, let tk = defl. Otherwise, choose any tk E [0, 1] such that (NmD) holds at t = tk > r sup(T e [0, 1] I(NmD) holds Vt tk and [0, T]}. It is easy to see that the above nonmonotone Armijo line search produces a step length that fulfills (NmLs). The parameter T need not be explicitly used in damped Newton's algorithm, however, so other line search procedures in which r is not specified may be valid; only the existence of r, independent of k, is needed to prove convergence. The formal algorithm is now given. Line search damped Newton's method. Given x? E X, the sequence (xk) is inductively defined for k = 0, 1,... as follows. If f(Xk) = 0, stop. Find xk+ -de fk - Vf(xk)-lf(xk). Line search: Let pk(t) =defxk + t(2k+ i -k) for t E [0,1]. Find tk e [0,1] satisfy- ing (NmLs). Define xk+ =defpk(tk). We present a basic convergence result for the line search damped, or so-called global Newton's method. It is a corollary of Proposition 10. PROPOSITION8. Let f: RIN N->R def Xo- be continuously differentiable,ao > 0 and (x E R I 1f(x)II < ao}. Let o-, i E (0, 1) and M E N be the line search parameters governing (NmLs). Suppose Xo is bounded, and Vf(x) is invertible for each x E Xo. Then for each x? E X, line search damped Newton's method is well defined and the sequence (xk) converges to a zero x* of f. 362 DANIEL RALPH The residuals converge to zero at least at an R-linear rate: for some constant p e (0,1) and all k> 0, Ilf(xk) )| pmax{|f(Xk<p llf(XO) = 1,..., min{M, k}} 1. The rate of convergence of (xk) to x* is Q-superlinear. In particular, if Vf is Lipschitz near x* then (xk) converges Q-quadratically to x*: lIxk+l _ X*II< dllXk - x*112 for some constant d > 0 and all sufficientlylarge k. The convergence properties of the method depend both on the uniform accuracy of each of the approximations Ak of f at xk for all k, i.e., Ilf(x) -Ak(x)\\/Ilx -xk - 0 as x -xk (x : xk), uniformly Vk and on the uniformly bounded invertibility of the approximations Ak: sup Vf(xk) k I < oo These uniformness properties are disguised in the boundedness (hence compactness) hypothesis on X0. 4. Path search damped Newton's method for nonsmooth equations. solve the nonlinear and, in general, nonsmooth equation f(x) = 0, We want to X where f: X -> Y. We proceed as in the smooth case, the main difference being the use of first-order approximations of the function f instead of linearizations. Suppose xk E X (k {0, 1,... }) and Ak =defq(xk), where v is a first-order approximation of f on X. Recall (Definition 2) a path is a continuous mapping of the form p: [0, T] -> X where T E [0,1]. Assume there exists a path pk: [0,1] -> X such that, for t E [0, 1], (4) Ak(pk(t)) = (1 -t)f(xk); pk is the Newton path. Assume also that pk(t) - k = O(t). Note that ^k+ = defpk(l) is the Newton iterate, a solution of the equation Ak(x) = 0. In nonsmooth Newton's method, the next iterate is given by xk+ = defXk+1 just as in smooth Newton's method. For nonsmooth functions having a uniform first-order approximation we call this Robinson-Newton's method, since the ideas behind local convergence results are essentially the same as those employed in the seminal paper (Robinson 1988), although a special uniform first-order approximation called the point-based approximation is required there (see discussion after Proposition 10). (Robinson 1988, Theorem 3.2) is the nonsmooth version of the classical KantorovichNewton convergence theorem (Ortega and Rheinboldt 1970, Theorem 12.6.2). Applications of Robinson-Newton's method include sequential quadratic programming (Fletcher 1987), or Wilson's method (Wilson 1963) for nonlinear programming; and OF DAMPED NEWTON'S GLOBAL CONVERGENCE 363 METHOD Josephy-Newton's method (Josephy 1979) for generalized equations (see also ?5). As in the smooth case, however, convergence of the method is shown within a ball of of f at x? is radius 8 > 0 about x?. The first-order approximation A = defx0) assumed to be Lipschitzianly invertible near x? and then, for small enough 8, the continuity properties of '( ) are used in Lemma 7 to show that /(x) is Lipschitzianly invertible near each x E x? + 58x. Further assumptions guarantee that RobinsonNewton's method generates a sequence lying in this 5-ball of x?, and convergence follows. We propose to enlarge the domain of convergence using a path search. Now Ak is a first-order approximation of f at xk, and o(pk(t) assuming pk(t) - xk = (t). With (4) we find that (5) = (1 - t)f(k) f(pk(t)) - xk) = o(t) + o(t), i.e., f moves toward zero rapidly on the path pk as t increases from 0, at least initially. In the spirit of ?3, we fix (r, r E (0, 1). As before, assuming f(xk) / 0, we have for all sufficiently small positive t, f(pk(t))ll|| < (1 -_ t)||f( k)||. So the nonmonotone descent condition below, formally identical to that given in ?3, is valid given any memory size M E N and all small positive t: (NmD) I|f(pk(t))l| < (1 - t)max{| f(Xk+l- j)l |j = 1,...,min{M, k + 1}}. The path search is any procedure satisfying If (NmD) holds at t = 1, let tk =defl. Otherwise, choose any tk >r tk E [0, 1] such that (NmD) holds at t = sup{T E tk and [0, 1] (NmD) holds Vt E [0, T]}. The (path search) damped Newton iterate is xk+1 =defpk(tk). The path search takes + = tk = 1, hence the Newton iterate xk k(1), if possible, otherwise a path length tk large enough to prevent premature convergence under further conditions. Also, as in ?3, r need not be specified explicitly. However, the path search given above is too restrictive in practice; in particular it assumes existence of the Newton iterate xk+ l Ak l(0). Motivated by computation (?5) we only assume the path pk: [0, Tk] -- X can be constructed for some Tk E (0, 1]. The path is constructed by iteratively extending the domain [0, Tk] until either it cannot be extended further (e.g., Tk = 1) or the progress criterion (NmD) is violated at t = Tk. The path length tk is then chosen with reference to this upper bound Tk. Of course we are still assuming that the path pk satisfies the Path conditions: (P) pk(0) X= k, Ak(pk(t)) = (1 - t)f(xk), Vt E dom(pk) where dom (pk) = [0, Tk]. The idea of extending the path pk is justified by Lemma 3, taking I = Ak and x = pk(Tk), which says that if Ak is continuously invertible near pk(Tk) and Tk < 1, then pk can be defined over a larger domain (i.e., Tk is strictly increased) while still satisfying (P). 364 DANIEL RALPH We use the following Nonmonotone Pathsearch in which pk: supposed to satisfy (P). [0,Tk] -> X is If (NmD) holds at t = Tk, let tk defTk. Otherwise, choose any tk E [0, Tk] such that (NmD) holds at t = tk and (NmPs). tk > r sup{T E [0, Tk](NmD) holds Vt [0, T]}. Now we give the algorithm formally. As the choice of tk may be determined during the construction of pk (see ?5), we have not separated the construction of pk from the path search in the algorithm. Path search damped Newton's method. Given x? E X, the sequence (xk) is inductively defined for k = 0,1,... as follows. If f(Xk) = 0, stop. Path search: Let Ak =def(xk). satisfying (P) such that if Tk Construct a path pk: [, Tk] - X, T,k [0,1], < 1 then either Ak is not continuously invertible near or (NmD) fails at t = Tk. Find tk Define xk+l -defpk(tk), [0, Tk] satisfying (NmPs). pk(Tk), Our main result, showing the convergence properties of global Newton's method, is now given. The first two assumptions on the first-order approximation v correspond, in the smooth case, to uniform continuity of Vf on X( and uniformly bounded invertibility of Vf(x) for x E X(, respectively. The purpose of the third, technical assumption is to guarantee the existence of paths used by the algorithm (cf. the continuation property of Rheinboldt (1969)). With the exception of dealing with the paths pk, the proof uses techniques well developed in standard convergence theory of damped algorithms (e.g. (Ortega and Rheinboldt 1970, McCormick 1983, Fletcher 1987)). THEOREM 9. Let f: X - Y be continuous, a0 > 0 and Xo def {X E XI Ilf(x)II aol. Let or,r E (0, 1) and M E N be the parameters governing the path search (NmPs). Suppose (1) / is a uniform first-order approximation off on X(. (2) WX(x)is uniformly Lipschitzianly invertible near each x E X,, meaning for some constants 8, c, L > 0 and for each x E XO, there are sets Ux and V^ containing x + 8BI and f(x) + EBy respectively, such that &/(x)lux: Ux - Vx is Lipschitzianly invertible of modulus L. (3) For each x E Xo and T E (0, 1], if p: [0, T) -- X is such that p(O) = x and, for = (1 - t)f(x) and A/(x) is continuously invertible near each t E [0, T), s(x)(p(t)) = p(t), then there exists p(T) def limt T p(t) with d(x)(p(T)) = (1 - T)f(x). Thenfor any x? E X(, path search damped Newton's method is well defined such that the sequence (x k) converges to a zero x* off. The residuals converge to zero at least at an R-linear rate: for some constant E p (0,1) and all k > 0, fIf(xk)l < pmax{l f(xk-j)ll j = 1,...,min{M, k}} < pkll f(XO). The rate of convergence of (xk) to x* is Q-superlinear; indeed, if for c > 0 and all pointsx nearx* we have lll(x)(x*) - f(x*)ll < cIx - x*112,then the rate of convergence is Q-quadratic: IIxk+ - for sufficiently large k. x*II < cLIIxk - X*112 GLOBAL CONVERGENCE OF DAMPED NEWTON'S 365 METHOD PROOF. Assume, without loss of generality, that f(xk) i 0 for each k. Let 8, E, L be the constants given by Hypothesis (2) of the theorem and, for each k, let Ak be the Lipschitzianly invertible mapping (xk)lUxk: Uxk -0 VXk given there. We may also assume without loss of generality that UXkis open: replace UXkby its interior, UXk,8 by any 6SE (0, 8), E by any e E (0, min{E,8/L}), and Vxk by Ak(UXk). The only point which may be unclear is that Ak(UXk) contains Ak(xk) + EB: let y E Ak(xk) + El; then x =deA- '(y) n Uxkis an element of Uxksuch that LIIy - Ak(Xk) |<, t|x -x xl hence y E Ak(UXk). Let A(t) = o(t) be the uniform bound on the accuracy of '(x), x e XO, as given by Definition 1; and A(0) = 0 for convenience. We show the algorithm is well defined. Suppose xk E X0. Lemma 3 can be used to show existence of a (unique) continuous function p: I -> X of largest domain I, with respect to the conditions *p(0) = xk; *either I = [0, 1], or I = [0, T) for some T E (0, 1]; and ?for each t e I, Ak(p(t)) = (1 - t)f(x), Ak is continuously invertible near p(t). If I = [0,1], pk p is a path acceptable to the algorithm. If I = [0, T) then, by Hypothesis (3), we can extend p continuously to domain [0, T] by p(T) =def limt Tp(t), for which Ak(p(T)) = (1 - T)f(xk). In this case, by maximality of I, Ak is not continuously invertible at p(T), so the extension p: [0, T] - X is acceptable as pk. This shows that existence of the path pk, required by the algorithm, is guaranteed if xk e XO. The fact that each xk belongs to XO follows from the inequality max{I|f(xk-j') I Ij = 1,...,min{M, k}} < i|(x?) |, which is easily shown by induction. Recall that pk: [0, Tk] - X, 0 < Tk < 1, is the path determined by the algorithm. We will show that Tk and the path length tk are bounded away from zero. We aim to find a positive constant y such that for each k, def Tk > Sk = min{y/ll f(xk)ll, 1) and (6) pk(t) =Ak -((1 -t)f(xk)) Vt e [0,Sk], (NmD) holds Vt E [0, Sk]. (7) To show these we need several other facts, the first of which is given by Lemma 3 when q = defAk and U =defUk: (8) pk(t) = A-((1 - t)f(xk)), V0 < t < min{e/||f(xk) |, Tk 366 DANIEL RALPH Now if Tk<I and (NmD) holds at tT= Tk then, by choice of continuouslyinvertiblenearpk(Tk); thus pk, Ak is not Tk > E/Ilf(Xk)I . Another fact is that ~ (9) (9) ~ ~ min(r/ii ~Y1If(Xk)ll, (NmD) holds for0 <t <min{k} TT where y et (0, El will be specified below. Recall the path search parameter o- G (0, 1). We choose /3> 0 such that A(/8) <3(1 )- )iL for 0 </3 </3. Then for 0 < t < min{ II k) I k) we have plk(t) Xk j-(( ? - )(k)) ((- LII(1- - f(xk) t)f(xk) So for such t, 1tpk(t) xkII</, (10) -'MX t)f(x(f(x)) If = by(8) jI by Hypothesis(2). tLIjf(xk) hence _ A(Iipk(t) XkJI) < fpk(t) _ Xk (1(l u/L - t(1 - c)IIf(xk)I, and furthermore, If(pk(t))jj <IIAk(pk(t))II + A(IIpk(t) _ XkjI) by Hypothesis (1) < (1 - t)IIf(xk)II ? t(l - a)IIf(xk)JII (1 _- ut) 1tf(Xk)1 by choice of pk and (10) < (1 - ut)max{llf(xk+?1j) Let y =def )IIj = 1,~... min{M, k + 1}}. min{/3/L, E} (e (0, E]).We have verified(9). If Tk < I and (NmD) holds at t =- Tk we have already seen that Tk> E/lif(xk)JI; hence Tk> Skk If (NmD) is violated at t - Tk then, with the statement(9), we see that Tk > y/llf(xk)I!. So we always have Tk> Sk. Statement (6) now immediately follows from (8) because E > y and Tk> Sk. Likewise, statement (7) immediately follows from (9). Next we show that tk > TSk for every k (recall r E (0, 1) is another path search parameter).Fromthe rules of (NmPs),if (NmD) holds at t = Tk then tk -def Tk> Sk > rS,k; otherwise tk is chosen to satisfy tk > r SuP{TE >TSk [0, Tk I(NmD) holds Vt e [0, T]} by(7). OF DAMPED GLOBAL CONVERGENCE NEWTON S METHOD 367 Since each iterate xk belongs to XO, we get tk *, n/llf(n - m > rmn ,,def 1 t min{/|()| y/aO, 11 > 0. Therefore, by a short induction argument, for each k > 0 (11) I||(xk)|| IIIj = 1,...,min{M,k}}) < pmax{l f(xk-) < pkllf|(x?) where p =de (1- (it)1/M. This validates the claim of linear convergence of the residuals. As a result, for some K, > 0 and each k > K1, whence Sk = 1. So pk(1) = Ak (0) (by (6)) and (NmD) holds at t = 1 (by (7)). Also Tk = 1, since 1 = Sk < Tk < 1; hence (NmPs) determines tk =defl and the damped Newton iterate is the Newton iterate: xk +l = Ak 1(0). For k > K1, Ixk+1l - Ak 1k(0) xkI= -Ak l(f(Xk))[[ < L 0 - ( by Hypothesis (2) xk)I -<Lp ||f(xO) 1 by (11). So if K, K' E , K < K < K', then IXK - XK || < E lixk+ k=K - xk < E LIkllf(x?)11 k=K Ilf(x LIIf(x?)lK p -= 0 as K -oo, K' >K. This shows that (xk) is a Cauchy sequence, hence convergent in the complete normed space X with limit, say, x*. Since IIf(xk)l -> 0, continuity of f yields f(x*) = 0. Finally note that for all sufficiently large k, x* xk + 8BExC Uxk. For such k > K,, Hypothesis (2) yields IIx-xk l X* II Ak(f( < L|| f(x )) -Ak (Ak(X*))l ) -(xAX*) < LA xIIxk-* 11). The second inequality demonstrates Q-superlinear convergence. With the first inequality we see that if IL((x)(x*) - f(x*)ll < cllx -x*l12 for some c > 0 and all x near x*, then IIXk+1 for sufficiently large k. [ _ x* 1< cLlIXk - x* 12 368 DANIEL RALPH We can say more about the rate of convergence of the residuals: the inequality I||f(xk)||l< p max{||lf(xk-j) I j ,...,min{M, k}} shows that convergence is M-step Q-linear, and, if f is Lipschitz near x* as it must be in the smooth case, the residuals converge at a Q-superlinear or Q-quadratic rate. We will find the following version of Theorem 9 useful in applications (?5.1). PROPOSITION 10. Let f: N be continuous, a( > 0 and > 1RN def o ({x < a}. Nll(x)ll Let ar,r E (0, 1) and M E N be the parameters governing the path search (NmPs). Suppose X( is bounded, v/ is a first-order approximation off on X0, and for each x E X. the following hold: (1) W/is a uniform first-order approximation of f near x. (2) S(x) is invertiblenear x, and continuous. (3) There exists Ix > 0 such that if d'(x) is invertible near any x' E [RN, then it is Lipschitzianly invertible of modulus lx near x'. (4) There are rqx:(0, oo) -> [0, oo]and a neighborhood Ux of x such that lim5s , rx(s) = 0 and A(x1) - s(x2) is Lipschitz of modulus r7(lQx1- x211)on Ux, for x', x2 E Ux . Then for X = Y = RN, the hypotheses (and conclusions) of Theorem 9 hold. PROOF. We first strengthen Hypothesis (2): each x E X, has an open neighbor- hood Ux such that (2') For some scalars Ax,ex, Lx > 0 and each x' E x + XE,x, the mapping d(x')IuU: UX- (x)(Ux) is Lipschitzianly invertible of modulus Lx, and d(x')(Ux) contains f(x') + ExBy. To see this appeal to Hypotheses (2)-(4). There are neighborhoods Ux and Vx of x and f(x), respectively, and lx > 0 for which '(x)lux: Ux - Vx is Lipschitzianly invertible of modulus lx > 0; and there is 7/x:[0, oo) -> [0, oo] such that lim , r7x(s) = 0, 7ix(0) = 0, and sc'(x) - s(x2) is Lipschitz of modulus ri7(llxl - x211)for xl, x2 Ux. Also S'(x) is continuous. So there is 8x > 0 such that x + 2b8x~ c Ux, d(x)(x + 8x,X) + (6x/lx)By c x, (s) < 1/(21X), Vs E [0, For x' E x + XIBx, we apply Lemma 7 with g =def,(x), g =def(x'), = --> defx',: V(x')Iux: Ux L = defl, 7 =def x(llX - x'II), X0 =defxt, and Lipschitzianly invertible of modulus 21l, (8x/(2lx))By. Let Lx =def21 and and ex =def6x/(2lx). '(x')(Ux) contains f =defUx, (x')(Ux) is f(x') + (2') is confirmed. By Hypothesis (1) and the above, each x E X0 has a neighborhood Ux such that for some AZ(t) = o(t), (12) II V(x')(x") -f(x" ) l < Ax(llx' - x"ll), x', x" Ux, and (2') holds. Assume without loss of generality that Ux contains x + ,xBx, and let 369 GLOBAL CONVERGENCE OF DAMPED NEWTON'S METHOD 0x denote the interior of x + 8xBx. Since X0 is compact we may cover it by finitely many neighborhoods (0xi) corresponding to a finite sequence (xi) c Xo. For each i and x = x' we have A i(t) = o(t) such that (12) holds, and scalars 5,, e,i L, satisfying (2'). Now each x e XO is an interior point of 0xi for some i, indeed there C Ox for some i; if not, exists 8 > 0, independent of x, such that for x + _5BIX of X0 leads to an Define A(t) as contradiction. sequential compactness easy max/ Ai(t) for 0 < t < 5 and oofor t > 8_ then (t) = o(t). Since any two points of X0 separated by distance less than _ lie in some Oxi, (12) yields < A(lIx - -f(x")|1 Ilx|(x')(x") x"[ll) Vx', xf" e X, which confirms Hypothesis (1) of Theorem 9. Hypothesis (2) of Theorem 9 also holds, with 6 = def, e def min,i c, and L = d maxi Lxi. For each x E XO we take Ux =defUi for i such that x + SIBxc Oxi, and Note that x + SBxEcxi + 5,iBX, hence x + 3Bx c Ux and, by VX-def(x)(Uxi). (2'), &V(x)Iux:U -- Vx is Lipschitzianly invertible of modulus L and Vx D f(x) + Ely. To show Hypothesis (3) of Theorem 9, let x e X0, T (0, 1] and p: [0, T) - X be such that, for each t E [0, T), x(x)(p(t)) = (1 - t)f(x) and sf(x) is continuously invertible near p(t). We only need show p is Lipschitz on (0, T), say of modulus 1, in which case, first, (p(t)lt e (0, T)} is bounded, hence p(t) has an accumulation point y as t TT; secondly, for t e (0, T), ||p(t) - y |< limsup |p(t) - p(s) |< lit - T, sTT so lim,t p(t) = y; and finally, S/(x)(y) = (1 - T)f(x) by continuity of M(x). Let t, t' e(0, T); without loss of generality assume t' > t. Let Ix > 0 be the Lipschitz constant given by Hypothesis (3), and choose a neighborhood U of p(t) n U is Lipschitz of modulus lx near (1 - t)f(x). So there is y > 0 such that X(x)such that (t - y, t + y) is contained in (0, T), and p(s) = S/(x)- [(1 - s)f(x)] n U for s E (t- y, t + y). Then p is Lipschitz of modulus/ I deflxlf(x)ll on (t - , t + y). As [t, t'] is compact, it can be covered by finitely many open intervals in (0, T), on each of which p is Lipschitz of modulus 1. Hence there is a finite sequence ' t' such that p is Lipschitz of modulus I on [tk-1, tk] for <tK to = t < t < = each k 1,..., K. We conclude: K 11p(t)-p(t')Il < E I|P(tk-I) -P(tk)ll= k=l K E (tk tk-)l (t t)l. O k=l Hypotheses (1) and (4) of the proposition specify a relaxed local version of the point-based approximation of f at x, used to define Robinson-Newton's method (Robinson 1988). A point-based approximation of f on f c X is a function v on Q, each value d(x) of which is itself a mapping from f to Y, such that for some K > 0 and every x1, x2 EG , (a) Il.(xl)(x2)f(x2)lI < (1/2)Kllxl' x2I12 and (b) av(xl) - X2(x2)is Lipschitz of modulus Kljxl- x211. Suppose v is a point-based approximation of f on a neighborhood l of x and, for each x1 e f, we extend the domain of a(x') to X by arbitrarilydefining the values gV(xl)(x2) e Y, x2 e X\ f. Then property (a) implies Hypothesis (1), and property (b) implies Hypothesis (4). Liqun Qi has noted that the conditions of the previous two convergence results preclude from consideration such functions as f: [R-> R: x > Ix[, because f does not 370 DANIEL RALPH have a first-order approximation at 0 that is invertible near 0. This particular example A is easily accommodated: Theorem 9 is valid if X0 is replaced by Xo\f-(0). similar recasting of Proposition 10 is possible, though care is needed because compactness of X0 does not imply compactness of X0 \f -(0). This leads to a more serious question however: Can we deal with a function that is badly behaved at the current iterate xk E XO?In light of Lemma 7, the hypotheses of both Theorem 9 and Proposition 10 imply that f must be Lipschitzianly invertible near each point of X0 (though we can restrict our attention to points of X( at which f is nonzero), hence at xk. This is an implicit restriction on the kind of mapping to which the analysis applies. 5. Applications. 5.1. Variational inequalities. The Variational Inequality is the problem of finding a vector z E N such that z e C, (VI) (F(z),c -z) > 0, Vc C where F is a function from C to RN and C is a nonempty closed convex set in RN. We will also assume that F is continuously differentiable (see Definition 5) and usually that C is polyhedral. See Harker and Pang (1990) for a survey of variational inequalities which includes analysis, algorithms and applications. Equivalent to (VI) are the Generalized Equation (Robinson 1979) 0 E F(z) (GE) + Nc(z) where Nc(z) is the normal cone to C at z (see ?2), and the Normal Equation (Robinson 1992) 0 = F(rc(x)) (NE) + x - 7c(x) where 7c(x) is the Euclidean projection of x to its nearest point in C. So (NE) is just 0 = Fc(x). We will work with (NE) because it is a (nonsmooth) equation. Note that (VI) and (GE) are, but for notation, identical, whereas the equivalence between each of these two problems and (NE) is indirect: if z solves (VI) or (GE) then x = z - F(z) solves (NE), while if x solves (NE) then z = rc(x) solves (VI) and (GE). In fact we have a stronger result from Lemma 6, assuming F is locally Lipschitz: Fc is Lipschitzianly invertible near x if and only if for some neighborhoods U, V of rrc(x), Fc(x) respectively, (F + Nc)- 1 n U is a Lipschitz function when restricted to V. This result has links to strongly regular generalized equations (see Robinson (1980), and proof of Proposition 13). A first-order approximation of f =defF at x is obtained by linearizing F about C = defT(X): def - c) + x' - 7c(x'), W'(x)(x') So (x)(-) is VF(c)c(') shifted by the vector F(c)- continuous, F(c) + VF(c)(77c(x') (13) and for any x', x2 11,(X")(x2) < VF(c)(c). Therefore X/(x) is RIN, - f(X2)11 sup IlVF(7c(X1))xo s)<1 Vx' E RN. ) - X lv( 7C X') 7T r( x2)1l! VF(sc(x') + (1- S)T(X2))|| GLOBAL CONVERGENCE OF DAMPED NEWTON'S 371 METHOD by the vector mean value theorem (Ortega and Rheinboldt 1970, Theorem 3.2.3). Now rrC(x+ B) is compact, so VF is uniformly continuous on the convex hull of this set, and it follows from the above inequality and Lipschitz continuity of 7rc that vO is a uniform first-order approximation of f near x. Similar arguments ensure that, for s > 0 and (s)d= VF(7r)c(-2))I sup{ VF(rc(C')) |1, 2 E X + B, 11 - 21l < s}, if x1, x2 E x + B then as(x1) - Q(x2) is Lipschitz of modulus rx(llx1 - x211),and r7x(s) 0 as s I 0. We have verified the first and fourth assumptions of Proposition 10 for each x E RN. To apply Proposition 10 we still must find a constant a0 > 0 such that the level set def xo- {x E R Ilf(x) I < ao is bounded and only contains points x near which X(x) is Lipschitzianly invertible, etc. One case we will deal with is when F is strongly monotone of modulus A > 0, that is (F(c) -F(c'),c -c') > Ai -c'112, Vc, c' E C. Another case to consider is when C is polyhedral convex; here -rc is piecewise linear, hence, for each x, S(x) is piecewise linear too. In this situation, Robinson's homeomorphism theorem (Robinson 1992, Theorem 4.3) says /(x) is homeomorphic if and only if it is coherently oriented, i.e., its determinants in each of its full dimensional pieces of linearity have the same (nonzero) sign. (Robinson 1992, Theorem 5.2) also provides homeomorphism results near a point x, via the critical cone (Robinson 1987) to C at x, which is the intersection of the tangent cone to C at c = 7c(x), Tc(c), with the linear subspace orthogonal to x- c, (x- c)l. See also Ralph (1993). These results provide testable conditions for the second assumption of Proposition 10. PROPOSITION 11. Let C be a nonempty closed convex set in RN, F: C - RN be continuously differentiable, and o-, r E (0, 1), M E I be the parameters governing the path search (NmPs), and ao > 0. Let X = def{x E [NI IIFc(x)II< a0}. Suppose, in addition, either (1) F is strongly monotone; or (2) Xo is bounded; C is polyhedral; and for each x E XO, c =def7C(x), VF(c)c is invertible near x, or, equivalently, VF(c)K is coherently oriented where K =def T(c) f1 (x -c)'. Let path search damped Newton's method for solving Fc(x) = 0 be defined using the first-order approximation (13), startingfrom x? E Xo. Then the damped Newton iterates xk converge to a zero x* of Fc. The residuals Fc(xk) and iterates Xk converge at Q-superlinear rates to zero and x*, respectively; indeed these rates are Q-quadratic if VF is Lipschitz near 7rr(x*). PROOF. We have seen that Hypotheses (1) and (4) of Proposition 10 hold. It is left to verify the remaining hypotheses. The claim of Q-superlinear or Q-quadratic convergence of Fc(xk) is then explained by the comments after Theorem 9, given that Fc is Lipschitz near x*. 372 DANIEL RALPH (1) Let F be strongly monotone of modulus A > 0. It is well known for such F (Harker and Pang 1990, Corollary 3.2) that the generalized equation y F(c) +Nc(c) has a unique solution c E C for each y E RN. Suppose c' E C and y' E (F + NC)ci), for i = 1, 2. Using the Cauchy-Schwartz inequality, and the definitions of the normal cone and strong monotonicity, we get Ilyl _ y21l1 C - c211 > (yl (y y2, c1 -_ F(c'), +( F(c) 2) c - 2) + - F(c2),cl (y2- F(C2), 2 cl) - c2) > AIlc1- C2112 Thus Ily1 - y21 > AIIc1- c211.Since (F + Nc)-1 maps points to nonempty sets, this bound implies (F + Nc)-1 is actually a function on RN that is Lipschitz (of modulus A-1). So Co =def(F + Nc)- (aoB) is compact and it follows, by continuity of F, that F(Co) is compact too. Recall that from Lemma 6, equation (2), y = Fc(x) if and only if x = y - F(c) + c for some c E (F + N)- 1(y). This yields X = y - F(c) + c y E aolB, c E (F + Nc)-(y) caB - F(CO) + C(, which shows that XO is bounded. It also follows from strong monotonicity of F that for any x E [R' and c defrrc(x), VF(c) is strongly monotone, hence, as above, (VF(c) + Nc)-1 is a Lipschitz mapping. Since VF(c) is also Lipschitz on RN we see that, as in Lemma 6, the normal mapping VF(c)c is Lipschitzianly invertible; hence X/(x) is Lipschitzianly invertible. As as(x) is also continuous, Hypotheses (2) and (3) of Proposition 10 are satisfied. (2) Under the assumptions of this proposition, we have seen that, by virtue of Robinson (1992), Hypothesis (2) of Proposition 10 holds. We are also given that X( is bounded, so it only left to show that Hypothesis (3) of Proposition 10 is valid. Since C is polyhedral convex, a/(x)(.) is piecewise linear for each x E [RN, that is ,X(x) is represented on each of finitely many polyhedral convex sets covering RN by an affine mapping z - Miz + b', where {Mi} c [NXN and (bi} c RN. Since V/(x) is invertible near x, at least one of the matrices Mi must be invertible; let def lx = max{llM,-' I IMi is invertible}. If S/(x) is invertible near some x' E [RN, then it has a local inverse P: V -- U where V is a neighborhood of W(x)(x') and U is a neighborhood of x. We assume without loss of generality that V is convex, and take any y, y' E V. Now the interval joining P(y) and P(y') is covered by finitely many subintervals {Ij} on each of which a/(x) is represented by some invertible mapping Miz + b'. Hence the interval joining y and y' is covered by the subintervals ({W(x)(Ij)}, on each of which P is Lipschitz of modulus lx. It follows that IIP(y) - P(y')11 < lxlly - y'11. Fukushima (1992), Taji, Fukushima and Ibaraki (1993), and Wu, Florian and Marcotte (1990) show convergence of line search damped schemes based on Josephy-Newton's method (Josephy (1979) and below) for strongly monotone varia- 373 GLOBAL CONVERGENCE OF DAMPED NEWTON S METHOD tional inequalities. Superlinear convergence is shown under the additional assumptions of polyhedrality of C, and a nondegeneracy condition implying smoothness of Fc at the solution point to which the iterates converge. When F is monotone, but not necessarily strongly monotone, and C is compact as well as being convex and nonempty, Marcotte and Dussault (1987) give a globally convergent damped Josephy-Newton's method; in this situation, however, it seems that neither our analysis nor the analysis of Fukushima (1992), Taji, Fukushima and Ibaraki(1993) and Wu, Florian and Marcotte (1990) applies. Josephy-Newton's method. In Josephy-Newton's method (Josephy 1979) for solving (GE), given the kth iterate ck, the next iterate is defined to be a solution ck+ of the linearized generalized equation 0 E F(ck) - + VF(ck)(c ck) + Nc(c), c E RN. The equivalence between Josephy-Newton's method on (GE) and Robinson-Newton's method on the associated (NE) is well known: if xk is such that _-C(xk) = ck and Ak+l d Ak+ then Xk+ is a zero of (xk)(X _ F(Ck) - - ck), VF(ck)(Ck+l ) and Ck+ = rCc(k+ 1). Hence the Josephy-Newton iterate is the Euclidean projection onto C of the Robinson-Newton search damping of Robinson-Newton's method produces xk+l on the on a path from ck pk from xk to k+l. The projection rr(xk+l), iterate. damped Josephy-Newton For practical purposes, suppose C is polyhedral convex. Let us construction iterate. Path Newton path to ck+, is a describe the of the path pk given the current iterate xk. From (13), with ck -def Trr(Xk), Ak(x) df(k)(X) dfF(ck) + VF(ck)(c( - k) + -tc(X). As Ak is piecewise linear, pk is also piecewise linear. We construct pk, piece by affine piece, using a pivotal method. Starting from t = 0 (pk() = xk), ignoring degeneracy, each pivot increases t to the next breakpoint in the derivative of pk while maintaining the equation = (1 - t)f(xk), Ak(pk(t)) thereby extending the domain of pk. We continue to pivot so long as pivoting is possible and our latest breakpoint t satisfies the nonmonotone descent condition (NmD). If, after a pivot, (NmD) fails, then we line search on the interval [pk(told), pk(t)] to find xk+ , where told is the value of t at the previous breakpoint. The line search makes sense here because pk is affine between successive breakpoints, hence affine on [told, t]. It is easy to see that the Armijo line search applied with parameters o, r E (0, 1) produces tk E [told, t] that fulfills (NmPs). On the other hand, if (NmD) holds at every breakpoint t then we must eventually stop because t = 1 or further pivots are not possible (i.e., Ak is not continuously invertible at pkt)). In this case we take Tk =deft and xk+l = defp(t). This procedure can be thought of as a forward path search: we construct the path as t moves forward from 0 to Tk, testing (NmD) after each pivot that extends the path. This is an efficient strategy if (NmD) fails after the first few pivots. A backward path search extends the path as far as possible, that is until Tk = 1 or p is not 374 DANIEL RALPH invertible at Tk, while recording each pivot point p(t) and the corresponding value of t. If (NmD) holds at the final value of t, i.e., at Tk, take xk+l =defpk(Tk); otherwise search through the list of (pk(t), t) pairs until one, with t = t1 say, is found such that t = t1 satisfies (NmD), but (NmD) fails at the next value of t seen in the list, say t = t2. A line search is now performed on the interval [pk(tl), pk(t2)] For the sake of generality, we mention a class of nonsmooth functions with "natural" first-order approximations that contains each normal mapping f = Fc such that F is smooth and C is closed, convex and nonempty. Consider the class of functions of the form f=Hoh where D c [RK, h: RN -> D is locally Lipschitz, and H: D -- RN is smooth (e.g., D is convex and H is continuously differentiable). A subclass of this class was highlighted in Robinson (1988) in the context of point-based approximations. Similar to above, it is easy to see that def J(x)(x') -H(h(x)) defines a first-order approximation of f on Proposition - h(x)], + VH(h(x))[h(x') RN x,' N satisfying Assumptions (1) and (4) of 10. Fc is given by setting K =def2N, D =defc X RN, h(x) =def(7c(X), X - r7c(x)) and H(a, b) =dfF(a) + b. 5.2. Nonlinear complementarity problems. We now confine our attention to C the nonnegative orthant in RN. The problems (VI), (GE) and (NE) are equivalent to the NonLinear Complementarity Problem: =def RN, z,F(z) >0, (NLCP) (z,F(z)) = 0, where vector inequalities are taken componentwise. The normal mapping induced by F and lRN is denoted by F+, so F+= F+N. The normal equation form of (NLCP) is F+(x) = 0 or F(x+) where x+ denotes rrN(x). def (14) /(x)(x') F(x+) +x -x+= 0 The associated first-order approximation (13) becomes + VF(x+)(x+-x+) + x' -x', Vx' E N. (NLCP) has many applications, for example in nonlinear programming (?5.3) and economic equilibria problems (Harker and Pang 1990, Harker and Xiao 1990, Pang and Gabriel 1993). More notation is needed. Let I be the identity matrix in RNXN. Given a matrix M E RNXN and index sets J, Jc {1,..., N}, let M be the (possibly vacuous) Also let \J be the submatrix of M of elements Mi,j where (i, j) E Jxf. of be the Schur and N} J, J, {1,..., \ complement complement of M M/Mj , with respect to M,r _, M/M_, = (M \ \-M\ M vacuous \ ,,M,Y-M,\ if 0 + {1, . . ., N}, if J= , if J= {1,..., N}, GLOBAL CONVERGENCE OF DAMPED NEWTON S METHOD 375 assuming M , j is invertible or vacuous. Also, M is a P-matrix if it has positive principal minors. PROPOSITION 12. Let F: RN - IRN be continuously differentiable,and r, r E (0, 1), M E N be the parameters governing the path search (NmPs). Suppose ao > 0 and def X- = {x E RI IF+(x) 1< ao is bounded. Suppose for each x E Xo the normal mapping VF(x +)+ is Lipschitzianly invertible near x or, equivalently, the following (possibly vacuous) conditions hold: VF(x+), , is inuertible, where J= def{ilx, > 0}; > 0}. , is a P-matrix, where f def{ixi /VF(x+), VF(x+), Let path search damped Newton's method for solving F+(x) = 0 be defined using the first-order approximation (14). Then for any x? E X0, the damped Newton iterates xk converge to a zero x* of F+. The residuals F+(xk) and iterates xk converge at Q-superlinear rates to zero and x*, respectively; indeed these rates are Q-quadratic if VF is Lipschitz near x*. If the above conditions on VF(x+) are equivalent to the assertion that is Lipschitzianly invertible near x, then the result is a corollary of ProposiVF(x+)+ tion 11.2. The required equivalence follows from Robinson (1992, Proposition 4.1 and Theorem 5.2). c This result is similar in statement to Harker and Xiao (1990, Theorem 3) on line search damped B-Newton's method, but the conclusions of Harker and Xiao (1990) are weaker. The convergence results of Harker and Xiao (1990) are derived directly from Pang (1990), hence, as noted in the Introduction, require existence of a limit point of the damped Newton iterates at which lIF+(')112is differentiable. Note that no differentiability properties of the residual, or of its norm squared, are needed in Proposition 12. On a practical level, let us compare the damped method used in Proposition 12 with line search damped B-Newton's method. In the latter case we use the approximation PROOF. Bk(x) -f (xk) +F+(k; -xk), where the B-derivative F (xk; d) equals VF(xk) + d - , and (i is di if xk > 0, max{d/, 0} if x/k = 0, and 0 otherwise. Solving Bk(x)= 0 involves factoring out the linear part of F+(xk; ), which corresponds to the nonzero components of xk, and solving the remaining piecewise linear problem of reduced size. On one hand, this generally requires less work per iteration than using the full piecewise linear approximation (14) to carry out a path search. On the other hand, since the approximation Bk of F+ is generally not as accurate as Ak, damped B-Newton's method may require more iterations to reduce the size of the residual to a given level. 5.3. Nonlinear programming. Our computational examples are all optimization problems in NonLinear Programming. A general form of the nonlinear programming problem is min 0(z) subject to z E D, g(z) = 0 where D is a nonempty polyhedral convex set in [Rn,and 0: D -> R and g: D -> [Rm 376 DANIEL RALPH are smooth functions. Under a constraint qualification the standard first-order conditions necessary for optimality of this problem are of the form (GE), where N = defn + m, def C = D X Rm, + Vg(z) F(z, y)def (V(z) V(z, y) E D X [. y, g(z)), See, for example, Robinson (1983 ?1 and Theorem 3.2). Variable-multiplier pairs (z, y) satisfying the first-order conditions can be rewritten as solutions of (NE) for these F and C (Park 1989, Chapter 3, ?4). We will confine ourselves to a more restrictive class of nonlinear programs which contains our computational examples: (NLP) subject toz> min0(z) 0, g(z) <0. Again under a constraint qualification such as the Mangasarian-Fromovitz condition (Mangasarian 1969, 11.3.5, McCormick 1983, 10.2.16) the first-order conditions necessary for optimality of (NLP) can be written as either (NLCP), or the corresponding normal equation, where N = defn + m and (15) F(z + Vg(Z)-g y) d (VO(Z) (z)), V(z,y) E + Kojima (1980) introduced the normal equation formulation for programs with (nonlinear) inequality and equality constraints. Wilson's method. In Wilson's method (Wilson 1963, Fletcher 1987), also known as sequential quadratic programming (SQP), given the k th variable-multiplier pair (ak, bk) E R[n+m the next iterate is defined as the optimal variable-multiplier pair (ak+l bk+l) for the quadratic program: min a E n 0(ak) + (V0(ak),a ak,V2[ +(1/2)(a subject to a > 0, g(ak) Suppose xk = (zk yk) E [Rn+ - ak) + Vg(ak)(a satisfies (z, + (bk) g](a(a a - a)) - ak) < 0. y+) = (ak, bk), F is given by (15), and s(xk) is given by (14). The SQP iterate (ak+,fk+'1) satisfies the first-order condition for the quadratic program, which is equivalent, by previous discussion, to saying the point k+l (zk+l 9k+) def(k+ bk+l) F(ak VF(ak, bk)(k+l bk) _ ak k+l _ bk) is a zero of S(xk). Also (zk+l, k+) = (k+, bk+l). A path search applied to Robinson-Newton's method for (NE) determines the next iterate xk+1 = (zk+ , yk+ ) as a point on the Newton path pk from xk to xk+ . The nonnegative part k?+I k+I , yk+ ), a point on the path from (ak, bk) to (ak+l, bk+ ), is a damped iterate (z for SQP. There are standard conditions that jointly ensure local uniqueness of an optimal variable-multiplier pair (z, y) > 0 of (NLP), hence of the solution (also (z, y)) of the GLOBAL CONVERGENCE OF DAMPED NEWTON'S METHOD 377 associated system (NLCP), and of the solution (z*, y*) = (z, y) - F(z, y) of the associated equation (NE). At nonsolution points (z, y) of (NE), analogous conditions will guarantee local invertibility of F+= FR+m and its first-order approximation (14) at (z, y). To specify these conditions at (z, y) E 8nfm, let {ilzi > 0}, Ade {ilzi > 0}, de (16) def .y-- = -{/lY def > ye= {lylY 0}, 0), and I,J be the cardinality of z, etc. We present the conditions of Linear Independence of binding constraint gradients at (z, y): (LI) Vg( )jg zz, has linearly independent rows , and Strong Second-Order Sufficiency at (z, y): if Vg(z+)y,j ddt=0, O d E Rlal, (SSOS) + YV2g(z+ then d [V20(z+) )] d > 0. We note that if (z, y) is a zero of F+ then (LI) and (SSOS) correspond to more familiar conditions (e.g., Robinson 1980, ?4) defined with respect to (z+, y+) rather than (z, y). This connection is explored further in the proof of our next result. PROPOSITION 13. Let 0: R+-> R, g:R'~- Rm be twice continuously differentiable functions, and F be given by (15). Let o, r E (0, 1) and M e Ni be the parameters governing the path search (NmPs). Suppose ao > O, def Xdf {( z, y) E +Rn+m |F+ (z y) | < a0} is bounded, and the above (LI) and (SSOS) conditions hold at each (z, y) E X0. If path search damped Newton's method for solving F+(z, y) = 0 is defined using the first-order approximation (14), where x - df(z, y), then for any (z0, y?) E XQ, the damped Newton iterates (zk, yk) converge to a zero (z*, y*) of F such that z is a local minimizer of (NLP). The residuals F+(zk, yk) and the iterates (zk, yk) converge at Q-superlinear rates to zero and (z*, y*), respectively; indeed these rates are Qquadratic if V20 and V2g are Lipschitz near z. PROOF. This is essentially a corollary of Proposition 12. Most of the proof is devoted to showing that the (LI) and (SSOS) conditions at a given point (z0, yO) E R" X R" are sufficient for VF(z?, y? to be Lipschitzianly invertible near that .) I n denotes the times n identity matrix; and the fact that point. Below, + N(z) 0E u + u, 0, (u, z) 0 E z + NR=(u), will be used without reference. Also Robinson (1980) is needed, so generalized equations rather than normal equations are stressed. 378 Let x? DANIEL RALPH y) =def(zO, and u? =def ,0 z0 (so that (5x0, u?)= zo- ) Define Def y) F + (X0). Consider the nonlinear program, a perturbed version of (NLP). min 0(z) (NLP) subject to g(z) < 0, where 0(z) =def0(z)- (<), Z) and g(z) def(g(z) + )(, -z). The first-order opticondition this (Robinson 1980) for mality problem is the perturbed generalized equation 0E (GE) F+ y, u) NA1X,,,,)(z, where F(z,y,u)de(V(z) y) - =(F(z, +Vg(z)(y),-g(Z)) -(u ,0), z). Now F(z+ y+ (, y-+ y ,z+ + -E " +(+, y+, + u0 ) solves (GE). This generalized equation is said to be strongly regular i.e., (z, , , u ) if the linearized set mapping (Robinson 1980) at (z?, def 0 T(z, y, u) = F(z, y , u) ?+ F(, z", y --y+,u-u u+,)( + N,RxR,+,,(z, ) , yu) is such that for some neighborhoods U of (z, y, ) and V of 0 e [R2n +n, T- n U is a Lipschitz mapping when restricted to V. Given a subset X of the row indices of a matrix M, let M denote the submatrix consisting of the rows of M of indices in X. According to Robinson (1980, Theorem 4.1), it is sufficient for strong regularity of (GE) at (z, y, Ou ) that two conditions hold, where, for " =def(y( ut), -: def {(iIg(z ), 0, def ( > O), (i g(z+ = ) 0, The first is linear independence of the binding constraints: Vg(z?) )+ u has full row rank; and the second, strong second-order sufficiency: if z E \ {0} and Vg(z )iy z = 0, then zT'"z > 0 01. ( OF DAMPED NEWTON'S GLOBAL CONVERGENCE 379 METHOD where d'V2(j+ (y, )+ +g)( = ) + (y)Tg)(z+). V2( In terms of g and the index sets (16), and \ independence condition is Vg(z?f =-def{l . ..,n} \ , the linear has linearly independent rows; I~-I\Z L hence is equivalent to (LI) at (z?, y0). Likewise, the strong second-order sufficient condition may be rewritten if z ER \ {0} v and yz = 0, then zr" z > 0, which clearly is equivalent to (SSOS) at (z, y0). Therefore the conditions (LI) and (SSOS) at (z?, yo) guarantee strong regularity of (GE) at (z?, yo uo ). Furthermore, assuming (z*, y*) is a zero of F+, if (z?, yO) (z*, y*) then 0 = (0,0) and (NLP) is just the original problem (NLP). So, as shown above, (LI) and (SSOS) at (z*, y*) are respectively equivalent to the usual conditions of linear independence and strong second order sufficiency for (NLP) with respect to the variable z*, the multipliers y* corresponding to the nonlinear constraints g(z*) < 0, and the multipliers z -z* corresponding to the constraints z > 0. Applying McCormick (1983, 10.4.1) we find that z* is a strict local minimizer of (NLP). We now show that strong regularity of (GE) implies VF(z0 yo)+ is Lipschitzianly , y0, invertible near (z, y?) Let M defVF(z y) M = defF ). Observe that _" " where M= -G -G o M Gt 0 ( "), G 0 0 I =defV2(0 + (y0 )Tg)(Z)= T 0 defVg(zO). Define def (&1,)00-(y 0O0)) (00,o oo (\ zO A)(z,, -z yo); --M)(zo, Note T(-) + (O 0)) (o + NX,,x+m( if0, For any (z, y), (s, y) e Rn X Rm ($,Sy) ** 3uu e Rn n, ). E = -'z fe 0 y+,0 0X +Ni,uO +rn+ z ( (,) yo+ (M +AN+m)(Z, Y) + GTy - u, -Gz + N,(y), 0 e z + NIa(u), = 3 E R", ( (,(j y, 0) E + nXR+n")( Z,Y,U). 380 DANIEL RALPH ((, 0) such that Hence if U, V are respective neighborhoodsof (z?, yo, u ), (0, (M+ is a Lipschitz mapping when restricted to V, then (M + n U is a Lipschitz map when restricted to V, where N,Rnx,+t,)-'n NRn+m)- ) def{( U e n+m 1(, ,0) def y) {(, Fn+m(z, y }, ) E U for some u e R}. an (o In this case U and V are neighborhoods of (z?, y) and ) respectively, so Lemma 6 (and equation (2)) assures us that M+ is also Lipschitzianly invertible near (,f,, :0() + (I M)(z,+, To summarize: (z, y ) = 40 + (I F)(z(, )) =(z, y y, uo ) solves (GE) and (LI) and (SSOS) hold at (z?, y?) = T- 1n u is Lipschitz when restricted to V, for some neighborhoods U, 7 of (z?, yI,+ u ), 0 E M + N n xRm+n) (A R2n+m respectively n u is Lipschitz when restricted to V, for some neighborhoods U, V of (z, ,0) respectively y (0I, ), 0oo n U is Lipschitz when restricted to V, (M + NRn,+)- for some neighborhoods U, V of (z? y) ( ) respectively < M+ is Lipschitzianly invertible near (z?, yO). The proof is complete. E We show that Proposition 13 applies to convex programs. Let 0, g, F, o-, ,and M be as in Proposition 13. Let 0 and each component function gl (I = 1,..., m) of g be convex. Suppose there is z > 0 with g(z) < 0 (Slater constraint qualification) and (NLP) has a (global) minimum at z E(W. Then there exists y e R+ such that (z*, y*) = (z, 9) - F(z, y) is a zero of LEMMA 14. F+. If (LI) and (SSOS) hold at (z*, y*) then there is ao > 0 such that def Xo- | {(z, y) E [R+m IF+(z, <ao} Y)II1 is bounded, and VF(z+, y+)+ is Lipschitzianly invertiblenear each (z, y) E Xo. GLOBAL CONVERGENCE 381 OF DAMPED NEWTON S METHOD PROOF. It is a classical result in optimization (Mangasarian 1969, 7.3.7, McCormick 1983, 10.2.16) that there exists a Lagrange multiplier (y, iu) e l'm X Rn"such that 0 = V0(z) + Vg()Ty o< , - , i,2>)= 0, 0<?y, <y, g( )) = 0. Since z is also feasible (z > 0 and g(z) < 0), (z, y, u) = (, 9, i) is a solution of 0 (GE*) where F*(z, y, u) =def(F(, y, u) (F* + NR,tXR+?f)(Z, E 0), z). (GE*) is just the generalized y)-(U, equation (GE) from the proof of Proposition 13 when s? is 0 e R[n+m.It is easy to see that (2, y) solves (GE), 0 E (F + NR+n)(z, 9), - F(z, y) is a zero of F+; and (z*, y *, hence (z*, y*) def(z,) z*(2z ,i,), a solution of (GE*). Since (LI) and (SSOS) z) equals hold at x* =def(z*, Y*), the proof of Proposition 13 demonstrates that VF(xT)+ is Lipschitzianly invertible near x*. Lemma 7 with g =defVF(4X)+ and g =defF+ can be used to show that F+ is Lipschitzianly invertible near x*. Let U* be a neighborhood of x* and a. > 0 be such that -F+u *: U* aEB is Lipschitzianly invertible. So U* is bounded. For any element x? of U*, another shows that application of Lemma 7, this time with g =deF+ and g' =defVF(xo)+, VF(x? )+ is Lipschitzianly invertible near x?. It only remains to be seen that for some a0 > 0, def XO=F+ (aoBI) c U, because then X( is bounded and, for each x? X0, VF(x?)+ is Lipschitzianly invertible near x?. It is basic exercise in convex analysis to show that F is monotone on tR++-: (F(x) -F(x'), x -x') > 0, Vx x' E Rn+m By Brezis (1973, Proposition 2.10), F + NRn+m is a maximal monotone operator, meaning there is no monotone operator whose graph strictly contains the graph of F + NRn+E, Hence (F + Nn+m)-' is also maximal monotone and, as a result (see Brezis 1973, Example 2.1.2), (F + NRn+m)-'l() is a convex set for each $ E [n+m. We have established that F+ is Lipschitzianly invertible near x*, so Lemma 6 n+m such that yields neighborhoods U' of rc(x*) and V' of 0 E WR (F + NRn+m)-1 n uI is VI c a*lE and U' is so (F + NR,+m)-'(), (F + NRnm)-(~)n IU1 a Lipschitz function when restricted to Vl. We may assume open. Let 5 e V', x = (F + NRn+m)- '()f) U and x' x' need not lie in Ul. Then (1 - t)x + tx' lies in for all sufficiently small t E [0,1], because (F + NRnm)-1l() is convex and U1 is open. This contradicts uniqueness of x in U1, unless x' = x. 382 DANIEL RALPH Therefore (F + NRn+m)-'Ivris a function. It follows that, by using (2) from the proof of Lemma 6, F+ lIv is also a function. Recall F+ fl U* is a function when restricted to a 1B, and a [B contains V1; therefore F+1v 1- U* and F+l 1 are identical, in c U*. To choose F+1(V1) particular conclude, a( > 0 such that aoB c V1, and observe def X=F+F(a0B) c F-'(V ) c UW. i 5.4. Implementation of the path search. We will outline how the Newton path pk is obtained computationally at iteration k + 1, for solving (NE) when C =def1N. We are actually thinking of solving (NLP), that is letting N = defn + m and F be given by (15). First Lemke's algorithm (Cottle and Dantzig 1974), a standard pivotal method for solving linear complementarity problems, is reviewed. The Linear Complementarity Problem is a special case of the nonlinear complementarity problem when the function F defining (NLCP) is affine: fine v, w E RN such that (LCP) w = Mv + q, 0< 0 = (v,w), ,w, where M e RNxN and q E RN. In Lemke's algorithm an artificial variable uv0 R is introduced. Let e = def(1,.., 1)T E RN. At each iteration of the algorithm we have a basic feasible solution or BFS (Chvatal 1983), (v, w, v0), of the system w =Mv + q + evo, (17) 0 < (v,w, vu), which is almost complementary, i.e., for each basic variable ui (respectively wi), 1 < i < N, wi (respectively vi) is nonbasic. The variable wi is called the complement of vi, and vice versa. In particular (v,w) = 0, hence if v( = 0 then (v,w) solves (LCP). Given uv, (v, w) solves the parametric linear complementarity problem w = Mv + q + evoU, 0 , 0 = (v,w). The initial almost complementary BFS is given by taking uv large and positive, and (u, w) = (0, q + evo). So Lemke's algorithm may be viewed as a method of constructing a path of solutions (v(v0), W(Vo))to the parametric problem as uv moves down toward zero.1 We use this idea, but with a different parametric problem, to construct the path pk we need for iteration k + 1 of damped Newton's method. Let N =defn + m, F be given by (15) and xk =def(zk, yk). Let the first-order approximation / of F+ on R[N be given by (14); and denote x(xk) by Ak. Now is the solution x to pk(0) =defxk and, for each t E dom(pk),pk(t) (1-t)F+(xk) =Ak(x) =F(xk) = VF(k)+ + VF(xk )(x+-xk) (x) + -x+ + [F(Xk) - VF(xk)xt ]. tActually, v0 need not strictly decrease as a result of a pivot in Lemke's algorithm, in which case the corresponding solution (v, w) of the parametric problem is not a function of uv. In any case (u, w) traces out a path. OF DAMPED NEWTON S METHOD GLOBAL CONVERGENCE Equivalently, (u, w) = (pk(t)+, pk(t)mentarity problem: (LCP)(t) 383 pk(t)) solves the parametric linear comple- w = Mkv + qk - (1 - t)rk, 0 = (v,w), 0 < v,w, where 0 -Vg(zk) ^def/ ^ __/ q =fF(x ()-VF(x+)x+- rk -F+(Xk) = Mkk ^ . VO(Zk) - ^'Zk V 'k z + -g(z+) Vg(zk)z + qk _ wk and defV2 ( (uk,Wk) (X, + (yk )Tg)(z), X- Xk). Clearly (vk, wk) solves (LCP)(O),and a solution (v, w) of (LCP)(1) yields a zero u - w of Ak, i.e., the Newton iterate. To construct pk we use a modification of Lemke's algorithm in which the role of the auxiliary variable v0 is played by t. Assume (v, w, t) is an almost complementary BFS of (18) w = Mk + qk + (t - 1)rk, 0 < ,, 0 < t < 1, such that t is basic. This defines the current point on pk: pk(t) =defU - w. As there are exactly N basic variables, there is some i for which both vi and wi are nonbasic. One of these nonbasics is identified as the entering variable. The modified algorithm iterates likes Lemke's algorithm, in the following way. Increase the entering variable-altering the basic variables as needed to maintain the equation w = Mkv + qk + (t - 1)rk-until (at least) one of the basic variables is forced to a bound. We choose one of these basic variables to be the leaving variable, that is the variable to be replaced in the basis by the entering variable. This transformation of the entering and basic variables is called a pivot operation, and corresponds to moving along an affine piece of the path pk from the value of the parameter t before the pivot, told, to the current parameter value. After a pivot, (v, w, t) is still an almost complementary BFS for (18); the leaving variable is now nonbasic, and, assuming it is not t, its complement is nonbasic too. The next entering variable is defined as the complement of the leaving variable. This completes an iteration of the modified algorithm, and we are ready to begin the next iteration. The modified algorithm is initialized at (v, w, t) = (vk, wk, 0) with t as the nonbasic entering variable (so pk(0) = Uk - wk as needed). Now the algorithm cannot continue if either t becomes the leaving variable (because no entering variable is defined) or no leaving variable can be found (because increasing the entering variable causes no decrease in any basic variable). In the latter case we have detected a ray. Other stopping criteria are needed to reflect the 384 DANIEL RALPH aim of the exercise, namely to path search. We stop iterating if t = 1 (the entire path has been traversed), or t, with pk(t) = def - w, does not satisfy the descent condi- tion (NmD), or t strictly decreases as a result of the last pivot-satisfaction of (NmD) after each pivot is the basic requirement of a forward path search. In practice, an upper bound on the number of pivots is also enforced. It is a subtle point that, when solving (NLP), the first-order approximation at xk = (zk, yk) E [Rn+m used to define the path pk is exact for points x = (z, y) such that z+= z: Ak(x) = F(x+) +x - x+. Using this idea, we may save a little work by only checking (NmD) if the positive part of the variables z differ from zk. This corresponds in the modified Lemke's algorithm to only checking (NmD) if the subvectorof the first n componentsof v differsfrom the correspondingsubvectorof k Suppose the method halts because a ray is detected or t decreases. In both cases we know Ak is not invertible at pk(told), otherwise it would have been possible to perform a pivot yielding t > told, so we take xk+ =-defpk(told). Another possibility is that the method halts because (NmD) fails at t, in which case we use the Armijo line search on [p(told), p(t)] to determine a path length tk E [told, t] satisfying (NmPs), and take xk+l defpk(tk) We have described our implementation of a forward path search. A backward path search was also implemented as follows. We start by recording the first tuple (uk,Wk,0), and then perform pivots as described previously, recording the tuple (v, w,t) after each successful pivot. As before, pivoting stops if a ray is detected, t leaves the basis, or t = 1; we do not, however, use the condition (NmD) or the sign of the change in t to decide whether to continue pivoting. Suppose J is the number of successful pivots. If the (J + l)th tuple (v,w, t) is accurate, meaning (NmD) is satisfied at t with pk(t) =def - W, then take xk?+ =defy - w. If not, take J = 1, J = J + 1 and, while J + 1 = J, do the following bisection search on the list: if the tuple at position J' =def[(J + J)/2] in the list is accurate, then let J = J' and J be unchanged, otherwise let J be unchanged and J = J'. After the bisection search, the tuple at position J is accurate and the tuple at position J + 1 is not accurate. An Armijo line search is now carried out on the interval between the points v -w defined by these two tuples, and xk+l defined accordingly. One difficulty we have not discussed is the possibility that (uk, wk) is not a basic solution of w = Mkv + qk - rk O < v,w, i.e., (uk,wk, 0) with t nonbasic cannot be used as a starting point of the modified Lemke's algorithm. To overcome this, we performed a modified Cholesky factorization (Gill, Murray and Wright 1981) of the Hessian of the Lagrangian, 7'k' above, changing it if necessary into a positive definite matrix. This operation-somewhat crude because it attacks the entire Hessian of the Lagrangian, rather than just the "basis" columns-may not always be sufficient to correct the problem since columns of Mk not corresponding to columns of k'' are generally present in the initial basis. However it was sufficient in the problems we tried, below. It has the added advantage of being a heuristic for preventing convergence to a Karush-Kuhn-Tucker point of the nonlinear program which corresponds to a local maximum or saddle point instead of a local minimum of the program. 5.5. Computational examples. Our computational examples are of global, or path search damped Newton's method applied to the nonlinear programs of the form 385 GLOBAL CONVERGENCE OF DAMPED NEWTON'S METHOD (NLP): min 0(z) subject to z > 0, g(z) < 0. The computer programs were written in C and the computation carried out on a Sun 4 (Sparcstation). Most problems are taken from Ferris (1990), with starting variables z? close to solutions2 and initial multipliers y0 set to zero. We note that these problems all have small dimension: no more than 15 variables and 10 nonlinear constraints. Results include performance of global Newton's method using both the forward and backward path search procedures. Other problems, showing convergence of the damped method in spite of the cycling (nonconvergence) that occurs without damping, are also tested. For comparison we have also tested local (Robinson- or Josephy-)Newton's method on the same problems, using Lemke's algorithm to find the Newton iterate at each iteration. This method was found to be somewhat fragile with respect to starting points: quite often the method failed because a ray was detected by Lemke's algorithm, i.e., the Newton iterate xk+? could not be determined. To make the method more robust we altered it, allowing it continue after Lemke's algorithm detects a ray by setting xk+ = v - w, where (r, w, L0()is the last almost complementary BFS produced by Lemke's algorithm. The altered method was usually successful in solving test problems. Our limited computational results, below, bear out what intuition suggests. First, the global method is more robust than the local method especially in avoiding cycling, discussed below. Second, the global method requires more stringent conditions on the current approximation Ak than does the local method, and enforcing these heuristically can degrade convergence. Third, the computational cost of carrying out a path search is seen in the increase the number of evaluations of F. For these problems, the backward path search was more efficient than the forward procedure because, at each iteration, the Newton iterate pk(l) E Ak (0) usually existed and satisfied the descent criterion (NmD), so that xk+l =defpk(l). In higher dimensions we expect the value of path searching to be even greater, a view partly supported by the following observations. One peculiarity of Newton's method is the possibility of cycling, depending on the starting point, as in the smooth case. This is seen for the altered Newton's algorithm when testing the Colville 2 problem with a feasible starting point (Table 1). It is easy to find a real function of one variable, which, when minimized over [0, oo) by Newton's method, demonstrates cycling of (unaltered) Newton's method (Example 15 below). It turns out that for any problem (NLP), the new problem formed by adding such a function in an (n + l)th variable to the objective function 0 cannot be solved by Newton's method for some starting points (Example 16): at best it will converge in the first n variables and cycle in the (n + l)th variable. So it can be argued that the likelihood of cycling in Newton's method increases as the dimension of the problem increases. Global Newton's method converges for otherwise well behaved problems. Table 1 summarizes our results on the problems from Ferris (1990). The following parameters were set: in the path search, a =def0.1, ='def0.5, M =def4. The problem was considered solved when the Euclidean norm of the residual satisfied I|F(x) + XXk X < 10-5 2Fortran subroutines and initial values of variables supplied by Michael C. Ferris, Computer Sciences Department, University of Wisconsin-Madison, Madison, WI 53706. 386 DANIEL RALPH TABLE 1 I Local Newton. II Global Newton with a Forward Path Search. III Global Newton with a Backward Path Search. Problem Size m Xn Rosenbrock Himmelblau Wright Colville 1 Colville 2 (feasible) Colville 2 (infeasible) 4 x 3x 3x 10 x 5x 5 x 2 4 5 5 15 15 Pivots Evaluations Iterations I II III I II III I II III 20 25 99 31 * 113 19 7 31 41 23 40 21 7 31 41 24 40 7 6 8 4 17 6 29 5 21 23 15 6 29 4 13 8 6 5 7 3 * 7 9 5 27 3 8 7 9 5 27 3 8 7 * 8 Starting points x?t def(z0 y0) E [R""+Mwere taken with the variables z? used in Ferris (1990) and the multipliers y =def0. The first column of Table 1 gives the name of the problem, and the second column gives the number of constraints m in the vector inequality g(z) < 0, and the number of variables n, i.e., the respective dimensions of multipliers y and variables z. Each of the remaining columns contains three subcolumns corresponding to three implementations of Newton's method: I denotes the (altered) local Newton's method, II, the global method with a forward path search, and III, the global method with a backward path search. The 'Pivots' column lists the total number of pivots required by the three implementations to solve a given problem. The 'Evaluations' column lists the total number of evaluations of F used by the three implementations. The final column, 'Iterations', shows the number of iterations needed by the implementations to solve each problem. An asterisk * indicates failure to solve the problem. REMARKSON TABLE 1. Iterations. The most important feature of the results is the number of iterations needed, i.e., the number of evaluations of the Hessian of the Lagrangian, f/". Three problems where the number of iterations differ widely are discussed further. The objective function of the Rosenbrock problem is highly nonlinear causing any damping of Newton's method (even in the unconstrained case) to take very short steps, hence many iterations, unless the current iterate is very close to a minimizer. The nonmonotone line search alleviates this problem: with memory length M= 4 only 9 iterations were needed, while more than 500 iterations were needed in another test using the monotone path search (M = 1). In other tests using the monotone path search, the remaining problems required the same or slightly fewer iterations than listed in Table 1. In the Wright problem, the local method does much better (7 iterations) than both implementations of the global method (27 iterations) because it uses the actual Hessian of the Lagrangian whereas the global implementations introduce error by using a modified Cholesky version of this. In another test, 27 iterations were also required by the local method using the modified Cholesky version of the Hessian of the Lagrangian. This highlights a difficulty of path search damping: care is needed to ensure that the path is well defined near current iterate. On the other hand, this has advantages (other than damping) as we see in the remarks on pivots below. For the Colville 2 problem with a feasible starting point, the local Newton's method cycled while the global Newton implementations solved the problem easily. In the local implementation, after the first iteration the pair (uk, wk) alternated between two different vector pairs for which the associated residual [IF(vk) - wkfltook values 98 and 5 respectively, the former corresponding to an unbounded ray. Evaluations. The number of evaluations of F, i.e., of V0, g and Vg, is higher for the global implementations than for the local implementation, because the former ()F DAMPED NEWION'S GLOBAL CONVERGENCE METHOD 387 compare IlF() - wll to (1 - t)llF(vt) - wHll after perhaps several pivots during iteration k + 1, whereas the latter only checks the norm of the residual at the start of the iteration. For global Newton's method, the backward path search (III) generally uses fewer function and derivative evaluations than the forward version (II) because, near the solution, the Newton iterate usually exists and is acceptable as the next iterate. The difference between the number in the 'Pivots' column and the number in the 'Evaluations' column for method II (global Newton with a forward path search) is the savings in function/derivative evaluations obtained by only checking (NmD) when the first n components of v differ from the first n components of vk. Pivots. The global implementations usually require fewer pivots per iteration than does the local implementation. This is not surprising: the damped method requires the current iterate xk to correspond to a basis of (LCP)(0). We initiate the modified Lemke's algorithm at this basis by explicitly factorizing the corresponding matrix. By starting with this 'warm' basis, both the forward and backward implementation of the damped method generally take many fewer pivots than the local implementation to find the Newton iterate, the solution of (LCP)(1). EXAMPLE 15. Let (': R --> be a differentiable function whose gradient is dle V4(z) arctan(z - 10), the shifted inverse trigonometric tangent function. Integrating and setting the constant of integration to be zero we obtain O(z) (z - 10)arctan(z - 10) - (1/2)og(l + (z - 10)2). Now consider a simple problem of the form (NLP): subject to z > 0. min 4(z) 10. The unique solution is z The unique solution is z = 10. It can be easily shown that for any starting point z? - 101> 2 Newton's method cycles infinitely. This has been observed in computation for several such starting points. Global Newton's method using the forward path search, with the same parameters as used to obtain the results of Table 1, converged in 4-33 iterations over a variety of starting points 2 < Iz? - 101< 100. EXAMPLE 16. Derive a problem from (NLP) by including an extra variable z,,+. The new objective function is def O(zi,..., l) -= O(z,...,z,) + (z,+), where ( is defined in the last example. The constraints are g(z,.., z,,) < 0 as for see that not hard to It is ). starting any point before, and 0 < z =def(z,..., z,,+ (z0, y0) E [ "+ X W"'in which 1z2?+l- 101> 2, Newton's method cannot converge in z +1. We have tested this for the Himmelblau (NLP), where n =de4, (z(),..., z4) are the initial variables used in Ferris (1990), z = 0, and all multipliers are initially zero. Newton's method cycles infinitely; damped Newton's method using the forward path search, with the same parameters as used for Table 1, converges in 33 iterations. In other tests using the monotone path search, damped Newton's method using the forward path search only required 4-7 iterations to solve the problem in Example 15 388 DANIEL RALPH from various starting points; and 9 iterations for the augmented problem of Example 16 using the starting point specified there. In both cases this is an improvement on nonmonotone damping. It is interesting that nonmonotone damping, which aids convergence for highly nonlinear problems by accepting iterates that need not decrease the size of the residual, slows convergence here because it perpetuates almost cyclic behavior of the iterates. Acknowledgement. This research was initiated under funding from the Air Force Office of Scientific Research, Air Force Systems Command, USAF, through Grants No. AFOSR-88-0090 and No. AFOSR-89-0058; and the National Science Foundation through Grant No. CCR-8801489. Further support was given by the National Science Foundation for the author's participation in the 1990 Young Scientists' Summer Program at the International Institute of Applied Systems Analysis, Laxenburg, Austria. Finally, this research was partially supported by the U.S. Army Research Office through the Mathematical Sciences Institute, Cornell University, and by the Computational Mathematics Program of the National Science Foundation under grant DMS-8706133. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. I would like to thank Stephen M. Robinson for his support of this project, through both initial funding and discussions, and his continued interest. I also thank Laura M. Morley for her assistance in implementing path search damped Newton's method; and Oleg P. Burdakov, Michael C. Ferris, Liqun Qi, and two anonymous referees for their helpful comments. This paper was completed while the author was at the Advanced Computing Research Institute, Cornell University, Ithaca, New York 14853. References Bonnans, J. F., (1990). Rates of Convergence of Newton Type Methods for Variational Inequalities and Nonlinear Programming, manuscript, Institut National de Recherche en Informatique et en Automatique, Rocquencourt, France. Br6zis, H. (1973). Operateurs Maximalix Monotones et Semigrollpes de Contractionssdans les Espaces de Hilbert, North-Holland Mathematics Studies No. 5, North-Holland, Amsterdam. Burdakov,O. P. (1980). Some Globally Convergent Modifications of Newton's Method for Solving Systems of Nonlinear Equations, Sotiet Math. Dokl. 22 376-379. Chvatal, V. (1983). Linear Programming, W. H. Freeman and Co., NY. Clarke, F. H. (1983). Optimization and Nonsmooth Analysis, John Wiley and Sons, NY. Cottle, R. W. and Dantzig, G. B. (1974). Complementary Pivot Theory of Mathematical Programming. In Studies in Optimization 10, (G. B. Dantzig and B. C. Eaves, Eds), MAA Studies in Mathematics, pp. 27-51. Ferris, M. C. (1990). Iterative Linear Programming Solution of Convex Programs. J. Optim. Theoiy Appl. 65 53-65. and Lucidi, S., (1994). Nonmonotone Stabilization Methods for Nonlinear Equations. J. Optim. TheoryAppl. 81 (to appear). Fletcher, R. (1987). Practical Methods of Optimization. 2nd ed., John Wiley and Sons, NY. Fukushima, M. (1992). Equivalent Differentiable Optimization Problems and Descent Methods for Asymmetric Variational Inequality Problems. Math. Programming 53 99-110. Gill, P. E., Murray, W. and Wright, M. H. (1981). Practical Optimization. Academic Press, NY. Grippo, L., Lampariello, F. and Lucidi, S. (1986). A Nonmonotone Line Search Technique for Newton's Method. SIAM J. Numer. Anal. 23 707-716. Han, S.-P., Pang, J.-S. and Rangaraj, N. (1992). Globally Convergent Newton Methods for Nonsmooth Equations. Math. Oper. Res. 17 586-607. Harker, P. T. and Pang, J.-S. (1990). Finite-dimensional Variational Inequality and Nonlinear Complementarity Problems: A Survey of Theory, Algorithms and Applications. Math. Programming 48 161-220. and Xiao, B. (1990). Newton's Method for the Nonlinear Complementarity Problem: A B-Differentiable Equation Approach. Math. Programming 48 339-357. Josephy, N. H. (1979). Newton's Method for Generalized Equations. Technical Summary Report #1965, Mathematics Research Center, University of Wisconsin-Madison. GLOBAL CONVERGENCE )F DAMPED NEWTON'S METHOD 389 Kojima, M. (1980). Strongly Stable Stationary Solutions in Nonlinear Programs. In Analysis and Computation of Fixed Points (S. M. Robinson, Ed.), Academic Press, NY. pp. 93-138. and Shindo, S. (1986). Extensions of Newton and Quasi-Newton Methods to Systems of PC' Equations. J. Oper. Res. Soc. Japan 29 352-374. Mangasarian, 0. L. (1969). Nonlinear Programming. McGraw-Hill, NY. Marcotte, P. and Dussault, J.-P. (1987). A Note on a Globally Convergent Newton Method for Solving Monotone Variational Inequalities. Oper. Res. Lett. 6 35-42. McCormick, G. P. (1983) Nonlinear Programming, Theory, Algorithms and Applications. John Wiley and Sons, NY. Ortega, J. M. and Rheinboldt, W. C. (1970). Iterative Solution of Nonlinear Equations of Se'eral Variables. Academic Press, San Diego. Pang, J.-S. (1990). Newton's Method for B-differentiable Equations. Math. Oper. Res. 15 311-341. (1991). A B-differentiable Equation Based, Globally, and Locally Quadratically Convergent Algorithm for Nonlinear Programs, Complementarity and Variational Inequality Problems. Math. Programming 51 101-131. (1993). Convergence of Splitting and Newton Methods for Complementarity Problems: An Application of Some Sensitivity Results. Math. Programming 58 149-160. and Gabriel, S. A. (1993). NE/SQP: A Robust Algorithm for the Nonlinear Complementarity Problem. Math. Programming 60 295-337. Han, S.-P. and Rangaraj, N. (1991). Minimization of Locally Lipschitz Functions, SIAM J. Optim. 1 57-82. Park, K. (1989). Continuation Methods for Nonlinear Programming. Dissertation, Department of Industrial Engineering, University of Wisconsin-Madison. Qi, L. (1993). Convergence Analysis of Some Algorithms for Solving Nonsmooth Equations. Math. Oper. Res. 18 227-224. Ralph, D. (1993). A New Proof of Robinson's Homeomorphism Theorem for pl-Normal Maps. Linear Algebra Appl.. 178 249-260. Rheinboldt, W. C. (1969). Local Mapping Relations and Global Implicit Function Theorems. Trans. Amer. Math. Soc. 138 183-198. Robinson, S. M. (1979). Generalized Equations and Their Solutions. Part I: Basic Theory. Math. Programming Stud. 10 128-141. (1980). Strongly Regular Generalized Equations. Math. Oper. Res. 5 43-62. (1983). Local Structure of Feasible Sets in Nonlinear Programming. Part I: Regularity. In Numerical Methods: Proceedings, Caracas 1982 (V. Pereyra and A. Reinoza, Eds.), Springer-Verlag, Berlin, pp. 240-251. (1987). Local Structure of Feasible Sets in Nonlinear Programming. Part III: Stability and Sensitivity. Math. Programming Stud. 30 45-66. (1988). Newton's Method for a Class of Nonsmooth Functions. Manuscript, Department of Industrial Engineering, University of Wisconsin-Madison. (1992). Normal Maps Induced by Linear Transformations. Math. Oper. Res. 17 691-714. Taji, K., Fukushima, M. and Ibaraki, T. (1993). A Globally Convergent Newton Method for Solving Strongly Monotone Variational Inequalities. Math. Programming 58 369-383. Wilson, R. B. (1963). A Simplicial Method for Concave Programming. Dissertation, Graduate School of Business, Harvard University. Wu, J. H., Florian, M., and Marcotte, P. (1990). A General Descent Framework for the Monotone Variational Inequality Problem. Publication CRT-723, Centre de Recherche sur les Transports, Universite de Montreal, Canada. D. Ralph: Mathematics Department, University of Melbourne, Parkville, Victoria 3052, Australia; e-mail:danny@(mundoe.maths.mu.oz.au
© Copyright 2026 Paperzz