NONLINEAR OPTIMIZATION BY A CURVILINEAR PATH STRATEGY L. GRANDINETTI Dipartimento di Sistemi Universita d e l l a Calabria 87036 Arcavacata (Cosenza), I t a l y I. INTRODUCTION To f i n d a local solution of the unconstrained minimization problem min F ( x ) , x ~ R n i t is not a straightforward c a l c u l a t i o n in many d i f f i c u l t situations.These s i t u a t i o n s , associated - f o r instance- to the s o l u t i o n of nonlinear programs via penalty methods, may be p i c t o r i a l l y described as those in which the objective function behaves l i k e a long curved v a l l e y with very steep walls. P a r t i c u l a r l y in these s i t u a t i o n s i t may be suitable to f i n d a curved t r a j e c t o r y in Rn which passes through the minimizer and, then, to f o l l o w this t r a j e c t o r y by c u r v i l i n e a r searches with the aim to reach the solution in few Zom# steps. This s t r a tegy is intended to overcome the t y p i c a l behaviour of classical descent methods which, in these s i t u a t i o n s , necessarily perform many ~ho~t steps along l i n e a r monodimensional manifolds. In f a c t , most of them are based on the i t e r a t i v e model Xk+1 = Xk+~kdk (I) where k denotes the i t e r a t i o n index, x k is the current i t e r a t e , dk~ Rn defines the d i r e c t i o n o f a l i n e a r t r a j e c t o r y , and ~k E R+ is a steplength along dk s u i t a b l y chosen in such a way that f ( X k + l ) < f ( x k ) . In this paper an o r i g i n a l way to construct a t r a j e c t o r y is pointed o u t ; i t is baseo on the concept o f p a r a l l e l tangent hyperplanes and possesses some i n t e r e s t i n g theoret i c a l properties. In p r i n c i p l e this t r a j e c t o r y may be derived as solution of a system of d i f f e r e n t i a l equations; however i t is shown how a s u i t a b l e approximation of i t can be obtained without any e x p l i c i t solution of the d i f f e r e n t i a l equations. L a s t l y , a prototype implementation o f t h i s c u r v i l i n e a r path strategy is devised; the relevant numerical experiments, although l i m i t e d , seem to indicate that the metnoa may be capable of very high e f f i c i e n c y , at l e a s t f o r certain classes of objective functions. 290 2. GENERALLY DESIRABLE TRAJECTORYFEATURES A t r a j e c t o r y x(m), i . e . a nonlinear monodimensional manifold param~trized by the scalar mER+, may be associated in many ways to f ( x ) f o r the purpose of i t s minimizat i o n ; therefore i t is sensible to devise a framework of suitable general properties to be possessed by t r a j e c t o r i e s of p r a c t i c a l i n t e r e s t . Here the f o l l o w i n g properties are taken i n t o consideration, having assumed f:Rn-~R globally differentiable. ( i ) Regularity A basic natural requirement f o r the t r a j e c t o r y is that x(m):R+ ~R n be a one-to-one continuous mapping. This excludes the p o s s l b i l i t y f o r d i s c o n t i n u i t i e s , b i f u r c a t i o n points and loops to occur; the benefits of avoiding such s i t u a t i o n s can be remarkable whenever a numerical treatment o f the t r a j e c t o r y has to be done. And in f a c t t h i s is the case f o r t r a j e c t ories of practical i n t e r e s t . The additional stronger requirement that x(~) be g l o b a l l y d i f f e r e n t i a b l e , can be also considered generally desirable; t h i s means, in other words, that for each value of m is defined the tangent l i n e a r v a r i e t y to the t r a j e c t o r y at x(m). (ii) Suitability Straightforward requirements f o r a t r a j e c t o r y x(m), mER+, to be s u i t a b l e f o r the purpose of local minimization of f ( x ) , can be considered the f o l l o w i n g : (a) the convergence to the |ocal minimizer x* from an i n i t i a l (b) the p o s s i b i l i t y to determine ~ estimate x(o)=Xo; (a f i n i t e or a l i m i t value) such that x(~*) be the local optimizer. Let us consider, for example, the f o l l o w i n g objective function: f ( x ) = ~xTAx+bTx AER nxn , A>O (2) and bERn and associate to i t the t r a j e c t o r y i m p l i c i t l y defined by the system of ordinary d i f f e r e n t i a l equations (ODE): x ' ( ~ ) = -A-Igo with x(o) = xo and go =- Vf(xo) I t is immediate to see that the t r a j e c t o r y is regular and s u i t a b l e ; i . e . i t passes through the minimizer and, in a d d i t i o n , f o r ~ = I , we get: 29t x ( ] ) = Xo-A-Igo ~ x~ . I t is worth while to observe that f ( x ) is always decreasing on x(~), for ~E[o,= * ] . Remark 2.1 The attribute (b) is an ideal one; generally neither x(~) is a known function of nor i t is possible to determine a n a l y t i c a l l y x(a*) like in previous simple example. Therefore the weaker requirement (more r e a l i s t i c than that stated in (b)) that f(x) decreases as ~ increases to ~ can be, more conveniently, assumed. This descent property of the trajectory may result p a r t i c u l a r l y useful in i t s practical numerical processing. (iii) Characterizability I t is crucial that x(~) be characterized in terms of properties of f ( x ) . An useful way for defining the trajectory may be that of exploiting d i f f e r e n t i a l properties of f ( x ) (e.g. x(~) expressed as solution of a system of ODE). A particular characterization of this type is discussed in the sequel of this paper. ( i v ) Linear invariance I t is sensible that the trajectory described into the domain of the variable x be invariant under a linear transformation of variables defined by: x = J~ eRn ; JER nxn , det J#O. This precisely means that trajectories constructed in terms of x and ~ with the same procedure are related in such a way that: x(~) = J ~(~) , v~ 3. A TRAJECTORYDERIVED VIA PARALLEL TANGENTHYPERPLANES A way to construct a trajectory based on the concept of parallel tangent hyperplanes is pointed out. Several desirable properties and features are possessed by this trajectory for classes of functions of practical interest. The basic idea on which the method is founded can be usefully introduced by means of a geometric sketch in R2, i l l u s t r a t e d in F i g . l . Let us consider, in correspondence to a starting point xo, where go,o, the tangent plane to the level curve defined by f ( x ) = f ( x o ) . Furthermore consider the parallel planes g~(x-xo) = -~ 292 parametrized by the scalar m~o. F i n a l l y consider those points where the planes defined before are tangent to the level curves. The locus of such points (parametrized by the nonnegative scalar ~) defines the t r a j e c t o r y x(~). I f , f o r example, we consider again the s t r i c t l y convex function given by (2),then i t is easy to recognize that the t r a j e c t o r y associated to i t via p a r a l l e l tangent planes i s : (i) regular ( p r e c i s e l y , i t i s l i n e a r ) ; (ii) characterizable; (iii) suitable : ~ x ( ~ ) = x ~ e a s i l y (iv) linearly invariant. determined; As an immediate consequence, we have t h a t the t r a j e c t o r y derived on the basis o f the previous concept possesses e x c e l l e n t behaviour on convex quadratic functions (prec i s e l y the same as the Newton method) and thus, t y p i c a l l y , near the s o l u t i o n of a general smooth function. However, i t i s worthwhile to observe t h a t , in a d d i t i o n , a useful behaviour is possessed for nonconvex quadratic functions (e.g. concave, saddle), d i f f e r e n t l y of the simple Newton method. In f a c t , in these cases a descent t r a j e c t o r y , l i n e a r l y i n v a r i a n t , still is provided. Since a l l desirable properties are generally guaranteed at least near the solution of a smooth f u n c t i o n , i t seems natural to extend t h i s procedure to general cases. / level curves of f(x) trajectory x(a) parallel tangent planes g~(X-Xo)=-~ F i g . l . P a r a l l e l tangent planes and related t r a j e c t o r y . 293 The g e n e r a l i z a t i o n of the method to the n-dimensional case and i t s proper formul a t i o n lead to formalize the d e f i n i t i o n o f the t r a j e c t o r y as the locus of points s a t i s f y i n g the f o l l o w i n g canonical conditions: g~ (X-Xo) : - ~ (3a) g ( x ( ~ ) ) : X(m)go (3b) where: ~o; x(o) =xo ; go~g(xo) # o and ~(m) is a scalar function such that x(o)=l and ~(~)=o whenever g(x(~))=o ( i . e . at any s t a t i o n a r y point on the t r a j e c t o r y ) . I t is worth noting that X(m) can be interpreted as the Lagrange m u l t i p l i e r associated to the subproblem of minimizing f ( x ) constrained by the generic tangent hyperplane. In addition to (3), the c o n t i n u i t y of x(m) is assumed as a canonical side-condition; in f a c t i t guarantees (at least f o r broad classes of problems) local u n i c i t y of the trajectory. F i n a l l y , algebraic manipulations lead to the f o l l o w i n g c h a r a c t e r i z a t i o n : x'(~) = - u~:) g~u(~) (4a) x(o) = x 0 (4b) Here the n-dimensional vector u(e) s a t i s f i e s : Gu = go where Gzv2f(x(e)). In p r i n c i p l e the c h a r a c t e r i z a t i o n is not always possible ( i . e . f o r any value of a) and may seem quite r e s t r i c t i v e . However, in p r a c t i c e , when the t r a j e c t o r y is treated by some suitable numerical technique, there are no severe consequencies at least f o r broad classes of functions (provided that the t r a j e c t o r y e x i s t s and is well defined). The properties of the t r a j e c t o r y formally defined here, generally agree, at least for some large classes of f u n c t i o n s , with those desirable features described in section 2. In f a c t x(~) is regular, i f f ( x ) is s u f f i c i e n t l y well-behaved. In a d d i t i o n , it is linearly inva~Jc~tsince the f o l l o w i n g r e s u l t holds Fact 3.1. Let us assume x=J~ ~eR n , J mdet J # o . Then, on the basis of the defining conditions (3), the t r a j e c t o ~ e s i n t h e domain of x and ~ are related by: x(m) = O ~(~) , showing l i n e a r invariance. Moreover a deseent property r e l i e s on the f o l l o w i n g r e s u l t [] 294 Fact 3.2. I f x(~) is g l o b a l l y d i f f e r e n t i a b ! e , then i t is a straightforward consequence of (3) that gT x ' ( ~ ) = - X(~) . [] I t is worth noting that, since ~(o)=I, i n i t i a l l y the trajectory's slope is negative; then, whenever ~(m) >o, the trajectory is a descent one. This happens, at least, on the arc between m=o and ~ (for which ~(~):o) where a stationary point is encountered. Lastly, the trajectory is generally charaete~zc~Ze. Except "pathological" situations, this happens almost everywhere and, i f necessary, the trajectory can be described by continuation; as a typical example, the following objective function can be considered: f ( x ) = 1 - exp (-xTx) . I t i s worth noting that most of the above mentioned properties can be p r e c i s e l y established for some important classes of functions. For s t r i c t l y convex and g l o b a l l y d i f f e r e n t i a b l e f u n c t i o n s , i t can be proved, in p a r t i c u l a r , that ~(~) is monotonically decreasing; in f a c t , i t stands the f o l l o w i n g : Len~na 3.1. Given f ( x ) s t r i c t l y ~i<~2 convex and g l o b a l l y d i f f e r e n t i a b l e , then f o r any ( a l , ~ 2 ) with we have ~2 < ~I. [] With e s s e n t i a l l y the same assumptions on f ( x ) , then the stronger r e s u l t s t h a t the t r a j e c t o r y is regular, s u i t a b l e , l i n e a r l y i n v a r i a n t and characterizable can be proved. For the class of pseudoconvex functions in the sense of Mangasarian [ I ] , which are i n t e r e s t i n g f o r t h e i r a p p l i c a t i o n s in several contexts, the t r a j e c t o r y s t i l l possesses the desirable properties although in weaker sense (~(m) non monotonically decreasing; c h a r a c t e r i z a b i l i t y almost everywhere). On the basis of the many sound r e s u l t s , i t seems possible to draw the conclusion that the method generally produces t r a j e c t o r i e s well-behaved, is descent and e s s e n t i a l l y "becomes" the Newton method near the s o l u t i o n o f smooth f u n c t i o n s , thus showing good p o t e n t i a l i t i e s f o r e f f e c t i v e numerical implementations. 4. QUADRATICMODEL OF THE TRAJECTORY I t is not r e a l i s t i c , in general s i t u a t i o n s , t o derive the t r a j e c t o r y x(m) as so- l u t i o n of a system o f nonlinear ODE of the type (4). To overcome the problem of e x p l i c i t l y solving d i f f e r e n t i a l equations, here the f o l l o w i n g quadratic model of x(~) is taken i n t o consideration: x(~) = x(o) + ~ s + ½~2t (5) 295 where s ~ x'(o) ; t ~ x"(o) . I f f(x) is such that the trajectory is l o c a l l y characterizable at xo, then s can be e x p l i c i t l y obtained;indeed,in this case, i t is an obvious consequence of (4) the following Fact 4,1. At the i n i t i a l point xo the trajectory's f i r s t derivative is given by: GoI go S = - ~ T_ I f , in (6) [] goGo go addition, f(x) possesses continuous third derivatives, then the following important result holds Proposition 4.1. I f f ( x ) e C 3 and, in addition, the trajectory is locally characterizable at Xo, then the vector t can be derived by solving: Got : - ~0 s (7) S where Go i s a t e n s o r s [ Go = defined , , GIS , G2s , . . . . by: : , GnS ] T with Gi v2gilXo~'' ; gi ~ ~B f , i=l,...,n. Proof. Let us assume the following second order approximation of the gradient g along the trajectory: x g(x(~)) = g(x(o)) + GO [x(~)-x(o)] + IGo[x(~)-x(o)] where x~ [x(~) - x(o)] : as + ~ 2 t ' G nx]T o ~ [GIX'' ' I t follows that s t g(x(~)) = g(x(o~+aGoS+½~2Got+½(eGo+½~2Go)(~s+~e2t) and, by neglecting terms of high degree in ~, the following gradient approximation is i . . . . | obtained: S g(x(a)) = g(x(o))+~GoS+~2Got+½~2GoS • (8) But, on the other hand, from conditions (3) i t can be easily derived the following approximation for X(~): X(~) = l g~ G;1 go which, u t i l i z e d in the canonical condition 296 g(x (~)) : x(~) go , permits to obtain: g(x(~)) = % _ ~ ~ go (9) By equating the expressions given by (8) and ( 9 ) , we f i n a l l y get: s Got + GoS = o . [] I t derives from the above results t h a t , under r e l a t i v e l y mild assumptions on f ( x ) , the quadratic model of the t r a j e c t o r y is completely specified. 5. COMPUTATIONALASPECTS The c a l c u l a t i o n of t h i r d d e r i v a t i v e s , needed in general s i t u a t i o n s to obtain the quadratic model of the t r a j e c t o r y , seems to be a serious computational complication. In p r i n c i p l e these c a l c u l a t i o n s could be avoided by approximating t h i r d d e r i v a t i v e matrices in a suitable sense (e.g. by quasi-Newton techniques). However e x p l i c i t computation of t h i r d d e r i v a t i v e s is challenging, at l e a s t f o r problems of moderate s i z e , e s p e c i a l l y i f the expected benefits are noticeable. Under t h i s respect,modern advanced computational techniques, l i k e p a r a l l e l processing and symbolic manipulation, may r e s u l t very h e l p f u l . Apparently a second source of d i f f i c u l t i e s may be the occurrence of s i n g u l a r i t i e s of the Hessian matrix of f ( x ) ; in practice t h i s is not a serious one, at l e a s t objective functions s u f f i c i e n t l y for regular, provided that an appropriate i n i t i a l i z a t i o n of the t r a j e c t o r y is done and a s u i t a b l e c u r v i l i n e a r search along the quadratic model is performed. When the approximating quadratic t r a j e c t o r y is used in an i t e r a t i v e scheme, i t becomes c r u c i a l to devise e f f e c t i v e steplength algorithms f o r the computation of a s u i t a b l e value of m, at any i t e r a t i o n . The present quadratic m~.del can b e n e f i t in many s i t u ations of the Goldfarb [ 2 ] c u r v i l i n e a r path steplength algorithms; these generalize the Armijo and Goldstein conditions to the case x(~) = x(o) +~s + ~2t p r o v i d i n g , under mild assumptions, convergence to a s t a t i o n a r y p o i n t . Additional aids to a p r a c t i c a l numerical implementation of the c u r v i l i n e a r search are: ( i ) a t h i r d order approximation of f ( x ) , r e s t r i c t e d to the quadratic t r a j e c t o r y , can be computed w i t h o u t any extra computational e f f o r t ; ( i i ) a good i n i t i a l by the value of guess of the steplengthcan be provided, at l e a s t near the s o l u t i o n , ~ ~ ~(~) = o. 297 6. PROTOTYPEIMPLEMENTATION An implementation has been devised, f o l l o w i n g previous ideas, and tested on functions which seem to possess features s u i t a b l e f o r a meaningful v a l i d a t i o n o f the method. A very e f f e c t i v e behaviour has been achieved, by using accurate c u r v i l i n e a r search f o r the f o l l o w i n g t e s t functions: Test i. Rosenbrock standard f ( x ) : I00 (x~-x2) 2 + ( I - x i ) 2 starting point Test 2. , xo = ( - I . 2 , l.O) T . Powell singular f ( x ) = (Xl+lOx2) 2 + 5(x3-x4)2 + (x2-2X3)4 + lO(Xl-X4 )4 , starting point xo = (3, -l , 0 , I) T The numerical results for the Rosenbrock function, which can be considered a good model of long curved valley with steep walls, are reported in Table I . Table I . Results for the test function ITERATION I n i t i a l Conditions x -1.20 Rosenbrook standard ........... ].00 f(x) 24.2 Ist 0.568 0.322 1.8xlO - l 3rd 0.996 0.991 9.5xi0 -5 5th 1.000 1.000 1.4xlO-ll These can be f u l l y appreciated i f compared with analogous results of very well established classical algorithms. A good quasi-Newton algorithm takes 28 iterations for decreasing f ( x ) to the value of I0-5; similar results are provided by a good implementBtion of the newest collinear scaling algorithm [3] . Moreover, a trust region algorithm, of the Hebden-Mor~ type [ 4 ] , takes 21 iterations to minimize f ( x ) , and a good implementation of the modified Newton method (of the G i l l and Murray type [4] ) takes 24 iterations. Computational results relative to the Powell function, whose Hessian is singular at the minimizer, are reported in Table 2. The numerical performance of the c u r v i l i near path strategy can be s t i l l considered very good; in fact a good quasi-Newton algorithm takes 15 iterations for decreasing f(x) to the value of lO-5 and 40 i t e r ations to minimize i t . 298 TBble 2. Results f o r the t e s t function ITERATION I n i t i a l Conditions Powell singular x 3.0 -I.0 f~) 0.0 1.0 2.1xlO 2 3rd 1.6xlO "I -7.5xi0 -3 3.2xi0 -2 1.9xlO -2 1.3xlO -2 5th 3.9xi0 "2 -4.3xi0 -3 6.1xlO -3 6.6xi0 -3 2.6xi0 -5 9th l . ] x l O "3 - ] . I x l O -4 1.8xi0-4 l.SxlO -4 3.1xIO - I I 7. CDNCLUDING REMARKS The f i r s t computational experience with the new c u r v i l i n e a r path strategy presented here seems encouraging; the t e s t i n g , although very l i m i t e d , has been carried out on functions which can be considered representative of those f o r which t h i s method has been designed and may r e s u l t p a r t i c u l a r l y appropriate ( e . g . , very i l l - c o n d i t i o n e d , l i k e the Powell function near the s o l u t i o n ) . At present, the e f f o r t in computing t h i r d derivatives can be a l i m i t i n g f a c t o r , at least f o r large size problems. However, i t i s worth noting that, among various p o s s i b i l i t i e s to avoid t h i s calculations, the following tensor approximation has been taken into consideration: s ~Go = G(xo+ ~ s ) - Go • Computations carried out with t h i s approximation (which, therefore, produces a second d e r i v a t i v e method), provided numerical results very close to those reported in the Tables 1 and 2. Although the results presented here are e s s e n t i a l l y based on an abstract algorithm and f u r t h e r research is needed on i t , nevertheless i t seems possible to draw the conclusion that the underlying theory is sound, indicating the p o t e n t i a l i t y of the method in achieving high performances. REFERENCES 1 I MANGASARIAN,O.L., Nonlinear Progranm~ng, McGraw-Hi11, New York, 1969. 121 GOLDFARB,D., Curvilinear path steplength algorithms for minimization which use directions of negative curvature,Mathematical Programming,Vol.18, pp.31-40, 1980. 131 AL-DHAHIR,J., A descent method for unconstrained optimization based on a conic model, University of Dundee, MS Thesis, 1982. [4] FLETCHER,R., Practical Methods of Optimization, Vol.1, ~Jiley, Chirchester, 1980.
© Copyright 2026 Paperzz