Nonlinear optimization by a curvilinear path strategy

NONLINEAR OPTIMIZATION BY A CURVILINEAR PATH STRATEGY
L. GRANDINETTI
Dipartimento di Sistemi
Universita d e l l a Calabria
87036 Arcavacata (Cosenza), I t a l y
I. INTRODUCTION
To f i n d a local solution of the unconstrained minimization problem
min F ( x ) , x ~ R n
i t is not a straightforward c a l c u l a t i o n in many d i f f i c u l t
situations.These s i t u a t i o n s ,
associated - f o r instance- to the s o l u t i o n of nonlinear programs via penalty methods,
may be p i c t o r i a l l y described as those in which the objective function behaves l i k e a
long curved v a l l e y with very steep walls.
P a r t i c u l a r l y in these s i t u a t i o n s i t may be suitable to f i n d a curved t r a j e c t o r y
in Rn which passes through the minimizer and, then, to f o l l o w this t r a j e c t o r y
by
c u r v i l i n e a r searches with the aim to reach the solution in few Zom# steps. This s t r a tegy is intended to overcome the t y p i c a l behaviour of classical descent
methods
which, in these s i t u a t i o n s , necessarily perform many ~ho~t steps along l i n e a r monodimensional manifolds. In f a c t , most of them are based on the i t e r a t i v e model
Xk+1 = Xk+~kdk
(I)
where k denotes the i t e r a t i o n index, x k is the current i t e r a t e , dk~ Rn defines the
d i r e c t i o n o f a l i n e a r t r a j e c t o r y , and ~k E R+ is a steplength along dk s u i t a b l y chosen
in such a way that f ( X k + l ) < f ( x k ) .
In this paper an o r i g i n a l way to construct a t r a j e c t o r y is pointed o u t ; i t is baseo
on the concept o f p a r a l l e l tangent hyperplanes and possesses some i n t e r e s t i n g theoret i c a l properties.
In p r i n c i p l e this t r a j e c t o r y may be derived as solution of a system of d i f f e r e n t i a l equations; however i t is shown how a s u i t a b l e approximation of i t can be obtained
without any e x p l i c i t solution of the d i f f e r e n t i a l equations.
L a s t l y , a prototype implementation o f t h i s c u r v i l i n e a r path strategy is devised;
the relevant numerical experiments, although l i m i t e d , seem to indicate that the metnoa
may be capable of very high e f f i c i e n c y , at l e a s t f o r certain classes of objective
functions.
290
2. GENERALLY DESIRABLE TRAJECTORYFEATURES
A t r a j e c t o r y x(m), i . e . a nonlinear monodimensional manifold param~trized by the
scalar mER+, may be associated in many ways to f ( x ) f o r the purpose of i t s minimizat i o n ; therefore i t is sensible to devise a framework of suitable general properties
to be possessed by t r a j e c t o r i e s of p r a c t i c a l i n t e r e s t .
Here the f o l l o w i n g properties are taken i n t o consideration, having assumed f:Rn-~R
globally differentiable.
( i ) Regularity
A basic natural requirement f o r the t r a j e c t o r y is that
x(m):R+ ~R n
be a one-to-one continuous mapping.
This excludes the p o s s l b i l i t y f o r d i s c o n t i n u i t i e s , b i f u r c a t i o n points and loops
to
occur; the benefits of avoiding such s i t u a t i o n s can be remarkable whenever a numerical
treatment o f the t r a j e c t o r y has to be done. And in f a c t t h i s is the case f o r t r a j e c t ories of practical i n t e r e s t .
The additional stronger requirement that x(~) be g l o b a l l y d i f f e r e n t i a b l e , can be
also considered generally desirable; t h i s means, in other words, that for each value
of m is defined the tangent l i n e a r v a r i e t y to the t r a j e c t o r y at x(m).
(ii)
Suitability
Straightforward requirements f o r a t r a j e c t o r y x(m), mER+, to be s u i t a b l e f o r the
purpose of local minimization of f ( x ) , can be considered the f o l l o w i n g :
(a) the convergence to the |ocal minimizer x* from an i n i t i a l
(b) the p o s s i b i l i t y to determine ~
estimate x(o)=Xo;
(a f i n i t e or a l i m i t value) such that x(~*)
be
the local optimizer.
Let us consider, for example, the f o l l o w i n g objective function:
f ( x ) = ~xTAx+bTx
AER nxn , A>O
(2)
and bERn
and associate to i t the t r a j e c t o r y i m p l i c i t l y defined by the system of ordinary d i f f e r e n t i a l equations (ODE):
x ' ( ~ ) = -A-Igo
with
x(o)
= xo
and
go =- Vf(xo)
I t is immediate to see that the t r a j e c t o r y is regular and s u i t a b l e ; i . e . i t passes
through the minimizer and, in a d d i t i o n , f o r ~ = I , we get:
29t
x ( ] ) = Xo-A-Igo ~ x~ .
I t is worth while to observe that f ( x ) is always decreasing on x(~), for ~E[o,= * ] .
Remark 2.1
The attribute (b) is an ideal one; generally neither x(~) is a known function of
nor i t is possible to determine a n a l y t i c a l l y x(a*) like in previous simple example.
Therefore the weaker requirement (more r e a l i s t i c than that stated in (b)) that f(x)
decreases as ~ increases to ~ can be, more conveniently, assumed.
This descent property of the trajectory may result p a r t i c u l a r l y useful in i t s practical numerical processing.
(iii)
Characterizability
I t is crucial that x(~) be characterized in terms of properties of f ( x ) .
An useful way for defining the trajectory may be that of exploiting d i f f e r e n t i a l
properties of f ( x ) (e.g. x(~) expressed as solution of a system of ODE). A particular
characterization of this type is discussed in the sequel of this paper.
( i v ) Linear invariance
I t is sensible that the trajectory described into the domain of the variable
x
be invariant under a linear transformation of variables defined by:
x = J~
eRn ;
JER nxn ,
det J#O.
This precisely means that trajectories constructed in terms of x and ~ with the same
procedure are related in such a way that:
x(~) = J ~(~) ,
v~
3. A TRAJECTORYDERIVED VIA PARALLEL TANGENTHYPERPLANES
A way to construct a trajectory based on the concept of parallel tangent hyperplanes is pointed out. Several desirable properties and features are possessed by
this trajectory for classes of functions of practical interest.
The basic idea on which the method is founded can be usefully introduced by means
of a geometric sketch in R2, i l l u s t r a t e d in F i g . l .
Let us consider, in correspondence to a starting point xo, where go,o, the tangent
plane to the level curve defined by f ( x ) = f ( x o ) .
Furthermore consider the parallel planes
g~(x-xo) =
-~
292
parametrized by the scalar m~o.
F i n a l l y consider those points where the planes defined before are tangent to
the
level curves.
The locus of such points (parametrized by the nonnegative
scalar ~) defines the
t r a j e c t o r y x(~).
I f , f o r example, we consider again the s t r i c t l y
convex function given by (2),then
i t is easy to recognize that the t r a j e c t o r y associated to i t via p a r a l l e l tangent
planes i s :
(i)
regular ( p r e c i s e l y , i t i s l i n e a r ) ;
(ii)
characterizable;
(iii)
suitable : ~ x ( ~ ) = x ~ e a s i l y
(iv)
linearly invariant.
determined;
As an immediate consequence, we have t h a t the t r a j e c t o r y derived on the basis o f
the previous concept possesses e x c e l l e n t behaviour on convex quadratic functions (prec i s e l y the same as the Newton method) and thus, t y p i c a l l y , near the s o l u t i o n of
a
general smooth function.
However, i t i s worthwhile to observe t h a t , in a d d i t i o n , a useful behaviour is possessed for nonconvex quadratic functions (e.g. concave, saddle), d i f f e r e n t l y of the simple
Newton method. In f a c t , in these cases a descent t r a j e c t o r y , l i n e a r l y i n v a r i a n t ,
still
is
provided.
Since a l l desirable properties are generally guaranteed at least near the solution
of a smooth f u n c t i o n , i t seems natural to extend t h i s procedure to general cases.
/
level curves of f(x)
trajectory x(a)
parallel tangent planes g~(X-Xo)=-~
F i g . l . P a r a l l e l tangent planes and related t r a j e c t o r y .
293
The g e n e r a l i z a t i o n of the method to the n-dimensional case and i t s proper formul a t i o n lead to formalize the d e f i n i t i o n o f the t r a j e c t o r y as the locus of points
s a t i s f y i n g the f o l l o w i n g canonical conditions:
g~ (X-Xo) : - ~
(3a)
g ( x ( ~ ) ) : X(m)go
(3b)
where:
~o;
x(o) =xo ; go~g(xo) # o
and ~(m) is a scalar function such that x(o)=l and ~(~)=o whenever g(x(~))=o ( i . e . at
any s t a t i o n a r y point on the t r a j e c t o r y ) .
I t is worth noting that X(m) can be interpreted as the Lagrange m u l t i p l i e r associated
to the subproblem of minimizing f ( x ) constrained by the generic tangent hyperplane.
In addition to (3), the c o n t i n u i t y of x(m) is assumed as a canonical side-condition;
in f a c t i t guarantees (at least f o r broad classes of problems) local u n i c i t y of
the
trajectory.
F i n a l l y , algebraic manipulations lead to the f o l l o w i n g c h a r a c t e r i z a t i o n :
x'(~) = -
u~:)
g~u(~)
(4a)
x(o) = x 0
(4b)
Here the n-dimensional vector u(e) s a t i s f i e s :
Gu = go
where Gzv2f(x(e)).
In p r i n c i p l e the c h a r a c t e r i z a t i o n is not always possible ( i . e . f o r any value of a)
and may seem quite r e s t r i c t i v e .
However, in p r a c t i c e , when the t r a j e c t o r y is treated
by some suitable numerical technique, there are no severe consequencies at least f o r
broad classes of functions (provided that the t r a j e c t o r y e x i s t s and is well defined).
The properties of the t r a j e c t o r y formally defined here, generally agree, at least
for some large classes of f u n c t i o n s , with those desirable features described in section 2.
In f a c t x(~) is
regular, i f f ( x ) is s u f f i c i e n t l y well-behaved. In a d d i t i o n ,
it
is
linearly inva~Jc~tsince the f o l l o w i n g r e s u l t holds
Fact 3.1.
Let us assume
x=J~
~eR n , J mdet J # o .
Then, on the basis of the defining conditions (3), the t r a j e c t o ~ e s i n t h e domain of x
and ~ are related by:
x(m) = O ~(~) ,
showing l i n e a r invariance.
Moreover a deseent property r e l i e s on the f o l l o w i n g r e s u l t
[]
294
Fact 3.2.
I f x(~) is g l o b a l l y d i f f e r e n t i a b ! e , then i t is a straightforward consequence of (3)
that
gT x ' ( ~ ) = - X(~) .
[]
I t is worth noting that, since ~(o)=I, i n i t i a l l y the trajectory's slope is negative;
then, whenever ~(m) >o, the trajectory is a descent one. This happens, at least, on
the arc between m=o and ~ (for which ~(~):o) where a stationary point is encountered.
Lastly, the trajectory is generally charaete~zc~Ze. Except "pathological" situations, this happens almost everywhere and, i f necessary, the trajectory can be described by continuation; as a typical example, the following objective function can
be considered:
f ( x ) = 1 - exp (-xTx) .
I t i s worth noting that most of the above mentioned properties can be p r e c i s e l y
established for some important classes of functions.
For s t r i c t l y
convex and g l o b a l l y d i f f e r e n t i a b l e f u n c t i o n s , i t can be proved, in
p a r t i c u l a r , that ~(~) is monotonically decreasing; in f a c t , i t stands the f o l l o w i n g :
Len~na 3.1.
Given f ( x ) s t r i c t l y
~i<~2
convex and g l o b a l l y d i f f e r e n t i a b l e , then f o r any ( a l , ~ 2 ) with
we have
~2 < ~I.
[]
With e s s e n t i a l l y the same assumptions on f ( x ) , then the stronger r e s u l t s t h a t the t r a j e c t o r y is regular, s u i t a b l e , l i n e a r l y i n v a r i a n t and characterizable can be proved.
For the class of pseudoconvex functions in the sense of Mangasarian [ I ]
, which
are i n t e r e s t i n g f o r t h e i r a p p l i c a t i o n s in several contexts, the t r a j e c t o r y s t i l l
possesses the desirable properties although in weaker sense (~(m) non monotonically
decreasing; c h a r a c t e r i z a b i l i t y almost everywhere).
On the basis of the many sound r e s u l t s , i t seems possible to draw the conclusion
that the method generally produces t r a j e c t o r i e s well-behaved, is descent and e s s e n t i a l l y "becomes" the Newton method near the s o l u t i o n o f smooth f u n c t i o n s , thus showing
good p o t e n t i a l i t i e s f o r e f f e c t i v e numerical implementations.
4. QUADRATICMODEL OF THE TRAJECTORY
I t is not r e a l i s t i c ,
in general s i t u a t i o n s , t o derive the t r a j e c t o r y x(m) as so-
l u t i o n of a system o f nonlinear ODE of the type (4).
To overcome the problem of e x p l i c i t l y
solving d i f f e r e n t i a l equations, here
the
f o l l o w i n g quadratic model of x(~) is taken i n t o consideration:
x(~) = x(o) + ~ s + ½~2t
(5)
295
where
s ~ x'(o) ; t ~ x"(o) .
I f f(x) is such that the trajectory is l o c a l l y characterizable at xo, then s can
be e x p l i c i t l y obtained;indeed,in this case, i t is an obvious consequence of (4) the
following
Fact 4,1.
At the i n i t i a l point xo the trajectory's f i r s t derivative is given by:
GoI go
S = - ~ T_
I f , in
(6)
[]
goGo go
addition, f(x) possesses continuous third derivatives, then the following
important result holds
Proposition 4.1.
I f f ( x ) e C 3 and, in addition, the trajectory is locally characterizable at Xo, then
the vector t can be derived by solving:
Got : - ~0 s
(7)
S
where Go i s a t e n s o r
s
[
Go =
defined
,
,
GIS , G2s , . . . .
by:
:
, GnS
]
T
with
Gi
v2gilXo~'' ; gi ~ ~B f
, i=l,...,n.
Proof.
Let us assume the following second order approximation of the gradient g along the
trajectory:
x
g(x(~)) = g(x(o)) + GO [x(~)-x(o)] + IGo[x(~)-x(o)]
where
x~ [x(~) - x(o)]
: as + ~ 2 t
' G nx]T
o ~ [GIX''
'
I t follows that
s
t
g(x(~)) = g(x(o~+aGoS+½~2Got+½(eGo+½~2Go)(~s+~e2t)
and, by neglecting terms of high degree in ~, the following gradient approximation is
i
. . . .
|
obtained:
S
g(x(a)) = g(x(o))+~GoS+~2Got+½~2GoS •
(8)
But, on the other hand, from conditions (3) i t can be easily derived the following
approximation for X(~):
X(~) = l
g~ G;1 go
which, u t i l i z e d in the canonical condition
296
g(x (~)) : x(~) go ,
permits to obtain:
g(x(~)) = %
_
~
~
go
(9)
By equating the expressions given by (8) and ( 9 ) , we f i n a l l y get:
s
Got + GoS = o .
[]
I t derives from the above results t h a t , under r e l a t i v e l y mild assumptions on f ( x ) ,
the quadratic model of the t r a j e c t o r y is completely specified.
5. COMPUTATIONALASPECTS
The c a l c u l a t i o n of t h i r d d e r i v a t i v e s , needed in general s i t u a t i o n s to obtain the
quadratic model of the t r a j e c t o r y , seems to be a serious computational complication.
In p r i n c i p l e these c a l c u l a t i o n s could be avoided by approximating t h i r d d e r i v a t i v e
matrices in a suitable sense (e.g. by quasi-Newton techniques). However e x p l i c i t
computation of t h i r d d e r i v a t i v e s is challenging, at l e a s t f o r problems of moderate
s i z e , e s p e c i a l l y i f the expected benefits are noticeable. Under t h i s respect,modern
advanced computational techniques, l i k e p a r a l l e l processing and symbolic manipulation,
may r e s u l t very h e l p f u l .
Apparently a second source of d i f f i c u l t i e s
may be the occurrence of s i n g u l a r i t i e s
of the Hessian matrix of f ( x ) ; in practice t h i s is not a serious one, at l e a s t
objective functions s u f f i c i e n t l y
for
regular, provided that an appropriate i n i t i a l i z a t i o n
of the t r a j e c t o r y is done and a s u i t a b l e c u r v i l i n e a r search along the quadratic model
is performed.
When the approximating quadratic t r a j e c t o r y is used in an i t e r a t i v e scheme, i t becomes
c r u c i a l to devise e f f e c t i v e steplength algorithms f o r the computation of a s u i t a b l e
value of m, at any i t e r a t i o n . The present quadratic m~.del can b e n e f i t in many s i t u ations of the Goldfarb [ 2 ]
c u r v i l i n e a r path steplength algorithms; these generalize
the Armijo and Goldstein conditions to the case
x(~)
= x(o)
+~s
+
~2t
p r o v i d i n g , under mild assumptions, convergence to a s t a t i o n a r y p o i n t .
Additional aids to a p r a c t i c a l numerical implementation of the c u r v i l i n e a r search are:
( i ) a t h i r d order approximation of f ( x ) , r e s t r i c t e d to the quadratic t r a j e c t o r y , can
be computed w i t h o u t any extra computational e f f o r t ;
( i i ) a good i n i t i a l
by the value of
guess of the steplengthcan be provided, at l e a s t near the s o l u t i o n ,
~ ~ ~(~) = o.
297
6. PROTOTYPEIMPLEMENTATION
An implementation has been devised, f o l l o w i n g previous ideas, and tested on functions which seem to possess features s u i t a b l e f o r a meaningful v a l i d a t i o n o f the method.
A very e f f e c t i v e behaviour has been achieved, by using accurate c u r v i l i n e a r search
f o r the f o l l o w i n g t e s t functions:
Test i.
Rosenbrock standard
f ( x ) : I00 (x~-x2) 2 + ( I - x i ) 2
starting point
Test 2.
,
xo = ( - I . 2 , l.O) T .
Powell singular
f ( x ) = (Xl+lOx2) 2 + 5(x3-x4)2 + (x2-2X3)4 + lO(Xl-X4 )4 ,
starting point
xo = (3, -l , 0 , I) T
The numerical results for the Rosenbrock function, which can be considered a good
model of long curved valley with steep walls, are reported in Table I .
Table I . Results for the test function
ITERATION
I n i t i a l Conditions
x
-1.20
Rosenbrook standard
...........
].00
f(x)
24.2
Ist
0.568
0.322
1.8xlO - l
3rd
0.996
0.991
9.5xi0 -5
5th
1.000
1.000
1.4xlO-ll
These can be f u l l y appreciated i f compared with analogous results of very well established classical algorithms. A good quasi-Newton algorithm takes 28 iterations for
decreasing f ( x ) to the value of I0-5; similar results are provided by a good implementBtion of the newest collinear scaling algorithm [3] . Moreover, a trust region
algorithm, of the Hebden-Mor~ type [ 4 ] , takes 21 iterations to minimize f ( x ) ,
and
a good implementation of the modified Newton method (of the G i l l and Murray type [4] )
takes 24 iterations.
Computational results relative to the Powell function, whose Hessian is singular
at the minimizer, are reported in Table 2. The numerical performance of the c u r v i l i near path strategy can be s t i l l considered very good; in fact a good quasi-Newton
algorithm takes 15 iterations for decreasing f(x) to the value of lO-5 and 40 i t e r ations to minimize i t .
298
TBble 2. Results f o r the t e s t function
ITERATION
I n i t i a l Conditions
Powell singular
x
3.0
-I.0
f~)
0.0
1.0
2.1xlO 2
3rd
1.6xlO "I
-7.5xi0 -3
3.2xi0 -2
1.9xlO -2
1.3xlO -2
5th
3.9xi0 "2
-4.3xi0 -3
6.1xlO -3
6.6xi0 -3
2.6xi0 -5
9th
l . ] x l O "3
- ] . I x l O -4
1.8xi0-4
l.SxlO -4
3.1xIO - I I
7. CDNCLUDING REMARKS
The f i r s t computational experience with the new c u r v i l i n e a r path strategy presented here seems encouraging; the t e s t i n g , although very l i m i t e d , has been carried
out
on functions which can be considered representative of those f o r which t h i s method
has been designed and may r e s u l t p a r t i c u l a r l y appropriate ( e . g . , very i l l - c o n d i t i o n e d ,
l i k e the Powell function near the s o l u t i o n ) .
At present, the e f f o r t in computing t h i r d derivatives can be a l i m i t i n g f a c t o r ,
at least f o r large size problems. However, i t i s worth noting that, among various
p o s s i b i l i t i e s to avoid t h i s calculations, the following tensor approximation has been
taken into consideration:
s
~Go = G(xo+ ~ s ) - Go •
Computations carried out with t h i s approximation (which, therefore, produces a second
d e r i v a t i v e method), provided numerical results very close to those reported in the
Tables 1 and 2.
Although the results presented here are e s s e n t i a l l y based on an abstract algorithm
and f u r t h e r research is needed on i t , nevertheless i t seems possible to draw the
conclusion that the underlying theory is sound, indicating the p o t e n t i a l i t y of
the
method in achieving high performances.
REFERENCES
1 I MANGASARIAN,O.L., Nonlinear Progranm~ng, McGraw-Hi11, New York, 1969.
121 GOLDFARB,D., Curvilinear path steplength algorithms for minimization which use directions of negative curvature,Mathematical Programming,Vol.18, pp.31-40, 1980.
131 AL-DHAHIR,J., A descent method for unconstrained optimization based on a conic
model, University of Dundee, MS Thesis, 1982.
[4] FLETCHER,R., Practical Methods of Optimization, Vol.1, ~Jiley, Chirchester, 1980.