Conic Approximations and Collinear Scalings for Optimizers

Haverford College
Haverford Scholarship
Faculty Publications
Mathematics
1980
Conic Approximations and Collinear Scalings for
Optimizers
William C. Davidon
Haverford College
Follow this and additional works at: http://scholarship.haverford.edu/mathematics_facpubs
Repository Citation
Davidon, William C. "Conic approximations and collinear scalings for optimizers." SIAM Journal on Numerical Analysis 17.2 (1980):
268-281.
This Journal Article is brought to you for free and open access by the Mathematics at Haverford Scholarship. It has been accepted for inclusion in
Faculty Publications by an authorized administrator of Haverford Scholarship. For more information, please contact [email protected].
Conic Approximations and Collinear Scalings for Optimizers
Author(s): William C. Davidon
Source: SIAM Journal on Numerical Analysis, Vol. 17, No. 2 (Apr., 1980), pp. 268-281
Published by: Society for Industrial and Applied Mathematics
Stable URL: http://www.jstor.org/stable/2156612 .
Accessed: 12/04/2013 13:02
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extend
access to SIAM Journal on Numerical Analysis.
http://www.jstor.org
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
(
SIAM J. NUMER. ANAL.
Vol. 17, No. 2, April 1980
1980 SocietyforIndustrialand Applied Mathematics
0036-1429/80/1702-0007$01.00/0
CONIC APPROXIMATIONS AND COLLINEAR
FOR OPTIMIZERS*
SCALINGS
WILLIAM C. DAVIDONt
to theirobjectivefunctions.
approximations
algorithms
updatequadratic
Manyoptimization
Abstract.
as ratiosofquadratics
defined
from
toconicapproximations,
quadratic
a generalization
Thispapersuggests
of typical
are squares,(a + a Tx)2. Thesecan bettermatchthevaluesand gradients
whosedenominators
fortheirminimizers.
affine
scalings,
Equivalently,
and hencegivebetterestimates
objectivefunctions,
f are generalizedto collinearscalings,S(w)=
S(w)=xo+Jw, of the domainof objectivefunctions
as well as better
fS morenearlyconstant
x0+Jw/(1+hTw)) to makethe Hessianof the composition
andcollinear
ofoptimization
algorithms
usingconicapproximations
conditioned.
Certaingeneralfeatures
and
underaffine
alongwithNewton-Raphson
arepresented.
Thesearenotonlyinvariant
scalings,
scalings
collinear
scalings.
underthelargergroupofinvertible
algorithms,
buttheyarealsoinvariant
variablemetric
updatequadraticapproximations
1. Introduction.
algorithms
Manyoptimization
to
ofsuccessiveapproximations
f,and use theminimizers
to theirobjectivefunction
eachquadratic
approximation
minimizers
descentalgorithms,
estimate
forf.Insteepest
off at one point.In Newton-Raphson
has a unitHessianand matchesthegradient
off at
matchestheHessianas wellas thegradient
algorithms
[9],eachapproximation
matchesthegradient
one point.In variablemetric
algorithms
[4],eachapproximation
are neededto matchfunction
off at twopoints.But nonquadratic
approximations
valuesfi, as well as gradientsgi. at pointsx+, wheneverf+-f_ does not equal
fromquadraticto conic approximating
+ g)T (x? - x-). This papergeneralizes
aresquares.
whosedenominators
in ? 2 as thoseratiosofquadratics
functions,
defined
thisparticular
are:
Somereasonsforsuggesting
generalization
by
can be determined
a conicinterpolation
1) Underappropriate
conditions,
at then + 1 vertices
ofan n dimensional
successive
function
andgradient
evaluations
operationsaftereach, muchas a quadraticintersimplex,usingO(n 2) numerical
atthese
byjustitsgradients
towithin
an additiveconstant,
polationcanbe determined,
vertices.
can be made invariant
usingconicapproximations
algorithms
2) Optimization
of projectivegeometry.
underthe groupof collineartransformations
characteristic
onlyundertheproper
are invariant
algorithms
andvariablemetric
Newton-Raphson
whilesteepestdescentand conjugategradient
subgroupof affinetransformations,
areinvariant
ofisometries
ofEuclidean
subgroup
onlyunderthestillsmaller
algorithms
oftheHessianofthe
transformations
canimprove
theconditioning
space.Whileaffine
at anypoint,collineartransformations
can also makethetransobjectivefunction
formed
Hessianmorenearlyconstant,
sincetheJacobianofcollineartransformations,
unlikethatofaffine
ones,neednotbe constant.
likemostoftheobjectivefunctions
theyareto approximate,
3) Conicfunctions,
abouttheirminimizers.
Theycan also betterfitexponential,
need notbe symmetric
ofincreasing
whichsharewithconicstheproperty
or otherfunctions
rapidly
penalty,
in Dn.
nearsomen -1 dimensional
hyperplane
need
likethatofa typical
ofa conicfunction,
objectivefunction,
4) Theminimizer
ofthenonquadratic
eachminimizer
ofa Newtonstep.Incontrast,
notbe inthedirection
considered
byFried[5],JacobsonandOxman[6],andothers[2],[3],
approximations
functions
ofa Newtonstepsincetheirapproximating
satisfy
[7] isalwaysinthedirection
* ReceivedbytheeditorsApril29, 1979,andinrevisedformSeptember
11, 1979.
19041.
t Haverford
Pennsylvania
College,Haverford,
268
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
CONIC APPROXIMATIONS
269
q ofdegreez>0; i.e.,qS(As)=
function
f(x)=f* +q5(x-xx) forsomehomogeneous
AV+(s)forall A> 0 ands E R'.
methodsoflinearalgebraare applic5) Theoretical
conceptsandcomputational
domainof
whenpointsin then dimensional
able to algorithms
usingconicfunctions
coordinates,
are specified
thesefunctions
bythen ratiosamongn + 1 homogeneous
collineartransformations
is isomorphic
to
essentially
becausethegroupofinvertible
modulomultiples
oftheunitmatrix.
thegroupofinvertible
(n + 1) x (n + 1) matrices
and ? 3 appliestheseto the
somebasictermsandconcepts,
Section2 introduces
at theverticesof a simplex.
studyofconicfunctions
withgivenvaluesand gradients
canbe obtainedusing0(n2) numerical
Section4 showshowtheseconicinterpolations
aftereach function
and gradient
evaluation.Section5 givesan algorithm
operations
canbe derived;itcanbe readfirst
from
schemata
whichspecific
algorithms
optimization
interested
incomputation
sinceitmakesfewreferences
totherestof
bythoseprimarily
thispaper.
denotereal
Theellipsis"iff"is usedfor"ifandonlyif".Lowercase Greekletters
functions,
lowercaseLatinletters
denoteintegers,
numbers
whichneednotbe integers;
or
or columnvectors;and uppercase lettersdenotemoregeneralmaps,matrices,
ofanymatrix
A is AT, andA-T = (AT)-1 = (A-1)T.
spaces.The transpose
2. Definitions
and explications.
Dnis then dimensional
spaceofrealn x 1 columnvectors;
Rmxn is themn dimensional
spaceofrealm x n matrices;
n x n matrices,ordered
is the in (n + 1) dimensionalspace of real symmetric
Rnfvn
byA 0-OiffvTAv?0 forall v E
Inis then x n unitmatrix;and
X is an openconvexsubsetin Dn.
A smoothfunction
f:X -* R is:
iffitsgradient
is constant;
affine
quadraticiffitsHessianis constant;
collinear
iffitis a ratioofaffine
functions;
coniciffitis a ratioofa quadraticto thesquareofan affine
function;
ifff(x)> 0 forall x E X, and
positive
all itslevelsetsare convex,and it has no smooth
cuppediffit has a minimizer,
extension
to a largeropenconvexdomain.
A map S: W -* X betweenconvexsets W and X is affine,quadratic,collinear,or
conic iffeach affinef: X -* R makes the compositionfS: W -* R affine,quadratic,
or conicrespectively.
collinear,
A gauge fora function
p :X -* R which
f: X -* R is a smoothpositivefunction
makestheproduct
p2f:X -* R quadratic.
A scalingfora functionf: X -* R isa smoothmap S: W -* X froman open set W in
a Euclideanspaceto X whichmakesthecomposition
fS: W -* R quadraticwithunit
Hessian.
ofthesedefinitions
are:
Somebasicconsequences
isaffine
Eachconstant
function
isaffine;
a function
1) Hierarchy
ofconicfunctions.
andeachquadraticor collinearfunction
is conic.
iffitis bothquadraticand collinear,
to lines.A function
is affine,
quadratic,collinear,or coniciffits
2) Restrictions
toeachlineinitsdomainis affine,
orconicrespectively.
restriction
quadratic,
collinear,
A function
toall Rniffitis
hasa (collinear)
conicextension
3) Maximalextensions.
is an open
(affine)
quadratic.The largestconvexdomainforanyotherconicfunction
halfspacein Dn.
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
270
WILLIAM C. DAVIDON
The criticalpointsofeach conicfunction
(4) Criticalpoints.
f:X -* forman affine
subspaceinX, possibly
null;i.e.,ifx #y arecritical
pointsoff,theneachpointofX on
critical
thelinethrough
x andy is also a critical
pointoff. WhileHessiansat different
ofmultiples
of
pointsneednotbe thesame,theysharea commonnullspace,consisting
function
hasa critical
point
thedisplacements
x - y betweencritical
points.A collinear
iffit is constant.If a conicfunction
has a local minimizer
x*, thenx* is a global
eachlevelsetoff is convex,andf hasjustone cuppedconicextension.
minimizer,
areconicsections,
buttheseare
5) Levelsets.Thelevelsetsofeachconicfunction
iffthefunction
anaffine
transformation
of
similar
andconcentric
isquadratic.
To within
is equivalent
toone withunit
itsdomain,eachconicfunction
witha uniqueminimizer
0 E ns. Foreachsuchnormalized
Hessianat itsminimizer
conicf:X -* R,thereis just
one a E Dnwith
and
aTx<1
1(x)=f*+-
T
)2
under
forall x E X. The levelsetsoff are convexconicsectionswhichare invariant
is quadraticanditslevel
rotations
abouta andhaveone focusat 0 E Din.Thisfunction
setsare concentric
spheresiffa = 0. For a ? 0, thereis just one levelset foreach
eccentricity
e > 0. It is
{x EX: a TaTxe2(
-a Tx)2}
within
whichf(x) f*+ le2/aTa. Thisis an ellipsoidfore < 1, a paraboloidfore = 1,
fore > 1. The function
f:X -* RIis convexonlyin the
and a lobe of a hyperboloid
ofa paraboloid
segment
{X eX: aTaxTX-(l+aTx)2}.
6) A representation
of collinearmaps. A map S: W -+ X is collinearand has the
valuexoE X andJacobianJ E Dinmat OE W c R'" iffthereis an h E R' with
T
1+hTw>0
(2.1)
i~~~~w
and S(w)=xo+l
hT
forall w E W.The Jacobianofthismapat anyw E W is
(Im+whTylf1
1+hTw
+hW
FIh
W)*
JwhTW)
and SW = X, inwhichcase
iffJ is invertible
Thismapis invertible
'( )
Y(x-xo)
J-1r
7) A representation
of conicfunctions.A functionf: X -* R is conic and has the
with
valuefoE R andgradient
goE Dinat xoE X iffthereis an a E DinandA E DnVn
T
a s<1
(2.2)
~~~~~g0s1 sTAs
TT
and f(xo+s)=fo+
+2
iscollinear
atanypoint
iffA = 0. Itsgradient
foralls E Dnwithxo+ s E X. Thisfunction
X=XO+s of Xis
g=
1
-2
V
+ A
T
(In - asTfl (ygo+As):
s
T)1(go
(yIn+ as(g+
3-
V
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
As),
271
CONIC APPROXIMATIONS
vanishesat
wherey= 1- aTs is thevalueat x = xo+ s of a gaugeforf.Thisgradient
eX
X* =Xo+S*
iff
(A -
=
goa T)S*
_go.
The valueoff is thesameat all critical
pointsx* andequals
1 gos*
f* fo+o
2
1 s*As*
2 7* 2
y*
whereey = 1-aTs*. The Hessianoff at x* is
1
T =1
2 (A-ago
Y*
-goaT
+2(fo-f*)aa
T)=
T
Y*
=4
7*
(In- as*-'A(I,-s*aT)1
(y*Il + as *)A(y*I, +s*aT).
in whichcase
iffA _; itis uniqueiffA is invertible,
A critical
pointis a minimizer
e* = 1/(1-aTA-lgo),s* = -y*A-go, f* =fo-2goTA-lgo, andtheHessianoff at x*
is
1S
2
7*
(A - ago)A-'(A - goaT)
of(2.2) iffh = jT a
8) Scalings.Thecollinear
mapof(2.1) scalestheconicfunction
and JTAJ= Im.A scalingS: W -* X of any functionf :X -* R pairs the level sets of f
withconcentric
spheresin W. If a function
f:X -* R witha criticalpointx* has an
-*
theHessianoff at x* is
invertible
scalingS: W X, thenx* is a uniqueminimizer;
positive definite,and this Hessian equals J-TJ-1, where J-1 is the Jacobian of
S- :X-* W at x*. Morse'slemma[8] impliesthe converse:if a smoothfunction
Hessianat a criticalpointx* E Rn,thenthereis an
f: Rn-*R has a positivedefinite
X ofx*.
off tosomeneighborhood
of
the
restriction
invertible
scaling
-*
A
conic
foreachxoE X there
function
:
X
is
iff
coordinates.
R
9) Homogeneous
f
Rn+1
Rn+lvn+l
A
with
a
c
E
and E
is
(2.3)
and f(xo+s)=-(C
cII1
W)2
w E Dn+l of (). The gradient
forall s Ein withxo+ s E X and all positivemultiples
g E Rnoff at x = xo+ s is uniquelydetermined
by
(2.4)
A
(1) =
s)
+2f(x)c,
wherey = c (s) is thevalueat x = xo+s of a gaugeforf.Equations(2.2) and (2.3)
thesameconicfunction
iff
specify
(2.5)
C(-)
and A(A
g0a
T
T
ag0
+O)
2fToc
function
overall Dn is positiveiffit is a positive
An affine
10) Miscellaneous.
Thereare
andthesearetheonlygaugesfornonconstant
constant,
quadraticfunctions.
functions
overanyproperopenconvexsubsetX in n, and
nonconstant
positiveaffine
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
272
WILLIAM C. DAVIDON
whileall thesegaugeeachconstant
function
overX, thegaugesforeachnonconstant
arepositivemultiples
conicfunction
ofeachother.
Thesumoftwo(collinear)
conicmapsis (collinear)
conicifftheysharea gauge.The
setofthosecollinear
W toan n dimensional
X
mapsS: W -* X, froman mdimensional
whichsharea gauge,isan (m + 1)n dimensional
vectorsubspaceinthe-(m + 2)(m + 1)n
dimensional
vectorspaceofall conicmapsS: W -*X whichsharethissamegauge.
Collinearmapspreserve
andcrossratios;i.e.,ifS: W-> X
collinearity,
convexity,
is collinear,
then
(i) S pairscollinear
pointsu,v,and w of W withcollinear
pointsS(u), S(v), and
S(w) of X;
(ii) S pairseachconvexsubsetU in W witha convexsubsetSU inX; and
(iii) ift,u,v,and w arecollinear
pointsof W,thenforanynormson thespacesW
andX,
ilt-ullllv-wll list-SuilllSv-Swl
Ilt-vllvu -wll list-Svii iiSu-SwlI
whenever
thedenominators
arepositive.Notethatthechoiceofnormsdoes
notaffect
theseratiossince|IAsIl
= IAI ilsiI
foranynorm.
to be considered
3. Conicinterpolations.
ofthealgorithms
Whileeach iteration
usesthevaluesandgradients
ofanobjectivefunction
atjusttwopointstoupdatea conic
itisinstructive
interpolation,
first
toderivea necessary
andsufficient
condition
forthere
to be a conicinterpolation
to givenfunction
andgradient
valuesat thevertices
ofany
simplex.
THEOREM 1. Thereis a conicfunction
f: X -* R withvalues fk ElRand gradients
gkE Rn at them + 1 verticesXkof an m dimensionalsimplexin X iffthereare positive
numbersYk with
(3.1)
fi-fi =
(2
(gj
(x-xi)
+gi)
forall i and j.
Proof. Firstassume thatf is a conic functionwithvalues fkand gradientsgk at Xk,
and let yk be thevalue at Xkofa gauge forf. Since a gaugeis positive,yk > 0, and sinceit
is affine,
itsvalue at anypointx(r) = xi + (xj -xi)r on theline through
xi and xj is
-* Rl
ofa gaugetoshowthatthefunction
defined
q: ax
by
yi+ (y - yi)r.Use thedefinition
q(r) = (yi + (-y- yi))2f(x(r))
is quadratic. Evaluate q and its derivativeq' at 0 and 1, and then use q (1)- q (0)=
2(q'(0)+q'(1)) to get(3.1).
Now assumethatsomefk e R, gkE RIn,and positivenumbersyksatisfy(3.1) forthe
m + 1 vertices
Sinceonlytheratiosamongtheyk enterin(3.1),choose
Xkofa simplex.
yo= 1. DefineSk= Xk- xoand use thelinearindependenceofthem vectorsSkfork ? 0
to showthereis an a E RF with
(3.2)
aTSk=1-Yk
forall k. For each k, definethevectorrk =
definethenumber
(3.3)
Pij=2 (6gj-
(gk
-
askigk)
-
7kgO,
g,) (xi -xi)
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
andforeach i andj,
273
CONIC APPROXIMATIONS
Use (3.1) and (3.2),together
with(fj-fi) + (fi-fo)+ (fo-f;)= O, to get
(3.4)
risi = Ryiyi
(Poi+ poi pii),
T~~~~~
r[Tsi.Use thisand thelinearindependenceofthe m vectorsSk fork $
to showthereis an A E R'Vf with
and hencerTsj
=
(3.5)
Ask =
0
rk
forallk. In (2.2),useanya E R' satisfying
(3.2) andA E R... satisfying
(3.5) toobtaina
values
and
function
with
the
0
conic
given
gradients.
X -* withvaluesfkE R and gradients
COROLLARY 1. Thereis an affinefunctionf:
E
at
the
m
+
1
vertices
a
in
X
gk Rn
Xk of simplex
iff
forall i and j,
= g T(x1-xi) =
fi-fi
gf(x;-xi),
a quadraticinterpolation
iff
(gi + gi) T(X-i
jfif=2
)
a collinearinterpolation
iffthereare positivenumbersYkwith
fi- fi=7
(xi
ViT(Xi-Xi) =g
Vi
7i
xi),
and a conicinterpolation
iffthereare positivenumbers7Ykwith
fifi =
(Igi++ Yg)
(xj xi).
andis includedhereonlyto
Thislastcondition
Proof.
simply
repeatsthetheorem
fromthisone byshowingthata
facilitate
Obtaintheotherconditions
comparisons.
conicinterpolation
iscollineariffthePijdefined
by(3.3) areallzero,thatitisquadratic
iffithasa constant
gauge,withyi= yjforall i andj, and thatitis affine
iffitis both
andcollinear. 0
quadratic
2. If f: X -* R is a conic functionwithvalues fk E R and gradients
X, thenforeach i and j thereis just one Pij = Pji E R forwhichthe
each
of
gauge forf satisfy(3.3). ThisPijalso satisfies
COROLLARY
gk E Rn at pointsXk E
values ykat Xk
(3.6)
(3.7)
= (f-fi)2-i
gT(x -Xi)g
T(Xj
xi),
Pi; =fu fX+ig)(xxi)=fi-fi-gT(x1-xi)
V~~~~~~i
Vi
and
(3.8)
1 2(Sj_ i
2
yi RYi
A
(S__
Yi Yi
fortheA E Rnvn of (2.2), whereSk= Xk - XO. Iff is cupped,thenpij_ O0.For each pointx of
the affinesubspace in X spanned by the Xk, thereare 0k e R with Z Ck > 0 and
X= E wkXk/Z, Ok.Each gaugeforfwithvalues Ykatxk has thevalue Z ckYk/Z, Ok > 0 atx,
and thevalue off at x is
(3.9)
f
f(x)=
Z=
E iJj7Yi7YjPq
kYkfk 1 Z
Y.WkYk
2
( ZMkYk)2
Definepijby(3.3),use (3.1) toget(3.6) and(3.7),anduse (3.4) and(3.5) to
Proof.
its
get(3.8).Iff is cupped,getp.j>0 fromA _0 and(3.8). Sinceeachgaugeis affine,
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
274
WILLIAM C. DAVIDON
value at Z cokXk/ZYWk is Z &0kYk/ZCOk, and sinceeach gaugeis positive,ZCkYk>0.
in (3.9) bythesquareofthisgaugeto showthatf is
thefunction
Multiply
f specified
at Xk toshowthatitis therequired
derivatives
conic,andevaluatef anditsdirectional
interpolation.O
withvaluesfkE R and gradients
COROLLARY 3. Thereare at most2nconicfunctions
in D8n.Ofthese,
at mostoneis
gkE Dn at then + 1 vertices
Xk ofan n dimensional
simplex
f and iffo> fkforeach k > 0, thenthereis
cupped.If thereis a cuppedconicinterpolation
just one gaugeforf whosevalue at xo is 1; itsvalue at each Xkis
T
-go5/
fk to-fk + POk'
whereSk=
Xk-
xo and Pok = ((fo-fk)k -gOSgkSk)
conicforeach
2 toshowthatthereisatmostoneinterpolating
Proof.
Use Corollary
>
with
fora
k
and
that
these
signs
are
all
positive
0,
choiceofsignsforthen numbers
Pok
cuppedconic. O
COROLLARY 4. Thereis a conicfunctionovertheinterval[x_,x+] c=R withvalues
E
f? R and slopes g? E R at pointsx+ > x_ of R iffthereare rootsy > 0 to thequadratic
equation
(3.10)
+--
+y2
y+g=.
x+-x_
For each root'y,thereis just one conic interpolation
gauged byan affinefunctionwhose
at
values 'y-.at xi have theratioy+/y = 'y. The value and slope of thisinterpolation
x (r) = x_+ (x+ - x4)r E X are
f(x (TJJ
(1 -r)f + yrf+
1 -,r+yr
yr(1 -r)p
(1 -,r+yr)2
and
g (x (r))
(1-r)g_ + y3rg+
(1=
-
+ yr)
where
P =2 ()'g+
- g_)(x+- x_)
= f_-f+ + yg+(x+- x-)
= f+-f_ --g_(x+ - x_) and
p2 = (f+ _f_)2_ g+g_(x+_x_)2
There is a cupped conic extensionof f iffeitherf is constant,or else y2g+> g_ and
y3g+> g_,whentheminimizer
x* and minimumf* off are
3
X*-
-yg+x_-g_x+
3
y g+-g_
and
f* =fi
(f+_f_-p)2
4p
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
275
CONIC APPROXIMATIONS
Proof.Use (3.1) withy = y+/y-to get(3.10). Use Theorem1, Corollary
2, and
algebratoverify
theotherconclusions.0
Thiscorollary
provides
a basisforonedimensional
optimization
algorithms
suchas
thosedevelopedbyBj0rstadandNocedal[1]. The following
simpleexampleassumes
to thevaluesand slopesof
thatforeach k> 0, thereis a cuppedconicinterpolation
at Xk+1, and that
at Xk and Xk1 witha uniqueminimizer
the objectivefunction
fk-1 >fk.
ALGORITHM
1.
Input:A pointxointhedomainX c Rofan objectivefunction
f: X -* R, an initial
thevalue
forcalculating
nonzerosteps1E R to be takenfromxo,and a subalgorithm
fk e R andslopegk e R off at anyXk E X.
Calculatefo and go at xo.
criterion
k > 0 untilsomeconvergence
is met,calculatefk andgk
Foreachinteger
=
at Xk Xk-1 + Sk, andset
Pk =
((fk
1-fk)
-2gk
1gksk/2
Pkk-2_g-lkS
Yk =
-gk-lSk
fk-1 -fk +Pk
and
3
Sk+1
- YkgkSk
3
ykgk -gk-1
4. Recursive
A minimizer
andscalings.
ofa cuppedconicfunction
interpolations
f
can be calculated fromthe values and gradientsof f at n +1 points,using 0(n3)
Thissectionshowshow0(n2) operations
can be usedaftereachfunction
operations.
or
and locateitsminimizer,
and gradient
evaluationto updatea conicinterpolation
to updatea collinearscalingofitsdomain.
equivalently,
Themainideaistouse(2.3) toreplacetheconicfunction
f: X -* Rbythequadratic
n+1
1 T
T
function
w -*2w Aw overthen dimensional
in R withcW
= 1, andto
hyperplane
replaceeachpointXk = xO+ Sk E X bya corresponding
point
1 |Sk\
Wk = (-
on thishyperplane.
The nextalgorithm
calculatesa V e Rn+lvn+l fora cuppedconic
function
f :X -* R andn dimensional
simplexinX, whichis thenusedbythefollowing
to specify
a minimizer
theorem
andminimum
off.
ALGORITHM 2.
k from0 through
Input:Foreachinteger
n,thevaluefk E Randgradient
gk e Rn of
a cuppedconicfunction
f: X -* R atthekthvertexXk ofan n dimensional
simplexinX,
withfo>fk fork>0.
Set Vo= OE Rn+lvn+l.
Sk
Xk
n, set
Foreach k from1 through
Pk = ((fofk)-goSkgkSk)'/2,
XO,
=
f -gOk
+
Sk
Zk =
1
) S
Yk = ( Tygg
and
Vk = Zk-Vk-lYk
YkgkSk
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
276
WILLIAM C. DAVIDON
T
T
If ykVk > 0,
set Vk= Vk-l +
else, set Vk =
k
~~~~~VkV
T
YkVk
Vk-i.
Set V = V,.
THEOREM 2.
For each cuppedconicfunctionf: X -* R whosevalues fk E DRat the
n + 1 verticesXk of an n dimensionalsimplexin X satisfyfo> fkforall k > 0, define
VE l n+lvn+l
byAlgorithm2. If y* > 0 and s* E Rn are definedby
(?)
1- S*)
Vgo
thenf has itsminimumvalue
f*
-fo+*
g?s* =fo-2[go,0]V(0)
at xo+ s*, wheregoe Rn is thegradientoff atxo. Furthermore,
(4.1)
V_0,
where C ERn+1
and AE
Vc=0,
VAV=V
R n+lvn+i
are definedby (2.3).
and
AVA=A-2f*cc
inAlgorithm
3 ofTheorem1 toshowthatthe ykdefined
2,
Proof.Use Corollary
together
withyo= 1,arethevaluesatXkofa gaugeforf.Use (2.4) toshowthatforeach
inAlgorithm
2 satisfy
k from1 through
n,theYk and Zk defined
(4.2)
CTZk = 0
and
Yk =Azk +2(fo-fk)c.
valuef* E R off anda w* E Rn+1 withc Tw* = 1
Sincef is cupped,thereis a minimum
and Aw* = 2f*c.For each k from0 throughn,defineZk E R
and Uk E R+
by
Zk=[W*,Zl,Z2,
and
'*,Zk]
Uk=A-AVkA-2f*cc,
anddefinetheinduction
hypothesis
(9k)
Vk
0,
Vkc = 0,
Vk
Vk=Vk,
Uk
0
and
UkZk = O.
To prove go, note that Vo=0, thatf_f*, in (2.3) impliesUO2'0, and that
UoZo= Uow* = Aw* - 2f*c = 0. The followingproofof 9Pk from9k-1 is similarto one
forvariablemetricalgorithms.
Assume A'k-1forsome k > 0 and showthatAVk= Uk-lZk, and henceyjvk=
setsVk= Vkl1,so
equalityiffUk-lZk = 0. IfYkVk =0 , thealgorithm
Zk Uk_ Zk ?0, with
Ukizk] =0. Now assumeykjv > 0
that Uk = Uk-i and UkZk = Uk-iZk = [Uk-iZk-i,
and get Vk '-Vki 1
. Use Vklic O andc Zk = O to get Vkc = 0. Use VkAZk = Vkykof Uk and Vk toget
Zk and Vk-1AVk-1 = Vk-1 to getVkAVk = Vk. Use thedefinitions
Uk =Uk-1-Uk-1
T
ZkZ k_
ZTUkZk
Z k Uk-1Z
Uk
0 and UkZk= 0 togetUkZk = 0.
10
0 anduse Uk-iZk-=
Thenuse Uk_:
to getUk 0,
This completesthe proofof 2k from -ik,1and hence of An.
Use thelinearindependence
ofthevectors
withc TZk =0 and
Sk fork #0, together
c w* = 1,toshowthatrank(Zk) = k + 1,which
withUkZk = 0 givesrank(Uk) C n - k;
inparticular,
ofthe
Un= 0. Use this,An,and (2.3) and (2.4) to obtaintheconclusions
theorem. O
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
CONIC
277
APPROXIMATIONS
While each iterationin Algorithm2 refersback to the startingpointx0, a linear
mappingin each iteration,using0(n 2) operations,can update thisreferencepointso
thatthekthiterationuses onlythestepXk- Xk1 insteadof Xk - x0.The nextalgorithm
fromAlgorithm2 just by thismapping.
differs
ALGORITHM
3.
Input:For each integerk from0 throughn,thevalue fk E R and gradientgk Rn of
a cuppedconicfunction
f: X -e Rat the kthvertexXkof an n dimensionalsimplexin X,
withfk-l >fk foreach k > 0.
Set Vo = 0 E Rn+1vn+1. For each k from1 throughn, set
Pk =
Xk -Xk-1,
Sk=
T
TT1/2
f T\
Jk) -gk-lSkgkSk)
1"
((fk-1
,Yk
=
-gk-ilSk
fk +Pk
fk-1 -fk + Pk
1
Y Sk
Zk(Y
=
lYkgk gk- 1
Vk = Z
-'kgkSk/
- Vk-lYkq
Yk
and
Dk
If
Ik(
T
YkVk >
-Sk)
set
set Vk=DVkVkDT
Vk=Lk(
Vk1 +
0,
T
YkVk
k
else, set Vk=DkVk-1Dk
Set V= Vn.
COROLLARY 1. For each cuppedconicfunction
f: X
Da
R whosevaluesfk E R at the
n + 1 verticesXk of an n dimensionalsimplexin X satisfyfk-1 > fk forall k > 0, define
VE Stn+lvn+1 by Algorithm 3. If y* >0 and s EDRnare defined by
1
(s 1)
( g>)
n( O)
thenf has itsminimumvalue
f* =fn +2
at Xn+
S*,
gnS* fn - [gn,0]V(O)
wheregnE R is thegradientoff at xn. Furthermore,
V-'0,
Vcn=0,
VAnV=V
and AnVAn=An-2f*CnCn
wherecn E Rn+1 and AnE R;n+lvn+1 are definedbyreplacingxo byxn in (2.3).
Proof.For each k from 0 throughn, define CkeIRn+1 and AkE Rn+lvn?l
by
replacingx0byXkin (2.3). Show thatforeach k from1 throughn,theYk and Zk defined
in Algorithm3 satisfy
T
Ck-lZk
=
and
Yk =Ak-1Zk+2(fk-1-fk)ck-1
insteadof(4.2). As in theproofofTheorem2, definef* E R and w* E Rn+1 bycw*=W 1
and Aow*= 2f*co. Define Zo = w* and for each k from1 throughn, defineZk E
Rn+lxk+l
by Zk =Dk[Zk-l, Zk], where Dk l DRn+lxn+lis defined in Algorithm3.
of Uk and theinductionhypothesis9Pkof
Replace c and A byCk and Ak in thedefinition
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
278
WILLIAM
C. DAVIDON
theproofof Theorem2, and replace c and A by Ck-l and Ak-i in theproofof ?Pkfrom
Use Ck = Dk Ck- and Ak = Dk Ak-lDk
to completethe proofof thiscorolgk-i.
lary. O
matricesVk?0 can be factoredas Vk = LkL7j foran Lk E Rnll'Mk
The symmetric
of rank mk = rank(Vk) - n. These Lk are determinedonly to withina rightmultiplication by an orthogonal mk X mk matrix.Each Lk determinesan hk E: R and
mk
fk E
RnXmk
by
(4.3)
k
T(=k)
and these,withXk E X, determinea collinearscalingSk: Wk -O X off: X -e R by (2.1).
The compositefunctionfSk: Wk -e R is quadraticwithunitHessian since /?kk implies
Lk A Lk = Im and Ljc = 0, or equivalently,that hk and Jk satisfyhk = Jka and
JkAJk = Im.
The onlychangesneeded in Algorithm3 to updateLk, and hencethecollinearmap
insteadof Vk = LkLk, are to definean integermO= 0, and thenforeach
k > 0, to replace the update for Vk by the followingupdate formk and Lk E Rn+iXmk:
Sk:
Wk -e X,
IfykVk >O,
set mk= mk-4+1
else, set mk= mk-
and
and
Lk
Lk = Dk[Lk-1,
Vk(YkVk)];
= DkLk-i.
ineach
5. An algorithmschemata.Insteadofdiscardingall previousinterpolations
iteration,as in Algorithm1 for one dimensionalproblems,or saving all previous
interpolations,as in Algorithms2 and 3 forcupped conic objective functions,more
about theobjective
need to selectivelyreplacesome old information
generalalgorithms
functionwithnew. The choice of whatis to be replaced needs to take intoaccountat
least two factors.For one, informationgatheredin regionsfar froma minimizeris
from
usuallybetterdiscardedthanthatfromnearbyregions.For another,information
steps which are nearly in the same directionas the currentstep is usually better
directions.Conflictbetween these
discarded than that fromsteps in quite different
desiderataarisewhenstepsnear theminimizerlie in a subspace offewdimensions,and
various strategiesfor balancing them need to be considered.However since these
considerationsare not unique to algorithmsusingconic approximationsor collinear
scalings,the followingschemataleaves open thischoice of strategy,and onlysuggests
how any one could be implemented.
the dimensionm of the regionin Rwn
Input: m and n, positiveintegersspecifying
ifthereare no constraints,
consistent
withanylinearequalityconstraints;
m = n;
xo E Rn, a point consistentwithany constraintswhere the firstvalue and
gradientof the objectivefunctionwillbe computed;
JOE
E;8nXm, a matrixwhose m columnsspan all steps consistentwithany
constraints.If the initial conic approximationis quadratic, then the
columnsof Jo are conjugate steps whichif taken fromthe minimizer
would each increasethe quadraticapproximationby 2;
hoE Rm,a columnvectorequal to zero ifthe initialconic approximationis
quadratic;
a positivenumber,used in a testwhichstopsthecalculationwhenthe
E E DR,
minimumof the currentconic approximationis withinE of the current
functionvalue; and
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
279
CONIC APPROXIMATIONS
of the
a subalgorithmforcalculatingthe value fk E Raand gradientgk E wR
objectivefunctionat Xk.
Step 0. Call fo and go at xo.
Comment.The initialconic approximationto the objectivefunctionsatisfies
(5.1)
f(xo+
Jow
) =fo+dgojow+
WTw
equal to
with1 + hojW> 0. This approximationis singularat thosex E wR
forall w E WRm
withhow >0. If goJoho< 1, it has the minimumvalue
xo+Jow/how forsome w E WRm
at
fo-2 II2goII
Jo1
hJo4go
For each k > 0:
If 2 WkWk < E, thenstop.
Step 1k Set Wk =-k-lgk-1
Else, go to Step 2k*
Comment.This simpleconvergencetestis invariantundercollinearmappings;it
maybe supplementedwithothers.
withAk = 1/(1 + h k-1 Wk), forwhichthe
Step2k. Find a stepSk = AkJk-1 Wk, typically
functionvalue fk and gradientgk at Xk = Xk1 + Sk satisfy
T
0 > gk-lSk,
fk-1 >fk
and
>gk-lSkgkSk.
(fk-1-fk)2
Comment.Some step Sk will satisfythe stated conditionsprovidedthe objective
functionhas a lower bound.
Step 3k. Set Pk = ((fk-1 - k) - lgk-SkgkSk),
'Yk = -gk-1Sk/(fk-1-fk
+Pk),
and
T
T
rk = Jk-1 (Ykgk -gk-1)-hk-1YkgkSk.
Comment.This 'Yk E Raand rk E Rm are used in theupdateforhk and Jk. The 'Yk is the
ratioof thevalue of a gauge at Xk to thatat Xk1, and rk is thechangein thegradientof
fromthestepfromXk-l to Xk, whereSk-l
fSk-1: W -* R resulting
thecompositefunction
in
satisfies(2.1). If differences functionvalues are used to estimategradients,thenthe
ratherthanfirst
componentsof rk can be estimateddirectlyfromfunctiondifferences,
it
to
calculate
rk.
estimating
gk and thenusing
~~T
VT = 2Pk ?? Vkrk.
Step 4k. Choose a vectorVkE RmwithVkVk
Set Uk = (Vk - rk)/(vk - rk)Vk,
hk = Ykhk-1 + (1-
Yk - Ykhk-lVk)Uk,
and
Jk = 'YkJk-1 + (Sk -YkJk-1Vk)Uk-Skhk.
about
Comment.The choiceofthevectorVk E Rm determineswhatold information
If u kTv;= 0 forsome j < k,
theobjectivefunctionis to be replacedby new information.
fromthejth stepis kept.IfSk is nearlyinthesame directionas some
thenall information
is to be kept,thenVk should be nearlyin the same direction
si, forwhichinformation
as v;.
THEOREM 3. If an algorithmof thistypeis used witha cupped conic objective
f: X -e St,and ifforeach k m, theVk chosenin Step 4k makes u kjV1= 0 forall
function
j < k, thenthem + 1 pointsXk fork _ m span them dimensionalaffinesubspacein X of
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
280
WILLIAM C. DAVIDON
thosex forwhichx - xo is in thecolumnspace of Jo. The restriction
off to thisaffine
has its
subspacehas a minimumiff1 + h TWm+ > 0, and ifthisis thecase, thisrestriction
minimumvalue
f=
fm-_llWm+ill2
at
X*
Furthermore,
Lm
E
=Xm
Jmwm+1
.
+h T
Rn+lxm definedby (4.3) satisfies
LTAmLm = Im
wherecmE Rn+1 and AmE
and
LTcm =0,
n+lvn+1 are defined by replacing xo by Xm in (2.3).
thisargument,
Proof.To simplify
keepxoas a reference
point,as inAlgorithm
2,
insteadofshifting
thereference
pointfromXk1 to Xkin thekthiteration,
as in this
algorithm
andAlgorithm
3; i.e.,makethesechangesinthekthiteration:
Sk = Xk - XO,
rk =
Pk =((fo -fk)
-gOSkg
g~~~~T
Jk-1 (Ykgk - go) - hk-lYkg kSk,
Sk)Sk2
Yk =-g
sk/(fO
fk+Pk),
T
hk = hk-1 +(
hik-1V
k)Uk
and
Jk = Jk-1 +
Yk
Sk Jk1Vk)
Uk
3 ofTheorem1 toshowthattheseyk,
Use Corollary
withyo= 1,arethevalues
together
at Xkofa gaugeforf.DefineYkandZkas inAlgorithm
2 anddefineLk by(4.3). Show
T
thatrk = Lk-lyk,
Lk =Lk-1+(Zk-Lk-lVk
)k,
Zk LkVk, and Vk= Lkyk. Define c E fRn+1and A E Rn+1vn+1 by (2.3) so thatYk and Zk
satisfy(4.2). Show thatifyi and z1 also satisfy(4.2), as well as zi = Lk-1vV,v1= Lk-1Y;,
and ukv1= 0, thenLk inheritsfromLk-l thepropertythatz; = Lkv1and v; = Lky1.Then
as intheproofofTheorem2. Lastly,verify
use induction
thattheupdateforhkandJk
from
giveninthealgorithm
differs
thatgivenherebyjustthelinearmapping
Dk defined
in Algorithm
3. 0
6. Summary
and conclusions.
Threewaysto specifyconicfunctions
have been
one givenby (2.2) and used in Theorem1, one givenby (2.3) using
introduced;
andusedinTheorem2, andonegivenby(5.1) intermsofa
coordinates
homogeneous
collinearscalingand usedin thealgorithm
schemataof ? 5. Equations(2.2) and (2.3)
conic
function
their
the
same
iff
specify
parameters
satisfy
(2.5), and (2.2) and (5.1)
= I.
specifythe same conic functioniffho= JoTaand JoTAJo
in ? 3 summarize
of conicinterTheorem1 and itscorollaries
basicproperties
at theverticesof a simplex.Corollary4
polationswithgivenvaluesand gradients
and Algorithm
overline intervals,
1
specializesTheorem1 to conicinterpolations
howthesecanbe usedforlinesearches,thoughsafeguards
mustbe addedto
suggests
ofAlgorithm
Theorem2 givestheproperties
makethisa generalpurposealgorithm.
2,
ofa
toupdatea conicinterpolation
aftereachevaluation
whichusesO(n2) operations
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions
CONIC
APPROXIMATIONS
281
fromAlgorithm2 in usingonlythesteps
functionand itsgradient.Algorithm3 differs
4 updates collinear
Xk - Xkl ratherthanXk - xO formakingthese updates. Algorithm
scalingsfortheobjectivefunctionratherthanconic approximationsto it,thoughthese
are equivalent,in theabsence ofrounding,forapproximations
withunique minimizers.
Theorem 3 statesthe basic propertiesof the algorithmschemataof ? 5 forupdating
collinearscalings.Specificalgorithmscan be obtained fromthisby adding rules for
choosingthe factorAk determiningthe magnitudeof Xk - Xk-l in Step 2, and for
choosingthe directionof the vectorVk in Step 4, whichdetermineswhatinformation
frompreviousiterationsis to be replaced in the currentone.
P. Bj0rstadand J. Nocedal [1] have developed an algorithmforone dimensional
minimization
whichimprovesupon Algorithm1 of ? 3. Theirsincludessafeguardsto
insurestability.They prove it has an R quadraticconvergencerate,when thereis a
neighborhoodoftheminimuminwhichf"is positiveandf"' is Lipschitzcontinuous,and
thatithas a fasterconvergenceratethanalgorithmsmakingcubicinterpolations
when
f"f"">2(ft"')2.
D. Sorensen [10] has shown that an algorithmupdatingcollinearscalingsfor n
dimensionalproblems has a Q superlinearconvergencerate and that it compares
favorablywitha widelyused BFGS variablemetricalgorithmon a varietyof standard
testproblems.His algorithmupdates a matrixCk correspondingto fkj =AkA1 for
theJ of (2.1) and A of (2.2).
It is a pleasureto thankP. Bj0rstad,J.More, and D. Sorensen
Acknowledgments.
fordiscussionsof theseideas and forconstructive
commentson an earlierdraftof this
paper. I am also indebtedto twoanonymousrefereeswho correctedseveralerrorsand
made valuable suggestions.
REFERENCES
[1] P. BJ0RSTAD AND J. NOCEDAL, Analysisof a newalgorithm
forone-dimensional
minimization,
Computing,22 (1979), pp. 93-100.
[2] C. CHARALAMBOUS, Unconstrained
optimization
basedonhomogeneous
models,
Math.Programming,
5 (1973),pp. 189-198.
[3] E. DAVISON AND P. WONG, A robust
conjugate
gradient
algorithm
whichminimizes
L-functions,
ControlSystemsRep. 7313, U. of Toronto,Toronto,Ontario, 1974.
[4] J.DENNIS AND J. MORE, Quasi-Newton
methods,
motivation
and theory,
SIAM Rev. 19 (1977),pp.
46-87.
[5] I. FRIED, N-stepconjugate
gradient
minimization
schemefornon-quadratic
A.I.A.A. J.9
functions,
(1971),pp. 2286-2287.
[6] D. JACOBSON AND W. OXMAN, An algorithm
thatminimizes
in
homogeneous
functions
ofn variables
N + 2 iterations
and rapidlyminimizes
J. Math.Anal. Appl., 38 (1972),pp.
generalfunctions,
533-552.
[7] J. KOWALIK
AND
K.
RAMAKRISHNAN,
A numerically
stableoptimization
methodbased on a
Math. Programming,
11 (1976), pp. 50-66.
homogenousfunction,
[8] M. MORSE, The calculus of variationin thelarge,AmericanMathematicalSociety,Providence,RI,
1934.
[9] J.RAPHSON, Analysis
Aequationum
Universalis,
London,1690.
[10] D.
The Q-superlinear
convergence
of a collinearscalingalgorithm
forunconstrained
optimization,
thisJournal, thisvolume,pp. 84-114.
SORENSEN,
This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM
All use subject to JSTOR Terms and Conditions