Haverford College Haverford Scholarship Faculty Publications Mathematics 1980 Conic Approximations and Collinear Scalings for Optimizers William C. Davidon Haverford College Follow this and additional works at: http://scholarship.haverford.edu/mathematics_facpubs Repository Citation Davidon, William C. "Conic approximations and collinear scalings for optimizers." SIAM Journal on Numerical Analysis 17.2 (1980): 268-281. This Journal Article is brought to you for free and open access by the Mathematics at Haverford Scholarship. It has been accepted for inclusion in Faculty Publications by an authorized administrator of Haverford Scholarship. For more information, please contact [email protected]. Conic Approximations and Collinear Scalings for Optimizers Author(s): William C. Davidon Source: SIAM Journal on Numerical Analysis, Vol. 17, No. 2 (Apr., 1980), pp. 268-281 Published by: Society for Industrial and Applied Mathematics Stable URL: http://www.jstor.org/stable/2156612 . Accessed: 12/04/2013 13:02 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Society for Industrial and Applied Mathematics is collaborating with JSTOR to digitize, preserve and extend access to SIAM Journal on Numerical Analysis. http://www.jstor.org This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions ( SIAM J. NUMER. ANAL. Vol. 17, No. 2, April 1980 1980 SocietyforIndustrialand Applied Mathematics 0036-1429/80/1702-0007$01.00/0 CONIC APPROXIMATIONS AND COLLINEAR FOR OPTIMIZERS* SCALINGS WILLIAM C. DAVIDONt to theirobjectivefunctions. approximations algorithms updatequadratic Manyoptimization Abstract. as ratiosofquadratics defined from toconicapproximations, quadratic a generalization Thispapersuggests of typical are squares,(a + a Tx)2. Thesecan bettermatchthevaluesand gradients whosedenominators fortheirminimizers. affine scalings, Equivalently, and hencegivebetterestimates objectivefunctions, f are generalizedto collinearscalings,S(w)= S(w)=xo+Jw, of the domainof objectivefunctions as well as better fS morenearlyconstant x0+Jw/(1+hTw)) to makethe Hessianof the composition andcollinear ofoptimization algorithms usingconicapproximations conditioned. Certaingeneralfeatures and underaffine alongwithNewton-Raphson arepresented. Thesearenotonlyinvariant scalings, scalings collinear scalings. underthelargergroupofinvertible algorithms, buttheyarealsoinvariant variablemetric updatequadraticapproximations 1. Introduction. algorithms Manyoptimization to ofsuccessiveapproximations f,and use theminimizers to theirobjectivefunction eachquadratic approximation minimizers descentalgorithms, estimate forf.Insteepest off at one point.In Newton-Raphson has a unitHessianand matchesthegradient off at matchestheHessianas wellas thegradient algorithms [9],eachapproximation matchesthegradient one point.In variablemetric algorithms [4],eachapproximation are neededto matchfunction off at twopoints.But nonquadratic approximations valuesfi, as well as gradientsgi. at pointsx+, wheneverf+-f_ does not equal fromquadraticto conic approximating + g)T (x? - x-). This papergeneralizes aresquares. whosedenominators in ? 2 as thoseratiosofquadratics functions, defined thisparticular are: Somereasonsforsuggesting generalization by can be determined a conicinterpolation 1) Underappropriate conditions, at then + 1 vertices ofan n dimensional successive function andgradient evaluations operationsaftereach, muchas a quadraticintersimplex,usingO(n 2) numerical atthese byjustitsgradients towithin an additiveconstant, polationcanbe determined, vertices. can be made invariant usingconicapproximations algorithms 2) Optimization of projectivegeometry. underthe groupof collineartransformations characteristic onlyundertheproper are invariant algorithms andvariablemetric Newton-Raphson whilesteepestdescentand conjugategradient subgroupof affinetransformations, areinvariant ofisometries ofEuclidean subgroup onlyunderthestillsmaller algorithms oftheHessianofthe transformations canimprove theconditioning space.Whileaffine at anypoint,collineartransformations can also makethetransobjectivefunction formed Hessianmorenearlyconstant, sincetheJacobianofcollineartransformations, unlikethatofaffine ones,neednotbe constant. likemostoftheobjectivefunctions theyareto approximate, 3) Conicfunctions, abouttheirminimizers. Theycan also betterfitexponential, need notbe symmetric ofincreasing whichsharewithconicstheproperty or otherfunctions rapidly penalty, in Dn. nearsomen -1 dimensional hyperplane need likethatofa typical ofa conicfunction, objectivefunction, 4) Theminimizer ofthenonquadratic eachminimizer ofa Newtonstep.Incontrast, notbe inthedirection considered byFried[5],JacobsonandOxman[6],andothers[2],[3], approximations functions ofa Newtonstepsincetheirapproximating satisfy [7] isalwaysinthedirection * ReceivedbytheeditorsApril29, 1979,andinrevisedformSeptember 11, 1979. 19041. t Haverford Pennsylvania College,Haverford, 268 This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions CONIC APPROXIMATIONS 269 q ofdegreez>0; i.e.,qS(As)= function f(x)=f* +q5(x-xx) forsomehomogeneous AV+(s)forall A> 0 ands E R'. methodsoflinearalgebraare applic5) Theoretical conceptsandcomputational domainof whenpointsin then dimensional able to algorithms usingconicfunctions coordinates, are specified thesefunctions bythen ratiosamongn + 1 homogeneous collineartransformations is isomorphic to essentially becausethegroupofinvertible modulomultiples oftheunitmatrix. thegroupofinvertible (n + 1) x (n + 1) matrices and ? 3 appliestheseto the somebasictermsandconcepts, Section2 introduces at theverticesof a simplex. studyofconicfunctions withgivenvaluesand gradients canbe obtainedusing0(n2) numerical Section4 showshowtheseconicinterpolations aftereach function and gradient evaluation.Section5 givesan algorithm operations canbe derived;itcanbe readfirst from schemata whichspecific algorithms optimization interested incomputation sinceitmakesfewreferences totherestof bythoseprimarily thispaper. denotereal Theellipsis"iff"is usedfor"ifandonlyif".Lowercase Greekletters functions, lowercaseLatinletters denoteintegers, numbers whichneednotbe integers; or or columnvectors;and uppercase lettersdenotemoregeneralmaps,matrices, ofanymatrix A is AT, andA-T = (AT)-1 = (A-1)T. spaces.The transpose 2. Definitions and explications. Dnis then dimensional spaceofrealn x 1 columnvectors; Rmxn is themn dimensional spaceofrealm x n matrices; n x n matrices,ordered is the in (n + 1) dimensionalspace of real symmetric Rnfvn byA 0-OiffvTAv?0 forall v E Inis then x n unitmatrix;and X is an openconvexsubsetin Dn. A smoothfunction f:X -* R is: iffitsgradient is constant; affine quadraticiffitsHessianis constant; collinear iffitis a ratioofaffine functions; coniciffitis a ratioofa quadraticto thesquareofan affine function; ifff(x)> 0 forall x E X, and positive all itslevelsetsare convex,and it has no smooth cuppediffit has a minimizer, extension to a largeropenconvexdomain. A map S: W -* X betweenconvexsets W and X is affine,quadratic,collinear,or conic iffeach affinef: X -* R makes the compositionfS: W -* R affine,quadratic, or conicrespectively. collinear, A gauge fora function p :X -* R which f: X -* R is a smoothpositivefunction makestheproduct p2f:X -* R quadratic. A scalingfora functionf: X -* R isa smoothmap S: W -* X froman open set W in a Euclideanspaceto X whichmakesthecomposition fS: W -* R quadraticwithunit Hessian. ofthesedefinitions are: Somebasicconsequences isaffine Eachconstant function isaffine; a function 1) Hierarchy ofconicfunctions. andeachquadraticor collinearfunction is conic. iffitis bothquadraticand collinear, to lines.A function is affine, quadratic,collinear,or coniciffits 2) Restrictions toeachlineinitsdomainis affine, orconicrespectively. restriction quadratic, collinear, A function toall Rniffitis hasa (collinear) conicextension 3) Maximalextensions. is an open (affine) quadratic.The largestconvexdomainforanyotherconicfunction halfspacein Dn. This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions 270 WILLIAM C. DAVIDON The criticalpointsofeach conicfunction (4) Criticalpoints. f:X -* forman affine subspaceinX, possibly null;i.e.,ifx #y arecritical pointsoff,theneachpointofX on critical thelinethrough x andy is also a critical pointoff. WhileHessiansat different ofmultiples of pointsneednotbe thesame,theysharea commonnullspace,consisting function hasa critical point thedisplacements x - y betweencritical points.A collinear iffit is constant.If a conicfunction has a local minimizer x*, thenx* is a global eachlevelsetoff is convex,andf hasjustone cuppedconicextension. minimizer, areconicsections, buttheseare 5) Levelsets.Thelevelsetsofeachconicfunction iffthefunction anaffine transformation of similar andconcentric isquadratic. To within is equivalent toone withunit itsdomain,eachconicfunction witha uniqueminimizer 0 E ns. Foreachsuchnormalized Hessianat itsminimizer conicf:X -* R,thereis just one a E Dnwith and aTx<1 1(x)=f*+- T )2 under forall x E X. The levelsetsoff are convexconicsectionswhichare invariant is quadraticanditslevel rotations abouta andhaveone focusat 0 E Din.Thisfunction setsare concentric spheresiffa = 0. For a ? 0, thereis just one levelset foreach eccentricity e > 0. It is {x EX: a TaTxe2( -a Tx)2} within whichf(x) f*+ le2/aTa. Thisis an ellipsoidfore < 1, a paraboloidfore = 1, fore > 1. The function f:X -* RIis convexonlyin the and a lobe of a hyperboloid ofa paraboloid segment {X eX: aTaxTX-(l+aTx)2}. 6) A representation of collinearmaps. A map S: W -+ X is collinearand has the valuexoE X andJacobianJ E Dinmat OE W c R'" iffthereis an h E R' with T 1+hTw>0 (2.1) i~~~~w and S(w)=xo+l hT forall w E W.The Jacobianofthismapat anyw E W is (Im+whTylf1 1+hTw +hW FIh W)* JwhTW) and SW = X, inwhichcase iffJ is invertible Thismapis invertible '( ) Y(x-xo) J-1r 7) A representation of conicfunctions.A functionf: X -* R is conic and has the with valuefoE R andgradient goE Dinat xoE X iffthereis an a E DinandA E DnVn T a s<1 (2.2) ~~~~~g0s1 sTAs TT and f(xo+s)=fo+ +2 iscollinear atanypoint iffA = 0. Itsgradient foralls E Dnwithxo+ s E X. Thisfunction X=XO+s of Xis g= 1 -2 V + A T (In - asTfl (ygo+As): s T)1(go (yIn+ as(g+ 3- V This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions As), 271 CONIC APPROXIMATIONS vanishesat wherey= 1- aTs is thevalueat x = xo+ s of a gaugeforf.Thisgradient eX X* =Xo+S* iff (A - = goa T)S* _go. The valueoff is thesameat all critical pointsx* andequals 1 gos* f* fo+o 2 1 s*As* 2 7* 2 y* whereey = 1-aTs*. The Hessianoff at x* is 1 T =1 2 (A-ago Y* -goaT +2(fo-f*)aa T)= T Y* =4 7* (In- as*-'A(I,-s*aT)1 (y*Il + as *)A(y*I, +s*aT). in whichcase iffA _; itis uniqueiffA is invertible, A critical pointis a minimizer e* = 1/(1-aTA-lgo),s* = -y*A-go, f* =fo-2goTA-lgo, andtheHessianoff at x* is 1S 2 7* (A - ago)A-'(A - goaT) of(2.2) iffh = jT a 8) Scalings.Thecollinear mapof(2.1) scalestheconicfunction and JTAJ= Im.A scalingS: W -* X of any functionf :X -* R pairs the level sets of f withconcentric spheresin W. If a function f:X -* R witha criticalpointx* has an -* theHessianoff at x* is invertible scalingS: W X, thenx* is a uniqueminimizer; positive definite,and this Hessian equals J-TJ-1, where J-1 is the Jacobian of S- :X-* W at x*. Morse'slemma[8] impliesthe converse:if a smoothfunction Hessianat a criticalpointx* E Rn,thenthereis an f: Rn-*R has a positivedefinite X ofx*. off tosomeneighborhood of the restriction invertible scaling -* A conic foreachxoE X there function : X is iff coordinates. R 9) Homogeneous f Rn+1 Rn+lvn+l A with a c E and E is (2.3) and f(xo+s)=-(C cII1 W)2 w E Dn+l of (). The gradient forall s Ein withxo+ s E X and all positivemultiples g E Rnoff at x = xo+ s is uniquelydetermined by (2.4) A (1) = s) +2f(x)c, wherey = c (s) is thevalueat x = xo+s of a gaugeforf.Equations(2.2) and (2.3) thesameconicfunction iff specify (2.5) C(-) and A(A g0a T T ag0 +O) 2fToc function overall Dn is positiveiffit is a positive An affine 10) Miscellaneous. Thereare andthesearetheonlygaugesfornonconstant constant, quadraticfunctions. functions overanyproperopenconvexsubsetX in n, and nonconstant positiveaffine This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions 272 WILLIAM C. DAVIDON whileall thesegaugeeachconstant function overX, thegaugesforeachnonconstant arepositivemultiples conicfunction ofeachother. Thesumoftwo(collinear) conicmapsis (collinear) conicifftheysharea gauge.The setofthosecollinear W toan n dimensional X mapsS: W -* X, froman mdimensional whichsharea gauge,isan (m + 1)n dimensional vectorsubspaceinthe-(m + 2)(m + 1)n dimensional vectorspaceofall conicmapsS: W -*X whichsharethissamegauge. Collinearmapspreserve andcrossratios;i.e.,ifS: W-> X collinearity, convexity, is collinear, then (i) S pairscollinear pointsu,v,and w of W withcollinear pointsS(u), S(v), and S(w) of X; (ii) S pairseachconvexsubsetU in W witha convexsubsetSU inX; and (iii) ift,u,v,and w arecollinear pointsof W,thenforanynormson thespacesW andX, ilt-ullllv-wll list-SuilllSv-Swl Ilt-vllvu -wll list-Svii iiSu-SwlI whenever thedenominators arepositive.Notethatthechoiceofnormsdoes notaffect theseratiossince|IAsIl = IAI ilsiI foranynorm. to be considered 3. Conicinterpolations. ofthealgorithms Whileeach iteration usesthevaluesandgradients ofanobjectivefunction atjusttwopointstoupdatea conic itisinstructive interpolation, first toderivea necessary andsufficient condition forthere to be a conicinterpolation to givenfunction andgradient valuesat thevertices ofany simplex. THEOREM 1. Thereis a conicfunction f: X -* R withvalues fk ElRand gradients gkE Rn at them + 1 verticesXkof an m dimensionalsimplexin X iffthereare positive numbersYk with (3.1) fi-fi = (2 (gj (x-xi) +gi) forall i and j. Proof. Firstassume thatf is a conic functionwithvalues fkand gradientsgk at Xk, and let yk be thevalue at Xkofa gauge forf. Since a gaugeis positive,yk > 0, and sinceit is affine, itsvalue at anypointx(r) = xi + (xj -xi)r on theline through xi and xj is -* Rl ofa gaugetoshowthatthefunction defined q: ax by yi+ (y - yi)r.Use thedefinition q(r) = (yi + (-y- yi))2f(x(r)) is quadratic. Evaluate q and its derivativeq' at 0 and 1, and then use q (1)- q (0)= 2(q'(0)+q'(1)) to get(3.1). Now assumethatsomefk e R, gkE RIn,and positivenumbersyksatisfy(3.1) forthe m + 1 vertices Sinceonlytheratiosamongtheyk enterin(3.1),choose Xkofa simplex. yo= 1. DefineSk= Xk- xoand use thelinearindependenceofthem vectorsSkfork ? 0 to showthereis an a E RF with (3.2) aTSk=1-Yk forall k. For each k, definethevectorrk = definethenumber (3.3) Pij=2 (6gj- (gk - askigk) - 7kgO, g,) (xi -xi) This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions andforeach i andj, 273 CONIC APPROXIMATIONS Use (3.1) and (3.2),together with(fj-fi) + (fi-fo)+ (fo-f;)= O, to get (3.4) risi = Ryiyi (Poi+ poi pii), T~~~~~ r[Tsi.Use thisand thelinearindependenceofthe m vectorsSk fork $ to showthereis an A E R'Vf with and hencerTsj = (3.5) Ask = 0 rk forallk. In (2.2),useanya E R' satisfying (3.2) andA E R... satisfying (3.5) toobtaina values and function with the 0 conic given gradients. X -* withvaluesfkE R and gradients COROLLARY 1. Thereis an affinefunctionf: E at the m + 1 vertices a in X gk Rn Xk of simplex iff forall i and j, = g T(x1-xi) = fi-fi gf(x;-xi), a quadraticinterpolation iff (gi + gi) T(X-i jfif=2 ) a collinearinterpolation iffthereare positivenumbersYkwith fi- fi=7 (xi ViT(Xi-Xi) =g Vi 7i xi), and a conicinterpolation iffthereare positivenumbers7Ykwith fifi = (Igi++ Yg) (xj xi). andis includedhereonlyto Thislastcondition Proof. simply repeatsthetheorem fromthisone byshowingthata facilitate Obtaintheotherconditions comparisons. conicinterpolation iscollineariffthePijdefined by(3.3) areallzero,thatitisquadratic iffithasa constant gauge,withyi= yjforall i andj, and thatitis affine iffitis both andcollinear. 0 quadratic 2. If f: X -* R is a conic functionwithvalues fk E R and gradients X, thenforeach i and j thereis just one Pij = Pji E R forwhichthe each of gauge forf satisfy(3.3). ThisPijalso satisfies COROLLARY gk E Rn at pointsXk E values ykat Xk (3.6) (3.7) = (f-fi)2-i gT(x -Xi)g T(Xj xi), Pi; =fu fX+ig)(xxi)=fi-fi-gT(x1-xi) V~~~~~~i Vi and (3.8) 1 2(Sj_ i 2 yi RYi A (S__ Yi Yi fortheA E Rnvn of (2.2), whereSk= Xk - XO. Iff is cupped,thenpij_ O0.For each pointx of the affinesubspace in X spanned by the Xk, thereare 0k e R with Z Ck > 0 and X= E wkXk/Z, Ok.Each gaugeforfwithvalues Ykatxk has thevalue Z ckYk/Z, Ok > 0 atx, and thevalue off at x is (3.9) f f(x)= Z= E iJj7Yi7YjPq kYkfk 1 Z Y.WkYk 2 ( ZMkYk)2 Definepijby(3.3),use (3.1) toget(3.6) and(3.7),anduse (3.4) and(3.5) to Proof. its get(3.8).Iff is cupped,getp.j>0 fromA _0 and(3.8). Sinceeachgaugeis affine, This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions 274 WILLIAM C. DAVIDON value at Z cokXk/ZYWk is Z &0kYk/ZCOk, and sinceeach gaugeis positive,ZCkYk>0. in (3.9) bythesquareofthisgaugeto showthatf is thefunction Multiply f specified at Xk toshowthatitis therequired derivatives conic,andevaluatef anditsdirectional interpolation.O withvaluesfkE R and gradients COROLLARY 3. Thereare at most2nconicfunctions in D8n.Ofthese, at mostoneis gkE Dn at then + 1 vertices Xk ofan n dimensional simplex f and iffo> fkforeach k > 0, thenthereis cupped.If thereis a cuppedconicinterpolation just one gaugeforf whosevalue at xo is 1; itsvalue at each Xkis T -go5/ fk to-fk + POk' whereSk= Xk- xo and Pok = ((fo-fk)k -gOSgkSk) conicforeach 2 toshowthatthereisatmostoneinterpolating Proof. Use Corollary > with fora k and that these signs are all positive 0, choiceofsignsforthen numbers Pok cuppedconic. O COROLLARY 4. Thereis a conicfunctionovertheinterval[x_,x+] c=R withvalues E f? R and slopes g? E R at pointsx+ > x_ of R iffthereare rootsy > 0 to thequadratic equation (3.10) +-- +y2 y+g=. x+-x_ For each root'y,thereis just one conic interpolation gauged byan affinefunctionwhose at values 'y-.at xi have theratioy+/y = 'y. The value and slope of thisinterpolation x (r) = x_+ (x+ - x4)r E X are f(x (TJJ (1 -r)f + yrf+ 1 -,r+yr yr(1 -r)p (1 -,r+yr)2 and g (x (r)) (1-r)g_ + y3rg+ (1= - + yr) where P =2 ()'g+ - g_)(x+- x_) = f_-f+ + yg+(x+- x-) = f+-f_ --g_(x+ - x_) and p2 = (f+ _f_)2_ g+g_(x+_x_)2 There is a cupped conic extensionof f iffeitherf is constant,or else y2g+> g_ and y3g+> g_,whentheminimizer x* and minimumf* off are 3 X*- -yg+x_-g_x+ 3 y g+-g_ and f* =fi (f+_f_-p)2 4p This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions 275 CONIC APPROXIMATIONS Proof.Use (3.1) withy = y+/y-to get(3.10). Use Theorem1, Corollary 2, and algebratoverify theotherconclusions.0 Thiscorollary provides a basisforonedimensional optimization algorithms suchas thosedevelopedbyBj0rstadandNocedal[1]. The following simpleexampleassumes to thevaluesand slopesof thatforeach k> 0, thereis a cuppedconicinterpolation at Xk+1, and that at Xk and Xk1 witha uniqueminimizer the objectivefunction fk-1 >fk. ALGORITHM 1. Input:A pointxointhedomainX c Rofan objectivefunction f: X -* R, an initial thevalue forcalculating nonzerosteps1E R to be takenfromxo,and a subalgorithm fk e R andslopegk e R off at anyXk E X. Calculatefo and go at xo. criterion k > 0 untilsomeconvergence is met,calculatefk andgk Foreachinteger = at Xk Xk-1 + Sk, andset Pk = ((fk 1-fk) -2gk 1gksk/2 Pkk-2_g-lkS Yk = -gk-lSk fk-1 -fk +Pk and 3 Sk+1 - YkgkSk 3 ykgk -gk-1 4. Recursive A minimizer andscalings. ofa cuppedconicfunction interpolations f can be calculated fromthe values and gradientsof f at n +1 points,using 0(n3) Thissectionshowshow0(n2) operations can be usedaftereachfunction operations. or and locateitsminimizer, and gradient evaluationto updatea conicinterpolation to updatea collinearscalingofitsdomain. equivalently, Themainideaistouse(2.3) toreplacetheconicfunction f: X -* Rbythequadratic n+1 1 T T function w -*2w Aw overthen dimensional in R withcW = 1, andto hyperplane replaceeachpointXk = xO+ Sk E X bya corresponding point 1 |Sk\ Wk = (- on thishyperplane. The nextalgorithm calculatesa V e Rn+lvn+l fora cuppedconic function f :X -* R andn dimensional simplexinX, whichis thenusedbythefollowing to specify a minimizer theorem andminimum off. ALGORITHM 2. k from0 through Input:Foreachinteger n,thevaluefk E Randgradient gk e Rn of a cuppedconicfunction f: X -* R atthekthvertexXk ofan n dimensional simplexinX, withfo>fk fork>0. Set Vo= OE Rn+lvn+l. Sk Xk n, set Foreach k from1 through Pk = ((fofk)-goSkgkSk)'/2, XO, = f -gOk + Sk Zk = 1 ) S Yk = ( Tygg and Vk = Zk-Vk-lYk YkgkSk This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions 276 WILLIAM C. DAVIDON T T If ykVk > 0, set Vk= Vk-l + else, set Vk = k ~~~~~VkV T YkVk Vk-i. Set V = V,. THEOREM 2. For each cuppedconicfunctionf: X -* R whosevalues fk E DRat the n + 1 verticesXk of an n dimensionalsimplexin X satisfyfo> fkforall k > 0, define VE l n+lvn+l byAlgorithm2. If y* > 0 and s* E Rn are definedby (?) 1- S*) Vgo thenf has itsminimumvalue f* -fo+* g?s* =fo-2[go,0]V(0) at xo+ s*, wheregoe Rn is thegradientoff atxo. Furthermore, (4.1) V_0, where C ERn+1 and AE Vc=0, VAV=V R n+lvn+i are definedby (2.3). and AVA=A-2f*cc inAlgorithm 3 ofTheorem1 toshowthatthe ykdefined 2, Proof.Use Corollary together withyo= 1,arethevaluesatXkofa gaugeforf.Use (2.4) toshowthatforeach inAlgorithm 2 satisfy k from1 through n,theYk and Zk defined (4.2) CTZk = 0 and Yk =Azk +2(fo-fk)c. valuef* E R off anda w* E Rn+1 withc Tw* = 1 Sincef is cupped,thereis a minimum and Aw* = 2f*c.For each k from0 throughn,defineZk E R and Uk E R+ by Zk=[W*,Zl,Z2, and '*,Zk] Uk=A-AVkA-2f*cc, anddefinetheinduction hypothesis (9k) Vk 0, Vkc = 0, Vk Vk=Vk, Uk 0 and UkZk = O. To prove go, note that Vo=0, thatf_f*, in (2.3) impliesUO2'0, and that UoZo= Uow* = Aw* - 2f*c = 0. The followingproofof 9Pk from9k-1 is similarto one forvariablemetricalgorithms. Assume A'k-1forsome k > 0 and showthatAVk= Uk-lZk, and henceyjvk= setsVk= Vkl1,so equalityiffUk-lZk = 0. IfYkVk =0 , thealgorithm Zk Uk_ Zk ?0, with Ukizk] =0. Now assumeykjv > 0 that Uk = Uk-i and UkZk = Uk-iZk = [Uk-iZk-i, and get Vk '-Vki 1 . Use Vklic O andc Zk = O to get Vkc = 0. Use VkAZk = Vkykof Uk and Vk toget Zk and Vk-1AVk-1 = Vk-1 to getVkAVk = Vk. Use thedefinitions Uk =Uk-1-Uk-1 T ZkZ k_ ZTUkZk Z k Uk-1Z Uk 0 and UkZk= 0 togetUkZk = 0. 10 0 anduse Uk-iZk-= Thenuse Uk_: to getUk 0, This completesthe proofof 2k from -ik,1and hence of An. Use thelinearindependence ofthevectors withc TZk =0 and Sk fork #0, together c w* = 1,toshowthatrank(Zk) = k + 1,which withUkZk = 0 givesrank(Uk) C n - k; inparticular, ofthe Un= 0. Use this,An,and (2.3) and (2.4) to obtaintheconclusions theorem. O This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions CONIC 277 APPROXIMATIONS While each iterationin Algorithm2 refersback to the startingpointx0, a linear mappingin each iteration,using0(n 2) operations,can update thisreferencepointso thatthekthiterationuses onlythestepXk- Xk1 insteadof Xk - x0.The nextalgorithm fromAlgorithm2 just by thismapping. differs ALGORITHM 3. Input:For each integerk from0 throughn,thevalue fk E R and gradientgk Rn of a cuppedconicfunction f: X -e Rat the kthvertexXkof an n dimensionalsimplexin X, withfk-l >fk foreach k > 0. Set Vo = 0 E Rn+1vn+1. For each k from1 throughn, set Pk = Xk -Xk-1, Sk= T TT1/2 f T\ Jk) -gk-lSkgkSk) 1" ((fk-1 ,Yk = -gk-ilSk fk +Pk fk-1 -fk + Pk 1 Y Sk Zk(Y = lYkgk gk- 1 Vk = Z -'kgkSk/ - Vk-lYkq Yk and Dk If Ik( T YkVk > -Sk) set set Vk=DVkVkDT Vk=Lk( Vk1 + 0, T YkVk k else, set Vk=DkVk-1Dk Set V= Vn. COROLLARY 1. For each cuppedconicfunction f: X Da R whosevaluesfk E R at the n + 1 verticesXk of an n dimensionalsimplexin X satisfyfk-1 > fk forall k > 0, define VE Stn+lvn+1 by Algorithm 3. If y* >0 and s EDRnare defined by 1 (s 1) ( g>) n( O) thenf has itsminimumvalue f* =fn +2 at Xn+ S*, gnS* fn - [gn,0]V(O) wheregnE R is thegradientoff at xn. Furthermore, V-'0, Vcn=0, VAnV=V and AnVAn=An-2f*CnCn wherecn E Rn+1 and AnE R;n+lvn+1 are definedbyreplacingxo byxn in (2.3). Proof.For each k from 0 throughn, define CkeIRn+1 and AkE Rn+lvn?l by replacingx0byXkin (2.3). Show thatforeach k from1 throughn,theYk and Zk defined in Algorithm3 satisfy T Ck-lZk = and Yk =Ak-1Zk+2(fk-1-fk)ck-1 insteadof(4.2). As in theproofofTheorem2, definef* E R and w* E Rn+1 bycw*=W 1 and Aow*= 2f*co. Define Zo = w* and for each k from1 throughn, defineZk E Rn+lxk+l by Zk =Dk[Zk-l, Zk], where Dk l DRn+lxn+lis defined in Algorithm3. of Uk and theinductionhypothesis9Pkof Replace c and A byCk and Ak in thedefinition This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions 278 WILLIAM C. DAVIDON theproofof Theorem2, and replace c and A by Ck-l and Ak-i in theproofof ?Pkfrom Use Ck = Dk Ck- and Ak = Dk Ak-lDk to completethe proofof thiscorolgk-i. lary. O matricesVk?0 can be factoredas Vk = LkL7j foran Lk E Rnll'Mk The symmetric of rank mk = rank(Vk) - n. These Lk are determinedonly to withina rightmultiplication by an orthogonal mk X mk matrix.Each Lk determinesan hk E: R and mk fk E RnXmk by (4.3) k T(=k) and these,withXk E X, determinea collinearscalingSk: Wk -O X off: X -e R by (2.1). The compositefunctionfSk: Wk -e R is quadraticwithunitHessian since /?kk implies Lk A Lk = Im and Ljc = 0, or equivalently,that hk and Jk satisfyhk = Jka and JkAJk = Im. The onlychangesneeded in Algorithm3 to updateLk, and hencethecollinearmap insteadof Vk = LkLk, are to definean integermO= 0, and thenforeach k > 0, to replace the update for Vk by the followingupdate formk and Lk E Rn+iXmk: Sk: Wk -e X, IfykVk >O, set mk= mk-4+1 else, set mk= mk- and and Lk Lk = Dk[Lk-1, Vk(YkVk)]; = DkLk-i. ineach 5. An algorithmschemata.Insteadofdiscardingall previousinterpolations iteration,as in Algorithm1 for one dimensionalproblems,or saving all previous interpolations,as in Algorithms2 and 3 forcupped conic objective functions,more about theobjective need to selectivelyreplacesome old information generalalgorithms functionwithnew. The choice of whatis to be replaced needs to take intoaccountat least two factors.For one, informationgatheredin regionsfar froma minimizeris from usuallybetterdiscardedthanthatfromnearbyregions.For another,information steps which are nearly in the same directionas the currentstep is usually better directions.Conflictbetween these discarded than that fromsteps in quite different desiderataarisewhenstepsnear theminimizerlie in a subspace offewdimensions,and various strategiesfor balancing them need to be considered.However since these considerationsare not unique to algorithmsusingconic approximationsor collinear scalings,the followingschemataleaves open thischoice of strategy,and onlysuggests how any one could be implemented. the dimensionm of the regionin Rwn Input: m and n, positiveintegersspecifying ifthereare no constraints, consistent withanylinearequalityconstraints; m = n; xo E Rn, a point consistentwithany constraintswhere the firstvalue and gradientof the objectivefunctionwillbe computed; JOE E;8nXm, a matrixwhose m columnsspan all steps consistentwithany constraints.If the initial conic approximationis quadratic, then the columnsof Jo are conjugate steps whichif taken fromthe minimizer would each increasethe quadraticapproximationby 2; hoE Rm,a columnvectorequal to zero ifthe initialconic approximationis quadratic; a positivenumber,used in a testwhichstopsthecalculationwhenthe E E DR, minimumof the currentconic approximationis withinE of the current functionvalue; and This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions 279 CONIC APPROXIMATIONS of the a subalgorithmforcalculatingthe value fk E Raand gradientgk E wR objectivefunctionat Xk. Step 0. Call fo and go at xo. Comment.The initialconic approximationto the objectivefunctionsatisfies (5.1) f(xo+ Jow ) =fo+dgojow+ WTw equal to with1 + hojW> 0. This approximationis singularat thosex E wR forall w E WRm withhow >0. If goJoho< 1, it has the minimumvalue xo+Jow/how forsome w E WRm at fo-2 II2goII Jo1 hJo4go For each k > 0: If 2 WkWk < E, thenstop. Step 1k Set Wk =-k-lgk-1 Else, go to Step 2k* Comment.This simpleconvergencetestis invariantundercollinearmappings;it maybe supplementedwithothers. withAk = 1/(1 + h k-1 Wk), forwhichthe Step2k. Find a stepSk = AkJk-1 Wk, typically functionvalue fk and gradientgk at Xk = Xk1 + Sk satisfy T 0 > gk-lSk, fk-1 >fk and >gk-lSkgkSk. (fk-1-fk)2 Comment.Some step Sk will satisfythe stated conditionsprovidedthe objective functionhas a lower bound. Step 3k. Set Pk = ((fk-1 - k) - lgk-SkgkSk), 'Yk = -gk-1Sk/(fk-1-fk +Pk), and T T rk = Jk-1 (Ykgk -gk-1)-hk-1YkgkSk. Comment.This 'Yk E Raand rk E Rm are used in theupdateforhk and Jk. The 'Yk is the ratioof thevalue of a gauge at Xk to thatat Xk1, and rk is thechangein thegradientof fromthestepfromXk-l to Xk, whereSk-l fSk-1: W -* R resulting thecompositefunction in satisfies(2.1). If differences functionvalues are used to estimategradients,thenthe ratherthanfirst componentsof rk can be estimateddirectlyfromfunctiondifferences, it to calculate rk. estimating gk and thenusing ~~T VT = 2Pk ?? Vkrk. Step 4k. Choose a vectorVkE RmwithVkVk Set Uk = (Vk - rk)/(vk - rk)Vk, hk = Ykhk-1 + (1- Yk - Ykhk-lVk)Uk, and Jk = 'YkJk-1 + (Sk -YkJk-1Vk)Uk-Skhk. about Comment.The choiceofthevectorVk E Rm determineswhatold information If u kTv;= 0 forsome j < k, theobjectivefunctionis to be replacedby new information. fromthejth stepis kept.IfSk is nearlyinthesame directionas some thenall information is to be kept,thenVk should be nearlyin the same direction si, forwhichinformation as v;. THEOREM 3. If an algorithmof thistypeis used witha cupped conic objective f: X -e St,and ifforeach k m, theVk chosenin Step 4k makes u kjV1= 0 forall function j < k, thenthem + 1 pointsXk fork _ m span them dimensionalaffinesubspacein X of This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions 280 WILLIAM C. DAVIDON thosex forwhichx - xo is in thecolumnspace of Jo. The restriction off to thisaffine has its subspacehas a minimumiff1 + h TWm+ > 0, and ifthisis thecase, thisrestriction minimumvalue f= fm-_llWm+ill2 at X* Furthermore, Lm E =Xm Jmwm+1 . +h T Rn+lxm definedby (4.3) satisfies LTAmLm = Im wherecmE Rn+1 and AmE and LTcm =0, n+lvn+1 are defined by replacing xo by Xm in (2.3). thisargument, Proof.To simplify keepxoas a reference point,as inAlgorithm 2, insteadofshifting thereference pointfromXk1 to Xkin thekthiteration, as in this algorithm andAlgorithm 3; i.e.,makethesechangesinthekthiteration: Sk = Xk - XO, rk = Pk =((fo -fk) -gOSkg g~~~~T Jk-1 (Ykgk - go) - hk-lYkg kSk, Sk)Sk2 Yk =-g sk/(fO fk+Pk), T hk = hk-1 +( hik-1V k)Uk and Jk = Jk-1 + Yk Sk Jk1Vk) Uk 3 ofTheorem1 toshowthattheseyk, Use Corollary withyo= 1,arethevalues together at Xkofa gaugeforf.DefineYkandZkas inAlgorithm 2 anddefineLk by(4.3). Show T thatrk = Lk-lyk, Lk =Lk-1+(Zk-Lk-lVk )k, Zk LkVk, and Vk= Lkyk. Define c E fRn+1and A E Rn+1vn+1 by (2.3) so thatYk and Zk satisfy(4.2). Show thatifyi and z1 also satisfy(4.2), as well as zi = Lk-1vV,v1= Lk-1Y;, and ukv1= 0, thenLk inheritsfromLk-l thepropertythatz; = Lkv1and v; = Lky1.Then as intheproofofTheorem2. Lastly,verify use induction thattheupdateforhkandJk from giveninthealgorithm differs thatgivenherebyjustthelinearmapping Dk defined in Algorithm 3. 0 6. Summary and conclusions. Threewaysto specifyconicfunctions have been one givenby (2.2) and used in Theorem1, one givenby (2.3) using introduced; andusedinTheorem2, andonegivenby(5.1) intermsofa coordinates homogeneous collinearscalingand usedin thealgorithm schemataof ? 5. Equations(2.2) and (2.3) conic function their the same iff specify parameters satisfy (2.5), and (2.2) and (5.1) = I. specifythe same conic functioniffho= JoTaand JoTAJo in ? 3 summarize of conicinterTheorem1 and itscorollaries basicproperties at theverticesof a simplex.Corollary4 polationswithgivenvaluesand gradients and Algorithm overline intervals, 1 specializesTheorem1 to conicinterpolations howthesecanbe usedforlinesearches,thoughsafeguards mustbe addedto suggests ofAlgorithm Theorem2 givestheproperties makethisa generalpurposealgorithm. 2, ofa toupdatea conicinterpolation aftereachevaluation whichusesO(n2) operations This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions CONIC APPROXIMATIONS 281 fromAlgorithm2 in usingonlythesteps functionand itsgradient.Algorithm3 differs 4 updates collinear Xk - Xkl ratherthanXk - xO formakingthese updates. Algorithm scalingsfortheobjectivefunctionratherthanconic approximationsto it,thoughthese are equivalent,in theabsence ofrounding,forapproximations withunique minimizers. Theorem 3 statesthe basic propertiesof the algorithmschemataof ? 5 forupdating collinearscalings.Specificalgorithmscan be obtained fromthisby adding rules for choosingthe factorAk determiningthe magnitudeof Xk - Xk-l in Step 2, and for choosingthe directionof the vectorVk in Step 4, whichdetermineswhatinformation frompreviousiterationsis to be replaced in the currentone. P. Bj0rstadand J. Nocedal [1] have developed an algorithmforone dimensional minimization whichimprovesupon Algorithm1 of ? 3. Theirsincludessafeguardsto insurestability.They prove it has an R quadraticconvergencerate,when thereis a neighborhoodoftheminimuminwhichf"is positiveandf"' is Lipschitzcontinuous,and thatithas a fasterconvergenceratethanalgorithmsmakingcubicinterpolations when f"f"">2(ft"')2. D. Sorensen [10] has shown that an algorithmupdatingcollinearscalingsfor n dimensionalproblems has a Q superlinearconvergencerate and that it compares favorablywitha widelyused BFGS variablemetricalgorithmon a varietyof standard testproblems.His algorithmupdates a matrixCk correspondingto fkj =AkA1 for theJ of (2.1) and A of (2.2). It is a pleasureto thankP. Bj0rstad,J.More, and D. Sorensen Acknowledgments. fordiscussionsof theseideas and forconstructive commentson an earlierdraftof this paper. I am also indebtedto twoanonymousrefereeswho correctedseveralerrorsand made valuable suggestions. REFERENCES [1] P. BJ0RSTAD AND J. NOCEDAL, Analysisof a newalgorithm forone-dimensional minimization, Computing,22 (1979), pp. 93-100. [2] C. CHARALAMBOUS, Unconstrained optimization basedonhomogeneous models, Math.Programming, 5 (1973),pp. 189-198. [3] E. DAVISON AND P. WONG, A robust conjugate gradient algorithm whichminimizes L-functions, ControlSystemsRep. 7313, U. of Toronto,Toronto,Ontario, 1974. [4] J.DENNIS AND J. MORE, Quasi-Newton methods, motivation and theory, SIAM Rev. 19 (1977),pp. 46-87. [5] I. FRIED, N-stepconjugate gradient minimization schemefornon-quadratic A.I.A.A. J.9 functions, (1971),pp. 2286-2287. [6] D. JACOBSON AND W. OXMAN, An algorithm thatminimizes in homogeneous functions ofn variables N + 2 iterations and rapidlyminimizes J. Math.Anal. Appl., 38 (1972),pp. generalfunctions, 533-552. [7] J. KOWALIK AND K. RAMAKRISHNAN, A numerically stableoptimization methodbased on a Math. Programming, 11 (1976), pp. 50-66. homogenousfunction, [8] M. MORSE, The calculus of variationin thelarge,AmericanMathematicalSociety,Providence,RI, 1934. [9] J.RAPHSON, Analysis Aequationum Universalis, London,1690. [10] D. The Q-superlinear convergence of a collinearscalingalgorithm forunconstrained optimization, thisJournal, thisvolume,pp. 84-114. SORENSEN, This content downloaded from 165.82.168.47 on Fri, 12 Apr 2013 13:02:37 PM All use subject to JSTOR Terms and Conditions
© Copyright 2026 Paperzz