Adaptive Nonmonotonic Methods With Averaging of Subgradients Chepurnoj, N.D. IIASA Working Paper WP-87-062 July 1987 Chepurnoj, N.D. (1987) Adaptive Nonmonotonic Methods With Averaging of Subgradients. IIASA Working Paper. IIASA, Laxenburg, Austria, WP-87-062 Copyright © 1987 by the author(s). http://pure.iiasa.ac.at/2990/ Working Papers on work of the International Institute for Applied Systems Analysis receive only limited review. Views or opinions expressed herein do not necessarily represent those of the Institute, its National Member Organizations, or other organizations supporting the work. All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage. All copies must bear this notice and the full citation on the first page. For other purposes, to republish, to post on servers or to redistribute to lists, permission must be sought by contacting [email protected] ADAPTIW NONMONOTONIC METHODS WITH AVERAGING OF SUBGRADIENTS N.D. Chepurnoj July 1987 WP-87-62 Working Papers are interim r e p o r t s on work of t h e International Institute f o r Applied Systems Analysis and have received only limited review. Views o r opinions e x p r e s s e d herein d o not necessarily r e p r e s e n t those of t h e Institute or of i t s National Member Organizations. INTERNATIONAL INSTITUTE FOR APPLIED SYSTEMS ANALYSIS A-2361 Laxenburg, Austria FOREWORD The numerical methods of t h e nondifferentiable optimization are used f o r solving decision analysis problems in economic, engineering, environment and agricult u r e . This p a p e r is devoted t o t h e adaptive nonmonotonic methods with averaging of t h e subgradients. The unified approach is suggested f o r construction of new deterministic subgradient methods, t h e i r stochastic finite-difference analogs and a posteriori estimates of a c c u r a c y of solution. Alexander B. Kurzhanski Chairman System and Decision Sciences Program CONTENTS 1 Overview of Results in Nonmonotonic Subgradient Methods 2 Subgradient Methods with Program-Adaptive Step-Size Regulation 3 Methods with Averaging of Subgradients and Program-Adaptive Successive Step-Size Regulation 4 Stochastic Finite-Difference Analogs t o Adaptive Nonmonotonic Methods with Averaging of Subgradients 5 A P o s t e r i o r i Estimates of Accuracy of Solution to Adaptive Subgradient Methods and Their Stochastic Finite-Difference Analogs References ADAPTIVE NO~ONOTONICMlCI'HODS WITH AVERAGING OF SUBGRADEXTI'S N.D. Chepurnoj 1. OVEBYIEW OF RESULTS IN NONMONOTONIC SUBGEWDIENT METHODS Among t h e existing numerical methods of solution of nondffferentiable optimization problems, t h e nonmonotonic subgradient methods hold an important position. The pioneering work by N.Z. S h o r [26] gave impetus t o t h e i r explosive progress. In 1962, he suggested an iterative process of minimization of convex piecewise-linear function named afterwards t h e generalized gradient descent (GGD): where gS E a f ' ( x S) i s a set of subgradients of a function f ' ( x ) at a point x S ; rs r 0 i s a s t e p size. For t h e differentiable functions this method a g r e e s v e r y closely with the well-known subgradient method. The fundamental difference between them i s t h a t t h e motion direction (- g ) in (1.1) is, a s a rule, not a descent direction. A t t h e f i r s t attempts to substantiate theoretically t h e convergence of proc e d u r e s of t h e type (1.1) t h e r e s e a r c h e r s immediately faced two difficulties. For one thing, t h e objective function lacked t h e p r o p e r t y of differentiability. For another, method (1.1) w a s not monotonic. These combined f e a t u r e s r e n d e r e d impractical t h e use of known gradient procedure convergence theorems. New theoretical approaches t h e r e f o r e became a must. One more "misfortune" came on t h e neck of t h e others: numerical computations demonstrated t h a t GGD h a s a low convergence r a t e . Initially g r e a t hopes were pinned on t h e step-size selection s t r a t e g y as a way towards overcoming t h e crisis. By t h e e a r l y 1970s difficulties caused by t h e formal substantiation of convergence of nonmontonic subgradient p r o c e d u r e s had been mastered and different approaches to t h e step-size regulation had been offered [6, 7, 8 , 1 9 , 20, 261. However the computations continued t o prove t h e poor convergence of GGD in p r a c t i c e . I t c a n b e said t h a t t h e f i r s t s t a g e in GGD evolution w a s o v e r in 1976. Thereupon t h e numerical methods of nondifferentiable optimization developed in t h r e e directions, i.e., methods with s p a c e dilation, monotone, and adaptive nonmonotonic methods w e r e explored. Let us dwell on each of t h e s e approaches. In a n e f f o r t t o enhance t h e GGD efficiency, N.Z. S h o r elaborated methods where t h e operation of s p a c e dilation in t h e direction of a subgradient and a difference between t w o successive subgradients w a s employed. Literally t h e next f e w y e a r s w e r e prolific f o r p a p e r s [27, 28,291 investigating into t h e s p a c e dilation operation in nondifferentiable function minimization problems. A high rate of convergence of t h e suggested methods w a s c o r r o b o r a t e d theoretically. Computational p r a c t i c e a t t e s t e d convincingly to t h e advantageousness of application of t h e algorithms with s p a c e dilation, especially t h e r-algorithm [29], as alternative t o GGD, providing dimensions of t h e s p a c e do not exceed 200 t o 300. However, if dimensions are ample, f i r s t , a considerable amount of computations is s p e n t on t h e s p a c e dilation matrix transformation, second, some e x t r a capacity of computer memory i s required. The monotonic methods became a n o t h e r essential direction. Even though t h e f i r s t p a p e r s on t h e monotonic methods a p p e a r e d back in 1968 (V.F. Dem'janov [30]), t h e i r p r o g r e s s r e a c h e d its peak in t h e e a r l y 70's. Two classes of these algorithms should b e distinguished h e r e : t h e &-steepest descent 15, 301 and t h e &-subgradient algorithms [31-341. W e shall not examine them in detail but note, t h a t t h e monotonic methods offered higher r a t e of convergence a s against GGD. Just as with t h e methods using t h e s p a c e dilation, vast dimensions of problems to be solved still remained Achilles' heel f o r t h e monotonic algorithms. Thus, t h e nonmonotonic subgradient methods have come into p a r t i c u l a r importance in t h e solution of large-scale nondifferentiable optimization problems. The nonmonotonic p r o c e d u r e s have a n o t h e r important object of application, a p a r t from t h e large-scale problems, i.e., t h e problems in which t h e subgradient cannot b e precisely defined at a point. The latter encompass problems of identification, learning, and p a t t e r n recognition [I, 211. The minimized function i s t h e r e a mathematical expectation whose distribution law is unknown. E r r o r s in subgradient calculation may s t e m from computation errors and many o t h e r r e a l processes. Ju.M. Ermol'ev and Z.V. Nekrylova [9] w e r e t h e f i r s t to investigate t h e like procedures. Stochastic programming problems have increasingly drawn t h e attention to t h e nonmonotonic subgradient methods. However, as pointed out e a r l i e r . GGD, widely used, r e s i s t a n t to e r r o r s in subgradient computations, saving memory capacity, still had a poor rate of convergence. Of g r e a t importance t h e r e f o r e w a s t h e construction of nonmonotonic methods such t h a t , on t h e one hand, r e t a i n all advantages of GGD and, on t h e o t h e r , possess a high rate of convergence. I t has been this requirement t h a t h a s l e t to elaboration of t h e adaptive nonmonotonic procedures. An analysis revealed t h a t the Markov n a t u r e of GGD is t h e chief cause of i t s slow convergence. I t is quite obvious t h a t t h e use of t h e m o s t intimate knowledge of p r o g r e s s of t h e computations is indispensable t o selection of t h e direction and regulation of t h e stepsize. Several ideas provided t h e basis f o r t h e development of adaptive nonmonotoni c methods. The major concept of all techniques f o r selecting t h e direction and regulating t h e step-size w a s the use of information about t h e fulfillment of necessary conditions to have the extremal-value function. I t s implementation are t h e methods with averaging of the subgradients. In t h e most general case by t h e operation of averaging i s meant a procedure of "taking" t h e convex hull of a n a r b i t r a r y finite number of vectors. The operation of averaging in t h e numerical methods w a s f i r s t applied by Ja.2. Cypkin [ Z Z ] and Ju.M. Ermol'ev [ll]. The p a p e r by A.M. Gupal and L.G. Bazhenov [3] also dealing with t h e use of operation of averaging of stochastic estimates of t h e generalized gradients app e a r e d in 1972. However all t h e above p a p e r s considered t h e program regulation of the stepsize, i.e., a sequence [ r , ] independent of computations w a s selected such t h a t The next n a t u r a l s t a g e in t h e evolution of this concept w a s t h e construction of adaptive step-size regulation using t h e operation of averaging of preceding subgradients. In 1974, E.A. Nurminskij and L.A. Zhelikovskij [I81 suggested a successive p r o g r a m a d a p t i v e regulation of the step-size f o r t h e quasigradient method of minimization of weakly convex function. The c r u x of this relation consists in t h e following. Let an i t e r a t i v e sequence be constructed according to t h e r u l e where g S E a j ( z s ) i s a quasi-gradient of t h e function j ( z ) at t h e point z S ,r o i s a constant step-size. Assume t h a t t h e r e e x i s t z E En t h a t f o r any s = 0 , 1,2, ... llzS nation of subgradients e S DE conv Then t h e point tgi S ji and numerical parameters - G 11 5 6. t > 0, 6 > 0 such Let us suppose also t h a t a convex combi- tgi if LO exists such t h a t IlesolI S t, :0 S . z i s sufficiently close to t h e set X' = argmin j ( z ) according to t h e necessary extremum conditions. In t h e given case t h e step-size h a s to be reduced and t h e procedure r e p e a t e d with t h e new step-size value r l starting at t h e ob- . numerical realization of t h e described algorithm r e q u i r e s a tained point z S D The specific r u l e f o r constructing vectors eS'. In [10] the v e c t o r eS' is constructed by the r u l e os' = P r o j O/conv S g kk ' s , t h a t is, all quasi-gradients are included into t h e convex hull s t a r t i n g from t h e m o s t r e c e n t instant of t h e step-size change. Numerical computations b o r e out t h e expediency of making allowances f o r such regulation. However a g r a v e disadvantage w a s inherent in it: t h e g r e a t laboriousness of iteration. Considering t h a t t h e approach as a whole holds promise, averaging schemes had to be developed f o r t h e efficient use when selecting t h e direction and regulating t h e step-size. This p a p e r treats such averaging schemes. They s e r v e as a foundation f o r new nonmonotonic subgradient methods, f o r t h e description of stochastic finitedifference analogs, a posteriori estimates of solution accuracy. P r i o r to discussing results, let us make s o m e g e n e r a l assumptions. Presume t h a t t h e minimization problem i s being solved on t h e e n t i r e s p a c e of t h e function j ( z ) : where En i s a n n-dimensional Euclidean space. The function j'(z) will be everywhere thought of as being t h e convex eigenfunction j'(z), dom j' = E n , t h e sets [z :/(z) 5 c j being bounded f o r any bounded constant C. The set of solutions of t h e problem (*) will be believed to be t h e s e t 2. SUBGRADIENT KETHODS WITFi PROGRAM-ADAPTIVE STEP-SIZE ImGULATION The concept of adaptive successive step-size regulation has already been set forth. In 1231 a way of determining t h e instant of t h e step-size variation w a s suggested. Central to i t was t h e simplest scheme of averaging of t h e preceding subgradients. This method i s easy to implement and effects a saving in computer memory capacity. Compared t o t h e program regulation, t h e adaptive regulation improves convergence of t h e subgradient methods. Description of Algorithm 1 Let z 0 be a n a r b i t r a r y initial point, b sequences such t h a t ck LO = > 0, tk -4 0, rk > 0 be a constant, > 0, rk -+ 0. P u t itk j, s [rkj be number = 0, j = 0, k = 0, E aj'(zO). S t e p 1. Construct S t e p 2. If j'(zS +') >j'(zO) + b, then s e l e c t zS E Iz :j'(z) zSj'(zO)jand go to Step 5. S t e p 3. Define lies +41 > c k , then s = s + 1and go t o s t e p 1. S t e p 5. S e t k = k + 1,j = s + 1,s = s + 1and g o t o Step 1. S t e p 4 . 1f THEOREM 1.1 Assume t h a t t h e problem (8) is s o l v e d b y a l g o r i t h m 2. Then a l l Limit p o i n t s of t h e sequence [ z S1 belong to x*. PROOF Denote t h e instants of step-size variations by s m . Let us prove t h a t the step-size rk v a r i e s a n infinite number of times. Suppose i t i s not so, i.e., the stepsize does not v a r y starting from a n instant s, and i s equal r , . Then t h e points z S f o r s 5 s, belong t o t h e set and are r e l a t e d by Considering t h a t t h e step-size does not v a r y , llesll t o t h e limit by s -4 > E, > 0 for s r s,. In passing in t h e inequality we obtain a contradiction in t h e boundedness of t h e set The f u r t h e r proof of Theorem 1.1amounts to checking t h e g e n e r a l conditions of algorithm convergence derived by E.A. Nurminskij [17]. NURMINSKIJ THEOREM Let t h e s e q u e n c e lzS1 a n d t h e set of s o l u t i o n s X * be s u c h t h a t t h e f o l l o w i n g c o n d i t i o n s a r e s a t i s f i e d : Dl. For any sequence [ z s k 1 such t h a t D2 There exists t h e closed bounded set S such t h a t D3 F o r any subsequence [ znk 1 such t h a t t h e r e e x i s t s co > 0 s u c h t h a t f o r all 0 inf m : [IIzm-znkII > rj <E S ro and any k = m k < = I. m"'k D4 The continuous function W ( z )e x i s t s such t h a t f o r a n a r b i t r a r y subsequence [ znk j such t h a t and f o r t h e subsequence [ z m k jcorresponding to i t by condition D3 f o r a n a r b i trary 0 <E S r0 D5. The function W ( z )of condition D4 assumes no more t h a n countable number of values on t h e set x*. Then a l l limiting points of t h e sequence [ z S1 belong to x*. Select t h e function f ( z ) as t h e function W ( z ) .Conditions Dl, D5 are satisfied in view of t h e algorithm s t r u c t u r e and t h e e a r l i e r assumptions. The rest of t h e conditions will b e verified by t h e following scheme. W e will p r o v e t h a t conditions D3, D4 hold t h e points being t h e i n n e r points of t h e set s = [ z :f ( z) 3 f ( zO ) 1. I t is therewith obvious t h a t max W ( z) ES < inf W ( z) t Then t h e sequence [ z S1 falls outside t h e set S only finite number of times. Consequently, condition D2 is satisfied and t h i s automatically entails t h e validity of D3 and D4. So, let t h e subsequence [znpj e x i s t s such t h a t znp --, z ' x*. Assume at t h i s s t a g e of t h e proof t h a t z ' E int S. W e will p r o v e t h a t t h e r e e x i s t s ro > 0 s u c h t h a t f o r all 0 < E 3 r0 at a n a r b i t r a r y p: Now suppose condition (2.1) i s not satisfied, t h a t is, f o r any r such t h a t l1zs - zn41 3 r f o r all s > np. > 0 there exists n p W e have f o r sufficiently l a r g e n p and s > n,,. By the supposition 0 Z B f ( x ' ) . By virtue of the closedness, convexity and upper semi-continuity of t h e many-valued mapping a f ( x ) t h e r e exists E >0 such t h a t 0 = conv G q c ( z'), where conv 1.1 i s a convex hull and G 4 r ( x ' ) i s a set It i s easily seen t h a t E >0 U I e ( x') C int S, where f Obviously 6 E conv G 4,(x '). ( t h a t f o r k 2 K ( 6 ) we have can be always selected in such a way t h a t z) = > 0. As S r)/Z. ek - -x 1 5 4 . 6 = min 1; 11, 0, t h e r e exists a n integer L(6) such x :z Let P u t n p 2 K ( 6 ) . Then i t i s readily s e e n t h a t f o r s 2 np t h e step-size r k can vary no m o r e than once within t h e set U I c ( z ' ) . Examine t h e sequence IsS1 separately on t h e intervals n p5 s s; = min When n p S s < s p* , where sm :sm a n p ' . < sp t h e points z S are r e l a t e d as follows where t h e index L i s reconstructed with r e s p e c t to s p . Let us consider t h e s c a l a r products where z np = grip, Since (z N1 ,g zs N1+l E s 2 np, it conv G q , ( z ' ) , is possible to prove that ) 2y, y =1/2l9~. Thus, We n e x t c o n s i d e r t h e scalar p r o d u c t s ds = ( z N 1 +-lz s , where s 2 N1 g s ) = r 1 ( s -Nl - l ) ( z s - l , g s ) , + 1. Ne The index N2 e x i s t s such t h a t ( z ,g Ne+l ) 2 y and d N e + l hr l ( N 2 - Then in a similar way we c a n p r o v e t h e e x i s t e n c e of indices Nt ( t 2 3 ) s u c h t h a t I t i s e a s y t o p r o v e t h a t Nt + - Nt S N < =, t = 1, 2,. ... Let Nt b e t h e maximal of indices Nt t h a t does not e x c e e d s; . Then Since s p - Nt0 5 N , t h e n with p --r = t h e l a s t t e r m on t h e right-hand side of t h e inequality a p p r o a c h e s z e r o . W e finally obtain where E; -+ 0 with p' --r w. I t i s not difficult t o notice t h a t t h e reasoning which underlies t h e derivation I of inequality (2.2) may b e also r e p e a t e d without changes f o r t h e i n t e r v a l s L s p t o get f(zrn) - f ( z n p ) s- 7 1 Adding (2.2) t o (2.3) we obtain - s;) y + c; In passing t o t h e limit by m --, = in inequality ( 2 . 4 ) we are led t o a contradiction with r e s p e c t t o t h e boundedness of continuous function on t h e closed bounded set U q t ( x d ) .Consequently, condition ( 2 . 1 ) i s proved. Let mp = inf m :l / x m m >np By s t r u c t u r e xmp F - x nPI\ >r . u , ( x n p ) , b u t f o r sufficiently l a r g e p All t h e reasoning involved in derivation of inequality ( 2 . 4 ) remains valid f o r t h e ins t a n t m p , t h a t is, we have In passing to t h e limit by p - --, -we lim w(xmp) < lim w ( a n p ) P-- get . P-- The f u r t h e r proof of t h i s theorem follows from t h e Nurminskij theorem. To fix more p r e c i s e l y t h e instant when t h e i t e r a t i o n p r o c e s s g e t s into t h e neighborhood of t h e solution we c a n employ t h e following modification of algorithm 1 provided t h e computer c a p a c i t y allows. Let z 0 be a n a r b i t r a r y initial point, d sequences such t h a t ck > 0, ck 4 0 , rk > 0 be a > 0, rk constant, 4 [ E 1,~ Irk 1 be number 0 ; k l , k2, . . . , k, be integer positive bounded constants. = 0,j = 0 ,k = 0 ,e0 = g o Put s E Bf(zO). S t e p 1 Construct Step 2 , ~ ~ + ~ ~ [ z : f ( z and ) ~ go f ( t oz ~ ) ] If f ( z S f l ) > f ( z O ) + d then Step 5. Step 3 Define "0s s +l e,S s - j +2 +1 = + 1 g s +l s - j + 2 Each of t h e notations P i ( - , ., -) designates a n a r b i t r a r y convex combination of a finite number of t h e indicated preceding subgradients. Find - min IIe; +lI I . LLs+ l - o s p s m >E Step4 If p s + l Step5 S e t k = k + 1 , j = s +1,s = s + 1 , eS = g S a n d g o L o S t e p 1 . THEOREM 2.1 ~ th , ens = s +landgotoStepl. m p p o s e t h a t the problem (*) is solved b y the modiJ%ed algo- r i t h m I. Then all limit points of the sequence iz 1 belong to x*. 3. METHODS WITH AVERAGING OF SUBGRADIENTS AND PROGRAM-ADAPTIVE SUCCESSIVE STEP-SIZE REGULATION S u c c e s s i v e Step- Size R e g u l a t i o n A s noted in a number of works [Z, 3, 1 2 , 161 i t i s expedient t o average subgradients calculated at t h e previous iterations s o t h a t t h e subgradient methods will b e more regular. F o r instance, when t h e "ravineu-type functions are minimized, t h e averaged direction points t h e way along the bottom of t h e "ravine". I t will be demonstrated in Section 5 t h a t t h e operation of averaging enables t h e improvement of a posteriori estimates of the solution a c c u r a c y along with t h e upgrading of regularity of t h e described methods. Methods with averaging of subgradients and consecutive p r o g r a m a d a p t i v e regulation of t h e step-size are set f o r t h in this section. Results obtained h e r e stem from [24]. Description of Algorithm 2. Let z 0 be a n a r b i t r a r y initial approximation; 3 >o be a constant; Irk 1, irk j be number sequences such t h a t Puts = 0,j = 0 ,k = 0 , Step 1 Construct > f ( z O ) + 8, then go t o Step 7. Step 2 If f ( z S + I ) Step 3 Define v S Step4 ~ o n s t r u c t e ~ + + ~ (=s e-~j + ~ ) - l ( v ~ + ~ - e ~ ) . Step5 1f \leS+41> el:,t h e n s = s + l a n d g o t o S t e p l . Step6 Set k = k + 1 , j = s + 1 , s = s + 1 , eS = v S a n d g o t o S t e p 1 . Step7 ~ e t z ~ + ~ E i z : f ( z ) ~ f ( z ~ ) ]j, =s s=, s k= + kl +, l a n d g o t o Step 1. according t o t h e schemes a ) o r b). In construction of t h e direction v S t h e following schemes of subgradient averaging a r e dealt with. a) The "moving" average. Let K where gi b) The E a f ( z i ), hi, "weighted" + 1be a n integer. Then + 0. average. vS=gS+hS(vS-l-gS), Let where M +1 be OShsS1 an for integer. s f 0 (mod Then M), 0 S As S 6 <1f o r s = O (modM). THEOREM 3.1 Assume that t h e probLem (*) is solved by aLgorithm 2. Then a L L l i m i t p o i n t s of t h e sequence [zSj belong to t h e s e t x*. 4. STOCHASTIC FINITE-DIPFEENCE ANALOGS TO ADAPTIVE NONBEONOTONIC METHODS WITH AVERAGING OF SUBGBADIENTS I t should b e emphasized t h a t t h e practical value of t h e subgradient-type methods essentially depends upon t h e existence of t h e i r finite-difference analogs. Of g r e a t importance t h e finite-difference methods a r e primarily in situations when subgradient computation programs a r e unavailable. This generally o c c u r s in t h e solution of large-scale problems. Construction of t h e finite-difference methods in the nonsmooth optimization originated two approaches: t h e nondeterministic and the stochastic ones. Each of them h a s i t s own advantages and disadvantages. The stochastic approach i s favored here. One of t h e advantages of t h e introduced averaging operation i s t h e f a c t t h a t t h e construction of stochastic analogs t o subgradient methods p r e s e n t s no special problems. The offered methods a r e close t o those with smoothing [4] which, in t h e i r t u r n , a r e closest to t h e schemes of stochastic quasi-gradient methods [IZ]. Research into t h e stochastic quasi-gradient methods with successive step-size regulation i s quite a new and underdeveloped field. Ju. M. Ermol'jev s p u r r e d f i r s t t h e investigations in this direction. His and Ju. M. Kaniovskij results [13] a r e undoubtedly of theoretical interest. However implementation of methods described in [14] c r e a t e s complications as t h e r e is no r u l e to regulate variations in the step-size. Let us f i r s t dwell on functions f ( x , i ) of the form where ai > 0. P r o p e r t i e s of the functions f ( x , i ) have been studied by A.M. Gupal [4] proceeding from the assumption t h a t f ( z ) satisfies the Lipschitz local condition. f!, f ( z ) THEOREM 4.1 is a convez e i g q f b n c t i o n , dom f = E" , t h e n f ( z , i ) is a l s o a convez eige@unction, dom f ( z , i ) A sequence of j b n c t i o n s f ( z , i ) u n ~ r m l yconverges to f ( s ) THEOREM 4.2 w i t h ai -+ = E n , for a n y ai > 0. 0 i n a n y bounded d o m a i n X. Now we shall g o t o the description of stochastic finite-difference analogs t o algorithms with successive program-adaptive regulation of the step-size and with averaging of the direction. Description of Algorithm 3 be a constant, [ t ij, Let s o b e a n a r b i t r a r y initial approximation, b >0 [ t i j, [ai1, Ipi j be number sequences. P u t s = 0,i = 0,j = 0. S t e p 1 Compute <s " (f(si8 'IS . - . , ~ t + Q i. ,. . , X n ) =- 1 2 a i I: = 1 - where S;, k = 1 , n are independent random values distributed uniformly on intervals [zi - a i , z; Step2 + ail, ai > 0. Construct e S in compliance with t h e schemes a ) and b), where the subgradients a r e r e p l a c e by t h e i r stochastic estimates. Step3 F i n d s S + ' = z S- t i e S . Step4 Iff(zS+')>f(z0)+b,thengotoStep9. Step5 ~ e f i n e z ~ =" z S + ( s - j +l)-'(es Step 6 If s Step7 ~ f I l z ~ + ~ I tI h>e tn ~s ,= s + l a n d g o t o S t e p l . Step8 Puti = i + l , j = s + l , s = s + l a n d g o t o S t e p l . Step9 ~ e t z ~ I+ z : f~( z €) S f ( z 0 ) ] , j = s + 1 , i = i + 1 , s = s + l a n d g o -j < pi, then s = s -zS) + 1and go t o Step 1 . t o Step 1. THEOREM 4.3 Let t h e problem (*) be solved by a l g o r i t h m 3 a n d t h e n u m b e r se- quences satisfy t h e following conditions Then almost f o r all o t h e sequence f ( z S (o))converges and all Limit points of the sequence [ z S ( a ) j belong t o t h e set of solutions x*.Theorem 4.3 is proved in detail in [25]. 5. A POSTERIORI ESTIMATES OF ACCURACY OF SOLUTION TO ADAPTIYE SUBGRADIENT METHODS AND THEIR STOCHASTIC FINITE-DIFFERENCE ANALOGS In numeric solution of extremum problems of nondifferentiable optimization strong emphasis i s placed on t h e check of obtained solution accuracy. Given t h e solution accuracy estimates, f i r s t , a v e r y efficient r u l e of algorithm stopping c a n be formulated, second, t h e obtained estimates c a n form the basis f o r justified conclusions with r e s p e c t t o t h e s t r a t e g y of selection of algorithm parameters. Using r a t h e r simple procedure a posteriori estimates of solution accuracy for t h e introduced adaptive algorithms are constructed h e r e . The estimates provide a means f o r s t r i c t l y evaluating efficiency of t h e averaging operation use. Thus, assume t h a t the convex function minimization problem i s being solved. Suppose t h e set X* contains only one point x To solve t h e problem theorem 2.1 i s the (0) proof *. consider algorithm 1. The spin-off from the proof of t h a t the sequence l x S j falls outside lx :p ( x ) 5 f ( x O ) + 61 a finite number of times only. Therefore, t h e set c 2 0 exists such that for s 2 ? Then t h e s t e p size will v a r y only if the condition lies +'I1 5 rk is satisfied, where Without loss of generality w e will assume t h a t t h e f i r s t instant of t h e change from t h e s t e p r o t o r l o c c u r r e d just because t h e condition is satisfied. From t h e convexity of t h e function f ( x ) i t i s i n f e r r e d t h a t Summation of inequalities (5.1), (5.2), . . . (5.3) yields C;O~ Denote the expression ( s o + I)-' xi - x O ) by A,,. W e have obvious inequalities where with s o d s d sl the points x S a r e related by x S + ' = x - r ' g S . For these values of s it i s possible t o derive that € l x : j ' ( x ) d min l f ( z s o +1 ), . . . ,f ' ( x s ' ) ] ] . Thus, f o r s k + I d s d ~ ~ + have ~ w e where 22;' E i x : J ( x > 5 min [ p ( x S k + l ) , . . . , j ( x S k + l ) ] ] , I t i s easily proved t h a t Ak THEOREM 5.1 4 0. A s s u m e that t h e problem (*) is s o l v e d b y a l g o r i t h m 2. T h e n t h e inequalities hold f o r such instants sk at which t h e step-size v a r i e s because t h e condition lleskIl S ,rk i s satisfied. REMARK I t follows from theorem 5.1 t h a t t h e s a m e estimate o c c u r s both f o r t h e 11 subsequence of "records" 2, { and f o r C e s a m subsequence (8" {. Let t h e problem (*) be solved by algorithm 2 where t h e operation of averaging of proceeding subgradients i s used. Denote instants of changes in t h e step-size by s i , i = 0 , 1, 2, .... Suppose t h e f i r s t instant of t h e change from r o to r l t a k e s place because t h e inequality lleSolI S EO holds. Examine t h e scheme of averaging by "moving" average. W e have gS s p * + (g", 2 s -Ze) , s C Designate t h e expression i =O Then Whence f o r s s K w e have Xi, by j ' . For s > K w e s h a l l have Thus, From t h e formula t h e following recommendations c a n be o f f e r e d with r e s p e c t t o t h e selection of p a r a m e t e r s Xi,, : (2) min h,S*Oi 5= o X i , s ( g i , x i - x O ) , f:= o X i , s = I i The s u b g r a d i e n t averaging t h e r e b y allows improving a p o s t e r i o r i estimates of t h e solution a c c u r a c y . This may substantiate formally t h a t i t is of advantage to int r o d u c e a n d study t h e o p e r a t i o n of subgradient averaging. F o r a n a r b i t r a r y instant of step-size variation s f > K we c a n easily obtain t h e estimate THEOREM 5.1 Let t h e problem ( a ) be solved b y algorithm 2 w i t h t h e u s e of averaging scheme a). Then for t h e i n s t a n t s s f , for w h i c h 11 es'l 1 5 ci, inequality (5.9) holds. The scheme of averaging b y "weighted" average b) i s treated in a similar way. The a posteriori estimates of t h e solution a c c u r a c y attained f o r the adaptive subgradient methods c a n be extended t o t h e i r stochastic finite-difference analogs with t h e minimum of alterations. The way of getting them is illustrated with algorithm 3 . We will use notations introduced in Section 4 . When proving theorem 4.3 i t is possible t o demonstrate t h a t t h e step-size r f v a r i e s an infinite number of times. A s algorithm 3 converges with a probability of unity, then f o r almost all o i t i s possible t o indicate E(o) such t h a t with s 2 Therefore, with s 2 E(o) t h e step-size r f v a r i e s because t h e condition holds, where sf 2 pi + j , zs' = zs' -'+ (sf - j ) l ( # s ' -z' ) sequences Itf ) and Ipf 1 comply with p r o p e r t i e s formulated in theorem 4.3, j is reconstructed by Sf. Consider t h e event where st constant 0 is t h e instant of step-size change t h a t p r e c e d e s s f . There exists t h e <c < such t h a t with t h e probability g r e a t e r than 1 ble t o state t h a t Then f o r t h e instant si t h e inequality holds with t h e s a m e probability. - Cdi i t i s possi- Theorem 5.3 is readily formulated and proved. Assume t h a t the problem (*) i s solved by algorithm 3. Then f o r almost all w i t is possible t o isolate a subsequence of points jxs'(w)j f o r which with the probability g r e a t e r than 1 - C bi t h e inequal- ities hold where fiYl = min f (x , i 2 €En x i Y l E Argmin f (x , i -I), - 1) . BEFERENCES Ajzerman, M.A., E.M. Braverman and L.I. Rozonoer: Potential Functions Method in Machine Learning Theory. M.: Nauka, 1970, p. 384. Glushkova, O.V. and A.M. Gupal: About Nonmonotonic Methods of Nonsmooth Function Minimization with Averaging of Subgradients. Kibernetika, 1980, No. 6, pp. 128-129. Gupal, A.M. and L.G. Bazhenov: Stochastic Analog t o Conjugate Gradient Method. Kibernetika, 1972, No. 1, pp. 125-126. Gupal, A.M.: Stochastic Methods of Solution of Nonsmooth Extremum Problems. Kiev: Naukova dumka, 1979, p. 152. Dem'janov, V.F. and V.N. Malozemov: Introduction t o Minimax. M. : Nauka, 1972, p. 368. Eremin, 1.1.: The Relaxation Method of Solving Systems of Inequalities with Convex Functions on t h e Left Side. Dokl. AN SSSR, 1965, Vol. 160, No. 5 , pp. 994-996. Ermol'ev, Ju.M.: Methods of Solution of Nonlinear Extremum Problems. Kibernetika, 1966, No. 4, pp. 1-17. Ermol'ev, Ju.M. and N.Z. Shor: On Minimization of Nondifferentiable Functions. Kibernetika, 1967, No. 1, pp. 101-102. Ermol'ev, Ju.M. and Z.V. Nekrylova: Some Methods of Stochastic Optimization. Kibernetika, 1966, No. 6 , pp. 96-98. Ermol'ev, Ju.M.: On the Method of Generalized Stochastic Gradients and Stochastic Quasi-Fejer Sequences. Kibernetika, 1969, No. 2 , pp. 73-83. Ermol'ev, Ju.M.: On One General Problem of Stochastic Programming. Kibernetika, 1971, No. 3 , pp. 47-50. Ermol'ev, Ju.M. : Stochastic Programming Methods. M. : Nauka, 1976, p. 240. Ermol'ev, Ju.M. and Ju.M. Kaniovskij: Asymptotic P r o p e r t i e s of Some Stochastic Programming Methods with Constant Step-Size. Zhurn. Vych. Mat. i Mat. Fiziki, 1979,Vol. 19,No. 2,pp. 356-366. Kaniovskij, Ju.M., P.S. Knopov and Z.V. Nekrylova: Limit Theorems f o r Stochastic Programming. Kiev: Naukova dumka, 1980,p. 156. Loev, hi.: Probability Theory. M.: Izd-vo inostr. lit., 1967,p. 720. Norkin, V.N.: Method of Nondifferentiable Function Minimization with Averaging of Generalized Gradients. Kibernetika, 1980,No. 6,pp. 86-89, 102. Nurminskij, E.A.: Convergence Conditions f o r Nonlinear Programming Algorithms. Kibernetika, 1973,N o . l,pp. 122-125. Nurminskij, E.A. and A.A. Zhelikovskij: Investigation of One Regulation of Step in Quasi-Gradient Method f o r Minimizing Weakly Convex Functions. Kibernetika, 1974,No. 6, pp. 101-105. Poljak, B.T.: Generalized Method of Solving Extremum Problems. Dokl. AN SSSR, 1967,Vol. 174,No. 1,pp. 33-36. Poljak, B.T.: Minimization of Nonsmooth Functionals. Zhurn. vychisl. mat. i mat. fiziki, 1969,Vol. 9,No. 3,pp. 509-521. Tsypkin, Ja.Z.: Adaptation and Learning in Automatic Systems. M.: Nauka, 1968. Tsypkin, Ja.Z.: Generalized Learning Algorithms. Avtomatika i telemekhanika, 1970,No. 1,pp. 97-103. Chepurnoj, N.D.: One Successive Step-Size Regulation f o r Quasi-Gradient Method of Weakly Convex Function Minimization. Collection: Issledovanie Operacij i ASU. Kiev: Vyshcha shkola, 1981,No. 19,pp. 13-15. Chepurnoj, N.D.: Averaged Quasi-Gradient Method with Successive Step-Size Regulation t o Minimize Weakly Convex Functions. Kibernetika, 1981,No. 6,pp. 131-132. Chepurnoj, N.D.: One Successive Step-Size Regulation in Stochastic Method of Nonsmooth Function Minimization. Kibernetika, 1982,No. 4,pp. 127-129. S h o r , N.Z.: Application of Gradient Descent Method f o r Solution of Network Transportation Problem. In: Materialy nauchnogo seminara p o prikladnym voprosam kibernetiki i issledovanija operacij. Nauchnyj sovet p o kibernetike IK AN USSR, Kiev, 1962,vypusk 1,pp. 9-17. S h o r , N.Z.: Investigation of Space Dilation Operation in Convex Function Minimization Problems. Kibernetika, 1970,No. 1,pp. 6-12. Shor, N.Z. and N.G. Zhurbenko: Minimization Method Using Space Dilation, in t h e Direction of Difference of Two Successive Gradient. Kibernetika, 1971, No. 3,pp. 51-59. S h o r , N.Z.: Nondifferentiable Function Minimization Methods and Their Applications. Kiev, Nauk. dumka, 1979,p. 200. Demjanov, V.F.: Algorithms f o r Some Minimax Problems. Journal of Computer and Systems Science, 1968,2,No. 4,pp. 342-380. Lemarechal, C. : An Algorithm f o r Minimizing Convex Functions. In: Information Processing'74 /ed. Rosenfeld/, 1974,North-Holland, Amsterdam, pp. 552-556. Lemarechal, C.: Nondifferentiable Optimization: Subgradient and Epsilon Subgradient Methods. Lecture Notes in Economics and Mathematical Systems /ed. Oettli W./, 1975,117,Springer, Berlin, pp. 191-199. 33 34 Bertsekas, D.P. and S.K. Mitter: A Descent Numerical Method f o r Optimization Problems with Nondifferentiable Cost Functions. SIAM J o u r n a l on Control, 1973,11,No. 4,pp. 63'7-652. Wolfe, P.: A Method of Conjugate Subgradients f o r Minimizing Nondifferentiable Functions. In: Nondifferentiable Optimization /eds. Balinski M.L., Wolfe P./, Mathematical Programming Study 3, 1975, North-Holland, Amsterdam. pp. 145-1'73.
© Copyright 2026 Paperzz