NASACONTRA YASA CR-2093 <I,/ - CTOR REPORT m gc 0 N LOAN COPY: RETURN AFWL (DOUL) KIRTLAND AFB, N. M U TO MI L COMPARISON OF GENETIC ALGORITHMS WITH CONJUGATE GRADIENT METHODS : by Jack Bosworth, Normm Foe, und Bernard P. Zeigler ‘I d f Prepared j 5 g THE ;: for by UNIVERSITY Ann Arbor, Langley Mich. Research OF MICHIGAN 48 104, Center I‘i‘5 NATIONAL AERONAUTICSAND SPACE ADMINISTRATION l WASHINGTON, D. C. . AUGUST 1972 TECH LIBRARY KAFB, NM 1.Report No. 2. Government 4. Title and Accession No. Subtitle -. Jack Bosworth. Name and No. Name and Date 6. Performing Organization Code 6. Organization Report Performing 10. Work 11. &ntract Unit No. or Grant No. NGR-23-005-04 7 Department 13. Address Type of Report Contractor National Aeronautics and Space Administration Washington, D.C. 20546 14. ___. .--. Sponsoring and Period Report Agency Code .~ Notes 16. Abstract Genetic algorithms for mathematical function optimization are modeled on search strategies employed in natural adaptation. Comparisons of genetic algorithms with conjugate gradient methods , which have been made on sn IBM 1800 digital computer, show that genetic algorithms display superior performance over gradient methods for functions and functions obscured by which are poorly behaved mathematically, multimodal functions, additive random noise. Furthermore, genetic methods offer performance comparable to gradient methods for many of the standard functions. 7. Key Words (Suggested by Authoris)) -~ 16. Function optimization Mathematical Programing 19. Security aassif. 20. I sale by the Distribution Statement Unclassified (of this report) Unclassified *For No. 003120-1-T Address The University of Michigan Logic of Computers Group Computer and Communication Sciences Ann Arbor, Michigan 48104 15. Supplementary Report Norman Foo, and Bernard P. Zeigler Organization Agency 5. Catalog I 7. Author(s) 12. Sponsoring Recipient’s August 1972 COMPARISONOF GENETIC ALGORITHMSWITH CONJUGATEGRADIENT METHODS 9. Performing 3. -I NASA CR-2093 National Security Classif. (of this page) - Unlimited lZl.NoGPap. Unclassified Technical Information Service, Springfield, Virginia 22151 Covered I. Introduction A function optimization a real valued function points problem may be defined defined on a finite dimensional of the space at which the function A direct or maximum) values. optimization sear& a number of points attains in the space until a point the optimum (minimum its step-by-step Given space, find for solving aZgoritkm problem is an iterative as follows: such an procedure which samples is found which is apparently optimum. Function optimization arise problems requiring search algorithms from the general area of the design of optimal (Athans and Falb (1966)). the control control some pre-determined solution plant necessitate (Kalman, Falb, Arbib (controlled optimal control or chemical processing optimization control to optimize controller actions attempts In this point being based upon its and environmental to Often the design of such direct search algorithms to formulate for their beforehand a realistic case, one may design a control system of view (Bellman (1959), Mishkin and Braun An adaptive the performance of the plant continually according problems which cannot be solved (1961), Feld'baum (1966), Sworder (1966)). attempts for however, not enough is known about the system) behavior from the adaptive plants, to (1969), Lavi and Vogl (1965)). applications, problem. systems of view, when applied of performance. and therefore In many control point control systems which perform optimally criteria systems leads to function analytically The optimal of aerospace vehicles example, involves inputs direct "on line", to improve the plant's record of past plant disturbances. An adaptive control system i.e., performance, the its responses to control controller must possess as essential subcomponents, direct search toward optimum points search algorithms of the criterion which can direct function the (Wilde (1964), Hall and Ratz (1967)). Thus the successful rests critically solving on the existence function algorithm optimization secondly, and adaptive of useful problems. in any application to converge (i.e., locate design of optimal direct to converge rapidZy locate it variations roundoff is important in the criterion error or plant far removed from the actual Genetic algorithms upon search strategies for practical into time) and application). not be misled by random (arising, disturbances) place can be guaranteed to eventually such an algorithm function for search in the first the optimum in a finite (many algorithms that search algorithms ability the optimum but do so much too slowly Thirdly, systems The value of a direct depends on its to actually control for example, by digital settling on apparent optima ones. are direct search algorithms employed in natural which are modelled adaptation. Attempts were made by Fogel, Owens and Walsh (1966) and Bremermann (1966) to implement some of the search strategies adaptation. The techniques employed by these workers only superficially resembled those known to exist did not yield information or cost and complexity algorithms in nature concerning the comparative of the genetic at the genotypic algorithms. level (1967), Bagley (1967), and Cavicchio experimental (Mayr, 1965) and the studies employing the mechanisms of crossover, and reproduction results employed in natural indicating convergence properties More sophisticated inversion, mutation have been developed by Rosenberg (1970). the superiority These workers obtained of the-genetic algorithms to competitive methods in the areas of pattern adaptation which they explored. systematic theoretical existence of an ideal to any other plan, Holland analysis it and biochemical has undertaken a His work concerns the plan which is "good" in comparison sustains only a finite time when compared to any other plan. of the requirements (1969a,b,c) of these methods. reproductive i.e., recognition loss over infinite This criterion that a search algorithm is a formalization be "efficient" and "robust" over a broad range of test problems. Hollstien optimization. (1971) developed a class of genetic He has shown that these algorithms convergence on functions as classical single hill which are multipeaked climbing algorithms for function are capable of achieving and discontinuous where methods operate well only on sufficiently smooth peaked functions. In this algorithms paper we are concerned with the convergence rates in comparison with other methods. the convergence rates gradient (variable is a severe test of genetic methods relative metric) Polak (1971)) on test problems typical for the genetic extraction from the analytic structure other hand the conjugate to those of the conjugate in the latter techniques function) Thus from this inferior Some positive for performance however arise Hollstien claims superior point than fixed of of view performance from the genetic (1966) and Schumer (1968) which indicate methods can be more efficient and on the methods have been honed to the point one may expect relatively Rastrigin This for guidance (which is available of the usual test gradient area. methods since on the one hand they do not for these functions. indications we investigate methods (Luenberger (1964), Pearson (1969), employ derivative extreme efficiency As a beginning, of genetic methods. from studies by that random step size step size gradient methods. Since performance for his methods over those of 3 Rastrigin this opens the possibility that genetic ably with the conjugate gradient powerful step size gradient than the fixed methods can compete favor- methods (which are themselves more methods). 4 --- ~-- -. II. Description of Program As work progressed on our optimization We shall a number of modifications. by describing description program it naturally attempt to portray this four stages of development (I,II,III,IV). the theoretical these modifications will and experimental underwent evolution After the developments which motivated be discussed. We consider maximization of real valued n-ary functions of the form f:Rn + R. A chromosome (or string) n-dimensional pattern is a list vector with an associated is a permutation of coordinate inversion of the sequence l,...,n values of an pattern. An inversion say il,...,i . If a n means that there string is a ,...,an with inversion pattern i ,...,i n this 1 1 is a point in n-space which corresponds to the string such that its For example, let n=4, and the string coordinate is a.. 3 -.4 with inversion 1.3, -.4, va2v.e associated (currently being optimized) the value associated fC.1, 1,4,2,3 then the corresponding be .l, point is (.I, .02). The function function pattern i. th I .02, 1.3, with a string is just at the corresponding with the above string is f(.l, the value of the point. 1.3, -.4, Thus .02) (not .02, 1.3, -.4)). Version I The basic flow diagram for Version I is as follows: 5 --- --..-I-~-.---_I__ __-. - _ cross-over no Forty strings each. I.e., Only one inversion any two strings inversion giving pattern. in four subpopuZations was associated in the same subpopulation called of ten strings with each subpopulation. had the same associated the utility vector was maintained value of each string. consisted the best string then replacing pattern A vector the function Selection (i.e., were maintained of ordering each subpopulation is the one with the highest the lowest four strings by function function value) by the best four strings value and (in each subpopulation). Cross-over two pairs pivot consisted (7,8) points. (9,lO) Then all of picking of strings at random two coordinates in each subpopulation. coordinate a1 ,...,a with pivot points bla2a3a4b5' Inversion best strings, subpopulations, 5 and bl,..., 2 and 4 say. consisted 1 the pivot For example suppose we have a pair b5 with inversion The resulting of ordering These are called values between and including are exchanged between pair members. of strings for each of the pattern strings 1,2,3,4,5 are alb2b3b4a5 and the four subpopulations copying the best two subpopulations and changing the inversion patterns and into by their the worst two of the copies as ‘1 In each subpopulation the string with the highest function value is found (the best string of the subpopulation) and the subpopulation ' with the highest "best string" is best, etc. points folloWs. To change the inversion for each copy and all I.e., if strings 4 say, then the new string Mutation initial were inverted a3 is a string al,..., mutation were chosen with pivot points, points 2 and is ala4a3a2a5. A probabitifg was more complex. specified points about these pivot of a subpopulation vector The vector parameter specifications. coordinate two pivot patterns, the probability was included in the had four coordinates. Each of using a corresponding method of on any given string. The methods of mutation were: 1) Fletcher-Reeves A version (FR) Mutation. (1960) method which could be applied a controlled a point (without When q=l this mutation, i.e., by the string reinitialization).2 an approximate gradient which were initialized. 2) from the point m integers to be mutated, (the actual coordinates Gaussian An integer, with limits the number to be mutated) lvere chosen randomly limits amounts) were symmetric about 0, say rl,...,rm. coordinate of the point. ApproximatZon. m and il,...,im If & is the initialized rl,l'rl,2'""rm,l~rm,2~ "standard deviation"of coordinate by the gradient search were 2m numbers were chosen randomly between -1 and 1, say chosen as in 2). i.th 1 one dimensional m numbers (the mutation rj was added to the i.th 7 Quadratic specified specified was chosen randomly between 1 and n, say m. chosen randomly between initialized 3) was taken at the point of coordinates. between 1 and n, say il,...,im. Finally reduces to gradient 3 UnifoMn random mutation of coordinates number of times q to to be mutated and a "Golden Section" was made along the line of the Fletcher-Reeves this of the point mutation, then r. JXrj,2 for each j=l,...,j=m. number determining the l L is added to the 2 Since everytime the routine is called its remembered gradient is set to 0 this is equivalently a reset mode of operation with reset interval q. jOur Fletcher-Reeves method uses 2n samples for its gradient estimation and 30 samples for its one dimensional search per iteration (n is the dimension of the space). 7 4) zero mutation. The string For each of the forty strings was chosen according corresponding with the same inversion the strings pattern value vector) vector was values between two bounds, say -2 and 2. The utility set to 1,2,3,4...n. with the associated other parameters were considered and initialized was converted to a string of reading in parameters and initializing were all was initialized point to the point to the point. consisted patterns and applied as before and the utility to random coordinate The four inversion vector The resulting the function The initiaZization unaltered. one of these four methods of mutation to the probability to the string. updated by applying is left to be subject function vector values. to experimental All manipulation accordingly. Version II Version I was modified Selection four strings replaced 1. String 8 is replaced by string String II in the following The strings String String randomly from those remaining. Thus the best two strings until all strings Cross-over caused cross-over strings, rather are i where i is (uniform) 9 is replaced by string by the selection.process. with 7 is replaced by which case 9 is replaced by 3. duplicated ways: in each subpopulation as follows. and 7,8,9 and 10 are replaced. chosen from 2,3,...,10. randomly 2 unless i=2 in 10 is replaced by a string chosen are always (None of the replacements were made were chosen.) was done in the same way. Note that the selection to occur between the best strings than among the best strings The four mutation 8 the worst four strings from the same subpopulation rated 1 ,...,lO string to create Version now and randomly chosen themselves. methods of I were used except that 2) was altered (function as follows: rl,...,rm 2') Cubic Gaussian Approximation. between -1 and 1 then added ri*L A fifth 5) coordinate. method was added: Uniform Raxdom with old 2) but the limits for different Let these limits Before this mutation 4.1 was mutated as before, each subpopulation the = be was done the maximum and values were found for each coordinate, for the 1.th coordinate. Each string This method was like rm were chosen were different of the point. nn,L . minimum coordinate VariabZe Limits. between which rl,..., coordinates -l.l,.L1'-L2,.L2,...,-& respectively to the i,th J were chosen randomly say ai and.ai, a;-a. 1** but when the best string was mutated (according to the probability the mutant replaced the worst member in the subpopulation in vector), (the best string was also saved unmutated). The major addition "adaptation" fixed routine _- to the program structure which controlled at initialization. These parameters included toward selecting was based on a history how often resulted highest in applying mutation which contained function mutations value present routine The variance producing increases routine 2') deviatiofl' (determining the method). information The adaptation concerning and 3) for each subpopulation, in each subpopulation used was similar determining whether large mutation of this vector the%tandard and when each mutation was used, the average mutation The adaptation (1968). a particular vector and the to that of Schumer and Steiglitz parameter L was modified value. which before mutation. according amounts or small ones proved more fruitful in the function , some of the parameters previously L used in methods 3) and Z'), the probability disposition was a second level to in A more complete description is given in Appendix A. 9 Version III The flow diagram for Version III is as follows: I-initialization The major change introduced tion into amically four distinct each sharing was that to at most 40. a commoninversion there was no partitioning The population subpopulations. but was limited I pattern associated allowing pattern crossing were not maintained with the inversion the alleles patterns. (for the difficulties Instead pattern cross-over inversion pattern pattern. More detail The mutation in the following of the better will routine ways. between strings with that of having the same inversion in this strategy see was allowed between arbitrary of the better string string is in fact strings of the pair determining In essence, the heuristic to be crossed-over. some convention One possibility, over only between strings was rejected (Bagley (1557)). 10 inversion size was determined dyn- Because separate subpopulations had to be adopted in order to achieve crossover different of the popula- is that the the better inversion be given in a moment. differed from the previous mutation routines A parameter ml (determined by initialization) was defined as the number of strings began with m strings (m assumed notless which had the highest were chosen. to be mutated. associated These ml strings than ml), function Suppose the program then the ml strings values among the initial were copied. Each of the ml copies was mutated using a method chosen randomly with the probability mining the frequency of selection of any given mutation methods were the same as l), 3) and 2') of Version Z), was not implemented in the Version the utility mutation vector was updated and the history The adaptation routine III of routine Version II of the history vector). was introduced to evaluate was essentially (allowing deter- The mutation Method 5) (As before, was maintained.) the same as the adaptation The major difference in the structure was that a weighting method effectiveness so that difference value than a method not weighted so heavily of the probabilities method. routine. vector vector II. for the differences method had to produce a higher percentage m strings a heavily scheme weighted in the best function in order to have the ratio of these two methods remain the same. These weights were initialized. The cross-over initialized routine parameter indicating would operate on. Z-ml strings of the string points other string. Let m2 be the the number of strings than or equal to m2’ were chosen. present and pairing the pivot as follows. Z-ml (the number of strings was assumed greater values) was altered leaving with the higher Then the alleles function routine) among the by copying the strings (coordinate value between and including were exchanged with the corresponding Equivalently the mutation The best m2 strings Cross-over initiated the copies randomly. which the routine the normal cross-over alleles operation of the is performed 11 except that the inversion of the better string pattern of the worse string before the exchange is begun. one of the daughters receives.the other daughter inheriting worse string's the better string's is replaced by that After the exchange inversion pattern). pattern (the For example, if ala2a3a4a5 with pattern 12345 and blb2b3b4b5 with pattern 54321 are to be crossed over, first create b5b4b3b2bl with pattern 12345 and do the cross-over as usual. With pivot alb4b3b2a5 and b a a a b . 52341 the other gets 54321. points One of these is given pattern The number of successive cross-overs but was determined by an initialized constraint 40. 2 and 4 for example, we obtain was not held at one (as before), maximum bound i subject that the process was to be stopped if (Note that the population 12345 while to the the population doubles at each successive size reached cross-over and 2' = 32 so i < 5.) The inversion entering greater population than ml/Z) the inversion points were chosen. (the least Each such string was copied and (Production was halted when ml strings were produced.) routine the same as Version some of the original III ml strings except that in the were mutated as well. parameter mi < ml determined that mi randomly chosen from the original ml strings not including the best were to be mutated in the same manner as the ml copies already produced. 12 integer of the copy was determined by randomly chosen pivot IV was exactly Thus an initialized strings strings Assuming the IV Version mutation always produced ml strings. size exceeded ml the best 'ml/Z' pattern as before. Version routine III. Test Functions The following 1. 2. 3. 4. Spherical functions we used as test functions to be optimized: Contours fl(X) = “co x2. is1 ' f2(x) = Index 40 C ix: i=l Index squared f3(x) = %Oi'xf i=l f4(x) = 100(x2-x1)+(1-x1) wood 22 22 + 90(x4-x3) +(1-x3) 2 2 + lO.l((x,-1)2+(x4-1)2) +19.8(x2-1)(x4-1) 5. VaZleys fg(x) 6. = Z i2(x5+i-xi)2+ixij i=l Repeated Peaks f6 (‘) = (4i~lxi for (‘-‘i)) x.1 > 0 ~+~ xs] -x5)(“5-[ i = 1,2,3,4 ‘51) ’ xs1 ( ~x5~+1) and x 5z1 = 0 otherwise4 Functions We invented 4[x] 1 through 4 are standard in the direct 5 and 6 to test is the integer our hypotheses part of x,' e.g., [1.5] concerning search literature. algorithm behavior. = J. 13 NOTE: Functions is replaced 14 l-5 by -f(x) are to be minimized so that and the standard maximization in the program f(x) formal is satisfied. IV. Comparison of Genetic and Classical As stated before, performance of genetic one of our primary objectives and classical could utilize the local well to compete favorably extraction structure was to compare the methods in the realm of numerical We hoped to ascertain optimization. Methods in this of analytic with classical way whether genetic functions methods which employ gradient routines. to date with the Fletcher-Reeves are strictly method. speaking only relevant the means of comparison. our conclusions to the particular Version II called methods compared and were selected consisted of running Version FRl had the Version IV against with that a control is the only mutation II structure routine now had the form: apply the Fletcher-Reeves interval q = n (where n is the number of variables method. except that the mutation method with reset in the function) to in each subpopulation. IV and FRl were applied 5 with the same initial The resuZts function time the mutation obtained in mind we have reason to believe FRl in which Fletcher-Reeves More specifically, Version constructed may have general validity. The experiment the best string routines Of course the results However since the latter role of class representatives each test methods sufficiently Our approach was to compare the best of our genetic their function to each of the test 2 through set of points. are shown in Tables 1 and 2. the number of function routine functions is executed. In Table 1, we record for evaluations taken by FRl each In comparison the number of 15 TABLE 1 Test Function Number of FRl Function Evaluations per generation actual divided Number of Version IV Function evaluations needed to achieve same change in function value Corresponding change (order of magnitude) by 4 2. Index 17,616 4,404 55,800 11,475 90 3. Index Square 17,616 4,404 19,800 4,400 5,500 3,200 3,500 3,500 4. Wood 5. Val 1eys 656 164 45 45 45 45 45 45 270 90 90 90 3780 3240 0 0 0 40,770 18,900 2,100 525 1,566 5,220 1,392 3,132 15 1.46 .12 .8 .3 .189 .292 ,385 .238 .34 .28 .525 .315 .583 ,479 .00001 .006 .o 1.669 .643 1.2 .71 .88 1.3 - TABLE 2a Test Function Function value attained Number of function evaluations actua 1. Spherical 2. Index 3. Contours 1o-3g 'Rl divided by 4 T Version 52,800 110‘ 52,848 13,424 67,725 Index Squared -1.0 x lo-l5 -2.0 x 10-10 .05,696 26,421 40,000 4. Wood -1.46 x 1O-6 11,152 2,790 68,000 5. Valleys -3.4 x 10-9 8,400 2,100 11,310 6. Repeated Peaks 11.999 m IV 03 5,070 *The figure given in the number of function evaluations required by our optimum gradient method (Fletcher-Reeves with q = 1) which converged in one iteration. 17 TABLE 2b Function value attained Test Function Number of Function evaluations required by Version IV after FRl hung up 2. Index 1.6 x lo-l9 86,175 3. Index Squared 1 x 1o-22 94,140 4. Wood 1 x lo-l4 5. Valleys 2.4 x 10-13 200,000 93,544 18 -..- _._,.-_.----.,_-_-_--,_,.-- --- --,m. ,-..,.,.-1.1 l,~..~..~--,,,l..,.-,~,-, I._.. ,,-~-1,.,,.,, ,.,11.--1-m..111mm mm...,,., 1.1..-1-11m1 1.1mm.m1.1.11’ function evaluations function value is indicated The function taken by Version IV to achieve the same change in (along with the change in value achieved). value attributed to a population In Table 2awe record the total number of function by the methods to reach the indicated In these tables evaluations number divided on the number of function by 4. evaluations taken evaluations number of FRl function The latter is a lower bound were the classical our method in its nonHUform) in the initial best string. level. we have given both the actual and this method (i.e., is that of its Fletcher-Reeves to be applied to the best point population. It may have become apparent to the reader that we face the difficulty here of comparing the parallel conjugate gradient of initial points. methods. Our genetic The Fletcher-Reeves have observed that variable operating algorithms is locally iterations taken since the last quadratic for meaningful themselves more to this comparison. form of analysis little context. to be used: the maximum rate of convergence? a method fails rapidly from others? We (for example and the number of re-initialization. type of method in the present What if with a number may be quite search region or near a sharp ridge) some kind of aggregate behavior space is required must start method begins at one point. depending on the nature of the current Clearly methods with the sequential the rate of convergence of Fletcher-Reeves whether it either genetic of a method over the search While parallel methods lend is known analytically for Then too which aggregate is the average? to converge from some starting points the minimum? but converges 19 As already indicated, method in a Version over: this II genetic is equivalent reach a given function was to embed the Fletcher-Reeves program. to applying of the 4 subpopulations, by Fletcher-Reeves our decision If we ignore the effects Fletcher-Reeves the number of function value level applied level by four" first to the point iability we would need only l/4 efficiency. evaluations which reaches this points level an "optimistic" This optimism will best is unimportant) be well and inappropriate fact high in which case the "pessimistic" Our results indicate that if first. reach estimate of founded if the varpoint the variability is is in upper bound is justified. except for the behavior Wood and Repeated Peaks function contours, to Thus the "divided of convergence is low (so that knowing which starting ultimately in each required would actually of the total. columns of Tables 1 and 2a represent Fletcher-Reeves to the best point being then four times the number required knowing before hand which of the four initial this of cross- on the spherical there is not a vast difference in convergence rates. The behavior on the spherical IV's lack of gradient (or just extraction optimum gradient) direction directly functions On this facilities. can follow a one-dimension points out Version function, Fletcher-Reeves search in the gradient to the optimum. The Wood function good ridge contours follower. results (Its indicated initial that Version progress IV is not a very is comparable to FRl but it seems to get hung up in mid course though its mutation facilitates enable it to make a recovery). Repeated Peaks is a multiple the abilities in the fact on any local climbing that FRl hangs up on the local 5Actually our observations the FRl context. 20 hill peak function indicate that and thus should be beyond method. This is substantiated peak on which it crossover has little is initiated. effect in Of course for the comparison here to be truly level meaningful should be superposed above the Fletcher-Reeves The conclusion that convergence rates local (1968) and Hollstien show that on functions at the same value. for his genetic algorithms gradient methods with fixed Rastrigin over the random directional are of the fixed The efficiency exceed that of the fixed mation to this algorithms methods. step method with convergence methods. outperform methods referred of the latter conjugate is a good representative6) step gradient.7 the to by gradient gradient is known Thus the question remains methods compare with the conjugate methods and hence how Hollstien's the gradient two references step size type and not of the conjugate open as to how the random direction gradient genetic (1963), step size. methods (of which Fletcher-Reeves a The first claims superior here to note that the gradient class which we considered. to of Rastrigin than a gradient Hollstien then that Hollstien's It is crucial (1971). more efficient step size fixed It follows search. of type 1 through 3 a random directional method can be significantly search are comparable on Functions 2 through 5 should be discussed in view of some results Schumer and Steiglitz a global Our present results genetic methods compare with thus add essentially new infor- comparison. 61n a comparison of 7 conjugate gradient methods including the well-known ones, Pearson's (1969) results show that in terms of the number of one-dimensional searches, Fletcher-Reeves is superior to all others (except Newton Raphson) when operated in the reset mode (as it is here) on the Rosenbrock and Wood functions. Thus we chose Fletcher-Reeves since it is both more simple and efficient on the "well-behaved" functions we considered. considered by Pearson the situation (On the "penalty functions" is drastically reversed with Pearson's method #3 coming out well on top.) 7This can be seen in any of the texts referred to in the literature survey and is essentially due to the use of one-dimensional rather than the much more costly n-dimensional searches. 21 Actually, n2 C xi i=l Schumer compares his method with Newton Raphson on (our function 1) and ? x4 and finds i=l n < 78 and superior on the latter of the number of function increases linearly inferior evaluations and it because second partial As we have indicated the function increase appears that Schumer's method quadratically derivatives evaluations only linearly on the former for The comparison is in terms for n > 2. while Newton Raphson increases regard (essentially Fletcher-Reeves it in this must be estimated). per iteration required in dimension and on the by : x2. and i=l l n4 C xi functions it should far surpass Newton Raphson in this measure. i=l In fact, Table 2a shows that our Fletcher-Reeves requires only 110 samples compared to the 330 required by Schumer's method and the 1500 required by Newton Raphsonl (Data taken from Schumer's Figure 4). classical Fletcher-Reeves on Spherical Contours for any finite It is interesting as a ridge should be uniformly follower also that as indicated better Thus the than Schumer's method dimension. Schumer's method proved not very effective by its inferior performance on Rosenbrock's function. It should be noted that Version value levels behavior than was FRl. starting This is shown in Table 2b which gives Version from the levels are those for which FRl's progress our Fletcher-Reeves IV'was able to reach much lower function indicated terminated. in Table 2a. The latter IV's levels (This may be an artifact of realization.) when it is considered that lActually the difference is-53 en more striking Fletcher-Reeves reached 10 from point (2,2,. ..,2), while Schumer's data are for the level 10-8 starting from (l,l,...,l). Note that Version IV's performance fell in between the Schumer and Newton Raphson methods. 22 V. Evolution from Version 5.1 Mutation and Second Level Adaptation Initially, I to Version only mutation random selection off false of coordinates ridges adding a second level (loci) and values (Version As a run progresses For this II) smaller in function value over the period halved assuming that of the vectors of a random mutation of a better mutation. and added a program to the standard deviation mutation. Table 3a must be changed less in order parameters in a Bayesian approximation. for a larger on the Our analysis routine. reason, the standard deviation end we implemented some history and like-wise that is as follows. the best alleles mutation had worked best, toward the system move improved performance. must decrease in order to improve the probability adapt the mutation Methods to modify the biasing of the adaptation reasons for the improvement obtained To this (alleles). Later we discovered 1965). considerably the effectiveness a uniform in order to bias the distribution (Wilde, routine basis of past experience to improve. It involved This improved convergence by helping resolution indicates Routines method 1) was used. 3), 2') and 5) were introduced small changes. IV Thus, if a was decreased If there has been no improvement of history the standard deviation it had been too large. was When the parameters became too small for the accuracy of the machine, they were reset to maximal values. It was apparent that the kind of mutation one point For this in a run was sometimes different reason more history ent mutation which worked best at for a different was kept and the probabilities methods were changed. part of the run. of the differ- This seems to work but does not usually give marked improvement in the performance of the system. 23 Non-uniform adaptation distributions was applied worked better than uniform ones when no since there was a higher probability of small uniform works best under certain conditions change. With the adaptation, since the probability tation of making the right can progress more likely faster. Under different to put the adaptation where change is quite for a long period This can happen in our present Resetting below a preset limit. below the preset in the function state the simpler better but is still mutation of the adaptive flexibility (i.e., (adaptation) routine. We were interested limit in "quasi-stable" routine shown. is insufficient routine 5) was indeed of 5) was redundant in view variability) to which gradjent algorithm that (5) against However, used with the additional controllable in the extent the genetic mutation Table 3b indicates the extra variability structure. introduced information could Employing Fletcher- does not give much of an answer to this since on the test functions vergence to such an extent 24 result variable (2). routine question will changes change in the parameter setting. adaptation Reeves as a mutation random or in which the parameter is not fed back to the adaptive the more complicated into no value. too small to cause significant than 2) on the two functions be encorporated of time at a suboptimal values of mutated points quadratic by the latter state of the parameter occurs only when it has passed since the information We tested is in which the adjusted system since we include Thus a situation level to cause a directed the uniform parameter in a "quasi-stable" state we mean a situation parameter is maintained reset. conditions and adap- smooth but too slow to be useful. By a "quasi-stable" regular size change is higher employed it that the essential tends to speed up congenetic elements (cross-over TABLE3a Number of function (All evaluations by Version I V.S. Version parameters are set to the same values except that Verstion a second level Function adaptation II. II employs routine). Value Attained Spherical contours -2.045E+l* -5.28 -3.78 -2.97 -2.48 -1.80 -1.36 -1.26 -1.00 - .753 - .472 - .268 - .218 - .217 *aEb is Fortran for a x lob 1. required Version I 90 900 1350 2250 2700 4400 5500 5840 10,750 13,400 14,400 17,820 28,600 >38,300 Version II 90 1050 1175 1350 i440 1440 1525 1620 1700 1890 2150 2700 2790 2790 25 TABLE 3b The number of generations mutation (2) V.S. variable Function 26 limit Value Attained 2. Index -700 -400 -300 -200 -100 - 80 - 60 - 40 - 20 4. Wood - 15.0 - 10.0 - 9.0 - 4.0 - 2.0 - 1.0 .5 .l .Ol required by Version mutation II using quadratic (5). (21 (5) 10 15 36 75 190 260 310 550 >4200 20 50 70 110 170 190 220 260 470 7 9 10 12 15 46 60 >700 >700 10 20 40 60 70 80 170 370 TABLE 3c The number of generations mutation quadratic 3. required (1) V.S. a mixed strategy Value Attained Index squared -100 - 10 : - 6" 5 II using a pure gradient using 1) with probability random (3) with probability Function by Version l/4.* 1 9 34 43 55 64 l---- *Note that 1) also uses more function than does 3/4(1)+1/4(3). 3/4 and evaluations 3/4(1)+1/4(3) 10 34 35 41 45 per generation 27 TABLE 3d The number of generations 28 required by version I with "best saved" strategy versus "best not saved." Function Value Attained Best Saved Best not Saved 3. Index Squared -.198E5 10 10 -.llE5 20 30 -.7E4 40 50 -.5E4 50 90 -.4E4 70 120 -.3E4 90 160 -:2E4 110 310 -.lE4 190 '4700 and inversion) mutation better do not play much of a role. Fletcher-Reeves (i.e., in conjunction subpopulation for the worse. from the best string, than alone (see Table 3~). often changed the best string When we saved the best string the worst string with the mutation the performance was increased the Version I kind of cross-over several can cause good alleles and still make the string best with best cross-over (coordinate values) does not use all its crossing over the best strings (see the description performance considerably of Version test were used. attained of crossover (since none can be introduced and inversion of those available. 5, where the alleles best of those initially We also tested This improved and inversion where no Here one expects that the ultimate is governed by the alleles is whether crossover alleles II). crossed over as shown in Table 4. We examined the effectiveness routines concept of with randomly selected so that only part of the time are the best strings with each other bad, For this potential. we tried population (Table 3d). because bad by making some alleles crossover value level fold. to appear in reason and other reasons based on our theoretical mutation in each in each subpopulation. intuitive strings in results best-with-best only occurred between the best strings Since mutation a string it worked and Crossover We called cross-over the mutation by replacing 5.2 Inversion with q = 1) we found that with random mutation It was apparent that a given population However when we used gradient That this in the final present by the mutation) in the initial so the real can operate to select is possible population function is indicated the best in Table are no worse than the second available. the effectiveness of crossover in bringing together 29 TABLE 4 The number of generations crossing over best with best Function 1. Spherical contours 3. . Index squared 4. Wood function *Version 30 Function value attained -500 -400 -300 -200 -100 -10,000 - 8,000 - 6,000 - 4,000 - 2,000 -15 -10 - 9 -4 -2 I using mutation required to reachindicated level (BB) V.S. best with random (BR).* BB BR 10 19 48 190 >400 6 15 36 75 190 16 30 50 108 >2700 16 22 34 75 200 2 9 21 32 >600 7 9 10 12 15 2) (uniform random) by TABLE 5 The effectiveness of Verison IV with no mutation and 1 crossover per generation. Function 2. Index The two smallest* values of alleles in 1st four co-ordinates available in initial population After generation 12 only one string remained: 1 2 1 2 3 4 . 3835 .0488 -.6048 .0488 .1774 .1181 I -.6048 *Clearly, .0976 for the index function, the smallest are the best. 31 "good" alleles crossover Version in another way. routine was unable to achieve the ultimate IV using 2 crossovers The effectiveness 5.3 Table 6 shows that Version The motivation tested that the version III the nwnber of subpopulations with the result only two.inversion Thus on a function are linked, would not be tested of the subpopulations without were reduced to say five subpopulations Thus it inversion pattern. effectiveness were maintained patterns. rapidly appeared that sized primarily Comparing of subpopulations of inversion some of whose variables enough to or six strings to achieve the smaller But subpopulations patterns On the other hand, if of cross-over. over would have been less effect. evalua- However, were we to reduce to two or three, patterns was being carried that more function would be compared in general. improve the effectiveness (Table 7). system was as follows: a lot of "excess baggageIf were being used than was necessary. inversion performance of II and III along in the four subpopulations tions was similarly for constructing It seemed probable a per generation. of inversion Comparison of Versions IV without the size each, cross- one must do (total) population. to preserve a single was thus a test How can this of the be achieved without subpopulations? Doing away with subpopulations strings will terns pattern means that Then there are n!/2 any inversion for functions %.e., redundant information 32 What having may be crossed over.Suppose the function since any permutation but turning the question: be crossed over and how? Suppose that only strings the same inversion n variables. also forces essentially of the variables pattern different inversion is an inversion end for end preserves has pat- pattern, clumpings. with more than three or four variables This cross-over TABLE 6 Number of function crossover (all evaluations required by Version IV with and without other parameters fixed). Function Value Attained Without Crossover With Crossover* 2. Index -6.03E2 75 225 -3.4E2 750 450 -1.65B2 1500 1125 -6.87El 2250 2475 -2.05El 3000 4500 -8.07EO 4500 6750 -3.19Eo 6000 8100 -.9.99El 7500 10,080 -5.82E-1 9000 10,800 -3.59E-1 10,500 11,210 -2.68~-1 12,000 11,700 -l.BOE-1 15,000 12,150 -7.7E-2 22,500 13,300 -4.9E-2 30,000 13,300 *using 2 consecutive crossovers 33 TABLE 7 Number of inversion 5. 34 (all generations required by Version I with and without other parameters fixed). Function Value Attained No Inversion Valleys 6.7 4.5 4.0 3.0 2.2 2.0 1.5 1.3 1.0 .9 .7 .6 10 30 40 60 90 150 170 260 380 440 520 570 Inversion 10 30 30 50 60 70 100 140 150 340 360 360 would take place very seldom in a set of strings inversion Therefore patterns. As already employed. different inversion applying the pivot a more general kind of cross-over indicated, patterns points we tried by picking inversion pattern. variables. Although this its from the first, to be about half uncertain cross-over alleles". pattern value alleles with only a It asswnes at finding clumps the right is only slightly consequences are more difficult obtained function vaZue usuaZZy has the better type of cross-over as effective The results function that the inversion That is, as before but are in the string. the "corresponding with the better with with the corresponding allows unrestricted computing cost to find that the string points with the better involved must be over two strings no matter where those alleles This kind of cross-over slight crossing two pivot to the string and simply exchanging the alleles of the worse string which have individual III It seems to predict. the best inversion from the version different pattern. and IV systems are often because they have more than a dozen parameters. The purpose of having open so many parameters was that we wished to be able to test hypotheses which we had formulated the version II system. dependent. That is, one parameters, has a strong (that effect found the optimal as a result of our experience These parameters have proved to be quite we find that for any reasonable one being arbitrary), on the efficiency of the system. amount. parameters which yielded but parameter However, having fixed, changes the optimal An important be to chart the interrelations However we were able to show that inter- of all the single value for that parameter with the others value for the.one parameter by a.large will setting varying changing some of the other parameters frequently for the future with involved. there were settings performance much superior project of Version to Version II III (Table 8) 35 TABLE 8 The number of function value by Version Function 3. 36 Index Squared II V.S. Value Attained 8000 4700 3200 2200 950 700 500 200 80 40 10 7 evaluations Version required to reach indicated III. Version 630 1,260 2,520 2,835 4,400 5,350 7,550 13,200 18,300 21,400 34,300 35,900 II Version 180 440 630 1,080 1,710 1,800 2,250 3,150 3,780 4,590 6,930 9,000 III function thus justifying VI. the change in system structure. Conclusions If the reader finds the results himself of our work to date, in the same position. are sufficiently We have constructed interesting, of theoretical the dark experimentation. Moreover, algorithms, effectiveness a significant Clearly which but which at the same guidance we are reduced to stab in since a single optimization in obtaining however, we have obtained the comparative behavior of genetic run data is of various the crossover subcomponents of the genetic and inversion suggestive and conjugate and we have also come to some conclusions We have for example demonstrated optimization individually to be as our enthusiasm demands. Within these cmstraints concerning ourselves study and classification? takes hours to complete our rate of progress not as fast statement of a class of algorithms amenable to analytical the benefit a clear let us assure you that we feel complex to be highly time are not readily Thus, without unable to formulate results gradient concerning the algorithms. problems where encorporating operators actually does achieve improvement in the rate of convergence (Tables 5,6,7). much remains to be done in confirming or disconfirming these con- clusions. 9 We look forward to a forthcoming book by J.H. Holland on adaptive systems for possible help in this direction. Also some preliminary analysis will appear in our Report (Foo and Bosworth, 1972). 37 APPENDIX The following is a more mathematical Let the function description of the adaptation used. highest fi denote function value) adaptation. Let the mutation which of the version f' just denote before just integer, system. Let i. fi-fi 1 di= f i-l i. in best function for for version between To avoid mutation this we will are used and an exact from the program used in the ith to f'. for generations generation 10 The theoretical consider Let 1 and ten. string just before In the case i has values in the case of the version i-l and i. between II This requires the version II ai denote double and its subscripting. The same methods I system may be gained of all adaptation. of the ai over string system. the average the last difference Let wi = di(i+l)/2. own best of the version following average the last adaptation. each has its understanding listings. following the II each generation. only one with and ten for version I. Let f1-f' and dl = f' . di is the percentage In the case of subpopulations, own average between (the of the best the last 0' string mutation value before say i i=2,...,io value the ith i has values one and an initialized denote of the best the function occurred I system value an infinite mutations Let a' ccrres>ond number of trials loWhen a string is mutated using one of the methods which use the adaptation parameter k?, a random number of coordinates are chosen to be mutated, sayth mutation is used an r. is chosen for the j If the "cubic" i (1 2 i < n). 3 The absolute values, coordinate if it is to be mutated where r. E [-l,l]. thJ average being say bk where of these i numbers are averaged, Ir.1, to be mutated that generation. If thJ string mutated was the kth string are chosen for the jth mutation is used an r . and an r the "quadratic" 2*i numbers E coordinate if it is to be mutated whe% r string mutated the average be& b Let ai denote the average o!2 all the bk in the ith generation. 38 is .5 since between the ri and r.. have absolute values uniformly distributed Jl Let a* and a* denote respectively the maximum and minimum 0 and 1. of the a i's including a' excluding ai . 0 If 1' is greater L is replaced $4, new t was less initialized than 4-e. by i-1. The $ limit above is a Bayesian This from the probability method. applied before the last each generation, p.m. p’ based on the assumption vector on generations vector). If to the ai over is less by C'. an infinite the & was that The the amount to the effect one is negligible. the probability consideration vector. (found p is 0 the method was not used so go to (from (like If obviously. the next method under than to the value pertaining following is made when adapting of the mutation adaptation l' k? is replaced Let ki be the number of strings correspond If .4?was reset in the history of mutation same assumption by $1. in the change of 1 is arbitrary stored Let p be the probability the next constant, approximation information of a generation $L I &' I $l, than an initialized to. of usable If 1 is replaced adapting to which e). a' and f'). the method was Let k' be the number just If m strings number of trials the ki were mutated should average Thus let @:wi+Cki)+ I (I iO c wi i=l -ial] _ p.m . p,1o I + p ) 39 p could be changed by no more than one tenth in the same manner as .& could be changed by no more than one half. The probabilities size. in the vector The "probabilities" You can easily in the probability see that this vector has no effect 40 vector set to p'. as to numerical were not normalized. on the above computations derived from the probability since vector. on the vector were programmed so that no value in the probability could be greater or version p was normally had upper and lower limits the p used there is a true probability The limits Therefore II than 20.0 or less than 0.5 or 0.1 in the version systems respectively. I VII. References Athans, M. and Falb, P.L. (1966) "Optimal Control: Theory and Its Applications" McGraw-Hill. An Introduction to the Bagley, J.D. (1967) "The Behavior of Adaptive Systems Which Employ Genetic Department of Computer and Correlation Algorithms" Doctoral Thesis. and Communication Sciences, The University of Michigan. Bellman, R. (1959) Adaptive University Press. Control Processes: A Guided Tour, Princeton Bremermann, H.J.; Rogson, M.; Salaff, S. (1966) "Global Properties in Natural Automata and Useful SimuZations. Evolution Processes" Spartan Books. of of Constrained Brioschi, F. and Locatelli, A.F. (1967) "Extremization Multivariable Function: Structural Programming" IEEE Trans. on Sys. Sci. and cy. ssc-3, 2. Cavicchio, D.J. (1970) "Adaptive Search Using Simulated Evolution" Department of Computer and Communication Sciences, Doctoral Thesis. The University of Michigan. Cohen, A.I. (1971) "Rate of Convergence of Several Conjugate Gradient Fifth Annual Princeton Conference on Information Science Algorithms" and Systems. Cragg, E.E. and Levy, A.V. (1969) "Study on a Supermemory Gradient Method Journal of Optimization: Theory and for the Minimization of Functions" Application, 4, 3. Davidon, W.C. (1966) Variable Metric Method for Minimization. Argonne Nat. Lab. ANL-5990 (Rev. 2). Feld'baum, A.A. (1966) Optimal Control Systems. Fletcher, R. and Reeves, C.M. (1964) "Function The Computer J., 7, pp. 149-154. Gradients" Academic Press. Minimization by Conjugate Fletcher, R. and Powell, M.J.D. (1963) "A Rapidly Convenient Descent Method for Minimization" The Computer J., 6, pp. 163-168. Flood, M.M. and Leon, A. (1963) "A Direct Search Code for the Estimation of Parameters in Stochastic Learning Models" Mental Health Research Institute, The University of Michigan, Preprint 109. Fogel, L.J.; Owens, A.J.; Walsh, M.J. (1966) ArtificiaZ SimuZated Evohtion. John Wiley and Sons, Inc. Foo, N. and Bosworth, J.L. (1972) Aspects of Genetic Operators." "Algebraic, Geometric, NASA CR-2099, 1972. Intelligence Through and Stochastic 41 Hall, C.D. and Ratz, H.C. (1967) "The Automatic Design of Fractional Factorial Experiments for Adaptive Process Optimization" Information and ControZ, 11, pp. 505-527. Hartmanis, J. and Stearns, Information Sciences. R.E. (1969) "Computational Complexity" Hill, J.D. (1969) "A Search Technique for Multimodal IEEE Trans. on Sys. 5%. and Cy., SSC-3, January. Holland, J.H. (1969) "A New Kind of Turnpike the American Math. Sot., 75, 6. Surfaces" Theorem" of BuZZetin Hollstien, R.B. (1971) "Artificial Genetic Adaptation in Computer Control Department of Computer Information and Systems" Doctoral Thesis. Control Engineering, The University of Michigan. Kalman, R.E.; Falb, P.L.; Arbib, Theory. McGraw-Hill Book Co. Lasdon, L.S. (1971) Optimization M.A. (1969) Topics in Mathematical Theory for Large Systems. System MacMillan. Bibliography on Optimization" In Recent Leon, A. (1965a) "A Classified Advances in Optimization Techniques. Lavi, A. and Vogl, T.P., eds. John Wiley and Sons. -, (1965bl "A Comparison Among Eight Known Optimization Procedures" Lavi, A. and Vogl, T-P., In Recent Advances in Optimization Techniques. eds. John Wiley and Sons. Luenberger, D.G. (1969) John Wiley and Sons. Mayr, E. (1965) Cambridge. Optimization Animal Species oxd EvbZution. Miele, A. and Cantrell, for The Minimization AppZication, 3, 6. J.W. (1969) of Functions" Mishkin, E. and Braun, L. (1961) Book Co. Pearson, J.D. (1968) Trans. on Sys. Sci. -, (1969) "Variable Harvard University Metric Press, "Study on a Memory Gradient Method JournaZ of Optimization Theory oxd Adaptive ControZ Systems. "Decomposition in Multivariable mtd Cy., 1. SSC-4. Polak, E. (1971) Computational Academic Press. 42 by Vector Space Methods. Methods" Systems" The Computer JournaZ, Methods in Optimization McGraw-Hill IEEE 72, 2. - A Unified Approach. Rastrigin, L.A. (1963) "The Convergence of the Random Search Method in the Extremal Control of a Many Parameter System" Atiomtion and Remote Co&&., 24, pp. 1337-1342. Rosenberg, R. (1967) tlSimulation of Genetic Populations with Biochemical Properties" Doctoral Thesis, Department of Computer and Communication Sciences, The University of Michigan. Rosenbrock, H.H. (1960) "Automatic Method for Finding the Greatest or Least Value of a Function" The Computer Journal, 3, pp. 175-184. Schumer, M.A. and Steiglitz, K. (1968) "Adaptive IEEE Trans. on Aut. Control, AC-13, 3. Step Size Random Search" Shekel, J. (1971) "Test Functions for Multimodal Fifth Annual Princeton Conference on Information Search Techniques" Science and Systems. Spang, H.A. (1962) "A Review of Minimization Functions" SIAM Review, 4, pp. 343-365. Sworder, D. (1966) Optimal Adaptive Wilde, Optimum Seeking Methods. D.J. (1964) Control Techniques for Non-Linear Systems. Academic Press. Prentice-Hall. Wood, C.F. (1964) "Review of Design Optimization Techniques" Westinghouse Research Laboratories, Science Paper 64-SC4-361-Pl. Zeigler, B.P. (1969a) "On the Feedback Complexity of Automata" Doctoral Thesis, Department of Computer and Communication Sciences, The University of Michigan. -9 U969bl "On the Feedback Complexity of Automata" Proceedings of The Third Annuul Princeton Conference on Information Science and Systems. NASA-Lan@ey, 1972 - 19 CR-2093 43
© Copyright 2026 Paperzz