A Cost-benefit-based Adaptation Scheme for Multimeme Algorithms Wilfried Jakob Forschungszentrum Karlsruhe GmbH Institute for Applied Computer Science P.O. Box 3640, 76021 Karlsruhe, Germany [email protected] Abstract. Memetic Algorithms are the most frequently used hybrid of Evolutionary Algorithms (EA) for real-world applications. This paper will deal with one of the most important obstacles to their wide usage: compared to pure EA, the number of strategy parameters which have to be adjusted properly is increased. A cost-benefit-based adaptation scheme suited for every EA will be introduced, which leaves only one strategy parameter to the user, the population size. Furthermore, it will be shown that the range of feasible sizes can be reduced drastically. 1 Motivation Almost all practical applications of Evolutionary Algorithms use some sort of hybridisation with other algorithms like heuristics or local searchers, frequently in the form of a Memetic Algorithm (MA)1. MAs integrate local search in the offspring production part of an EA and, thus, introduce additional strategy parameters controlling the frequency and intensity of the local search among others [2, 3]. The benefit of this approach is a speed-up of the resulting hybrid usually in the magnitude of factors. The draw back is the greater amount of strategy parameters which have to be adjusted properly [2, 3, 4]. The necessary tuning of strategy parameters is one of the most important obstacles to the broad application of EAs to real-world problems. This can be summarised by the following statement: despite the wide scope of application of Evolutionary Algorithms, they are not widely applied. To overcome this situation, a reduction of the MA strategy parameters to be adjusted manually is urgently required. 2 Introduction To enhance the applicability of MAs, they should either adopt their strategy parameters or the parameters should be fixed to the greatest extent possible. Another point is the usage of application-independent local searchers (LS), as they maintain the gen1 See [1] and reports about real-world applications at PPSN, ICGA, or GECCO conference series. R.Wyrzykowski et al. (Eds.): Conf. Proc. of Parallel Processing and Applied Mathematics (PPAM 2007), LNCS 4967, Springer-Verlag, Berlin, pp.509-519, 2008 © Springer-Verlag, Berlin, Heidelberg, 2008 eral usability of the MA. In this paper a cost-benefit-based adaptation scheme suited for every EA shall be introduced. It will be applied to an MA for parameter optimisation, which uses two application-independent local searchers to maintain generality. The idea of cost-benefit-based adaptation was first published in 2004 [4, 5] and in a more elaborated form in 2006 [6]. In this paper an improved version of the adaptation scheme shall be presented together with a more detailed analysis of the experimental results as it was possible in [6]. To achieve this, the section about related work will be summarised here and the interested reader is referred to the detailed discussion in [6]. Other researchers dealt with dynamic adaptation which uses some sort of coevolution to adjust the strategy parameters [7, 8] or they worked in other application fields, especially combinatorial optimisation [8, 9]. As it is not known a priori which local searcher or meme suits best or is feasible at all, some researches also tackled the problem of adaptive meme construction [7] or the usage of several memes [7, 8, 10]. In [6] it is shown in detail how the work presented here fits into the gap previous work leaves and why co-evolution is not considered the method of first choice. Section 3 will introduce the cost-benefit-based adaptation scheme and its extension with respect to the one used in [6]. Section 4 will contain a short introduction of the basic algorithms used for the experiments. In Sect. 5 the test cases will be highlighted briefly, the old and new strategy parameters compared, and the experimental results discussed in detail. From this, a common parameterisation will be derived. The paper shall conclude with a summary and an outlook. 3 Concept of Cost-benefit-based Adaptation The basic idea is to use the costs measured in evaluations caused by and the benefit measured in fitness gain obtained from an LS run to control the selection of an LS out of a given set and the intensity of their search as well as the frequency of their usage. Suited LS must therefore have an external controllable termination parameter like an iteration limit or, even better, a convergence-based termination threshold. Firstly, the adaptive mechanism is described for the parameter adjustment. For the fitness gain a relative measure is used, because a certain amount of fitness improvement is much easier to achieve in the beginning of a search than in the end. The relative fitness gain rfg is based on a normalised fitness function in the range of 0 and fmax, which turns every task into a maximisation problem. rfg is the ratio between the achieved fitness improvement (fLS - fevo) and the maximum possible one (fmax - fevo), as shown in (1), where fLS is the fitness obtained by the LS and fevo the fitness of the offspring as produced by the evolution. rfg = f LS − f evo f max − f evo (1) ∑ rfg ∑ eval Pn ,L 1 ,i Pn ,L 1 ,i rfg rfg : ∑evalPn ,L 2 , j : ∑ Pn ,L 3 ,k ∑ Pn ,L 2 , j ∑ eval (2) Pn ,L 3 ,k For each parameter Pn a set of levels is defined, each of which has a probability p and a value v containing for each level an appropriate value of that particular parameter. Three of these levels are always active, i.e. have a probability p greater than zero. For each active level the required evaluations eval and the obtained rfg are calculated per LS usage and summed up. The probabilities of the levels are adjusted, if either each level was used at minimum usageL,min times or they all have been used usageL,max in total since the last adjustment. The new relation among the active levels L1, L2, and L3 is calculated as shown in (2). The sums are reset to zero after the adjustment, such that the adaptation Fig. 1. The three phases of a level movement. Active levis faster. If the probability of els are marked by a grey background. p denotes the probthe lowest or highest active ability and v the parameter value of a level level exceeds a threshold value of 0.5, the next lower or higher level, respectively, is added. The level at the opposite end is dropped and its likeliness is added to its neighbour. The new level is given a probability of 20% of the sum of the probabilities of the other two levels. This causes a move of three consecutive levels along the scale of possible ones according to their performance determined by the achieved fitness gain and the required evaluations. To ensure mobility in both directions, none of the three active levels may have a probability below 0.1. An example of an upward movement of the three active levels is shown in Fig. 1. It is done, because p of level 5 has become too large (first row). Level 3 is deactivated (p=0) and level 4 inherits its likeliness totalling now p=0.4 (second row). Finally, the new level 6 receives 20% of the probabilities of the two others as shown in the third row. The same procedure can be used to adjust the probabilities of the involved LSs. Again, the required evaluations eval as well as rfg are calculated and summed up for each LS usage. The new relation of the LS probabilities is computed in the same way as for the active levels, if either each LS was used at minimum usageLS,min times or there have been matingsmax matings in total since the last adjustment. If the probability of one LS drops below Pmin for three consecutive alterations, it is ignored from then on. To avoid premature deactivation, the probability is set to Pmin for the first time it is lower than Pmin. For the experiments Pmin was set to 0.1. This simple LS adaptation scheme was used for the investigations reported in [6]. As erroneous deactivations of an LS were observed, the adaptation speed had to be reduced by comparably high values for the re-adjustment thresholds. As a consequence, the extended LS adaptation procedure uses the old distribution by summing up one third of the old probabilities and two thirds of the newly calculated ones, thus resulting in a new likelihood of each LS. For EAs that create more than one offspring per mating, such as the one used here, a choice must be made between locally optimising the best (called best-improvement) or a fraction of up to all of these offspring (called all-improvement). This is controlled adaptively in the following way: the best offspring always undergoes LS improvement and for its siblings the chance of being treated by the LS is adaptively adjusted as described before, with the following peculiarities. After having processed all se- lected offspring, fLS is estimated as the fitness of the best locally improved child and fevo as that of the best offspring from pure evolution. 4 Basic Algorithms As EA, GLEAM (General Learning Evolutionary Algorithm and Method) [11] is used. It is an EA of its own, combining aspects from Evolution Strategy and Genetic Algorithms with the concept of abstract data types for the easy formulation of genes and chromosomes. A detailed and updated description can be found in [12]. GLEAM uses ranking-based selection, elitist offspring acceptance, and a structured population based on a neighbourhood model [13] that causes an adaptive balance between exploration and exploitation and avoids premature convergence. This is achieved by maintaining niches within the population for a longer period of time and thus, sustains diversity. Hence, GLEAM can be regarded a more powerful EA compared to simple ones, which makes it harder to reach an improvement by adding local search. On the other hand, if an improvement can be achieved by adding and applying memes adaptively, then at least the same advantage, if not better, can be expected by using a simpler EA. As local searchers, two well-known procedures from the sixties, the Rosenbrock and the Complex algorithms, are used, since they are known as powerful local search procedures. As they are derivative-free and able to handle restrictions, they maintain general usability. The implementation is based on Schwefel [14], who gives a detailed description of both algorithms together with experimental results. Hereinafter, the algorithms will be abbreviated by R and C. GLEAM and the two LS form HyGLEAM (Hybrid General-purpose Evolutionary Algorithm and Method). Apart from the basic procedures, HyGLEAM contains a simple MA (SMA), consisting of one meme (local searcher) and an adaptive Multimeme Algorithm (AMMA) using both LS. Earlier research showed that Lamarckian evolution, where the chromosomes are updated according to the LS improvement, performs better than without updates [3, 4, 5]. The danger of premature convergence, which was observed by other researchers, is avoided by the neighbourhood model used. Hence, Lamarckian evolution was used for the experiments with the AMMA. 5 Experimental Results 5.1 Test Cases Five test functions taken from the GENEsYs collection [15] and two real-world problems [16, 17] were used, see Table 1. Due to the lack of space, they shall be described very briefly only, and the interested reader is referred to the given literature. Rotated versions of Shekel’s Foxholes and the Rastrigin function were employed in order to make them harder, see [4, 5]. The scheduling task is solved largely by assigning start times to production batches so that the combinatorial aspect is limited to solving conflicts arising from assignments of the same time. Table 1. Important properties of the test cases used. fi are the function numbers of [15] Para- Modality Implicit Range meter Restrict. Schwefel’s Sphere [15, f1] 30 real unimodal no [-5∗106, 5∗106] Shekel’s Foxholes [15, f5] 2 real multimodal no [-500, 500] Gen. Rastrigin f. [15, f7] 5 real multimodal no [-5.12, 5.12] Fletcher & Powell f. [15, f16] 5 real multimodal no [-3.14, 3.14] Fractal f. [15, f13] 20 real multimodal no [-5, 5] Design optimisation [16] 3 real multimodal no Scheduling+resource opt.[17] 87 int. multimodal yes Test Case 5.2 Target Value 0.01 0.998004 0.0001 0.00001 -0.05 Strategy Parameters The adaptive Multimeme Algorithm (AMMA) controls meme selection and five strategy parameters which affect the intensity of local search (thR, limitR, and limitC) and the frequency of meme application (all-imprR and all-imprC) as shown in Table 2. thR is a convergence-based termination threshold value of the Rosenbrock procedure, while limitR and limitC simply limit the LS iterations. The two all-impr parameters control the probabilities of applying the corresponding meme to the siblings of the best offspring in case of adaptive all-improvement, see also Sect. 3. Table 2. Adaptively controlled strategy parameters of the Multimeme Algorithm (AMMA) Strategy Parameter thR limitR, limitC all-imprR, all-imprC Values Used for the Experiments 10-1, 10-2, 10-3, 10-4, 10-5, 10-6, 10-7, 10-8, 10-9 100, 200, 350, 500, 750, 1000, 1250, 1500, 1750, 2000 0, 0.2, 0.4, 0.6, 0.8, 1.0 The usage and matings parameters control the adaptation speed as shown in Table 3. The experiments reported in [6] motivated separate adaptation for different fitness ranges. The analysis of these experiments showed that the amount of these ranges is of minor influence. Consequently, the effects of unseparated and separate adaptation using three fitness ranges (0-40%, 40-70%, and 70-100% of fmax), called separated and common adaptation, are compared in the experiments reported here. Together with the already introduced strategy parameters, this results in four new strategy parameters as shown in Table 4, and a crucial question for Table 3. Strategy parameters settings for the adaptation speed the experiments is, Adaptation Meme Selection Parameter Adaptation whether they can be Speed usageLS, min matingsmax usage L,min usageL,max set to common valfast 3 15 3 12 ues without any medium 5 20 4 15 relevant loss of perslow 8 30 7 25 formance. Table 4. Strategy parameters of the SMA and the AMMA. The new ones are in italic. Strategy Parameter population size µ Lamarckian or Baldwinian evolution best- or static all-improvement best- or adaptive all-improvement LS selection LS iteration limit Rosenbrock termination threshold thR probability of the adaptive all-improvement adaptation speed simple or extended LS adaptation separate or common adaptation 5.3 Relevance and Treatment SMA AMMA manual manual manual Lamarckian evolution manual manual manual adaptive 5000 adaptive manual adaptive adaptive manual manual manual Experimental Results An algorithm together with a setting of its strategy parameters is called a job and the comparisons are based on the average number of evaluations from one hundred runs per job. Where necessary, t-test (95% confidence) is used to distinguish significant differences from stochastic ones. Only jobs are taken into account, the runs of which are all successful, i.e. reach the target values of Table 1 or a given solution quality in case of the real-world problems. Table 5 compares the results for the basic algorithms2 and Table 6 shows the results of the two SMAs. An important result is that the wide range of best population sizes ranging from 20 to 11,200 is narrowed down by the best SMAs to 5 to 70. Further results are that the choice of the best SMA as well as good values for thR are application-dependent and that static all-improvement is better in two cases. Moreover, all-improvement is crucial to the success of the Rastrigin function. In all cases an improvement can be achieved in the magnitude of factors. In this respect the sphere function and the design optimisation are exceptional as they are solved during the first generation by the SMA-R or SMA-C, respectively. As most real-world applications are of multimodal nature, the efforts are geared to this case and the sphere function is chosen for checking the obtained results with a challenging unimodal test function. Thus, these results are not bad. But for the design optimisation, this outcome means that this task is of limited value for evaluating the adaptation scheme. Next, the best jobs of adaptive adaptation are compared to those obtained from the SMAs. What can be expected from adaptation: on the one hand, improvement because of better tuned strategy parameters and on the other hand, impairment, because 2 The results presented here differ from those reported in [3-6] in two cases. To better show the effects of adaptation, the sphere function is parameterised now in such a way that GLEAM can just solve it and the Rosenbrock procedure misses the 100% success rate with common values for thR. Secondly, a different GLEAM job is used for Shekel’s foxholes as a reference, because it has a much better confidence interval and is nearly as good as the old job. Table 5. Best jobs of the basic algorithms together with the population size µ and thR. Abbreviations: CI: confidence interval for 95% confidence, Succ.: success rate Test Case µ Sphere 120 Fletcher 600 Scheduling 1,800 Foxholes 300 Rastrigin 11,200 Fractal 20 Design 210 GLEAM Evaluations 37,964,753 483,566 5,376,334 108,435 3,518,702 195,129 5,773 CI 974,707 191,915 330,108 8,056 333,311 20,491 610 Rosenbrock Proc. thR Succ. Eval. 10-8 100 4.706 10-8 22 5,000 0.6 0 3,091 -4 10 5 133 0 0 10-6 15 891 Complex-Alg. Succ. Eval. 0 5,000 26 658 0 473 0 95 0 5,000 0 781 12 102 Table 6. Best jobs of both SMAs. Confidence intervals are given for jobs better than GLEAM only. Abbreviations: b/a: best- or static all-improvement, L: rate of Lamarckian evolution Test Case Sphere Fletcher Scheduling Foxholes Rastrigin Fractal Design µ 20 10 5 30 70 5 10 Rosenbrock-MA (SMA-R) Complex-MA (SMA-C) thR b/a L Eval. CI Eval. CI µ b/a L 10-6 b 100 5,453 395 10-4 b 100 13,535 1,586 5 b 100 4,684 812 0.6 b 100 69,448 10,343 10-2 a 100 10,710 1,751 20 b 0 8,831 2,073 10-2 a 100 315,715 46,365 150 a 100 3,882,513 10-2 b 100 30,626 3,353 10 b 100 1,065,986 10-4 b 5 4,222 724 5 b 100 1,041 141 adaptation means learning and learning causes some additional expense. And, in fact, both effects can be observed, as is shown in Fig. 2. The Rosenbrock procedure turns out to be the absolute dominating meme in the end phase of the runs for all test cases with two exceptions, the Foxhole function (0.78%) and the design optimisation (0.49%). The improved test cases all use a thR value of 0.01, while the worsened ones require lower values, as shown in Table 6. As the adaptation starts with values of 0.1 to 0.001, a longer phase of adaptation is required to decrease thR and to increase limitR, which is required to benefit from lower thR thresholds. This explanation of the observed impairments is checked for the most drastic case, the sphere function, by starting with medium levels for both parameters. As a result, the impairment factor can be reduced to 3.3. The effect of the extended LS adaptation as described in section 3 is also shown in Fig. 2. Unfortunately, the only two cases with significant differences show a different behaviour. In all other cases, a minimal, but insignificant improvement is observed in each case. A more detailed analysis of the results as it can be presented here shows that the danger of erroneous deactivation of a local searcher is considerably lowered, when the extended LS adaptation is used. Thus, the adaptation speed can be increased, resulting in more adaptations and a faster and better parameter adaptation. To interpret Fig. 2 correctly, it must be kept in mind that the performance of the SMAs is the result of a time-consuming and cumbersome tuning by hand. Up to now the comparisons have been based on alland best-improvement, although tasks as multimodal as the Rastrigin function require all-improvement. Hence, a generally applicable AMMA must use it. Fig. 3 shows the differences of best-, static-, and adapFig. 2. Comparison of the improvement or impairment factors based tive all-improveon the average efforts between two variants of the AMMA and the ment. The latter best SMA jobs. For a better presentation impairments are shown as outperforms the negative factors two others with two exceptions: Fletcher’s function, where the differences are not significant, and the unimodal Sphere function, where all-improvement is not suited at all. Fig. 3 shows that adaptive allimprovement even is superior to best-improvement is in three cases, videlicet Shekel’s Foxholes, fractal, and in particular the Rastrigin function. But on the other hand, it can also be oversized like in case of Fletcher’s function or the scheduling task. Thus, the usage of adaptive all-improvement is not only motivated by its need for the Rastrigin function. Apart from the performance aspects, the stability or robustness of the results must be considered also. Hereinafter, jobs, where only one or a few population sizes yield successful runs, will be omitted as stability-lacking jobs. A comparison of the remaining strategy parameters reveals Fig. 3. Comparison of best- and static all-improvement (SMA) and the following facts. adaptive all-improvement (AMMA). For a better overview, the results Using common adof the fractal and the Sphere function are scaled by 0.1, those of the aptation (cf. sect. Rastrigin function and the scheduling task by 0.01. Furthermore, one 5.2), there is at the bar of the Rastrigin function is not displayed completely Fig. 4. Comparison of the best SMA and AMMA jobs. Empty fields indicate an insufficient success rate (below 100%), while flat fields denote the fulfilling of the optimisation task, but with greater effort than the basic EA. The given improvements are the ratio between the means of the evaluations of the basic EA and the compared algorithms. Table 7: Results for best and recommended AMMA. Abbr.: Table 5 Test Case µ Sphere 5 Fletcher 10 Scheduling 20 Foxholes 30 Rastrigin 120 Fractal 10 Design 5 6 Best AMMA Recommended AMMA Eval. CI Eval. CI µ 54,739 3,688 30 194,653 17,094 11,451 1,644 5 12,599 2,486 251,917 22,453 20 306,843 51,595 5,467 1,086 50 5,673 916 257,911 31,818 70 300,551 73,152 19,904 2,801 5 23,075 2,562 1,383 210 5 1,581 384 minimum one test case with jobs lacking stability for all adaptation speeds. Using separated adaptation according to three fitness ranges (cf. section 5.2) instead and extended LS adaptation (cf. section 3), best results are reached with fast adaptation at a good stability. This is together with adaptive all-improvement the recommended common parameterisation. Leaving the exceptional cases of the Sphere function and the design task aside, 88.3% of the performance of the best handtuned AMMA can be reached on an average and 75.7% when considering all test cases. Fig. 4 and Table 7 summarise and compare these results. Conclusions and Outlook A common cost-benefit-based adaptation procedure for Memetic Algorithms was introduced, which controls both meme selection and the balance between global and local search. The latter is achieved by adapting the intensity of local search and the frequency of their usage. A common good parameterisation was given for the new strat- egy parameters of the adaptation scheme leaving only one parameter to be adjusted manually: the population size µ. In the experiments the useful range of µ could be narrowed down from ranges between 20 and 11,200 to ranges between 5 and 70. A value of 20 can be recommended to start with, provided that the task is assumed to be not very complex with not too many suboptima. Further investigations will take more test cases into account and aim at the extension of the approach to combinatorial tasks, namely, job scheduling and resource allocation in the context of grid computing. References 1. L. Davis, L. (ed): Handbook of Genetic Algorithms. Van Nostrand Reinhold, NY (1991) 2. Hart, W.E., Krasnogor, N., Smith, J.E. (eds.): Recent Advances in Memetic Algorithms. Studies in Fuzziness and Soft Computing, vol. 166, Springer, Berlin (2005) 3. Jakob, W.: HyGLEAM – An Approach to Generally Applicable Hybridization of Evolutionary Algorithms. In: Merelo, J.J., et al. (eds.): Conf. Proc. PPSN VII, LNCS 2439, Springer, Berlin (2002) 527–536 4. Jakob, W: A New Method for the Increased Performance of Evolutionary Algorithms by the Integration of Local Search Procedures. In German, PhD thesis, Dept. Mech. Eng., University of Karlsruhe, FRG, FZKA 6965, March 2004, see also: http://www.iai.fzk.de/~jakob/HyGLEAM/main-gb.html 5. Jakob, W., Blume, C., Bretthauer, G.: Towards a Generally Applicable Self-Adapting Hybridization of Evolutionary Algorithms. In: Conf. Proc. GECCO 2004, LNCS 3102, Springer, Berlin (2004) 790-791 and vol. Late Breaking Papers 6. Jakob, W: Towards an Adaptive Multimeme Algorithm for Parameter Optimisation Suiting the Engineers’ Needs. In: Runarsson, T.P., et al. (eds.): Conf. Proc. PPSN IX, LNCS 4193, Springer, Berlin (2006) 132-141 7. Krasnogor, N.: Studies on the Theory and Design Space of Memetic Algorithms. PhD thesis, Faculty Comput., Math. and Eng., Univ. West of England, Bristol, U.K. (2002) 8. Smith, J.E.: Co-evolving Memetic Algorithms: A learning approach to robust scalable optimisation. In: Conf. Proc. CEC 2003, IEEE press, Piscataway, N.J. (2003) 498-505 9. Bambha, N.K., Bhattacharyya, S.S., Zitzler, E., Teich, J.: Systematic Integration of Parameterized Local Search Into Evolutionary Algorithms. IEEE Trans. on Evolutionary Computation, vol. 8, no. 2 (2004) 137-155 10. Ong, Y.S., Keane, A.J.: Meta-Lamarckian Learning in Memetic Algorithms. IEEE Trans. on Evolutionary Computation, vol. 8, no. 2 (2004) 99-110, citation: p.100 11. Blume, C.: GLEAM - A System for Intuitive Learning. In: Schwefel, H.P., Männer, R. (eds): Conf. Proc. of PPSN I. LNCS 496, Springer Verlag, Berlin (1990) 48-54 12. Blume, C., Jakob, W.: GLEAM – An Evolutionary Algorithm for Planning and Control Based on Evolution Strategy. In: Cantú-Paz, E. (ed): GECCO – 2002, Vol. Late Breaking Papers (2002) 31-38 13. Gorges-Schleuter, M.: Genetic Algorithms and Population Structures - A Massively Parallel Algorithm. PhD thesis, Dept. Comp. Science, University of Dortmund, 1990 14. Schwefel, H.-P.: Evolution and Optimum Seeking. John Wiley & Sons, New York (1995). 15. Bäck, T.: GENEsYs 1.0 (1992) ftp://lumpi.informatik.uni-dortmund.de/pub/GA/ 16. Sieber, I. Eggert, H., Guth, H., Jakob, W.: Design Simulation and Optimization of Microoptical Components. In: Bell, K.D. et al. (eds): Proceedings of Novel Optical Systems and Large-Aperture Imaging. SPIE, vol.3430 (1998) 138-149 17. Blume, C., Gerbe, M.: Deutliche Senkung der Produktionskosten durch Optimierung des Ressourceneinsatzes. atp 36, 5/94, Oldenbourg Verlag, München (in German) (1994) 25-29
© Copyright 2026 Paperzz