A Hybrid Evolutionary Algorithm for Polynomial Optimisation with Constraints Angel Kuri-Morales Instituto Tecnológico Autónomo de México Río Hondo No. 1 [email protected] Abstract The search for the global optimum in a function, particularly a polynomial with constrains, is an old problem. This paper proposes a hybrid algorithm (QGA) that merges two different and well studied methods for solving it: Genetic Algorithms (VGA) and Sequential Quadratic Programming (SQP). In so doing we expect to benefit from of each method’s advantages while ameliorating their shortcomings. We tested the hybrid algorithm on a large number of randomly generated polynomials From this extensive simulations we were able to extract statistically significant quantitative results proving that QGA is better than either of its components. Key words: Optimisation, Genetic Algorithms, Sequential Quadratic Programming, Hybrid Methods. 1. Introduction Some of the methods to solve optimisation problems that have been developed through time are based on the observation of the surrounding reality and in the analysis of the properties of the problem that has to be solved. Likewise, some algorithms developed in artificial intelligence are abstractions of a model seen in nature. For example, neural networks, simulated annealing, genetic algorithms, etc. GA’s theory, in particular, ensures they will arrive to an optimal solution given enough time. On the other hand, there are several mathematical methods devised for continuous optimisation with constraints. These use information of first and second order to find a feasible region or a direction to one where they can move to get better results. In this paper we discuss a hybrid method, where a rough search (performed by a genetic algorithm) is tuned up (with a second order gradient descent method). Putting these two techniques together will allow us to improve on both, as will be discussed in what follows. The kind of problem that is analysed in this work is: Karina Zapién-Arreola Instituto Tecnológico Autónomo de México Río Hondo No. 1 [email protected] minimize f (x ) subject to ci ( x) 0, i I where, x n , n 4 f ( x) : n is a polynomial of grade l 5, c i ( x) : n is a set of m linear constrains that together have the form: Ax b 0 with A mn and b m . The constraint gradients selected are independent in an active point. The restrictions were chosen as above to be able to tackle them with a traditional optimisation method. The evolutionary algorithm imposes no such restriction on the form of the constraints. 2. Genetic Algorithms Genetic Algorithms are inspired by the theory developed by Charles Darwin about evolution, which states that adaptation to the environment is the key factor that decides which individual will be survive on the next generation1. Therefore, just as evolution does, genetic algorithms initially generate a randomly coded population. In each generation every individual is evaluated, reflecting the fitness of the said individual to the environment (which depends, of course, on the problem that has to be solved). Then, it iterates changing the code of the individuals in the population with operators such as crossover and mutation. Roughly speaking the three operators (selection, crossover and mutation) correspond to learning about the problem, exploiting the knowledge already gained to approach the target efficiently and exploring new points in the space of solutions. One of the most powerful genetic algorithms is the Vasconcelos’ genetic algorithm (VGA) 2. In the VGA selection is performed based on the fitness for each of the individuals, including the ones that were present in the past population. From an original population of size n we generate n new individuals 1 General Problem. 2 For details refer to [2], [6] and [10] This result is proved in [11] from crossover and mutation (as described in what follows). Then we select the best n individuals (out of the 2n) and we discard the rest. Crossover is annular with probability pc. The couples for crossover are chosen a priori based on their fitness. The best one is crossed with probability pc with the worst, the second with the next to the last, etc. Mutation is performed, simply, when a bit from the population is chosen with probability pm and its state is changed ( 0 1 ). We iterate l times and the solution will correspond to the individual with the best fitness (in this case, it would be the one with the lower value). When applying the VGA to a constrained optimisation task those individuals that do not satisfy all of the constraints will get a very high constant value added to its fitness. This scheme has been tried before successfully to handle constrained optimisation [26]. GAs are very flexible, in the following sense. If we find a way to represent the solution in a (typically) binary-encoded individual, almost any problem may be tackled. GAs do not require the function to optimise to be differentiable. This sort of advantage makes it easy to program a GA. And since it looks for the solution in all the feasible space (via mutation), it can (and will, given sufficient time) converge to a global minimum. Unfortunately, a relatively long time to find the solution is sometimes required because the GA looks for it in all the space of solutions and we do not have information about how close it is to the best optimum once the algorithm has iterated k times. In order to accelerate convergence, we intend to selectively activate a mathematically more efficient optimisation method: SQP (sequential quadratic programming). Hopefully, then, the GA will receive the benefit of SQP and target on the desired global optimum more efficiently. 3. Sequential quadratic programming Some closed methods for optimisation are based in optimality theorems, which mainly describe the form that the first and second order derivatives3 must satisfy in order for the algorithm to work properly. Here we discuss SQP. This method starts with a feasible point that can be obtained, for instance, from the first phase of the Simplex method. The purpose of SQP is to convert the general problem in an easier sub-problem such as a Quadratic Problem. This can be solved to approach the general solution and used as a basis of an iterative problem. This Quadratic problem is obtained by approaching the function with its second order Taylor series. The Quadratic sub-problem is defined as follows: 3 Refer to [5], [8], [12], [13], [15] and [16] minimize 1 p T Gp dp 2 subject to Ap ci ( xk ) 0, where, iI 2 f ( x) if 2xx f ( x) positive definite G 2 xx in other case xx f ( x) (0.0001 emin ) I This is an approximation to the Hessian of f(x) evaluated in x f ( x) if 2xx f ( x) positive definite x, d (0.0001 emin ) x f ( x) in other case which yields an approximation to the gradient of f(x) evaluated in x, with emin min eigenvalues of 2xx f ( x) A ci ( x) T is the gradient of the i-th constrain evaluated in x, which in this case is the same as the matrix that defines the linear constrains. In several methods, a strategy is to convert a constrained problem into a problem without constrains using a merit function which seeks to reach the minimum and satisfy the constrains. This process, in the limit, converges to the optimal within the feasible set. Definition 1. The Lagrange function is defined as: L( x, ) f ( x) T c( x). where, is a vector of the so-called the Lagrange multipliers for the constraints. It is very common to use this function as a merit function. In order to understand the following theorems, we introduce the next definitions: Definition 2. The active set A(x ) at any feasible point x is the indices of the active inequality constraints; that is, A( x) i I | ci ( x) 0. * Definition 3. Given the point x and the active set A( x * ) , we say that the linear independence constraint qualification (LICQ) holds if the set of active constraint gradients ci ( x * ), i A( x * ) is linearly independent. Theorem 1 (First-Order Necessary Conditions). * Suppose that x is a local solution and that the LICQ * holds at x . Then there is a Lagrange multiplier vector * , with components *i , i E I , such that the following conditions are satisfied at ( x , ): * * x L( x * , * ) ci ( x * ) 0, 0, * i *i ci ( x * ) 0, iI for all i I for all i I for all These conditions are known as the Karush-KuhnTucker conditions, or KKT conditions for short. The theorem for the Second-Order Necessary Conditions states that the Hessian of the Lagrange function must be positive semi-definite in a space defined by the function and the constraints. Thus, the algorithm will stop if it reaches both conditions or it becomes numerically unstable. If instability is not reached, it is true that it will arrive rapidly to a (at least) local minimum since it has information about where to move in the next step to decrease the function. It can arrive as close as we decide to the optimal accordingly to the first and second order conditions for optimality. Unfortunately we cannot know if the point so reached is a global minimum and the solution given by this algorithm can be a local solution. The function must be twice differentiable and the functions for the constrains have to be once differentiable. The strategy is to use as a base the VGA and add another sort of mutation with probability p sqp if an individual is feasible. This mutation consists of applying SQP and changing the original individual and its fitness accordingly. If the individual is not feasible, it remains untouched. Because it uses SQP, we denote it as Quadratic Genetic Algorithm (QGA). 5. Results To test the three algorithms, we generated random polynomials of no more than 4 variables of degree 5 at most. We also generated random constrains that define a hypercube in the solution space. A statistical methodology is used to analyse each algorithm. We basically estimate the sample mean and the standard deviation. We find confidence intervals and, in order to make a better estimation of the parameters, we use the central limit theorem and Chebyshev’s theorem to find a more general confidence interval. The results pertaining the minimum values found are shown in table 1. 4. Hybrid algorithm The two algorithms in the previous sections perform their search quite differently. The algorithm proposed herein is one that intends to take advantage of each algorithm’s pluses while discarding their minuses to get a more robust and efficient one. Minimizer found Table 1. Parameters from the simulations Proof with the original data Total number of runs: 5450 SQP VGA QGA -0.7130 1.2998 -17.5294 1.8496 -1.0823 1.8330 -19.9167 0.9912 -1.0875 1.8380 -19.9168 0.9912 I. Statistics Mean S. Deviation Minimun Maximun II. Fail and success proportion Proportion than AG than AG_SQP Better Same Worse 60.81% 0.86% 38.33% 21.04% 21.91% 57.05% than SQP 38.33% 0.86% 60.81% than AG_SQP 9.02% 1.70% 89.28% than SQP 57.05% 21.91% 21.04% than AG 89.28% 1.70% 9.02% Proof with sample means of the original data Number of sample means: Size of the sampe: SQP 109 50 VGA QGA 9.6239 7.7890 III.Proof X for normality 2 Proof X 2 6.8716 IV. 93.75% confidence interval with Chebyshev's theorem for the original data Better case Worst case Proportion Better case Worst case -5.9128 4.4868 with AG with AG_SQP 0.6833 0.6852 1.4462 1.4385 SQP - Sequential Quadratic Programming algorithm VGA - Vasconcelos' Genetic Algorithm QGA - Genetic Algorithm with SQP * = mean -8.6533 6.4888 with SQP with AG_SQP 1.4635 1.0028 0.6915 0.9947 -8.6291 6.4542 with SQP with AG 1.4594 0.9972 0.6952 1.0054 In Table 1, part I (Statistics) we compare the mean values for SQP, VGA and QGA (the hybrid algorithm). The columns entitled “AG” refer to VG; the columns entitled “AG_SQP” refer to QGA. Its is clear that VGA is much better than SQP. But QGA improves on VGA alone, as expected. In part II of the table we show the number of times (as a percentage) in which we found better results of SQP vs. VGA and QGA.; VGA vs. SQP and QGQ; QGA vs. SQP and VGA. In the second part of the table we display similar results but applying the method developed in [11]. In this instance, normality was not assumed. Rather, we used a chi square test with 95% confidence to guarantee that the means were distributed normally. At this point we were able to determine μ X and σ X . From these, it is easy to determine the original population’s parameters from μ μ X and σ n σ X . The previous table shows that the independent results of either algorithm (VGA and SQP) were improved upon when using the hybrid combination of both. However, in some cases SQP found better results than VGA and QGA. This is due to the fact that, because of its mathematically closed design, SQP is more precise when it is started on the vicinity of a possible solution. We stress that the computational effort implied in SQP is an order or magnitude smaller than that of VGA. Therefore, the following strategy suggests itself: run both QGA and SQP. Select the best result. With this simple scheme we achieve the best results and the additional cost is negligible. References and Bibliography [1] Aarts, E. and Korst, J.: Simulated Annealing and Boltzmann Machines, John Wiley & Sons, 1990. [2] Bäck, T.: Evolutionary Algorithms in Theory and Practice, Oxford, 1996. [3] Bartle, R.: The Elements of Real Analysis, John Wiley, second edition, 1976. [4] Devore, J. L.: Probabilidad y Estadística para ingeniería y ciencias, trad: Jorge H. Romo, International Thomson Editores, cuarta edición, 1998. [5] Fletcher, R.: Practical Methods of Optimisation, John Wiley & Sons, second edition, 1990. [6] Fogel, D.: Evolutionary Computation : Toward a New Philosophy of Machine Intelligence, IEEE, second edition, 1999. [7] Freund, J., Simon, G.: Statistics, A first course, Prentice Hall, Fifth edition, 1991. [8] Gottfried, B., Weisman, J.: Introduction to optimisation theory, Prentice-Hall, Inc. 282-284, 1973. [9] Grossman, S., Álgebra lineal, McGraw Hill, quinta edición, 1996. [10] Kuri-Morales, A., A Comprehensive Approach To Genetic Algorithms in Optimisation and Learning. Theory and Applications, Instituto Politécnico Nacional, 1999. [11] Kuri-Morales, A., A Methodology for the Statistical Characterization of Genetic Algorithms, MICAI 2002, Advances in Artificial Intelligence, LNAI 2313. Springer, 2002. [12] Luenberger, D.: Programación Lineal y no lineal, Addison Wesley Iberoamericana, 1989. [13] The MatlabWorks Inc.: Optimisation Toolbox User's Guide, The Matlab-Works Inc., Fourth printing, 2000. [14] Megeath, J.: How to use statistics, Canfield Press, San Francisco, 1975. [15] Nemhauser, G., Rinnooy, A., Todd, M. Handbooks in operations research and management science, Elsevier Science Publishers B.V., NorthHolland, Volume I Optimisation, 1991. [16] Nocedal, J., Wright, S., Numerical Optimisation, Harcourt Academic Press, seventh edition, 2000. [17] Oetiker, T., Partl, H., Hyna, I. and Schlegl, E.: The Not So Short Introduction to LATEX, 2000. [18] Rice, J.:Mathematical Statistics and Data Analysis, Duxbury Press, Second Edition, 1995. [19] Ross, S., Introduction to Probability Models, Horcourt Academic Press, seventh edition, 2000. [20] Ross, S.: Stochastic Process, John Wiley & Sons, Inc., second edition, 1996. [21] Salvat Editores, S.A.: Enciclopedia Salvat Diccionario, Tomo 5, Salvat Editores, S.A., Barcelona, 1971. [22] Walpole, R., Myers, R.: Probabilidad y estadística, trad.: Gerardo Maldonado, McGraw-Hill, Cuarta Edición, 1992. [23] http://www.xrefer.com/... entry.jsp?xred=509861&secid [24] http://www.xrefer.com/entry/171151 [25] http://www.xrefer.com/entry.jsp/xred=170766 [26] Kuri-Morales, A. and Gutiérrez-García, J., Penalty Function Methods for Constrained Optimisation with Genetic Algorithms: A Statistical Analysis, Lecture Notes in Artificial Intelligence, LNAI 2313, Springer-Verlag, 2002, pp. 108-117.
© Copyright 2026 Paperzz