Applying natural evolution for solving computational problems First lecture: Introduction to Evolutionary Computation Second lecture: Genetic Programming Inverted CERN School of Computing 2017 Daniel Lanza - CERN Agenda • Genetic Programming • Introduction to GP • Representation of individuals • Phases • The problem of bloat • Implementing GP with ECJ • Distributed processing 2 Introduction to Genetic Programming Genetic programming (GP) is a technique whereby computer programs are encoded as a set of genes that are then evolved using an evolutionary algorithm. The space of solutions consists of computer programs. [1] • Belongs to the class of evolutionary algorithms • History • J. Holland 1962 (Ann Arbor, MI): Genetic Algorithms • J. Koza 1989 (Palo Alto, CA): Genetic Programming • When to use them • When finding exact solution is computationally too demanding, but near-optimal solution is sufficient 3 Representation of individuals • Main characteristic of GP • Individuals represent computer programs • Individuals are represented as trees • Nodes: operations • Terminals: values or variables (a*b)+(c/6) + x a / b c 6 4 Phases • Following the evolutionary process Initialization Evaluation Randomly generated Calculate fitness for each individual Optimal solution Breeding Selection Individuals are crossed-over and mutations take place Choose individuals for breeding 5 Phases: initialization Evaluation Initialization Breeding • First population is filled up with individuals • Randomly generated trees with allowed operations and terminals • Initial individual size is limited to a range of values Selection 6 Phases: evaluation Evaluation Initialization Breeding Selection • Computer programs represented by individuals are executed • Different or all possible inputs are tried and output is checked • Fitness could be the percentage of correct outputs 0.4 0.6 0.2 0.5 7 Phases: selection Evaluation Initialization • A selection strategy is defined to choose the parents Breeding Selection • Parents are individuals that will be used for breeding • Fitness is taken into account (best individuals) • But also some randomness affects the selection (simulating real life) • Individuals with reduced fitness could have valuable features • Elitism (optional): best individual is copied to the next generation • Other factors that could be taken into account (multi-objective selection): • Tree size • Computational cost 0.4 0.6 0.2 0.5 8 Phases: breeding Evaluation Initialization Breeding Selection • Selected individuals are crossed-over • New individuals fill up the next generation of individuals • Selection and breeding phases are performed till next population is filled Crossing point + New individual x 8 + a b Crossing point + / x / c a 8 b c 6 6 9 Phases: mutation Evaluation Initialization Breeding • With a very little likelihood • A random modification is applied Mutation point Selection Randomly generated tree + 6 x Mutated individual a 9 c + b - x a b 9 c 10 Phases: evaluation, selection, breeding, … • The loop keeps going till an individual provides a proper solution + - x a 1 b 9 c Evaluation Calculate fitness for each individual Breeding Selection Individuals are crossedover and mutations take place Choose individuals for breeding 11 The problem of bloat • We would like solutions: • Understandable by humans, therefore simple • Computationally cheap to execute (CPU and memory) • Bloat: the continuous increment in size of the trees • Bigger trees use to provide ”better” solutions to the problems • Control mechanisms need to be applied 12 The problem of bloat: control mechanisms • • • • Limited tree size [3] Size punishes fitness [4] Multi-objective selection techniques [5] Eliminate introns (code that does nothing) [6] • Computational time to evaluate can be consider as size • That would include: • The complexity of used operations • The amount of operations • Normally time is hard to obtain • Other processes may interfere • Implicit methods [7] could be applied, no need to measure time 13 Evolutionary Computation research tool (ECJ) • Developed at the George Manson University [2] • Eliminate the need of implementing the evolutionary process • Highly used in the community • Main features: • • • • • • Multi-platform: Java Flexibility: easy to implement many kind of problems Configuration files Check points Multi-threading Pseudo-random number generator: reproduce results 14 Evolutionary Computation research tool (ECJ) • Multiplexer problem • Operations • • • • And Or Not If • Terminals • Data: D0, D1, D2, D3 • Address: A0, A1 D0 D1 D2 D3 A0 A0 Multiplexer Output 15 Evolutionary Computation research tool (ECJ) • Configuring the multiplexer problem parent.0 = ../../gp/koza/koza.params # Function set gp.fs.size = 1 gp.fs.0.name = f0 gp.fs.0.size = 10 # Define problem eval.problem = ec.app.multiplexer.Multiplexer eval.problem.data = ec.app.multiplexer.MultiplexerData eval.problem.bits = 2 gp.fs.0.func.0 = ec.app.multiplexerslow.func.And gp.fs.0.func.0.nc = nc2 gp.fs.0.func.1 = ec.app.multiplexerslow.func.Or gp.fs.0.func.1.nc = nc2 gp.fs.0.func.2 = ec.app.multiplexerslow.func.Not gp.fs.0.func.2.nc = nc1 gp.fs.0.func.3 = ec.app.multiplexerslow.func.If gp.fs.0.func.3.nc = nc3 gp.fs.0.func.4 = ec.app.multiplexerslow.func.A0 gp.fs.0.func.4.nc = nc0 gp.fs.0.func.5 = ec.app.multiplexerslow.func.A1 gp.fs.0.func.5.nc = nc0 gp.fs.0.func.6 = ec.app.multiplexerslow.func.D0 gp.fs.0.func.6.nc = nc0 gp.fs.0.func.7 = ec.app.multiplexerslow.func.D1 gp.fs.0.func.7.nc = nc0 gp.fs.0.func.8 = ec.app.multiplexerslow.func.D2 gp.fs.0.func.8.nc = nc0 gp.fs.0.func.9 = ec.app.multiplexerslow.func.D3 gp.fs.0.func.9.nc = nc0 16 Evolutionary Computation research tool (ECJ) • Problem implementation 17 Evolutionary Computation research tool (ECJ) • Functions implementation 18 Evolutionary Computation research tool (ECJ) • Terminals implementation 19 Evolutionary Computation research tool (ECJ) • Execution 20 Distributed processing • Evaluation of individuals could be computationally expensive • Easily parallelizable • Different approaches to distribute the work • Island models • Master-slaves • Integrating ECJ with Hadoop [8] 21 Distributed processing: an example [9] 22 Distributed processing: an example [9] 23 Distributed processing: an example [9] 24 Summarizing • EA and GP can be applied for a wide range of applications • When finding the exact solution is computationally expensive • When the near-optimal solution is sufficient • Similar to the evolution of individuals in nature, computational problems can be solved through evolution • Initialization, evaluation, selection, breeding and mutation • Different challenges still to face • Bloat control • Expensive evaluation, distribution of workload • Existing tools, like ECJ, help with the research and application 25 Questions? 26 References • [1] https://en.wikipedia.org/wiki/Genetic_programming • [2] https://cs.gmu.edu/~eclab/projects/ecj/ • [3] S. Luke. Issues in scaling genetic programming: Breeding strategies, tree generation, and code bloat. PhD thesis, Department of Computer Science, University of Maryland, A. V. Williams Building, University of Maryland, College Park, MD 20742 USA., 2000. • [4] R. Poli. A simple but theoretically-motivated method to control bloat in genetic programming. Genetic Programming, Proceedings of EuroGP’2003, pages 204– 217. Springer., 2003. • [5] S. Bleuler; M. Brack; L. Thiele. Multiobjective genetic programming: reducing bloat using spea2. Proceedings of the 2001 Congress on Evolutionary Compu- tation CEC2001, pages 536–543, COEX, World Trade Center, 159 Samseong- dong, Gangnam-gu, Seoul, Korea. IEEE Press., 2001. • [6] Eva Alfaro-Cid; Juan Juli á n Merelo Guerv o ́ s; F. Fern ́andez de Vega; Anna Isabel Esparcia-Alc á aar; Ken Sharman. Bloat control operators and diversity in genetic programming: A comparative study. Evolutionary Computation 18(2): 305-332, 2010. • [7] D. Lanza, F. Chavez, F. Fernandez and G. Olague. Prevencion del bloat mediante una interpretacion espaciotemporal de la Programacion Genetica Paralela. CEDI 2016 paper. • [8] Francisco Chavez, Francisco Fernandez, Cesar Benavides, Daniel Lanza, Juan Villegas, Leonardo Trujillo, Gustavo Olague, Graciela Roman. ECJ+HADOOP: An Easy Way to Deploy Massive Runs of Evolutionary Algorithms. EvoPAR 2015 paper. • [9] Francisco Chavez, Francisco Fernandez, Cesar Benavides-Alvarez, Daniel Lanza and Juan Villegas. Speeding up Evolutionary Approaches to Face Recognition by Means of Hadoop. EVO 2016 paper. 27
© Copyright 2026 Paperzz