Applying natural evolution for solving computational problems

Applying natural evolution for solving
computational problems
First lecture: Introduction to Evolutionary Computation
Second lecture: Genetic Programming
Inverted CERN School of Computing 2017
Daniel Lanza - CERN
Agenda
• Genetic Programming
• Introduction to GP
• Representation of individuals
• Phases
• The problem of bloat
• Implementing GP with ECJ
• Distributed processing
2
Introduction to Genetic Programming
Genetic programming (GP) is a technique whereby computer programs
are encoded as a set of genes that are then evolved using an
evolutionary algorithm. The space of solutions consists of computer
programs. [1]
• Belongs to the class of evolutionary algorithms
• History
• J. Holland 1962 (Ann Arbor, MI): Genetic Algorithms
• J. Koza 1989 (Palo Alto, CA): Genetic Programming
• When to use them
• When finding exact solution is computationally too demanding, but near-optimal solution is
sufficient
3
Representation of individuals
• Main characteristic of GP
• Individuals represent computer programs
• Individuals are represented as trees
• Nodes: operations
• Terminals: values or variables
(a*b)+(c/6)
+
x
a
/
b
c
6
4
Phases
• Following the evolutionary process
Initialization
Evaluation
Randomly generated
Calculate fitness
for each individual
Optimal solution
Breeding
Selection
Individuals are
crossed-over and
mutations take place
Choose individuals
for breeding
5
Phases: initialization
Evaluation
Initialization
Breeding
• First population is filled up with individuals
• Randomly generated trees with allowed operations and terminals
• Initial individual size is limited to a range of values
Selection
6
Phases: evaluation
Evaluation
Initialization
Breeding
Selection
• Computer programs represented by individuals are executed
• Different or all possible inputs are tried and output is checked
• Fitness could be the percentage of correct outputs
0.4
0.6
0.2
0.5
7
Phases: selection
Evaluation
Initialization
• A selection strategy is defined to choose the parents
Breeding
Selection
• Parents are individuals that will be used for breeding
• Fitness is taken into account (best individuals)
• But also some randomness affects the selection (simulating real life)
• Individuals with reduced fitness could have valuable features
• Elitism (optional): best individual is copied to the next generation
• Other factors that could be taken into account (multi-objective selection):
• Tree size
• Computational cost
0.4
0.6
0.2
0.5
8
Phases: breeding
Evaluation
Initialization
Breeding
Selection
• Selected individuals are crossed-over
• New individuals fill up the next generation of individuals
• Selection and breeding phases are performed till next population is filled
Crossing point
+
New individual
x
8
+
a
b
Crossing point
+
/
x
/
c
a
8
b
c
6
6
9
Phases: mutation
Evaluation
Initialization
Breeding
• With a very little likelihood
• A random modification is applied
Mutation point
Selection
Randomly
generated tree
+
6
x
Mutated individual
a
9
c
+
b
-
x
a
b
9
c
10
Phases: evaluation, selection, breeding, …
• The loop keeps going till an individual provides a proper solution
+
-
x
a
1
b 9
c
Evaluation
Calculate fitness for
each individual
Breeding
Selection
Individuals are crossedover and mutations take
place
Choose individuals for
breeding
11
The problem of bloat
• We would like solutions:
• Understandable by humans, therefore simple
• Computationally cheap to execute (CPU and memory)
• Bloat: the continuous increment in size of the trees
• Bigger trees use to provide ”better” solutions to the problems
• Control mechanisms need to be applied
12
The problem of bloat: control mechanisms
•
•
•
•
Limited tree size [3]
Size punishes fitness [4]
Multi-objective selection techniques [5]
Eliminate introns (code that does nothing) [6]
• Computational time to evaluate can be consider as size
• That would include:
• The complexity of used operations
• The amount of operations
• Normally time is hard to obtain
• Other processes may interfere
• Implicit methods [7] could be applied, no need to measure time
13
Evolutionary Computation research tool (ECJ)
• Developed at the George Manson University [2]
• Eliminate the need of implementing the evolutionary process
• Highly used in the community
• Main features:
•
•
•
•
•
•
Multi-platform: Java
Flexibility: easy to implement many kind of problems
Configuration files
Check points
Multi-threading
Pseudo-random number generator: reproduce results
14
Evolutionary Computation research tool (ECJ)
• Multiplexer problem
• Operations
•
•
•
•
And
Or
Not
If
• Terminals
• Data: D0, D1, D2, D3
• Address: A0, A1
D0 D1 D2 D3
A0
A0
Multiplexer
Output
15
Evolutionary Computation research tool (ECJ)
• Configuring the multiplexer problem
parent.0 = ../../gp/koza/koza.params
# Function set
gp.fs.size = 1
gp.fs.0.name = f0
gp.fs.0.size = 10
# Define problem
eval.problem = ec.app.multiplexer.Multiplexer
eval.problem.data =
ec.app.multiplexer.MultiplexerData
eval.problem.bits = 2
gp.fs.0.func.0 = ec.app.multiplexerslow.func.And
gp.fs.0.func.0.nc = nc2
gp.fs.0.func.1 = ec.app.multiplexerslow.func.Or
gp.fs.0.func.1.nc = nc2
gp.fs.0.func.2 = ec.app.multiplexerslow.func.Not
gp.fs.0.func.2.nc = nc1
gp.fs.0.func.3 = ec.app.multiplexerslow.func.If
gp.fs.0.func.3.nc = nc3
gp.fs.0.func.4 = ec.app.multiplexerslow.func.A0
gp.fs.0.func.4.nc = nc0
gp.fs.0.func.5 = ec.app.multiplexerslow.func.A1
gp.fs.0.func.5.nc = nc0
gp.fs.0.func.6 = ec.app.multiplexerslow.func.D0
gp.fs.0.func.6.nc = nc0
gp.fs.0.func.7 = ec.app.multiplexerslow.func.D1
gp.fs.0.func.7.nc = nc0
gp.fs.0.func.8 = ec.app.multiplexerslow.func.D2
gp.fs.0.func.8.nc = nc0
gp.fs.0.func.9 = ec.app.multiplexerslow.func.D3
gp.fs.0.func.9.nc = nc0
16
Evolutionary Computation research tool (ECJ)
• Problem implementation
17
Evolutionary Computation research tool (ECJ)
• Functions implementation
18
Evolutionary Computation research tool (ECJ)
• Terminals implementation
19
Evolutionary Computation research tool (ECJ)
• Execution
20
Distributed processing
• Evaluation of individuals could be computationally expensive
• Easily parallelizable
• Different approaches to distribute the work
• Island models
• Master-slaves
• Integrating ECJ with Hadoop [8]
21
Distributed processing: an example [9]
22
Distributed processing: an example [9]
23
Distributed processing: an example [9]
24
Summarizing
• EA and GP can be applied for a wide range of applications
• When finding the exact solution is computationally expensive
• When the near-optimal solution is sufficient
• Similar to the evolution of individuals in nature, computational problems can
be solved through evolution
• Initialization, evaluation, selection, breeding and mutation
• Different challenges still to face
• Bloat control
• Expensive evaluation, distribution of workload
• Existing tools, like ECJ, help with the research and application
25
Questions?
26
References
• [1] https://en.wikipedia.org/wiki/Genetic_programming
• [2] https://cs.gmu.edu/~eclab/projects/ecj/
• [3] S. Luke. Issues in scaling genetic programming: Breeding strategies, tree generation, and code bloat. PhD thesis,
Department of Computer Science, University of Maryland, A. V. Williams Building, University of Maryland, College
Park, MD 20742 USA., 2000.
• [4] R. Poli. A simple but theoretically-motivated method to control bloat in genetic programming. Genetic
Programming, Proceedings of EuroGP’2003, pages 204– 217. Springer., 2003.
• [5] S. Bleuler; M. Brack; L. Thiele. Multiobjective genetic programming: reducing bloat using spea2. Proceedings of
the 2001 Congress on Evolutionary Compu- tation CEC2001, pages 536–543, COEX, World Trade Center, 159
Samseong- dong, Gangnam-gu, Seoul, Korea. IEEE Press., 2001.
• [6] Eva Alfaro-Cid; Juan Juli á n Merelo Guerv o
́ s; F. Fern ́andez de Vega; Anna Isabel Esparcia-Alc á aar; Ken Sharman.
Bloat control operators and diversity in genetic programming: A comparative study. Evolutionary Computation
18(2): 305-332, 2010.
• [7] D. Lanza, F. Chavez, F. Fernandez and G. Olague. Prevencion del bloat mediante una interpretacion espaciotemporal de la Programacion Genetica Paralela. CEDI 2016 paper.
• [8] Francisco Chavez, Francisco Fernandez, Cesar Benavides, Daniel Lanza, Juan Villegas, Leonardo Trujillo, Gustavo
Olague, Graciela Roman. ECJ+HADOOP: An Easy Way to Deploy Massive Runs of Evolutionary Algorithms. EvoPAR
2015 paper.
• [9] Francisco Chavez, Francisco Fernandez, Cesar Benavides-Alvarez, Daniel Lanza and Juan Villegas. Speeding up
Evolutionary Approaches to Face Recognition by Means of Hadoop. EVO 2016 paper.
27