Solving GA-Hard Problems with EMMRS and GPGPUs

Solving GA-Hard Problems with EMMRS and GPGPUs
J. Ignacio Hidalgo
J. Manuel Colmenar
[email protected]
[email protected]
Carlos Sánchez-Lacruz
Juan Lanchares
[email protected]
[email protected]
Jose L. Risco-Martín
[email protected]
Oscar Garnica
[email protected]
Adaptive and Bioinspired Systems Group (ABSys)
Facultad de Informática
Universidad Complutense de Madrid (Spain)
ABSTRACT
a GA fail in the search of optimal solutions: (1) the domain is not
appropriate; (2) sampling error is too large; (3) crossover destroys
good schemata; (4) the problem is deceptive.
The term deceptive was introduced, using other previous works,
by Goldberg [5] [7] to test the limitations of the Genetic Algorithms (GAs). Deceptive problems are a particular kind of problems, and when being solved with GAs, they exploit the weakness
of the coding of the chromosomes [15]. The problem stems from
classical encodings. Normally, a particular codification scheme of
binary numbers is chosen to transfer each binary sequence to the
corresponding numeric value (or other type of information) of the
parameter. Different techniques have been proposed to tackle deceptive problems. Among them, some papers presented techniques
that work with different encodings and representations. There are
also some theoretical works about genotype-phenotype mappings
and some variations on Whitley and Goldberg works. For example,
Shackleton et al[14] show the potential use of redundant genotypephenotype mappings to improve evolutionary algorithms. In [1]
Chow proposes a new encoding technique, the Evolutionary Mapping Method (EMM), to tackle some deceptive problems. Chow
uses multiple chromosomes in a single cell for mating with another cell within a single population. As he claimed, the mapping
from genotype-to-phenotype is explicitly evolved and maintained.
Although this works improved previous reported results, it fails on
solving some deceptive problems and the method does not assure a
100 % of optimal solutions for all of the tests solved.
Using Chow’s ideas, Risco et al. presented also a new method to
solve deceptive problems by modifying the Evolutionary Mapping
Method [12]. The method, EMMRS (EMM with Replacement and
Shift) adds a new operator to traditional crossover and mutation
operators, consisting on doing a replacement and a shift of some of
the bits within the chromosome. Experimental results showed the
efficiency of the proposal on deceptive problems, obtaining 100 %
of optimal solutions in less number of generations for the first set
of experiments used in [1] and a significant improvement of previous results for other deceptive problems. However EMMRS was
not tested with other kind of hard problems like discrete or combinatorial problems. In general, it is not an easy task to determine
whether a problem is deceptive or not [3], so having a good method
which works reasonably well on different kind of GA-Hard Problems (deceptive or not) it would be of interest for the community.
In this paper we have adapted EMMRS for solving the Traveling Salesman Problem (TSP). Given a set of cities and the cost of
traveling between each pair of cities, the TSP problem formulation
tries to find the cheapest way of visiting all the cities and returning
to your starting point, visiting each city once and only once. The
adaptation is not easy since the encodings and genetic operators
Different techniques have been proposed to tackle GA-Hard problems. Some techniques work with different encodings and representations, other use reordering operators and several, such as the
Evolutionary Mapping Method (EMM), apply genotype-phenotype
mappings. EMM uses multiple chromosomes in a single cell for
mating with another cell within a single population. Although EMM
gave good results, it fails on solving some deceptive problems. In
this line, EMMRS (EMM with Replacement and Shift) adds a new
operator, consisting on doing a replacement and a shift of some
of the bits within the chromosome. Results showed the efficiency
of the proposal on deceptive problems. However, EMMRS was not
tested with other kind of hard problems. In this paper we have adapted EMMRS for solving the Traveling Salesman Problem (TSP).
The encodings and genetic operators for solving the TSP are quite different to those applied on deceptive problems. In addition,
execution times recommended the parallelization of the GA. We
implemented a GPU parallel version. We present here some preliminary results proving that Evolutionary Mapping Method with
Replacement and Shift gives good results not only in terms of quality but also in terms of speedup on its GPU parallel version for
some instances of the TSP problem.
Categories and Subject Descriptors
I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods, and Search—Heuristic methods; G.1.6 [Numerical Analysis]: Optimization—Global optimization
Keywords
GAs, Hard problems, genotype and phenotype mapping
1.
INTRODUCTION AND MOTIVATION
Although GAs are very powerful algorithms capable of successfully solve a lot of complex optimization problems, there are some
problems which remain difficult to solve and are named GA-Hard
problems. Liepins and Vose [10] named 4 main reasons that make
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected].
GECCO’14, July 12–16, 2014, Vancouver, BC, Canada.
Copyright 2014 ACM 978-1-4503-2662-9/14/07 ...$15.00.
http://dx.doi.org/10.1145/2576768.2598219 .
1007
for solving the TSP are quite different to those applied in [12]. In
addition, when trying to solve big TSP instances, the high execution time recommends the parallelization of the GA. We choose a
GPGPU for implementing the parallel version and we have solved
several issues in its implementation.
The main contributions of this paper are
Table 1: TSP instances used in this study.
Instance Number of Cities Optimal Solution
Gr48
48
5046
Pr76
76
108159
kroA100
100
21282
kroB100
100
22141
kroC100
100
20749
kroD100
100
21294
kroE100
100
22068
We present the EMMRS method for permutational representations
We have implemented EMMRS on a GPU. EMMRS is not
directly parallelizable, so we explain here some important
issues of the parallelization process.
shift operator explained also in detail on Section 4. EMMRS solved efficiently a set of deceptive problems, obtaining up to 100 %
of optimal solutions in less number of generations than EMM. However, EMMRS was not tested with other kind of hard problems
like TSP, the minimum spanning tree problem ("MST") or other
combinatorial problems.
Given that techniques like EMMRS obtain good results for typical deceptive problems, the aim of our paper is to apply the same
ideas to a well-known hard problem like the Traveling Salesman
Problem (TSP). Many methods have been applied to solve the TSP,
however the purpose of this work is not to compete with them. Our
aim is to test the performance of the EMMRS algorithm (and its
GPU version) when solving not only deceptive problems but also a
real combinatorial problem with a high complexity, like the TSP.
We present here some preliminary results proving that Evolutionary Mapping Method with Replacement and Shift gives good
results not only in terms of quality but also in terms of speedup on
its GPU’s parallel version. In addition we have tested a shuffling
algorithm for preserving the diversity of random numbers on the
GPU.
The rest of the paper is organized as follows. In section 2 a brief
review of the literature on deceptive problems and evolutionary algorithms is performed. Section 3 introduce the TSP instances solved in this paper. Section 4 describes EMMRS and Section 5 explains some details an the adaptation of EMMRS to the TSP and
of the GPU parallelization. We present the experimental results on
Section 6. Finally Section 7 gives some conclusions and ideas for
future work.
3.
2.
TSP
In this work we have used several instances of the Traveling Salesman Problem (TSP). We have selected the TSP because is one
of the most intensively studied problems in computational mathematics and it also has a lot of applications in real world problems.
This work was made under a project which considers logistic problems. We should highlight that the propose of this work was to test
that EMMRS works not only on deceptive continuous problems but
also in discrete combinatorial optimization problem. Future work
will include harder instances of the TSP.
Given a set of cities, and known distances between each pair
of cities, the TSP problem is to obtain a tour that visits each city
exactly once returning to the starting point and that minimizes the
total distance traveled. If we represent the cities as the vertexes of a
graph and the connections between cities as edges, we can formally
define the problem as: Given an undirected graph, and a cost for
each edge of that graph, find a Hamiltonian circuit of minimum
total cost.
The TSP instances used in this study belong to TSPLIB1 library
and are listed on Table 1 including the number of cities and the
optimal tour length of each instance.
RELATED WORK
As we have mentioned, there are some problems which are GAHard Problems. Of course a lot of effort has been made during
the last years for improving the quality of the solutions obtained
by GA-based methods on those problems. Some papers presented
techniques that work with different encodings and representations.
One reference paper was presented by Whitley in 1991 [15]. Here,
the author presents several theorems about deception and proves
that the problems which are interesting for GA optimization involve some degree of deception. Whitley proposed the use of tags to
specify the phenotypic bit location, in addition to each binary chromosome.
Goldberg and Bridges also presented an analytical study of reordering operators in [6]. The main drawback of those approximations is the need of a double search. There are also some theoretical works about genotype-phenotype mappings and some variations on Whitley and Goldberg works. For example, Shackleton et
al. [14] need a double search to obtain optimal gene positions (instead of bit positions) to preserve building blocks. Igel and Toussaint worked on the design of neutral encodings to improve the
efficiency of evolutionary algorithms[8]. A similar approximation
is used by Dasgupta in [2]. He presents other approach for solving
deceptive problems, implementing what he called the structured
genetic algorithm. The structured GA uses a two-level hierarchical
encoding in its over-specified chromosomal representation. Finally,
Rothlauf designed practical representations for genetic and evolutionary computation in [13]. These works allow to better understand representation issues in this kind of algorithms.
In [1] Chow proposes EMM (see Section 4), to tackle some deceptive problems with evolved mapping chromosomes. However
EMM failed on solving some deceptive problems and the method
does not assure a 100 % of optimal solutions. Motivated by this lack
of results of EMM and as a continuation of Chow’s work, Risco et
al. proposed EMMRS [12] which incorporates a replacement and
data
A
B
C
D
E
F
G
H
phenotype
C
3
2
2
3
1
2
7
B
B
C
A
B
4
mapping
Figure 1: EMM example.
1
1008
http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/
G
D
EMMRS::main() {
Initialize population pop
t = 0;
While ((t < maxGenerations) and (Not Optimum Found))
{
parent1 = tournamentSelection(pop);
parent2 = tournamentSelection(pop);
children = crossover(parent1, parent2);
children.genotype.mutate(probMutation);
children.mapping.mutate(probMutation);
children.mapping.replacementAndShift(probRS);
t = t + 1;
}
Figure 2: EMMRS. Genetic operators.
4.
EMMRS::replacementAndShift(prob) {
i = 0;
While (i < mapping.length) {
If (randFloat(0,1) < prob) {
pos2 = randInteger(0, mapping.length)
posMin = min(i, pos2);
posMax = max(i, pos2);
mapping[posMin+2 .. posMax] = mapping[posMin+1
.. posMax-1]
mapping[posMin+1] = mapping[posMin]
}
i = i + 1;
}
EVOLUTIONARY MAPPING WITH REPLACEMENT AND SHIFT (EMMRS)
The encoding implemented in this paper is based on the method
of double codification presented by Chow. To improve it, Risco et
al. proposed EMMRS in [12]. They added a new genetic operator
called replacement and shift (R&S) that is applied after the mutation operator in EMM. In this paper we adapt the EMMRS to the
TSP problem and propose a parallel implementation to run in GPU.
Thus, in this section we first review the EMM, second we explain
EMMRS and finally we detail our EMMRS code implementation.
In [1] Chow proposes a double encoding technique, the Evolutionary Mapping Method (EMM), to tackle deceptive problems.
According to this method, each individual is represented by two
chromosomes: a data chromosome, which stores the genotype for
the optimization function, and a mapping chromosome which stores the location of each data as integer values. Each pair of data and
mapping chromosomes form a so called cell. Figure 1 represents an
example of the EMM codification.
The major advantage of EMM is that decoupling the genotype
and the mapping process permits the possibility of capturing useful
schemes in hard problems~[1]. The operators that are applied to
both chromosomes are those habitually used in GAs. The proposal
in that paper is to apply a traditional two points crossover operator
and a mutation operator, which is an integer operator that randomly
alters a gene in one of the positions of bits.
Hence, the basis of EMM are: (1) the data chromosomes are forced to concentrate themselves on the exploration and survival in
the genotypic space where the genomic materials constantly suffer
construction and destruction of schemes; (2) the mapping chromosomes concentrate themselves on the effort to obtain an optimal
mapping between the genotype and the phenotype.
Therefore given that the number of chromosomes is twice the
number of individuals, and genetic operators work over both elements, the amount of work to be done is higher in EMM versus a
classical GA. Therefore, a parallel implementation should be proposed in order to speed up the execution.
Risco et al. presented in [12] an R&S operator which is applied to
the mapping chromosome after crossover and mutation on an EMM
approach. They named this method as EMMRS (which stands for
EMM with Replacement and Shift). The authors proved that the
R&S operator reduces the number of evaluations needed to find
the global optimum. Figure 2 depicts the genetic operators applied
to the mapping chromosome under the EMMRS approach. As in
[1], a 2-point integer crossover and an integer mutation operator
are applied. After them, the R&S operator is applied, which randomly picks up two genes at random positions i and j, and makes
m[j] = m[i], where m is the mapping array. Next, it moves j to follow i, shifting the rest along to accommodate. Mutation and R&S
probabilities alter each gene independently with probabilities Pm
and PR&S .
Figure 3: Proposed EMMRS algorithm implementation.
Figure 4: Individual for 9 cities. G is the genotype chromosome,
M is the mapping chromosome and T is the tour.
As is stated in the paper by Chow, once a subschema is explicit
formed, the change of one genotypic bit may trigger the changes of
multiple defining bits on the phenotype. Hence, the Hamming distance between two hyper-planes is shortened. The inclusion of the
R&S operator obtains another reduction of the Hamming distance
between hyper-planes because of two reasons. First, replacement
reinforces the movement from one schema to another schema in the
mapping chromosome. Second, shifting bits takes into account the
relative order of the mapping bits and therefore, the relative order
of bits in the phenotype, which recalls the partial schemas reached
at that point. Thus, as stated in [12], the search in EMMRS is able
to get away from a deceptive attractor and go towards a global optimum faster than in EMM, specially in those problems where the
global optimum is far away (in terms of Hamming distance) from
the deceptive attractor. Therefore, all the benefits described before
lead us to select EMMRS as the scheme to tackle the TSP problem.
Hence, we have made an implementation of EMMRS as shown
in Figure 3. Given that the kernel of the algorithm is a traditional GA, we have defined the main function following its typical
approach. Initially, a random population is created. Next, we iterate until the termination condition is satisfied. The loop executes
the genetic operators, where selection and crossover are applied
using both the data and the mapping chromosomes. The mapping
chromosome is needed in selection to obtain the phenotype and
the fitness value. Crossover operator is applied to data and mapping chromosomes, generating two offsprings (children). After
1009
that, the mutation operator is applied over the offspring. Notice that
mutation is executed independently twice: one for the data chromosome and the other for the mapping chromosome. This requires the
generation of two random numbers in order to decide if mutation
is applied on each chromosome. Finally, the R&S operator is applied. Figure 3 shows the code implementation of this operator in
the replacementAndShift function, which performs the operator as described above.
5.
void Chromosome_init (int* Chromosome, const int N){
//N= Number of cities
unsigned int r=0;
int* l_aux;
l_aux=(int*)malloc((N)*sizeof(int));
for (int i=0; i<N; i++){
l_aux[i]=i+1;
}
for (int i=0; i<N-1; i++){
r=rand()\ %(N-i-1);
Individual [i] = l_aux[r];
l_aux[r] = l_aux[N-i-1];
}
Individual [N-1] = l_aux[0];
free (l_aux);
}
IMPLEMENTATION OF EMMRS IN TSP
In this section we explain some decisions that have been made
about the concepts presented previously in relation to the design
of the evolutionary algorithm. First we focus on the adaptation of
EMMRS and TSP and then we will review some important concepts of the parallel implementation on the GPU.
5.1
Figure 5: Algorithm to initialize a chromosome.
Representation
We employed a permutation representation of the individuftals,
which is the most natural way of representing problems based on
scheduling or sequencing. In permutation encodings if there are X
variables of the problem (jobs, cities, etc..), then the representation is performed using a list of X integers, each of which appears
exactly once in the chromosome. EMMRS needs two chromosomes, thus each individual is represented by two chromosomes of N
integers (where N is the number of cities) with values between 1
and N. A chromosome of genotype (G) and chromosome of mapping (M) . The advantage is that EMM and EMMRS were designed
for dealing with permutations, since the mapping operation is necessarily a permutation of the number of genes of the chromosome.
Chromosomes are stored separately, in two vectors of N ∗ Spop dimensions, where Spop is the size of the population of individuals.
This way of storing the information meets the requirements for allowing a correct memory coalescing when threads access the population stored in the global memory of the GPU. Figure 4 shows
an example of an individual for 9 cities, where G is the Genotype
chromosome, M is the mapping chromosome and T is the resulting
Tour which is represented by the pair of chromosomes. It is important to note that the permutational nature of the solutions of the
TSP slightly changes the mapping process. Unlike original EMM
and EMMRS implementations, we are allowed to map only once
each tour. We will see on the experimental results, that EMMRS
works also fine with this evolutionary mapping.
5.2
with two Individuals using Genotype and Mapping chromosomes
and 7 cities. Parent 1 is represented by MP 1 and GP 1 and Parent
2 by MP 2 and GP 2 . PMX works by first randomly selecting two
points within each Pair of chromosomes: am and bm for MP 1 and
MP 2 and ag and bg for GP 1 and GP 2 obtaining 3 sections; InitialMedium-Final.
Figure 6: PMX crossover example with Mapping an Genotype
chromosomes
Initial Population
When generating the initial population it is necessary to have
in mind both the problem and the parallel architecture. We have
applied the algorithm showed in Figure 5. The process uses an auxiliary vector laux for storing the elements still available for filling
the chromosome ( G or M )without repetition. This process is repeated twice for each individual, one for the G chromosome and
another for the M chromosome.
The main advantage of the algorithm above is that we do not
need to check what components of the laux vector have been already written, reducing the computational time of the process. This
is very important, specially having in mind that we works with up
to 4092 individuals per population.
5.3
Then, PMX interchanges the Medium segments of both parents
(genes from ai to bi ) to generate two new individuals Offspring1
represented by MO1 and GO1 and Offspring 2 by MO2 and GO2 .
The rest of the chromosomes of the Offsprings are filled as much
as possible with the same parent elements placed in the same position. The only condition is to respect the permutation encoding.
After this step if still there are free genes, those are positioned according to the position of the replaced allele in the other parents.
In other words PMX solves permutational constraints problem by
establishing relationships between genes exchanged. Mutation and
R&S operators were explained in section 4 and the selection is a
tournament selection of two individuals, which is a good selection
operator for parallel implementations on the GPU, since we can
perform the tournaments in parallel.
Operators
The method chosen for the crossover operator in this work is
known as PMX (Partially Mapped Crossover) [4] which has been
extensively applied for TSP evolutionary algorithms. The operator
is applied separately for genotype chromosomes (G) and for mapping chromosomes (M) of both parents. Figure 6 shows an example
5.4
Improving random numbers
The quality of random numbers is a very important issue when
implementing a non-deterministic optimization algorithm. There
1010
5.5
Parallel Implementation
As we have mentioned GPU computing on the parallelization
under the CUDA architecture have been applied. GPU computing
consists of using the GPU to perform general purpose computations.
CUDA (Compute Unified Device Architecture) is a Single Instruction Multiple Thread (SIMT) computing platform and programming model developed by NVIDIA. Broadly speaking, it allows to
use Graphics Processor Units (GPUs) for general purpose computation in a simplified way without a deep knowledge of the underlying architecture. Architectures compatible with the CUDA programming model are typically referred to as General Purpose Graphics Processor Units (GPGPUs). However, the CUDA platform presents several implementation constraints that might impact the final
performance of a GPGPU applications.
The population level parallelism can be exploited with the implementation of an island model in which disjoint sets of individuals evolve independently with the exception of infrequent migrations (exchanges of individuals). The evaluation of individuals of
the same population can be performed simultaneously through the
implementation of a Master-Worker model. We have implemented
several configurations of the parallel algorithm reported in Table
2. On it we can see the name of the Configuration (CONFIG), the
number of individuals per island (IND/ISL), the number of islands
(ISLANDS), the frequency of migration in number of generation
(F) and the number of total migrations(ERAS). The CONFIG name
will be used in section 6 to identify them. In this way EM M RSE
denoted that we run EM M RS algorithm on 64 islands with 64
individuals per island with a migration frequency of 30 generations
an a total number of (30 · 600) = 9000 generations.
Figure 7: The shuffler process
are a plethora of applications and recipes for the implementation
of random number generators on a CPU. However, on a GPU there are additional factors which make more difficult to implement
a good generator without degradation on the GPU performance. In
this paper we propose to use a simple method for refreshing random numbers on the GPU, in order to improve the quality of the
solutions of Parallel Evolutionary Algorithms without performance reduction. It is well known that the bandwidth between the GPU
and the CPU is very limited so this communication channel should
be used with caution.
In the early versions of our algorithm, random numbers were
recomputed from the CPU on each generation and then copied to
the GPU. Obviously this greatly slowed the process. Although we
did some preliminary tests with Curand () function and the Park
& Miller generation methods [11], we found little improvement in
both execution times and quality of solutions. While trying to fix
the bugs with those, widely tested, methods, we thought on several
solutions, and finally, we chose the option of generating a small
Random Number Shuffler (RNS) within the GPU. This shuffler has
the property of being easily paralelizable. RNS is just to shuffle
random vectors in the global memory. In this way we only have to
communicate with the Host once every two migration.
Figure 7 explains the shuffler process. The threads of a block
work in groups of 32, reading one position of the random numbers
vector and writing them in the next 32 positions, but displaced in
groups of four threads according to the sequence +2, -1, +1, -2 as
Figure 7 shows . Threads copy their values into registers and then
those are sent to the global memory displaced in GPU block and position as explained. Registers belonging to the following block store
also their values before to the write of the new values, so we have
no problem of missing values and hence we do not need additional
or auxiliary memory structures. Anyway, this should not be a problem, since random number within a block have no dependencies,
and changing one value from the random number list will have little
effect on the performance of the algorithm. In order to have a control of the process we perform all the necessary syncthreads operations. We tested the correctness of the process by checking that
the same seed produces always the same numbers on each generation. We implemented this process to solve some problems with the
quality of the obtained solutions, however we think that it is necessary a deeper study and a more exhaustive comparison with other
well tested methods presented in the literature [9].
CONFIG
A
B
C
D
E
F
G
H
I
J
5.6
Table 2: Configurations.
IND/ISL ISLANDS F
32
128
30
32
128
40
32
256
30
32
256
40
64
64
30
64
64
40
64
128
30
64
128
40
32
256
40
64
128
40
ERAS
600
450
300
225
600
450
300
225
4000
4000
EMMRS on the GPU
Host (CPU) and device (GPU) have different memory spaces.
This means that the device memory cannot be directly accessed
from the host and vice-versa. Thus, memory transfers are necessary to perform data exchanges between CPU and GPU. The programmer has to be aware of the complex GPU memory hierarchy.
Note that accessing on-chip memories is two orders of magnitude faster than off-chip memory accesses. Thus, an appropriate use
of both registers and shared memory is crucial to obtain the maximum performance from GPUs. The efficiency of those accesses is
determined by the configuration in terms of threads Blocks.
There are several important kernels in our GPU implementation.
Selection kernel is in charge of finding the best individuals and proceed with the tournament selection. This kernels launch as many
Blocks as islands and a Block has as many threads as individuals
per island. The fitness values of the individuals of an island are stored in the shared memory. Then one thread obtains the two best individuals on each Block. This is the less parallel part of the kernel.
1011
Addresses of the best individuals of each island are stored in global
memory. Tournament kernel follows a similar grid organization.
Crossover Kernel performs three operation: Crossover, Mutation
and Replacement and Shift. This kernel will have the population,
the vector of random numbers and the selection vector as inputs.
The output of the kernel is a vector of offspring population. The
structure of this kernel is the most unusual as it has as many blocks
as crossover operations (population divided by 2) and each block
has as many threads as genes has the chromosomes (i.e. number of
cities). Each block of the crossover operator manages a crossover
and each thread is responsible for copying only 1 gene. However, in
the repair phase, PMX operator does not allow thread level parallelization so a lot of computing time is lost there. Offspring are sent
from shared memory to global memory at the end of the kernel.
6.
Table 3: Study of the value of the probability of the Replacement and Shift.
Configuration 1
PR&S
Avg
Std Dev
# Opt
0.15
98.031 1.399755045
2
0.2
98.594 0.970789525
6
0.25
98.971 0.903708026
10
0.3
99.214 0.723249687
16
0.35
99.353 0.58194348
19
0.4
99.47 0.468225062
26
0.45
99.461 0.491970213
21
0.5
99.496 0.397107748
23
0.55
99.358 0.567018556
20
0.6
99.314 0.554906403
15
0.65
99.316 0.513764949
13
0.7
99.056 0.708178284
6
EXPERIMENTAL RESULTS
In this section we present four kind of results. First we justify
the election of the value of the probability of the Replacement and
Shift. Then, in order to check the utility of the Replacement and
Shift operator on discrete problems, we have performed a set of
experiments on several TSP instances listed on Table 1. We will
contrast the results with those of [1] and [12]. This was the main
motivation of the work. As we have also explained we have also
parallelized the algorithm on the GPU, thus we present also the
results of a profiling of the algorithm, which is used to select the
functions to parallelize. Finally we present some results in terms of
Speedup comparing the CPU and the CPU-GPU implementation.
All the experiments were carried out in a dedicated Intel Corei5 machine, a 4-core processor, running at 2.80 GHz with 8 GB
of RAM under a Linux operating system, NVIDIA GeForce GTX
570. After several tests, the crossover and mutation rates for all the
algorithms were set to 0.7 and 0.2, respectively. The RS probability was set to 0.4, as determined by the experiments shown in the
following Section.
6.1
better than GA and EMM for the selected TSP instances, we have
use a statistical analysis. We have used Kolmogorov-Smirnov test
to check normality. We obtained a p-value much less than 0.05 for
all the dataset, thus we need a non-parametric test. We can perform
a Wilcoxon sign ranked test, however in this case it is not necessary
since EMMRS is better than EMM and GA for all the instances of
Table 1. An interesting result is that on Table 5 and Table 6, results
of GA are better than the ones of EMM. We need more research on
this fact, but it seems that EMM alone is not positive for discrete
problems.
Table 5: Results for Gr48 instance.
Instance
Gr48
Algorithm
Avg.
Std
#Opt
EM M RSA 99.513 0.457
14
EM M RSB 99.346 0.538
11
EM M RSC 99.124 0.660
5
EM M RSD 98.865 0.646
1
EM M RSE 99.240 0.645
6
EM M RSF 99.118 0.678
8
EM M RSG 99.190 0.639
3
EM M RSH 99.076 0.731
5
EM MA
81.898 3.914
0
EM MB
81.369 3.867
0
EM MC
82.250 2.754
0
EM MD
80.502 3.376
0
EM ME
81.178 4.626
0
EM MF
80.278 4.214
0
EM MG
83.574 3.554
0
EM MH
81.861 2.889
0
GAA
85.478 3.550
0
GAB
85.531 3.687
0
GAC
85.242 3.295
0
GAD
83.023 3.190
0
GAE
83.491 4.286
0
GAF
83.905 3.822
0
GAG
86.390 4.215
0
GAH
85.302 3.118
0
Study of the value of the probability of the
Replacement and Shift.
Before to start testing EMMRS with the TSP instances we should
select a value for the probability of the R&S operador. We made
several test with Gr48 and Pr76 instances and we made 100 runs of
10 configurations. This is not the objective of this paper, however
we think it is important to highlight an extract of this study. For
instance, one of the best configuration in terms of the number of
optimal solutions reached is: 4096 individuals, organized in islands
of 32, with a migration frequency of 30 generations and a total of
18000 generations.
We have varied the R&S probability (PR&S ) from 0 to 1 in steps
of 0.05. Table 3 shows the % of averaged approximation to the optimal solution (Avg,) of 100 runs of Configuration 1 for the most significative results, i.e probability which reach to optimal solutions.
All the configurations obtained the best results for values of R&S
between 0.3 and 0.6, and in most cases around 0.4. We fixed then,
R&S probability to 0.4. As we have said, it would be necessary a
more exhaustive statistical analysis of these experiments.
6.2
TSP Results
Experimental results are shown in Table 4, Table 5 and Table 6.
These tables present the % of averaged approximation to the optimal solution (Avg,), the standard deviation (Std.) and the number
of optimal solutions (#Opt) obtained in runs of the best configuration of three algorithms; a GA with EMMRS, a GA with EMM,
and a simple GA. Each run stops when the maximum number of
evaluations is achieved. Although it seems clear that EMMRS is
1012
Algorithm
Instance
Avg.
#Opt
StdDev
Instance
Avg.
#Opt
StdDev
Instance
Avg.
#Opt
StdDev
Instance
Avg.
#Opt
StdDev
Instance
Avg.
#Opt
StdDev
6.3
Table 4: Results 100 cities for “kro” instances.
EM M RSI EM M RSJ EM MI EM MJ
GAI
kroA100
98.343
98.238
61.840
61.180
62.499
0
1
0
0
0
1.144
1.125
3.968
4.214
3.486
kroB100
97.516
97.451
63.029
62.459
65.569
0
0
0
0
0
1.116
0.975
3.255
3.065
4.667
kroC100
98.641
97.736
60.255
59.982
61.470
2
1
0
0
0
1.055
1.408
3.961
3.971
3.958
kroD100
98.044
97.570
62.834
61.655
63.387
1
0
0
0
0
0.955
1.237
3.762
3.046
4.078
kroE100
98.036
97.875
62.013
62.038
64.052
1
1
0
0
0
0.888
1.164
3.066
3.956
3.702
Analysis of the parallel algorithm
6.4
In most GPGPU scenarios, a segment of the code is still executed in the CPU in a sequential manner while the GPU is used to
accelerate those parts of the code presenting a higher computational
cost. Thus, the GPU is mainly used as a co-processor in charge of
executing only some specific tasks. It is important to note that launching a kernel is an order of magnitude slower than calling a CPU
function. Therefore, the obtained speedup must be high enough to
hide the overhead caused the kernel call and the memory transfers
between CPU and GPU. Figure 8 shows the percentage of the total
time spent on each part of the Parallel Algorithm. As was expected,
Fitness evaluation and PMX are the most time consuming parts.
This is due to the distances matrix, which has the information of
the cities and paths, does not fit in the shared memory so we do the
addition of the distances by the global memory of the GPU.
Figures about Shuffle operator and migration are also of interest.
As we can see, Shuffle time is negligible (0.814 %) when compared
with the cost of copying random numbers from the CPU (4.287 %).
Migration operations percentages are also very small (0.043 %).
62.808
0
4.063
TCP U (16Kgen )
TGP U ((16Kgen )
om op y
N
um
be
r Sh
uﬄ
e be
st
Se
PM lec/
o
X n cr
os
so
ve
r Fit
ne
ss
Ra
nk
M
igr a/
on
ot
he
rs
C
sf e
r 62.546
0
4.633
on
7.
Ra
nd
ra
n
a T
a/
ul
at
59.576
0
3.460
where TCP U (16Kgen ) and TGP U (16Kgen ) are the execution time
for 160000 generations on the CPU and on the CPU-GPU respectively. This figure is important also if we think (1) about the difficulty of the parallel implementation (on the GPU) of the crossover
operator when dealing with permutational encodings (as is the case
here with the TSP), and (2) When computing the fitness value on
the GPU, we need a high amount of data transfers with the global
Memory of the GPU, since we didn’t store the information of the
distances on the shared memory. The former should be improved
in future work when dealing with bigger TSP instances.
In
Po
p
63.094
0
4.625
Speedup Results
Speedup =
63,754 70 60 50 40 21,547 30 20 7,34 4,287 10 0,0023 0,814 1,516 0,662 0,018 0,043 0,0167 0 al
D
61.043
0
4.355
Regarding running times we made two different comparatives.
The first one is a comparison of the average execution time among
EMMRS, EMM and GA on the GPU. We can confirm what was
expected; EMMRS need more time than EMM (5 % to 8 % more, depending of the instance) and than GA (25 % to 30 % more).
Of course the main and only reason is the implementation of the
R&S operator, which is also the main contributor to the improvement of the quality of the solutions. The lower execution time of
the GA is justified by the use of half the number of chromosomes
than EMM and EMMRS. Although important, this increase in total
times seems reasonable because we get solutions 15 % better for
Gr48, 25 % better for Pr76 and more than 35 % for the five 100cities instances of the TSP tested.
The second study compares the time needed by EMMRS on both
the GPU and the CPU for the same number of generations. We
observed an speedup of 24, computing the speedup as:
% of the Total +me i/
GAJ
CONCLUSIONS
In this paper we have checked the utility of the EMMRS algorithm not only with deceptive functions but also with the TSP problem. The main motivation of the paper was to show that the Replacement and Shift favors the search in several regions of the space.
It works on a way similar to a mutation operator, however it is not
Figure 8: Execution time of each part of the algorithm
1013
Table 6: Results for Pr76 instance
Instance
Pr76
Algorithm
Avg.
Std
#Opt
EM M RSA 98.460 0.805
3
EM M RSB 98.548 0.956
6
EM M RSC 98.935 0.712
8
EM M RSD 98.939 0.744
9
EM M RSE 98.135 1.018
3
EM M RSF 98.319 1.133
5
EM M RSG 98.444 0.868
3
EM M RSH 98.358 0.893
3
EM MA
70.942 3.561
0
EM MB
70.766 4.366
0
EM MC
72.966 3.144
0
EM MD
72.765 3.349
0
EM ME
69.785 4.482
0
EM MF
70.830 3.690
0
EM MG
72.562 3.099
0
EM MH
72.902 3.283
0
GAA
71.318 3.953
0
GAB
72.237 3.870
0
GAC
76.042 3.546
0
GAD
74.763 3.397
0
GAE
70.778 3.620
0
GAF
71.792 4.663
0
GAG
74.148 3.965
0
GAH
74.170 3.769
0
GA
Problem
f4
f8
fU40
fU60
fB40
fB60
fS
fR
Gr48
Pr76
kroA100
kroB100
kroC100
kroD100
kroE100
%
100 %
0%
0%
0%
0%
0%
83 %
0%
0%
0%
0%
0%
0%
0%
0%
Table 7: default
EMM
Avg. %
Avg.
1690 100 % 916
n/a/
78 % 8291
n/a
100 % 1593
n/a
100 % 2739
n/a
100 % 4578
n/a
64 % 9353
3756 74 % 1874
n/a
0%
n/a
n/a/
0%
n/a/
n/a/
0%
n/a/
n/a/
0%
n/a/
n/a/
0%
n/a/
n/a/
0%
n/a/
n/a/
0%
n/a/
n/a/
0%
n/a/
EMMRS
%
Avg.
100 %
179
100 %
1040
100 %
105
100 %
253
100 %
380
100 %
550
82 %
2137
100 %
3105
28 %
13468
18 %
57911
2%
124703
0%
n/a
4%
85784
2%
64893
2%
34296
[6] D. E. Goldberg and C. L. Bridges. An analysis of a
reordering operator on a GA-hard problem. Biological
Cybernetics, 62:397–405, 1990.
[7] D. E. Goldberg and J. Richardson. Genetic algorithm with
sharing for multimodal function optimization. In
Proceedings of the Second International Conference on
Genetic Algorithms, pages 41–49, Hillsdale, NJ, 1987.
Lawrence Erlbaum Associates.
so aggressive in terms of the translation through the search space.
[8] C. Igel and M. Toussaint. On classes of functions for which
We think that applied with a medium probability allows to combiNo Free Lunch results hold. Information Processing Letters,
ne both exploration and exploitation. As a final conclusion we put
86:317–321, 2003.
together results from [12] and our new results. Although we need
[9] W. B. Langdon. Graphics processing units and genetic
not only more experimental study, but also a theoretical analysis,
programming: an overview. Soft Computing, 15:1657–1669,
we extract from the experimental results that the R&S is good for
2011.
problems so different as those presented in Table 7.
[10] G. E. Liepins and M. D. Vose. Deceptiveness and genetic
algorithm dynamics. In G. J. E. Rawlins, editor, FOGA,
Acknowledgements
pages 36–50. Morgan Kaufmann, 1990.
ABSys group is supported by Spanish Government grants INNPACTO- [11] S. K. Park and K. W. Miller. Random number generators:
IPT-2011-1198-430000 and TIN 2008-00508.
good ones are hard to find. Communications of The ACM,
31:1192–1201, 1988.
[12]
J.
L. Risco-Martin, J. I. Hidalgo, J. Lanchares, and O.
8. ADDITIONAL AUTHORS
Garnica.
Solving discrete deceptive problems with EMMRS.
Josefa Díaz (Universidad de Extremadura, GEA Group, email:
In Genetic and Evolutionary Computation Conference, pages
[email protected]) .
1139–1140, 2008.
[13] F. Rothlauf. Representations for genetic and evolutionary
9. REFERENCES
algorithms, volume 104 of Studies in Fuzziness and Soft
[1] R. Chow. Evolving genotype to phenotype mappings with a
Computing. Springer, 2002.
multiple-chromosome genetic algorithm. GECCO 2004, vol.
[14] M. Shackleton, R. Shipma, and M. Ebner. An investigation
3102 of LNCS, pages 1006–1017. Springer, 2004.
of redundant genotype-phenotype mappings and their role in
[2] D. Dasgupta. Handling Deceptive Problems Using a
evolutionary search. In IEEE Congress on Evolutionary
Different Genetic Search. In International Conference on
Computation, 2000.
Evolutionary Computation, pages 807–811, 1994.
[15] L. D. Whitley. Fundamental Principles of Deception in
[3] S. Forrest and M. Mitchell. What Makes a Problem Hard for
Genetic Search. In Foundations of Genetic Algorithms, pages
a Genetic Algorithm? Some Anomalous Results and Their
221–241, 1990.
Explanation. Machine Learning, 13:285–319, 1993.
[4] D. Goldberg and R. L. Alleles. Loci and the traveling
salesman problem. In International Conference on Genetic
Algorithms, 1985.
[5] D. E. Goldberg. Genetic Algorithms and Walsh Functions
Part II: Deception and its Analysis. Complex Systems,
3:129–152, 1989.
1014

Download Report

Solving GA-Hard Problems with EMMRS and GPGPUs

Paperzz.com

Your Paperzz