Solving GA-Hard Problems with EMMRS and GPGPUs J. Ignacio Hidalgo J. Manuel Colmenar [email protected] [email protected] Carlos Sánchez-Lacruz Juan Lanchares [email protected] [email protected] Jose L. Risco-Martín [email protected] Oscar Garnica [email protected] Adaptive and Bioinspired Systems Group (ABSys) Facultad de Informática Universidad Complutense de Madrid (Spain) ABSTRACT a GA fail in the search of optimal solutions: (1) the domain is not appropriate; (2) sampling error is too large; (3) crossover destroys good schemata; (4) the problem is deceptive. The term deceptive was introduced, using other previous works, by Goldberg [5] [7] to test the limitations of the Genetic Algorithms (GAs). Deceptive problems are a particular kind of problems, and when being solved with GAs, they exploit the weakness of the coding of the chromosomes [15]. The problem stems from classical encodings. Normally, a particular codification scheme of binary numbers is chosen to transfer each binary sequence to the corresponding numeric value (or other type of information) of the parameter. Different techniques have been proposed to tackle deceptive problems. Among them, some papers presented techniques that work with different encodings and representations. There are also some theoretical works about genotype-phenotype mappings and some variations on Whitley and Goldberg works. For example, Shackleton et al[14] show the potential use of redundant genotypephenotype mappings to improve evolutionary algorithms. In [1] Chow proposes a new encoding technique, the Evolutionary Mapping Method (EMM), to tackle some deceptive problems. Chow uses multiple chromosomes in a single cell for mating with another cell within a single population. As he claimed, the mapping from genotype-to-phenotype is explicitly evolved and maintained. Although this works improved previous reported results, it fails on solving some deceptive problems and the method does not assure a 100 % of optimal solutions for all of the tests solved. Using Chow’s ideas, Risco et al. presented also a new method to solve deceptive problems by modifying the Evolutionary Mapping Method [12]. The method, EMMRS (EMM with Replacement and Shift) adds a new operator to traditional crossover and mutation operators, consisting on doing a replacement and a shift of some of the bits within the chromosome. Experimental results showed the efficiency of the proposal on deceptive problems, obtaining 100 % of optimal solutions in less number of generations for the first set of experiments used in [1] and a significant improvement of previous results for other deceptive problems. However EMMRS was not tested with other kind of hard problems like discrete or combinatorial problems. In general, it is not an easy task to determine whether a problem is deceptive or not [3], so having a good method which works reasonably well on different kind of GA-Hard Problems (deceptive or not) it would be of interest for the community. In this paper we have adapted EMMRS for solving the Traveling Salesman Problem (TSP). Given a set of cities and the cost of traveling between each pair of cities, the TSP problem formulation tries to find the cheapest way of visiting all the cities and returning to your starting point, visiting each city once and only once. The adaptation is not easy since the encodings and genetic operators Different techniques have been proposed to tackle GA-Hard problems. Some techniques work with different encodings and representations, other use reordering operators and several, such as the Evolutionary Mapping Method (EMM), apply genotype-phenotype mappings. EMM uses multiple chromosomes in a single cell for mating with another cell within a single population. Although EMM gave good results, it fails on solving some deceptive problems. In this line, EMMRS (EMM with Replacement and Shift) adds a new operator, consisting on doing a replacement and a shift of some of the bits within the chromosome. Results showed the efficiency of the proposal on deceptive problems. However, EMMRS was not tested with other kind of hard problems. In this paper we have adapted EMMRS for solving the Traveling Salesman Problem (TSP). The encodings and genetic operators for solving the TSP are quite different to those applied on deceptive problems. In addition, execution times recommended the parallelization of the GA. We implemented a GPU parallel version. We present here some preliminary results proving that Evolutionary Mapping Method with Replacement and Shift gives good results not only in terms of quality but also in terms of speedup on its GPU parallel version for some instances of the TSP problem. Categories and Subject Descriptors I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods, and Search—Heuristic methods; G.1.6 [Numerical Analysis]: Optimization—Global optimization Keywords GAs, Hard problems, genotype and phenotype mapping 1. INTRODUCTION AND MOTIVATION Although GAs are very powerful algorithms capable of successfully solve a lot of complex optimization problems, there are some problems which remain difficult to solve and are named GA-Hard problems. Liepins and Vose [10] named 4 main reasons that make Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. GECCO’14, July 12–16, 2014, Vancouver, BC, Canada. Copyright 2014 ACM 978-1-4503-2662-9/14/07 ...$15.00. http://dx.doi.org/10.1145/2576768.2598219 . 1007 for solving the TSP are quite different to those applied in [12]. In addition, when trying to solve big TSP instances, the high execution time recommends the parallelization of the GA. We choose a GPGPU for implementing the parallel version and we have solved several issues in its implementation. The main contributions of this paper are Table 1: TSP instances used in this study. Instance Number of Cities Optimal Solution Gr48 48 5046 Pr76 76 108159 kroA100 100 21282 kroB100 100 22141 kroC100 100 20749 kroD100 100 21294 kroE100 100 22068 We present the EMMRS method for permutational representations We have implemented EMMRS on a GPU. EMMRS is not directly parallelizable, so we explain here some important issues of the parallelization process. shift operator explained also in detail on Section 4. EMMRS solved efficiently a set of deceptive problems, obtaining up to 100 % of optimal solutions in less number of generations than EMM. However, EMMRS was not tested with other kind of hard problems like TSP, the minimum spanning tree problem ("MST") or other combinatorial problems. Given that techniques like EMMRS obtain good results for typical deceptive problems, the aim of our paper is to apply the same ideas to a well-known hard problem like the Traveling Salesman Problem (TSP). Many methods have been applied to solve the TSP, however the purpose of this work is not to compete with them. Our aim is to test the performance of the EMMRS algorithm (and its GPU version) when solving not only deceptive problems but also a real combinatorial problem with a high complexity, like the TSP. We present here some preliminary results proving that Evolutionary Mapping Method with Replacement and Shift gives good results not only in terms of quality but also in terms of speedup on its GPU’s parallel version. In addition we have tested a shuffling algorithm for preserving the diversity of random numbers on the GPU. The rest of the paper is organized as follows. In section 2 a brief review of the literature on deceptive problems and evolutionary algorithms is performed. Section 3 introduce the TSP instances solved in this paper. Section 4 describes EMMRS and Section 5 explains some details an the adaptation of EMMRS to the TSP and of the GPU parallelization. We present the experimental results on Section 6. Finally Section 7 gives some conclusions and ideas for future work. 3. 2. TSP In this work we have used several instances of the Traveling Salesman Problem (TSP). We have selected the TSP because is one of the most intensively studied problems in computational mathematics and it also has a lot of applications in real world problems. This work was made under a project which considers logistic problems. We should highlight that the propose of this work was to test that EMMRS works not only on deceptive continuous problems but also in discrete combinatorial optimization problem. Future work will include harder instances of the TSP. Given a set of cities, and known distances between each pair of cities, the TSP problem is to obtain a tour that visits each city exactly once returning to the starting point and that minimizes the total distance traveled. If we represent the cities as the vertexes of a graph and the connections between cities as edges, we can formally define the problem as: Given an undirected graph, and a cost for each edge of that graph, find a Hamiltonian circuit of minimum total cost. The TSP instances used in this study belong to TSPLIB1 library and are listed on Table 1 including the number of cities and the optimal tour length of each instance. RELATED WORK As we have mentioned, there are some problems which are GAHard Problems. Of course a lot of effort has been made during the last years for improving the quality of the solutions obtained by GA-based methods on those problems. Some papers presented techniques that work with different encodings and representations. One reference paper was presented by Whitley in 1991 [15]. Here, the author presents several theorems about deception and proves that the problems which are interesting for GA optimization involve some degree of deception. Whitley proposed the use of tags to specify the phenotypic bit location, in addition to each binary chromosome. Goldberg and Bridges also presented an analytical study of reordering operators in [6]. The main drawback of those approximations is the need of a double search. There are also some theoretical works about genotype-phenotype mappings and some variations on Whitley and Goldberg works. For example, Shackleton et al. [14] need a double search to obtain optimal gene positions (instead of bit positions) to preserve building blocks. Igel and Toussaint worked on the design of neutral encodings to improve the efficiency of evolutionary algorithms[8]. A similar approximation is used by Dasgupta in [2]. He presents other approach for solving deceptive problems, implementing what he called the structured genetic algorithm. The structured GA uses a two-level hierarchical encoding in its over-specified chromosomal representation. Finally, Rothlauf designed practical representations for genetic and evolutionary computation in [13]. These works allow to better understand representation issues in this kind of algorithms. In [1] Chow proposes EMM (see Section 4), to tackle some deceptive problems with evolved mapping chromosomes. However EMM failed on solving some deceptive problems and the method does not assure a 100 % of optimal solutions. Motivated by this lack of results of EMM and as a continuation of Chow’s work, Risco et al. proposed EMMRS [12] which incorporates a replacement and data A B C D E F G H phenotype C 3 2 2 3 1 2 7 B B C A B 4 mapping Figure 1: EMM example. 1 1008 http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/ G D EMMRS::main() { Initialize population pop t = 0; While ((t < maxGenerations) and (Not Optimum Found)) { parent1 = tournamentSelection(pop); parent2 = tournamentSelection(pop); children = crossover(parent1, parent2); children.genotype.mutate(probMutation); children.mapping.mutate(probMutation); children.mapping.replacementAndShift(probRS); t = t + 1; } Figure 2: EMMRS. Genetic operators. 4. EMMRS::replacementAndShift(prob) { i = 0; While (i < mapping.length) { If (randFloat(0,1) < prob) { pos2 = randInteger(0, mapping.length) posMin = min(i, pos2); posMax = max(i, pos2); mapping[posMin+2 .. posMax] = mapping[posMin+1 .. posMax-1] mapping[posMin+1] = mapping[posMin] } i = i + 1; } EVOLUTIONARY MAPPING WITH REPLACEMENT AND SHIFT (EMMRS) The encoding implemented in this paper is based on the method of double codification presented by Chow. To improve it, Risco et al. proposed EMMRS in [12]. They added a new genetic operator called replacement and shift (R&S) that is applied after the mutation operator in EMM. In this paper we adapt the EMMRS to the TSP problem and propose a parallel implementation to run in GPU. Thus, in this section we first review the EMM, second we explain EMMRS and finally we detail our EMMRS code implementation. In [1] Chow proposes a double encoding technique, the Evolutionary Mapping Method (EMM), to tackle deceptive problems. According to this method, each individual is represented by two chromosomes: a data chromosome, which stores the genotype for the optimization function, and a mapping chromosome which stores the location of each data as integer values. Each pair of data and mapping chromosomes form a so called cell. Figure 1 represents an example of the EMM codification. The major advantage of EMM is that decoupling the genotype and the mapping process permits the possibility of capturing useful schemes in hard problems~[1]. The operators that are applied to both chromosomes are those habitually used in GAs. The proposal in that paper is to apply a traditional two points crossover operator and a mutation operator, which is an integer operator that randomly alters a gene in one of the positions of bits. Hence, the basis of EMM are: (1) the data chromosomes are forced to concentrate themselves on the exploration and survival in the genotypic space where the genomic materials constantly suffer construction and destruction of schemes; (2) the mapping chromosomes concentrate themselves on the effort to obtain an optimal mapping between the genotype and the phenotype. Therefore given that the number of chromosomes is twice the number of individuals, and genetic operators work over both elements, the amount of work to be done is higher in EMM versus a classical GA. Therefore, a parallel implementation should be proposed in order to speed up the execution. Risco et al. presented in [12] an R&S operator which is applied to the mapping chromosome after crossover and mutation on an EMM approach. They named this method as EMMRS (which stands for EMM with Replacement and Shift). The authors proved that the R&S operator reduces the number of evaluations needed to find the global optimum. Figure 2 depicts the genetic operators applied to the mapping chromosome under the EMMRS approach. As in [1], a 2-point integer crossover and an integer mutation operator are applied. After them, the R&S operator is applied, which randomly picks up two genes at random positions i and j, and makes m[j] = m[i], where m is the mapping array. Next, it moves j to follow i, shifting the rest along to accommodate. Mutation and R&S probabilities alter each gene independently with probabilities Pm and PR&S . Figure 3: Proposed EMMRS algorithm implementation. Figure 4: Individual for 9 cities. G is the genotype chromosome, M is the mapping chromosome and T is the tour. As is stated in the paper by Chow, once a subschema is explicit formed, the change of one genotypic bit may trigger the changes of multiple defining bits on the phenotype. Hence, the Hamming distance between two hyper-planes is shortened. The inclusion of the R&S operator obtains another reduction of the Hamming distance between hyper-planes because of two reasons. First, replacement reinforces the movement from one schema to another schema in the mapping chromosome. Second, shifting bits takes into account the relative order of the mapping bits and therefore, the relative order of bits in the phenotype, which recalls the partial schemas reached at that point. Thus, as stated in [12], the search in EMMRS is able to get away from a deceptive attractor and go towards a global optimum faster than in EMM, specially in those problems where the global optimum is far away (in terms of Hamming distance) from the deceptive attractor. Therefore, all the benefits described before lead us to select EMMRS as the scheme to tackle the TSP problem. Hence, we have made an implementation of EMMRS as shown in Figure 3. Given that the kernel of the algorithm is a traditional GA, we have defined the main function following its typical approach. Initially, a random population is created. Next, we iterate until the termination condition is satisfied. The loop executes the genetic operators, where selection and crossover are applied using both the data and the mapping chromosomes. The mapping chromosome is needed in selection to obtain the phenotype and the fitness value. Crossover operator is applied to data and mapping chromosomes, generating two offsprings (children). After 1009 that, the mutation operator is applied over the offspring. Notice that mutation is executed independently twice: one for the data chromosome and the other for the mapping chromosome. This requires the generation of two random numbers in order to decide if mutation is applied on each chromosome. Finally, the R&S operator is applied. Figure 3 shows the code implementation of this operator in the replacementAndShift function, which performs the operator as described above. 5. void Chromosome_init (int* Chromosome, const int N){ //N= Number of cities unsigned int r=0; int* l_aux; l_aux=(int*)malloc((N)*sizeof(int)); for (int i=0; i<N; i++){ l_aux[i]=i+1; } for (int i=0; i<N-1; i++){ r=rand()\ %(N-i-1); Individual [i] = l_aux[r]; l_aux[r] = l_aux[N-i-1]; } Individual [N-1] = l_aux[0]; free (l_aux); } IMPLEMENTATION OF EMMRS IN TSP In this section we explain some decisions that have been made about the concepts presented previously in relation to the design of the evolutionary algorithm. First we focus on the adaptation of EMMRS and TSP and then we will review some important concepts of the parallel implementation on the GPU. 5.1 Figure 5: Algorithm to initialize a chromosome. Representation We employed a permutation representation of the individuftals, which is the most natural way of representing problems based on scheduling or sequencing. In permutation encodings if there are X variables of the problem (jobs, cities, etc..), then the representation is performed using a list of X integers, each of which appears exactly once in the chromosome. EMMRS needs two chromosomes, thus each individual is represented by two chromosomes of N integers (where N is the number of cities) with values between 1 and N. A chromosome of genotype (G) and chromosome of mapping (M) . The advantage is that EMM and EMMRS were designed for dealing with permutations, since the mapping operation is necessarily a permutation of the number of genes of the chromosome. Chromosomes are stored separately, in two vectors of N ∗ Spop dimensions, where Spop is the size of the population of individuals. This way of storing the information meets the requirements for allowing a correct memory coalescing when threads access the population stored in the global memory of the GPU. Figure 4 shows an example of an individual for 9 cities, where G is the Genotype chromosome, M is the mapping chromosome and T is the resulting Tour which is represented by the pair of chromosomes. It is important to note that the permutational nature of the solutions of the TSP slightly changes the mapping process. Unlike original EMM and EMMRS implementations, we are allowed to map only once each tour. We will see on the experimental results, that EMMRS works also fine with this evolutionary mapping. 5.2 with two Individuals using Genotype and Mapping chromosomes and 7 cities. Parent 1 is represented by MP 1 and GP 1 and Parent 2 by MP 2 and GP 2 . PMX works by first randomly selecting two points within each Pair of chromosomes: am and bm for MP 1 and MP 2 and ag and bg for GP 1 and GP 2 obtaining 3 sections; InitialMedium-Final. Figure 6: PMX crossover example with Mapping an Genotype chromosomes Initial Population When generating the initial population it is necessary to have in mind both the problem and the parallel architecture. We have applied the algorithm showed in Figure 5. The process uses an auxiliary vector laux for storing the elements still available for filling the chromosome ( G or M )without repetition. This process is repeated twice for each individual, one for the G chromosome and another for the M chromosome. The main advantage of the algorithm above is that we do not need to check what components of the laux vector have been already written, reducing the computational time of the process. This is very important, specially having in mind that we works with up to 4092 individuals per population. 5.3 Then, PMX interchanges the Medium segments of both parents (genes from ai to bi ) to generate two new individuals Offspring1 represented by MO1 and GO1 and Offspring 2 by MO2 and GO2 . The rest of the chromosomes of the Offsprings are filled as much as possible with the same parent elements placed in the same position. The only condition is to respect the permutation encoding. After this step if still there are free genes, those are positioned according to the position of the replaced allele in the other parents. In other words PMX solves permutational constraints problem by establishing relationships between genes exchanged. Mutation and R&S operators were explained in section 4 and the selection is a tournament selection of two individuals, which is a good selection operator for parallel implementations on the GPU, since we can perform the tournaments in parallel. Operators The method chosen for the crossover operator in this work is known as PMX (Partially Mapped Crossover) [4] which has been extensively applied for TSP evolutionary algorithms. The operator is applied separately for genotype chromosomes (G) and for mapping chromosomes (M) of both parents. Figure 6 shows an example 5.4 Improving random numbers The quality of random numbers is a very important issue when implementing a non-deterministic optimization algorithm. There 1010 5.5 Parallel Implementation As we have mentioned GPU computing on the parallelization under the CUDA architecture have been applied. GPU computing consists of using the GPU to perform general purpose computations. CUDA (Compute Unified Device Architecture) is a Single Instruction Multiple Thread (SIMT) computing platform and programming model developed by NVIDIA. Broadly speaking, it allows to use Graphics Processor Units (GPUs) for general purpose computation in a simplified way without a deep knowledge of the underlying architecture. Architectures compatible with the CUDA programming model are typically referred to as General Purpose Graphics Processor Units (GPGPUs). However, the CUDA platform presents several implementation constraints that might impact the final performance of a GPGPU applications. The population level parallelism can be exploited with the implementation of an island model in which disjoint sets of individuals evolve independently with the exception of infrequent migrations (exchanges of individuals). The evaluation of individuals of the same population can be performed simultaneously through the implementation of a Master-Worker model. We have implemented several configurations of the parallel algorithm reported in Table 2. On it we can see the name of the Configuration (CONFIG), the number of individuals per island (IND/ISL), the number of islands (ISLANDS), the frequency of migration in number of generation (F) and the number of total migrations(ERAS). The CONFIG name will be used in section 6 to identify them. In this way EM M RSE denoted that we run EM M RS algorithm on 64 islands with 64 individuals per island with a migration frequency of 30 generations an a total number of (30 · 600) = 9000 generations. Figure 7: The shuffler process are a plethora of applications and recipes for the implementation of random number generators on a CPU. However, on a GPU there are additional factors which make more difficult to implement a good generator without degradation on the GPU performance. In this paper we propose to use a simple method for refreshing random numbers on the GPU, in order to improve the quality of the solutions of Parallel Evolutionary Algorithms without performance reduction. It is well known that the bandwidth between the GPU and the CPU is very limited so this communication channel should be used with caution. In the early versions of our algorithm, random numbers were recomputed from the CPU on each generation and then copied to the GPU. Obviously this greatly slowed the process. Although we did some preliminary tests with Curand () function and the Park & Miller generation methods [11], we found little improvement in both execution times and quality of solutions. While trying to fix the bugs with those, widely tested, methods, we thought on several solutions, and finally, we chose the option of generating a small Random Number Shuffler (RNS) within the GPU. This shuffler has the property of being easily paralelizable. RNS is just to shuffle random vectors in the global memory. In this way we only have to communicate with the Host once every two migration. Figure 7 explains the shuffler process. The threads of a block work in groups of 32, reading one position of the random numbers vector and writing them in the next 32 positions, but displaced in groups of four threads according to the sequence +2, -1, +1, -2 as Figure 7 shows . Threads copy their values into registers and then those are sent to the global memory displaced in GPU block and position as explained. Registers belonging to the following block store also their values before to the write of the new values, so we have no problem of missing values and hence we do not need additional or auxiliary memory structures. Anyway, this should not be a problem, since random number within a block have no dependencies, and changing one value from the random number list will have little effect on the performance of the algorithm. In order to have a control of the process we perform all the necessary syncthreads operations. We tested the correctness of the process by checking that the same seed produces always the same numbers on each generation. We implemented this process to solve some problems with the quality of the obtained solutions, however we think that it is necessary a deeper study and a more exhaustive comparison with other well tested methods presented in the literature [9]. CONFIG A B C D E F G H I J 5.6 Table 2: Configurations. IND/ISL ISLANDS F 32 128 30 32 128 40 32 256 30 32 256 40 64 64 30 64 64 40 64 128 30 64 128 40 32 256 40 64 128 40 ERAS 600 450 300 225 600 450 300 225 4000 4000 EMMRS on the GPU Host (CPU) and device (GPU) have different memory spaces. This means that the device memory cannot be directly accessed from the host and vice-versa. Thus, memory transfers are necessary to perform data exchanges between CPU and GPU. The programmer has to be aware of the complex GPU memory hierarchy. Note that accessing on-chip memories is two orders of magnitude faster than off-chip memory accesses. Thus, an appropriate use of both registers and shared memory is crucial to obtain the maximum performance from GPUs. The efficiency of those accesses is determined by the configuration in terms of threads Blocks. There are several important kernels in our GPU implementation. Selection kernel is in charge of finding the best individuals and proceed with the tournament selection. This kernels launch as many Blocks as islands and a Block has as many threads as individuals per island. The fitness values of the individuals of an island are stored in the shared memory. Then one thread obtains the two best individuals on each Block. This is the less parallel part of the kernel. 1011 Addresses of the best individuals of each island are stored in global memory. Tournament kernel follows a similar grid organization. Crossover Kernel performs three operation: Crossover, Mutation and Replacement and Shift. This kernel will have the population, the vector of random numbers and the selection vector as inputs. The output of the kernel is a vector of offspring population. The structure of this kernel is the most unusual as it has as many blocks as crossover operations (population divided by 2) and each block has as many threads as genes has the chromosomes (i.e. number of cities). Each block of the crossover operator manages a crossover and each thread is responsible for copying only 1 gene. However, in the repair phase, PMX operator does not allow thread level parallelization so a lot of computing time is lost there. Offspring are sent from shared memory to global memory at the end of the kernel. 6. Table 3: Study of the value of the probability of the Replacement and Shift. Configuration 1 PR&S Avg Std Dev # Opt 0.15 98.031 1.399755045 2 0.2 98.594 0.970789525 6 0.25 98.971 0.903708026 10 0.3 99.214 0.723249687 16 0.35 99.353 0.58194348 19 0.4 99.47 0.468225062 26 0.45 99.461 0.491970213 21 0.5 99.496 0.397107748 23 0.55 99.358 0.567018556 20 0.6 99.314 0.554906403 15 0.65 99.316 0.513764949 13 0.7 99.056 0.708178284 6 EXPERIMENTAL RESULTS In this section we present four kind of results. First we justify the election of the value of the probability of the Replacement and Shift. Then, in order to check the utility of the Replacement and Shift operator on discrete problems, we have performed a set of experiments on several TSP instances listed on Table 1. We will contrast the results with those of [1] and [12]. This was the main motivation of the work. As we have also explained we have also parallelized the algorithm on the GPU, thus we present also the results of a profiling of the algorithm, which is used to select the functions to parallelize. Finally we present some results in terms of Speedup comparing the CPU and the CPU-GPU implementation. All the experiments were carried out in a dedicated Intel Corei5 machine, a 4-core processor, running at 2.80 GHz with 8 GB of RAM under a Linux operating system, NVIDIA GeForce GTX 570. After several tests, the crossover and mutation rates for all the algorithms were set to 0.7 and 0.2, respectively. The RS probability was set to 0.4, as determined by the experiments shown in the following Section. 6.1 better than GA and EMM for the selected TSP instances, we have use a statistical analysis. We have used Kolmogorov-Smirnov test to check normality. We obtained a p-value much less than 0.05 for all the dataset, thus we need a non-parametric test. We can perform a Wilcoxon sign ranked test, however in this case it is not necessary since EMMRS is better than EMM and GA for all the instances of Table 1. An interesting result is that on Table 5 and Table 6, results of GA are better than the ones of EMM. We need more research on this fact, but it seems that EMM alone is not positive for discrete problems. Table 5: Results for Gr48 instance. Instance Gr48 Algorithm Avg. Std #Opt EM M RSA 99.513 0.457 14 EM M RSB 99.346 0.538 11 EM M RSC 99.124 0.660 5 EM M RSD 98.865 0.646 1 EM M RSE 99.240 0.645 6 EM M RSF 99.118 0.678 8 EM M RSG 99.190 0.639 3 EM M RSH 99.076 0.731 5 EM MA 81.898 3.914 0 EM MB 81.369 3.867 0 EM MC 82.250 2.754 0 EM MD 80.502 3.376 0 EM ME 81.178 4.626 0 EM MF 80.278 4.214 0 EM MG 83.574 3.554 0 EM MH 81.861 2.889 0 GAA 85.478 3.550 0 GAB 85.531 3.687 0 GAC 85.242 3.295 0 GAD 83.023 3.190 0 GAE 83.491 4.286 0 GAF 83.905 3.822 0 GAG 86.390 4.215 0 GAH 85.302 3.118 0 Study of the value of the probability of the Replacement and Shift. Before to start testing EMMRS with the TSP instances we should select a value for the probability of the R&S operador. We made several test with Gr48 and Pr76 instances and we made 100 runs of 10 configurations. This is not the objective of this paper, however we think it is important to highlight an extract of this study. For instance, one of the best configuration in terms of the number of optimal solutions reached is: 4096 individuals, organized in islands of 32, with a migration frequency of 30 generations and a total of 18000 generations. We have varied the R&S probability (PR&S ) from 0 to 1 in steps of 0.05. Table 3 shows the % of averaged approximation to the optimal solution (Avg,) of 100 runs of Configuration 1 for the most significative results, i.e probability which reach to optimal solutions. All the configurations obtained the best results for values of R&S between 0.3 and 0.6, and in most cases around 0.4. We fixed then, R&S probability to 0.4. As we have said, it would be necessary a more exhaustive statistical analysis of these experiments. 6.2 TSP Results Experimental results are shown in Table 4, Table 5 and Table 6. These tables present the % of averaged approximation to the optimal solution (Avg,), the standard deviation (Std.) and the number of optimal solutions (#Opt) obtained in runs of the best configuration of three algorithms; a GA with EMMRS, a GA with EMM, and a simple GA. Each run stops when the maximum number of evaluations is achieved. Although it seems clear that EMMRS is 1012 Algorithm Instance Avg. #Opt StdDev Instance Avg. #Opt StdDev Instance Avg. #Opt StdDev Instance Avg. #Opt StdDev Instance Avg. #Opt StdDev 6.3 Table 4: Results 100 cities for “kro” instances. EM M RSI EM M RSJ EM MI EM MJ GAI kroA100 98.343 98.238 61.840 61.180 62.499 0 1 0 0 0 1.144 1.125 3.968 4.214 3.486 kroB100 97.516 97.451 63.029 62.459 65.569 0 0 0 0 0 1.116 0.975 3.255 3.065 4.667 kroC100 98.641 97.736 60.255 59.982 61.470 2 1 0 0 0 1.055 1.408 3.961 3.971 3.958 kroD100 98.044 97.570 62.834 61.655 63.387 1 0 0 0 0 0.955 1.237 3.762 3.046 4.078 kroE100 98.036 97.875 62.013 62.038 64.052 1 1 0 0 0 0.888 1.164 3.066 3.956 3.702 Analysis of the parallel algorithm 6.4 In most GPGPU scenarios, a segment of the code is still executed in the CPU in a sequential manner while the GPU is used to accelerate those parts of the code presenting a higher computational cost. Thus, the GPU is mainly used as a co-processor in charge of executing only some specific tasks. It is important to note that launching a kernel is an order of magnitude slower than calling a CPU function. Therefore, the obtained speedup must be high enough to hide the overhead caused the kernel call and the memory transfers between CPU and GPU. Figure 8 shows the percentage of the total time spent on each part of the Parallel Algorithm. As was expected, Fitness evaluation and PMX are the most time consuming parts. This is due to the distances matrix, which has the information of the cities and paths, does not fit in the shared memory so we do the addition of the distances by the global memory of the GPU. Figures about Shuffle operator and migration are also of interest. As we can see, Shuffle time is negligible (0.814 %) when compared with the cost of copying random numbers from the CPU (4.287 %). Migration operations percentages are also very small (0.043 %). 62.808 0 4.063 TCP U (16Kgen ) TGP U ((16Kgen ) om op y N um be r Sh uffl e be st Se PM lec/ o X n cr os so ve r Fit ne ss Ra nk M igr a/ on ot he rs C sf e r 62.546 0 4.633 on 7. Ra nd ra n a T a/ ul at 59.576 0 3.460 where TCP U (16Kgen ) and TGP U (16Kgen ) are the execution time for 160000 generations on the CPU and on the CPU-GPU respectively. This figure is important also if we think (1) about the difficulty of the parallel implementation (on the GPU) of the crossover operator when dealing with permutational encodings (as is the case here with the TSP), and (2) When computing the fitness value on the GPU, we need a high amount of data transfers with the global Memory of the GPU, since we didn’t store the information of the distances on the shared memory. The former should be improved in future work when dealing with bigger TSP instances. In Po p 63.094 0 4.625 Speedup Results Speedup = 63,754 70 60 50 40 21,547 30 20 7,34 4,287 10 0,0023 0,814 1,516 0,662 0,018 0,043 0,0167 0 al D 61.043 0 4.355 Regarding running times we made two different comparatives. The first one is a comparison of the average execution time among EMMRS, EMM and GA on the GPU. We can confirm what was expected; EMMRS need more time than EMM (5 % to 8 % more, depending of the instance) and than GA (25 % to 30 % more). Of course the main and only reason is the implementation of the R&S operator, which is also the main contributor to the improvement of the quality of the solutions. The lower execution time of the GA is justified by the use of half the number of chromosomes than EMM and EMMRS. Although important, this increase in total times seems reasonable because we get solutions 15 % better for Gr48, 25 % better for Pr76 and more than 35 % for the five 100cities instances of the TSP tested. The second study compares the time needed by EMMRS on both the GPU and the CPU for the same number of generations. We observed an speedup of 24, computing the speedup as: % of the Total +me i/ GAJ CONCLUSIONS In this paper we have checked the utility of the EMMRS algorithm not only with deceptive functions but also with the TSP problem. The main motivation of the paper was to show that the Replacement and Shift favors the search in several regions of the space. It works on a way similar to a mutation operator, however it is not Figure 8: Execution time of each part of the algorithm 1013 Table 6: Results for Pr76 instance Instance Pr76 Algorithm Avg. Std #Opt EM M RSA 98.460 0.805 3 EM M RSB 98.548 0.956 6 EM M RSC 98.935 0.712 8 EM M RSD 98.939 0.744 9 EM M RSE 98.135 1.018 3 EM M RSF 98.319 1.133 5 EM M RSG 98.444 0.868 3 EM M RSH 98.358 0.893 3 EM MA 70.942 3.561 0 EM MB 70.766 4.366 0 EM MC 72.966 3.144 0 EM MD 72.765 3.349 0 EM ME 69.785 4.482 0 EM MF 70.830 3.690 0 EM MG 72.562 3.099 0 EM MH 72.902 3.283 0 GAA 71.318 3.953 0 GAB 72.237 3.870 0 GAC 76.042 3.546 0 GAD 74.763 3.397 0 GAE 70.778 3.620 0 GAF 71.792 4.663 0 GAG 74.148 3.965 0 GAH 74.170 3.769 0 GA Problem f4 f8 fU40 fU60 fB40 fB60 fS fR Gr48 Pr76 kroA100 kroB100 kroC100 kroD100 kroE100 % 100 % 0% 0% 0% 0% 0% 83 % 0% 0% 0% 0% 0% 0% 0% 0% Table 7: default EMM Avg. % Avg. 1690 100 % 916 n/a/ 78 % 8291 n/a 100 % 1593 n/a 100 % 2739 n/a 100 % 4578 n/a 64 % 9353 3756 74 % 1874 n/a 0% n/a n/a/ 0% n/a/ n/a/ 0% n/a/ n/a/ 0% n/a/ n/a/ 0% n/a/ n/a/ 0% n/a/ n/a/ 0% n/a/ n/a/ 0% n/a/ EMMRS % Avg. 100 % 179 100 % 1040 100 % 105 100 % 253 100 % 380 100 % 550 82 % 2137 100 % 3105 28 % 13468 18 % 57911 2% 124703 0% n/a 4% 85784 2% 64893 2% 34296 [6] D. E. Goldberg and C. L. Bridges. An analysis of a reordering operator on a GA-hard problem. Biological Cybernetics, 62:397–405, 1990. [7] D. E. Goldberg and J. Richardson. Genetic algorithm with sharing for multimodal function optimization. In Proceedings of the Second International Conference on Genetic Algorithms, pages 41–49, Hillsdale, NJ, 1987. Lawrence Erlbaum Associates. so aggressive in terms of the translation through the search space. [8] C. Igel and M. Toussaint. On classes of functions for which We think that applied with a medium probability allows to combiNo Free Lunch results hold. Information Processing Letters, ne both exploration and exploitation. As a final conclusion we put 86:317–321, 2003. together results from [12] and our new results. Although we need [9] W. B. Langdon. Graphics processing units and genetic not only more experimental study, but also a theoretical analysis, programming: an overview. Soft Computing, 15:1657–1669, we extract from the experimental results that the R&S is good for 2011. problems so different as those presented in Table 7. [10] G. E. Liepins and M. D. Vose. Deceptiveness and genetic algorithm dynamics. In G. J. E. Rawlins, editor, FOGA, Acknowledgements pages 36–50. Morgan Kaufmann, 1990. ABSys group is supported by Spanish Government grants INNPACTO- [11] S. K. Park and K. W. Miller. Random number generators: IPT-2011-1198-430000 and TIN 2008-00508. good ones are hard to find. Communications of The ACM, 31:1192–1201, 1988. [12] J. L. Risco-Martin, J. I. Hidalgo, J. Lanchares, and O. 8. ADDITIONAL AUTHORS Garnica. Solving discrete deceptive problems with EMMRS. Josefa Díaz (Universidad de Extremadura, GEA Group, email: In Genetic and Evolutionary Computation Conference, pages [email protected]) . 1139–1140, 2008. [13] F. Rothlauf. Representations for genetic and evolutionary 9. REFERENCES algorithms, volume 104 of Studies in Fuzziness and Soft [1] R. Chow. Evolving genotype to phenotype mappings with a Computing. Springer, 2002. multiple-chromosome genetic algorithm. GECCO 2004, vol. [14] M. Shackleton, R. Shipma, and M. Ebner. An investigation 3102 of LNCS, pages 1006–1017. Springer, 2004. of redundant genotype-phenotype mappings and their role in [2] D. Dasgupta. Handling Deceptive Problems Using a evolutionary search. In IEEE Congress on Evolutionary Different Genetic Search. In International Conference on Computation, 2000. Evolutionary Computation, pages 807–811, 1994. [15] L. D. Whitley. Fundamental Principles of Deception in [3] S. Forrest and M. Mitchell. What Makes a Problem Hard for Genetic Search. In Foundations of Genetic Algorithms, pages a Genetic Algorithm? Some Anomalous Results and Their 221–241, 1990. Explanation. Machine Learning, 13:285–319, 1993. [4] D. Goldberg and R. L. Alleles. Loci and the traveling salesman problem. In International Conference on Genetic Algorithms, 1985. [5] D. E. Goldberg. Genetic Algorithms and Walsh Functions Part II: Deception and its Analysis. Complex Systems, 3:129–152, 1989. 1014
© Copyright 2026 Paperzz