Handbook of Research on Natural Computing for Optimization Problems Jyotsna Kumar Mandal University of Kalyani, India Somnath Mukhopadhyay Calcutta Business School, India Tandra Pal National Institute of Technology Durgapur, India A volume in the Advances in Computational Intelligence and Robotics (ACIR) Book Series Published in the United States of America by Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue Hershey PA, USA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: [email protected] Web site: http://www.igi-global.com Copyright © 2016 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. Library of Congress Cataloging-in-Publication Data Names: Mandal, Jyotsna Kumar, 1960- editor. | Mukhopadhyay, Somnath, 1983editor. | Pal, Tandra, 1965- editor. Title: Handbook of research on natural computing for optimization problems / Jyotsna Kumar Mandal, Somnath Mukhopadhyay, and Tandra Pal, editors. Description: Hershey, PA : Information Science Reference, 2016. | Includes bibliographical references and index. Identifiers: LCCN 2016002268| ISBN 9781522500582 (hardcover) | ISBN 9781522500599 (ebook) Subjects: LCSH: Natural computation--Handbooks, manuals, etc. Classification: LCC QA76.9.N37 H364 2016 | DDC 006.3/8--dc23 LC record available at http://lccn.loc.gov/2016002268 This book is published in the IGI Global book series Advances in Computational Intelligence and Robotics (ACIR) (ISSN: 2327-0411; eISSN: 2327-042X) British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the authors, but not necessarily of the publisher. For electronic access to this publication, please contact: [email protected]. 66 Chapter 4 A System on Chip Development of Customizable GA Architecture for Real Parameter Optimization Problem Sumitra Mukhopadhyay University of Calcutta, India Soumyadip Das University of Calcutta, India ABSTRACT This chapter presents the design and development of a hardware based architecture of Evolutionary Algorithm for solving both the unimodal and multimodal fixed point real parameter optimization problems. Here a modular architecture has been proposed to provide a tradeoff between real time performance and flexibility and to work as a resource efficient reconfigurable device. The evolutionary algorithm used here is Genetic Algorithm. Prototype implementation of the algorithm has been performed on a system-on-chip field programmable gate array. The notable feature of the architecture is the capability of optimizing a wide class of functions with minimum or no change in the synthesized hardware. The architecture has been tested with ten benchmark problems and it has been observed that for different optimization problems the synthesized target requires maximum of 5% logic slice utilization, 2% of the available block RAMs and 2% of the DSP48 utilization in Xilinx Virtex IV (ML401, XC4VLX25) board. INTRODUCTION A large number of real world problems like asset allocation, best possible resource utilization, automated system design and operation etc. require decision making process for future evolution of the system under uncertainty. However, in stochastic environment, the problem of decision making involves multiple sub problems like system identification, state estimation and generation of optimal control. Many of DOI: 10.4018/978-1-5225-0058-2.ch004 Copyright © 2016, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. A System on Chip Development of Customizable GA Architecture these modules require the formulation of mathematical models of the process to be controlled and take aid of several optimization algorithms. Also in various fields of engineering and technology like civil, mechanical, chemical, electronic design automation, VLSI, control, machine learning, signal processing etc. mathematical optimization is applied to find optimal solutions. The optimization problem is generally formulated by representing the different situation of the real world problem in mathematical terms. The possible solutions are represented as the decision variables; the limits of the decision variables are indicative to the range of the solution search space. The objective function is defined, which is usually a function of the decision variables, and its value is required to be optimized (minimized or maximized) to obtain optimum performance of the system satisfying the defined constrains. The unconstrained problems are formulated and optimized without any such constraints. The standard form of an optimization problem can be defined as follows: min f0 (x ) (1) Subject to fi (x ) ≤ 0, i = 1,..., m hi (x ) = 0, i = 1,..., p Here, function, f0: Rn→ R is called the cost function or the objective function; the functions fi: Rn → R, i = 1,..., m, are the inequality constraint functions; and, the functions hi: Rn → R, i = 1,..., p, are the equality constraint functions; x = {x1, …,xn} is a vector called the optimization variable of the problem. A vector z is said to be feasible if it satisfies all the equality and inequality constraints: f1(z)≤0,…., fm(z)≤0; and h1(z)=0,…., hp(z)=0; A particular vector x* is called as the optimal solution if it has the optimal objective value among all the feasible vectors: f0(z)≥ f0(x*); An optimization problem is said to be unconstrained if m = p = 0. The task of solving a problem is to find out an acceptable solution, if not be the best one. The search for the better solution is carried out amongst all feasible solutions. The fitness values of the feasible solutions constitute the search space. Thus we search the minimum or maximum point in the search space that represents the better solution to the problem. Various algorithms (Back, 1996; Holland, 1975; Goldberg, 1989) have been developed for solving the different classes of optimization problems (Boyd & Vandenberghe, 2004). The difficulty of solving the problems depends on the forms and number of constraints, objective functions and variables of the 67 A System on Chip Development of Customizable GA Architecture problem. The optimization algorithms can be deterministic or stochastic. Many of the problems require huge computational efforts to find acceptable solutions. As the problem size increases, the solution processes often fail to converge. Bio-inspired stochastic optimization algorithms are developed as effective alternatives to deterministic methods in such a situation for solving optimization problems. Evolutionary algorithms (EA) optimize a problem by iteratively searching for better solutions utilizing natural phenomena like growth, development, reproduction, selection, survival of the fittest, etc. They include genetic algorithm (GA), differential evolution (DE), and so on. Genetic algorithm, proposed by Holland (1975), is a robust stochastic algorithm for finding optimized solutions. GA is applied across many disciplines to find acceptable solutions to complex problems in a finite amount of time but its computational complexity and iteration time increases for solving complex problems. Thus software implementation of GA for computationally intensive problems causes large updation delays. If dedicated hardware units for GA steps are built, the parallelism inherent in hardware can be utilized to speed up the processes. Also, the overheads of software applications like decoding and interpretation of software commands can be avoided which speeds up the performance of the algorithm. So, if each GA step is mapped into a dedicated hardware, it will take much lesser time than executing the GA in software. That is why, a lot of research efforts are directed towards hardware implementation of Genetic Algorithm as this helps in gaining speed over software implementation for real time optimization requirements. The authors of Graham and Nelson (1996), Koonar (2003,2005), Scott (1994), Tommiska and Vuori (1996), Shackleford (2001), Aporntewan and Chongstitvatana (2001), Tang and Leslie (2004), Vavouras, Papadimitriou and Papaefstathiou (2009, July), Chen et al (2008), Fernando et al (2010), Kok et al. (2013), Nambiar, Balakrishnan, Khalil-Hani, and Marsono (2013), Ashraf, and DeMara (2013) and a number of researchers indicated regarding the hardware implementation of GA. Most of the works in this field have targeted the development of fast GA hardware that outperforms the software GAs in speed. However, many of the researchers are targeting to obtain faster optimization with minimum hardware resources. Application-specific-GA hardware is also implemented that works well for particular problems (Kok et al., 2013). The GA hardware implementations require the hardware to be re-synthesized to accommodate newer optimization problems. GA can also be implemented by first writing the code in software and then extracting the hardware configuration code using CAD tool. But in such implementations, the design customization like parallelism and pipelining is limited by the tool. Here GA is implemented using Verilog HDL only without the assistance of any other code generator tool. Thus it reduces the overheads required during the implementation of the modules in single chip. This research work tries to achieve the following targets: • • 68 The System on Chip (SoC) FSM based fixed point Genetic Algorithm (GA) architecture has been proposed and implemented on single FPGA board(Xilinx Virtex IV (XC4VLX25)) for function optimization without any modification of standard benchmark problems available in the literature. (In the testing phase, most of the literatures are modifying the target problems according to the implementation requirement, making it simpler, reducing the search range, and resolution of search range etc. and thus the complexity of the target problem reduces). To examine the maximum amount of flexibility that can be achieved in a single architecture in terms of genetic parameters like word lengths of population members and that of their fitness values, population size and the number of generations. A System on Chip Development of Customizable GA Architecture • The GA architecture was slightly modified and integrated into a Hardware-in-the-Loop (HIL) testing environment to develop a flexible GA based optimization system that requires no re-synthesis to solve different optimization problems. Thus a universal system has been designed which will help to find out an optimal solution for a large number of optimization problems. It has been successfully tested for 10 standard Benchmark Problems. The performance metric used to evaluate the functioning of the proposed GA based architecture is the number of function calls (NFCs), success rate and success performance which are detailed in the result section of the chapter. Table 1 represents a list of abbreviations and their full forms. A brief description of genetic algorithm is presented in the following section. Table 1. Abbreviations and their full forms Abbreviation Full Form AHDL Altera Hardware Description Language ASIC Application Specific Integrated Circuit CAD Computer Aided Design CCU Central Control Unit CM Crossover Module DE Differential Evolution DSP Digital Signal Processing EA Evolutionary Algorithm ES Evolutionary Strategy FEM Fitness Evaluation Module FIL FPGA-in-Loop FMem Fitness Memory FPCB Field Programmable Circuit Board FPGA Field Programmable Gate Array FSM Finite State Machine GA Genetic Algorithm GI Guard Interval GP Genetic Programming HDL Hardware Description Language HIL Hardware-in-Loop IP Intellectual Property IPGM Initial Population Generation Module LCA Linear Cellular Automata LCM Load Constraint Module LFSR Linear Feedback Shift Register MM Mutation Module NFC Number of Function Call continued on following page 69 A System on Chip Development of Customizable GA Architecture Table 1. Continued Abbreviation Full Form Np Population Size OSFVMem Offspring Solution Fitness Value Memory OSMem Offspring Solution Memory PCI Peripheral Components Interconnect PRNG Pseudo Random Number Generator PSFVMem Parent Solution Fitness Value Memory PSM Parent Selection Module PSMem Parent Solution Memory PSSM Prospective Solution Selection Module RAM Random Access Memory RNG Random Number Generator SMem Solution Memory SoC System on Chip SPGA Splash-2 Parallel GA TSMC Taiwan Semiconductor Manufacturing Company TSP Travelling Salesman Problem UAV Unmanned Aerial Vehicle VHDL Very High Speed Integrated Circuits Hardware Description Language VLSI Very Large Scale Integration GENETIC ALGORITHM GA is based on the natural principle of “survival of the fittest”. It constitutes of a number of steps like Initial Population Generation, Fitness Evaluation, Parent Selection, Crossover, and, Mutation. In Figure 1 the Block Diagram of Genetic Algorithm Cycle is illustrated. In genetic algorithm, solutions are generated iteratively with an expectation that successive iteration will produce better solutions to the problem. INITIAL POPULATION GENERATION The search process starts with the generation of a set of solutions, called chromosome, predicted arbitrarily in the domain of the problem. This is called the initial population of solutions and this step is called Initial Population Generation. Each chromosome represents some characteristics of the solution to the problem. They may consist of a string of binary numbers in which each bit or group of bits encode some characteristics of the solution. Usually, this is done employing random number generators that generate solutions encoded as binary strings within the domain of the problem to be optimized. The number of chromosomes generated is determined by population size (Np), which is the number of candidate solutions present in each generation. 70 A System on Chip Development of Customizable GA Architecture Figure 1. Block Diagram of Genetic Algorithm Cycle FITNESS EVALUATION In Fitness Evolution, the characteristics of the solutions encoded into the chromosomes are utilized to evaluate the fitness value of the chromosomes using an appropriate objective function that represents the problem to be optimized. Since the objective function changes from one problem to another, the fitness evaluation step is required to be modified. PARENT SELECTION In this step, the set of chromosomes having promising fitness value are selected to form a set called mating pool for subsequent genetic operations to create new solutions. This may be done by processes like Roulette Wheel selection, tournament selection etc. The proposed GA architecture in this chapter uses the tournament selection process for this purpose. In tournament selection, two solutions are arbitrary selected. Their fitness values are compared and the one having the better fitness value is selected. GENETIC OPERATION The genetic operation consists of two steps, called Crossover and Mutation. The crossover operation mimics the genetic crossover process that occurs during the formation of new individual called offspring in living beings. During crossover operation, two chromosomes are selected from the mating pool. The characters or genes of the chromosomes, encoded as bits, are exchanged and recombined among the 71 A System on Chip Development of Customizable GA Architecture parents to give rise of new chromosomes, called offspring, representing new solutions to the problem. The exchange of bits may occur at one or more points in the parent chromosomes, and such processes are accordingly named as single point or multiple point crossovers. Figure 2(a) and Figure 2(b) illustrates the single point and two point crossovers. The crossover points are selected randomly. The mutation operation mimics genetic mutation in living beings. During mutation, one of more character or genetic trait of a chromosome gets abruptly altered due to some external causes giving rise to an offspring with different traits. This biological process is mimicked in genetic algorithm using the mutation operation. The operation is used to prevent the algorithm from getting stuck to a local minimum for a function shown in Figure 3. Now the fitness values of these new chromosomes developed from the genetic operations are evaluated. The best solutions are selected to form a new set of parents and they will participate in Parent Selection to form the mating pool for the next iteration of genetic operation. Successive generations consist of improved solutions. The genetic algorithm described above is the basis of the EA hardware proposed in this chapter. Figure 2. (a) Single point crossover and (b) 2-point crossover Figure 3. Mutation operation 72 A System on Chip Development of Customizable GA Architecture Pseudo Code for Genetic Algorithm (GA) Generate random solutions in problem domain Write into Parent Memory Loop Parent selection Mating Pool formation with selected parents Write selected parents into offspring memory Crossover Mutation Evaluate new solutions Update parent memory End-loop. The following section gives a review of the previously reported works on hardware implementations of GA. BACKGROUND AND LITERATURE REVIEW OF RELATED WORKS In recent years the importance on the hardware implementation of EA is gaining popularity. GA has been implemented in hardware for a number of times. Here, brief descriptions of those efforts are discussed with special emphasis on the works that report FPGA implementation of general purpose GA for solving optimization problems. However, it is revealed from the detailed literature survey that the implementation of generalized GA architecture for function optimization is limited. Some researchers have targeted and tested their design approach with very limited number of functions. Also, those functions were customized according to the design need and the domain of the solution space considered was essentially integer. Standard available bench mark examples from the literature with its full complexity were not considered in most of the cases which were surveyed during our analysis. One of the earliest hardware implementation of general purpose GA was done by Scott in 1994 (Scott, 1994). At that point of time, the researchers were only motivated to investigate the hardware implementation issues of GA in order to utilize the hardware’s speed advantage over software based GAs (SGA) and inherent parallelism in GA. Multiple FPGAs were used to implement a general purpose Hardware Genetic Algorithm (HGA) on a reconfigurable BORG prototyping board containing five Xilinx XC40000 FPGAs. In the experiments roulette wheel selection and single-point crossover was employed with a population size of 16, member width of 3 bits and fitness value width of 4 bits. Comparison was done between the average execution time of the HGA and SGA in terms of number of clock cycles required for the same number of generations while solving six different test functions. Number of clock cycles was used as a technology independent performance metric. It was shown that, HGA exhibited speedups of 1to 3 orders of magnitude. It was suggested that further hardware improvements could be done by parallelization, concurrent memory access, etc. Prime target was to study the hardware implementation of basic GA and its speedup over software, rather than efficient usage and compactness of hardware resources or extension of GA by implementing other GA operators, selection methods or encoding. Vavouras, Papadimitriou and Papaefstathiou (2009) also reported one GA based hardware implementation on 73 A System on Chip Development of Customizable GA Architecture FPGA. HGA was implemented on XUPV2P platform consisting of an embedded PowerPC. The authors claimed to improve the HGA implemented by Scott in 1994 described above. The implementation was tested using linear, quadratic and cubic test functions. The implementation was compared to some other hardware implementations on the basis of the hardware execution time required to solve the test functions. In another work Tang and Leslie (2004) reported a PCI-based Hardware GA implementation using two Altera FPGAs which are SRAM based. One FPGA (FLEX 6000) was used for bus interface, control unit and the genetic operators. The other FPGA (FLEX 10,000) was used as the fitness evaluator by implementing the objective function. Programmable genetic operators were implemented using parallel and pipelined architectures using FPGA to support one/four/uniform crossover and one/four/uniform/ Gaussian mutation. Roulette wheel selection was used. A modified version of Witte and Holst’s Strait Equation was used as the test problem. In other work in late nineties, the hardware GA architecture was designed targeting on solving constrained optimization problem. Graham and Nelson (1996) used Splash-2 Parallel GA (SPGA) for optimizing travelling salesman problem (TSP). Each SPGA processor consisted of four Xilinx 4010 FPGAs and memories. Slowly the interest started growing on hardware EA implementation and the process speedup and optimum resource utilization became key parameters. In one such work, Tommiska and Vuori (1996) tried to address those issues. The motivation was to utilize the inherent speed advantage of hardware in case of GA implementation for real time applications like optimized telecommunication network routing etc. The platform used comprised of a white noise generator, an A/D converter and two Altera’s Flex 10K FPGAs interconnected by Peripheral Components Interconnect (PCI) cards. This was connected to the host Pentium microprocessor based computer via high speed PCI bus slots through which the FPGAs were configured. The architecture was implemented in a 4 stage pipelined fashion using Altera Hardware Description Language (AHDL) code. Population was stored in RAM in the embedded array blocks and fitness function was in the logic array blocks of FPGA. Each population consisted of 32 members, each 32-bit width. Single point crossover was performed with probability of 100% and mutation probability of 3.1%. Unsigned 32-bit binary numbers were compared as test problem. Round-robin selection was done. The hardware exhibited 212 times speed gain over the same algorithm programmed in C language running on 120 MHz Pentium processor based Linux system. A different fitness function can be employed by re programming the FPGA using AHDL. Thus this GA architecture is not general in true sense of the term as it requires reprogramming and re synthesis for different fitness functions. As the interest on hardware implementation increased, simultaneously the complexity in the development procedure also grew up. One of the solutions was Survival based steady state GA and it was proposed by Shackleford (2001). Six FPGAs, on an Aptix AXB-MP3 Field Programmable Circuit Board (FPCB), were used, and the implementation was not SoC. Another aspect of GA architecture development came into surface with the proposal of compact GA. Aporntewan and Chongstitvatana (2001) described the implementation of a compact GA using Verilog HDL. One max problem was used to evaluate the system performance. But theoretically, the Compact GA cannot absolutely replace Simple GA for all classes of problems as it simulates the orderone behavior of Simple GA using binary tournament selection and uniform crossover. The convergence is ensured for problems consisting of tightly coded, non-over lapping building blocks. Such problems are rarely found in real-world applications. The authors also did not claim that Compact GA hardware performed better than other Hardware GAs as performance evaluations was different (Aporntewan & Chongstitvatana, 2001). 74 A System on Chip Development of Customizable GA Architecture Circuit partitioning in VLSI is an important field for the application of optimization algorithm. Several software EA algorithms are proposed and customized boards are designed based on those algorithms (Manikas, & Cain, 1996; Bui, & Moon, 1998; Koonar, 2003; Areibi, Moussa, & Koonar, 2005). One such example is the design by Koonar (2003) using Very High Speed Integrated Circuits Hardware Description Language (VHDL) to develop an application specific GA architecture to address circuit partitioning in VLSI. The architecture consists of three added memories and 6 modules. A new avenue of GA hardware implementation opened with the development of intellectual property (IP) core for the GA. Some of the examples are described to establish the concept and the significance of the scheme. Chen et al. (2008) described a flexible-very-large-scale integration intellectual property (IP) core for the GA. Using C++ programming, a software application called Smart GA was built that imparts flexibility in the design. Using the Smart GA software, GA parameters like population sizes, individual lengths, fitness functions, crossover operations and mutation rates can be fixed for a particular GA implementation to solve a specific problem and the corresponding Verilog HDL code was obtained. To perform the fitness calculation for different fitness functions, either lookup table or user defined approach was used. In the former method, a fixed number (216) of 16-bit wide fitness values of 16-bit wide individuals can be generated and stored in a fixed sized LUT. In the latter method, the user requires to know Verilog coding for different applications. The IP core generates GA hardware with range of population sizes (8-16384), fitness lengths (8-1024) and individual lengths (8-1024). Randomization was done using linear cellular automata (LCA) method. Tournament selection process was employed. Crossover operator can be chosen from uniform, single-point, two-point and cross-point crossover. Mutation is also user defined. A chip is developed using Taiwan Semiconductor Manufacturing Company’s (TSMC) 0.18-μm cell library. Three test functions were used namely 1-D trigonometric function, 2-D Shubert function and in digital audio broadcasting system to determine optimized guard interval (GI) length under bit error rate (BER) performance specification. The generated architectures found optimal values within 150µs, 315µs and 0.167ms respectively. But the disadvantages are this implementation requires the user to rewrite and re synthesize the hardware for every change in the GA parameters or that of fitness function. So, once ASIC implementation is done, it can be used only for specific problem. Also to predict the best GA parameter settings for a particular fitness function, the user is required to re synthesize the GA netlist repeatedly for best solution. Fernando et al. (2010) presented a customizable FPGA IP core implementation of a General-Purpose GA engine. Here the GA IP core supports up to eight different fitness functions. The core has I/O ports to accommodate a new fitness function. A new function has to be implemented on another device which requires to be connected to those I/O ports for evaluation. 16-bit cellular automata based PRNG is used for initial population generation and other randomizations. Proportionate selection scheme is used for parent selection along with single point crossover and mutation. The GA parameters like population size, number of generation, crossover and mutation and crossover rates are user programmable for the IP core. The core supports chromosome length up to 16 bit. For larger lengths, the netlist requires to be resynthesized and simulated for verification of functionality and timing details. This shows that, the system is flexible with limitations. The FPGA implementation is tested for three test functions using lookup-based fitness evaluation module. Modified, binary and scaled versions functions were used for easy hardware implementation. The possible fitness values are stored in look up table. Thus the values are needed to be pre-calculated and coded into the memory. As an improvement of new application specific GA hardware, Kok et al. (2013) described a SoC FPGA implementation of a modified GA hardware based architecture for path planning of unmanned 75 A System on Chip Development of Customizable GA Architecture aerial vehicle (UAV). Here all the functionalities of GA-based path planner are implemented on a single FPGA chip. According to the authors, this was the first SoC implementation of a GA-based path planner. But the design is not balanced. Also, it aims at solving only the problem of UAV path planning using GA and does not have a universal approach. Very recently researchers are also targeting the optimization problems in cognitive radio environment and we find few simulations study in this regard. In the works by Rieser (2004) and Rondeau, Le, Rieser, & Bostian, (2004, November), cognitive radio controlling systems are modeled using GA. Optimized Spectrum sensing and allocation for cognitive radio have been proposed by Zhao, Peng, Zheng, & Shang (2009) and Deka, Chakraborty, & Roy,(2012). Table 2 shows a parametric comparison between the implementation aspects of different hardware GAs along with those of the present proposal. Most of these implementations as discussed above suffer from one or more of the following disadvantages: • • • • • • Except for (Kok et al., 2013), implementation of the algorithm was done using platforms containing more than one FPGA chips. So these hardware architectures are larger in size and can be used in applications where compactness of the hardware is not a primary requirement. Also, inter-chip communication requirement for running of these architectures costs their speed. Many of these architectures (Scott, 1994) are not portable and cannot be implemented on other platforms. Some of the implemented GAs, such as in (Graham and Nelson, 1996; Koonar, 2003; Kok et al., 2013) are application specific. The general purpose GAs implemented in (Scott, 1994; Tommiska & Vuori, 1996; Aporntewan & Chongstitvatana, 2001; Tang & Leslie, 2004; Chen et al., 2008; Fernando et al., 2010) lack in flexibility. Each of these supports only a single or a finite set of fitness functions. Thus utilizing such hardware for optimization in other applications or for a different fitness function requires alteration in the basic hardware and re-synthesizing the hardware again. In all hardware based systems developed till date, the architecture needs to be re-synthesized for enabling it to optimize new problems. Hardware GA implementations as in Fernando et al., 2010 dealt with integers as the design variables and fitness values instead of real numbers. Although complex test functions are optimized, but they have been modified for hardware implementations. Also the minimization functions are changed to maximization functions. Customized problems were used and in the implementation search space of the variables were integers. This chapter presents a general purpose function optimization architecture which is a significant improvement over most existing works using application specific architectures. In this chapter, our proposed GA hardware implementation has following features which serve as key factors for obtaining solutions to the above mentioned problems: • • 76 The EA Hardware described in this paper is implemented entirely on a single FPGA chip. This implementation can be utilized for applications that require a compact and small sized hardware. Fixed Point GA Hardware is implemented in a finite state machine (FSM) based approach where each FSM state corresponds to a basic module of the architecture. The execution of individual x; x+5; 2x;x^2; 2x^3-45x^2+300x; x^3-15x^2+500 Travelling Salesman Problem comparison between 32-bit unsigned binary protein folding one max problem Circuit partitioning in VLSI modified Witte and Holst’s Strait Equation:|x1-a|+|x2b|+|x3-c| 1-D trigonometric function, 2-D Shubert function, DAB system BF6_2; mBF7_2(x,y); mShubert2D(x1,x2) 4-D Benchmark Functions Graham & Nelson (1996) Tommiska & Vuori (1996) Shackleford Aporntewan & Chongstitvatana (2001) Koonar (2003) Tang & Leslie (2004) Chen et al. (2008) Fernando et al. (2010) Proposed Functions Scott (1994) Work 8 or 16 or 32 or 64 Programmable 8-16384 Programmable 20 Fixed(256) 24 bit 16 bit 8-1024 32-bit Fixed(32) Fixed --- 3 bit Member Width 64/128/256 Fixed(16) Population Size Programmable Progarmmable (32-bit) Dynamic Progarmmable 20/60/100 N/A Fixed Fixed --- Fixed No. of Generations Tournament Roulette Tournament Roulette Tournament N/A Survival Round Robin Roulette Roulette Selection Fixed Progarmmable (4-bit) Dynamic Progarmmable Fixed N/A Fixed Fixed (100%)/ fixed (3.1%) --- Fixed Crossover/ Mutation Rates PRN Generator Type --- Fixed Cellular Automata Cellular Automata Fixed Single Point Linear Cellular Automata Single Point Cellular Automata uniform, singlepoint, twopoint and cross-point Single Point, 4-point, Uniform Single point LFSR N/A Single Point Cellular Automata Seed Xilinx Virtex2Pro FPGA Altera PCI card Virtex xcv50e Xilinx Virtex 1000 Aptix Altera’s Flex 10K FPGAs on PCI cards - BORG board Hardware Platform Provided through Xilinx Input Virtex4 FPGA Programmable Fixed Fixed Fixed Fixed Single Point Linear Shift Fixed Register Single point --- Single Point Cellular Automata Crossover Operators A System on Chip Development of Customizable GA Architecture Table 2. Comparison of different literature survey, problem considered at hand, the variation of parameters and resources 77 A System on Chip Development of Customizable GA Architecture • • • • module is governed by a central control unit (CCU). This approach imparts flexibility to the implementation. The implementation is tested in Xilinx Virtex IV (ML401, XC4VLX25) FPGA chip. Also it can be implemented in other reconfigurable hardware with minor modification in configuration part. The implemented architecture is successfully tested with a number of different types of unimodal and multimodal problems as suggested in the literature by Dieterich, Johannes M., and Hartke (2012), Jamil, Momin, and Yang (2013), Suganthan et al. (2005). The tests produce satisfactory results. The functions were optimized without any modification. So the design variables and fitness values were real numbers instead of integer values. This was dealt in hardware by using fixed point GA hardware implementation approach. The designed hardware was modified and a FPGA-in-Loop (FIL) environment was designed. Therefore the system developed by integrating hardware with FIL could be used to optimize any problem without re-synthesizing the basic hardware. The next section highlights the GA hardware implementation issues and its flow execution. GENETIC ALGORITHM BASED HARDWARE DEVELOPMENT AND ITS OPERATION In this section, architecture of the proposed EA based optimization system and its functioning is described. The architecture is inspired from the simple GA and is designed based on the genetic algorithm discussed above. The fixed point GA hardware is modeled in the form of a finite state machine (FSM) with each GA step corresponding to a state of the FSM. With the help of FSM based approach the states were mapped into hardware structure in modular fashion. Each of the steps, say initial population generation, fitness evaluation, parent selection, crossover, mutation, etc., are designed as a separate module and integrated into a sequential FSM to form the GA hardware. BASIC HARDWARE STRUCTURE A generalized external schematic of the proposed GA architecture is shown in Figure 4 (a) and the detail hardware representation is given in Figure 4(b). The inputs are the CLOCK, RESET, ACTIVATE and SEED. The outputs are the BEST SOLUTION and ERROR or FITNESS VALUE. The hardware consists of a central control unit (CCU), memory units and several modules named after their function as Load Constraint Module (LCM), Initial Population Generation Module (IPGM), Fitness Evaluation Module (FEM), Prospective Solution Selection Module (PSSM), Parent Selection Module (PSM), Crossover Module (CM) and Mutation Module (MM). CCU controls the modules using handshaking signals namely READY, DO and DONE, indicated in Figure 4(b) using arrows between the CCU and the modules. For the sake of clarity, the READY, DO and DONE signals between LCM and CCU only has been indicated in the figure. Each module has an incoming signal from the CCU named DO; and the two signals outgoing from each module into the CCU are named as READY and DONE. 78 A System on Chip Development of Customizable GA Architecture Figure 4. Schematic representation of GA architecture: (a) external view (b) detailed internal structure Pseudo random number generators (PRNGs) are used to generate random numbers for imparting randomization in various steps of GA like initial population generation, arbitrary crossover and mutation point selection, etc. There are four data memory modules and one profile memory. Among the data memories, two modules are required for the storage of parent and the offspring solutions namely the Parent Solution Memory (PSMem) and the Offspring Solution Memory (OSMem). And two other data memories are required for 79 A System on Chip Development of Customizable GA Architecture storing the fitness values of the parent and the offspring solutions which are named as the Parent Solution Fitness Value Memory (PSFVMem) and the Offspring Solution Fitness Value Memory (OSFVMem) respectively. For better hardware realization, the PSMem and the OSMem are implemented as two logical parts of a single physical memory block called the Solution Memory (SMem) while the PSFVMem and the OSFVMem are implemented as two logical parts of another physical memory block named as the Fitness Memory (FMem). The profile memory stores the configuration details of the architecture. Such memory implementation has certain advantages which will be detailed later. Since real numbers are used as design variables and fitness values, the hardware deals with binary fixed point numbers. Here a brief idea of binary fixed point implementation is provided. FIXED POINT HARDWARE IMPLEMENTATION ISSUES A real number consisting of the integer and the fractional parts is represented in the same way in binary only with the base changed to 2. In hardware, there is no representation for the decimal point. Only a number of bits are fixed for representing the fractional and the integer parts of a real number with a scaling factor. For example, if (1011)2 is a binary representation of a real number with scaling factor 2, corresponding decimal real number is 11/2=5.5 and it has a binary fraction point between the 0th and the 1st bits. Here we implement the architecture using this fixed point concept. HARDWARE OPERATION The synchronized operation and communication between the different modules and the hardware operations are described here. The working of the architecture is represented in the flowchart of Figure 5. • • • • 80 Load Constraint Module (LCM): The system initiates the operation when CCU receives the ACTIVATE input, and issues DO signal to LCM. The LCM loads the profile memory with the value from the SEED input. As LCM completes its tasks, it issues DONE signal to CCU which inputs the DO signal to IPGM. Initial Population Generation Module (IPGM): The IPGM uses a pseudo random generator to predict arbitrary solutions which are loaded into the Parent memory as the initial population of solutions. As the IPGM finishes, it sends DONE signal to the CCU. Fitness Evaluation Module (FEM): FEM evaluates the fitness of a chromosome depending on the problem specific fitness function. The fitness function of the problem is implemented in FEM. After evaluating each member one by one, the FEM places its fitness value to the corresponding location of the Parent Solution Fitness Value Memory (PSFVMem). Thus the fitness value of the solution present in the first location of the Parent Solution Memory (PSMem) is mapped into the first location of the PSFVMem and so on. The FEM issues the DONE signal after completion. Parent Selection Module (PSM): Now the CCU initiates the PSM to create the mating pool for genetic operations. The PSM randomly chooses two solutions from the PSMem and uses tournament selection to choose between them. PSM fetches the fitness values of the two randomly chosen solutions from the PSFVMem, compares, and writes the better solution into the OSMem. A System on Chip Development of Customizable GA Architecture Figure 5. Flow chart representing working of the proposed genetic algorithm architecture to solve constrained optimization problems • • Thus the mating pool is created in PSMem for performing genetic operations. After the mating pool is generated, the PSM issues the DONE signal. Crossover Module (CM): The CCU then activates the CM to perform crossover operation on the mating pool formed in the Offspring Solution Memory (OSMem). Here, single point crossover is used. CM selects the first two candidates of the mating pool, employs a PRNG to select a random crossover point for single point crossover giving rise to two new offsprings. The new solutions are written into the positions of the OSMem, from which the mating pool candidates were chosen. This process is repeated with all the pairs of solutions in the mating pool. After the crossover operation, CM issues the DONE signal. Mutation Module (MM): The CCU now initiates the MM. Depending on the mutation probability, the MM decides to mutate some candidates. The MM contains a PRNG that arbitrarily selects a bit of the candidate that is to be mutated. The selected bit of the candidate solution is inverted to mutate it. Now, the offspring memory consists of the new offspring solutions. FEM is again initiated to evaluate the new candidates and writes their fitness values in OSFVMem. 81 A System on Chip Development of Customizable GA Architecture • Prospective Solution Selection Module (PSSM): Now Prospective Solution Selection Module (PSSM) is initiated. It employs a bubble sort (Cormen, Leiserson, Rivest, & Stein, 2003) operation on all the candidates of the parent and the offspring generations depending on their fitness values obtained from PSFVMem and OSFVMem. PSMem is then updated with the elite solutions arranged in the descending order of their fitness values. Their fitness values are also placed in the corresponding locations of the PSFVMem. Now the topmost location of the PSMem and that of the PSFVMem contains the best solution and its fitness value obtained so far. The outputs BEST SOLUTION and ERROR of the GA architecture are read from these locations. This completes a single iteration of the GA. The above steps are repeated until the error is less than a specified threshold or the maximum number of iteration has taken place. To start the next iteration, the CCU again initiates the PSM. For different problems, only the FEM is required to be reprogrammed to implement the fitness function into it. The proposed architecture provides a general structure of a GA hardware implemented on a single FPGA chip. FPGA BASED PROTOTYPE DEVELOPMENT OF GA BASED ARCHITECTURE The proposed design is implemented on a Xilinx development environment on a Virtex-IV (ML401, XC4VLX25) FPGA kit using synthesizable Verilog HDL. The various FPGA platform specific implementation issues and constraints such as limitation of available programmable logic resources are tested using the aforesaid platform. • • • 82 Finite State Machine Based Approach: The Genetic Algorithm is mapped in FPGA based hardware platform using Finite State Machine (FSM) based approach where each state corresponds to a particular operation of GA. The state diagram is shown in the Figure 6. The performance of such FSM based architecture can be easily modified by changing the order of execution of the modules, or by minor changes in a specific module, whose activities required to be changed, without altering the whole program. The hardware mapping of the different components are described in the following sections. Central Control Unit (CCU): The CCU controls the operation of the architecture by issuing control signals to each of the modules. Whenever a module is required to function, the CCU sends a DO signal to the module to engage it. After the module finishes its task, the module sends a DONE signal to the CCU. The CCU then issues DO signal to the next module in sequence that is READY to perform. Encoding GA Parameters - Chromosome Structure: Here the target is to find the optimal solution of an n dimensional problem space. Each possible solution consists of a set of n possible values, one for each dimension. Solutions are encoded in the form of binary strings containing the n number of values each of p-bit long with a scaling factor f. That is, each p-bit number have r fractional bits where r=log2(f) and q integer bits where q=p-r. If the chromosomes are M bit long, then, M=p×n. A System on Chip Development of Customizable GA Architecture Figure 6. State diagram of the proposed GA architecture In our hardware, M=24, p=6, q=4, r=2 and n=4. Thus, the solutions are encoded in the form of binary string which is considered to be a 6-bit signed fixed point real numbers with a scaling factor of four, i.e. the leftmost significant bit is the sign bit, the next three bits represent the whole number and last two digits represent the fractional value. To start with, the hardware is designed to solve four dimensional problems. Thus in the present hardware, each binary string of chromosome represents a set of four binary numbers. Thus each chromosome is a 24(=6×4)-bit binary string where each 6 bits represent a dimension in four dimensional system. The chromosome structure in the GA hardware to solve a 4 dimensional problem is shown Figure 7. If X1, X2, X3 and X4 together represent a possible solution to the 4-dimensional problem, bits 23 to 18 of the chromosome represent X1, bit 17 to 12 represent X2, bits 11 to 6 represent X3 and bits 5 to 0 represent X4. The structure of each of the four 6-bit binary numbers is enlarged in the figure. • Parent and Offspring Memories: As discussed in the previous section, the GA hardware proposed in this paper requires four memory modules, namely the Parent Solution Memory (PSMem), the Offspring Solution Memory (OSMem), the Parent Solution Fitness Value Memory (PSFVMem) and the Offspring Solution Fitness Value Memory (OSFVMem). Here the PSMem and the OSMem are implemented in a single memory called the Solution Memory (SMem); the PSFVMem and the OSFVMem have been implemented into another single memory called the Fitness Memory (FMem). The two physical memories SMem and FMem are implemented using dual port block RAMs available in the Virtex IV FPGA processor (XC4VLX25).Adding the parent and offspring memories to form a single SMem has certain computational advantages while sorting out the elite chromosomes from both the generations. Sorting the solutions of SMem in the 83 A System on Chip Development of Customizable GA Architecture Figure 7. Chromosome structure in the GA hardware to solve 4-dimentional optimization problems descending order of their fitness values, puts the solutions having the better fitness values in the upper half of the SMem. This upper half is defined as the PSMem. So the parent memory automatically gets updated with the elite chromosomes form both the parent and the offspring generations. The arbitrary set of solutions generated at the beginning of the GA process by the IPGM is stored in the PSMem, as well as, the Parent generation chromosomes obtained after every iteration, are updated in the PSMem. The mating pool generated by the PSM is stored in the OSMem. The genetic operations are performed on the contents of the OSMem by the CM and MM to form the offspring generation. The fitness values of the parent and the offspring generation chromosomes are stored in the PSFVMem and the OSFVMem respectively. A particular location of the PSFVMem or OSFVMem contains the fitness value of the solution chromosome present in the same location of the PSMem or OSMem respectively. • • Profile Memory: The profile memory contains the input seed, the lower and upper limits of the feasible region of the problem. The LCM loads these values into the profile memory. Pseudo Random Number Generators: Random number generators (RNGs) are very important components in the GA that generate randomized sequence of numbers or bit strings. The architecture requires four RNGs for different purposes. The IPGM employs a random number generator to generate the initial population of parent generation chromosomes. It is a 24-bit pseudorandom generator module. It generates sixteen 24-bit random binary numbers to form the initial parent generation. The PSM requires random numbers to arbitrarily pick candidates to perform Tournament Selection for generating the mating pool from the parent generation. A 4-bit PRNG is used by the PSM to generate an arbitrary address that is fed into the address input of the dual port Solution Memory, i.e., SMem to pick a parent chromosome for tournament selection to build up the mating pool. The CM and the MM requires random numbers to select a random bit of a chromosome at which the crossover or the mutation occurs. The PRNGs used by the CM and MM generates 5-bit numbers. 84 A System on Chip Development of Customizable GA Architecture The arbitrary number determines the crossover point for a pair of chromosomes or the mutation bit in a 24-bit chromosome. True random numbers are generated using non deterministic sources like clock jitters (Holleman, Bridges, Otis, & Diorio, 2008). Meysenburg et al. (1997) and Meysenburg, and Foster (1999) showed that good quality random numbers have very little effect on the performance of GA. On the other hand, Cantú-Paz (2002) reported that, the GA performance depends on the quality of random numbers used to generate the initial population of chromosomes, but do not depend on the RNG quality used for crossover and mutation operations. Methods for generating pseudorandom numbers are studied by Wolfram (1984) and Hortensius, McLeod, and Card (1989). There are two systems to generate pseudo-random numbers: • • The linear feedback shift register (LFSR). The linear cellular automata (LCA). Wolfram (1984) describes the rule 90 and 150 for generating random numbers using LCA as explained in the equations below: • Rule 90: Si+ = Si −1 ⊕ Si +1 • Rule 150: Si+ = Si −1 ⊕ S i ⊕Si +1 Here Si implies the current state and Si+ is the next state of the of the ith bit of an array. The hardware implementations of the rules are shown in the Figure 8(a) and Figure 8(b) respectively. These rules are used to implement the pseudo random generators required in the different modules of the GA architecture. The seed used in a PRNG directly influences the generated sequence of numbers. In the present proposal, the PRNG seed can be given as an input to the hardware. • • Load Constraint Module (LCM): The LCM is the initial state of the FSM based GA hardware to initiate the operation and initialize all the states along with the memory modules. Initial Population Generation Module (IPGM): The IPGM is implemented as the second state of the FSM. IPGM generates an initial set of arbitrary solutions with the help of the 24-bit PRNG and writes it into the parent memory. Since the memories are implemented using dual port block Figure 8. Hardware implementations of Rule 90 (a) and Rule 150 (b) for generating random numbers using LCA 85 A System on Chip Development of Customizable GA Architecture • • • • 86 RAMs, at every clock, two randomly generated chromosomes are written into the parent memory by the IPGM at two consecutive addresses. Thus it requires half the number of clock cycles compared to the number of candidates in the parent memory. Fitness Evaluation Module (FEM): The FEM is effectively the hardware map of the fitness function of the problem at hand. It is the third state of the FSM. FEM takes two clock cycles to complete its task. In the first clock, the FEM issues the address of the Solution Memory (SMem) location whose fitness is to be evaluated. At the second clock, the content of the said location is read, the fitness value is calculated and the calculated value is written into data-in register of the Fitness Memory (FMem). Also the address of the FMem location at which this value is to be written is issued by FEM. Finally, after the 2nd clock cycle, the value is written into the said location. The FEM issues the same address to both the SMem and the FMem so that the fitness value of the solution present in a location of the SMem gets written into the corresponding location of the FMem. This process continues for all candidates. FEM takes twice as many clock cycles as the number of candidates. Parent Selection Module (PSM): Some of the chromosomes from the parent memory are selected for performing genetic operations on them. These selected chromosomes constitute the mating pool. The PSM does this work. In PSM, the tournament selection process is employed to select the chromosomes. The PSM contains a PRNG for random selection of two chromosomes from the first half of the SMem that is the PSMem, containing the parent generation. The PRNG generates two random numbers in the range 0 to 15. These numbers act as addresses to the first half of the SMem. They are fed into the address inputs of both the SMem and FMem to fetch a pair of parent chromosome and their corresponding fitness values. In the second clock cycle, the PSM compares the fitness values, and selects the chromosome having the better fitness value. Then, it writes the chromosome into the first location of the second half of SMem which is the first location of the OSMem that is the offspring memory. Thus the PSM state takes three clock cycles to populate the mating pool with each candidate solution until the OSMem is filled up. Prospective Solution Selection Module (PSSM): This module maps into hardware the theory of “survival of the fittest”. The PSSM sorts the chromosomes in the decreasing order of their fitness value and finally updates the parent generation with the fitter set of 16 chromosomes for the next iteration. The evaluation module is responsible for fitness evaluation of the chromosomes of the offspring generation. After the genetic operations, the resultant offspring chromosomes and the parent chromosomes are re-evaluated and compared to find the best solutions. The fittest set of chromosomes replaces the chromosomes of the parent generation to form an updated parent generation consisting of solutions closer to minimizing the objective function. The finite state machine again repeats the operations from the Parent Selection Module if the best fitness value obtained does not satisfy the maximum allowable error value of the objective function. The PSSM employs bubble sort algorithm. It requires three clock cycles for each comparison. For a series of m members, bubble sorting requires m*(m-1)/2 comparisons. The number of clock cycles required is 3m*(m-1)/2 for m candidates and another clock cycle for the output updation. Crossover Module (CM): The crossover module acts at the next state of the FSM based hardware implementation. This module maps the genetic crossover process into hardware where genetic traits from both the parent chromosomes are incorporated into the chromosome of the offspring. In the current hardware implementation, at the first clock cycle, the PRNG present in CM generates a random number between 1 and 23 to select the random crossover point in the 24-bit long A System on Chip Development of Customizable GA Architecture • mating pool candidate. A pair of chromosomes is fetched from the mating pool in the OSMem. In the second clock cycle, the portion to the right of the selected bit is swapped between the two chromosomes and these new chromosomes are written into the OSMem. Thus two new chromosomes are obtained at every two clock cycles. Thus in the present implementation, the CM takes as many number of clocks as the number of candidates. Mutation Module (MM): This module corresponds to the genetic mutation process where the characteristics of an offspring changes abruptly from its parent. The MM operates at the seventh state of the FSM. At first clock of this state, the candidate solution from the Offspring generation is fetched and a random number generator generates a random value between 1 and 24 to select a random bit for mutation. The selected bit of the candidate is logically inverted in the next clock cycle. Thus in the current hardware implementation, the MM takes twice the number of clock cycles as to the number of candidates. Evidently, for m number of candidates, number of clock cycles required by LCM, IPGM, FEM, PSSM, PSM, CM and MM are respectively 1, m/2, 2m, 3m*(m-1)/2+1, 3m, m and 2m. From the flowchart of Figure 5 it is can be seen that, in the initial iteration, the modules operate in the sequence LCM, IPGM, FEM, PSM, CM, MM, FEM, PSSM; for all the subsequent iterations, the modules operates in the order PSM, CM, MM, FEM, PSSM and again starts from PSM and so on. So, if we denote the number of clocks required for the initial cycle as nclki and the number of clocks required for the subsequent cycles as nclks, then they are given as nclki = 1 3m 2 + 18m + 4 2 (2) nclki = 1 3m 2 + 13m + 2 2 (3) ( ( ) ) For different functions and problem implementation the clock cycles consumed are indicated by above expressions, however as the operating frequency of the implemented prototype changes the execution time also changes. This shows the novelty of the FSM based proposed architecture. The benchmark problems are detailed in the following section. BENCHMARK PROBLEMS The effectiveness of different evolutionary algorithms in function optimization is determined using a large test set consisting of standard problems known as benchmark problems. The “no free lunch” theorem (Wolpert, & Macready, 1997) proves that the average performance of any two searching algorithms will be same if compared with all possible functions. No algorithm can be regarded as better than another in solving all possible functions. Thus a particular algorithm is usually suitable to solve a set of problems consisting of some common characteristics. To form an evaluation test set for an algorithm, the problem set needs to be characterized for which the algorithm is suitable. Benchmark functions can be classified 87 A System on Chip Development of Customizable GA Architecture in terms of characteristics like modality, separability, dimensionality, scalability, etc. The features of these Benchmark functions are defined below: • • Dimensionality: The difficulty of a problem is directly proportional to its dimensionality. The size of the search space increases exponentially with increase in the dimensionality or number of parameters in the problem. To introduce the same order of difficulty, all the problems in the present chapter are chosen to have dimensionality D=4. Modality: The number of optimum (minima or maxima) present in the search space defines the modality of a function. A function is multimodal if it has two or more local minima or maxima. For a multimodal function, the algorithm may get stuck in one of the local minima (or maxima) during the search process, and thereby fail to find the global minima (maxima). Thus the search process slows down and it is difficult to find true optimal solutions for multimodal functions. • Separability: Separability of a benchmark function is a measure of the difficulty in solving it using evolutionary algorithms. The variables in a separable function are independent of each other, while they are interdependent in inseparable functions. Separability is related to the interrelation among the function variables. Non-separable functions are more difficult to optimize than separable functions. Thus a function with n variables is said to be separable if it can be expressed as a sum of n functions each with a unique variable (Hadley, G., 1964; Ortiz-Boyer, Hervás-Martínez, & García-Pedrajas, 2005). Thus, a function f of n variables f(x1, x2, x3, …, xn) is said to be separable if it can be expressed in terms of n functions f1, f2, f3,…, fn, such that n f (x 1, x 2 , x 3 ,..., x n ) = ∑ fi (x i ) i =1 • Scalability: A function is said to be scalable if it can be expressed in n-dimensional form where n is any integer that is, its dimensionality can be changed. Otherwise, it is said to be non-scalable. Let us consider the following two functions f and g as follows n f (x 1, x 2 , x 3 ,..., x n ) = ∑ fi (x i ) g(x 1, x 2 ) = x 1 + x 2 i =1 where, n may be 1, 2, 3, … or any integer. Thus it is evident that function g has a fixed dimensionality of 2, whereas function f can have a dimensionality of 1, 2, 3, …, n depending on the value of n. Thus f can be scaled to have different dimensional forms, while g has a fixed dimensionality. So, f is called a scalable function and g is a non-scalable function. The benchmark problems used in this chapter to evaluate the performance of the proposed EA based architecture are detailed below. Here, the dimension is given by D; the domain size is denoted by Lb≤ xi ≤ Ub, where Lb and Ub are the upper and lower bounds of the domain; X* denotes the value of the variables at the global minima ;and, F(X*) = F(x1,…, xn) is the optimal solution. 88 A System on Chip Development of Customizable GA Architecture • Sphere Function: D F1 (x ) = ∑ x i2 i =1 where, -5.12≤ xi ≤5.12; global minima at X*= (0,.., 0); F(X*) = 0. • Schwefel’s Double Sum Function: 2 i F2 (x ) = ∑ ∑ x j D i =1 j =1 where, -65≤ xi ≤ 65 ; global minima at X*= (0,.., 0); F(X*) = 0. • De Jong’s Function 4 (No Noise): D F3 (x ) = ∑ ix i4 i =1 where, -1.28≤ xi ≤1.28; global minima at X*= (0,.., 0); F(X*) = 0. • Powell Sum Function: D F4 (x ) = ∑ x i i =1 i +1 where, -1≤ xi ≤1; global minima at X*= (0,.., 0); F(X*) = 0. • Rastrigin Function: D ( ) F5 (x ) = 10D + ∑ x i2 − 10 cos(2πx i ) i =1 where, -5.12≤ xi ≤5.12; global minima at X*= (0,.., 0); F(X*) = 0. • Griewangk’s Function: x 2 D x F6 (x ) = ∑ i − ∏ cos i + 1 i i =1 4000 i =1 D 89 A System on Chip Development of Customizable GA Architecture where, -600≤ xi ≤600; global minima at X*= (0,.., 0); F(X*) = 0. • Ackley 1 Function (Ackley Path Function): D D F7 (x ) = −20 exp −0.2 D −1 ∑ x i2 − exp D −1 ∑ cos(2πx i ) + 20 + e i =1 i =1 where, -32≤ xi ≤ 32; global minima at X*= (0,.., 0); F(X*) = 0. • Cosine Mixture Function: D D i =1 i =1 F8 (x ) = −0.1∑ cos(5πx i ) − ∑ x i2 where, -1≤ xi ≤1; global minima at X*= (0,.., 0); F(X*) = (0.2 or 0.4) for D=2 and 4 respectively. • Csendes Function: D F9 (x ) = ∑ x i6 (2 + sin(1 / x i )) i =1 where, -1≤ xi ≤1; global minima at X*= (0,.., 0); F(X*) = 0. • Solomon Function: F10 (x ) = 1 − cos 2π D ∑x i =1 2 i + 0.1 D ∑x i =1 2 i where, -100≤ xi ≤100; global minima at X*= (0,.., 0); F(X*) = 0. The characteristics of the above benchmark functions are given in Table 3. EXPERIMENTAL STUDY This section describes the FPGA hardware implementation results, functional verification and performance analysis of the proposed GA hardware. Performance Metrics: The metrics used in this chapter to evaluate the performance of the proposed architecture are as follows: 90 A System on Chip Development of Customizable GA Architecture Table 3. Characteristics of the benchmark functions F1- F10 Function Multimodal Separable Scalable F1 No yes Yes F2 No no Yes F3 No no Yes F4 no yes Yes F5 yes yes Yes F6 yes no Yes F7 yes no Yes F8 yes yes Yes F9 yes yes Yes F 10 yes no Yes • • • • Number of Function Calls (NFCs): The convergence speed of the architecture is measured in terms of number of function calls (NFCs) which has been used as a metric in works by Suganthan et al. (2005), Andre, Siarry, and Dognon, (2001), and Hrstka, and Kučerová, (2004). This gives the number of candidate solutions evaluated by the architecture before finding the best solution. The percentage of solution space evaluated before finding the best solution gives a measurement of the speed up achieved over an exhaustive search. Success Rate: It is the percentage of runs when the architecture finds a solution for a problem. Success Performance: It is the mean number of NFCs for successful runs multiplied by the number of total runs and divided by the number of successful runs. Parameter Settings: The different GA parameter settings for the conducted experiments are detailed below: • • • • 4-dimensional version of the functions was considered. The chromosome length for all the experiments was fixed at 24. Also the predicted values were considered to be signed. Hence for four variable functions, each variable range is between -32 and 31. Population size can be varied among 8, 16, 32 and 64. Good results were obtained keeping it at the minimum value of 8. So throughout all the experiments, the population size was kept fixed at 8, that is, each population contains 8 candidate solutions. Maximum NFCs is fixed at 104. Since the chromosome length was considered to be 24, the size of the solution space becomes 16777216. The design verification and functional analysis are presented in a few steps. First, the architecture was coded and simulated for functional verification. Then, it was synthesized and implemented on the target FPGA device (XC4VLX25) to analyze the performance of the GA architecture. During both simulation and FPGA implementation, the architecture was used to optimize some Benchmark Function (detailed below) to test its functional correctness. Finally, an FIL testing environment was set up to test the ar- 91 A System on Chip Development of Customizable GA Architecture chitecture with standard Benchmark functions and study its performance. These experimental steps are detailed below in this section. SIMULATION AND FUNCTIONAL VERIFICATION The architecture was coded using Verilog Hardware description language and simulated using Xilinx ISE 10.1i simulator. The parent and the offspring memories contain the solutions that are modified during each GA step like selection, cross over, mutation, etc. To check for functional correctness during simulation the memory contents generated after each step were written in text files. Thus the evolutionary change in the solutions through the generations of GA can be observed by looking at the files. The functioning of the designed architecture was also verified. During the simulations, the following fitness function was optimized to test the performance. The function was scaled to its 4- dimensional version and optimized using the GA architecture. The function G1 below is a minimization function D G1 (x ) = −30 + ∑ ix i i =1 where, the minimum value is, min (G1)= G1(0,..,0)=0. The 4-D version of G1 is G1 (x ) = −30 + x 1 + 2x 2 + 3x 3 + 4x 4 SYNTHESIS AND FPGA IMPLEMENTATION The GA based architecture was synthesized by Xilinx XST using Xilinx Virtex IV (XC4VLX25) as the target device on platform ML401. The synthesis result and device utilization summary is shown in Table 4. The synthesized design was downloaded into the FPGA device and its performance was verified using the FPGA in the Loop (FIL) environment. Figure 9 shows the FIL based experimental setup. Here the FIL block denotes the Simulink block that represents the Xilinx Virtex IV (XC4VLX25) device. The ‘Seed’ and the ‘Act’ respectively denotes the SEED and the ACTIVATE inputs to this architecture, the best solution of a generation consisting of a set of four values for a 4-dimentinal problem, are obtained through the ‘x1’, ‘x2’, ‘x3’,‘x4’ outputs. A scope is used to plot the output values against time and workspace variables are used to record the outputs. The GA based architecture was implemented and used to optimize the above fitness function G1. The function G1 was implemented into the FEM. The results for six independent runs are shown in Table 5. Here, the best fitness values achieved for different input seeds are presented. The “Convergence Generation Number” column lists the number of generations required to achieve convergence for a given seed. Figure 10 illustrates the convergence graph of G1. 92 A System on Chip Development of Customizable GA Architecture Table 4. Device utilization report for implementation of the proposed GA hardware in Xilinx Virtex IV FPGA (XC4VLX25) Component Slices Available Used Utilization % 10752 531 4% Slice Flip Flops 21504 445 2% 4 input LUTs 21504 921 4% Bonded IOBs 448 90 20% Dual port Block Memory (RAMB 16s) 72 2 2% GCLKs 32 1 3% DSP48s 48 1 2% Figure 9. FIL simulation based experimental setup for the GA architecture Figure 10. Convergence plot for the test function G1 calculator 93 A System on Chip Development of Customizable GA Architecture Table 5. The best fitness values achieved for different input seeds for function F1 Run Number Seed Fitness Convergence Generation Number NFCs % of Solution Space Evaluated (16777216) 1 2 0 3 24 .00014 2 5 0 4 32 .00019 3 6 0 9 72 .00043 4 15 0 12 96 .00057 5 17 0 8 64 .00038 6 29 8 5 40 .00024 PERFORMANCE ANALYSIS OF THE PROPOSED GA STRUCTURE: OPTIMIZATION OF STANDARD BENCHMARK FUNCTIONS FIL environment using MATLAB Simulink and a slightly modified version of the GA architecture was set up to test the hardware performance using the benchmark functions. Here, the objective function is directly written as input to the setup. The FEM of the hardware was modified. The FEM outputs the predicted solution which is fed as an input to a Simulink block into which the fitness function is written. The fitness value of the solution is calculated inside this block. Output of the block is connected to the FEM which stores the fitness value into the FVMem. Thus, the need for recoding and re-synthesis of the basic GA hardware for each function is eliminated. Figure 11 illustrates the Simulink model of the experimental setup. The FIL block has inputs named ‘seed’, ‘act’, and outputs namely ‘x1’, ‘x2’, ‘x3’,‘x4’ Figure 11. Experimental setup for evaluation of the GA architecture using benchmark function in FIL environment 94 A System on Chip Development of Customizable GA Architecture and ‘error’. The MATLAB function block is used to write the fitness function. The pred_x1, pred_x2, pred_x3, and pred_x4, outputs from the FIL block give the predicted solution into the MATLAB function block and the calculated fitness value is fed into the architecture through the ‘value_in’ input. The test results for optimization of the benchmark functions using the proposed GA hardware are presented in Table 6 and Table 7. The NFCs are fixed as mentioned above and 40 independent runs were performed for each benchmark function. The best, median, worst, mean and standard deviation of the fitness values generated for these 40 runs are presented in Table 6 for the benchmark functions. In Table 7, the best, median, worst, mean and standard deviation of the number of function calls (NFCs) required in finding the best fitness are presented. The table also presents the success rate, success Table 6. The best, median, worst, mean and standard deviation of the function error values generated over 40 runs for test functions F1-F10 Best Median Worst Mean Standard Deviation F1 Function 0 0.0625 0.0625 0.0404 0.0308 F2 0 0.0625 0.125 0.0592 0.057 F3 0 0 0.1875 0.006696 0.029715 F4 0 0 0.0625 0.003125 0.013975 F5 0 0 1 0.125 0.3536 F6 0 0.0313 0.0625 0.0198 0.0209 F7 0 0 1.09375 0.328125 0.528331 F8 .0313 0.0625 0.21875 0.051786 0.032987 F9 0 0 0 0 0 F 10 .0938 0.09375 0.1875 0.101351 0.025943 Table 7. The best, median, worst, mean and standard deviation of the number of function calls (NFCs) required in finding the best fitness for the test functions F1-F10 over 40 test runs Function Best Median Worst Mean Standard Deviation Success Rate Success Performance Average % of Total Solution Space Evaluated F1 64 232 384 208.4706 113.7179 100% 208.4706 .0012 F2 64 280 3096 551 912.8842 100% 551 .0033 F3 40 192 440 188.7619 99.52958 100% 188.7619 .0007 F4 64 200 376 175.6 97.75501 100% 175.6 .0010 F5 64 72 120 75 18.6088 100% 75 .00044 F6 48 136 832 162.9333 143.9461 100% 162.9333 .00097 F7 64 80 224 121.6 69.10732 100% 121.6 .00072 F8 72 176 384 173.9429 73.56068 100% 173.9429 .00103 F9 48 100 272 113.8462 82.2502 100% 113.8462 .00068 F 10 80 544 3680 867.8919 833.8135 100% 867.8919 .00517 95 A System on Chip Development of Customizable GA Architecture performance and average percentage of total solution space evaluated before convergence, as these gives a measure of the speedup over exhaustive search processes. It can be seen that for every function, the GA hardware requires evaluating a very nominal percentage of the total solution space and this indicates a very high speed up over exhaustive search. The Figure 12 shows the convergence graphs for functions F1 to F5 the Figure 13 shows the same for functions F6 to F10. The convergence plots for all the benchmark test functions indicate that, the proposed GA hardware converges within only 8 to 10 generations. Figure 12. Convergence plot for the test functions F1, F2, F3, F4 and F5 Figure 13. Convergence plot for the test functions F6, F7, F8, F9 and F10 96 A System on Chip Development of Customizable GA Architecture COMPARISON WITH SOFTWARE IMPLEMENTATION AND PREVIOUS GENERAL PURPOSE HARDWARE GAs In this section, the speeds of the implemented GA hardware and GA software are compared. In order to make the comparison fair enough, the different parameters of the GA like population size, selection and crossover, are set same in both the MATLAB simulation and hardware implementation. The software was run to solve the same benchmark functions. The time required for the GA architecture to process each GA population can be calculated as the product of the number of clock cycles required and the clock period (Fernando et al.,2010). From Equations 2 and 3, we know that, when population size is 8, the total number of clock cycles required for the initial iteration (nclki) and subsequent iterations (nclks) are 170 and 149 respectively. If the number of iterations required for convergence is known, the hardware execution time for convergence of the test functions can be calculated. The largest available clock frequency of the prototype implementation as obtained from the actual synthesis report is 36.757MHz giving the clock period of 27.205ns. The speed up of the hardware GA against the software GA is shown in Table 8. Figure 14, compares the average hardware and software execution times to converge for the functions F1 to F10. It can be observed that the hardware execution time is negligibly small in comparison to the software execution time. In Table 9, the reported speed ups of different hardware GAs and that of the proposed hardware is compared. Here, the best speed ups reported in the literatures are tabulated. The average speedup of the proposed hardware in solving the functions F1 to F10 is considered to compare it with the previous implementations. Figure 15 shows, the comparison between the respective speedups. The proposed GA architecture exhibits a very high speed up compared to the speed ups of previous implementations. CONCLUSION This chapter presents a generalized prototype of evolutionary algorithm based hardware for solving real parameter optimization problems. The hardware is tested for various types of benchmark problems. To Table 8. Speed comparison of the proposed GA hardware with software GA implementation Function Avg. Software GA Execution Time (msec) Avg. Hardware GA Execution Time(msec) Hardware GA Speed Up F1 55.347 0.106182406 521.2445461 F2 58.1338 0.2797078 207.8376077 F3 54.3252 0.096197979 564.7228853 F4 57.6582 0.08953016 644.0086782 F5 58.4622 0.0385662 1515.892154 F6 55.1854 0.08311321 663.9786882 F7 54.0796 0.06217376 869.8138893 F8 54.3798 0.088690673 613.1400076 F9 56.4766 0.058245685 969.6271935 F 10 56.0422 0.440245237 127.2976863 97 A System on Chip Development of Customizable GA Architecture Figure 14. Comparison between execution time of hardware and software implementation of the GA for convergence of the functions F1 to F10 Figure 15. Comparison of the reported speedups of previous hardware GA implementations and the proposed GA hardware 98 A System on Chip Development of Customizable GA Architecture Table 9. Speed comparison of the proposed GA hardware with previous hardware GA implementations Work Speed Up Scott (1994) 18.8x Graham & Nelson (1996) 10.6x Tommiska & Vuori (1996) 212x Shackleford 160x Koonar (2003) 100x Tang & Leslie (2004) 10.68x Fernando et al. (2010) 5.16x Proposed 669.75x best of the authors’ knowledge, none of the evolutionary algorithm based hardware proposed till date has been successfully tested for so many standard benchmark problems. The hardware is also not re synthesized for optimizing various problems and thus provides a flexible platform for optimizing different kinds of problems. A prototype of the hardware has been developed using Verilog HDL. The prototype has been implemented on a single FPGA Xilinx Virtex IV (XC4VLX25) chip. Detailed performance comparison has been given with respect to the software implementation of the own work and also other works reported in the literature. Thus we get a very fast, compact and flexible entity that can be used for optimization requirements in real time applications such as one of them is spectrum sensing in cognitive radio environment. ACKNOWLEDGMENT The work is undertaken as part of Media Lab Asia project entitled “Mobile Broadband Service Support over Cognitive Radio Networks”. REFERENCES Back, T. (1996). Evolutionary algorithms in theory and practice. Oxford Univ. Press. Holland, J. H. (1975). An introductory analysis with applications to biology, control, and artificial intelligence. In Adaptation in Natural and Artificial Systems (1st ed.). The University of Michigan. Golberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Addison Wesley. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press. Graham, P., & Nelson, B. (1996, April). Genetic algorithms in software and in hardware-a performance analysis of workstation and custom computing machine implementations. In FPGAs for Custom Computing Machines, 1996. Proceedings. IEEE Symposium on (pp. 216-225). IEEE. doi:10.1109/FPGA.1996.564847 99 A System on Chip Development of Customizable GA Architecture University of Guelph School of Engineering, & Koonar, G. K. (2003). A reconfigurable hardware implementation of genetic algorithms for vlsi cad design. University of Guelph. Areibi, S., Moussa, M., & Koonar, G. (2005). A genetic algorithm hardware accelerator for VLSI circuit partitioning. International Journal of Computers and Their Applications, 12(3), 163. Scott, S. D., Samal, A., & Seth, S. (1995, February). HGA: A hardware-based genetic algorithm. In Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays (pp. 53-59). ACM. Tommiska, M., & Vuori, J. (1996, August). Hardware implementation of GA. In Proceedings of the Second Nordic Workshop on Genetic Algorithms and their Applications (2NWGA). Shackleford, B., Okushi, E., Yasuda, M., Koizumi, H., Seo, K., Iwamoto, T., & Yasuura, H. (2001). Highperformance hardware design and implementation of genetic algorithms. In Hardware implementation of intelligent systems (pp. 53–87). Physica-Verlag HD. doi:10.1007/978-3-7908-1816-1_2 Aporntewan, C., & Chongstitvatana, P. (2001, May). A hardware implementation of the compact genetic algorithm. In IEEE Congress on Evolutionary Computation (pp. 624-629). doi:10.1109/CEC.2001.934449 Unlt, G. P. (2004). Hardware implementation of genetic algorithms using FPGA. Academic Press. Vavouras, M., Papadimitriou, K., & Papaefstathiou, I. (2009, July). High-speed FPGA-based implementations of a genetic algorithm. In Systems, Architectures, Modeling, and Simulation, 2009. SAMOS’09. International Symposium on (pp. 9-16). IEEE doi:10.1109/ICSAMOS.2009.5289236 Chen, P. Y., Chen, R. D., Chang, Y. P., & Malki, H. A. (2008). Hardware implementation for a genetic algorithm. Instrumentation and Measurement. IEEE Transactions on, 57(4), 699–705. Fernando, P. R., Katkoori, S., Keymeulen, D., Zebulum, R., & Stoica, A. (2010). Customizable FPGA IP core implementation of a general-purpose genetic algorithm engine. Evolutionary Computation. IEEE Transactions on, 14(1), 133–149. Kok, J., Gonzalez, L. F., & Kelson, N. (2013). FPGA implementation of an evolutionary algorithm for autonomous unmanned aerial vehicle on-board path planning. Evolutionary Computation. IEEE Transactions on, 17(2), 272–281. Nambiar, V. P., Balakrishnan, S., Khalil-Hani, M., & Marsono, M. N. (2013). HW/SW co-design of reconfigurable hardware-based genetic algorithm in FPGAs applicable to a variety of problems. Computing, 95(9), 863–896. doi:10.1007/s00607-013-0305-5 Ashraf, R., & DeMara, R. F. (2013). Scalable FPGA refurbishment using netlist-driven evolutionary algorithms. Computers. IEEE Transactions on, 62(8), 1526–1541. Manikas, T. W., & Cain, J. T. (1996). Genetic algorithms vs. simulated annealing: A comparison of approaches for solving the circuit partitioning problem. Academic Press. Bui, T. N., & Moon, B. R. (1998). GRCA: A hybrid genetic algorithm for circuit ratio-cut partitioning. Computer-Aided Design of Integrated Circuits and Systems. IEEE Transactions on, 17(3), 193–204. 100 A System on Chip Development of Customizable GA Architecture Rieser, C. J. (2004). Biologically inspired cognitive radio engine model utilizing distributed genetic algorithms for secure and robust wireless communications and networking. (Doctoral dissertation). Virginia Polytechnic Institute and State University. Rondeau, T. W., Le, B., Rieser, C. J., & Bostian, C. W. (2004, November). Cognitive radios with genetic algorithms: Intelligent control of software defined radios. In Software defined radio forum technical conference (pp. C3-C8). Zhao, Z., Peng, Z., Zheng, S., & Shang, J. (2009). Cognitive radio spectrum allocation using evolutionary algorithms. Wireless Communications. IEEE Transactions on, 8(9), 4421–4425. Deka, R., Chakraborty, S., & Roy, S. J. (2012). Optimization of spectrum sensing in cognitive radio using genetic algorithm. Facta universitatis-series. Electronics and Energetics, 25(3), 235–243. Dieterich, J. M., & Hartke, B. (2012). Empirical review of standard benchmark functions using evolutionary global optimization. arXiv preprint arXiv:1207.4318 Jamil, M., & Yang, X. S. (2013). A literature survey of benchmark functions for global optimisation problems. International Journal of Mathematical Modelling and Numerical Optimisation, 4(2), 150–194. doi:10.1504/IJMMNO.2013.055204 Suganthan, P. N., Hansen, N., Liang, J. J., Deb, K., Chen, Y. P., Auger, A., & Tiwari, S. (2005). Problem definitions and evaluation criteria for the CEC 2005 special session on real-parameter optimization. KanGAL Report, 2005005. Holleman, J., Bridges, S., Otis, B. P., & Diorio, C. (2008). A 3 W CMOS true random number generator with adaptive floating-gate offset cancellation. Solid-State Circuits. IEEE Journal of, 43(5), 1324–1336. Meysenburg, M. M., Foster, J., Saghi, G., Dickinson, J., Jacobsen, R. T., & Shreeve, J. N. M. (1997). The effect of psuedo-random number generator quality on the performance of a simple genetic algorithm. Academic Press. Meysenburg, M. M., & Foster, J. A. (1999). Randomness and GA performance, revisited. Academic Press. Cantú-Paz, E. (2002, July). On Random Numbers And The Performance Of Genetic Algorithms. GECCO. Wolfram, S. (1984). Universality and complexity in cellular automata. Physica D. Nonlinear Phenomena, 10(1), 1–35. doi:10.1016/0167-2789(84)90245-8 Hortensius, P. D., McLeod, R. D., & Card, H. C. (1989). Parallel random number generation for VLSI systems using cellular automata. Computers. IEEE Transactions on, 38(10), 1466–1473. Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. Evolutionary Computation. IEEE Transactions on, 1(1), 67–82. Hadley, G. (1964). Nonlinear and dynamic programming. Academic Press. Ortiz-Boyer, D., Hervás-Martínez, C., & García-Pedrajas, N. (2005). CIXL2: A Crossover Operator for Evolutionary Algorithms Based on Population Features. Journal of Artificial Intelligence Research, 24, 1–48. 101 A System on Chip Development of Customizable GA Architecture Andre, J., Siarry, P., & Dognon, T. (2001). An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization. Advances in Engineering Software, 32(1), 49–60. doi:10.1016/S0965-9978(00)00070-3 Hrstka, O., & Kučerová, A. (2004). Improvements of real coded genetic algorithms based on differential operators preventing premature convergence. Advances in Engineering Software, 35(3), 237–246. doi:10.1016/S0965-9978(03)00113-3 KEY TERMS AND DEFINITIONS Benchmark Problems: It is a set of standard optimization problems consisting of various types of functions, used for evaluation, characterization and performance measurement of optimization algorithm. Behavior of the algorithms under different environmental conditions can be predicted using benchmark functions. Embedded Applications: It is a software or hardware based computer system included as a part of a larger device and dedicated to perform certain real-time function with some constraints. Evolutionary Algorithm: A set of meta-heuristic, population-based optimization techniques that uses nature inspired processes such as selection, reproduction, recombination, mutation, etc. Hardware Description Language: (HDL): A language used to describe the design and functioning of hardware in a software environment that enables to verify the design constraints and then help to implement the design in actual hardware. Hardware–In-Loop: A platform to test real-time systems consisting of the device under test (DUT) in loop with the mathematical representations of the other related dynamic systems. Objective Function: It is a real-valued function to be optimized under some constraints and it defines the relationship between input and output of a system which is represented by the function. System on Chip (SoC): It is a low power computer or an electronic system, capable of various analog and/or digital and/or radio-frequency functions and fabricated into a single integrated circuit (IC). 102
© Copyright 2026 Paperzz