Handbook of research on natural computing for optimization problems

Handbook of Research on
Natural Computing for
Optimization Problems
Jyotsna Kumar Mandal
University of Kalyani, India
Somnath Mukhopadhyay
Calcutta Business School, India
Tandra Pal
National Institute of Technology Durgapur, India
A volume in the Advances in Computational
Intelligence and Robotics (ACIR) Book Series
Published in the United States of America by
Information Science Reference (an imprint of IGI Global)
701 E. Chocolate Avenue
Hershey PA, USA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: [email protected]
Web site: http://www.igi-global.com
Copyright © 2016 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in
any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or
companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Library of Congress Cataloging-in-Publication Data
Names: Mandal, Jyotsna Kumar, 1960- editor. | Mukhopadhyay, Somnath, 1983editor. | Pal, Tandra, 1965- editor.
Title: Handbook of research on natural computing for optimization problems /
Jyotsna Kumar Mandal, Somnath Mukhopadhyay, and Tandra Pal, editors.
Description: Hershey, PA : Information Science Reference, 2016. | Includes
bibliographical references and index.
Identifiers: LCCN 2016002268| ISBN 9781522500582 (hardcover) | ISBN
9781522500599 (ebook)
Subjects: LCSH: Natural computation--Handbooks, manuals, etc.
Classification: LCC QA76.9.N37 H364 2016 | DDC 006.3/8--dc23 LC record available at http://lccn.loc.gov/2016002268
This book is published in the IGI Global book series Advances in Computational Intelligence and Robotics (ACIR) (ISSN:
2327-0411; eISSN: 2327-042X)
British Cataloguing in Publication Data
A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the
authors, but not necessarily of the publisher.
For electronic access to this publication, please contact: [email protected].
66
Chapter 4
A System on Chip Development
of Customizable GA
Architecture for Real Parameter
Optimization Problem
Sumitra Mukhopadhyay
University of Calcutta, India
Soumyadip Das
University of Calcutta, India
ABSTRACT
This chapter presents the design and development of a hardware based architecture of Evolutionary
Algorithm for solving both the unimodal and multimodal fixed point real parameter optimization problems.
Here a modular architecture has been proposed to provide a tradeoff between real time performance
and flexibility and to work as a resource efficient reconfigurable device. The evolutionary algorithm
used here is Genetic Algorithm. Prototype implementation of the algorithm has been performed on a
system-on-chip field programmable gate array. The notable feature of the architecture is the capability
of optimizing a wide class of functions with minimum or no change in the synthesized hardware. The
architecture has been tested with ten benchmark problems and it has been observed that for different
optimization problems the synthesized target requires maximum of 5% logic slice utilization, 2% of the
available block RAMs and 2% of the DSP48 utilization in Xilinx Virtex IV (ML401, XC4VLX25) board.
INTRODUCTION
A large number of real world problems like asset allocation, best possible resource utilization, automated
system design and operation etc. require decision making process for future evolution of the system under uncertainty. However, in stochastic environment, the problem of decision making involves multiple
sub problems like system identification, state estimation and generation of optimal control. Many of
DOI: 10.4018/978-1-5225-0058-2.ch004
Copyright © 2016, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

A System on Chip Development of Customizable GA Architecture
these modules require the formulation of mathematical models of the process to be controlled and take
aid of several optimization algorithms. Also in various fields of engineering and technology like civil,
mechanical, chemical, electronic design automation, VLSI, control, machine learning, signal processing
etc. mathematical optimization is applied to find optimal solutions. The optimization problem is generally formulated by representing the different situation of the real world problem in mathematical terms.
The possible solutions are represented as the decision variables; the limits of the decision variables are
indicative to the range of the solution search space. The objective function is defined, which is usually
a function of the decision variables, and its value is required to be optimized (minimized or maximized)
to obtain optimum performance of the system satisfying the defined constrains. The unconstrained
problems are formulated and optimized without any such constraints.
The standard form of an optimization problem can be defined as follows:
min f0 (x ) (1)
Subject to
fi (x ) ≤ 0, i = 1,..., m
hi (x ) = 0, i = 1,..., p
Here, function, f0: Rn→ R is called the cost function or the objective function; the functions fi: Rn →
R, i = 1,..., m, are the inequality constraint functions; and, the functions hi: Rn → R, i = 1,..., p, are the
equality constraint functions; x = {x1, …,xn} is a vector called the optimization variable of the problem.
A vector z is said to be feasible if it satisfies all the equality and inequality constraints:
f1(z)≤0,…., fm(z)≤0;
and
h1(z)=0,…., hp(z)=0;
A particular vector x* is called as the optimal solution if it has the optimal objective value among
all the feasible vectors:
f0(z)≥ f0(x*);
An optimization problem is said to be unconstrained if m = p = 0.
The task of solving a problem is to find out an acceptable solution, if not be the best one. The search
for the better solution is carried out amongst all feasible solutions. The fitness values of the feasible
solutions constitute the search space. Thus we search the minimum or maximum point in the search
space that represents the better solution to the problem.
Various algorithms (Back, 1996; Holland, 1975; Goldberg, 1989) have been developed for solving
the different classes of optimization problems (Boyd & Vandenberghe, 2004). The difficulty of solving
the problems depends on the forms and number of constraints, objective functions and variables of the
67

A System on Chip Development of Customizable GA Architecture
problem. The optimization algorithms can be deterministic or stochastic. Many of the problems require
huge computational efforts to find acceptable solutions. As the problem size increases, the solution processes often fail to converge. Bio-inspired stochastic optimization algorithms are developed as effective
alternatives to deterministic methods in such a situation for solving optimization problems.
Evolutionary algorithms (EA) optimize a problem by iteratively searching for better solutions utilizing
natural phenomena like growth, development, reproduction, selection, survival of the fittest, etc. They
include genetic algorithm (GA), differential evolution (DE), and so on.
Genetic algorithm, proposed by Holland (1975), is a robust stochastic algorithm for finding optimized
solutions. GA is applied across many disciplines to find acceptable solutions to complex problems in
a finite amount of time but its computational complexity and iteration time increases for solving complex problems. Thus software implementation of GA for computationally intensive problems causes
large updation delays. If dedicated hardware units for GA steps are built, the parallelism inherent in
hardware can be utilized to speed up the processes. Also, the overheads of software applications like
decoding and interpretation of software commands can be avoided which speeds up the performance of
the algorithm. So, if each GA step is mapped into a dedicated hardware, it will take much lesser time
than executing the GA in software. That is why, a lot of research efforts are directed towards hardware
implementation of Genetic Algorithm as this helps in gaining speed over software implementation for
real time optimization requirements.
The authors of Graham and Nelson (1996), Koonar (2003,2005), Scott (1994), Tommiska and Vuori
(1996), Shackleford (2001), Aporntewan and Chongstitvatana (2001), Tang and Leslie (2004), Vavouras,
Papadimitriou and Papaefstathiou (2009, July), Chen et al (2008), Fernando et al (2010), Kok et al. (2013),
Nambiar, Balakrishnan, Khalil-Hani, and Marsono (2013), Ashraf, and DeMara (2013) and a number of
researchers indicated regarding the hardware implementation of GA. Most of the works in this field have
targeted the development of fast GA hardware that outperforms the software GAs in speed. However,
many of the researchers are targeting to obtain faster optimization with minimum hardware resources.
Application-specific-GA hardware is also implemented that works well for particular problems (Kok
et al., 2013). The GA hardware implementations require the hardware to be re-synthesized to accommodate newer optimization problems. GA can also be implemented by first writing the code in software
and then extracting the hardware configuration code using CAD tool. But in such implementations, the
design customization like parallelism and pipelining is limited by the tool.
Here GA is implemented using Verilog HDL only without the assistance of any other code generator
tool. Thus it reduces the overheads required during the implementation of the modules in single chip.
This research work tries to achieve the following targets:
•
•
68
The System on Chip (SoC) FSM based fixed point Genetic Algorithm (GA) architecture has been
proposed and implemented on single FPGA board(Xilinx Virtex IV (XC4VLX25)) for function
optimization without any modification of standard benchmark problems available in the literature. (In the testing phase, most of the literatures are modifying the target problems according to
the implementation requirement, making it simpler, reducing the search range, and resolution of
search range etc. and thus the complexity of the target problem reduces).
To examine the maximum amount of flexibility that can be achieved in a single architecture in
terms of genetic parameters like word lengths of population members and that of their fitness values, population size and the number of generations.

A System on Chip Development of Customizable GA Architecture
•
The GA architecture was slightly modified and integrated into a Hardware-in-the-Loop (HIL) testing environment to develop a flexible GA based optimization system that requires no re-synthesis
to solve different optimization problems. Thus a universal system has been designed which will
help to find out an optimal solution for a large number of optimization problems.
It has been successfully tested for 10 standard Benchmark Problems. The performance metric used to
evaluate the functioning of the proposed GA based architecture is the number of function calls (NFCs),
success rate and success performance which are detailed in the result section of the chapter. Table 1
represents a list of abbreviations and their full forms.
A brief description of genetic algorithm is presented in the following section.
Table 1. Abbreviations and their full forms
Abbreviation
Full Form
AHDL
Altera Hardware Description Language
ASIC
Application Specific Integrated Circuit
CAD
Computer Aided Design
CCU
Central Control Unit
CM
Crossover Module
DE
Differential Evolution
DSP
Digital Signal Processing
EA
Evolutionary Algorithm
ES
Evolutionary Strategy
FEM
Fitness Evaluation Module
FIL
FPGA-in-Loop
FMem
Fitness Memory
FPCB
Field Programmable Circuit Board
FPGA
Field Programmable Gate Array
FSM
Finite State Machine
GA
Genetic Algorithm
GI
Guard Interval
GP
Genetic Programming
HDL
Hardware Description Language
HIL
Hardware-in-Loop
IP
Intellectual Property
IPGM
Initial Population Generation Module
LCA
Linear Cellular Automata
LCM
Load Constraint Module
LFSR
Linear Feedback Shift Register
MM
Mutation Module
NFC
Number of Function Call
continued on following page
69

A System on Chip Development of Customizable GA Architecture
Table 1. Continued
Abbreviation
Full Form
Np
Population Size
OSFVMem
Offspring Solution Fitness Value Memory
OSMem
Offspring Solution Memory
PCI
Peripheral Components Interconnect
PRNG
Pseudo Random Number Generator
PSFVMem
Parent Solution Fitness Value Memory
PSM
Parent Selection Module
PSMem
Parent Solution Memory
PSSM
Prospective Solution Selection Module
RAM
Random Access Memory
RNG
Random Number Generator
SMem
Solution Memory
SoC
System on Chip
SPGA
Splash-2 Parallel GA
TSMC
Taiwan Semiconductor Manufacturing Company
TSP
Travelling Salesman Problem
UAV
Unmanned Aerial Vehicle
VHDL
Very High Speed Integrated Circuits Hardware Description Language
VLSI
Very Large Scale Integration
GENETIC ALGORITHM
GA is based on the natural principle of “survival of the fittest”. It constitutes of a number of steps like
Initial Population Generation, Fitness Evaluation, Parent Selection, Crossover, and, Mutation. In Figure
1 the Block Diagram of Genetic Algorithm Cycle is illustrated. In genetic algorithm, solutions are generated iteratively with an expectation that successive iteration will produce better solutions to the problem.
INITIAL POPULATION GENERATION
The search process starts with the generation of a set of solutions, called chromosome, predicted arbitrarily
in the domain of the problem. This is called the initial population of solutions and this step is called
Initial Population Generation. Each chromosome represents some characteristics of the solution to the
problem. They may consist of a string of binary numbers in which each bit or group of bits encode some
characteristics of the solution. Usually, this is done employing random number generators that generate
solutions encoded as binary strings within the domain of the problem to be optimized. The number of
chromosomes generated is determined by population size (Np), which is the number of candidate solutions present in each generation.
70

A System on Chip Development of Customizable GA Architecture
Figure 1. Block Diagram of Genetic Algorithm Cycle
FITNESS EVALUATION
In Fitness Evolution, the characteristics of the solutions encoded into the chromosomes are utilized to
evaluate the fitness value of the chromosomes using an appropriate objective function that represents the
problem to be optimized. Since the objective function changes from one problem to another, the fitness
evaluation step is required to be modified.
PARENT SELECTION
In this step, the set of chromosomes having promising fitness value are selected to form a set called mating pool for subsequent genetic operations to create new solutions. This may be done by processes like
Roulette Wheel selection, tournament selection etc. The proposed GA architecture in this chapter uses
the tournament selection process for this purpose. In tournament selection, two solutions are arbitrary
selected. Their fitness values are compared and the one having the better fitness value is selected.
GENETIC OPERATION
The genetic operation consists of two steps, called Crossover and Mutation. The crossover operation
mimics the genetic crossover process that occurs during the formation of new individual called offspring
in living beings. During crossover operation, two chromosomes are selected from the mating pool. The
characters or genes of the chromosomes, encoded as bits, are exchanged and recombined among the
71

A System on Chip Development of Customizable GA Architecture
parents to give rise of new chromosomes, called offspring, representing new solutions to the problem.
The exchange of bits may occur at one or more points in the parent chromosomes, and such processes
are accordingly named as single point or multiple point crossovers. Figure 2(a) and Figure 2(b) illustrates
the single point and two point crossovers. The crossover points are selected randomly.
The mutation operation mimics genetic mutation in living beings. During mutation, one of more
character or genetic trait of a chromosome gets abruptly altered due to some external causes giving rise
to an offspring with different traits. This biological process is mimicked in genetic algorithm using the
mutation operation. The operation is used to prevent the algorithm from getting stuck to a local minimum
for a function shown in Figure 3.
Now the fitness values of these new chromosomes developed from the genetic operations are evaluated. The best solutions are selected to form a new set of parents and they will participate in Parent
Selection to form the mating pool for the next iteration of genetic operation. Successive generations
consist of improved solutions. The genetic algorithm described above is the basis of the EA hardware
proposed in this chapter.
Figure 2. (a) Single point crossover and (b) 2-point crossover
Figure 3. Mutation operation
72

A System on Chip Development of Customizable GA Architecture
Pseudo Code for Genetic Algorithm (GA)
Generate random solutions in problem domain
Write into Parent Memory
Loop
Parent selection
Mating Pool formation with selected parents
Write selected parents into offspring memory
Crossover
Mutation
Evaluate new solutions
Update parent memory
End-loop.
The following section gives a review of the previously reported works on hardware implementations of GA.
BACKGROUND AND LITERATURE REVIEW OF RELATED WORKS
In recent years the importance on the hardware implementation of EA is gaining popularity. GA has
been implemented in hardware for a number of times. Here, brief descriptions of those efforts are discussed with special emphasis on the works that report FPGA implementation of general purpose GA
for solving optimization problems. However, it is revealed from the detailed literature survey that the
implementation of generalized GA architecture for function optimization is limited. Some researchers
have targeted and tested their design approach with very limited number of functions. Also, those functions were customized according to the design need and the domain of the solution space considered was
essentially integer. Standard available bench mark examples from the literature with its full complexity
were not considered in most of the cases which were surveyed during our analysis.
One of the earliest hardware implementation of general purpose GA was done by Scott in 1994 (Scott,
1994). At that point of time, the researchers were only motivated to investigate the hardware implementation issues of GA in order to utilize the hardware’s speed advantage over software based GAs (SGA)
and inherent parallelism in GA. Multiple FPGAs were used to implement a general purpose Hardware
Genetic Algorithm (HGA) on a reconfigurable BORG prototyping board containing five Xilinx XC40000
FPGAs. In the experiments roulette wheel selection and single-point crossover was employed with a
population size of 16, member width of 3 bits and fitness value width of 4 bits. Comparison was done
between the average execution time of the HGA and SGA in terms of number of clock cycles required
for the same number of generations while solving six different test functions. Number of clock cycles
was used as a technology independent performance metric. It was shown that, HGA exhibited speedups
of 1to 3 orders of magnitude. It was suggested that further hardware improvements could be done by
parallelization, concurrent memory access, etc. Prime target was to study the hardware implementation
of basic GA and its speedup over software, rather than efficient usage and compactness of hardware
resources or extension of GA by implementing other GA operators, selection methods or encoding. Vavouras, Papadimitriou and Papaefstathiou (2009) also reported one GA based hardware implementation on
73

A System on Chip Development of Customizable GA Architecture
FPGA. HGA was implemented on XUPV2P platform consisting of an embedded PowerPC. The authors
claimed to improve the HGA implemented by Scott in 1994 described above. The implementation was
tested using linear, quadratic and cubic test functions. The implementation was compared to some other
hardware implementations on the basis of the hardware execution time required to solve the test functions.
In another work Tang and Leslie (2004) reported a PCI-based Hardware GA implementation using
two Altera FPGAs which are SRAM based. One FPGA (FLEX 6000) was used for bus interface, control
unit and the genetic operators. The other FPGA (FLEX 10,000) was used as the fitness evaluator by
implementing the objective function. Programmable genetic operators were implemented using parallel
and pipelined architectures using FPGA to support one/four/uniform crossover and one/four/uniform/
Gaussian mutation. Roulette wheel selection was used. A modified version of Witte and Holst’s Strait
Equation was used as the test problem.
In other work in late nineties, the hardware GA architecture was designed targeting on solving constrained optimization problem. Graham and Nelson (1996) used Splash-2 Parallel GA (SPGA) for optimizing travelling salesman problem (TSP). Each SPGA processor consisted of four Xilinx 4010 FPGAs
and memories. Slowly the interest started growing on hardware EA implementation and the process
speedup and optimum resource utilization became key parameters. In one such work, Tommiska and
Vuori (1996) tried to address those issues. The motivation was to utilize the inherent speed advantage
of hardware in case of GA implementation for real time applications like optimized telecommunication
network routing etc. The platform used comprised of a white noise generator, an A/D converter and two
Altera’s Flex 10K FPGAs interconnected by Peripheral Components Interconnect (PCI) cards. This was
connected to the host Pentium microprocessor based computer via high speed PCI bus slots through
which the FPGAs were configured. The architecture was implemented in a 4 stage pipelined fashion using
Altera Hardware Description Language (AHDL) code. Population was stored in RAM in the embedded
array blocks and fitness function was in the logic array blocks of FPGA. Each population consisted of 32
members, each 32-bit width. Single point crossover was performed with probability of 100% and mutation probability of 3.1%. Unsigned 32-bit binary numbers were compared as test problem. Round-robin
selection was done. The hardware exhibited 212 times speed gain over the same algorithm programmed
in C language running on 120 MHz Pentium processor based Linux system. A different fitness function
can be employed by re programming the FPGA using AHDL. Thus this GA architecture is not general
in true sense of the term as it requires reprogramming and re synthesis for different fitness functions.
As the interest on hardware implementation increased, simultaneously the complexity in the development procedure also grew up. One of the solutions was Survival based steady state GA and it was
proposed by Shackleford (2001). Six FPGAs, on an Aptix AXB-MP3 Field Programmable Circuit Board
(FPCB), were used, and the implementation was not SoC.
Another aspect of GA architecture development came into surface with the proposal of compact
GA. Aporntewan and Chongstitvatana (2001) described the implementation of a compact GA using
Verilog HDL. One max problem was used to evaluate the system performance. But theoretically, the
Compact GA cannot absolutely replace Simple GA for all classes of problems as it simulates the orderone behavior of Simple GA using binary tournament selection and uniform crossover. The convergence
is ensured for problems consisting of tightly coded, non-over lapping building blocks. Such problems
are rarely found in real-world applications. The authors also did not claim that Compact GA hardware
performed better than other Hardware GAs as performance evaluations was different (Aporntewan &
Chongstitvatana, 2001).
74

A System on Chip Development of Customizable GA Architecture
Circuit partitioning in VLSI is an important field for the application of optimization algorithm. Several
software EA algorithms are proposed and customized boards are designed based on those algorithms
(Manikas, & Cain, 1996; Bui, & Moon, 1998; Koonar, 2003; Areibi, Moussa, & Koonar, 2005). One
such example is the design by Koonar (2003) using Very High Speed Integrated Circuits Hardware
Description Language (VHDL) to develop an application specific GA architecture to address circuit
partitioning in VLSI. The architecture consists of three added memories and 6 modules.
A new avenue of GA hardware implementation opened with the development of intellectual property
(IP) core for the GA. Some of the examples are described to establish the concept and the significance
of the scheme. Chen et al. (2008) described a flexible-very-large-scale integration intellectual property
(IP) core for the GA. Using C++ programming, a software application called Smart GA was built that
imparts flexibility in the design. Using the Smart GA software, GA parameters like population sizes,
individual lengths, fitness functions, crossover operations and mutation rates can be fixed for a particular
GA implementation to solve a specific problem and the corresponding Verilog HDL code was obtained.
To perform the fitness calculation for different fitness functions, either lookup table or user defined
approach was used. In the former method, a fixed number (216) of 16-bit wide fitness values of 16-bit
wide individuals can be generated and stored in a fixed sized LUT. In the latter method, the user requires
to know Verilog coding for different applications. The IP core generates GA hardware with range of
population sizes (8-16384), fitness lengths (8-1024) and individual lengths (8-1024). Randomization
was done using linear cellular automata (LCA) method. Tournament selection process was employed.
Crossover operator can be chosen from uniform, single-point, two-point and cross-point crossover. Mutation is also user defined. A chip is developed using Taiwan Semiconductor Manufacturing Company’s
(TSMC) 0.18-μm cell library. Three test functions were used namely 1-D trigonometric function, 2-D
Shubert function and in digital audio broadcasting system to determine optimized guard interval (GI)
length under bit error rate (BER) performance specification. The generated architectures found optimal
values within 150µs, 315µs and 0.167ms respectively. But the disadvantages are this implementation
requires the user to rewrite and re synthesize the hardware for every change in the GA parameters or
that of fitness function. So, once ASIC implementation is done, it can be used only for specific problem.
Also to predict the best GA parameter settings for a particular fitness function, the user is required to re
synthesize the GA netlist repeatedly for best solution.
Fernando et al. (2010) presented a customizable FPGA IP core implementation of a General-Purpose
GA engine. Here the GA IP core supports up to eight different fitness functions. The core has I/O ports
to accommodate a new fitness function. A new function has to be implemented on another device which
requires to be connected to those I/O ports for evaluation. 16-bit cellular automata based PRNG is used
for initial population generation and other randomizations. Proportionate selection scheme is used for
parent selection along with single point crossover and mutation. The GA parameters like population
size, number of generation, crossover and mutation and crossover rates are user programmable for the
IP core. The core supports chromosome length up to 16 bit. For larger lengths, the netlist requires to
be resynthesized and simulated for verification of functionality and timing details. This shows that, the
system is flexible with limitations. The FPGA implementation is tested for three test functions using
lookup-based fitness evaluation module. Modified, binary and scaled versions functions were used for
easy hardware implementation. The possible fitness values are stored in look up table. Thus the values
are needed to be pre-calculated and coded into the memory.
As an improvement of new application specific GA hardware, Kok et al. (2013) described a SoC
FPGA implementation of a modified GA hardware based architecture for path planning of unmanned
75

A System on Chip Development of Customizable GA Architecture
aerial vehicle (UAV). Here all the functionalities of GA-based path planner are implemented on a single
FPGA chip. According to the authors, this was the first SoC implementation of a GA-based path planner.
But the design is not balanced. Also, it aims at solving only the problem of UAV path planning using
GA and does not have a universal approach.
Very recently researchers are also targeting the optimization problems in cognitive radio environment
and we find few simulations study in this regard. In the works by Rieser (2004) and Rondeau, Le, Rieser,
& Bostian, (2004, November), cognitive radio controlling systems are modeled using GA. Optimized
Spectrum sensing and allocation for cognitive radio have been proposed by Zhao, Peng, Zheng, & Shang
(2009) and Deka, Chakraborty, & Roy,(2012).
Table 2 shows a parametric comparison between the implementation aspects of different hardware
GAs along with those of the present proposal.
Most of these implementations as discussed above suffer from one or more of the following disadvantages:
•
•
•
•
•
•
Except for (Kok et al., 2013), implementation of the algorithm was done using platforms containing more than one FPGA chips. So these hardware architectures are larger in size and can be used
in applications where compactness of the hardware is not a primary requirement. Also, inter-chip
communication requirement for running of these architectures costs their speed.
Many of these architectures (Scott, 1994) are not portable and cannot be implemented on other
platforms.
Some of the implemented GAs, such as in (Graham and Nelson, 1996; Koonar, 2003; Kok et al.,
2013) are application specific.
The general purpose GAs implemented in (Scott, 1994; Tommiska & Vuori, 1996; Aporntewan
& Chongstitvatana, 2001; Tang & Leslie, 2004; Chen et al., 2008; Fernando et al., 2010) lack in
flexibility. Each of these supports only a single or a finite set of fitness functions. Thus utilizing
such hardware for optimization in other applications or for a different fitness function requires
alteration in the basic hardware and re-synthesizing the hardware again.
In all hardware based systems developed till date, the architecture needs to be re-synthesized for
enabling it to optimize new problems.
Hardware GA implementations as in Fernando et al., 2010 dealt with integers as the design variables and fitness values instead of real numbers. Although complex test functions are optimized,
but they have been modified for hardware implementations. Also the minimization functions are
changed to maximization functions. Customized problems were used and in the implementation
search space of the variables were integers.
This chapter presents a general purpose function optimization architecture which is a significant
improvement over most existing works using application specific architectures. In this chapter, our
proposed GA hardware implementation has following features which serve as key factors for obtaining
solutions to the above mentioned problems:
•
•
76
The EA Hardware described in this paper is implemented entirely on a single FPGA chip. This
implementation can be utilized for applications that require a compact and small sized hardware.
Fixed Point GA Hardware is implemented in a finite state machine (FSM) based approach where
each FSM state corresponds to a basic module of the architecture. The execution of individual
x; x+5; 2x;x^2;
2x^3-45x^2+300x;
x^3-15x^2+500
Travelling Salesman
Problem
comparison between
32-bit unsigned
binary
protein folding
one max problem
Circuit partitioning
in VLSI
modified Witte
and Holst’s Strait
Equation:|x1-a|+|x2b|+|x3-c|
1-D trigonometric
function, 2-D
Shubert function,
DAB system
BF6_2;
mBF7_2(x,y);
mShubert2D(x1,x2)
4-D Benchmark
Functions
Graham &
Nelson (1996)
Tommiska &
Vuori (1996)
Shackleford
Aporntewan &
Chongstitvatana
(2001)
Koonar (2003)
Tang & Leslie
(2004)
Chen et al.
(2008)
Fernando et al.
(2010)
Proposed
Functions
Scott (1994)
Work
8 or 16 or 32
or 64
Programmable
8-16384
Programmable
20
Fixed(256)
24 bit
16 bit
8-1024
32-bit
Fixed(32)
Fixed
---
3 bit
Member
Width
64/128/256
Fixed(16)
Population Size
Programmable
Progarmmable
(32-bit)
Dynamic
Progarmmable
20/60/100
N/A
Fixed
Fixed
---
Fixed
No. of
Generations
Tournament
Roulette
Tournament
Roulette
Tournament
N/A
Survival
Round Robin
Roulette
Roulette
Selection
Fixed
Progarmmable
(4-bit)
Dynamic
Progarmmable
Fixed
N/A
Fixed
Fixed (100%)/
fixed (3.1%)
---
Fixed
Crossover/
Mutation
Rates
PRN
Generator
Type
---
Fixed
Cellular
Automata
Cellular
Automata
Fixed
Single Point Linear
Cellular
Automata
Single Point Cellular
Automata
uniform,
singlepoint, twopoint and
cross-point
Single
Point,
4-point,
Uniform
Single point LFSR
N/A
Single Point Cellular
Automata
Seed
Xilinx
Virtex2Pro
FPGA
Altera
PCI card
Virtex
xcv50e
Xilinx
Virtex 1000
Aptix
Altera’s
Flex 10K
FPGAs on
PCI cards
-
BORG
board
Hardware
Platform
Provided through Xilinx
Input
Virtex4
FPGA
Programmable
Fixed
Fixed
Fixed
Fixed
Single Point Linear Shift Fixed
Register
Single point ---
Single Point Cellular
Automata
Crossover
Operators

A System on Chip Development of Customizable GA Architecture
Table 2. Comparison of different literature survey, problem considered at hand, the variation of parameters and resources
77

A System on Chip Development of Customizable GA Architecture
•
•
•
•
module is governed by a central control unit (CCU). This approach imparts flexibility to the
implementation.
The implementation is tested in Xilinx Virtex IV (ML401, XC4VLX25) FPGA chip. Also it can
be implemented in other reconfigurable hardware with minor modification in configuration part.
The implemented architecture is successfully tested with a number of different types of unimodal
and multimodal problems as suggested in the literature by Dieterich, Johannes M., and Hartke
(2012), Jamil, Momin, and Yang (2013), Suganthan et al. (2005). The tests produce satisfactory
results.
The functions were optimized without any modification. So the design variables and fitness values
were real numbers instead of integer values. This was dealt in hardware by using fixed point GA
hardware implementation approach.
The designed hardware was modified and a FPGA-in-Loop (FIL) environment was designed.
Therefore the system developed by integrating hardware with FIL could be used to optimize any
problem without re-synthesizing the basic hardware.
The next section highlights the GA hardware implementation issues and its flow execution.
GENETIC ALGORITHM BASED HARDWARE
DEVELOPMENT AND ITS OPERATION
In this section, architecture of the proposed EA based optimization system and its functioning is described. The architecture is inspired from the simple GA and is designed based on the genetic algorithm
discussed above. The fixed point GA hardware is modeled in the form of a finite state machine (FSM)
with each GA step corresponding to a state of the FSM. With the help of FSM based approach the states
were mapped into hardware structure in modular fashion. Each of the steps, say initial population generation, fitness evaluation, parent selection, crossover, mutation, etc., are designed as a separate module
and integrated into a sequential FSM to form the GA hardware.
BASIC HARDWARE STRUCTURE
A generalized external schematic of the proposed GA architecture is shown in Figure 4 (a) and the detail hardware representation is given in Figure 4(b). The inputs are the CLOCK, RESET, ACTIVATE
and SEED. The outputs are the BEST SOLUTION and ERROR or FITNESS VALUE. The hardware
consists of a central control unit (CCU), memory units and several modules named after their function
as Load Constraint Module (LCM), Initial Population Generation Module (IPGM), Fitness Evaluation
Module (FEM), Prospective Solution Selection Module (PSSM), Parent Selection Module (PSM),
Crossover Module (CM) and Mutation Module (MM). CCU controls the modules using handshaking
signals namely READY, DO and DONE, indicated in Figure 4(b) using arrows between the CCU and
the modules. For the sake of clarity, the READY, DO and DONE signals between LCM and CCU only
has been indicated in the figure. Each module has an incoming signal from the CCU named DO; and
the two signals outgoing from each module into the CCU are named as READY and DONE.
78

A System on Chip Development of Customizable GA Architecture
Figure 4. Schematic representation of GA architecture: (a) external view (b) detailed internal structure
Pseudo random number generators (PRNGs) are used to generate random numbers for imparting
randomization in various steps of GA like initial population generation, arbitrary crossover and mutation point selection, etc.
There are four data memory modules and one profile memory. Among the data memories, two modules
are required for the storage of parent and the offspring solutions namely the Parent Solution Memory
(PSMem) and the Offspring Solution Memory (OSMem). And two other data memories are required for
79

A System on Chip Development of Customizable GA Architecture
storing the fitness values of the parent and the offspring solutions which are named as the Parent Solution
Fitness Value Memory (PSFVMem) and the Offspring Solution Fitness Value Memory (OSFVMem)
respectively. For better hardware realization, the PSMem and the OSMem are implemented as two logical parts of a single physical memory block called the Solution Memory (SMem) while the PSFVMem
and the OSFVMem are implemented as two logical parts of another physical memory block named as
the Fitness Memory (FMem). The profile memory stores the configuration details of the architecture.
Such memory implementation has certain advantages which will be detailed later. Since real numbers
are used as design variables and fitness values, the hardware deals with binary fixed point numbers.
Here a brief idea of binary fixed point implementation is provided.
FIXED POINT HARDWARE IMPLEMENTATION ISSUES
A real number consisting of the integer and the fractional parts is represented in the same way in binary
only with the base changed to 2. In hardware, there is no representation for the decimal point. Only a
number of bits are fixed for representing the fractional and the integer parts of a real number with a
scaling factor. For example, if (1011)2 is a binary representation of a real number with scaling factor 2,
corresponding decimal real number is 11/2=5.5 and it has a binary fraction point between the 0th and
the 1st bits. Here we implement the architecture using this fixed point concept.
HARDWARE OPERATION
The synchronized operation and communication between the different modules and the hardware operations are described here. The working of the architecture is represented in the flowchart of Figure 5.
•
•
•
•
80
Load Constraint Module (LCM): The system initiates the operation when CCU receives the
ACTIVATE input, and issues DO signal to LCM. The LCM loads the profile memory with the
value from the SEED input. As LCM completes its tasks, it issues DONE signal to CCU which
inputs the DO signal to IPGM.
Initial Population Generation Module (IPGM): The IPGM uses a pseudo random generator to
predict arbitrary solutions which are loaded into the Parent memory as the initial population of
solutions. As the IPGM finishes, it sends DONE signal to the CCU.
Fitness Evaluation Module (FEM): FEM evaluates the fitness of a chromosome depending on
the problem specific fitness function. The fitness function of the problem is implemented in FEM.
After evaluating each member one by one, the FEM places its fitness value to the corresponding
location of the Parent Solution Fitness Value Memory (PSFVMem). Thus the fitness value of the
solution present in the first location of the Parent Solution Memory (PSMem) is mapped into the
first location of the PSFVMem and so on. The FEM issues the DONE signal after completion.
Parent Selection Module (PSM): Now the CCU initiates the PSM to create the mating pool for
genetic operations. The PSM randomly chooses two solutions from the PSMem and uses tournament selection to choose between them. PSM fetches the fitness values of the two randomly
chosen solutions from the PSFVMem, compares, and writes the better solution into the OSMem.

A System on Chip Development of Customizable GA Architecture
Figure 5. Flow chart representing working of the proposed genetic algorithm architecture to solve constrained optimization problems
•
•
Thus the mating pool is created in PSMem for performing genetic operations. After the mating
pool is generated, the PSM issues the DONE signal.
Crossover Module (CM): The CCU then activates the CM to perform crossover operation on the
mating pool formed in the Offspring Solution Memory (OSMem). Here, single point crossover is
used. CM selects the first two candidates of the mating pool, employs a PRNG to select a random
crossover point for single point crossover giving rise to two new offsprings. The new solutions are
written into the positions of the OSMem, from which the mating pool candidates were chosen.
This process is repeated with all the pairs of solutions in the mating pool. After the crossover operation, CM issues the DONE signal.
Mutation Module (MM): The CCU now initiates the MM. Depending on the mutation probability, the MM decides to mutate some candidates. The MM contains a PRNG that arbitrarily selects
a bit of the candidate that is to be mutated. The selected bit of the candidate solution is inverted to
mutate it. Now, the offspring memory consists of the new offspring solutions. FEM is again initiated to evaluate the new candidates and writes their fitness values in OSFVMem.
81

A System on Chip Development of Customizable GA Architecture
•
Prospective Solution Selection Module (PSSM): Now Prospective Solution Selection Module
(PSSM) is initiated. It employs a bubble sort (Cormen, Leiserson, Rivest, & Stein, 2003) operation on all the candidates of the parent and the offspring generations depending on their fitness
values obtained from PSFVMem and OSFVMem. PSMem is then updated with the elite solutions
arranged in the descending order of their fitness values. Their fitness values are also placed in the
corresponding locations of the PSFVMem.
Now the topmost location of the PSMem and that of the PSFVMem contains the best solution and
its fitness value obtained so far. The outputs BEST SOLUTION and ERROR of the GA architecture
are read from these locations. This completes a single iteration of the GA. The above steps are repeated
until the error is less than a specified threshold or the maximum number of iteration has taken place. To
start the next iteration, the CCU again initiates the PSM.
For different problems, only the FEM is required to be reprogrammed to implement the fitness function into it. The proposed architecture provides a general structure of a GA hardware implemented on
a single FPGA chip.
FPGA BASED PROTOTYPE DEVELOPMENT OF GA BASED ARCHITECTURE
The proposed design is implemented on a Xilinx development environment on a Virtex-IV (ML401,
XC4VLX25) FPGA kit using synthesizable Verilog HDL. The various FPGA platform specific implementation issues and constraints such as limitation of available programmable logic resources are tested
using the aforesaid platform.
•
•
•
82
Finite State Machine Based Approach: The Genetic Algorithm is mapped in FPGA based hardware platform using Finite State Machine (FSM) based approach where each state corresponds to
a particular operation of GA. The state diagram is shown in the Figure 6. The performance of such
FSM based architecture can be easily modified by changing the order of execution of the modules,
or by minor changes in a specific module, whose activities required to be changed, without altering the whole program. The hardware mapping of the different components are described in the
following sections.
Central Control Unit (CCU): The CCU controls the operation of the architecture by issuing
control signals to each of the modules. Whenever a module is required to function, the CCU sends
a DO signal to the module to engage it. After the module finishes its task, the module sends a
DONE signal to the CCU. The CCU then issues DO signal to the next module in sequence that is
READY to perform.
Encoding GA Parameters - Chromosome Structure: Here the target is to find the optimal solution of an n dimensional problem space. Each possible solution consists of a set of n possible
values, one for each dimension. Solutions are encoded in the form of binary strings containing
the n number of values each of p-bit long with a scaling factor f. That is, each p-bit number have r
fractional bits where r=log2(f) and q integer bits where q=p-r. If the chromosomes are M bit long,
then, M=p×n.

A System on Chip Development of Customizable GA Architecture
Figure 6. State diagram of the proposed GA architecture
In our hardware, M=24, p=6, q=4, r=2 and n=4. Thus, the solutions are encoded in the form of
binary string which is considered to be a 6-bit signed fixed point real numbers with a scaling factor
of four, i.e. the leftmost significant bit is the sign bit, the next three bits represent the whole number
and last two digits represent the fractional value. To start with, the hardware is designed to solve four
dimensional problems. Thus in the present hardware, each binary string of chromosome represents a
set of four binary numbers. Thus each chromosome is a 24(=6×4)-bit binary string where each 6 bits
represent a dimension in four dimensional system. The chromosome structure in the GA hardware to
solve a 4 dimensional problem is shown Figure 7. If X1, X2, X3 and X4 together represent a possible solution to the 4-dimensional problem, bits 23 to 18 of the chromosome represent X1, bit 17 to 12 represent
X2, bits 11 to 6 represent X3 and bits 5 to 0 represent X4. The structure of each of the four 6-bit binary
numbers is enlarged in the figure.
•
Parent and Offspring Memories: As discussed in the previous section, the GA hardware
proposed in this paper requires four memory modules, namely the Parent Solution Memory
(PSMem), the Offspring Solution Memory (OSMem), the Parent Solution Fitness Value Memory
(PSFVMem) and the Offspring Solution Fitness Value Memory (OSFVMem). Here the PSMem
and the OSMem are implemented in a single memory called the Solution Memory (SMem); the
PSFVMem and the OSFVMem have been implemented into another single memory called the
Fitness Memory (FMem). The two physical memories SMem and FMem are implemented using
dual port block RAMs available in the Virtex IV FPGA processor (XC4VLX25).Adding the parent and offspring memories to form a single SMem has certain computational advantages while
sorting out the elite chromosomes from both the generations. Sorting the solutions of SMem in the
83

A System on Chip Development of Customizable GA Architecture
Figure 7. Chromosome structure in the GA hardware to solve 4-dimentional optimization problems
descending order of their fitness values, puts the solutions having the better fitness values in the
upper half of the SMem. This upper half is defined as the PSMem. So the parent memory automatically gets updated with the elite chromosomes form both the parent and the offspring generations.
The arbitrary set of solutions generated at the beginning of the GA process by the IPGM is stored in
the PSMem, as well as, the Parent generation chromosomes obtained after every iteration, are updated
in the PSMem. The mating pool generated by the PSM is stored in the OSMem. The genetic operations
are performed on the contents of the OSMem by the CM and MM to form the offspring generation. The
fitness values of the parent and the offspring generation chromosomes are stored in the PSFVMem and
the OSFVMem respectively. A particular location of the PSFVMem or OSFVMem contains the fitness
value of the solution chromosome present in the same location of the PSMem or OSMem respectively.
•
•
Profile Memory: The profile memory contains the input seed, the lower and upper limits of the
feasible region of the problem. The LCM loads these values into the profile memory.
Pseudo Random Number Generators: Random number generators (RNGs) are very important components in the GA that generate randomized sequence of numbers or bit strings. The
architecture requires four RNGs for different purposes. The IPGM employs a random number
generator to generate the initial population of parent generation chromosomes. It is a 24-bit pseudorandom generator module. It generates sixteen 24-bit random binary numbers to form the initial
parent generation. The PSM requires random numbers to arbitrarily pick candidates to perform
Tournament Selection for generating the mating pool from the parent generation. A 4-bit PRNG
is used by the PSM to generate an arbitrary address that is fed into the address input of the dual
port Solution Memory, i.e., SMem to pick a parent chromosome for tournament selection to build
up the mating pool.
The CM and the MM requires random numbers to select a random bit of a chromosome at which
the crossover or the mutation occurs. The PRNGs used by the CM and MM generates 5-bit numbers.
84

A System on Chip Development of Customizable GA Architecture
The arbitrary number determines the crossover point for a pair of chromosomes or the mutation bit in
a 24-bit chromosome.
True random numbers are generated using non deterministic sources like clock jitters (Holleman,
Bridges, Otis, & Diorio, 2008). Meysenburg et al. (1997) and Meysenburg, and Foster (1999) showed
that good quality random numbers have very little effect on the performance of GA. On the other hand,
Cantú-Paz (2002) reported that, the GA performance depends on the quality of random numbers used to
generate the initial population of chromosomes, but do not depend on the RNG quality used for crossover
and mutation operations. Methods for generating pseudorandom numbers are studied by Wolfram (1984)
and Hortensius, McLeod, and Card (1989). There are two systems to generate pseudo-random numbers:
•
•
The linear feedback shift register (LFSR).
The linear cellular automata (LCA).
Wolfram (1984) describes the rule 90 and 150 for generating random numbers using LCA as explained
in the equations below:
•
Rule 90: Si+ = Si −1 ⊕ Si +1
•
Rule 150: Si+ = Si −1 ⊕ S i ⊕Si +1
Here Si implies the current state and Si+ is the next state of the of the ith bit of an array. The hardware
implementations of the rules are shown in the Figure 8(a) and Figure 8(b) respectively. These rules are
used to implement the pseudo random generators required in the different modules of the GA architecture.
The seed used in a PRNG directly influences the generated sequence of numbers. In the present
proposal, the PRNG seed can be given as an input to the hardware.
•
•
Load Constraint Module (LCM): The LCM is the initial state of the FSM based GA hardware
to initiate the operation and initialize all the states along with the memory modules.
Initial Population Generation Module (IPGM): The IPGM is implemented as the second state
of the FSM. IPGM generates an initial set of arbitrary solutions with the help of the 24-bit PRNG
and writes it into the parent memory. Since the memories are implemented using dual port block
Figure 8. Hardware implementations of Rule 90 (a) and Rule 150 (b) for generating random numbers
using LCA
85

A System on Chip Development of Customizable GA Architecture
•
•
•
•
86
RAMs, at every clock, two randomly generated chromosomes are written into the parent memory
by the IPGM at two consecutive addresses. Thus it requires half the number of clock cycles compared to the number of candidates in the parent memory.
Fitness Evaluation Module (FEM): The FEM is effectively the hardware map of the fitness
function of the problem at hand. It is the third state of the FSM. FEM takes two clock cycles to
complete its task. In the first clock, the FEM issues the address of the Solution Memory (SMem)
location whose fitness is to be evaluated. At the second clock, the content of the said location is
read, the fitness value is calculated and the calculated value is written into data-in register of the
Fitness Memory (FMem). Also the address of the FMem location at which this value is to be written is issued by FEM. Finally, after the 2nd clock cycle, the value is written into the said location.
The FEM issues the same address to both the SMem and the FMem so that the fitness value of
the solution present in a location of the SMem gets written into the corresponding location of the
FMem. This process continues for all candidates. FEM takes twice as many clock cycles as the
number of candidates.
Parent Selection Module (PSM): Some of the chromosomes from the parent memory are selected for performing genetic operations on them. These selected chromosomes constitute the mating
pool. The PSM does this work. In PSM, the tournament selection process is employed to select
the chromosomes. The PSM contains a PRNG for random selection of two chromosomes from the
first half of the SMem that is the PSMem, containing the parent generation. The PRNG generates
two random numbers in the range 0 to 15. These numbers act as addresses to the first half of the
SMem. They are fed into the address inputs of both the SMem and FMem to fetch a pair of parent
chromosome and their corresponding fitness values. In the second clock cycle, the PSM compares
the fitness values, and selects the chromosome having the better fitness value. Then, it writes the
chromosome into the first location of the second half of SMem which is the first location of the
OSMem that is the offspring memory. Thus the PSM state takes three clock cycles to populate the
mating pool with each candidate solution until the OSMem is filled up.
Prospective Solution Selection Module (PSSM): This module maps into hardware the theory
of “survival of the fittest”. The PSSM sorts the chromosomes in the decreasing order of their fitness value and finally updates the parent generation with the fitter set of 16 chromosomes for the
next iteration. The evaluation module is responsible for fitness evaluation of the chromosomes of
the offspring generation. After the genetic operations, the resultant offspring chromosomes and
the parent chromosomes are re-evaluated and compared to find the best solutions. The fittest set
of chromosomes replaces the chromosomes of the parent generation to form an updated parent
generation consisting of solutions closer to minimizing the objective function. The finite state
machine again repeats the operations from the Parent Selection Module if the best fitness value
obtained does not satisfy the maximum allowable error value of the objective function. The PSSM
employs bubble sort algorithm. It requires three clock cycles for each comparison. For a series of
m members, bubble sorting requires m*(m-1)/2 comparisons. The number of clock cycles required
is 3m*(m-1)/2 for m candidates and another clock cycle for the output updation.
Crossover Module (CM): The crossover module acts at the next state of the FSM based hardware
implementation. This module maps the genetic crossover process into hardware where genetic
traits from both the parent chromosomes are incorporated into the chromosome of the offspring.
In the current hardware implementation, at the first clock cycle, the PRNG present in CM generates a random number between 1 and 23 to select the random crossover point in the 24-bit long

A System on Chip Development of Customizable GA Architecture
•
mating pool candidate. A pair of chromosomes is fetched from the mating pool in the OSMem.
In the second clock cycle, the portion to the right of the selected bit is swapped between the two
chromosomes and these new chromosomes are written into the OSMem. Thus two new chromosomes are obtained at every two clock cycles. Thus in the present implementation, the CM takes
as many number of clocks as the number of candidates.
Mutation Module (MM): This module corresponds to the genetic mutation process where the
characteristics of an offspring changes abruptly from its parent. The MM operates at the seventh
state of the FSM. At first clock of this state, the candidate solution from the Offspring generation
is fetched and a random number generator generates a random value between 1 and 24 to select a
random bit for mutation. The selected bit of the candidate is logically inverted in the next clock
cycle. Thus in the current hardware implementation, the MM takes twice the number of clock
cycles as to the number of candidates.
Evidently, for m number of candidates, number of clock cycles required by LCM, IPGM, FEM, PSSM,
PSM, CM and MM are respectively 1, m/2, 2m, 3m*(m-1)/2+1, 3m, m and 2m. From the flowchart of
Figure 5 it is can be seen that, in the initial iteration, the modules operate in the sequence LCM, IPGM,
FEM, PSM, CM, MM, FEM, PSSM; for all the subsequent iterations, the modules operates in the order
PSM, CM, MM, FEM, PSSM and again starts from PSM and so on. So, if we denote the number of
clocks required for the initial cycle as nclki and the number of clocks required for the subsequent cycles
as nclks, then they are given as
nclki =
1
3m 2 + 18m + 4 2
(2)
nclki =
1
3m 2 + 13m + 2 2
(3)
(
(
)
)
For different functions and problem implementation the clock cycles consumed are indicated by above
expressions, however as the operating frequency of the implemented prototype changes the execution
time also changes. This shows the novelty of the FSM based proposed architecture.
The benchmark problems are detailed in the following section.
BENCHMARK PROBLEMS
The effectiveness of different evolutionary algorithms in function optimization is determined using a large
test set consisting of standard problems known as benchmark problems. The “no free lunch” theorem
(Wolpert, & Macready, 1997) proves that the average performance of any two searching algorithms will
be same if compared with all possible functions. No algorithm can be regarded as better than another in
solving all possible functions. Thus a particular algorithm is usually suitable to solve a set of problems
consisting of some common characteristics. To form an evaluation test set for an algorithm, the problem
set needs to be characterized for which the algorithm is suitable. Benchmark functions can be classified
87

A System on Chip Development of Customizable GA Architecture
in terms of characteristics like modality, separability, dimensionality, scalability, etc. The features of
these Benchmark functions are defined below:
•
•
Dimensionality: The difficulty of a problem is directly proportional to its dimensionality. The
size of the search space increases exponentially with increase in the dimensionality or number of
parameters in the problem. To introduce the same order of difficulty, all the problems in the present chapter are chosen to have dimensionality D=4.
Modality: The number of optimum (minima or maxima) present in the search space defines the
modality of a function. A function is multimodal if it has two or more local minima or maxima.
For a multimodal function, the algorithm may get stuck in one of the local minima (or maxima) during
the search process, and thereby fail to find the global minima (maxima). Thus the search process slows
down and it is difficult to find true optimal solutions for multimodal functions.
•
Separability: Separability of a benchmark function is a measure of the difficulty in solving it using evolutionary algorithms. The variables in a separable function are independent of each other,
while they are interdependent in inseparable functions. Separability is related to the interrelation
among the function variables. Non-separable functions are more difficult to optimize than separable functions. Thus a function with n variables is said to be separable if it can be expressed as a
sum of n functions each with a unique variable (Hadley, G., 1964; Ortiz-Boyer, Hervás-Martínez,
& García-Pedrajas, 2005). Thus, a function f of n variables f(x1, x2, x3, …, xn) is said to be separable
if it can be expressed in terms of n functions f1, f2, f3,…, fn, such that
n
f (x 1, x 2 , x 3 ,..., x n ) = ∑ fi (x i ) i =1
•
Scalability: A function is said to be scalable if it can be expressed in n-dimensional form where
n is any integer that is, its dimensionality can be changed. Otherwise, it is said to be non-scalable.
Let us consider the following two functions f and g as follows
n
f (x 1, x 2 , x 3 ,..., x n ) = ∑ fi (x i )
g(x 1, x 2 ) = x 1 + x 2
i =1
where, n may be 1, 2, 3, … or any integer. Thus it is evident that function g has a fixed dimensionality
of 2, whereas function f can have a dimensionality of 1, 2, 3, …, n depending on the value of n. Thus f
can be scaled to have different dimensional forms, while g has a fixed dimensionality. So, f is called a
scalable function and g is a non-scalable function.
The benchmark problems used in this chapter to evaluate the performance of the proposed EA based
architecture are detailed below. Here, the dimension is given by D; the domain size is denoted by Lb≤
xi ≤ Ub, where Lb and Ub are the upper and lower bounds of the domain; X* denotes the value of the
variables at the global minima ;and, F(X*) = F(x1,…, xn) is the optimal solution.
88

A System on Chip Development of Customizable GA Architecture
•
Sphere Function:
D
F1 (x ) = ∑ x i2 i =1
where, -5.12≤ xi ≤5.12; global minima at X*= (0,.., 0); F(X*) = 0.
•
Schwefel’s Double Sum Function:
2
 i

F2 (x ) = ∑ ∑ x j  

D
i =1
j =1
where, -65≤ xi ≤ 65 ; global minima at X*= (0,.., 0); F(X*) = 0.
•
De Jong’s Function 4 (No Noise):
D
F3 (x ) = ∑ ix i4 i =1
where, -1.28≤ xi ≤1.28; global minima at X*= (0,.., 0); F(X*) = 0.
•
Powell Sum Function:
D
F4 (x ) = ∑ x i
i =1
i +1
where, -1≤ xi ≤1; global minima at X*= (0,.., 0); F(X*) = 0.
•
Rastrigin Function:
D
(
)
F5 (x ) = 10D + ∑ x i2 − 10 cos(2πx i ) i =1
where, -5.12≤ xi ≤5.12; global minima at X*= (0,.., 0); F(X*) = 0.
•
Griewangk’s Function:
 x 2  D
 x 

F6 (x ) = ∑  i  − ∏ cos  i  + 1 
 i 
i =1 
 4000  i =1
D
89

A System on Chip Development of Customizable GA Architecture
where, -600≤ xi ≤600; global minima at X*= (0,.., 0); F(X*) = 0.
•
Ackley 1 Function (Ackley Path Function):


D
D



F7 (x ) = −20 exp −0.2 D −1 ∑ x i2  − exp D −1 ∑ cos(2πx i ) + 20 + e 



i =1
i =1
where, -32≤ xi ≤ 32; global minima at X*= (0,.., 0); F(X*) = 0.
•
Cosine Mixture Function:
D
D
i =1
i =1
F8 (x ) = −0.1∑ cos(5πx i ) − ∑ x i2 where, -1≤ xi ≤1; global minima at X*= (0,.., 0); F(X*) = (0.2 or 0.4) for D=2 and 4 respectively.
•
Csendes Function:
D
F9 (x ) = ∑ x i6 (2 + sin(1 / x i )) i =1
where, -1≤ xi ≤1; global minima at X*= (0,.., 0); F(X*) = 0.
•
Solomon Function:


F10 (x ) = 1 − cos 2π

D
∑x
i =1
2
i

 + 0.1


D
∑x
i =1
2
i
where, -100≤ xi ≤100; global minima at X*= (0,.., 0); F(X*) = 0.
The characteristics of the above benchmark functions are given in Table 3.
EXPERIMENTAL STUDY
This section describes the FPGA hardware implementation results, functional verification and performance analysis of the proposed GA hardware.
Performance Metrics: The metrics used in this chapter to evaluate the performance of the proposed
architecture are as follows:
90

A System on Chip Development of Customizable GA Architecture
Table 3. Characteristics of the benchmark functions F1- F10
Function
Multimodal
Separable
Scalable
F1
No
yes
Yes
F2
No
no
Yes
F3
No
no
Yes
F4
no
yes
Yes
F5
yes
yes
Yes
F6
yes
no
Yes
F7
yes
no
Yes
F8
yes
yes
Yes
F9
yes
yes
Yes
F 10
yes
no
Yes
•
•
•
•
Number of Function Calls (NFCs): The convergence speed of the architecture is measured in
terms of number of function calls (NFCs) which has been used as a metric in works by Suganthan
et al. (2005), Andre, Siarry, and Dognon, (2001), and Hrstka, and Kučerová, (2004). This gives
the number of candidate solutions evaluated by the architecture before finding the best solution.
The percentage of solution space evaluated before finding the best solution gives a measurement
of the speed up achieved over an exhaustive search.
Success Rate: It is the percentage of runs when the architecture finds a solution for a problem.
Success Performance: It is the mean number of NFCs for successful runs multiplied by the number of total runs and divided by the number of successful runs.
Parameter Settings: The different GA parameter settings for the conducted experiments are detailed
below:
•
•
•
•
4-dimensional version of the functions was considered.
The chromosome length for all the experiments was fixed at 24. Also the predicted values were
considered to be signed. Hence for four variable functions, each variable range is between -32 and
31.
Population size can be varied among 8, 16, 32 and 64. Good results were obtained keeping it at
the minimum value of 8. So throughout all the experiments, the population size was kept fixed at
8, that is, each population contains 8 candidate solutions.
Maximum NFCs is fixed at 104.
Since the chromosome length was considered to be 24, the size of the solution space becomes 16777216.
The design verification and functional analysis are presented in a few steps. First, the architecture was
coded and simulated for functional verification. Then, it was synthesized and implemented on the target
FPGA device (XC4VLX25) to analyze the performance of the GA architecture. During both simulation
and FPGA implementation, the architecture was used to optimize some Benchmark Function (detailed
below) to test its functional correctness. Finally, an FIL testing environment was set up to test the ar-
91

A System on Chip Development of Customizable GA Architecture
chitecture with standard Benchmark functions and study its performance. These experimental steps are
detailed below in this section.
SIMULATION AND FUNCTIONAL VERIFICATION
The architecture was coded using Verilog Hardware description language and simulated using Xilinx
ISE 10.1i simulator. The parent and the offspring memories contain the solutions that are modified during each GA step like selection, cross over, mutation, etc. To check for functional correctness during
simulation the memory contents generated after each step were written in text files. Thus the evolutionary change in the solutions through the generations of GA can be observed by looking at the files. The
functioning of the designed architecture was also verified. During the simulations, the following fitness
function was optimized to test the performance. The function was scaled to its 4- dimensional version
and optimized using the GA architecture. The function G1 below is a minimization function
D
G1 (x ) = −30 + ∑ ix i i =1
where, the minimum value is, min (G1)= G1(0,..,0)=0.
The 4-D version of G1 is
G1 (x ) = −30 + x 1 + 2x 2 + 3x 3 + 4x 4 SYNTHESIS AND FPGA IMPLEMENTATION
The GA based architecture was synthesized by Xilinx XST using Xilinx Virtex IV (XC4VLX25) as the
target device on platform ML401. The synthesis result and device utilization summary is shown in Table
4. The synthesized design was downloaded into the FPGA device and its performance was verified using
the FPGA in the Loop (FIL) environment. Figure 9 shows the FIL based experimental setup. Here the
FIL block denotes the Simulink block that represents the Xilinx Virtex IV (XC4VLX25) device. The
‘Seed’ and the ‘Act’ respectively denotes the SEED and the ACTIVATE inputs to this architecture, the
best solution of a generation consisting of a set of four values for a 4-dimentinal problem, are obtained
through the ‘x1’, ‘x2’, ‘x3’,‘x4’ outputs. A scope is used to plot the output values against time and workspace variables are used to record the outputs.
The GA based architecture was implemented and used to optimize the above fitness function G1. The
function G1 was implemented into the FEM. The results for six independent runs are shown in Table 5.
Here, the best fitness values achieved for different input seeds are presented. The “Convergence Generation Number” column lists the number of generations required to achieve convergence for a given seed.
Figure 10 illustrates the convergence graph of G1.
92

A System on Chip Development of Customizable GA Architecture
Table 4. Device utilization report for implementation of the proposed GA hardware in Xilinx Virtex IV
FPGA (XC4VLX25)
Component
Slices
Available
Used
Utilization %
10752
531
4%
Slice Flip Flops
21504
445
2%
4 input LUTs
21504
921
4%
Bonded IOBs
448
90
20%
Dual port Block Memory (RAMB 16s)
72
2
2%
GCLKs
32
1
3%
DSP48s
48
1
2%
Figure 9. FIL simulation based experimental setup for the GA architecture
Figure 10. Convergence plot for the test function G1 calculator
93

A System on Chip Development of Customizable GA Architecture
Table 5. The best fitness values achieved for different input seeds for function F1
Run Number
Seed
Fitness
Convergence
Generation Number
NFCs
% of Solution Space
Evaluated (16777216)
1
2
0
3
24
.00014
2
5
0
4
32
.00019
3
6
0
9
72
.00043
4
15
0
12
96
.00057
5
17
0
8
64
.00038
6
29
8
5
40
.00024
PERFORMANCE ANALYSIS OF THE PROPOSED GA STRUCTURE:
OPTIMIZATION OF STANDARD BENCHMARK FUNCTIONS
FIL environment using MATLAB Simulink and a slightly modified version of the GA architecture was
set up to test the hardware performance using the benchmark functions. Here, the objective function is
directly written as input to the setup. The FEM of the hardware was modified. The FEM outputs the
predicted solution which is fed as an input to a Simulink block into which the fitness function is written.
The fitness value of the solution is calculated inside this block. Output of the block is connected to the
FEM which stores the fitness value into the FVMem. Thus, the need for recoding and re-synthesis of
the basic GA hardware for each function is eliminated. Figure 11 illustrates the Simulink model of the
experimental setup. The FIL block has inputs named ‘seed’, ‘act’, and outputs namely ‘x1’, ‘x2’, ‘x3’,‘x4’
Figure 11. Experimental setup for evaluation of the GA architecture using benchmark function in FIL
environment
94

A System on Chip Development of Customizable GA Architecture
and ‘error’. The MATLAB function block is used to write the fitness function. The pred_x1, pred_x2,
pred_x3, and pred_x4, outputs from the FIL block give the predicted solution into the MATLAB function
block and the calculated fitness value is fed into the architecture through the ‘value_in’ input.
The test results for optimization of the benchmark functions using the proposed GA hardware are
presented in Table 6 and Table 7. The NFCs are fixed as mentioned above and 40 independent runs
were performed for each benchmark function. The best, median, worst, mean and standard deviation
of the fitness values generated for these 40 runs are presented in Table 6 for the benchmark functions.
In Table 7, the best, median, worst, mean and standard deviation of the number of function calls
(NFCs) required in finding the best fitness are presented. The table also presents the success rate, success
Table 6. The best, median, worst, mean and standard deviation of the function error values generated
over 40 runs for test functions F1-F10
Best
Median
Worst
Mean
Standard Deviation
F1
Function
0
0.0625
0.0625
0.0404
0.0308
F2
0
0.0625
0.125
0.0592
0.057
F3
0
0
0.1875
0.006696
0.029715
F4
0
0
0.0625
0.003125
0.013975
F5
0
0
1
0.125
0.3536
F6
0
0.0313
0.0625
0.0198
0.0209
F7
0
0
1.09375
0.328125
0.528331
F8
.0313
0.0625
0.21875
0.051786
0.032987
F9
0
0
0
0
0
F 10
.0938
0.09375
0.1875
0.101351
0.025943
Table 7. The best, median, worst, mean and standard deviation of the number of function calls (NFCs)
required in finding the best fitness for the test functions F1-F10 over 40 test runs
Function
Best
Median
Worst
Mean
Standard
Deviation
Success
Rate
Success
Performance
Average
% of Total
Solution
Space
Evaluated
F1
64
232
384
208.4706
113.7179
100%
208.4706
.0012
F2
64
280
3096
551
912.8842
100%
551
.0033
F3
40
192
440
188.7619
99.52958
100%
188.7619
.0007
F4
64
200
376
175.6
97.75501
100%
175.6
.0010
F5
64
72
120
75
18.6088
100%
75
.00044
F6
48
136
832
162.9333
143.9461
100%
162.9333
.00097
F7
64
80
224
121.6
69.10732
100%
121.6
.00072
F8
72
176
384
173.9429
73.56068
100%
173.9429
.00103
F9
48
100
272
113.8462
82.2502
100%
113.8462
.00068
F 10
80
544
3680
867.8919
833.8135
100%
867.8919
.00517
95

A System on Chip Development of Customizable GA Architecture
performance and average percentage of total solution space evaluated before convergence, as these gives
a measure of the speedup over exhaustive search processes. It can be seen that for every function, the GA
hardware requires evaluating a very nominal percentage of the total solution space and this indicates a
very high speed up over exhaustive search.
The Figure 12 shows the convergence graphs for functions F1 to F5 the Figure 13 shows the same for
functions F6 to F10. The convergence plots for all the benchmark test functions indicate that, the proposed
GA hardware converges within only 8 to 10 generations.
Figure 12. Convergence plot for the test functions F1, F2, F3, F4 and F5
Figure 13. Convergence plot for the test functions F6, F7, F8, F9 and F10
96

A System on Chip Development of Customizable GA Architecture
COMPARISON WITH SOFTWARE IMPLEMENTATION AND
PREVIOUS GENERAL PURPOSE HARDWARE GAs
In this section, the speeds of the implemented GA hardware and GA software are compared. In order to
make the comparison fair enough, the different parameters of the GA like population size, selection and
crossover, are set same in both the MATLAB simulation and hardware implementation. The software
was run to solve the same benchmark functions. The time required for the GA architecture to process
each GA population can be calculated as the product of the number of clock cycles required and the
clock period (Fernando et al.,2010). From Equations 2 and 3, we know that, when population size is 8,
the total number of clock cycles required for the initial iteration (nclki) and subsequent iterations (nclks)
are 170 and 149 respectively. If the number of iterations required for convergence is known, the hardware execution time for convergence of the test functions can be calculated. The largest available clock
frequency of the prototype implementation as obtained from the actual synthesis report is 36.757MHz
giving the clock period of 27.205ns. The speed up of the hardware GA against the software GA is shown
in Table 8. Figure 14, compares the average hardware and software execution times to converge for the
functions F1 to F10. It can be observed that the hardware execution time is negligibly small in comparison
to the software execution time.
In Table 9, the reported speed ups of different hardware GAs and that of the proposed hardware is
compared. Here, the best speed ups reported in the literatures are tabulated. The average speedup of
the proposed hardware in solving the functions F1 to F10 is considered to compare it with the previous
implementations. Figure 15 shows, the comparison between the respective speedups. The proposed
GA architecture exhibits a very high speed up compared to the speed ups of previous implementations.
CONCLUSION
This chapter presents a generalized prototype of evolutionary algorithm based hardware for solving real
parameter optimization problems. The hardware is tested for various types of benchmark problems. To
Table 8. Speed comparison of the proposed GA hardware with software GA implementation
Function
Avg. Software GA Execution
Time (msec)
Avg. Hardware GA Execution
Time(msec)
Hardware GA Speed Up
F1
55.347
0.106182406
521.2445461
F2
58.1338
0.2797078
207.8376077
F3
54.3252
0.096197979
564.7228853
F4
57.6582
0.08953016
644.0086782
F5
58.4622
0.0385662
1515.892154
F6
55.1854
0.08311321
663.9786882
F7
54.0796
0.06217376
869.8138893
F8
54.3798
0.088690673
613.1400076
F9
56.4766
0.058245685
969.6271935
F 10
56.0422
0.440245237
127.2976863
97

A System on Chip Development of Customizable GA Architecture
Figure 14. Comparison between execution time of hardware and software implementation of the GA for
convergence of the functions F1 to F10
Figure 15. Comparison of the reported speedups of previous hardware GA implementations and the
proposed GA hardware
98

A System on Chip Development of Customizable GA Architecture
Table 9. Speed comparison of the proposed GA hardware with previous hardware GA implementations
Work
Speed Up
Scott (1994)
18.8x
Graham & Nelson (1996)
10.6x
Tommiska & Vuori (1996)
212x
Shackleford
160x
Koonar (2003)
100x
Tang & Leslie (2004)
10.68x
Fernando et al. (2010)
5.16x
Proposed
669.75x
best of the authors’ knowledge, none of the evolutionary algorithm based hardware proposed till date
has been successfully tested for so many standard benchmark problems. The hardware is also not re synthesized for optimizing various problems and thus provides a flexible platform for optimizing different
kinds of problems. A prototype of the hardware has been developed using Verilog HDL. The prototype
has been implemented on a single FPGA Xilinx Virtex IV (XC4VLX25) chip. Detailed performance
comparison has been given with respect to the software implementation of the own work and also other
works reported in the literature. Thus we get a very fast, compact and flexible entity that can be used for
optimization requirements in real time applications such as one of them is spectrum sensing in cognitive
radio environment.
ACKNOWLEDGMENT
The work is undertaken as part of Media Lab Asia project entitled “Mobile Broadband Service Support
over Cognitive Radio Networks”.
REFERENCES
Back, T. (1996). Evolutionary algorithms in theory and practice. Oxford Univ. Press.
Holland, J. H. (1975). An introductory analysis with applications to biology, control, and artificial intelligence. In Adaptation in Natural and Artificial Systems (1st ed.). The University of Michigan.
Golberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Addison Wesley.
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
Graham, P., & Nelson, B. (1996, April). Genetic algorithms in software and in hardware-a performance
analysis of workstation and custom computing machine implementations. In FPGAs for Custom Computing
Machines, 1996. Proceedings. IEEE Symposium on (pp. 216-225). IEEE. doi:10.1109/FPGA.1996.564847
99

A System on Chip Development of Customizable GA Architecture
University of Guelph School of Engineering, & Koonar, G. K. (2003). A reconfigurable hardware implementation of genetic algorithms for vlsi cad design. University of Guelph.
Areibi, S., Moussa, M., & Koonar, G. (2005). A genetic algorithm hardware accelerator for VLSI circuit
partitioning. International Journal of Computers and Their Applications, 12(3), 163.
Scott, S. D., Samal, A., & Seth, S. (1995, February). HGA: A hardware-based genetic algorithm. In
Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays (pp.
53-59). ACM.
Tommiska, M., & Vuori, J. (1996, August). Hardware implementation of GA. In Proceedings of the
Second Nordic Workshop on Genetic Algorithms and their Applications (2NWGA).
Shackleford, B., Okushi, E., Yasuda, M., Koizumi, H., Seo, K., Iwamoto, T., & Yasuura, H. (2001). Highperformance hardware design and implementation of genetic algorithms. In Hardware implementation
of intelligent systems (pp. 53–87). Physica-Verlag HD. doi:10.1007/978-3-7908-1816-1_2
Aporntewan, C., & Chongstitvatana, P. (2001, May). A hardware implementation of the compact genetic
algorithm. In IEEE Congress on Evolutionary Computation (pp. 624-629). doi:10.1109/CEC.2001.934449
Unlt, G. P. (2004). Hardware implementation of genetic algorithms using FPGA. Academic Press.
Vavouras, M., Papadimitriou, K., & Papaefstathiou, I. (2009, July). High-speed FPGA-based implementations of a genetic algorithm. In Systems, Architectures, Modeling, and Simulation, 2009. SAMOS’09.
International Symposium on (pp. 9-16). IEEE doi:10.1109/ICSAMOS.2009.5289236
Chen, P. Y., Chen, R. D., Chang, Y. P., & Malki, H. A. (2008). Hardware implementation for a genetic
algorithm. Instrumentation and Measurement. IEEE Transactions on, 57(4), 699–705.
Fernando, P. R., Katkoori, S., Keymeulen, D., Zebulum, R., & Stoica, A. (2010). Customizable FPGA IP
core implementation of a general-purpose genetic algorithm engine. Evolutionary Computation. IEEE
Transactions on, 14(1), 133–149.
Kok, J., Gonzalez, L. F., & Kelson, N. (2013). FPGA implementation of an evolutionary algorithm for
autonomous unmanned aerial vehicle on-board path planning. Evolutionary Computation. IEEE Transactions on, 17(2), 272–281.
Nambiar, V. P., Balakrishnan, S., Khalil-Hani, M., & Marsono, M. N. (2013). HW/SW co-design of
reconfigurable hardware-based genetic algorithm in FPGAs applicable to a variety of problems. Computing, 95(9), 863–896. doi:10.1007/s00607-013-0305-5
Ashraf, R., & DeMara, R. F. (2013). Scalable FPGA refurbishment using netlist-driven evolutionary
algorithms. Computers. IEEE Transactions on, 62(8), 1526–1541.
Manikas, T. W., & Cain, J. T. (1996). Genetic algorithms vs. simulated annealing: A comparison of
approaches for solving the circuit partitioning problem. Academic Press.
Bui, T. N., & Moon, B. R. (1998). GRCA: A hybrid genetic algorithm for circuit ratio-cut partitioning.
Computer-Aided Design of Integrated Circuits and Systems. IEEE Transactions on, 17(3), 193–204.
100

A System on Chip Development of Customizable GA Architecture
Rieser, C. J. (2004). Biologically inspired cognitive radio engine model utilizing distributed genetic
algorithms for secure and robust wireless communications and networking. (Doctoral dissertation).
Virginia Polytechnic Institute and State University.
Rondeau, T. W., Le, B., Rieser, C. J., & Bostian, C. W. (2004, November). Cognitive radios with genetic
algorithms: Intelligent control of software defined radios. In Software defined radio forum technical
conference (pp. C3-C8).
Zhao, Z., Peng, Z., Zheng, S., & Shang, J. (2009). Cognitive radio spectrum allocation using evolutionary algorithms. Wireless Communications. IEEE Transactions on, 8(9), 4421–4425.
Deka, R., Chakraborty, S., & Roy, S. J. (2012). Optimization of spectrum sensing in cognitive radio using
genetic algorithm. Facta universitatis-series. Electronics and Energetics, 25(3), 235–243.
Dieterich, J. M., & Hartke, B. (2012). Empirical review of standard benchmark functions using evolutionary global optimization. arXiv preprint arXiv:1207.4318
Jamil, M., & Yang, X. S. (2013). A literature survey of benchmark functions for global optimisation
problems. International Journal of Mathematical Modelling and Numerical Optimisation, 4(2), 150–194.
doi:10.1504/IJMMNO.2013.055204
Suganthan, P. N., Hansen, N., Liang, J. J., Deb, K., Chen, Y. P., Auger, A., & Tiwari, S. (2005). Problem
definitions and evaluation criteria for the CEC 2005 special session on real-parameter optimization.
KanGAL Report, 2005005.
Holleman, J., Bridges, S., Otis, B. P., & Diorio, C. (2008). A 3 W CMOS true random number generator
with adaptive floating-gate offset cancellation. Solid-State Circuits. IEEE Journal of, 43(5), 1324–1336.
Meysenburg, M. M., Foster, J., Saghi, G., Dickinson, J., Jacobsen, R. T., & Shreeve, J. N. M. (1997). The
effect of psuedo-random number generator quality on the performance of a simple genetic algorithm.
Academic Press.
Meysenburg, M. M., & Foster, J. A. (1999). Randomness and GA performance, revisited. Academic Press.
Cantú-Paz, E. (2002, July). On Random Numbers And The Performance Of Genetic Algorithms. GECCO.
Wolfram, S. (1984). Universality and complexity in cellular automata. Physica D. Nonlinear Phenomena,
10(1), 1–35. doi:10.1016/0167-2789(84)90245-8
Hortensius, P. D., McLeod, R. D., & Card, H. C. (1989). Parallel random number generation for VLSI
systems using cellular automata. Computers. IEEE Transactions on, 38(10), 1466–1473.
Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. Evolutionary
Computation. IEEE Transactions on, 1(1), 67–82.
Hadley, G. (1964). Nonlinear and dynamic programming. Academic Press.
Ortiz-Boyer, D., Hervás-Martínez, C., & García-Pedrajas, N. (2005). CIXL2: A Crossover Operator
for Evolutionary Algorithms Based on Population Features. Journal of Artificial Intelligence Research,
24, 1–48.
101

A System on Chip Development of Customizable GA Architecture
Andre, J., Siarry, P., & Dognon, T. (2001). An improvement of the standard genetic algorithm fighting
premature convergence in continuous optimization. Advances in Engineering Software, 32(1), 49–60.
doi:10.1016/S0965-9978(00)00070-3
Hrstka, O., & Kučerová, A. (2004). Improvements of real coded genetic algorithms based on differential operators preventing premature convergence. Advances in Engineering Software, 35(3), 237–246.
doi:10.1016/S0965-9978(03)00113-3
KEY TERMS AND DEFINITIONS
Benchmark Problems: It is a set of standard optimization problems consisting of various types of
functions, used for evaluation, characterization and performance measurement of optimization algorithm.
Behavior of the algorithms under different environmental conditions can be predicted using benchmark
functions.
Embedded Applications: It is a software or hardware based computer system included as a part of
a larger device and dedicated to perform certain real-time function with some constraints.
Evolutionary Algorithm: A set of meta-heuristic, population-based optimization techniques that
uses nature inspired processes such as selection, reproduction, recombination, mutation, etc.
Hardware Description Language: (HDL): A language used to describe the design and functioning of hardware in a software environment that enables to verify the design constraints and then help to
implement the design in actual hardware.
Hardware–In-Loop: A platform to test real-time systems consisting of the device under test (DUT)
in loop with the mathematical representations of the other related dynamic systems.
Objective Function: It is a real-valued function to be optimized under some constraints and it defines
the relationship between input and output of a system which is represented by the function.
System on Chip (SoC): It is a low power computer or an electronic system, capable of various analog and/or digital and/or radio-frequency functions and fabricated into a single integrated circuit (IC).
102