A High-Performance, Pipelined, FPGA

A High-Performance, Pipelined,
FPGA-Based
Genetic Algorithm Machine
(2001)
Barry Shackleford, Greg Snider, Richard Carter, Etsuko Okushi, MitsuhiroYasuda,
Katsuhiko Seo, Hiroto Yasuura
A Paper Review by Griffin Lacey
OUTLINE
•
•
•
•
•
•
•
INTRODUCTION
ALGORITHM
ARCHITECTURE
PERFORMANCE
RESULTS
CONTRIBUTIONS
CRITIQUE
INTRODUCTION
• What is a genetic algorithm (GA)?
▫ Search heuristic
▫ Mimics process of natural
evolution
▫ Starts with random
population of candidate
solutions
▫ Terminates when desired
fitness level achieved
WHAT ARE THE PROBLEMS WITH GA?
• One major drawback
▫ Slow execution speed when implemented on GPP
• How to overcome this?
▫ Parallel processing
• Implement as hardware pipeline on FPGA
▫
▫
▫
▫
Parent selection
Crossover
Mutation
Survival
• How to program GA?
▫ Design pipelined fitness function for the
problem to be solved
ALGORITHM
ALGORITHM NOTATION
• One-dimensional population array
▫ Each entry contains cdata and cfitness
• FUNCTIONS
▫ Fitness(Cdata)
▫ Crossover(cut_prob,p1data,p2data)
▫ Mutation(mutation_prob,cdata)
ALGORITHM EXPLANATION
1. Randomly generated population is assigned
fitness values
2. Randomly select parents
•
•
3.
4.
5.
6.
7.
parent2 <- parent1
parent1 <- random
Child created via crossover function
Child then exposed to mutation
Child evaluated by fitness function
If child is fitter than one of parents -> replacement
Eventually survival rate diminishes to zero
Pseudo-code for steadystate GA readily
implementable in hardware
ALGORITHM RATIONALE
1. Population storage
▫
Steady-state allows population array to be
implemented as single memory
2. Parent selection
▫
By replacing old parent with new parent, only one
clock cycle needed for parent pair
3. Crossover and mutation
▫
Performed every clock cycle
4. Survival-driven evolution
▫
Evolution promoted through survival
ALGORITHM VALIDITY
Have compromises been made that damage the functional
integrity of the GA?
Royal Road Function
• Optimum solution
achieved after ≈
6,000 crossovers
• Speedup of 10x
over GA used in
Royal Road
experiment
ARCHITECTURE
• 6 stage pipeline
• Equal processing
time for each stage
DATAPATH
• Significant portion of GA circuitry
• In bit-slice, there are:
▫ 5 logic functions
▫ 7 flip-flops
• Cost is 8 LUT’s per bit-slice
▫ Under assumption of 2-output LUT
• Total cost in LUT’s is 8nd
• Parent Registers
▫ Hold signal prevents re-entry
• Crossover
▫ Crossover template controls cutpoint variation
• Mutation
▫ Controlled by AND, XOR
• Child Registers
▫ Connected to fitness function and population memory
CROSSOVER
MUTATION
PERFORMANCE
• Time for each stage:
▫ fc = clock frequency
• Net throughput:
▫ Nf = number of function units
▫ Ii = Initiation interval of fitness
function
RESULTS
First Prototype
Second Prototype
Theoretical
Problem Type
Set-Covering
Protein Folding
Protein Folding
FPGA
Implementation
6 Aptix AXB-MP3
FPGAs
1 MHz
Xilinx SCV300
66 MHz
Xilinx XCV3200E
Software
Implementation
Workstation
100 MHz
Pentium II
366 MHz
Workstation
100 MHz
Speedup
2,200x
320x
9,600x
CONTRIBUTIONS
• Bit-slice design which is amenable to FPGA
implementation
• A net child chromosome generation rate of one per
clock cycle is obtained
CRITIQUE
• Doesn’t emphasize what makes this algorithm
superior to others
• Theoretical speedup of 9,600x relies on large FPGA
and many complex fitness functions, but genetic
algorithms do not scale well with complexity
• Would like to see more diagrams/figures to help
explain concepts
QUESTIONS?