Generating, Maintaining, and Exploiting Diversity
in a Memetic Algorithm for Protein Structure
Prediction
Mario Garza-Fabre, Shaun M. Kandathil, Julia Handl, Joshua Knowles, Simon C.
Lovell
Presentation by Michiel Braat, Hugo Heemskerk, Kambiz Sekandar and Matthijs de Wachter
Protein structure prediction
– Applicable in medicine
– We have: amino acid sequence
– We want: 3d model of protein
– Not the same as dynamic process of protein folding
Folds
By Thomas Splettstoesser (www.scistyle.com) - Own work,
CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=28353539
Problems of protein structure prediction
1) Combinatorial explosion
2) Difficult to explore diverse set of protein folds
3) Energy configuration function of proteins is
3A) Deceptive
3B) Inaccurate
The problems in GA terms
1) Combinatorial explosion
2) Difficult to explore diverse set of protein folds
3) Energy configuration function of proteins is
3A) Deceptive
3B) Inaccurate
1) Rugged fitness landscape
2) Loss of diversity
3A) Deceptive fitness function
3B) Inaccurate fitness function
Solutions
– Genetic local search (memetic algorithm)
– Specialised genetic operators
– Generalised stochastic ranking
– Conformational diversity measures (tell apart compact structures with different
folds)
Protein structure construction algorithms
– Homologous proteins (global similarity), hard problem
– Fragment-assembly (local similarity), more recent, seems more promising
– Turns protein folds into combinatorial optimisation
– Worse for larger proteins/with many self-touching segments
Fragment-assembly
– Divide target protein into amino acid fragments
– Match then extract fragments of known
proteins
– Recombine fragments with an optimisation
scheme
– Generates a low-resolution model
– Key advantage: no prior similar proteins
required
Rosetta heuristic
– This is the local search algorithm used
– Uses fragment-assembly protein as base model
– Varies backbone torsion angles (“protein skeleton” rotations)
Rosetta-based memetic algorithm (RMA)
- Rosetta as local search strategy
- Genetic operators use specific problem knowledge (about secondary
structures)
- Ranked Selection over Parents+Offspring
- Evaluation of the energy state as only evaluation (for now...)
RMA - variation
- 2 point crossover on loops
- loop locations based on secondary structure predictions
RMA - Mutation
- Mutation by fragment insertion
- Only done on amino acid residues part of a loop
Energy evaluation VS RMSD
- Optimal energy function does not always give the best conformation to the
real thing!
- RMSD corresponds to the real structure of the proteins
- Root mean square deviation (distances between secondary structures)
RMA vs Rosetta
1000 local searches
30 different proteins
Rosetta = blue
RMA = red
Genetic Operator and Exploration
Influence of the specific genetic operators on the exploration of the different folds
Experiment with:
1. no operators
2. normal 2 point crossover and normal rosetta mutations
3. original RMA
4. original RMA with wrong secondary structure information
Genetic Operator and secondary structure
Protein: 1ehn
3 secondary structures
distances between structures
darker red = more exploration
How to deal with inaccuracies
- No correlation between energy
and RMSD
- Diversity is a measure for RMSD
How to deal with inaccuracies
- Stochastic ranking for dealing with 2 criteria
- Algorithm is based on a bubble-sort like procedure
- Based on probabilities
Experimental Results
Three different values for the parameter of stochastic ranking were analysed:
ρ ∈ {.45, .5, .55}
These were compared with Rosetta, RMA with energy-based selection and each
other.
Experimental Results
R=
Rosetta
E=
Energy-based selection RMA
S=
Stochastic based selection RMA
with ρ = {.55, .5, .45}
Stochastic ranking reduced selection pressure
All forms of RMA, except ρ = .45, outperformed Rosetta
Experimental Results
R=
Rosetta
E=
Energy-based selection RMA
S=
Stochastic based selection RMA
with ρ = {.55, .5, .45}
Consideration of structural diversity has increased the likelihood of the RMA
reaching and preserving more native-like conformations
ρ = .5 seems to produce the most competitive performance
Experimental Results
Fragment-assembly methods rely on the existence of native-like configurations in
the conformational space defined by the fragment libraries employed
For some targets no native-like structures were sampled
This may mean the libraries used for this study are lacking, and deserves further
investigation.
Diversity Generation and Preservation
Next, we examine the effect of the genetic operators and the survival selection
strategy on the diversity generation and preservation.
Diversity Generation and Preservation
Without genetic operators, the energy-driven RMA (i)
produces compact, well-defined solution clusters.
The lack of mechanisms boosting exploration and high
selection pressure can lead to premature convergence.
Adding recombination and mutation (ii), and using
stochastic selection (iii) both increase diversity. Combining
these (iv), however, gives the best results.
Diversity Generation and Preservation
Having two criteria causes a drop in offspring survival.
This slows the convergence speed and results in higher diversity.
Discussion
Cons:
Accuracy was only tested on known protein structures
Pros:
Generally, applying GAs to other fields of study leads to new challenges in
genetic computation research
Specifically in this paper: Inaccurate fitness function
⇒ Solution: Selecting for diversity
© Copyright 2025 Paperzz