Global optimization and Monte Carlo

Introduction to Bioinformatics: Lecture XVI
Global Optimization and Monte Carlo
Jarek Meller
Division of Biomedical Informatics,
Children’s Hospital Research Foundation
& Department of Biomedical Engineering, UC
JM - http://folding.chmcc.org
1
Outline of the lecture





Global optimization and local minima problem
Physical map assembly, ab initio protein
folding and likelihood maximization as
examples of global optimization problems
Biased random search heuristics
Monte Carlo approach
Biological motivations and genetic algorithms
JM - http://folding.chmcc.org
2
Optimization, steepest descent and local minima
Optimization is a procedure in which an extremum of a function is sought.
When the relevant extremum is the minimum of a function the optimization
procedure is called minimization.
f(x)
Local minimum
Local minimum
Local minimum
Global minimum
JM - http://folding.chmcc.org
3
Rugged landscapes and local minima (maxima) problem
JM - http://folding.chmcc.org
4
Algorithmic complexity of global optimization




Polynomial vs. exponential complexity, e.g., n2 vs. 2n
steps to obtain the optimal solution where n denotes
the overall “size of the input”
Global optimization term is used to refer to
optimization problems for which no polynomial time
algorithm that guarantees optimal solution is known
In general global optimization implies that there
might be multiple local minima and thus one is likely
to find a local rather than the global optimum
Let us revisit some of the global optimization
problems that we stumbled on so far …
JM - http://folding.chmcc.org
5
The problem of ordering clone libraries with STS
markers in the presence of errors
In the presence of experimental errors the problem leads to global
optimization problem (see Pevzner, Chapter 3).
STS:
1
2
3
4
5
DNA
clone 1
clone 2
clone 3
clone 4
STS
Clone
5
4
1
3
2
1
2
3
4
5
1
0
1
0
1
0
1
0
0
1
1
0
2
0
1
1
0
1
2
1
1
0
1
0
3
0
1
0
1
1
3
0
1
1
1
0
4
1
0
0
1
0
4
0
0
1
0
1
6
Heuristic solutions may still provide good probe ordering
The number of “gaps” (blocks of zeros in rows) in the hybridization matrix
may be used as a cost function, since hybridization errors typically split
blocks of ones (false negatives) or split a gap into two gaps (false positive).
The problem of finding a permutation that minimizes the number of gaps
can be cast as a Traveling Salesman Problem (TSP), in which cities are the
columns of the hybridization matrix (plus an additional column of zeros)
and the distance between two cities is the number of positions in which
the two columns differ (Hamming dist.)
Thus, an efficient algorithm is unlikely in general case (unless P=NP) and
heuristic solutions are being sought that provide good probe ordering, at
least for most cases (e.g. Alizadeh et. al., 1995)
JM - http://folding.chmcc.org
7
Profile HMMs and likelihood optimization when
states (optimal multiple alignments) are not known
JM - http://folding.chmcc.org
8
Random biased search: ideas and heuristics
GA, MC, SA (MC with a smoothing)
Fitness lanscapes
Biological and physical systems solve these “unsolvable” problems:
From optimization to biology and back to optimization
JM - http://folding.chmcc.org
9
Literature watch: 10 years of DNA computing
Adleman LM, Molecular computation of solutions to combinatorial problems,
Science 266:1021-4 (1994)
RS Braich, N Chelyapov, C Johnson, PWK Rothemund, and L Adleman,
Solution of a 20-Variable 3-SAT Problem on a DNA Computer,
Science 296: 499-502 (2002)
JM - http://folding.chmcc.org
10
Monte Carlo random search
A simulation technique for conformational sampling and optimization
based on a random search for energetically favourable conformations.
Finding global (or at least “good” local) minimum by biased
random walk may take some luck …
JM - http://folding.chmcc.org
11
Monte Carlo algorithm



The core of MC algorithm is a heuristic prescription for a plausible
pattern of changes in the configurations assumed by the system.
Such an elementary “move” depends on the type of the problem.
In the realm of protein structure it may be for instance a rotation
around a randomly chosen backbone bond. A long series of
random moves is generated with only some of them considered as
“good” moves.
The advantage of MC method is its generality and a relatively weak
dependence on the dimensionality of the system. However, finding
a “move” which would ensure efficient sampling may be a highly
non-trivial problem.
JM - http://folding.chmcc.org
12
Monte Carlo algorithm

In the standard Metropolis MC a move is accepted
unconditionally if the new configuration results in a
better (lower) potential energy. Otherwise it is accepted
with a probability given by the Boltzmann factor:
Pr r
U  U (r )  U (r )
U
 exp( 
)
k BT
denotes the change in the potential energy
associated with a move
r  r
JM - http://folding.chmcc.org
13
Climbing mountains easier: simulated annealing
Increasing the effective “temperature” means higher
probability of accepting moves that increase the
energy
 Thus, the likelihood of escaping from a local minimum
may be tuned
 Heating and cooling cycles, in analogy to physical
systems
 In the limit of infinitely slow cooling simulated
annealing is guaranteed to provide the global
minimum

JM - http://folding.chmcc.org
14
From biology to optimization: genetic algorithms
Genetic algorithm (GA). A class of algorithms inspired by the mechanisms
of genetics, which has been applied to global optimization (especially
combinatorial optimization problems). It requires the specification of three
operations (each is typically probabilistic) on objects, called "strings"
(these could be real-valued vectors)
0. Initialize population
1. Select parents for reproduction and “evolutionary” operators (e.g.
mutation and crossover)
2. Perform operations to generate intermediate population and evaluate
their fitness (value of the objective function to be optimized)
3. Select a subpopulation for next generation (survival of the fittest)
4. Repeat 1-3 until some stopping rule is reached
JM - http://folding.chmcc.org
15
Genetic algorithm: operators and adaptation
Reproduction - combining strings in the population to create a new string
(offspring);
Example: Taking 1st character from 1st parent + rest of string from 2nd parent:
[001001] + [111111] ===> [011111]
Mutation - spontaneous alteration of characters in a string;
Example: [001001] ===> [101001]
Crossover - combining strings to exchange values, creating new strings in their
place.
Example: With crossover location at 2:
[001001] & [111111] ===> [001111], [111001]
JM - http://folding.chmcc.org
16
Genetic algorithms for global optimization
The original GA was proposed by John Holland and used crossover
and total population replacement. This means a population with 2N
objects (called chromosomes) form N pairings of parents that
produce 2N offsprings. The offsprings comprise the new generation,
and they become the total population, replacing their parents. More
generally, a population of size N produces an intermediate
population of N+M, from which Ñis kept to form the new population.
One way to choose which Ñsurvive is by those with the greatest
fitness values –survival of the fittest.
JM - http://folding.chmcc.org
17
Random biased search: ideas and heuristics
GA, MC, SA (MC with a smoothing)
Fitness lanscapes
JM - http://folding.chmcc.org
18