Long-term evolution is surprisingly predictable in lattice proteins

Downloaded from http://rsif.royalsocietypublishing.org/ on June 18, 2017
Long-term evolution is surprisingly
predictable in lattice proteins
Michael E. Palmer, Arnav Moudgil and Marcus W. Feldman
rsif.royalsocietypublishing.org
Research
Cite this article: Palmer ME, Moudgil A,
Feldman MW. 2013 Long-term evolution is
surprisingly predictable in lattice proteins. J R
Soc Interface 10: 20130026.
http://dx.doi.org/10.1098/rsif.2013.0026
Received: 10 January 2013
Accepted: 12 February 2013
Department of Biology, Stanford University, Gilbert Hall, Stanford, CA 94305-5020, USA
It has long been debated whether natural selection acts primarily upon individual organisms, or whether it also commonly acts upon higher-level
entities such as lineages. Two arguments against the effectiveness of longterm selection on lineages have been (i) that long-term evolutionary outcomes
will not be sufficiently predictable to support a meaningful long-term fitness
and (ii) that short-term selection on organisms will almost always overpower
long-term selection. Here, we use a computational model of protein folding
and binding called ‘lattice proteins’. We quantify the long-term evolutionary
success of lineages with two metrics called the k-fitness and k-survivability.
We show that long-term outcomes are surprisingly predictable in this model:
only a small fraction of the possible outcomes are ever realized in multiple
replicates. Furthermore, the long-term fitness of a lineage depends only
partly on its short-term fitness; other factors are also important, including
the ‘evolvability’ of a lineage—its capacity to produce adaptive variation. In
a system with a distinct short-term and long-term fitness, evolution need
not be ‘short-sighted’: lineages may be selected for their long-term properties,
sometimes in opposition to short-term selection. Similar evolutionary basins of
attraction have been observed in vivo, suggesting that natural biological
lineages will also have a predictive long-term fitness.
Subject Areas:
computational biology, biocomplexity
Keywords:
evolvability, survivability, long-term evolution,
lineage evolution, protein evolution,
macroevolution
Author for correspondence:
Michael E. Palmer
e-mail: [email protected]
1. Introduction
The fitness of a lineage is defined as the expected factor of increase in the
number of its member organisms over a single generation. It provides a quantitative measure of how successful a lineage is expected to be, in a particular
environmental context, one generation in the future. This measure of fitness
is useful because it is predictive: in a new experimental replicate in the same
biotic and abiotic environmental context, the lineage can be expected to increase
by the same factor.
The standard fitness depends only upon the current standing variation in a
lineage. There has been significant discussion of systems’ ability to produce variation, sometimes referred to as evolvability, although definitions have varied
widely. Kirschner & Gerhart [1, p. 8420] call evolvability, ‘an organism’s
capacity to generate heritable phenotypic variation’, without reference to the fitness of that variation. The view of Houle [2], which associates evolvability with
current standing variation and the immediate response of the population to
selection, has fallen out of favour. A consensus now supports Wagner &
Altenberg [3, p. 970], who proposed, ‘the genome’s ability to produce adaptive
variants’. In this sense, evolvability is about the adaptive variation that may be
produced, rather than the current standing variation, i.e. the quantitative genetics
‘M-matrix’ as opposed to the ‘G-matrix’ [4]. Altenberg [5, p. 60] provides a rare
quantitative definition, ‘the probability that a population generates individuals
fitter than any existing’. Other quantitative definitions are ad hoc for particular
systems [6 –9].
However, we note that the ‘ability to produce adaptive variants’ alone
cannot predict evolutionary outcomes, in general. Other factors are important
to the long-term success of a lineage: for example, avoidance of production of
deleterious variation. Furthermore, this definition of evolvability explicitly
& 2013 The Author(s) Published by the Royal Society. All rights reserved.
Downloaded from http://rsif.royalsocietypublishing.org/ on June 18, 2017
2.1. Lineages
We define an asexual lineage as a single founding individual, and
zero or more generations of its descendants. A sexual lineage consists of a founding population, and zero or more generations of its
descendants. In either case, a lineage is monophyletic. For sexuals, we assume that a lineage begins at an irreversible speciation
event. The founders of a sexual lineage are all the members of
one of the nascent species. This definition implies that two distinct
lineages cannot fuse together. A lineage can continue to speciate,
however, and all members of sub-lineages thus formed are also
members of the original lineage. The non-fusing requirement is
automatically satisfied for asexuals. Note that new lineages are
founded frequently, especially in asexuals: any asexual individual is itself the founder of a new lineage at every generation.
And, any asexual individual is a member of many lineages: the
lineages founded by each of its ancestors.
All lineages simulated here are asexual, but the metrics
defined in §2.2 apply equally well to sexual lineages, given the
non-fusing constraint.
We define the k-generation fitness and k-generation survivability of
lineages as follows [10]: p[N(t) ¼ n] is defined as the probability
that a lineage has n members at generation t. This can be estimated with a sufficient number of longitudinal experimental
replicates in which lineage membership is periodically counted.
NðtÞ
is the expected number of members of the lineage at
generation t, namely
X
NðtÞ
¼
np½NðtÞ ¼ n:
n
Wk (t) is the ‘k-fitness’ at generation t, namely the ratio of the
expected number of members of the lineage at generation t þ k,
to the expected number at generation t:
Wk ðtÞ ¼
þ kÞ
Nðt
;
NðtÞ
ðt 0; k 1Þ:
Sk (t) is the ‘k-survivability’ at generation t, namely the probability that the lineage will survive to generation t þ k, given
that it has survived to generation t:
Sk ðtÞ ¼ p½Nðt þ kÞ = 0 j NðtÞ = 0:
(To refer explicitly to the k-fitness or k-survivability of a lineage i
among several lineages, we will add a superscript, as in Wki ðtÞ,
Sik ðtÞ, but we will usually omit it for brevity.)
Because selection on lineages is ultimately determined by
extinction, the k-survivability, Sk (t), is the more fundamental
measure of evolutionary success. Nonetheless, Wk (t) is also
useful because, for short time spans, extinctions of lineages
may be rare. In this case, the k-fitness is a practical proxy for
the k-survivability.
We refer to k ¼ 1 as the immediate term. For example, W1(t)
(which is equivalent to the standard one-generation fitness
used in population genetics) is the immediate fitness at generation
t, and S1(t) is the immediate survivability at generation t. We will
refer to k slightly greater than one (k . 1) as the short term, and
k 1 as the long term. For the immediate term (k ¼ 1), the evolutionary outcome depends on the standing variation in the
lineage at generation t, but cannot depend on the spectrum of
variation it may produce. As k increases, the variation produced
by each lineage becomes increasingly important to the outcome.
We will estimate Wk (t) and Sk (t) by conducting multiple
experimental replicates of a given experimental scenario. The specification of a scenario will typically include: the set of founding
protein sequences; the ligand (or ligands) used as binding targets; all details of the lattice protein model; and all details of
the population genetic model. Multiple replicates of one scenario
will be identical in the scenario specification, and differ only
stochastically. Typically, the replicates will differ stochastically
when we draw random numbers to determine which mutations occur, or when we sample to produce a finite number of
offspring. Thus, the metrics Wk (t) and Sk (t) characterize the evolutionary process in the context of that scenario, with stochasticity
entering in a well-prescribed way.
It is possible to define ‘broader’ scenarios in which stochasticity enters at other points. For example, in this paper, we
typically fix the ligand across replicates. However, one could
instead define a procedure for generating a class of ligands, and
use a different, random ligand in each replicate. In that case, the
metrics Wk (t) and Sk (t) would characterize the behaviour of the
lineages when binding to a class of ligands, rather than to a particular
ligand. In another example, we might allow environments (here,
the ligand comprises the only ‘environment’) to vary stochastically
over time, differently across replicates; then, our metrics will apply
to a particular regime of environmental change. In summary, for any
scenario, we must define what is constant across replicates, and
what varies across replicates; our metrics will be specific to this
context (as is the standard fitness).
2
J R Soc Interface 10: 20130026
2. Methods
2.2. Metrics of expected evolutionary success
rsif.royalsocietypublishing.org
excludes standing variation, but clearly a lineage must survive in the immediate term (i.e. one generation) in order to
succeed in the long term.
Palmer & Feldman [10] introduced two quantitative
metrics of the expected evolutionary success of lineages. We
suggest that these metrics properly frame evolutionary
dynamics in both the short and long term, properly incorporate the effects of both standing and generated variation, and
can be predictive like the standard fitness. The first of these
metrics is Wk(t), the ‘k-fitness’ at generation t, defined as
the ratio of the expected number of members of a lineage at
generation t þ k, to the expected number at generation t.
(Note that W1(t) is the standard, one-generation fitness at
generation t.) The second is Sk(t), the ‘k-survivability’ at generation t, defined as the probability that the lineage will
survive to generation t þ k, given that it has survived to generation t. We say that a lineage survives (does not go extinct) if
it has any living descendants.
Palmer & Feldman [10] previously showed that the
k-fitness and k-survivability metrics can be predictive in
simple in silico models. The question driving this paper is
whether these metrics are predictive in more biologically realistic
models. In this study, we use an in silico model of evolved
‘lattice proteins’, a model of protein folding and binding, to
demonstrate that long-term outcomes are indeed predictable
in a biologically realistic model, where the ‘outcome’ of an
experiment with several competing lineages is the set of
lineages that survive to the end of the experiment. We quantify this predictability with an entropy measure. Furthermore,
we find that outcomes are surprisingly predictable; that is,
even when the a priori uncertainty is very large, such as
when many competing lineages are initially present, the
evolutionary process removes much of this uncertainty in
the long term. We quantify this removal of uncertainty
with a negentropy measure. Finally, we show that while
outcomes are predictable, they are not fully determined by
the initial standing variation, nor predicted by the standard
one-generation fitness; thus lattice proteins (evolved as
described here) possess a distinct short- and long-term fitness.
It is in such a system that long-term selection can sometimes
act in opposition to short-term selection.
Downloaded from http://rsif.royalsocietypublishing.org/ on June 18, 2017
Wk (t) and Sk (t) describe the evolutionary process in a given
scenario: they summarize what has happened in the replicates
we have conducted, and they yield a prediction about what the
outcome will be in a new replicate. We next define a measure of
predictability in a given evolutionary scenario.
Call s(i, j, k) the ‘survival state’ of lineage i in replicate
experiment j at generation k. s(i, j, k) ¼ 1 if lineage i has any
living members at generation k of replicate j, and 0 otherwise.
The ‘joint survival state’ M( j, k) is a binary vector composed
of elements s(i, j, k), ordered by i; it indicates the realized survival state of all lineages at generation k of replicate j. If there
are L lineages, there are 2L possible joint survival states. By
averaging over multiple replicates, we can estimate pm(k), the
probability that each joint survival state m will occur at time
step k. We then compute the entropy of the joint survival state
at generation k,
X
H joint ðkÞ ¼ pm ðkÞ log2 ð pm ðkÞÞ;
m
as a measure of the unpredictability of evolutionary outcomes in
a particular evolutionary system. Note that
0 H joint ðkÞ L
H joint ðkÞ max½H i ðkÞ
i
and
H joint ðkÞ X
H i ðkÞ:
i
2.4. Lattice protein model
The lattice protein model [15,16] is a well-studied, simple model
of protein folding, in which the amino acid residues of a protein
sequence are constrained to points on a grid (either twodimensional, as here, or three-dimensional). Although the lattice
protein model is much less physically realistic than modern,
atomic-level protein folding models, it is also vastly cheaper,
computationally. This permits the evolution of populations of lattice proteins, over many generations, in multiple replicates, on a
modern computer cluster. Despite their relative simplicity, lattice
proteins retain important properties of real proteins.
The heritable genotype in the model consists of a sequence of
amino acid residues, of length 12 for the simulations presented
here, chosen from the 20 biological amino acids. Occasional
mutations (rates below) may change the residue at a particular locus.
The fitness of one protein sequence is computed by first ‘folding’ the sequence into its minimum energy conformation, and
then computing its binding affinity to a target ligand, as follows.
2.4.1. Protein folding
A protein sequence of a finite length may ‘fold’ into a finite
number of possible conformations on the two-dimensional lattice. Examples of folded lattice proteins are shown in figure 1;
the folded proteins are indicated by upper-case letters. All
pairs of abutting amino acid residues contribute to the total interaction energy E(Ci) of a conformation Ci according to a set of
pairwise interaction energies taken from Miyazawa & Jernigan
[17, table V]. The stability of a particular folded conformation
Ci is given by
DGf ðCi Þ ¼ EðCi Þ þ T ln½QðTÞ eEðCi Þ=T ;
P
where QðTÞ ¼ Ci eEðCi Þ=T is the partition function. In vivo, real
proteins will spend time in a number of possible conformations;
more time is spent in lower-energy conformations. Here, we
make the simplification of considering only the most stable conformation for a particular sequence. If the lowest DGf for a
sequence is greater than zero, then the sequence has no energetic
‘preference’ for folding, and receives a fitness of zero.
ðin units of bitsÞ:
joint
(k) estimates our average uncertainty in predicting the
H
outcome of a new experimental replicate at generation k. If
H joint (k) ¼ 0, then the same state always occurs at generation k:
the outcome is perfectly predictable. If H joint (k) ¼ L, then the
outcomes are equally likely, and thus maximally unpredictable.
We also compute Hi (k), the entropy of the survival state of
each distinct lineage i. Below, we will show that these lineage
entropies, Hi (k), are useful for interpreting the variation in the
joint entropy, H joint (k), over time. The k-survivability of lineage
i, Sik ð0Þ, is the probability of its survival to generation k (given
that it was present at generation 0), so Hi (k) turns out to be a
simple function of Sik ð0Þ:
H i ðkÞ ¼ Sik ð0Þ log2 ðSik ð0ÞÞ ð1 Sik ð0ÞÞ log2 ð1 Sik ð0ÞÞ:
Because there are only two possible survival states of one
lineage,
0 H i ðkÞ 1
3
ðin units of bitsÞ:
2.4.2. Protein binding to a target ligand
For all sequences that do fold stably (i.e. DGf , 0), we compute the
maximum binding affinity to a target ligand, indicated by lowercase letters in figure 1. The ligand is a fixed-conformation protein
of length six residues. For a given sequence, call Cmin the conformation with the lowest DGf. All possible position and rotation
combinations of the ligand in relation to the protein are tested to
find the minimum binding energy (sum of the interaction energies
of abutting residues of the sequence and ligand), relative to conformation Cmin. Call BEmin(Cmin) the lowest binding energy over all
these combinations. The fitness of each protein sequence is
f ¼ eBEmin ðCmin Þ . See [15,16,18] for additional details.
2.5. Population genetic model
Reproduction is asexual, and generations do not overlap. Each
deme (local population) has associated with it a single target
ligand. In some of the experiments below, the population is subdivided into multiple demes. At each generation, the fitness of
J R Soc Interface 10: 20130026
2.3. An entropy measure of the predictability of
evolutionary outcomes
The joint entropy and the lineage entropies obey the following inequalities:
rsif.royalsocietypublishing.org
Our metrics may also be applied to living systems. Two important requirements are (i) that we can periodically count the
membership of a lineage and (ii) that a sufficient number of replicate experiments can be conducted to estimate p[N(t) ¼ n] for a
particular experimental scenario. For long-lived organisms, there
are obvious experimental difficulties, but the lifespan of microorganisms is short enough such that experiments of many generations can be conducted. The most closely related work of which
we are aware is that of Woods et al. [11], who measured the equivalent of W7 (t) (the seven-generation fitness) of multiple bacterial
lineages at two time points t1 and t2. They did not measure
Wk (t) over the long term. However, the technology for such
measurement does currently exist: random ‘DNA barcode’
sequences may be inserted into individual cells [12–14]; descendants of these initial lineage founders can then be reliably
counted for many generations. Replicate longitudinal experiments
of this type could generate estimates of p[N(t) ¼ n].
We cannot estimate p[N(t) ¼ n] for extinct natural lineages,
because we cannot conduct multiple experimental replicates:
we have only the single ‘replicate’ comprising natural history.
However, the standard one-generation fitness has the same
limitation, and it has been an extremely useful tool in biology.
Downloaded from http://rsif.royalsocietypublishing.org/ on June 18, 2017
4
rsif.royalsocietypublishing.org
J R Soc Interface 10: 20130026
Figure 1. Examples of folded lattice proteins. Upper-case letters indicate the protein of length 12 residues. Lower-case letters indicate the target ligand, a protein of fixed
conformation and length six residues. Proteins are shown in their lowest-energy folding conformation, and the ligand is translated and rotated, relative to the protein, into
the position of strongest binding affinity. Blue residues contribute to folding: they touch other (non-sequential) residues. Red residues contribute to binding: they touch
the ligand. Purple residues contribute to both folding and binding. Grey residues contribute to neither. An initial sequence (top left) evolves over time into a family of
descendant sequences. Descendants may assume different folded configurations, and the binding site may change over time. Note increasing fitnesses.
each protein sequence in a deme is computed as above. Each of
the N sequences is cloned and placed into the offspring generation with probability proportional to its fitness until there are
N offspring. Random mutations are applied, and if there are
multiple demes, random migration occurs among them.
2.6. Experimental design
We screened many random lattice protein sequences of length 12
to generate 1600 sequences that fold stably (DGf , 0), in an
approximately uniform distribution of DGf between 23.5 and
0 kcal mol21. In an experimental scenario using L lineages (e.g.
L ¼ 4), we select L ‘founding’ sequences out of the 1600 prescreened stable sequences. We generate N/L clones of each
founding sequence to construct L initial lineages with a total of
individuals. In a given replicate of a scenario, the membership
of each lineage will vary over time, and lineages may go extinct.
In each replicate, we record the number of living members of
each of the L lineages at each generation, in order to estimate
p[N(t) ¼ n], which allows us to compute Wk (t) and Sk (t), for
each lineage, for a given scenario.
In the first set of experiments below, we generated 32 scenarios,
each of which comprised a fixed set of L ¼ 4 founders (chosen
from the 1600 stable sequences), and a single fixed ligand.
A single deme (D ¼ 1) of fixed population size, N ¼ 16 384, was
used. We ran R ¼ 512 replicates of each scenario (which will
ensure a 95% chance that the probabilities of all outcomes will
be estimated within 5% [19]), for a duration of genmax ¼ 2000 generations each. Stochasticity across replicates of a given scenario
enters only during mutation, and sampling to the population
size N. Mutations were applied at the rate of m ¼ 0.0005 per residue
per generation ( producing about 10 mutations per generation).
In the second set of experiments below, the number of founders was increased to L ¼ 128. Again, we generated 32 such
scenarios, and N ¼ 16 384, D ¼ 1, R ¼ 512 and genmax ¼ 2000.
In the third set of experiments below, we subdivided the
population into D ¼ 32 demes of N ¼ 512 individuals each
(total population, ND ¼ 16 384). For each of 16 scenarios, we
selected L ¼ 128 founders, assigning them to fixed demes
within a scenario. One random ligand was assigned to each
deme, and this was fixed within a scenario. Again, we performed
R ¼ 512 replicates; however, genmax was increased to 10 000 generations, because population subdivision slows the attainment of
equilibrium. Here, additional stochasticity enters in the selection
of random migrants between the demes.
3. Results
3.1. Evolution is predictable in lattice proteins
In figure 2, each of the three rows presents the results of a distinct experimental ‘scenario’ of our first set of experiments,
which involve a single deme and L ¼ 4 founding lineages.
(We conducted 32 such scenarios, selecting three of them
for inclusion in figure 2.) Each of the pre-screened stable
founding sequences is assigned a unique number from 1 to
1600; each lineage is uniquely identified by the sequence
number of its founder. The legends in each row identify the
four lineages included in the scenario: for example, in the
H(k) (bits)
H(k) (bits)
0140
0665
1018
0788
0736
1051
0742
0024
joint
9
8
7
6
5
4
3
2
1
0
1116
1184
0044
0088
joint
H(k) (bits)
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0
9
8
7
6
5
4
3
2
1
0
9
8
7
6
5
4
3
2
1
0
0140
0665
1018
0788
joint
0736
1051
0742
0024
0.8
Sk(0)
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0
1116
1184
0044
0088
1.0
0.6
0.4
0.2
0
1116
1184
0044
0088
1.0
Sk(0)
0.8
0.6
0.4
0.2
0
1.0
0140
0665
1018
0788
0.8
Sk(0)
Wk(0)
Wk(0)
0736
1051
0742
0024
0.6
0.4
0.2
1
10
100
generations
1000
0
1
10
100
generations
1000
1
10
100
generations
1000
Figure 2. Four competing lineages (L ¼ 4, N ¼ 16 384, D ¼ 1, R ¼ 32). Each row plots data for one ‘scenario’, corresponding to four founding sequences and a
target ligand. The three columns plot the k-fitness, Wk(0); the k-survivability, Sk(0); and the entropy, H(k), of the survival state, including the entropy of the joint
survival state of all lineages (black). For R ¼ 512 replicates, the maximum measureable entropy is log2(512) ¼ 9 bits, as indicated by the dotted line in each of the
right panels. However, for these scenarios, the actual entropy remains well below this maximum measurable level. (The folded proteins shown in figure 1 belong to
the green lineage (0665) in the third row of figure 2. See also figure 8.)
first row of figure 2, sequences 0736, 1051, 0742 and 0024
were cloned to found the four lineages. In the legends, the
lineage numbers are ordered by the total ‘weight’ [20] of
their lineage, i.e. the sum of the counts of their members
over all generations. (This ‘weight’ is a simple way to rank
the lineages in a given scenario.) The three columns show
the k-fitness, k-survivability, and the entropy of the survival
state, respectively, for the three scenarios. The error bars in
the panels in the left column indicate the standard error of
the mean of Wk(0).
In the top left panel of figure 2, where the k-fitness is
plotted for the first scenario, lineage 0736 (red) is shown to
be the best adapted to the target ligand in the short term.
Its W1(0) is approximately 2.0, which can be read along the
y-axis (note that generations are plotted on a log scale, and
the y-axis corresponds to generation 1). That is, in the first
generation, the membership of the red lineage increases by
a factor of approximately 2.0 on average (over R ¼ 512 replicates). It quickly comes to dominate the population, with a
medium- and long-term Wk(0) of approximately 4; this is
the maximum possible k-fitness, because each lineage starts
with N/L members (with L ¼ 4) and can increase to at
most N. In the centre panel of the first row, where Sk(0) is
plotted, it can be seen that lineages 0742 (blue) and 0024
( purple) have a 0 per cent estimated probability of surviving
past generation 16, or 20, respectively. Lineage 1051 (green)
has an almost 100 per cent chance of extinction shortly thereafter, leaving lineage 0736 with an almost 100 per cent chance
of surviving (again, as estimated over 512 replicates) in the
medium and long term.
The entropy of the survival state of each lineage Hi (k) is
shown in the right panel of each row (coloured lines), along
with the entropy of the joint survival state H joint (k) (black
line). For R ¼ 512 replicates, the maximum observable joint
entropy is log2(512) ¼ 9 bits; this maximum reading would
be achieved if a different outcome ( joint survival state)
occurred in every replicate. The horizontal dotted line across
the top of each panel in the right column of figure 2 indicates
this maximum possible measurement.
In the top right panel of figure 2, after about generation
30, the joint entropy (black line) for the first scenario is
almost zero: in almost all replicates, the same survival state
occurs, corresponding to the survival of lineage 0736 and
the extinction of all other lineages. There is almost no uncertainty in the joint survival state after this point; thus,
evolution is very predictable after k ¼ 30. If we performed a
new replicate of this scenario, we could be very confident
that lineage 0736 (red) would again dominate in the long
term. (There is a small chance that lineage 1051 (green)
would survive.) Notice that before generation 4 there is also
5
J R Soc Interface 10: 20130026
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0
rsif.royalsocietypublishing.org
Wk(0)
Downloaded from http://rsif.royalsocietypublishing.org/ on June 18, 2017
Downloaded from http://rsif.royalsocietypublishing.org/ on June 18, 2017
6
9
average joint
average H joint(k) (bits)
8
rsif.royalsocietypublishing.org
7
6
5
4
3
2
1
0
1
10
100
generations
1000
Figure 3. Average of the joint entropy, H joint (k), over 32 scenarios, for L ¼ 4.
The error bars show the standard error of the mean. Evolutionary outcomes
are very predictable in the long term.
flip of a fair coin (one bit). Therefore, we can say that, for L ¼ 4,
evolution is very predictable in lattice proteins.
3.2. Is evolution still predictable for larger numbers of
competing lineages?
For our second set of experiments, we increase L to 128. In
figure 4, we plot several scenarios (again, one scenario per
row) with a larger number of founding lineages: L ¼ 128.
As before, N ¼ 16 384, D ¼ 1, and R ¼ 512 replicates. For
clarity, only the five lineages with the heaviest ‘weight’
(sum of member count across generations) are explicitly
named in the panel legends for each of three scenarios
(rows); however, all 128 lineages are plotted until extinction.
The top row of figure 4 shows a scenario for which a single
lineage, 0957 (red), is initially fittest, and also dominates in the
long term. It attains a long-term fitness of about 60; the maximum possible fitness is 128 in this case (because each lineage
is initialized with N/L members and can increase to at most
N, where L ¼ 128). As seen in the top centre panel, the red
lineage has an approximately 45 per cent chance of long-term
survival, with several other lineages having non-zero chances
of surviving in the long term.
In the top right panel of figure 4, the maximum observable joint entropy is indicated with a horizontal dotted line:
because there are R ¼ 512 replicates, the maximum observable joint entropy is log2(512) ¼ 9 bits. Note that because
there are L ¼ 128 lineages, there are 2128 possible joint survival states, or a maximum actual joint entropy of log2(2128) ¼
128 bits. Thus, between generations 3 and 6, the measured
entropy is ‘clipped’ by the dotted horizontal line: the actual
entropy is higher than we can observe with only 512 replicates. This is because many of the 128 lineages go extinct in
the first six generations, but their precise order of extinction
is uncertain. However, in all three of the scenarios shown,
the actual entropy decreases to well below the maximum
observable nine bits in the long term.
The second row shows a scenario in which the initially
best-adapted lineage does not dominate in the long term:
the green lineage (0898) is initially more fit in the short
term, but later fails to adapt as well as the red lineage
(0001), which dominates in the long term. The long-term outcome is quite certain: only around 0.1 bits of joint entropy.
J R Soc Interface 10: 20130026
zero uncertainty: all lineages reliably survive until at least
generation 3. The k of highest joint uncertainty (nearly 1.5
bits) is k ¼ 7. In an additional replicate, it would be uncertain
whether the blue and purple lineages would yet be extinct, as
can be seen by the plots of Hi (k) for lineages 0742 and 0024
(blue and purple lines in top right panel). A secondary
peak in joint uncertainty occurs because the green lineage
tends to go extinct at an uncertain moment between generations
10 and 30.
In the second row of figure 2, lineage 1184 (green) is
initially better adapted to the target ligand, and increases
rapidly in membership; however, lineage 1116 (red) adapts
more rapidly, and supersedes lineage 1184. Thus, although
the green lineage has a superior W1(0) of about 2.0 (as can
be read along the y-axis of the left panel in the second row
of figure 2), the red lineage has a superior W2000(0) of 3.75.
Ultimately, the red lineage has approximately 95 per cent
chance of surviving in the long term, and the green lineage
has approximately 5 per cent chance of surviving in the
long term (see middle panel of second row). (The blue lineage
also has a small chance of surviving in the long term.) Thus,
while the green lineage is initially better adapted to the
ligand, the red lineage has higher long-term survivability,
to the extent that red drives green extinct approximately
95 per cent of the time.
Note that, at generation 4, the k-fitness (left panel, second
row) of lineage 1116 reaches a minimum of approximately
0.3. Because each lineage was initialized with 4096 members,
this means that the red lineage had approximately 1229
members at generation 4, on average, although the number
may fluctuate across replicates. If the total population size
(N ¼ 16 384) were smaller (not shown), the red lineage
might have gone extinct early in some replicates, rather
than discovering the adaptation(s) that led it to a high
k-fitness in the long term. (See Palmer & Feldman [10] for a
discussion of the relationship between population size and
the conflict between long-term and short-term fitness. This
conflict is also discussed by Clune et al. [21].) In the rightmost
panel of the second row, we see a joint survival entropy
(black line) of only 0.4 bits in the long term. The plots of
the lineage entropies (coloured lines in rightmost panel) indicate that the purple and blue lineages contribute to higher
joint entropy mostly in the initial generations (k less than
about 35). Later, the joint entropy is due to uncertainty in
the survival state of the red and green lineages.
In the third row of figure 2, again the lineage that does
best initially ( purple, 0788) is not the one that dominates in
the long term (red, 0140); although the purple lineage has a
W1(0) of approximately 1.6 (as seen in the bottom left
panel), it has a 100 per cent chance of extinction by approximately generation 30 (as seen in the bottom centre panel).
A maximum uncertainty in the joint survival state (approx.
1.8 bits) arises around generation 13, but the long-term joint
entropy is less than one bit.
For this first set of experiments (L ¼ 4, in a single deme),
we performed 32 different scenarios, with R ¼ 512 replicates
each. (Just three of them were included in figure 2.) Figure 3
plots the entropy of the joint survival state, averaged over all
32 scenarios. This indicates how predictable the k-generation
evolutionary outcome will be in general for such scenarios
(i.e. for L ¼ 4, N ¼ 16 384, with sequences and ligands selected
in the same way). In the long term (k . 500), there is low uncertainty in the outcome: only about 0.9 bits, more certain than the
Downloaded from http://rsif.royalsocietypublishing.org/ on June 18, 2017
Sk(0)
0.6
1439
0.4
40
0001
9
8
7
6
5
4
3
2
1
0
0150
0.2
0
0
0001
0898
100
0876
80
0982
0001
1.0
0898
0.8
0876
0693
60
Sk(0)
0693
40
0.6
0.4
20
0.2
0
0
0150
120
0437
100
0315
80
0709
0982
0150
1.0
0437
0.8
0315
1327
40
Sk(0)
60
1327
0.6
0709
0.4
0.2
20
0
0
1
10
100
generations
1000
1
10
100
generations
1000
1131
1323
1439
joint
0898
0876
0693
0982
joint
0437
0315
1327
0709
joint
1
10
100
generations
1000
Figure 4. One hundred and twenty-eight competing lineages (L ¼ 128, N ¼ 16 384, D ¼ 1, R ¼ 32). Each row plots data for one ‘scenario’. In the right-hand column, the
actual joint entropy is ‘clipped’ by the maximum observable entropy (dotted line) early in each scenario; more replicates would be needed to measure higher levels of entropy.
3.3. A negentropy measure of uncertainty removed by
an evolutionary process
In thermodynamics and information theory, a quantity called
the negentropy [22,23] is used to describe the difference between
the maximum possible uncertainty in a system and the current
uncertainty. If Hmax is the maximum entropy of a system and
Hcurr is the current entropy, then the ‘negentropy’ is defined as
N ¼ Hmax Hcurr :
This quantity is also useful for describing evolutionary
processes. For an evolving system of L lineages, there are 2L
9
average joint
8
average Hjoint(k) (bits)
In the third row, a variety of lineages may survive to the long
term in different replicates; this produces (as seen in the
bottom right panel) a higher long-term joint entropy of
about 3.7 bits: the long-term outcome in this scenario is
much more uncertain.
In figure 5, we plot the average joint entropy over 32
different L ¼ 128 scenarios. Surprisingly, the average joint
entropy in the long term is still quite low: only about two
bits! This is comparable to only four (or so) of the 128 lineages
being in contention to dominate in the long term.
While it is interesting that the L ¼ 4 scenarios yielded an
average joint uncertainty of 0.9 bits in the long term (figure 3),
it seems even more remarkable that the L ¼ 128 scenarios
should yield an average joint uncertainty of only two bits
(figure 5), given that there are so many more possible outcomes when L ¼ 128. How should we compare these two
situations quantitatively?
7
6
5
4
3
2
1
0
1
7
0815
10
100
generations
1000
Figure 5. Average of the joint entropy, H joint (k), over 32 scenarios, for L ¼
128. Even with many more possible outcomes, the long-term outcomes are
very predictable.
possible values of the ‘joint survival state’ vector. The most
uncertain distribution is when all states have an equal probjoint
ability of 1/2L, producing a maximum entropy of Hmax ¼ L
bits. Thus, if H joint (k) is the joint entropy of a set of L lineages
at generation k, then the joint negentropy at generation k is
N joint ðkÞ ¼ L H joint ðkÞ:
This tells us how much uncertainty is removed by observing
many replicates of a particular evolutionary process (as specified
by the scenario).
J R Soc Interface 10: 20130026
120
Wk(0)
9
8
7
6
5
4
3
2
1
0
1131
1323
20
Wk(0)
H(k) (bits)
0.8
1439
60
0957
0815
1323
80
9
8
7
6
5
4
3
2
1
0
H(k) (bits)
1131
Wk(0)
100
0957
1.0
rsif.royalsocietypublishing.org
0815
H(k) (bits)
0957
120
Downloaded from http://rsif.royalsocietypublishing.org/ on June 18, 2017
80
H(k) (bits)
0.8
0.6
Sk(0)
Wk(0)
100
0097
0876
0262
1339
0716
1.0
60
0.4
40
0.2
20
0
0
0097
1529
0128
0338
0615
0.8
Sk(0)
Wk(0)
80
60
0.6
0.4
40
0.2
20
0
0
1490
1119
1563
1208
0978
100
0.8
Sk(0)
Wk(0)
80
60
1490
1119
1563
1208
0978
1.0
H(k) (bits)
120
0.6
0.4
40
0.2
20
0
1
10
1000
100
generations
10 000
0
1
10
100
1000
generations
10 000
0097
0876
0262
1339
0716
joint
9
8
7
6
5
4
3
2
1
0
0097
1529
0128
0338
0615
joint
9
8
7
6
5
4
3
2
1
0
1490
1119
1563
1208
0978
joint
1
10
100
1000
generations
10 000
Figure 6. Metapopulations of multiple demes (L ¼ 128, N ¼ 512, D ¼ 32, total population ND ¼ 16 384, R ¼ 512). In a metapopulation, multiple lineages may coexist in
the long term.
N joint (k) is equivalent to the mutual information [24,25]
I(X;Y ) between X, the outcome of a single new replicate,
and Y, the set of outcomes of the R replicates:
N joint ðkÞ ¼ IðX; YÞ ¼ HðXÞ HðXjYÞ:
This can be read as ‘the uncertainty in X (without knowing
Y ) minus the uncertainty in X given Y0.
We will use the following qualitative terminology. A scenario is predictable, but not surprisingly so when H joint (k) is low,
but N joint (k) is also low. A scenario is surprisingly predictable
when H joint (k) is low, and N joint (k) is high. A scenario is
simply not predictable when H joint (k) is high. In our first set
of experiments (L ¼ 4), in the long term, H joint (k) ¼ 0.9
(according to figure 3), yielding N joint (k) ¼ 3.1. In our
second set of experiments (L ¼ 128), H joint (k) ¼ 2.0 (according
to figure 5), yielding N joint (k) ¼ 126. In both cases, evolution
is quite predictable; but in the latter case, the predictability is
much more surprising.
3.4. A structured metapopulation still yields
surprising predictability
Note that, in the Wright–Fisher model with fixed population
size and fixed fitnesses, one lineage will eventually fix in a
single deme. We excluded this knowledge of the evolutionary
process when computing the maximum possible entropy.
However, assuming this knowledge a priori would eliminate
much uncertainty: the number of possible long-term outcomes
would go from 2L to L. By contrast, in a structured population
of multiple demes, it may take much longer for a single lineage
to fix [26]: multiple lineages may persist for a long time in a
subset of the demes, especially if selection is different in
each deme. This raises the question of whether evolution
will still be so surprisingly predictable in multiple demes.
Therefore, we conducted a third set of experiments, in
which the population was divided into D ¼ 32 demes, with
migration among them at a rate of mig ¼ 0.01 per individual
per generation. L ¼ 128, and the assignment of founders to
demes was fixed per scenario. The population size in each
deme was N ¼ 512, for a total population of ND ¼ 16 384,
and R ¼ 512. Because evolution can take longer to reach
a stationary distribution in a structured population, we
increased genmax to 10 000 generations. A different target
ligand was used in each of the 32 demes (fixed per scenario),
so that for a lineage to dominate the entire metapopulation, it
must adapt to bind a different ligand in every deme. Nevertheless, it was still possible to find scenarios in which a single
lineage typically dominates in the long term.
The first row of figure 6 shows such a scenario: the red
lineage (0097) dominates from the beginning, and has an estimated 100 per cent survival probability in the long term. The
long-term entropy in the joint survival state of about 3.5 bits
is due to the possibility that a variety of lineages may coexist
(i.e. in different demes) with the ever-present red lineage in
the long term.
In the second row of figure 6, the red lineage (0097) has a
higher long-term fitness, W10 000(0), than the green lineage
(1529, left panel of second row); however, both of these
lineages have a high long-term survivability, S10 000(0): 1.0
and approximately 0.85, respectively: the red and green
lineages usually both survive in the long term. This is
because the green lineage ‘specializes’ in a subset of
the demes (not shown). Only in the very long term
8
J R Soc Interface 10: 20130026
100
0097
1529
0128
0338
0615
1.0
H(k) (bits)
120
9
8
7
6
5
4
3
2
1
0
rsif.royalsocietypublishing.org
0097
0876
0262
1339
0716
120
Downloaded from http://rsif.royalsocietypublishing.org/ on June 18, 2017
9
average Hjoint(k) (bits)
7
6
5
4
3
2
1
10
100
generations
1000
Figure 7. Average of the joint entropy, H joint (k), over 16 scenarios, for L ¼
128, N ¼ 512, D ¼ 32. Even with many possible outcomes, and a structured
population with a different target in each deme, evolutionary outcomes are
still very predictable.
(k approaching 10 000) does it begin to go extinct in some
replicates (as the red immigrants eventually invade; not
shown). Note that the long-term joint entropy (right panel,
second row) is increasing as we pass generation 10 000,
because of the uncertain timing in the extinction of the
green lineage, in some replicates, around this time.
In the third scenario of figure 6, multiple lineages each fix
in a subset of the demes (not directly shown); thus, many
lineages have a high chance of long-term survival. In the
centre panel of the third row, at generation k ¼ 10 000, the
total survival probability of all lineages sums to well above
1.0: three or four lineages often survive this long. In a structured population, multiple lineages may coexist for a long
time, so that the sum of Sk(0) over all lineages can be greater
than 1.0, even at high k. (In contrast, in figures 2 and 4, this
sum was no more than 1.0 at high k, because one lineage
will fix.) The long-term joint entropy is much higher here
(approx. 5) because of the uncertainty in the extinctions of
the various lineages, which may fix in a few demes, or go
extinct, in any given replicate.
Figure 7 shows the average entropy of the joint survival
state taken over 16 scenarios of the multi-deme case (i.e.
L ¼ 128, D ¼ 32, N ¼ 512). In the long term, H joint (k) is only
about 4.5 bits, yielding N joint (k) of 123.5 bits at high k; this
is ‘surprising’. Despite the separation of the population into
multiple demes, evolution is quite predictable (H joint (k) is
low), and surprisingly so (N joint (k) is high).
In summary, for interesting, non-degenerate experimental
set-ups, we have shown that evolution can be surprisingly
predictable in lattice proteins.
3.5. The likelihood of long-term success is a property of
the lineage
In figure 8, we plot the evolutionary trajectories of individual
sequences from the third scenario (row) of figure 2. Each of the
four panels plots the trajectories of the members of four
lineages: 0140 (red), 0665 (green), 1018 (blue) and 0788
( purple). In figure 2, NðtÞ,
the expected number of members
at generation t, estimated across all replicates (R ¼ 512), is
shown on a log scale. The black line in each panel shows
the total NðtÞ
for the specified lineage as a whole. The
4. Discussion
In a particular model, there are two ways that Wk(t) and Sk(t)
could fail to have the predictive utility of the standard onegeneration fitness: (i) evolutionary outcomes could simply
not be very predictable over the long term or (ii) outcomes
could be so strongly predicted by the short-term fitness that
the long-term metrics are uninteresting. In the lattice protein
model, we have shown that long-term evolutionary outcomes
in a single deme and in multiple demes are predictable, and
surprisingly so: the entropy of the joint survival state is low
in the long term, whereas its negentropy is high. We have
shown that the immediate-term adaptedness of a lineage contributes to, but does not entirely determine, its chance of
success in the long term. The long-term success of a lineage
depends, additionally, on its tendency to generate adaptive
variation, its tendency to avoid deleterious variation, its ability to diversify in phenotype, and its ability to physically
disperse [10].
That long-term outcomes are predictable, but not fully
determined by the short-term fitness, implies that lineages
may be selected for their long-term properties, sometimes
in opposition to short-term selective forces; evolution need
not be short-sighted [21]. Selection at some time scales and
in some situations may best be summarized as selection on
organisms. However, in some cases, often in the longer
term, lineage-level selection may provide the best model by
describing evolution both simply and with sufficient accuracy. We might slightly increase our descriptive accuracy by
including the voluminous details of organismal-level selection; but this would hide higher-level consistencies that
could be concisely captured as lineage-level selection.
The ‘viewpoint’ of the lineage-as-individual is quite alien
to that of the organism-as-individual. To the lineage, organisms are expendable, until their numbers become small.
The successful lineage is one that continues to have some
surviving descendants, not necessarily one that has many
descendants. Lineages are potentially immortal, and may
change genetically without bound over time.
Is there a simple property of the founding sequence of a
lineage that will strongly predict its long-term fitness?
J R Soc Interface 10: 20130026
0
1
9
rsif.royalsocietypublishing.org
average joint
8
coloured lines in each panel show NðtÞ
for each unique
sequence belonging to the lineage; all sequences that attain
an NðtÞ
of 10 or more at some generation t are plotted, and
the 10 sequences attaining the highest peak values of NðtÞ
are labelled with their sequence names. In all panels, the
founding sequence of each lineage (i.e. FCTF KIIN CEWV
for lineage 0140 (red), MVNL TLFS VTLM for lineage 0665
(green), FLEL TCLN NPCF for lineage 1018 (blue) and
IWPK AHML SHNY for lineage 0788 ( purple)) goes extinct
within 10 generations. We include these plots to illustrate
how natural it is to consider the lineage as an entity that possesses a long-term fitness and survivability: no single sequence
completely determines the long-term expected success of the lineage; rather, success is determined partly by the immediate
fitness of the founding sequence, and partly by the fitness
of descendant sequences.
Significant neutral divergence is apparent at high k in the
red and green panels. (The 10 highest weight sequences in
the upper right panel (green, 0665) are the same sequences
shown in folded form in figure 1.)
Downloaded from http://rsif.royalsocietypublishing.org/ on June 18, 2017
lineage 0140
10
MVNL TLFS VTLM
MVNL VLFS VTLM
FCTF FFIF CEWV
FCTF FIIM CEWV
FCTF FIIF CEWV
FCTF WIIN CEWV
FCTF VIIN CEWV
FCTF KIIV CEWV FCTF FFIM CEWV
FCTF KIIW CEWV
FCTF KIIF CEWV
FCTF KIIN CEWV
_
N(t)
1000
MVNL VFFS VTLM
MFNL TLFS VFLM
MVNL TLFV VTLM MFNL TLFS VFFM
MVNL TLFS VFLM
MFNL TLFS VFFI
MINL TLFS VFLM
MVNL TLFS VLLM
100
J R Soc Interface 10: 20130026
10
lineage 1018
lineage 0788
IWPK AHML SHNY
10 000
FLEL TCLN NPCF
_
N(t)
1000
IWPK ACML SHNY
FFML SWLN VPFF
FFIL TCLN NPFF
FFIL CCLN VPFF
FLIL TCLN NPFF
100
FMIL TCLN NPFF
FLIL TCLN NPIF FFML QWLE YAFF
FLIL TCLN NPCF
FFIL TCLN IPFF
10
1
10
100
1000
generations
1
10
100
1000
generations
for each unique sequence belonging to the four lineages shown in the third scenario of figure 2 (L ¼ 4). All sequences
Figure 8. Plots of expected counts, NðtÞ,
are labelled. Long-term success is best considered as a property of
attaining NðtÞ of at least 10 at some t are plotted; the 10 sequences with the highest peak NðtÞ
a lineage. (The lineages shown are taken from the scenario in the third row of figure 2. The 10 highest ‘weight’ sequences of lineage 0665 (green) are shown in
folded form in figure 1.)
Bloom et al. [27] claimed that the stability of a folded protein
promoted ‘evolvability’ (defined differently). By contrast, we
found a weak negative correlation between the stability of the
founding sequence and its long-term fitness (not shown).
We have come to suspect that there may be no simple property of a sequence that strongly predicts its long-term fitness.
Instead, the mutational landscape, the fitness landscape
(binding process) and the genotype –phenotype map (folding
process), together with the other scenario details, define a
multi-dimensional set of evolutionary ‘pathways’ that the
descendants of a given founder are likely to follow. These
pathways cannot be simply predicted from the high-level
properties of a single sequence, e.g. from its initial stability.
Although these high-dimensional pathways are difficult for
us to visualize, they are persistent across replicates of a
given scenario. They describe basins of attraction that pull
the lineages through the evolutionary process, to repeatable
outcomes. The long-term fitness and survivability of a lineage
characterize these basins of attraction in a given scenario.
Moreover, analogous to how the survival state of all lineages
i can be combined into a joint survival state, the entire evolutionary system can be considered to be moving along a
much higher-dimensional set of pathways, from its initial
state (all lineages with equal numbers of members) to a
restricted set of possible outcomes (one or a few particular
lineages surviving).
We have previously discussed the metrics Wk(t) and Sk(t)
in the context of simpler models [10]. In this study, our intent
rsif.royalsocietypublishing.org
10 000
lineage 0665
has been to demonstrate the metrics in the more biologically
realistic model of lattice proteins. The lattice protein model
resembles real protein folding, but is simpler in several
ways: (i) here, it is in two dimensions, rather than three;
(ii) we consider only the lowest-energy conformation of the
protein, and the lowest-energy binding site; and (iii) the proteins are short. Nonetheless, these model proteins do retain
important features of real proteins, producing a complex
evolutionary landscape that has some compatibility to that
of real proteins. If long-term predictability had been absent
in lattice protein evolution, or if the short-term fitness had
completely predicted the long-term fitness, then we would
have held out little hope for the utility of Wk(t) and Sk(t) in
biological models.
Do real proteins, and real lineages of organisms, also have
a predictive long-term fitness that is distinct from their shortterm fitness? There is evidence in real proteins [28 –30] and in
real bacteria [11,31] that the number of selectively permissible
evolutionary pathways from a maladapted to an adapted
state may be quite low, which would enhance predictability
of outcomes. We think it likely that such a long-term fitness, and surprising predictability, will soon be measured in
experimental evolution with micro-organisms.
This work was supported in part by NIH grant no. GM28016. Many
thanks to Jesse Bloom for graciously sharing his lattice protein folding code [15,18,27,32]. Thanks to our anonymous reviewers for
important insights and suggestions.
Downloaded from http://rsif.royalsocietypublishing.org/ on June 18, 2017
References
1.
13.
14.
16.
17.
18.
19.
20.
21.
22. Brillouin L. 1953 The negentropy principle of
information. J. Appl. Phys. 24, 1152– 1163. (doi:10.
1063/1.1721463)
23. Gibbs JW. 1873 A method of geometrical
representation of the thermodynamic properties of
substances by means of surfaces. Trans. Connecticut
Acad. Arts Sci. 2, 382 –404.
24. Adami C. 2004 Information theory in molecular
biology. Phys. Life Rev. 1, 3–22. (doi:10.1016/j.
plrev.2004.01.002)
25. Cover TM, Thomas JA. 2006 Elements of information
theory. London, UK: Wiley-Interscience.
26. Slatkin M. 1981 Fixation probabilities and fixation
times in a subdivided population. Evolution 35,
477–488. (doi:10.2307/2408196)
27. Bloom JD, Labthavikul ST, Otey CR, Arnold FH. 2006
Protein stability promotes evolvability. Proc. Natl
Acad. Sci. USA 103, 5869–5874. (doi:10.1073/pnas.
0510098103)
28. Costanzo MS, Brown KM, Hartl DL. 2011 Fitness
trade-offs in the evolution of dihydrofolate
reductase and drug resistance in Plasmodium
falciparum. PLoS ONE 6, e19636. (doi:10.1371/
journal.pone.0019636)
29. Lozovsky ER et al. 2009 Stepwise acquisition of
pyrimethamine resistance in the malaria parasite.
Proc. Natl Acad. Sci. USA 106, 12 025–12 030.
(doi:10.1073/pnas.0905922106)
30. Weinreich DM, Delaney NF, DePristo MA, Hartl DL.
2006 Darwinian evolution can follow only very few
mutational paths to fitter proteins. Science 312,
111–114. (doi:10.1126/science.1123539)
31. Blount ZD, Borland CZ, Lenski RE. 2008 Historical
contingency and the evolution of a key innovation
in an experimental population of Escherichia coli.
Proc. Natl Acad. Sci. USA 105, 7899–7906. (doi:10.
1073/pnas.0803151105)
32. Bloom JD, Raval A, Wilke CO. 2006 Thermodynamics
of neutral protein evolution. Genetics 175,
255–266. (doi:10.1534/genetics.106.061754)
J R Soc Interface 10: 20130026
15.
with viral genetic barcoding. Nat. Biotechnol. 29,
928 –933. (doi:10.1038/nbt.1977)
Lauring AS, Andino R. 2011 Exploring the fitness
landscape of an RNA virus by using a universal
barcode microarray. J. Virol. 85, 3780–3791.
(doi:10.1128/JVI.02217-10)
Gerrits A, Dykstra B, Kalmykowa OJ, Klauke K,
Verovskaya E, Broekhuis MJC, de Haan G, Bystrykh
LV. 2010 Cellular barcoding tool for clonal analysis
in the hematopoietic system. Blood 115,
2610 –2618. (doi:10.1182/blood-2009-06-229757)
Bloom JD, Wilke CO, Arnold FH, Adami C. 2004
Stability and the evolvability of function in a model
protein. Biophys. J. 86, 2758– 2764. (doi:10.1016/
S0006-3495(04)74329-5)
Dill KA, Bromberg S, Yue K, Fiebig KM, Yee DP,
Thomas PD. 1995 Principles of protein folding: a
perspective from simple exact models. Protein Sci. 4,
561 –602. (doi:10.1002/pro.5560040401)
Miyazawa S, Jernigan RL. 1985 Estimation of
effective interresidue contact energies from protein
crystal structures: quasi-chemical approximation.
Macromolecules 18, 534– 552. (doi:10.1021/
ma00145a039)
Bloom JD, Silberg JJ, Wilke CO, Drummond DA,
Adami C, Arnold FH. 2005 Thermodynamic
prediction of protein neutrality. Proc. Natl Acad. Sci.
USA 102, 606–611. (doi:10.1073/pnas.
0406744102)
Thompson SK. 1987 Sample size for estimating
multinomial proportions. Am. Stat. 41, 42 –46.
Desai MM, Fisher DS. 2007 Beneficial mutation
selection balance and the effect of linkage on
positive selection. Genetics 176, 1759– 1798.
(doi:10.1534/genetics.106.067678)
Clune J, Misevic D, Ofria C, Lenski RE, Elena SF,
Sanjuán R. 2008 Natural selection fails to optimize
mutation rates for long-term adaptation on rugged
fitness landscapes. PLoS Comput. Biol. 4, e1000187.
(doi:10.1371/journal.pcbi.1000187)
rsif.royalsocietypublishing.org
Kirschner M, Gerhart J. 1998 Evolvability. Proc. Natl
Acad. Sci. USA 95, 8420– 8427. (doi:10.1073/pnas.
95.15.8420)
2. Houle D. 1992 Comparing evolvability and variability
of quantitative traits. Genetics 130, 195–204.
3. Wagner GP, Altenberg L. 1996 Perspective: complex
adaptations and the evolution of evolvability.
Evolution 50, 967–976. (doi:10.2307/2410639)
4. Jones AG, Arnold SJ, Bürger R. 2007 The mutation
matrix and the evolution of evolvability. Evolution
61, 727–745. (doi:10.1111/j.1558-5646.2007.
00071.x)
5. Altenberg L. 1994 The evolution of evolvability in
genetic programming. In Advances in genetic
programming (ed K Kinnear), pp. 47 –74.
Cambridge, MA: MIT Press.
6. Draghi J, Wagner GP. 2009 The evolutionary
dynamics of evolvability in a gene network model.
J. Evol. Biol. 22, 599–611. (doi:10.1111/j.14209101.2008.01663.x)
7. Draghi J, Wagner GP. 2008 Evolution of evolvability
in a developmental model. Evolution 62, 301–315.
(doi:10.1111/j.1558-5646.2007.00303.x)
8. Griswold CK. 2006 Pleiotropic mutation, modularity
and evolvability. Evol. Dev. 8, 81 –93. (doi:10.1111/
j.1525-142X.2006.05077.x)
9. Palmer ME, Feldman MW. 2011 Spatial
environmental variation can select for evolvability.
Evolution 65, 2345 –2356. (doi:10.1111/j.15585646.2011.01283.x)
10. Palmer ME, Feldman MW. 2012 Survivability is
more fundamental than evolvability. PLoS ONE 7,
e38025. (doi:10.1371/journal.pone.0038025)
11. Woods RJ, Barrick JE, Cooper TF, Shrestha U, Kauth MR,
Lenski RE. 2011 Second-order selection for evolvability
in a large Escherichia coli population. Science 331,
1433–1436. (doi:10.1126/science.1198914)
12. Lu R, Neff NF, Quake SR, Weissman IL. 2011
Tracking single hematopoietic stem cells in vivo
using high-throughput sequencing in conjunction
11