The dynamics of adaptation on correlated fitness

The dynamics of adaptation on correlated
fitness landscapes
Sergey Kryazhimskiya,1 , Gašper Tkačika,b,1 , and Joshua B. Plotkina,2
a
Department of Biology and b Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, PA 19104
Edited by Simon A. Levin, Princeton Universtiy, Princeton, NJ, and approved September 4, 2009 (received for review May 18, 2009)
Evolutionary theory predicts that a population in a new environment will accumulate adaptive substitutions, but precisely how
they accumulate is poorly understood. The dynamics of adaptation
depend on the underlying fitness landscape. Virtually nothing is
known about fitness landscapes in nature, and few methods allow
us to infer the landscape from empirical data. With a view toward
this inference problem, we have developed a theory that, in the
weak-mutation limit, predicts how a population’s mean fitness and
the number of accumulated substitutions are expected to increase
over time, depending on the underlying fitness landscape. We find
that fitness and substitution trajectories depend not on the full distribution of fitness effects of available mutations but rather on the
expected fixation probability and the expected fitness increment
of mutations. We introduce a scheme that classifies landscapes
in terms of the qualitative evolutionary dynamics they produce.
We show that linear substitution trajectories, long considered the
hallmark of neutral evolution, can arise even when mutations are
strongly selected. Our results provide a basis for understanding the
dynamics of adaptation and for inferring properties of an organism’s fitness landscape from temporal data. Applying these methods to data from a long-term experiment, we infer the sign and
strength of epistasis among beneficial mutations in the Escherichia
coli genome.
epistasis | fitness trajectory | substitution trajectory | weak mutation | evolution
E
volutionary theory predicts that mean fitness will increase over
time when a population encounters a new environment. This
behavior is observed in natural and laboratory populations. Yet
evolutionary theory offers few quantitative predictions for the
dynamics of adaptation (1). The primary difficulty is that adaptation depends on the shape of the underlying fitness landscape.
Unfortunately, mapping out an organism’s fitness landscape is virtually impossible because of its vast dimensionality and the coarse
resolution of fitness measurements. Moreover, because of the
scarcity of such measurements, most theoretical work has been
pursued in isolation from data.
Much of the theory of adaptation is concerned with understanding the dynamics on uncorrelated, or “rugged”, fitness landscapes.
This approach, pioneered by Kingman (2) and Kauffman and
Levin (3), has generated many important results (e.g. refs. (4–7)).
But many of these results do not extend to landscapes that are correlated. One striking example is the expected length of an adaptive
walk: It is extremely short on rugged landscapes (3, 8), but it can be
very long on correlated landscapes (9). Although data are scarce, a
long-term evolution experiment in Escherichia coli has found that
adaptation continues to proceed even after 20,000 generations in a
constant environment (10). This observation suggests that fitness
landscapes in nature are correlated.
A second body of work examines relatively realistic, complex
genotype-to-fitness maps—e.g. an RNA folding algorithm—and
studies adaptation on the resulting correlated landscapes by computer simulation (e.g. refs. (3, 11–15)). This approach provides
important insights into the process of adaptation, and it produces quantitative predictions about the specific systems being
simulated. But such results are difficult to generalize.
18638–18643
PNAS
November 3, 2009
vol. 106
no. 44
A third approach, orthogonal to the first two, was introduced by
Gillespie (16, 17) and revived more recently by Orr (8, 18, 19). It
utilizes extreme-value theory to identify features of the adaptation
process that are independent of the underlying fitness landscape.
Although helpful for understanding some fundamental properties
of evolution, this approach suffers from a few serious drawbacks.
Most importantly, by focusing on features of adaptation that are
independent of the fitness landscape, the Orr–Gillespie theory
does not elucidate how the structure of the landscape influences
adaptation, nor does it allow us to infer the landscape from empirical data. Yet this is a question of central interest in evolutionary
biology. In addition, most of the predictions of this theory concern
a single adaptive step (8, 18, 19), and those predictions that extend
to multiple steps hold again only for uncorrelated landscapes (20).
In order to address these shortcomings, we present here an elementary theory of adaptation on a correlated fitness landscape.
Our theory makes an explicit connection between the shape of
the fitness landscape and observable features of adaptation, and
it therefore allows us to infer important properties of the fitness
landscapes from data. Experimental studies of microbial evolution typically report the mean fitness of the population (21, 22)
and the mean number of accumulated substitutions (23, 24) over
time; therefore we develop a theory that predicts these dynamic
quantities, which we call the fitness and substitution trajectories,
in terms of the underlying fitness landscape.
To develop this theory, we need a sufficiently general but
tractable description of a correlated fitness landscape. As in Gillespie’s model (17), we will describe the fitness landscape by specifying the distribution of fitnesses of single-mutant neighbors for each
genotype, which we call the “neighbor fitness distribution” (NFD).
On an uncorrelated landscape, all genotypes share the same NFD.
We introduce correlations by assuming that the same NFD is
shared among genotypes that have the same fitness, but genotypes
of different fitnesses may have different NFDs. We say that such
landscapes are fitness-parameterized because the possible consequences of a mutation are determined only by the fitness of the
parental genotype (52). This framework accommodates arbitrary
correlations introduced by nonneutral mutations. But neutral networks (14, 25, 26) or mutations with equal effect but different evolutionary potential fall outside of the scope of fitness-parametrized
landscapes. Nevertheless, the space of fitness-parametrized landscapes is very large and contains most of the landscapes studied
in previous literature.
To understand this space better, we will first explore three classical fitness landscapes: the uncorrelated landscape (2, 5, 6, 20, 27),
the (additive) nonepistatic landscape (28, 29), and the landscape
Author contributions: S.K., G.T., and J.B.P. designed research; S.K. and G.T. performed
research; S.K. and G.T. analyzed data; and S.K., G.T., and J.B.P. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
Freely available online through the PNAS open access option.
1 S.K.
2 To
and G.T. contributed equally to the paper.
whom correspondence should be addressed. E-mail: [email protected].
This article contains supporting information online at www.pnas.org/cgi/content/full/
0905497106/DCSupplemental.
www.pnas.org / cgi / doi / 10.1073 / pnas.0905497106
Three Classical Fitness Landscapes. We describe a fitness landscape
by a family of probability distributions, Φx . Φx (y)dy denotes the
probability that a mutation arising in an individual of fitness x will
have a fitness in [y, y+dy]. The space of fitness-parametrized landscapes includes, among others, such well-known (2, 5, 6, 20, 27, 29–
31) landscapes as (i) the “house of cards” (HOC) or the uncorrelated landscapes, for which all genotypes have the same NFD
Φx (y) = Ψ(y); (ii) the non-epistatic (NEPI) landscapes, for which
the distribution of fitness effects of mutations is the same for all
genotypes, so that the NFD is given by Φx (y) = Ψ(y − x), and (iii)
the “stairway to heaven” (STH) landscapes, for which the distribution of selection coefficients is the same for all genotypes, so that
the NFD is given by Φx (y) = x−1 Ψ(x−1 (y − x)).
The definitions of these three well-known landscapes are summarized in Table 1, where we have assumed that the NFD follows
an exponential form. We will derive expressions for the expected
fitness and substitution trajectories on each of these landscapes.
Our results also hold qualitatively if we replace the exponential
distribution by any other distribution from the Gumbel domain
of attraction as predicted by the Orr–Gillespie theory (18). Note
that there are no deleterious or neutral mutations in the NEPI and
STH landscapes (Table 1), but our conclusions would not change
if we added such mutations (see SI Appendix).
Before we derive analytic expressions for the dynamics of adaptation on the three classical landscapes, we first develop some
intuitive expectations. On all landscapes, we expect substitutions
to accrue and the mean fitness to increase over time. For the
HOC landscapes, we expect that the rate of fitness increase
should slow down as the population becomes more adapted. To
see this slowdown, imagine a population initially at fitness x0 ,
∞
where x Ψ(y)dy = 0.5, i.e. 50% of mutations are beneficial. If
Fitness and Substitution Trajectories. In order to analyze the
dynamics of adaptation, we consider an asexual population of fixed
size N that evolves according to the infinite-sites Wright–Fisher
(WF) model (see Materials and Methods for details). We assume
that the mutation rate is sufficiently small that, at most, one mutant
segregates in the population at any time (8, 17). Thus, the population is essentially always monomorphic, and it can be characterized
at each time by its fitness x. When a mutation with fitness y arises,
it either fixes instantaneously with Kimura’s fixation probability
πx (y) = (1 − e−2sx (y) )/(1 − e−2Nsx (y) ) or is instantaneously lost with
probability 1 − πx (y) where sx (y) is the selection coefficient (see
Materials and Methods). In this limit, the adaptive walk of the
population is described by a continuous-time, continuous-space
Markov chain. We emphasize that, in contrast to the “greedy”
adaptive walks typically studied in the literature on rugged fitness landscapes (3, 4), the adaptive walks studied here never stop.
Even if a population reaches a local fitness maximum, a deleterious
mutation will eventually fix, and the walk will continue.
We have developed a method for efficiently computing the full
ensemble distribution of fitnesses and substitutions of the population at time t, given that its initial fitness was x0 at time zero
(see SI Appendix). Here we focus on two important statistics of
these distributions: the expected fitness of the population F(t) at
time t, and the expected number of substitutions S(t) accumulated
in the population by time t. We call these quantities the fitness
trajectory and the substitution trajectory, respectively. If we measure time in the expected number of mutations, these functions
approximately satisfy the following equations (see Materials and
Methods):
Ḟ = r(F),
F(0) = x0
[1]
Ṡ = q(F),
S(0) = 0,
[2]
APPLIED
Results
a beneficial mutation arises and fixes, providing fitness x1 > x0 ,
then this event can
∞ only reduce the pool of remaining beneficial
mutations—i.e. x Ψ(y)dy < 0.5. Thus, the rate of fitness increase
1
should be reduced as adaptation proceeds on the HOC landscape.
By contrast, on a STH landscape, we expect that the rate of fitness
increase will increase as the population adapts. Indeed, the fraction
of mutations that are adaptive does not change as fitness increases,
but the fitness increment of such mutations grows linearly with
the fitness of the parent (because the selection coefficient stays
the same). These simple considerations indicate that HOC landscapes are antagonistically epistatic, whereas STH landscapes are
synergistically epistatic. We call the landscape Φx (y) = Ψ(y − x)
nonepistatic because on this landscape the distribution of fitness
increments of mutations does not depend upon the fitness of
the parental genotype. If fitness effects were viewed multiplicatively, however, then the STH landscape would be considered
nonepistatic—although we do not adapt this convention here (see
ref. 28 for an extensive discussion on this topic). Moreover, as we
show below, the STH landscape produces unrealistic evolutionary
dynamics.
Table 1. Classical fitness landscapes with the exponential form and the corresponding fitness and substitution trajectories obtained from
Eqs. 1 and 2
Expected fitness increment∗
r(x)
NFD
Φx (y)
HOC
1 − ya
ae
NEPI
1 − y−x
a
ae
,
y ≥x
STH
1 − y−x
ax
ax e
,
y ≥x
, y ≥0
x
4a2 e− a
4a2
x
4a2 (1+a)
(1+2a)2
x
Fitness trajectory
F(t)
x0
a ln e a + 4at
x02 + 8a2 t
2
4a (a+1)
x0 exp (2a+1)2 t
Expected fixation probability∗
q(x)
x
2ae− a
2a
x
Substitution trajectory
S(t)
x0
x0
1
a + 4at −
2a ln e
2a2
1
x02 + 8a2 t − x0
2a
2a
2a+1
2a
1+2a
t
∗
Expressions for the r- and q-functions are derived in the limit x 1 (HOC, NEPI) and under the approximation N 1 (HOC, NEPI, STH). These approximations
are highly accurate, especially for large x (see Fig. 1). See SI Appendix for details.
Kryazhimskiy et al.
PNAS
November 3, 2009
vol. 106
no. 44
18639
EVOLUTION
0
MATHEMATICS
with a constant distribution of selection coefficients (30, 31).
We will demonstrate how the choice of landscape influences the
dynamics of adaptation. Having gained some insight from these
examples, we will classify fitness-parametrized landscapes in terms
of the qualitative evolutionary dynamics they produce. Remarkably, the qualitative dynamics fall into 14 possible classes, which
include, among others, the well-known classical examples. By
comparing these classes against observations from microbial evolution experiments (21), we will infer the space of landscapes that,
given our simplifying assumptions, are compatible with existing
data.
We will study the dynamics of adaptation in the limit of weak
mutation (8, 16, 17, 32), which allows us to ignore the effects of
multiple, competing beneficial mutations (30, 31, 33, 34). This
approach is mathematically convenient, and, more importantly,
it allows us to study the dynamics induced by the fitness landscape itself in isolation from those that result from clonal interference (30, 31, 35, 36). Our analysis will therefore provide a
null expectation against which to compare more complex models
or data.
Fig. 1. Dynamics of adaptation on three classical fitness landscapes. Rows correspond to fitness landscapes. The first column graphs the NFD, Φx (y), for two
representative values of the parental fitness, x0 = 1 and x0 = 4. The second and third columns show the fitness and substitution trajectories for a population
starting with fitness x0 = 2. Black lines correspond to the theoretical predictions of Eqs. 1 and 2; gray lines show the results of stochastic simulations; dashed
lines show a linear function, for reference. Note that axes are logarithmic. The fourth column shows the empirical distribution of selection coefficients of fixed
mutations; dashed lines show the best-fit regression on the semi-log scale, with slope k (only selection coefficients > 0.5 were used for fitting). Parameter
values: N = 1000; μ = 10−5 ; L = 1000; number of replicate simulations = 104 ; a = 1 for the HOC and the NEPI landscapes, and a = 0.42 for the STH landscape.
where the dot denotes a derivative with respect to time;
∞
πx (y)Φx (y) dy
q(x) =
[3]
0
is the expected fixation probability of a mutation arising in a
population with fitness x; and
∞
(y − x)πx (y)Φx (y) dy
[4]
r(x) =
0
is the expected fitness increment of such a mutation, weighted
by its fixation probability. Eqs. 1 and 2 were derived under the
infinite-sites assumption, i.e. each genotype was assumed to have
an infinite number of neighbors, so that even very fit genotypes
have a nonzero chance of discovering a beneficial mutation. Consistent with previous work (37), the infinite-sites approximation
is highly accurate, as we demonstrate by comparing (Fig. 1) the
solutions of these equations (Table 1) to simulations of a finite-site
model (see Materials and Methods).
Fig. 1 shows the dynamics of adaptation on the three classical
fitness landscapes. On the HOC landscape, both the expected fitness of the population and the expected number of substitutions
grow logarithmically with time, consistent with previous work (4).
As we expected, the rate of adaptation on such landscapes rapidly
declines as the fitness of the population grows. As the population adapts, there are two forces on the HOC landscape that act
against further adaptation. First, the fraction of mutations that
are beneficial decreases. Second, the probability of fixation of an
adaptive mutation decreases as well. This decrease occurs because
the fixation probability monotonically depends on its selection
coefficient, and the selection coefficients of available adaptive
mutations decline as the fitness of the parent increases. In addition, adaptation slows down further because the time to fixation
of beneficial mutations grows with declining selection coefficients.
However, this effect turns out to be negligible (see the comparison with the full WF model below). The rate of adaptation on the
NEPI landscape also slows down as the fitness increases, but it
does so less dramatically than on the HOC landscape. This behavior is expected because the fraction of beneficial mutations and
their effects do not change as the fitness of the parental genotypes increases. However, the selection coefficients of beneficial
mutations decrease, thereby reducing the rate of fitness growth.
Finally, on the STH landscape, the rate of mean-fitness increase
grows without bound over time, as expected. In contrast to HOC
and NEPI landscapes, there are no forces on such landscapes
18640
www.pnas.org / cgi / doi / 10.1073 / pnas.0905497106
that impede further adaptation as the population becomes more
adapted (hence the name “stairway to heaven”).
In order to investigate the robustness of the results in Fig. 1 with
respect to the assumption of weak mutation, we have simulated
the full stochastic WF model over a wide range of mutation rates.
These simulations incorporate the effects of competing mutations,
and they also account for the (nonzero) time to fixation. Our theoretical prediction matches the dynamics of the full WF model very
well when θ 0.1. Moreover, even when θ > 1, the concavities of
fitness and substitution trajectories are correctly predicted by our
theory (see SI Appendix).
Distribution of Selection Coefficients of Fixed Mutations. In
addition to fitness and substitution trajectories, we have investigated the distribution of selection coefficients for mutations that
fix during adaptation (Fig. 1, fourth column). By using computer
simulations, Orr previously showed that this distribution is approximately exponential (excluding small selection coefficients) for
uncorrelated landscapes whose NFD belongs to the Gumbel type
(8). Fig. 1 shows that Orr’s observation holds more generally—
i.e. even for correlated landscapes, such as the NEPI and STH
landscapes. In fact, the distribution of fixed selection coefficients
is so robust to changes in the landscape structure that virtually no
inference can be made on its basis. To demonstrate this problem,
we have chosen the parameter a (see Table 1) so that the resulting
distributions of fixed selection coefficients are virtually the same
for all three classical fitness landscapes, even though their qualitative trajectories are completely different (Fig. 1). In other words,
the selection coefficients associated with mutations that are fixed
during evolution tell us very little about the long-term behavior
of an adapting population or the fitness landscape on which it is
evolving.
Toward a Classification of Landscapes. The space of all possible fitness landscapes is vast. We therefore wish to classify landscapes in
terms of the qualitative evolutionary dynamics they produce—i.e.
in terms of their fitness and substitution trajectories, which can
be directly observed in an experiment. Our analytic approximation in Eqs. 1 and 2 captures the behavior of the trajectories quite
well, especially as the population reaches high fitnesses (Fig. 1).
Remarkably, these equations depend on only two simple functions
of the landscape: the expected fixation probability of a mutation arising in a population of fitness x, q(x), and the expected
fitness increment of such a mutation weighted by its fixation probability, r(x). By varying just these two quantities, we can explore
all possible qualitative behaviors of the fitness and substitution
trajectories.
Kryazhimskiy et al.
Fig. 2. Classification of fitness landscapes. Column 1 shows five possible shapes for the r-function, and three possible shapes for the q-function. In some
cases, these functions have asymptotes, shown as dashed horizontal lines. Columns 2–6 show the fitness (Upper) and substitution (Lower) trajectories for the
15 landscapes that arise through combinations of r- and q-functions. Substitution trajectories for landscapes with q-function of type A, B, and C are shown
in green, dark orange, and purple, respectively. In some cases, the fitness or substitution trajectories possess asymptotic slopes, shown as dashed lines in
the corresponding color. In these cases, the asymptotic slope equals the asymptotic value of the corresponding r- or q-function (except for the substitution
trajectories in case V). Landscapes V-B and V-C both have asymptotically linear substitution trajectories, and therefore fall into the same class.
Inferring Landscape Structure From Data. Which fitness landscapes
are compatible with empirical data, and which are not? To address
this question, we have compared predicted evolutionary dynamics with data from long-term evolution experiments. Empirical
fitness trajectories in a fixed environment typically have negative
curvature: Fitness increases quickly at the early stages of adaptation, and more slowly at later stages (10, 21, 22, 39–42). This
negative curvature implies that the r-functions for landscapes in
nature belong to type III, IV or V. In other words, a large class
of strongly synergistic landscapes (those with an increasing rfunction) are incompatible with basic, empirical observations. The
space of unrealistic fitness landscapes includes the widely used
STH landscapes (30, 31, 33–35, 43–45), for which r(x) ∼ x.
Kryazhimskiy et al.
Landscapes with either antagonistic epistasis (r(x) < Cx−1 ) or
weak synergistic epistasis (Cx−1 < r(x) ≤ C) produce fitness trajectories that are concave, and so they are qualitatively consistent
with data from microbial evolution experiments. We can use such
data to estimate the sign and strength of epistasis. In order to do so,
we assume that the r-function has the form r(x) = Bxβ with B > 0
and β ≤ 0. This form is convenient because it includes nonepistatic landscapes when β = −1, weakly synergistic landscapes when
−1 < β ≤ 0, and antagonistic landscapes when β < −1. Eq. 1 can
then be solved analytically, and the fitness trajectory is given by
1−β
1
F(t) = x0 + B(1 − β)t 1−β .
[5]
PNAS
November 3, 2009
vol. 106
no. 44
18641
APPLIED
Discussion
The framework developed here addresses two key problems in the
theory of adaptation: how to characterize evolution on a correlated
fitness landscape and how to infer properties of a fitness landscape
from empirical data. Our analysis has relied on two assumptions:
weak mutation and the fitness parametrization of the landscape.
The assumption of weak mutation, although restrictive, has been
used in previous literature and provides a reasonable starting
point for future research. Relaxing this assumption presents substantial mathematical complications and introduces entirely new
phenomena, such as clonal interference (30, 35) and “piggybacking” (31, 36). Therefore, we must first have a solid understanding
of adaptation dynamics under weak mutation before proceeding
to incorporate these additional effects. Without a theory of weak
mutation, we would be unable to disentangle the effects of the
fitness landscape itself from the effects of clonal interference. In
the future, experiments whose primary goal is to probe the fitness
landscape should be designed to minimize the effects of clonal
interference, e.g. by choosing small population sizes.
The fitness parametrization is a less-restrictive assumption,
especially when weak mutation is already assumed. Indeed, neutral networks are important for adaptation only when a population
can use them to quickly access previously inaccessible beneficial
mutations. This regime only occurs when the population is polymorphic, i.e. when θ > 1. In contrast, a monomorphic population
MATHEMATICS
It follows from this expression that the slope of the line fitted on
the log–log scale to the fitness trajectory observed in a long-term
evolution experiment provides an estimate of (1−β)−1 . We applied
this procedure to data from the evolutionary experiment by Lenski
et al. (21) and found that β̂ = −9.58 with the 95% confidence interval [−13.36, −7.38], suggesting that the fitness landscape of E. coli
is, on average, strongly antagonistically epistatic. This qualitative
conclusion is robust with respect to the violation of the weak mutation assumption (see SI Appendix), although the precise estimate
of β may change with the development of more refined models of
E. coli evolution.
EVOLUTION
For the purpose of classification, we consider only landscapes
that are defined on the whole positive real axis, and whose r- and qfunctions are monotonic and smooth. The five different shapes of
the r-function and three different shapes of the q-function determine, respectively, five qualitatively different fitness trajectories
and three qualitatively different substitution trajectories (Fig. 2).
Landscapes with an increasing or decreasing r-function produce
convex (type I and II) or concave (types III, IV, and V) fitness trajectories, respectively. More specifically, fitness trajectories grow
superlinearly with time (type I), are asymptotically linear (type II
and III), grow sublinearly (type IV), or asymptote to a constant
(type V). Similarly, landscapes with an increasing or decreasing q-function produce convex (type A) or concave (types B and
C) substitution trajectories, respectively. Substitution trajectories
grow asymptotically linearly (type A and B), or sublinearly (type
C). Considering all possible combinations of the r- and q-functions
produces a total of 14 classes of qualitatively different evolutionary
dynamics (Fig. 2).
This classification scheme accommodates the three classical
landscapes considered above. The STH landscapes belong to class
I-A or I-B, because q(x) is constant and r(x) grows without bound.
The NEPI landscapes belong to class IV-C, because both r(x)
and q(x) decay as x−1 . The HOC landscapes belong to class VC because r(x) is negative for large x and q(x) decays to zero.
Recall that the STH landscapes are synergistically epistatic and
the HOC landscapes are antagonistically epistatic. This observation suggests the following natural definition: landscapes for which
the r-function either grows or decays slower than x−1 are synergistically epistatic (types I, II, III, and IV), whereas landscapes for
which the r-function decays faster than x−1 are antagonistically
epistatic (types IV and V).
Remarkably, the substitution trajectories for landscapes of type
IV or V are almost linear—a pattern long considered the hallmark
of neutral or nearly neutral evolution (38). As these correlated
landscapes demonstrate, this pattern can also arise when substitutions confer significant fitness gains. In fact, the linear accrual
of adaptive mutations has recently been observed in experimental
populations (53).
can explore the neutral network only very slowly, by substituting neutral mutations (26). Such a population is far more likely
to substitute a beneficial mutation and jump to a new neutral
network.
We have studied several quantities that characterize evolutionary dynamics. We found that the distribution of selection coefficients of fixed mutations is insensitive to the underlying NFD,
consistent with previous findings (8, 46, 47). In contrast, the fitness and substitution trajectories are very informative about the
underlying fitness landscape. In particular, the substitution trajectory is convex or concave on landscapes for which the fixation
probability of a mutation increases or decreases with increasing
fitness, respectively. Similarly, the fitness trajectory is convex or
concave on landscapes for which the expected fitness increment
of a mutation increases or decreases with increasing fitness. Moreover, the curvature of the fitness trajectory is informative about
the sign and strength of epistasis in the fitness landscape.
These results provide a groundwork for inferring fitness landscapes from dynamic data. In particular, we have shown that
data from bacterial evolution experiments are incompatible
with landscapes that feature a constant distribution of selection
coefficients—even though such landscapes are often used in the
theoretical literature. We have also proposed a simple method
for inferring the sign and strength of epistasis from such data. In
contrast to most other estimates of epistasis that are based on measurements of interactions among deleterious mutations (see e.g.
ref. 48 and references therein), we provide an estimate of epistasis based on the interaction among beneficial mutations—which is
more informative for the long-term dynamics of adaptation. Our
estimates suggest that the E. coli fitness landscape is characterized by strong antagonistic epistasis, at least in a fixed laboratory
environment, which is consistent with one previous study (49).
However, the precise type of landscape (e.g. type IV versus type V)
for E. coli or other microorganisms may be difficult to determine
on the basis of fitness and substitution trajectories alone. The
ensemble variance in trajectories across experimental replicates
may provide additional power (see SI Appendix).
Here we have focused on static fitness landscapes, which probably arise only in laboratory environments. Fitness landscapes in the
field are likely dynamic because of fluctuations in the environment
or frequency-dependent selection. We can hope to understand the
evolutionary dynamics on such landscapes only after we acquire
a firm understanding of static landscapes. Our elementary theory provides an explicit link between the form of static fitness
landscapes and their resulting evolutionary dynamics, in terms of
simple observable quantities. Hopefully, this link will help bring
together theoretical and experimental studies of adaptation.
including neutral ones, is much shorter than the waiting time until the arrival
of the next mutation. Therefore, the population is monomorphic at virtually
all times, and occasionally it transitions almost instantaneously to a new type
(17). Individuals and the population as a whole are characterized by their
fitness, x. Φx (y)dy denotes the fitness-parametrized landscape, i.e. the probability that the mutation arising in an individual with fitness x has fitness
y. We assume that genome length is sufficiently large so that each mutation
occurs at a new site. A mutation fixes in the population with Kimura’s fixation
probability πx (y) = (1 − e−2sx (y) )/(1 − e−2Nsx (y) ) where sx (y) = y/x − 1 is the
selection coefficient (50). If a mutation arises and fixes, then the population
instantaneously transitions from fitness x to fitness y—we ignore the time it
takes for a mutation to fix. We can thus describe the sequence of such transitions by a stationary continuous-time Markov chain, whose state space is the
semi axis [0, +∞). The population waits θ−1 generations for the next mutation on average. If we measure time by the expected number of mutations,
the probability that the population has fitness in [y, y + dy] at time t + δt,
given it had fitness x at time t, is Φx (y)πx (y)dyδt.
and substitution trajectories as F(t, x) =
∞We define the fitness ∞
i=0 iPi (t|x), respectively, where P(y, t|x) is the
0 yP(y, t|x)dy, and S(t, x) =
probability that the population has fitness in [y, y + δy] at time t, given initial
fitness x, and Pi (t|x) is the probability that the population has accumulated
i substitutions by time t, given initial fitness x [for simplicity we also write
F(t) and S(t)]. It follows from the classical Markov chain theory that F and S
satisfy the equations (see SI Appendix)
∂F
(t, x) = (K̂b F(t, ·))(x), F(0, x) = x,
∂t
∂S
(t, x) = (K̂b S(t, ·))(x) + q(x), S(0, x) = 0,
∂t
[6]
[7]
where K̂b is defined by
∞
(K̂b f (·))(x) =
Φx (ξ)πx (ξ)(f (ξ) − f (x))dξ,
[8]
0
which is the backward Kolmogorov operator. In the SI Appendix, we present
an efficient numerical method for finding the whole distributions P(y, t|x)
and Pi (t|x).
On landscapes for which mutations of large effect become increasingly
unlikely as the fitness of the population increases, most of the contribution to the integral in Eq. 8 comes from values ξ ≈ x, and we can write
f (ξ) − f (x) ≈ f (x)(ξ − x). Consequently, (K̂b f (·))(x) ≈ r(x)f (x), where
r(x) is given by Eq. 4. Therefore, Eqs. 6 and 7 can be approximated by socalled advection equations that turn out to be equivalent to Eqs. 1 and 2
(see SI Appendix for details). Eqs. 1 and 2 are closely related to those
derived by Tachida (51) and Welch and Waxman (37) for the uncorrelated
landscape.
In stochastic simulations, we implement a finite-site version of the model
described above. In these simulations, after a substitution has occurred, a
sample of size L = 1, 000 is drawn from the distribution Φx , which represents
the (finite) mutational neighborhood of the current genotype. Each of these
L-neighboring genotypes has the same probability to be drawn at a subsequent mutation event. Our results do not depend on the value of L on the
time scales examined as long as L is large (e.g. L ≥ 103 ). Code written in the
Objective Caml language is available upon request.
We consider an asexual population of fixed size N that evolves according to the infinite-sites WF model (50) with a small mutation rate, so that
θ (4 log N)−1 , where θ = Nμ and μ is the per-locus, per-generation mutation rate. This condition ensures that the absorption time of all mutations,
ACKNOWLEDGMENTS. The authors thank Richard Lenski, Michael Desai,
Todd Parsons, and Jeremy Draghi for many fruitful discussions. J.B.P. acknowledges support from the Burroughs Wellcome Fund, the David and Lucile
Packard Foundation, the James S. McDonnell Foundation, the Alfred P. Sloan
Foundation, and Defense Advanced Research Projects Agency Grant HR001105-1-0057. G.T. acknowledges support from National Science Foundation
Grants IBN-0344678 and DMR04-25780.
1. Aita T, et al. (2007) Extracting characteristic properties of fitness landscape from in
vitro molecular evolution: A case study on infectivity of fd phage to E. coli. J Theor
Biol 246:538–550.
2. Kingman JFC (1978) A simple model for the balance between selection and mutation.
J Appl Prob 15:1–12.
3. Kauffman S, Levin S (1987) Towards a general theory of adaptive walks on rugged
landscapes. J Theor Biol 128:11–45.
4. Flyvbjerg H, Lautrup B (1992) Evolution in a rugged fitness landscape. Phys Rev A
46:6714–6723.
5. Park, SC, Krug, J (2008) Evolution in random fitness landscapes: The infinite sites
model. J Stat Mech P04014.
6. Macken CA, Perelson AS (1989) Protein evolution on rugged landscapes. Proc Natl
Acad Sci USA 86:6191–6195.
7. Kauffman S, Weinberger ED (1989) The NK model of rugged fitness landscape and its
application to maturation of the immune response. J Theor Biol 141:211–245.
8. Orr HA (2002) The population genetics of adaptation: The adaptation of DNA
sequences. Evolution 7:1317–1330.
9. Orr HA (2006) The population genetics of adaptation on correlated fitness landscapes:
the block model. Evolution 60:1113–1124.
10. Cooper VS, Lenski RE (2000) The population genetics of ecological specialization in
evolving Escherichia coli populations. Nature 407:736–739.
11. Perelson AS, Macken CA (1995) Protein evolution on partially correlated landscapes.
Proc Natl Acad Sci USA 92:9657–9661.
12. Newman MEJ, Engelhardt R (1998) Effects of selective neutrality on the evolution of
molecular species. Proc R Soc London Ser B 265:1333–1338.
13. Adami C (2006) Digital genetics: Unravelling the genetic basis of evolution. Nat Rev
Genet 7:109–118.
14. Cowperthwaite MC, Meyers LA (2007) How mutational networks shape evolution:
Lessons from RNA models. Annu Rev Ecol Evol Syst 38:203–230.
15. Ndifon, W, Plotkin, JB, Dushoff, J (2009) On the accessibility of adaptive phenotypes
of a bacterial metabolic network. PLoS Comput Biol 5:e1000472.
16. Gillespie JH (1983) A simple stochastic gene substitution model. Theor Pop Biol
23:202–215.
17. Gillespie, JH (1994) The Causes of Molecular Evolution (Oxford Univ Press, Oxford).
18. Orr HA (2003) The distribution of fitness effects among beneficial mutations. Genetics
163:1519–1526.
19. Joyce P, Rokyta DR, Beisel CJ, Orr HA (2008) A general extreme value theory model
for the adaptation of DNA sequences under strong selection and weak mutation.
Genetics 180:1627–1643.
20. Rokyta DR, Beisel CJ, Joyce P (2006) Properties of adaptive walks on uncorrelated landscapes under strong selection and weak mutation. J Theor Biol 243:114–
120.
Materials and Methods
18642
www.pnas.org / cgi / doi / 10.1073 / pnas.0905497106
Kryazhimskiy et al.
EVOLUTION
APPLIED
38. Kimura M, Ohta T (1968) Protein polymorphism as a phase of molecular evolution.
Nature 229:467–469.
39. Bull JJ, et al. (1997) Exceptional convergent evolution in a virus. Genetics 147:1497–
1507.
40. Elena SF, Davila M, Novella IS, Holland JJ, Esteban (1998) Evolutionary dynamics of fitness recovery from the debilitating effects of Muller’s ratchet. Evolution
52:309–314.
41. de Visser, JAGM, Lenski RE (2002) Long-term experimental evolution in Escherichia
coli. XI Rejection of non-transitive interactions as cause of declining rate of adaptation. BMC Evol Biol 2:19.
42. Hayashi Y, et al. (2006) Experimental rugged fitness landscape in protein sequence
space. PLoS ONE 1:e96.
43. Orr HA (2000) The rate of adaptation in asexuals. Genetics 155:961–968.
44. Johnson T, Barton NH (2002) The effect of deleterious alleles on adaptation in asexual
populations. Genetics 162:395–411.
45. Bachtrog D, Gordo I (2004) Adaptive evolution of asexual populations under Muller’s
ratchet. Evolution 58:1403–1413.
46. Rozen DE, de Visser JAG, Gerrish PJ (2002) Fitness effects of fixed beneficial mutations
in microbial populations. Curr Biol 12:1040–1045.
47. Hegreness M, Shoresh N, Hartl D, Kishony R (2006) An equivalence principle for
the incorporation of favorable mutations in asexual populations. Science 311:1615–
1617.
48. Kouyos RD, Silander OK, Bonhoeffer S (2007) Epistasis between deleterious mutations
and the evolution of recombination. Trends Ecol Evol 22:308–315.
49. Sanjuán R, Moya A, Elena SF (2004) The contribution of epistasis to the architecture
of fitness in an RNA virus. Proc Natl Acad Sci USA 101:15376–15379.
50. Crow, JF, Kimura, M (1972) An Introduction to Population Genetics Theory (Harper &
Row, New York).
51. Tachida H (1991) A study on a nearly neutral mutation model in finite populations.
Genetics 128:183–192.
52. Brandt H (2001) Correlation Analysis of Fitness Landscapes. (International Institute for
Applied Systems Analysis, Laxenburg, Austria), Interim Report IR-01-058.
53. Barrick JE, et al. (2009) Genome evolution and adaptation in a long-term experiment
with E. coli. Nature, 10.1038/nature08480.
Kryazhimskiy et al.
PNAS
November 3, 2009
vol. 106
no. 44
18643
MATHEMATICS
21. Lenski RE, Travisano M (1994) Dynamics of adaptation and diversification: A 10,000generation experiment with bacterial populations. Proc Natl Acad Sci USA 91:6808–
6814.
22. Silander OK, Tenaillon O, Chao L (2007) Understanding the evolutionary fate of finite
populations: The dynamics of mutational effects. PLoS Bio 5:e94.
23. Paquin C, Adams J (1983) Frequency of fixation of adaptive mutations is higher in
evolving diploid than haploid yeast populations. Nature 302:495–500.
24. Wichman HA, Millstein J, Bull JJ (2005) Adaptive molecular evolution for 13,000 phage
generations: A possible arms race. Genetics 170:19–31.
25. Fontana W, Schuster P (1998) Continuity in evolution: On the nature of transitions.
Science 280:1451–1455.
26. van Nimwegen E, Crutchfield JP, Huynen M (1999) Neutral evolution of mutational
robustness. Proc Natl Acad Sci USA 96:9716–9720.
27. Orr HA (2003) A minimum on the mean number of steps taken in adaptive walks.
J Theor Biol 220:241–247.
28. Mani R, St. Onge RP, Hartman IV, JL, Giaever G, Roth FP (2008) Defining genetic
interaction. Proc Natl Acad Sci USA 105:3461–3466.
29. Eshel I (1971) On evolution in a population with an infinite number of types. Theor
Pop Biol 2:209–236.
30. Gerrish PJ, Lenski RE (1998) The fate of competing beneficial mutations in an asexual
population. Genetica 102–103:127–144.
31. Desai MM, Fisher DS (2007) Beneficial mutation-selection balance and the effect of
linkage on positive selection. Genetics 176:1759–1798.
32. Dieckmann U, Law R (1996) The dynamical theory of coevolution: a derivation from
stochastic ecological processes. J Math Biol 34:579–612.
33. Rouzine IM, Wakeley J, Coffin JM (2003) The solitary wave of asexual evolution. Proc
Natl Acad Sci USA 100:587–592.
34. Park SC, Krug J (2007) Clonal interference in large populations. Proc Natl Acad Sci USA
104:18135–18140.
35. Wilke CO (2004) The speed of adaptation in large asexual populations. Genetics
167:2045–2053.
36. Zeyl C (2007) Evolutionary genetics: A piggyback ride to adaptation and diversity. Curr
Biol 17:R333.
37. Welch JJ, Waxman D (2005) The nk model and population genetics. J Theor Biol
234:329–340.
The Dynamics of Adaptation on
Correlated Fitness Landscapes
Sergey Kryazhimskiy, Gašper Tkačik, Joshua B. Plotkin1
Supporting information
1
To whom correspondence should be addressed. E-mail: [email protected]
1
1
Markov chain formalism and solutions
As described in the main text, we consider an asexual population of fixed size N that
evolves according to the Wright-Fisher model in the limit of low mutation rates [1, 2]. The
type of an individual is determined solely by its fitness. Since under the weak-mutation
limit the population is monomorphic except for negligibly brief periods when a mutation
sweeps to fixation, the state of the population as a whole is completely described by the
current fitness of its individuals. Φx (y)dy denotes the fitness parametrized landscape, i.e.
the probability that the mutation arising in an individual with fitness x has a fitness in
[y, y + dy]. sx (y) = y/x − 1 is the selection coefficient of such a mutation in a population
with fitness x. The probability of fixation of the mutant is then given by [3]
πx (y) = π(sx (y)) =
1 − e−2sx (y)
,
1 − e−2N sx (y)
(S1)
which, in the infinite population size limit becomes πx (y) = 1 − e−2sx (y) for y > x and zero
otherwise. If the mutation is fixed, the population transitions instantaneously from fitness
x to new fitness y. The adaptive walk is then described by a stationary continuous-time
Markov chain with state space [0, +∞). The population waits for the next mutation on
average θ−1 generations where θ = µN is per locus per generation mutation rate scaled
by population size. If time t is measured in the expected number of arrived mutations,
the instantaneous transition rate from state x to state y is
Q(y|x) = Φx (y)πx (y).
(S2)
We are interested in the probability P (y, t|x) of finding the population at fitness value
y after time t given initial fitness x at time zero and in the probability Pi (t|x) for the
population to accumulate i substitutions by time t given initial fitness x. Here we present
the general operator-based formulation which is well-suited for the mathematical analysis
and for analytic calculations with simple fitness landscapes. We also derive recursion relations appropriate that are convenient for the numerical computation of the distributions
P (y, t|x) and Pi (t|x) as well as their moments.
1.1
Formal solutions
Define the forward and backward operators by
Z ∞
K̂f f (·) (y) =
f (ξ)Q(y|ξ) − f (y)Q(ξ|y) dξ,
Z0 ∞
Q(ξ|x) f (ξ) − f (x) dξ,
K̂b f (·) (x) =
0
2
(S3)
(S4)
respectively. It follows from the standard Markov chain theory that P (y, t|x) satisfies the
forward and backward Kolmogorov equations
∂P
(y, t|x) =
∂t
∂P
(y, t|x) =
∂t
K̂f P (·, t|x) (y)
(S5)
K̂b P (y, t|·) (x),
(S6)
with the initial condition
P (y, 0|x) = δ(y − x),
(S7)
where δ(z) is the Dirac delta-function. The formal solutions to the equations (S5)–(S7)
can be written as
P (y, t|x) = exp{t K̂f }P (·, 0|x) (y),
(S8)
P (y, t|x) = exp{t K̂b }P (y, 0|·) (x),
(S9)
P
i
where the operator exponentiation is defined as exp{F̂ } = ∞
i=0 F̂ /i!.
The equations for Pi (t|x) are more cumbersome. In the next section we show that
Pi (t|x) satisfy recursive equations
∂P0
(t|x) = −q(x)P0 (t|x),
∂t
∂Pi
(t|x) = −q(x) Pi (t|x) − Pi−1 (t|x) + K̂b Pi−1 (t|·) (x),
∂t
(S10)
i = 1, 2, . . . (S11)
with the initial condition
Pi (0|x) = δi0 ,
(S12)
where δij is the Kronecker delta and
Z
∞
q(x) =
Q(y|x) dy
(S13)
0
is the expected fixation probability of a mutant that occurs in the background x (see also
equation (3) in the main text),
The solution to equations (S10)–(S12) is given by
P0 (t|x) = e−q(x)t
Z t Z
Pi (t|x) =
dτ
0
(S14)
∞
e−q(x)(t−τ ) Q(ξ|x)Pi−1 (τ |ξ) dξ,
0
for i = 1, 2, . . .
3
(S15)
1.2
Derivation of the distribution Pi (t|x)
In order to derive the equations (S10), (S11), note that the probability P (y, t|x) of
observing the population at fitness y at time t given that it had fitness x at time zero can
be expressed as a sum of the probabilities of reaching fitness y from fitness x in time t
with any possible number of substitutions,
∞
X
P (y, t|x) =
Pi (y, t|x).
i=0
Pi (y, t|x) is the probability for the population to reach fitness y by time t in exactly i
substitution events, given the initial fitness x. ThenR the probability of having accumu∞
lated exactly i substitutions by time t is Pi (t|x) = 0 Pi (y, t|x)dy. It is easy to derive
the recursion relations for Pi (y, t|x) from the following considerations. First note that
after zero substitutions the population must have the initial fitness x and, since the first
substitution will occur with rate q(x), we have
P0 (y, t|x) = δ(y − x)e−q(x)t .
(S16)
After integrating this expression over y we obtain (S14). Now, in order for the population
to be in fitness y at time t after exactly i substitutions, the first substitution must have
occurred at some time τ < t which moved the population to some fitness ξ after which
another i − 1 substitutions brought it to fitness y in the period of time between τ and t.
So, conditioned on ξ and τ , the probability of finding the system in state y at time t is the
product of three probabilities: (a) the probability of the first substitution occurring at
time τ , q(x)e−q(x)τ , (b) the probability that the first substitution moves the population to
fitness ξ, Q(ξ|x)/q(x), and (c) the probability that i−1 substitutions move the population
from fitness ξ to fitness y in the time period t − τ , Pi−1 (y, t − τ |ξ). Therefore, integrating
over all τ and ξ,
Z ∞ Z t
Pi (y, t|x) =
dξ
e−q(x)τ Q(ξ|x)Pi−1 (y, t − τ |ξ) dτ, for i = 1, 2, . . .
(S17)
0
0
P
It is easy to show that ∞
i=0 Pi (y, t|x) with Pi (y, t|x) defined by equations (S16), (S17)
satisfies the backward Kolmogorov equation (S6). To compute the number of substitutions
at time t, we rewrite the recursion equation (S17) as
Z ∞
Z t
Pi (y, t|x) =
Q(ξ|x) dξ
e−q(x)(t−τ ) Pi−1 (y, τ |ξ) dτ
0
0
from which (S15) follows after integration with respect to y. Equations (S10), (S11) follow
from (S14), (S15) by differentiating with respect to t.
1.3
Fitness and substitution trajectories
We call the expected value of the distribution P (y, t|x) the fitness trajectory F (t, x)
and we call the expected value of the distribution Pi (t|x) the substitution trajectory
S(t, x).
4
1.3.1
General equations for the fitness and substitution trajectories
Multiplying the backward equation (S6) by y and integrating it with respect to y, we
obtain
∂F
(t, x) = K̂b F (t, ·) (x)
(S18)
∂t
with the initial condition
F (0, x) = x,
(S19)
whose formal solution is given by
F (t, x) = exp{tK̂b }I(·) (x),
(S20)
where I(x) = x is the identity function. Analogously, from equations (S10), (S11) follows
that the substitution trajectory satisfies the equation
∂S
(t, x) = q(x) + K̂b S(t, ·) (x).
∂t
(S21)
S(0, x) = 0,
(S22)
with the initial condition
whose solution is given by
S(t, x) =
∞
X
i=0
(t)i+1
K̂bi q(·) (x)
.
(i + 1)!
(S23)
An obvious result follows immediately from equation (S21): if the rate rate of substitutions
is the same for all fitnesses, i.e., if q(x) = q0 = const, then the substitutions accumulate
linearly with time. Indeed, since K̂b q0 (x) ≡ 0, equation (S21) becomes an ordinary
differential equation whose solution is S(t, x) = q0 t.
1.3.2
Approximate equations for the fitness and substitution trajectories
In this section we derive equations (1)–(2) in the main text. We assume that the advection approximation holds and the r(x) and q(x) functions are sufficiently smooth. First,
we notice that on landscapes for which mutations of large effect become increasingly unlikely as the fitness of the parent increases, most of the contribution to the integral (S4)
comes from
values ξ ≈ x and we can write f (ξ) − f (x) ≈ f 0 (x)(ξ − x). Consequently,
K̂b f (·) (x) ≈ r(x)f 0 (x), where r(x) is given by equation (3) in the main text. Under
this so-called advection approximation, equations (S18), (S21) become
∂F
∂F
(t, x) = r(x)
, F (0, x) = x
∂t
∂x
∂S
∂S
(t, x) = r(x)
+ q(x), S(0, x) = 0,
∂t
∂x
where q(x) is defined by equation (4) in the main text (or equation (S13) above).
5
(S24)
(S25)
In fact, equations (S24) and (S25) are equivalent to equations equations (1)–(2) in the
main text. To see this, first, let
Z x
dξ
χ(x0 , x, t) =
+ t.
x0 r(ξ)
This function is monotonic in x and in x0 as long as r(ξ) does not change sign. Since we
are interested in adaptation, we always have r(ξ) > 0, so that we can solve the equation
χ(x0 , x, t) = 0 with respect to x0 . Denote the solution as x0 = u(x, t). Analogously, we
obtain the solution of the same equation with respect to x, x = v(x0 , t).
Both equations (S24) and (S25) have the same characteristic which is given by equation
dx
= −r(x),
dt
x(0) = x0 .
The solution of equation (S24) does not change along this characteristic, and therefore it
is given by F (t, x) = u(x, t). Using the implicit function differentiation rules, it is easy to
see that F (t, x) satisfies equation (1) in the main text.
The solution of equation (S25) changes along this characteristic according to equation
dS
= q(v(x0 , t)),
dt
S(x0 , 0) = 0,
and therefore it is given by
Z
t
Z
F (t,x)
q(v(u(x, t), τ ))dτ =
S(t, x) =
x
0
q(ζ)
dζ.
r(ζ)
Here we used the fact that v(x0 , 0) = x0 and v(u(x, t), t) ≡ x. Now it is easy to see that
S(t, x) satifies equation (2) in the main text.
1.4
Numerical algorithm
Only in some special cases can the formulas (S8), (S9), (S14), (S15) be effectively used
for evaluating the distributions P (y, t|x) and Pi (t|x). We propose the following recursion
equations for the efficient numerical implementation.
1.4.1
Computing distribution P (y, t|x)
The basic idea behind the recursion is to write the probability P (y, t|x) as the sum over
all possible paths connecting the initial fitness x with fitness y at time t, each with a
particular number m = 0, 1, . . . of mutations.
P (y, t|x) =
∞
X
Um (t) Vm (y|x).
m=0
6
(S26)
Here, Um (t) is the probability of observing m mutations during time interval [0, t], and
Vm (y|x) is the probability for a change in fitness from initial value x to final value y that
takes exactly m mutational attempts. Because the mutations arise independently, Um (t)
is the Poisson distribution with parameter t,
Um (t) =
(t)m −t
e .
m!
(S27)
Note that the sum in equation (S26) runs over all possible numbers of mutations, some
of which will fix and some of which will not; if we conditioned on the mutations having
been fixed, the distribution U would no longer be Poisson.
The sequence Vm (y|x) can be written as follows:
V0 (y|x)
Vm (y|x)
=
=
δ(y − x),
Z
Z ∞
Q(y|ξ)Vm−1 (ξ|x) dξ + Vm−1 (y|x)
(S28)
∞
(1 − πy (ξ))Φy (ξ) dξ
0
0
for m = 1, 2, . . . .
(S29)
The relations (S29) have a simple intuitive interpretation. For each m but m = 0, the
distribution of fitnesses after exactly m mutations is a sum of two terms. The first term
accounts for the situation when m − 1 mutations preceding the current one have brought
the population into state ξ. This term equals the probability that a mutation with fitness
y arises and is successfully fixed in the population with the intermediate fitness ξ. The
second term accounts for the situation when fitness y has already been reached with the
preceding m − 1 mutations. In order for the final fitness to still be y, the m-th mutation
whose fitness is ξ must fail to fix.
In practice, one computes the distribution of the number of mutations from equation
(S27) to find the range of m over which U is non-negligible, evaluates by recursion the
terms in equation (S29) in the relevant range, and finally sums them up according to
equation (S26). A Matlab implementation of this algorithm is available upon request.
Now we show that the solution to the recursion relations (S26)–(S29) in fact coincides
with the solution (S8) of the forward equation (S5). First, note that equations (S29) can
be written in the form
Vm (y|x) = Vm−1 (y|x) + K̂f Vm−1 (·|x) (y)
for m = 1, 2, . . . Now it is easy to see that, in fact,
!
m X
m
Vm (y|x) =
K̂fi V0 (·|x) (y).
i
i=0
(S30)
Substituting (S27) and (S30) into (S26) and changing the order of summation, we obtain
!
∞
X
(tK̂f )i
V0 (·|x) (y),
P (y, t|x) =
i!
i=0
which coincides with (S8).
7
1.4.2
Computing distribution Pi (t|x)
To compute the distribution Pi (t|x), let us first write it as
Pi (t|x) =
∞
X
Um (t)Wm (i|x),
(S31)
m=i
where Wm (i|x) is the probability that out of m mutations exactly i have fixed, given the
initial fitness x; clearly Wm (i|x) ≡ 0 if i > m. First, note that the probability wj that
the j-th mutation has fixed, is given by the first term of equation (S29) integrated over
all final fitnesses y,
Z ∞ Z ∞
wj (x) =
dy
Q(y|ξ)Vj−1 (ξ|x) dξ, j = 1, 2 . . .
0
0
Let us describe the fate of m mutations by a vector σ m = (σ1 , σ2 , . . . , σm ) where σj = 1
if the j-th mutation has fixed, and σj = 0 if it was lost. The event that out of m mutations
exactly
i have fixed encompasses all events that are described by vectors P
σ m such that
Pm
m
σ
=
i.
Denote
the
set
of
all
such
elementary
events
by
Σ
=
{σ
:
m,i
m
j=1 σj = i}.
j=1 j
For example, the event that, out of 2 mutations, exactly one has fixed can be realized by
σ 2 = (1, 0) where the first mutation has fixed and the second has not and by σ 2 = (0, 1)
where the second mutation has fixed and the first has not; therefore Σ2,1 = {(1, 0), (0, 1)}.
Then, since all members of the set Σm,i are mutually exclusive,
Wm (i|x) =
m
XY
σ
wj j (x)(1 − wj (x))1−σj .
(S32)
Σm,i j=1
Of course, if wj (x) were equal for all j, this expression would reduce to the binomial
probability with parameters m and wj (x). In general, wj (x) are not equal, and equation
(S32) is difficult to evaluate. We conjecture, however, that the sum in (S32) is usually
dominated by a small number of terms, which one could try to find knowing each wj (x)
from the recursion relations.
Fortunately, calculating some lower order statistics, like the mean of the distribution
Pi (t|x), or its variance, is much easier. The expected number of fixations of the j-th
mutation is 1 · wj (x) + 0 · (1 − wj (x)) = wj (x). Thus, the expected number of substitutions
that occurred after m mutations took place is
∞
m
X
X
iWm (i|x) =
wj (x).
(S33)
i=0
j=1
The substitution trajectory can then be finally written as
∞
∞
∞
m
X
X
X
X
S(t, x) =
Um (t)
iWm (i|x) =
Um (t)
wj (x).
m=0
m=1
i=0
(S34)
j=1
The variance in the number of substitutions can be similarly calculated, taking into account that the variance in the expected number of fixations of the j-th mutation equals
wj (x)(1 − wj (x)), and the total variance is the sum over individual mutational steps.
8
2
The role of neutral and deleterious mutations in
adaptation
In this section we investigate how the distributions P (y, t|x) and Pi (t|x) change if a
constant fraction of neutral or deleterious mutations is added to the NFD.
2.1
Neutral mutations
Suppose that on the fitness landscape Φx the distribution of fitnesses at time t is
P (y, t|x) and the distribution of the number of accumulated substitution is Pi (t|x). Let
Φ̃x = νδx + (1 − ν)Φx
(S35)
where δx is a point mass centered at x, be a new fitness landscape with a fraction ν of
neutral mutations. Let the distribution of fitnesses at time t on this landscape be P̃ (y, t|x)
and let the distribution of substitutions be P̃i (t|x). We claim that
P̃ (y, t|x) = P (y, (1 − ν)t|x)
i
X
P̃i (t|x) =
Uj (N −1 νt)Pi−j (1 − ν)t|x ,
(S36)
(S37)
j=0
where Uj (N −1 νt) is, as before, the Poisson distribution with parameter N −1 νt. Expression
(S36) shows that the evolution of the distribution P (y, t|x) proceeds on the landscape Φ̃x
with mutation rate θ exactly as on the landscape Φx with a smaller mutation rate (1−ν)θ.
Expression (S37) shows that, if the random variables Qt and Q̃t describe the number of
substitutions that occurred by time t on the fitness landscapes Φx and Φ̃x , respectively,
then Q̃t = Q(1−ν)t + Rνt , where Rt is a Poisson process with rate N −1 . In other words,
neutral mutations simply add an independent Poisson counting process to the original
substitution process.
To show that (S36) and (S37) hold, we substitute (S35) into (S4) and (S13) and obtain
Z ∞
def
K̃b f (·) (x) =
Φ̃x (ξ)πx (ξ) f (ξ) − f (x) dξ = (1 − ν) K̂b f (·) (x), (S38)
Z0 ∞
def
q̃(x) =
Φ̃x (ξ)πx (ξ) dξ = νN −1 + (1 − ν)q(x).
(S39)
0
From (S38) follows that the backward equation for P̃ (y, t|x) differs from the backward
equation for P (y, t|x) only by the scaling factor 1 − ν. Now,
∂ P̃0
(t|x) = −N −1 νU0 (N −1 νt)P0 (1 − ν)t|x − (1 − ν)q(x)U0 (N −1 νt)P0 (1 − ν)t|x
∂t
= −q̃(x)P̃0 (t|x),
9
and
i X
∂ P̃i
−1
−1
−1
(t|x) =
− N νUj (N νt)Pi−j (1 − ν)t|x − (1 − ν)Uj (N νt)q(x)Pi−j (1 − ν)t|x
∂t
j=0
+ N
−1
ν
i
X
Uj−1 (N
−1
νt)Pi−j (1 − ν)t|x + (1 − ν)q(x)
j=1
+ (1 − ν)
i−1
X
i−1
X
Uj (N −1 νt)Pi−j−1 (1 − ν)t|x
j=0
Uj (N −1 νt) K̂b Pi−j−1 (1 − ν)t| ·
(x)
j=0
= − P̃i (t|x) − P̃i−1 (t|x) + K̃b P̃i (t|·) (x)
which implies that P̃i given by equation (S37) satisfy equations (S10), (S11) for the
landscape Φ̃x . As a consequence, the fitness trajectory S̃(t, x) on the landscape Φ̃x is
given by
S̃(t, x) = S (1 − ν)t, x + νN −1 t,
which can also be obtained directly by substituting expression (S39) into solution (S23).
2.2
Deleterious mutations
In general, it is hard to predict how deleterious mutations would influence the dynamics of adaptation. However, their effect becomes negligible as the population size goes
to infinity, at least in the weak-mutation limit. Indeed, the fixation probability (S1) of
deleterious mutations quickly tends to zero as the population size increases. For example,
the probability of fixation of a moderately deleterious mutation with the selective disadvantage of 0.1% is less than 10−3 for a population of size of 103 and is less than 10−11 for
a population of size of 104 . Thus, even in moderately large populations, the vast majority
of deleterious mutations will not go to fixation. Therefore, on the long time scale, all deleterious mutations are equivalent to being lethal. Intuitively, this means that if we add a
fraction d of deleterious mutations to the NFD of all genotypes, this fraction of mutations
will simply be wasted and only the remaining fraction 1 − d will be potentially utilized in
the process of adaptation. To illustrate that this indeed is happening, we add a fraction d
of deleterious mutations to the non-epistatic and stairway to heaven landscapes. We call
the resulting landscapes NEPI+d and STH+d, respectively. The fitness and substitution
trajectories for these landscapes are shown in Figure S1. As expected, the analytical
approximations calculated under the assumption that deleterious mutations are wasted
gives an excellent fit to simulations.
An important consequence of this observation is that the weak mutation theory holds
when θb (4 log N )−1 instead of the more stringent θ (4 log N )−1 , where θb ≡ (1 − d)θ.
If the genomic rate of beneficial mutations µb is 10−5 [4], then this condition is satisfied
for population sizes smaller than 1000.
10
4
STH+d
probability density
NEPI+d
Φ 1 (y )
1
0
2
1
Φ 1 (y )
Φ 4 (y )
0
2
0
2
4
0
10
4
10
2
10
0
10
4
10
10
Φ 4 (y )
0
2
2
Φ 1 (y )
1
10
10
Φ 4 (y )
expected fitness, F(t)
MTF
3
10
expected number of substitutions, S(t)
2
6
8 10
fitness of a mutant, y
0
10
−2
10
0
10
time, t
2
10
1
10
−1
10
3
10
1
10
−1
10
3
10
1
10
−1
10
−2
10
0
10
2
10
time, t
Figure S1: Dynamics of adaptation on the continuous additive Mount Fuji landscape
(MTF), the non-epistatic landscape with deleterious mutations (NEPI+d) and the stairway to heaven landscape with deleterious mutations (STH+d). Notations are as in Figure
1 in the main text. Parameter values used: N = 1000, µ = 10−5 (θ = 0.01), L = 1000,
number of replicate simulations = 103 . MTF landscape: xmax = 5, a = 1; NEPI+d landscape: d = 0.5 and a = 1. STH+d landscape: d = 0.5 and a = 0.42. The same analytical
approximations were used here for the NEPI+d and STH+d landscapes as in the main
text, but time was rescaled by θ(1 − d) instead of θ.
11
3
Classical landscapes
Recall that we employ the following definitions (see main text).
1. The house of cards or the uncorrelated landscapes are the landscapes on which the
NFD is the same for all genotypes (and fitnesses),
Φx (y) dy = Ψ(y) dy.
2. The non-epistatic landscapes are landscapes on which the distribution of fitness
effects Ψ(v) remains the same for all genotypes, so that the NFD is given by
Φx (y) dy = Ψ(y − x) dy.
3. The stairway to heaven landscapes are the landscapes on which the distribution of
selection coefficients of mutations, Ψ(s), is the same for all genotypes, so that the
NFD is given by
1
y−x
Φx (y) dy = Ψ
dy.
x
x
In the main text we considered special cases of these landscapes when the distribution Ψ
was of exponential form (see Table 1 in the main text),
House of cards
Non-epistatic
Stairway to heaven
3.1
n yo
1
exp −
, y≥0
a
a
1
y−x
Φx (y) = exp −
, y≥x
a
a
y−x
1
exp −
Φx (y) =
, y≥x
ax
ax
Φx (y) =
(S40)
(S41)
(S42)
Correlation structure
The house of cards, non-epistatic, and stairway to heaven landscapes differ by the
correlation structure between parent and offspring fitnesses. By definition, there is no such
correlation on the house of cards landscape. By contrast, the offspring fitness is positively
correlated with the parent fitness on both non-epistatic and stairway to heaven landscapes.
Let X be the fitness of the parent that is drawn randomly from some distribution, and Y
be the fitness of the offspring.
On non-epistatic landscapes, Y = X + V , where V is the fitness increment which is
drawn from distribution Ψ(v) independently of X. Then
Cov(X, Y ) = E (X − X̄)(X − X̄ + V − V̄ )
= Var(X) > 0,
12
where, X̄ and Var(X) are the mean and the variance of the distribution from which the
parent is drawn, and V̄ > 0 is the mean of the distribution of fitness increments.
On the stairway to heaven landscapes, Y = X(1 + S), where S is the selection coefficient which is drawn from distribution Ψ(s) independently of X. Then
Cov(X, Y ) = E (X − X̄)(X − X̄ + XS − X̄ S̄)
= Var(X) (1 + S̄) > 0,
where S̄ > −1 is the mean of the distribution of selection coefficients.
3.2
Approximate solution for the exponential house of cards
landscape
On the house of cards landscape (S40) we have, for large population sizes and for large
x,
Z
x
1 − x ∞ − y−x 2a − x
−2 y−x
a
a
x
q(x) =
e
e
1−e
dy =
e a ≈ 2ae− a
a
x + 2a
Zx ∞
y−x
y−x
x
4a2 (x + a) − x
1 −x
(y − x)e− a 1 − e−2 x
dy =
r(x) =
e a
e a ≈ 4a2 e− a .
2
a
(x + 2a)
x
The last approximate inequality for q(x) and r(x) is not very accurate since it neglects
the term of order x−1 ,but it captures the fact the exponential decay in both q(x) and r(x)
will dominate the power-law decay as x gets large. After substituting these functions into
equations (1), (2) in the main text, we solve them using the method of characteristics
to obtain expressions for the fitness and substitution trajectories presented in Figure 1
(main text).
3.3
Approximate solution for the exponential non-epistatic landscape
On the non-epistatic landscape (S41) we have, for large population sizes, and for large
x,
Z
1 ∞ − y−x 2a
2a
−2 y−x
a
x
q(x) =
e
1−e
dy =
≈
a x
x + 2a
x
Z ∞
2
y−x
y−x
1
4a (x + a)
4a2
r(x) =
dy =
(y − x)e− a 1 − e−2 x
≈
.
a x
(x + 2a)2
x
After substituting these functions into equations (1), (2) in the main text, we solve them
using the method of characteristics to obtain expressions for the fitness and substitution
trajectories presented in Figure 1 (main text).
13
3.4
Exact solution for an arbitrary stairway to heaven landscape
It is possible to solve equations (S18)–(S22) for an arbitrary stairway to heaven landscape, Φx (y) = xR−1 Ψ(y/x − 1). First note that the expected fixation probability of a
∞
mutation, q(x) = −1 Ψ(s)π(s) ds = hπ(s)i, is independent of the fitness x of the parental
R∞
genotype. In addition, r(x) = x −1 sΨ(s)π(s) ds = xhπ(s)si, which suggests, after exploring the advection approximation, the ansatz F (t, x) = f (t)x and S(t, x) = g(t) for
the equations (S18)–(S22). With this ansatz we obtain
Z ∞
1
ξ−x
K̂b F (t, ·) (x) =
Ψ
πx (ξ)f (t)(ξ − x) dξ = hπ(s)si F (t, x),
x
x
0
K̂b S(t, ·) (x) = 0,
where hπ(s)si is the expected selection coefficient of a random mutation to any genotype,
weighted by its fixation probability. Equations (S18)–(S22) become simple ODE’s whose
solutions are given by
F (t, x) = xehπ(s)sit
S(t, x) = hπ(s)it,
(S43)
(S44)
Expressions for the fitness and substitution trajectories presented in Figure 1 (main text)
follow from expressions (S43), (S44) by noting that, for large population sizes,
Z
1 ∞ −s
2a
e a 1 − e−2s ds =
,
hπ(s)i =
a 0
1 + 2a
Z
1 ∞ −s
4a2 (1 + a)
hπ(s)si =
s e a 1 − e−2s ds =
.
a 0
(1 + 2a)2
It can be shown analogously that the k-th moment of the distribution of fitnesses,
Mk (t, x), evolves according to
Mk (t, x) = xn eκn t ,
Pn n
j
where κn =
j=1 j hπ(s)s i. In particular, the relative width of the distribution of
2
fitnesses increases with time, M2 (t, x)/F 2 (t, x) = ehπ(s)s it − 1.
3.5
Mount Fuji landscape
In addition to the classical fitness landscape considered above, the class of fitness
parametrized landscapes encompasses many other landscapes. To demonstrate this, we
present here a version of the “Mount Fuji” landscape. On this landscape, the fitness
decreases monotonically with the Hamming distance from the single optimal genotype,
so that the fitness of the genotype that differs by h mutations from the optimal one is
(1 − s)h , where 0 < s < 1. If formulated in terms of neighbor fitness distributions,
14
such multiplicative mount Fuji landscape would be defined for a discrete set of fitnesses
x ∈ {1, 1 − s, · · · , (1 − s)L }, where L is the genome size,
Φx (y) = bh δx(1−s)−1 (y) + (1 − bh )δx(1−s) (y).
Here, δz is, as before, a point measure centered at z, and bh is the probability of a
beneficial mutation to a genotype with h mutations. These probabilities can be easily
calculated knowing the genome length L and the alphabet size |A|. For instance, b0 = 1,
b1 = (|A|L)−1 and bL = 1.
A continuous version of the additive Mount Fuji landscape can also be defined, for
example, as follows.
h
i
(
1
a
a
,
if
y
∈
x(1
−
),
x(1
−
)
+
a
a
xmax
xmax
Φx (y) =
0, otherwise
On this landscape, the fraction of beneficial mutations decreases linearly from 1 to 0 as
the fitness changes from 0 to the maximum value xmax . Parameter a defines the width of
the NFD. The dynamics of adaptation on this landscape is shown in Figure S1.
4
Relaxation of the weak-mutation limit
In this section we investigate, by means of simulations, the validity of our theory
outside of the weak-mutation limit. We perform full stochastic simulations of the infinite
alleles Wright-Fisher model with N = 1000 individuals. We vary the mutation rate from
µ = 10−5 to µ = 10−2 per individual per generation, which corresponds to θ ranging from
θ = 0.01, where our theory should well describe the Wright-Fisher model, up to θ = 10,
where clonal interference and piggybacking effects cannot be ignored.
In the simulations, each individual is characterized by its allelic type z (a float number
between 0 and 1); xz is the fitness of allele z, kz is the number of mutations that have
occurred on the line of descent of an individual of type z. A mutant offspring of individual
of type z has type z 0 which is drawn randomly from [0, 1]; fitness xz0 is then drawn from
the distribution Φxz , and kz0 = kz + 1.
At each time point t the population is characterized by a collection of K(t) types
z1 , z2 , . . . , zK(t) and their frequencies f1 , f2 , . . . , fK(t) . We use the shorthand notations
xi ≡ xzi and ki ≡ kzi . In the simulations we track four summary statistics:
P
1. The mean fitness of the population K(t)
i=1 xi fi
P
2. The mean number of mutations since the initial time point K(t)
i=1 ki fi
P
2
3. The population heterozygosity 1 − K(t)
i=1 fi
4. The number of alleles present in the population, K(t)
15
3
mean number of substitutions
mean fitness
0
NEPI
10
4
10
2
10
0
10
4
10
2
STH
−2
1
10
10
10
10
1
10
10
10
−4
10
−1
10
3
10
heterozygosity
2
HOC
2
0
10
1
10
−1
10
3
10
0
number of alleles
4
10
0
10
−2
10
−4
10
0
10
−2
1
2
10
1
10
0
10
2
10
1
10
10
10
10
−4
0
10
10
−1
0
10
2
10
10
4
10
0
10
2
10
4
0
10
0
10
10
2
10
4
0
10
10
2
10
4
10
time, t (in generations)
Figure S2: Dynamics of adaptation in the full Wright-Fisher model with µ = 10−5 ,
N = 103 on three classical landscapes. The first and second columns show the fitness and
substitution trajectories (see text for details). Black lines correspond to the predictions of
our theory; gray lines show the results of the Wright-Fisher simulations; dashed lines show
a linear function, for reference. The third column shows the evolution of heterozygosity,
and the fourth column shows how the number of alleles in the population changes over
time (see text for details). Note that time is measured in generations. Parameter values
are the same as in Figure 1 in the main text, except number of replicate simulations is
103 ; at time zero the population is monomorphic with a type with fitness 2. Simulations
are terminated prematurely if the fitness of an indiviual exceeds 10100 .
3
mean number of substitutions
STH
mean fitness
0
NEPI
−2
1
10
10
4
10
2
10
0
10
4
10
2
10
10
10
1
10
10
10
−4
10
−1
10
3
10
heterozygosity
2
HOC
2
0
10
1
10
−1
10
3
10
0
number of alleles
4
10
0
10
−2
10
−4
10
0
10
−2
1
2
10
1
10
0
10
2
10
1
10
10
10
10
−4
0
10
10
−1
0
10
2
10
4
10
10
0
10
2
10
4
0
10
0
10
10
2
10
4
10
0
10
2
10
4
10
time, t (in generations)
Figure S3: Dynamics of adaptation in the full Wright-Fisher model with µ = 10−4 ,
N = 103 . Notations as in Figure S2.
16
3
mean number of substitutions
mean fitness
0
NEPI
10
4
10
2
10
0
10
4
10
2
STH
−2
1
10
10
10
10
1
10
10
10
−4
10
−1
10
3
10
heterozygosity
2
HOC
2
0
10
1
10
−1
10
3
10
0
number of alleles
4
10
0
10
−2
10
−4
10
0
10
−2
1
2
10
1
10
0
10
2
10
1
10
10
10
10
−4
0
10
10
−1
0
10
2
10
10
4
10
0
10
2
10
4
0
10
0
10
10
2
10
4
0
10
10
2
10
4
10
time, t (in generations)
Figure S4: Dynamics of adaptation in the full Wright-Fisher model with µ = 10−3 ,
N = 103 . Notations as in Figure S2.
3
mean number of substitutions
STH
mean fitness
0
NEPI
−2
1
10
10
4
10
2
10
0
10
4
10
2
10
10
10
1
10
10
10
−4
10
−1
10
3
10
heterozygosity
2
HOC
2
0
10
1
10
−1
10
3
10
0
number of alleles
4
10
0
10
−2
10
−4
10
0
10
−2
1
2
10
1
10
0
10
2
10
1
10
10
10
10
−4
0
10
10
−1
0
10
2
10
4
10
10
0
10
2
10
4
0
10
0
10
10
2
10
4
10
0
10
2
10
4
10
time, t (in generations)
Figure S5: Dynamics of adaptation in the full Wright-Fisher model with µ = 10−2 ,
N = 103 . Notations as in Figure S2.
17
3
3
10
10
1
HOC
1
10
10
STH
−1
Var(# of subst)
NEPI
Var(fitness)
−1
103
10
1
10
−1
103
10
1
1
10
−1
103
10
1
10
10
−1
10
103
10
−1
−2
10
0
10
10
2
10
−2
10
time, t
0
10
2
10
time, t
Figure S6: Variance of the ensemble distribution of fitnesses (top row) and substitutions
(bottom row) for classical landscapes. Notations and parameter values are as in Figure 1
in the main text.
Figures S2–S5 show the average values of these statistics across 1000 independent replicas.
For θ = 0.01 (Figure S2) and even for θ = 0.1 (Figure S3), our theory accurately describes
the dynamics of adaptation, as expected. For θ = 1 (Figure S4) and θ = 10 (Figure S5),
the quantitative predictions of our theory are poor. Indeed, when θ > 1, the population
is polymorphic most of the time—this can be seen in the graphs showing the population
heterozygosity and the number of coexisting alleles. Thus, in simulations with θ > 1
clonal interference and piggybacking certainly occur. Surprizingly, even in this regime the
qualitative predictions of our theory still hold. In particular, we observe that, even though
the curvature of the fitness and substitution trajectories depends on the mutation rate,
its sign does not. In other words, landscapes that give rise to concave (convex) fitness
(substitution) trajectories in the weak-mutation limit continue to give rise to concave
(convex) fitness (substitution) trajectories even in the presence of clonal interference and
piggybacking. This implies that we can use the weak mutation theory to obtain qualitative
conclusions about the fitness landscape, even if the observed trajectories were generated
under high mutation rates.
References
[S1] Gillespie, JH (1994) The causes of molecular evolution (Oxford University Press).
[S2] Orr, HA (2002) The population genetics of adaptation: The adaptation of DNA
sequences. Evolution 7:1317–1330.
18
[S3] Crow, JF, Kimura, M (1972) An introduction to population genetics theory (Harper
& Row Ltd).
[S4] Perfeito, L, Fernandes, L, Mota, C, Gordo, I (2007) Adaptive mutations in bacteria:
High rate and small effects. Science 317:813–815.
19