The dynamics of adaptation on correlated fitness landscapes Sergey Kryazhimskiya,1 , Gašper Tkačika,b,1 , and Joshua B. Plotkina,2 a Department of Biology and b Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, PA 19104 Edited by Simon A. Levin, Princeton Universtiy, Princeton, NJ, and approved September 4, 2009 (received for review May 18, 2009) Evolutionary theory predicts that a population in a new environment will accumulate adaptive substitutions, but precisely how they accumulate is poorly understood. The dynamics of adaptation depend on the underlying fitness landscape. Virtually nothing is known about fitness landscapes in nature, and few methods allow us to infer the landscape from empirical data. With a view toward this inference problem, we have developed a theory that, in the weak-mutation limit, predicts how a population’s mean fitness and the number of accumulated substitutions are expected to increase over time, depending on the underlying fitness landscape. We find that fitness and substitution trajectories depend not on the full distribution of fitness effects of available mutations but rather on the expected fixation probability and the expected fitness increment of mutations. We introduce a scheme that classifies landscapes in terms of the qualitative evolutionary dynamics they produce. We show that linear substitution trajectories, long considered the hallmark of neutral evolution, can arise even when mutations are strongly selected. Our results provide a basis for understanding the dynamics of adaptation and for inferring properties of an organism’s fitness landscape from temporal data. Applying these methods to data from a long-term experiment, we infer the sign and strength of epistasis among beneficial mutations in the Escherichia coli genome. epistasis | fitness trajectory | substitution trajectory | weak mutation | evolution E volutionary theory predicts that mean fitness will increase over time when a population encounters a new environment. This behavior is observed in natural and laboratory populations. Yet evolutionary theory offers few quantitative predictions for the dynamics of adaptation (1). The primary difficulty is that adaptation depends on the shape of the underlying fitness landscape. Unfortunately, mapping out an organism’s fitness landscape is virtually impossible because of its vast dimensionality and the coarse resolution of fitness measurements. Moreover, because of the scarcity of such measurements, most theoretical work has been pursued in isolation from data. Much of the theory of adaptation is concerned with understanding the dynamics on uncorrelated, or “rugged”, fitness landscapes. This approach, pioneered by Kingman (2) and Kauffman and Levin (3), has generated many important results (e.g. refs. (4–7)). But many of these results do not extend to landscapes that are correlated. One striking example is the expected length of an adaptive walk: It is extremely short on rugged landscapes (3, 8), but it can be very long on correlated landscapes (9). Although data are scarce, a long-term evolution experiment in Escherichia coli has found that adaptation continues to proceed even after 20,000 generations in a constant environment (10). This observation suggests that fitness landscapes in nature are correlated. A second body of work examines relatively realistic, complex genotype-to-fitness maps—e.g. an RNA folding algorithm—and studies adaptation on the resulting correlated landscapes by computer simulation (e.g. refs. (3, 11–15)). This approach provides important insights into the process of adaptation, and it produces quantitative predictions about the specific systems being simulated. But such results are difficult to generalize. 18638–18643 PNAS November 3, 2009 vol. 106 no. 44 A third approach, orthogonal to the first two, was introduced by Gillespie (16, 17) and revived more recently by Orr (8, 18, 19). It utilizes extreme-value theory to identify features of the adaptation process that are independent of the underlying fitness landscape. Although helpful for understanding some fundamental properties of evolution, this approach suffers from a few serious drawbacks. Most importantly, by focusing on features of adaptation that are independent of the fitness landscape, the Orr–Gillespie theory does not elucidate how the structure of the landscape influences adaptation, nor does it allow us to infer the landscape from empirical data. Yet this is a question of central interest in evolutionary biology. In addition, most of the predictions of this theory concern a single adaptive step (8, 18, 19), and those predictions that extend to multiple steps hold again only for uncorrelated landscapes (20). In order to address these shortcomings, we present here an elementary theory of adaptation on a correlated fitness landscape. Our theory makes an explicit connection between the shape of the fitness landscape and observable features of adaptation, and it therefore allows us to infer important properties of the fitness landscapes from data. Experimental studies of microbial evolution typically report the mean fitness of the population (21, 22) and the mean number of accumulated substitutions (23, 24) over time; therefore we develop a theory that predicts these dynamic quantities, which we call the fitness and substitution trajectories, in terms of the underlying fitness landscape. To develop this theory, we need a sufficiently general but tractable description of a correlated fitness landscape. As in Gillespie’s model (17), we will describe the fitness landscape by specifying the distribution of fitnesses of single-mutant neighbors for each genotype, which we call the “neighbor fitness distribution” (NFD). On an uncorrelated landscape, all genotypes share the same NFD. We introduce correlations by assuming that the same NFD is shared among genotypes that have the same fitness, but genotypes of different fitnesses may have different NFDs. We say that such landscapes are fitness-parameterized because the possible consequences of a mutation are determined only by the fitness of the parental genotype (52). This framework accommodates arbitrary correlations introduced by nonneutral mutations. But neutral networks (14, 25, 26) or mutations with equal effect but different evolutionary potential fall outside of the scope of fitness-parametrized landscapes. Nevertheless, the space of fitness-parametrized landscapes is very large and contains most of the landscapes studied in previous literature. To understand this space better, we will first explore three classical fitness landscapes: the uncorrelated landscape (2, 5, 6, 20, 27), the (additive) nonepistatic landscape (28, 29), and the landscape Author contributions: S.K., G.T., and J.B.P. designed research; S.K. and G.T. performed research; S.K. and G.T. analyzed data; and S.K., G.T., and J.B.P. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Freely available online through the PNAS open access option. 1 S.K. 2 To and G.T. contributed equally to the paper. whom correspondence should be addressed. E-mail: [email protected]. This article contains supporting information online at www.pnas.org/cgi/content/full/ 0905497106/DCSupplemental. www.pnas.org / cgi / doi / 10.1073 / pnas.0905497106 Three Classical Fitness Landscapes. We describe a fitness landscape by a family of probability distributions, Φx . Φx (y)dy denotes the probability that a mutation arising in an individual of fitness x will have a fitness in [y, y+dy]. The space of fitness-parametrized landscapes includes, among others, such well-known (2, 5, 6, 20, 27, 29– 31) landscapes as (i) the “house of cards” (HOC) or the uncorrelated landscapes, for which all genotypes have the same NFD Φx (y) = Ψ(y); (ii) the non-epistatic (NEPI) landscapes, for which the distribution of fitness effects of mutations is the same for all genotypes, so that the NFD is given by Φx (y) = Ψ(y − x), and (iii) the “stairway to heaven” (STH) landscapes, for which the distribution of selection coefficients is the same for all genotypes, so that the NFD is given by Φx (y) = x−1 Ψ(x−1 (y − x)). The definitions of these three well-known landscapes are summarized in Table 1, where we have assumed that the NFD follows an exponential form. We will derive expressions for the expected fitness and substitution trajectories on each of these landscapes. Our results also hold qualitatively if we replace the exponential distribution by any other distribution from the Gumbel domain of attraction as predicted by the Orr–Gillespie theory (18). Note that there are no deleterious or neutral mutations in the NEPI and STH landscapes (Table 1), but our conclusions would not change if we added such mutations (see SI Appendix). Before we derive analytic expressions for the dynamics of adaptation on the three classical landscapes, we first develop some intuitive expectations. On all landscapes, we expect substitutions to accrue and the mean fitness to increase over time. For the HOC landscapes, we expect that the rate of fitness increase should slow down as the population becomes more adapted. To see this slowdown, imagine a population initially at fitness x0 , ∞ where x Ψ(y)dy = 0.5, i.e. 50% of mutations are beneficial. If Fitness and Substitution Trajectories. In order to analyze the dynamics of adaptation, we consider an asexual population of fixed size N that evolves according to the infinite-sites Wright–Fisher (WF) model (see Materials and Methods for details). We assume that the mutation rate is sufficiently small that, at most, one mutant segregates in the population at any time (8, 17). Thus, the population is essentially always monomorphic, and it can be characterized at each time by its fitness x. When a mutation with fitness y arises, it either fixes instantaneously with Kimura’s fixation probability πx (y) = (1 − e−2sx (y) )/(1 − e−2Nsx (y) ) or is instantaneously lost with probability 1 − πx (y) where sx (y) is the selection coefficient (see Materials and Methods). In this limit, the adaptive walk of the population is described by a continuous-time, continuous-space Markov chain. We emphasize that, in contrast to the “greedy” adaptive walks typically studied in the literature on rugged fitness landscapes (3, 4), the adaptive walks studied here never stop. Even if a population reaches a local fitness maximum, a deleterious mutation will eventually fix, and the walk will continue. We have developed a method for efficiently computing the full ensemble distribution of fitnesses and substitutions of the population at time t, given that its initial fitness was x0 at time zero (see SI Appendix). Here we focus on two important statistics of these distributions: the expected fitness of the population F(t) at time t, and the expected number of substitutions S(t) accumulated in the population by time t. We call these quantities the fitness trajectory and the substitution trajectory, respectively. If we measure time in the expected number of mutations, these functions approximately satisfy the following equations (see Materials and Methods): Ḟ = r(F), F(0) = x0 [1] Ṡ = q(F), S(0) = 0, [2] APPLIED Results a beneficial mutation arises and fixes, providing fitness x1 > x0 , then this event can ∞ only reduce the pool of remaining beneficial mutations—i.e. x Ψ(y)dy < 0.5. Thus, the rate of fitness increase 1 should be reduced as adaptation proceeds on the HOC landscape. By contrast, on a STH landscape, we expect that the rate of fitness increase will increase as the population adapts. Indeed, the fraction of mutations that are adaptive does not change as fitness increases, but the fitness increment of such mutations grows linearly with the fitness of the parent (because the selection coefficient stays the same). These simple considerations indicate that HOC landscapes are antagonistically epistatic, whereas STH landscapes are synergistically epistatic. We call the landscape Φx (y) = Ψ(y − x) nonepistatic because on this landscape the distribution of fitness increments of mutations does not depend upon the fitness of the parental genotype. If fitness effects were viewed multiplicatively, however, then the STH landscape would be considered nonepistatic—although we do not adapt this convention here (see ref. 28 for an extensive discussion on this topic). Moreover, as we show below, the STH landscape produces unrealistic evolutionary dynamics. Table 1. Classical fitness landscapes with the exponential form and the corresponding fitness and substitution trajectories obtained from Eqs. 1 and 2 Expected fitness increment∗ r(x) NFD Φx (y) HOC 1 − ya ae NEPI 1 − y−x a ae , y ≥x STH 1 − y−x ax ax e , y ≥x , y ≥0 x 4a2 e− a 4a2 x 4a2 (1+a) (1+2a)2 x Fitness trajectory F(t) x0 a ln e a + 4at x02 + 8a2 t 2 4a (a+1) x0 exp (2a+1)2 t Expected fixation probability∗ q(x) x 2ae− a 2a x Substitution trajectory S(t) x0 x0 1 a + 4at − 2a ln e 2a2 1 x02 + 8a2 t − x0 2a 2a 2a+1 2a 1+2a t ∗ Expressions for the r- and q-functions are derived in the limit x 1 (HOC, NEPI) and under the approximation N 1 (HOC, NEPI, STH). These approximations are highly accurate, especially for large x (see Fig. 1). See SI Appendix for details. Kryazhimskiy et al. PNAS November 3, 2009 vol. 106 no. 44 18639 EVOLUTION 0 MATHEMATICS with a constant distribution of selection coefficients (30, 31). We will demonstrate how the choice of landscape influences the dynamics of adaptation. Having gained some insight from these examples, we will classify fitness-parametrized landscapes in terms of the qualitative evolutionary dynamics they produce. Remarkably, the qualitative dynamics fall into 14 possible classes, which include, among others, the well-known classical examples. By comparing these classes against observations from microbial evolution experiments (21), we will infer the space of landscapes that, given our simplifying assumptions, are compatible with existing data. We will study the dynamics of adaptation in the limit of weak mutation (8, 16, 17, 32), which allows us to ignore the effects of multiple, competing beneficial mutations (30, 31, 33, 34). This approach is mathematically convenient, and, more importantly, it allows us to study the dynamics induced by the fitness landscape itself in isolation from those that result from clonal interference (30, 31, 35, 36). Our analysis will therefore provide a null expectation against which to compare more complex models or data. Fig. 1. Dynamics of adaptation on three classical fitness landscapes. Rows correspond to fitness landscapes. The first column graphs the NFD, Φx (y), for two representative values of the parental fitness, x0 = 1 and x0 = 4. The second and third columns show the fitness and substitution trajectories for a population starting with fitness x0 = 2. Black lines correspond to the theoretical predictions of Eqs. 1 and 2; gray lines show the results of stochastic simulations; dashed lines show a linear function, for reference. Note that axes are logarithmic. The fourth column shows the empirical distribution of selection coefficients of fixed mutations; dashed lines show the best-fit regression on the semi-log scale, with slope k (only selection coefficients > 0.5 were used for fitting). Parameter values: N = 1000; μ = 10−5 ; L = 1000; number of replicate simulations = 104 ; a = 1 for the HOC and the NEPI landscapes, and a = 0.42 for the STH landscape. where the dot denotes a derivative with respect to time; ∞ πx (y)Φx (y) dy q(x) = [3] 0 is the expected fixation probability of a mutation arising in a population with fitness x; and ∞ (y − x)πx (y)Φx (y) dy [4] r(x) = 0 is the expected fitness increment of such a mutation, weighted by its fixation probability. Eqs. 1 and 2 were derived under the infinite-sites assumption, i.e. each genotype was assumed to have an infinite number of neighbors, so that even very fit genotypes have a nonzero chance of discovering a beneficial mutation. Consistent with previous work (37), the infinite-sites approximation is highly accurate, as we demonstrate by comparing (Fig. 1) the solutions of these equations (Table 1) to simulations of a finite-site model (see Materials and Methods). Fig. 1 shows the dynamics of adaptation on the three classical fitness landscapes. On the HOC landscape, both the expected fitness of the population and the expected number of substitutions grow logarithmically with time, consistent with previous work (4). As we expected, the rate of adaptation on such landscapes rapidly declines as the fitness of the population grows. As the population adapts, there are two forces on the HOC landscape that act against further adaptation. First, the fraction of mutations that are beneficial decreases. Second, the probability of fixation of an adaptive mutation decreases as well. This decrease occurs because the fixation probability monotonically depends on its selection coefficient, and the selection coefficients of available adaptive mutations decline as the fitness of the parent increases. In addition, adaptation slows down further because the time to fixation of beneficial mutations grows with declining selection coefficients. However, this effect turns out to be negligible (see the comparison with the full WF model below). The rate of adaptation on the NEPI landscape also slows down as the fitness increases, but it does so less dramatically than on the HOC landscape. This behavior is expected because the fraction of beneficial mutations and their effects do not change as the fitness of the parental genotypes increases. However, the selection coefficients of beneficial mutations decrease, thereby reducing the rate of fitness growth. Finally, on the STH landscape, the rate of mean-fitness increase grows without bound over time, as expected. In contrast to HOC and NEPI landscapes, there are no forces on such landscapes 18640 www.pnas.org / cgi / doi / 10.1073 / pnas.0905497106 that impede further adaptation as the population becomes more adapted (hence the name “stairway to heaven”). In order to investigate the robustness of the results in Fig. 1 with respect to the assumption of weak mutation, we have simulated the full stochastic WF model over a wide range of mutation rates. These simulations incorporate the effects of competing mutations, and they also account for the (nonzero) time to fixation. Our theoretical prediction matches the dynamics of the full WF model very well when θ 0.1. Moreover, even when θ > 1, the concavities of fitness and substitution trajectories are correctly predicted by our theory (see SI Appendix). Distribution of Selection Coefficients of Fixed Mutations. In addition to fitness and substitution trajectories, we have investigated the distribution of selection coefficients for mutations that fix during adaptation (Fig. 1, fourth column). By using computer simulations, Orr previously showed that this distribution is approximately exponential (excluding small selection coefficients) for uncorrelated landscapes whose NFD belongs to the Gumbel type (8). Fig. 1 shows that Orr’s observation holds more generally— i.e. even for correlated landscapes, such as the NEPI and STH landscapes. In fact, the distribution of fixed selection coefficients is so robust to changes in the landscape structure that virtually no inference can be made on its basis. To demonstrate this problem, we have chosen the parameter a (see Table 1) so that the resulting distributions of fixed selection coefficients are virtually the same for all three classical fitness landscapes, even though their qualitative trajectories are completely different (Fig. 1). In other words, the selection coefficients associated with mutations that are fixed during evolution tell us very little about the long-term behavior of an adapting population or the fitness landscape on which it is evolving. Toward a Classification of Landscapes. The space of all possible fitness landscapes is vast. We therefore wish to classify landscapes in terms of the qualitative evolutionary dynamics they produce—i.e. in terms of their fitness and substitution trajectories, which can be directly observed in an experiment. Our analytic approximation in Eqs. 1 and 2 captures the behavior of the trajectories quite well, especially as the population reaches high fitnesses (Fig. 1). Remarkably, these equations depend on only two simple functions of the landscape: the expected fixation probability of a mutation arising in a population of fitness x, q(x), and the expected fitness increment of such a mutation weighted by its fixation probability, r(x). By varying just these two quantities, we can explore all possible qualitative behaviors of the fitness and substitution trajectories. Kryazhimskiy et al. Fig. 2. Classification of fitness landscapes. Column 1 shows five possible shapes for the r-function, and three possible shapes for the q-function. In some cases, these functions have asymptotes, shown as dashed horizontal lines. Columns 2–6 show the fitness (Upper) and substitution (Lower) trajectories for the 15 landscapes that arise through combinations of r- and q-functions. Substitution trajectories for landscapes with q-function of type A, B, and C are shown in green, dark orange, and purple, respectively. In some cases, the fitness or substitution trajectories possess asymptotic slopes, shown as dashed lines in the corresponding color. In these cases, the asymptotic slope equals the asymptotic value of the corresponding r- or q-function (except for the substitution trajectories in case V). Landscapes V-B and V-C both have asymptotically linear substitution trajectories, and therefore fall into the same class. Inferring Landscape Structure From Data. Which fitness landscapes are compatible with empirical data, and which are not? To address this question, we have compared predicted evolutionary dynamics with data from long-term evolution experiments. Empirical fitness trajectories in a fixed environment typically have negative curvature: Fitness increases quickly at the early stages of adaptation, and more slowly at later stages (10, 21, 22, 39–42). This negative curvature implies that the r-functions for landscapes in nature belong to type III, IV or V. In other words, a large class of strongly synergistic landscapes (those with an increasing rfunction) are incompatible with basic, empirical observations. The space of unrealistic fitness landscapes includes the widely used STH landscapes (30, 31, 33–35, 43–45), for which r(x) ∼ x. Kryazhimskiy et al. Landscapes with either antagonistic epistasis (r(x) < Cx−1 ) or weak synergistic epistasis (Cx−1 < r(x) ≤ C) produce fitness trajectories that are concave, and so they are qualitatively consistent with data from microbial evolution experiments. We can use such data to estimate the sign and strength of epistasis. In order to do so, we assume that the r-function has the form r(x) = Bxβ with B > 0 and β ≤ 0. This form is convenient because it includes nonepistatic landscapes when β = −1, weakly synergistic landscapes when −1 < β ≤ 0, and antagonistic landscapes when β < −1. Eq. 1 can then be solved analytically, and the fitness trajectory is given by 1−β 1 F(t) = x0 + B(1 − β)t 1−β . [5] PNAS November 3, 2009 vol. 106 no. 44 18641 APPLIED Discussion The framework developed here addresses two key problems in the theory of adaptation: how to characterize evolution on a correlated fitness landscape and how to infer properties of a fitness landscape from empirical data. Our analysis has relied on two assumptions: weak mutation and the fitness parametrization of the landscape. The assumption of weak mutation, although restrictive, has been used in previous literature and provides a reasonable starting point for future research. Relaxing this assumption presents substantial mathematical complications and introduces entirely new phenomena, such as clonal interference (30, 35) and “piggybacking” (31, 36). Therefore, we must first have a solid understanding of adaptation dynamics under weak mutation before proceeding to incorporate these additional effects. Without a theory of weak mutation, we would be unable to disentangle the effects of the fitness landscape itself from the effects of clonal interference. In the future, experiments whose primary goal is to probe the fitness landscape should be designed to minimize the effects of clonal interference, e.g. by choosing small population sizes. The fitness parametrization is a less-restrictive assumption, especially when weak mutation is already assumed. Indeed, neutral networks are important for adaptation only when a population can use them to quickly access previously inaccessible beneficial mutations. This regime only occurs when the population is polymorphic, i.e. when θ > 1. In contrast, a monomorphic population MATHEMATICS It follows from this expression that the slope of the line fitted on the log–log scale to the fitness trajectory observed in a long-term evolution experiment provides an estimate of (1−β)−1 . We applied this procedure to data from the evolutionary experiment by Lenski et al. (21) and found that β̂ = −9.58 with the 95% confidence interval [−13.36, −7.38], suggesting that the fitness landscape of E. coli is, on average, strongly antagonistically epistatic. This qualitative conclusion is robust with respect to the violation of the weak mutation assumption (see SI Appendix), although the precise estimate of β may change with the development of more refined models of E. coli evolution. EVOLUTION For the purpose of classification, we consider only landscapes that are defined on the whole positive real axis, and whose r- and qfunctions are monotonic and smooth. The five different shapes of the r-function and three different shapes of the q-function determine, respectively, five qualitatively different fitness trajectories and three qualitatively different substitution trajectories (Fig. 2). Landscapes with an increasing or decreasing r-function produce convex (type I and II) or concave (types III, IV, and V) fitness trajectories, respectively. More specifically, fitness trajectories grow superlinearly with time (type I), are asymptotically linear (type II and III), grow sublinearly (type IV), or asymptote to a constant (type V). Similarly, landscapes with an increasing or decreasing q-function produce convex (type A) or concave (types B and C) substitution trajectories, respectively. Substitution trajectories grow asymptotically linearly (type A and B), or sublinearly (type C). Considering all possible combinations of the r- and q-functions produces a total of 14 classes of qualitatively different evolutionary dynamics (Fig. 2). This classification scheme accommodates the three classical landscapes considered above. The STH landscapes belong to class I-A or I-B, because q(x) is constant and r(x) grows without bound. The NEPI landscapes belong to class IV-C, because both r(x) and q(x) decay as x−1 . The HOC landscapes belong to class VC because r(x) is negative for large x and q(x) decays to zero. Recall that the STH landscapes are synergistically epistatic and the HOC landscapes are antagonistically epistatic. This observation suggests the following natural definition: landscapes for which the r-function either grows or decays slower than x−1 are synergistically epistatic (types I, II, III, and IV), whereas landscapes for which the r-function decays faster than x−1 are antagonistically epistatic (types IV and V). Remarkably, the substitution trajectories for landscapes of type IV or V are almost linear—a pattern long considered the hallmark of neutral or nearly neutral evolution (38). As these correlated landscapes demonstrate, this pattern can also arise when substitutions confer significant fitness gains. In fact, the linear accrual of adaptive mutations has recently been observed in experimental populations (53). can explore the neutral network only very slowly, by substituting neutral mutations (26). Such a population is far more likely to substitute a beneficial mutation and jump to a new neutral network. We have studied several quantities that characterize evolutionary dynamics. We found that the distribution of selection coefficients of fixed mutations is insensitive to the underlying NFD, consistent with previous findings (8, 46, 47). In contrast, the fitness and substitution trajectories are very informative about the underlying fitness landscape. In particular, the substitution trajectory is convex or concave on landscapes for which the fixation probability of a mutation increases or decreases with increasing fitness, respectively. Similarly, the fitness trajectory is convex or concave on landscapes for which the expected fitness increment of a mutation increases or decreases with increasing fitness. Moreover, the curvature of the fitness trajectory is informative about the sign and strength of epistasis in the fitness landscape. These results provide a groundwork for inferring fitness landscapes from dynamic data. In particular, we have shown that data from bacterial evolution experiments are incompatible with landscapes that feature a constant distribution of selection coefficients—even though such landscapes are often used in the theoretical literature. We have also proposed a simple method for inferring the sign and strength of epistasis from such data. In contrast to most other estimates of epistasis that are based on measurements of interactions among deleterious mutations (see e.g. ref. 48 and references therein), we provide an estimate of epistasis based on the interaction among beneficial mutations—which is more informative for the long-term dynamics of adaptation. Our estimates suggest that the E. coli fitness landscape is characterized by strong antagonistic epistasis, at least in a fixed laboratory environment, which is consistent with one previous study (49). However, the precise type of landscape (e.g. type IV versus type V) for E. coli or other microorganisms may be difficult to determine on the basis of fitness and substitution trajectories alone. The ensemble variance in trajectories across experimental replicates may provide additional power (see SI Appendix). Here we have focused on static fitness landscapes, which probably arise only in laboratory environments. Fitness landscapes in the field are likely dynamic because of fluctuations in the environment or frequency-dependent selection. We can hope to understand the evolutionary dynamics on such landscapes only after we acquire a firm understanding of static landscapes. Our elementary theory provides an explicit link between the form of static fitness landscapes and their resulting evolutionary dynamics, in terms of simple observable quantities. Hopefully, this link will help bring together theoretical and experimental studies of adaptation. including neutral ones, is much shorter than the waiting time until the arrival of the next mutation. Therefore, the population is monomorphic at virtually all times, and occasionally it transitions almost instantaneously to a new type (17). Individuals and the population as a whole are characterized by their fitness, x. Φx (y)dy denotes the fitness-parametrized landscape, i.e. the probability that the mutation arising in an individual with fitness x has fitness y. We assume that genome length is sufficiently large so that each mutation occurs at a new site. A mutation fixes in the population with Kimura’s fixation probability πx (y) = (1 − e−2sx (y) )/(1 − e−2Nsx (y) ) where sx (y) = y/x − 1 is the selection coefficient (50). If a mutation arises and fixes, then the population instantaneously transitions from fitness x to fitness y—we ignore the time it takes for a mutation to fix. We can thus describe the sequence of such transitions by a stationary continuous-time Markov chain, whose state space is the semi axis [0, +∞). The population waits θ−1 generations for the next mutation on average. If we measure time by the expected number of mutations, the probability that the population has fitness in [y, y + dy] at time t + δt, given it had fitness x at time t, is Φx (y)πx (y)dyδt. and substitution trajectories as F(t, x) = ∞We define the fitness ∞ i=0 iPi (t|x), respectively, where P(y, t|x) is the 0 yP(y, t|x)dy, and S(t, x) = probability that the population has fitness in [y, y + δy] at time t, given initial fitness x, and Pi (t|x) is the probability that the population has accumulated i substitutions by time t, given initial fitness x [for simplicity we also write F(t) and S(t)]. It follows from the classical Markov chain theory that F and S satisfy the equations (see SI Appendix) ∂F (t, x) = (K̂b F(t, ·))(x), F(0, x) = x, ∂t ∂S (t, x) = (K̂b S(t, ·))(x) + q(x), S(0, x) = 0, ∂t [6] [7] where K̂b is defined by ∞ (K̂b f (·))(x) = Φx (ξ)πx (ξ)(f (ξ) − f (x))dξ, [8] 0 which is the backward Kolmogorov operator. In the SI Appendix, we present an efficient numerical method for finding the whole distributions P(y, t|x) and Pi (t|x). On landscapes for which mutations of large effect become increasingly unlikely as the fitness of the population increases, most of the contribution to the integral in Eq. 8 comes from values ξ ≈ x, and we can write f (ξ) − f (x) ≈ f (x)(ξ − x). Consequently, (K̂b f (·))(x) ≈ r(x)f (x), where r(x) is given by Eq. 4. Therefore, Eqs. 6 and 7 can be approximated by socalled advection equations that turn out to be equivalent to Eqs. 1 and 2 (see SI Appendix for details). Eqs. 1 and 2 are closely related to those derived by Tachida (51) and Welch and Waxman (37) for the uncorrelated landscape. In stochastic simulations, we implement a finite-site version of the model described above. In these simulations, after a substitution has occurred, a sample of size L = 1, 000 is drawn from the distribution Φx , which represents the (finite) mutational neighborhood of the current genotype. Each of these L-neighboring genotypes has the same probability to be drawn at a subsequent mutation event. Our results do not depend on the value of L on the time scales examined as long as L is large (e.g. L ≥ 103 ). Code written in the Objective Caml language is available upon request. We consider an asexual population of fixed size N that evolves according to the infinite-sites WF model (50) with a small mutation rate, so that θ (4 log N)−1 , where θ = Nμ and μ is the per-locus, per-generation mutation rate. This condition ensures that the absorption time of all mutations, ACKNOWLEDGMENTS. The authors thank Richard Lenski, Michael Desai, Todd Parsons, and Jeremy Draghi for many fruitful discussions. J.B.P. acknowledges support from the Burroughs Wellcome Fund, the David and Lucile Packard Foundation, the James S. McDonnell Foundation, the Alfred P. Sloan Foundation, and Defense Advanced Research Projects Agency Grant HR001105-1-0057. G.T. acknowledges support from National Science Foundation Grants IBN-0344678 and DMR04-25780. 1. Aita T, et al. (2007) Extracting characteristic properties of fitness landscape from in vitro molecular evolution: A case study on infectivity of fd phage to E. coli. J Theor Biol 246:538–550. 2. Kingman JFC (1978) A simple model for the balance between selection and mutation. J Appl Prob 15:1–12. 3. Kauffman S, Levin S (1987) Towards a general theory of adaptive walks on rugged landscapes. J Theor Biol 128:11–45. 4. Flyvbjerg H, Lautrup B (1992) Evolution in a rugged fitness landscape. Phys Rev A 46:6714–6723. 5. Park, SC, Krug, J (2008) Evolution in random fitness landscapes: The infinite sites model. J Stat Mech P04014. 6. Macken CA, Perelson AS (1989) Protein evolution on rugged landscapes. Proc Natl Acad Sci USA 86:6191–6195. 7. Kauffman S, Weinberger ED (1989) The NK model of rugged fitness landscape and its application to maturation of the immune response. J Theor Biol 141:211–245. 8. Orr HA (2002) The population genetics of adaptation: The adaptation of DNA sequences. Evolution 7:1317–1330. 9. Orr HA (2006) The population genetics of adaptation on correlated fitness landscapes: the block model. Evolution 60:1113–1124. 10. Cooper VS, Lenski RE (2000) The population genetics of ecological specialization in evolving Escherichia coli populations. Nature 407:736–739. 11. Perelson AS, Macken CA (1995) Protein evolution on partially correlated landscapes. Proc Natl Acad Sci USA 92:9657–9661. 12. Newman MEJ, Engelhardt R (1998) Effects of selective neutrality on the evolution of molecular species. Proc R Soc London Ser B 265:1333–1338. 13. Adami C (2006) Digital genetics: Unravelling the genetic basis of evolution. Nat Rev Genet 7:109–118. 14. Cowperthwaite MC, Meyers LA (2007) How mutational networks shape evolution: Lessons from RNA models. Annu Rev Ecol Evol Syst 38:203–230. 15. Ndifon, W, Plotkin, JB, Dushoff, J (2009) On the accessibility of adaptive phenotypes of a bacterial metabolic network. PLoS Comput Biol 5:e1000472. 16. Gillespie JH (1983) A simple stochastic gene substitution model. Theor Pop Biol 23:202–215. 17. Gillespie, JH (1994) The Causes of Molecular Evolution (Oxford Univ Press, Oxford). 18. Orr HA (2003) The distribution of fitness effects among beneficial mutations. Genetics 163:1519–1526. 19. Joyce P, Rokyta DR, Beisel CJ, Orr HA (2008) A general extreme value theory model for the adaptation of DNA sequences under strong selection and weak mutation. Genetics 180:1627–1643. 20. Rokyta DR, Beisel CJ, Joyce P (2006) Properties of adaptive walks on uncorrelated landscapes under strong selection and weak mutation. J Theor Biol 243:114– 120. Materials and Methods 18642 www.pnas.org / cgi / doi / 10.1073 / pnas.0905497106 Kryazhimskiy et al. EVOLUTION APPLIED 38. Kimura M, Ohta T (1968) Protein polymorphism as a phase of molecular evolution. Nature 229:467–469. 39. Bull JJ, et al. (1997) Exceptional convergent evolution in a virus. Genetics 147:1497– 1507. 40. Elena SF, Davila M, Novella IS, Holland JJ, Esteban (1998) Evolutionary dynamics of fitness recovery from the debilitating effects of Muller’s ratchet. Evolution 52:309–314. 41. de Visser, JAGM, Lenski RE (2002) Long-term experimental evolution in Escherichia coli. XI Rejection of non-transitive interactions as cause of declining rate of adaptation. BMC Evol Biol 2:19. 42. Hayashi Y, et al. (2006) Experimental rugged fitness landscape in protein sequence space. PLoS ONE 1:e96. 43. Orr HA (2000) The rate of adaptation in asexuals. Genetics 155:961–968. 44. Johnson T, Barton NH (2002) The effect of deleterious alleles on adaptation in asexual populations. Genetics 162:395–411. 45. Bachtrog D, Gordo I (2004) Adaptive evolution of asexual populations under Muller’s ratchet. Evolution 58:1403–1413. 46. Rozen DE, de Visser JAG, Gerrish PJ (2002) Fitness effects of fixed beneficial mutations in microbial populations. Curr Biol 12:1040–1045. 47. Hegreness M, Shoresh N, Hartl D, Kishony R (2006) An equivalence principle for the incorporation of favorable mutations in asexual populations. Science 311:1615– 1617. 48. Kouyos RD, Silander OK, Bonhoeffer S (2007) Epistasis between deleterious mutations and the evolution of recombination. Trends Ecol Evol 22:308–315. 49. Sanjuán R, Moya A, Elena SF (2004) The contribution of epistasis to the architecture of fitness in an RNA virus. Proc Natl Acad Sci USA 101:15376–15379. 50. Crow, JF, Kimura, M (1972) An Introduction to Population Genetics Theory (Harper & Row, New York). 51. Tachida H (1991) A study on a nearly neutral mutation model in finite populations. Genetics 128:183–192. 52. Brandt H (2001) Correlation Analysis of Fitness Landscapes. (International Institute for Applied Systems Analysis, Laxenburg, Austria), Interim Report IR-01-058. 53. Barrick JE, et al. (2009) Genome evolution and adaptation in a long-term experiment with E. coli. Nature, 10.1038/nature08480. Kryazhimskiy et al. PNAS November 3, 2009 vol. 106 no. 44 18643 MATHEMATICS 21. Lenski RE, Travisano M (1994) Dynamics of adaptation and diversification: A 10,000generation experiment with bacterial populations. Proc Natl Acad Sci USA 91:6808– 6814. 22. Silander OK, Tenaillon O, Chao L (2007) Understanding the evolutionary fate of finite populations: The dynamics of mutational effects. PLoS Bio 5:e94. 23. Paquin C, Adams J (1983) Frequency of fixation of adaptive mutations is higher in evolving diploid than haploid yeast populations. Nature 302:495–500. 24. Wichman HA, Millstein J, Bull JJ (2005) Adaptive molecular evolution for 13,000 phage generations: A possible arms race. Genetics 170:19–31. 25. Fontana W, Schuster P (1998) Continuity in evolution: On the nature of transitions. Science 280:1451–1455. 26. van Nimwegen E, Crutchfield JP, Huynen M (1999) Neutral evolution of mutational robustness. Proc Natl Acad Sci USA 96:9716–9720. 27. Orr HA (2003) A minimum on the mean number of steps taken in adaptive walks. J Theor Biol 220:241–247. 28. Mani R, St. Onge RP, Hartman IV, JL, Giaever G, Roth FP (2008) Defining genetic interaction. Proc Natl Acad Sci USA 105:3461–3466. 29. Eshel I (1971) On evolution in a population with an infinite number of types. Theor Pop Biol 2:209–236. 30. Gerrish PJ, Lenski RE (1998) The fate of competing beneficial mutations in an asexual population. Genetica 102–103:127–144. 31. Desai MM, Fisher DS (2007) Beneficial mutation-selection balance and the effect of linkage on positive selection. Genetics 176:1759–1798. 32. Dieckmann U, Law R (1996) The dynamical theory of coevolution: a derivation from stochastic ecological processes. J Math Biol 34:579–612. 33. Rouzine IM, Wakeley J, Coffin JM (2003) The solitary wave of asexual evolution. Proc Natl Acad Sci USA 100:587–592. 34. Park SC, Krug J (2007) Clonal interference in large populations. Proc Natl Acad Sci USA 104:18135–18140. 35. Wilke CO (2004) The speed of adaptation in large asexual populations. Genetics 167:2045–2053. 36. Zeyl C (2007) Evolutionary genetics: A piggyback ride to adaptation and diversity. Curr Biol 17:R333. 37. Welch JJ, Waxman D (2005) The nk model and population genetics. J Theor Biol 234:329–340. The Dynamics of Adaptation on Correlated Fitness Landscapes Sergey Kryazhimskiy, Gašper Tkačik, Joshua B. Plotkin1 Supporting information 1 To whom correspondence should be addressed. E-mail: [email protected] 1 1 Markov chain formalism and solutions As described in the main text, we consider an asexual population of fixed size N that evolves according to the Wright-Fisher model in the limit of low mutation rates [1, 2]. The type of an individual is determined solely by its fitness. Since under the weak-mutation limit the population is monomorphic except for negligibly brief periods when a mutation sweeps to fixation, the state of the population as a whole is completely described by the current fitness of its individuals. Φx (y)dy denotes the fitness parametrized landscape, i.e. the probability that the mutation arising in an individual with fitness x has a fitness in [y, y + dy]. sx (y) = y/x − 1 is the selection coefficient of such a mutation in a population with fitness x. The probability of fixation of the mutant is then given by [3] πx (y) = π(sx (y)) = 1 − e−2sx (y) , 1 − e−2N sx (y) (S1) which, in the infinite population size limit becomes πx (y) = 1 − e−2sx (y) for y > x and zero otherwise. If the mutation is fixed, the population transitions instantaneously from fitness x to new fitness y. The adaptive walk is then described by a stationary continuous-time Markov chain with state space [0, +∞). The population waits for the next mutation on average θ−1 generations where θ = µN is per locus per generation mutation rate scaled by population size. If time t is measured in the expected number of arrived mutations, the instantaneous transition rate from state x to state y is Q(y|x) = Φx (y)πx (y). (S2) We are interested in the probability P (y, t|x) of finding the population at fitness value y after time t given initial fitness x at time zero and in the probability Pi (t|x) for the population to accumulate i substitutions by time t given initial fitness x. Here we present the general operator-based formulation which is well-suited for the mathematical analysis and for analytic calculations with simple fitness landscapes. We also derive recursion relations appropriate that are convenient for the numerical computation of the distributions P (y, t|x) and Pi (t|x) as well as their moments. 1.1 Formal solutions Define the forward and backward operators by Z ∞ K̂f f (·) (y) = f (ξ)Q(y|ξ) − f (y)Q(ξ|y) dξ, Z0 ∞ Q(ξ|x) f (ξ) − f (x) dξ, K̂b f (·) (x) = 0 2 (S3) (S4) respectively. It follows from the standard Markov chain theory that P (y, t|x) satisfies the forward and backward Kolmogorov equations ∂P (y, t|x) = ∂t ∂P (y, t|x) = ∂t K̂f P (·, t|x) (y) (S5) K̂b P (y, t|·) (x), (S6) with the initial condition P (y, 0|x) = δ(y − x), (S7) where δ(z) is the Dirac delta-function. The formal solutions to the equations (S5)–(S7) can be written as P (y, t|x) = exp{t K̂f }P (·, 0|x) (y), (S8) P (y, t|x) = exp{t K̂b }P (y, 0|·) (x), (S9) P i where the operator exponentiation is defined as exp{F̂ } = ∞ i=0 F̂ /i!. The equations for Pi (t|x) are more cumbersome. In the next section we show that Pi (t|x) satisfy recursive equations ∂P0 (t|x) = −q(x)P0 (t|x), ∂t ∂Pi (t|x) = −q(x) Pi (t|x) − Pi−1 (t|x) + K̂b Pi−1 (t|·) (x), ∂t (S10) i = 1, 2, . . . (S11) with the initial condition Pi (0|x) = δi0 , (S12) where δij is the Kronecker delta and Z ∞ q(x) = Q(y|x) dy (S13) 0 is the expected fixation probability of a mutant that occurs in the background x (see also equation (3) in the main text), The solution to equations (S10)–(S12) is given by P0 (t|x) = e−q(x)t Z t Z Pi (t|x) = dτ 0 (S14) ∞ e−q(x)(t−τ ) Q(ξ|x)Pi−1 (τ |ξ) dξ, 0 for i = 1, 2, . . . 3 (S15) 1.2 Derivation of the distribution Pi (t|x) In order to derive the equations (S10), (S11), note that the probability P (y, t|x) of observing the population at fitness y at time t given that it had fitness x at time zero can be expressed as a sum of the probabilities of reaching fitness y from fitness x in time t with any possible number of substitutions, ∞ X P (y, t|x) = Pi (y, t|x). i=0 Pi (y, t|x) is the probability for the population to reach fitness y by time t in exactly i substitution events, given the initial fitness x. ThenR the probability of having accumu∞ lated exactly i substitutions by time t is Pi (t|x) = 0 Pi (y, t|x)dy. It is easy to derive the recursion relations for Pi (y, t|x) from the following considerations. First note that after zero substitutions the population must have the initial fitness x and, since the first substitution will occur with rate q(x), we have P0 (y, t|x) = δ(y − x)e−q(x)t . (S16) After integrating this expression over y we obtain (S14). Now, in order for the population to be in fitness y at time t after exactly i substitutions, the first substitution must have occurred at some time τ < t which moved the population to some fitness ξ after which another i − 1 substitutions brought it to fitness y in the period of time between τ and t. So, conditioned on ξ and τ , the probability of finding the system in state y at time t is the product of three probabilities: (a) the probability of the first substitution occurring at time τ , q(x)e−q(x)τ , (b) the probability that the first substitution moves the population to fitness ξ, Q(ξ|x)/q(x), and (c) the probability that i−1 substitutions move the population from fitness ξ to fitness y in the time period t − τ , Pi−1 (y, t − τ |ξ). Therefore, integrating over all τ and ξ, Z ∞ Z t Pi (y, t|x) = dξ e−q(x)τ Q(ξ|x)Pi−1 (y, t − τ |ξ) dτ, for i = 1, 2, . . . (S17) 0 0 P It is easy to show that ∞ i=0 Pi (y, t|x) with Pi (y, t|x) defined by equations (S16), (S17) satisfies the backward Kolmogorov equation (S6). To compute the number of substitutions at time t, we rewrite the recursion equation (S17) as Z ∞ Z t Pi (y, t|x) = Q(ξ|x) dξ e−q(x)(t−τ ) Pi−1 (y, τ |ξ) dτ 0 0 from which (S15) follows after integration with respect to y. Equations (S10), (S11) follow from (S14), (S15) by differentiating with respect to t. 1.3 Fitness and substitution trajectories We call the expected value of the distribution P (y, t|x) the fitness trajectory F (t, x) and we call the expected value of the distribution Pi (t|x) the substitution trajectory S(t, x). 4 1.3.1 General equations for the fitness and substitution trajectories Multiplying the backward equation (S6) by y and integrating it with respect to y, we obtain ∂F (t, x) = K̂b F (t, ·) (x) (S18) ∂t with the initial condition F (0, x) = x, (S19) whose formal solution is given by F (t, x) = exp{tK̂b }I(·) (x), (S20) where I(x) = x is the identity function. Analogously, from equations (S10), (S11) follows that the substitution trajectory satisfies the equation ∂S (t, x) = q(x) + K̂b S(t, ·) (x). ∂t (S21) S(0, x) = 0, (S22) with the initial condition whose solution is given by S(t, x) = ∞ X i=0 (t)i+1 K̂bi q(·) (x) . (i + 1)! (S23) An obvious result follows immediately from equation (S21): if the rate rate of substitutions is the same for all fitnesses, i.e., if q(x) = q0 = const, then the substitutions accumulate linearly with time. Indeed, since K̂b q0 (x) ≡ 0, equation (S21) becomes an ordinary differential equation whose solution is S(t, x) = q0 t. 1.3.2 Approximate equations for the fitness and substitution trajectories In this section we derive equations (1)–(2) in the main text. We assume that the advection approximation holds and the r(x) and q(x) functions are sufficiently smooth. First, we notice that on landscapes for which mutations of large effect become increasingly unlikely as the fitness of the parent increases, most of the contribution to the integral (S4) comes from values ξ ≈ x and we can write f (ξ) − f (x) ≈ f 0 (x)(ξ − x). Consequently, K̂b f (·) (x) ≈ r(x)f 0 (x), where r(x) is given by equation (3) in the main text. Under this so-called advection approximation, equations (S18), (S21) become ∂F ∂F (t, x) = r(x) , F (0, x) = x ∂t ∂x ∂S ∂S (t, x) = r(x) + q(x), S(0, x) = 0, ∂t ∂x where q(x) is defined by equation (4) in the main text (or equation (S13) above). 5 (S24) (S25) In fact, equations (S24) and (S25) are equivalent to equations equations (1)–(2) in the main text. To see this, first, let Z x dξ χ(x0 , x, t) = + t. x0 r(ξ) This function is monotonic in x and in x0 as long as r(ξ) does not change sign. Since we are interested in adaptation, we always have r(ξ) > 0, so that we can solve the equation χ(x0 , x, t) = 0 with respect to x0 . Denote the solution as x0 = u(x, t). Analogously, we obtain the solution of the same equation with respect to x, x = v(x0 , t). Both equations (S24) and (S25) have the same characteristic which is given by equation dx = −r(x), dt x(0) = x0 . The solution of equation (S24) does not change along this characteristic, and therefore it is given by F (t, x) = u(x, t). Using the implicit function differentiation rules, it is easy to see that F (t, x) satisfies equation (1) in the main text. The solution of equation (S25) changes along this characteristic according to equation dS = q(v(x0 , t)), dt S(x0 , 0) = 0, and therefore it is given by Z t Z F (t,x) q(v(u(x, t), τ ))dτ = S(t, x) = x 0 q(ζ) dζ. r(ζ) Here we used the fact that v(x0 , 0) = x0 and v(u(x, t), t) ≡ x. Now it is easy to see that S(t, x) satifies equation (2) in the main text. 1.4 Numerical algorithm Only in some special cases can the formulas (S8), (S9), (S14), (S15) be effectively used for evaluating the distributions P (y, t|x) and Pi (t|x). We propose the following recursion equations for the efficient numerical implementation. 1.4.1 Computing distribution P (y, t|x) The basic idea behind the recursion is to write the probability P (y, t|x) as the sum over all possible paths connecting the initial fitness x with fitness y at time t, each with a particular number m = 0, 1, . . . of mutations. P (y, t|x) = ∞ X Um (t) Vm (y|x). m=0 6 (S26) Here, Um (t) is the probability of observing m mutations during time interval [0, t], and Vm (y|x) is the probability for a change in fitness from initial value x to final value y that takes exactly m mutational attempts. Because the mutations arise independently, Um (t) is the Poisson distribution with parameter t, Um (t) = (t)m −t e . m! (S27) Note that the sum in equation (S26) runs over all possible numbers of mutations, some of which will fix and some of which will not; if we conditioned on the mutations having been fixed, the distribution U would no longer be Poisson. The sequence Vm (y|x) can be written as follows: V0 (y|x) Vm (y|x) = = δ(y − x), Z Z ∞ Q(y|ξ)Vm−1 (ξ|x) dξ + Vm−1 (y|x) (S28) ∞ (1 − πy (ξ))Φy (ξ) dξ 0 0 for m = 1, 2, . . . . (S29) The relations (S29) have a simple intuitive interpretation. For each m but m = 0, the distribution of fitnesses after exactly m mutations is a sum of two terms. The first term accounts for the situation when m − 1 mutations preceding the current one have brought the population into state ξ. This term equals the probability that a mutation with fitness y arises and is successfully fixed in the population with the intermediate fitness ξ. The second term accounts for the situation when fitness y has already been reached with the preceding m − 1 mutations. In order for the final fitness to still be y, the m-th mutation whose fitness is ξ must fail to fix. In practice, one computes the distribution of the number of mutations from equation (S27) to find the range of m over which U is non-negligible, evaluates by recursion the terms in equation (S29) in the relevant range, and finally sums them up according to equation (S26). A Matlab implementation of this algorithm is available upon request. Now we show that the solution to the recursion relations (S26)–(S29) in fact coincides with the solution (S8) of the forward equation (S5). First, note that equations (S29) can be written in the form Vm (y|x) = Vm−1 (y|x) + K̂f Vm−1 (·|x) (y) for m = 1, 2, . . . Now it is easy to see that, in fact, ! m X m Vm (y|x) = K̂fi V0 (·|x) (y). i i=0 (S30) Substituting (S27) and (S30) into (S26) and changing the order of summation, we obtain ! ∞ X (tK̂f )i V0 (·|x) (y), P (y, t|x) = i! i=0 which coincides with (S8). 7 1.4.2 Computing distribution Pi (t|x) To compute the distribution Pi (t|x), let us first write it as Pi (t|x) = ∞ X Um (t)Wm (i|x), (S31) m=i where Wm (i|x) is the probability that out of m mutations exactly i have fixed, given the initial fitness x; clearly Wm (i|x) ≡ 0 if i > m. First, note that the probability wj that the j-th mutation has fixed, is given by the first term of equation (S29) integrated over all final fitnesses y, Z ∞ Z ∞ wj (x) = dy Q(y|ξ)Vj−1 (ξ|x) dξ, j = 1, 2 . . . 0 0 Let us describe the fate of m mutations by a vector σ m = (σ1 , σ2 , . . . , σm ) where σj = 1 if the j-th mutation has fixed, and σj = 0 if it was lost. The event that out of m mutations exactly i have fixed encompasses all events that are described by vectors P σ m such that Pm m σ = i. Denote the set of all such elementary events by Σ = {σ : m,i m j=1 σj = i}. j=1 j For example, the event that, out of 2 mutations, exactly one has fixed can be realized by σ 2 = (1, 0) where the first mutation has fixed and the second has not and by σ 2 = (0, 1) where the second mutation has fixed and the first has not; therefore Σ2,1 = {(1, 0), (0, 1)}. Then, since all members of the set Σm,i are mutually exclusive, Wm (i|x) = m XY σ wj j (x)(1 − wj (x))1−σj . (S32) Σm,i j=1 Of course, if wj (x) were equal for all j, this expression would reduce to the binomial probability with parameters m and wj (x). In general, wj (x) are not equal, and equation (S32) is difficult to evaluate. We conjecture, however, that the sum in (S32) is usually dominated by a small number of terms, which one could try to find knowing each wj (x) from the recursion relations. Fortunately, calculating some lower order statistics, like the mean of the distribution Pi (t|x), or its variance, is much easier. The expected number of fixations of the j-th mutation is 1 · wj (x) + 0 · (1 − wj (x)) = wj (x). Thus, the expected number of substitutions that occurred after m mutations took place is ∞ m X X iWm (i|x) = wj (x). (S33) i=0 j=1 The substitution trajectory can then be finally written as ∞ ∞ ∞ m X X X X S(t, x) = Um (t) iWm (i|x) = Um (t) wj (x). m=0 m=1 i=0 (S34) j=1 The variance in the number of substitutions can be similarly calculated, taking into account that the variance in the expected number of fixations of the j-th mutation equals wj (x)(1 − wj (x)), and the total variance is the sum over individual mutational steps. 8 2 The role of neutral and deleterious mutations in adaptation In this section we investigate how the distributions P (y, t|x) and Pi (t|x) change if a constant fraction of neutral or deleterious mutations is added to the NFD. 2.1 Neutral mutations Suppose that on the fitness landscape Φx the distribution of fitnesses at time t is P (y, t|x) and the distribution of the number of accumulated substitution is Pi (t|x). Let Φ̃x = νδx + (1 − ν)Φx (S35) where δx is a point mass centered at x, be a new fitness landscape with a fraction ν of neutral mutations. Let the distribution of fitnesses at time t on this landscape be P̃ (y, t|x) and let the distribution of substitutions be P̃i (t|x). We claim that P̃ (y, t|x) = P (y, (1 − ν)t|x) i X P̃i (t|x) = Uj (N −1 νt)Pi−j (1 − ν)t|x , (S36) (S37) j=0 where Uj (N −1 νt) is, as before, the Poisson distribution with parameter N −1 νt. Expression (S36) shows that the evolution of the distribution P (y, t|x) proceeds on the landscape Φ̃x with mutation rate θ exactly as on the landscape Φx with a smaller mutation rate (1−ν)θ. Expression (S37) shows that, if the random variables Qt and Q̃t describe the number of substitutions that occurred by time t on the fitness landscapes Φx and Φ̃x , respectively, then Q̃t = Q(1−ν)t + Rνt , where Rt is a Poisson process with rate N −1 . In other words, neutral mutations simply add an independent Poisson counting process to the original substitution process. To show that (S36) and (S37) hold, we substitute (S35) into (S4) and (S13) and obtain Z ∞ def K̃b f (·) (x) = Φ̃x (ξ)πx (ξ) f (ξ) − f (x) dξ = (1 − ν) K̂b f (·) (x), (S38) Z0 ∞ def q̃(x) = Φ̃x (ξ)πx (ξ) dξ = νN −1 + (1 − ν)q(x). (S39) 0 From (S38) follows that the backward equation for P̃ (y, t|x) differs from the backward equation for P (y, t|x) only by the scaling factor 1 − ν. Now, ∂ P̃0 (t|x) = −N −1 νU0 (N −1 νt)P0 (1 − ν)t|x − (1 − ν)q(x)U0 (N −1 νt)P0 (1 − ν)t|x ∂t = −q̃(x)P̃0 (t|x), 9 and i X ∂ P̃i −1 −1 −1 (t|x) = − N νUj (N νt)Pi−j (1 − ν)t|x − (1 − ν)Uj (N νt)q(x)Pi−j (1 − ν)t|x ∂t j=0 + N −1 ν i X Uj−1 (N −1 νt)Pi−j (1 − ν)t|x + (1 − ν)q(x) j=1 + (1 − ν) i−1 X i−1 X Uj (N −1 νt)Pi−j−1 (1 − ν)t|x j=0 Uj (N −1 νt) K̂b Pi−j−1 (1 − ν)t| · (x) j=0 = − P̃i (t|x) − P̃i−1 (t|x) + K̃b P̃i (t|·) (x) which implies that P̃i given by equation (S37) satisfy equations (S10), (S11) for the landscape Φ̃x . As a consequence, the fitness trajectory S̃(t, x) on the landscape Φ̃x is given by S̃(t, x) = S (1 − ν)t, x + νN −1 t, which can also be obtained directly by substituting expression (S39) into solution (S23). 2.2 Deleterious mutations In general, it is hard to predict how deleterious mutations would influence the dynamics of adaptation. However, their effect becomes negligible as the population size goes to infinity, at least in the weak-mutation limit. Indeed, the fixation probability (S1) of deleterious mutations quickly tends to zero as the population size increases. For example, the probability of fixation of a moderately deleterious mutation with the selective disadvantage of 0.1% is less than 10−3 for a population of size of 103 and is less than 10−11 for a population of size of 104 . Thus, even in moderately large populations, the vast majority of deleterious mutations will not go to fixation. Therefore, on the long time scale, all deleterious mutations are equivalent to being lethal. Intuitively, this means that if we add a fraction d of deleterious mutations to the NFD of all genotypes, this fraction of mutations will simply be wasted and only the remaining fraction 1 − d will be potentially utilized in the process of adaptation. To illustrate that this indeed is happening, we add a fraction d of deleterious mutations to the non-epistatic and stairway to heaven landscapes. We call the resulting landscapes NEPI+d and STH+d, respectively. The fitness and substitution trajectories for these landscapes are shown in Figure S1. As expected, the analytical approximations calculated under the assumption that deleterious mutations are wasted gives an excellent fit to simulations. An important consequence of this observation is that the weak mutation theory holds when θb (4 log N )−1 instead of the more stringent θ (4 log N )−1 , where θb ≡ (1 − d)θ. If the genomic rate of beneficial mutations µb is 10−5 [4], then this condition is satisfied for population sizes smaller than 1000. 10 4 STH+d probability density NEPI+d Φ 1 (y ) 1 0 2 1 Φ 1 (y ) Φ 4 (y ) 0 2 0 2 4 0 10 4 10 2 10 0 10 4 10 10 Φ 4 (y ) 0 2 2 Φ 1 (y ) 1 10 10 Φ 4 (y ) expected fitness, F(t) MTF 3 10 expected number of substitutions, S(t) 2 6 8 10 fitness of a mutant, y 0 10 −2 10 0 10 time, t 2 10 1 10 −1 10 3 10 1 10 −1 10 3 10 1 10 −1 10 −2 10 0 10 2 10 time, t Figure S1: Dynamics of adaptation on the continuous additive Mount Fuji landscape (MTF), the non-epistatic landscape with deleterious mutations (NEPI+d) and the stairway to heaven landscape with deleterious mutations (STH+d). Notations are as in Figure 1 in the main text. Parameter values used: N = 1000, µ = 10−5 (θ = 0.01), L = 1000, number of replicate simulations = 103 . MTF landscape: xmax = 5, a = 1; NEPI+d landscape: d = 0.5 and a = 1. STH+d landscape: d = 0.5 and a = 0.42. The same analytical approximations were used here for the NEPI+d and STH+d landscapes as in the main text, but time was rescaled by θ(1 − d) instead of θ. 11 3 Classical landscapes Recall that we employ the following definitions (see main text). 1. The house of cards or the uncorrelated landscapes are the landscapes on which the NFD is the same for all genotypes (and fitnesses), Φx (y) dy = Ψ(y) dy. 2. The non-epistatic landscapes are landscapes on which the distribution of fitness effects Ψ(v) remains the same for all genotypes, so that the NFD is given by Φx (y) dy = Ψ(y − x) dy. 3. The stairway to heaven landscapes are the landscapes on which the distribution of selection coefficients of mutations, Ψ(s), is the same for all genotypes, so that the NFD is given by 1 y−x Φx (y) dy = Ψ dy. x x In the main text we considered special cases of these landscapes when the distribution Ψ was of exponential form (see Table 1 in the main text), House of cards Non-epistatic Stairway to heaven 3.1 n yo 1 exp − , y≥0 a a 1 y−x Φx (y) = exp − , y≥x a a y−x 1 exp − Φx (y) = , y≥x ax ax Φx (y) = (S40) (S41) (S42) Correlation structure The house of cards, non-epistatic, and stairway to heaven landscapes differ by the correlation structure between parent and offspring fitnesses. By definition, there is no such correlation on the house of cards landscape. By contrast, the offspring fitness is positively correlated with the parent fitness on both non-epistatic and stairway to heaven landscapes. Let X be the fitness of the parent that is drawn randomly from some distribution, and Y be the fitness of the offspring. On non-epistatic landscapes, Y = X + V , where V is the fitness increment which is drawn from distribution Ψ(v) independently of X. Then Cov(X, Y ) = E (X − X̄)(X − X̄ + V − V̄ ) = Var(X) > 0, 12 where, X̄ and Var(X) are the mean and the variance of the distribution from which the parent is drawn, and V̄ > 0 is the mean of the distribution of fitness increments. On the stairway to heaven landscapes, Y = X(1 + S), where S is the selection coefficient which is drawn from distribution Ψ(s) independently of X. Then Cov(X, Y ) = E (X − X̄)(X − X̄ + XS − X̄ S̄) = Var(X) (1 + S̄) > 0, where S̄ > −1 is the mean of the distribution of selection coefficients. 3.2 Approximate solution for the exponential house of cards landscape On the house of cards landscape (S40) we have, for large population sizes and for large x, Z x 1 − x ∞ − y−x 2a − x −2 y−x a a x q(x) = e e 1−e dy = e a ≈ 2ae− a a x + 2a Zx ∞ y−x y−x x 4a2 (x + a) − x 1 −x (y − x)e− a 1 − e−2 x dy = r(x) = e a e a ≈ 4a2 e− a . 2 a (x + 2a) x The last approximate inequality for q(x) and r(x) is not very accurate since it neglects the term of order x−1 ,but it captures the fact the exponential decay in both q(x) and r(x) will dominate the power-law decay as x gets large. After substituting these functions into equations (1), (2) in the main text, we solve them using the method of characteristics to obtain expressions for the fitness and substitution trajectories presented in Figure 1 (main text). 3.3 Approximate solution for the exponential non-epistatic landscape On the non-epistatic landscape (S41) we have, for large population sizes, and for large x, Z 1 ∞ − y−x 2a 2a −2 y−x a x q(x) = e 1−e dy = ≈ a x x + 2a x Z ∞ 2 y−x y−x 1 4a (x + a) 4a2 r(x) = dy = (y − x)e− a 1 − e−2 x ≈ . a x (x + 2a)2 x After substituting these functions into equations (1), (2) in the main text, we solve them using the method of characteristics to obtain expressions for the fitness and substitution trajectories presented in Figure 1 (main text). 13 3.4 Exact solution for an arbitrary stairway to heaven landscape It is possible to solve equations (S18)–(S22) for an arbitrary stairway to heaven landscape, Φx (y) = xR−1 Ψ(y/x − 1). First note that the expected fixation probability of a ∞ mutation, q(x) = −1 Ψ(s)π(s) ds = hπ(s)i, is independent of the fitness x of the parental R∞ genotype. In addition, r(x) = x −1 sΨ(s)π(s) ds = xhπ(s)si, which suggests, after exploring the advection approximation, the ansatz F (t, x) = f (t)x and S(t, x) = g(t) for the equations (S18)–(S22). With this ansatz we obtain Z ∞ 1 ξ−x K̂b F (t, ·) (x) = Ψ πx (ξ)f (t)(ξ − x) dξ = hπ(s)si F (t, x), x x 0 K̂b S(t, ·) (x) = 0, where hπ(s)si is the expected selection coefficient of a random mutation to any genotype, weighted by its fixation probability. Equations (S18)–(S22) become simple ODE’s whose solutions are given by F (t, x) = xehπ(s)sit S(t, x) = hπ(s)it, (S43) (S44) Expressions for the fitness and substitution trajectories presented in Figure 1 (main text) follow from expressions (S43), (S44) by noting that, for large population sizes, Z 1 ∞ −s 2a e a 1 − e−2s ds = , hπ(s)i = a 0 1 + 2a Z 1 ∞ −s 4a2 (1 + a) hπ(s)si = s e a 1 − e−2s ds = . a 0 (1 + 2a)2 It can be shown analogously that the k-th moment of the distribution of fitnesses, Mk (t, x), evolves according to Mk (t, x) = xn eκn t , Pn n j where κn = j=1 j hπ(s)s i. In particular, the relative width of the distribution of 2 fitnesses increases with time, M2 (t, x)/F 2 (t, x) = ehπ(s)s it − 1. 3.5 Mount Fuji landscape In addition to the classical fitness landscape considered above, the class of fitness parametrized landscapes encompasses many other landscapes. To demonstrate this, we present here a version of the “Mount Fuji” landscape. On this landscape, the fitness decreases monotonically with the Hamming distance from the single optimal genotype, so that the fitness of the genotype that differs by h mutations from the optimal one is (1 − s)h , where 0 < s < 1. If formulated in terms of neighbor fitness distributions, 14 such multiplicative mount Fuji landscape would be defined for a discrete set of fitnesses x ∈ {1, 1 − s, · · · , (1 − s)L }, where L is the genome size, Φx (y) = bh δx(1−s)−1 (y) + (1 − bh )δx(1−s) (y). Here, δz is, as before, a point measure centered at z, and bh is the probability of a beneficial mutation to a genotype with h mutations. These probabilities can be easily calculated knowing the genome length L and the alphabet size |A|. For instance, b0 = 1, b1 = (|A|L)−1 and bL = 1. A continuous version of the additive Mount Fuji landscape can also be defined, for example, as follows. h i ( 1 a a , if y ∈ x(1 − ), x(1 − ) + a a xmax xmax Φx (y) = 0, otherwise On this landscape, the fraction of beneficial mutations decreases linearly from 1 to 0 as the fitness changes from 0 to the maximum value xmax . Parameter a defines the width of the NFD. The dynamics of adaptation on this landscape is shown in Figure S1. 4 Relaxation of the weak-mutation limit In this section we investigate, by means of simulations, the validity of our theory outside of the weak-mutation limit. We perform full stochastic simulations of the infinite alleles Wright-Fisher model with N = 1000 individuals. We vary the mutation rate from µ = 10−5 to µ = 10−2 per individual per generation, which corresponds to θ ranging from θ = 0.01, where our theory should well describe the Wright-Fisher model, up to θ = 10, where clonal interference and piggybacking effects cannot be ignored. In the simulations, each individual is characterized by its allelic type z (a float number between 0 and 1); xz is the fitness of allele z, kz is the number of mutations that have occurred on the line of descent of an individual of type z. A mutant offspring of individual of type z has type z 0 which is drawn randomly from [0, 1]; fitness xz0 is then drawn from the distribution Φxz , and kz0 = kz + 1. At each time point t the population is characterized by a collection of K(t) types z1 , z2 , . . . , zK(t) and their frequencies f1 , f2 , . . . , fK(t) . We use the shorthand notations xi ≡ xzi and ki ≡ kzi . In the simulations we track four summary statistics: P 1. The mean fitness of the population K(t) i=1 xi fi P 2. The mean number of mutations since the initial time point K(t) i=1 ki fi P 2 3. The population heterozygosity 1 − K(t) i=1 fi 4. The number of alleles present in the population, K(t) 15 3 mean number of substitutions mean fitness 0 NEPI 10 4 10 2 10 0 10 4 10 2 STH −2 1 10 10 10 10 1 10 10 10 −4 10 −1 10 3 10 heterozygosity 2 HOC 2 0 10 1 10 −1 10 3 10 0 number of alleles 4 10 0 10 −2 10 −4 10 0 10 −2 1 2 10 1 10 0 10 2 10 1 10 10 10 10 −4 0 10 10 −1 0 10 2 10 10 4 10 0 10 2 10 4 0 10 0 10 10 2 10 4 0 10 10 2 10 4 10 time, t (in generations) Figure S2: Dynamics of adaptation in the full Wright-Fisher model with µ = 10−5 , N = 103 on three classical landscapes. The first and second columns show the fitness and substitution trajectories (see text for details). Black lines correspond to the predictions of our theory; gray lines show the results of the Wright-Fisher simulations; dashed lines show a linear function, for reference. The third column shows the evolution of heterozygosity, and the fourth column shows how the number of alleles in the population changes over time (see text for details). Note that time is measured in generations. Parameter values are the same as in Figure 1 in the main text, except number of replicate simulations is 103 ; at time zero the population is monomorphic with a type with fitness 2. Simulations are terminated prematurely if the fitness of an indiviual exceeds 10100 . 3 mean number of substitutions STH mean fitness 0 NEPI −2 1 10 10 4 10 2 10 0 10 4 10 2 10 10 10 1 10 10 10 −4 10 −1 10 3 10 heterozygosity 2 HOC 2 0 10 1 10 −1 10 3 10 0 number of alleles 4 10 0 10 −2 10 −4 10 0 10 −2 1 2 10 1 10 0 10 2 10 1 10 10 10 10 −4 0 10 10 −1 0 10 2 10 4 10 10 0 10 2 10 4 0 10 0 10 10 2 10 4 10 0 10 2 10 4 10 time, t (in generations) Figure S3: Dynamics of adaptation in the full Wright-Fisher model with µ = 10−4 , N = 103 . Notations as in Figure S2. 16 3 mean number of substitutions mean fitness 0 NEPI 10 4 10 2 10 0 10 4 10 2 STH −2 1 10 10 10 10 1 10 10 10 −4 10 −1 10 3 10 heterozygosity 2 HOC 2 0 10 1 10 −1 10 3 10 0 number of alleles 4 10 0 10 −2 10 −4 10 0 10 −2 1 2 10 1 10 0 10 2 10 1 10 10 10 10 −4 0 10 10 −1 0 10 2 10 10 4 10 0 10 2 10 4 0 10 0 10 10 2 10 4 0 10 10 2 10 4 10 time, t (in generations) Figure S4: Dynamics of adaptation in the full Wright-Fisher model with µ = 10−3 , N = 103 . Notations as in Figure S2. 3 mean number of substitutions STH mean fitness 0 NEPI −2 1 10 10 4 10 2 10 0 10 4 10 2 10 10 10 1 10 10 10 −4 10 −1 10 3 10 heterozygosity 2 HOC 2 0 10 1 10 −1 10 3 10 0 number of alleles 4 10 0 10 −2 10 −4 10 0 10 −2 1 2 10 1 10 0 10 2 10 1 10 10 10 10 −4 0 10 10 −1 0 10 2 10 4 10 10 0 10 2 10 4 0 10 0 10 10 2 10 4 10 0 10 2 10 4 10 time, t (in generations) Figure S5: Dynamics of adaptation in the full Wright-Fisher model with µ = 10−2 , N = 103 . Notations as in Figure S2. 17 3 3 10 10 1 HOC 1 10 10 STH −1 Var(# of subst) NEPI Var(fitness) −1 103 10 1 10 −1 103 10 1 1 10 −1 103 10 1 10 10 −1 10 103 10 −1 −2 10 0 10 10 2 10 −2 10 time, t 0 10 2 10 time, t Figure S6: Variance of the ensemble distribution of fitnesses (top row) and substitutions (bottom row) for classical landscapes. Notations and parameter values are as in Figure 1 in the main text. Figures S2–S5 show the average values of these statistics across 1000 independent replicas. For θ = 0.01 (Figure S2) and even for θ = 0.1 (Figure S3), our theory accurately describes the dynamics of adaptation, as expected. For θ = 1 (Figure S4) and θ = 10 (Figure S5), the quantitative predictions of our theory are poor. Indeed, when θ > 1, the population is polymorphic most of the time—this can be seen in the graphs showing the population heterozygosity and the number of coexisting alleles. Thus, in simulations with θ > 1 clonal interference and piggybacking certainly occur. Surprizingly, even in this regime the qualitative predictions of our theory still hold. In particular, we observe that, even though the curvature of the fitness and substitution trajectories depends on the mutation rate, its sign does not. In other words, landscapes that give rise to concave (convex) fitness (substitution) trajectories in the weak-mutation limit continue to give rise to concave (convex) fitness (substitution) trajectories even in the presence of clonal interference and piggybacking. This implies that we can use the weak mutation theory to obtain qualitative conclusions about the fitness landscape, even if the observed trajectories were generated under high mutation rates. References [S1] Gillespie, JH (1994) The causes of molecular evolution (Oxford University Press). [S2] Orr, HA (2002) The population genetics of adaptation: The adaptation of DNA sequences. Evolution 7:1317–1330. 18 [S3] Crow, JF, Kimura, M (1972) An introduction to population genetics theory (Harper & Row Ltd). [S4] Perfeito, L, Fernandes, L, Mota, C, Gordo, I (2007) Adaptive mutations in bacteria: High rate and small effects. Science 317:813–815. 19
© Copyright 2026 Paperzz