Indirect Mutational Pathways to Population Fitness

Indirect Mutational Pathways to Population Fitness
Amanda Zajac
Thesis submitted in partial fulfillment for
Honors in Applied Mathematics – Biology, Sc.B.
Brown University
April 15th, 2017
Thesis Advisor: Daniel Weinreich, Ph.D.
Second Reader: Anastasios Matzavinos, Ph.D.
Table of Contents
Abstract ..................................................................................................................... 3
Introduction ...............................................................................................................4
Methods ..................................................................................................................... 7
Results ..................................................................................................................... 16
Accessibility of Pathways .............................................................................16
Efficiency of Tradeoff ................................................................................... 19
Determining Correlation of Fitness and Path Length .................................23
Discussion ...............................................................................................................26
Acknowledgments ...................................................................................................29
References ...............................................................................................................30
2
Abstract
Mutational paths to a better-adapted genotype are characterized as monotonically
increasing in organismal fitness. Here, we identify and evaluate the accessibility of direct
mutational trajectories, as well as the evolutionary implications of indirect trajectories
caused by mutational reversions, as a result of sign epistasis. Through the analysis of
empirical fitness values of organisms, we found that indirect mutational pathways
increase the number of pathways leading to a determined fitness peak. The value of
tradeoff efficiency is defined as the ratio of the number of accessible indirect pathways to
the total number of inaccessible direct pathways as a result of sign epistasis. In evaluating
this metric, we determined that a positive correlation exists between an increase in
mutation sites and accessible indirect pathways. In this study, we use a number of
analytic and algorithmic mechanisms to further explore the mechanisms of drug
resistance and the effect of differing mutational pathways on overall fitness of an
organism.
3
Introduction
Current analysis of mutational paths to higher fitness is predominantly based on
the existence and identification of direct mutational trajectories. A mutational trajectory
is a pathway that leads from a wild type genotype to a better-adapted genotype. The
trajectory is considered direct when it consists of the shortest number of steps necessary
to reach the better-adapted genotype. In this sense, the length of the direct mutational
trajectory is equal to the number of mutational differences between the starting and
ending genotypes. To determine evolutionary relevance, mutational pathways are
evaluated on their accessibility as a population evolves. Accessibility of mutational
pathways is determined based on the concept of monotonically increasing fitness, in
which every step in a given trajectory results in an increase in fitness.
Weinreich et. al. (2005) postulates that the genetic constraint as exhibited on
direct pathways is contingent on the existence of sign epistasis. Sign epistasis results in
the failure of a mutation to consistently influence a given phenotype on all genetic
backgrounds, affecting the accessibility of a given mutational trajectory (Weinreich et al.,
2013). For the purposes of this study, the relevant effects of sign epistasis include
mutations that are beneficial on some backgrounds, and deleterious on others. As a result
of this situation, an indirect mutational trajectory composed of more steps may be
necessary in order to achieve higher fitness in a population, as compared to the number of
steps necessary in a direct pathway. These additional steps, backwards in nature, are
referred to as a reversion. The potential mutational trajectories between a wild type and
higher fitness genotype are exemplified in the form of a hypercube, which includes all
possible pathways to the higher fitness genotype (Fig. 1). Accessible direct and indirect
4
mutational trajectories are then evaluated on accessibility, determined on the basis of
monotonically increasing fitness.
Fig. 1.
111
110
100
101
011
010
001
000
Figure 1: Direct and Indirect Pathways of an organism with L=3 mutational sites. Examples of potential
trajectories leading to a genotype of highest fitness in a three-bit model. The gray arrows represent all
potential pathways. Two accessible pathways exist, with green representing an accessible direct pathway
in which all mutations are forward, and black representing an accessible indirect pathway in which
mutational reversions exist.
Researchers have investigated the phenotypic impact of multiple mutation sites in
a population, specifically on drug resistance and higher fitness. Here, we focus on four
elements of this impact: mutational trajectories, subsequent increased fitness, indirect
trajectories, and tradeoff efficiency.
We developed an improved understanding of indirect mutational trajectories
through utilizing algorithmic development. The algorithm allows direct and indirect
mutational pathway identification, accounting for the effect of sign epistasis on
5
mutational pathway fitness. Through analysis of algorithm-identified direct and indirect
mutational pathways for accessibility, we determined the fitness effects of mutational
pathways that incorporate additional steps in the form of reversions.
With the inclusion of sign epistasis in analyzed fitness landscapes, we found a
percentage of direct pathways do not satisfy the requirements of accessibility, and thus
speculated that a trade-off exists in the emergence of indirect pathways with
monotonically increasing fitness. This manifestation is evident in two ways; a decrease in
accessible direct mutational trajectories and an increase in accessible indirect mutational
trajectories. In further investigating the tradeoff efficiency, or the ratio of the number of
accessible indirect mutational pathways to the total number of inaccessible direct
pathways, we found that there exists a positive correlation between indirect pathway
accessibility and number of mutational sites. We further determined that a negative
correlation exists between increase in length of the indirect mutational trajectories and the
subsequent amount of accessible trajectories, indicating the trade-off between direct and
indirect pathways due to sign epistasis is not one-for-one. Findings from this project
provide an improved understanding of the trajectories leading to genotypes of higher
fitness and the effect of differing mutational pathways on overall fitness of a population.
6
Methods
Binary Representation of Mutational Pathways
The algorithm outputting mutational pathways functions is based on binary values
converted to decimal values. A wild type organism is represented by the decimal value 0,
indicating no mutations have occurred. Each mutation occurring in the organism at a
specific allele is indicated by the flip of a bit, or the change of that bit from 0 to 1. A
genotype is the total combination of the flipped and unflipped bits, contributing to a
corresponding phenotype of the organism. In the context of this study, the genotype in
which all bits have been flipped is assumed to be of highest fitness, or the final genotype
a population will reach. The representation of the bit system can be expanded to include
any number of bits, or relevant loci that contribute to genotypic change. For the purposes
of algorithm development, all binary numbers were converted to their decimal equivalent,
allowing for easier computation when applied in code. Each decimal number corresponds
to a unique genotype of an organism with a corresponding fitness value.
Modeling Evolution and Mutation
For the purposes of modeling the evolution of a population, the binary system
follows a set of rules. The first is that in each step of a produced pathway, there can only
be one bit change at a time, meaning only one flip from 0 to 1 in a direct pathway, or one
flip from 0 to 1 or 1 to 0 in an indirect pathway may occur. The second rule follows that
7
no genotype, or combination of alleles, may be visited more than once in a given
pathway.
In the direct mutational pathway system, bits may only be switched from 0 to 1,
thus resulting in a number of steps in the pathway equivalent to the number of bits, or
mutational sites representing the organism. For example, in an organism with three
alleles, each represented by a bit, a pathway from wild type 000 to the final genotype 111
consists of three steps. This results in the creation of a hypercube, as shown in Figure 2a.
The direct pathway requires that only forward arrows be followed in the hypercube. The
resultant number of available direct pathways is therefore equivalent to L!, where L is the
number of bits representing the genotype of the organism.
In the indirect mutational pathway system, bits can be switched from either 0 to 1
or 1 to 0, the latter referred to as a reversion. The same rules of the direct pathway system
apply, in which only one mutation, or bit flip, can occur in a step and no genotype can be
visited twice. In the context of the indirect pathway hypercube, either the forwards or
backwards arrow can be followed assuming the previously stated stipulations are met, as
shown in Figure 2b. For both systems, the hypercube consists of a number of edges
equaling:
The length of the indirect pathways varies based on the amount of reversions
included in the pathway. Indirect pathway length for an L bit system includes L, L+2 …
L+2*(L-1), where the total number of both direct and indirect pathways is dependent on
the number of L mutation sites of the organism.
8
Figure 2a.
111
110
100
101
011
010
001
000
Figure 2b.
111
110
100
101
011
010
001
000
Figure 2: Direct and Indirect Pathways of an organism with L=3 mutational sites. Examples of potential
mutational trajectories leading to a genotype of highest fitness in a three-bit model. The gray arrows
represent all potential direct mutational trajectories. (a) Examples of direct mutational pathways leading
to higher fitness. Two accessible direct pathways exist, represented in green and purple, in which all
mutations are forward and the number of steps is equal to L. (b) Examples of accessible indirect mutational
trajectories leading to higher fitness. Two accessible indirect pathways are shown, represented by the red
and blue arrows, in which mutational reversions exist. The pathways consist of 5 steps.
9
Algorithm Development: Direct Pathways
Coding for the algorithm was conducted using a combination of MATLAB and
Julia languages. The initial algorithm derived has the function of outputting all direct
mutational trajectories, and is based on a tree structure consisting of binary values
converted into decimals. Each branch layer of the tree corresponds to the possible next
steps, or corresponding bit flips, of the binary representation of the organism
genotype. There are two inputs for the function, bits, meaning number of bits in the
system or number of loci of the organism, and path, meaning the path thus far that the
algorithm has determined. The initial input of the algorithm is the vector [0], modeling
the starting point of the algorithm as the wild type genotype.
The algorithm is recursive in nature, with accessible mutational trajectories
determined by checking the newly outputted step of the pathway against the most recent
path input into the algorithm. Each step in the pathway, labeled nextstep, is determined
based on the most recently input step in path, labeled current. In the direct pathway, 20,
21, … 2L-1 is added to current to output the possible nextstep, which is then checked
against the values in the current path to determine if the decimal representation has
already been visited. If so, the pathway is discarded and the algorithm starts at the most
recently visited step. If nextstep is determined to not be present in path, then the value is
added to path and the concatenated vector functions as the new input value. The process
stops when nextstep is equal to 2L-1, meaning it has reached the decimal equivalent of the
final step in the path in which all bits are flipped. The final output of the algorithm
includes all viable pathways that fulfill the criteria; bits may only be flipped from 0 to 1,
10
no nextstep can be visited twice, and no value can surpass the final genotype, represented
by 2L-1.
Algorithm Development: Indirect Pathways
Indirect pathway algorithm development required additional restrictions in path
determination. The inputs of bits and path of the original algorithm are maintained. The
tree structure is updated, however, to additionally include the subtraction of values of 20,
21, … 2L-1, with each branch as a reflection of the addition or subtraction of these values
to the previous branch. The stipulation remains that no value can be revisited.
This formulated as an algorithm remains similar to the algorithm of direct
pathways, instead with the algorithm allowing for the addition or subtraction of values of
20, 21, … 2L-1. The output is not limited to path lengths of L steps, and allows for paths of
L steps, L+20, 21, … 2L-1 steps. The process of checking nextstep against path, and
subsequent recursion remains the same as delineated above. The final output of the
algorithm includes all viable pathways that fulfill the criteria; bits may be flipped from 0
to 1 or 1 to 0 indicating the presence of both addition and subtraction, no nextstep can be
visited twice, and no value can surpass the final genotype, represented by 2L-1 (Fig. 3).
11
Fig. 3.
12
Figure 3: Pseudocode of algorithm outputting all accessible direct and indirect pathways for a given
dataset.
Data Sources
Data sets imported to determine fitness accessibility of pathways were gathered
from 15 relevant studies in higher order epistasis (Weinreich et al., 2013). The data sets
consist of systems ranging from L = 3 bits to L = 9 bits.
Computing Selective Accessibility
Imported data sets of fitness values and associating the values with their
corresponding genotypes were used to determine the accessibility of a given pathway.
This system establishes a rank amongst genotypes with regards to evolutionary fitness.
Using DataFrames, the algorithm imports the fitness values into a vector. At each
additional step in the pathway, the algorithm indexes the created fitness value vector,
called fitness_data, using the decimal value of the current genotype as an index to
determine the associated fitness value. It then tests the current genotype fitness value
against the nextstep genotypic fitness value. If the fitness of the nextstep is greater than
the fitness of current, then the nextstep genotype is added to path and the recursion
continues. Otherwise, the path is discarded. A mutational pathway is only printed if the
entire path is monotonically increasing in fitness.
In a number of data sets included in this study, the wild type genotype is
considered of the highest fitness, with subsequent mutations deleterious in nature. In
13
these instances, we considered the data set in reverse in order to determine the fitness
accessibility of the pathway, as defined as monotonically increasing. In importing the
data using DataFrames, the vector of fitness values was reversed in order to establish the
wild type genotype as lowest fitness, adhering to the requirements of the algorithm.
Mutations forward were therefore considered beneficial. This established usability of
data sets otherwise unusable for the purposes of this study.
Determining Correlation of Tradeoff Efficiency and Mutational Sites
The algorithm was first implemented in order to determine the number of
accessible mutational trajectories, with direct pathway length equaling the number of
mutational differences between the starting and ending genotype, L. The tradeoff
efficiency ratio for direct mutational pathways was then calculated, as the ratio of
accessible direct mutational trajectories over the total number of direct mutational
trajectories without the presence of sign epistasis, L!. These ratios were plotted against
the number of mutational sites of the datasets, L. Regression analysis was then used to
determine the associated correlation coefficient. Analysis of tradeoff efficiency was then
conducted for indirect mutational trajectories. The indirect mutational trajectories were
categorized based on the additional number of steps as compared to the direct mutational
pathway, in increments of 2. Given this information, the next step was determining the
efficiency of the tradeoff. In the context of the data, tradeoff efficiency of indirect
mutational pathways is the ratio of the number of accessible indirect mutational pathways
of a given length to the number of accessible direct mutational pathways subtracted from
14
the number of total direct pathways, L!. These ratios were then log transformed and
plotted based on the associated number of mutational sites, L. This is represented in the
values of the x-axis; where L = 3, 4, 5, 6, 9.
Determining Correlation of Tradeoff Efficiency and Path Length
The algorithm was implemented in order to determine the number of accessible
mutational trajectories, with the indirect mutational trajectories categorized based on the
additional number of steps in increments of 2, and separated based on number of
mutational sites, L. Given this information, the next step was determining the efficiency
of the tradeoff. In the context of the data, tradeoff efficiency is the ratio of the number of
accessible indirect pathways of a given length to the number of accessible direct
pathways subtracted from the number of total direct pathways, L!. More colloquially, this
is the ratio of the number of accessible indirect mutational trajectories to the total number
of inaccessible direct mutational pathways as a result of sign epistasis. These ratios were
log transformed and plotted based on the increase of mutational trajectory length in
relation to the direct mutational pathway; Indirect +2, Indirect +4, and Indirect +6. This is
represented in the values of the x-axis; 2, 4, and 6.
In order to establish the correlation between pathway length and the tradeoff
efficiency ratio, data was normalized based on the number of bits in the dataset. This
produced a single plot for analysis, displaying results for L=3, 4, 5, 6, 9.
15
Results
Accessibility of Pathways
The existence of both direct and indirect pathways that monotonically increase in
fitness indicates the presence of sign epistasis in the fitness landscape. Based on the
available number of alleles, L, there exist L! direct mutational pathways that lead to a
given fitness peak. Sign epistasis, while restricting the accessibility of these direct
pathways, can contribute to the increased accessibility of indirect mutational pathways.
Pathway accessibility was defined as those mutational trajectories monotonically
increasing in fitness. All potential pathways were enumerated through the use of an
algorithm, which output both direct and indirect trajectories from wild type to genotypes
of highest fitness. Direct trajectories are defined as trajectories composed of the shortest
number of steps necessary from a low to highest fitness genotype, that is the number of
mutational differences between the starting and ending genotypes. An indirect pathway
is one in which the fixation of a mutation can cause a previously beneficial mutation to
become deleterious. In this case, the reversal of the fitness effect of a mutation, as a result
of sign epistasis, creates a formerly beneficial mutation that is now beneficial in its
reversion. As a result of this situation, a population may require more steps in a mutation
trajectory in order to achieve higher fitness, as compared to the number of steps
necessary in a direct pathway. Without the consideration of these indirect pathways, the
number of determined accessible pathways is not reflective of the accessible pathways in
the fitness landscape in which the observed organism exists. Nevertheless, indirect
pathways can contribute to the overall adaptive capabilities of a population (Palmer,
16
2015). In mutational trajectories, no genotype of an organism in a direct or indirect
pathway may be visited more than once. The final result in which mutations on all alleles
have occurred in the organism is recognized as the fitness peak of the landscape.
Weinreich et al. (2005) defines sign epistasis as a constraint of natural selection,
as it reduces the number of accessible mutational trajectories. We applied these
restrictions to the 15 datasets on a genetic background influenced by sign epistasis. As a
result, we found that accessibly direct mutational trajectories in which every step resulted
in an increase in fitness accounted for 1.87% of the expected number of direct trajectories
in a landscape without sign epistasis. All accessible mutational pathways as determined
by algorithm analysis follow that each mutation, or step in the mutational trajectory, is
beneficial for the organism. Direct mutational trajectories accounted for approximately
53.4% of the total number of accessible mutational pathways as a result of sign epistasis.
For each study used, the proportion of accessible direct pathways over L!, or the
value of the total number of direct pathways, is indicated in Table 1. The ratio indicates
the percentage of remaining accessible direct pathways in the presence of sign epistasis,
as well as the percentage of direct pathways with lost accessibility as a result of the
presence of sign epistasis. The fraction of direct mutational trajectories that are
selectively accessible is plotted against the number of mutational sites in a given dataset,
L (Fig. 4).
17
Table 1.
Dataset
Brown 2010
Malcom 1990
Constanzo 2011
Chou 2011
Lozovsky 2009
Khan 2011
Tan 2011
Weinreich 2006
Whitlock Walsh 2000
daSilva 2010
deVisser 2009
O'Maille 2008
Hall 2010 Diploid
Hall 2010 Haploid
Lunzer 2005
Size (L)
Number of
Direct Paths
3
3
3
4
4
5
5
5
5
5
5
6
6
6
9
6
2
3
24
2
86
5
9
27
17
25
27
40
7
6564
Fraction of direct
mutational trajectories
that are selectively
accessible
1.000000
0.333333
0.500000
1.000000
0.083333
0.716667
0.041667
0.075000
0.225000
0.141667
0.208333
0.037500
0.055556
0.009722
0.018089
Table 1: The ratio of accessible direct pathways over total direct pathways, L!, for each dataset. The
dataset name, number of mutational sites, L, and direct pathway ratio are given for each dataset .
18
Fig. 4.
#Direct Pathways/L!
Direct Pathways/L!
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
y = -0.1299x + 0.9373
R² = 0.3288
#Direct/L!
0
2
4
6
8
Number of Mutational Sites, L
10
Figure 4: Fraction of direct mutational trajectories that are selectively accessible versus number of
mutational sites: The ratio of accessible direct pathways to the total number of direct pathways L!, plotted
against the total number of mutational sites in a given dataset, L . The plot is fitted with a regression line,
with R2 = .32883, and r = -.57343.
Efficiency of Tradeoff
In the presence of sign epistasis, there exists a number of direct pathways that
become functionally inaccessible. We found that the number of accessible indirect
pathways accounts for 1.66% of functionally inaccessible direct pathways as a result of
sign epistasis. Further, indirect mutational trajectories account for 46.6% of accessible
pathways. We therefore can conclude that the inclusion of indirect pathways greatly
increases the amount of pathways leading to a determined fitness peak.
We refer to the evolutionary tradeoff as the increased accessibility of indirect
trajectories due to sign epistasis. In order to measure this evolutionary tradeoff, we
19
determined the total number of indirect accessible pathways, distinguishing based on the
length of pathway; where the number of steps in the direct pathway equals L (the number
of alleles), and each subsequent indirect pathway has length L+2, …L+2*(L-1), as shown
in Figure 5, a hypercube representing mutational sites L = 4. These values were then
divided by the total number of lost direct pathways, as determined by the number of
accessible direct pathways subtracted from L!, the number of total direct pathways. This
value is referred to as the ‘tradeoff efficiency’, meaning the ratio of the number of
accessible indirect mutational pathways to the total number of inaccessible direct
pathways as a result of sign epistasis.
Fig. 5.
Figure 5: Direct and Indirect Pathways of an organism with L=4 mutational sites. Examples of potential
trajectories leading to a genotype of highest fitness in a four-bit model. The gray arrows represent all
potential pathways. Two accessible pathways exist, with green representing an accessible direct pathway
20
in which all mutations are forward, and black representing an accessible indirect pathway in which
mutational reversions exist. Figure from Zagorski et al. 2016.
For the purposes of our study, the most relevant tradeoff efficiencies are related to
the total number of accessible indirect mutational trajectories. We distinguished tradeoff
efficiency ratios based on the length of the indirect trajectory as compared to the direct
trajectory length and plotted these values against the total number of mutational sites, L.
We found that tradeoff efficiency of an organism as related to the accessibility of indirect
pathways is negatively correlated with increase in indirect pathway length, with the
maximum tradeoff efficiency values at Indirect +2.
Regression analysis further distinguished this trend amongst datasets with
differing numbers of mutation sites. Organisms with L=3 and L=4 mutations sites,
independent of path length, have a lower tradeoff efficiency ratio as compared to
organisms of mutational sites L=5 and L=6, as shown in Figure 6. This suggests that as
the number of mutations increases in an organism, there is an increase in accessibility due
to indirect pathways.
Table 2.
21
Dataset
Brown 2010
Malcom 1990
Constanzo
2011
Chou 2011
Lozovsky 2009
Khan 2011
Tan 2011
Weinreich
2006
Whitlock
Walsh 2000
daSilva 2010
deVisser 2009
O'Maille 2008
Hall 2010
Diploid
Hall 2010
Haploid
Lunzer 2005
Size, L
Fraction of
indirect
mutational
trajectories of
length +2 that
are selectively
accessible
3
3
3
0
0
0.333
Fraction of
indirect
mutational
trajectories of
length +4 that are
selectively
accessible
0
0
0
Fraction of
indirect
mutational
trajectories of
length +6 that
are selectively
accessible
Fraction of
indirect
mutational
trajectories of
length +8 that are
selectively
accessible
4
4
5
5
5
0
0
1.853
0.017
0.063
0
0
0.176
0
0.018
0
0
0
0
0
0
0
0
0
0
5
0.376
0.183
0.022
0
5
5
6
6
0.078
0
0.017
0.041
0
0
0
0.037
0
0
0
0.018
0
0
0
0
6
0.003
0
0
0
9
0.016
0
0
0
Table 2: The ratio of accessible indirect pathways over accessible indirect pathways as distinguished by
length subtracted from the number total direct pathways, L!, for each dataset . The dataset name, number of
mutational sites, L, and indirect pathway ratio, referred to as tradeoff efficiency ratio, are given for each
dataset.
Figure 6.
22
⁄⁄
-∞
Figure 6: Log transform of indirect trajectory tradeoff efficiency ratio versus number of mutational
sites: The log transform of the ratio of accessible indirect pathways over accessible indirect pathways as
distinguished by length subtracted from the number total direct pathways, L!, plotted against the total
number of mutational sites in a given dataset, L. The indirect pathways of interest are of the length of the
direct pathway plus 2, plus 4, and plus 6.
Determining Correlation of Tradeoff Efficiency and Path Length
Factors of interest relating to increased tradeoff efficiency and organismal fitness
included the overall path length, as a variable independent of the number of mutational
sites, L, in an organism. We examined this relationship through the use of tradeoff
efficiency as an indicator of indirect mutational pathway viability, and found the
correlation between the increase in path length from the direct mutational pathway and
the resulting tradeoff efficiency ratio for a given trajectory. To standardize the data, we
23
tabulated the added lengths of the indirect paths. The lengths of mutational pathways
were given in relation to the increase in length of a direct mutational pathway, with
subsequent indirect paths given lengths of 0+2*1, …0+2*(Lmax-1). Indirect paths beyond
the length of the direct trajectory +8 were found to contribute less than 1% to the total
number of selectively accessible trajectories. Tradeoff efficiency was again determined as
the ratio of number of accessible indirect pathways divided by the number of accessible
direct pathways subtracted from L!, the total number of direct pathways without the
influence of sign epistasis. The relationship between tradeoff efficiency and the increase
in length of the pathway showed a negative correlation between tradeoff efficiency and
pathway length (Fig. 7). This indicates that the number of accessible indirect pathways
decreases with an increase in trajectory length as a result of mutational reversions.
Fig. 7.
24
Figure 7: Log Transform of Length of Indirect Pathway versus the Tradeoff Efficiency Ratio: The log
transformation of the ratio of accessible indirect pathways over accessible indirect pathways as
distinguished by length subtracted from the number total direct pathways, L!, referred to as the Tradeoff
Efficiency Ratio, plotted against the size of the step increase in direct pathway length . The indirect
pathways of interest are of the length of the direct pathway plus 2, 4, and 6 .
Discussion
25
From the analysis in this study, we quantified the extent to which sign epistasis
results in the presence of both direct and indirect mutational trajectories leading to
increased fitness, or drug resistance. Thus, while previous studies focus on the
evolutionary value of direct trajectories, the inclusion of indirect pathways is valuable in
considering the mutational trajectories leading to higher fitness.
In evaluating epistatic tradeoff in the presence of sign epistasis, the efficiency of
indirect mutational pathways in providing increased accessibility increases as the number
of mutational sites increases. This relationship is contingent on the length of the indirect
mutational pathway, however, as the tradeoff efficiency of indirect pathways is
negatively correlated with subsequent increases in mutational trajectory length. In
evaluating indirect mutational pathway contributions to tradeoff efficiency, it is evident
that the increase accessibility of indirect trajectories is not sufficient to replace the lost
accessible direct mutational pathways.
The analysis conducted in this study assumes a binary model in characterizing
selectively accessible trajectories towards higher fitness. This model is simplified in two
areas. First, it is assumed that mutations hold a binary form, and can either be switched
on or off. In a more complex model, mutational analysis can be expanded to investigate
the potential of additional mutations, specifically of 4, representing the nucleotides A, C,
T, and G, or of 20, representing all possible amino acids, as conducted in additional
studies (Zagorski et al., 2016). Second, in evaluating the tradeoff efficiency ratio of
indirect pathways, the denominator is based on the number of inaccessible direct
pathways. In improving our understanding of the tradeoff efficiency, the algorithm can be
26
extended to account for the number of potentially accessible indirect mutational
trajectories as determined by length beyond expected direct pathways; L+2, L+4, …,
L+2*(Lmax-1). The datasets included in this study consisted of mutation site values, L,
that required a significant amount of computational power in order to run for L > 5. Due
to the recursive nature of the mutational trajectory algorithm, these values were not
computable in a manageable timeframe for the purposes of this study.
Additional analysis regarding occurrence of sign epistasis can be conducted
through the analysis of the value of sign epistatic density in a given mutational trajectory
hypercube as determined by number of mutational sites, L. Sign epistatic density is
defined as the fraction of cases in which a mutation that is normally beneficial is
deleterious (Weinreich et al., 2017). For the purposes of this study, the evaluation of sign
epistatic density would be calculated based on individual datasets, in order to determine
the environment of the data (Weinreich et al., 2006). This value, in conjunction with the
value of accessible indirect pathways as divided by the total number of indirect pathways
of a given length, can be used to determine a correlation between sign epistatic incidence
and direct and indirect trajectory accessibility.
Additionally, it is assumed that all beneficial mutations on a genetic background
are equally likely, which does not necessarily hold in the presence of sign epistasis. In
general, natural selection affects the likelihood of beneficial mutations occurring, as the
process may favor some mutations over others. The probability of a beneficial mutation
occurrence is further affected by mutational biases, such an increased chance of a
transition occurring than a transversion. In understanding the evolutionary implications of
the increased accessibility of indirect pathways, further analysis can be done to determine
27
the predictability of evolution, that is the probability of each step in a pathway occurring.
This can therefore more effectively elucidate the probability of organismal evolution
towards drug resistance and higher fitness, and the analysis of genetic interactions as
expanding the mutational pathways in which this genotype is ultimately reached.
Ultimately, our analyses demonstrate the evolutionary significance of indirect
mutational trajectories as they function in the presence of sign epistasis. First, we
determined that there exists a positively correlation between number of mutational sites
and accessibility of indirect mutational pathways. Second, we proved that the tradeoff
efficiency of indirect mutational trajectories decreases as mutational pathway length
increases. The findings from this project provide an improved understanding of the
mechanisms of drug resistance and the effect of differing mutational pathways on overall
fitness of an organism. Further experimental analysis will serve to effectively analyze
observed gaps in accessibility of indirect mutational trajectories as it relates to
inaccessibility of direct mutational trajectories due to sign epistasis.
Acknowledgements
28
I would like to thank Professor Dan Weinreich for providing the guidance,
mentorship, and support throughout my thesis project and undergraduate research
experience. I would additionally like to thank Professor Anastasios Matzavinos for
providing second advising for my thesis, and guidance in my Applied Mathematics
pursuits. I would lastly like to extend my gratitude to the Weinreich Lab for their support
throughout this process.
References
29
[1] Palmer, A.C., Toprak, E., Baym, M., Kim, S., Veres, A., Bershtein, S. and Kishony,
R., 2015. Delayed commitment to evolutionary fate in antibiotic resistance fitness
landscapes. Nature Communications, 6.
[2] Zagorski, M., Burda, Z. and Waclaw, B., 2016. Beyond the Hypercube: Evolutionary
Accessibility of Fitness Landscapes with Realistic Mutational Networks. PLOS
Computational Biology, 12(12), p.e1005218.
[3] Weinreich, D.M., Delaney, N.F., DePristo, M.A. and Hartl, D.L., 2006. Darwinian
evolution can follow only very few mutational paths to fitter proteins. Science,
312(5770), pp.111-114.
[4] Knies, J., Cai, F. and Weinreich, D.M., 2017. Enzyme efficiency but not
thermostability drives cefotaxime resistance evolution in TEM-1 β-lactamase. Molecular
biology and evolution.
[5] de Visser, J.A.G., Park, S.C. and Krug, J., 2009. Exploring the effect of sex on
empirical fitness landscapes. The American Naturalist, 174(S1), pp.S15-S30.
[6] Hall, D.W., Agan, M. and Pope, S.C., 2010. Fitness epistasis among 6 biosynthetic
loci in the budding yeast Saccharomyces cerevisiae. Journal of Heredity, 101(suppl 1),
pp.S75-S84.
[7] Khan, A.I., Dinh, D.M., Schneider, D., Lenski, R.E. and Cooper, T.F., 2011.
Negative epistasis between beneficial mutations in an evolving bacterial population.
Science, 332(6034), pp.1193-1196.
[8] Lozovsky, E.R., Chookajorn, T., Brown, K.M., Imwong, M., Shaw, P.J.,
Kamchonwongpaisan, S., Neafsey, D.E., Weinreich, D.M. and Hartl, D.L., 2009.
Stepwise acquisition of pyrimethamine resistance in the malaria parasite. Proceedings of
the National Academy of Sciences, 106(29), pp.12025-12030.
[9] Lunzer, M., Miller, S.P., Felsheim, R. and Dean, A.M., 2005. The biochemical
architecture of an ancient adaptive landscape. Science, 310(5747), pp.499-501.
[10] O'Maille, P.E., Malone, A., Dellas, N., Hess, B.A., Smentek, L., Sheehan, I.,
Greenhagen, B.T., Chappell, J., Manning, G. and Noel, J.P., 2008. Quantitative
exploration of the catalytic landscape separating divergent plant sesquiterpene synthases.
Nature Chemical Biology, 4(10), pp.617-623.
[11] Aita, T. and Husimi, Y., 1996. Fitness spectrum among random mutants on Mt.
Fuji-type fitness landscape. Journal of Theoretical Biology, 182(4), pp.469-485.
30
[12] Tan, L., Serene, S., Chao, H.X. and Gore, J., 2011. Hidden randomness between
fitness landscapes limits reverse evolution. Physical review letters, 106(19), p.198102.
[13] Brown, K.M., Costanzo, M.S., Xu, W., Roy, S., Lozovsky, E.R. and Hartl, D.L.,
2010. Compensatory mutations restore fitness during the evolution of dihydrofolate
reductase. Molecular Biology and Evolution, 27(12), pp.2682-2690.
[14] Whitlock, M.C. and Bourguet, D., 2000. Factors affecting the genetic load in
Drosophila: synergistic epistasis and correlations among fitness components. Evolution,
54(5), pp.1654-1660.
[15] Bridgham, J.T., Carroll, S.M. and Thornton, J.W., 2006. Evolution of hormonereceptor complexity by molecular exploitation. Science, 312(5770), pp.97-101.
[16] Chou, H.H., Chiu, H.C., Delaney, N.F., Segrè, D. and Marx, C.J., 2011.
Diminishing returns epistasis among beneficial mutations decelerates
adaptation. Science, 332(6034), pp.1190-1192.
[17] da Silva, J., Coetzer, M., Nedellec, R., Pastore, C. and Mosier, D.E., 2010. Fitness
epistasis and constraints on adaptation in a human immunodeficiency virus type 1 protein
region. Genetics, 185(1), pp.293-303.
[18] Costanzo, M.S. and Hartl, D.L., 2011. The evolutionary landscape of antifolate
resistance in Plasmodium falciparum. Journal of Genetics, 90(2), pp.187-190.
31