Indirect Mutational Pathways to Population Fitness Amanda Zajac Thesis submitted in partial fulfillment for Honors in Applied Mathematics – Biology, Sc.B. Brown University April 15th, 2017 Thesis Advisor: Daniel Weinreich, Ph.D. Second Reader: Anastasios Matzavinos, Ph.D. Table of Contents Abstract ..................................................................................................................... 3 Introduction ...............................................................................................................4 Methods ..................................................................................................................... 7 Results ..................................................................................................................... 16 Accessibility of Pathways .............................................................................16 Efficiency of Tradeoff ................................................................................... 19 Determining Correlation of Fitness and Path Length .................................23 Discussion ...............................................................................................................26 Acknowledgments ...................................................................................................29 References ...............................................................................................................30 2 Abstract Mutational paths to a better-adapted genotype are characterized as monotonically increasing in organismal fitness. Here, we identify and evaluate the accessibility of direct mutational trajectories, as well as the evolutionary implications of indirect trajectories caused by mutational reversions, as a result of sign epistasis. Through the analysis of empirical fitness values of organisms, we found that indirect mutational pathways increase the number of pathways leading to a determined fitness peak. The value of tradeoff efficiency is defined as the ratio of the number of accessible indirect pathways to the total number of inaccessible direct pathways as a result of sign epistasis. In evaluating this metric, we determined that a positive correlation exists between an increase in mutation sites and accessible indirect pathways. In this study, we use a number of analytic and algorithmic mechanisms to further explore the mechanisms of drug resistance and the effect of differing mutational pathways on overall fitness of an organism. 3 Introduction Current analysis of mutational paths to higher fitness is predominantly based on the existence and identification of direct mutational trajectories. A mutational trajectory is a pathway that leads from a wild type genotype to a better-adapted genotype. The trajectory is considered direct when it consists of the shortest number of steps necessary to reach the better-adapted genotype. In this sense, the length of the direct mutational trajectory is equal to the number of mutational differences between the starting and ending genotypes. To determine evolutionary relevance, mutational pathways are evaluated on their accessibility as a population evolves. Accessibility of mutational pathways is determined based on the concept of monotonically increasing fitness, in which every step in a given trajectory results in an increase in fitness. Weinreich et. al. (2005) postulates that the genetic constraint as exhibited on direct pathways is contingent on the existence of sign epistasis. Sign epistasis results in the failure of a mutation to consistently influence a given phenotype on all genetic backgrounds, affecting the accessibility of a given mutational trajectory (Weinreich et al., 2013). For the purposes of this study, the relevant effects of sign epistasis include mutations that are beneficial on some backgrounds, and deleterious on others. As a result of this situation, an indirect mutational trajectory composed of more steps may be necessary in order to achieve higher fitness in a population, as compared to the number of steps necessary in a direct pathway. These additional steps, backwards in nature, are referred to as a reversion. The potential mutational trajectories between a wild type and higher fitness genotype are exemplified in the form of a hypercube, which includes all possible pathways to the higher fitness genotype (Fig. 1). Accessible direct and indirect 4 mutational trajectories are then evaluated on accessibility, determined on the basis of monotonically increasing fitness. Fig. 1. 111 110 100 101 011 010 001 000 Figure 1: Direct and Indirect Pathways of an organism with L=3 mutational sites. Examples of potential trajectories leading to a genotype of highest fitness in a three-bit model. The gray arrows represent all potential pathways. Two accessible pathways exist, with green representing an accessible direct pathway in which all mutations are forward, and black representing an accessible indirect pathway in which mutational reversions exist. Researchers have investigated the phenotypic impact of multiple mutation sites in a population, specifically on drug resistance and higher fitness. Here, we focus on four elements of this impact: mutational trajectories, subsequent increased fitness, indirect trajectories, and tradeoff efficiency. We developed an improved understanding of indirect mutational trajectories through utilizing algorithmic development. The algorithm allows direct and indirect mutational pathway identification, accounting for the effect of sign epistasis on 5 mutational pathway fitness. Through analysis of algorithm-identified direct and indirect mutational pathways for accessibility, we determined the fitness effects of mutational pathways that incorporate additional steps in the form of reversions. With the inclusion of sign epistasis in analyzed fitness landscapes, we found a percentage of direct pathways do not satisfy the requirements of accessibility, and thus speculated that a trade-off exists in the emergence of indirect pathways with monotonically increasing fitness. This manifestation is evident in two ways; a decrease in accessible direct mutational trajectories and an increase in accessible indirect mutational trajectories. In further investigating the tradeoff efficiency, or the ratio of the number of accessible indirect mutational pathways to the total number of inaccessible direct pathways, we found that there exists a positive correlation between indirect pathway accessibility and number of mutational sites. We further determined that a negative correlation exists between increase in length of the indirect mutational trajectories and the subsequent amount of accessible trajectories, indicating the trade-off between direct and indirect pathways due to sign epistasis is not one-for-one. Findings from this project provide an improved understanding of the trajectories leading to genotypes of higher fitness and the effect of differing mutational pathways on overall fitness of a population. 6 Methods Binary Representation of Mutational Pathways The algorithm outputting mutational pathways functions is based on binary values converted to decimal values. A wild type organism is represented by the decimal value 0, indicating no mutations have occurred. Each mutation occurring in the organism at a specific allele is indicated by the flip of a bit, or the change of that bit from 0 to 1. A genotype is the total combination of the flipped and unflipped bits, contributing to a corresponding phenotype of the organism. In the context of this study, the genotype in which all bits have been flipped is assumed to be of highest fitness, or the final genotype a population will reach. The representation of the bit system can be expanded to include any number of bits, or relevant loci that contribute to genotypic change. For the purposes of algorithm development, all binary numbers were converted to their decimal equivalent, allowing for easier computation when applied in code. Each decimal number corresponds to a unique genotype of an organism with a corresponding fitness value. Modeling Evolution and Mutation For the purposes of modeling the evolution of a population, the binary system follows a set of rules. The first is that in each step of a produced pathway, there can only be one bit change at a time, meaning only one flip from 0 to 1 in a direct pathway, or one flip from 0 to 1 or 1 to 0 in an indirect pathway may occur. The second rule follows that 7 no genotype, or combination of alleles, may be visited more than once in a given pathway. In the direct mutational pathway system, bits may only be switched from 0 to 1, thus resulting in a number of steps in the pathway equivalent to the number of bits, or mutational sites representing the organism. For example, in an organism with three alleles, each represented by a bit, a pathway from wild type 000 to the final genotype 111 consists of three steps. This results in the creation of a hypercube, as shown in Figure 2a. The direct pathway requires that only forward arrows be followed in the hypercube. The resultant number of available direct pathways is therefore equivalent to L!, where L is the number of bits representing the genotype of the organism. In the indirect mutational pathway system, bits can be switched from either 0 to 1 or 1 to 0, the latter referred to as a reversion. The same rules of the direct pathway system apply, in which only one mutation, or bit flip, can occur in a step and no genotype can be visited twice. In the context of the indirect pathway hypercube, either the forwards or backwards arrow can be followed assuming the previously stated stipulations are met, as shown in Figure 2b. For both systems, the hypercube consists of a number of edges equaling: The length of the indirect pathways varies based on the amount of reversions included in the pathway. Indirect pathway length for an L bit system includes L, L+2 … L+2*(L-1), where the total number of both direct and indirect pathways is dependent on the number of L mutation sites of the organism. 8 Figure 2a. 111 110 100 101 011 010 001 000 Figure 2b. 111 110 100 101 011 010 001 000 Figure 2: Direct and Indirect Pathways of an organism with L=3 mutational sites. Examples of potential mutational trajectories leading to a genotype of highest fitness in a three-bit model. The gray arrows represent all potential direct mutational trajectories. (a) Examples of direct mutational pathways leading to higher fitness. Two accessible direct pathways exist, represented in green and purple, in which all mutations are forward and the number of steps is equal to L. (b) Examples of accessible indirect mutational trajectories leading to higher fitness. Two accessible indirect pathways are shown, represented by the red and blue arrows, in which mutational reversions exist. The pathways consist of 5 steps. 9 Algorithm Development: Direct Pathways Coding for the algorithm was conducted using a combination of MATLAB and Julia languages. The initial algorithm derived has the function of outputting all direct mutational trajectories, and is based on a tree structure consisting of binary values converted into decimals. Each branch layer of the tree corresponds to the possible next steps, or corresponding bit flips, of the binary representation of the organism genotype. There are two inputs for the function, bits, meaning number of bits in the system or number of loci of the organism, and path, meaning the path thus far that the algorithm has determined. The initial input of the algorithm is the vector [0], modeling the starting point of the algorithm as the wild type genotype. The algorithm is recursive in nature, with accessible mutational trajectories determined by checking the newly outputted step of the pathway against the most recent path input into the algorithm. Each step in the pathway, labeled nextstep, is determined based on the most recently input step in path, labeled current. In the direct pathway, 20, 21, … 2L-1 is added to current to output the possible nextstep, which is then checked against the values in the current path to determine if the decimal representation has already been visited. If so, the pathway is discarded and the algorithm starts at the most recently visited step. If nextstep is determined to not be present in path, then the value is added to path and the concatenated vector functions as the new input value. The process stops when nextstep is equal to 2L-1, meaning it has reached the decimal equivalent of the final step in the path in which all bits are flipped. The final output of the algorithm includes all viable pathways that fulfill the criteria; bits may only be flipped from 0 to 1, 10 no nextstep can be visited twice, and no value can surpass the final genotype, represented by 2L-1. Algorithm Development: Indirect Pathways Indirect pathway algorithm development required additional restrictions in path determination. The inputs of bits and path of the original algorithm are maintained. The tree structure is updated, however, to additionally include the subtraction of values of 20, 21, … 2L-1, with each branch as a reflection of the addition or subtraction of these values to the previous branch. The stipulation remains that no value can be revisited. This formulated as an algorithm remains similar to the algorithm of direct pathways, instead with the algorithm allowing for the addition or subtraction of values of 20, 21, … 2L-1. The output is not limited to path lengths of L steps, and allows for paths of L steps, L+20, 21, … 2L-1 steps. The process of checking nextstep against path, and subsequent recursion remains the same as delineated above. The final output of the algorithm includes all viable pathways that fulfill the criteria; bits may be flipped from 0 to 1 or 1 to 0 indicating the presence of both addition and subtraction, no nextstep can be visited twice, and no value can surpass the final genotype, represented by 2L-1 (Fig. 3). 11 Fig. 3. 12 Figure 3: Pseudocode of algorithm outputting all accessible direct and indirect pathways for a given dataset. Data Sources Data sets imported to determine fitness accessibility of pathways were gathered from 15 relevant studies in higher order epistasis (Weinreich et al., 2013). The data sets consist of systems ranging from L = 3 bits to L = 9 bits. Computing Selective Accessibility Imported data sets of fitness values and associating the values with their corresponding genotypes were used to determine the accessibility of a given pathway. This system establishes a rank amongst genotypes with regards to evolutionary fitness. Using DataFrames, the algorithm imports the fitness values into a vector. At each additional step in the pathway, the algorithm indexes the created fitness value vector, called fitness_data, using the decimal value of the current genotype as an index to determine the associated fitness value. It then tests the current genotype fitness value against the nextstep genotypic fitness value. If the fitness of the nextstep is greater than the fitness of current, then the nextstep genotype is added to path and the recursion continues. Otherwise, the path is discarded. A mutational pathway is only printed if the entire path is monotonically increasing in fitness. In a number of data sets included in this study, the wild type genotype is considered of the highest fitness, with subsequent mutations deleterious in nature. In 13 these instances, we considered the data set in reverse in order to determine the fitness accessibility of the pathway, as defined as monotonically increasing. In importing the data using DataFrames, the vector of fitness values was reversed in order to establish the wild type genotype as lowest fitness, adhering to the requirements of the algorithm. Mutations forward were therefore considered beneficial. This established usability of data sets otherwise unusable for the purposes of this study. Determining Correlation of Tradeoff Efficiency and Mutational Sites The algorithm was first implemented in order to determine the number of accessible mutational trajectories, with direct pathway length equaling the number of mutational differences between the starting and ending genotype, L. The tradeoff efficiency ratio for direct mutational pathways was then calculated, as the ratio of accessible direct mutational trajectories over the total number of direct mutational trajectories without the presence of sign epistasis, L!. These ratios were plotted against the number of mutational sites of the datasets, L. Regression analysis was then used to determine the associated correlation coefficient. Analysis of tradeoff efficiency was then conducted for indirect mutational trajectories. The indirect mutational trajectories were categorized based on the additional number of steps as compared to the direct mutational pathway, in increments of 2. Given this information, the next step was determining the efficiency of the tradeoff. In the context of the data, tradeoff efficiency of indirect mutational pathways is the ratio of the number of accessible indirect mutational pathways of a given length to the number of accessible direct mutational pathways subtracted from 14 the number of total direct pathways, L!. These ratios were then log transformed and plotted based on the associated number of mutational sites, L. This is represented in the values of the x-axis; where L = 3, 4, 5, 6, 9. Determining Correlation of Tradeoff Efficiency and Path Length The algorithm was implemented in order to determine the number of accessible mutational trajectories, with the indirect mutational trajectories categorized based on the additional number of steps in increments of 2, and separated based on number of mutational sites, L. Given this information, the next step was determining the efficiency of the tradeoff. In the context of the data, tradeoff efficiency is the ratio of the number of accessible indirect pathways of a given length to the number of accessible direct pathways subtracted from the number of total direct pathways, L!. More colloquially, this is the ratio of the number of accessible indirect mutational trajectories to the total number of inaccessible direct mutational pathways as a result of sign epistasis. These ratios were log transformed and plotted based on the increase of mutational trajectory length in relation to the direct mutational pathway; Indirect +2, Indirect +4, and Indirect +6. This is represented in the values of the x-axis; 2, 4, and 6. In order to establish the correlation between pathway length and the tradeoff efficiency ratio, data was normalized based on the number of bits in the dataset. This produced a single plot for analysis, displaying results for L=3, 4, 5, 6, 9. 15 Results Accessibility of Pathways The existence of both direct and indirect pathways that monotonically increase in fitness indicates the presence of sign epistasis in the fitness landscape. Based on the available number of alleles, L, there exist L! direct mutational pathways that lead to a given fitness peak. Sign epistasis, while restricting the accessibility of these direct pathways, can contribute to the increased accessibility of indirect mutational pathways. Pathway accessibility was defined as those mutational trajectories monotonically increasing in fitness. All potential pathways were enumerated through the use of an algorithm, which output both direct and indirect trajectories from wild type to genotypes of highest fitness. Direct trajectories are defined as trajectories composed of the shortest number of steps necessary from a low to highest fitness genotype, that is the number of mutational differences between the starting and ending genotypes. An indirect pathway is one in which the fixation of a mutation can cause a previously beneficial mutation to become deleterious. In this case, the reversal of the fitness effect of a mutation, as a result of sign epistasis, creates a formerly beneficial mutation that is now beneficial in its reversion. As a result of this situation, a population may require more steps in a mutation trajectory in order to achieve higher fitness, as compared to the number of steps necessary in a direct pathway. Without the consideration of these indirect pathways, the number of determined accessible pathways is not reflective of the accessible pathways in the fitness landscape in which the observed organism exists. Nevertheless, indirect pathways can contribute to the overall adaptive capabilities of a population (Palmer, 16 2015). In mutational trajectories, no genotype of an organism in a direct or indirect pathway may be visited more than once. The final result in which mutations on all alleles have occurred in the organism is recognized as the fitness peak of the landscape. Weinreich et al. (2005) defines sign epistasis as a constraint of natural selection, as it reduces the number of accessible mutational trajectories. We applied these restrictions to the 15 datasets on a genetic background influenced by sign epistasis. As a result, we found that accessibly direct mutational trajectories in which every step resulted in an increase in fitness accounted for 1.87% of the expected number of direct trajectories in a landscape without sign epistasis. All accessible mutational pathways as determined by algorithm analysis follow that each mutation, or step in the mutational trajectory, is beneficial for the organism. Direct mutational trajectories accounted for approximately 53.4% of the total number of accessible mutational pathways as a result of sign epistasis. For each study used, the proportion of accessible direct pathways over L!, or the value of the total number of direct pathways, is indicated in Table 1. The ratio indicates the percentage of remaining accessible direct pathways in the presence of sign epistasis, as well as the percentage of direct pathways with lost accessibility as a result of the presence of sign epistasis. The fraction of direct mutational trajectories that are selectively accessible is plotted against the number of mutational sites in a given dataset, L (Fig. 4). 17 Table 1. Dataset Brown 2010 Malcom 1990 Constanzo 2011 Chou 2011 Lozovsky 2009 Khan 2011 Tan 2011 Weinreich 2006 Whitlock Walsh 2000 daSilva 2010 deVisser 2009 O'Maille 2008 Hall 2010 Diploid Hall 2010 Haploid Lunzer 2005 Size (L) Number of Direct Paths 3 3 3 4 4 5 5 5 5 5 5 6 6 6 9 6 2 3 24 2 86 5 9 27 17 25 27 40 7 6564 Fraction of direct mutational trajectories that are selectively accessible 1.000000 0.333333 0.500000 1.000000 0.083333 0.716667 0.041667 0.075000 0.225000 0.141667 0.208333 0.037500 0.055556 0.009722 0.018089 Table 1: The ratio of accessible direct pathways over total direct pathways, L!, for each dataset. The dataset name, number of mutational sites, L, and direct pathway ratio are given for each dataset . 18 Fig. 4. #Direct Pathways/L! Direct Pathways/L! 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 y = -0.1299x + 0.9373 R² = 0.3288 #Direct/L! 0 2 4 6 8 Number of Mutational Sites, L 10 Figure 4: Fraction of direct mutational trajectories that are selectively accessible versus number of mutational sites: The ratio of accessible direct pathways to the total number of direct pathways L!, plotted against the total number of mutational sites in a given dataset, L . The plot is fitted with a regression line, with R2 = .32883, and r = -.57343. Efficiency of Tradeoff In the presence of sign epistasis, there exists a number of direct pathways that become functionally inaccessible. We found that the number of accessible indirect pathways accounts for 1.66% of functionally inaccessible direct pathways as a result of sign epistasis. Further, indirect mutational trajectories account for 46.6% of accessible pathways. We therefore can conclude that the inclusion of indirect pathways greatly increases the amount of pathways leading to a determined fitness peak. We refer to the evolutionary tradeoff as the increased accessibility of indirect trajectories due to sign epistasis. In order to measure this evolutionary tradeoff, we 19 determined the total number of indirect accessible pathways, distinguishing based on the length of pathway; where the number of steps in the direct pathway equals L (the number of alleles), and each subsequent indirect pathway has length L+2, …L+2*(L-1), as shown in Figure 5, a hypercube representing mutational sites L = 4. These values were then divided by the total number of lost direct pathways, as determined by the number of accessible direct pathways subtracted from L!, the number of total direct pathways. This value is referred to as the ‘tradeoff efficiency’, meaning the ratio of the number of accessible indirect mutational pathways to the total number of inaccessible direct pathways as a result of sign epistasis. Fig. 5. Figure 5: Direct and Indirect Pathways of an organism with L=4 mutational sites. Examples of potential trajectories leading to a genotype of highest fitness in a four-bit model. The gray arrows represent all potential pathways. Two accessible pathways exist, with green representing an accessible direct pathway 20 in which all mutations are forward, and black representing an accessible indirect pathway in which mutational reversions exist. Figure from Zagorski et al. 2016. For the purposes of our study, the most relevant tradeoff efficiencies are related to the total number of accessible indirect mutational trajectories. We distinguished tradeoff efficiency ratios based on the length of the indirect trajectory as compared to the direct trajectory length and plotted these values against the total number of mutational sites, L. We found that tradeoff efficiency of an organism as related to the accessibility of indirect pathways is negatively correlated with increase in indirect pathway length, with the maximum tradeoff efficiency values at Indirect +2. Regression analysis further distinguished this trend amongst datasets with differing numbers of mutation sites. Organisms with L=3 and L=4 mutations sites, independent of path length, have a lower tradeoff efficiency ratio as compared to organisms of mutational sites L=5 and L=6, as shown in Figure 6. This suggests that as the number of mutations increases in an organism, there is an increase in accessibility due to indirect pathways. Table 2. 21 Dataset Brown 2010 Malcom 1990 Constanzo 2011 Chou 2011 Lozovsky 2009 Khan 2011 Tan 2011 Weinreich 2006 Whitlock Walsh 2000 daSilva 2010 deVisser 2009 O'Maille 2008 Hall 2010 Diploid Hall 2010 Haploid Lunzer 2005 Size, L Fraction of indirect mutational trajectories of length +2 that are selectively accessible 3 3 3 0 0 0.333 Fraction of indirect mutational trajectories of length +4 that are selectively accessible 0 0 0 Fraction of indirect mutational trajectories of length +6 that are selectively accessible Fraction of indirect mutational trajectories of length +8 that are selectively accessible 4 4 5 5 5 0 0 1.853 0.017 0.063 0 0 0.176 0 0.018 0 0 0 0 0 0 0 0 0 0 5 0.376 0.183 0.022 0 5 5 6 6 0.078 0 0.017 0.041 0 0 0 0.037 0 0 0 0.018 0 0 0 0 6 0.003 0 0 0 9 0.016 0 0 0 Table 2: The ratio of accessible indirect pathways over accessible indirect pathways as distinguished by length subtracted from the number total direct pathways, L!, for each dataset . The dataset name, number of mutational sites, L, and indirect pathway ratio, referred to as tradeoff efficiency ratio, are given for each dataset. Figure 6. 22 ⁄⁄ -∞ Figure 6: Log transform of indirect trajectory tradeoff efficiency ratio versus number of mutational sites: The log transform of the ratio of accessible indirect pathways over accessible indirect pathways as distinguished by length subtracted from the number total direct pathways, L!, plotted against the total number of mutational sites in a given dataset, L. The indirect pathways of interest are of the length of the direct pathway plus 2, plus 4, and plus 6. Determining Correlation of Tradeoff Efficiency and Path Length Factors of interest relating to increased tradeoff efficiency and organismal fitness included the overall path length, as a variable independent of the number of mutational sites, L, in an organism. We examined this relationship through the use of tradeoff efficiency as an indicator of indirect mutational pathway viability, and found the correlation between the increase in path length from the direct mutational pathway and the resulting tradeoff efficiency ratio for a given trajectory. To standardize the data, we 23 tabulated the added lengths of the indirect paths. The lengths of mutational pathways were given in relation to the increase in length of a direct mutational pathway, with subsequent indirect paths given lengths of 0+2*1, …0+2*(Lmax-1). Indirect paths beyond the length of the direct trajectory +8 were found to contribute less than 1% to the total number of selectively accessible trajectories. Tradeoff efficiency was again determined as the ratio of number of accessible indirect pathways divided by the number of accessible direct pathways subtracted from L!, the total number of direct pathways without the influence of sign epistasis. The relationship between tradeoff efficiency and the increase in length of the pathway showed a negative correlation between tradeoff efficiency and pathway length (Fig. 7). This indicates that the number of accessible indirect pathways decreases with an increase in trajectory length as a result of mutational reversions. Fig. 7. 24 Figure 7: Log Transform of Length of Indirect Pathway versus the Tradeoff Efficiency Ratio: The log transformation of the ratio of accessible indirect pathways over accessible indirect pathways as distinguished by length subtracted from the number total direct pathways, L!, referred to as the Tradeoff Efficiency Ratio, plotted against the size of the step increase in direct pathway length . The indirect pathways of interest are of the length of the direct pathway plus 2, 4, and 6 . Discussion 25 From the analysis in this study, we quantified the extent to which sign epistasis results in the presence of both direct and indirect mutational trajectories leading to increased fitness, or drug resistance. Thus, while previous studies focus on the evolutionary value of direct trajectories, the inclusion of indirect pathways is valuable in considering the mutational trajectories leading to higher fitness. In evaluating epistatic tradeoff in the presence of sign epistasis, the efficiency of indirect mutational pathways in providing increased accessibility increases as the number of mutational sites increases. This relationship is contingent on the length of the indirect mutational pathway, however, as the tradeoff efficiency of indirect pathways is negatively correlated with subsequent increases in mutational trajectory length. In evaluating indirect mutational pathway contributions to tradeoff efficiency, it is evident that the increase accessibility of indirect trajectories is not sufficient to replace the lost accessible direct mutational pathways. The analysis conducted in this study assumes a binary model in characterizing selectively accessible trajectories towards higher fitness. This model is simplified in two areas. First, it is assumed that mutations hold a binary form, and can either be switched on or off. In a more complex model, mutational analysis can be expanded to investigate the potential of additional mutations, specifically of 4, representing the nucleotides A, C, T, and G, or of 20, representing all possible amino acids, as conducted in additional studies (Zagorski et al., 2016). Second, in evaluating the tradeoff efficiency ratio of indirect pathways, the denominator is based on the number of inaccessible direct pathways. In improving our understanding of the tradeoff efficiency, the algorithm can be 26 extended to account for the number of potentially accessible indirect mutational trajectories as determined by length beyond expected direct pathways; L+2, L+4, …, L+2*(Lmax-1). The datasets included in this study consisted of mutation site values, L, that required a significant amount of computational power in order to run for L > 5. Due to the recursive nature of the mutational trajectory algorithm, these values were not computable in a manageable timeframe for the purposes of this study. Additional analysis regarding occurrence of sign epistasis can be conducted through the analysis of the value of sign epistatic density in a given mutational trajectory hypercube as determined by number of mutational sites, L. Sign epistatic density is defined as the fraction of cases in which a mutation that is normally beneficial is deleterious (Weinreich et al., 2017). For the purposes of this study, the evaluation of sign epistatic density would be calculated based on individual datasets, in order to determine the environment of the data (Weinreich et al., 2006). This value, in conjunction with the value of accessible indirect pathways as divided by the total number of indirect pathways of a given length, can be used to determine a correlation between sign epistatic incidence and direct and indirect trajectory accessibility. Additionally, it is assumed that all beneficial mutations on a genetic background are equally likely, which does not necessarily hold in the presence of sign epistasis. In general, natural selection affects the likelihood of beneficial mutations occurring, as the process may favor some mutations over others. The probability of a beneficial mutation occurrence is further affected by mutational biases, such an increased chance of a transition occurring than a transversion. In understanding the evolutionary implications of the increased accessibility of indirect pathways, further analysis can be done to determine 27 the predictability of evolution, that is the probability of each step in a pathway occurring. This can therefore more effectively elucidate the probability of organismal evolution towards drug resistance and higher fitness, and the analysis of genetic interactions as expanding the mutational pathways in which this genotype is ultimately reached. Ultimately, our analyses demonstrate the evolutionary significance of indirect mutational trajectories as they function in the presence of sign epistasis. First, we determined that there exists a positively correlation between number of mutational sites and accessibility of indirect mutational pathways. Second, we proved that the tradeoff efficiency of indirect mutational trajectories decreases as mutational pathway length increases. The findings from this project provide an improved understanding of the mechanisms of drug resistance and the effect of differing mutational pathways on overall fitness of an organism. Further experimental analysis will serve to effectively analyze observed gaps in accessibility of indirect mutational trajectories as it relates to inaccessibility of direct mutational trajectories due to sign epistasis. Acknowledgements 28 I would like to thank Professor Dan Weinreich for providing the guidance, mentorship, and support throughout my thesis project and undergraduate research experience. I would additionally like to thank Professor Anastasios Matzavinos for providing second advising for my thesis, and guidance in my Applied Mathematics pursuits. I would lastly like to extend my gratitude to the Weinreich Lab for their support throughout this process. References 29 [1] Palmer, A.C., Toprak, E., Baym, M., Kim, S., Veres, A., Bershtein, S. and Kishony, R., 2015. Delayed commitment to evolutionary fate in antibiotic resistance fitness landscapes. Nature Communications, 6. [2] Zagorski, M., Burda, Z. and Waclaw, B., 2016. Beyond the Hypercube: Evolutionary Accessibility of Fitness Landscapes with Realistic Mutational Networks. PLOS Computational Biology, 12(12), p.e1005218. [3] Weinreich, D.M., Delaney, N.F., DePristo, M.A. and Hartl, D.L., 2006. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science, 312(5770), pp.111-114. [4] Knies, J., Cai, F. and Weinreich, D.M., 2017. Enzyme efficiency but not thermostability drives cefotaxime resistance evolution in TEM-1 β-lactamase. Molecular biology and evolution. [5] de Visser, J.A.G., Park, S.C. and Krug, J., 2009. Exploring the effect of sex on empirical fitness landscapes. The American Naturalist, 174(S1), pp.S15-S30. [6] Hall, D.W., Agan, M. and Pope, S.C., 2010. Fitness epistasis among 6 biosynthetic loci in the budding yeast Saccharomyces cerevisiae. Journal of Heredity, 101(suppl 1), pp.S75-S84. [7] Khan, A.I., Dinh, D.M., Schneider, D., Lenski, R.E. and Cooper, T.F., 2011. Negative epistasis between beneficial mutations in an evolving bacterial population. Science, 332(6034), pp.1193-1196. [8] Lozovsky, E.R., Chookajorn, T., Brown, K.M., Imwong, M., Shaw, P.J., Kamchonwongpaisan, S., Neafsey, D.E., Weinreich, D.M. and Hartl, D.L., 2009. Stepwise acquisition of pyrimethamine resistance in the malaria parasite. Proceedings of the National Academy of Sciences, 106(29), pp.12025-12030. [9] Lunzer, M., Miller, S.P., Felsheim, R. and Dean, A.M., 2005. The biochemical architecture of an ancient adaptive landscape. Science, 310(5747), pp.499-501. [10] O'Maille, P.E., Malone, A., Dellas, N., Hess, B.A., Smentek, L., Sheehan, I., Greenhagen, B.T., Chappell, J., Manning, G. and Noel, J.P., 2008. Quantitative exploration of the catalytic landscape separating divergent plant sesquiterpene synthases. Nature Chemical Biology, 4(10), pp.617-623. [11] Aita, T. and Husimi, Y., 1996. Fitness spectrum among random mutants on Mt. Fuji-type fitness landscape. Journal of Theoretical Biology, 182(4), pp.469-485. 30 [12] Tan, L., Serene, S., Chao, H.X. and Gore, J., 2011. Hidden randomness between fitness landscapes limits reverse evolution. Physical review letters, 106(19), p.198102. [13] Brown, K.M., Costanzo, M.S., Xu, W., Roy, S., Lozovsky, E.R. and Hartl, D.L., 2010. Compensatory mutations restore fitness during the evolution of dihydrofolate reductase. Molecular Biology and Evolution, 27(12), pp.2682-2690. [14] Whitlock, M.C. and Bourguet, D., 2000. Factors affecting the genetic load in Drosophila: synergistic epistasis and correlations among fitness components. Evolution, 54(5), pp.1654-1660. [15] Bridgham, J.T., Carroll, S.M. and Thornton, J.W., 2006. Evolution of hormonereceptor complexity by molecular exploitation. Science, 312(5770), pp.97-101. [16] Chou, H.H., Chiu, H.C., Delaney, N.F., Segrè, D. and Marx, C.J., 2011. Diminishing returns epistasis among beneficial mutations decelerates adaptation. Science, 332(6034), pp.1190-1192. [17] da Silva, J., Coetzer, M., Nedellec, R., Pastore, C. and Mosier, D.E., 2010. Fitness epistasis and constraints on adaptation in a human immunodeficiency virus type 1 protein region. Genetics, 185(1), pp.293-303. [18] Costanzo, M.S. and Hartl, D.L., 2011. The evolutionary landscape of antifolate resistance in Plasmodium falciparum. Journal of Genetics, 90(2), pp.187-190. 31
© Copyright 2026 Paperzz