Natural Selection on Synonymous and Nonsynonymous Mutations Shapes Patterns of Polymorphism in Populus tremula Pär K. Ingvarsson* Umeå Plant Science Centre, Department of Ecology and Environmental Science, Umeå University, Umeå, Sweden *Corresponding author: E-mail: [email protected]. Associate editor: Carlos Bustamante Abstract Research article One important goal of population genetics is to understand the relative importance of different evolutionary processes for shaping variation in natural populations. Here, I use multilocus data to show that natural selection on both synonymous and nonsynonymous mutations plays an important role in shaping levels of synonymous polymorphism in European aspen ( Populus tremula ). Previous studies have documented a preferential fixation of synonymous mutations encoding preferred codons in P. tremula . The results presented here show that this has resulted in an increase in codon bias in P. tremula , consistent with stronger selection acting on synonymous codon usage. In addition, positive selection on nonsynonymous mutations appears to be common in P. tremula , with approximately 30% of all mutations having been fixed by positive selection. In addition, the recurrent fixation of beneficial mutations also reduces standing levels of polymorphism as evidenced by a significantly negative relationship between the rate of protein evolution synonymous site diversity and silent site diversity. Finally, I use approximate Bayesian methods to estimate the strength of selection acting on beneficial substitutions. These calculations show that recurrent hitchhiking reduces polymorphism by, on average, 30%. The product of strength of selection acting on beneficial mutations and the rate by which these occur across the genome (2Ne λs) equals 1.54 × 10−7 , which is in line with estimates from Drosophila where recurrent hitchhiking has also been shown to have significant effects on standing levels of polymorphism. Key words: adaptive evolution, codon bias, natural selection, selective sweeps. Introduction Understanding how various population genetic processes affect genome-wide patterns of polymorphism is one of the main goals of population genetics. However, the relative importance of various population genetics processes, such as genetic drift and natural selection, has long been contentious (Kimura 1983; Gillespie 1991). The neutral or nearly neutral theory propose that the overall pattern of DNA sequence evolution is explained by a balance between mutation, genetic drift, and purifying selection (Kimura 1983; Ohta 1992). Therefore, although mutations that are influenced by positive selection exist and may have great “biological” significance, the neutral or nearly neutral theory suggests that these mutations are too rare to have any “statistically” significant effect on the rate of protein evolution in most organisms. Alternatively, if positive selection plays a larger role, rates of protein evolution would be determined by the beneficial mutation rate and how quickly these mutations can spread and ultimately fix in a species (Gillespie 1991). It has been problematic to distinguish between these two models based on available population genetics data, largely because similar patterns of variation are predicted over a large part of the parameter ranges of the two models. For instance, demographic processes, such as bottleneck, growth, and population subdivision, are known to result in patterns of nucleotide diversity that are similar to those produced by positive and negative selections. However, if positive selection is more common than postulated by the neutral or nearly neutral theories, many (or most) sites in the genome would have experienced the effects of selection, either directly or indirectly through selection acting on linked sites, through a process known as hitchhiking (Smith and Haigh 1974; Gillespie 2000; Sella et al. 2009). Under this scenario, few sites would be truly neutral and the predictions from the neutral or nearly neutral theory would not be applicable to most sites in the genome (Sella et al. 2009). The action of positive selection is expected to leave characteristic signatures, such as reduced levels of polymorphism and enhanced linkage disequilibrium in genome regions surrounding the site where a selected substitution has occurred. Indirect effects of selection are thus expected to be most pronounced in genome regions where recombination is infrequent, and many studies have demonstrated a correlation between levels of nucleotide polymorphism and local recombination rates (e.g., Begun and Aquadro 1992; Baudry et al. 2001; Cutter and Payseur 2003). However, recent studies have shown that also regions experiencing high recombination rates show signs of ongoing selection at linked sites. For instance, recent studies in Drosophila have documented reduced levels of synonymous polymorphism in rapidly evolving genes (Andolfatto 2007; Macpherson et al. 2007; Bachtrog 2008), a pattern that is expected under a model of recurrent selective sweeps (Kaplan et al. 1989; Wiehe and Stephan 1993). Selection associated with recurrent selective sweeps © The Author 2009. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] 650 Mol. Biol. Evol. 27(3):650–660. 2010 doi:10.1093/molbev/msp255 Advance Access publication October 16, 2009 MBE Signals of Selection in Populus tremula · doi:10.1093/molbev/msp255 is usually thought to be strong (Ne s 1), but very even weak natural selection (Ne s ∼ 1) can leave distinct signatures that can be detected using data on polymorphism and divergence between species (McVean and Charlesworth 1999). For instance, selection on synonymous codons shapes the frequency spectrum of preferred and unpreferred mutations of alternative synonymous codons (Akashi and Schaeffer 1997; Cutter and Charlesworth 2006) and results in the preferential fixation of preferred codons in highly expressed genes (Maside et al. 2004). Besides affecting polymorphism and divergence at putatively neutral sites, concurrent selection at multiple linked sites reduces the efficacy of selection at all sites through a process known as Hill–Robertson interference (Hill and Robertson 1996; McVean and Charlesworth 2000). For instance, strong selection at linked sites reduces the efficiency of selection at sites under weak selection, such as selection on synonymous codons (Comeron and Guthrie 2005; McVean and Charlesworth 2000). Studies in Drosophila have demonstrated patterns consistent with Hill–Robertson interference, with genes experiencing high rates of protein evolution also showing signs of reduced codon bias (Betancourt and Presgraves 2002; Andolfatto 2007; Bachtrog 2008). In this paper, I show that there is ongoing selection at both synonymous and nonsynonymous sites in the longlived tree European aspen (Populus tremula ). Despite the long generation time of this species, selection at both synonymous and nonsynonynous sites appears to be common and strong enough that it leaves clear signatures in patterns of polymorphism at synonymous sites. Materials and Methods Plant Material, Selection of Loci, and DNA Sequencing The 77 loci used in this study are described in more detail in Ingvarsson (2008b). Briefly, samples of 12 or 19 diploid individuals, representing 24 or 38 haploid genomes of P. tremula , were taken from populations in Sweden, Austria, and France (Ingvarsson 2005). Gene fragments, ranging in size from 223 to 1,224 bp (mean length of the sequenced fragments was 554 bp), were amplified from diploid genomic DNA and directly sequenced on Beckman CEQ8000 capillary sequencers at Ume Plant Science Centre. All fragments were sequenced in both directions. Sequences were base called and assembled with PHRED and PHRAP (Ewing et al. 1998), and heterozygous bases were called with the Polyphred program (Nickerson et al. 1997) and confirmed by visual inspection of the corresponding trace files using the CONSED trace file viewer (Gordon et al. 1998). Regions with missing or lowquality data were trimmed from all sequences. Multiple sequence alignments were made using ClustalW (Thompson et al. 1994) and adjusted manually using BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). For all gene fragments, the homologous region from Populus trichocarpa was extracted from the publicly available genome sequence (Tuskan et al. 2006) and were used as outgroup sequence and to annotated gene fragments. Sequences described in this paper are available from GenBank/EMBL databases (accession numbers EU752500–EU754117). Population Genetic Analyses All population genetic analyses were performed using computer programs based on the publicly available C++ class library libsequence (Thornton 2003). Nucleotide diversity was calculated from either the average pairwise differences between sequences (π ; Tajima 1983) or the number of segregating sites (θW ; Watterson 1975). Diversity statistics were also calculated separately for noncoding, silent, and replacement sites. The frequency spectrum of mutations was summarized using either the Tajima’s D (Tajima 1989) or the standardized version of Fay and Wu’s H (Fay and Wu 2000; Zeng et al. 2006.) The latter statistic requires the use of an outgroup sequence so that mutations can be polarized into ancestral or derived states (Fay and Wu 2000). For all loci, the homologous region was extracted from the publicly available genome sequence from P. trichocarpa (Tuskan et al. 2006). Synonymous and nonsynonymous nucleotide substitution rates per site were calculated for all genes using a randomly selected sequence from P. tremula and the outgroup P. trichocarpa , as described in Ingvarsson (2007). Briefly, substitution rates were calculated using the maximum likelihood (ML) method of Goldman and Yang (1994) implemented in the Codeml program from the PAML package (version 3.15; Yang 1997). The estimation was performed assuming transition/transversion bias and with codon frequencies calculated from average nucleotide frequencies (F1x4). It has been shown that misidentification of the ancestral state of mutations can bias statistics that depend on accurate polarization of mutations into ancestral and derived states (e.g., Fay and Wu’s H ; Baudry and Depaulis 2003). However, the rate of misidentification is so low in P. tremula (<1%) that it is likely to have only minor effects on statistics that depend on an accurate identification of ancestral states, such as Fay and Wu’s H (Ingvarsson 2008b). Codon Bias and Selection on Synonymous Codons Optimal codons for Populus have been identified previously (Ingvarsson 2007, 2008a), and these were used to calculate the frequency of optimal codons (Fop ; Ikemura 1985) using the program codonw (version 1.4.2; http://www.codonw.sourceforge.net/) both for P. tremula and for the homologous gene in P. trichocarpa . I also used codonw to calculate the GC content for the complete coding region and for third codon positions (GC3s) for all genes, again both for P. tremula and for P. trichocarpa . The GC content for noncoding regions was obtained from a previous study (Ingvarsson 2007). Codon bias is generally believed to be determined by a balance between mutation, genetic drift, and natural selection. Assuming independence among sites and no 651 Ingvarsson · doi:10.1093/molbev/msp255 dominance, the steady-state proportion of optimal codons used in a coding sequence is given by Bulmer (1991) and Akashi (1996) 1 , (1) Fop (γ , κ) = 1 + e−2γ κ where κ = ν/µ is the ratio of mutation rates from and to optimal codons, respectively, and γ = 2Ns is the scaled strength of selection acting on synonymous codons. To analyze changes in codon bias between P. tremula and P. trichocarpa , I used equation (1) to derive the relationship between codon bias (Fop ) and expected change in codon bias (∆Fop ) following changes to either γ or κ: Fop (a γ , b κ) − Fop (γ , κ) ∆Fop = , (2) Fop (γ , κ) where Fop is given by equation (1). Here, a and b denote the relative change in selection on synonymous codons ( γ ) and mutational bias (κ) between two species, respectively. This model was fit to the observed data using nonlinear regression in the statistical package R (R Development Core Team 2007) using the nls function. Initially, a full model, including both a and b , was fitted to the data. Subsequently, reduced models were fitted where either a or b was constrained to unity, corresponding to models with either no change in mutational bias or no change in selection, respectively. Nested models were compared using likelihood ratio tests based on twice the difference in log likelihood between the models. The log-likelihood difference between two models was compared with a χ2 distribution with degrees of freedom (df) equal to the difference in the number of parameters in the models (df = 1 for both model comparisons). Estimating the Effects of Hitchhiking I used simulations to quantify the effects of linked selection on synonymous site diversity and to estimate parameters for a recurrent hitchhiking model (Jensen et al. 2008). To do this, I relied on approximate Bayesian computation (ABC). Briefly, I used coalescent simulations with the possibility for recurrent selection where the stochastic trajectories of the positively selected mutations are taken into account (Coop and Griffiths 2004; Jensen et al. 2008). Each run generated a simulated sample consisting of 77 loci (the number present in the original P. tremula data set), where sample size and number of sites for the different loci were taken from the original data. These data were used to calculate a number of summary statistics as suggested by Jensen et al. (2008). The summary statistics used were the mean and standard deviation across loci of Watterson’s θ (Watterson 1975), nucleotide diversity, π (Tajima 1983), Tajima’s D (Tajima 1989), Fay and Wu’s H (Fay and Wu 2000, standardized according to Zeng et al. 2006), and Kelly’s Zns (Kelly 1997). These test summary statistics capture somewhat different aspects of the frequency spectrum of the segregating mutations and of associations between different mutations. The model is characterized by a number of parameters that are treated as random variables, and for each simulation, values for these parameters are drawn from some 652 MBE prior distributions. The parameters of interest are the scaled mutation rate θ = 4Ne µ, the scaled recombination rate ρ = 4Ne r, the rate of selective sweeps across the genome, λ, and the strength of selection acting on positively selected mutations, s. The method also relies on an estimate of the effective population size, Ne . There is currently little information available on the size of Ne in P. tremula . Ingvarsson (2008b) suggests that the effective population size of P. tremula is at least 1.18 × 105 based on levels of synonymous polymorphism and a mutation rate for Populus from Tuskan et al. (2006). Because this estimate is likely a lower bound of Ne in P. tremula , I used two different values in the simulations, Ne = 105 and Ne = 5 × 105 . I generated 106 simulations with unique parameter values drawn from prior distributions; the scaled mutation and recombination rates were drawn from uniform prior distributions, θ ∼ U (0.001, 0.04) and ρ ∼ U (0.001, 0.1), respectively. There is little independent information on levels of recombination in Populus, but data on linkage disequlibrium (LD) suggest that per-site recombination rate is about an order of magnitude higher than the mutation rate (Ingvarsson 2008b). The strength of positive selection was drawn from log10 (s ) ∼ U (−7, 0), and the genome-wide rate of sweeps was drawn from log10 (2Ne λ) ∼ U (−8, −1). I used the ABC method of Beaumont et al. (2002) to sample from the posterior distribution of the parameters. Briefly, the simulated data were summarized using the summary statistics described above, and simulated samples were accepted if they were deemed to be sufficiently close to the observed data (Ssim − Sobs δ ), where Sobs is the set of summary statistics calculated from the original data and δ is a prechosen tolerance (Beaumont et al. 2002). For all analyses in this paper, I used a fixed value of δ = 0.001, and all parameter estimates were weighted and adjusted using local linear regression as outlined by Beaumont et al. (2002). All coalescent simulations were performed using a slightly modified version of the program described in Jensen et al. (2008) available at http://www.molpopgen.org/. The program was modified to read input data from a file to allow for different a sample size and number of sites for each locus. Each simulation replicate consisted of 77 loci, with sample sizes and number of sites matching that of the original data. The ABC analyses were performed using R scripts generously provided by M. Beaumont (available at http://www.rubic.rdg.ac.uk/ mab/stuff/) Posterior densities, including modes and highest posterior density intervals, for the estimated parameters were computed using the local likelihood method of Loader (1999) as implemented in the R library locfit. To assess the fit of the model estimated from the ABC analyses, I performed posterior predictive simulations (Gelman et al. 2004). The posterior predictive simulations were performed by generating a large number of data sets (105 ) from the posterior distributions of the parameters of the model. If the model and the estimated parameters show a good fit to the data, simulated data should approximate the observed data. The simulated data sets were summarized using the same summary statistics that were used for Signals of Selection in Populus tremula · doi:10.1093/molbev/msp255 Table 1. Levels of Nucleotide Polymorphism in Populus tremula. Sites Segregating sites Watterson’s θ Nucleotide diversity (π ) Tajima’s D a Fay and Wu’s H a All 39,381 725 0.0049 0.0046 −0.427 −0.574 Synonymous 8,266 464 0.0129 0.0120 −0.173 −0.459 Replacement 30,115 261 0.0022 0.0017 −0.648 −0.271 a Average across loci. the original data set and the distributions of these summary statistics were compared with the corresponding values calculated from the original data. Posterior predictive simulations constitute an important “self-consistency check” of the model (Gelman et al. 2004). I also evaluated the fit of the recurrent hitchhiking model and the bottleneck model from Ingvarsson (2008b) by assessing the fit between the observed frequency spectrum and that obtained from the posterior predictive simulations. The fit of the model was assessed using the χ2 discrepancy quantity defined as (Gelman et al. 2004) (yi − E(y |Λ))2 χ2 = , Var(yi |Λ) i where yi are the observed values of the different classes of the frequency spectrum and where E(y |Λ) and Var(y |Λ) are the expected values and variances of the frequency spectrum classes obtained from posterior predictive simulations of the corresponding model (Λ). Results and Discussion Selection on Synonymous Codon Usage Levels of polymorphism at synonymous and nonsynonymous sites are summarized in table 1 together with Tajima’s D (Tajima 1989) and Fay and Wu’s H (Fay and Wu 2000) that capture different aspects of the frequency spectrum. A total of 811 segregating sites were identified, of which 263 were singletons. About 20% of all sites screened were synonymous positions, but the majority of the segregating sites identified were still synonymous (table 1). In P. tremula , the average GC content at coding sites and at third codon positions is 0.461 and 0.432, respectively. This is significantly higher than the average GC content of surrounding noncoding regions which equals 0.377 (Wilcoxon signed rank test, P < 0.001 for both comparisons; see also Ingvarsson 2007). The lower GC content in noncoding regions suggests an overall bias of GC to AT mutations in P. tremula . Assuming that the GC content in noncoding region represents only the effects of (biased) mutation, yields and estimate of the mutational bias from GC to AT ( κ) of 1.653. The correlations between GC content in coding regions or third codon positions with the GC content at noncoding sites are weak and slightly negative (Spearman rank correlation, ρ = −0.188 and ρ = −0.142, respectively), demonstrating that differences in GC content between genes are not caused by regional differences in the mutation rates between AT and GC. The GC content in coding regions is also higher in P. tremula than in the outgroup MBE species P. trichocarpa , both across all sites in the coding region and specifically at third codon positions. Although the difference in GC content between P. tremula and P. trichocarpa difference is slight (0.2% for all coding sites and 0.05% for third codon positions), GC content is significantly greater in P. tremula across genes (Wilcoxon signed rank test, P = 0.017 and P = 0.021, for total GC content and GC content of third codon positions, respectively). There is, however, no difference between P. tremula and P. trichocarpa in the GC content at noncoding sites (Wilcoxon signed rank test, P = 0.308). In Populus , most preferred, or optimal, codons end in G or C (Ingvarsson 2007, 2008a), and the higher GC content observed in P. tremula translates into stronger codon bias, with P. tremula having a higher degree of codon bias than P. trichocarpa (Wilcoxon signed rank test, P < 0.001 and ∆F̄op = 0.021). This confirms previous results (Ingvarsson 2008a), which have documented an excess of fixations from unpreferred to preferred (P → U ) mutation in the P. tremula lineage, compared with other lineages in Populus. The accumulation of U → P over P → U mutations in P. tremula is consistent with stronger selection on synonymous codon usage in P. tremula (Ingvarsson 2008a). Why selection on synonymous codon usage is stronger in P. tremula is at present an open question, but one plausible explanation is an increase in the effective population size of P. tremula compared with the ancestral population size of Populus and compared with many other extant species in the genus (Ingvarsson 2008a). From equation (1), it is clear that differences in codon bias between two closely related species can be driven by either changes in the strength of selection acting on codon usage (γ = 2Ne s in eq. (1)) or changes in the mutational biases between the preferred and the unpreferred sites ( κ in eq. (1)). These alternative explanations have quite different effects on both the direction and the magnitude of the expected change in Fop for genes with different degrees of codon bias (see fig. 1 in Akashi 1996). When changes in codon bias are driven by mutational biases alone, both low and medium bias genes are expected to show similar changes in codon bias (Akashi 1996) and only genes that experience strong selection for preferred codons, and where levels of codon bias are already high, will show little change in codon bias. On the other hand, if there is a systematic change in the strength of selection acting on synonymous codons, for instance, if the effective population size of a species is increased or reduced, genes with intermediate levels of codon bias will not show much change in codon bias. Genes with high or low levels of codon bias will, however, show progressively greater change but in opposite directions (Akashi 1996). Again, genes with very high levels of codon bias are not expected to show much change simply because these genes are already showing a near complete use of optimal codons. Figure 1 plots differences in codon bias between P. tremula and P. trichocarpa (∆Fop ) versus the observed level of codon bias in P. trichocarpa (Fop ) for the 77 loci used in this study. There is a strong positive correlation between 653 MBE Ingvarsson · doi:10.1093/molbev/msp255 et al. 2006; Ingvarsson 2008b), suggesting a much greater differences in Ne between the two species. The approach to mutation–selection–drift equilibrium for codon bias is quite slow, however, on the order of the reciprocal of the mutation rate, so the discrepancy in relative Ne between P. tremula and P. trichocarpa based on changes in codon bias and on differences in standing levels of nucleotide polymorphism can likely be explained by the fact that codon bias has not yet reached equilibrium in P. tremula . Selection on Nonsynonymous Mutations FIG. 1. Changes in codon usage between Populus tremula and Populus trichocarpa for genes with different degree of codon bias. The solid line gives the best fitting nonlinear regression of Fop on ∆Fop under the assumption that changes in codon bias between the two species are driven by a change in Ne (from eq. (1)). The dotted line that is almost superimposed on the solid line is the best fitting model when changes in codon bias are caused by changes in both Ne and mutational bias. The dashed line gives the best fitting regression under the assumption that changes in codon bias are entirely caused by a change in the mutational bias from GC to AT. See test for further detail on the two models. The horizontal dotted line indicates the zero line, at which there is no net change in Fop between the two species. The vertical dotted line is the equilibrium value of Fop in the absence of selection on codon usage, that is, Fop calculated from the average GC content in noncoding regions in P. tremula . Both the correlation between Fop and ∆Fop (ρ = 0.557, P < 0.001) and the nonlinear regression (t = 20.1, df = 1, P < 0.001) are significant. Fop and ∆Fop (ρ = 0.576), as expected when differences in codon bias between the two species are driven by a change in the strength of selection rather than by a change in the mutational bias between preferred and unpreferred codons (Akashi 1996). To formally test these two alternative hypotheses, I fit a model including both shifts in γ and κ in equation (1) to the codon bias data from P. tremula and P. trichocarpa (dotted line in fig. 1). A model that includes a shift in the mutational bias only (dashed line in fig. 1) provides a significantly worse fit to the data (likelihood ratio test: 2∆L = 41.64, df = 1, P < 0.001). However, a model including only a shift in the strength of selection (solid line in fig. 1) is statistically indistinguishable from the full model (likelihood ratio test: 2∆L = 0.031, df = 1, P = 0.99), suggesting that changes in codon bias between P. tremula and P. trichocarpa are consistent with an increased strength of selection acting on synonymous codon usage in P. tremula . The ML estimate of the increase in γ is 1.574, and, assuming that selection on synonymous codons is stronger because of an increase in Ne in P. tremula , this suggests that the effective population size in P. tremula has increased by around 60% compared with P. trichocarpa . Current levels of nucleotide polymorphism in P. tremula are roughly two to three times higher than in P. trichocarpa , however (Gilchrist 654 Assuming that nonsynonymous mutations are either neutral or strongly deleterious, the ratio of polymorphic sites at nonsynonymous and synonymous sites (PA /PS ) should equal the number of fixed differences at nonsynonymous and synonymous sites (DA /DS ). However, if a proportion of nonsynonymous mutations are advantageous, they will rapidly be fixed by natural selection and result in an excess of nonsynonymous fixations, giving DA /DS > PA /PS . Several studies have used this framework to estimate the proportion of nonsynonymous substitutions that have been fixed by positive natural selection (denoted α, see EyreWalker 2006, for a review). I used the ML method of Bierne and Eyre-Walker (2004), as implemented in the DoFE program (available at http://www.lifesci.sussex.ac.uk/home/ Adam Eyre-Walker/Software files/DoFE-all 1.zip) to calculate the proportion of amino acid substitutions driven to fixation by positive selection in P. tremula , yielding an estimate of α of 0.159 with a 95% confidence interval of −0.079 to 0.340. However, a substantial proportion of all nonsynonymous mutations that are segregating in natural populations are likely slightly deleterious, are therefore unlikely to drift to fixation, and will thus not contribute to the divergence between species (Charlesworth 1996; Eyre-Walker 2006). These slightly deleterious mutations will to some extent mask the signature of adaptive evolution by introducing a negative bias into estimates of α. To deal with this bias, Charlesworth and Eyre-Walker (2008) suggested that many or even most of these unconditionally deleterious mutations would be excluded if mutations with a frequency below 15% are ignored when estimating α. If low-frequency variants (<15%) are excluded, the estimate of α in P. tremula is increased from 0.159 to 0.301, with a 95% confidence interval of 0.076–0.476. The data thus suggest that roughly 30% of all nonsynonymous substitutions that have been fixed because the divergence of P. tremula and P. trichocarpa have been driven to fixation by positive selection. The methods used to estimate α all assume that synonymous mutations accumulate through genetic drift only. However, the recent evolutionary history of P. tremula is characterized by an increased rate of fixation of preferred mutations at synonymous sites (Ingvarsson 2008a). This will reduce the DA /DS ratio compared with the situation when synonymous mutations are strictly neutral, resulting in an underestimation of α. If the increase in selection on synonymous mutations is driven by an increase in Ne , which appears to be likely in P. tremula (Ingvarsson 2008a; fig. 1), it is Signals of Selection in Populus tremula · doi:10.1093/molbev/msp255 conceivable that also weakly selected nonsynonymous mutations have experienced an increased rate of fixation due to stronger selection. As the approach to mutation–selection– drift equilibrium for weakly selected mutations is rather slow, it is not clear how these nonequilibrium processes affect the estimation of the proportion of substitutions that are fixed by positive selection. Because many species have likely experienced changes in Ne during their genealogical history, this is clearly something that warrants further studies. The proportion of adaptive substitutions in P. tremula is slightly lower than estimates from Drosophila, where different studies have suggested that between 40% and 95% of all mutations are fixed by positive selection (Wright and Andolfatto 2008). The P. tremula estimate of α is, however, substantially greater than estimates from Arabidopsis where there is instead a general excess of nonsynonymous polymorphism (Bustamante et al. 2002; Foxe et al. 2008). The lack of adaptively driven substitutions in Arabidopsis thaliana was initially credited to the highly selfing mating system of A. thaliana (Bustamante et al. 2002). However, recent studies have shown that also the obligately outcrossing relative Arabidopsis lyrata shows an excess of nonsynonymous polymorphisms, suggesting that the Arabidopsis data are not explained by mating system effects alone (Foxe et al. 2008). A potential factor influencing the fixation of mutation by positive selection in plants, like Arabidopsis, is strong population structure. If natural selection varies across at local populations, few mutations will spread throughout the species as they are not unconditionally favorable. Although population structure is pronounced in both A. lyrata and A. thaliana , there is still relatively little evidence to suggest that selection on nonsynonymous mutations within local populations differs markedly from species-wide patterns (Foxe et al. 2008). Nevertheless, it is interesting to note that although population structure is not completely absent in P. tremula , it is generally quite low (Hall et al. 2007). Population structure could thus be an important factor explaining the low estimates of α in A. thaliana and A. lyrata (Wright and Andolfatto 2008). Effects of Selection on Levels of Synonymous Polymorphism The interplay between weak selection for synonymous codon usage and intraspecific polymorphism is quite complex. In the absence of a mutation bias, intraspecific nucleotide diversity declines with increasing selection on synonymous codons (McVean and Charlesworth 1999). However, with a mutation bias from preferred to unpreferred codons, nucleotide diversity is actually highest at genes experiencing intermediate strengths of selection on synonymous codon usage and the expected correlation between codon bias and nucleotide diversity for genes with low to medium levels of codon bias is in fact positive (fig. 2 in McVean and Charlesworth 1999). The low GC content in noncoding regions in P. tremula suggests a mutation bias favoring GC to AT mutations, and because preferred codons in P. tremula end in G or C (Ingvarsson 2008a), MBE FIG. 2. Synonymous diversity (θsyn ) corrected for synonymous divergence plotted against (A ) nonsynonymous divergence (dN ) and (B ) codon bias (Fop ). this means a mutation bias favors unpreferred codons. As predicted, nucleotide diversity at synonymous sites ( θsyn ) and codon bias (Fop ) is positively correlated in P. tremula (ρ = 0.304 and P = 0.007). Depending on the degree of mutation bias, maximum nucleotide diversity will occur at a frequency of codon bias that is already quite high (McVean and Charlesworth 1999). Using an estimate of κ = 1.653, Ne = 1.18 × 105 , and µ = 3.75 × 10−8 for P. tremula (see Ingvarsson 2008b), maximum nucleotide diversity is expected to occur at loci where the frequency of optimal codon usage is around 47%, which is very close to what is actually observed in P. tremula (fig. 2). Strong selection, on the other hand, should leave very distinct signatures in the amount and patterns of nucleotide polymorphism in the genome region surrounding the site of an adaptive substitution (Kaplan et al. 1989; Przeworski 2002; Kim and Nielsen 2004). Regions linked to a site that 655 MBE Ingvarsson · doi:10.1093/molbev/msp255 have experienced a selective sweep are expected to show reduced levels of nucleotide polymorphism, an excess of high-frequency derived sites, and enhanced levels of linkage disequilibrium (Fay and Wu 2000; Kim and Stephan 2002). However, some of these effects are rapidly eradicated after a selective sweep has gone to completion. For instance, high-frequency derived sites will quickly drift to fixation and will no longer segregate in the population while linkage disequilibrium will be broken down by recombination (Przeworski 2002). Loci selected at random from throughout the genome are therefore not expected to show the clear signatures that are characteristic of recent selective sweeps, even if these loci have experienced multiple selective sweeps during their genealogical history (Przeworski 2002). What has become clear, however, is that recurrent hitchhiking does leave signatures that can be detected using genome-wide data on polymorphism (Przeworski 2002; Andolfatto 2007; Macpherson et al. 2007; Bachtrog 2008; Jensen et al. 2008). In regions experiencing a high rate of adaptive substitutions, recurrent hitchhiking events will reduce standing levels of polymorphism. If adaptive substitutions occur frequently enough, levels of polymorphism will not have time to recover between successive hitchhiking events and levels of polymorphism in these regions can thus be substantially lower than what is expected under neutrality. This has led to the suggestion that one conspicuous pattern of recurrent hitchhiking is a negative correlation between the rate adaptive substitutions (measured as the rate of protein evolution, dN ) and standing levels of polymorphism at synonymous sites ( θsyn ) and such patters have recently been observed in studies of Drosophilia (Andolfatto 2007; Macpherson et al. 2007). I also find a significant negative correlation between the synonymous site diversity (θsyn ) and the substitution rate at nonsynonymous sites (dN ) in the data from P. tremula (ρ = −0.243 and P = 0.033). Although the negative correlation between dN and synonymous site diversity is predicted from models of recurrent hitchhiking, an alternative explanation is that the mutation rate between loci varies across the genome. There is evidence for mutation rate variation across the genome of P. tremula as dN and dS are highly correlated across loci (ρ = 0.469 and P < 0.001). However, when accounting for mutation rate variation between loci, using divergence at synonymous sites (dS ) as a covariate, the association between the synonymous site diversity and the nonsynonymous substitution rate in fact becomes stronger, not weaker (partial correlation, ρC = −0.324, P = 0.004; fig. 2A ). Recurrent hitchhiking thus appears to exert a pronounced influence on standing levels of synonymous nucleotide polymorphism in P. tremula . The results presented here show that two distinct selective processes are affecting levels of synonymous polymorphism in P. tremula as both selection on synonymous codon usage and protein evolution have significant effects on synonymous polymorphism. In an effort to disentangle the direct and indirect effects of selection on synonymous codon usage and protein evolution on levels of polymorphism at 656 Table 2. Multiple Regression of the Effects of Selection on Synonymous and Nonsynonymous Sites on Synonymous Site Diversity in Populus tremula. Effect dS dN Fop β 0.055 −0.347 0.025 t 1.61 −2.50 2.19 P 0.112 0.015 0.032 synonymous sites in P. tremula, I included both variables in a multiple regression. The results show that codon bias and rates of protein evolution both have significant effects on levels of synonymous polymorphism (table 2) and these effects are working in opposite directions. High rates of protein evolution are associated with reduced levels of polymorphism at synonymous sites, whereas selection on synonymous codon usage increases polymorphism at synonymous sites (fig. 2). At first hand, it may seem rather surprising that recurrent hitchhiking have such large effects on synonymous diversity, given that linkage disequilibrium does not extend more than a few hundred base pairs in P. tremula (Ingvarsson 2005; Ingvarsson 2008b). Similar results have been observed, however, in Drosophila (Andolfatto 2007; Macpherson et al. 2007), where LD decays over similar genomic scales. Recently, a study in scots pine (Pinus sylvestris ), another long-lived tree species, has also documented patterns of polymorphism that are suggestive of the action of recurrent selective sweeps (Palmé et al. 2008). In P. sylvestris genes with high rates of protein evolution, measured as dN /dS , have both lower levels of polymorphism and a greater excess of low-frequency mutations. This study, however, is based on only nine loci, making it difficult to conclude whether this is truly a genome-wide phenomena in P. sylvestris . Nevertheless, several studies have now show that recurrent hitchhiking appears to be common and/or strong enough in many species that most sites, even in high recombination regions of the genome, are affected by linked selection. One caveat is that a negative correlation between θsyn and dN is not conclusive evidence for recurrent hitchhiking reducing standing levels of nucleotide polymorphism. An alternative explanation is background selection where the removal of weakly deleterious mutations through purifying selection reduces local Ne of genome regions (Charlesworth et al. 1993). A reduction in local Ne will result in both reduced levels of nucleotide diversity and potentially also in an increased rate of accumulation of slightly deleterious mutations, thereby creating an apparent correlation between synonymous diversity and nonsynonymous substitution rates. Variation in the strength of background selection is, at least partly, tied to variation in local recombination rates, and the effects of background selection are expected to be most pronounced in regions with little or no crossing over (Innan and Stephan 2003). There is no apparent correlation between dN and sequence-based estimates of the recombination rate in P. tremula estimated using the method of Hudson (2001; ρ = −0.007 and P = 0.95), MBE Signals of Selection in Populus tremula · doi:10.1093/molbev/msp255 suggesting that local variations in Ne are not a likely explanation for the negative correlation between dN and θsyn . Background selection should also affect synonymous substitution rates, but when variation in dS is taken into account, the negative relationship between θsyn and dN is actually stronger (fig. 2). Furthermore, in Drosophila, regions with low but nonzero recombination rates have similar rates of protein evolution as regions with normal levels of recombination do (Betancourt and Presgraves 2002), suggesting that background selection has only minor effects on rates protein evolution in regions with nonzero recombination rates. Adaptive Substitutions and Interference with Selection on Synonymous Codon Usage Studies in Drosophila have shown that recurrent hitchhiking interferes with weak selection for synonymous codon usage (Betancourt and Presgraves 2002; Andolfatto 2007), resulting in a negative association between rates of protein evolution and codon bias. I find a similar negative relationship between dN and Fop in P. tremula , although this association is not significant (ρ = −0.120 and P = 0.30). Nevertheless, the negative relationship between dN and Fop hints at a reduction in the efficacy of selection at synonymous sites in rapidly evolving genes in P. tremula . However, contrary to Drosphilia melanogaster where selection on synonymous codons appears to be rather weak or absent (Akashi 1996; Nielsen et al. 2007), selection on synonymous codons is pervasive in P. tremula (Singh et al. 2007; Ingvarsson 2008a). It is possible that ongoing selection on synonymous codon usage partly mitigates the negative effects of recurrent hitchhiking and helps explain why the relationship between dN and Fop is so weak in P. tremula . Moreover, codon bias and rates of protein evolution are both linked to patterns of gene expression in P. tremula (Ingvarsson 2007) and this is true also in many other organisms (Akashi 2001; Wright et al. 2004; Lemos et al. 2005). Codon bias and protein evolution are influenced by different aspects of gene expression in P. tremula , with the level of gene expression being the main force shaping selection on codon usage, whereas protein evolution is largely determined by expression breadth, that is, how many tissue types a gene is expressed in (rather than maximum expression level per se; Ingvarsson 2007). Expression level and expression breadth are correlated in P. tremula (ρ = 0.471, P < 0.001; Ingvarsson 2007), and it is plausible that the negative correlation observed between dN and Fop could arise because these variables are both correlated to a common underlying variable gene expression. When factoring out the common correlation to gene expression (measured either by expression level or by expression breadth), the partial correlation between dN and Fop is drastically reduced both in the 77 locus data set used in this paper (ρpartial = −0.061 and P = 0.605) and in the much greater data set I have examined earlier (Ingvarsson 2007; ρpartial = −0.055 and P = 0.198). This suggests that Hill–Robertson interference has little effect on the efficiency of selection on synonymous codon usage in P. tremula . FIG. 3. Posterior densities from the ABC analyses. The gray dot gives the joint maximum a posteriori (MAP) estimates of the parameters, whereas dotted lines denote the marginal MAPs. The contour lines denote the quartiles of the bivariate posterior distribution. Estimating Hitchhiking Parameters Under a model of recurrent selective sweeps, the effect of polymorphism at linked sites is a function of the genomewide rate of fixations of beneficial mutations, 2Ne λ, and the average strength of selection acting on these mutations, s (Wiehe and Stephan 1993). I used coalescent simulations with recurrent selective sweeps to estimate the product of the genome-wide rate of selective sweeps the average strength of selection of positively selected mutations (2Ne λs) and the scaled mutation rate ( θ = 4Ne µ) using an ABC approach (Beaumont et al. 2002). The results from the analyses are summarized in figure 3. The posterior mode of θ = 4Ne µ for the recurrent hitchhiking model, 0.0182, and a comparison with the estimated value of θ to the observed value of synonymous sites diversity ( θsyn = 0.0129) show that recurrent hitchhiking reduces levels of polymorphism by, on average, 29%. A reduction of genetic diversity of 29% is slightly greater than what Andolfatto (2007) found in a study of 137 genes from D. melanogaster (20%). It is, however, substantially lower the estimate of 50% provided by Jensen et al. (2008) who applied the same method that I have used in this paper to the data set from Andolfatto (2007). The posterior mode for 2Ne λs equals 1.54 × 10−7 , which is in line with estimates from Drosophila (reviewed in Jensen et al. 2008). The effects of the rate of sweeps (2Ne λ) and the strength of selection (s) are usually confounded because similar results can be obtained by either assuming strong and rare selection or common and selection. However, even though strong and rare or weak and common selection have similar effects of average levels of polymorphism, the variance in polymorphism across different genome regions is greatly increased when selection is strong but rare, and this can in be curtailed into providing separate estimates of 2Ne λ and s (Jensen et al. 2008). The data in this paper, with 657 Ingvarsson · doi:10.1093/molbev/msp255 MBE FIG. 4. Histograms showing the distribution of summary statistics from the posterior predictive simulations. Observed values from the data set are shown by dotted vertical lines. many, but rather short genome regions, are not well suited for separate estimation of these parameters, and there is thus still a great deal of uncertainty as to whether selection is generally rare and strong or common but weak in P. tremula. It is worth pointing out that when the same data are analyzed using the method of Andolfatto (2007), the parameter estimates are in the same range as those presented here, suggesting that the results are rather insensitive to the actual method used. As more data on nucleotide polymorphism accumulate in P. tremula , it should be possible to shed more light on this question. It is clear from the posterior predictive simulations that the selection parameters estimated from the ABC analyses cannot completely reproduce the patterns of sequence variation seen in P. tremula (fig. 4). There is a large excess of high-frequency derived sites in P. tremula that apparently cannot be explained by the recurrent hitchhiking model (fig. 4). Even though selective sweeps are known to result in a distorted frequency spectrum with an excess of high-frequency derived sites, this signature of selection is quickly eradicated, usually with 0.1Ne generations following the sweep (Przeworski 2002). Randomly selected loci are therefore not expected to show a pronounced excess of high-frequency derived sites unless the genome-wide rate of selective sweeps is high enough that most loci are 658 expected to have experienced a selective sweep recently (Przeworski 2002). The recurrent hitchhiking model used in the ABC analyses assumes that the demography of the species conforms to the standard neutral model. Earlier studies, using some of the same data used in this paper, have suggested that P. tremula has gone through a modest bottleneck about 3–500 kya (Ingvarsson 2008b). Both nonequilibrium demography and recurrent selective sweeps are expected to generate many patterns of genetic variation that are similar and it is not immediately clear how to separate the effects of selection and demography. To evaluate these two models, I compared the fit of the frequency spectrum from posterior predictive simulations with the frequency spectrum from the original data. Both the recurrent hitchhiking model and the bottleneck model from Ingvarsson (2008b) show reasonably good fits to the observed data (χ2 = 0.627 and χ2 = 2.058 for the hitchhiking and bottleneck models, respectively), with the recurrent hitchhiking model providing a marginally better fit. Although relatively simple demographic scenarios appear to explain patterns of intraspecific polymorphism in P. tremula (Ingvarsson 2008b), it is harder to envision a demographic explanation for the strong negative relationship observed between synonymous site diversity and rates of protein evolution. These observations suggest that both demographic MBE Signals of Selection in Populus tremula · doi:10.1093/molbev/msp255 and selective processes could have important consequences for patterns of nucleotide polymorphism in P. tremula . Conclusions Polymorphism data from a number of long-lived tree species have been shown to be consistent with a bottleneck sometime in their genealogical history (Heuertz et al. 2006; Pyhäjarvi et al. 2007; Ingvarsson 2008b). This study and a recent study in P. sylvestris show that recurrent hitchhiking also can have substantial effects on synonymous diversity. The presence of natural selection will bias inference of demographic parameters from polymorphism data just as demography will bias estimates of natural selection. It is possible, and straight forward, to extend methods like the ABC analyses used in this paper to also include nonequilibrium demography, such as population growth and/or bottlenecks. However, the number of parameters in the model rapidly increase with increasing model complexity and progressively more data are needed reliable estimate the parameters in the model. Jensen et al. (2008) suggested using longer contiguous sequence regions as a way to obtain separate estimates of s and 2Ne λ. Utilizing long contiguous genome regions increase the variance in summary statistics across loci, and this is the crucial aspect of the data that allows for the separate estimation of s and 2Ne λ (Macpherson et al. 2007; Jensen et al. 2008). It is conceivable that using this approach would also increase the likelihood of differentiating between selective and demographic effects, but more work is clearly needed to verify this. With an increased reliance on next generation sequencing technologies, it is not too far fetched to imagine that complete or near-complete genome-wide data from a relatively large number of individuals will be available in the near future. This will allow for truly genome-wide population genetic analyses of the combined effects of natural selection and demography (see Macpherson et al. 2007, for such an analysis). One thing that would be extremely useful to incorporate into future analyses is the negative relationship between the synonymous polymorphism and the rate of protein evolution that this and other recent studies have documented. Such a pattern is hard to explain by neutral processes alone and by incorporating such a pattern in ABC simulations it might be possible to extract important information on both the strength of selection and the rate by which selective sweeps occur across the genome (Jensen et al. 2008; Sella et al. 2009). Developing methods to allow for the joint estimation of selection and demography from nucleotide data should therefore have high priority in the future (see Li and Stephan 2006, for such an example). It is clear that the relative importance of demography and natural selection for shaping patterns of nucleotide polymorphism in P. tremula , and in other species, is something that will be worth revisiting in the future as more genome-wide data accumulate. Acknowledgments This study has been funded by grants from the Swedish Research Council to V.R. References Akashi H. 1996. Molecular evolution between Drosophila melanogaster and D. simulans : reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster. Genetics 144:1297–1307. Akashi H. 2001. Gene expression and molecular evolution. Curr Opin Genet Dev. 11:660–666. Akashi H, Schaeffer SW. 1997. Natural selection and the frequency distributions of “silent” DNA polymorphism in Drosophila. Genetics 146:295–307. Andolfatto P. 2007. Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome. Genome Res. 17:1755–1762. Bachtrog D. 2008. Similar rates of protein adaptation in Drosophila miranda and D. melanogaster, two species with different current effective population sizes. BMC Evol Biol. 8:334. Baudry E, Depaulis F. 2003. Effect of misoriented sites on neutrality tests with outgroup. Genetics 165:1619–1622. Baudry E, Kerdelhue C, Innan H, Stephan W. 2001. Species and recombination effects on DNA variability in the tomato genus. Genetics 158:1725–1735. Beaumont MA, Zhang W, Balding DJ. 2002. Approximate Bayesian computation in population genetics. Genetics 162:2025–2035. Begun DJ, Aquadro CF. 1992. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature 356:519–520. Betancourt AJ, Presgraves DC. 2002. Linkage limits the power of natural selection in Drosophila. Proc Natl Acad Sci USA. 99: 13616–13620. Bierne N, Eyre-Walker A. 2004. The genomic rate of adaptive amino acid substitution in Drosophila. Mol Biol Evol. 21:1350–1360. Bulmer M. 1991. The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907. Bustamante CD, Nielsen R, Sawyer SA, Olsen KM, Purugganan MD, Hartl DL. 2002. The cost of inbreeding in Arabidopsis. Nature 416:531–534. Charlesworth B. 1996. Background selection and patterns of genetic diversity in Drosophila melanogaster. Genet Res. 68:131–149. Charlesworth B, Morgan MT, Charlesworth D. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289–1303. Charlesworth J, Eyre-Walker A. 2008. The Mcdonald-Kreitman test and slightly deleterious mutations. Mol Biol Evol. 25:1007–1015. Comeron JM, Guthrie TB. 2005. Intragenic Hill-Robertson interference influences selection intensity on synonymous mutations in Drosophila. Mol Biol Evol. 22:2519–2530. Coop G, Griffiths RC. 2004. Ancestral inference on gene trees under selection. Theor Popul Biol. 66:219–232. Cutter AD, Charlesworth B. 2006. Selection intensity on preferred codons correlates with overall codon usage bias in Caenorhabditis remanei. Curr Biol. 16:2053–2057. Cutter AD, Payseur BA. 2003. Selection at linked sites in the partial selfer Caenorhabditis elegans. Mol Biol Evol. 20:665–673. Ewing B, Hillier L, Wendl M, Green P. 1998. Base-calling of automated sequencer traces using phred. I. Accuracy assesment. Genome Res. 8:175–185. Eyre-Walker A. 2006. The genomic rate of adaptive evolution. Trends Ecol Evol. 21:569–575. Fay J, Wu CI. 2000. Hitchhiking under positive Darwinian selection. Genetics 155:1405–1413. Foxe JP, Dar V, Zheng H, Nordborg M, Gaut BS, Wright SI. 2008. Selection on amino acid substitutions in Arabidopsis. Mol Biol Evol. 25:1375–1383. Gelman A, Carlin JB, Stern HS, Rubin DB. 2004. Bayesian data analysis. 2nd ed. Boca Raton (FL): Chapman and Hall/CRC Press. 659 Ingvarsson · doi:10.1093/molbev/msp255 Gilchrist EJ, Haughn GW, Ying CC, et al. (14 co-authors). 2006. Use of Ecotilling as an efficient SNP discovery tool to survey genetic variation in wild populations of Populus trichocarpa. Mol Ecol. 15: 1367–1378. Gillespie JH. 1991. The causes of molecular evolution. Oxford (UK): Oxford University Press. Gillespie JH. 2000. Genetic drift in an infinite population: the pseudohitchhiking model. Genetics 155:909–919. Goldman N, Yang ZH. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 11:725– 736. Gordon D, Abajian C, Green P. 1998. Consed: a graphical tool for sequence finishing. Genome Res. 8:195–202. Hall D, Luquez V, Garcia MV, St. Onge KR, Jansson S, Ingvarsson PK. 2007. Adaptive population differentiation in phenology across a latitudinal gradient in European aspen (Populus tremula, L.): a comparison of neutral markers, candidate genes and phenotypic traits. Evolution 61:2849–2860. Heuertz M, De Paoli E, Kallman T, Larsson H, Jurman I, Morgante M, Lascoux M, Gyllenstrand N. 2006. Multilocus patterns of nucleotide diversity, linkage disequilibrium and demographic history of Norway spruce [Picea abies (L.) Karst]. Genetics 174:2095–2105. Hill WG, Robertson A. 1996. The effect of linkage on limits to artificial selection. Genet Res. 8:269–294. Hudson RR. 2001. Two-locus sampling distributions and their application. Genetics 159:1805–1817. Ikemura T. 1985. Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. 2:13–34. Ingvarsson PK. 2005. Nucleotide polymorphism and linkage disequilbrium within and among natural populations of European aspen (Populus tremula L., Salicaceae). Genetics 169:945–953. Ingvarsson PK. 2007. Gene expression and protein length influence codon usage and rates of sequence evolution in Populus tremula. Mol Biol Evol. 24:836–844. Ingvarsson PK. 2008a. Molecular evolution of synonymous codon usage in Populus. BMC Evol Biol. 8:307. doi:10.1186/1471-2148-8-307. Ingvarsson PK. 2008b. Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula. Genetics 180:329–340. Innan H, Stephan W. 2003. Distinguishing the hitchhiking and background selection models. Genetics 165:2307–2312. Jensen JD, Thornton KR, Andolfatto P. 2008. An approximate Bayesian estimator suggests strong, recurrent selective sweeps in Drosophila. PLoS Genet. 4:e1000198. Kaplan NL, Hudson RR. 1989. The ’hitchhiking’ effect revisited. Genetics 123:887–899. Kelly JK. 1997. A test of neutrality based on interlocus associations. Genetics 146:1197–1206. Kim Y, Nielsen R. 2004. Linkage disequilibrium as a signature of selective sweeps. Genetics 167:1513–1524. Kim Y, Stephan W. 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160:765–777. Kimura M. 1983. The neutral theory of molecular evolution. Cambridge (UK): Cambridge University Press. Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL. 2005. Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length and number of protein-protein interactions. Mol Biol Evol. 22:1345–1354. Li H, Stephan W. 2006. Inferring the demographic history and rate of adaptive substitution in Drosophila. PLoS Genet. 2:e166. Loader C. 1999. Local regression and likelihood. New York: Springer Verlag. Macpherson JM, Sella G, Davis JC, Petrov DA. 2007. Genomewide spatial correspondence between nonsynonymous divergence and neutral polymorphism reveals extensive adaptation in Drosophila. Genetics 177:2083–2099. 660 MBE Maside XL, Lee AWS, Charlesworth B. 2004. Selection on codon usage in Drosophila americana. Curr Biol. 14:150–154. Smith JM, Haigh J. 1974. The hitch-hiking effect of a favorable gene. Genet Res. 23:23–35. McVean GAT, Charlesworth B. 1999. A population genetic model for the evolution of synonymous codon usage: patterns and predictions. Genet Res. 74:145–158. McVean GAT, Charlesworth B. 2000. The effects of Hill-Robertson interference between weakly selected mutations on patterns of molecular evolution and variation. Genetics 155:929–944. Nickerson DA, Tobe VO, Taylor SL. 1997. Polyphred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing. Nucleic Acids Res. 25: 2745–2751. Nielsen R, DuMont VLB, Hubisz MJ, Aquadro CF. 2007. Maximum likelihood estimation of ancestral codon usage bias parameters in Drosophila. Mol Biol Evol. 24:228–235. Ohta T. 1992. The nearly neutral theory of molecular evolution. Annu Rev Ecol Syst. 23:263–286. Palmé AE, Wright M, Savolainen O. 2008. Patterns of divergence among conifer ESTs and polymorphism in Pinus sylvestris identify putative selective sweeps. Mol Biol Evol. 25:2567–2577. Przeworski M. 2002. The signature of positive selection at randomly chosen loci. Genetics 160:1179–1189. Pyhäjarvi T, Garcia-Gil MR, Knurr T, Mikkonen M, Wachowiak W, Savolainen O. 2007. Demographic history has influenced nucleotide diversity in European Pinus sylvestris populations. Genetics 177:1713–1724. R Development Core Team. 2007. R: a language and environment for statistical computing. Vienna (Austria): R Foundation for Statistical Computing. ISBN 3-900051-07-0. Sella G, Petrov DA, Przeworski M, Andolfatto P. 2009. Pervasive natural selection in the Drosophila genome? PLoS Genet. 5(6): e1000495. Singh ND, Bauer DuMont VL, Hubisz MJ, Nielsen R, Aquadro CF. 2007. Patterns of mutation and selection at synonymous sites in Drosophila. Mol Biol Evol. 24:2687–2697. Tajima F. 1983. Evolutionary relationship of DNA sequences in finite populations. Genetics 105:437–460. Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595. Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res. 22:4673–4680. Thornton K. 2003. libsequence: a C++ class library for evolutionary genetic analysis. Bioinformatics 19:2325–2327. Tuskan G, DiFazio S, Jansson S, Bohlman J, et al. (111 co-authors). 2006. The genome of black cottonwood Populus trichocarpa (Torr. & Gray). Science 313:1596–1604. Watterson GA. 1975. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 7:256–276. Wiehe TH, Stephan W. 1993. Analysis of a genetic hitchhiking model, and its application to DNA polymorphism data from Drosophila melanogaster. Mol Biol Evol. 10:842–854. Wright SI, Andolfatto P. 2008. The impact of natural selection on the genome: emerging patterns in Drosophila and Arabidopsis. Annu Rev Ecol Evol Syst. 29:193–213. Wright SI, Yau CBK, Looseley M, Meyers BC. 2004. Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata. Mol Biol Evol. 21:1719–1726. Yang Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 13:555–556. Zeng K, Fu YX, Shi S, Wu CI. 2006. Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174: 1431–1439.
© Copyright 2026 Paperzz