Natural Selection on Synonymous and

Natural Selection on Synonymous and Nonsynonymous
Mutations Shapes Patterns of Polymorphism in Populus
tremula
Pär K. Ingvarsson*
Umeå Plant Science Centre, Department of Ecology and Environmental Science, Umeå University, Umeå, Sweden
*Corresponding author: E-mail: [email protected].
Associate editor: Carlos Bustamante
Abstract
Research article
One important goal of population genetics is to understand the relative importance of different evolutionary processes for
shaping variation in natural populations. Here, I use multilocus data to show that natural selection on both synonymous and
nonsynonymous mutations plays an important role in shaping levels of synonymous polymorphism in European aspen ( Populus tremula ). Previous studies have documented a preferential fixation of synonymous mutations encoding preferred codons
in P. tremula . The results presented here show that this has resulted in an increase in codon bias in P. tremula , consistent with
stronger selection acting on synonymous codon usage. In addition, positive selection on nonsynonymous mutations appears
to be common in P. tremula , with approximately 30% of all mutations having been fixed by positive selection. In addition,
the recurrent fixation of beneficial mutations also reduces standing levels of polymorphism as evidenced by a significantly
negative relationship between the rate of protein evolution synonymous site diversity and silent site diversity. Finally, I use
approximate Bayesian methods to estimate the strength of selection acting on beneficial substitutions. These calculations
show that recurrent hitchhiking reduces polymorphism by, on average, 30%. The product of strength of selection acting on
beneficial mutations and the rate by which these occur across the genome (2Ne λs) equals 1.54 × 10−7 , which is in line with
estimates from Drosophila where recurrent hitchhiking has also been shown to have significant effects on standing levels of
polymorphism.
Key words: adaptive evolution, codon bias, natural selection, selective sweeps.
Introduction
Understanding how various population genetic processes
affect genome-wide patterns of polymorphism is one of the
main goals of population genetics. However, the relative
importance of various population genetics processes, such
as genetic drift and natural selection, has long been contentious (Kimura 1983; Gillespie 1991). The neutral or nearly
neutral theory propose that the overall pattern of DNA
sequence evolution is explained by a balance between
mutation, genetic drift, and purifying selection (Kimura
1983; Ohta 1992). Therefore, although mutations that
are influenced by positive selection exist and may have
great “biological” significance, the neutral or nearly neutral
theory suggests that these mutations are too rare to have
any “statistically” significant effect on the rate of protein
evolution in most organisms. Alternatively, if positive
selection plays a larger role, rates of protein evolution
would be determined by the beneficial mutation rate and
how quickly these mutations can spread and ultimately fix
in a species (Gillespie 1991).
It has been problematic to distinguish between these two
models based on available population genetics data, largely
because similar patterns of variation are predicted over a
large part of the parameter ranges of the two models. For instance, demographic processes, such as bottleneck, growth,
and population subdivision, are known to result in patterns
of nucleotide diversity that are similar to those produced
by positive and negative selections. However, if positive
selection is more common than postulated by the neutral or
nearly neutral theories, many (or most) sites in the genome
would have experienced the effects of selection, either directly or indirectly through selection acting on linked sites,
through a process known as hitchhiking (Smith and Haigh
1974; Gillespie 2000; Sella et al. 2009). Under this scenario,
few sites would be truly neutral and the predictions from
the neutral or nearly neutral theory would not be applicable to most sites in the genome (Sella et al. 2009).
The action of positive selection is expected to leave
characteristic signatures, such as reduced levels of polymorphism and enhanced linkage disequilibrium in genome
regions surrounding the site where a selected substitution has occurred. Indirect effects of selection are thus
expected to be most pronounced in genome regions
where recombination is infrequent, and many studies have
demonstrated a correlation between levels of nucleotide
polymorphism and local recombination rates (e.g., Begun
and Aquadro 1992; Baudry et al. 2001; Cutter and Payseur
2003). However, recent studies have shown that also
regions experiencing high recombination rates show signs
of ongoing selection at linked sites. For instance, recent
studies in Drosophila have documented reduced levels
of synonymous polymorphism in rapidly evolving genes
(Andolfatto 2007; Macpherson et al. 2007; Bachtrog 2008),
a pattern that is expected under a model of recurrent
selective sweeps (Kaplan et al. 1989; Wiehe and Stephan
1993). Selection associated with recurrent selective sweeps
© The Author 2009. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
650
Mol. Biol. Evol. 27(3):650–660. 2010 doi:10.1093/molbev/msp255 Advance Access publication October 16, 2009
MBE
Signals of Selection in Populus tremula · doi:10.1093/molbev/msp255
is usually thought to be strong (Ne s 1), but very
even weak natural selection (Ne s ∼ 1) can leave distinct
signatures that can be detected using data on polymorphism and divergence between species (McVean and
Charlesworth 1999). For instance, selection on synonymous
codons shapes the frequency spectrum of preferred and
unpreferred mutations of alternative synonymous codons
(Akashi and Schaeffer 1997; Cutter and Charlesworth 2006)
and results in the preferential fixation of preferred codons
in highly expressed genes (Maside et al. 2004).
Besides affecting polymorphism and divergence at putatively neutral sites, concurrent selection at multiple
linked sites reduces the efficacy of selection at all sites
through a process known as Hill–Robertson interference
(Hill and Robertson 1996; McVean and Charlesworth
2000). For instance, strong selection at linked sites reduces
the efficiency of selection at sites under weak selection,
such as selection on synonymous codons (Comeron and
Guthrie 2005; McVean and Charlesworth 2000). Studies
in Drosophila have demonstrated patterns consistent with
Hill–Robertson interference, with genes experiencing high
rates of protein evolution also showing signs of reduced
codon bias (Betancourt and Presgraves 2002; Andolfatto
2007; Bachtrog 2008).
In this paper, I show that there is ongoing selection at
both synonymous and nonsynonymous sites in the longlived tree European aspen (Populus tremula ). Despite the
long generation time of this species, selection at both synonymous and nonsynonynous sites appears to be common
and strong enough that it leaves clear signatures in patterns
of polymorphism at synonymous sites.
Materials and Methods
Plant Material, Selection of Loci, and DNA
Sequencing
The 77 loci used in this study are described in more
detail in Ingvarsson (2008b). Briefly, samples of 12 or 19
diploid individuals, representing 24 or 38 haploid genomes
of P. tremula , were taken from populations in Sweden,
Austria, and France (Ingvarsson 2005). Gene fragments,
ranging in size from 223 to 1,224 bp (mean length of
the sequenced fragments was 554 bp), were amplified
from diploid genomic DNA and directly sequenced on
Beckman CEQ8000 capillary sequencers at Ume Plant
Science Centre. All fragments were sequenced in both
directions. Sequences were base called and assembled with
PHRED and PHRAP (Ewing et al. 1998), and heterozygous
bases were called with the Polyphred program (Nickerson
et al. 1997) and confirmed by visual inspection of the
corresponding trace files using the CONSED trace file
viewer (Gordon et al. 1998). Regions with missing or lowquality data were trimmed from all sequences. Multiple
sequence alignments were made using ClustalW
(Thompson et al. 1994) and adjusted manually using
BioEdit (http://www.mbio.ncsu.edu/BioEdit/bioedit.html).
For all gene fragments, the homologous region from
Populus trichocarpa was extracted from the publicly
available genome sequence (Tuskan et al. 2006) and
were used as outgroup sequence and to annotated gene
fragments. Sequences described in this paper are available from GenBank/EMBL databases (accession numbers
EU752500–EU754117).
Population Genetic Analyses
All population genetic analyses were performed using computer programs based on the publicly available C++ class
library libsequence (Thornton 2003). Nucleotide diversity
was calculated from either the average pairwise differences
between sequences (π ; Tajima 1983) or the number of segregating sites (θW ; Watterson 1975). Diversity statistics were
also calculated separately for noncoding, silent, and replacement sites. The frequency spectrum of mutations was summarized using either the Tajima’s D (Tajima 1989) or the
standardized version of Fay and Wu’s H (Fay and Wu 2000;
Zeng et al. 2006.) The latter statistic requires the use of
an outgroup sequence so that mutations can be polarized
into ancestral or derived states (Fay and Wu 2000). For all
loci, the homologous region was extracted from the publicly available genome sequence from P. trichocarpa (Tuskan
et al. 2006).
Synonymous and nonsynonymous nucleotide substitution rates per site were calculated for all genes using a randomly selected sequence from P. tremula and the outgroup
P. trichocarpa , as described in Ingvarsson (2007). Briefly,
substitution rates were calculated using the maximum likelihood (ML) method of Goldman and Yang (1994) implemented in the Codeml program from the PAML package
(version 3.15; Yang 1997). The estimation was performed
assuming transition/transversion bias and with codon frequencies calculated from average nucleotide frequencies
(F1x4).
It has been shown that misidentification of the ancestral
state of mutations can bias statistics that depend on
accurate polarization of mutations into ancestral and
derived states (e.g., Fay and Wu’s H ; Baudry and Depaulis
2003). However, the rate of misidentification is so low in
P. tremula (<1%) that it is likely to have only minor effects
on statistics that depend on an accurate identification of
ancestral states, such as Fay and Wu’s H (Ingvarsson 2008b).
Codon Bias and Selection on Synonymous Codons
Optimal codons for Populus have been identified previously (Ingvarsson 2007, 2008a), and these were used
to calculate the frequency of optimal codons (Fop ;
Ikemura 1985) using the program codonw (version 1.4.2;
http://www.codonw.sourceforge.net/) both for P. tremula
and for the homologous gene in P. trichocarpa . I also used
codonw to calculate the GC content for the complete
coding region and for third codon positions (GC3s) for all
genes, again both for P. tremula and for P. trichocarpa . The
GC content for noncoding regions was obtained from a
previous study (Ingvarsson 2007).
Codon bias is generally believed to be determined by
a balance between mutation, genetic drift, and natural
selection. Assuming independence among sites and no
651
Ingvarsson · doi:10.1093/molbev/msp255
dominance, the steady-state proportion of optimal codons
used in a coding sequence is given by Bulmer (1991) and
Akashi (1996)
1
,
(1)
Fop (γ , κ) =
1 + e−2γ κ
where κ = ν/µ is the ratio of mutation rates from and
to optimal codons, respectively, and γ = 2Ns is the scaled
strength of selection acting on synonymous codons. To
analyze changes in codon bias between P. tremula and P. trichocarpa , I used equation (1) to derive the relationship between codon bias (Fop ) and expected change in codon bias
(∆Fop ) following changes to either γ or κ:
Fop (a γ , b κ) − Fop (γ , κ)
∆Fop =
,
(2)
Fop (γ , κ)
where Fop is given by equation (1). Here, a and b denote the
relative change in selection on synonymous codons ( γ ) and
mutational bias (κ) between two species, respectively. This
model was fit to the observed data using nonlinear regression in the statistical package R (R Development Core Team
2007) using the nls function. Initially, a full model, including
both a and b , was fitted to the data. Subsequently, reduced
models were fitted where either a or b was constrained
to unity, corresponding to models with either no change
in mutational bias or no change in selection, respectively.
Nested models were compared using likelihood ratio tests
based on twice the difference in log likelihood between the
models. The log-likelihood difference between two models
was compared with a χ2 distribution with degrees of freedom (df) equal to the difference in the number of parameters in the models (df = 1 for both model comparisons).
Estimating the Effects of Hitchhiking
I used simulations to quantify the effects of linked selection
on synonymous site diversity and to estimate parameters
for a recurrent hitchhiking model (Jensen et al. 2008). To do
this, I relied on approximate Bayesian computation (ABC).
Briefly, I used coalescent simulations with the possibility for
recurrent selection where the stochastic trajectories of the
positively selected mutations are taken into account (Coop
and Griffiths 2004; Jensen et al. 2008). Each run generated a
simulated sample consisting of 77 loci (the number present
in the original P. tremula data set), where sample size and
number of sites for the different loci were taken from the
original data.
These data were used to calculate a number of summary
statistics as suggested by Jensen et al. (2008). The summary
statistics used were the mean and standard deviation across
loci of Watterson’s θ (Watterson 1975), nucleotide diversity, π (Tajima 1983), Tajima’s D (Tajima 1989), Fay and
Wu’s H (Fay and Wu 2000, standardized according to Zeng
et al. 2006), and Kelly’s Zns (Kelly 1997). These test summary statistics capture somewhat different aspects of the
frequency spectrum of the segregating mutations and of associations between different mutations.
The model is characterized by a number of parameters
that are treated as random variables, and for each simulation, values for these parameters are drawn from some
652
MBE
prior distributions. The parameters of interest are the scaled
mutation rate θ = 4Ne µ, the scaled recombination rate
ρ = 4Ne r, the rate of selective sweeps across the genome,
λ, and the strength of selection acting on positively selected
mutations, s. The method also relies on an estimate of the
effective population size, Ne . There is currently little information available on the size of Ne in P. tremula . Ingvarsson (2008b) suggests that the effective population size of
P. tremula is at least 1.18 × 105 based on levels of synonymous polymorphism and a mutation rate for Populus from
Tuskan et al. (2006). Because this estimate is likely a lower
bound of Ne in P. tremula , I used two different values in
the simulations, Ne = 105 and Ne = 5 × 105 . I generated 106 simulations with unique parameter values drawn
from prior distributions; the scaled mutation and recombination rates were drawn from uniform prior distributions,
θ ∼ U (0.001, 0.04) and ρ ∼ U (0.001, 0.1), respectively.
There is little independent information on levels of recombination in Populus, but data on linkage disequlibrium (LD)
suggest that per-site recombination rate is about an order
of magnitude higher than the mutation rate (Ingvarsson
2008b). The strength of positive selection was drawn from
log10 (s ) ∼ U (−7, 0), and the genome-wide rate of sweeps
was drawn from log10 (2Ne λ) ∼ U (−8, −1).
I used the ABC method of Beaumont et al. (2002) to
sample from the posterior distribution of the parameters.
Briefly, the simulated data were summarized using the summary statistics described above, and simulated samples were
accepted if they were deemed to be sufficiently close to the
observed data (Ssim − Sobs δ ), where Sobs is the set
of summary statistics calculated from the original data and
δ is a prechosen tolerance (Beaumont et al. 2002). For all
analyses in this paper, I used a fixed value of δ = 0.001, and
all parameter estimates were weighted and adjusted using
local linear regression as outlined by Beaumont et al. (2002).
All coalescent simulations were performed using a
slightly modified version of the program described in Jensen
et al. (2008) available at http://www.molpopgen.org/. The
program was modified to read input data from a file to
allow for different a sample size and number of sites for
each locus. Each simulation replicate consisted of 77 loci,
with sample sizes and number of sites matching that of
the original data. The ABC analyses were performed using
R scripts generously provided by M. Beaumont (available at
http://www.rubic.rdg.ac.uk/ mab/stuff/) Posterior densities,
including modes and highest posterior density intervals, for
the estimated parameters were computed using the local
likelihood method of Loader (1999) as implemented in the
R library locfit.
To assess the fit of the model estimated from the
ABC analyses, I performed posterior predictive simulations
(Gelman et al. 2004). The posterior predictive simulations
were performed by generating a large number of data sets
(105 ) from the posterior distributions of the parameters of
the model. If the model and the estimated parameters show
a good fit to the data, simulated data should approximate
the observed data. The simulated data sets were summarized using the same summary statistics that were used for
Signals of Selection in Populus tremula · doi:10.1093/molbev/msp255
Table 1. Levels of Nucleotide Polymorphism in Populus tremula.
Sites
Segregating sites
Watterson’s θ
Nucleotide diversity (π )
Tajima’s D a
Fay and Wu’s H a
All
39,381
725
0.0049
0.0046
−0.427
−0.574
Synonymous
8,266
464
0.0129
0.0120
−0.173
−0.459
Replacement
30,115
261
0.0022
0.0017
−0.648
−0.271
a Average across loci.
the original data set and the distributions of these summary
statistics were compared with the corresponding values
calculated from the original data. Posterior predictive simulations constitute an important “self-consistency check” of
the model (Gelman et al. 2004).
I also evaluated the fit of the recurrent hitchhiking model
and the bottleneck model from Ingvarsson (2008b) by
assessing the fit between the observed frequency spectrum
and that obtained from the posterior predictive simulations.
The fit of the model was assessed using the χ2 discrepancy
quantity defined as (Gelman et al. 2004)
(yi − E(y |Λ))2
χ2 =
,
Var(yi |Λ)
i
where yi are the observed values of the different classes of
the frequency spectrum and where E(y |Λ) and Var(y |Λ)
are the expected values and variances of the frequency spectrum classes obtained from posterior predictive simulations
of the corresponding model (Λ).
Results and Discussion
Selection on Synonymous Codon Usage
Levels of polymorphism at synonymous and nonsynonymous sites are summarized in table 1 together with Tajima’s
D (Tajima 1989) and Fay and Wu’s H (Fay and Wu 2000)
that capture different aspects of the frequency spectrum. A
total of 811 segregating sites were identified, of which 263
were singletons. About 20% of all sites screened were synonymous positions, but the majority of the segregating sites
identified were still synonymous (table 1).
In P. tremula , the average GC content at coding sites and
at third codon positions is 0.461 and 0.432, respectively. This
is significantly higher than the average GC content of surrounding noncoding regions which equals 0.377 (Wilcoxon
signed rank test, P < 0.001 for both comparisons; see also
Ingvarsson 2007). The lower GC content in noncoding regions suggests an overall bias of GC to AT mutations in P.
tremula . Assuming that the GC content in noncoding region represents only the effects of (biased) mutation, yields
and estimate of the mutational bias from GC to AT ( κ)
of 1.653. The correlations between GC content in coding
regions or third codon positions with the GC content at
noncoding sites are weak and slightly negative (Spearman
rank correlation, ρ = −0.188 and ρ = −0.142, respectively), demonstrating that differences in GC content between genes are not caused by regional differences in the
mutation rates between AT and GC. The GC content in coding regions is also higher in P. tremula than in the outgroup
MBE
species P. trichocarpa , both across all sites in the coding
region and specifically at third codon positions. Although
the difference in GC content between P. tremula and P. trichocarpa difference is slight (0.2% for all coding sites and
0.05% for third codon positions), GC content is significantly
greater in P. tremula across genes (Wilcoxon signed rank
test, P = 0.017 and P = 0.021, for total GC content and
GC content of third codon positions, respectively). There
is, however, no difference between P. tremula and P. trichocarpa in the GC content at noncoding sites (Wilcoxon
signed rank test, P = 0.308).
In Populus , most preferred, or optimal, codons end in G
or C (Ingvarsson 2007, 2008a), and the higher GC content
observed in P. tremula translates into stronger codon bias,
with P. tremula having a higher degree of codon bias than
P. trichocarpa (Wilcoxon signed rank test, P < 0.001 and
∆F̄op = 0.021). This confirms previous results (Ingvarsson
2008a), which have documented an excess of fixations from
unpreferred to preferred (P → U ) mutation in the P. tremula lineage, compared with other lineages in Populus. The
accumulation of U → P over P → U mutations in P. tremula is consistent with stronger selection on synonymous
codon usage in P. tremula (Ingvarsson 2008a). Why selection on synonymous codon usage is stronger in P. tremula
is at present an open question, but one plausible explanation is an increase in the effective population size of P. tremula compared with the ancestral population size of Populus
and compared with many other extant species in the genus
(Ingvarsson 2008a).
From equation (1), it is clear that differences in codon
bias between two closely related species can be driven by
either changes in the strength of selection acting on codon
usage (γ = 2Ne s in eq. (1)) or changes in the mutational
biases between the preferred and the unpreferred sites ( κ
in eq. (1)). These alternative explanations have quite different effects on both the direction and the magnitude of
the expected change in Fop for genes with different degrees
of codon bias (see fig. 1 in Akashi 1996). When changes
in codon bias are driven by mutational biases alone, both
low and medium bias genes are expected to show similar
changes in codon bias (Akashi 1996) and only genes that experience strong selection for preferred codons, and where
levels of codon bias are already high, will show little change
in codon bias. On the other hand, if there is a systematic
change in the strength of selection acting on synonymous
codons, for instance, if the effective population size of a
species is increased or reduced, genes with intermediate levels of codon bias will not show much change in codon bias.
Genes with high or low levels of codon bias will, however,
show progressively greater change but in opposite directions (Akashi 1996). Again, genes with very high levels of
codon bias are not expected to show much change simply
because these genes are already showing a near complete
use of optimal codons.
Figure 1 plots differences in codon bias between P. tremula and P. trichocarpa (∆Fop ) versus the observed level of
codon bias in P. trichocarpa (Fop ) for the 77 loci used in
this study. There is a strong positive correlation between
653
MBE
Ingvarsson · doi:10.1093/molbev/msp255
et al. 2006; Ingvarsson 2008b), suggesting a much greater
differences in Ne between the two species. The approach
to mutation–selection–drift equilibrium for codon bias is
quite slow, however, on the order of the reciprocal of the
mutation rate, so the discrepancy in relative Ne between P.
tremula and P. trichocarpa based on changes in codon bias
and on differences in standing levels of nucleotide polymorphism can likely be explained by the fact that codon bias has
not yet reached equilibrium in P. tremula .
Selection on Nonsynonymous Mutations
FIG. 1. Changes in codon usage between Populus tremula and Populus trichocarpa for genes with different degree of codon bias. The solid
line gives the best fitting nonlinear regression of Fop on ∆Fop under the
assumption that changes in codon bias between the two species are
driven by a change in Ne (from eq. (1)). The dotted line that is almost
superimposed on the solid line is the best fitting model when changes
in codon bias are caused by changes in both Ne and mutational bias.
The dashed line gives the best fitting regression under the assumption that changes in codon bias are entirely caused by a change in the
mutational bias from GC to AT. See test for further detail on the two
models. The horizontal dotted line indicates the zero line, at which
there is no net change in Fop between the two species. The vertical
dotted line is the equilibrium value of Fop in the absence of selection
on codon usage, that is, Fop calculated from the average GC content in
noncoding regions in P. tremula . Both the correlation between Fop and
∆Fop (ρ = 0.557, P < 0.001) and the nonlinear regression (t = 20.1,
df = 1, P < 0.001) are significant.
Fop and ∆Fop (ρ = 0.576), as expected when differences in
codon bias between the two species are driven by a change
in the strength of selection rather than by a change in the
mutational bias between preferred and unpreferred codons
(Akashi 1996). To formally test these two alternative hypotheses, I fit a model including both shifts in γ and κ in
equation (1) to the codon bias data from P. tremula and P.
trichocarpa (dotted line in fig. 1). A model that includes a
shift in the mutational bias only (dashed line in fig. 1) provides a significantly worse fit to the data (likelihood ratio
test: 2∆L = 41.64, df = 1, P < 0.001). However, a model
including only a shift in the strength of selection (solid line
in fig. 1) is statistically indistinguishable from the full model
(likelihood ratio test: 2∆L = 0.031, df = 1, P = 0.99), suggesting that changes in codon bias between P. tremula and
P. trichocarpa are consistent with an increased strength of
selection acting on synonymous codon usage in P. tremula .
The ML estimate of the increase in γ is 1.574, and, assuming that selection on synonymous codons is stronger because of an increase in Ne in P. tremula , this suggests that
the effective population size in P. tremula has increased by
around 60% compared with P. trichocarpa . Current levels of
nucleotide polymorphism in P. tremula are roughly two to
three times higher than in P. trichocarpa , however (Gilchrist
654
Assuming that nonsynonymous mutations are either neutral or strongly deleterious, the ratio of polymorphic sites
at nonsynonymous and synonymous sites (PA /PS ) should
equal the number of fixed differences at nonsynonymous
and synonymous sites (DA /DS ). However, if a proportion
of nonsynonymous mutations are advantageous, they will
rapidly be fixed by natural selection and result in an excess of nonsynonymous fixations, giving DA /DS > PA /PS .
Several studies have used this framework to estimate the
proportion of nonsynonymous substitutions that have been
fixed by positive natural selection (denoted α, see EyreWalker 2006, for a review). I used the ML method of Bierne
and Eyre-Walker (2004), as implemented in the DoFE program (available at http://www.lifesci.sussex.ac.uk/home/
Adam Eyre-Walker/Software files/DoFE-all 1.zip) to calculate the proportion of amino acid substitutions driven to
fixation by positive selection in P. tremula , yielding an estimate of α of 0.159 with a 95% confidence interval of −0.079
to 0.340.
However, a substantial proportion of all nonsynonymous
mutations that are segregating in natural populations are
likely slightly deleterious, are therefore unlikely to drift to
fixation, and will thus not contribute to the divergence
between species (Charlesworth 1996; Eyre-Walker 2006).
These slightly deleterious mutations will to some extent
mask the signature of adaptive evolution by introducing a
negative bias into estimates of α. To deal with this bias,
Charlesworth and Eyre-Walker (2008) suggested that many
or even most of these unconditionally deleterious mutations would be excluded if mutations with a frequency below 15% are ignored when estimating α. If low-frequency
variants (<15%) are excluded, the estimate of α in P. tremula is increased from 0.159 to 0.301, with a 95% confidence interval of 0.076–0.476. The data thus suggest that
roughly 30% of all nonsynonymous substitutions that have
been fixed because the divergence of P. tremula and P. trichocarpa have been driven to fixation by positive selection.
The methods used to estimate α all assume that synonymous mutations accumulate through genetic drift only.
However, the recent evolutionary history of P. tremula is
characterized by an increased rate of fixation of preferred
mutations at synonymous sites (Ingvarsson 2008a). This will
reduce the DA /DS ratio compared with the situation when
synonymous mutations are strictly neutral, resulting in an
underestimation of α. If the increase in selection on synonymous mutations is driven by an increase in Ne , which appears to be likely in P. tremula (Ingvarsson 2008a; fig. 1), it is
Signals of Selection in Populus tremula · doi:10.1093/molbev/msp255
conceivable that also weakly selected nonsynonymous mutations have experienced an increased rate of fixation due to
stronger selection. As the approach to mutation–selection–
drift equilibrium for weakly selected mutations is rather
slow, it is not clear how these nonequilibrium processes
affect the estimation of the proportion of substitutions that
are fixed by positive selection. Because many species have
likely experienced changes in Ne during their genealogical history, this is clearly something that warrants further
studies.
The proportion of adaptive substitutions in P. tremula
is slightly lower than estimates from Drosophila, where different studies have suggested that between 40% and 95%
of all mutations are fixed by positive selection (Wright and
Andolfatto 2008). The P. tremula estimate of α is, however, substantially greater than estimates from Arabidopsis
where there is instead a general excess of nonsynonymous
polymorphism (Bustamante et al. 2002; Foxe et al. 2008).
The lack of adaptively driven substitutions in Arabidopsis
thaliana was initially credited to the highly selfing mating
system of A. thaliana (Bustamante et al. 2002). However, recent studies have shown that also the obligately outcrossing
relative Arabidopsis lyrata shows an excess of nonsynonymous polymorphisms, suggesting that the Arabidopsis data
are not explained by mating system effects alone (Foxe et al.
2008). A potential factor influencing the fixation of mutation by positive selection in plants, like Arabidopsis, is strong
population structure. If natural selection varies across at local populations, few mutations will spread throughout the
species as they are not unconditionally favorable. Although
population structure is pronounced in both A. lyrata and A.
thaliana , there is still relatively little evidence to suggest that
selection on nonsynonymous mutations within local populations differs markedly from species-wide patterns (Foxe
et al. 2008). Nevertheless, it is interesting to note that although population structure is not completely absent in P.
tremula , it is generally quite low (Hall et al. 2007). Population structure could thus be an important factor explaining
the low estimates of α in A. thaliana and A. lyrata (Wright
and Andolfatto 2008).
Effects of Selection on Levels of Synonymous
Polymorphism
The interplay between weak selection for synonymous
codon usage and intraspecific polymorphism is quite
complex. In the absence of a mutation bias, intraspecific
nucleotide diversity declines with increasing selection on
synonymous codons (McVean and Charlesworth 1999).
However, with a mutation bias from preferred to unpreferred codons, nucleotide diversity is actually highest at
genes experiencing intermediate strengths of selection on
synonymous codon usage and the expected correlation between codon bias and nucleotide diversity for genes with
low to medium levels of codon bias is in fact positive (fig. 2
in McVean and Charlesworth 1999). The low GC content
in noncoding regions in P. tremula suggests a mutation
bias favoring GC to AT mutations, and because preferred
codons in P. tremula end in G or C (Ingvarsson 2008a),
MBE
FIG. 2. Synonymous diversity (θsyn ) corrected for synonymous divergence plotted against (A ) nonsynonymous divergence (dN ) and (B )
codon bias (Fop ).
this means a mutation bias favors unpreferred codons. As
predicted, nucleotide diversity at synonymous sites ( θsyn )
and codon bias (Fop ) is positively correlated in P. tremula
(ρ = 0.304 and P = 0.007). Depending on the degree of
mutation bias, maximum nucleotide diversity will occur at a
frequency of codon bias that is already quite high (McVean
and Charlesworth 1999). Using an estimate of κ = 1.653,
Ne = 1.18 × 105 , and µ = 3.75 × 10−8 for P. tremula (see Ingvarsson 2008b), maximum nucleotide diversity is expected
to occur at loci where the frequency of optimal codon usage is around 47%, which is very close to what is actually
observed in P. tremula (fig. 2).
Strong selection, on the other hand, should leave very
distinct signatures in the amount and patterns of nucleotide
polymorphism in the genome region surrounding the site
of an adaptive substitution (Kaplan et al. 1989; Przeworski
2002; Kim and Nielsen 2004). Regions linked to a site that
655
MBE
Ingvarsson · doi:10.1093/molbev/msp255
have experienced a selective sweep are expected to show
reduced levels of nucleotide polymorphism, an excess of
high-frequency derived sites, and enhanced levels of linkage
disequilibrium (Fay and Wu 2000; Kim and Stephan 2002).
However, some of these effects are rapidly eradicated
after a selective sweep has gone to completion. For instance,
high-frequency derived sites will quickly drift to fixation
and will no longer segregate in the population while linkage
disequilibrium will be broken down by recombination
(Przeworski 2002). Loci selected at random from throughout the genome are therefore not expected to show the
clear signatures that are characteristic of recent selective
sweeps, even if these loci have experienced multiple selective sweeps during their genealogical history (Przeworski
2002).
What has become clear, however, is that recurrent hitchhiking does leave signatures that can be detected using
genome-wide data on polymorphism (Przeworski 2002;
Andolfatto 2007; Macpherson et al. 2007; Bachtrog 2008;
Jensen et al. 2008). In regions experiencing a high rate
of adaptive substitutions, recurrent hitchhiking events will
reduce standing levels of polymorphism. If adaptive substitutions occur frequently enough, levels of polymorphism
will not have time to recover between successive hitchhiking events and levels of polymorphism in these regions
can thus be substantially lower than what is expected
under neutrality. This has led to the suggestion that one
conspicuous pattern of recurrent hitchhiking is a negative correlation between the rate adaptive substitutions
(measured as the rate of protein evolution, dN ) and standing levels of polymorphism at synonymous sites ( θsyn ) and
such patters have recently been observed in studies of
Drosophilia (Andolfatto 2007; Macpherson et al. 2007).
I also find a significant negative correlation between the
synonymous site diversity (θsyn ) and the substitution rate
at nonsynonymous sites (dN ) in the data from P. tremula
(ρ = −0.243 and P = 0.033). Although the negative correlation between dN and synonymous site diversity is predicted from models of recurrent hitchhiking, an alternative
explanation is that the mutation rate between loci varies
across the genome. There is evidence for mutation rate variation across the genome of P. tremula as dN and dS are highly
correlated across loci (ρ = 0.469 and P < 0.001). However,
when accounting for mutation rate variation between loci,
using divergence at synonymous sites (dS ) as a covariate, the
association between the synonymous site diversity and the
nonsynonymous substitution rate in fact becomes stronger,
not weaker (partial correlation, ρC = −0.324, P = 0.004;
fig. 2A ). Recurrent hitchhiking thus appears to exert a pronounced influence on standing levels of synonymous nucleotide polymorphism in P. tremula .
The results presented here show that two distinct selective processes are affecting levels of synonymous polymorphism in P. tremula as both selection on synonymous codon
usage and protein evolution have significant effects on synonymous polymorphism. In an effort to disentangle the direct and indirect effects of selection on synonymous codon
usage and protein evolution on levels of polymorphism at
656
Table 2. Multiple Regression of the Effects of Selection on
Synonymous and Nonsynonymous Sites on Synonymous Site
Diversity in Populus tremula.
Effect
dS
dN
Fop
β
0.055
−0.347
0.025
t
1.61
−2.50
2.19
P
0.112
0.015
0.032
synonymous sites in P. tremula, I included both variables
in a multiple regression. The results show that codon bias
and rates of protein evolution both have significant effects
on levels of synonymous polymorphism (table 2) and these
effects are working in opposite directions. High rates of
protein evolution are associated with reduced levels of
polymorphism at synonymous sites, whereas selection
on synonymous codon usage increases polymorphism at
synonymous sites (fig. 2).
At first hand, it may seem rather surprising that recurrent
hitchhiking have such large effects on synonymous diversity, given that linkage disequilibrium does not extend more
than a few hundred base pairs in P. tremula (Ingvarsson
2005; Ingvarsson 2008b). Similar results have been observed,
however, in Drosophila (Andolfatto 2007; Macpherson et al.
2007), where LD decays over similar genomic scales. Recently, a study in scots pine (Pinus sylvestris ), another
long-lived tree species, has also documented patterns of
polymorphism that are suggestive of the action of recurrent
selective sweeps (Palmé et al. 2008). In P. sylvestris genes
with high rates of protein evolution, measured as dN /dS ,
have both lower levels of polymorphism and a greater excess
of low-frequency mutations. This study, however, is based
on only nine loci, making it difficult to conclude whether
this is truly a genome-wide phenomena in P. sylvestris .
Nevertheless, several studies have now show that recurrent hitchhiking appears to be common and/or strong
enough in many species that most sites, even in high recombination regions of the genome, are affected by linked
selection.
One caveat is that a negative correlation between θsyn
and dN is not conclusive evidence for recurrent hitchhiking
reducing standing levels of nucleotide polymorphism. An
alternative explanation is background selection where the
removal of weakly deleterious mutations through purifying
selection reduces local Ne of genome regions (Charlesworth
et al. 1993). A reduction in local Ne will result in both reduced levels of nucleotide diversity and potentially also in
an increased rate of accumulation of slightly deleterious
mutations, thereby creating an apparent correlation between synonymous diversity and nonsynonymous substitution rates. Variation in the strength of background selection is, at least partly, tied to variation in local recombination rates, and the effects of background selection are expected to be most pronounced in regions with little or no
crossing over (Innan and Stephan 2003). There is no apparent correlation between dN and sequence-based estimates
of the recombination rate in P. tremula estimated using
the method of Hudson (2001; ρ = −0.007 and P = 0.95),
MBE
Signals of Selection in Populus tremula · doi:10.1093/molbev/msp255
suggesting that local variations in Ne are not a likely explanation for the negative correlation between dN and θsyn . Background selection should also affect synonymous substitution rates, but when variation in dS is taken into account,
the negative relationship between θsyn and dN is actually
stronger (fig. 2). Furthermore, in Drosophila, regions with
low but nonzero recombination rates have similar rates of
protein evolution as regions with normal levels of recombination do (Betancourt and Presgraves 2002), suggesting
that background selection has only minor effects on rates
protein evolution in regions with nonzero recombination
rates.
Adaptive Substitutions and Interference with
Selection on Synonymous Codon Usage
Studies in Drosophila have shown that recurrent hitchhiking interferes with weak selection for synonymous codon
usage (Betancourt and Presgraves 2002; Andolfatto 2007),
resulting in a negative association between rates of protein
evolution and codon bias. I find a similar negative relationship between dN and Fop in P. tremula , although this association is not significant (ρ = −0.120 and P = 0.30).
Nevertheless, the negative relationship between dN and Fop
hints at a reduction in the efficacy of selection at synonymous sites in rapidly evolving genes in P. tremula . However, contrary to Drosphilia melanogaster where selection
on synonymous codons appears to be rather weak or absent (Akashi 1996; Nielsen et al. 2007), selection on synonymous codons is pervasive in P. tremula (Singh et al. 2007;
Ingvarsson 2008a). It is possible that ongoing selection on
synonymous codon usage partly mitigates the negative effects of recurrent hitchhiking and helps explain why the
relationship between dN and Fop is so weak in P. tremula .
Moreover, codon bias and rates of protein evolution are
both linked to patterns of gene expression in P. tremula
(Ingvarsson 2007) and this is true also in many other organisms (Akashi 2001; Wright et al. 2004; Lemos et al. 2005).
Codon bias and protein evolution are influenced by different aspects of gene expression in P. tremula , with the level
of gene expression being the main force shaping selection
on codon usage, whereas protein evolution is largely determined by expression breadth, that is, how many tissue
types a gene is expressed in (rather than maximum expression level per se; Ingvarsson 2007). Expression level and expression breadth are correlated in P. tremula (ρ = 0.471,
P < 0.001; Ingvarsson 2007), and it is plausible that the negative correlation observed between dN and Fop could arise
because these variables are both correlated to a common
underlying variable gene expression. When factoring out the
common correlation to gene expression (measured either
by expression level or by expression breadth), the partial correlation between dN and Fop is drastically reduced both in
the 77 locus data set used in this paper (ρpartial = −0.061
and P = 0.605) and in the much greater data set I have
examined earlier (Ingvarsson 2007; ρpartial = −0.055 and
P = 0.198). This suggests that Hill–Robertson interference
has little effect on the efficiency of selection on synonymous
codon usage in P. tremula .
FIG. 3. Posterior densities from the ABC analyses. The gray dot gives
the joint maximum a posteriori (MAP) estimates of the parameters,
whereas dotted lines denote the marginal MAPs. The contour lines
denote the quartiles of the bivariate posterior distribution.
Estimating Hitchhiking Parameters
Under a model of recurrent selective sweeps, the effect of
polymorphism at linked sites is a function of the genomewide rate of fixations of beneficial mutations, 2Ne λ, and
the average strength of selection acting on these mutations, s (Wiehe and Stephan 1993). I used coalescent simulations with recurrent selective sweeps to estimate the
product of the genome-wide rate of selective sweeps the average strength of selection of positively selected mutations
(2Ne λs) and the scaled mutation rate ( θ = 4Ne µ) using
an ABC approach (Beaumont et al. 2002). The results from
the analyses are summarized in figure 3. The posterior mode
of θ = 4Ne µ for the recurrent hitchhiking model, 0.0182,
and a comparison with the estimated value of θ to the observed value of synonymous sites diversity ( θsyn = 0.0129)
show that recurrent hitchhiking reduces levels of polymorphism by, on average, 29%. A reduction of genetic diversity
of 29% is slightly greater than what Andolfatto (2007) found
in a study of 137 genes from D. melanogaster (20%). It is,
however, substantially lower the estimate of 50% provided
by Jensen et al. (2008) who applied the same method that
I have used in this paper to the data set from Andolfatto
(2007).
The posterior mode for 2Ne λs equals 1.54 × 10−7 , which
is in line with estimates from Drosophila (reviewed in Jensen
et al. 2008). The effects of the rate of sweeps (2Ne λ) and
the strength of selection (s) are usually confounded because
similar results can be obtained by either assuming strong
and rare selection or common and selection. However, even
though strong and rare or weak and common selection
have similar effects of average levels of polymorphism, the
variance in polymorphism across different genome regions
is greatly increased when selection is strong but rare, and
this can in be curtailed into providing separate estimates of
2Ne λ and s (Jensen et al. 2008). The data in this paper, with
657
Ingvarsson · doi:10.1093/molbev/msp255
MBE
FIG. 4. Histograms showing the distribution of summary statistics from the posterior predictive simulations. Observed values from the data set
are shown by dotted vertical lines.
many, but rather short genome regions, are not well suited
for separate estimation of these parameters, and there is
thus still a great deal of uncertainty as to whether selection
is generally rare and strong or common but weak in P. tremula. It is worth pointing out that when the same data are
analyzed using the method of Andolfatto (2007), the parameter estimates are in the same range as those presented
here, suggesting that the results are rather insensitive to the
actual method used. As more data on nucleotide polymorphism accumulate in P. tremula , it should be possible to
shed more light on this question.
It is clear from the posterior predictive simulations that
the selection parameters estimated from the ABC analyses cannot completely reproduce the patterns of sequence
variation seen in P. tremula (fig. 4). There is a large excess of high-frequency derived sites in P. tremula that apparently cannot be explained by the recurrent hitchhiking
model (fig. 4). Even though selective sweeps are known to
result in a distorted frequency spectrum with an excess of
high-frequency derived sites, this signature of selection is
quickly eradicated, usually with 0.1Ne generations following the sweep (Przeworski 2002). Randomly selected loci
are therefore not expected to show a pronounced excess
of high-frequency derived sites unless the genome-wide
rate of selective sweeps is high enough that most loci are
658
expected to have experienced a selective sweep recently
(Przeworski 2002). The recurrent hitchhiking model used
in the ABC analyses assumes that the demography of the
species conforms to the standard neutral model. Earlier
studies, using some of the same data used in this paper, have
suggested that P. tremula has gone through a modest bottleneck about 3–500 kya (Ingvarsson 2008b). Both nonequilibrium demography and recurrent selective sweeps are
expected to generate many patterns of genetic variation
that are similar and it is not immediately clear how to
separate the effects of selection and demography. To evaluate these two models, I compared the fit of the frequency spectrum from posterior predictive simulations
with the frequency spectrum from the original data. Both
the recurrent hitchhiking model and the bottleneck model
from Ingvarsson (2008b) show reasonably good fits to the
observed data (χ2 = 0.627 and χ2 = 2.058 for the hitchhiking and bottleneck models, respectively), with the recurrent hitchhiking model providing a marginally better fit.
Although relatively simple demographic scenarios appear to
explain patterns of intraspecific polymorphism in P. tremula
(Ingvarsson 2008b), it is harder to envision a demographic
explanation for the strong negative relationship observed
between synonymous site diversity and rates of protein evolution. These observations suggest that both demographic
MBE
Signals of Selection in Populus tremula · doi:10.1093/molbev/msp255
and selective processes could have important consequences
for patterns of nucleotide polymorphism in P. tremula .
Conclusions
Polymorphism data from a number of long-lived tree
species have been shown to be consistent with a bottleneck
sometime in their genealogical history (Heuertz et al. 2006;
Pyhäjarvi et al. 2007; Ingvarsson 2008b). This study and a
recent study in P. sylvestris show that recurrent hitchhiking also can have substantial effects on synonymous diversity. The presence of natural selection will bias inference
of demographic parameters from polymorphism data just
as demography will bias estimates of natural selection. It is
possible, and straight forward, to extend methods like the
ABC analyses used in this paper to also include nonequilibrium demography, such as population growth and/or bottlenecks. However, the number of parameters in the model
rapidly increase with increasing model complexity and
progressively more data are needed reliable estimate the
parameters in the model. Jensen et al. (2008) suggested using longer contiguous sequence regions as a way to obtain
separate estimates of s and 2Ne λ. Utilizing long contiguous
genome regions increase the variance in summary statistics
across loci, and this is the crucial aspect of the data that allows for the separate estimation of s and 2Ne λ (Macpherson
et al. 2007; Jensen et al. 2008). It is conceivable that using this
approach would also increase the likelihood of differentiating between selective and demographic effects, but more
work is clearly needed to verify this.
With an increased reliance on next generation sequencing technologies, it is not too far fetched to imagine that
complete or near-complete genome-wide data from a relatively large number of individuals will be available in the
near future. This will allow for truly genome-wide population genetic analyses of the combined effects of natural selection and demography (see Macpherson et al. 2007, for
such an analysis). One thing that would be extremely useful to incorporate into future analyses is the negative relationship between the synonymous polymorphism and the
rate of protein evolution that this and other recent studies
have documented. Such a pattern is hard to explain by neutral processes alone and by incorporating such a pattern in
ABC simulations it might be possible to extract important
information on both the strength of selection and the rate
by which selective sweeps occur across the genome (Jensen
et al. 2008; Sella et al. 2009). Developing methods to allow
for the joint estimation of selection and demography from
nucleotide data should therefore have high priority in the
future (see Li and Stephan 2006, for such an example). It is
clear that the relative importance of demography and natural selection for shaping patterns of nucleotide polymorphism in P. tremula , and in other species, is something that
will be worth revisiting in the future as more genome-wide
data accumulate.
Acknowledgments
This study has been funded by grants from the Swedish
Research Council to V.R.
References
Akashi H. 1996. Molecular evolution between Drosophila
melanogaster and D. simulans : reduced codon bias, faster
rates of amino acid substitution, and larger proteins in
D. melanogaster. Genetics 144:1297–1307.
Akashi H. 2001. Gene expression and molecular evolution. Curr Opin
Genet Dev. 11:660–666.
Akashi H, Schaeffer SW. 1997. Natural selection and the frequency distributions of “silent” DNA polymorphism in Drosophila. Genetics
146:295–307.
Andolfatto P. 2007. Hitchhiking effects of recurrent beneficial amino
acid substitutions in the Drosophila melanogaster genome.
Genome Res. 17:1755–1762.
Bachtrog D. 2008. Similar rates of protein adaptation in Drosophila
miranda and D. melanogaster, two species with different current
effective population sizes. BMC Evol Biol. 8:334.
Baudry E, Depaulis F. 2003. Effect of misoriented sites on neutrality
tests with outgroup. Genetics 165:1619–1622.
Baudry E, Kerdelhue C, Innan H, Stephan W. 2001. Species and recombination effects on DNA variability in the tomato genus. Genetics
158:1725–1735.
Beaumont MA, Zhang W, Balding DJ. 2002. Approximate Bayesian
computation in population genetics. Genetics 162:2025–2035.
Begun DJ, Aquadro CF. 1992. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster.
Nature 356:519–520.
Betancourt AJ, Presgraves DC. 2002. Linkage limits the power of
natural selection in Drosophila. Proc Natl Acad Sci USA. 99:
13616–13620.
Bierne N, Eyre-Walker A. 2004. The genomic rate of adaptive amino
acid substitution in Drosophila. Mol Biol Evol. 21:1350–1360.
Bulmer M. 1991. The selection-mutation-drift theory of synonymous
codon usage. Genetics 129:897–907.
Bustamante CD, Nielsen R, Sawyer SA, Olsen KM, Purugganan MD,
Hartl DL. 2002. The cost of inbreeding in Arabidopsis. Nature
416:531–534.
Charlesworth B. 1996. Background selection and patterns of genetic
diversity in Drosophila melanogaster. Genet Res. 68:131–149.
Charlesworth B, Morgan MT, Charlesworth D. 1993. The effect of
deleterious mutations on neutral molecular variation. Genetics
134:1289–1303.
Charlesworth J, Eyre-Walker A. 2008. The Mcdonald-Kreitman test
and slightly deleterious mutations. Mol Biol Evol. 25:1007–1015.
Comeron JM, Guthrie TB. 2005. Intragenic Hill-Robertson interference influences selection intensity on synonymous mutations in
Drosophila. Mol Biol Evol. 22:2519–2530.
Coop G, Griffiths RC. 2004. Ancestral inference on gene trees under
selection. Theor Popul Biol. 66:219–232.
Cutter AD, Charlesworth B. 2006. Selection intensity on preferred
codons correlates with overall codon usage bias in Caenorhabditis remanei. Curr Biol. 16:2053–2057.
Cutter AD, Payseur BA. 2003. Selection at linked sites in the partial
selfer Caenorhabditis elegans. Mol Biol Evol. 20:665–673.
Ewing B, Hillier L, Wendl M, Green P. 1998. Base-calling of automated
sequencer traces using phred. I. Accuracy assesment. Genome Res.
8:175–185.
Eyre-Walker A. 2006. The genomic rate of adaptive evolution. Trends
Ecol Evol. 21:569–575.
Fay J, Wu CI. 2000. Hitchhiking under positive Darwinian selection.
Genetics 155:1405–1413.
Foxe JP, Dar V, Zheng H, Nordborg M, Gaut BS, Wright SI. 2008. Selection on amino acid substitutions in Arabidopsis. Mol Biol Evol.
25:1375–1383.
Gelman A, Carlin JB, Stern HS, Rubin DB. 2004. Bayesian data analysis.
2nd ed. Boca Raton (FL): Chapman and Hall/CRC Press.
659
Ingvarsson · doi:10.1093/molbev/msp255
Gilchrist EJ, Haughn GW, Ying CC, et al. (14 co-authors). 2006. Use
of Ecotilling as an efficient SNP discovery tool to survey genetic
variation in wild populations of Populus trichocarpa. Mol Ecol. 15:
1367–1378.
Gillespie JH. 1991. The causes of molecular evolution. Oxford (UK):
Oxford University Press.
Gillespie JH. 2000. Genetic drift in an infinite population: the pseudohitchhiking model. Genetics 155:909–919.
Goldman N, Yang ZH. 1994. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 11:725–
736.
Gordon D, Abajian C, Green P. 1998. Consed: a graphical tool for sequence finishing. Genome Res. 8:195–202.
Hall D, Luquez V, Garcia MV, St. Onge KR, Jansson S, Ingvarsson PK.
2007. Adaptive population differentiation in phenology across a
latitudinal gradient in European aspen (Populus tremula, L.): a
comparison of neutral markers, candidate genes and phenotypic
traits. Evolution 61:2849–2860.
Heuertz M, De Paoli E, Kallman T, Larsson H, Jurman I, Morgante
M, Lascoux M, Gyllenstrand N. 2006. Multilocus patterns of nucleotide diversity, linkage disequilibrium and demographic history
of Norway spruce [Picea abies (L.) Karst]. Genetics 174:2095–2105.
Hill WG, Robertson A. 1996. The effect of linkage on limits to artificial
selection. Genet Res. 8:269–294.
Hudson RR. 2001. Two-locus sampling distributions and their application. Genetics 159:1805–1817.
Ikemura T. 1985. Codon usage and tRNA content in unicellular and
multicellular organisms. Mol Biol Evol. 2:13–34.
Ingvarsson PK. 2005. Nucleotide polymorphism and linkage disequilbrium within and among natural populations of European aspen
(Populus tremula L., Salicaceae). Genetics 169:945–953.
Ingvarsson PK. 2007. Gene expression and protein length influence
codon usage and rates of sequence evolution in Populus tremula.
Mol Biol Evol. 24:836–844.
Ingvarsson PK. 2008a. Molecular evolution of synonymous codon usage in Populus. BMC Evol Biol. 8:307. doi:10.1186/1471-2148-8-307.
Ingvarsson PK. 2008b. Multilocus patterns of nucleotide polymorphism and the demographic history of Populus tremula. Genetics
180:329–340.
Innan H, Stephan W. 2003. Distinguishing the hitchhiking and background selection models. Genetics 165:2307–2312.
Jensen JD, Thornton KR, Andolfatto P. 2008. An approximate
Bayesian estimator suggests strong, recurrent selective sweeps in
Drosophila. PLoS Genet. 4:e1000198.
Kaplan NL, Hudson RR. 1989. The ’hitchhiking’ effect revisited. Genetics 123:887–899.
Kelly JK. 1997. A test of neutrality based on interlocus associations.
Genetics 146:1197–1206.
Kim Y, Nielsen R. 2004. Linkage disequilibrium as a signature of selective sweeps. Genetics 167:1513–1524.
Kim Y, Stephan W. 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160:765–777.
Kimura M. 1983. The neutral theory of molecular evolution.
Cambridge (UK): Cambridge University Press.
Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL. 2005. Evolution
of proteins and gene expression levels are coupled in Drosophila
and are independently associated with mRNA abundance, protein
length and number of protein-protein interactions. Mol Biol Evol.
22:1345–1354.
Li H, Stephan W. 2006. Inferring the demographic history and rate of
adaptive substitution in Drosophila. PLoS Genet. 2:e166.
Loader C. 1999. Local regression and likelihood. New York: Springer
Verlag.
Macpherson JM, Sella G, Davis JC, Petrov DA. 2007. Genomewide
spatial correspondence between nonsynonymous divergence and
neutral polymorphism reveals extensive adaptation in Drosophila.
Genetics 177:2083–2099.
660
MBE
Maside XL, Lee AWS, Charlesworth B. 2004. Selection on codon usage
in Drosophila americana. Curr Biol. 14:150–154.
Smith JM, Haigh J. 1974. The hitch-hiking effect of a favorable gene.
Genet Res. 23:23–35.
McVean GAT, Charlesworth B. 1999. A population genetic model for
the evolution of synonymous codon usage: patterns and predictions. Genet Res. 74:145–158.
McVean GAT, Charlesworth B. 2000. The effects of Hill-Robertson
interference between weakly selected mutations on patterns of
molecular evolution and variation. Genetics 155:929–944.
Nickerson DA, Tobe VO, Taylor SL. 1997. Polyphred: automating
the detection and genotyping of single nucleotide substitutions
using fluorescence-based resequencing. Nucleic Acids Res. 25:
2745–2751.
Nielsen R, DuMont VLB, Hubisz MJ, Aquadro CF. 2007. Maximum
likelihood estimation of ancestral codon usage bias parameters in
Drosophila. Mol Biol Evol. 24:228–235.
Ohta T. 1992. The nearly neutral theory of molecular evolution. Annu
Rev Ecol Syst. 23:263–286.
Palmé AE, Wright M, Savolainen O. 2008. Patterns of divergence
among conifer ESTs and polymorphism in Pinus sylvestris identify
putative selective sweeps. Mol Biol Evol. 25:2567–2577.
Przeworski M. 2002. The signature of positive selection at randomly
chosen loci. Genetics 160:1179–1189.
Pyhäjarvi T, Garcia-Gil MR, Knurr T, Mikkonen M, Wachowiak W,
Savolainen O. 2007. Demographic history has influenced nucleotide diversity in European Pinus sylvestris populations. Genetics 177:1713–1724.
R Development Core Team. 2007. R: a language and environment for
statistical computing. Vienna (Austria): R Foundation for Statistical Computing. ISBN 3-900051-07-0.
Sella G, Petrov DA, Przeworski M, Andolfatto P. 2009. Pervasive
natural selection in the Drosophila genome? PLoS Genet. 5(6):
e1000495.
Singh ND, Bauer DuMont VL, Hubisz MJ, Nielsen R, Aquadro CF.
2007. Patterns of mutation and selection at synonymous sites in
Drosophila. Mol Biol Evol. 24:2687–2697.
Tajima F. 1983. Evolutionary relationship of DNA sequences in finite
populations. Genetics 105:437–460.
Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595.
Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment
through sequence weighting, position specific gap penalties and
weight matrix choice. Nucleic Acids Res. 22:4673–4680.
Thornton K. 2003. libsequence: a C++ class library for evolutionary genetic analysis. Bioinformatics 19:2325–2327.
Tuskan G, DiFazio S, Jansson S, Bohlman J, et al. (111 co-authors). 2006.
The genome of black cottonwood Populus trichocarpa (Torr. &
Gray). Science 313:1596–1604.
Watterson GA. 1975. On the number of segregating sites in genetical
models without recombination. Theor Popul Biol. 7:256–276.
Wiehe TH, Stephan W. 1993. Analysis of a genetic hitchhiking model,
and its application to DNA polymorphism data from Drosophila
melanogaster. Mol Biol Evol. 10:842–854.
Wright SI, Andolfatto P. 2008. The impact of natural selection on the
genome: emerging patterns in Drosophila and Arabidopsis. Annu
Rev Ecol Evol Syst. 29:193–213.
Wright SI, Yau CBK, Looseley M, Meyers BC. 2004. Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata. Mol Biol Evol. 21:1719–1726.
Yang Z. 1997. PAML: a program package for phylogenetic analysis by
maximum likelihood. Comput Appl Biosci. 13:555–556.
Zeng K, Fu YX, Shi S, Wu CI. 2006. Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174:
1431–1439.