PDF - Oxford Academic - Oxford University Press

Expression Divergence Is Correlated with Sequence Evolution
but Not Positive Selection in Conifers
Kathryn A. Hodgins,*,†,1 Sam Yeaman,†,2,3,4 Kristin A. Nurkowski,1 Loren H. Rieseberg,2 and Sally N. Aitken3
1
School of Biological Sciences, Monash University, Melbourne, VIC, Australia
Department of Botany, University of British Columbia, Vancouver, BC, Canada
3
Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, Canada
4
Department of Biological Sciences, University of Calgary, Calgary, AB, Canada
†
These authors contributed equally to this work.
*Corresponding author: E-mail: [email protected].
Associate editor: Stephen Wright
2
Abstract
The evolutionary and genomic determinants of sequence evolution in conifers are poorly understood, and previous
studies have found only limited evidence for positive selection. Using RNAseq data, we compared gene expression profiles
to patterns of divergence and polymorphism in 44 seedlings of lodgepole pine (Pinus contorta) and 39 seedlings of
interior spruce (Picea glauca engelmannii) to elucidate the evolutionary forces that shape their genomes and their
plastic responses to abiotic stress. We found that rapidly diverging genes tend to have greater expression divergence,
lower expression levels, reduced levels of synonymous site diversity, and longer proteins than slowly diverging genes.
Similar patterns were identified for the untranslated regions, but with some exceptions. We found evidence that genes
with low expression levels had a larger fraction of nearly neutral sites, suggesting a primary role for negative selection in
determining the association between evolutionary rate and expression level. There was limited evidence for differences in
the rate of positive selection among genes with divergent versus conserved expression profiles and some evidence
supporting relaxed selection in genes diverging in expression between the species. Finally, we identified a small number
of genes that showed evidence of site-specific positive selection using divergence data alone. However, estimates of the
proportion of sites fixed by positive selection (a) were in the range of other plant species with large effective population
sizes suggesting relatively high rates of adaptive divergence among conifers.
Key words: lodgepole pine, white spruce, Engelmann spruce, gene expression, RNAseq, positive selection, climate.
Introduction
Article
Understanding why genes evolve at different rates is a major
goal of the field of molecular evolution, and the evolutionary
cause(s) of rate variation has been the subject of much debate. Evolutionary rate could largely be determined by the
balance between the level of selective constraint on a protein
(i.e., negative or purifying selection) and genetic drift (Kimura
1983; Ohta 2002). Alternatively protein sequence evolution
may depend on the rate at which new beneficial mutations
arise and fix (Gillespie 1991). Evolutionary rates of coding
sequences can be quantified by comparing the rate of substitutions at synonymous sites (dS), which are presumed neutral, to the rate of substitutions at nonsynonymous sites (dN,
amino acid replacement), which may experience selection.
Comparisons between these measures can provide compelling evidence of selection. Purifying selection will reduce the
fixation rate at deleterious replacement sites, lowering dN/dS
ratios, while positive selection should increase the fixation rate
of beneficial sites, increasing dN/dS (Bielawski and Yang 2005).
Several genomic parameters, such as the expression level,
breadth, and divergence; the number of protein–protein interactions; gene network position; synonymous
polymorphism; and gene essentiality can correlate with evolutionary rate (Duret and Mouchiroud 2000; Ingvarsson 2007;
Ramsay et al. 2009; Slotte et al. 2011; Renaut et al. 2012), but
the evolutionary mechanisms that produce many of these
patterns remain obscure. For example, many studies have
found that genes with high levels of expression evolve more
slowly (for review see Rocha 2006), and while neutral theory
predicts that this pattern is likely due to differences in the
strength of negative selection, it is possible that adaptive divergence is also constrained in high expression genes.
Previous studies examining the relationship between sequence and expression divergence have found contrasting
results, with some identifying no correlation (e.g., yeast,
Tirosh and Barkai 2008; sunflowers, Renaut et al. 2012;
Moyers and Rieseberg 2013) prompting suggestions that sequence and expression evolution are fundamentally
decoupled. However, others have identified positive correlations (e.g., Drosophilla, Nuzhdin et al. 2004; mammals, Jordan
et al. 2005; Warnefors and Kaessmann 2013). In some cases,
such as Drosphila, positive selection has been implicated as
an important contributor to the correlation between dN and
expression divergence (Nuzhdin et al. 2004), while in others
ß The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
All rights reserved. For permissions, please e-mail: [email protected]
1502
Mol. Biol. Evol. 33(6):1502–1516 doi:10.1093/molbev/msw032 Advance Access publication February 12, 2016
Conifer Sequence and Expression Evolution . doi:10.1093/molbev/msw032
relaxed purifying selection and genetic drift have been proposed to be the primary driving force of weaker associations
between sequence and expression divergence (e.g., Liao and
Zhang 2006). Why these discrepancies among taxa exist is
uncertain and could be related to the efficacy of selection or
time since divergence, as tempo of gene expression and sequence evolution may differ through time, or the genomic
targets of selection may differ among taxa.
Investigation of the molecular evolution of conifers on a
genome wide basis has received relatively little attention, perhaps because it has been hampered by the large genome sizes
of these species (De La Torre et al. 2014). Conifers have genomic characteristics that distinguish them from many other
plant groups, including large genomes, long generation times,
slow evolutionary rates, long introns, and limited evidence of
paleopolyploidy (Nystedt 2013; but see Li et al. 2015). Previous
comparisons of evolutionary rates between spruce-pine and
poplar-Arabidopsis lineages have identified higher dN/dS ratios in the conifers (Buschiazzo et al. 2012). This is despite
much lower substitution rates in conifers, perhaps related to
longer generation times (or cell lineage division time), lower
mutation rates or the impact of large effective population
sizes on weakly deleterious mutations (Buschiazzo et al. 2012).
Nearly neutral theory predicts that genome wide elevation in
dN/dS should largely be a function of drift rather than greater
rates of positive selection (Kimura 1983; Ohta 2002), and
molecular evidence for adaptive divergence in conifers is
somewhat limited (Eckert et al. 2013a; but see Eckert et al.
2013b). However, widespread conifers have life history traits
that confer large effective population sizes and weak population structure (Neale and Kremer 2011), suggesting that
adaptive evolution could contribute substantially to divergence among species. There is also considerable evidence
for local adaptation at the phenotypic level in many forest
trees, and a growing number of studies have focused on identifying the genomic basis of these adaptations (Howe et al.
2003; Savolainen et al. 2007; Aitken et al. 2008; Eckert and
Shahi 2012; Parchman et al. 2012; De La Torre et al. 2013).
Here, we describe the results of a study of evolutionary rate
in lodgepole pine (Pinus contorta) and interior spruce (natural
hybrid Picea engelmannii Picea glauca). In the context of
this study we refer to interior spruce as a single “species” as
the seeds used in this study were produced by parents originating in a population within this ancient hybrid zone (De La
Torre et al. 2014; Yeaman et al. 2014). Lodgepole pine and
interior spruce are widespread conifers in Western North
America. Both have substantial ecological and economic importance, with over 200 million trees planted each year in
Western Canada.
Our aim is to compare patterns of divergence and polymorphism in coding and untranslated regions of lodgepole
pine and interior spruce to help elucidate the evolutionary
forces that shape the genomes of conifers. To do this we used
our reference transcriptomes for both species, as well as publically available transcriptomes for four other species, to identify orthologs and substitution rates. We then aligned
RNAseq reads for lodgepole pine and interior spruce to their
respective reference transcriptomes to determine levels of
MBE
polymorphism. Using these data, we were able to estimate
the proportion of amino acid substitutions fixed by positive
selection and also identify specific genes that showed evidence of positive selection. Results from these analyses allowed us to address the following specific questions:
(1) What are the genomic determinants of evolutionary
rates in coding and noncoding gene regions in conifers?
We tested whether expression level, expression divergence, protein length, specificity of expression across
environments, and neutral polymorphism were associated with nonsynonymous and synonymous divergence between lodgepole pine and interior spruce.
(2) What are the relative contributions of positive and
negative selection in driving differences in evolutionary
rate for genes with different expression profiles? We
examined whether expression classes based on divergence versus conservation of gene expression between
species, as well as average expression levels, influenced
the distribution of fitness effects (DFE) of new mutations and the proportion of substitutions fixed by positive selection in lodgepole pine (a), using either
interior spruce or loblolly pine (Pinus taeda) as an outgroup. This provided insight into the evolutionary
forces that drive patterns of molecular evolution in
conifers.
(3) What is the role of adaptation in driving divergence in
conifers, and more specifically, what genes show evidence of high evolutionary rates and positive selection
in this group? We used a maximum likelihood approach to identify orthologous genes showing evidence of site-specific positive selection in conifers. To
do this we utilized the transcriptome references of the
focal species (lodgepole pine and interior spruce), as
well as loblolly pine, Sitka spruce (Picea sitchensis),
Norway spruce (Picea abies) and Douglas-fir
(Pseudotsuga menziesii).
Results
We found 13,809 one-to-one orthologs between lodgepole
pine and interior spruce. On average the dN/dS ratio was
0.278 60.002 (N ¼ 5,195; dN ¼ 0.0436 60.0003; dS ¼ 0.166
6 0.0009). The substitution rate for the 50 -UTR was
0.130 6 0.0015 (N ¼ 4,566) and for the 30 -UTR was 0.090 6
0.0019 (N ¼ 4,719).
Gene Expression Patterns versus Evolutionary Rate in
Lodgepole Pine and Interior Spruce
We predicted that evolutionary rate would be reduced in
genes that were conserved in their pattern of gene expression
between species, reflecting conserved gene function across
evolutionary time compared to those that had diverged in
expression between the species. For orthologs identified between pine and spruce we considered four mutually exclusive
classes of genes based on their expression pattern between
species and among seven climate treatments as identified in
Yeaman et al. (2014).
1503
MBE
1504
0.28
ab
0.24
0.26
a
0.20
0.22
dN/dS
c
bc
CEG
DEG
NON
SEG
Expression pattern
(b)
0.09
ab
a
ab
0.05
0.05
Substitution rate 3’UTR
b
CEG
DEG
NON
SEG
ab
ab
DEG
NON
a
0.12
0.13
b
0.11
(c)
0.14
Expression pattern
0.10
We found strong evidence that evolutionary rate varied
depending on the pattern of gene expression among treatments and species (fig. 1 and table 1; supplementary tables S4
and S5, Supplementary Material online). These expression
pattern classes of genes for the lodgepole pine-interior spruce
comparison were significantly related to gene-specific estimates of evolutionary rate for pairwise comparisons of the
two species for both untranslated regions and coding sequences. The fastest evolutionary rates were found for genes
that diverged in overall expression (SEG) or diverged in their
pattern of expression plasticity (DEG), relative to lower rates
observed in genes with expression variation that was not
associated with treatment or species (NON), or conserved
patterns of expression plasticity (CEG). SEG had significantly
higher rates of evolution than both NON and CEG in the 50 UTR and dN/dS, while DEG had significantly higher rates of
evolution than NON and CEG in the 30 -UTR and dN/dS.
Differences between NON and CEG versus DEG or SEG
were nonsignificant for the 50 -UTR (DEG) and 30 -UTR
(SEG). There was a marginally significant difference in dS rates
among expression classes (no pairwise differences after correcting for multiple tests). However, there was a stronger
effect of expression class on dN, indicating that differences
in the rates of dN/dS are being driven by the accumulation of
amino acid changing mutations. On average, the more rapidly
evolving classes (DEG and SEG) had a 6.4% higher rate of
sequence evolution (as measured by dN/dS) than the more
slowly evolving classes (CEG and NON).
Multiple factors are known to be associated with sequence
evolution in other species (Duret and Mouchiroud 2000;
Ingvarsson 2007; Ramsay et al. 2009; Slotte et al. 2011;
Renaut et al. 2012). We predicted that, in addition to diverged
patterns of expression, rapidly evolving genes should have low
levels of average expression. If rapidly evolving genes experienced recent sweeps, reduced levels of synonymous polymorphism are also predicted (Andolfatto 2007; Lohmueller et al.
2011). We first examined the relationship among these factors
using pairwise correlations, followed by partial correlations to
(a)
Substitution rate 5’UTR
(1) Conserved expression genes (CEG) are those with significant differences in expression among treatments
(expression plasticity), but the pattern of expression
among the seven treatments was retained between
the species. These genes were identified by a significant
treatment effect, but no species by treatment interaction effect (in some cases these genes had a significant
species effect, with differences in the average expression level overall).
(2) Diverged expression genes (DEG) are those with divergent patterns of differential expression between species. These genes were identified by a significant
interaction between species and treatment.
(3) Species expression genes (SEG) had a significant difference in overall expression level between species, but no
interaction or treatment effect.
(4) Nonsignificant genes (NON) were those for which
none of the factors in the model explained a significant
proportion of the variation in gene expression.
0.30
Hodgins et al. . doi:10.1093/molbev/msw032
CEG
SEG
Expression pattern
FIG. 1. The impact of gene expression class on evolutionary rate between interior spruce and lodgepole pine for different gene regions.
CEG, conserved expression genes (conserved plasticity in gene expression between species); DEG, diverged expression genes (diverged plasticity in gene expression between species); SEG, species expression
genes (no plasticity in expression but divergence in average expression level between the species); NON, no differences in expression
among treatments or species. (a) dN/dS (nonsynonmous substitution rate/synonymous substitution rate) (CEG: N ¼ 1,608; DEG:
N ¼ 870; SEG: N ¼ 1,220; NON: N ¼ 1,470); (b) substitution rate for
the 30 -UTR (CEG: N ¼ 1,465; DEG: N ¼ 775; SEG: N ¼ 1,370; NON:
N ¼ 1,106); (c) substitution rate for the 50 -UTR (CEG: N ¼ 1,423;
DEG: N ¼ 764; SEG: N ¼ 1,316; NON: N ¼ 1,063). Means and standard
errors are shown. Different letters indicate significant differences.
Conifer Sequence and Expression Evolution . doi:10.1093/molbev/msw032
Table 1. Results from ANOVA Tests of Differences in Evolutionary
Rates between Genes of Different Expression Class (CEG, DEG, SEG,
NON).
Region or Site Type
0
3 -UTR
50 -UTR
dN
dS
dN/dS
F Statistic
3.31
7.83
4.43
2.57
5.81
df
P
3,4715
3,4562
3,5191
3, 5191
3,5191
<0.05
<0.001
<0.01
0.052
<0.001
NOTE.—See figure 1 for a depiction of the patterns.
control for correlations among variables (fig. 3a; supplemen
tary table S7, Supplementary Material online). dN/dS ratios
were positively associated with protein length (q ¼ 0.05, P <
0.01) but negatively correlated with average expression level
(q ¼ 0.05, P <0.01), synonymous nucleotide diversity, ps
(q ¼ 0.14, P < 0.001), and expression divergence (q ¼
0.04, P < 0.05). However, the negative correlation between
expression divergence and dN/dS was driven by a strong
positive relationship with mean expression and expression
divergence. Consequently, the correlation between dN/dS
and expression divergence became positive once mean expression level was accounted for in the analysis (partial q ¼
0.09, P < 0.001). Therefore, we conclude that rapidly diverging genes tend to have lower expression levels, reduced levels
of synonymous site diversity, greater expression divergence,
and longer proteins. We note that Snorm (diversity scaled by
divergence; Lohmueller et al. 2011) andps were highly correlated (q ¼ 0.71, P < 0.001) and provided a similar negative
correlation with dN (q ¼ 0.11, P < 0.001) so we chose to
conduct all of the analysis using ps rather than Snorm.
Using pairwise correlations, ps was positively correlated
with average expression level (q ¼ 0.08, P < 0.001) and
expression divergence (q ¼ 0.11, P < 0.001), and negatively
correlated with protein length (q ¼ 0.19, P < 0.001).
Partial correlations followed a similar pattern. Genes with
lower levels of synonymous nucleotide diversity had weaker
expression, longer proteins, and reduced divergence in expression. Shorter proteins had weaker expression divergence
(q ¼ 0.14, P < 0.001), and tended to have lower expression
levels when controlling for correlations among variables (q ¼
0.13, P < 0.001).
Partial correlation estimates for dN and dS separately identified similar patterns for dN and dN/dS (supplementary fig.
S1, Supplementary Material online). Partial correlations were
positive between dS and expression divergence, as well as ps.
We also examined the relationship between evolutionary rate
and treatment specificity of expression (supplementary fig. S2,
Supplementary Material online) in a broader set of genes, as
polymorphism data were not required for this analysis. We
predicted that genes with context-dependent expression (i.e.,
greater treatment specificity) would experience higher evolutionary rates because the effect of mutations is limited to a
specific environmental context (Snell-Rood et al. 2010). We
found a positive correlation between dN/dS (q¼0.07,
P <0.001) and treatment specificity, meaning genes with
high dN/dS tended to be expressed in fewer treatments,
MBE
but this correlation was not significant after correlations
among the other variables were accounted for (P ¼0.25).
As changes in the UTRs can have direct impacts on gene
expression, we tested if divergence in expression was associated with evolutionary rate in these regions. Patterns in
untranslated regions showed associations that paralleled those
for dN/dS, with the exception of protein length and ps, which
exhibited the opposite pattern (fig. 3b and c). Partial correlations were positive between 30 -UTR rate and dN (q ¼ 0.15, P
< 0.001), dS (q ¼ 0.07, P < 0.001), expression divergence
(q¼0.06, P <0.001), and ps (q¼ 0.07, P <0.01), but were negative for mean expression (q¼0.05, P < 0.01) and protein
length (q¼0.08, P <0.001). The 50 -UTR patterns of variation
were similar to 30 -UTR, with positive correlations between 50 UTR substitution rate and dN (q¼0.08, P <0.001), as well as
ps (q¼0.08, P <0.001), but negative associations for mean
expression (q¼0.06, P <0.01) and protein length (q ¼
0.12, P < 0.001). However, dS and expression divergence
were not correlated. Contrasting patterns were identified with
treatment specificity as this variable was negatively correlated
with 50 -UTR rate (q¼0.06, P <0.001), but weakly positively
correlated with 30 -UTR (q¼0.04, P <0.01; supplementary fig.
S2, Supplementary Material online).
We tested the hypothesis that the association between
expression and sequence divergence is driven by relaxed selection. We predicted a negative correlation between the
difference in branch-specific estimates of dN/dS and average
expression difference (pine-spruce) if relaxed selection was
driving this pattern. We identified 3,286 alignments with
lodgepole pine, interior spruce and Douglas fir. There was
no association between the difference in lineage specific estimates of dN/dS and expression level differences between pine
and spruce (q¼ 0.007, P ¼ 0.65).
Variation in Polymorphism and Divergence among
Gene Regions and between Species
Interior spruce had higher levels of polymorphism at synonymous sites than lodgepole pine (ps: P <0.001; pine
mean 6 SE ¼0.0058 60.00016;
spruce
mean 6 SE ¼
0.0073 6 0.00018) and a more skewed site frequency
spectrum (SFS) at synonymous sites (Tajima’s D: P <0.001;
pine mean 6 SE ¼0.51 60.034; spruce mean 6 SE ¼
0.84 60.028; supplementary table S3, Supplementary
Material online). We found no differences in the SFS and
amount of within-species polymorphism at synonymous sites
for the different expression classes (P >0.1) in either species.
There was a positive correlation between average expression
and ps (pine: q ¼ 0.08, P <0.001; spruce: q ¼ 0.06, P <0.001).
We repeated this analysis for pine using down-sampled reads
and found the same correlation (q ¼ 0.09, P <0.001) demonstrating that this pattern is likely not the result of biases in
single nucleotide polymorphism (SNP) calling associated with
expression differences. In addition, ps was highly correlated
between down-sampled and non-down-sampled approaches
(q ¼ 0.92, P <0.001). Moreover the same positive relationship
was detected between dS and expression levels; similar patterns between polymorphism and divergence are expected if
1505
MBE
Hodgins et al. . doi:10.1093/molbev/msw032
0.20
(b)
a
a
0.15
c
0.05
0.10
divergence
a
c
b
d
b
0.00
0.000 0.001 0.002 0.003 0.004 0.005 0.006
nucleotide polymorphism (
(a)
Synonymous Replacement
3'UTR
5'UTR
Region
Synonymous Replacement
3'UTR
5'UTR
Region
FIG. 2. Polymorphism and divergence estimates of 57 genes containing information for both untranslated regions and coding sequences for
lodgepole pine. Interior spruce was used as the outgroup. Different letters indicate significant differences.
synonymous sites are largely evolving neutrally, but not if
biases in SNP calling are driving the association between expression and ps. The highest levels of nucleotide polymorphism and divergence were at synonymous sites, followed by
50 -UTR, and 30 -UTR sites, with the lowest levels of diversity
and divergence in replacement sites for both lodgepole pine
and interior spruce (fig. 2; supplementary table S2,
Supplementary Material online).
DFEs and Adaptive Divergence
We used the method of Eyre-Walker and Keightley (2009)
under two different demographic scenarios to quantify the
DFEs, the proportion differences fixed by positive selection (a)
and the rate of adaptive fixation (xa) in lodgepole pine. The
DFE showed similar patterns across the expression pattern
categories with one exception: there was a slightly smaller
proportion of sites in the effectively neutral category
(NeS < 1) for the NON genes compared with the SEG genes
(fig. 4a) in the two-epoch model (proportion NON¼ 0.17,
SEG¼ 0.21, P < 0.01). The same pattern was identified in
the down-sampled data set, but the difference between
NON and SEG was no longer significant (supplementary fig.
S4, Supplementary Material online). There was no difference
in a or xa among the categories (fig. 4b and c), nor any
differences among the expression categories for the one-epoch model for a, xa, or DFE (supplementary fig. S3,
Supplementary Material online). A similar pattern was found
using the approximation method although a was even higher
across all categories (method II, Eyre-Walker and Keightley
2009; supplementary table S5, Supplementary Material
online). Over all genes our estimate of a was 0.16 (95%
CI ¼ 0.12–0.20) and our estimate of xa was 0.034 (95%
CI ¼ 0.026–0.043) when applying the two-epoch model.
We compared the DFEs, a and xa using the two-epoch
model among expression levels. We found a greater
1506
proportion of nearly neutral sites in the low expression category compared with the high category (proportion NeS < 1,
high¼ 0.16, low¼ 0.23, P < 0.001) and a larger fraction of sites
under strong purifying selection (proportion NeS > 100,
high¼ 0.72, low¼ 0.62, P <0.001) in high expression genes
compared with low expression genes. We found no evidence
of differences in the proportion of sites fixed by positive selection between expression level categories (fig. 5a), nor the
rate of positive selection between expression level categories
(fig. 5b and c). Using the one-epoch model with loblolly pine
as the outgroup, there was a greater proportion of highly
deleterious sites in the high expression category compared
with the low expression categories (proportion NeS > 100,
high¼ 0.57, low¼ 0.17, P < 0.001), and the reverse pattern
for the intermediate categories (proportion NeS 10–100,
high¼ 0.21, low¼ 0.50, P < 0.001; NeS 1-10, high¼ 0.11,
low¼ 0.22; supplementary fig. S5, Supplementary Material
online). We also found that a and xa were greater for the
low expression category compared with the high expression
category. The mid expression levels were intermediate between the low and high categories in all cases. Similar patterns
were found when interior spruce was used as an outgroup for
this analysis (supplementary figs. S6 and S8, Supplementary
Material online), although the absolute values of a and xa
were higher when loblolly was the outgroup, perhaps because
a more conserved set of orthologs were identified in comparisons between pine and spruce. Similarly, a higher dN/dS was
identified on average for lodgepole pine when loblolly was
used as the outgroup compared to interior spruce. The same
patterns and similar absolute values of a were identified when
down-sampled reads were used to calculate polymorphism
(supplementary fig. S7, Supplementary Material online), demonstrating that biases and SNP calling were not driving these
patterns. Again, a similar pattern was found using the approximation method, although a was even higher across all
MBE
Conifer Sequence and Expression Evolution . doi:10.1093/molbev/msw032
Table 2. Genes with Evidence for Site-Specific Positive Selection Using Paml Models (M1a vs. M2a and M7 vs. M8) and the Corresponding Best
BLASTX Hit to Arabidopsis.
Orthogroup
LRT M7:M8
LRT M1a:M2a
ortho_group8482
ortho_group3686
ortho_group10360
ortho_group13328
ortho_group8319
ortho_group3142
ortho_group10677
ortho_group18841
ortho_group4644
ortho_group7797
ortho_group14244
ortho_group6195
ortho_group6148
ortho_group7794
ortho_group13072
ortho_group7027
ortho_group7322
ortho_group5194
ortho_group12985
ortho_group5684
ortho_group2017
ortho_group12314
ortho_group7511
ortho_group8450
ortho_group8790
43.63***
28.08**
25.23**
40.49***
28.00**
24.98**
25.61***
24.35**
24.29**
22.85**
22.81**
21.43*
20.42*
19.71*
19.64*
19.39*
19.15*
18.51*
24.64**
24.55**
22.86*
22.82*
21.48*
22.21*
20.42*
19.60*
19.39*
19.58*
18.56*
21.39*
20.78*
20.32*
18.45*
18.38*
18.29*
18.18*
18.09*
17.67*
17.57*
Arabidopsis Top Hit
AT3G17730.1
AT3G22600.1
AT5G01310.1
AT2G42510.1
AT1G28090.1
AT4G32300.1
AT1G52320.2
AT4G14385.2
AT1G69170.1
AT5G04000.1
AT3G20560.1
AT5G01310.1
AT1G70630.1
AT4G29260.1
AT1G60730.2
AT3G14120.1
AT4G17330.1
AT1G26180.1
AT2G15440.1
Description
No hit
NAC domain transcription factor
Seed storage 2S albumin superfamily
APRATAXIN-like
No hit
Involved in spliceosome assembly
Polynucleotide adenylyltransferase family protein
S-domain-2 5
Involved in N-terminal protein myristoylation
Unknown function
Squamosa promoter-binding protein-like transcription factor
No hit
Unknown function
Thioredoxin (TRX) superfamily
APRATAXIN-like
Nucleotide-diphospho-sugar transferase
HAD superfamily, subfamily IIIB acid phosphatase
No hit
NAD(P)-linked oxidoreductase superfamily protein
Unknown function
G2484-1 protein of unknown function
Unknown function
No hit
Unknown function
No hit
NOTE.—Two times the difference in the lnL from the models (Likelihood ratio test (LRT), df ¼ 2 in both cases) is shown along with the significance.
*P < 0.05;
**P < 0.01;
***P < 0.001.
categories (method II, Eyre-Walker and Keightley 2009; sup
plementary table S5, Supplementary Material online).
Genes with Signatures of Positive Selection
To identify specific genes that showed evidence of positive
selection in conifers, we examined 7,185 alignments that
passed all of the filtering requirements. Only 17 orthogroups
had dN/dS ratios that were greater than 1, suggesting substantial constraints on evolutionary rates. After correcting for
multiple comparisons, Paml’s M1a:M2a models identified 15
significant orthogroups and the M7:M8 models identified 23.
All of the orthogroups in M1a:M2a except one were significant in the M7:M8 comparison. Separate runs of the M8
model with different starting values returned similar results.
Top blast hits from Arabidopsis thaliana for these 24 genes
(Yeaman et al. 2014) are shown in table 2.
Discussion
Divergence and Conservation of Expression and
Evolutionary Rate
To our knowledge, this is the first conifer study to find that
evolutionary rate is associated with changes in expression
plasticity between species (fig. 3). Genes that had conserved
expression patterns across treatments between lodgepole
pine and interior spruce, particularly those that varied significantly across treatments, had the lowest average level of
coding sequence evolution (table 1 and fig. 1). In contrast,
those genes showing divergence in expression plasticity, as
well as those that had constitutive differences in expression
between lodgepole pine and interior spruce, had significantly
higher dN/dS.
Our results show a distinct association between coding
sequence evolution and divergence in expression patterns in
comparisons between lodgepole pine and interior spruce
with a 6.4% increase in dN/dS for the two more highly divergent expression categories (DEG, SEG) (fig. 1). Two
nonmutually exclusive factors could be driving this pattern.
The first is that an increase in the rate of fixation due to
positive selection in genes diverging in expression could
produce this relationship. For instance, a change in gene
expression might lead to selection for corresponding
changes in coding sequence to improve the function of
the gene in its altered role. However, we found no significant
differences in the proportion of substitutions resulting from
positive selection or the rate of positive selection (a or xa;
fig. 4b and c; supplementary fig. S3b and c, Supplementary
Material online) among our expression categories. Indeed,
the trend was in the opposite direction with CEG and NON
genes having slightly higher estimates of a. As a second explanation, genes that are diverging in expression may experience reduced purifying selection in one or both of the
species. For example, higher expression appears to be correlated with greater purifying selection, so a reduction in
gene expression in one species relative to another may be
accompanied by a relaxation of selection in that species.
1507
MBE
Hodgins et al. . doi:10.1093/molbev/msw032
(a)
Partial correlation
P<0.001
dN/dS
P<0.001
P<0.001
P<0.001
P<0.001
P<0.001
P<0.001
P<0.001
P<0.001
Spearman’s rho
mean expression
P<0.01
protein length
P<0.01
P=0.10
expression
divergence
P<0.05
P<0.001
P<0.05
P<0.001
P<0.001
P<0.001
dN/dS
mean expression
protein length
1.00
0.75
0.50
0.25
s
s
P<0.001 P<0.001 P<0.01
rate
P<0.001 P<0.001 P<0.01
P<0.001 P<0.001 P<0.001 P<0.001 P<0.01
dN
P<0.001
dS
P<0.001 P<0.001
P=0.11
0.00
P<0.001
expression
divergence
(b)
P<0.01
P=0.85
P<0.001 P<0.001
Spearman’s rho
1.00
0.75
mean expression
P<0.01 P=0.62
P<0.001 P<0.001 P<0.001
P<0.001
0.50
0.25
protein length
expression
divergence
s
P<0.001 P<0.001 P<0.001 P=0.11
P<0.001 P<0.001
P<0.001 P=0.06
P<0.001 P<0.001 P<0.04
P<0.001 P=0.19
P<0.001 P<0.001 P<0.001 P<0.001
P<0.001 P=0.33
s
expression
divergence
P<0.01
protein length
P<0.001 P=0.51
rate
mean expression
dS
dN
rate
(c)
P<0.001
P<0.001 P<0.001 P<0.001 P<0.001 P<0.001
dN
P<0.001
dS
P<0.01 P<0.001
P=0.07
0.00
P=0.07
P=0.92 P<0.001 P<0.001
Spearman’s rho
1.00
0.75
P<0.001 P<0.001 P<0.001
mean expression
P=0.53 P=0.30
P<0.001
protein length
P<0.001 P<0.01
P<0.01
expression
divergence
P=0.82 P=0.12
P<0.001 P<0.001 P=0.03
P<0.001 P=0.11
P<0.001 P<0.01
0.50
0.25
s
P<0.001 P<0.001
P=0.12
0.00
P<0.001
P<0.001 P<0.001
s
expression
divergence
protein length
mean expression
dS
dN
rate
Pairwise correlation
FIG. 3. Pairwise (below diagonal) and partial (above diagonal)
Spearman’s rank correlations between evolutionary rates and genomic variables. (a) dN/dS (N¼ 3,199), (b) 50 -UTR (N ¼ 2,924), and (c)
30 -UTR (N ¼ 3,028). Note that some contrasts that are reproduced
among panels may differ slightly in their results, due to differences in
the number of genes included, as a result of incomplete information
from UTRs.
A slight but significant increase in the proportion of nearly
neutral substitutions in genes with differences in expression
among species (SEG) compared to those with constitutive
expression across treatments and species (NON) provides
some support for this hypothesis (fig. 4a). This pattern
1508
suggests that genes with conserved expression patterns experience stronger purifying selection than those that have
shifted in some way between the species. However, we
found no significant differences between CEG and DEG
genes in a, xa, or the DFE, and differences in branch-specific
estimates of dN/dS were not correlated with expression differences between pine and spruce. Although differences in
the DFE among expression class provide some support for
the hypothesis that genes that change expression over evolutionary time experience relaxed selection, we lacked an
outgroup for the gene expression analysis with which to
make inferences about which branch(s) the change in expression took place and the direction of the shift. Such information is needed to assess whether positive or negative
selection is associated with an increase or decrease in gene
expression in each lineage, as both could be contributing to
the pattern. Future studies that examine both gene expression and sequence evolution in a phylogenetic context may
allow a greater understanding of the relative roles of positive
and negative selection in driving coding sequence and expression divergence.
The UTRs of eukaryotic mRNAs play an important role in
the posttranscriptional regulation of gene expression (Pesole
et al. 2001; Liu et al. 2012). UTRs can influence gene expression
through their impact on mRNA stability, transcription or
translation efficiency, and mRNA localization (Narsai et al.
2007; Liu et al. 2012). The 30 -UTR showed patterns of polymorphism and divergence that were more similar to the replacement sites while the 50 -UTR showed patterns more
similar to synonymous sites (fig. 2). This suggests greater evolutionary constraints on the 30 -UTR, which tends to be longer
and likely harbors more functional sites (Narsai et al. 2007; Liu
et al. 2012). Because changes in these regions may have direct
impacts on gene expression, we tested if divergence in expression was associated with evolutionary rate in these regions. Genes conserved in their plasticity of expression (CEG)
showed the slowest evolutionary rates for both the 50 - and 30 UTRs, perhaps because of the role of these regions in maintaining proper regulation in response to environmental
change (fig. 1). Interestingly, the 30 -UTR displayed a significant
positive association between evolutionary rate and expression
divergence (fig. 3), but there was no significant association of
the 50 -UTR rate with expression divergence. This makes sense
as a primary function of the 30 -UTR is to regulate expression
of mRNA, whereas the 50 -UTR has a major role in regulating
translation (Mignone et al. 2002).
Gene Expression Level and Evolutionary Rate
We identified negative correlations between both dN/dS
and dN and average expression level. Several studies have
found that genes with high levels of expression are constrained in their evolutionary rate (for review see Rocha
2006). High expression genes may function in a wide array
of biochemical environments and may be able to tolerate
fewer mutations because they interact with a wide array of
partners (Duret and Mouchiroud 2000; Subramanian and
Kumar 2004). Alternatively, there may be selection to reduce
the number of mis-folded proteins. This would constrain dN
MBE
Conifer Sequence and Expression Evolution . doi:10.1093/molbev/msw032
(b)
Proportion of sites
(a)
ns
a
CEG
DEG
NON
SEG
ns
ab ab a
b
ns
ns
(c)
ns
<1
SEG
NON
DEG
CEG
NeS category
Expression category
FIG. 4. The DFEs (a), the proportion of sites fixed by positive selection (b) and the rate of fixation by positive selection (c) for four gene expression
categories determined by the patterns of expression among climate treatments and between lodgepole pine and interior spruce. CEG: conserved
plasticity in expression; DEG: diverged plasticity in expression; SEG: species diverged genes with no plasticity in expression; and NON: genes will no
treatment or species differences in expression. Interior spruce was used as the outgroup and a two-epoch model was applied. 95% confidence
intervals are shown from bootstrapping the data.
when expression is high through selection for protein sequences that fold properly despite mistranslation
(Drummond et al. 2006). To determine the relative importance of positive and negative selection in driving the correlation between expression level and dN/dS, we compared
patterns of divergence and polymorphism at synonymous
and nonsynonymous sites. We found significantly more
nearly neutral (NeS < 1) and fewer highly deleterious mutations (NeS > 100) in the low expression class compared with
the high expression class when using the two-epoch model
(fig. 5). However, neither the proportion of substitutions
fixed by positive selection (a) nor the rate of positive selection (xa) differs among expression level classes. This suggests
that the differences in dN/dS are driven primarily by a relaxation of purifying selection in low expression genes compared to the high expression genes. Similarly, other studies
that have taken this approach have found strong evidence
for greater purifying selection in high expression genes
(Paape et al. 2013; Williamson et al. 2014). In order to identify polymorphisms using RNAseq we required a relatively
high level of expression across many individuals, suggesting
the pattern would potentially be even stronger if SNPs could
be identified in genes with even lower levels of expression.
However, this approach meant that we eliminated very
weakly expressed genes from the analysis and ensured that
pseudogenes did not create an apparent relationship between expression level and evolutionary rate. Pseudogenes
are common in conifers (Nystedt 2013) and could have a
basal level of expression (Thibaud-nissen et al. 2009).
Evolutionary Rates and Genomic Context
Context-specific gene expression can reduce pleiotropic
constraints by limiting the effects of mutations to a specific
context (e.g., tissues or environments). This can potentially
facilitate sequence divergence (Pal et al. 2006; Snell-Rood
et al. 2010). Genes with low tissue specificity have also
been found to be more slowly evolving (e.g., Subramanian
and Kumar 2004; Renaut et al. 2012; Paape et al. 2013), likely
because these genes are involved in multiple biochemical
pathways and experience multiple selective environments.
Our present study was unable to examine tissue specificity
as only one organ type (needles) was examined.
Environment-specific gene expression may have similar impacts on evolutionary rate as it could restrict gene expression to a subset of individuals. The restricted number of
individuals that experience this environment and thereby
express these genes will mean that the effects of selection
will be weakened (Snell-Rood et al. 2010). To address this
hypothesis, we examined the association of dN/dS with
treatment specificity. However, in coding regions we found
limited evidence for this association once correlations
among other variables were taken into account (supplemen
tary fig. S2, Supplementary Material online). This could be
because climate treatments reflect environments commonly
1509
MBE
Hodgins et al. . doi:10.1093/molbev/msw032
(b)
(a)
Proportion of sites
low
mid low
mid high
high
a ab
b
b
a
a
a
ns
a
(c)
bc
ns
ns
ns
<1
high
mid high
mid low
low
NeS category
Expression level
FIG. 5. The DFEs (a), the proportion of sites fixed by positive selection (b), and the rate of fixation by positive selection, (c) for four gene expression
categories, equal in size, determined by average expression level in lodgepole pine. Loblolly pine was used as the outgroup and a two-epoch model
was applied. 95% confidence intervals are shown from bootstrapping the data. Different letters indicate significant differences.
experienced by most individuals during development due to
temporal variation, or because genes that appear to have
high treatment specificity in these seedlings are actually expressed in a number of environments or involved in multiple
traits over the long lifetimes of these trees. Alternatively, our
cutoff for determining if a gene was expressed or not in a
treatment may not reflect the functionality of these weakly
expressed genes. However, the untranslated regions were
associated with treatment specificity but in opposing
ways, with the positive association between substitution
rate and treatment specificity predicted for the coding sequence found in the 30 -UTR, and with a stronger negative
association in the 50 -UTR such that genes with faster evolving 50 -UTRs were expressed in a wider array of climate
treatments.
We found a strong positive correlation between protein
length and evolutionary rate. Protein length has been found
to be correlated with divergence in previous studies (but see
Drummond et al. 2006; Alvarez-Ponce 2012), although the
direction of the relationship is not consistent among species.
Similar to our findings, some studies find strong positive
correlations with dN/dS (Lemos et al. 2005; Ingvarsson
2007), but others report the reverse pattern (Liao and
Zhang 2006; Larracuente et al. 2008; Yang and Gaut 2011;
Sun and Choi 2015). Positive correlations between protein
length and the coding sequence could be due to greater
selective interference among sites (i.e., the Hill–Robertson
effect), which is expected in longer proteins, thereby reducing
the efficiency of natural selection and potentially leading to
1510
greater divergence (Ingvarsson 2007). Long proteins also had
reduced synonymous polymorphisms, which would be expected under this scenario. However, such an explanation
predicts that linked untranslated regions should follow a
similar pattern. This is inconsistent with our results, perhaps
implying a greater role of the untranslated regions in regulating transcription or translation of longer proteins in particular. Protein length could be correlated with other factors
not accounted for in our analysis that may influence divergence in the coding sequence or UTRs, such as gene function,
intron number or length, and codon bias (Bush et al. 2015; De
La Torre et al. 2015). We also found associations between
evolutionary rate and some gene ontology (GO) categories
for both the UTR and coding regions, suggesting gene function can be an important driver of evolutionary change (sup
plementary fig. S9, Supplementary Material online).
Estimates of Adaptive Divergence in Conifers
We identified only a handful of genes that showed evidence of
site-specific positive selection using divergence data with
Paml. Although this method has limited power to detect
selection when few taxa are included in the alignment
(Bielawski and Yang 2005), our findings provide an initial
glimpse at which genes are evolving rapidly during conifer
evolution. Several putative transcription factors were identified (ortho_group3686, ortho_group13328, ortho_group
13072), including one with a homolog involved in defense
in A. thaliana (ortho_group14244). Ongoing work identifying
loci important for local adaptation in these species will allow
Conifer Sequence and Expression Evolution . doi:10.1093/molbev/msw032
us to determine how often genes diverging within a species in
response to local selective pressures are the same as those
fixing rapidly among species in response to positive selection.
The low number of sites identified as experiencing positive
selection using divergence data alone contrasts with the relatively high proportion of sites identified as adaptively diverging in lodgepole pine using polymorphism and divergence
data (figs. 4 and 5). The results of the two-epoch model suggest that the proportion of sites evolving due to positive
selection in conifers ranges from 0.13 to 0.52, depending on
the expression class examined and the outgroup used. Our
estimates are higher than previous estimates of a conducted
in several conifer species (Eckert et al. 2013a). Using a similar
approach, Eckert et al. (2013a) found a estimates were not
significantly different from zero in all 11 lineages examined.
They attribute these low values to strong biases in the genes
chosen for the analysis. In a sample of a larger number of
genes in loblolly pine similar estimates to ours of have been
found (Eckert et al. 2013b). Many widespread conifers are
thought to have relatively large effective population sizes
(Neale and Savolainen 2004) and several lines of evidence
including decades of common garden experiments demonstrate the adaptive capacity of conifers (Alberto et al. 2013).
The proportion of sites fixed by positive selection is consistent
with estimates in other plant species with large effective population sizes and low population structure (see Hough et al.
2013 for review) such as sunflowers (Renaut et al. 2012),
Capsella grandiflora (Williamson et al. 2014), and Populus
tremula (Ingvarsson 2010). However, estimates of a require
many assumptions to be made and violation of these assumptions can dramatically impact the outcome and hinder
comparisons among species (Gossmann et al. 2010; Hough
et al. 2013). For example, current methods to estimate a
assume that positively selected sites are swept to fixation
and do not contribute to polymorphism. Although we only
used sites from a single population in this study, local adaptation or strong population structure will maintain variation
within a species that is not accounted for by this approach.
Estimates of a can be strongly biased downwards if a significant number of polymorphic sites are slightly deleterious,
as such sites would rarely contribute to divergence. Therefore,
we implemented DFE-a, which attempts to estimate the fraction of these deleterious alleles using the allele frequency
spectrum (AFS; Eyre-Walker and Keightley 2009). This approach estimates demographic changes, which can also impact the frequency spectrum of polymorphisms. However,
the demographic model used (one or two epochs) had a
substantial impact on the outcome. The effects of genetic
draft can distort the AFS and impact estimates of demographic parameters and strength of purifying selection
(Messer and Petrov 2013). Differences in selection on linked
functional sites could explain why slightly different demographic estimates were obtained for different classes of genes
in our analysis. A single epoch model resulted in significantly
greater rates of positive selection overall, shifts in the DFE and
significantly higher a and xa estimates in low expression
genes compared with the two-epoch model (supplementary
figs. S3 and S5, Supplementary Material online). Simulations
MBE
have shown that a two-epoch model can produce more accurate estimates of a even when population sizes are held
constant due to the impact of genetic draft. The two-epoch
model essentially accounts for the skew of the site frequency
spectrum at synonymous sites introduced by background
selection. Messer and Petrov (2013) found that the one-epoch model generally overestimated a, which is consistent
with our findings. However, variation in demographic estimates for different gene categories and the nonindependence
of functional and nonfunctional sites may have obscured
differences in DFE or rates of positive selection.
We found a weak but significant negative correlation between dN/dS and synonymous site polymorphism. This pattern is driven by an underlying positive correlation with dS
and a stronger negative correlation with dN. One explanation
for this pattern is recurrent selection, whereby repeated
sweeps at functional sites results in reduced variation at
linked neutral sites (Maynard Smith and Haigh 1974).
Similarly, interactions between positive and negative selection
could contribute to this pattern, such that positive selection
may result in divergence at weakly deleterious sites via local
reductions in effective population size (Andolfatto 2007).
Such sweeps would also be expected to reduce neutral diversity. Simulations have demonstrated that a negative correlation between nonsynonymous divergence and neutral
polymorphism is unlikely to be generated by negative selection alone (Lohmueller et al. 2011). Along with intermediate
estimates of a, the negative partial correlation between dN
and synonymous nucleotide diversity suggests that adaptive
divergence among conifer genomes may be more frequent
than some previous estimates have indicated (Eckert et al.
2013a). Our capacity to detect rapidly evolving genes or sites
is likely hampered somewhat by difficulties in properly aligning diverged regions and our focus on one-to-one orthologs,
suggesting that the adaptive divergence identified here could
be even more prevalent. Conifer genomes have notoriously
low evolutionary rates relative to angiosperms and have several other unique genomic features (e.g., reduced whole-genome duplications, but see Li et al. 2015), perhaps suggesting
conifer-specific mechanisms of genome evolution
(Buschiazzo et al. 2012). Determining whether this level of
adaptive divergence is due to the demography and life history
of our particular study species or a general feature of conifers
will require further comparative studies.
Materials and Methods
Transcriptome Data
We previously developed reference transcriptomes for lodgepole pine and interior spruce (Yeaman et al. 2014). The reference transcriptomes contained a single (longest) isoform
per transcript cluster (identified by the Trinity assembler;
Grabherr et al., 2011) and had weakly expressed transcripts
removed. We also obtained reference transcriptomes for loblolly pine, Sitka spruce, Norway spruce and Douglas-fir from
the TreeGenes database (http://dendrome.ucdavis.edu/tree
genes/, last accessed February 20, 2016). For each of these
transcriptomes, we removed redundant transcripts by
1511
MBE
Hodgins et al. . doi:10.1093/molbev/msw032
clustering using Cd-Hit-Est (94% identity, word size ¼ 8 and
both strands were compared) (Li and Godzik 2006; Fu et al.
2012).
Ortholog Identification and Interspecific Alignments
for Six Conifer Species
Using the six conifer transcriptome assemblies, we conducted
an all-against-all TBLASTX. Using these results, we identified
orthologs with OrthoMCL version 2.0.8 applying default parameters (Li et al. 2003). This program uses a heuristic BLASTbased approach to identify putative orthologous clusters of
genes termed orthogroups. For comparisons between species
we only used one-to-one orthologs. We identified the most
likely open reading frames (ORFs) for all orthologs using
Transdecoder (option–search_pfam; Haas et al., 2013). After
ORFs were translated, we conducted a BLASTP to the TAIR 10
database. Only those ORF with a pfam hit or hit to A. thaliana
were retained for further analysis to ensure the correct identification of ORFs, particularly in fragmented transcripts. A
relatively small fraction of putative ORFs were rejected at this
stage (<18%). Following this we extracted the predicted coding sequences and the 50 -UTR and 30 -UTR for each
orthogroup and aligned the sequences using Prank (þF option; L€
oytynoja and Goldman 2008). This program, which
takes evolutionary relationships into account when doing
the alignments, has been shown to outperform other alignment programs (L€
oytynoja and Goldman 2008; Fletcher and
Yang 2010; Markova-Raina and Petrov 2011). For the coding
regions we used the codon model for the alignments.
RNAseq Data and Expression Analysis
We employed approximately 350 Gb RNAseq data that we
previously gathered to examine patterns of gene expression
among seven climate treatments that differed in their light,
temperature, and moisture regime (for methodological details see supplementary table S1, Supplementary Material online; Yeaman et al. 2014). Data were derived from needle
samples of 44 lodgepole pine and 39 interior spruce seedlings
obtained from the British Columbia Ministry of Forests,
Lands, and Natural Resource Operations. Lodgepole pine
seed was from seedlot 63,019, seed orchard 313 (Nelson
Seed Planning Unit) containing 46 parental genotypes.
Interior spruce seedlings were grown from seedlot 63,060
from seed orchard 305 (Nelson Seed Planning Unit) containing 70 parental genotypes. We used RSEM to estimate gene
expression levels in each individual (Li and Dewey 2011),
aligning each library back to our Trinity assemblies. For orthologous genes between lodgepole pine and interior spruce, we
analyzed patterns of gene expression among treatments and
species using the EdgeR software package (Robinson et al.
2010).
Evolutionary Rates for Pairwise Comparisons between
Lodgepole Pine and Interior Spruce
For all orthogroups for which lodgepole pine and interior
spruce had one-to-one orthologs, we conducted pairwise
comparisons to determine the divergence at nonsynonymous
and synonymous sites in the coding sequence, as well as the
1512
divergence in the untranslated regions using PAML version
4.5 (Yang 1997; Yang 2007). We ran BASEML for the UTRs
(general time reversible model), and CODEML for the protein
coding regions (runmode -2, F3X4 codon frequency). We removed columns with missing data or gaps and only retained
sequences for which there were at least 60 bases for the
untranslated regions and 60 codons for the coding regions.
As low divergence leads to uncertain estimates, orthogroups
where dS was below 0.01 were excluded. We removed alignments where the untranslated regions had substitution rates
below this level and discarded orthogroups showing substitution rates greater than 2, indicating saturation of substitutions and potential alignment errors. We used a number of
filters to avoid alignment errors. This may bias alignments
toward more slowly diverging genes, but we felt that this
was preferable to introducing errors that can provide incorrect estimates of evolutionary rate. Prior to the analysis, we
eliminated all alignments with average percent identity below
50%, and for the analysis of the coding regions, we removed
genes without complete ORFs in both species (i.e., a start and
a stop codon detected), and removed genes with large discrepancies in protein length (greater than a 10% difference).
These features may indicate misalignments, frameshifts, premature stop codons, pseudogenes, or the erroneous identification of orthologs based on homologous conserved domains
in otherwise separate genes.
SNP Identification and Polymorphism Estimates
We aligned RNAseq reads for lodgepole pine and interior
spruce to the de novo assemblies for each species (Yeaman
et al. 2014) with BWA-mem (Li and Durbin 2009; Li 2013) and
GATK IndelRealigner (DePristo et al. 2011), and called SNPs
and indels using Mpileup (Samtools/Bcftools v 0.1.19; Li et al.
2009; Li 2011). Details are in the supplementary methods,
Supplementary Material online. Following filtering (supple
mentary methods, Supplementary Material online), we used
a modified version of Polymorphorama perl script (Bachtrog
and Andolfatto 2006; Andolfatto 2007; Haddrill et al. 2008) to
generate nonsynonymous and synonymous AFS, as well as
summary statistics for each gene. This script estimated the
number of synonymous sites, nonsynonymous sites, average
pairwise diversity at synonymous (ps) and nonsynonymous
sites (pn), average pairwise divergence (Dxy), as well as counts
of the number of polymorphisms (S), and the summary of the
frequency distribution of polymorphism, Tajima’s D (Tajima
1989). Average pairwise diversity and divergence estimates
were either corrected for multiple hits using a Jukes–Cantor
correction (Jukes and Cantor 1969) or, in the case of synonymous divergence, the Kimura (1980) two-parameter model.
Statistical Analysis
We determined if the rates of protein and UTR evolution
differed depending on the pattern of gene expression among
treatments or between lodgepole pine and interior spruce
using a linear model in R (package lm). When gene expression
class (CEG, DEG, SEG, and NON) explained a significant
amount of the variation in evolutionary rate, we carried out
multiple comparisons using Tukey’s test. The data were either
Conifer Sequence and Expression Evolution . doi:10.1093/molbev/msw032
log or square root transformed to improve normality and
homogeneity of variance.
We determined if dN/dS ratio and substitution rate in the
untranslated regions were correlated with average expression
level, expression divergence between species (lodgepole pine
vs. interior spruce), treatment specificity, or protein length
using partial correlations (Spearman’s) in R (pcor R). Average
expression level was based on the transcript fraction measure
from RSEM and is preferred over RPKM and FPKM measures
because it is independent of the mean expressed transcript
length, and comparable across samples and species (Li and
Dewey 2011). Values were averaged within each treatment
and species, and then averaged over all treatments.
Expression divergence between species was assessed by determining the Euclidean distance between the average expression values for each treatment between species. Treatment
specificity was determined by identifying treatments with
average expression below 1.0 e6. The proportion of treatments with values below this threshold was then determined
so that specificity would be 0 if expression occurred in all
treatments and 1 if it was expressed in none of the treatments
examined. We repeated the above analysis but replaced treatment specificity with ps as we were unable to estimate this in
genes where treatment specificity was low.
We tested the hypothesis that the correlation between
expression and sequence divergence is driven by relaxed selection in the species with lower expression levels. We identified alignments with lodgepole pine, interior spruce and
Douglas fir and used PAML to determine lineage specific
evolutionary rates. Because we did not have an outgroup
for expression to determine changes in expression level along
each branch, we examined correlations between the difference in dN/dS and the difference in average expression level
between lodgepole pine and interior spruce.
For those orthologs in interior spruce and lodgepole pine
with sufficient data, we compared levels of nucleotide diversity and Tajima’s D between the species using a Wilcoxon
Rank test in R. Patterns of polymorphism and divergence
were examined among gene expression classes and gene regions (untranslated regions vs. replacement and synonymous
sites) using the nonparametric Kruskal–Wallis test. Because
only a handful of orthogroups contained all three gene regions in interior spruce with sufficient polymorphism data for
both UTRs and coding region, we only present this comparison for lodgepole pine, but the same pattern was observed in
both species when comparisons of the coding region and
each UTR was conducted (supplementary results,
Supplementary Material online).
Estimates of the DFEs, , and !a
The McDonald–Kreitman test compares polymorphism and
divergence between selectively and neutrally evolving sites to
estimate the proportion of fixations driven by positive selection (a). However, the effects of slightly deleterious mutations
can downwardly bias estimates of positive selection.
Therefore, we implemented the approach of Eyre-Walker
and Keightley (2009) to estimate a and the rate of positive
selection (xa) while taking into account segregating
MBE
deleterious polymorphism by using the site frequency spectrum. The divergence values and the AFS were calculated for
each gene using Polymorphorama for lodgepole pine with
interior spruce as an outgroup. We chose this outgroup as
we wanted to compare changes in gene expression between
these species to the adaptive substitutions arising during divergence between them. Using synonymous sites as a neutral
reference we estimated a, xa, and the DFEs using DFE-a
(Keightley and Eyre-Walker 2007; Eyre-Walker and Keightley
2009) for each expression category. This was also repeated
using method II from Eyre-Walker and Keightley (2009) implemented in DoFe 3.0. Divergence and polymorphism data
were summed across all genes in the specific category. We
repeated this analysis by down-sampling reads to ensure that
patterns were not impacted by potential biases in SNP calling
associated with expression level. We implemented a one and
two-epoch model (i.e., a single population size vs. a stepwise
change in population size) and ran each model under a range
of starting parameters (t2: 1,000, 100, 10; s: 0.1, 0.01, 0.001;
beta: 2, 1, 0.5, 0.1). Changing the starting parameters had little
impact on the final estimates (<1%) in all cases. However, the
number of epochs in the model had a large impact on the
absolute estimates of a and in some cases the relative differences among expression classes. We estimated 95% CI using
1,000 bootstraps by sampling genes with replacement from
each expression category. We determined significance in the
same manner as Williamson et al. (2014).
We also examined the impact of expression level on a, xa,
and the DFE to determine the relative importance of positive
and negative selection in shaping the relationship between
divergence and expression level. We examined expression
level in lodgepole pine using loblolly pine as an outgroup.
We repeated this analysis using interior spruce as the outgroup and by down-sampling reads to ensure that patterns
were not impacted by the selected outgroup or potential
biases in SNP calling associated with expression level.
Polymorphism data from interior spruce were not used in
this analysis, as the hybrid zone effects would not be accounted for in the demographic model available in DFE-a.
Genes were divided into four equal categories based on average expression levels and DFE-a was run in the same manner
as above.
Analysis of Site-Specific Positive Selection Using
Divergence among Six Conifers
Only a small fraction of sites are likely targeted by positive
selection over a brief window of evolutionary time (Golding
and Dean 1998; Bielawski and Yang 2005). Positive selection is
difficult to detect using pairwise comparisons, as this approach averages selective pressure over the entire evolutionary history separating the two lineages and over all codon
sites in the sequences. Power is improved if selective pressure
is allowed to vary over sites or branches. However, the greater
complexity of the model means that multiple sequences are
needed (Yang 1998; Yang et al. 2000; Bielawski and Yang
2005). To accommodate this, we included all six conifer species for which transcriptomes are publicly available. As visual
inspection of the alignments occasionally indicated potential
1513
Hodgins et al. . doi:10.1093/molbev/msw032
paralogs, we used a tree-based approach to flag these problematic alignments (see supplementary methods,
Supplementary Material online), which were ignored for all
downstream analyses.
We evaluated site-specific positive selection using PAML
4.5. We filtered all alignments according to the same parameters as above. Only orthogroups with at least three species in
the trimmed and filtered alignments were used. For the analysis we constructed an unrooted tree using the known topology among the six species (Wang et al. 2000; Lockwood
et al. 2013; supplementary fig. S10, Supplementary Material
online). Specifically, we used the sites model in CODEML to
estimate dN and dS at each codon averaged across all
branches in the tree. We tested for sites evolving by positive
selection (i.e., dN/dS, x > 1) by comparing M1a (nearly neutral), M2a (positive selection), and M7 (beta) against M8 (beta
and x) (Yang 1997; Yang 2014). Equilibrium codon frequencies for each alignment were estimated from the average
nucleotide frequencies at the three codon positions (F3X4),
and transition/transversion ratios were estimated by iteration
of the data. Twice the difference in log-likelihood values of the
M1a:M2a (2 df) and M7:M8 (2 df) comparisons were assessed
for statistical significance using the v2 distribution. Because
the M8 model in particular is known to be influenced by
starting parameters, we ran the program using multiple starting values of the parameter x. We compared P values to
critical values calculated based on a ¼ 0.05 so that fewer
than 5% of genes identified as significant were false positives
(q value R; Storey, 2002).
Supplementary Material
Supplementary methods and results, figures S1–S10 and
tables S1–S7 are available at Molecular Biology and
Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments
The authors would like to thank the Aitken and Rieseberg lab
groups for helpful suggestions as well as the editor and three
anonymous reviewers for their insightful comments. The authors thank G. O’Neill of the BC Ministry of Forests, Lands and
Natural Resources Operations for providing seedlings, and P.
Smets, R. Belvas and C. Fitzpatrick for cultivating seedlings
and maintaining treatments. This work is part of the
AdapTree Project funded by the Genome Canada Large
Scale Applied Research Project program, with co-funding
from Genome BC, the BC Ministry of Forests, Lands and
Natural Resources Operations, and the Forest Genetics
Council of BC (co-Project leaders S.N. Aitken and A.
Hamann).
References
Aitken SN, Yeaman S, Holliday JA, Wang T, Curtis-McLane S. 2008.
Adaptation, migration or extirpation: climate change outcomes
for tree populations. Evol Appl. 1:95–111.
Alberto FJ, Aitken SN, Alıa R, Gonzalez-Martınez SC, H€anninen H,
Kremer A, Lefèvre F, Lenormand T, Yeaman S, Whetten R, et al.
1514
MBE
2013. Potential for evolutionary responses to climate change —
evidence from tree populations. Glob Chang Biol. 19:1645–1661.
Alvarez-Ponce D. 2012. The relationship between the hierarchical position of proteins in the human signal transduction network and their
rate of evolution. BMC Evol Biol. 12:192.
Andolfatto P. 2007. Hitchhiking effects of recurrent beneficial amino acid
substitutions in the Drosophila melanogaster genome. Genome Res.
17:1755–1762.
Bachtrog D, Andolfatto P. 2006. Selection, recombination and demographic history in Drosophila miranda. Genetics 174:2045–2059.
Bielawski JP, Yang Z. 2005. Maximum likelihood methods for detecting
adaptive protein evolution. In: Nielsen R, editor. Statistical methods
in molecular evolution. New York: Springer Verlag. p. 103–124.
Buschiazzo E, Ritland C, Bohlmann J, Ritland K. 2012. Slow but not low:
genomic comparisons reveal slower evolutionary rate and higher
dN/dS in conifers compared to angiosperms. BMC Evol Biol. 12:1–15.
Bush SJ, Kover PX, Urrutia AO. 2015. Lineage-specific sequence evolution
and exon edge conservation partially explain the relationship between evolutionary rate and expression level in A. thaliana. Mol Ecol.
24:3093–3106.
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C,
Philippakis AA, del Angel G, Rivas MA, Hanna M, et al. 2011. A
framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 43:491–498.
Drummond DA, Raval A, Wilke CO. 2006. A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol. 23:327–337.
Duret L, Mouchiroud D. 2000. Determinants of substitution rates in
mammalian genes: expression pattern affects selection intensity
but not mutation rate. Mol Biol Evol. 17:68–74.
Eckert A, Shahi H. 2012. Spatially variable natural selection and the
divergence between parapatric subspecies of lodgepole pine (Pinus
contorta, Pinaceae). Am J Bot. 99:12–11.
Eckert AJ, Bower AD, Jermstad KD, Wegrzyn JL, Knaus BJ, Syring JV, Neale
DB. 2013a. Multilocus analyses reveal little evidence for lineage-wide
adaptive evolution within major clades of soft pines (Pinus subgenus
Strobus). Mol Ecol. 22:5635–5650.
Eckert AJ, Wegrzyn JL, Liechty JD, Lee JM, Cumbie WP, Davis JM, Goldfarb
B, Loopstra C a., Palle SR, Quesada T, et al. 2013b. The evolutionary
genetics of the genes underlying phenotypic associations for loblolly
pine (Pinus taeda, Pinaceae). Genetics 195:1353–1372.
Eyre-Walker A, Keightley PD. 2009. Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations
and population size change. Mol Biol Evol. 26:2097–2108.
Fletcher W, Yang Z. 2010. The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol Biol
Evol. 27:2257–2267.
Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: accelerated for clustering
the next-generation sequencing data. Bioinformatics 28:3150–3152.
Gillespie J. 1991. The causes of molecular evolution. Oxford: Oxford
University Press.
Golding GB, Dean AM. 1998. The structural basis of molecular adaptation. Mol Biol Evol. 15:355–369.
Gossmann TI, Song B-H, Windsor AJ, Mitchell-Olds T, Dixon CJ, Kapralov
MV, Filatov Da, Eyre-Walker A. 2010. Genome wide analyses reveal
little evidence for adaptive evolution in many plant species. Mol Biol
Evol. 27:1822–1832.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I,
Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. 2011. Full-length
transcriptome assembly from RNA-Seq data without a reference
genome. Nat Biotechnol. 29:644–652.
Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J,
Couger MB, Eccles D, Li B, Lieber M, et al. 2013. De novo transcript
sequence reconstruction from RNA-seq using the trinity platform
for reference generation and analysis. Nat Protoc. 8:1494–1512.
Haddrill PR, Bachtrog D, Andolfatto P. 2008. Positive and negative selection on noncoding DNA in Drosophila simulans. Mol Biol Evol.
25:1825–1834.
Conifer Sequence and Expression Evolution . doi:10.1093/molbev/msw032
Hough J, Williamson RJ, Wright SI. 2013. Patterns of selection in plant
genomes. Annu Rev Ecol Evol Syst. 44:31–49.
Howe GT, Aitken SN, Neale DB, Jermstad KD, Wheeler NC, Chen TH.
2003. From genotype to phenotype: unraveling the complexities of
cold adaptation in forest trees. Can J Bot. 81:1247–1266.
Ingvarsson PK. 2007. Gene expression and protein length influence codon usage and rates of sequence evolution in Populus tremula. Mol
Biol Evol. 24:836–844.
Ingvarsson PK. 2010. Natural selection on synonymous and nonsynonymous mutations shapes patterns of polymorphism in Populus
tremula. Mol Biol Evol. 27:650–660.
Jordan IK, Mari~
no-ramırez L, Koonin EV. 2005. Evolutionary significance
of gene expression divergence. Gene 345:119–126.
Jukes TH, Cantor CR. 1969. Evolution of protein molecules. In: Munro HN,
editor. Mammalian protein metabolism. New York: Academic Press.
p. 21–123.
Keightley PD, Eyre-Walker A. 2007. Joint inference of the distribution of
fitness effects of deleterious mutations and population demography
based on nucleotide polymorphism frequencies. Genetics
177:2251–2261.
Kimura M. 1980. A simple method for estimating evolutionary rates of
base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 16:111–120.
Kimura M. 1983. The neutral theory of molecular evolution. Cambridge:
Cambridge University Press.
De La Torre AR, Birol I, Bousquet J, Ingvarsson PK, Jansson S, Jones SJM,
Keeling CI, MacKay J, Nilsson O, Ritland K, et al. 2014. Insights into
conifer giga-genomes. Plant Physiol. 166:1724–1732.
De La Torre AR, Lin Y-C, Van de Peer Y, Ingvarsson PK. 2015. Genomewide analysis reveals diverged patterns of codon bias, gene expression, and rates of sequence evolution in Picea gene families. Genome
Biol Evol. 7:1002–1015.
De La Torre AR, Wang T, Jaquish B, Aitken SN. 2013. Adaptation and
exogenous selection in a Picea glauca Picea engelmannii hybrid
zone: implications for forest management under climate change.
New Phytol. 687–699.
Larracuente AM, Sackton TB, Greenberg AJ, Wong A, Singh ND, Sturgill
D, Zhang Y, Oliver B, Clark AG. 2008. Evolution of protein-coding
genes in Drosophila. Trends Genet. 24:114–123.
Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL. 2005. Evolution of
proteins and gene expression levels are coupled in Drosophila and
are independently associated with mRNA abundance, protein
length, and number of protein-protein interactions. Mol Biol Evol.
22:1345–1354.
Li B, Dewey CN. 2011. RSEM: accurate transcript quantification from
RNA-Seq data with or without a reference genome. BMC
Bioinformatics 12:323.
Li H. 2011. A statistical framework for SNP calling, mutation discovery,
association mapping and population genetical parameter estimation
from sequencing data. Bioinformatics 27:2987–2993.
Li H. 2013. Aligning sequence reads, clone sequences and assembly
contigs with BWA-MEM. arXiv:1303.3997v2 [q-bio.GN].
Li H, Durbin R. 2009. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25:1754–1760.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G,
Abecasis G, Durbin R. 2009. The Sequence Alignment/Map format
and SAMtools. Bioinformatics 25:2078–2079.
Li L, Stoeckert CJ, Roos DS. 2003. OrthoMCL: identification of ortholog
groups for eukaryotic genomes. Genome Res. 13:2178–2189.
Li W, Godzik A. 2006. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics
22:1658–1659.
Liao BY, Zhang J. 2006. Low rates of expression profile divergence in
highly expressed genes and tissue-specific genes during mammalian
evolution. Mol Biol Evol. 23:1119–1128.
MBE
Liu H, Yin J, Xiao M, Gao C, Mason AS, Zhao Z, Liu Y, Li J, Fu D. 2012.
Characterization and evolution of 5 0 and 3 0 untranslated regions in
eukaryotes. Gene 507:106–111.
Lockwood JD, Aleksic JM, Zou J, Wang J, Liu J, Renner SS. 2013. A new
phylogeny for the genus Picea from plastid, mitochondrial, and nuclear sequences. Mol Phylogenet Evol. 69:717–727.
Lohmueller KE, Albrechtsen A, Li Y, Kim SY, Korneliussen T,
Vinckenbosch N, Tian G, Huerta-Sanchez E, Feder AF, Grarup N,
et al. 2011. Natural selection affects multiple aspects of genetic variation at putatively neutral sites across the human genome. PLoS
Genet. 7:e1002326.
L€oytynoja A, Goldman N. 2008. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science
320:1632–1635.
Markova-Raina P, Petrov D. 2011. High sensitivity to aligner and high rate
of false positives in the estimates of positive selection in the 12
Drosophila genomes. Genome Res. 21:863–874.
Maynard Smith J, Haigh J. 1974. The hitch-hiking effect of a favourable
gene. Genet Res. 23:23–35.
Messer PW, Petrov DA. 2013. Frequent adaptation and the McDonaldKreitman test. Proc Natl Acad Sci U S A 110:8615–8620.
Mignone F, Gissi C, Liuni S, Pesole G. 2002. Untranslated regions of
mRNAs. Genome Biol. 3:REVIEWS0004.
Moyers BT, Rieseberg LH. 2013. Divergence in gene expression is
uncoupled from divergence in coding sequence in a secondarily
woody sunflower. Int J Plant Sci. 174:1079–1089.
Narsai R, Howell KA, Millar AH, O’Toole N, Small I, Whelan J. 2007.
Genome-wide analysis of mRNA decay rates and their determinants
in Arabidopsis thaliana. Plant Cell 19:3418–3436.
Neale DB, Kremer A. 2011. Forest tree genomics: growing resources and
applications. Nat Rev Genet. 12:111–122.
Neale DB, Savolainen O. 2004. Association genetics of complex traits in
conifers. Trend Plant Sci. 9:1360–1385.
Nuzhdin SV, Wayne ML, Harmon KL, McIntyre LM. 2004. Common
pattern of evolution of gene expression level and protein sequence
in Drosophila. Mol Biol Evol. 21:1308–1317.
Nystedt B. 2013. The Norway spruce genome sequence and conifer
genome evolution. Nature 497:579–584.
Ohta T. 2002. Near-neutrality in evolution of genes and gene regulation.
Proc Natl Acad Sci U S A. 99:16134–16137.
Paape T, Bataillon T, Zhou P, Kono TJY, Briskine R, Young ND, Tiffin P.
2013. Selection, genome-wide fitness effects and evolutionary rates
in the model legume Medicago truncatula. Mol Ecol. 22:3525–3538.
Pal C, Papp B, Lercher MJ. 2006. An integrated view of protein evolution.
Nat Rev Genet. 7:337–348.
Parchman TL, Gompert Z, Mudge J, Schilkey FD, Benkman CW, Buerkle
CA. 2012. Genome-wide association genetics of an adaptive trait in
lodgepole pine. Mol Ecol. 21:2991–3005.
Pesole G, Mignone F, Gissi C, Grillo G, Licciulli F, Liuni S. 2001. Structural
and functional features of eukaryotic mRNA untranslated regions.
Gene 276:73–81.
Ramsay H, Rieseberg LH, Ritland K. 2009. The correlation of evolutionary
rate with pathway position in plant terpenoid biosynthesis. Mol Biol
Evol. 26:1045–1053.
Renaut S, Grassa C, Moyers B, Kane N, Rieseberg L. 2012. The population
genomics of sunflowers and genomic determinants of protein evolution revealed by RNAseq. Biology 1:575–596.
Rocha EPC. 2006. The quest for the universals of protein evolution.
Trends Genet. 22:412–416.
Savolainen O, Pyh€aj€arvi T, Kn€
urr T. 2007. Gene flow and local adaptation
in trees. Annu. Rev Ecol Evol Syst. 38:595–619.
Slotte T, Bataillon T, Hansen TT, St Onge K, Wright SI, Schierup MH.
2011. Genomic determinants of protein evolution and polymorphism in Arabidopsis. Genome Biol Evol. 3:1210–1219.
1515
Hodgins et al. . doi:10.1093/molbev/msw032
Snell-Rood EC, Dyken JDV, Cruickshank T, Wade MJ, Moczek AP. 2010.
Toward a population genetic framework of developmental evolution: the costs, limits, and consequences of phenotypic plasticity.
BioEssays 32:71–81.
Storey J. 2002. A direct approach to false discovery rates. J R Stat Soc.
64:479–498.
Subramanian S, Kumar S. 2004. Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome.
Genetics 168:373–381.
Sun SS, Choi S. 2015. Lengths of coding and noncoding regions of a gene
correlate with gene essentiality and rates of evolution. Genes
Genomics 37:365–374.
Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 595:585–595.
Thibaud-nissen F, Ouyang S, Buell CR. 2009. Identification and characterization of pseudogenes in the rive gene complement. BMC
Genomics. 13:1–13.
Tirosh I, Barkai N. 2008. Evolution of gene sequence and gene expression
are not correlated in yeast. Trends Genet. 24:109–113.
Wang XQ, Tank DC, Sang T. 2000. Phylogeny and divergence times in
Pinaceae: evidence from three genomes. Mol Biol Evol. 17:773–781.
Warnefors M, Kaessmann H. 2013. Evolution of the correlation between
expression divergence and protein divergence in mammals. Genome
Biol Evol. 5:1324–1335.
1516
MBE
Williamson RJ, Josephs EB, Platts AE, Hazzouri KM, Haudry A, Blanchette
M, Wright SI. 2014. Evidence for widespread positive and negative
selection in coding and conserved noncoding regions of Capsella
grandiflora. PLoS Genet. 10:e1004622.
Yang L, Gaut BS. 2011. Factors that contribute to variation in
evolutionary rate among Arabidopsis genes. Mol Biol Evol.
28:2359–2369.
Yang Z. 1997. PAML: a program package for phylogenetic analysis by
maximum likelihood. Comput Appl Biosci. 13:555–556.
Yang Z. 1998. Likelihood ratio tests for detecting positive selection and
application to primate lysozyme evolution. Mol Biol Evol.
15:568–573.
Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood.
Mol Biol Evol. 24:1586–1591.
Yang Z. 2014. User guide PAML: phylogenetic analysis by maximum
likelihood. Version 4.8a (August 2014).
Yang Z, Nielsen R, Goldman N, Pedersen a M. 2000. Codon-substitution
models for heterogeneous selection pressure at amino acid sites.
Genetics 155:431–449.
Yeaman S, Hodgins K, Suren H, Nurkowski K, Rieseberg L, Holliday J,
Aitken S. 2014. Conservation and divergence of gene expression
plasticity following 140 million years of evolution in lodgepole
pine (Pinus contorta) and interior spruce (Picea glauca, Picea engelmannii and their hybrids). New Phytol. 203:578–591.