Genomic Profiles of Diversification and Genotype–Phenotype

Genomic Profiles of Diversification and Genotype–Phenotype
Association in Island Nematode Lineages
Angela McGaughran,*1,2,3 Christian Rödelsperger,3 Dominik G. Grimm,4,5,6 Jan M. Meyer,3
Eduardo Moreno,3 Katy Morgan,3 Mark Leaver,7 Vahan Serobyan,3 Barbara Rakitsch,4
Karsten M. Borgwardt,4,5,6 and Ralf J. Sommer3
1
CSIRO Land & Water, Black Mountain Laboratories, Canberra, ACT, Australia
School of BioSciences, University of Melbourne, Melbourne, VIC, Australia
3
Department for Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany
4
Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems, Tübingen, Germany
5
Zentrum Für Bioinformatik, Eberhard Karls Universit€at, Tübingen, Germany
6
Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
7
Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
2
*Corresponding author: E-mail: [email protected].
Associate editor: Beth Shapiro
Abstract
Key words: differentiation, diversification, evolution, FST, genome-wide association study, incipient speciation.
Introduction
Understanding how organisms diversify is a fundamental
topic in biology, that broadly relates to deciphering the
changes in phenotype and genotype that occur among populations (Darwin 1859; Mayr 1963; Losos et al. 2013).
Speciation stems from processes which drive phenotypic diversification, population divergence, and ultimately, the evolution of reproductive isolation (Endler 1986; Schluter and
Conte 2009; Nosil 2012). However, the mechanisms underlying speciation can vary in time and space, as well as have
different phenotypic and genetic signatures (Nosil 2012). For
example, speciation can result from nondeterministic processes, such as genetic drift among isolated populations, or
can be a product of natural selection operating divergently
across environments (Schluter 2001; Via 2001; Rundle and
Nosil 2004). Somewhat less recognized processes, such as immigrant inviability, whereby immigrants show reduced success upon reaching foreign environments that are ecologically
divergent from their native habitat (sensu Nosil et al. 2005),
may also contribute to reproductive isolation.
Teasing apart the evolutionary processes that eventually
promote reproductive isolation among populations can be
challenging (e.g., Cruickshank and Hahn 2014). However,
studying incipient species, which constitute intermediate
stages of speciation, may provide insights into local adaptation and ecological speciation, especially in species with well
characterized biogeographic histories (Nosil and Feder 2012;
Seehausen et al. 2014). Such insights often stem from focusing
on patterns of divergence along the genome, and a pattern
typical of ecological speciation is a relatively low background
ß The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Mol. Biol. Evol. 33(9):2257–2272 doi:10.1093/molbev/msw093 Advance Access publication May 9, 2016
2257
Article
Understanding how new species form requires investigation of evolutionary forces that cause phenotypic and genotypic
changes among populations. However, the mechanisms underlying speciation vary and little is known about whether
genomes diversify in the same ways in parallel at the incipient scale. We address this using the nematode, Pristionchus
pacificus, which resides at an interesting point on the speciation continuum (distinct evolutionary lineages without
reproductive isolation), and inhabits heterogeneous environments subject to divergent environmental pressures. Using
whole genome re-sequencing of 264 strains, we estimate FST to identify outlier regions of extraordinary differentiation
(1.725 Mb of the 172.5 Mb genome). We find evidence for shared divergent genomic regions occurring at a higher
frequency than expected by chance among populations of the same evolutionary lineage. We use allele frequency spectra
to find that, among lineages, 53% of divergent regions are consistent with adaptive selection, whereas 24% and 23% of
such regions suggest background selection and restricted gene flow, respectively. In contrast, among populations from
the same lineage, similar proportions (34–48%) of divergent regions correspond to adaptive selection and restricted gene
flow, whereas 13–22% suggest background selection. Because speciation often involves phenotypic and genomic divergence, we also evaluate phenotypic variation, focusing on pH tolerance, which we find is diverging in a manner corresponding to environmental differences among populations. Taking a genome-wide association approach, we functionally
validate a significant genotype–phenotype association for this trait. Our results are consistent with P. pacificus undergoing heterogeneous genotypic and phenotypic diversification related to both evolutionary and environmental processes.
MBE
McGaughran et al. . doi:10.1093/molbev/msw093
of genomic differentiation interspersed with regions of extraordinary differentiation (Hanikenne et al. 2013; Renaut
et al. 2013; Sadier et al. 2014; Soria-Carrasco et al. 2014).
However, so-called “islands of divergence” can result from
multiple processes, including ecological adaptation, “divergence hitch-hiking” around selected variants, reduction of
gene flow in “speciation islands”, and variation in mutation
and recombination rates (e.g., Burri et al. 2015; Feulner et al.
2015). Researchers are beginning to tease apart these various
processes through comparative studies of genome-wide patterns of differentiation between populations at varying stages
along the speciation continuum (i.e., from partially isolated
races to fully isolated taxa; Feder et al. 2012). Such studies
have demonstrated the power of analyzing various genomic
parameters – e.g., diversity (p), Tajima’s D (TD), relative and
absolute divergence (FST and Dxy), and linkage disequilibrium
(LD) – including their partitioning across the genome and the
relationships between them, to better elucidate the processes
underlying genomic diversification and ecological speciation.
Pristionchus pacificus is an androdioecious (i.e., hermaphrodite and male-producing) nematode with a cosmopolitan
distribution encompassing Africa, Asia, Europe, America, and
the Mascareigne Islands of the Indian Ocean (Herrmann et al.
2007, 2010). The species was originally used as a system for
comparative studies with Caenorhabditis elegans focused on
developmental biology, ecology, and population genetics (for
review see Hong and Sommer 2006; Sommer and
McGaughran 2013; Sommer 2015). By now, P. pacificus has
a well-understood biogeographic history across the Indian
Ocean islands (Morgan et al. 2012, 2014; McGaughran et al.
2013a, 2014), where it frequently lives in an inactive state on
scarab beetles, feeding on decomposing microorganisms after
the beetles’ death (Herrmann et al. 2007).
Mitochondrial studies have revealed four lineages (A, B, C
and D), all of which were found on La Réunion (Herrmann
et al. 2010), and three of which (A, C, and D) were found on
the neighboring Mauritius Island (Morgan et al. 2014). In
contrast, both microsatellite sequencing of Réunion and
Mauritius strains (Morgan et al. 2012), and whole genome
re-sequencing of 104 globally sampled strains, indicated the
presence of a more complex sub-structure within lineages.
This included the division of the mitochondrial A lineage into
A1, A2 and A3 sub-lineages (Rödelsperger et al. 2014). Of all
the genomic lineages, A2, B, C, and D are present across an
array of heterogeneous environments on La Réunion Island,
and A2, C, and D are also present on nearby Mauritius Island
(Morgan et al. 2012; McGaughran et al. 2013a, 2014). As such,
P. pacificus represents an excellent species for examining the
role of both evolutionary and environmental gradients in
contributing to the early stages of speciation processes.
Previous work has demonstrated that the distinct lineages,
even those in close geographic proximity, have different genomic profiles. Most (> 90%) within-lineage diversity is due
to private (local) variation rather than to diversity shared in
the common ancestral pool (Rödelsperger et al. 2014). This
fits well with a model for Réunion and Mauritius, whereby
lineages have independently colonized the islands from multiple source populations and thereafter continued to diverge
2258
in relative isolation (McGaughran et al. 2013a). Appreciable
levels of genetic variation in P. pacificus are accompanied by
high levels of phenotypic diversification among La Réunion
Island populations (e.g., with natural variation in dauer
formation; Mayer and Sommer 2011, cold tolerance;
McGaughran and Sommer 2014, chemosensation;
McGaughran et al. 2013b, and oxygen-induced social behavior; Moreno et al. 2016). The high degree of both genetic and
phenotypic diversity among populations and lineages of P.
pacificus, coupled with the fact that the different lineages can
be crossed in the laboratory to produce fertile offspring
(Sommer Lab, unpublished data), suggests that P. pacificus
can be considered to be at an early stage in speciation, corresponding to continuous variation without complete reproductive isolation (sensu Hendry et al. 2009). In conjunction
with this, the island distribution of P. pacificus corresponds to
a range of ecological gradients and ecosystem heterogeneities
among habitats (Strasberg et al. 2005). As a result, there lies an
opportunity in this species to try to understand the ecological
relevance of genomic differentiation across a range of evolutionary divergence.
In this study, we characterize the genomic profile of diversification among P. pacificus lineages from the same island,
among lineages and populations from neighboring islands (La
Réunion and Mauritius), and among populations of the same
lineage within islands (La Réunion). We use a variety of population genomic parameter estimates (p, TD, FST, Dxy, and LD)
to examine whether diversification is occurring at the same
genomic sites within and across lineages and populations, and
to explore the processes (e.g., divergent selection, linkage/recombination, gene flow) that best explain patterns of genomic divergence. In conjunction, we analyze phenotypic
variation among populations from La Réunion Island to
test whether an ecologically relevant phenotype, pH tolerance, is also diverging in environmental comparisons in a
manner that may promote immigrant inviability. pH tolerance has not been analyzed in nematodes before, but pH
varies considerably between different soil types on La
Réunion, and this variation has the potential to exert strong
selective pressure on nematodes, which spend a portion of
their life cycle as soil-dwelling. Finally, we link genotype with
phenotype in a genome-wide association study (GWAS)
framework, taking an identified genetic candidate through
to functional validation for pH tolerance. Ultimately, by combining functional and biological levels of analysis in a system
with older lineage, and younger population, pairs that can still
interbreed under laboratory conditions, we attempt to understand how genomes begin to diversify on their path to
becoming separate species.
Results
Whole Genome Re-Sequencing
From our total P. pacificus collection, we performed whole
genome re-sequencing on 264 strains, each created from a
single field-collected hermaphrodite (see Materials and
Methods) from La Réunion Island and nearby Mauritius
Island. Our strain selection covered the complete known
Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093
FIG. 1. Broad-scale genomic structure in Pristionchus pacificus. (A)
Map showing the location of Mauritius and La Réunion Islands in
the Indian Ocean; approximate position of La Réunion Island strain
collection locations are indicated in black. Original Map data 2014
Google; (B) neighbor-joining tree (based on p-distances) showing the
genetic relationships among lineages (colored according to: A2 – dark
purple; B – light purple; C – light green; D – dark green), as well as finescale population structure among geographic locations for lineages B,
C, and D; numbers on figure represent pairwise genome-wide mean
divergence (FST) within or between lineages.
mitochondrial and microsatellite diversity of P. pacificus on La
Réunion (Morgan et al. 2012; McGaughran et al. 2014), as well
as a variety of ecological and environmental habitat conditions (fig. 1A and table 1). Pooled individuals of each isogenic
line were sequenced to a mean read depth of 17.4 (range:
5–58; SD ¼ 8.3). After quality filtering (see Materials and
Methods), 6.8 million SNP positions (from a total genome
size of 172.5 Mb and six chromosomes; Rödelsperger et al.
2014) were retained for population genomic analysis.
Population Genetics
Structure and Diversity of Lineages and Populations
Strain selection resulted in a collective 30 populations (27
from Réunion, 3 from Mauritius; supplementary tables S1
and S2, Supplementary Material online), corresponding to
groups of strains which were collected at the same geographic
location and are of the same genomic ancestry. We used all
264 strains to estimate baseline population genetic parameters for the most diverse data set possible. For all other
genome-wide analyses, we accounted for sample size
MBE
differences by randomly reducing each data set to create
equal sample sizes among lineages (n ¼ 17) and populations
(n ¼ 9 or 8) (see Materials and Methods and table 1). These
reduced data sets included 4 lineages (A2, B, C and D); 11
Réunion populations (3, 6, and 2, from lineages B, C, and D,
respectively), and 2 Mauritius populations (1 from each of
lineage C, and D) (fig. 1A and table 1).
The distribution of lineages across Réunion and Mauritius
Island is presented in figure 1B, where several geographic locations can be seen to share genomes that differ in ancestry.
For example, the location GE harbors strains of lineage A2, C,
and D ancestry, whereas the locations SS1, PL, TK, and BV, all
have strains corresponding to both lineage C and D (fig. 1B
and table 1). This confirms the presence of co-habiting lineages on the island, and indeed, our sampling records show
rare instances in which different lineages can even co-occur
on the same collected beetle. However, the retained lineage
structure at these locations suggests that isolation mechanisms may prevent hybridization between lineages within
geographic locations. Using phylogenetic analysis, we identified fine-scale geographic structure among strains within lineage B, C, and D, providing support for allopatric divergence
at the within-lineage level for several populations (fig. 1B; see
also Morgan et al. 2012; McGaughran et al. 2013a, 2014).
Mean genome-wide diversity (p) in 10-kb windows did not
differ greatly among lineages and populations or between full
and sub-sampled data sets (total range: 0.0011–0.0024; table
1). Mean Tajima’s D, also measured in 10-kb windows, was
consistently negative only for lineage A2 (0.4) at a
genome-wide scale, where negative values are taken to indicate an abundance of rare alleles that could result from selection and/or population demographic expansion (table 1).
Mean LD, measured as R2 in 10-kb windows, was highest
(generally > 0.6) for Mauritius lineages and Réunion populations, and lowest for Réunion lineages (< 0.45). Although this
likely reflects sample size differences across the various groups
(table 1), the correlation between LD for full and down-sized
data sets for two populations was calculated as 0.786 and
0.782 (P < 0.001 for both cases). LD decay has been examined
previously in P. pacificus, and shown to occur over 20- to 300kb distances, depending on the geographic sampling of lineages (Rödelsperger et al. 2014; Morgan et al., unpublished
data). In other selfing species, such as Arabidopsis thaliana,
LD is also high and decays over long genomic distances
(Nordborg et al. 2002). In P. pacificus, such patterns are likely
to predominantly be the result of its androdioecious life cycle,
whereas other factors, such as spatial population structure
and selective sweeps, may also be important drivers of these
patterns (Schmid et al. 2006; Barrière and Félix 2007;
Andersen et al. 2012).
Genomic Profiles of Differentiation among SubSampled Lineages and Populations
Genomic Differentiation over a Continuum of Evolutionary
Divergence
For all of the lineages and populations, pairwise genome-wide
mean divergence (FST) ranged from an average of 0.007–0.068.
2259
MBE
McGaughran et al. . doi:10.1093/molbev/msw093
Table 1. Mean Population Genetic Estimators for Lineages and Populations.
Scale of Comparison
Island
Lineagea
Population
nb
Pic
Tajima’s Dc
LDc
Lineages
La Réunion and
Mauritius
(n¼264)
A2
–
22
0.0019 (0.0045)
0.4988 (1.6907)
0.2364 (0.3693)
B
C
D
A2
–
–
–
–
31
135
76
18
0.0013 (0.0028)
0.0013 (0.0027)
0.0012 (0.0026)
0.0019 (0.0046)
0.2115 (1.7990)
0.2624 (1.8929)
0.2223 (1.9022)
0.4645 (1.6780)
0.2488 (0.3771)
0.1745 (0.3049)
0.2643 (0.3784)
0.2935 (0.4032)
B
C
D
A2
–
–
–
–
31
125
68
4
0.0013 (0.0028)
0.0013 (0.0027)
0.0012 (0.0027)
0.0024 (0.0057)
0.2115 (1.7990)
0.2592 (1.8765)
0.2069 (1.8863)
0.1205 (1.1820)
0.2488 (0.3771)
0.1761 (0.3055)
0.2682 (0.3820)
0.5815 (0.4127)
C
D
A2
–
–
–
10
8
17
0.0016 (0.0040)
0.0012 (0.0030)
0.0019 (0.0046)
0.7667 (1.3115)
0.6965 (1.2020)
0.4553 (1.6658)
0.6107 (0.4258)
0.6627 (0.4057)
0.3040 (0.4074)
B
C
D
A2
–
–
–
SB
17
17
17
9
0.0013 (0.0029)
0.0016 (0.0028)
0.0012 (0.0024)
0.0020 (0.0043)
0.1114 (1.6664)
0.2373 (1.5595)
0.2983 (1.5777)
0.3757 (1.4833)
0.3750 (0.4234)
0.4239 (0.4249)
0.4485 (0.4316)
0.4069 (0.4315)
B
CC
CK
NB
CO
GE
PA1
PC
SS1
TB
GE
GEd
PL
PLd
–
–
9
9
9
9
9
9
9
9
9
9
8
9
8
9
8
0.0014 (0.0034)
0.0014 (0.0034)
0.0014 (0.0028)
0.0014 (0.0034)
0.0015 (0.0030)
0.0016 (0.0039)
0.0015 (0.0029)
0.0014 (0.0033)
0.0013 (0.0030)
0.0011 (0.0024)
0.0016 (0.0034)
0.0011 (0.0024)
0.0016 (0.0034)
0.0017 (0.0041)
0.0012 (0.0030)
0.2532 (1.4330)
0.5430 (1.3428)
0.0273 (1.4710)
0.8454 (1.2818)
0.4934 (1.4348)
0.8562 (1.2559)
0.3430 (1.3881)
0.8578 (1.2771)
0.4845 (1.4143)
0.4422 (1.3611)
0.8499 (1.2734)
0.4597 (1.3138)
0.8378 (1.2347)
0.7627 (1.2760)
0.6965 (1.2020)
0.4917 (0.4359)
0.5623 (0.4298)
0.3917 (0.4244)
0.6401 (0.4101)
0.6204 (0.4271)
0.4860 (0.4268)
0.6828 (0.4152)
0.6381 (0.4149)
0.5789 (0.4374)
0.6308 (0.4133)
0.6113 (0.4198)
0.6672 (0.4043)
0.6366 (0.4133)
0.6372 (0.4197)
0.6627 (0.4057)
La Réunion
(n¼242)
Mauritius
(n¼22)
Sub-sampled
lineages
Populations
(sub-sampled)
La Réunion
(n¼68)
La Réunion
(n¼108)
C
D
Mauritius (n¼17)
C
D
a
NOTE.—For each location, the total number of samples representing each genetic lineage is shown.
Sample size.
c
Calculated in 10-kb windows, with SD given in parentheses. Results were not quantitatively different for window sizes of 1 and 100 kb (see supplementary table S2,
Supplementary Material online).
d
Sample size was reduced for these two populations in order to match sample sizes for lineage D from Mauritius in analyses using pairwise comparisons.
b
This highlights an increasing degree of differentiation as we
move from populations located on the same island (e.g., La
Réunion lineage B: 0.008, La Réunion lineage C: 0.007) to
populations across La Réunion and Mauritius Islands (e.g.,
lineage C Mauritius vs. La Réunion: 0.019; lineage D
Mauritius vs. La Réunion: 0.017), to lineages (mean: 0.068)
(fig. 1A and supplementary table S3, Supplementary
Material online).
Spatial heterogeneity along the genome was analyzed between lineages and populations using a genome scan approach, averaging FST in 1-, 10-, and 100-kb nonoverlapping
windows (fig. 2 and supplementary fig. S1 and table S4,
Supplementary Material online), with 10-kb windows used
in all subsequent analyses (see Materials and Methods). The
shape of the distribution of FST values across the genome can
be seen in supplementary figure S2 (Supplementary Material
online). The marked right tail of these distributions aided the
identification of outlier windows (up to 172.5 10-kb regions in
2260
the top 1% of the FST distribution, making up 1.725 Mb of
the genome), which are significantly higher than the genomewide average (outlier windows were detected as the top 1% of
the empirical distribution in addition to being significantly
differentiated compared with a random permutation approach using a false discovery rate of 0.01) (table 2).
Spearman’s correlation analysis of FST along the genome
was performed between lineage and population pairs. FST was
weakly, but significantly, correlated across lineages overall
(mean rho ¼ 0.348 6 0.122 SD; P < 0.001) and across lineages
on different islands (mean rho ¼ 0.432 6 0.038 SD; P < 0.001,
for Mauritius vs. Réunion lineage C). These correlations were
higher than those found for within-island populations (mean
rho ¼ 0.316 6 0.020 SD;
P < 0.001,
and
mean
rho ¼ 0.242 6 0.089 SD; P < 0.001, for Réunion lineage B
and C populations, respectively) (supplementary table S5,
Supplementary Material online). Because genome-wide FST
values are generally low, a significant portion of this
Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093
MBE
Table 2. Differentiated Genomic Regions across Lineages and Populations.
FST Comparison
Reunion
Lineages
A2 vs. B,C,D
B vs. A2,C,D
C vs. A2,B,D
D vs. A2,C,D
Totald
Populations
Lineage B
CC vs. CK,NB
CK vs. CC,NB
NB vs. CC,CK
Totald
Lineage C
CO vs. GE,PA1,PC,SS1,TB
GE vs. CO,PA1,PC,SS1,TB
PA1 vs. CO,GE,PC,SS1,TB
PC vs. CO,GE,PA1,SS1,TB
SS1 vs. CO,GE,PA1,PC,TB
TB vs. CO,GE,PA1,PC,SS1
Totald
Lineage De
GE vs. PL
Mauritiuse
Lineage C
MAU vs. CO,GE,PA1,PC,SS1,TB
Lineage D
MAU vs. GE,PL
No.a Total No.
Total No. 10-kb
No. Common
Shared
No. Shared by Significancec
b
10-kb Windows Outlier Windows Outlier Windows Windows (%) All Groups
3
3
3
3
6
19,238
19,741
20,620
20,659
40,129
181
185
194
198
398
6
1
0
1
10
3.31
0.54
0.00
0.51
2.51
0
S
NS
NS
NS
NS
2
2
2
3
14,661
14,541
14,698
21,950
145
143
144
218
11
11
11
11
7.59
7.69
7.64
5.05
11
S
S
S
S
10
10
10
10
10
10
15
37,083
37,650
37,045
37,125
37,595
36,304
111,401
363
369
357
362
362
354
1,105
18
18
12
13
14
14
115
4.96
4.88
3.36
3.59
3.87
3.95
10.41
0
NS
NS
NS
NS
NS
NS
S
1
7,634
75
15
42,395
406
23
5.67
0
NS
2
14,315
136
14
10.29
14
–
–
a
NOTE.—Number of pairwise comparisons.
In at least two comparisons.
c
As determined with 10,000 random permutations (see Materials and Methods).
d
All values correspond to the total number of pairwise comparisons unique to the whole group.
e
Signficance for Réunion Lineage D (GE vs. PL) and Mauritius Lineage D (MAU vs. GE, PL) is not considered as the minimum number of comparisons is too low.
b
correlation is likely due to noise; however, the overall mean
rho value is quite high at > 0.3. These results may therefore
indicate that similar population genetic processes (e.g., recombination landscapes and background selection) may be
driving genome-wide divergence in different lineages and
populations (see Rödelsperger et al. 2014). However, more
generally (i.e., in 60–70% of cases), FST windows in one population/lineage are not the same windows of low/high FST in
other population pairs across a large proportion of the
genome.
To determine whether this was also true for outlier FST
windows, we investigated the degree of overlap among outlier
FST windows for lineage and population comparisons, using a
permutation approach (table 2). Comparisons across lineages
and across islands were not significant, indicating that divergent genomic regions are generally not shared across lineages
at a rate more than expected by chance. However, for population comparisons on La Réunion Island, we detected a
total of 218 10-kb outlier windows for lineage B populations,
of which 11 were shared across all 3 population pairs. This
proportion is more than expected by chance (10,000 permutations of random sampling gave on an average five overlaps;
one-tailed P ¼ 0.025). The same was true for La Réunion lineage C populations, where we detected a total of 1,105 10-kb
outlier windows across all 15 possible pairwise comparisons,
115 of which were shared in at least 2 of the 15 possible
population pairs (84 expected overlaps in 10,000 permutations of random sampling, one-tailed P ¼ 0.025). Thus, we
found a pattern in which genomic regions of extraordinary
differentiation were more often shared among recently diverged (allopatric) populations than among historically diverged lineages (table 2).
Relative versus Absolute Genomic Differentiation
To complement FST analysis, measurements of absolute divergence, such as Dxy, are now recommended (e.g., Nachman
and Payseur 2012; Cruickshank and Hahn 2014). Thus, to
examine the degree of absolute genomic differentiation, we
compared Dxy in outlier versus nonoutlier windows for each
lineage and population. We found that median Dxy was consistently lower in outlier windows comparative to nonoutlier windows (after Bonferroni correction, the Wilcoxon
test P > 0.003 in 12/17 tests; supplementary fig. S3,
Supplementary Material online). Because absolute divergence
measurements are considered to be unreliable for young populations and/or populations that are not at equilibrium due
to ongoing differentiation processes (Nachman and Payseur
2012; Cruickshank and Hahn 2014), we further examined
2261
McGaughran et al. . doi:10.1093/molbev/msw093
MBE
FIG. 2. FST profiles and highly differentiated SNPs. Sliding window pairwise FST plotted for comparisons involving lineage B, from top to bottom: CC
versus CK; CC versus NB; CK versus NB. In each panel, the x-axis corresponds to the chromosomal location, whereas the y-axis represents FST, and
the top 1% of divergent regions is indicated in green. See supplementary figure S1 (Supplementary Material online) for additional lineage/
population FST comparisons.
FIG. 3. Divergence patterns consistent with selection or drift. Results of analysis exploring whether Tajima’s D (TD) patterns in outlier windows
relative to the genome-wide TD baseline are facilitated by selection or drift for all lineages (“Lineages”), for lineages across islands (“Mauritius_C” for
lineage C La Réunion vs. Mauritius, “Mauritius_D” for lineage D La Réunion vs. Mauritius), and for populations within lineages on La Réunion Island
(“Réunion_B”, “Réunion_C” and “Réunion_D” for Réunion lineage B, C, and D populations, respectively).
genomic regions of extraordinary differentiation by assessing
selective sweep signatures using TD (see next section).
Selection and LD in Outlier Windows
In order to quantify the relative contribution of different
mechanisms (e.g., recombination, background selection,
adaptive selection, restricted gene flow) shaping the genomics
of speciation, we first investigated fine-scale linkage patterns
and their effects on genomic heterogeneity. For each lineage
and population, we estimated LD (R2) in 10-kb windows
along the genome and then checked to see if the median
R2 was different in outlier windows comparative to nonoutlier
windows. A pattern of positive correlation between FST and
R2 is potentially indicative of both a local reduction in gene
flow mediated by divergent selection, and selection with
hitch-hiking of linked neutral sites (Keinan and Reich 2010;
Nachman and Payseur 2012; Feulner et al. 2015). We found a
slight, but consistent, increase of median R2 in outlier versus
2262
nonoutlier windows, potentially indicating a local reduction
in gene flow mediated by either divergent selection or selection with hitch-hiking as outlined (supplementary fig. S4,
Supplementary Material online). However, individual comparisons were, for the most part, not significantly different
(after Bonferroni correction, the Wilcoxon test P > 0.003 in
14/17 tests).
Next, we explored whether observed divergence patterns
are facilitated by selection or drift, based on analysis of TD in
outlier windows relative to the genome-wide TD average. TD is
a statistic that compares the average number of pairwise
differences in a sample to the number of segregating sites.
We expect positive selection to give a negative TD in the
absence of demographic effects, and a positive TD is expected
in the case of balancing selection. We use TD here to classify
the evolutionary process resulting in a given pattern of differentiation based on the premise that regions that are differentiated as a result of a local restriction of gene flow should
show a local signature of neutral evolution (i.e., no skew in the
Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093
MBE
FIG. 4. Significant genotype–phenotype associations. Manhattan plot showing 175 significant SNPs (in purple) detected in easyGWAS for pH
tolerance. The SNP for the Ppa-nhx contig20-snap.250 gene is highlighted with an arrow, whereas the four genomic regions the 175 SNPs lie in are
outlined in black boxes. Mean pairwise LD (R2) between all significant SNPs in these regions, from left to right in the figure, is: 0.79, 0.57, 0.55, and
0.01.
allele frequency spectrum; Nachman and Payseur 2012).
Meanwhile, divergent regions resulting from selection with
hitch-hiking at linked sites should show a characteristic skew
of the spectrum, resulting in a negative TD (indicative of an
excess of rare alleles in a population following a selective
sweep, or of background selection in the case where both
populations in a pair show such a pattern; Smith and Haigh
1974; Charlesworth et al. 1993). Thus, following Feulner et al.
(2015), we used shifts in the allele frequency spectrum, calculated as TD across the genome, to partition outlier windows
into three mutually exclusive categories among lineage and
population comparisons, based on contrasting outlier TD values to the genome-wide mean TD: (1) TD reduced relative to
genome-wide mean in both population pairs, consistent with
background selection; (2) TD reduced relative to the genomewide mean in one of the two population pairs, consistent
with adaptive/positive selection; and (3) neutral TD patterns,
where TD in both population pairs was not significantly different from the genome-wide mean, consistent with restricted gene flow from neutral (e.g., genetic drift)
processes. Note that outlier windows make up a small percentage of the total genome (e.g., 190 10-kb windows for
lineages and 75–360 10-kb windows for populations; table 2),
thus these TD analyses are referring to only very small genomic regions.
We found that 53% of windows for lineage comparisons
corresponded to adaptive selection, 24% to background selection, and 23% to restricted gene flow. For population comparisons across islands, we found that 33–35%, 7–8%, and
57–60% of outlier windows corresponded to adaptive selection, background selection, and restricted gene flow, respectively. For Réunion populations, these values corresponded to
34–48%, 13–22%, and 40–50% (fig. 3 and supplementary ta
ble S6, Supplementary Material online). Thus, lineages appear
to be most highly subject to adaptive selection, populations
across islands were most subject to restricted gene flow, and
populations within Réunion were similarly subject to adaptive
selection and restricted gene flow.
Genes in Outlier Windows
Names for all genes (based on homology searches using
BLAST against C. elegans), found within common outlier windows for lineage and population comparisons can be found in
supplementary table S7 (Supplementary Material online). In
lineage comparisons, genes in overlapping outlier windows
were overrepresented with functions involved in positive
regulation of vulval development (GO:0040026; P ¼ 0.001),
muscle myosin thick filament assembly (GO:0030241; P ¼ 0.
008), apical protein localization (GO:0045176; P ¼ 0.009), and
mRNA processing (GO:0006397; P ¼ 0.009), whereas those
common to populations were overrepresented with functions involved in mRNA stabilization (GO:0048255; P ¼ 0.
001 and P ¼ 0.003 for lineage B and C, respectively), sensory
perception of taste (GO:0050909; P ¼ 0.001; lineage B populations), regulation of cell proliferation (GO:0042127; P ¼ 0.
002, lineage C populations), and detection of temperature
stimulus (GO:0016048; P ¼ 0.003, lineage D populations).
Phenotypic Profiles of Differentiation among Lineages
and Populations
Natural Variation in Phenotype
We screened a subset of 130 strains (lineages A2, B, C, and D;
supplementary table S2, Supplementary Material online), for
variation in their tolerance to a pH of 5 (see Materials and
Methods). We first examined the environmental distribution
of this trait, and found no evidence for spatial autocorrelation
in environmental (soil) pH (Moran’s I P value ¼ 0.870).
However, both soil pH and pH tolerance among nematodes
varied significantly with geographic location (Kruskal–Wallis
rank sum tests: v2 ¼ 130, df ¼ 10, P < 0.001; v2 ¼ 27.07,
df ¼ 10, P ¼ 0.003; for soil pH and pH tolerance among nematodes, respectively) (supplementary fig. S5, Supplementary
Material online). Variation in pH tolerance among the tested
strains was significantly correlated with local soil pH
(Spearman’s rank correlation rho ¼ 0.254; P ¼ 0.003), and
this correlation was only slightly reduced if the individual with
the highest mortality in our pH assays (i.e., a potential outlier)
was removed from the analysis (Spearman’s rank correlation
rho ¼ 0.236; P ¼ 0.007).
Associations between Genotype and Phenotype
To test whether the identified phenotypic variance in pH
tolerance could be linked to its genotypic variance, we used
easyGWAS (Grimm et al. 2012), an integrated interspecies
platform for GWAS. We used the EMMAX algorithm (Kang
et al. 2010) in easyGWAS to perform genome-wide association mappings whereas accounting for confounding by population structure. We identified a total of 175 significant
GWAS hits, which fell in four genomic regions (fig. 4). We
calculated all pairwise LD values for these four genomic regions (see fig. 4), and all hits were extracted and used to
2263
McGaughran et al. . doi:10.1093/molbev/msw093
MBE
FIG. 5. Functional follow-up of GWAS candidate Pristionchus pacificus nhx gene. (A) Mortality (%) results of pH assays performed on RS2333 (wildtype phenotype), RSC021 (mortality phenotype), three independent transgenic RSC021 lines for which a copy of the Ppa-nhx contig20-snap.250
gene was injected from RS2333 into RSC021 (Lines A, B, and C), one independent line for which a copy of the Ppa-nhx contig20-snap.250 gene was
injected from RSC021 into RSC021 (Line D), and one independent line for which a construct without the Ppa-nhx contig20-snap.250 gene was
injected into RSC021 (Line E). Rescue of the phenotype is seen in the transgenic lines, which have significantly (“***” indicates P < 0.0001;
“**” ¼ P < 0.001) reduced mortality compared with the wild-type RSC021 strain. In the case of Line A, P ¼ 0.0132, which was not significant after
Bonferroni correction. Inset, top-right: dumpy-like phenotype seen in some RSC021 individuals after 24-h incubation in a pH 5 solution. (B) Gene
structure of the Ppa-nhx contig20-snap.250 gene, determined via laboratory RACE experiments. This construct (running from Sal1 cutting points
on the left and right of the gene) was used for micro-injection experiments; the splice leader position (“SL1”) is shown at the left of the gene
structure, whereas the original SNP identified in easyGWAS is shown to the right; 30 - and 50 -UTR, exons and introns, and the genomic region (from
4,000 to 13,000 bp) of Ppa-nhx contig20-snap.250 are indicated by the key at the top of the panel. (C) Regional association and linkage disequilibrium plot for the significantly associated focal easyGWAS SNP at ChrI:21,179,317. This SNP (in magenta) is located in an nhx-9 homologue of the
nhx-9 gene in Caenorhabditis elegans. The linkage disequilibrium structure is highlighted in different colors, where red colors illustrate strong LD
and blue colors indicate weak or no LD. Below the zoomed-in regional Manhattan plot, the minor allele frequency (MAF) for each SNP is shown, as
well as gene annotations for this region. Note that all available SNPs with a MAF >10% were used to generate this LD plot.
identify potential candidate loci underlying pH tolerance in P.
pacificus (below).
Functional Analysis of a GWAS-Derived pH Gene Candidate
Among the candidate loci identified in easyGWAS was a homologue of the C. elegans nhx family (supplementary fig. S6
2264
and table S8, Supplementary Material online). One member
of this family, nhx-9, is known to encode a sodium/proton
exchanger, expressed intra-cellularly in C. elegans (Nehrke and
Melvin 2002). Involved in the regulation of pH, NHX proteins
are thought to prevent intracellular acidification by catalyzing
the exchange of vesicular sodium for an intracellular proton
(Nehrke and Melvin 2002). A comparison of genomic
Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093
variation among the 130 strains in our data set (and the
laboratory reference strain, RS2333), found 98 polymorphisms
in the 10-kb genomic region encompassing the predicted P.
pacificus nhx gene (contig20-snap.250). In survival assays, we
found that most strains had very low mortality when exposed
to pH 5, whereas one strain, RSC021, had mortality that was
significantly higher (T129 ¼ 16.11, P < 0.001; 95% CI: 14.25–18.
77). In each case, RSC021 individuals that died during the
assay had a dumpy-like phenotype (fig. 5A) that was not
present in any of the other tested strains.
After determining the exact gene structure of the Ppa-nhx
contig20-snap.250 (fig. 5B), we generated transgenic RSC021
animals, carrying extra copies of Ppa-nhx contig20-snap.250
from the laboratory reference strain, RS2333, which showed a
0% mortality phenotype in our assays. These transgenic animals were successfully rescued, showing significantly reduced,
or no, mortality in our pH assay (P ¼ 0.0132, 0.0013, and
0.0013, for transgenic RSC021 lines A, B, and C, respectively;
light blue bars in fig. 5A). In contrast, control transgenic lines
carrying only the Ppa-egl-20::rfp reporter were not rescued
(grey bar in fig. 5A). To determine whether over-expression
of a Ppa-nhx contig20-snap.250 gene from the RSC021 strain
itself was also able to cause rescue of the phenotype, we
created a similar construct of Ppa-nhx contig20-snap.250
from RSC021. In these transgenic animals, we observed partial rescue (green bar; fig. 5A), suggesting that genomic variation within the locus (i.e., 98 polymorphisms, above), as
well as Ppa-nhx contig20-snap.250 expression differences between RS2333 and RSC021, are responsible for the pH
phenotype.
Finally, we examined local linkage in the region of
contig20-snap.250, plotting R2 between the GWAS focal
SNP and all other SNPs in the genomic region of contig20snap.250 (fig. 5C). We examined genomic FST, p, and TD patterns around the Ppa-nhx contig20-snap.250 to see if we
could find evidence for a selective sweep (i.e., high FST, low
p and low, negative TD; Olson-Manning et al. 2012). For this
analysis, we compared populations from CO and SS1. Soil pH
was low at CO and high at SS1, but tolerance to pH 5 was high
at CO and low at SS1 (i.e., nematodes collected from a low soil
pH environment – CO – showed high tolerance to pH 5 in
assays, whereas those collected from a higher pH environment – SS1 – showed lower tolerance to pH 5 in assays).
We found that genomic FST between the two populations
was low, whereas p was reduced in both populations; TD was
low and negative for CO, but neutral (close to 0) for SS1,
which may be indicative of adaptive selection operating in
this region for the CO population (supplementary fig. S7,
Supplementary Material online). We also examined FST
among Réunion populations at the Ppa-nhx contig20-snap.
250 locus to see if we could find a pattern in which FST was
higher for sites characterized by lower-pH soils comparative
to higher-pH soils (supplementary fig. S5, Supplementary
Material online). Instead, we found that FST was lowest
among populations from the same lineage, and this overrode any potential pH-trait driven effects on FST at the Ppanhx contig20-snap.250 locus (supplementary table S9,
Supplementary Material online).
MBE
Discussion
Heterogeneous Genomic Differentiation among
Lineages and Populations
Genetic divergence can be influenced by several factors, including gene flow, drift, mutation, recombination, and natural
selection. Over evolutionary time scales, the action of these
factors at the micro-scale can eventually result in changes at
the macro-scale, to promote speciation. Here, we focused on
delineating patterns of genomic divergence across several
points of the speciation continuum in the nematode, P.
pacificus.
FST analysis revealed that highly differentiated genomic
regions were rarely shared across pairwise comparisons involving evolutionary lineages, suggesting that these regions
of the P. pacificus genome are evolving independently among
isolated lineages. This is in line with related studies in other
organisms, in which patterns of genomic divergence have
been shown to be widely dispersed across the genome (e.g.,
threespine stickleback; Hohenlohe et al. 2010; Deagle et al.
2012; Jones et al. 2012; and Anopheles gambiae; Lawniczak
et al. 2010). Such patterns may not be particularly surprising,
given that the different P. pacificus lineages are, for the most
part, associated with different beetle species living in different
ecological/geographic environments. However, in the case of
more recently diverged, allopatric populations (that do often
share beetle host species), we found that genomic regions of
extraordinary differentiation were more often shared than
would be expected by chance. These genomic regions may
therefore contain loci important in incipient speciation processes. Indeed, our functional enrichment analysis found an
over-representation of genes involved in environmental
sensation (e.g., sensory perception of taste, detection of temperature stimulus) in outlier windows in population
comparisons.
Heterogeneity of genomic divergence may be due to several factors, and patterns of FST divergence in particular, may
vary in a manner that is independent of local adaptation or
speciation (Cruickshank and Hahn 2014). For example, shared
recombination and mutation profiles among populations,
background selection, hitch-hiking, migration, and recent
population splitting events, can all result in shared withinpopulation polymorphisms that reduce local diversity and
lead to between-population differentiation, and FST divergence may simply result from the stochasticity of neutral,
but convergent, genetic drift (Kaplan et al. 1989; Nordborg
et al. 1996; Slatkin and Wiehe 1998; Nosil and Feder 2012;
Cruickshank and Hahn 2014; Seehausen et al. 2014). Recent
studies reporting heterogeneous landscapes of differentiation
have provided important insights into the genomic profiles
underlying adaptive divergence. By using various genomic
parameters, and the relationships between them, these studies have helped elucidate the heterogeneous processes underlying genomic diversification and ecological speciation
(e.g., Nosil et al. 2009; Lawniczak et al. 2010; Roesti et al.
2012; Feulner et al. 2015).
Here, we used predictions about the behavior of genomic
diversity, Tajima’s D, relative and absolute divergence, and
2265
MBE
McGaughran et al. . doi:10.1093/molbev/msw093
linkage/recombination, to provide a basis for understanding
the processes underlying the FST patterns we detected (sensu
Feulner et al. 2015). We found a slight, but consistent increase
in median R2 in outlier versus nonoutlier genomic windows.
Such a pattern may be suggestive of either a local reduction in
gene flow mediated by divergent selection, or of selection
with hitch-hiking of linked neutral sites (Keinan and Reich
2010; Nachman and Payseur 2012; Feulner et al. 2015). The
former of these is a particularly intriguing possibility that
Cruickshank and Hahn (2014) note is often a neglected explanation for genomic islands of divergence. We also found
that median Dxy was consistently lower in outlier windows
comparative to nonoutlier windows. This is consistent with
genomic comparisons of closely related species in other taxa,
where regions elevated for measures of relative divergence like
FST generally have not also shown high Dxy values relative to
the genome-wide average (Cruickshank and Hahn 2014). The
discrepancy between relative and absolute measures of divergence has led to debate about the validity of interpreting
patterns of relative genomic divergence as evidence for speciation with gene flow, because islands of high relative but not
absolute divergence can also be driven by the effects of background selection in isolated species or populations
(Cruickshank and Hahn 2014). In P. pacificus (as for C. elegans;
Cutter and Choi 2010; Rockman et al. 2010; Andersen et al.
2012), the balance between recombination and mutation is
highly influenced by a predominantly self-fertilizing reproductive mode, and previous work has shown that background
selection has been an important factor shaping genomic diversity (Rödelsperger et al. 2014).
We sought further clarification of these issues by examining allele frequency spectra using Tajima’s D, and found that,
as for the degree of shared divergent genomic regions (above),
TD patterns were heterogeneous between lineages and populations. Specifically, 53% of divergent regions for lineage
comparisons corresponded to adaptive selection, 24% to
background selection, and 23% to restricted gene flow.
Meanwhile, for Réunion populations, these values corresponded to 34–48%, 13–22%, and 40–50%. Thus, lineages
appeared to be most highly subject to adaptive selection,
whereas populations across islands were most subject to restricted gene flow, and populations within Réunion Island
were similarly subject to adaptive selection and restricted
gene flow. For comparison, in sticklebacks, 22–55% of the
top 1% of divergent regions were shown to be consistent
with a local reduction in gene flow, whereas 25–75% of
such regions were shaped by hitch-hiking effects around selected variants (Feulner et al. 2015). Conversely, in flycatchers,
heterogeneity of genomic differentiation was shown to be
largely due to background selection and selective sweeps in
genomic regions of low recombination (Burri et al. 2015).
Genomic Architecture of Phenotypic Differentiation
as a Precursor to Ecological Speciation?
According to classic ecological theory, populations diverge for
specific phenotypes and genotypes that influence survival
and reproduction when exposed to different environments
(Mayr 1963; Schluter 2000). This process is facilitated when
2266
emigrants show reduced success as they immigrate to a foreign environment that is ecologically divergent to its native
habitat (Nosil et al. 2005). Through acting on phenotypes to
eventually change genotypes over time, environmental mechanisms thus play an important role in the evolution of reproductive isolation among populations, resulting in the
formation of new species.
In trait analyses, we found evidence for pH tolerance evolving in association with local environmental (soil) pH. This
complements previous work showing that La Réunion populations are undergoing phenotypic divergence in an array of
ecologically relevant traits (Mayer and Sommer 2011;
McGaughran et al. 2013b; McGaughran and Sommer 2014;
Moreno et al. 2016). Such phenotypic differentiation may be
driven by divergent selective effects among geographic regions (e.g., Kawecki and Ebert 2004), and we have shown
elsewhere that local environmental variables explain a significant proportion of divergence in P. pacificus (McGaughran
et al. 2014).
Here, we examined the genomic profile in the region surrounding a pH gene candidate, as identified with GWAS analysis, and found putative support for adaptive selection
operating at this locus (i.e., Tajima’s D was low and negative
in the population where soil pH was at its lowest, but pH
tolerance was at its highest). However, FST at the pH candidate locus was not shown to be higher for populations at
locations characterized by low-pH soils comparative to highpH soils; instead FST was lowest among populations from the
same lineage, and this over-rode any potential pH-trait driven
effects. This suggests that both ecological genomics and forward genetics studies focused on a priori candidate loci may
benefit from an holistic approach that incorporates genomic
analysis of the regions upstream and downstream of the candidate gene.
Conclusions
Speciation is a process that varies continuously, through
quantitative variation in the degree of phenotypic divergence
and the completeness of reproductive isolation, and in the
profile of highly differentiating genomic regions. Despite years
of research, we still understand very little about the consequences of genomic divergence for speciation. Yet, the identification of differentiated genomic regions and the genes
involved in local adaptation and ecological diversification represents a crucial first step that is enhanced by systems which
provide access to multiple comparisons across the scale of
evolutionary divergence. Indeed, P. pacificus, residing at an
interesting point on the speciation continuum, inhabiting
heterogeneous environments with diverse environmental
pressures, and having a genomic toolkit available that allows
functional follow-up of interesting candidate genes, is a useful system for studying incipient speciation. Further integration of ecological and functional genomic studies will
enable the establishment of direct links between patterns of
genomic divergence and speciation in both this, and
other, key species.
Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093
Materials and Methods
Sampling, Data Processing, and Validation
Sampling
A total of 264 P. pacificus strains (i.e., genetically identical
individuals of a thawed isogenic line that was created via
the inbreeding of a single hermaphroditic individual over 10
generations), including 39 from Rödelsperger et al. (2014),
were selected from our collections based on their availability,
geographic origin of collection, and evolutionary lineage
based on our previous molecular analysis of mitochondrial
and microsatellite data (Herrmann et al. 2010; Morgan et al.
2014). This resulted in a collective 30 “populations” (27 from
Réunion, 3 from Mauritius; supplementary table S1,
Supplementary Material online), corresponding to groups
of strains which were collected at the same geographic location and are of the same genomic ancestry. While efforts were
made to select a well-balanced data set in terms of sample
size across lineage and geographic location, we recognize that
some sample sizes were small and uneven (supplementary
table S1, Supplementary Material online). This largely results
from the fact that, though our collections span >10 years of
efforts from a large team, they are nevertheless reliant upon
successful beetle catches, and beetle distribution and collecting success are generally unpredictable from year to year. In
addition, nematode infestation rates vary among beetle species. Thus, whereas we used the full data set to initially estimate baseline population genetic parameters for the most
diverse data set possible, for all other genome-wide analyses,
we accounted for the natural differences in sample size
among lineages and localities by randomly reducing each
data set to create equal sample sizes among lineages
(n ¼ 17) and populations (n ¼ 8 or 9). We performed several
quality control checks for the population genetic parameters
calculated from our data sets to ensure accuracy after random sub-sampling (see below). Overall, these reduced data
sets included 4 lineages (A2, B, C and D); 11 Réunion populations (3, 6, and 2, from lineages B, C, and D, respectively),
and 2 Mauritius populations (1 from each of lineage C, and
D), and it is from these data sets that final population pairs
were analyzed (see below) (table 1).
Genomic DNA Preparation
Genomic DNA was prepared for all strains following the protocol outlined in Rödelsperger et al. (2014). In brief, DNA was
extracted from pooled individuals of each isogenic line using
the MasterPure DNA purification kit from Epicentre (Biozym
Scientific GmbH, Hessisch Oldendorf, Germany) and genomic
libraries were generated using the TruSeq DNA Sample
Preparation Kit ver. 2 from Illumina (Illumina Inc., CA).
DNA was sheared using the Covaris S2 System (Covaris
Ltd., Woodingdean Brighton, United Kingdom) and end repair, adenylation, and adaptor ligation were performed following the kit protocol. Strains were run on a 2% agarose gel
with slices ranging from 400- to 500-bp extracted to give a
final insert size of 300–400 bp. After PCR amplification,
libraries were validated on an Agilent Bioanalyzer DNA
1,000 chip (Agilent Technologies GmbH, Waldbronn,
MBE
Germany) and diluted before sequencing on an Illumina
Genome Analyzer II platform.
Alignment, Variant Calling, and Validation
For all Illumina sequence read data, bases in the first 36 bp of
raw reads with a quality <20 (error probability ¼ 1%) were
masked and reads were trimmed at the first occurrence of a
low (<20) quality base in the rest of the read. Quality-filtered
paired-end reads were aligned to the P. pacificus genome
assembly (ver. Hybrid1), which spans a total length of
172.5 Mb and six chromosomes (Rödelsperger et al.
2014), using stampy ver. 1.0.20 (Lunter and Goodson 2011).
Duplicate reads were removed and remaining reads were locally realigned using GATK ver. 2.1.13 (McKenna et al. 2010).
SNPs were called using samtools ver. 0.1.18 (Li and Durbin
2009). The accuracy of variant calls under the same pipelines
applied here was analyzed previously with Sanger sequencing
data and resulted in an estimated variant call accuracy of
98%; see Rödelsperger et al. (2014) for further details.
From our total sequencing data set of 264 strains, 6,867,575
SNP positions were retrieved with respect to the RS2333
California reference (Hybrid1 assembly) after applying a minor allele frequency (MAF) filter of 0.01 during SNP calling.
This set of 6.8 million SNPs was used to estimate population
genomic parameters (nucleotide diversity [p] and Tajima’s D
[TD]; see below) among all samples from given lineages and
populations. A further subset of SNPs, derived using a minor
allele frequency (MAF) filter of 0.25 was then used for all
pairwise population differentiation comparisons (i.e., FST).
The full data set was used to estimate baseline population
genetic parameters at the lineage scale, whereas all “population” calculations used reduced, equal-sized data sets. To ensure accuracy among the down-sampled data sets, we
checked for consistency among full (n > 40) and downsampled data (n ¼ 8 or 9) for two populations, using measures of p and TD, via correlation analysis (supplementary
table S10, Supplementary Material online).
For the GWAS approaches (below), we generated a data
set of 130 strains, for which we had phenotype data, and 2.1
million SNPs, including genotypes that were imputed using
fastPHASE ver. 1.2 (Scheet and Stephens 2006) at positions
that could be genotyped based on sequencing data in at least
95% of strains. Accuracy of imputed genotypes was evaluated
based on resequencing data for 14 strains and estimated to be
>99% (Rödelsperger et al. 2014).
Population Genetic Parameters
Population Structure
To illustrate the relationship among all sampled populations,
we utilized a set of 208,841 genome-wide SNPs to build a
neighbor-joining tree based on p-distances in MEGA ver. 6
(Tamura et al. 2013).
Population Genetic Estimators
Population genetic estimators, including nucleotide diversity
(p), and Tajima’s D (TD) were calculated with VCFtools ver.
0.1.11 (Danecek et al. 2011) for several different lineage and
2267
McGaughran et al. . doi:10.1093/molbev/msw093
MBE
population categorizations (table 1). For lineages, these parameters were calculated for all samples for all lineages, for
Réunion lineages, and for Mauritius lineages, as well as for
sub-sampled Réunion lineages where n ¼ 17. For populations,
Réunion and Mauritius data sets were all sub-sampled to give
even sample sizes across populations so that n ¼ 8 or 9 before
estimators were calculated (table 1). Both p and TD were
averaged across the genome in nonoverlapping windows to
ensure statistical independence of windows. Window sizes of
1-, 10-, and 100-kb were used to confirm that results were
quantitatively the same, regardless of window size. Diversity
(p) estimates were corrected for the number of sites for which
genotypes were available (28,945,735 sites for all strains),
whereas additional estimates of TD were performed after filtering at a minor allele frequency cut-off of 25% for each data
set, to check that estimates did not reflect genotyping error
manifesting as rare variants (supplementary table S10,
Supplementary Material online).
with TD, by comparing outlier-window TD with the nonoutlier-window TD for each population comparison. Following
Feulner et al. (2015), we classify divergent regions into three
categories: background selection if TD dropped significantly
below the genome-wide average in both populations; adaptation in one or the other population if TD dropped significantly below the genome-wide average only in the respective
population; and reduced gene flow if TD appeared neutral (i.e.,
not significantly below the genome-wide average) in both
populations.
Genomic Profiles of Differentiation among Lineages
and Populations
Phenotypic Profiles of Differentiation among Lineages
and Populations
Genomic Differentiation Analyses
Relative divergence (Weir and Cockerham’s FST; Weir and
Cockerham 1984) and absolute divergence (Dxy) were calculated with VCFtools for all of the possible lineage pairwise
comparisons, and for all population pairs, using data sets of
equal size (Waples 1998), and a minor allele frequency cut-off
of 25% (Feulner et al. 2015) across each pairwise comparison.
Natural Variation in pH Tolerance
About 130 strains, selected to encompass collections from a
variety of environmental gradients on La Réunion Island (e.g.,
altitude, temperature, precipitation), were screened for variation in pH tolerance (supplementary table S2,
Supplementary Material online). All strains were maintained
at 20 C on Escherichia coli OP50 (Brenner 1974) for at least
3 weeks before assaying, and up to six biological replicates
were performed for each assay.
In assays, the pH solution (with concentration determined
in an initial range-finding experiment) was prepared by
autoclaving standard K-medium (2.36 g KCl and 3.0 g NaCl
per L distilled H2O) and adjusting the pH to 5 by adding 1 M
HCl and/or 1 M NaOH. Following the protocol of Khanna
et al. (1997), a 50-ml aliquot of concentrated worm suspension
was exposed to 250 ml of the pH solution in a 24-well tissue
culture plate (Greiner Bio-One GmbH, Frickenhausen,
Germany). After 24-h incubation at pH 5, strains were transferred to an OP50-seeded NGM plate and mortality of adults
scored.
Outlier Windows
Outlier FST windows were determined empirically by selecting
windows above the top 1% of the empirical distribution as
putative outliers. In addition, a permutation approach was
taken in R ver. 3.1.1 (R Core Team 2014), in which loci across
the genome were permuted 1,000,000 times. Window estimates of FST were then tested against permutations holding
the same amount of variable sites using an R script modified
from Feulner et al. (2015). Putative outliers from the permutation approach were identified using a false discovery rate of
0.01 and final outlier windows were those which were significant in both the empirical and permutation approaches.
Outlier windows were compared among all pairwise lineage and population comparisons to examine the degree of
overlap across comparisons of increasing population divergence. To evaluate how many overlapping outlier windows
would be expected by chance, windows were permuted 1,000
times using a custom-made R script.
For all comparisons, outlier windows were finally analyzed
for their gene content based on homology searches using
BLAST against C. elegans, and enrichment of functional classes
of these genes among regions for populations and lineages
was determined using BLAST2go analysis (BLAST2GO ver.
3.1.3; Conesa et al. 2005).
Selection Signals in Divergent Regions
To assess the molecular signature of selection in outlier windows, shifts in the allele frequency spectra were evaluated
2268
LD Patterns in Divergent Regions
Direct measures of fine-scale population linkage disequilibrium (R2) were obtained using plink ver. 1.07 (Purcell et al.
2007) for each lineage and population separately. LD estimates were averaged over nonoutlier 10-kb windows
throughout the genome to obtain genome-wide baseline estimates, and then over each outlier region.
Environmental Distribution of pH Tolerance
To examine the environmental distribution of pH tolerance
among strains on La Reunion Island, we performed a variety
of analyses in R. First, we checked for spatial autocorrelation in
the soil pH data using Moran’s I. Next, we evaluated whether
both local soil pH and pH tolerance varied significantly
among geographic locations, using the nonparametric
Kruskal–Wallis one-way analysis of variance (ANOVA) by
ranks. Finally, to evaluate whether pH tolerance was correlated with local soil pH, we used the nonparametric
Spearman’s (rho) correlation analysis.
Associations between Genotype and Phenotype
Genome-wide association analysis was used to identify SNPs
that were significantly associated with pH tolerance in the
Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093
cloud service, easyGWAS (Grimm et al. 2012). Standard linear
regression models tend to detect many spurious associations
since they do not account for confounding factors, such as
population structure and cryptic relatedness (Newman et al.
2001). It has been shown in the past that linear mixed models
are well suited to correct for hidden confounding of this type
(Kang et al. 2010; Lippert et al. 2011) by modeling the genetic
relationship between samples as a random effect. Thus, for
our analysis, we used easyGWAS with the EMMAX algorithm
(Kang et al. 2010). We estimated the genetic similarity between individuals by using the realized relationship kinship
matrix (Hayes et al. 2009). To assess the degree of inflated test
statistics, we computed genomic control (GC), measuring the
deviation of the observed median test statistic from the expected one (Devlin and Roeder 1999). A GC value larger than
one indicates inflated P values, whereas a GC value smaller
than one is an indicator of deflated P values. In our final
analysis, the GC value was 1.02.
We are convinced that our analysis adequately corrected
for strong population and lineage structure for several reasons: (1) our GC value is close to 1, indicating that we correctly accounted for hidden types of confounding (including
population structure) in our model; (2) we later used functional validation of our easyGWAS results to confirm that a
significant SNP identified in our analysis plays a causative role
in our pH phenotype (see Results); and (3) because it has
been shown in other species, such as A. thaliana and outbred
rats, that linear mixed models with a kinship matrix are able
to account for strong degrees of population structure, both
between samples and between distinct sub-populations (e.g.,
Long et al. 1998; Atwell et al. 2010; Horton et al. 2012; Rat
Genome Sequencing and Mapping Consortium et al. 2013).
For our analysis, which included only homozygous sites, we
encoded the major allele with “0” and the minor allele with
“2” and excluded SNPs with a minor allele frequency (MAF) of
<10% in all experiments. After MAF filtering of the 130-strain
data set, a set of 870,876 SNPs was left, a large proportion of
which were observed to be one-to-one copies of single SNPs.
Therefore, we reduced the data set to include only single
copies of each SNP, creating a final subset of 50,833 SNPs.
Excluding duplicated SNPs from the analysis in this manner
was crucial because we otherwise count SNPs multiple times,
leading to skewed QQ-plots and GC values. However, to draw
Manhattan plots, we used all SNPs after MAF filtering since
we did not want to destroy any linkage disequilibrium
structure.
To account for environmental and location-specific factors
in the analysis, we used the covariate option in easyGWAS for
the following covariates: altitude, beetle host, ecozone, location, all. We did not transform the quantitative covariate
(altitude), but encoded the X categorical covariates (location,
ecozone, beetle host, all) as dummy variables: if X can have k
different categories, we created k1 binary variables, each
indicating one category. pH tolerance was also tested in
easyGWAS in raw data format and after square-root normalization. For the final analysis, we used square-root normalized
data and included the covariate, ecozone. Bonferroni correction was applied for multiple hypothesis testing.
MBE
In addition, we computed which parts of the phenotypic
variance could be attributed to the random (genetic contribution) and to the fixed effect (environmental contribution).
For this purpose, we conducted a 10-fold cross-validation for
pH tolerance, y. For each fold, we trained a linear mixed
model using only the kinship matrix and the covariates,
and predicted the phenotype, ^y, using the remaining evaluation set. Predictions were obtained by summing up the contributions of the fixed and random effects:
^y ¼ Ctest bþKtest ðKtrain þ dIÞ1 ðytrain Ctrain bÞ
where C are the included covariates (if no covariates are included, C is a vector of ones), K is the kinship matrix, and b
and d are the learned parameters in the training step.
Finally, we computed the variance explained vðytest ; ^yÞ as
follows:
vðytest ; ^yÞ ¼ 1 Varðytest ^yÞ
Varðytest Þ
This resulted in a final explained variance (i.e., summed
contributions of random [genetic contribution] and fixed
[environmental contribution] effects) of 0.06.
Functional Analysis of a GWAS-Derived pH Gene Candidate
The candidate gene list resulting from easyGWAS analysis
included a P. pacificus nhx gene, nhx-9, which encodes a sodium/proton exchanger, expressed intra-cellularly in C. elegans (Nehrke and Melvin 2002). Among all tested isolates,
we identified one strain, RSC021, that had significantly higher
mortality when exposed to solutions of pH 5. Meanwhile the
reference strain of P. pacificus, RS2333, showed 0% mortality
in the same assay. Thus, we created an extra-chromosomal
array containing 2 ng ll 1 of a genetic construct of RS2333
nhx gene (Ppa-nhx contig20-snap.250), a Ppa-egl-20::rfp (red
fluorescent protein) reporter (10 ng ll 1), and genomic carrier DNA (60 ng ll 1) from the recipient line. During this
process, we used the SalI restriction enzyme to cut DNA for
insertion into the array, and we modified the amplified fragment (i.e., the construct) with primers to produce these restriction sites (supplementary table S5, Supplementary
Material online). The array was used to perform transgenic
micro-injection experiments in RSC021, injecting into the
germlines of adult hermaphrodites. Transgenic lines were
scored for their mortality after pH 5 exposure over multiple
generations to determine whether the Ppa-nhx contig20snap.250 gene construct could cause rescue of the mortality
phenotype in RSC021. As a control, we also created an
RSC021 line injected with only the Ppa-egl-20::rfp reporter
and tested it for rescue in our assay. In addition, we injected
a second Ppa-nhx contig20-snap.250 gene construct amplified from wild-type (wt) RSC021, into wt-RSC021 to test if
over-expression of the Ppa-nhx contig20-snap.250 gene, regardless of the donor origin, was causing rescue of the phenotype. All constructs consisted of a 12-kb fragment
encompassing the predicted Ppa-nhx contig20-snap.250 coding region plus 5 kb of upstream and downstream sequence. We used RACE (Frohman et al. 1988) to examine
2269
McGaughran et al. . doi:10.1093/molbev/msw093
the potential differences in the Ppa-nhx contig20-snap.250
gene structure between our high- (RSC021) and low(RS2333) mortality strains.
LD and Other Population Genetic Estimators for the
easyGWAS SNP Region
We examined local linkage in the region of contig20-snap.250
by plotting R2 between the GWAS focal SNP and all other
SNPs in the local genomic region. Next, we plotted genomic
FST, p, and TD around contig20-snap.250 by comparing two
populations (CO and SS1) with divergent pH tolerance and
soil pH profiles. Finally, we calculated FST among Réunion
populations at the Ppa-nhx contig20-snap.250 locus
(996 bp) using Arlequin ver. 3.5 (Excoffier and Lischer 2010).
Supplementary Material
Supplementary figs. S1–S7 and tables S1–S10 are available at
Molecular Biology and Evolution online (http://www.mbe.
oxfordjournals.org/).
Acknowledgments
This project represents the fruition of several years of planning and implementation; the authors wish to thank everyone who played a role, including members of the Sommer
Laboratory and many colleagues and conference/workshop
participants for valuable discussions. In addition, we thank
Metta Riebesell for performing micro-injection experiments,
our La Réunion colleagues for logistic support during fieldwork (particularly Dr Jacques Rochat, La Réunion Insectarium,
and staff of La Réunion Parc National), and three anonymous
reviewers for their constructive comments on an earlier version of the manuscript. All genomic data has been submitted
to the European Nucleotide archive. Phenotypic, summary
statistics, and dynamic Manhattan plots corresponding to
our easyGWAS analyses, are available at: https://easygwas.
ethz.ch.
References
Andersen E, Gerke J, Shapiro JA, Crissman JR, Ghosh R, Bloom JS, Félix
M-A, Kruglyak L. 2012. Chromosome-scale selective sweeps shape
Caenorhabditis elegans genomic diversity. Nat Genet. 44:285–290.
Atwell S, Huang YS, Vilhjalmsson BJ, Willems G, Horton M, Li Y, Meng D,
Platt A, Tarone AM, Hu TT, et al. 2010. Genome-wide association
study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature
465:627–631.
Barrière A, Félix M-A. 2007. Temporal dynamics and linkage disequilibrium in natural populations of Caenorhabditis elegans populations.
Genetics 176:999–1011.
Rat Genome Sequencing and Mapping Consortium, Baud A, Hermsen R,
Guryev V, Stridh P, Graham D, McBride MW, Foroud T, Calderari S,
Diez M, et al. 2013. Combined sequence-based and genetic mapping
analysis of complex traits in outbred rats. Nat Genet. 45:767–775.
Brenner SJ. 1974. The genetics of Caenorhabditis elegans. Genetics
77:71–94.
Burri R, Nater A, Kawakami T, Mugal CF, Olason PI, Smeds L, Suh A,
Dutoit L, Bures S, Garamszegi LZ, et al. 2015. Linked selection and
recombination rate variation drive the evolution of the genomic
landscape of differentiation across the speciation continuum of
Ficedula flycatchers. Genet Res. 25:10–11.
2270
MBE
Charlesworth B, Morgan MT, Charlesworth D. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics
134:1289–1303.
Conesa A, Götz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. 2005.
Blast2GO: a universal tool for annotation, visualization and analysis
in functional genomics research. Bioinformatics 21:3674–3676.
Cruickshank TE, Hahn MW. 2014. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene
flow. Mol Ecol. 23:3133–3157.
Cutter AD, Choi JY. 2010. Natural selection shapes nucleotide polymorphism across the genome of the nematode Caenorhabditis briggsae.
Genome Res. 20:1103–1111.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA. 2011.
The variant call format and VCFtools. Bioinformatics 27:2156–2158.
Darwin C. 1859. On the origin of species. 6th ed. London: Murray.
Deagle BE, Jones FC, Chan YF, Absher DM, Kingsley DM, Reimchen TE.
2012. Population genomics of parallel phenotypic evolution in stickleback across stream-lake ecological transitions. Proc R Soc B.
279:1277–1286.
Devlin B, Roeder K. 1999. Genomic control for association studies.
Biometrics 4:997–1004.
Endler JA. 1986. Natural selection in the wild. Monographs in population
biology. Princeton: Princeton University Press.
Excoffier L, Lischer HEL. 2010. Arlequin suite ver 3.5: a new series of
programs to perform population genetic analyses under Linux and
Windows. Mol Ecol Res. 10:564–567.
Feder JL, Egan SP, Nosil P. 2012. The genomics of speciation-with-geneflow. Trends Genet. 28:342–350.
Feulner PGD, Chai FJJ, Panchal M, Huang Y, Eizaguirre C, Kalbe M, Lenz
TL, Samonte IE, Stoll M, Bornberg-Bauer E, et al. 2015. Genomics of
divergence along a continuum of parapatric population differentiation. PLoS Genet. 11:e1004966.
Frohman MA, Dush MK, Martin GR. 1988. Rapid production of full
length cDNAs from rare transcripts: amplification using a single
gene-specific oligonucleotide primer. Proc Natl Acad Sci U S A.
85:8998–9002.
Grimm D, Greshake B, Kleeberger S, Lippert C, Stegle O, Schölkopf B,
Weigel D, Borgwardt K. 2012. easyGWAS: an integrated interspecies
platform for performing genome-wide association studies.
ArXiv:1212.4788 [q-bio.GN].
Hanikenne M, Kroymann J, Trampczynska A, Bernal M, Motte P,
Clemens S, Kr€amer U. 2013. Hard selective sweep and ectopic
gene conversion in a gene cluster affording environmental adaptation. PLoS Genet. 9:e1003707.
Hayes BJ, Visscher PM, Goddard ME. 2009. Increased accuracy of artificial
selection by using the realized relationship matrix. Genetics Res.
91:47–60.
Hendry P, Bolnick DI, Berner D, Peichel CL. 2009. Along the speciation
continuum in sticklebacks. J Fish Biol. 75:2000–2036.
Herrmann M, Kienle S, Rochat J, Mayer WE, Sommer RJ. 2010. Haplotype
diversity of the nematode Pristionchus pacificus on Réunion in the
Indian Ocean suggests multiple independent invasions. Bio J Linn
Soc. 100:170–179.
Herrmann M, Mayer W, Hong R, Kienle S, Minasaki R, Sommer RJ. 2007.
The nematode Pristionchus pacificus (Nematoda: Diplogastridae) is
associated with the Oriental beetle Exomala orientalis (Coleoptera:
Scarabaeidae) in Japan. Zool Sci. 24:883–889.
Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, Cresko WA.
2010. Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet. 6:e100086.
Hong RL, Sommer RJ. 2006. Pristionchus pacificus: a well-rounded nematode. BioEssays 28:651–659.
Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A,
Wayan Muliyati N, Platt A, Gianluca Sperone F, Vilhjalmsson BJ, et al.
2012. Genome-wide patterns of genetic variation in worldwide
Arabidopsis thaliana accessions from the RegMap panel. Nat
Genet. 44:212–216.
Jones FC, Grabherr MG, Chan YF, Russell P, Mauceli E, Johnson J,
Swofford R, Pirun M, Zody MC, White S, et al. 2012. The genomic
Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093
basis of adaptive evolution in threespine sticklebacks. Nature
484:55–61.
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S, Freimer NB, Sabatti C,
Eskin E. 2010. Variance component model to account for sample
structure in genome-wide association studies. Nat Genet.
42:348–354.
Kaplan NL, Hudson RR, Langley CH. 1989. The ‘hitchhiking effect’ revisited. Genetics 123:887–899.
Kawecki TJ, Ebert D. 2004. Conceptual issues in local adaptation. Ecol
Lett. 7:1225–1241.
Keinan A, Reich D. 2010. Human population differentiation is strongly
correlated with local recombination rate. PLoS Genet. 6:e1000886.
Khanna N, Cressman CP, Tatara CP, Williams PL. 1997. Tolerance of the
nematode Caenorhabditis elegans to pH, salinity, and hardness in
aquatic media. Arch Environ Contam Toxicol. 32:110–114.
Lawniczak MKN, Emrich SJ, Holloway AK, Regier AP, Olson M, White B,
Redmond S, Fulton L, Appelbaum E, Godfrey J, et al. 2010.
Widespread divergence between incipient Anopheles gambiae species revealed by whole genome sequences. Science 330:512–514.
Li H, Durbin R. 2009. Fast and accurate short read alignment with
Burrows-Wheeler transform. Bioinformatics 25:1754–1760.
Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D. 2011.
FaST linear mixed models for genome-wide association studies. Nat
Methods. 8:833–835.
Long AD, Lyman RF, Langley CH, Mackay TF. 1998. Two sites in the Delta
gene region contribute to naturally occurring variation in bristle
number in Drosophila melanogaster. Genetics 149:999–1017.
Losos JB, Arnold SJ, Bejerano G, Brodie EDIII, Hibbett D, Hoekstra HE,
Mindell DP, Monteiro A, Moritz C, Allen Orr H, et al. 2013.
Evolutionary biology for the 21st century. PLoS Biol. 11:e1001466.
Lunter G, Goodson M. 2011. Stampy: a statistical algorithm for sensitive
and fast mapping of Illumina sequence reads. Genome Res.
21:936–939.
Mayer MG, Sommer RJ. 2011. Natural variation in Pristionchus pacificus
dauer formation reveals cross-preference rather than self-preference
of nematode dauer pheromones. Proc Biol Sci. 278:2784–2790.
Mayr E. 1963. Animal species and evolution. Oxford: Oxford University
Press.
McGaughran A, Morgan K, Sommer RJ. 2013a. Unravelling the
evolutionary history of the nematode Pristionchus pacificus:
from lineage diversification to island colonization. Evol Ecol.
3:667–675.
McGaughran A, Morgan K, Sommer RJ. 2013b. Natural variation in
chemosensation: lessons from an island nematode. Ecol Evol.
3:5209–5224.
McGaughran A, Morgan K, Sommer RJ. 2014. Environmental variables
explain genetic structure in a beetle-associated nematode. PLoS One
9:e87317.
McGaughran A, Sommer RJ. 2014. Natural variation in cold tolerance in
the nematode Pristionchus pacificus: the role of genotype and environment. Open Biol. 3:832–838.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K,
Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al.
2010. The genome analysis toolkit: a MapReduce framework for
analyzing next-generation DNA sequencing data. Genome Res.
20:1297–1303.
Moreno E, McGaughran A, Rödelsperger C, Zimmer M, Sommer RJ.
2016. Oxygen-induced social behaviours in Pristionchus pacficus
have a distinct evolutionary history and genetic regulation from
Caenorhabditis elegans. Proc R Soc B. 283:20152263.
Morgan K, McGaughran A, Ganeshan S, Herrmann M, Sommer RJ. 2014.
Landscape and oceanic barriers shape dispersal and population
structure in the island nematode Pristionchus pacificus. Bio J Linn
Soc. 112:1–15.
Morgan K, McGaughran A, Villate L, Herrmann M, Witte H, Bartelmes G,
Rochat J, Sommer RJ. 2012. Multi-locus analysis of Pristionchus pacificus on La Réunion Island reveals an evolutionary history shaped by
multiple introductions, constrained dispersal events and rare outcrossing. Mol Ecol. 21:250–266.
MBE
Nachman MW, Payseur BA. 2012. Recombination rate variation and
speciation: theoretical predictions and empirical results from rabbits
and mice. Philos Trans R Soc Lond B Biol Sci. 367:409–421.
Nehrke K, Melvin JE. 2002. The NHX family of Naþ-Hþ exchangers in
Caenorhabditis elegans. J Biol Chem. 277:20936–29044.
Newman DL, Abney M, McPeek MS, Ober C, Cox NJ. 2001. The importance of genealogy in determining genetic associations with complex
traits. Am J Hum Genet. 69:1146–1148.
Nordborg M, Borevitz JO, Bergelson J, Berry CC, Chory J, Hagenblad J,
Kreitman M, Maloof JN, Noyes T, Oefner PJ, et al. 2002. The extent of
linkage disequilibrium in Arabidopsis thaliana. Nat Genet.
30:190–193.
Nordborg M, Charlesworth B, Charlesworth D. 1996. The effect of recombination on background selection. Genet Res. 67:159–174.
Nosil P. 2012. Ecological speciation. Oxford: Oxford University Press.
Nosil P, Feder JL. 2012. Genomic divergence during speciation: causes
and consequences. Philos Trans R Soc Lond B Biol Sci. 367:332–342.
Nosil P, Harmon LJ, Seehausen O. 2009. Ecological explanations for (incomplete) speciation. Trends Ecol Evol. 24:145–156.
Nosil P, Vines TH, Funk DJ. 2005. Perspective: reproductive isolation
caused by natural selection against immigrants from divergent habitats. Evolution 59:705–719.
Olson-Manning CF, Wagner MR, Mitchell-Olds T. 2012. Adaptive evolution: evaluating empirical support for theoretical predictions. Nat
Rev Genet. 13:867–877.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D,
Maller J, Sklar P, de Bakker PI, Daly MJ, et al. 2007. PLINK: a toolset for
whole-genome association and population-based linkage analysis.
Am J Hum Genet. 81:559–575.
R Core Team. 2014. R: a language and environment for statistical computing. [cited 2014 Mar 23]. Available from: http://www.R-project.
org
Renaut S, Grassa CJ, Yeaman S, Moyers BT, Lai Z, Kane NC, Bowers JE,
Burke JM, Rieseberg LH, et al. 2013. Genomic islands of divergence
are not affected by geography of speciation in sunflowers. Nat
Commun. 4:1827.
Rockman MV, Skrovanek SS, Kruglyak L. 2010. Selection at linked sites
shapes heritable phenotypic variation in C. elegans. Science
330:372–376.
Rödelsperger C, Neher RA, Weller AM, Eberhardt G, Witte H, Mayer WE,
Dieterich C, Sommer RJ. 2014. Characterization of genetic diversity in
the nematode Pristionchus pacificus from population-scale resequencing data. Genetics 196:1153–1165.
Roesti M, Hendry AP, Salzburger W, Berner D. 2012. Genome divergence
during evolutionary diversification as revealed in replicate lakestream stickleback population pairs. Mol Ecol. 21:2852–2862.
Rundle HD, Nosil P. 2004. Ecological speciation. Ecol Lett. 8:336–352.
Sadier A, Viriot L, Pantalacci S, Laudet V. 2014. The ectodysplasin pathway: from diseases to adaptations. Trends Genet. 30:24–31.
Scheet P, Stephens M. 2006. A fast and flexible statistical model for largescale population genotype data: applications to infer missing genotypes and haplotypic phase. Am J Hum Genet. 78:629–644.
Schluter D. 2000. The ecology of adaptive radiation. Oxford: Oxford
University Press.
Schluter D. 2001. Ecology and the origin of species. Trends Ecol Evol.
16:372–380.
Schluter D, Conte GL. 2009. Genetics and ecological speciation. Proc Natl
Acad Sci U S A. 106:9955–9962.
Schmid KJ, Törjék O, Meyer R, Schmuths H, Hoffmann MH, Altmann T.
2006. Evidence for a large-scale population structure in Arabidopsis
thaliana from genome-wide single nucleotide polymorphism
markers. Theor Appl Genet. 112:1104–1114.
Seehausen O, Butlin RK, Keller I, Wagner CE, Boughman JW, Hohenlohe PA,
Peichel CL, Saetre G-P, Bank C, Br€annström Å, et al. 2014. Genomics and
the origin of species. Nat Rev Genet. 15:176–192.
Slatkin M, Wiehe T. 1998. Genetic hitch-hiking in a subdivided population. Genet Res. 71:155–160.
Smith JM, Haigh J. 1974. The hitch-hiking effect of a favourable gene.
Genet Res. 23:23–35.
2271
McGaughran et al. . doi:10.1093/molbev/msw093
Sommer RJ. 2015. Pristionchus pacificus. A nematode model for comparative and evolutionary biology. The Netherlands: BRILL.
Sommer RJ, McGaughran A. 2013. The nematode Pristionchus pacificus
as a model system for integrative studies in evolutionary biology. Mol
Ecol. 22:2380–2393.
Soria-Carrasco V, Gompert Z, Comeault AA, Farkas TE, Parchman TL,
Johnston JS, Alex Buerkle C, Feder JL, Bast J, Schwander T, et al. 2014.
Stick insect genomes reveal natural selection’s role in parallel speciation. Science 344:738–742.
Strasberg D, Rouget M, Richardson D, Baret S, Dupont J, Cowling RM. 2005.
An assessment of habitat diversity and transformation on La Réunion
2272
MBE
Island (Mascarene Islands, Indian Ocean) as a basis for identifying
broad-scale conservation priorities. Biodiver Conser. 14:3015–3032.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6:
Molecular Evolutionay Genetics Analysis version 6.0. Mol Biol Evol.
30:2725–2729.
Via S. 2001. Sympatric speciation in animals: the ugly duckling grows up.
Trends Ecol Evol. 16:381–390.
Waples R. 1998. Separating the wheat from the chaff: patterns of genetic
differentiation in high gene flow species. J Hered. 89:438–450.
Weir BS, Cockerham CC. 1984. Estimating F-statistics for the analysis of
population structure. Evolution 38:1358–1370.