Genomic Profiles of Diversification and Genotype–Phenotype Association in Island Nematode Lineages Angela McGaughran,*1,2,3 Christian Rödelsperger,3 Dominik G. Grimm,4,5,6 Jan M. Meyer,3 Eduardo Moreno,3 Katy Morgan,3 Mark Leaver,7 Vahan Serobyan,3 Barbara Rakitsch,4 Karsten M. Borgwardt,4,5,6 and Ralf J. Sommer3 1 CSIRO Land & Water, Black Mountain Laboratories, Canberra, ACT, Australia School of BioSciences, University of Melbourne, Melbourne, VIC, Australia 3 Department for Evolutionary Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany 4 Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems, Tübingen, Germany 5 Zentrum Für Bioinformatik, Eberhard Karls Universit€at, Tübingen, Germany 6 Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland 7 Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany 2 *Corresponding author: E-mail: [email protected]. Associate editor: Beth Shapiro Abstract Key words: differentiation, diversification, evolution, FST, genome-wide association study, incipient speciation. Introduction Understanding how organisms diversify is a fundamental topic in biology, that broadly relates to deciphering the changes in phenotype and genotype that occur among populations (Darwin 1859; Mayr 1963; Losos et al. 2013). Speciation stems from processes which drive phenotypic diversification, population divergence, and ultimately, the evolution of reproductive isolation (Endler 1986; Schluter and Conte 2009; Nosil 2012). However, the mechanisms underlying speciation can vary in time and space, as well as have different phenotypic and genetic signatures (Nosil 2012). For example, speciation can result from nondeterministic processes, such as genetic drift among isolated populations, or can be a product of natural selection operating divergently across environments (Schluter 2001; Via 2001; Rundle and Nosil 2004). Somewhat less recognized processes, such as immigrant inviability, whereby immigrants show reduced success upon reaching foreign environments that are ecologically divergent from their native habitat (sensu Nosil et al. 2005), may also contribute to reproductive isolation. Teasing apart the evolutionary processes that eventually promote reproductive isolation among populations can be challenging (e.g., Cruickshank and Hahn 2014). However, studying incipient species, which constitute intermediate stages of speciation, may provide insights into local adaptation and ecological speciation, especially in species with well characterized biogeographic histories (Nosil and Feder 2012; Seehausen et al. 2014). Such insights often stem from focusing on patterns of divergence along the genome, and a pattern typical of ecological speciation is a relatively low background ß The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 33(9):2257–2272 doi:10.1093/molbev/msw093 Advance Access publication May 9, 2016 2257 Article Understanding how new species form requires investigation of evolutionary forces that cause phenotypic and genotypic changes among populations. However, the mechanisms underlying speciation vary and little is known about whether genomes diversify in the same ways in parallel at the incipient scale. We address this using the nematode, Pristionchus pacificus, which resides at an interesting point on the speciation continuum (distinct evolutionary lineages without reproductive isolation), and inhabits heterogeneous environments subject to divergent environmental pressures. Using whole genome re-sequencing of 264 strains, we estimate FST to identify outlier regions of extraordinary differentiation (1.725 Mb of the 172.5 Mb genome). We find evidence for shared divergent genomic regions occurring at a higher frequency than expected by chance among populations of the same evolutionary lineage. We use allele frequency spectra to find that, among lineages, 53% of divergent regions are consistent with adaptive selection, whereas 24% and 23% of such regions suggest background selection and restricted gene flow, respectively. In contrast, among populations from the same lineage, similar proportions (34–48%) of divergent regions correspond to adaptive selection and restricted gene flow, whereas 13–22% suggest background selection. Because speciation often involves phenotypic and genomic divergence, we also evaluate phenotypic variation, focusing on pH tolerance, which we find is diverging in a manner corresponding to environmental differences among populations. Taking a genome-wide association approach, we functionally validate a significant genotype–phenotype association for this trait. Our results are consistent with P. pacificus undergoing heterogeneous genotypic and phenotypic diversification related to both evolutionary and environmental processes. MBE McGaughran et al. . doi:10.1093/molbev/msw093 of genomic differentiation interspersed with regions of extraordinary differentiation (Hanikenne et al. 2013; Renaut et al. 2013; Sadier et al. 2014; Soria-Carrasco et al. 2014). However, so-called “islands of divergence” can result from multiple processes, including ecological adaptation, “divergence hitch-hiking” around selected variants, reduction of gene flow in “speciation islands”, and variation in mutation and recombination rates (e.g., Burri et al. 2015; Feulner et al. 2015). Researchers are beginning to tease apart these various processes through comparative studies of genome-wide patterns of differentiation between populations at varying stages along the speciation continuum (i.e., from partially isolated races to fully isolated taxa; Feder et al. 2012). Such studies have demonstrated the power of analyzing various genomic parameters – e.g., diversity (p), Tajima’s D (TD), relative and absolute divergence (FST and Dxy), and linkage disequilibrium (LD) – including their partitioning across the genome and the relationships between them, to better elucidate the processes underlying genomic diversification and ecological speciation. Pristionchus pacificus is an androdioecious (i.e., hermaphrodite and male-producing) nematode with a cosmopolitan distribution encompassing Africa, Asia, Europe, America, and the Mascareigne Islands of the Indian Ocean (Herrmann et al. 2007, 2010). The species was originally used as a system for comparative studies with Caenorhabditis elegans focused on developmental biology, ecology, and population genetics (for review see Hong and Sommer 2006; Sommer and McGaughran 2013; Sommer 2015). By now, P. pacificus has a well-understood biogeographic history across the Indian Ocean islands (Morgan et al. 2012, 2014; McGaughran et al. 2013a, 2014), where it frequently lives in an inactive state on scarab beetles, feeding on decomposing microorganisms after the beetles’ death (Herrmann et al. 2007). Mitochondrial studies have revealed four lineages (A, B, C and D), all of which were found on La Réunion (Herrmann et al. 2010), and three of which (A, C, and D) were found on the neighboring Mauritius Island (Morgan et al. 2014). In contrast, both microsatellite sequencing of Réunion and Mauritius strains (Morgan et al. 2012), and whole genome re-sequencing of 104 globally sampled strains, indicated the presence of a more complex sub-structure within lineages. This included the division of the mitochondrial A lineage into A1, A2 and A3 sub-lineages (Rödelsperger et al. 2014). Of all the genomic lineages, A2, B, C, and D are present across an array of heterogeneous environments on La Réunion Island, and A2, C, and D are also present on nearby Mauritius Island (Morgan et al. 2012; McGaughran et al. 2013a, 2014). As such, P. pacificus represents an excellent species for examining the role of both evolutionary and environmental gradients in contributing to the early stages of speciation processes. Previous work has demonstrated that the distinct lineages, even those in close geographic proximity, have different genomic profiles. Most (> 90%) within-lineage diversity is due to private (local) variation rather than to diversity shared in the common ancestral pool (Rödelsperger et al. 2014). This fits well with a model for Réunion and Mauritius, whereby lineages have independently colonized the islands from multiple source populations and thereafter continued to diverge 2258 in relative isolation (McGaughran et al. 2013a). Appreciable levels of genetic variation in P. pacificus are accompanied by high levels of phenotypic diversification among La Réunion Island populations (e.g., with natural variation in dauer formation; Mayer and Sommer 2011, cold tolerance; McGaughran and Sommer 2014, chemosensation; McGaughran et al. 2013b, and oxygen-induced social behavior; Moreno et al. 2016). The high degree of both genetic and phenotypic diversity among populations and lineages of P. pacificus, coupled with the fact that the different lineages can be crossed in the laboratory to produce fertile offspring (Sommer Lab, unpublished data), suggests that P. pacificus can be considered to be at an early stage in speciation, corresponding to continuous variation without complete reproductive isolation (sensu Hendry et al. 2009). In conjunction with this, the island distribution of P. pacificus corresponds to a range of ecological gradients and ecosystem heterogeneities among habitats (Strasberg et al. 2005). As a result, there lies an opportunity in this species to try to understand the ecological relevance of genomic differentiation across a range of evolutionary divergence. In this study, we characterize the genomic profile of diversification among P. pacificus lineages from the same island, among lineages and populations from neighboring islands (La Réunion and Mauritius), and among populations of the same lineage within islands (La Réunion). We use a variety of population genomic parameter estimates (p, TD, FST, Dxy, and LD) to examine whether diversification is occurring at the same genomic sites within and across lineages and populations, and to explore the processes (e.g., divergent selection, linkage/recombination, gene flow) that best explain patterns of genomic divergence. In conjunction, we analyze phenotypic variation among populations from La Réunion Island to test whether an ecologically relevant phenotype, pH tolerance, is also diverging in environmental comparisons in a manner that may promote immigrant inviability. pH tolerance has not been analyzed in nematodes before, but pH varies considerably between different soil types on La Réunion, and this variation has the potential to exert strong selective pressure on nematodes, which spend a portion of their life cycle as soil-dwelling. Finally, we link genotype with phenotype in a genome-wide association study (GWAS) framework, taking an identified genetic candidate through to functional validation for pH tolerance. Ultimately, by combining functional and biological levels of analysis in a system with older lineage, and younger population, pairs that can still interbreed under laboratory conditions, we attempt to understand how genomes begin to diversify on their path to becoming separate species. Results Whole Genome Re-Sequencing From our total P. pacificus collection, we performed whole genome re-sequencing on 264 strains, each created from a single field-collected hermaphrodite (see Materials and Methods) from La Réunion Island and nearby Mauritius Island. Our strain selection covered the complete known Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093 FIG. 1. Broad-scale genomic structure in Pristionchus pacificus. (A) Map showing the location of Mauritius and La Réunion Islands in the Indian Ocean; approximate position of La Réunion Island strain collection locations are indicated in black. Original Map data 2014 Google; (B) neighbor-joining tree (based on p-distances) showing the genetic relationships among lineages (colored according to: A2 – dark purple; B – light purple; C – light green; D – dark green), as well as finescale population structure among geographic locations for lineages B, C, and D; numbers on figure represent pairwise genome-wide mean divergence (FST) within or between lineages. mitochondrial and microsatellite diversity of P. pacificus on La Réunion (Morgan et al. 2012; McGaughran et al. 2014), as well as a variety of ecological and environmental habitat conditions (fig. 1A and table 1). Pooled individuals of each isogenic line were sequenced to a mean read depth of 17.4 (range: 5–58; SD ¼ 8.3). After quality filtering (see Materials and Methods), 6.8 million SNP positions (from a total genome size of 172.5 Mb and six chromosomes; Rödelsperger et al. 2014) were retained for population genomic analysis. Population Genetics Structure and Diversity of Lineages and Populations Strain selection resulted in a collective 30 populations (27 from Réunion, 3 from Mauritius; supplementary tables S1 and S2, Supplementary Material online), corresponding to groups of strains which were collected at the same geographic location and are of the same genomic ancestry. We used all 264 strains to estimate baseline population genetic parameters for the most diverse data set possible. For all other genome-wide analyses, we accounted for sample size MBE differences by randomly reducing each data set to create equal sample sizes among lineages (n ¼ 17) and populations (n ¼ 9 or 8) (see Materials and Methods and table 1). These reduced data sets included 4 lineages (A2, B, C and D); 11 Réunion populations (3, 6, and 2, from lineages B, C, and D, respectively), and 2 Mauritius populations (1 from each of lineage C, and D) (fig. 1A and table 1). The distribution of lineages across Réunion and Mauritius Island is presented in figure 1B, where several geographic locations can be seen to share genomes that differ in ancestry. For example, the location GE harbors strains of lineage A2, C, and D ancestry, whereas the locations SS1, PL, TK, and BV, all have strains corresponding to both lineage C and D (fig. 1B and table 1). This confirms the presence of co-habiting lineages on the island, and indeed, our sampling records show rare instances in which different lineages can even co-occur on the same collected beetle. However, the retained lineage structure at these locations suggests that isolation mechanisms may prevent hybridization between lineages within geographic locations. Using phylogenetic analysis, we identified fine-scale geographic structure among strains within lineage B, C, and D, providing support for allopatric divergence at the within-lineage level for several populations (fig. 1B; see also Morgan et al. 2012; McGaughran et al. 2013a, 2014). Mean genome-wide diversity (p) in 10-kb windows did not differ greatly among lineages and populations or between full and sub-sampled data sets (total range: 0.0011–0.0024; table 1). Mean Tajima’s D, also measured in 10-kb windows, was consistently negative only for lineage A2 (0.4) at a genome-wide scale, where negative values are taken to indicate an abundance of rare alleles that could result from selection and/or population demographic expansion (table 1). Mean LD, measured as R2 in 10-kb windows, was highest (generally > 0.6) for Mauritius lineages and Réunion populations, and lowest for Réunion lineages (< 0.45). Although this likely reflects sample size differences across the various groups (table 1), the correlation between LD for full and down-sized data sets for two populations was calculated as 0.786 and 0.782 (P < 0.001 for both cases). LD decay has been examined previously in P. pacificus, and shown to occur over 20- to 300kb distances, depending on the geographic sampling of lineages (Rödelsperger et al. 2014; Morgan et al., unpublished data). In other selfing species, such as Arabidopsis thaliana, LD is also high and decays over long genomic distances (Nordborg et al. 2002). In P. pacificus, such patterns are likely to predominantly be the result of its androdioecious life cycle, whereas other factors, such as spatial population structure and selective sweeps, may also be important drivers of these patterns (Schmid et al. 2006; Barrière and Félix 2007; Andersen et al. 2012). Genomic Profiles of Differentiation among SubSampled Lineages and Populations Genomic Differentiation over a Continuum of Evolutionary Divergence For all of the lineages and populations, pairwise genome-wide mean divergence (FST) ranged from an average of 0.007–0.068. 2259 MBE McGaughran et al. . doi:10.1093/molbev/msw093 Table 1. Mean Population Genetic Estimators for Lineages and Populations. Scale of Comparison Island Lineagea Population nb Pic Tajima’s Dc LDc Lineages La Réunion and Mauritius (n¼264) A2 – 22 0.0019 (0.0045) 0.4988 (1.6907) 0.2364 (0.3693) B C D A2 – – – – 31 135 76 18 0.0013 (0.0028) 0.0013 (0.0027) 0.0012 (0.0026) 0.0019 (0.0046) 0.2115 (1.7990) 0.2624 (1.8929) 0.2223 (1.9022) 0.4645 (1.6780) 0.2488 (0.3771) 0.1745 (0.3049) 0.2643 (0.3784) 0.2935 (0.4032) B C D A2 – – – – 31 125 68 4 0.0013 (0.0028) 0.0013 (0.0027) 0.0012 (0.0027) 0.0024 (0.0057) 0.2115 (1.7990) 0.2592 (1.8765) 0.2069 (1.8863) 0.1205 (1.1820) 0.2488 (0.3771) 0.1761 (0.3055) 0.2682 (0.3820) 0.5815 (0.4127) C D A2 – – – 10 8 17 0.0016 (0.0040) 0.0012 (0.0030) 0.0019 (0.0046) 0.7667 (1.3115) 0.6965 (1.2020) 0.4553 (1.6658) 0.6107 (0.4258) 0.6627 (0.4057) 0.3040 (0.4074) B C D A2 – – – SB 17 17 17 9 0.0013 (0.0029) 0.0016 (0.0028) 0.0012 (0.0024) 0.0020 (0.0043) 0.1114 (1.6664) 0.2373 (1.5595) 0.2983 (1.5777) 0.3757 (1.4833) 0.3750 (0.4234) 0.4239 (0.4249) 0.4485 (0.4316) 0.4069 (0.4315) B CC CK NB CO GE PA1 PC SS1 TB GE GEd PL PLd – – 9 9 9 9 9 9 9 9 9 9 8 9 8 9 8 0.0014 (0.0034) 0.0014 (0.0034) 0.0014 (0.0028) 0.0014 (0.0034) 0.0015 (0.0030) 0.0016 (0.0039) 0.0015 (0.0029) 0.0014 (0.0033) 0.0013 (0.0030) 0.0011 (0.0024) 0.0016 (0.0034) 0.0011 (0.0024) 0.0016 (0.0034) 0.0017 (0.0041) 0.0012 (0.0030) 0.2532 (1.4330) 0.5430 (1.3428) 0.0273 (1.4710) 0.8454 (1.2818) 0.4934 (1.4348) 0.8562 (1.2559) 0.3430 (1.3881) 0.8578 (1.2771) 0.4845 (1.4143) 0.4422 (1.3611) 0.8499 (1.2734) 0.4597 (1.3138) 0.8378 (1.2347) 0.7627 (1.2760) 0.6965 (1.2020) 0.4917 (0.4359) 0.5623 (0.4298) 0.3917 (0.4244) 0.6401 (0.4101) 0.6204 (0.4271) 0.4860 (0.4268) 0.6828 (0.4152) 0.6381 (0.4149) 0.5789 (0.4374) 0.6308 (0.4133) 0.6113 (0.4198) 0.6672 (0.4043) 0.6366 (0.4133) 0.6372 (0.4197) 0.6627 (0.4057) La Réunion (n¼242) Mauritius (n¼22) Sub-sampled lineages Populations (sub-sampled) La Réunion (n¼68) La Réunion (n¼108) C D Mauritius (n¼17) C D a NOTE.—For each location, the total number of samples representing each genetic lineage is shown. Sample size. c Calculated in 10-kb windows, with SD given in parentheses. Results were not quantitatively different for window sizes of 1 and 100 kb (see supplementary table S2, Supplementary Material online). d Sample size was reduced for these two populations in order to match sample sizes for lineage D from Mauritius in analyses using pairwise comparisons. b This highlights an increasing degree of differentiation as we move from populations located on the same island (e.g., La Réunion lineage B: 0.008, La Réunion lineage C: 0.007) to populations across La Réunion and Mauritius Islands (e.g., lineage C Mauritius vs. La Réunion: 0.019; lineage D Mauritius vs. La Réunion: 0.017), to lineages (mean: 0.068) (fig. 1A and supplementary table S3, Supplementary Material online). Spatial heterogeneity along the genome was analyzed between lineages and populations using a genome scan approach, averaging FST in 1-, 10-, and 100-kb nonoverlapping windows (fig. 2 and supplementary fig. S1 and table S4, Supplementary Material online), with 10-kb windows used in all subsequent analyses (see Materials and Methods). The shape of the distribution of FST values across the genome can be seen in supplementary figure S2 (Supplementary Material online). The marked right tail of these distributions aided the identification of outlier windows (up to 172.5 10-kb regions in 2260 the top 1% of the FST distribution, making up 1.725 Mb of the genome), which are significantly higher than the genomewide average (outlier windows were detected as the top 1% of the empirical distribution in addition to being significantly differentiated compared with a random permutation approach using a false discovery rate of 0.01) (table 2). Spearman’s correlation analysis of FST along the genome was performed between lineage and population pairs. FST was weakly, but significantly, correlated across lineages overall (mean rho ¼ 0.348 6 0.122 SD; P < 0.001) and across lineages on different islands (mean rho ¼ 0.432 6 0.038 SD; P < 0.001, for Mauritius vs. Réunion lineage C). These correlations were higher than those found for within-island populations (mean rho ¼ 0.316 6 0.020 SD; P < 0.001, and mean rho ¼ 0.242 6 0.089 SD; P < 0.001, for Réunion lineage B and C populations, respectively) (supplementary table S5, Supplementary Material online). Because genome-wide FST values are generally low, a significant portion of this Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093 MBE Table 2. Differentiated Genomic Regions across Lineages and Populations. FST Comparison Reunion Lineages A2 vs. B,C,D B vs. A2,C,D C vs. A2,B,D D vs. A2,C,D Totald Populations Lineage B CC vs. CK,NB CK vs. CC,NB NB vs. CC,CK Totald Lineage C CO vs. GE,PA1,PC,SS1,TB GE vs. CO,PA1,PC,SS1,TB PA1 vs. CO,GE,PC,SS1,TB PC vs. CO,GE,PA1,SS1,TB SS1 vs. CO,GE,PA1,PC,TB TB vs. CO,GE,PA1,PC,SS1 Totald Lineage De GE vs. PL Mauritiuse Lineage C MAU vs. CO,GE,PA1,PC,SS1,TB Lineage D MAU vs. GE,PL No.a Total No. Total No. 10-kb No. Common Shared No. Shared by Significancec b 10-kb Windows Outlier Windows Outlier Windows Windows (%) All Groups 3 3 3 3 6 19,238 19,741 20,620 20,659 40,129 181 185 194 198 398 6 1 0 1 10 3.31 0.54 0.00 0.51 2.51 0 S NS NS NS NS 2 2 2 3 14,661 14,541 14,698 21,950 145 143 144 218 11 11 11 11 7.59 7.69 7.64 5.05 11 S S S S 10 10 10 10 10 10 15 37,083 37,650 37,045 37,125 37,595 36,304 111,401 363 369 357 362 362 354 1,105 18 18 12 13 14 14 115 4.96 4.88 3.36 3.59 3.87 3.95 10.41 0 NS NS NS NS NS NS S 1 7,634 75 15 42,395 406 23 5.67 0 NS 2 14,315 136 14 10.29 14 – – a NOTE.—Number of pairwise comparisons. In at least two comparisons. c As determined with 10,000 random permutations (see Materials and Methods). d All values correspond to the total number of pairwise comparisons unique to the whole group. e Signficance for Réunion Lineage D (GE vs. PL) and Mauritius Lineage D (MAU vs. GE, PL) is not considered as the minimum number of comparisons is too low. b correlation is likely due to noise; however, the overall mean rho value is quite high at > 0.3. These results may therefore indicate that similar population genetic processes (e.g., recombination landscapes and background selection) may be driving genome-wide divergence in different lineages and populations (see Rödelsperger et al. 2014). However, more generally (i.e., in 60–70% of cases), FST windows in one population/lineage are not the same windows of low/high FST in other population pairs across a large proportion of the genome. To determine whether this was also true for outlier FST windows, we investigated the degree of overlap among outlier FST windows for lineage and population comparisons, using a permutation approach (table 2). Comparisons across lineages and across islands were not significant, indicating that divergent genomic regions are generally not shared across lineages at a rate more than expected by chance. However, for population comparisons on La Réunion Island, we detected a total of 218 10-kb outlier windows for lineage B populations, of which 11 were shared across all 3 population pairs. This proportion is more than expected by chance (10,000 permutations of random sampling gave on an average five overlaps; one-tailed P ¼ 0.025). The same was true for La Réunion lineage C populations, where we detected a total of 1,105 10-kb outlier windows across all 15 possible pairwise comparisons, 115 of which were shared in at least 2 of the 15 possible population pairs (84 expected overlaps in 10,000 permutations of random sampling, one-tailed P ¼ 0.025). Thus, we found a pattern in which genomic regions of extraordinary differentiation were more often shared among recently diverged (allopatric) populations than among historically diverged lineages (table 2). Relative versus Absolute Genomic Differentiation To complement FST analysis, measurements of absolute divergence, such as Dxy, are now recommended (e.g., Nachman and Payseur 2012; Cruickshank and Hahn 2014). Thus, to examine the degree of absolute genomic differentiation, we compared Dxy in outlier versus nonoutlier windows for each lineage and population. We found that median Dxy was consistently lower in outlier windows comparative to nonoutlier windows (after Bonferroni correction, the Wilcoxon test P > 0.003 in 12/17 tests; supplementary fig. S3, Supplementary Material online). Because absolute divergence measurements are considered to be unreliable for young populations and/or populations that are not at equilibrium due to ongoing differentiation processes (Nachman and Payseur 2012; Cruickshank and Hahn 2014), we further examined 2261 McGaughran et al. . doi:10.1093/molbev/msw093 MBE FIG. 2. FST profiles and highly differentiated SNPs. Sliding window pairwise FST plotted for comparisons involving lineage B, from top to bottom: CC versus CK; CC versus NB; CK versus NB. In each panel, the x-axis corresponds to the chromosomal location, whereas the y-axis represents FST, and the top 1% of divergent regions is indicated in green. See supplementary figure S1 (Supplementary Material online) for additional lineage/ population FST comparisons. FIG. 3. Divergence patterns consistent with selection or drift. Results of analysis exploring whether Tajima’s D (TD) patterns in outlier windows relative to the genome-wide TD baseline are facilitated by selection or drift for all lineages (“Lineages”), for lineages across islands (“Mauritius_C” for lineage C La Réunion vs. Mauritius, “Mauritius_D” for lineage D La Réunion vs. Mauritius), and for populations within lineages on La Réunion Island (“Réunion_B”, “Réunion_C” and “Réunion_D” for Réunion lineage B, C, and D populations, respectively). genomic regions of extraordinary differentiation by assessing selective sweep signatures using TD (see next section). Selection and LD in Outlier Windows In order to quantify the relative contribution of different mechanisms (e.g., recombination, background selection, adaptive selection, restricted gene flow) shaping the genomics of speciation, we first investigated fine-scale linkage patterns and their effects on genomic heterogeneity. For each lineage and population, we estimated LD (R2) in 10-kb windows along the genome and then checked to see if the median R2 was different in outlier windows comparative to nonoutlier windows. A pattern of positive correlation between FST and R2 is potentially indicative of both a local reduction in gene flow mediated by divergent selection, and selection with hitch-hiking of linked neutral sites (Keinan and Reich 2010; Nachman and Payseur 2012; Feulner et al. 2015). We found a slight, but consistent, increase of median R2 in outlier versus 2262 nonoutlier windows, potentially indicating a local reduction in gene flow mediated by either divergent selection or selection with hitch-hiking as outlined (supplementary fig. S4, Supplementary Material online). However, individual comparisons were, for the most part, not significantly different (after Bonferroni correction, the Wilcoxon test P > 0.003 in 14/17 tests). Next, we explored whether observed divergence patterns are facilitated by selection or drift, based on analysis of TD in outlier windows relative to the genome-wide TD average. TD is a statistic that compares the average number of pairwise differences in a sample to the number of segregating sites. We expect positive selection to give a negative TD in the absence of demographic effects, and a positive TD is expected in the case of balancing selection. We use TD here to classify the evolutionary process resulting in a given pattern of differentiation based on the premise that regions that are differentiated as a result of a local restriction of gene flow should show a local signature of neutral evolution (i.e., no skew in the Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093 MBE FIG. 4. Significant genotype–phenotype associations. Manhattan plot showing 175 significant SNPs (in purple) detected in easyGWAS for pH tolerance. The SNP for the Ppa-nhx contig20-snap.250 gene is highlighted with an arrow, whereas the four genomic regions the 175 SNPs lie in are outlined in black boxes. Mean pairwise LD (R2) between all significant SNPs in these regions, from left to right in the figure, is: 0.79, 0.57, 0.55, and 0.01. allele frequency spectrum; Nachman and Payseur 2012). Meanwhile, divergent regions resulting from selection with hitch-hiking at linked sites should show a characteristic skew of the spectrum, resulting in a negative TD (indicative of an excess of rare alleles in a population following a selective sweep, or of background selection in the case where both populations in a pair show such a pattern; Smith and Haigh 1974; Charlesworth et al. 1993). Thus, following Feulner et al. (2015), we used shifts in the allele frequency spectrum, calculated as TD across the genome, to partition outlier windows into three mutually exclusive categories among lineage and population comparisons, based on contrasting outlier TD values to the genome-wide mean TD: (1) TD reduced relative to genome-wide mean in both population pairs, consistent with background selection; (2) TD reduced relative to the genomewide mean in one of the two population pairs, consistent with adaptive/positive selection; and (3) neutral TD patterns, where TD in both population pairs was not significantly different from the genome-wide mean, consistent with restricted gene flow from neutral (e.g., genetic drift) processes. Note that outlier windows make up a small percentage of the total genome (e.g., 190 10-kb windows for lineages and 75–360 10-kb windows for populations; table 2), thus these TD analyses are referring to only very small genomic regions. We found that 53% of windows for lineage comparisons corresponded to adaptive selection, 24% to background selection, and 23% to restricted gene flow. For population comparisons across islands, we found that 33–35%, 7–8%, and 57–60% of outlier windows corresponded to adaptive selection, background selection, and restricted gene flow, respectively. For Réunion populations, these values corresponded to 34–48%, 13–22%, and 40–50% (fig. 3 and supplementary ta ble S6, Supplementary Material online). Thus, lineages appear to be most highly subject to adaptive selection, populations across islands were most subject to restricted gene flow, and populations within Réunion were similarly subject to adaptive selection and restricted gene flow. Genes in Outlier Windows Names for all genes (based on homology searches using BLAST against C. elegans), found within common outlier windows for lineage and population comparisons can be found in supplementary table S7 (Supplementary Material online). In lineage comparisons, genes in overlapping outlier windows were overrepresented with functions involved in positive regulation of vulval development (GO:0040026; P ¼ 0.001), muscle myosin thick filament assembly (GO:0030241; P ¼ 0. 008), apical protein localization (GO:0045176; P ¼ 0.009), and mRNA processing (GO:0006397; P ¼ 0.009), whereas those common to populations were overrepresented with functions involved in mRNA stabilization (GO:0048255; P ¼ 0. 001 and P ¼ 0.003 for lineage B and C, respectively), sensory perception of taste (GO:0050909; P ¼ 0.001; lineage B populations), regulation of cell proliferation (GO:0042127; P ¼ 0. 002, lineage C populations), and detection of temperature stimulus (GO:0016048; P ¼ 0.003, lineage D populations). Phenotypic Profiles of Differentiation among Lineages and Populations Natural Variation in Phenotype We screened a subset of 130 strains (lineages A2, B, C, and D; supplementary table S2, Supplementary Material online), for variation in their tolerance to a pH of 5 (see Materials and Methods). We first examined the environmental distribution of this trait, and found no evidence for spatial autocorrelation in environmental (soil) pH (Moran’s I P value ¼ 0.870). However, both soil pH and pH tolerance among nematodes varied significantly with geographic location (Kruskal–Wallis rank sum tests: v2 ¼ 130, df ¼ 10, P < 0.001; v2 ¼ 27.07, df ¼ 10, P ¼ 0.003; for soil pH and pH tolerance among nematodes, respectively) (supplementary fig. S5, Supplementary Material online). Variation in pH tolerance among the tested strains was significantly correlated with local soil pH (Spearman’s rank correlation rho ¼ 0.254; P ¼ 0.003), and this correlation was only slightly reduced if the individual with the highest mortality in our pH assays (i.e., a potential outlier) was removed from the analysis (Spearman’s rank correlation rho ¼ 0.236; P ¼ 0.007). Associations between Genotype and Phenotype To test whether the identified phenotypic variance in pH tolerance could be linked to its genotypic variance, we used easyGWAS (Grimm et al. 2012), an integrated interspecies platform for GWAS. We used the EMMAX algorithm (Kang et al. 2010) in easyGWAS to perform genome-wide association mappings whereas accounting for confounding by population structure. We identified a total of 175 significant GWAS hits, which fell in four genomic regions (fig. 4). We calculated all pairwise LD values for these four genomic regions (see fig. 4), and all hits were extracted and used to 2263 McGaughran et al. . doi:10.1093/molbev/msw093 MBE FIG. 5. Functional follow-up of GWAS candidate Pristionchus pacificus nhx gene. (A) Mortality (%) results of pH assays performed on RS2333 (wildtype phenotype), RSC021 (mortality phenotype), three independent transgenic RSC021 lines for which a copy of the Ppa-nhx contig20-snap.250 gene was injected from RS2333 into RSC021 (Lines A, B, and C), one independent line for which a copy of the Ppa-nhx contig20-snap.250 gene was injected from RSC021 into RSC021 (Line D), and one independent line for which a construct without the Ppa-nhx contig20-snap.250 gene was injected into RSC021 (Line E). Rescue of the phenotype is seen in the transgenic lines, which have significantly (“***” indicates P < 0.0001; “**” ¼ P < 0.001) reduced mortality compared with the wild-type RSC021 strain. In the case of Line A, P ¼ 0.0132, which was not significant after Bonferroni correction. Inset, top-right: dumpy-like phenotype seen in some RSC021 individuals after 24-h incubation in a pH 5 solution. (B) Gene structure of the Ppa-nhx contig20-snap.250 gene, determined via laboratory RACE experiments. This construct (running from Sal1 cutting points on the left and right of the gene) was used for micro-injection experiments; the splice leader position (“SL1”) is shown at the left of the gene structure, whereas the original SNP identified in easyGWAS is shown to the right; 30 - and 50 -UTR, exons and introns, and the genomic region (from 4,000 to 13,000 bp) of Ppa-nhx contig20-snap.250 are indicated by the key at the top of the panel. (C) Regional association and linkage disequilibrium plot for the significantly associated focal easyGWAS SNP at ChrI:21,179,317. This SNP (in magenta) is located in an nhx-9 homologue of the nhx-9 gene in Caenorhabditis elegans. The linkage disequilibrium structure is highlighted in different colors, where red colors illustrate strong LD and blue colors indicate weak or no LD. Below the zoomed-in regional Manhattan plot, the minor allele frequency (MAF) for each SNP is shown, as well as gene annotations for this region. Note that all available SNPs with a MAF >10% were used to generate this LD plot. identify potential candidate loci underlying pH tolerance in P. pacificus (below). Functional Analysis of a GWAS-Derived pH Gene Candidate Among the candidate loci identified in easyGWAS was a homologue of the C. elegans nhx family (supplementary fig. S6 2264 and table S8, Supplementary Material online). One member of this family, nhx-9, is known to encode a sodium/proton exchanger, expressed intra-cellularly in C. elegans (Nehrke and Melvin 2002). Involved in the regulation of pH, NHX proteins are thought to prevent intracellular acidification by catalyzing the exchange of vesicular sodium for an intracellular proton (Nehrke and Melvin 2002). A comparison of genomic Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093 variation among the 130 strains in our data set (and the laboratory reference strain, RS2333), found 98 polymorphisms in the 10-kb genomic region encompassing the predicted P. pacificus nhx gene (contig20-snap.250). In survival assays, we found that most strains had very low mortality when exposed to pH 5, whereas one strain, RSC021, had mortality that was significantly higher (T129 ¼ 16.11, P < 0.001; 95% CI: 14.25–18. 77). In each case, RSC021 individuals that died during the assay had a dumpy-like phenotype (fig. 5A) that was not present in any of the other tested strains. After determining the exact gene structure of the Ppa-nhx contig20-snap.250 (fig. 5B), we generated transgenic RSC021 animals, carrying extra copies of Ppa-nhx contig20-snap.250 from the laboratory reference strain, RS2333, which showed a 0% mortality phenotype in our assays. These transgenic animals were successfully rescued, showing significantly reduced, or no, mortality in our pH assay (P ¼ 0.0132, 0.0013, and 0.0013, for transgenic RSC021 lines A, B, and C, respectively; light blue bars in fig. 5A). In contrast, control transgenic lines carrying only the Ppa-egl-20::rfp reporter were not rescued (grey bar in fig. 5A). To determine whether over-expression of a Ppa-nhx contig20-snap.250 gene from the RSC021 strain itself was also able to cause rescue of the phenotype, we created a similar construct of Ppa-nhx contig20-snap.250 from RSC021. In these transgenic animals, we observed partial rescue (green bar; fig. 5A), suggesting that genomic variation within the locus (i.e., 98 polymorphisms, above), as well as Ppa-nhx contig20-snap.250 expression differences between RS2333 and RSC021, are responsible for the pH phenotype. Finally, we examined local linkage in the region of contig20-snap.250, plotting R2 between the GWAS focal SNP and all other SNPs in the genomic region of contig20snap.250 (fig. 5C). We examined genomic FST, p, and TD patterns around the Ppa-nhx contig20-snap.250 to see if we could find evidence for a selective sweep (i.e., high FST, low p and low, negative TD; Olson-Manning et al. 2012). For this analysis, we compared populations from CO and SS1. Soil pH was low at CO and high at SS1, but tolerance to pH 5 was high at CO and low at SS1 (i.e., nematodes collected from a low soil pH environment – CO – showed high tolerance to pH 5 in assays, whereas those collected from a higher pH environment – SS1 – showed lower tolerance to pH 5 in assays). We found that genomic FST between the two populations was low, whereas p was reduced in both populations; TD was low and negative for CO, but neutral (close to 0) for SS1, which may be indicative of adaptive selection operating in this region for the CO population (supplementary fig. S7, Supplementary Material online). We also examined FST among Réunion populations at the Ppa-nhx contig20-snap. 250 locus to see if we could find a pattern in which FST was higher for sites characterized by lower-pH soils comparative to higher-pH soils (supplementary fig. S5, Supplementary Material online). Instead, we found that FST was lowest among populations from the same lineage, and this overrode any potential pH-trait driven effects on FST at the Ppanhx contig20-snap.250 locus (supplementary table S9, Supplementary Material online). MBE Discussion Heterogeneous Genomic Differentiation among Lineages and Populations Genetic divergence can be influenced by several factors, including gene flow, drift, mutation, recombination, and natural selection. Over evolutionary time scales, the action of these factors at the micro-scale can eventually result in changes at the macro-scale, to promote speciation. Here, we focused on delineating patterns of genomic divergence across several points of the speciation continuum in the nematode, P. pacificus. FST analysis revealed that highly differentiated genomic regions were rarely shared across pairwise comparisons involving evolutionary lineages, suggesting that these regions of the P. pacificus genome are evolving independently among isolated lineages. This is in line with related studies in other organisms, in which patterns of genomic divergence have been shown to be widely dispersed across the genome (e.g., threespine stickleback; Hohenlohe et al. 2010; Deagle et al. 2012; Jones et al. 2012; and Anopheles gambiae; Lawniczak et al. 2010). Such patterns may not be particularly surprising, given that the different P. pacificus lineages are, for the most part, associated with different beetle species living in different ecological/geographic environments. However, in the case of more recently diverged, allopatric populations (that do often share beetle host species), we found that genomic regions of extraordinary differentiation were more often shared than would be expected by chance. These genomic regions may therefore contain loci important in incipient speciation processes. Indeed, our functional enrichment analysis found an over-representation of genes involved in environmental sensation (e.g., sensory perception of taste, detection of temperature stimulus) in outlier windows in population comparisons. Heterogeneity of genomic divergence may be due to several factors, and patterns of FST divergence in particular, may vary in a manner that is independent of local adaptation or speciation (Cruickshank and Hahn 2014). For example, shared recombination and mutation profiles among populations, background selection, hitch-hiking, migration, and recent population splitting events, can all result in shared withinpopulation polymorphisms that reduce local diversity and lead to between-population differentiation, and FST divergence may simply result from the stochasticity of neutral, but convergent, genetic drift (Kaplan et al. 1989; Nordborg et al. 1996; Slatkin and Wiehe 1998; Nosil and Feder 2012; Cruickshank and Hahn 2014; Seehausen et al. 2014). Recent studies reporting heterogeneous landscapes of differentiation have provided important insights into the genomic profiles underlying adaptive divergence. By using various genomic parameters, and the relationships between them, these studies have helped elucidate the heterogeneous processes underlying genomic diversification and ecological speciation (e.g., Nosil et al. 2009; Lawniczak et al. 2010; Roesti et al. 2012; Feulner et al. 2015). Here, we used predictions about the behavior of genomic diversity, Tajima’s D, relative and absolute divergence, and 2265 MBE McGaughran et al. . doi:10.1093/molbev/msw093 linkage/recombination, to provide a basis for understanding the processes underlying the FST patterns we detected (sensu Feulner et al. 2015). We found a slight, but consistent increase in median R2 in outlier versus nonoutlier genomic windows. Such a pattern may be suggestive of either a local reduction in gene flow mediated by divergent selection, or of selection with hitch-hiking of linked neutral sites (Keinan and Reich 2010; Nachman and Payseur 2012; Feulner et al. 2015). The former of these is a particularly intriguing possibility that Cruickshank and Hahn (2014) note is often a neglected explanation for genomic islands of divergence. We also found that median Dxy was consistently lower in outlier windows comparative to nonoutlier windows. This is consistent with genomic comparisons of closely related species in other taxa, where regions elevated for measures of relative divergence like FST generally have not also shown high Dxy values relative to the genome-wide average (Cruickshank and Hahn 2014). The discrepancy between relative and absolute measures of divergence has led to debate about the validity of interpreting patterns of relative genomic divergence as evidence for speciation with gene flow, because islands of high relative but not absolute divergence can also be driven by the effects of background selection in isolated species or populations (Cruickshank and Hahn 2014). In P. pacificus (as for C. elegans; Cutter and Choi 2010; Rockman et al. 2010; Andersen et al. 2012), the balance between recombination and mutation is highly influenced by a predominantly self-fertilizing reproductive mode, and previous work has shown that background selection has been an important factor shaping genomic diversity (Rödelsperger et al. 2014). We sought further clarification of these issues by examining allele frequency spectra using Tajima’s D, and found that, as for the degree of shared divergent genomic regions (above), TD patterns were heterogeneous between lineages and populations. Specifically, 53% of divergent regions for lineage comparisons corresponded to adaptive selection, 24% to background selection, and 23% to restricted gene flow. Meanwhile, for Réunion populations, these values corresponded to 34–48%, 13–22%, and 40–50%. Thus, lineages appeared to be most highly subject to adaptive selection, whereas populations across islands were most subject to restricted gene flow, and populations within Réunion Island were similarly subject to adaptive selection and restricted gene flow. For comparison, in sticklebacks, 22–55% of the top 1% of divergent regions were shown to be consistent with a local reduction in gene flow, whereas 25–75% of such regions were shaped by hitch-hiking effects around selected variants (Feulner et al. 2015). Conversely, in flycatchers, heterogeneity of genomic differentiation was shown to be largely due to background selection and selective sweeps in genomic regions of low recombination (Burri et al. 2015). Genomic Architecture of Phenotypic Differentiation as a Precursor to Ecological Speciation? According to classic ecological theory, populations diverge for specific phenotypes and genotypes that influence survival and reproduction when exposed to different environments (Mayr 1963; Schluter 2000). This process is facilitated when 2266 emigrants show reduced success as they immigrate to a foreign environment that is ecologically divergent to its native habitat (Nosil et al. 2005). Through acting on phenotypes to eventually change genotypes over time, environmental mechanisms thus play an important role in the evolution of reproductive isolation among populations, resulting in the formation of new species. In trait analyses, we found evidence for pH tolerance evolving in association with local environmental (soil) pH. This complements previous work showing that La Réunion populations are undergoing phenotypic divergence in an array of ecologically relevant traits (Mayer and Sommer 2011; McGaughran et al. 2013b; McGaughran and Sommer 2014; Moreno et al. 2016). Such phenotypic differentiation may be driven by divergent selective effects among geographic regions (e.g., Kawecki and Ebert 2004), and we have shown elsewhere that local environmental variables explain a significant proportion of divergence in P. pacificus (McGaughran et al. 2014). Here, we examined the genomic profile in the region surrounding a pH gene candidate, as identified with GWAS analysis, and found putative support for adaptive selection operating at this locus (i.e., Tajima’s D was low and negative in the population where soil pH was at its lowest, but pH tolerance was at its highest). However, FST at the pH candidate locus was not shown to be higher for populations at locations characterized by low-pH soils comparative to highpH soils; instead FST was lowest among populations from the same lineage, and this over-rode any potential pH-trait driven effects. This suggests that both ecological genomics and forward genetics studies focused on a priori candidate loci may benefit from an holistic approach that incorporates genomic analysis of the regions upstream and downstream of the candidate gene. Conclusions Speciation is a process that varies continuously, through quantitative variation in the degree of phenotypic divergence and the completeness of reproductive isolation, and in the profile of highly differentiating genomic regions. Despite years of research, we still understand very little about the consequences of genomic divergence for speciation. Yet, the identification of differentiated genomic regions and the genes involved in local adaptation and ecological diversification represents a crucial first step that is enhanced by systems which provide access to multiple comparisons across the scale of evolutionary divergence. Indeed, P. pacificus, residing at an interesting point on the speciation continuum, inhabiting heterogeneous environments with diverse environmental pressures, and having a genomic toolkit available that allows functional follow-up of interesting candidate genes, is a useful system for studying incipient speciation. Further integration of ecological and functional genomic studies will enable the establishment of direct links between patterns of genomic divergence and speciation in both this, and other, key species. Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093 Materials and Methods Sampling, Data Processing, and Validation Sampling A total of 264 P. pacificus strains (i.e., genetically identical individuals of a thawed isogenic line that was created via the inbreeding of a single hermaphroditic individual over 10 generations), including 39 from Rödelsperger et al. (2014), were selected from our collections based on their availability, geographic origin of collection, and evolutionary lineage based on our previous molecular analysis of mitochondrial and microsatellite data (Herrmann et al. 2010; Morgan et al. 2014). This resulted in a collective 30 “populations” (27 from Réunion, 3 from Mauritius; supplementary table S1, Supplementary Material online), corresponding to groups of strains which were collected at the same geographic location and are of the same genomic ancestry. While efforts were made to select a well-balanced data set in terms of sample size across lineage and geographic location, we recognize that some sample sizes were small and uneven (supplementary table S1, Supplementary Material online). This largely results from the fact that, though our collections span >10 years of efforts from a large team, they are nevertheless reliant upon successful beetle catches, and beetle distribution and collecting success are generally unpredictable from year to year. In addition, nematode infestation rates vary among beetle species. Thus, whereas we used the full data set to initially estimate baseline population genetic parameters for the most diverse data set possible, for all other genome-wide analyses, we accounted for the natural differences in sample size among lineages and localities by randomly reducing each data set to create equal sample sizes among lineages (n ¼ 17) and populations (n ¼ 8 or 9). We performed several quality control checks for the population genetic parameters calculated from our data sets to ensure accuracy after random sub-sampling (see below). Overall, these reduced data sets included 4 lineages (A2, B, C and D); 11 Réunion populations (3, 6, and 2, from lineages B, C, and D, respectively), and 2 Mauritius populations (1 from each of lineage C, and D), and it is from these data sets that final population pairs were analyzed (see below) (table 1). Genomic DNA Preparation Genomic DNA was prepared for all strains following the protocol outlined in Rödelsperger et al. (2014). In brief, DNA was extracted from pooled individuals of each isogenic line using the MasterPure DNA purification kit from Epicentre (Biozym Scientific GmbH, Hessisch Oldendorf, Germany) and genomic libraries were generated using the TruSeq DNA Sample Preparation Kit ver. 2 from Illumina (Illumina Inc., CA). DNA was sheared using the Covaris S2 System (Covaris Ltd., Woodingdean Brighton, United Kingdom) and end repair, adenylation, and adaptor ligation were performed following the kit protocol. Strains were run on a 2% agarose gel with slices ranging from 400- to 500-bp extracted to give a final insert size of 300–400 bp. After PCR amplification, libraries were validated on an Agilent Bioanalyzer DNA 1,000 chip (Agilent Technologies GmbH, Waldbronn, MBE Germany) and diluted before sequencing on an Illumina Genome Analyzer II platform. Alignment, Variant Calling, and Validation For all Illumina sequence read data, bases in the first 36 bp of raw reads with a quality <20 (error probability ¼ 1%) were masked and reads were trimmed at the first occurrence of a low (<20) quality base in the rest of the read. Quality-filtered paired-end reads were aligned to the P. pacificus genome assembly (ver. Hybrid1), which spans a total length of 172.5 Mb and six chromosomes (Rödelsperger et al. 2014), using stampy ver. 1.0.20 (Lunter and Goodson 2011). Duplicate reads were removed and remaining reads were locally realigned using GATK ver. 2.1.13 (McKenna et al. 2010). SNPs were called using samtools ver. 0.1.18 (Li and Durbin 2009). The accuracy of variant calls under the same pipelines applied here was analyzed previously with Sanger sequencing data and resulted in an estimated variant call accuracy of 98%; see Rödelsperger et al. (2014) for further details. From our total sequencing data set of 264 strains, 6,867,575 SNP positions were retrieved with respect to the RS2333 California reference (Hybrid1 assembly) after applying a minor allele frequency (MAF) filter of 0.01 during SNP calling. This set of 6.8 million SNPs was used to estimate population genomic parameters (nucleotide diversity [p] and Tajima’s D [TD]; see below) among all samples from given lineages and populations. A further subset of SNPs, derived using a minor allele frequency (MAF) filter of 0.25 was then used for all pairwise population differentiation comparisons (i.e., FST). The full data set was used to estimate baseline population genetic parameters at the lineage scale, whereas all “population” calculations used reduced, equal-sized data sets. To ensure accuracy among the down-sampled data sets, we checked for consistency among full (n > 40) and downsampled data (n ¼ 8 or 9) for two populations, using measures of p and TD, via correlation analysis (supplementary table S10, Supplementary Material online). For the GWAS approaches (below), we generated a data set of 130 strains, for which we had phenotype data, and 2.1 million SNPs, including genotypes that were imputed using fastPHASE ver. 1.2 (Scheet and Stephens 2006) at positions that could be genotyped based on sequencing data in at least 95% of strains. Accuracy of imputed genotypes was evaluated based on resequencing data for 14 strains and estimated to be >99% (Rödelsperger et al. 2014). Population Genetic Parameters Population Structure To illustrate the relationship among all sampled populations, we utilized a set of 208,841 genome-wide SNPs to build a neighbor-joining tree based on p-distances in MEGA ver. 6 (Tamura et al. 2013). Population Genetic Estimators Population genetic estimators, including nucleotide diversity (p), and Tajima’s D (TD) were calculated with VCFtools ver. 0.1.11 (Danecek et al. 2011) for several different lineage and 2267 McGaughran et al. . doi:10.1093/molbev/msw093 MBE population categorizations (table 1). For lineages, these parameters were calculated for all samples for all lineages, for Réunion lineages, and for Mauritius lineages, as well as for sub-sampled Réunion lineages where n ¼ 17. For populations, Réunion and Mauritius data sets were all sub-sampled to give even sample sizes across populations so that n ¼ 8 or 9 before estimators were calculated (table 1). Both p and TD were averaged across the genome in nonoverlapping windows to ensure statistical independence of windows. Window sizes of 1-, 10-, and 100-kb were used to confirm that results were quantitatively the same, regardless of window size. Diversity (p) estimates were corrected for the number of sites for which genotypes were available (28,945,735 sites for all strains), whereas additional estimates of TD were performed after filtering at a minor allele frequency cut-off of 25% for each data set, to check that estimates did not reflect genotyping error manifesting as rare variants (supplementary table S10, Supplementary Material online). with TD, by comparing outlier-window TD with the nonoutlier-window TD for each population comparison. Following Feulner et al. (2015), we classify divergent regions into three categories: background selection if TD dropped significantly below the genome-wide average in both populations; adaptation in one or the other population if TD dropped significantly below the genome-wide average only in the respective population; and reduced gene flow if TD appeared neutral (i.e., not significantly below the genome-wide average) in both populations. Genomic Profiles of Differentiation among Lineages and Populations Phenotypic Profiles of Differentiation among Lineages and Populations Genomic Differentiation Analyses Relative divergence (Weir and Cockerham’s FST; Weir and Cockerham 1984) and absolute divergence (Dxy) were calculated with VCFtools for all of the possible lineage pairwise comparisons, and for all population pairs, using data sets of equal size (Waples 1998), and a minor allele frequency cut-off of 25% (Feulner et al. 2015) across each pairwise comparison. Natural Variation in pH Tolerance About 130 strains, selected to encompass collections from a variety of environmental gradients on La Réunion Island (e.g., altitude, temperature, precipitation), were screened for variation in pH tolerance (supplementary table S2, Supplementary Material online). All strains were maintained at 20 C on Escherichia coli OP50 (Brenner 1974) for at least 3 weeks before assaying, and up to six biological replicates were performed for each assay. In assays, the pH solution (with concentration determined in an initial range-finding experiment) was prepared by autoclaving standard K-medium (2.36 g KCl and 3.0 g NaCl per L distilled H2O) and adjusting the pH to 5 by adding 1 M HCl and/or 1 M NaOH. Following the protocol of Khanna et al. (1997), a 50-ml aliquot of concentrated worm suspension was exposed to 250 ml of the pH solution in a 24-well tissue culture plate (Greiner Bio-One GmbH, Frickenhausen, Germany). After 24-h incubation at pH 5, strains were transferred to an OP50-seeded NGM plate and mortality of adults scored. Outlier Windows Outlier FST windows were determined empirically by selecting windows above the top 1% of the empirical distribution as putative outliers. In addition, a permutation approach was taken in R ver. 3.1.1 (R Core Team 2014), in which loci across the genome were permuted 1,000,000 times. Window estimates of FST were then tested against permutations holding the same amount of variable sites using an R script modified from Feulner et al. (2015). Putative outliers from the permutation approach were identified using a false discovery rate of 0.01 and final outlier windows were those which were significant in both the empirical and permutation approaches. Outlier windows were compared among all pairwise lineage and population comparisons to examine the degree of overlap across comparisons of increasing population divergence. To evaluate how many overlapping outlier windows would be expected by chance, windows were permuted 1,000 times using a custom-made R script. For all comparisons, outlier windows were finally analyzed for their gene content based on homology searches using BLAST against C. elegans, and enrichment of functional classes of these genes among regions for populations and lineages was determined using BLAST2go analysis (BLAST2GO ver. 3.1.3; Conesa et al. 2005). Selection Signals in Divergent Regions To assess the molecular signature of selection in outlier windows, shifts in the allele frequency spectra were evaluated 2268 LD Patterns in Divergent Regions Direct measures of fine-scale population linkage disequilibrium (R2) were obtained using plink ver. 1.07 (Purcell et al. 2007) for each lineage and population separately. LD estimates were averaged over nonoutlier 10-kb windows throughout the genome to obtain genome-wide baseline estimates, and then over each outlier region. Environmental Distribution of pH Tolerance To examine the environmental distribution of pH tolerance among strains on La Reunion Island, we performed a variety of analyses in R. First, we checked for spatial autocorrelation in the soil pH data using Moran’s I. Next, we evaluated whether both local soil pH and pH tolerance varied significantly among geographic locations, using the nonparametric Kruskal–Wallis one-way analysis of variance (ANOVA) by ranks. Finally, to evaluate whether pH tolerance was correlated with local soil pH, we used the nonparametric Spearman’s (rho) correlation analysis. Associations between Genotype and Phenotype Genome-wide association analysis was used to identify SNPs that were significantly associated with pH tolerance in the Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093 cloud service, easyGWAS (Grimm et al. 2012). Standard linear regression models tend to detect many spurious associations since they do not account for confounding factors, such as population structure and cryptic relatedness (Newman et al. 2001). It has been shown in the past that linear mixed models are well suited to correct for hidden confounding of this type (Kang et al. 2010; Lippert et al. 2011) by modeling the genetic relationship between samples as a random effect. Thus, for our analysis, we used easyGWAS with the EMMAX algorithm (Kang et al. 2010). We estimated the genetic similarity between individuals by using the realized relationship kinship matrix (Hayes et al. 2009). To assess the degree of inflated test statistics, we computed genomic control (GC), measuring the deviation of the observed median test statistic from the expected one (Devlin and Roeder 1999). A GC value larger than one indicates inflated P values, whereas a GC value smaller than one is an indicator of deflated P values. In our final analysis, the GC value was 1.02. We are convinced that our analysis adequately corrected for strong population and lineage structure for several reasons: (1) our GC value is close to 1, indicating that we correctly accounted for hidden types of confounding (including population structure) in our model; (2) we later used functional validation of our easyGWAS results to confirm that a significant SNP identified in our analysis plays a causative role in our pH phenotype (see Results); and (3) because it has been shown in other species, such as A. thaliana and outbred rats, that linear mixed models with a kinship matrix are able to account for strong degrees of population structure, both between samples and between distinct sub-populations (e.g., Long et al. 1998; Atwell et al. 2010; Horton et al. 2012; Rat Genome Sequencing and Mapping Consortium et al. 2013). For our analysis, which included only homozygous sites, we encoded the major allele with “0” and the minor allele with “2” and excluded SNPs with a minor allele frequency (MAF) of <10% in all experiments. After MAF filtering of the 130-strain data set, a set of 870,876 SNPs was left, a large proportion of which were observed to be one-to-one copies of single SNPs. Therefore, we reduced the data set to include only single copies of each SNP, creating a final subset of 50,833 SNPs. Excluding duplicated SNPs from the analysis in this manner was crucial because we otherwise count SNPs multiple times, leading to skewed QQ-plots and GC values. However, to draw Manhattan plots, we used all SNPs after MAF filtering since we did not want to destroy any linkage disequilibrium structure. To account for environmental and location-specific factors in the analysis, we used the covariate option in easyGWAS for the following covariates: altitude, beetle host, ecozone, location, all. We did not transform the quantitative covariate (altitude), but encoded the X categorical covariates (location, ecozone, beetle host, all) as dummy variables: if X can have k different categories, we created k1 binary variables, each indicating one category. pH tolerance was also tested in easyGWAS in raw data format and after square-root normalization. For the final analysis, we used square-root normalized data and included the covariate, ecozone. Bonferroni correction was applied for multiple hypothesis testing. MBE In addition, we computed which parts of the phenotypic variance could be attributed to the random (genetic contribution) and to the fixed effect (environmental contribution). For this purpose, we conducted a 10-fold cross-validation for pH tolerance, y. For each fold, we trained a linear mixed model using only the kinship matrix and the covariates, and predicted the phenotype, ^y, using the remaining evaluation set. Predictions were obtained by summing up the contributions of the fixed and random effects: ^y ¼ Ctest bþKtest ðKtrain þ dIÞ1 ðytrain Ctrain bÞ where C are the included covariates (if no covariates are included, C is a vector of ones), K is the kinship matrix, and b and d are the learned parameters in the training step. Finally, we computed the variance explained vðytest ; ^yÞ as follows: vðytest ; ^yÞ ¼ 1 Varðytest ^yÞ Varðytest Þ This resulted in a final explained variance (i.e., summed contributions of random [genetic contribution] and fixed [environmental contribution] effects) of 0.06. Functional Analysis of a GWAS-Derived pH Gene Candidate The candidate gene list resulting from easyGWAS analysis included a P. pacificus nhx gene, nhx-9, which encodes a sodium/proton exchanger, expressed intra-cellularly in C. elegans (Nehrke and Melvin 2002). Among all tested isolates, we identified one strain, RSC021, that had significantly higher mortality when exposed to solutions of pH 5. Meanwhile the reference strain of P. pacificus, RS2333, showed 0% mortality in the same assay. Thus, we created an extra-chromosomal array containing 2 ng ll 1 of a genetic construct of RS2333 nhx gene (Ppa-nhx contig20-snap.250), a Ppa-egl-20::rfp (red fluorescent protein) reporter (10 ng ll 1), and genomic carrier DNA (60 ng ll 1) from the recipient line. During this process, we used the SalI restriction enzyme to cut DNA for insertion into the array, and we modified the amplified fragment (i.e., the construct) with primers to produce these restriction sites (supplementary table S5, Supplementary Material online). The array was used to perform transgenic micro-injection experiments in RSC021, injecting into the germlines of adult hermaphrodites. Transgenic lines were scored for their mortality after pH 5 exposure over multiple generations to determine whether the Ppa-nhx contig20snap.250 gene construct could cause rescue of the mortality phenotype in RSC021. As a control, we also created an RSC021 line injected with only the Ppa-egl-20::rfp reporter and tested it for rescue in our assay. In addition, we injected a second Ppa-nhx contig20-snap.250 gene construct amplified from wild-type (wt) RSC021, into wt-RSC021 to test if over-expression of the Ppa-nhx contig20-snap.250 gene, regardless of the donor origin, was causing rescue of the phenotype. All constructs consisted of a 12-kb fragment encompassing the predicted Ppa-nhx contig20-snap.250 coding region plus 5 kb of upstream and downstream sequence. We used RACE (Frohman et al. 1988) to examine 2269 McGaughran et al. . doi:10.1093/molbev/msw093 the potential differences in the Ppa-nhx contig20-snap.250 gene structure between our high- (RSC021) and low(RS2333) mortality strains. LD and Other Population Genetic Estimators for the easyGWAS SNP Region We examined local linkage in the region of contig20-snap.250 by plotting R2 between the GWAS focal SNP and all other SNPs in the local genomic region. Next, we plotted genomic FST, p, and TD around contig20-snap.250 by comparing two populations (CO and SS1) with divergent pH tolerance and soil pH profiles. Finally, we calculated FST among Réunion populations at the Ppa-nhx contig20-snap.250 locus (996 bp) using Arlequin ver. 3.5 (Excoffier and Lischer 2010). Supplementary Material Supplementary figs. S1–S7 and tables S1–S10 are available at Molecular Biology and Evolution online (http://www.mbe. oxfordjournals.org/). Acknowledgments This project represents the fruition of several years of planning and implementation; the authors wish to thank everyone who played a role, including members of the Sommer Laboratory and many colleagues and conference/workshop participants for valuable discussions. In addition, we thank Metta Riebesell for performing micro-injection experiments, our La Réunion colleagues for logistic support during fieldwork (particularly Dr Jacques Rochat, La Réunion Insectarium, and staff of La Réunion Parc National), and three anonymous reviewers for their constructive comments on an earlier version of the manuscript. All genomic data has been submitted to the European Nucleotide archive. Phenotypic, summary statistics, and dynamic Manhattan plots corresponding to our easyGWAS analyses, are available at: https://easygwas. ethz.ch. References Andersen E, Gerke J, Shapiro JA, Crissman JR, Ghosh R, Bloom JS, Félix M-A, Kruglyak L. 2012. Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity. Nat Genet. 44:285–290. Atwell S, Huang YS, Vilhjalmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT, et al. 2010. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465:627–631. Barrière A, Félix M-A. 2007. Temporal dynamics and linkage disequilibrium in natural populations of Caenorhabditis elegans populations. Genetics 176:999–1011. Rat Genome Sequencing and Mapping Consortium, Baud A, Hermsen R, Guryev V, Stridh P, Graham D, McBride MW, Foroud T, Calderari S, Diez M, et al. 2013. Combined sequence-based and genetic mapping analysis of complex traits in outbred rats. Nat Genet. 45:767–775. Brenner SJ. 1974. The genetics of Caenorhabditis elegans. Genetics 77:71–94. Burri R, Nater A, Kawakami T, Mugal CF, Olason PI, Smeds L, Suh A, Dutoit L, Bures S, Garamszegi LZ, et al. 2015. Linked selection and recombination rate variation drive the evolution of the genomic landscape of differentiation across the speciation continuum of Ficedula flycatchers. Genet Res. 25:10–11. 2270 MBE Charlesworth B, Morgan MT, Charlesworth D. 1993. The effect of deleterious mutations on neutral molecular variation. Genetics 134:1289–1303. Conesa A, Götz S, Garcia-Gomez JM, Terol J, Talon M, Robles M. 2005. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21:3674–3676. Cruickshank TE, Hahn MW. 2014. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol Ecol. 23:3133–3157. Cutter AD, Choi JY. 2010. Natural selection shapes nucleotide polymorphism across the genome of the nematode Caenorhabditis briggsae. Genome Res. 20:1103–1111. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA. 2011. The variant call format and VCFtools. Bioinformatics 27:2156–2158. Darwin C. 1859. On the origin of species. 6th ed. London: Murray. Deagle BE, Jones FC, Chan YF, Absher DM, Kingsley DM, Reimchen TE. 2012. Population genomics of parallel phenotypic evolution in stickleback across stream-lake ecological transitions. Proc R Soc B. 279:1277–1286. Devlin B, Roeder K. 1999. Genomic control for association studies. Biometrics 4:997–1004. Endler JA. 1986. Natural selection in the wild. Monographs in population biology. Princeton: Princeton University Press. Excoffier L, Lischer HEL. 2010. Arlequin suite ver 3.5: a new series of programs to perform population genetic analyses under Linux and Windows. Mol Ecol Res. 10:564–567. Feder JL, Egan SP, Nosil P. 2012. The genomics of speciation-with-geneflow. Trends Genet. 28:342–350. Feulner PGD, Chai FJJ, Panchal M, Huang Y, Eizaguirre C, Kalbe M, Lenz TL, Samonte IE, Stoll M, Bornberg-Bauer E, et al. 2015. Genomics of divergence along a continuum of parapatric population differentiation. PLoS Genet. 11:e1004966. Frohman MA, Dush MK, Martin GR. 1988. Rapid production of full length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. Proc Natl Acad Sci U S A. 85:8998–9002. Grimm D, Greshake B, Kleeberger S, Lippert C, Stegle O, Schölkopf B, Weigel D, Borgwardt K. 2012. easyGWAS: an integrated interspecies platform for performing genome-wide association studies. ArXiv:1212.4788 [q-bio.GN]. Hanikenne M, Kroymann J, Trampczynska A, Bernal M, Motte P, Clemens S, Kr€amer U. 2013. Hard selective sweep and ectopic gene conversion in a gene cluster affording environmental adaptation. PLoS Genet. 9:e1003707. Hayes BJ, Visscher PM, Goddard ME. 2009. Increased accuracy of artificial selection by using the realized relationship matrix. Genetics Res. 91:47–60. Hendry P, Bolnick DI, Berner D, Peichel CL. 2009. Along the speciation continuum in sticklebacks. J Fish Biol. 75:2000–2036. Herrmann M, Kienle S, Rochat J, Mayer WE, Sommer RJ. 2010. Haplotype diversity of the nematode Pristionchus pacificus on Réunion in the Indian Ocean suggests multiple independent invasions. Bio J Linn Soc. 100:170–179. Herrmann M, Mayer W, Hong R, Kienle S, Minasaki R, Sommer RJ. 2007. The nematode Pristionchus pacificus (Nematoda: Diplogastridae) is associated with the Oriental beetle Exomala orientalis (Coleoptera: Scarabaeidae) in Japan. Zool Sci. 24:883–889. Hohenlohe PA, Bassham S, Etter PD, Stiffler N, Johnson EA, Cresko WA. 2010. Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLoS Genet. 6:e100086. Hong RL, Sommer RJ. 2006. Pristionchus pacificus: a well-rounded nematode. BioEssays 28:651–659. Horton MW, Hancock AM, Huang YS, Toomajian C, Atwell S, Auton A, Wayan Muliyati N, Platt A, Gianluca Sperone F, Vilhjalmsson BJ, et al. 2012. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat Genet. 44:212–216. Jones FC, Grabherr MG, Chan YF, Russell P, Mauceli E, Johnson J, Swofford R, Pirun M, Zody MC, White S, et al. 2012. The genomic Genomic Profiles of Diversification and Genotype–Phenotype Association . doi:10.1093/molbev/msw093 basis of adaptive evolution in threespine sticklebacks. Nature 484:55–61. Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S, Freimer NB, Sabatti C, Eskin E. 2010. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 42:348–354. Kaplan NL, Hudson RR, Langley CH. 1989. The ‘hitchhiking effect’ revisited. Genetics 123:887–899. Kawecki TJ, Ebert D. 2004. Conceptual issues in local adaptation. Ecol Lett. 7:1225–1241. Keinan A, Reich D. 2010. Human population differentiation is strongly correlated with local recombination rate. PLoS Genet. 6:e1000886. Khanna N, Cressman CP, Tatara CP, Williams PL. 1997. Tolerance of the nematode Caenorhabditis elegans to pH, salinity, and hardness in aquatic media. Arch Environ Contam Toxicol. 32:110–114. Lawniczak MKN, Emrich SJ, Holloway AK, Regier AP, Olson M, White B, Redmond S, Fulton L, Appelbaum E, Godfrey J, et al. 2010. Widespread divergence between incipient Anopheles gambiae species revealed by whole genome sequences. Science 330:512–514. Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760. Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D. 2011. FaST linear mixed models for genome-wide association studies. Nat Methods. 8:833–835. Long AD, Lyman RF, Langley CH, Mackay TF. 1998. Two sites in the Delta gene region contribute to naturally occurring variation in bristle number in Drosophila melanogaster. Genetics 149:999–1017. Losos JB, Arnold SJ, Bejerano G, Brodie EDIII, Hibbett D, Hoekstra HE, Mindell DP, Monteiro A, Moritz C, Allen Orr H, et al. 2013. Evolutionary biology for the 21st century. PLoS Biol. 11:e1001466. Lunter G, Goodson M. 2011. Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res. 21:936–939. Mayer MG, Sommer RJ. 2011. Natural variation in Pristionchus pacificus dauer formation reveals cross-preference rather than self-preference of nematode dauer pheromones. Proc Biol Sci. 278:2784–2790. Mayr E. 1963. Animal species and evolution. Oxford: Oxford University Press. McGaughran A, Morgan K, Sommer RJ. 2013a. Unravelling the evolutionary history of the nematode Pristionchus pacificus: from lineage diversification to island colonization. Evol Ecol. 3:667–675. McGaughran A, Morgan K, Sommer RJ. 2013b. Natural variation in chemosensation: lessons from an island nematode. Ecol Evol. 3:5209–5224. McGaughran A, Morgan K, Sommer RJ. 2014. Environmental variables explain genetic structure in a beetle-associated nematode. PLoS One 9:e87317. McGaughran A, Sommer RJ. 2014. Natural variation in cold tolerance in the nematode Pristionchus pacificus: the role of genotype and environment. Open Biol. 3:832–838. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. 2010. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297–1303. Moreno E, McGaughran A, Rödelsperger C, Zimmer M, Sommer RJ. 2016. Oxygen-induced social behaviours in Pristionchus pacficus have a distinct evolutionary history and genetic regulation from Caenorhabditis elegans. Proc R Soc B. 283:20152263. Morgan K, McGaughran A, Ganeshan S, Herrmann M, Sommer RJ. 2014. Landscape and oceanic barriers shape dispersal and population structure in the island nematode Pristionchus pacificus. Bio J Linn Soc. 112:1–15. Morgan K, McGaughran A, Villate L, Herrmann M, Witte H, Bartelmes G, Rochat J, Sommer RJ. 2012. Multi-locus analysis of Pristionchus pacificus on La Réunion Island reveals an evolutionary history shaped by multiple introductions, constrained dispersal events and rare outcrossing. Mol Ecol. 21:250–266. MBE Nachman MW, Payseur BA. 2012. Recombination rate variation and speciation: theoretical predictions and empirical results from rabbits and mice. Philos Trans R Soc Lond B Biol Sci. 367:409–421. Nehrke K, Melvin JE. 2002. The NHX family of Naþ-Hþ exchangers in Caenorhabditis elegans. J Biol Chem. 277:20936–29044. Newman DL, Abney M, McPeek MS, Ober C, Cox NJ. 2001. The importance of genealogy in determining genetic associations with complex traits. Am J Hum Genet. 69:1146–1148. Nordborg M, Borevitz JO, Bergelson J, Berry CC, Chory J, Hagenblad J, Kreitman M, Maloof JN, Noyes T, Oefner PJ, et al. 2002. The extent of linkage disequilibrium in Arabidopsis thaliana. Nat Genet. 30:190–193. Nordborg M, Charlesworth B, Charlesworth D. 1996. The effect of recombination on background selection. Genet Res. 67:159–174. Nosil P. 2012. Ecological speciation. Oxford: Oxford University Press. Nosil P, Feder JL. 2012. Genomic divergence during speciation: causes and consequences. Philos Trans R Soc Lond B Biol Sci. 367:332–342. Nosil P, Harmon LJ, Seehausen O. 2009. Ecological explanations for (incomplete) speciation. Trends Ecol Evol. 24:145–156. Nosil P, Vines TH, Funk DJ. 2005. Perspective: reproductive isolation caused by natural selection against immigrants from divergent habitats. Evolution 59:705–719. Olson-Manning CF, Wagner MR, Mitchell-Olds T. 2012. Adaptive evolution: evaluating empirical support for theoretical predictions. Nat Rev Genet. 13:867–877. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, et al. 2007. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet. 81:559–575. R Core Team. 2014. R: a language and environment for statistical computing. [cited 2014 Mar 23]. Available from: http://www.R-project. org Renaut S, Grassa CJ, Yeaman S, Moyers BT, Lai Z, Kane NC, Bowers JE, Burke JM, Rieseberg LH, et al. 2013. Genomic islands of divergence are not affected by geography of speciation in sunflowers. Nat Commun. 4:1827. Rockman MV, Skrovanek SS, Kruglyak L. 2010. Selection at linked sites shapes heritable phenotypic variation in C. elegans. Science 330:372–376. Rödelsperger C, Neher RA, Weller AM, Eberhardt G, Witte H, Mayer WE, Dieterich C, Sommer RJ. 2014. Characterization of genetic diversity in the nematode Pristionchus pacificus from population-scale resequencing data. Genetics 196:1153–1165. Roesti M, Hendry AP, Salzburger W, Berner D. 2012. Genome divergence during evolutionary diversification as revealed in replicate lakestream stickleback population pairs. Mol Ecol. 21:2852–2862. Rundle HD, Nosil P. 2004. Ecological speciation. Ecol Lett. 8:336–352. Sadier A, Viriot L, Pantalacci S, Laudet V. 2014. The ectodysplasin pathway: from diseases to adaptations. Trends Genet. 30:24–31. Scheet P, Stephens M. 2006. A fast and flexible statistical model for largescale population genotype data: applications to infer missing genotypes and haplotypic phase. Am J Hum Genet. 78:629–644. Schluter D. 2000. The ecology of adaptive radiation. Oxford: Oxford University Press. Schluter D. 2001. Ecology and the origin of species. Trends Ecol Evol. 16:372–380. Schluter D, Conte GL. 2009. Genetics and ecological speciation. Proc Natl Acad Sci U S A. 106:9955–9962. Schmid KJ, Törjék O, Meyer R, Schmuths H, Hoffmann MH, Altmann T. 2006. Evidence for a large-scale population structure in Arabidopsis thaliana from genome-wide single nucleotide polymorphism markers. Theor Appl Genet. 112:1104–1114. Seehausen O, Butlin RK, Keller I, Wagner CE, Boughman JW, Hohenlohe PA, Peichel CL, Saetre G-P, Bank C, Br€annström Å, et al. 2014. Genomics and the origin of species. Nat Rev Genet. 15:176–192. Slatkin M, Wiehe T. 1998. Genetic hitch-hiking in a subdivided population. Genet Res. 71:155–160. Smith JM, Haigh J. 1974. The hitch-hiking effect of a favourable gene. Genet Res. 23:23–35. 2271 McGaughran et al. . doi:10.1093/molbev/msw093 Sommer RJ. 2015. Pristionchus pacificus. A nematode model for comparative and evolutionary biology. The Netherlands: BRILL. Sommer RJ, McGaughran A. 2013. The nematode Pristionchus pacificus as a model system for integrative studies in evolutionary biology. Mol Ecol. 22:2380–2393. Soria-Carrasco V, Gompert Z, Comeault AA, Farkas TE, Parchman TL, Johnston JS, Alex Buerkle C, Feder JL, Bast J, Schwander T, et al. 2014. Stick insect genomes reveal natural selection’s role in parallel speciation. Science 344:738–742. Strasberg D, Rouget M, Richardson D, Baret S, Dupont J, Cowling RM. 2005. An assessment of habitat diversity and transformation on La Réunion 2272 MBE Island (Mascarene Islands, Indian Ocean) as a basis for identifying broad-scale conservation priorities. Biodiver Conser. 14:3015–3032. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. 2013. MEGA6: Molecular Evolutionay Genetics Analysis version 6.0. Mol Biol Evol. 30:2725–2729. Via S. 2001. Sympatric speciation in animals: the ugly duckling grows up. Trends Ecol Evol. 16:381–390. Waples R. 1998. Separating the wheat from the chaff: patterns of genetic differentiation in high gene flow species. J Hered. 89:438–450. Weir BS, Cockerham CC. 1984. Estimating F-statistics for the analysis of population structure. Evolution 38:1358–1370.
© Copyright 2026 Paperzz