Genetic Architecture of Wild-Farm Differentiation: Detecting hybrid individuals - challenges and solutions Photograph by Paul Nicklen Outline of the talk Introduction My perspective to hybrid identification Challenges in hybrid detection Potential solutions Bayesian model-based methods Structure and NewHybrids General efficiency of the methods Applicability of the methods in real-life scenario Wild and farmed Atlantic salmon Population structure Identification of hybrids in sub-structured system Applicability of the methods in real-life scenario Farmed salmon in Teno river system Sub-arctic perspective Barents Sea imaa La k e Sa Bo thn ian Bay At la nt ic Oc ea n Teno river 500 km The River Teno (Tana in Norwegian) Catchment area 16386 km2 Mean annual discharge c. 200 m3/s Over 1200 km of river available for salmon Mean annual riverine catch 139 ton (S.D. 47) Teno river and farmed fish >500 000 salmon escape fron net-pens each year in coastal waters of Norway In m arc h 2003 Over 100000 adult salm on esc aped In August 2003 25000 sa lm on Esc ap ed Farm strains ’Cocktail’ of 42 populations Barents Sea 50 km Saim Lak e Bo thn ian Bay aa At la nt ic Oc ea n Teno river Teno river 500 km Gjedrem et al. (1991) Aquaculture 98, 41-50. Teno river system: number of farm escapees Proportions probably inflated, because of greater ’news value’ of reporting a farm escapee. Challenges in hybrid identification •Interspecific hybridization 5 bi-allelic diagnostic markers enough to separate backcrosses from F1-hybrids •Intraspecific hybridization No diagnostic, species specific markers Main challenge: populations differ in allele frequency distributions How much and is it enough to detect hybrids? How to best utilize this information? Potential solution to detecting hybridization Model-based Bayesian methods Utilize polymorphic microsatellites Case tailored models Assigning a particular individual to population or to a specific hybrid class Admixture analysis using the program Structure (Pritchard et. al 2000, Falush et al 2003) - Estimates the admixture coefficient (q) NewHybrids (Anderson and Thompson 2002) Infers the hybrid category Widely used since development Assessment of the general level of efficiency Admixture analysis using the program Structure • Estimates the admixture coefficient (q) •Infer population structure; Individuals are probabilistically assigned to a population, or in the case of admixed ancestry, jointly to several populations • A priori assumption that K=2 i.e. two-population model • two populations contribute to the gene pool of the sample. • Admixed ancestry is modeled by assuming that individual i has inherited some fraction (q) of its genome from ancestors in population K • Hybrids have intermediate q-value i.e. a first generation hybrid should have a q-value of 0.5. illustration of analysis with structure q-value 1 0.9 0.5 0.1 0 wild F1-hybrids farm Methods: NewHybrids -Infers the hybrid category Q is a discrete variable with up to six genotype frequency classes. Individual’s genotype frequency class (i.e. hybrid category) is inferred Provides a posterior probability to reflect the level of certainty that an individual belongs to a certain hybrid group (F1, back-cross, purebred etc.) Illustration of analysis with NewHybrids posterior probability of being wild farm F1-hybrid 100 % 90 % posterior probability 80 % 50 % wild Simulated individuals F1-hybrids Power analysis What is the efficiency and accuracy of the Bayesian methods to identify hybrids • Simulated data • • • Level of differentiation FST 0.03, 0.06, 0.12, 0.21 Number of loci 6, 12, 24, 48 Average number of alleles 7.5 (4-13) individuals of parental populations, F1-hybrids (and backcross-hybrids, not presented here) Structure admixture analysis values overlap q-value wild 1 0.9 0.8 F1-hybrids 0.5 farm 0 wild Simulated individuals F1hybrids Q>0.90 Q>0.90 1 0.9 0.5 Proportion of parents identifed Proportion of hybrids identifed Apparent Hybrid proportion Number of true parents in a group/ individuals in a group Number of true hybrids in a group/ individuals in a group Eff*Acc=overall performance NewHybrids Q>0.90 Proportion of true parents identifed Proportion of true hybrids identifed Apparent Hybrid proportion Q>0.90 NewHybrids %; true parents in a group/ individuals in a group %; true hybrids in a group/ individuals in a group Eff*Acc=overall performance Some remarks and conclusions Even with relatively low levels of genetic divergence (FST = 0.03–0.06), F1-hybrids could be distinguished with high efficiency with both methods, but relatively large number of loci are required knowledge of reference population allele frequencies are not necessary using Bayesian-based methodologies. Nevertheless, an implication for empirical studies is that accurate detection of hybrid individuals may prove to be a difficult task in scenarios with FST≤0.06 as the required number of unlinked marker loci for efficient hybrid identification (24 loci or more) may be unavailable. Real-life scenario in Teno • 44 historical (1972-1974) wild salmon individuals • 30 farm fish from a net-pen in Altafjord • 17 microsatellite loci Genetic diversity and differentiation Locus Allelic Richness Allele # HE Fst Teno Alta Teno Alta Teno Alta pairwise Ssa 171 13 6 12.0 5.9 0.87 0.74 0.09*** SSOSL 311 13 8 11.2 8.0 0.84 0.80 Ssa 197 16 7 15.2 7.0 0.91 0.81 Ssa 85 13 6 11.3 5.9 0.78 0.73 Ssa 202 11 9 10.8 8.8 0.86 0.78 SSOSL 85 11 8 10.7 7.9 0.86 0.84 SSOSL 438 7 5 6.3 5.0 0.65 0.72 Ssa 412 4 2 3.9 2.0 0.52 0.39 SLEEI 84 15 7 13.6 7.0 0.87 0.74 SLEEI 53 4 2 3.9 2.0 0.58 0.22 SSD 30 4 3 4.0 3.0 0.34 0.20 Ssa 422 8 6 7.7 6.0 0.76 0.75 Ssa 14 2 2 2.0 2.0 0.47 0.47 SSF 43 3 3 3.0 3.0 0.30 0.43 Sleen 82 7 4 6.8 3. 9 0.78 0.30 SSOSL 25 8 6 7.0 6.0 0.74 0.68 Ssa 289 3 5 3.0 5.0 0.55 0.64 0.06*** 0.03*** 0.17*** 0.05*** 0.04*** 0.11*** 0.00NS 0.04*** 0.26*** 0.01NS 0.11*** 0.00NS 0.01NS 0.26*** 0.18*** 0.22*** Total 8.4 5.2 7.8 5.2 0.69 0.60 0.10*** Two datasets to explore different levels of hybridization were constructed •In the high hybridization scenario (hybrid proportion = 22%) •In the low hybridization scenario (hybrid proportion = 1.6%) Real-life scenario in Teno 1.75% 3.2% 3.1% 0.75% 0.9 0.3 Simulated data Fst 0.12 and 24 loc i 0.2 0.1 0 0.1 2.3% 0.3 0.5 0.7 0.2% 0.9 Empirical, real-life microsatellite data based results are in concordance with the simulations 1.8% 5.5% 0.9 Em pirica l data 0.3 Fst 0.10 and 17 loc i 0.2 0.1 0 0.1 0.3 0.5 0.7 0.9 5.5% of hybrids mis-identified as purebred wild Comparison of individual classification, simulated data F1-hybrid wild Results: combining the information of the two programs F1-hybrid wild All hybrids were successfully identified with only 4% of wild individuals misclassified as hybrids In conclusion The potential for detecting hybridization between individuals from populations with only moderate genetic divergence is challenging improved accuracy by using two methods in tandem For any hybrid identification study utilizing programs such as structure and NewHybrids, it is advisable to simulate hybrid individuals in order to gain an insight into the level of efficiency and accuracy with which hybrids could potentially be distinguished from purebred individuals. Thus far we have considered the case of two populations, but Atlantic salmon: evidence of within-river genetic structure run timing (Stewart et al. 2002) age at smolting (Englund et al. 1999) sea-age at maturity (Niemelä 2004) Provide circumstantial evidence on within river genetic structuring Heterogeneity in nearly neutral and non-neutral genetic marker allele frequencies (Verspoor et al. 2005; Landry and Bernatchez, 2001) 1st aim of the study: To assess the level of population structuring of Atlantic salmon within the Teno river system using neutral microsatellite markers Large sub-arctic river system: Teno river 14 main tributaries to the mainstem No stocking history n=792 adult salmon 16 distinct sampling sites Mainstem 15-30th August Inferring populations An objective population genetic study requires an approach that does not rely on pre-defined populations Application of model-based methods is a promising approach for objective and accurate identification of populations driven solely by information in the genetic data. Structure (Falush et al. 2003) and BAPS 3.2 (Corander et al. 2003) Extensive genotyping effort 29 neutral, unlinked microsatellite loci Computationally very intensive Results: spatial genetic structuring Generally, defining populations by main tributaries was observed to be a reasonable approach in this large river system 14 inferred genetic clusters corresponding to the geographical topology of the river Vähä et al. (In press) Mol. Ecol. Results: Genetic diversity and divergence Mainstem and headwater populations vs. tributary populations Hybrid detection: Under spatially structured population Each main tributary fosters highly diverged unique population, while mainstem and headwater populations were genetically more diverse and less diverged How does the spatial population structure affect our ability to detect hybrids of wild and farmed salmon? Teno river system: number of farm escapees Teno mainstem lower (TmsL) Teno mainstem upper (TmsU) Pulmanki tributary population (Pul) Genetic diversity of farm escapees caught in the Teno river 14 microsatellite loci Esc TMSL TMSU Pul Allele richness 9.86 9.63 8.65 6.93 private allele richness 15.51 8.71 7.47 10.33 He 0.79 0.74 0.74 0.67 Fst Esc TMSL TMSU Pul Esc TMSL TMSU 0.045 0.048 0.023 0.108 0.098 0.097 Alta net-pen salmon Locus Allelic Richness Allele # Total HE Fst Teno Alta Teno Alta Teno Alta pairwise 8.4 5.2 7.8 5.2 0.69 0.60 0.10*** Single or several strains ? Genetic diversity of farm escapees caught in the Teno river 14 microsatellite loci Esc TMSL TMSU Pul Allele richness 9.86 9.63 8.65 6.93 private allele richness 15.51 8.71 7.47 10.33 He 0.79 0.74 0.74 0.67 Fst Esc TMSL TMSU Pul Esc TMSL TMSU 0.045 0.048 0.023 0.108 0.098 0.097 Simulated 50 hybrids (TmsL x Esc) Structure program Hierarchical approach Individuals of inferred clusters are omitted and Structure is subsequently run on partitioned data Partitioning selected applying the approach of Evanno et al. 2005 Hybrid detection: Under spatially structured population Number of inferred clusters, K=2 Number of inferred clusters, K=2 Escapees TmsL TmsU Pulmanki F1-hybrids q>0.8; 68/69 n 71 64 50 76 69 pop Esc Esc TmsL H TmsU Pul TmsL H TmsU Pul 68 Omitted from data and Structure run subsequently on partitioned data Hybrid detection: Under spatially structured population Number of inferred clusters, K=3 Escapees TmsL TmsU q>0.8; 67/76 n 71 64 50 76 69 pop Esc Esc TmsL H TmsU Pul TmsL H TmsU F1-hybrids One Pulmanki individual q>0.8; 1/50 Pul 1 67 68 Omitted from data and Structure run subsequently on partitioned data Hybrid detection: Results Number of inferred clusters, K=2 Escapees q<0.2; 63/71, TmsL F1-hybrids q<0.8; 57/64 However we know they are farm escapees n 71 64 50 76 69 pop Esc Esc (63) TmsL 1 H 10 TmsU Pul TmsL 57 13 7 1 H (8) 6 26 2 9 TmsU individuals TmsU One Pulmanki individual Pul 1 67 68 Hybrid detection: Results Number of inferred clusters, K=2 Escapees TmsL F1-hybrids 9 TmsU individuals 71 64 50 76 69 One Pulmanki individual pop Esc TmsL H TmsU Pul Esc TmsL 2 % 89 % 9 % H 20 % 26 % 52 % 2 % TmsU 9% 3 % 88 % Pul 1% 99 % 28% of hybrids undetected Hybrid detection: conclusions and future aspects Population structure complicates hybrid detection I considered only a subset of the salmon populations in Teno river system Smaller rivers, better success? Increasing the number of loci would increase the hybrid detection success Selecting more diagnostic loci Interlocus variation Identification of ’domestication genes’ may enable the use of more diagnostic markers in future Thanks Craig Primmer supervising Anti Vasemägi and Irma Saloniemi Contemplating discussions Jaakko Erkinaro and Eero Niemelä providing samples Maj and Tor Nessling Foundation Providing funding Results I Efficiency of distinguishing between purebred and F1-hybrids • Hybridization and consecuences purebred F1-hybrid Structure • Detecting hybrid individuals - challenges and solutions • Aim of this work • Methods • Results on the efficiency of methods to detect hybrids • Prospects for detecting hybrids between farmed and wild salmon in Teno • Conclusions • Acknowledgements Number of loci Results II Distinguishing between purebred, F1-hybrids and backcrosses • Hybridization and consecuences • Detecting hybrid individuals - challenges and solutions • Aim of this work • Methods • Results on the efficiency of methods to detect hybrids • Prospects for detecting hybrids between farmed and wild salmon in Teno • Conclusions • Acknowledgements purebred F1-hybrid Backcross Prospects for detecting hybrids between farmed and wild salmon in Teno • Hybridization and consecuences • Detecting hybrid individuals - challenges and solutions • Aim of this work • Methods • Results on the efficiency of methods to detect hybrids • Prospects for detecting hybrids between farmed and wild salmon in Teno • Conclusions • Acknowledgements • 44 historical (1972-1974) wild salmon individuals • 30 farm fish from a net-pen in Altafjord • 17 microsatellite loci FST= 0.10 Prospects for detecting hybrids between farmed and wild salmon in Teno • Hybridization and consecuences • Detecting hybrid individuals - challenges and solutions • Aim of this work • Methods • Results on the efficiency of methods to detect hybrids • Prospects for detecting hybrids between farmed and wild salmon in Teno • Conclusions • Acknowledgements • Using the current genotyping system 92.6% of individuals were correctly classified by both programs • Results based on empirical data in concordance with simulated data results • The potential for detecting hybridization between individuals from populations with only moderate genetic divergence is promising Prospects for detecting hybrids between farmed and wild salmon in Teno • Hybridization and consecuences • Detecting hybrid individuals - challenges and solutions • Aim of this work • Methods • Results on the efficiency of methods to detect hybrids • Prospects for detecting hybrids between farmed and wild salmon in Teno • Conclusions • Acknowledgements • Increasing the number of loci • More detailed analysis of borderline individuals • Selection of loci with highest assigment efficiency • Using linked loci Inferring populations: methods used The underlying population structure within the river system was deciphered using clustering methods implemented in Structure (Falush et al. 2003) and BAPS 3.2 (Corander et al. 2003) The model pursues clustering solutions that minimize the Hardy-Weinberg and linkage disequilibrium Hierarchical approach Individuals of inferred clusters are omitted and Structure is subsequently run on partitioned data Partitioning selected applying the approach of Evanno et al. 2005 29 neutral, unlinked microsatellite loci Results: spatial genetic structuring Generally, defining populations by main tributaries was observed to be a reasonable approach in this large river system In the mainstem the number of inferred populations was less than the number of distinct sample sites tributary populations: 14 inferred genetic clusters corresponding to the geographical topology of the river The average increase in pairwise FST measures was 0.016 (from FST= 0.086 to FST= 0.102). Results: Genetic diversity and divergence Mainstem and headwater populations vs. tributary populations 2nd aim of the study: To identify life-history and ecological variables best predicting the genetic diversity of populations. Dependence of allelic richness and sum of private allelic richness on one life-history variable and five landscape variables Allelic richness is standardized genetic diversity to the smallest N in a comparison Methods Simulated populations EASYPOP 1.8 (Balloux 2001: ) TPM (0.30) mutation rate 0.0001 Ne 1000 20 populations panmictic population for 28 000 generations migration scheme changed equilibrium proprotion of correctly classified Results: Sensitivity to proportion of hybrids in the sample structure NewHybrids parental individuals hybrid individuals >95% correctly classified Analysis with NewHybrids posterior probability of being wild farm F1-hybrid λ= Jeffreys prior α= Jeffreys prior values overlap 100 % 90 % posterior probability 80 % wild Simulated individuals F1-hybrids Consequently advisable for any hybrid identification study utilizing programs such as structure and NewHybrids to perform a sensivity analysis in order to gain insight into the level of accuracy and power with which hybrids can be distinguished from purebred individuals similar to that proposed by Blouin et al. (1996) in estimating individual relatedness Prospects for assessing farm escapee breeding success in Teno Using the two methods in tandem in case of low hybridization all hybrids were identified with only 4% of wild individuals misclassified as hybrids Further improvements larger number of loci selection of loci with highest assigment efficiency using linked loci Life-history and landscape variables sampling locale Kevojoki Tsarsjoki Utsjoki Maskejoki Valjoki Kuopp.joki Pulmanki Vetsijoki Tana Bru lower Teno mid Teno Utsjoki r.m. Outakoski Inarijoki Iesjoki Karasjoki Inferred populati on Proportion of MSW females mainstem or tributary Accessibility of the site Distance from the sea Altitude (masl) Kev Tsa Uts Mas Val Kuo Pul Vet 1 1 1 4 2 2 2 3 4 4 4 4 1 1 1 1 1 1 1 1 2 2 2 2 5 6 4 2 4 5 2 4 1 3 3 3 136 139 142 58 188 131 85 109 38 80 106 109 100 230 105 65 180 225 20 200 10 45 62 67 4 3 5 5 2 2 2 2 4 4 4 4 180 261 269 298 120 170 240 220 TMSL TMSU Ies Kar Catch ment area (km2) 474 232 946 595 531 102 743 691 7677 NA 1653 3147 2813 2206 Results of linear model: Allelic richness A significant proportion of variance in allelic richness of populations was explained by three of the variables: ‘MSW ‘ (R2=0.80, F1,10=38.9 ,p<0.0001) ‘catchment area’ (R2=0.64, F1,10=17.5, p=0.002) ‘mainstem/tributary’ (R2=0.64, F1,10=17.4 p=0.002) The best model explaining 86% of the variation in genetic diversity was when both ‘the proportion MSW female salmon’ in the population and the ‘catchment area’ of the river were included (R2=0.86, F2,9=27.9, p<0.0001) Results of linear model: private allelic richness A significant proportion of variance in private allelic richness of populations was explained by three of the variables: ‘MSW’ (R2=0.61, F1,10=15.9, p=0.0026) ‘accessibility of the site’ (R2=0.52, F1,10=10.8, p=0.0082) ‘catchment area’ (R2=0.39, F1,10=6.4, p=0.030) The best model, explaining 80% of the variation, was when ‘accessibility of the site’ and ‘the proportion of MSW females’ were both included in the model. (R =0.80, F =18.1, p<0.0007) 2 2,9 Summary and conclusions Each main tributary fosters highly diverged unique population, while mainstem and headwater populations were genetically more diverse and less diverged Population structure and variation in genetic diversity of populations were poorly explained by geographical distance In contrast, age-structure was found the most predictive variable in explaining the variation in the genetic diversity of the populations. Highlights the importance of multi-sea-winter fish on the effective population size and ultimately on the genetic diversity of the total population possibly by leveling off the annual fluctuations in the population size. Thanks Anti Vasemägi and Irma Saloniemi Contemplating discussions Craig Primmer supervising Jaakko Erkinaro and Eero Niemelä providing samples Maj and Tor Nessling Foundation Providing funding
© Copyright 2026 Paperzz