Structure

Genetic Architecture of Wild-Farm Differentiation:
Detecting hybrid individuals
- challenges and solutions
Photograph by Paul Nicklen
Outline of the talk
Introduction
My perspective to hybrid identification
Challenges in hybrid detection
Potential solutions
Bayesian model-based methods
Structure and NewHybrids
General efficiency of the methods
Applicability of the methods in real-life scenario
Wild and farmed Atlantic salmon
Population structure
Identification of hybrids in sub-structured system
Applicability of the methods in real-life scenario
Farmed salmon in Teno river system
Sub-arctic perspective
Barents Sea
imaa
La k
e Sa
Bo
thn
ian
Bay
At
la
nt
ic
Oc
ea
n
Teno river
500 km
The River Teno (Tana in Norwegian)
Catchment area 16386 km2
Mean annual discharge c. 200 m3/s
Over 1200 km of river available for salmon
Mean annual riverine catch 139 ton (S.D. 47)
Teno river and farmed fish
>500 000 salmon escape fron net-pens each year in coastal waters of Norway
In m arc h 2003
Over 100000 adult
salm on esc aped
In August 2003
25000 sa lm on
Esc ap ed
Farm strains
’Cocktail’ of 42
populations
Barents Sea
50 km
Saim
Lak
e
Bo
thn
ian
Bay
aa
At
la
nt
ic
Oc
ea
n
Teno river
Teno river
500 km
Gjedrem et al. (1991) Aquaculture 98, 41-50.
Teno river system:
number of farm escapees
Proportions probably inflated,
because of greater ’news value’ of
reporting a farm escapee.
Challenges in hybrid identification
•Interspecific hybridization
5 bi-allelic diagnostic markers enough to
separate backcrosses from F1-hybrids
•Intraspecific hybridization
No diagnostic, species specific markers
Main challenge:
populations differ in allele frequency distributions
How much and is it enough to detect hybrids?
How to best utilize this information?
Potential solution to detecting
hybridization
Model-based Bayesian methods
Utilize polymorphic microsatellites
Case tailored models
Assigning a particular individual to population or to a specific
hybrid class
Admixture analysis using the program Structure
(Pritchard et. al 2000, Falush et al 2003)
- Estimates the admixture coefficient (q)
NewHybrids
(Anderson and Thompson 2002)
Infers the hybrid category
Widely used since development
Assessment of the general level of efficiency
Admixture analysis using the program
Structure
• Estimates the admixture coefficient (q)
•Infer population structure; Individuals are probabilistically
assigned to a population, or in the case of admixed ancestry,
jointly to several populations
• A priori assumption that K=2 i.e. two-population model
• two populations contribute to the gene pool of the sample.
• Admixed ancestry is modeled by assuming that
individual i has inherited some fraction (q) of its
genome from ancestors in population K
• Hybrids have intermediate q-value i.e. a first generation
hybrid should have a q-value of 0.5.
illustration of analysis with structure
q-value
1
0.9
0.5
0.1
0
wild
F1-hybrids
farm
Methods: NewHybrids
-Infers the hybrid category
Q is a discrete variable with up to six genotype
frequency classes.
Individual’s genotype frequency class (i.e. hybrid
category) is inferred
Provides a posterior probability to reflect the
level of certainty that an individual belongs to a
certain hybrid group (F1, back-cross, purebred etc.)
Illustration of analysis with NewHybrids
posterior probability of being
wild
farm
F1-hybrid
100 %
90 %
posterior
probability
80 %
50 %
wild
Simulated individuals
F1-hybrids
Power analysis
What is the efficiency and accuracy of the
Bayesian methods to identify hybrids
• Simulated data
•
•
•
Level of differentiation
FST 0.03, 0.06, 0.12, 0.21
Number of loci
6, 12, 24, 48
Average number of alleles 7.5 (4-13)
individuals of parental populations, F1-hybrids
(and backcross-hybrids, not presented here)
Structure admixture analysis
values
overlap
q-value
wild
1
0.9
0.8
F1-hybrids 0.5
farm 0
wild
Simulated individuals
F1hybrids
Q>0.90
Q>0.90
1
0.9
0.5
Proportion of parents identifed
Proportion of hybrids identifed
Apparent Hybrid proportion
Number of true parents in a group/ individuals in a group
Number of true hybrids in a group/ individuals in a group
Eff*Acc=overall performance
NewHybrids
Q>0.90
Proportion of true parents identifed
Proportion of true hybrids identifed
Apparent Hybrid proportion
Q>0.90
NewHybrids
%; true parents in a group/ individuals in a group
%; true hybrids in a group/ individuals in a group
Eff*Acc=overall performance
Some remarks and conclusions
Even with relatively low levels of genetic divergence
(FST = 0.03–0.06), F1-hybrids could be distinguished
with high efficiency with both methods, but relatively
large number of loci are required
knowledge of reference population allele frequencies
are not necessary using Bayesian-based
methodologies.
Nevertheless, an implication for empirical studies is
that accurate detection of hybrid individuals may
prove to be a difficult task in scenarios with FST≤0.06
as the required number of unlinked marker loci for
efficient hybrid identification (24 loci or more) may be
unavailable.
Real-life scenario in Teno
• 44 historical (1972-1974) wild salmon
individuals
• 30 farm fish from a net-pen in
Altafjord
• 17 microsatellite loci
Genetic diversity and differentiation
Locus
Allelic Richness
Allele #
HE
Fst
Teno
Alta
Teno
Alta
Teno
Alta
pairwise
Ssa 171
13
6
12.0
5.9
0.87
0.74
0.09***
SSOSL 311
13
8
11.2
8.0
0.84
0.80
Ssa 197
16
7
15.2
7.0
0.91
0.81
Ssa 85
13
6
11.3
5.9
0.78
0.73
Ssa 202
11
9
10.8
8.8
0.86
0.78
SSOSL 85
11
8
10.7
7.9
0.86
0.84
SSOSL 438
7
5
6.3
5.0
0.65
0.72
Ssa 412
4
2
3.9
2.0
0.52
0.39
SLEEI 84
15
7
13.6
7.0
0.87
0.74
SLEEI 53
4
2
3.9
2.0
0.58
0.22
SSD 30
4
3
4.0
3.0
0.34
0.20
Ssa 422
8
6
7.7
6.0
0.76
0.75
Ssa 14
2
2
2.0
2.0
0.47
0.47
SSF 43
3
3
3.0
3.0
0.30
0.43
Sleen 82
7
4
6.8
3. 9
0.78
0.30
SSOSL 25
8
6
7.0
6.0
0.74
0.68
Ssa 289
3
5
3.0
5.0
0.55
0.64
0.06***
0.03***
0.17***
0.05***
0.04***
0.11***
0.00NS
0.04***
0.26***
0.01NS
0.11***
0.00NS
0.01NS
0.26***
0.18***
0.22***
Total
8.4
5.2
7.8
5.2
0.69
0.60
0.10***
Two datasets to explore different levels of
hybridization were constructed
•In the high hybridization scenario (hybrid
proportion = 22%)
•In the low hybridization scenario (hybrid
proportion = 1.6%)
Real-life scenario in Teno
1.75% 3.2%
3.1% 0.75%
0.9
0.3
Simulated data
Fst 0.12 and 24 loc i
0.2
0.1
0
0.1
2.3%
0.3
0.5
0.7
0.2%
0.9
Empirical, real-life
microsatellite data
based results are in
concordance with the
simulations
1.8% 5.5%
0.9
Em pirica l data
0.3
Fst 0.10 and 17 loc i
0.2
0.1
0
0.1
0.3
0.5
0.7
0.9
5.5% of hybrids
mis-identified as
purebred wild
Comparison of individual classification,
simulated data
F1-hybrid
wild
Results: combining the information of the
two programs
F1-hybrid
wild
All hybrids were successfully identified with only 4% of wild individuals
misclassified as hybrids
In conclusion
The potential for detecting hybridization between
individuals from populations with only moderate
genetic divergence is challenging
improved accuracy by using two methods in tandem
For any hybrid identification study utilizing programs
such as structure and NewHybrids, it is advisable to
simulate hybrid individuals
in order to gain an insight into the level of efficiency and
accuracy with which hybrids could potentially be
distinguished from purebred individuals.
Thus far we have considered the case of two populations, but
Atlantic salmon:
evidence of within-river genetic structure
run timing
(Stewart et al. 2002)
age at smolting
(Englund et al. 1999)
sea-age at maturity
(Niemelä 2004)
Provide circumstantial evidence on within river genetic structuring
Heterogeneity in nearly neutral and non-neutral genetic
marker allele frequencies (Verspoor et al. 2005; Landry and
Bernatchez, 2001)
1st aim of the study:
To assess the level of population
structuring of Atlantic salmon within the
Teno river system using neutral
microsatellite markers
Large sub-arctic river system:
Teno river
14 main tributaries to the mainstem
No stocking history
n=792 adult salmon
16 distinct sampling sites
Mainstem 15-30th August
Inferring populations
An objective population genetic study requires an
approach that does not rely on pre-defined
populations
Application of model-based methods is a promising
approach for objective and accurate identification of
populations driven solely by information in the genetic data.
Structure (Falush et al. 2003) and BAPS 3.2 (Corander et al. 2003)
Extensive genotyping effort
29 neutral, unlinked microsatellite loci
Computationally very intensive
Results: spatial genetic structuring
Generally, defining
populations by main tributaries
was observed to be a reasonable
approach in this large river
system
14 inferred
genetic clusters
corresponding to the
geographical topology
of the river
Vähä et al. (In press) Mol. Ecol.
Results: Genetic diversity and divergence
Mainstem and headwater populations vs. tributary
populations
Hybrid detection:
Under spatially structured population
Each main tributary fosters highly diverged unique
population, while mainstem and headwater
populations were genetically more diverse and less
diverged
How does the spatial population structure affect our
ability to detect hybrids of wild and farmed salmon?
Teno river system:
number of farm escapees
Teno mainstem lower (TmsL)
Teno mainstem upper (TmsU)
Pulmanki tributary population (Pul)
Genetic diversity of farm
escapees caught in the Teno river
14 microsatellite loci
Esc
TMSL
TMSU
Pul
Allele
richness
9.86
9.63
8.65
6.93
private
allele
richness
15.51
8.71
7.47
10.33
He
0.79
0.74
0.74
0.67
Fst
Esc
TMSL
TMSU
Pul
Esc
TMSL TMSU
0.045
0.048 0.023
0.108 0.098 0.097
Alta net-pen salmon
Locus
Allelic Richness
Allele #
Total
HE
Fst
Teno
Alta
Teno
Alta
Teno
Alta
pairwise
8.4
5.2
7.8
5.2
0.69
0.60
0.10***
Single or several strains ?
Genetic diversity of farm
escapees caught in the Teno river
14 microsatellite loci
Esc
TMSL
TMSU
Pul
Allele
richness
9.86
9.63
8.65
6.93
private
allele
richness
15.51
8.71
7.47
10.33
He
0.79
0.74
0.74
0.67
Fst
Esc
TMSL
TMSU
Pul
Esc
TMSL TMSU
0.045
0.048 0.023
0.108 0.098 0.097
Simulated 50 hybrids (TmsL x Esc)
Structure program
Hierarchical approach
Individuals of inferred clusters are omitted and Structure is
subsequently run on partitioned data
Partitioning selected applying the approach of Evanno et al. 2005
Hybrid detection:
Under spatially structured population
Number of inferred clusters, K=2
Number of inferred
clusters, K=2
Escapees
TmsL
TmsU
Pulmanki
F1-hybrids
q>0.8; 68/69
n
71
64
50
76
69
pop Esc
Esc
TmsL
H
TmsU
Pul
TmsL
H
TmsU
Pul
68
Omitted from data and
Structure run subsequently
on partitioned data
Hybrid detection:
Under spatially structured population
Number of inferred clusters, K=3
Escapees
TmsL
TmsU
q>0.8; 67/76
n
71
64
50
76
69
pop Esc
Esc
TmsL
H
TmsU
Pul
TmsL
H
TmsU
F1-hybrids
One
Pulmanki
individual
q>0.8; 1/50
Pul
1
67
68
Omitted from data and
Structure run subsequently
on partitioned data
Hybrid detection:
Results
Number of inferred clusters, K=2
Escapees
q<0.2; 63/71,
TmsL
F1-hybrids
q<0.8; 57/64
However we know they are farm escapees
n
71
64
50
76
69
pop
Esc
Esc
(63)
TmsL
1
H
10
TmsU
Pul
TmsL
57
13
7
1
H
(8)
6
26
2
9 TmsU
individuals
TmsU
One
Pulmanki
individual
Pul
1
67
68
Hybrid detection:
Results
Number of inferred clusters, K=2
Escapees
TmsL
F1-hybrids
9 TmsU
individuals
71
64
50
76
69
One
Pulmanki
individual
pop Esc TmsL
H
TmsU Pul
Esc
TmsL 2 % 89 % 9 %
H
20 % 26 % 52 % 2 %
TmsU
9%
3 % 88 %
Pul
1%
99 %
28% of
hybrids
undetected
Hybrid detection:
conclusions and future aspects
Population structure complicates hybrid detection
I considered only a subset of the salmon populations in Teno river
system
Smaller rivers, better success?
Increasing the number of loci would increase the hybrid
detection success
Selecting more diagnostic loci
Interlocus variation
Identification of ’domestication genes’ may enable the use of more
diagnostic markers in future
Thanks
Craig Primmer
supervising
Anti Vasemägi and Irma Saloniemi
Contemplating discussions
Jaakko Erkinaro and Eero Niemelä
providing samples
Maj and Tor Nessling Foundation
Providing funding
Results I
Efficiency of distinguishing between purebred and F1-hybrids
• Hybridization and
consecuences
purebred
F1-hybrid
Structure
• Detecting hybrid
individuals
- challenges and
solutions
• Aim of this work
• Methods
• Results on the
efficiency of methods
to detect hybrids
• Prospects for
detecting hybrids
between farmed and
wild salmon in Teno
• Conclusions
• Acknowledgements
Number of loci
Results II
Distinguishing between purebred, F1-hybrids and backcrosses
• Hybridization and
consecuences
• Detecting hybrid
individuals
- challenges and
solutions
• Aim of this work
• Methods
• Results on the
efficiency of methods
to detect hybrids
• Prospects for
detecting hybrids
between farmed and
wild salmon in Teno
• Conclusions
• Acknowledgements
purebred
F1-hybrid
Backcross
Prospects for detecting hybrids
between farmed and wild salmon in Teno
• Hybridization and
consecuences
• Detecting hybrid
individuals
- challenges and
solutions
• Aim of this work
• Methods
• Results on the
efficiency of methods
to detect hybrids
• Prospects for
detecting hybrids
between farmed and
wild salmon in Teno
• Conclusions
• Acknowledgements
• 44 historical (1972-1974) wild
salmon individuals
• 30 farm fish from a net-pen in
Altafjord
• 17 microsatellite loci
FST= 0.10
Prospects for detecting hybrids
between farmed and wild salmon in Teno
• Hybridization and
consecuences
• Detecting hybrid
individuals
- challenges and
solutions
• Aim of this work
• Methods
• Results on the
efficiency of methods
to detect hybrids
• Prospects for
detecting hybrids
between farmed and
wild salmon in Teno
• Conclusions
• Acknowledgements
• Using the current genotyping system
92.6% of individuals were correctly
classified by both programs
• Results based on empirical data in
concordance with simulated data results
• The potential for detecting hybridization
between individuals from populations
with only moderate genetic divergence
is promising
Prospects for detecting hybrids
between farmed and wild salmon in Teno
• Hybridization and
consecuences
• Detecting hybrid
individuals
- challenges and
solutions
• Aim of this work
• Methods
• Results on the
efficiency of methods
to detect hybrids
• Prospects for
detecting hybrids
between farmed and
wild salmon in Teno
• Conclusions
• Acknowledgements
• Increasing the number of loci
• More detailed analysis of borderline
individuals
• Selection of loci with highest assigment
efficiency
• Using linked loci
Inferring populations:
methods used
The underlying population structure within
the river system was deciphered using
clustering methods implemented in Structure
(Falush et al. 2003) and BAPS 3.2 (Corander et al. 2003)
The model pursues clustering solutions that minimize the
Hardy-Weinberg and linkage disequilibrium
Hierarchical approach
Individuals of inferred clusters are omitted and Structure is
subsequently run on partitioned data
Partitioning selected applying the approach of Evanno et al. 2005
29 neutral, unlinked microsatellite loci
Results: spatial genetic structuring
Generally, defining
populations by main tributaries
was observed to be a reasonable
approach in this large river
system
In the mainstem the number
of inferred populations was less
than the number of distinct
sample sites
tributary populations:
14 inferred
genetic clusters
corresponding to the
geographical topology
of the river
The average increase in pairwise
FST measures was 0.016 (from
FST= 0.086 to FST= 0.102).
Results: Genetic diversity and divergence
Mainstem and headwater populations vs. tributary
populations
2nd aim of the study:
To identify life-history and ecological
variables best predicting the genetic diversity
of populations.
Dependence of allelic richness and sum of
private allelic richness on one life-history
variable and five landscape variables
Allelic richness is standardized genetic diversity to the smallest N
in a comparison
Methods
Simulated populations
EASYPOP 1.8 (Balloux 2001: )
TPM (0.30)
mutation rate 0.0001
Ne 1000
20 populations
panmictic population for 28 000 generations
migration scheme changed
equilibrium
proprotion of correctly classified
Results: Sensitivity to proportion of hybrids
in the sample
structure
NewHybrids
parental individuals
hybrid individuals
>95% correctly classified
Analysis with NewHybrids
posterior probability of being
wild
farm
F1-hybrid
λ= Jeffreys prior
α= Jeffreys prior
values
overlap
100 %
90 %
posterior
probability
80 %
wild
Simulated individuals
F1-hybrids
Consequently
advisable for any hybrid identification
study utilizing programs such as
structure and NewHybrids to perform a
sensivity analysis
in order
to gain insight into the level of accuracy
and power with which hybrids can be
distinguished from purebred individuals
similar to that proposed by Blouin et al.
(1996) in estimating individual
relatedness
Prospects for assessing farm
escapee breeding success in
Teno
Using the two methods in tandem
in case of low hybridization
all hybrids were identified with only 4%
of wild individuals misclassified as
hybrids
Further improvements
larger number of loci
selection of loci with highest assigment
efficiency
using linked loci
Life-history and landscape variables
sampling
locale
Kevojoki
Tsarsjoki
Utsjoki
Maskejoki
Valjoki
Kuopp.joki
Pulmanki
Vetsijoki
Tana Bru
lower Teno
mid Teno
Utsjoki r.m.
Outakoski
Inarijoki
Iesjoki
Karasjoki
Inferred
populati
on
Proportion
of MSW
females
mainstem
or tributary
Accessibility
of the site
Distance
from the
sea
Altitude
(masl)
Kev
Tsa
Uts
Mas
Val
Kuo
Pul
Vet
1
1
1
4
2
2
2
3
4
4
4
4
1
1
1
1
1
1
1
1
2
2
2
2
5
6
4
2
4
5
2
4
1
3
3
3
136
139
142
58
188
131
85
109
38
80
106
109
100
230
105
65
180
225
20
200
10
45
62
67
4
3
5
5
2
2
2
2
4
4
4
4
180
261
269
298
120
170
240
220
TMSL
TMSU
Ies
Kar
Catch
ment
area
(km2)
474
232
946
595
531
102
743
691
7677
NA
1653
3147
2813
2206
Results of linear model:
Allelic richness
A significant proportion of variance in allelic richness
of populations was explained by three of the
variables:
‘MSW ‘ (R2=0.80, F1,10=38.9 ,p<0.0001)
‘catchment area’ (R2=0.64, F1,10=17.5, p=0.002)
‘mainstem/tributary’ (R2=0.64, F1,10=17.4 p=0.002)
The best model explaining 86% of the
variation in genetic diversity was when both
‘the proportion MSW female salmon’ in the
population and the ‘catchment area’ of the
river were included (R2=0.86, F2,9=27.9, p<0.0001)
Results of linear model:
private allelic richness
A significant proportion of variance in private allelic
richness of populations was explained by three of the
variables:
‘MSW’ (R2=0.61, F1,10=15.9, p=0.0026)
‘accessibility of the site’ (R2=0.52, F1,10=10.8, p=0.0082)
‘catchment area’ (R2=0.39, F1,10=6.4, p=0.030)
The best model, explaining 80% of the
variation, was when ‘accessibility of the site’
and ‘the proportion of MSW females’ were both
included in the model. (R =0.80, F =18.1, p<0.0007)
2
2,9
Summary and conclusions
Each main tributary fosters highly diverged unique
population, while mainstem and headwater populations
were genetically more diverse and less diverged
Population structure and variation in genetic diversity of
populations were poorly explained by geographical distance
In contrast, age-structure was found the most predictive
variable in explaining the variation in the genetic diversity of
the populations.
Highlights the importance of multi-sea-winter fish on the
effective population size and ultimately on the genetic
diversity of the total population possibly by leveling off the
annual fluctuations in the population size.
Thanks
Anti Vasemägi and Irma Saloniemi
Contemplating discussions
Craig Primmer
supervising
Jaakko Erkinaro and Eero Niemelä
providing samples
Maj and Tor Nessling Foundation
Providing funding