Genetic Diversity in Tasmanian Atlantic Salmon and

Genetic Diversity in Tasmanian Atlantic Salmon and
Prospects for GWAS and Genomic Prediction
James Kijas 1, Peter Kube 1, Brad Evans 2, Natasha Botwright 1,
Harry King 1, Craig Primmer 3, Klara Verbyla 1
1
AGRICULTURE
2
3
OUTLINE
1. Genomic Resources
- reference assembly, SNP chips
- opportunities
2. Tasmanian Atlantic Salmon
- population history
- selective breeding program
- objectives
3. Results
- genetic diversity, LD, imputation
4. Conclusions and Next Steps
1. Genomic Resources
AGRICULTURE
Genomic Resources Include:
• Haploid female
• 200 x ILMN, 4 x Sanger
• Genome duplication
SNP Genotyping Arrays
Lien et al Nature 2016
Transciptomes
Lien et al Nature 2016
Reference Assembly
• 37,206 PCGs
• 16K iSelect (2009)
• 130K, 200K Affy arrays (2014)
• Discovery panels european
Opportunities:
Genomic Prediction
Barson et al Nature 2015
GWAS
• Incorporates genotypes in BV estimation
• Speed genetic gain, success in livestock
• Highly relevant to our Salmon industries
• Major genes
2. Tasmanian Atlantic Salmon
AGRICULTURE
Population History
Transfer of animals in 1964 / 1965
Breeding Program from 2001
Population History
 Earliest consequences of domestication, GxE
 Founder effect
 Genome and SNP platforms from European stocks
The Tasmanian salmon breeding cycle
female
DNA taken from every fish
One-year old smolt are tagged
and fin-clipped
Fin-clips into 2 ml tubes, DNA
for pedigree plus..
Tags allow automated data
capture and fail safe systems
Any idiot can be called
upon to help at spawning...
female
male
Selective Breeding Program (SBP)
Challenges
• Early maturation
• Sex determination
• All female commercial animals
Focus Traits
• Growth rate
• Ameobic Gill Disease
• Flesh quality, other minor traits
NEXT STEPS
-
Genomic Prediction
-
GWAS for key traits
-
Impact of domestication and selection
Objectives
1:
Evaluate Levels of Genetic Diversity
- population comparison
- across year classes since inception of SBP
2:
Measure extent of linkage disequilibrium  GWAS
3:
Imputation  ongoing GP program
3. RESULTS
AGRICULTURE
Materials and Methods:
782 fish from the SBP
Genotyped using custom 220K Affymetrix array (AquaGen, CiGene)
Data QC and filtering (call and sample rate)
777 fish with high quality SNP data
Genetic Diversity: Polymorphism
PN
Proportion of polymorphic loci
H E1
Exp. Heterozygosity, all SNP
2
HE
Exp. Heterozygosity, poly. SNP
D ST
Average pairwise distance
Population Location
Status
N
SNP
PN
H E1
H E2
D ST
TAS
FIN_55
FIN_56
Farmed
Wild
Wild
782
137
326
218132
208704
208704
0.537
0.999
0.999
0.119
0.381
0.380
0.222
0.381
0.380
0.200
0.313
0.310
Tasmania
Finland
Finland
Very high rate of monomorphism in the TAS population
Source
This study
Barson et al. 2015
Barson et al . 2015
Genetic Diversity: MAF
Allele Frequency Distribution: 106 K SNP
TAS
25
20
Proportion of SNP
20
Proportion of SNP
FIN
25
15
10
5
15
10
5
0
0
0.0 0.05
0.05 0.10
0.10 0.15
0.15 0.20
0.20 0.25
0.25 0.30
0.30 0.35
0.35 0.40
0.40 0.45
0.45 0.50
≥ 0.5
Minor Allele Frequency Bin
55% of polymorphic SNP had MAF 15%
Likely ascertainment bias in SNP collection
0.0 0.05
0.05 0.10
0.10 0.15
0.15 0.20
0.20 0.25
0.25 0.30
0.30 0.35
0.35 0.40
Minor Allele Frequency Bin
0.40 0.45
0.45 0.50
≥ 0.5
Genetic Diversity: Inbreeding F
Reduction in heterozygosity compared with HWE
Higher inbreeding, higher F values (PLINK v1.p)
Expect SBP to affect F
Genetic Diversity: Inbreeding F
Distribution of Individual Inbreeding Coefficient (F)
40
Proprotion of Animals (%)
35
Founders 2001 - 2003 (n=131)
30
2010 Year Class (n=100)
25
20
15
10
5
0
Inbreeding Coefficient (F) Bin
Linkage Disequilibrium
Allele_A
Allele_B
Distance
Between
Adjacent
SNP
pair
spacing
onSNP
the(Kb)chip
Ssa01
2000
1800
SNP1 (A/T)
1600
SNP2 (G/C)
Number of SNP Pairs
1400
Min Gap
Max Gap
Average Gap
1200
1000
1 bp
450,156 bp
22 Kb
+/- 35 Kb
800
600
400
LD as r2
Range 0 - 1
200
0
5
15
25
35
45
55
65
75
85
95 105 115 125 135 145 155 165 175 185 195 205 215 225 235 245 255 265 275 285 295
Distance Between Adjacent SNP (Kb)
SN
Linkage Disequilibrium
LD Decay Comparing Two SNP sets
0.7
Linkage Disequilibrium (r2)
0.6
0.5
SNP Set Properties
All SNP
High MAF SNP
SNP Number
MAF average
Average SNP Pair Distance
Total SNP Pairs
Average Pairs Per Distance Bin
106,492
0.167
88,906
932,726
1,865
21,372
0.337
200,082
126,664
254
0.4
0.3
0.2
High MAF SNP
0.1
All SNP
0
0
50
100
150
200
250
300
350
400
450
500
Marker Distance (Kb)
Very high LD at short distances; ascertainment bias not the cause.
Linkage Disequilibrium: Average r2 Across Distance Bins
Marker
Distance (kb)
0 to 10
10 to 20
20 to 30
30 to 40
40 to 50
50 to 100
100 to 200
200 to 300
300 to 500
Tasmanian Population
Finnish Population
106K SNP 21K SNP 106K SNP 106K SNP 106K SNP 21K SNP
All Fish
All Fish
Males
Females
All Fish
All Fish
0.540
0.412
0.363
0.334
0.312
0.270
0.211
0.171
0.131
0.441
0.361
0.327
0.303
0.285
0.248
0.215
0.177
0.153
0.541
0.414
0.366
0.335
0.314
0.272
0.212
0.173
0.133
0.541
0.414
0.366
0.335
0.314
0.272
0.213
0.173
0.133
0.037
0.027
0.024
0.022
0.021
0.019
0.017
0.016
0.015
0.032
0.025
0.022
0.021
0.020
0.017
0.016
0.014
0.014
Linkage Disequilibrium: Average r2 Across Distance Bins
Marker
Distance (kb)
0 to 10
10 to 20
20 to 30
30 to 40
40 to 50
50 to 100
100 to 200
200 to 300
300 to 500
Tasmanian Population
Finnish Population
106K SNP 21K SNP 106K SNP 106K SNP 106K SNP 21K SNP
All Fish
All Fish
Males
Females
All Fish
All Fish
0.540
0.412
0.363
0.334
0.312
0.270
0.211
0.171
0.131
0.441
0.361
0.327
0.303
0.285
0.248
0.215
0.177
0.153
0.541
0.414
0.366
0.335
0.314
0.272
0.212
0.173
0.133
0.541
0.414
0.366
0.335
0.314
0.272
0.213
0.173
0.133
0.037
0.027
0.024
0.022
0.021
0.019
0.017
0.016
0.015
LD dramatically higher in TAS (farmed) versus FIN (wild) population.
0.032
0.025
0.022
0.021
0.020
0.017
0.016
0.014
0.014
Linkage Disequilibrium: Average r2 Across Distance Bins
Marker
Distance (kb)
0 to 10
10 to 20
20 to 30
30 to 40
40 to 50
50 to 100
100 to 200
200 to 300
300 to 500
Tasmanian Population
Finnish Population
106K SNP 21K SNP 106K SNP 106K SNP 106K SNP 21K SNP
All Fish
All Fish
Males
Females
All Fish
All Fish
0.540
0.412
0.363
0.334
0.312
0.270
0.211
0.171
0.131
0.441
0.361
0.327
0.303
0.285
0.248
0.215
0.177
0.153
0.541
0.414
0.366
0.335
0.314
0.272
0.212
0.173
0.133
0.541
0.414
0.366
0.335
0.314
0.272
0.213
0.173
0.133
0.037
0.027
0.024
0.022
0.021
0.019
0.017
0.016
0.015
Should translate into high power for GWAS to tag haplotypes.
0.032
0.025
0.022
0.021
0.020
0.017
0.016
0.014
0.014
Average Gap
22 Kb +/- 35 Kb
Imputation: How many SNP do we need to impute with accuracy ?
Year Class
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
N
31
42
58
12
81
107
120
123
97
100
6
Reference Panel
78 K SNP
574 fish
Test Panel: set SNP to missing to create 4 panels
203 fish
IMPUTATION ACCURACY 
0.5 K SNP
89%
78 K SNP
1 K SNP
3 K SNP
5 K SNP
92%
96%
97%
78 K SNP
78 K SNP
78 K SNP
Means we can deploy GP with low density SNP genotyping and imputation.
4. CONCLUSIONS
AGRICULTURE
Genetic Diversity
 Diversity levels consistent with population history
 Genome sequencing to provide an unbiased comparative estimate
Linkage Disequilibrium
 High LD at short to medium physical distances
 Good for detecting gene effects with medium density SNP chips
 Critical intervals likely to be large
Imputation
 Accuracies high due to population history
 Good for delivery of genomic prediction using low density SNP panels ($)
Next Steps:
 Characterisation of the sex determination locus
 Biological understanding of unwanted early maturation
 Consequences of domestication on genome variation
Completed 30 x genome sequencing of 20 SBP fish
male
female
Acknowledgements
Brad Evans
Peter Kube
Natasha Botwright
Harry King
Klara Verbyla
AGRICULTURE
Craig Primmer
SPEAKERS
Naomi Wray
David Hume
Anna Campbell
Heather Burrow
Andres Legarra
Tad Sonstegard
Lucia Galvao de Albuquerque
2