SNP chips and whole genome sequence data: Do they tell us the

SNP chips and whole genome sequence data: Do they tell us the same story?
SNP chip
BovineHD SNP with MAF > 0.05 in 176
sequenced GPE bulls
Most genomic variation is shared among all
cattle
• 82% of BovineHD SNP detected in
sequence from all CycleVII breeds
• 0.1% detected in only one CycleVII
breed
Genome sequence
coding sequence variants with MAF > 0.05
in sequence of 176 GPE bulls
Much coding sequence variation (SV) is not
shared by all breeds
• 33% of SV detected in sequence from
all CycleVII breeds
• 5% detected only in one CycleVII
breed
SNP effects estimated in different breeds and crosses may be due to different QTL.
Much genomic variation occurs at moderate to
high frequency
• mean MAF = 0.30
• half of BovineHD SNP have MAF > 0.3
Most SV occur at low frequency
• mean MAF = 0.21
• half of SV have MAF < 0.17
Similar MAF needed for SNP to be highly correlated (strong Linkage Disequilibrium;
LD). Opportunity for SNP in strong LD with QTL may be limited, if QTL and SV
distributions are similar.
Average correlations (LD) decrease with
increased distance between SNP and QTL
Separation between HD-HD and HD-SV pairs
in strong LD is greater than that suggested by
average LD
•
most close HD SNP in strong LD
o some 0 LD
•
most SV have 0 LD with close HD
o some strong LD
On average, LD does decrease with increased separation. Average values do not
reflect distributions of LD between close or distant SNP. QTL may not have strong LD
with close SNP, but could be in LD with distant SNP.
Density matters
• BovineSNP50 designed for every
(unknown) QTL to be within 50 Kb of a
SNP, with SNP-QTL r2 > .30
• BovineHD should have every QTL
within 3 Kb of SNP, r2 > .60
More than density matters
• 80% of SV do not have r2 > 0.30 with
50K SNP closer than 50 Kb
• 78% of SV do not have r2 > 0.60 with
HD SNP closer than 3.5 Kb
• 80% of SV not in strong LD with any
HD SNP
• 79% of SV have moderate LD with HD
SNP, 74% with multiple HD SNP
Expected QTL-SNP correlations, based on average LD between close HD SNP, are not met by
LD observed between HD SNP and variants detected in sequence. Not meeting expectations
has little impact on within-breed genomic predictions, as aggregate SNP effects capture
contributions of many QTL correlated with SNP genotypes. Interpreting individual SNP effect
estimates is impacted. The expectation that QTL are near SNP associated with phenotype has
limited the search for candidate genes and variants affecting phenotype to a few thousand
bases around significant SNP. Observed HD-SV LD suggests the search might be expanded to
include genes a million bases away, and allow for several candidates correlated with one
significant SNP. Also, differences between within-breed and across-breed HD-SV LD show why
LD-dependent genomic predictions are not effective across breeds and crosses.
Dependence on LD for genomic prediction could be reduced by replacing SNP chip genotypes
with genotypes for sequence variants likely to affect phenotype. Genomic sequence available
from the 1000 bull genomes and other projects might be used to impute SV, but reported
imputation accuracies are low, especially for low MAF variants. Other genotyping options
include a functional variant assay developed by the University of Missouri, Neogen and Illumina,
and custom chip or sequencing panels targeting specific variants. Genomic prediction with
these genotypes will still be affected by LD among these variants and QTL, but it may be
possible to identify influential variants having consistent effects across several breeds and
crosses.