SNP chips and whole genome sequence data: Do they tell us the same story? SNP chip BovineHD SNP with MAF > 0.05 in 176 sequenced GPE bulls Most genomic variation is shared among all cattle • 82% of BovineHD SNP detected in sequence from all CycleVII breeds • 0.1% detected in only one CycleVII breed Genome sequence coding sequence variants with MAF > 0.05 in sequence of 176 GPE bulls Much coding sequence variation (SV) is not shared by all breeds • 33% of SV detected in sequence from all CycleVII breeds • 5% detected only in one CycleVII breed SNP effects estimated in different breeds and crosses may be due to different QTL. Much genomic variation occurs at moderate to high frequency • mean MAF = 0.30 • half of BovineHD SNP have MAF > 0.3 Most SV occur at low frequency • mean MAF = 0.21 • half of SV have MAF < 0.17 Similar MAF needed for SNP to be highly correlated (strong Linkage Disequilibrium; LD). Opportunity for SNP in strong LD with QTL may be limited, if QTL and SV distributions are similar. Average correlations (LD) decrease with increased distance between SNP and QTL Separation between HD-HD and HD-SV pairs in strong LD is greater than that suggested by average LD • most close HD SNP in strong LD o some 0 LD • most SV have 0 LD with close HD o some strong LD On average, LD does decrease with increased separation. Average values do not reflect distributions of LD between close or distant SNP. QTL may not have strong LD with close SNP, but could be in LD with distant SNP. Density matters • BovineSNP50 designed for every (unknown) QTL to be within 50 Kb of a SNP, with SNP-QTL r2 > .30 • BovineHD should have every QTL within 3 Kb of SNP, r2 > .60 More than density matters • 80% of SV do not have r2 > 0.30 with 50K SNP closer than 50 Kb • 78% of SV do not have r2 > 0.60 with HD SNP closer than 3.5 Kb • 80% of SV not in strong LD with any HD SNP • 79% of SV have moderate LD with HD SNP, 74% with multiple HD SNP Expected QTL-SNP correlations, based on average LD between close HD SNP, are not met by LD observed between HD SNP and variants detected in sequence. Not meeting expectations has little impact on within-breed genomic predictions, as aggregate SNP effects capture contributions of many QTL correlated with SNP genotypes. Interpreting individual SNP effect estimates is impacted. The expectation that QTL are near SNP associated with phenotype has limited the search for candidate genes and variants affecting phenotype to a few thousand bases around significant SNP. Observed HD-SV LD suggests the search might be expanded to include genes a million bases away, and allow for several candidates correlated with one significant SNP. Also, differences between within-breed and across-breed HD-SV LD show why LD-dependent genomic predictions are not effective across breeds and crosses. Dependence on LD for genomic prediction could be reduced by replacing SNP chip genotypes with genotypes for sequence variants likely to affect phenotype. Genomic sequence available from the 1000 bull genomes and other projects might be used to impute SV, but reported imputation accuracies are low, especially for low MAF variants. Other genotyping options include a functional variant assay developed by the University of Missouri, Neogen and Illumina, and custom chip or sequencing panels targeting specific variants. Genomic prediction with these genotypes will still be affected by LD among these variants and QTL, but it may be possible to identify influential variants having consistent effects across several breeds and crosses.
© Copyright 2026 Paperzz