Genomic Selection Theory 11 July 2012 Jean-Luc Jannink 1 Acknowledgments • Martha T. Hamblin, Nicolas Heslot, Jeff Endelman, Jessica Rutkoski, Delphine Ly, Julie Dawson, Mark Sorrells • Vanessa Weber, Gary Atlin, Albrecht Melchinger • Kevin P. Smith, Vikas Vikram, Ahmad H. Sallam, Aaron J. Lorenz • Moshood A. Bakare, Melaku Gedil, Ismail Rabbi, Peter Kulakow Outline • What is genomic selection? – Minimal model – Prediction accuracy • Number of markers • Size of the training population – Should you replicate lines in the TP? – Population structure • Relationship between the training population and the selection candidates Genomic selection: Prediction using many markers Breeding Material Genotyping Calculate GEBV Make Selections Meuwissen et al. 2001 Genetics 157:1819-1829 Genomic selection principles • Meuwissen et al. 2001 Genetics 157:1819-1829 • No distinction between “significant” and “nonsignificant”: all markers contribute to prediction • (–) More markers than there are phenotypes • (+) Estimated effects are unbiased • (+) Capture small effects Statistical modeling: The two cultures X Observed inputs Nature Can we understand Y? X Regression Observed responses Y Identify causal inputs Y Can we predict Y? X ? Regression Decision trees Whatever works Breiman 2001 Stat. Sci. 16:199-231 Y Baseline model This is an allele dosage! 7 Marker effects –> additive relationship 8 Prediction accuracy = Correlation(predicted, true) R = irAσA rA = corr(selection criterion, breeding value) • On simulated data corr(Â, A) is easy • On real data: Prediction accuracy = Correlation(predicted, true) R = irAσA rA = corr(selection criterion, breeding value) • On simulated data corr(Â, A) is easy • On real data: Effective population size • Idealized populations randomly mate (real populations don’t) • Increasing N weakens drift and strengthens selection force to shift allele frequencies • Those forces have a certain strength in a real population • Ne summarizes that with reference to an idealized population Why Ne matters • As Ne increases “historical recombination” increases – Segments stay in the population longer before being eliminated or fixed – Recombination generates segments that segregate independently – Expectation: 2Ne effective segments per Morgan E(r2) = 1/(1 + 4Nec) E(r2) = (5 + 2Nec)/(11 + 26Nec + 8[Nec]2) Marker density requirements Solberg et al. 2008 Markers per Morgan 0.9 Accuracy 0.8 0.7 0.6 0.5 SNP: 1 Ne 2 Ne 4 Ne 8 Ne SSR: ¼ Ne ½ Ne 1 Ne 2 Ne Calus & Veerkamp 2007 • Average adjacent marker LD should be r2 = 0.20 • Implies a density of 4Ne markers per Morgan Training population size requirements • Daetwyler, H.D. et al. 2008. Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach. PLoS ONE 3:e3395 • Assume all loci affecting the trait are known and are independent • Assume marker effects are fixed λ 20 10 5 2 1 0.5 0.1 0.02 Replicating hurts: 2000 with 1 plot is better than 1000 with 2 plots To replicate or not to replicate 500 Lines replicated once Ridge Regression 168 Lines replicated three times BayesB More lines –> higher pressure Mean of selected poplua on Mean of selected population 1.40 1.30 h2 = 0.7 h2 = 0.7 1.20 h2 = 0.2 1.10 1.00 0.90 h2 = 0.2 0.80 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Fraction replicated Frac on replicated 0.8 0.9 1 The importance of structure • Correlation between prediction and phenotype for Grain Yield, Anthesis Date, and Anthesis–Silking Interval in maize • CIMMYT diversity panel • 50K SNPs; TP size: 200; VP size 50 GY: 0.44 AD: 0.45 ASI: 0.36 • Structure: 8 Subpopulations – Use 200 to estimate subpopulation mean then prediction is mean for subpopulation of origin GY: 0.50 AD: 0.44 ASI: 0.46 • Fraction of the variation due to subpopulation GY: 0.26 AD: 0.16 ASI: 0.27 Accuracy VP The importance of being in relationship TP Clark et al. GSE 2012 44:4 Mean Max Relationship As it might play out in alfalfa Clonally propagate & Intercross Clonally propagate ? Intercross 50 ½-Sib families 800 Plants 50 Plants Phenotype 800 Families Genotype & Select Update Model 1 Cycle / Year (?) 3 Years to Phenotype (?) Take home messages • This approach is totally feasible now – Statistical and marker technologies are easily powerful enough – Logistics and informatics are a challenge • Planning the training population is most critical • Expect some outcomes to be non-intuitive • Don’t trust a theoretician: go out and do it Questions? Empirical vignettes • Genomic selection can work for a difficult crop: Cassava • We have a field-validated success in barley Is cassava a little like a forage? • Cassava is a “young” crop (still a bit wild) • Cassava is outcrossed and clonally propagated Phenotypic data • • • • DNA from 623 Clones important at IITA ~ 50,000 plots of historic data Very high differences in replication 17 traits (disease, morphology, yield) GBS markers • 4984 SNPs • Average of 25.5% missing data per marker • Calling heterozygotes is a challenge Validation method • Cross validation: Predict subsets of the data that did not contribute to building the model Genomic Prediction Model Accuracy genomic predic on Comparison to phenotypic accuracy 0.8 0.6 CMD 0.4 0.2 0.0 0 0.2 0.4 0.6 0.8 Accuracy single phenotypic plot Higher accuracy would be good but • GS will reduce the breeding cycle time from 5 to 2 years [2.5 × Faster] • Any accuracy above 0.4 cannot be beat by phenotypic selection • Biases make phenotypic look better and genomic look worse than reality Barley Fusarium head blight FHB resistance is polygenic 8 10 13-14 2-4 3-4 1(7H) 3-4 5-8 12-14 3 5(1H) Frederickson 13 15-16 6-8 4-6 Chevron 11-12 3(3H) 2(2H) 13 4(4H) 6 Harbin 1 6(6H) CI4196 Zhedar 2 9-10 13 7(5H) Gobernadora Russian 6 Genomic selection set up • • • • • • • Three breeding programs 685 barley (6-row) lines Three years, 2 locations 1500 SNPs Choose 384 optimized for PIC & distribution 1440 progeny from 60 crosses “Project” parental SNP onto progeny Head-to-head comparison Crossing AxB Yr 1 F1 Inbreeding F2 F3 Yr 2 F4 Winter Nursery F4:5 Plant Rows Phenotypic Selection Crossing AxB F1 Inbreeding F2 F3 F4 Winter Nursery Yr 3 Evaluation Yr 1 Evaluation Genomic Selection Yr 2 40% 50% 60% 70% 80% 90% 100% 110% 120% 130% 140% 150% 160% 170% 180% 190% 200% 210% 220% 10 8 10 40% 50% 60% 70% 80% 90% 100% 110% 120% 130% 140% 150% 160% 170% 180% 190% 200% 210% 220% Phenotypic vs. Genomic Selection FHB Yield 16 30 14 12 25 99.7 86.2 Phenotypic 6 4 2 0 20 15 Genomic 5 0 20 15 15 10 5 0 90.6 10 5 0 30 25 20 100.7
© Copyright 2026 Paperzz