From Phenotype to Genotype and Back Again — Animal Genomics Enabling Prediction Alan L. Archibald The Roslin Institute and Royal (Dick) School of Veterinary Studies University of Edinburgh Genotype - phenotype • Aim – To predict outcomes • • • • Efficacy of drug Susceptibility to cancer Performance of daughters of elite dairy bull Susceptibility to nematode infections • Discovery – From phenotype to genotype (gene) • Prediction – From phenotype to genotype (breeding value) – From genotype to phenotype – From sequence to consequence 1953 Watson and Crick 1977 DNA sequenced ΦX174 5,386 nt 1990 Human Genome Project launched Animal model, infinitesimal model 1920s and 30s Fisher, Lush and others Population Genetics 1970s + Advances in quantitative analysis 1991 PiGMaP project starts 2001 Draft human genome sequence ‘Halothane’ gene test Marker Assisted Selection (MAS) 1990s + Quantitative trait locus (QTL) mapping 2001 Genomic selection proposed PiGMaP – 25 years old €1.2 million Linkage / recombination map Physical / cytogenetic map Comparative map cDNA (ESTs) microsatellites 1953 Watson and Crick 1977 DNA sequenced ΦX174 5,386 nt 1990 Human Genome Project launched Animal model, infinitesimal model 1920s and 30s Fisher, Lush and others Population Genetics 1970s + Advances in quantitative analysis 1991 PiGMaP project starts 2001 Draft human genome sequence ‘Halothane’ gene test Marker Assisted Selection (MAS) 1990s + Quantitative trait locus (QTL) mapping 2001 Genomic selection proposed 1962 2002 Prediction success • • • • Selective animal breeding Animal model Phenotypic selection Prediction of breeding value (genotype) from phenotype • Successful EU companies • e.g. AquaGen, Aviagen, Cherry Valley, Cogent, CRV, Genus, JSR Genetics, Hendrix-Genetics, Landcatch Natural Selection, Topigs Norsvin 50% more pigs 14 pigs/yr 21 pigs/yr 33% less feed 410 kg 34 kg feed / pig lean / pig 33% more lean 273 kg 45 kg feed / pig lean / pig Modern intensive agriculture is efficient “Why Industrial Farms Are Good for the Environment” Jayson Lusk, New York Times, 23 Sept 2016 Selection works • Age – matched • Seven rounds of selection per annum • Black box, but… Successes – from association to causation • DGAT1 – dairy cattle, milk yield • Callipyge – sheep, muscling • MSTN – sheep, muscling • IGF2 – pigs, muscling • Noteworthy – Regulatory sequences, epigenetics • One gene at a time: slow, inefficient Knowledge of causation enabled more sophisticated selective breeding 2001 Genomic selection proposed 2002 Mouse draft genome sequence 2004 2003 Chicken Human genome genome sequence “finished” sequenced $3 billion 2008 Human 1000 Genomes Project 2007 launched Cat genome sequenced 2010 Turkey genome sequenced 2009 Cattle genome sequenced Horse genome Sequenced Mouse genome “finished” 2005 Dog genome sequenced 2003 ENCODE (1%) launched 2008 Bovine 50K SNP chip 2007 ENCODE genome-wide 2009 2010 Pig 60K SNP chip 750K bovine SNP chips Sheep 60K SNP chip From Marker-Assisted to Genomic Selection “…This type of approach combined with cheap and high density markers, could allow a move from selection based on a combination of “infinitesimal” effects plus individual loci to effective total genomic selection…..” Genomics already delivering socioeconomic impact in agriculture • Genomic selection (GS) – GS theory developed in 2001 before technology available – First 50K SNP chip (cattle) 2008; 650K in 2010 – GS implemented in all major livestock sectors in developed world – GS is underpinning faster, more accurate and sustainable genetic improvement Accuracy – what has been achieved? USDA dairy cattle genomic evaluation Courtesy of George Wiggans (USDA, Beltsville) Milk yield Pedigree Genomic Accuracy 0.51 0.86 Evolution of Genomic selection • GS0.0 – The original model – Linkage disequilibrium based • GS1.0 – What has happened in practice – Linkage based • GS2.0 – The future – LD and QTN based – Requires lots of data Goddard & Hayes Nat. Rev. Genet. 2009 GS accuracy • Accurate really only for close relatives 0.9 Accuracy 0.8 0.7 0.6 R² = 0.962 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Mean of the Top Ten Relationships Clark et al. (2012) From SNPs to sequence • In next five years sequence data will supplant SNP genotypes • Two approaches – Sequencing individuals (e.g. 1000 Bull genomes project) • Expensive even at $1000 per genome • Alternatively genotyping-by-sequencing on new platforms (e.g. Illumina HiSeqX), then impute – Sequencing populations • Aiming for $10 per genome Multiple (aligned) animal genomes Pigs • • • • Groenen (Wageningen) ~300 individual pigs Korean ~60 individual pigs China ?? Pigs 96 pig exomes (Roslin) Sheep • 453 genomes in SheepGenomesDB http://sheepgenomesdb.org/home Chickens • 10’s of individuals (e.g. 10 individual J line brown egg layers) Cattle ~3,000 genomes (Taylor estimate) 1000 Bull Genomes Project • • • • Collaborative, Cloud data repository 1500+ bulls, average coverage ~11x Data analysis cycles for genomic prediction NextGen – >400 sheep, goat, cattle genomes Sequencing populations • Aim: sequence data for 100K to 1M individuals at $10 per individual • Exploits: – pedigree structures in managed population – imputation from low sequence coverage • Assemble shared halpotypes from partial low coverage sequence of 100’s of related individuals LCSeq for whole genome sequencing A True haplotypes 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Derive consensus haplotypes 2x sequencing (10 individuals) 1 Sire 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Progeny … 1 1 1 Progeny 1 Progeny 2 1 1 1 1 1 1 1 Progeny 10 1 1 1 1 Progeny … Sire 1 Progeny 1 Progeny 2 Impute sequence for all individuals 1 1 1 1 Progeny 10 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 B 1 1 1 1 10x sequencing (5 individuals) 1 1 1 1 1 1 C • Sequencing few individuals not that useful • Sequence everybody at low-x & impute • Make the population the target not the individual – ~250K pigs, Genus – ~250K chickens, Aviagen Genomic selection • GS theory developed in 2001 before technology available • First 50K SNP chip (cattle) 2008; 650K in 2010 • GS implemented in all major livestock sectors in developed world • GS is underpinning faster, more accurate and sustainable genetic improvement • From SNPs to sequence (via imputation) • Adding knowledge of SNP effects – Coding/non-coding; known/predicted 2012 ENCODE 2012 Pig genome sequenced 2012 Chicken 600K SNP chip 2016 Improved reference genomes – goat, pig, sheep, cattle, chicken, 2013 Goat genome sequenced 2013 Duck genome sequenced 2015 Functional Annotation of Animal Genomes (FAANG) launched 2016 FAANG-Europe COST Action 2014 Sheep genome sequenced 2013 onwards Genotype-bysequence 2014 Salmon SNP chip 2015 Pig 650K SNP chip 2015 onwards LCseq for genomic selection SNPs impute to sequence Fish: Tilapia, Cod, Salmon,…… From sequence to consequence Phenome Growth Feed efficiency Body composition Disease resistance Adapted from Ritchie et al. 2015 Nature Reviews Genetics 16: 85 Reference genome improvement • PacBio long read technology, de novo assembly – Goat, pig, sheep, cattle • Sscrofa10.2: 73,500 contigs; contig N50 ~80 kbp • Sscrofa11: <200 contigs; contig N50 ~35 Mbp • Disruptive technology, multiple genome(s) assemblies – Annotation - “Best in genome” – Graph visualization, alignment tools under development Discovering functional sequences • Evolutionary – Sequence comparison, conservation – 1000G, G10K,… – Genome sequence sufficient – Conserved, but what is it? – Highly variable ≠ nonfunctional • Functional, biochemical – Assay-by-sequence – ENCODE, iHEC, Epigenome roadmap, FAANG – Expensive – Exploring 4-demensional space (location + time) – Noise or biologically meaningful? • 80.4% participates in at least one biochemical RNA- and/or chromatinassociated event in at least one cell type • promoter functionality can explain most of the variation in RNA expression • SNPs associated with disease by GWAS are enriched within noncoding functional elements >$250 million Richly annotated reference genomes • A key shared (open access) resource – for 21st century biological research • For effective exploitation – for genomics enabled prediction • e.g. selective animal breeding • Expensive shared resources – International collaborative consortia
© Copyright 2026 Paperzz