Next Generation Sequencing Current Status and Prospects Avian Genomics in the 21st Century The Roslin Institute and Royal (Dick) School of Veterinary Studies University of Edinburgh [email protected] Sequencing Technologies Read length Raw read (bases) Accuracy (%) Reads per run Gbases 96 0.0003 NGS technology Sequencing principle 1st generation Sanger Dideoxy sequencing ~1,000 ≥99.999 Roche/454 Illumina/Solexa ABI/SOLiD Pyrosequencing Reversible terminator chemistry Sequencing by ligation 350-450 36–100 35-60 ≥99 ≥98–99 ≥99.99 8.00E+05 0.4 6.00E+09 600 1.00E+08 50–120 3rd generation PacBio Helicos Single-molecule sequencing Single-molecule sequencing 1000-4500 25–55 ≥80 ≥97 4.80E+04 0.05 6.00E+08 21–35 2nd generation 2 First Generation Sequencing Frederick Sanger In 1958 awarded Nobel prize in chemistry "for his work on the structure of proteins”. In 1980, Gilbert and Sanger shared the chemistry prize "for their contributions concerning the determination of base sequences in nucleic acids". 3 First Generation Sequencing 4 Genetic Maps • • • • • • Genetic markers (e.g. Microsatellites) Mapping populations (e.g. East Lansing) Comparative maps (e.g. Chicken/Human) Resource populations (e.g. B X L cross) QTL mapping Marker-Assisted-Selection 5 QTL Mapping 6 QTL Mapping • QTL not mapped precisely • Confidence intervals for QTL large • Markers account for limited genetic variation (~4%) 7 Genomic Tools • Expressed sequence tags (ESTs) • Chicken genome sequence • Gene expression chips – Affymetrix/Chicken Genome Consortium • 3M SNPs between RJF, Broiler, Layer and Silkie lines • 3, 20, 42, 60K SNP panels • ARK-Genomics facility • Genome Browsers – Ensembl and NCBI 8 Genomic Tools • Expressed sequence tags (ESTs) • Chicken genome sequence • Gene expression chips – Affymetrix/Chicken Genome Consortium • 3M SNPs between RJF, Broiler, Layer and Silkie lines • 3, 20, 42, 60K SNP panels • ARK-Genomics facility • Genome Browsers – Ensembl and NCBI 9 Genomic Tools • Expressed sequence tags (ESTs) • Chicken genome sequence • Gene expression chips – Affymetrix/Chicken Genome Consortium • 3M SNPs between RJF, Broiler, Layer and Silkie lines • 3, 20, 42, 60K SNP panels • ARK-Genomics facility • Genome Browsers – Ensembl and NCBI 10 Genomic Tools • Expressed sequence tags (ESTs) • Chicken genome sequence • Gene expression chips www. ark-genomics.org – Affymetrix/Chicken Genome Consortium • 3M SNPs between RJF, Broiler, Layer and Silkie lines • 3, 20, 42, 60K SNP panels • ARK-Genomics facility • Genome Browsers – Ensembl and NCBI 11 Further BBSRC support 2011-2014 (Gallus 4, RNAseq, SNPs...) 12 Sequencing Technologies Read length Raw read (bases) Accuracy (%) Reads per run Gbases 96 0.0003 NGS technology Sequencing principle 1st generation Sanger Dideoxy sequencing ~1,000 ≥99.999 Roche/454 Illumina/Solexa ABI/SOLiD Pyrosequencing Reversible terminator chemistry Sequencing by ligation 350-450 36–100 35-60 ≥99 ≥98–99 ≥99.99 8.00E+05 0.4 6.00E+09 600 1.00E+08 50–120 3rd generation PacBio Helicos Single-molecule sequencing Single-molecule sequencing 1000-4500 25–55 ≥80 ≥97 4.80E+04 0.05 6.00E+08 21–35 2nd generation 13 NGS: Illumina/Solexa 14 Clonal Single Molecule Arrays Attach single molecules to surface Amplify to form clusters Random array of clusters 15 Sequencing By Synthesis 3’ 5’ Cycle 1: Add sequencing reagents First base incorporated Remove unincorporated bases Detect signal A T G C C G T PPP T A C A Deblock and defluor Base Fluor C G A T T A G A C T Cycle 2-n: Add sequencing reagents and repeat C C G A G C T C • • • All four labeled nucleotides in one reaction High accuracy Base-by-base sequencing G A T 5’ 16 Base Calling from Raw Data TG C TAC GAT … 1 2 3 4 5 6 7 8 9 TTTTTTTGT… Identity of each base of a cluster is read off from sequential images 17 Applications Minou N. 2010 Eukaryotic Cell 9:1300-1310 18 Applications 19 Avian Genomes Ning Li, Yao Feng Zhao, China Agricultural University, Beijing, China Wubin Qian, Ju Wang, Beijing Genome Institute, Shenzhen, China David W Burt, Jacqueline Smith, Yinhua Huang, University of Edinburgh, UK 20 Avian Genomes Flight Small genome Unique karyotype Immune system Learning Migration Lifespan … 21 Phylogenomics • Clade and species-specific biology • Gene diversification – Gene innovation, duplication and expansion – Gene deletion, contraction and extinction – Selection constraints on protein coding sequences (negative, neutral, positive) 22 Computational Pipeline for Ensembl/Compara Process WUBlastp + SmithWaterman hcluster_sg1 multiple aligners consensified by M-Coffee TreeBeST Javier Herrero, Leo Gordon, Steve Searle, European Bioinformatics Institute, UK 23 Gene Family Expansion and Contraction of Adaptive Significance? • CAFE “Computational Analysis of gene Family Evolution” (Hahn et al, 2007) was used to predict gene family expansions and contractions of putative adaptive value • CAFE models gene expansion/contraction as a “birth/death” process with a specific probability • This value may be the same for all lineages or may vary in two or more lineages • The likelihood can be calculated and compare different models 24 Changes in gene family size along each branch Average expansion = (total genes gained – total genes lost)/n -0.005 +0.001 +0.081 -0.046 +0.046 +0.011 -0.035 +0.066 -0.068 +0.102 +0.084 +0.051 -0.022 +0.014 -0.126 -0.196 -0.074 -0.025 -0.052 -0.091 -0.020 -0.160 +0.010 -0.099 -0.060 +0.027 MRCA -0.336 -0.109 +0.073 -0.201 +0.001 +0.202 Million year before present 25 Accelerated Evolution of Genes Selection constraints on protein coding sequences (negative, neutral, positive) Heebal Kim, Taehun Kim, Seoul National University. Korea and Rasmus Nielsen, University of CaliforniaBerkeley, USA ω= dN/dS Birds vs. Mammals Adaptive evolution or relaxed selective constraint, during last ~100 million years? 26 Compare Rates of Evolution Selection constraints on protein coding sequences (negative, neutral, positive) ω= dN/dS Birds vs. Mammals 4,224 orthologs between eight species, 766 showed accelerated evolution in birds and 762 in mammals 27 Rates Birds > Mammals proliferation of B cells activation and migration of leukocytes and T-cells cardiovascular system (metabolic demands of flight, running, swimming and diving) Beak shape and size nervous system and behavior (birds have around three times the visual acuity of humans) hepatic function (migratory birds) 28 28 Rates Mammals > Birds movement of B cells lymphoid tissue structure and development (no lymph nodes in birds) Reproduction endocrine system development embryonic development visual system 29 Species Latin Names Adelie penguin American Crow Angola turaco Anna's hummingbird Barn owl Bar-tailed trogon Brown mesite Budgerigar Caribbean flamingo Chicken Chimney swift Common Cuckoo Crested Ibis Crowned crane Cuckoo roller Dalmatian pelican domestic pigeon Downy Woodpecker Emperor penguin Golden-collared Manakin Great black cormorant Great tinamou Great-crested grebe Pygoscelis adeliae Corvus brachyrhynchos Tauraco erythrolophus Calypte anna Tyto alba Apaloderma vittatum Mesitornis unicolor Melopsittacus undulatus Phoenicopterus ruber Gallus gallus Chaetura pelagica Cuculus canorus Nipponia nippon Balearica regulorum gibbericeps Leptosomus discolor Pelecanus crispus Columba livia Picoides pubescens Aptenodytes forsteri Manacus vitellinus Phalacrocorax carbo Tinamus major Podiceps cristatus Sequence Number Species Depth Genes 60X 15,300 Hoatzin 90X 16,742 Houbara Bustard 30X 14,667 Javan rhinoceros hornbill 110X 16,750 Kea 27X 14,048 Killdeer 28X 14,917 Little egret 29X 15,275 Medium ground finch 30X 16,368 Nightjar 33X 13,811 Northern Carmine bee-eater 7x Sanger 16,516 Northern Fulmar 106X 15,608 Ostrich 100X 15,681 Peking duck 105X 16,434 Peregrine falcon 33X 14,821 Red throated loon 32X 14,719 Red-legged seriema 34X 14,353 Rifleman 64X 17,300 Speckled mousebird 105X 16,396 Sunbittern 60X 16,470 Turkey 110X 16,103 Turkey vulture 24X 13,909 white-tail eagle 100X 15,504 White-tailed tropicbird 30X 13,957 Yellow-thoated Sandgrouse Zebra finch Latin Names Ophisthocomus hoazin Chlamydotis undulata Buceros rhinoceros silvestris Nestor notabilis Charadrius vociferus Egretta garzetta Geospiza fortis Caprimugus Carolinensis Merops nubicus Fulmarus glacialis Struthio camelus Anas platyrhynchos domestica Falco peregrinus Gavia stellata Cariama cristata Acanthisitta chloris Colius striatus Eurypyga helias Meleagris gallopavo Cathartes aura Haliaeetus albicilla Phaethon lepturus Pterocles guturalis Taeniopygia guttata Sequence Depth 100X 27X 35X 32X 100X 74X 115X 30X 37X 33X 85X 50X 105X 33X 24X 29X 27X 33X 30C 25X 26X 39X 25X 6X Sanger Avian Phylogenomics Group: BGI, Duke, Univ Copenhagen, Am Museum Nat His, Bowdoin, Cal Academy Sci, Cardiff, CNPq Brazil, Copenhagen Zoo, Florida, Griffith, Harvard, Heidelberg Institute Theoretical Physics, Mississippi Sate Univ, Montellier Univ, Murdoch Univ, New Mex State Univ, NIEHS, NIH, OHSU, San Diego Zoo, Smithsonian, U Texas Austin, UCSC, Univ DelawareUniv Maryland, Univ Minnesota, Univ Sydney, Utah, Wash Univ,, Roslin/Univ Edinburgh 30 Number Genes 14,937 14,090 13,835 14,736 16,146 15,814 16,780 14,502 14,019 14,186 15,417 19,144 16,262 13,933 15,329 16,034 14,807 13,582 14,108 13,600 13,793 14,667 14,897 17,471 Applications 31 QTL Mapping • QTL not mapped precisely • Confidence intervals for QTL large • Markers account for limited genetic variation (~4%) 32 Genome Wide Selection • Genotype 1000’s of markers to predict breeding values • High density SNP panel for whole genome (e.g. 600K) • QTL close to one or more markers • Allows SNP with smaller effects to be used effectively • GWS will account for all QTL and all genetic variation 33 SNP Discovery for Array Illumina ultra highthroughput sequencing 243 chickens from 24 lines, samples in pools of 10-15 individuals; Av. coverage 7-17X per line Sequence alignment to reference genome Used new GGA genome assembly (still unpublished) SNP detection: 78M SNPs (segregating in one or more lines) Criteria: Samtools Phred score ≥ 20, MAF ≥ 5, coverage ≥ 5, variant present in at least 5% of the reads SNP selection (stage1): 24M Criteria: Phred quality score ≥ 60 SNP selection (stage2): 10M Criteria: No other SNPs within 10bp at least on one side, uniformly paced out according to genetic distance SNP selection (stage3): 2M Criteria: Predicted reproducibility in array, 50:50 broiler and layer SNPs SNP selection (stage4): 650K Criteria: True reproducibility, Mendelian inheritance, HWE, LD 34 Distribution of SNPs 20 % of SNPs 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Z Chromosome Number of SNP/Kb 78M 24M 10M 2M 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Z Chromosome Number of SNPs/cM 78M 24M 10M 2M 40000 30000 20000 10000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20 21 22 23 24 25 26 27 28 Z Chromosome 78M 24M 10M 2M 35 Distribution of Minor Allele Frequency 50 % of SNPs 40 30 20 10 0 0-0.05 0.05-0.1 0.1-0.15 0.15-0.2 0.2-0.25 0.25-0.3 0.3-0.35 0.35-0.4 10M 2M 0.4-0.45 0.45-0.5 0.5-0.55 MAF 78M 24M 36 Annotation of SNPs Stop gain/loss 1% 50 45 40 Synonymous 48% 35 % Non-synonymous 51% 30 25 20 15 10 5 0 Intergenic Intronic Exonic Upstream Downstream 37 SNP Genotyping Panels • • • • • • 3K 6K 20K 42K 60K 600K 192 samples/run 125 million genotypes/run 38 Final Panel Selection • 600K panel to be selected based on – Call rate of markers – Mendelian inheritance (MI) – Minimum allele frequency (MAF) – Linkage disequilibrium (LD) – Prediction of SNP effects on coding sequence 39 Criteria for Passing SNPs • Polymorphic, with at least 3 examples of the minor allele • Robust assay: – Genotype call rate (≥98%) – Cluster separation – Reproducibility 40 Applications of SNP Panel • • • • • Genomic selection: broilers and layers Genome wide association studies High resolution genetic mapping Selection signature analysis SNP annotations, phenotypic effects and functional studies 41 Structural Variants 42 Acknowledgements Funding BBSRC/Defra LINK; Aviagen Ltd, Affymetrix Ltd; German Federal Ministry of Education and Research Roslin Institute David Burt John A. Woolliams Chris Haley Almas Gheyas Clarissa Boschiero Andy Law Le Yu Peter Kaiser Paul Hocking Aviagen Kellie A. Watson Andreas Kranis Hyline Janet E. Fulton ARK genomics Richard Talbot Frances Turner Sarah Smith Alison Downing Mark Fell Affymerix Fiona Brew Lucy Raynold Ali Pirani Synbreed Henner Simianer Ruedi Fries Rudolf Preisinger Steffen Weigend Klaus Meyer George Haberer Saber Qanbari 43 Applications 44 RNA-Seq 45 Gene Models: CRY1 46 Gene Models: TEF 47 Infectious Bursal Disease • • • • • • • Also known as Gumboro disease Caused by a Birnavirus (ds RNA) Usually diagnosed at 3-6 weeks old Spread through contaminated feed and water Infects B-cells Mortality can be up to 90% (usually around 20%) Symptoms: anorexia, depression, diahorrea, ruffled feathers, bursal lesions, immuno-suppression • Vaccination program (but different serotypes) 48 Experimental Design • 3 spleen samples from control birds (line BrL) • 3 spleen samples from IBDV-infected birds (4dpi) (line BrL) • Compared Affymetrix whole genome expression arrays with RNA-Seq 49 RNA-Seq Bioinformatics Fastqc fastx Soap2 Our own database Counts of RNA-Seq tags for each gene edgeR 50 Differential Gene Expression Gene Symbol ART1 IL28B PTX3 IFNB1 VEPH1 MX2 RSAD2 IFIT5 TMPRSS2 LYG1 Gene Description ADP-ribosyltransferase 1 [Gallus gallus] Interferon lambda, Interleukin 28 ; [Gallus gallus] pentraxin-related gene, rapidly induced by IL-1 beta [Gallus gallus] Interferon type B Precursor [Gallus gallus] ventricular zone expressed PH domain homolog 1 (zebrafish) [Gallus gallus] myxovirus (influenza virus) resistance 2 (mouse) [Gallus gallus] radical S-adenosyl methionine domain containing 2 [Gallus gallus] interferon-induced protein with tetratricopeptide repeats 5 [Gallus gallus] transmembrane protease, serine 2 [Gallus gallus] lysozyme G-like 1 [Gallus gallus] LOC768689 TNFRSF13C DAAM1L LOC424146 FAM5B PTPN5 DCLK1 PROKR2 AMY1C CLRN3 hypothetical protein LOC768689 [Gallus gallus] tumor necrosis factor receptor superfamily, member 13C [Gallus gallus] dishevelled-associated activator of morphogenesis 1-like [Gallus gallus] hypothetical LOC424146 [Gallus gallus] family with sequence similarity 5, member B [Gallus gallus] protein tyrosine phosphatase, non-receptor type 5 (striatum-enriched) [Gallus gallus] doublecortin-like kinase 1 [Gallus gallus] Prokineticin receptor 2 [Gallus gallus] amylase, alpha 1C (salivary) [Gallus gallus] clarin 3 [Gallus gallus] adjPVal FC 2.13E-235 166 5.60E-09 159 1.63E-17 89 4.69E-07 71 2.70E-17 57 1.21E-66 52 5.22E-61 48 4.62E-15 42 9.01E-26 42 2.58E-48 41 1.72E-10 1.18E-10 2.89E-13 6.77E-20 2.32E-45 3.45E-38 1.05E-15 2.55E-18 9.82E-09 3.60E-106 -15 -15 -16 -16 -17 -17 -23 -37 -40 -49 51 Annotated Genes • Microarrays – 693 / 828 (84%) annotated Affymetrix probes • RNA-Seq – 1509 /1867 (81%) annotated RNA tags – 1082 (72%) unique to RNA-Seq 52 Enrichment Analysis: Microarray Microarrays: Genes up: 330 Genes down: 223 GO enrichment: Up-regulated genes Immune response; cytokine activity; chemokine activity; regulation of IL-6 etc. Down-regulated genes Protein binding Enriched locations: None TFBS enrichment: Up-regulated genes ISRE, IRF7, Ovo Down-regulated genes None 53 Enrichment Analysis: RNA-Seq RNA-Seq: Genes up: 733 Genes down: 822 GO enrichment: Up-regulated genes As for array data Down-regulated genes Carbohydrate binding; structure of ribosome; biological adhesion; multicellular organismal development Enriched locations: Up-regulated genes chr1, chr20 Down-regulated genes chrZ, chr4 TFBS enrichment: Up-regulated genes ISRE, IRF7, ZNF42 Down-regulated genes CdxA, Nkx6_2, RSRFC4, Prrx2, FOXP1 54 Advantages of RNA-Seq 55 Alternate Transcripts 56 Acknowledgements Funding Biotechnology and Biological Sciences Research Council Roslin Institute Dave Burt Bob Paton Ark-Genomics Le Yu Institute for Animal Health Pete Kaiser Jean-Remy Sadeyen Centre for Genomic Regulation (Barcelona) Darek Kedra Cedric Notredam 57 Avian RNA-Seq Consortium • 37+ labs world-wide, agreed to pool RNA-Seq data • Multiple tissues, treatments, embryo and adults • Build gene models within Ensembl • Return for data analysis of gene expression 58 Applications 59 60 DNA Methylation 61 MeDIP-Seq: NPAS4 62 Data Integration 63 Sequencing Technologies Read length Raw read (bases) Accuracy (%) Reads per run Gbases 96 0.0003 NGS technology Sequencing principle 1st generation Sanger Dideoxy sequencing ~1,000 ≥99.999 Roche/454 Illumina/Solexa ABI/SOLiD Pyrosequencing Reversible terminator chemistry Sequencing by ligation 350-450 36–100 35-60 ≥99 ≥98–99 ≥99.99 8.00E+05 0.4 6.00E+09 600 1.00E+08 50–120 3rd generation PacBio Helicos Single-molecule sequencing Single-molecule sequencing 1000-4500 25–55 ≥80 ≥97 4.80E+04 0.05 6.00E+08 21–35 2nd generation 64 PacBio Real-Time Sequencing 65 Sequencing Technologies Read length Raw read (bases) Accuracy (%) Reads per run Gbases 96 0.0003 NGS technology Sequencing principle 1st generation Sanger Dideoxy sequencing ~1,000 ≥99.999 Roche/454 Illumina/Solexa ABI/SOLiD Pyrosequencing Reversible terminator chemistry Sequencing by ligation 350-450 36–100 35-60 ≥99 ≥98–99 ≥99.99 8.00E+05 0.4 6.00E+09 600 1.00E+08 50–120 3rd generation PacBio Helicos Single-molecule sequencing Single-molecule sequencing 1000-4500 25–55 ≥80 ≥97 4.80E+04 0.05 6.00E+08 21–35 2nd generation 66
© Copyright 2026 Paperzz