2/7/2013 Overview • Low coverage analysis of 6 Saccharum genotypes • Our BAC sequencing work to date Unravelling polyploidy • Future directions? NGS Approaches to Sugarcane Genomics Paul Berkman | OCE Postdoctoral Fellow – Sugarcane Genome Bioinformatics Sunday 13 January 2013 CSIRO PLANT INDUSTRY Unravelling Polyploidy | Paul Berkman | Page 2 Low-coverage WGS Low-coverage - Chloroplast phylogeny S. officinarum • IJ76-516 • LA-Purple • Six Saccharum varieties S. sponteneum • Mandalay • Each dataset mapped to published chloroplast sequence • Consensus sequence determined for each genotype • Phylogeny determined using nearest neighbour joining tree • ~0.3-0.8x individual coverage • (~0.1-0.4x after filtering) • ~1-3x combined monoploid coverage • Sorghum as an out-group S. bicolor S. officinarum X S. sponteneum • Q165 • R570 • SP80-3280 • Non-comprehensive datasets Mandalay SP80-3280 NCo-310 • Useful for limited analysis Q165 • Chloroplast phylogeny • Repeats two approaches • Monoploid genome size estimation R570 IJ76-514 0.0050 Unravelling Polyploidy | Paul Berkman | Page 3 Unravelling Polyploidy | Paul Berkman | Page 4 Low-coverage - Repeats analysis Low-coverage - Repeats analysis Homology Homology • Filtered readsets against TIGR Gramineae repeats database • • • • BLASTN (2.2.24+) TIGR_Gramineae_Repeats.v3.3 E-value 1e-5 Categorisation http://plantrepeats.plantbiology.msu.edu/about.html LA Purple 180000 160000 140000 120000 100000 80000 IJ76 LA-purple IJ76 LA-Purple Mandalay Q165 R570 SP80-3280 60000 40000 # repeats hit % repeats hit # repeat hits # repeat reads % repeat reads 2,535 2,908 2,520 2,515 2,551 2,622 48.18% 55.27% 47.90% 47.80% 48.49% 49.84% Mandalay Q165 R570 20000 SP80 0 5,687,659 4,998,194 3,854,094 4,209,774 3,692,524 5,091,104 331,307 392,951 211,199 244,160 311,808 540,213 3.81% 4.52% 2.43% 2.81% 3.58% 6.21% Unravelling Polyploidy | Paul Berkman | Page 5 Unravelling Polyploidy | Paul Berkman | Page 6 1 2/7/2013 Low-coverage - Repeats analysis Low-coverage - Repeats analysis Homology repeat profiles Kmer analysis Ty1-copia 2% Unclassified repeat Ty3-gypsy 3% 1% Ty1-copia 2% Telomereassociated repeats 13% Mandalay Ty3-gypsy 2% transposons 1% Other transposons 1% 45S rDNA 39% Other retrotranspos ons 20% MITE 2% 11% R570 Unclassified repeat 3% Ty3-gypsy 2% Ty1-copia 2% Other retrotranspos ons 28% Ty3-gypsy 2% Unclassified repeat 3% 5S rDNA 4% Centromereassociated repeats 18% CACTA 1% Other transposons 1% Other retrotranspos ons 29% 5S rDNA 2% Ty3-gypsy Ty1-copia 2% 4% 45S rDNA 10% 5S rDNA 1% 45S rDNA 21% CACTA 1% Other retrotranspos ons 31% Other transposons 1% MITE 2% MITE 1% Q165 R570 SP80-3280 ALL 24.72% 37.90% 36.26% 28.59% 21.63% 11.22% 65.26 % 75.28% 62.10% 63.74% 71.41% 78.37% 88.78% 4.81 6.21 4.14 5.07 7.24 Non-unique kmers CACTA 1% LA-Purple Mandalay 34.74% Average number nonunique kmer occurrences 4.69 4.64 Centromereassociated repeats 18% Telomereassociated repeats 27% Centromereassociated repeats 15% IJ76 Unique kmers SP80-3280 Unclassified repeat 2% Telomereassociated repeats 23% CACTA 1% Centromereassociated repeats 22% MITE 1% Ty1-copia 3% 45S rDNA 23% 5S rDNA 5% Other retrotranspos ons 31% Centromereassociated repeats 19% 5S rDNA Centromere2% associated CACTA 0% repeats Q165 Telomereassociated repeats 17% 45S rDNA 25% 5S rDNA 2% CACTA 1% Other retrotranspos ons 32% MITE 1% Determined optimal repeat kmer size to be 20 Calculated proportion of unique vs. non-unique kmers Unclassified Telomere- Ty1- Ty3-gypsy repeat 4% associated copia 2% 2% repeats Other 6% 45S rDNA 22% Telomereassociated repeats 20% Other transposons 1% Other transposons 1% Unclassified repeat 4% LA-purple IJ76-514 MITE 1% Unravelling Polyploidy | Paul Berkman | Page 7 Unravelling Polyploidy | Paul Berkman | Page 8 Low-coverage - Monoploid genome size Low-coverage - Monoploid genome size IJ76-514 Q165 LA Purple R570 Mandalay SP80-3280 • Individual assembly of NGS datasets • Each of the 6 genotypes • CLC Genomics Workbench • Output average coverage of contigs • Statistical analysis in R • Normalised contig coverage for each genotype • Estimated monoploid coverage of each genotype • TOTAL DATA VOLUME x MONOPLOID COVERAGE = monoploid genome size Unravelling Polyploidy | Paul Berkman | Page 9 Unravelling Polyploidy | Paul Berkman | Page 10 Low-coverage - Monoploid genome size BAC sequencing - Approach • Genotype-specific monoploid genome size estimates • 72 BACs Genotype IJ76-514 LA-purple Mandalay Q165 R570 SP80-3280 *** Revised Filtered data ** Monoploid Full monoploid (Mbp) coverage (x) coverage (x) coverage (x) 1173.41 1937.01 1221.95 1198.55 1518.03 2761.75 Unravelling Polyploidy | Paul Berkman | Page 11 3.33 5.52 3.75 4.40 3.83 6.78 0.25 0.41 0.34 0.33 0.29 0.51 2.17 3.81 2.65 2.74 1.96 3.20 *** Revised monoploid genome size (Mbp) 795 851 773 710 1,051 1,327 • 37 targeted • 35 randomly extracted • Illumina HiSeq2000, 100bp paired-end reads • 87.99 Gbp total sequence data generated • Assembly process • Remove bacterial sequence contamination • Hard-trim last 20 bp or 40 bp from reads • Assembled using local assembler Unravelling Polyploidy | Paul Berkman | Page 12 2 2/7/2013 BAC sequencing - Results BAC sequencing - Directions • High quality raw assemblies Total number of reads 439,938,797 Total volume sequence data (Mbp) 87,987.76 Bacterial sequence contamination 13.06% Total number assembled contigs 434 Total assembly Size (bp) 7,883,217 Total assembly N50 (bp) 39,008 Overall longest contig (bp) 128,410 Number contigs > 10 kbp 203 Number contigs > 100 kbp 7 • Comparable or better results than published 454 BAC assemblies • WGS mate-pair libraries for improved scaffolding • High throughput sequencing • 96 BACs/lane barcoded on HiSeq • Pooled BAC approach appears promising (4-6 BACs/barcode) • Sugarcane genome • 384 BACs/lane? • Full monoploid genome on a single HiSeq Run? • Under $500k USD? Unravelling Polyploidy | Paul Berkman | Page 13 Unravelling Polyploidy | Paul Berkman | Page 14 Questions to be resolved Acknowledgements • Alleles/genes & homoeologs/homologs • How do we think about these? • How will these be presented? • Intra- vs. Inter-varietal vs. Inter-species polymorphism? • Large-scale systems approach to sugarcane • How to integrate multi-omics data? • How will we approach the effects of polyploidy on the system? • Are current tools/approaches sufficient? Unravelling Polyploidy | Paul Berkman | Page 15 1c10 Collaborators CSIRO Colleagues Peter Bundock Karen Aitken Robert Henry Rosanne Casu Jiri Stiller BAC Collaborators Anne Rae Paul Visendi Jingchuan Li Hélène Bergès Jai Perroux Angélique D'Hont Olivier Garsmeur Carine Charron Unravelling Polyploidy | Paul Berkman | Page 16 Thank you CSIRO Plant Industry Paul Berkman OCE Postdoctoral Fellow t +61 7 3214 2361 e [email protected] w www.csiro.au/pi CSIRO PLANT INDUSTRY 3
© Copyright 2026 Paperzz