Adapterama I: An Integrated Approach for Double-Quadruple Indexed Illumina Libraries for Amplicons, RADseq, and Whole Genomes Travis Glenn, Brant Faircloth, et al. Brant Faircloth LSU Roger Nilsen Troy Kieran Todd Pierson John Finger Kerin Bentley University of Georgia Sandra Hoffberg [email protected] Thanks to folks @: Our Study Species http://blogs.wayne.edu/ Talk Outline • Adapterama I - Combinatorial Tagged Libraries - Genomic, Amplicon & RADseq Libs • Adapterama II - Sequence Capture - Ultraconserved elements - Applications of UCEs For Next-Gen Sequencing DNA samples need a unique adapter on each end Double Stranded DNA sample 5’ 3’ A Double stranded Linkers/adapters Double stranded DNA Library DNA A 3 5’ ’ B B Adapters have 2 functions • Clonal amplification • Sequencing Illumina has separate sequences for each function A DNA Sequencing Amplification B Why we focus on Illumina • Illumina currently has >80% of the sequencing market due to cost, read length, & application advantages. • Their sequencing technology is particularly well suited to large numbers of ID tags. Illumina HiSeq Data If printed on paper (double-sided) would: 1314 feet 1250 feet Be taller than the empire state building literally fill a huge dumptruck One lane! One Flow Cell Downsizing NGS Projects What if I don’t need Mb or Gb per sample? How do I use NGS to reduce the costs for small-scale projects? How do I use NGS to do huge numbers of samples? Next-Gen is cheap per pound, but you have to buy it by the truckload Capillary sequencing = $1400 / Mb Minimum buy in = $5 Illumina MiSeq = $0.14 / Mb Minimum buy in = $1000 How do I get $5 worth of MiSeq data? Key Question How can I divide a truckload of data more efficiently than using commercial ID tags? MiSeq HiSeq Making small amounts of cheap NGS data requires: • Low-cost library preparations – Multiplex within standard libraries • Lots of ID-tags (indexes) – >Thousands (not hundreds) • All libraries must be poolable – Use standard sequencing primers – Pool amplicons, genomic, RADseq, sequence capture, etc. Not Enough Tags Tags Available Tags Needed Approach 1: More (& better) tags Faircloth and Glenn (2012) Not all sequence tags are created equal... PLoS ONE 7(8):e42543 The Cost of Many Tags 96 x 96 = 9216 samples 96 plates ID-tagged Primers Cost (US $) 9216 $232,250 96 DNA samples 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H Need tags 11 nt long This is Expensive! Combinatorial Approach Use multiple tags on each sample in different combinations Tag 1 Tag A Tag 2 Tag B Tag 1 DNA sample 1 Tag A Tag 1 DNA sample 2 Tag B Tag 2 DNA sample 3 Tag A Tag 2 DNA sample 4 Tag B The Power of Multiple Tags 96 x 96 = 9216 samples 96 plates 96 DNA samples 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H Tag ID-tagged Positions Primers Cost (US $) 1 9216 $232,250 2 96+96 $4,864 So, combinatorial tagging requires a _lot_ fewer primers & reduces costs tremendously! Adapter wish list Adapters should accommodate: • Intact or sheared DNA • Small amounts of DNA • Amplicons • Sequence Capture • RADseq & GBS • Whole Genome Sequencing Original Illumina Indexing Method Shredded DNA sample 5’ 3’ DNA 3’ 5 ’ B primer stub rev. comp. A primer stub DNA Ligate Y-yoke adapters T/A overhangs not shown P P B primer stub rev. comp. Limited cycle PCR A primer stub B primer A primer index A Full-length & Indexed DNA B Adapterama Overview Variety of input DNA DNA Applicationspecific Adapters Consistent Sets of PCR Primers DNA 5’ 3’ P P iTru_5 primer indexed A primer indexed B primer i5 index Consistent Double Indexed Libraries 3’ 5 ’ A iTru_7 primer i7 index DNA B Adapterama Overview • Illumina TruSeq (or Nextera) compatible • 2-stage library preparations • Same 2nd stage primers for all methods • Dual-indexed (8 nt) primers • >384 total designed (most Illumina compatible) • Protocols & tools are freely available • Oligo aliquots & services available at cost iTru Double Index Libraries Similar to Illumina TruSeq HT – just a homebrew with many more indexed primers available Shredded DNA sample + A DNA 5’ 3’ 3’ 5 T/A overhangs ’ not shown T/A overhangs not shown B primer stub rev. comp. A primer stub DNA iTru Y-yoke Adapter + T P P B primer stub rev. comp. Limited Cycle PCR A primer stub iTru_5 primer iTru_7 primer indexed A primer indexed B primer i5 index Full-length & Double Indexed A i7 index DNA B Adapterama I: iTru Libraries Preparation of genomic DNA DNA shearing Adapterama I: iTru Libraries Preparation of genomic DNA Blunt end repair A-tailing 5’ 3’ A A 3’ 5’ Adapterama I: iTru Libraries Ligation of stubby Y-yoke adapter Sequence complementary to i7 adapter Sequence complementary to i5 adapter Y-yoke structure GCTCTTCCGATCT CGAGAAGGCTAGA Y-yoke structure AGATCGGAAGAGC TCTAGCCTTCTCG Adapterama I: iTru Libraries Ligation of stubby Y-yoke adapter GCTCTTCCGATCT CGAGAAGGCTAGA AGATCGGAAGAGC TCTAGCCTTCTCG Adapterama I: iTru Libraries Step 2: Limited cycle PCR iTru7 primer 5’-ACACTCTTTCCCTACACGACGCTCTTCCGATCT 3’-TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3’ TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5’ i7 index P7 sequence Adapterama I: iTru Libraries Limited cycle PCR 5’-ACACTCTTTCCCTACACGACGCTCTTCCGATCT 3’-TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3’ TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5’ Adapterama I: iTru Libraries Limited cycle PCR P5 sequence i5 index 5’-AATGATACGGCGACCACCGAGATCTACACACCGACAAACACTCTTTCCCTACACGACGCTCTTCCGATCT 3’-TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA iTru5 primer AGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGGTCACTATCTCGTATGCCGTCTTCTGCTTG-3’ TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5’ Adapterama I: iTru Libraries Limited cycle PCR 5’-AATGATACGGCGACCACCGAGATCTACACACCGACAAACACTCTTTCCCTACACGACGCTCTTCCGATCT 3’-TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA AGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGGTCACTATCTCGTATGCCGTCTTCTGCTTG-3’ TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5’ Adapterama I: iTru Libraries Limited cycle PCR 5’-AATGATACGGCGACCACCGAGATCTACACACCGACAAACACTCTTTCCCTACACGACGCTCTTCCGATCT 3’-TTACTATGCCGCTGGTGGCTCTAGATGTGTGGCTGTTTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA AGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGGTCACTATCTCGTATGCCGTCTTCTGCTTG-3’ TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5 iTru7 primer Adapterama I: iTru Libraries Double indexed, complete DNA molecule 5’-AATGATACGGCGACCACCGAGATCTACACACCGACAAACACTCTTTCCCTACACGACGCTCTTCCGATCT 3’-TTACTATGCCGCTGGTGGCTCTAGATGTGTGGCTGTTTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA AGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGGTCACTATCTCGTATGCCGTCTTCTGCTTG-3’ TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5 The Power of Multiple Tags 96 x 96 = 9216 samples 96 plates 96 DNA samples 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H Tag ID-tagged Positions Primers Cost (US $) 1 9216 $232,250 2 96+96 $4,864 We have: 192 + 384 primers designed 104 + 250 primers in hand 96 + 144 aliquots available (at cost) The Power of Multiple Tags 96 x 96 = 9216 samples 96 plates 96 DNA samples 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H Tag ID-tagged Positions Primers Cost (US $) 1 9216 $232,250 2 96+96 $4,864 aliquots 96+144 $725 If you are making <40,000 libraries you can save $ by getting aliquots from us The Power of Multiple Tags 96 x 96 = 9216 samples 96 plates 96 DNA samples 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H Tag ID-tagged Positions Primers Cost (US $) 1 9216 $232,250 2 96+96 $4,864 aliquots 96+144 $725 If you want to get your own from IDT or whoever, we are happy to share the primer sequences. Combinatorial tagging iTru7 primers 1 iTru5 primers A B C D E F G H 2 3 4 5 6 7 8 9 10 11 12 Combinatorial tagging iTru7 primers 1 iTru5 primers A B C D E F G H 2 3 4 5 6 7 8 9 10 11 12 Combinatorial tagging iTru7 primers 1 iTru5 primers A B C D E F G H 2 3 4 5 6 7 8 9 10 11 12 Combinatorial tagging iTru7 primers 1 iTru5 primers A B C D E F G H 2 3 4 5 6 7 8 9 10 11 12 Combinatorial tagging iTru7 primers 1 iTru5 primers A B C D E F G H 2 3 4 5 6 7 8 9 10 11 12 Combinatorial tagging iTru7 primers 1 iTru5 primers A B C D E F G H 2 3 4 5 6 7 8 9 10 11 12 Combinatorial tagging iTru7 primers 1 iTru5 primers A B C D E F G H 2 3 4 5 6 7 8 9 10 11 12 Combinatorial tagging iTru7 primers 1 A B iTru5 primers C D E F G H 2 3 4 5 6 7 8 9 10 11 12 Combinatorial tagging iTru7 primers 1 iTru5 primers A B C D E F G H 2 3 4 5 6 7 8 9 10 11 12 Combinatorial tagging iTru7 primers 1 iTru5 primers A B C D E F G H 2 3 4 5 6 7 8 9 10 11 12 What if I have more than 96? iTru5 primers – 4 x 8 = 32 iTru7 primers – 4 x 12 = 48 Set 101 Set 102 Set 103 Set 104 Set 01 Set 02 Set 03 Set 04 What if I have more than 96? iTru5 primers – 4 x 8 = 32 iTru7 primers – 4 x 12 = 48 Set 101 Set 102 Set 103 Set 104 Set 01 Set 02 Set 03 Set 04 01/101 02/102 03/103 04/104 Samples with Unique Tags: 4 x 96 = 384 What if I have more than 96? iTru5 primers – 4 x 8 = 32 iTru7 primers – 4 x 12 = 48 Set 101 Set 102 Set 103 Set 104 Set 01 01/101 01/102 01/103 01/104 Set 02 02/101 02/102 02/103 02/104 Set 03 03/101 03/102 03/103 03/104 Set 04 04/101 04/102 03/103 04/104 Samples with Unique Tags: 16 x 96 = 1536 Combinatorial Tagging Universal Stubs No tags are used here Reusable Pool & PCR with i5 & i7 8 x 12 = 96 Illumina i5 & i7’s 2 oligos Share these 192 x 384 = 73,728 iTru5’s & iTru7’s Pool, Sequence, Deconvolute Up to 96 samples Up to 73,728 samples Amplicon Tagging How can I quickly & easily make amplicon libraries with multiple tags? How do I make them so I can spike in a tiny fraction or use an entire MiSeq run? iTru Fusion PCR Primers Illumina TruSeq Compatible Flanking DNA Region of Interest Flanking DNA 5’-CGTCGGATGTAAGACACACACACACACACTCCGAATCGGCGT-3’ 3’-GCTGCCTACATTCTGTGTGTGTGTGTGTGAGGCTTAGCCGCA-5’ 5’-CGTCGGATGTA-3‘ Forward Primer Read1 3’-GCTTAGCCGCA-5’ Reverse Primer Read2 Add Illumina TruSeq (or Nextera) Read 1 & Read 2 Sequences onto the 5’ end of your primers iTru Double Index 2-step Libraries TruSeq 5’ 3’ TruSeq A stub Fusion Tagged Amplicons Limited Cycle PCR DNA Reverse primer TruSeq B stub 3’ 5 ’ TruSeq iTru_7 primer iTru_5 primer indexed A primer i5 index Full-length & Double Indexed Forward primer indexed B primer i7 index A B Fusion Amplicons • Requires 2 PCRs per sample • Limited by i5xi7 tag combinations • Requires 5-20% PhiX (or other genomic samples) to create diversity What can I do to solve all 3 of these constraints simultaneously? iTru Fusion Primers Illumina TruSeq Compatible Flanking DNA Region of Interest Flanking DNA 5’-CGTCGGATGTAAGACACACACACACACACTCCGAATCGGCGT-3’ 3’-GCTGCCTACATTCTGTGTGTGTGTGTGTGAGGCTTAGCCGCA-5‘ Tag_A Tag_1 iTru_Read2 iTru_Read1 Tagged Reverse Primers Tagged Forward Primers Tag 1 Tag 2 Tag 3 Tag 4 Tag 5 Tag 6 Tag A A-1 A-2 A-3 A-4 A-5 A-6 Tag B B-1 B-2 B-3 B-4 B-5 B-6 Tag C C-1 C-2 C-3 C-4 C-5 C-6 Tag D D-1 D-2 D-3 D-4 D-5 D-6 Internal Tags Vary in Length iTru Quadruple Index 2-step Libraries TruSeq 5’ 3’ TruSeq A stub Fusion Tagged Amplicons Limited Cycle PCR Internal index 1 DNA Internal index 2 Reverse primer TruSeq B stub 3’ 5 ’ TruSeq iTru_7 primer iTru_5 primer indexed A primer i5 index Full-length & Quadruple Indexed Forward primer indexed B primer i7 index A B TaggiMatrix Project Workflow 8 + 12 = 20 primers Forward_Primer_A Forward_Primer_B … Forward_Primer_H Reverse_Primers_1 - 12 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H 96 unique ID_tagged Amplicons Pooled amplicons 96 DNA samples Sequence library Demultiplex Sequences & Analyze Add i5 + i7 to make library Pooled library The Power of Multiple Tags 96 x 96 = 9216 samples 96 unique ID_tagged Amplicons per plate Each pool of 96 into one well of a new master plate 96 plates Sequence library Demultiplex Sequences & Analyze Add unique combinations of i5 + i7 to make libraries Pooled library The Power of Multiple Tags 96 x 96 = 9216 samples 96 plates 96 DNA samples 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H Tag ID-tagged Positions Primers Cost (US $) 1 9216 $232,250 2 96+96 $4,864 3 8+12+96 $1,491 4 8+12+8+12 $486 Hierarchical Tagging Project Specific 8 x 12 = 96 tag combinations 20 oligos Reusable Pool & PCR with i5 & i7 Share these 8 x 12 = 96 Illumina i5 & i7’s 192 x 384 = 73,728 BCF i5 & i7’s Pool, Sequence, Deconvolute Up to 9,216 samples Up to 7,077,888 samples That’s really cool, but… This still seems like a lot of stuff to keep track of, order, think about, etc. So, how can I do it? TaggiMatrix TaggiMatrix So… TaggiMatrix turns a problem of resource limitation into logistical & social problems (i.e., how do you coordinate all these samples). TaggiMatrix Highlights • Standard set of 20 tags – 8 used on Forward Primers – 12 used on Reverse Primers • Tags Vary in Length – This generates sequence diversity downstream • Use Nextera or TruSeqHT Primers – Can use Illumina primers (8 + 12) – Can use iNext or iTru primers (160 + 240) • iNext & iTru versions are validated Adapterama I Summary • Manuscripts are nearing submission • We are happy to share spreadsheets for: • iTru & iNext primers we have designed • TaggiMatrix primer designs • RADseq adapter oligo designs • You can order your own primers from IDT • We do have some primers in plates prealiquoted at IDT for distribution Ultraconserved elements (UCEs) as universal sequence capture probes and loci UCEs = stretches of DNA that are remarkably conserved across highly divergent species (Bejerano et al. 2003) Brant Faircloth is a bioinformatics guru who has been key in developing UCEs and tagging analysis. Primer Aliquots Aliquoted, dried & heat sealed at IDT 1 2 3 4 5 6 7 8 9 10 iTru5 primers A B C D E F G H Even numbered columns are empty 11 12 Primer Aliquots Aliquoted, dried & heat sealed at IDT iTru7 primer plates 1 A B C D E F G H 2 3 4 5 6 7 8 9 10 11 12 UCE probes available from Mycroarray Probe sets for 5000 or 2500 loci can be obtained for $600 from Mycroarray No, I (we) don’t get any of that $$$ http://www.mycroarray.com/mybaits/mybaits-UCEs.html Probes (baits) for regions of interest Biotinylated Oligo Capture •Library Construction: Genomic DNA is fragmented and adapters added to each end •Probe Construction: Make biotinylated probes that are complementary to regions of interest •Use Probes to Enrich Library for fragments that hybridize to probes http://www.opengenomics.com/sureselect-insolution.aspx http://ultraconserved.org/ How does iNext differ from iTru? • Nextera Read1 & Read2 perform the same function as in iTru, but are completely different sequences • Nextera & iNext stubs have G overhangs • Insert DNA must have a C overhang – Commercial kits must be modified • Must use iNext5 & iNext7 primers Utility of TaggiMatrix • Microbial Community Analysis – EPA – Kelvin Wong & Marirosa Molina – USDA – Brian Oakley, Mike Rothrock – UGA - Erin Lipp et al. • STD’s in Chimpanzees (Julie Rushmore) • Tropical Tree Roots (Brant Faircloth) • Tropical Tree Wood-rotting Fungi (Brant Faircloth, Greg Gilbert, Steve Hubbell) 3 Major Ways to Multi-Tag • Hierarchical – multiple tags in series (e.g., inner & outer) • Ecumenical – tags are platform independent • Combinatorial – tags are independent Oligo Library Capture Agilent SureSelect: - 55,000 probes - smallest kit = ~$7k for 10 samples (price assumes discount) Mycroarray MYselect: - 20,000 probes - smallest kit = ~$2.5k for 12 samples http://www.opengenomics.com/sureselect-insolution.aspx That’s keen, but… I don’t have 20,000 loci for my critters and $2,550 / 12 = $212 Can I do this for less than $250/sample? Reducing Costs of Sequence Capture • ID_tag multiple samples, pool them, and do captures on the pools • Use fewer independent probes: 20k probes = 1x concentration 10k probes = 2x concentration 2k probes = 10x concentration http://bad-dna.org/tags/ Sequencing Introduces Errors From B. Faircloth IOB Seminar 2013 Sequencing Process Errors From B. Faircloth IOB Seminar 2013 Errors in Tag Sequences From B. Faircloth IOB Seminar 2013 Errors in Tag Sequences From B. Faircloth IOB Seminar 2013 ID Nomenclature Ion Torrent or 454 ID-Tagging DNA A B Illumina ID Tagging DNA A ID sequence B ID sequence ID –sequences are called: Name MIDs Indexes Barcodes Tags Source 454 Illumina Various Faircloth & Glenn # blessed ~130 ≤96 Varies Up to 9000 Length 10 6 Varies 6-10 Sequencing Projects PCR Number of Individuals 1000s Sequence Capture RADseq UCEs exons WGS 1 1 thousands Number of Loci millions Whole genome Genetic Projects 1000s Number of Individuals II - TaggiMatrix Adapterama PCR IV - Splitaake Sequence Capture RADseq III – M-RAD, 3-RAD & B-RAD UCEs exons I – iTru & iNext 1 1 thousands Number of Loci millions WGS Whole genome Core UCE Research Team Kevin Winker Robb Brumfield John McCormack Nick Crawford Mike Harvey (LSU) (Occidental College) (Cal Acad) (LSU) (UAF) Mike Alfaro Brian Smith Mike Braun Ed Braun (UCLA) (LSU) (SI) (UFL) Noor White (UMD)
© Copyright 2026 Paperzz