Adapterama I: An Integrated Approach for Double

Adapterama I: An Integrated Approach
for Double-Quadruple Indexed Illumina
Libraries for Amplicons, RADseq, and
Whole Genomes
Travis Glenn, Brant Faircloth, et al.
Brant Faircloth
LSU
Roger
Nilsen
Troy
Kieran
Todd
Pierson
John
Finger
Kerin
Bentley
University of Georgia
Sandra
Hoffberg
[email protected] Thanks to folks @:
Our Study Species
http://blogs.wayne.edu/
Talk Outline
• Adapterama I
- Combinatorial Tagged Libraries
- Genomic, Amplicon & RADseq Libs
• Adapterama II
- Sequence Capture
- Ultraconserved elements
- Applications of UCEs
For Next-Gen Sequencing
DNA samples need a unique
adapter on each end
Double Stranded
DNA sample
5’
3’
A
Double stranded
Linkers/adapters
Double stranded
DNA Library
DNA
A
3
5’
’
B
B
Adapters have 2 functions
• Clonal amplification
• Sequencing
Illumina has separate sequences for each function
A
DNA
Sequencing
Amplification
B
Why we focus on Illumina
• Illumina currently has >80% of the
sequencing market due to cost, read
length, & application advantages.
• Their sequencing technology
is particularly well suited to
large numbers of ID tags.
Illumina HiSeq Data
If printed on paper (double-sided) would:
1314 feet
1250 feet
Be taller than the empire state building
literally fill a huge dumptruck
One
lane!
One Flow Cell
Downsizing NGS Projects
What if I don’t need
Mb or Gb per sample?
How do I use NGS to reduce the
costs for small-scale projects?
How do I use NGS to do huge
numbers of samples?
Next-Gen is cheap per pound,
but you have to buy it by the
truckload
Capillary sequencing = $1400 / Mb
Minimum buy in = $5
Illumina MiSeq = $0.14 / Mb
Minimum buy in = $1000
How do I get $5
worth of MiSeq data?
Key Question
How can I divide a truckload of data
more efficiently than using
commercial ID tags?
MiSeq
HiSeq
Making small amounts of
cheap NGS data requires:
• Low-cost library preparations
– Multiplex within standard libraries
• Lots of ID-tags (indexes)
– >Thousands (not hundreds)
• All libraries must be poolable
– Use standard sequencing primers
– Pool amplicons, genomic, RADseq,
sequence capture, etc.
Not Enough Tags
Tags Available
Tags Needed
Approach 1: More (& better) tags
Faircloth and Glenn (2012) Not all sequence tags are created equal...
PLoS ONE 7(8):e42543
The Cost of Many Tags
96 x 96 = 9216 samples
96 plates
ID-tagged
Primers
Cost
(US $)
9216
$232,250
96 DNA samples
1 2 3 4 5 6 7 8 9 10 11 12
A
B
C
D
E
F
G
H
Need tags 11 nt long
This is Expensive!
Combinatorial Approach
Use multiple tags on each sample in different combinations
Tag 1
Tag A
Tag 2
Tag B
Tag 1
DNA sample 1
Tag A
Tag 1
DNA sample 2
Tag B
Tag 2
DNA sample 3
Tag A
Tag 2
DNA sample 4
Tag B
The Power of Multiple Tags
96 x 96 = 9216 samples
96 plates
96 DNA samples
1 2 3 4 5 6 7 8 9 10 11 12
A
B
C
D
E
F
G
H
Tag ID-tagged
Positions Primers
Cost
(US $)
1
9216
$232,250
2
96+96
$4,864
So, combinatorial tagging
requires a _lot_ fewer primers
& reduces costs tremendously!
Adapter wish list
Adapters should accommodate:
• Intact or sheared DNA
• Small amounts of DNA
• Amplicons
• Sequence Capture
• RADseq & GBS
• Whole Genome Sequencing
Original Illumina Indexing Method
Shredded
DNA sample
5’
3’
DNA
3’
5
’
B primer stub
rev. comp.
A primer stub
DNA
Ligate
Y-yoke adapters
T/A overhangs
not shown
P
P
B primer stub
rev. comp.
Limited cycle
PCR
A primer stub
B primer
A primer
index
A
Full-length &
Indexed
DNA
B
Adapterama Overview
Variety of input
DNA
DNA
Applicationspecific Adapters
Consistent Sets of
PCR Primers
DNA
5’
3’
P
P
iTru_5 primer
indexed
A primer
indexed
B primer
i5 index
Consistent Double
Indexed Libraries
3’
5
’
A
iTru_7 primer
i7 index
DNA
B
Adapterama Overview
• Illumina TruSeq (or Nextera) compatible
• 2-stage library preparations • Same 2nd stage primers for all methods
• Dual-indexed (8 nt) primers
• >384 total designed (most Illumina compatible)
• Protocols & tools are freely available
• Oligo aliquots & services available at cost
iTru Double Index Libraries
Similar to Illumina TruSeq HT – just a homebrew with many more indexed primers available
Shredded
DNA sample + A
DNA
5’
3’
3’
5
T/A overhangs
’
not shown
T/A overhangs
not shown
B primer stub
rev. comp.
A primer stub
DNA
iTru Y-yoke
Adapter + T
P
P
B primer stub
rev. comp.
Limited Cycle
PCR
A primer stub
iTru_5 primer
iTru_7 primer
indexed
A primer
indexed
B primer
i5 index
Full-length &
Double Indexed
A
i7 index
DNA
B
Adapterama I: iTru Libraries
Preparation of genomic DNA
DNA shearing
Adapterama I: iTru Libraries
Preparation of genomic DNA
Blunt
end repair
A-tailing
5’
3’ A
A 3’
5’
Adapterama I: iTru Libraries
Ligation of stubby Y-yoke adapter
Sequence complementary
to i7 adapter
Sequence complementary
to i5 adapter
Y-yoke
structure
GCTCTTCCGATCT
CGAGAAGGCTAGA
Y-yoke
structure
AGATCGGAAGAGC
TCTAGCCTTCTCG
Adapterama I: iTru Libraries
Ligation of stubby Y-yoke adapter
GCTCTTCCGATCT
CGAGAAGGCTAGA
AGATCGGAAGAGC
TCTAGCCTTCTCG
Adapterama I: iTru Libraries
Step 2: Limited cycle PCR
iTru7 primer
5’-ACACTCTTTCCCTACACGACGCTCTTCCGATCT
3’-TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3’
TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5’
i7 index
P7 sequence
Adapterama I: iTru Libraries
Limited cycle PCR
5’-ACACTCTTTCCCTACACGACGCTCTTCCGATCT
3’-TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3’
TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5’
Adapterama I: iTru Libraries
Limited cycle PCR
P5 sequence
i5 index
5’-AATGATACGGCGACCACCGAGATCTACACACCGACAAACACTCTTTCCCTACACGACGCTCTTCCGATCT
3’-TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA
iTru5 primer
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGGTCACTATCTCGTATGCCGTCTTCTGCTTG-3’
TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5’
Adapterama I: iTru Libraries
Limited cycle PCR
5’-AATGATACGGCGACCACCGAGATCTACACACCGACAAACACTCTTTCCCTACACGACGCTCTTCCGATCT
3’-TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGGTCACTATCTCGTATGCCGTCTTCTGCTTG-3’
TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5’
Adapterama I: iTru Libraries
Limited cycle PCR
5’-AATGATACGGCGACCACCGAGATCTACACACCGACAAACACTCTTTCCCTACACGACGCTCTTCCGATCT
3’-TTACTATGCCGCTGGTGGCTCTAGATGTGTGGCTGTTTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGGTCACTATCTCGTATGCCGTCTTCTGCTTG-3’
TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5
iTru7 primer
Adapterama I: iTru Libraries
Double indexed, complete DNA molecule
5’-AATGATACGGCGACCACCGAGATCTACACACCGACAAACACTCTTTCCCTACACGACGCTCTTCCGATCT
3’-TTACTATGCCGCTGGTGGCTCTAGATGTGTGGCTGTTTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACAGGTCACTATCTCGTATGCCGTCTTCTGCTTG-3’
TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTCCAGTGATAGAGCATACGGCAGAAGACGAAC-5
The Power of Multiple Tags
96 x 96 = 9216 samples
96 plates
96 DNA samples
1 2 3 4 5 6 7 8 9 10 11 12
A
B
C
D
E
F
G
H
Tag ID-tagged
Positions Primers
Cost
(US $)
1
9216
$232,250
2
96+96
$4,864
We have: 192 + 384 primers designed
104 + 250 primers in hand
96 + 144 aliquots available
(at cost)
The Power of Multiple Tags
96 x 96 = 9216 samples
96 plates
96 DNA samples
1 2 3 4 5 6 7 8 9 10 11 12
A
B
C
D
E
F
G
H
Tag ID-tagged
Positions Primers
Cost
(US $)
1
9216
$232,250
2
96+96
$4,864
aliquots
96+144
$725
If you are making <40,000 libraries
you can save $ by getting aliquots from us
The Power of Multiple Tags
96 x 96 = 9216 samples
96 plates
96 DNA samples
1 2 3 4 5 6 7 8 9 10 11 12
A
B
C
D
E
F
G
H
Tag ID-tagged
Positions Primers
Cost
(US $)
1
9216
$232,250
2
96+96
$4,864
aliquots
96+144
$725
If you want to get your own from IDT or whoever,
we are happy to share the primer sequences.
Combinatorial tagging
iTru7 primers
1
iTru5 primers
A
B
C
D
E
F
G
H
2
3
4
5
6
7
8
9
10
11
12
Combinatorial tagging
iTru7 primers
1
iTru5 primers
A
B
C
D
E
F
G
H
2
3
4
5
6
7
8
9
10
11
12
Combinatorial tagging
iTru7 primers
1
iTru5 primers
A
B
C
D
E
F
G
H
2
3
4
5
6
7
8
9
10
11
12
Combinatorial tagging
iTru7 primers
1
iTru5 primers
A
B
C
D
E
F
G
H
2
3
4
5
6
7
8
9
10
11
12
Combinatorial tagging
iTru7 primers
1
iTru5 primers
A
B
C
D
E
F
G
H
2
3
4
5
6
7
8
9
10
11
12
Combinatorial tagging
iTru7 primers
1
iTru5 primers
A
B
C
D
E
F
G
H
2
3
4
5
6
7
8
9
10
11
12
Combinatorial tagging
iTru7 primers
1
iTru5 primers
A
B
C
D
E
F
G
H
2
3
4
5
6
7
8
9
10
11
12
Combinatorial tagging
iTru7 primers
1
A
B
iTru5
primers
C
D
E
F
G
H
2
3
4
5
6
7
8
9
10
11
12
Combinatorial tagging
iTru7 primers
1
iTru5 primers
A
B
C
D
E
F
G
H
2
3
4
5
6
7
8
9
10
11
12
Combinatorial tagging
iTru7 primers
1
iTru5 primers
A
B
C
D
E
F
G
H
2
3
4
5
6
7
8
9
10
11
12
What if I have more than 96?
iTru5 primers – 4 x 8 = 32
iTru7 primers – 4 x 12 = 48
Set 101 Set 102 Set 103 Set 104
Set 01
Set 02
Set 03
Set 04
What if I have more than 96?
iTru5 primers – 4 x 8 = 32
iTru7 primers – 4 x 12 = 48
Set 101 Set 102 Set 103 Set 104
Set 01
Set 02
Set 03
Set 04
01/101
02/102
03/103
04/104
Samples with Unique Tags: 4 x 96 = 384
What if I have more than 96?
iTru5 primers – 4 x 8 = 32
iTru7 primers – 4 x 12 = 48
Set 101 Set 102 Set 103 Set 104
Set 01
01/101 01/102 01/103 01/104
Set 02
02/101 02/102 02/103 02/104
Set 03
03/101 03/102 03/103 03/104
Set 04
04/101 04/102 03/103 04/104
Samples with Unique Tags: 16 x 96 = 1536
Combinatorial Tagging
Universal
Stubs
No tags are used here
Reusable
Pool & PCR with i5 & i7
8 x 12 = 96
Illumina i5 & i7’s
2 oligos
Share these
192 x 384 = 73,728
iTru5’s & iTru7’s
Pool, Sequence, Deconvolute
Up to 96
samples
Up to 73,728
samples
Amplicon Tagging
How can I quickly & easily make
amplicon libraries with multiple
tags?
How do I make them
so I can spike in a tiny
fraction or use an
entire MiSeq run?
iTru Fusion PCR Primers
Illumina TruSeq Compatible
Flanking DNA Region of Interest
Flanking DNA
5’-CGTCGGATGTAAGACACACACACACACACTCCGAATCGGCGT-3’
3’-GCTGCCTACATTCTGTGTGTGTGTGTGTGAGGCTTAGCCGCA-5’
5’-CGTCGGATGTA-3‘
Forward Primer
Read1
3’-GCTTAGCCGCA-5’
Reverse Primer
Read2
Add Illumina TruSeq (or Nextera)
Read 1 & Read 2 Sequences onto
the 5’ end of your primers
iTru Double Index 2-step Libraries
TruSeq
5’
3’
TruSeq
A stub
Fusion Tagged
Amplicons
Limited Cycle
PCR
DNA
Reverse
primer
TruSeq
B stub
3’
5
’
TruSeq
iTru_7
primer
iTru_5
primer
indexed
A primer
i5 index
Full-length &
Double
Indexed
Forward
primer
indexed
B primer
i7 index
A
B
Fusion Amplicons
• Requires 2 PCRs per sample
• Limited by i5xi7 tag combinations
• Requires 5-20% PhiX (or other
genomic samples) to create diversity
What can I do to solve all 3 of
these constraints simultaneously?
iTru Fusion Primers
Illumina TruSeq Compatible
Flanking DNA Region of Interest Flanking DNA
5’-CGTCGGATGTAAGACACACACACACACACTCCGAATCGGCGT-3’
3’-GCTGCCTACATTCTGTGTGTGTGTGTGTGAGGCTTAGCCGCA-5‘
Tag_A
Tag_1
iTru_Read2
iTru_Read1
Tagged Reverse Primers
Tagged
Forward
Primers
Tag 1
Tag 2
Tag 3
Tag 4
Tag 5
Tag 6
Tag A
A-1
A-2
A-3
A-4
A-5
A-6
Tag B
B-1
B-2
B-3
B-4
B-5
B-6
Tag C
C-1
C-2
C-3
C-4
C-5
C-6
Tag D
D-1
D-2
D-3
D-4
D-5
D-6
Internal Tags Vary in Length
iTru Quadruple Index 2-step Libraries
TruSeq
5’
3’
TruSeq
A stub
Fusion Tagged
Amplicons
Limited Cycle
PCR
Internal
index 1
DNA
Internal
index 2
Reverse
primer
TruSeq
B stub
3’
5
’
TruSeq
iTru_7
primer
iTru_5
primer
indexed
A primer
i5 index
Full-length &
Quadruple
Indexed
Forward
primer
indexed
B primer
i7 index
A
B
TaggiMatrix Project Workflow
8 + 12 = 20 primers
Forward_Primer_A
Forward_Primer_B
…
Forward_Primer_H
Reverse_Primers_1 - 12
1 2 3 4 5 6 7 8 9 10 11 12
A
B
C
D
E
F
G
H
96 unique
ID_tagged
Amplicons
Pooled
amplicons
96 DNA samples
Sequence library
Demultiplex
Sequences &
Analyze
Add i5 + i7
to make
library
Pooled
library
The Power of Multiple Tags
96 x 96 = 9216 samples
96 unique ID_tagged
Amplicons per plate
Each pool of 96 into one well
of a new master plate
96 plates
Sequence library
Demultiplex
Sequences &
Analyze
Add unique combinations
of i5 + i7 to make libraries
Pooled
library
The Power of Multiple Tags
96 x 96 = 9216 samples
96 plates
96 DNA samples
1 2 3 4 5 6 7 8 9 10 11 12
A
B
C
D
E
F
G
H
Tag ID-tagged
Positions Primers
Cost
(US $)
1
9216
$232,250
2
96+96
$4,864
3
8+12+96
$1,491
4
8+12+8+12
$486
Hierarchical Tagging
Project
Specific
8 x 12 = 96 tag
combinations
20 oligos
Reusable
Pool & PCR with i5 & i7
Share these
8 x 12 = 96
Illumina i5 & i7’s
192 x 384 = 73,728
BCF i5 & i7’s
Pool, Sequence, Deconvolute
Up to 9,216
samples
Up to 7,077,888
samples
That’s really cool, but…
This still seems like a lot
of stuff to keep track of,
order, think about, etc.
So, how can I do it?
TaggiMatrix
TaggiMatrix
So…
TaggiMatrix turns a problem of
resource limitation into logistical
& social problems (i.e., how do
you coordinate all these
samples).
TaggiMatrix Highlights
• Standard set of 20 tags
– 8 used on Forward Primers
– 12 used on Reverse Primers
• Tags Vary in Length
– This generates sequence diversity downstream
• Use Nextera or TruSeqHT Primers
– Can use Illumina primers (8 + 12)
– Can use iNext or iTru primers (160 + 240)
• iNext & iTru versions are validated
Adapterama I Summary
• Manuscripts are nearing submission
• We are happy to share spreadsheets for:
• iTru & iNext primers we have designed
• TaggiMatrix primer designs
• RADseq adapter oligo designs
• You can order your own primers from IDT
• We do have some primers in plates prealiquoted at IDT for distribution
Ultraconserved elements (UCEs) as
universal sequence capture probes
and loci
UCEs = stretches of DNA that
are remarkably conserved across
highly divergent species
(Bejerano et al. 2003)
Brant Faircloth is a bioinformatics guru
who has been key in developing UCEs
and tagging analysis.
Primer Aliquots
Aliquoted, dried & heat sealed at IDT
1
2
3
4
5
6
7
8
9
10
iTru5 primers
A
B
C
D
E
F
G
H
Even numbered columns are empty
11
12
Primer Aliquots
Aliquoted, dried & heat sealed at IDT
iTru7 primer plates
1
A
B
C
D
E
F
G
H
2
3
4
5
6
7
8
9
10
11
12
UCE probes available from Mycroarray
Probe sets for
5000 or 2500
loci can be
obtained for
$600 from
Mycroarray
No, I (we) don’t
get any of that $$$
http://www.mycroarray.com/mybaits/mybaits-UCEs.html
Probes (baits) for
regions of interest
Biotinylated
Oligo Capture
•Library Construction:
Genomic DNA is fragmented
and adapters added to each
end
•Probe Construction: Make
biotinylated probes that are
complementary to regions of
interest
•Use Probes to Enrich Library
for fragments that hybridize to
probes
http://www.opengenomics.com/sureselect-insolution.aspx
http://ultraconserved.org/
How does iNext differ
from iTru?
• Nextera Read1 & Read2 perform the same
function as in iTru, but are completely
different sequences
• Nextera & iNext stubs have G overhangs
• Insert DNA must have a C overhang
– Commercial kits must be modified
• Must use iNext5 & iNext7 primers
Utility of TaggiMatrix
• Microbial Community Analysis
– EPA – Kelvin Wong & Marirosa Molina
– USDA – Brian Oakley, Mike Rothrock
– UGA - Erin Lipp et al.
• STD’s in Chimpanzees (Julie Rushmore)
• Tropical Tree Roots (Brant Faircloth)
• Tropical Tree Wood-rotting Fungi
(Brant Faircloth, Greg Gilbert, Steve Hubbell)
3 Major Ways to Multi-Tag
• Hierarchical – multiple tags in series
(e.g., inner & outer)
• Ecumenical – tags are platform
independent
• Combinatorial – tags are independent
Oligo Library Capture
Agilent SureSelect:
- 55,000 probes
- smallest kit =
~$7k for 10 samples
(price assumes discount)
Mycroarray MYselect:
- 20,000 probes
- smallest kit =
~$2.5k for 12 samples
http://www.opengenomics.com/sureselect-insolution.aspx
That’s keen, but…
I don’t have 20,000 loci
for my critters and
$2,550 / 12 = $212
Can I do this for less
than $250/sample?
Reducing Costs of
Sequence Capture
• ID_tag multiple samples, pool them,
and do captures on the pools
• Use fewer independent probes:
 20k probes = 1x concentration
 10k probes = 2x concentration
 2k probes = 10x concentration
http://bad-dna.org/tags/
Sequencing Introduces Errors
From B. Faircloth IOB Seminar 2013
Sequencing Process Errors
From B. Faircloth IOB Seminar 2013
Errors in Tag Sequences
From B. Faircloth IOB Seminar 2013
Errors in Tag Sequences
From B. Faircloth IOB Seminar 2013
ID Nomenclature
Ion Torrent or 454 ID-Tagging
DNA
A
B
Illumina ID Tagging
DNA
A
ID sequence
B
ID sequence
ID –sequences are called:
Name
MIDs
Indexes
Barcodes
Tags
Source
454
Illumina
Various
Faircloth & Glenn
# blessed
~130
≤96
Varies
Up to 9000
Length
10
6
Varies
6-10
Sequencing Projects
PCR
Number of Individuals
1000s
Sequence
Capture RADseq
UCEs
exons
WGS
1
1
thousands
Number of Loci
millions
Whole
genome
Genetic Projects
1000s
Number of Individuals
II - TaggiMatrix
Adapterama
PCR
IV - Splitaake
Sequence
Capture RADseq
III – M-RAD, 3-RAD & B-RAD
UCEs
exons
I – iTru & iNext
1
1
thousands
Number of Loci
millions
WGS
Whole
genome
Core UCE Research Team
Kevin
Winker
Robb
Brumfield
John
McCormack
Nick
Crawford
Mike
Harvey
(LSU)
(Occidental College)
(Cal Acad)
(LSU)
(UAF)
Mike
Alfaro
Brian
Smith
Mike
Braun
Ed
Braun
(UCLA)
(LSU)
(SI)
(UFL)
Noor White
(UMD)