Titration-free 454 sequencing using Y adapters

protocol
Titration-free 454 sequencing using Y adapters
Zongli Zheng1,2, Abdolreza Advani3, Öjar Melefors2,3, Steve Glavas3, Henrik Nordström2,3, Weimin Ye1,
Lars Engstrand2,3 & Anders F Andersson4
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. 2Department of Microbiology, Tumor and Cell Biology,
Karolinska Institutet, Stockholm, Sweden. 3Swedish Institute for Infectious Disease Control, Solna, Sweden. 4Science for Life Laboratory, KTH Royal Institute of Technology,
Stockholm, Sweden. Correspondence should be addressed to Z.Z. ([email protected]).
1
© 2011 Nature America, Inc. All rights reserved.
Published online 18 August 2011; doi:10.1038/nprot.2011.369
We describe a protocol for construction and quantification of libraries for emulsion PCR (emPCR)-based sequencing platforms
such as Roche 454 or Ion Torrent PGM. The protocol involves library construction using customized Y adapters, quantification
using TaqMan-MGB (minor groove binder) probe–based quantitative PCR (qPCR) and calculation of an optimal template-tobead ratio based on Poisson statistics, thereby avoiding the need for a laborious titration assay. Unlike other qPCR methods,
the TaqMan-MGB probe specifically quantifies effective libraries in molar concentration and does not require specialized
equipment. A single quality control step prior to emulsion PCR ensures that libraries contain no adapter dimers and have an
optimal length distribution. The presented protocol takes ~7 h to prepare eight barcoded libraries from genomic DNA into libraries
that are ready to use for full-scale emPCR. It will be useful, for example, to allow analyses of precious clinical samples and
amplification-free metatranscriptomics.
INTRODUCTION
Modern DNA sequencing technology has improved markedly in
recent years. However, in many current technologies, sample library
preparation before sequencing has surfaced as a key limiting ­factor.
For instance, the current Roche 454 sequencing protocol for preparation of a shotgun library1 requires 500 ng of DNA as starting
material and includes a laborious titration assay. A faster library
preparation protocol that can handle lower starting amounts would
be desirable and particularly useful for, e.g., sequence analyses of
precious clinical samples, cDNA sequencing of environmental
samples for metatranscriptomics2 without the need for amplification that may introduce biases3, or microsatellite sequencing in
population genetics4. The protocol presented here is based on our
previous study that described a novel method for sequencing lowstarting-amount materials5.
Library preparation in most high-throughput sequencing technologies involves the ligation of universal adapter(s) to the ends
of DNA sample fragments to enable PCR amplification6–8. Unlike
linear adapters such as adapters A and B (used for 454 sequencing),
a single Y adapter, proposed previously9 and used for the Illumina
sequencing, has several advantages. Given a 100% ligation efficiency, four double-stranded DNA (dsDNA) molecules would, on
average, generate two properly appended dsDNA libraries using
adapters A and B, whereas eight single-stranded DNA (ssDNA)
libraries would be generated using a single Y adapter (Fig. 1). In
addition, the Y adapters can only be ligated at the double-stranded
stem end that enables a simultaneous incubation with all enzymes
involved, thus eliminating the need for laborious and yield-­reducing
cleanup steps.
One of the main differences in current high-throughput DNA
sequencing technology as compared with traditional Sanger
sequencing is that sample template concentration is kept very
low to avoid tedious microbial subcloning. Emulsion PCR–based
sequencing uses many millions of water-in-oil droplets, each of
which serves as a separated amplification compartment6. Sample
library concentration is kept so low that the majority of the droplets
contain no library, a small proportion contains single-molecule
libraries and an even smaller proportion contains mixed-molecule
libraries in a stochastic manner that follows Poisson distribution,
as shown by our sequencing data5. In addition, the enrichment
step will select only those beads that have a library, but a too-low
DNA-to-bead ratio will lead to insufficient amount of beads for
sequencing, whereas a too-high ratio will lead to frequent occurrence of mixed library beads. Thus, one of the key factors for a
successful experiment is to use an optimal amount of library for
sequencing. It is important to keep in mind that the amount of
library added is not linearly associated with the number of highquality beads5. We recommend an input DNA-to-bead ratio of 0.08,
which will result in 96% of the enriched beads having a single-copy
template according to Poisson distribution, and will be sufficient
for sequencing5. A higher proportion of ‘nonpure’ (mixed-copy)
beads associated with higher DNA-to-bead ratio might, in addition, affect the consumption of nucleotide flows during sequencing and bioinformatics processes, such as image background and
signal intensity normalizations. Apart from the predicted increase
of mixed library beads, a higher ratio of input DNA to beads is also
less tolerant of subtle pipetting errors5.
Two quantitative PCR (qPCR) assays have previously been proposed to quantify libraries derived from trace amounts of starting
material10,11. Besides requiring less library, as compared with UV
spectrophotometry and fluorometry methods, qPCR assays also
have the advantage of measuring the amount of effective library—
as the total library typically contains a mixture of molecules that
are amplifiable, amplifiable but inefficient, or nonamplifiable for
various reasons5. The previous two methods are based on SYBR
Green dye10 qPCR and universal template TaqMan probe digital
PCR11, respectively. With the SYBR Green–based qPCR assay, there
is no need to design and use the relatively expensive TaqMan probe.
However, it measures the total mass of the library and requires transformation into copy numbers on the basis of amplicon size estimation by gel electrophoresis or Agilent Bioanalyzer. Furthermore,
the precision (coefficient of variance (CV) of the estimates) of this
assay has not been assessed. In contrast, TaqMan-based assays have
nature protocols | VOL.6 NO.9 | 2011 | 1367
protocol
Figure 1 | A schematic illustration of the constructions of two types of
libraries, A-B and Y. The A-B library construction method generates on
average two effective double-stranded molecules (each appended with
adapters A (blue) and B (green)) from four dsDNA molecules, given
100% ligation efficiency. The molecules appended with A-A and B-B are
nonamplifiable because of the amplification-inhibiting hairpins formed
between the complementary adapted sequences after denaturing and
annealing steps. The Roche A-B library uses a biotinylated B adapter and
two additional steps to generate two effective ssDNA molecules (see ref. 5).
In contrast, the Y library construction method generates eight effective
single-stranded molecules. The MGB-probe is in red and barcode in yellow.
A-B library
Y library
End polishing,
5′ phosphorylation
P
P
PP
P
End polishing,
5′ phosphorylation,
adenylation and
ligation of Y adapter
P
P
P
Ligation of adapters
A
B
© 2011 Nature America, Inc. All rights reserved.
Without cleanup in
between
the advantage of measuring the number of amplifiable molecules
directly. The universal template TaqMan probe digital PCR is based
on an 8-bp dual-labeled locked nucleic acid probe, complementary
to the 5′-end tail of the customized amplification primer11. First,
the digital PCR assay requires special equipment that is not widely
accessible, such as Fluidigm’s BioMark microfluidic device. Second,
two rounds of quantifications are used: an initial crude quantification by qPCR to guide the dilution of the libraries for a more
precise quantification by digital PCR, which renders a precision
(CV 11.8%) higher than the initial qPCR alone (CV 21.2%)11. Our
MGB probe-based assay5 does not require special equipment and,
therefore, is more accessible to ordinary laboratories. This MGB
probe assay is at least as precise as digital PCR (CV 9.5% versus
11.8%). The MGB probe is a 20-bp-long probe complementary
to the library molecule and is located next to the 3′ end of one
of the amplification primers (Fig. 2). Having a probe targeting
site between the amplification primers has the advantage that the
amount of fluorescence signal is proportional to the number of
library amplicons and not to the potential amplification primer
dimers, which might be a potential source for less precise quantification with a probe-targeting part of an amplification primer.
Further, a longer probe, as compared with a shorter one (8 bp), is
more specific.
Limitations of the method should be acknowledged. The library
quantification described here is based on a qPCR assay. Essentially,
a
Amplification
primer
emPCR A
Sequencing primer complement
Nick repair
Two effective molecules
Eight effective molecules
One of the potential forms, if no
cleanups in between
for all qPCR-based assays, amplification efficiency drops as the
amplicon length increases. The best quantification method would
be the one that best mimics the efficiency of emPCR. Either poorer
or better efficiency than obtained by emPCR (depending on the
emPCR system) could result in inaccurate estimation of the enrichment percentage after emPCR. This issue possibly pertains to all
qPCR assays, including ours. It seems that our qPCR quantification
method yields lower efficiency than the Roche Titanium emPCR
method (data not shown) and results in ~50% more enriched beads
than expected. However, as a typical bead recovery percentage is
around 65–85%, the additional (~50%) beads more or less compensate for the bead loss during recovery and result in nearly the
amount needed for loading onto the sequencing plate. Because
qPCR amplification efficiency drops rapidly for long amplicons,
and with the anticipated increase in read length in upgraded or
b
At and after ∼500 bp, the
sequence quality typically
drops below acceptable levels
and is trimmed away during
computational analysis
Sequencing starts
Library key GACT
3′
Sequencing primer
5′ MID sequence
T
5′
5′
3′
5′
3′ MID sequence
MGB probe priming site
Library
key
MGB probe
Sample
5′ MID
3′ MID
emPCR B
complement
Amplification primer emPCR
B complement
c
5′ -C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACAC T*A*C*T*C*G*T-sample-pC*G*A*G*T*A*GTGTGACACGCAACAGGGGATAG ACAAGGCACACAGGG*G*A*T*A*G*G -3′
3′-G*G*A*T*A*G*GGGACACACGGAAC AGATAGGGGACAACGCACAG TGTG*A*T*G*A*G*Cp-sample-T*G*C*T*C*A*T CACATCAGCAGCCTCTGTGCGTCCCTAC*T*C*T*A*C*C-5′
Figure 2 | Design of Y MID adapter. (a,b) Schematic illustrations of the Y MID adapter (a) and the sequencing process on a library molecule (b).
(c) An example of two library molecules generated using one ‘Y3’ adapter. The emPCR primer A is underlined; emPCR primer B complement is shown by
dashed underline. The sequencing primer is highlighted in yellow, the library key sequence is shown with dots underneath, the 5′ MID sequence is shown in
red, the 3′ MID sequence is shown in green and the MGB-probe highlighted is in purple.
1368 | VOL.6 NO.9 | 2011 | nature protocols
protocol
© 2011 Nature America, Inc. All rights reserved.
new platforms in mind, we recommend empirically estimating
the difference in enrichment percentage predicted by the qPCR
method and one observed by titration assay once before applying
the titration-free method routinely. Here we use a qPCR thermocycling program of ~1.5 h, which favors the complete extension
of long amplicons. For applications of sequencing shorter amplicons (e.g., <150 bp), the thermocycling program can be shortened
to <30 min, as in a typical Fast qPCR assay.
Experimental design
Multiplexing adapters (MID adapters), also referred to as barcodes,
are essential as more and more projects require the pooling of samples. We present here a set of eight Y-barcoded adapters. The Y MID
adapter has a stem of 10 bp, which serves as the barcode. It is important to have a short Y-adapter stem (just long enough to anneal and
form dsDNA adapters at 25 °C for ligation to the sample DNA) to
reduce the inhibitory effect during PCR amplification. The longer
the stem, the stronger the inhibitory effect will be, assuming comparable G/C to A/T ratios. This inhibition is due to the potential
formation of a hairpin from the complementary barcodes at both
ends of an ssDNA molecule (Fig. 2). A 17-bp-long stem greatly
inhibited PCR amplification (annealing temperature 60 °C) and
showed no visible amplicons on agarose gel (data not shown). Each
Y MID adapter has a unique stem and two universal branches and
is created by annealing two oligonucleotides that share ten complementary nucleotides for the stem sequence (Fig. 2).
The barcodes were designed primarily using the barcodes selected
from the 12 Roche 454 Rapid Library (RL) MID sequences. We
designed two new barcode sequences, Ya1 and Ya2, containing the
same number (n = 17) of nucleotide flows during sequencing as
the RL MIDs. Selection and validation of the barcodes were performed in two steps. First, we used OligoAnalyzer (http://www.
idtdna.com) to exclude (n = 4) by initial sequence analysis the
RL MIDs that might form potential secondary structures (because
of the introduction of the TaqMan-MGB probe complementary
site) or that might have substantially different Gibbs free energy
(∆G) in the formation of the stem of Y adapters from the rest of
the MID ∆Gs. Second, experimental validation excluded two other
RL MIDs barcodes (Y1 and Y12) that gave different proportions
of sequencing yield than expected from the qPCR quantification.
Because the qPCR quantification was applied to individual libraries, it cannot reflect possible interactions among the adapters when
they are pooled in one reaction at a later stage. The remaining six
RL MIDs barcodes and the two newly designed barcodes (Ya1 and
Ya2) performed well in a panel.
Because sequencing starts from the 3′ end of a library molecule and sequencing errors accumulate as the polymerases extend
toward its 5′ end, we designed the qPCR probe complementary
site at the 5′ end of a library molecule so that it does not waste
the sequencing capacity (Fig. 2b). Currently, the 454 Titanium
sequencer generates read lengths of ~500 bp, whereas the library
molecules are generally longer than 500 bp, indicating that most
of the 5′ end of the library (3′ end of the read) is not sequenced at
acceptable quality and, therefore, trimmed away. This can also be
evidenced by the small difference in the total number of yielded
bases between pre- and post-trimming of our customized adapter
(711 versus 699 Mbp, see ANTICIPATED RESULTS).
For every experiment, it is useful to include a blank (no sample)
library in parallel, starting from the first step until qPCR quantification and agarose gel electrophoresis. Because of background
noise (most likely due to a trace amount of adapter dimers
remaining in the library), the qPCR quantification may yield a
value of hundreds to thousands of molecules in total for the blank
library12. Agarose gel electrophoresis of the blank library qPCR
amplicons would reveal a band corresponding to the size of an
adapter dimer. This background can be ignored as it typically
comprises <1% of a sample library. However, longer amplicons
would indicate that contamination had been introduced in the
upstream steps.
Size selection of DNA fragments is crucial. The Roche 454
Titanium platform is able to sequence, on average, 500 bp, and
the selected DNA fragments should be longer than 500 bp to take
advantage of the long read length. Conventional qPCR methods
recommend using amplicons no longer than 150 bp to achieve good
amplification efficiency. Using regular thermocycling conditions,
the amplification efficiency drops for amplicons longer than 500 bp
and drops profoundly for amplicons longer than 1,000 bp (data
not shown). Fragments longer than 1,000 bp will consequently not
be amplified well when mixed in one microdroplet with a 500-bp
fragment, although their presence might not be a problem provided that the noise light signal from the 1,000-bp amplicons during sequencing is negligible. When a 1,000-bp fragment is amplified
alone in a microdroplet, the number of amplicons on each bead
will be much lower than normal (e.g., 100,000 amplicons of size
1,000 bp on one bead versus 1 million amplicons of size 500 bp on
another bead), despite using the same thermocycling conditions.
Conceivably, postsequencing light normalization is better for a
library with a narrower size range, and we recommend removing DNA fragments longer than 900 bp for the current Roche 454
Titanium emPCR setting13.
For an experiment in which a single sample is to be sequenced,
the nonbarcoded Y adapter5 can be used to save 10 bp for each read
compared with the barcoded adapters. This Y adapter has a different key sequence (TCAG) than the barcoded ones (GACT). The
Roche 454 pipeline, however, supports a simultaneous sequencing
of libraries with different keys on one plate in physically separated
regions; this can be done, for example, by using the nonbarcode
Y adapter for one large-volume (LV) region (one sequencing plate
consists of two LV regions) and eight different barcoded adapters
for the other LV region.
MATERIALS
REAGENTS
 CRITICAL For all the reagents and buffers used, we have not noticed any
adverse effect after storage in a freezer or refrigerator (per manufacturer’s
recommendations) for up to 1 year, except that any solutions containing
ethanol need to be freshly prepared.
• DNA sample. Conventional methods (e.g., a variety of Qiagen DNA kits)
from different types of sample can be used for DNA extraction.
• Oligonucleotides for Y adapters. We used 16 HPLC-purified oligo­
nucleotides (Integrated DNA Technologies) to form eight Y adapters.
See Table 1.
nature protocols | VOL.6 NO.9 | 2011 | 1369
protocol
Table 1 | Oligonucleotides used to form Y adapters.
Number
Sequence (5′–3′)
Y3
5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACACT*A
*C*T*C*G*T-3′
5′-pC*G*A*G*T*AGTGTGACACGCAACAGGGGATAGACAAGG
CACACAGGG*G*A*T*A*G*G-3′
Y5
5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACGAG*T
*A*G*A*C*T-3′
5′-pG*T*C*T*A*CTCGTGACACGCAACAGGGGATAGACAAGG
CACACAGGG*G*A*T*A*G*G-3′
Y8
5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACGTA*C
*T*G*T*G*T-3′
© 2011 Nature America, Inc. All rights reserved.
5′-pC*A*C*A*G*TACGTGACACGCAACAGGGGATAGACAAGG
CACACAGGG*G*A*T*A*G*G-3′
Y9
5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACGTA*G
*A*T*C*G*T-3′
5′-pC*G*A*T*C*TACGTGACACGCAACAGGGGATAGACAAGG
CACACAGGG*G*A*T*A*G*G-3′
Y10
5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACTAC*G
*T*C*T*C*T-3′
5′-pG*A*G*A*C*GTAGTGACACGCAACAGGGGATAGACAAGG
CACACAGGG*G*A*T*A*G*G-3′
Y11
5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACTAT*A
*C*G*A*G*T-3′
5′-pC*T*C*G*T*ATAGTGACACGCAACAGGGGATAGACAAGG
CACACAGGG*G*A*T*A*G*G-3′
Ya1
5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTCTACT*C
*G*T*A*G*T-3′
5′-pC*T*A*C*G*AGTAGGACACGCAACAGGGGATAGACAAGG
CACACAGGG*G*A*T*A*G*G-3′
Ya2
5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTGTACA*G
*T*A*C*G*T-3′
5′-pC*G*T*A*C*TGTACGACACGCAACAGGGGATAGACAAGG
CACACAGGG*G*A*T*A*G*G
Complementary nucleotides that anneal to form the Y-adapter stem and barcode are shown in underlined
text. Asterisks (*) indicate a phosphorothioate-modified bond, p indicates a phosphorylation.
• Oligos for qPCR. emPCR A 5′-CCATCTCATCCCTGCGTGTC-3′ (various
vendors, salt purification); emPCR B 5′-CCTATCCCCTGTGTGCCTTG-3′
(Various vendors, salt purification); MGB probe 6FAM-CTATCCCCTGT
TGCGTGTC-MGBNFQ (Applied Biosystems, HPLC purification)
• qPCR standards. The standards were prepared by cloning of an available
library, PCR amplification of a single colony suspension, purification and
dilutions, as described earlier5. Alternatively, a simple dilution method can
be used12 (see INTRODUCTION).
• UltraPure glycerol (Invitrogen, cat. no. 15514011, http://www.invitrogen.com)
• MinElute PCR purification kit (Qiagen, cat. no. 28004, http://www.qiagen.com)
• T4 DNA ligase (Enzymatics, cat. no. L603-LC-L, http://www.enzymatics.
com)
• Klenow (3′→5′ exo-) (Enzymatics, cat. no. 01-LC-L, http://www.enzymatics.
com)
• Taq DNA polymerase, recombinant (Invitrogen, cat. no. 10342-020, http://
www.invitrogen.com)
• dNTP mix (Enzymatics, cat. no. N205L)
• AMPure XP beads (Agencourt, product no. A63880, http://www.
beckmancoulter.com)
• TaqMan Fast Universal PCR Master Mix (Applied Biosystems,
part no. 4352042, http://www.appliedbiosystems.com)
• End-repair mix (low concentration; Enzymatics, cat. no. Y914-LC-L,
http://www.enzymatics.com)
• Tris-EDTA (TE) buffer (10×, BioUltra Molecular Biology Grade, pH 8.0;
Sigma, cat. no. 93283, http://www.sigmaaldrich.com)
• Buffer PB (Qiagen, cat. no. 19066, http://www.qiagen.com)
• Water, Molecular Biology (Sigma, cat. no. W4502-1L, http://www.
sigmaaldrich.com)
• Ethanol (BioUltra, for molecular biology; Sigma, cat. no. 51976, http://www.
sigmaaldrich.com)
• GelPilot DNA loading dye (5×, Qiagen, cat. no. 239901, http://www.qiagen.
com)
• GelRed nucleic acid gel stain (Biotium, cat. no. 41002, http://www.biotium.
com/)
EQUIPMENT
• 7900HT Fast Real-Time PCR System or equivalent (Applied Biosystems,
part no. 4329001, http://www.appliedbiosystems.com)
• Thermocycler
• Magnetic particle collector (MPC, DynaMag-2 magnet, cat. no. 123-21D,
Invitrogen, http://products.invitrogen.com/ivgn/product/12321D)
• Nitrogen cylinder polyallomer tube (Beckman Coulter, part no. 357448,
http://www.beckmancoulter.com) or other low-binding tube
alternatives
• Nebulizers (Invitrogen, cat. no. K7025-05, http://www.invitrogen.com)
• PCR tubes
• Microcentrifuge tubes
REAGENT SETUP
Adapter annealing To a 200-µl PCR tube, add the following:
TE buffer (1×)
80 µl
Y adapter, top (100 µM)
10 µl
Y adapter, bottom (100 µM)
10 µl
Incubate at 95 °C for 1 min, 60 °C to 14 °C with − 0.1 °C per second, 14 °C
hold. Dilute the annealed adapters tenfold with 1× TE into working
concentration (1 µM). This can be stored at −20 °C for at least 1 year.
Nebulizing buffer Nebulizing buffer is 10% (vol/vol) glycerol in 1× TE
buffer. This buffer can be stored at 4 °C for at least 1 year.
PROCEDURE
DNA nebulization ● TIMING 1 h for eight samples
1| Add 590 µl of nebulizing buffer to a nebulizer. Add 10 µl of DNA sample. Connect the nebulizer to a nitrogen cylinder
connected to a regulator and apply 30 psi for 1 min.
 CRITICAL STEP If larger sample volumes are used, the total volume should be adjusted to 600 µl.
2| Add 2.5 ml of PB buffer and mix by pipetting.
3| Transfer 650 µl of the liquid into a MinElute spin column, centrifuge at 10,000g for 15 s and discard the flow-through.
1370 | VOL.6 NO.9 | 2011 | nature protocols
protocol
4| Repeat Step 3 until all of the nebulized sample has been transferred to the spin column.
5| Add 700 µl of 70% (vol/vol) ethanol to the column, centrifuge at 10,000g for 1 min and discard the flow-through.
6| Elute the sample in 25 µl of 1× TE buffer according the manufacturer’s instructions.
 CRITICAL STEP If you start with small amounts of DNA (less than 10 ng), low-binding tubes (e.g., polyallomer tubes),
should be used throughout the library preparation, including for the storage of the library in the freezer.
 CRITICAL STEP The DNA nebulization steps (Steps 1–6) can be skipped if the starting sample is of low molecular weight
(such as degraded archived formalin-fixed and paraffin-embedded tissue samples). The volume of AMPure XP beads used in
the fragment size selection (Steps 7–11) should also be adjusted to avoid losing the sample fragments.
© 2011 Nature America, Inc. All rights reserved.
Fragment size selection ● TIMING 30 min for one sample, 5 more min for each additional sample
7| To the nebulized and purified DNA fragments, add an appropriate amount of AMPure XP beads (e.g., 11.5 µl of beads
into 25 µl of sample to bind fragments longer than 900 bp; calibration is needed for each batch), and mix by pipetting.
 CRITICAL STEP Calibration of the AMPure XP beads should be done according to the manufacturer’s instructions before library
construction. The example given above was based on a calibration result showing that 11.5 µl of beads added to 25 µl of sample
captured fragments longer than 900 bp, and that 14.5 µl of beads in 25 µl of sample captured fragments longer than 500 bp.
8| Transfer to a 1.5-ml tube and incubate at ambient temperature for 5 min.
9| Place the tube on the MPC. After the beads are pelleted (about 1 min), pipette the supernatant, which contains
fragments shorter than 900 bp, into a new tube.
10| Add an appropriate amount of AMPure XP beads (e.g., 3.0 µl) to the tube, mix by pipetting and incubate at ambient
temperature for 5 min.
11| Place the new tube on the MPC. After the beads are pelleted, pipette and discard the supernatant, which contains
fragments shorter than 500 bp. The fragments that remain on the beads are in the size range of 500–900 bp.
 CRITICAL STEP Every time before you pipette the AMPure XP beads, you should vortex the bead tube thoroughly to obtain
a homogeneous solution. The size selection is based on the amount of solution containing the AMPure XP beads, not the
amount of beads per se, in relation to the sample volume. Fragments longer than 500 bp will remain on the beads in Step 11
when applying 58% (= 14.5/25) of beads, where 14.5 is the total volume of AMPure XP bead solution (11.5 µl from Step 7
plus 3.0 µl from Step 10) and 25 is the volume of sample from Step 6.
12| Add 500 µl of 70% (vol/vol) ethanol, incubate for 30 s and then pipette and discard the ethanol.
13| Repeat Step 12 once and remove any residual liquid drops at the bottom or on the walls of the tube.
14| Leave the tube open (on the MPC) to dry at ambient temperature for 2 min.
15| Remove the tube from the MPC and add 25 µl of 1× TE buffer (or 14 µl if starting with small amount of sample).
Pipette to mix the bead pellet.
16| Place the tube back onto the MPC. After the beads are pelleted (about 1 min), collect the aqueous phase,
which contains 500- to 900-bp-long fragments, into a new tube.
End-polishing, phosphorylation and dA extension ● TIMING 1 h
17| To a 200 µl PCR tube, add the following:
Size-selected DNA sample from Step 16 (add 1× TE up to 14 µl)
14.0 µl
1.0 µl
dNTP mix (25 mM)
SLOW ligation buffer, 10× (component of T4 DNA ligase kit)
2.5 µl
Buffer for Taq polymerase, 10× (Mg
2.0 µl
2 + free)
End-repair mix (low concentration)
2.0 µl
Klenow exo-
0.5 µl
Taq polymerase
0.5 µl
Total
22.5 µl
nature protocols | VOL.6 NO.9 | 2011 | 1371
protocol
 CRITICAL STEP Klenow exo- is optional and can be omitted if input DNA concentration is high (e.g., >10 ng). Note that
Invitrogen ‘Platinum’ Taq should not be used—Taq polymerase becomes active when the temperature rises and is fully active
at 72 °C, when it adds dA to the 3′ end of dsDNA. The Platinum Taq polymerase, however, requires a heat activation step,
which would disassociate the dsDNA and should therefore be avoided.
18| Place on a thermocycler and apply the following program (with a heated lid): 12 °C for 15 min, 37 °C for 15 min and
72 °C for 15 min; finally, hold at 4 °C.
Adapter ligation and purification ● TIMING 1 h or 8 h, depending on starting amount
19| To the reaction tube from Step 18, add the following:
Annealed Y adapter (with or without barcodes, 1 µM)
1.0 µl
Ligase (enzymatic, low concentration)
2.0 µl
25.0 µl
© 2011 Nature America, Inc. All rights reserved.
Total
20| Adapter ligation should be performed using option A for large amounts of starting DNA or option B for small amounts of
starting material:
(A) Large starting amount of DNA
(i) If a large amount is used, such as 100 ng, incubate at 22 °C for 20 min.
(B) Small starting amount of DNA
(i) If less than 10 ng is used, incubate at 12 °C for 8 h or overnight.
 CRITICAL STEP The yield of the ligation product increases with ligation time. When starting amounts of samples
are between 10 and 100 ng, ligation time can be adjusted on the basis of the amount of sequence yield needed.
We previously used 1 ng of nebulized fragments and generated sufficient library for 10 Titanium runs5.
21| Purification. To remove free adapters (or potential adapter dimers and residual fragments shorter than 500 bp), add the
same amount of AMPure XP beads as the total of those used in Steps 7 and 10 (in our example, 14.5 µl of beads) to the
25-µl ligation reaction from Step 19. Follow the manufacturer’s instructions and elute the library with 50 µl of 1× TE buffer
into a low-binding tube. However, if DNA samples are highly degraded (such as from old biopsies) with fragments shorter
than 100 bp, the library fragments will be shorter than 200 bp (after being appended by adapters on both sides). In this
case, it might be a better choice to use a sequencing platform other than 454 with a higher throughput.
Library dilution before qPCR (optional) ● TIMING 5 min for each sample
22| If the amount of starting DNA is high (such as 500 ng), it is advisable to dilute the library into 1:10, 1:100 and 1:1,000
dilutions and use them for the qPCR quantification.
 CRITICAL STEP the diluted libraries should be stored in low-binding tubes.
 PAUSE POINT before proceeding to the emPCR, the library can be maintained on ice for several hours while waiting for the
qPCR results, or stored at − 20 °C for a longer time.
Library quantification by qPCR ● TIMING 2 h
23| Set up the qPCRs, with each sample in triplicate. Total number of samples = 3 × (5 standards + 1 nontemplate control + number of libraries):
H 2O 2
Fast Master Mix, 2×
4.0 µl
10.0 µl
Primer emPCR A (10 µM)
1.8 µl
Primer emPCR B (10 µM)
1.8 µl
TaqMan-MGB probe (10 µM)
0.4 µl
Total
18.0 µl
 CRITICAL STEP The Fast Master Mix already contains the polymerase and dNTPs needed.
1372 | VOL.6 NO.9 | 2011 | nature protocols
protocol
Figure 3 | Quality control. qPCR-amplified products analyzed by
p n 1 2 3 4
n 1 2 3 4 5 6 7
a
b
1% (wt/vol) agarose gel electrophoresis and visualization under UV light,
as described in Step 26. (a) Lanes 1–4 represent individual samples that
600 bp
show no detectable adapter dimer in the libraries, and thus would be
suitable for subsequent amplification and sequencing. ‘p’ and ‘n’ denote
600 bp
100 bp
positive (a previously used library) and negative (water) controls for
100
bp
the qPCR reaction. The expected size of adapter dimers is 79 bp for
the nonbarcoded Y adapters and 91 bp for those with barcodes. The size
range of expected qPCR products is 300–700 bp. (b) Example of poor-quality results—lanes 1, 3, 6 and 7 show the presence of adapter dimers and lane 2
shows a suboptimal library fragment length distribution; thus, these samples should not be used for sequencing.
24| Dispense 18 µl of the mix into each well, and then add 2 µl of sample library (from Step 21 or 22) or standards
(103, 104, 105, 106, 107 copies per µl) per well.
© 2011 Nature America, Inc. All rights reserved.
25| Run qPCR using cycling conditions as follows: 95 °C for 2 min; 35 cycles of 95 °C for 15 s; 60 °C for 60 s; and 68 °C for 60 s.
Quality control ● TIMING 1 h
26| Analyze the qPCR-amplified products by 1% (wt/vol) agarose gel electrophoresis under standard conditions. We use 4 µl
of qPCR products from one of the triplicates and mix it with 1 µl of 5× DNA loading dye and load onto a 1% (wt/vol) agarose
gel prestained with GelRed. This gel can be prepared while the qPCR is running. We apply 130 V for 45 min on a gel tray
with a 15-cm distance between electrodes. Ensure that there are no apparent bands of the sizes of adapter dimers (79 bp for
nonbarcoded Y adapters (see ref. 5) and 91 bp for the barcoded adapters). As an example, lanes 1–4 in Figure 3a indicated
no detectable adapter dimer in the libraries, whereas lanes 1, 3, 6 and 7 in Figure 3b showed the presence of adapter dimers
(and should not be used for the emPCR). Although lane 2 in Figure 3b did not show apparent adapter dimers, it indicated a
suboptimal library fragment length distribution when started with high-molecular-weight DNA samples.
? TROUBLESHOOTING
Calculation of the amount of library needed for sequencing ● TIMING 1 min
27| Calculate the amount of library needed for an emPCR that corresponds to a molecule of library–to-bead ratio (cpb) of
0.08, either using a preformulated Excel file in Supplementary Data 1 or as follows: If the library concentration according
to the qPCR standard curve analysis is 100,000 molecules per µl, and the amount of emPCR beads to be used for sequencing
this library is for one LV region (i.e., 35 million beads in the current Titanium setting for a half plate; see the ‘emPCR Method
Manual’13), then the amount of library needed is 35,000,000 × 0.08 / (100,000 × 2) = 14 µl.
The factor 2 is to account for the dsDNA-to-ssDNA conversion after denaturation. At a small cpb ratio (such as <0.2),
Poisson statistics approach linearity5. Thus, 0.08 can be used to approximate an enrichment percentage of 8%.
Library pooling (optional) ● TIMING 1 min for one additional library
28| Pooling of libraries prepared with barcoded Y adapters can be carried out at this stage, after calculating the relative
contribution of each library. An example of this calculation to allocate one LV region (a half plate, 50%) to eight libraries
with equal quota (6.25%) and another LV region to four libraries with different quota (2.5, 2.5, 22.5 and 22.5%) is available
from an Excel file in the Supplementary Data 1. Note that the total volume of the pooled libraries ideally should be 1–10 µl
for each SV (small volume) region and 10–100 µl for each LV region (one sequencing plate consists of 2 LV regions).
emPCR amplification of the library, bead recovery and enrichment ● TIMING ~13 h
29| Perform emPCR of the library as described in ref. 13.
 CRITICAL STEP After heat denaturation of dsDNA library into ssDNA, keep the library at 4 °C if not used immediately
(avoid exposure at room temperature (~22 °C) for a long time). We do not recommend stopping the heat denaturation step
by promptly cooling on ice.
 CRITICAL STEP After adding the library into washed DNA capture beads, immediately follow by vigorous vortexing for 5 s.
 CRITICAL STEP To avoid different levels of bead losses during the breaking of emulsion after emPCR, which can affect the
accuracy of the calculation of enrichment percentage, calculate enrichment according to reference 13. Before enrichment,
measure the amount of beads, denoted as b. After the collection of enriched beads (see ref. 13), measure the amount of
beads, denoted as a. The percent enrichment = a / b ×100
 CRITICAL STEP See ref. 13 for details on optimal spin speeds, wash steps, and appropriate conditions for removing as
much supernatant (contains free enrichment or sequencing primers) as possible.
nature protocols | VOL.6 NO.9 | 2011 | 1373
protocol
Library sequencing using the GS FLX Titanium sequencing method ● TIMING 5 h plus 8 h machine time
30| Perform sequencing of the library as described14. Choose run processing type ‘Full Processing for Shotgun or Paired End’
at the appropriate point.
© 2011 Nature America, Inc. All rights reserved.
Post-sequencing data analysis ● TIMING 2 h (on an 8-core computer with and 16 GB RAM)
31| Y-adapter trimming. Starting from version v2.3, the GS software no longer recognizes the MGB-probe sequence
(which was part of the earlier B adapter) and therefore a customized pipeline is needed. An example of the template file
TrimYadapter.xml is available in Supplementary Data 2 or can be created using the gsRunProcessor command to generate a
filter template, e.g.:
gsRunProcessor --template = filteronly > TrimYadapter.xml
Within the ‘qualityFilter’ section of the template file, add:
<primer>GTGACACGCAACAGGGGATAGACAAGGCACACAGGGGATAGG</primer>
Run the runAnalysisFilter command using the new filter template and the ‘signalProcessing’ folder, e.g.:
runAnalysisFilter --pipe = TrimYadapter.xml D_2010...signalProcessing
32| Split mixture of barcoded samples. To split the sff file that contains pooled sample sequences into individual ones, run
the sfffile command, e.g.:
sfffile -s Y -mcf Yscheme.txt -o region1 NameOfYourSFFfile1.sff > MIDyieldR1.txt
sfffile -s Y -mcf Yscheme.txt -o region2 NameOfYourSFFfile2.sff > MIDyieldR2.txt
The Yscheme.txt can be created using any available text editor; alternatively, it is available in Supplementary Data 3.
 CRITICAL STEP To identify individual samples, it is likely that the proprietary Roche ‘sfffile’ command will use the 5′ MID
sequences, given their higher sequencing quality and the fact that the 3′ MID sequences are at the far end of library molecules and are often not readable (and, if they are read, they will be trimmed away).
? TROUBLESHOOTING
Troubleshooting advice can be found in Table 2.
Table 2 | Troubleshooting table.
Step
Problem
Possible reason
Solution
26
A visible band of ~79 or 91 bp
There remains a substantial proportion of
adapter dimer, possibly due to an incorrect
amount of AMPure XP beads used for size
selection
Purify the prepared library again using
an appropriate amount (% vol/vol) of
AMPure XP beads on the basis of the
calibration results
The majority of the amplicons are
100–300 bp
An incorrect amount of AMPure XP beads was
used for size selection
Purify the prepared library again using
an appropriate amount (% vol/vol) of
AMPure XP beads on the basis of the calibration results
● TIMING
For preparing eight libraries:
Steps 1–6, DNA nebulization: 1 h
Steps 7–16, Fragment size selection: 1 h
Steps 17 and 18, End polishing, phosphorylation and dA extension: 1 h
Steps 19–21, Adapter ligation and purification: 1 h or overnight
Steps 22–25, Library quantification by qPCR: 2.5 h
Step 26, Library quality control by gel electrophoresis: 1 h
Steps 27 and 28, Library pooling: 10 min
Step 29, Emulsion PCR amplification of the library, bead recovery and enrichment: 8 h for amplification plus 5 h for bead
recovery and enrichment
Step 30, Library sequencing: 5 h hands-on time plus 8 h machine time
Step 31, Y adapter trimming: 2 h on an 8-core computer with 16 GB of RAM
Step 32, Split mixture of barcoded samples: 1 min
1374 | VOL.6 NO.9 | 2011 | nature protocols
a
50
Run 1 total yield: 379 million bases
Region 1
Region 2
Median read length = 395 bp
Median read length = 299 bp
Total 613,714 reads
Total 526,044 reads
50
Expected (%)
(Pipeline was adjusted for
Observed (%)
short reads for the
degraded clinical samples)
Proportion of yield within region (%)
Figure 4 | Anticipated results. The expected
and observed sequencing yields using multiplex
identifier adapters (MID adapters) from our initial
two 454 Titanium runs. (a) Run 1 included
six ‘Hp’ samples, which were sequenced in
Region 1 (‘region’ is also called ‘lane’ in other
systems), and two formalin-fixed old biopsies
with two metagenomic samples, which were
sequenced in Region 2. Run 1 also served as an
experimental validation of the barcodes. (b) Run
2 included more Hp samples, and tested two
newly designed adapters and one Rapid Library
barcode (Y8) that was not previously used.
Expected percentage is the quota allocated for
each sample according to the needs of different
projects (see Supplementary Data 1) and
observed percentage is the actual proportion of
reads sequenced from each sample. Because the
Y1 and Y12 adapters generated substantially less
yield than expected (30% and 50%, respectively),
they were not recommended for further use.
40
40
30
30
20
20
10
10
0
MID
Sample
Y1
Hp1
Y3
Hp2
Y5
Hp3
Y9
Hp4
Y10
Hp5
Y11
Hp6
Y12
0
Y1
Y3
Y5
Y9
M1
Y10
F1
Y11
F2
Y12
M2
b
ANTICIPATED RESULTS
Region 1
Region 2
Median read length = 469 bp
Median read length = 445 bp
Figure 4 shows the expected and
Total
851,679
reads
Total
856,957 reads
20
20
observed sequencing yield using
Expected (%)
Observed (%)
multiplex identifier adapters (MID
adapters) from our initial two
Titanium runs. For 454 sequencing,
15
15
every run can be divided by 2, 4, 8
or 16 physically separated regions
(the counterpart term of ‘Region’ in
10
10
Illumina sequencing is ‘Lane’). Run 1
included degraded clinical samples in
Region 2 (shorter reads) and gave a total yield of 379 Mb, at the lower end of
5
5
the Roche’s specification (360M–560M).
Run 2 contained chromosomal DNA
samples, and yielded 711 Mb (699 Mb
0
0
after trimming of our customized
MID
Y1 Y3 Y5 Y8 Y9 Y10 Y11 Ya1 Ya2
Y1 Y3 Y5 Y8 Y9 Y10 Y11 Ya1 Ya2
Sample
adapter), with an even distribution
of each sample, demonstrating the
strength of this method.
Note that the adapter Y1 did not perform as well as others in Run 1 and the prepared library using this adapter was intentionally (for the purpose of our study project) used in Run 2 to balance the yield of the sample ‘Hp1’. Adapters Y1 and Y12 did
not pass the experimental validation and are therefore not included in the set of eight recommended adapters.
Run 2 total yield: 711 million bases
Acknowledgments This study was supported by the Sixth Research Framework
Programme of the European Union, a project on infections and cancers (INCA,
LSHC-CT-2005-018704); the Karolinska Institutet Faculty funds for partial
financing of new doctoral students (KID scholarship) to Z.Z.; and the Swedish
Research Council Formas grant (FORMAS) to A.F.A.
AUTHOR CONTRIBUTIONS Z.Z. designed the Y adapters and qPCR assay,
proposed Poisson statistics, conducted the experiments and drafted the
manuscript; A.A., Ö.M., S.G. and H.N. assisted with the experiments and
provided critical reviews; W.Y., L.E. and A.A. obtained funding and provided
critical reviews; A.F.A. supervised the study.
COMPETING FINANCIAL INTERESTS The authors declare no competing financial
interests.
1
p2
H
0
9
p2
H
8
p1
H
7
p1
H
6
p1
H
5
p1
H
p1
p1
H
H
4
3
p1
H
2
p1
H
1
p1
H
0
p1
H
p9
p1
H
p8
Note: Supplementary information is available via the HTML version of this article.
H
H
H
p7
Proportion of yield within region (%)
© 2011 Nature America, Inc. All rights reserved.
protocol
Published online at http://www.natureprotocols.com/.
Reprints and permissions information is available online at http://www.nature.
com/reprints/index.html.
1. Roche Diagnostics. Rapid Library Preparation Method Manual: GS FLX
Titanium Series (Roche Applied Science, October 2009 (Rev. January 2010)).
2. Frias-Lopez, J. et al. Microbial community gene expression in ocean
surface waters. Proc. Natl. Acad. Sci. USA 105, 3805–3810 (2008).
3. Aird, D. et al. Analyzing and minimizing PCR amplification bias in
Illumina sequencing libraries. Genome Biol. 12, R18 (2011).
4. Davey, J.L. & Blaxter, M.W. RADSeq: next-generation population genetics.
Brief Funct. Genomics 9, 416–423 (2010).
5. Zheng, Z. et al. Titration-free massively parallel pyrosequencing
using trace amounts of starting material. Nucleic Acids Res. 38, e137
(2010).
nature protocols | VOL.6 NO.9 | 2011 | 1375
protocol
© 2011 Nature America, Inc. All rights reserved.
6. Margulies, M. et al. Genome sequencing in microfabricated high-density
picolitre reactors. Nature 437, 376–380 (2005).
7. Bentley, D.R. et al. Accurate whole human genome sequencing using
reversible terminator chemistry. Nature 456, 53–59 (2008).
8. Shendure, J. et al. Accurate multiplex polony sequencing of an evolved
bacterial genome. Science 309, 1728–1732 (2005).
9. Baird, N.A. et al. Rapid SNP discovery and genetic mapping using
sequenced RAD markers. PLoS ONE 3, e3376 (2008).
10. Carson, S. et al. DNA sequencing by capillary electrophoresis: use of a
two-laser-two-window intensified diode array detection system.
Anal. Chem. 65, 3219–3226 (1993).
1376 | VOL.6 NO.9 | 2011 | nature protocols
11. White, R.A. III, Blainey, P.C., Fan, H.C. & Quake, S.R. Digital PCR provides
sensitive and absolute calibration for high throughput sequencing.
BMC Genomics 10, 116 (2009).
12. Huang, J., Zheng, Z., Andersen, A.F., Engstran, L. & Ye, W. Rapid
screening of complex DNA samples by single-molecule amplification and
sequencing. PLoS ONE 6, e19723 (2011).
13. Roche Diagnostics. emPCR Method Manual: Lib-L LV GS FLX
Titanium Series (Roche Applied Science, October 2009 (Rev. January
2010)).
14. Roche Diagnostics. Sequencing Method Manual: GS FLX Titanium Series
(Roche Applied Science, October 2009 (Rev. November 2010)).