protocol Titration-free 454 sequencing using Y adapters Zongli Zheng1,2, Abdolreza Advani3, Öjar Melefors2,3, Steve Glavas3, Henrik Nordström2,3, Weimin Ye1, Lars Engstrand2,3 & Anders F Andersson4 Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden. 2Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden. 3Swedish Institute for Infectious Disease Control, Solna, Sweden. 4Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden. Correspondence should be addressed to Z.Z. ([email protected]). 1 © 2011 Nature America, Inc. All rights reserved. Published online 18 August 2011; doi:10.1038/nprot.2011.369 We describe a protocol for construction and quantification of libraries for emulsion PCR (emPCR)-based sequencing platforms such as Roche 454 or Ion Torrent PGM. The protocol involves library construction using customized Y adapters, quantification using TaqMan-MGB (minor groove binder) probe–based quantitative PCR (qPCR) and calculation of an optimal template-tobead ratio based on Poisson statistics, thereby avoiding the need for a laborious titration assay. Unlike other qPCR methods, the TaqMan-MGB probe specifically quantifies effective libraries in molar concentration and does not require specialized equipment. A single quality control step prior to emulsion PCR ensures that libraries contain no adapter dimers and have an optimal length distribution. The presented protocol takes ~7 h to prepare eight barcoded libraries from genomic DNA into libraries that are ready to use for full-scale emPCR. It will be useful, for example, to allow analyses of precious clinical samples and amplification-free metatranscriptomics. INTRODUCTION Modern DNA sequencing technology has improved markedly in recent years. However, in many current technologies, sample library preparation before sequencing has surfaced as a key limiting factor. For instance, the current Roche 454 sequencing protocol for preparation of a shotgun library1 requires 500 ng of DNA as starting material and includes a laborious titration assay. A faster library preparation protocol that can handle lower starting amounts would be desirable and particularly useful for, e.g., sequence analyses of precious clinical samples, cDNA sequencing of environmental samples for metatranscriptomics2 without the need for amplification that may introduce biases3, or microsatellite sequencing in population genetics4. The protocol presented here is based on our previous study that described a novel method for sequencing lowstarting-amount materials5. Library preparation in most high-throughput sequencing technologies involves the ligation of universal adapter(s) to the ends of DNA sample fragments to enable PCR amplification6–8. Unlike linear adapters such as adapters A and B (used for 454 sequencing), a single Y adapter, proposed previously9 and used for the Illumina sequencing, has several advantages. Given a 100% ligation efficiency, four double-stranded DNA (dsDNA) molecules would, on average, generate two properly appended dsDNA libraries using adapters A and B, whereas eight single-stranded DNA (ssDNA) libraries would be generated using a single Y adapter (Fig. 1). In addition, the Y adapters can only be ligated at the double-stranded stem end that enables a simultaneous incubation with all enzymes involved, thus eliminating the need for laborious and yield-reducing cleanup steps. One of the main differences in current high-throughput DNA sequencing technology as compared with traditional Sanger sequencing is that sample template concentration is kept very low to avoid tedious microbial subcloning. Emulsion PCR–based sequencing uses many millions of water-in-oil droplets, each of which serves as a separated amplification compartment6. Sample library concentration is kept so low that the majority of the droplets contain no library, a small proportion contains single-molecule libraries and an even smaller proportion contains mixed-molecule libraries in a stochastic manner that follows Poisson distribution, as shown by our sequencing data5. In addition, the enrichment step will select only those beads that have a library, but a too-low DNA-to-bead ratio will lead to insufficient amount of beads for sequencing, whereas a too-high ratio will lead to frequent occurrence of mixed library beads. Thus, one of the key factors for a successful experiment is to use an optimal amount of library for sequencing. It is important to keep in mind that the amount of library added is not linearly associated with the number of highquality beads5. We recommend an input DNA-to-bead ratio of 0.08, which will result in 96% of the enriched beads having a single-copy template according to Poisson distribution, and will be sufficient for sequencing5. A higher proportion of ‘nonpure’ (mixed-copy) beads associated with higher DNA-to-bead ratio might, in addition, affect the consumption of nucleotide flows during sequencing and bioinformatics processes, such as image background and signal intensity normalizations. Apart from the predicted increase of mixed library beads, a higher ratio of input DNA to beads is also less tolerant of subtle pipetting errors5. Two quantitative PCR (qPCR) assays have previously been proposed to quantify libraries derived from trace amounts of starting material10,11. Besides requiring less library, as compared with UV spectrophotometry and fluorometry methods, qPCR assays also have the advantage of measuring the amount of effective library— as the total library typically contains a mixture of molecules that are amplifiable, amplifiable but inefficient, or nonamplifiable for various reasons5. The previous two methods are based on SYBR Green dye10 qPCR and universal template TaqMan probe digital PCR11, respectively. With the SYBR Green–based qPCR assay, there is no need to design and use the relatively expensive TaqMan probe. However, it measures the total mass of the library and requires transformation into copy numbers on the basis of amplicon size estimation by gel electrophoresis or Agilent Bioanalyzer. Furthermore, the precision (coefficient of variance (CV) of the estimates) of this assay has not been assessed. In contrast, TaqMan-based assays have nature protocols | VOL.6 NO.9 | 2011 | 1367 protocol Figure 1 | A schematic illustration of the constructions of two types of libraries, A-B and Y. The A-B library construction method generates on average two effective double-stranded molecules (each appended with adapters A (blue) and B (green)) from four dsDNA molecules, given 100% ligation efficiency. The molecules appended with A-A and B-B are nonamplifiable because of the amplification-inhibiting hairpins formed between the complementary adapted sequences after denaturing and annealing steps. The Roche A-B library uses a biotinylated B adapter and two additional steps to generate two effective ssDNA molecules (see ref. 5). In contrast, the Y library construction method generates eight effective single-stranded molecules. The MGB-probe is in red and barcode in yellow. A-B library Y library End polishing, 5′ phosphorylation P P PP P End polishing, 5′ phosphorylation, adenylation and ligation of Y adapter P P P Ligation of adapters A B © 2011 Nature America, Inc. All rights reserved. Without cleanup in between the advantage of measuring the number of amplifiable molecules directly. The universal template TaqMan probe digital PCR is based on an 8-bp dual-labeled locked nucleic acid probe, complementary to the 5′-end tail of the customized amplification primer11. First, the digital PCR assay requires special equipment that is not widely accessible, such as Fluidigm’s BioMark microfluidic device. Second, two rounds of quantifications are used: an initial crude quantification by qPCR to guide the dilution of the libraries for a more precise quantification by digital PCR, which renders a precision (CV 11.8%) higher than the initial qPCR alone (CV 21.2%)11. Our MGB probe-based assay5 does not require special equipment and, therefore, is more accessible to ordinary laboratories. This MGB probe assay is at least as precise as digital PCR (CV 9.5% versus 11.8%). The MGB probe is a 20-bp-long probe complementary to the library molecule and is located next to the 3′ end of one of the amplification primers (Fig. 2). Having a probe targeting site between the amplification primers has the advantage that the amount of fluorescence signal is proportional to the number of library amplicons and not to the potential amplification primer dimers, which might be a potential source for less precise quantification with a probe-targeting part of an amplification primer. Further, a longer probe, as compared with a shorter one (8 bp), is more specific. Limitations of the method should be acknowledged. The library quantification described here is based on a qPCR assay. Essentially, a Amplification primer emPCR A Sequencing primer complement Nick repair Two effective molecules Eight effective molecules One of the potential forms, if no cleanups in between for all qPCR-based assays, amplification efficiency drops as the amplicon length increases. The best quantification method would be the one that best mimics the efficiency of emPCR. Either poorer or better efficiency than obtained by emPCR (depending on the emPCR system) could result in inaccurate estimation of the enrichment percentage after emPCR. This issue possibly pertains to all qPCR assays, including ours. It seems that our qPCR quantification method yields lower efficiency than the Roche Titanium emPCR method (data not shown) and results in ~50% more enriched beads than expected. However, as a typical bead recovery percentage is around 65–85%, the additional (~50%) beads more or less compensate for the bead loss during recovery and result in nearly the amount needed for loading onto the sequencing plate. Because qPCR amplification efficiency drops rapidly for long amplicons, and with the anticipated increase in read length in upgraded or b At and after ∼500 bp, the sequence quality typically drops below acceptable levels and is trimmed away during computational analysis Sequencing starts Library key GACT 3′ Sequencing primer 5′ MID sequence T 5′ 5′ 3′ 5′ 3′ MID sequence MGB probe priming site Library key MGB probe Sample 5′ MID 3′ MID emPCR B complement Amplification primer emPCR B complement c 5′ -C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACAC T*A*C*T*C*G*T-sample-pC*G*A*G*T*A*GTGTGACACGCAACAGGGGATAG ACAAGGCACACAGGG*G*A*T*A*G*G -3′ 3′-G*G*A*T*A*G*GGGACACACGGAAC AGATAGGGGACAACGCACAG TGTG*A*T*G*A*G*Cp-sample-T*G*C*T*C*A*T CACATCAGCAGCCTCTGTGCGTCCCTAC*T*C*T*A*C*C-5′ Figure 2 | Design of Y MID adapter. (a,b) Schematic illustrations of the Y MID adapter (a) and the sequencing process on a library molecule (b). (c) An example of two library molecules generated using one ‘Y3’ adapter. The emPCR primer A is underlined; emPCR primer B complement is shown by dashed underline. The sequencing primer is highlighted in yellow, the library key sequence is shown with dots underneath, the 5′ MID sequence is shown in red, the 3′ MID sequence is shown in green and the MGB-probe highlighted is in purple. 1368 | VOL.6 NO.9 | 2011 | nature protocols protocol © 2011 Nature America, Inc. All rights reserved. new platforms in mind, we recommend empirically estimating the difference in enrichment percentage predicted by the qPCR method and one observed by titration assay once before applying the titration-free method routinely. Here we use a qPCR thermocycling program of ~1.5 h, which favors the complete extension of long amplicons. For applications of sequencing shorter amplicons (e.g., <150 bp), the thermocycling program can be shortened to <30 min, as in a typical Fast qPCR assay. Experimental design Multiplexing adapters (MID adapters), also referred to as barcodes, are essential as more and more projects require the pooling of samples. We present here a set of eight Y-barcoded adapters. The Y MID adapter has a stem of 10 bp, which serves as the barcode. It is important to have a short Y-adapter stem (just long enough to anneal and form dsDNA adapters at 25 °C for ligation to the sample DNA) to reduce the inhibitory effect during PCR amplification. The longer the stem, the stronger the inhibitory effect will be, assuming comparable G/C to A/T ratios. This inhibition is due to the potential formation of a hairpin from the complementary barcodes at both ends of an ssDNA molecule (Fig. 2). A 17-bp-long stem greatly inhibited PCR amplification (annealing temperature 60 °C) and showed no visible amplicons on agarose gel (data not shown). Each Y MID adapter has a unique stem and two universal branches and is created by annealing two oligonucleotides that share ten complementary nucleotides for the stem sequence (Fig. 2). The barcodes were designed primarily using the barcodes selected from the 12 Roche 454 Rapid Library (RL) MID sequences. We designed two new barcode sequences, Ya1 and Ya2, containing the same number (n = 17) of nucleotide flows during sequencing as the RL MIDs. Selection and validation of the barcodes were performed in two steps. First, we used OligoAnalyzer (http://www. idtdna.com) to exclude (n = 4) by initial sequence analysis the RL MIDs that might form potential secondary structures (because of the introduction of the TaqMan-MGB probe complementary site) or that might have substantially different Gibbs free energy (∆G) in the formation of the stem of Y adapters from the rest of the MID ∆Gs. Second, experimental validation excluded two other RL MIDs barcodes (Y1 and Y12) that gave different proportions of sequencing yield than expected from the qPCR quantification. Because the qPCR quantification was applied to individual libraries, it cannot reflect possible interactions among the adapters when they are pooled in one reaction at a later stage. The remaining six RL MIDs barcodes and the two newly designed barcodes (Ya1 and Ya2) performed well in a panel. Because sequencing starts from the 3′ end of a library molecule and sequencing errors accumulate as the polymerases extend toward its 5′ end, we designed the qPCR probe complementary site at the 5′ end of a library molecule so that it does not waste the sequencing capacity (Fig. 2b). Currently, the 454 Titanium sequencer generates read lengths of ~500 bp, whereas the library molecules are generally longer than 500 bp, indicating that most of the 5′ end of the library (3′ end of the read) is not sequenced at acceptable quality and, therefore, trimmed away. This can also be evidenced by the small difference in the total number of yielded bases between pre- and post-trimming of our customized adapter (711 versus 699 Mbp, see ANTICIPATED RESULTS). For every experiment, it is useful to include a blank (no sample) library in parallel, starting from the first step until qPCR quantification and agarose gel electrophoresis. Because of background noise (most likely due to a trace amount of adapter dimers remaining in the library), the qPCR quantification may yield a value of hundreds to thousands of molecules in total for the blank library12. Agarose gel electrophoresis of the blank library qPCR amplicons would reveal a band corresponding to the size of an adapter dimer. This background can be ignored as it typically comprises <1% of a sample library. However, longer amplicons would indicate that contamination had been introduced in the upstream steps. Size selection of DNA fragments is crucial. The Roche 454 Titanium platform is able to sequence, on average, 500 bp, and the selected DNA fragments should be longer than 500 bp to take advantage of the long read length. Conventional qPCR methods recommend using amplicons no longer than 150 bp to achieve good amplification efficiency. Using regular thermocycling conditions, the amplification efficiency drops for amplicons longer than 500 bp and drops profoundly for amplicons longer than 1,000 bp (data not shown). Fragments longer than 1,000 bp will consequently not be amplified well when mixed in one microdroplet with a 500-bp fragment, although their presence might not be a problem provided that the noise light signal from the 1,000-bp amplicons during sequencing is negligible. When a 1,000-bp fragment is amplified alone in a microdroplet, the number of amplicons on each bead will be much lower than normal (e.g., 100,000 amplicons of size 1,000 bp on one bead versus 1 million amplicons of size 500 bp on another bead), despite using the same thermocycling conditions. Conceivably, postsequencing light normalization is better for a library with a narrower size range, and we recommend removing DNA fragments longer than 900 bp for the current Roche 454 Titanium emPCR setting13. For an experiment in which a single sample is to be sequenced, the nonbarcoded Y adapter5 can be used to save 10 bp for each read compared with the barcoded adapters. This Y adapter has a different key sequence (TCAG) than the barcoded ones (GACT). The Roche 454 pipeline, however, supports a simultaneous sequencing of libraries with different keys on one plate in physically separated regions; this can be done, for example, by using the nonbarcode Y adapter for one large-volume (LV) region (one sequencing plate consists of two LV regions) and eight different barcoded adapters for the other LV region. MATERIALS REAGENTS CRITICAL For all the reagents and buffers used, we have not noticed any adverse effect after storage in a freezer or refrigerator (per manufacturer’s recommendations) for up to 1 year, except that any solutions containing ethanol need to be freshly prepared. • DNA sample. Conventional methods (e.g., a variety of Qiagen DNA kits) from different types of sample can be used for DNA extraction. • Oligonucleotides for Y adapters. We used 16 HPLC-purified oligo nucleotides (Integrated DNA Technologies) to form eight Y adapters. See Table 1. nature protocols | VOL.6 NO.9 | 2011 | 1369 protocol Table 1 | Oligonucleotides used to form Y adapters. Number Sequence (5′–3′) Y3 5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACACT*A *C*T*C*G*T-3′ 5′-pC*G*A*G*T*AGTGTGACACGCAACAGGGGATAGACAAGG CACACAGGG*G*A*T*A*G*G-3′ Y5 5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACGAG*T *A*G*A*C*T-3′ 5′-pG*T*C*T*A*CTCGTGACACGCAACAGGGGATAGACAAGG CACACAGGG*G*A*T*A*G*G-3′ Y8 5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACGTA*C *T*G*T*G*T-3′ © 2011 Nature America, Inc. All rights reserved. 5′-pC*A*C*A*G*TACGTGACACGCAACAGGGGATAGACAAGG CACACAGGG*G*A*T*A*G*G-3′ Y9 5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACGTA*G *A*T*C*G*T-3′ 5′-pC*G*A*T*C*TACGTGACACGCAACAGGGGATAGACAAGG CACACAGGG*G*A*T*A*G*G-3′ Y10 5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACTAC*G *T*C*T*C*T-3′ 5′-pG*A*G*A*C*GTAGTGACACGCAACAGGGGATAGACAAGG CACACAGGG*G*A*T*A*G*G-3′ Y11 5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTACTAT*A *C*G*A*G*T-3′ 5′-pC*T*C*G*T*ATAGTGACACGCAACAGGGGATAGACAAGG CACACAGGG*G*A*T*A*G*G-3′ Ya1 5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTCTACT*C *G*T*A*G*T-3′ 5′-pC*T*A*C*G*AGTAGGACACGCAACAGGGGATAGACAAGG CACACAGGG*G*A*T*A*G*G-3′ Ya2 5′-C*C*A*T*C*T*CATCCCTGCGTGTCTCCGACGACTGTACA*G *T*A*C*G*T-3′ 5′-pC*G*T*A*C*TGTACGACACGCAACAGGGGATAGACAAGG CACACAGGG*G*A*T*A*G*G Complementary nucleotides that anneal to form the Y-adapter stem and barcode are shown in underlined text. Asterisks (*) indicate a phosphorothioate-modified bond, p indicates a phosphorylation. • Oligos for qPCR. emPCR A 5′-CCATCTCATCCCTGCGTGTC-3′ (various vendors, salt purification); emPCR B 5′-CCTATCCCCTGTGTGCCTTG-3′ (Various vendors, salt purification); MGB probe 6FAM-CTATCCCCTGT TGCGTGTC-MGBNFQ (Applied Biosystems, HPLC purification) • qPCR standards. The standards were prepared by cloning of an available library, PCR amplification of a single colony suspension, purification and dilutions, as described earlier5. Alternatively, a simple dilution method can be used12 (see INTRODUCTION). • UltraPure glycerol (Invitrogen, cat. no. 15514011, http://www.invitrogen.com) • MinElute PCR purification kit (Qiagen, cat. no. 28004, http://www.qiagen.com) • T4 DNA ligase (Enzymatics, cat. no. L603-LC-L, http://www.enzymatics. com) • Klenow (3′→5′ exo-) (Enzymatics, cat. no. 01-LC-L, http://www.enzymatics. com) • Taq DNA polymerase, recombinant (Invitrogen, cat. no. 10342-020, http:// www.invitrogen.com) • dNTP mix (Enzymatics, cat. no. N205L) • AMPure XP beads (Agencourt, product no. A63880, http://www. beckmancoulter.com) • TaqMan Fast Universal PCR Master Mix (Applied Biosystems, part no. 4352042, http://www.appliedbiosystems.com) • End-repair mix (low concentration; Enzymatics, cat. no. Y914-LC-L, http://www.enzymatics.com) • Tris-EDTA (TE) buffer (10×, BioUltra Molecular Biology Grade, pH 8.0; Sigma, cat. no. 93283, http://www.sigmaaldrich.com) • Buffer PB (Qiagen, cat. no. 19066, http://www.qiagen.com) • Water, Molecular Biology (Sigma, cat. no. W4502-1L, http://www. sigmaaldrich.com) • Ethanol (BioUltra, for molecular biology; Sigma, cat. no. 51976, http://www. sigmaaldrich.com) • GelPilot DNA loading dye (5×, Qiagen, cat. no. 239901, http://www.qiagen. com) • GelRed nucleic acid gel stain (Biotium, cat. no. 41002, http://www.biotium. com/) EQUIPMENT • 7900HT Fast Real-Time PCR System or equivalent (Applied Biosystems, part no. 4329001, http://www.appliedbiosystems.com) • Thermocycler • Magnetic particle collector (MPC, DynaMag-2 magnet, cat. no. 123-21D, Invitrogen, http://products.invitrogen.com/ivgn/product/12321D) • Nitrogen cylinder polyallomer tube (Beckman Coulter, part no. 357448, http://www.beckmancoulter.com) or other low-binding tube alternatives • Nebulizers (Invitrogen, cat. no. K7025-05, http://www.invitrogen.com) • PCR tubes • Microcentrifuge tubes REAGENT SETUP Adapter annealing To a 200-µl PCR tube, add the following: TE buffer (1×) 80 µl Y adapter, top (100 µM) 10 µl Y adapter, bottom (100 µM) 10 µl Incubate at 95 °C for 1 min, 60 °C to 14 °C with − 0.1 °C per second, 14 °C hold. Dilute the annealed adapters tenfold with 1× TE into working concentration (1 µM). This can be stored at −20 °C for at least 1 year. Nebulizing buffer Nebulizing buffer is 10% (vol/vol) glycerol in 1× TE buffer. This buffer can be stored at 4 °C for at least 1 year. PROCEDURE DNA nebulization ● TIMING 1 h for eight samples 1| Add 590 µl of nebulizing buffer to a nebulizer. Add 10 µl of DNA sample. Connect the nebulizer to a nitrogen cylinder connected to a regulator and apply 30 psi for 1 min. CRITICAL STEP If larger sample volumes are used, the total volume should be adjusted to 600 µl. 2| Add 2.5 ml of PB buffer and mix by pipetting. 3| Transfer 650 µl of the liquid into a MinElute spin column, centrifuge at 10,000g for 15 s and discard the flow-through. 1370 | VOL.6 NO.9 | 2011 | nature protocols protocol 4| Repeat Step 3 until all of the nebulized sample has been transferred to the spin column. 5| Add 700 µl of 70% (vol/vol) ethanol to the column, centrifuge at 10,000g for 1 min and discard the flow-through. 6| Elute the sample in 25 µl of 1× TE buffer according the manufacturer’s instructions. CRITICAL STEP If you start with small amounts of DNA (less than 10 ng), low-binding tubes (e.g., polyallomer tubes), should be used throughout the library preparation, including for the storage of the library in the freezer. CRITICAL STEP The DNA nebulization steps (Steps 1–6) can be skipped if the starting sample is of low molecular weight (such as degraded archived formalin-fixed and paraffin-embedded tissue samples). The volume of AMPure XP beads used in the fragment size selection (Steps 7–11) should also be adjusted to avoid losing the sample fragments. © 2011 Nature America, Inc. All rights reserved. Fragment size selection ● TIMING 30 min for one sample, 5 more min for each additional sample 7| To the nebulized and purified DNA fragments, add an appropriate amount of AMPure XP beads (e.g., 11.5 µl of beads into 25 µl of sample to bind fragments longer than 900 bp; calibration is needed for each batch), and mix by pipetting. CRITICAL STEP Calibration of the AMPure XP beads should be done according to the manufacturer’s instructions before library construction. The example given above was based on a calibration result showing that 11.5 µl of beads added to 25 µl of sample captured fragments longer than 900 bp, and that 14.5 µl of beads in 25 µl of sample captured fragments longer than 500 bp. 8| Transfer to a 1.5-ml tube and incubate at ambient temperature for 5 min. 9| Place the tube on the MPC. After the beads are pelleted (about 1 min), pipette the supernatant, which contains fragments shorter than 900 bp, into a new tube. 10| Add an appropriate amount of AMPure XP beads (e.g., 3.0 µl) to the tube, mix by pipetting and incubate at ambient temperature for 5 min. 11| Place the new tube on the MPC. After the beads are pelleted, pipette and discard the supernatant, which contains fragments shorter than 500 bp. The fragments that remain on the beads are in the size range of 500–900 bp. CRITICAL STEP Every time before you pipette the AMPure XP beads, you should vortex the bead tube thoroughly to obtain a homogeneous solution. The size selection is based on the amount of solution containing the AMPure XP beads, not the amount of beads per se, in relation to the sample volume. Fragments longer than 500 bp will remain on the beads in Step 11 when applying 58% (= 14.5/25) of beads, where 14.5 is the total volume of AMPure XP bead solution (11.5 µl from Step 7 plus 3.0 µl from Step 10) and 25 is the volume of sample from Step 6. 12| Add 500 µl of 70% (vol/vol) ethanol, incubate for 30 s and then pipette and discard the ethanol. 13| Repeat Step 12 once and remove any residual liquid drops at the bottom or on the walls of the tube. 14| Leave the tube open (on the MPC) to dry at ambient temperature for 2 min. 15| Remove the tube from the MPC and add 25 µl of 1× TE buffer (or 14 µl if starting with small amount of sample). Pipette to mix the bead pellet. 16| Place the tube back onto the MPC. After the beads are pelleted (about 1 min), collect the aqueous phase, which contains 500- to 900-bp-long fragments, into a new tube. End-polishing, phosphorylation and dA extension ● TIMING 1 h 17| To a 200 µl PCR tube, add the following: Size-selected DNA sample from Step 16 (add 1× TE up to 14 µl) 14.0 µl 1.0 µl dNTP mix (25 mM) SLOW ligation buffer, 10× (component of T4 DNA ligase kit) 2.5 µl Buffer for Taq polymerase, 10× (Mg 2.0 µl 2 + free) End-repair mix (low concentration) 2.0 µl Klenow exo- 0.5 µl Taq polymerase 0.5 µl Total 22.5 µl nature protocols | VOL.6 NO.9 | 2011 | 1371 protocol CRITICAL STEP Klenow exo- is optional and can be omitted if input DNA concentration is high (e.g., >10 ng). Note that Invitrogen ‘Platinum’ Taq should not be used—Taq polymerase becomes active when the temperature rises and is fully active at 72 °C, when it adds dA to the 3′ end of dsDNA. The Platinum Taq polymerase, however, requires a heat activation step, which would disassociate the dsDNA and should therefore be avoided. 18| Place on a thermocycler and apply the following program (with a heated lid): 12 °C for 15 min, 37 °C for 15 min and 72 °C for 15 min; finally, hold at 4 °C. Adapter ligation and purification ● TIMING 1 h or 8 h, depending on starting amount 19| To the reaction tube from Step 18, add the following: Annealed Y adapter (with or without barcodes, 1 µM) 1.0 µl Ligase (enzymatic, low concentration) 2.0 µl 25.0 µl © 2011 Nature America, Inc. All rights reserved. Total 20| Adapter ligation should be performed using option A for large amounts of starting DNA or option B for small amounts of starting material: (A) Large starting amount of DNA (i) If a large amount is used, such as 100 ng, incubate at 22 °C for 20 min. (B) Small starting amount of DNA (i) If less than 10 ng is used, incubate at 12 °C for 8 h or overnight. CRITICAL STEP The yield of the ligation product increases with ligation time. When starting amounts of samples are between 10 and 100 ng, ligation time can be adjusted on the basis of the amount of sequence yield needed. We previously used 1 ng of nebulized fragments and generated sufficient library for 10 Titanium runs5. 21| Purification. To remove free adapters (or potential adapter dimers and residual fragments shorter than 500 bp), add the same amount of AMPure XP beads as the total of those used in Steps 7 and 10 (in our example, 14.5 µl of beads) to the 25-µl ligation reaction from Step 19. Follow the manufacturer’s instructions and elute the library with 50 µl of 1× TE buffer into a low-binding tube. However, if DNA samples are highly degraded (such as from old biopsies) with fragments shorter than 100 bp, the library fragments will be shorter than 200 bp (after being appended by adapters on both sides). In this case, it might be a better choice to use a sequencing platform other than 454 with a higher throughput. Library dilution before qPCR (optional) ● TIMING 5 min for each sample 22| If the amount of starting DNA is high (such as 500 ng), it is advisable to dilute the library into 1:10, 1:100 and 1:1,000 dilutions and use them for the qPCR quantification. CRITICAL STEP the diluted libraries should be stored in low-binding tubes. PAUSE POINT before proceeding to the emPCR, the library can be maintained on ice for several hours while waiting for the qPCR results, or stored at − 20 °C for a longer time. Library quantification by qPCR ● TIMING 2 h 23| Set up the qPCRs, with each sample in triplicate. Total number of samples = 3 × (5 standards + 1 nontemplate control + number of libraries): H 2O 2 Fast Master Mix, 2× 4.0 µl 10.0 µl Primer emPCR A (10 µM) 1.8 µl Primer emPCR B (10 µM) 1.8 µl TaqMan-MGB probe (10 µM) 0.4 µl Total 18.0 µl CRITICAL STEP The Fast Master Mix already contains the polymerase and dNTPs needed. 1372 | VOL.6 NO.9 | 2011 | nature protocols protocol Figure 3 | Quality control. qPCR-amplified products analyzed by p n 1 2 3 4 n 1 2 3 4 5 6 7 a b 1% (wt/vol) agarose gel electrophoresis and visualization under UV light, as described in Step 26. (a) Lanes 1–4 represent individual samples that 600 bp show no detectable adapter dimer in the libraries, and thus would be suitable for subsequent amplification and sequencing. ‘p’ and ‘n’ denote 600 bp 100 bp positive (a previously used library) and negative (water) controls for 100 bp the qPCR reaction. The expected size of adapter dimers is 79 bp for the nonbarcoded Y adapters and 91 bp for those with barcodes. The size range of expected qPCR products is 300–700 bp. (b) Example of poor-quality results—lanes 1, 3, 6 and 7 show the presence of adapter dimers and lane 2 shows a suboptimal library fragment length distribution; thus, these samples should not be used for sequencing. 24| Dispense 18 µl of the mix into each well, and then add 2 µl of sample library (from Step 21 or 22) or standards (103, 104, 105, 106, 107 copies per µl) per well. © 2011 Nature America, Inc. All rights reserved. 25| Run qPCR using cycling conditions as follows: 95 °C for 2 min; 35 cycles of 95 °C for 15 s; 60 °C for 60 s; and 68 °C for 60 s. Quality control ● TIMING 1 h 26| Analyze the qPCR-amplified products by 1% (wt/vol) agarose gel electrophoresis under standard conditions. We use 4 µl of qPCR products from one of the triplicates and mix it with 1 µl of 5× DNA loading dye and load onto a 1% (wt/vol) agarose gel prestained with GelRed. This gel can be prepared while the qPCR is running. We apply 130 V for 45 min on a gel tray with a 15-cm distance between electrodes. Ensure that there are no apparent bands of the sizes of adapter dimers (79 bp for nonbarcoded Y adapters (see ref. 5) and 91 bp for the barcoded adapters). As an example, lanes 1–4 in Figure 3a indicated no detectable adapter dimer in the libraries, whereas lanes 1, 3, 6 and 7 in Figure 3b showed the presence of adapter dimers (and should not be used for the emPCR). Although lane 2 in Figure 3b did not show apparent adapter dimers, it indicated a suboptimal library fragment length distribution when started with high-molecular-weight DNA samples. ? TROUBLESHOOTING Calculation of the amount of library needed for sequencing ● TIMING 1 min 27| Calculate the amount of library needed for an emPCR that corresponds to a molecule of library–to-bead ratio (cpb) of 0.08, either using a preformulated Excel file in Supplementary Data 1 or as follows: If the library concentration according to the qPCR standard curve analysis is 100,000 molecules per µl, and the amount of emPCR beads to be used for sequencing this library is for one LV region (i.e., 35 million beads in the current Titanium setting for a half plate; see the ‘emPCR Method Manual’13), then the amount of library needed is 35,000,000 × 0.08 / (100,000 × 2) = 14 µl. The factor 2 is to account for the dsDNA-to-ssDNA conversion after denaturation. At a small cpb ratio (such as <0.2), Poisson statistics approach linearity5. Thus, 0.08 can be used to approximate an enrichment percentage of 8%. Library pooling (optional) ● TIMING 1 min for one additional library 28| Pooling of libraries prepared with barcoded Y adapters can be carried out at this stage, after calculating the relative contribution of each library. An example of this calculation to allocate one LV region (a half plate, 50%) to eight libraries with equal quota (6.25%) and another LV region to four libraries with different quota (2.5, 2.5, 22.5 and 22.5%) is available from an Excel file in the Supplementary Data 1. Note that the total volume of the pooled libraries ideally should be 1–10 µl for each SV (small volume) region and 10–100 µl for each LV region (one sequencing plate consists of 2 LV regions). emPCR amplification of the library, bead recovery and enrichment ● TIMING ~13 h 29| Perform emPCR of the library as described in ref. 13. CRITICAL STEP After heat denaturation of dsDNA library into ssDNA, keep the library at 4 °C if not used immediately (avoid exposure at room temperature (~22 °C) for a long time). We do not recommend stopping the heat denaturation step by promptly cooling on ice. CRITICAL STEP After adding the library into washed DNA capture beads, immediately follow by vigorous vortexing for 5 s. CRITICAL STEP To avoid different levels of bead losses during the breaking of emulsion after emPCR, which can affect the accuracy of the calculation of enrichment percentage, calculate enrichment according to reference 13. Before enrichment, measure the amount of beads, denoted as b. After the collection of enriched beads (see ref. 13), measure the amount of beads, denoted as a. The percent enrichment = a / b ×100 CRITICAL STEP See ref. 13 for details on optimal spin speeds, wash steps, and appropriate conditions for removing as much supernatant (contains free enrichment or sequencing primers) as possible. nature protocols | VOL.6 NO.9 | 2011 | 1373 protocol Library sequencing using the GS FLX Titanium sequencing method ● TIMING 5 h plus 8 h machine time 30| Perform sequencing of the library as described14. Choose run processing type ‘Full Processing for Shotgun or Paired End’ at the appropriate point. © 2011 Nature America, Inc. All rights reserved. Post-sequencing data analysis ● TIMING 2 h (on an 8-core computer with and 16 GB RAM) 31| Y-adapter trimming. Starting from version v2.3, the GS software no longer recognizes the MGB-probe sequence (which was part of the earlier B adapter) and therefore a customized pipeline is needed. An example of the template file TrimYadapter.xml is available in Supplementary Data 2 or can be created using the gsRunProcessor command to generate a filter template, e.g.: gsRunProcessor --template = filteronly > TrimYadapter.xml Within the ‘qualityFilter’ section of the template file, add: <primer>GTGACACGCAACAGGGGATAGACAAGGCACACAGGGGATAGG</primer> Run the runAnalysisFilter command using the new filter template and the ‘signalProcessing’ folder, e.g.: runAnalysisFilter --pipe = TrimYadapter.xml D_2010...signalProcessing 32| Split mixture of barcoded samples. To split the sff file that contains pooled sample sequences into individual ones, run the sfffile command, e.g.: sfffile -s Y -mcf Yscheme.txt -o region1 NameOfYourSFFfile1.sff > MIDyieldR1.txt sfffile -s Y -mcf Yscheme.txt -o region2 NameOfYourSFFfile2.sff > MIDyieldR2.txt The Yscheme.txt can be created using any available text editor; alternatively, it is available in Supplementary Data 3. CRITICAL STEP To identify individual samples, it is likely that the proprietary Roche ‘sfffile’ command will use the 5′ MID sequences, given their higher sequencing quality and the fact that the 3′ MID sequences are at the far end of library molecules and are often not readable (and, if they are read, they will be trimmed away). ? TROUBLESHOOTING Troubleshooting advice can be found in Table 2. Table 2 | Troubleshooting table. Step Problem Possible reason Solution 26 A visible band of ~79 or 91 bp There remains a substantial proportion of adapter dimer, possibly due to an incorrect amount of AMPure XP beads used for size selection Purify the prepared library again using an appropriate amount (% vol/vol) of AMPure XP beads on the basis of the calibration results The majority of the amplicons are 100–300 bp An incorrect amount of AMPure XP beads was used for size selection Purify the prepared library again using an appropriate amount (% vol/vol) of AMPure XP beads on the basis of the calibration results ● TIMING For preparing eight libraries: Steps 1–6, DNA nebulization: 1 h Steps 7–16, Fragment size selection: 1 h Steps 17 and 18, End polishing, phosphorylation and dA extension: 1 h Steps 19–21, Adapter ligation and purification: 1 h or overnight Steps 22–25, Library quantification by qPCR: 2.5 h Step 26, Library quality control by gel electrophoresis: 1 h Steps 27 and 28, Library pooling: 10 min Step 29, Emulsion PCR amplification of the library, bead recovery and enrichment: 8 h for amplification plus 5 h for bead recovery and enrichment Step 30, Library sequencing: 5 h hands-on time plus 8 h machine time Step 31, Y adapter trimming: 2 h on an 8-core computer with 16 GB of RAM Step 32, Split mixture of barcoded samples: 1 min 1374 | VOL.6 NO.9 | 2011 | nature protocols a 50 Run 1 total yield: 379 million bases Region 1 Region 2 Median read length = 395 bp Median read length = 299 bp Total 613,714 reads Total 526,044 reads 50 Expected (%) (Pipeline was adjusted for Observed (%) short reads for the degraded clinical samples) Proportion of yield within region (%) Figure 4 | Anticipated results. The expected and observed sequencing yields using multiplex identifier adapters (MID adapters) from our initial two 454 Titanium runs. (a) Run 1 included six ‘Hp’ samples, which were sequenced in Region 1 (‘region’ is also called ‘lane’ in other systems), and two formalin-fixed old biopsies with two metagenomic samples, which were sequenced in Region 2. Run 1 also served as an experimental validation of the barcodes. (b) Run 2 included more Hp samples, and tested two newly designed adapters and one Rapid Library barcode (Y8) that was not previously used. Expected percentage is the quota allocated for each sample according to the needs of different projects (see Supplementary Data 1) and observed percentage is the actual proportion of reads sequenced from each sample. Because the Y1 and Y12 adapters generated substantially less yield than expected (30% and 50%, respectively), they were not recommended for further use. 40 40 30 30 20 20 10 10 0 MID Sample Y1 Hp1 Y3 Hp2 Y5 Hp3 Y9 Hp4 Y10 Hp5 Y11 Hp6 Y12 0 Y1 Y3 Y5 Y9 M1 Y10 F1 Y11 F2 Y12 M2 b ANTICIPATED RESULTS Region 1 Region 2 Median read length = 469 bp Median read length = 445 bp Figure 4 shows the expected and Total 851,679 reads Total 856,957 reads 20 20 observed sequencing yield using Expected (%) Observed (%) multiplex identifier adapters (MID adapters) from our initial two Titanium runs. For 454 sequencing, 15 15 every run can be divided by 2, 4, 8 or 16 physically separated regions (the counterpart term of ‘Region’ in 10 10 Illumina sequencing is ‘Lane’). Run 1 included degraded clinical samples in Region 2 (shorter reads) and gave a total yield of 379 Mb, at the lower end of 5 5 the Roche’s specification (360M–560M). Run 2 contained chromosomal DNA samples, and yielded 711 Mb (699 Mb 0 0 after trimming of our customized MID Y1 Y3 Y5 Y8 Y9 Y10 Y11 Ya1 Ya2 Y1 Y3 Y5 Y8 Y9 Y10 Y11 Ya1 Ya2 Sample adapter), with an even distribution of each sample, demonstrating the strength of this method. Note that the adapter Y1 did not perform as well as others in Run 1 and the prepared library using this adapter was intentionally (for the purpose of our study project) used in Run 2 to balance the yield of the sample ‘Hp1’. Adapters Y1 and Y12 did not pass the experimental validation and are therefore not included in the set of eight recommended adapters. Run 2 total yield: 711 million bases Acknowledgments This study was supported by the Sixth Research Framework Programme of the European Union, a project on infections and cancers (INCA, LSHC-CT-2005-018704); the Karolinska Institutet Faculty funds for partial financing of new doctoral students (KID scholarship) to Z.Z.; and the Swedish Research Council Formas grant (FORMAS) to A.F.A. AUTHOR CONTRIBUTIONS Z.Z. designed the Y adapters and qPCR assay, proposed Poisson statistics, conducted the experiments and drafted the manuscript; A.A., Ö.M., S.G. and H.N. assisted with the experiments and provided critical reviews; W.Y., L.E. and A.A. obtained funding and provided critical reviews; A.F.A. supervised the study. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1 p2 H 0 9 p2 H 8 p1 H 7 p1 H 6 p1 H 5 p1 H p1 p1 H H 4 3 p1 H 2 p1 H 1 p1 H 0 p1 H p9 p1 H p8 Note: Supplementary information is available via the HTML version of this article. H H H p7 Proportion of yield within region (%) © 2011 Nature America, Inc. All rights reserved. protocol Published online at http://www.natureprotocols.com/. Reprints and permissions information is available online at http://www.nature. com/reprints/index.html. 1. Roche Diagnostics. Rapid Library Preparation Method Manual: GS FLX Titanium Series (Roche Applied Science, October 2009 (Rev. January 2010)). 2. Frias-Lopez, J. et al. Microbial community gene expression in ocean surface waters. Proc. Natl. Acad. Sci. USA 105, 3805–3810 (2008). 3. Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011). 4. Davey, J.L. & Blaxter, M.W. RADSeq: next-generation population genetics. Brief Funct. Genomics 9, 416–423 (2010). 5. Zheng, Z. et al. Titration-free massively parallel pyrosequencing using trace amounts of starting material. Nucleic Acids Res. 38, e137 (2010). nature protocols | VOL.6 NO.9 | 2011 | 1375 protocol © 2011 Nature America, Inc. All rights reserved. 6. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005). 7. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008). 8. Shendure, J. et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309, 1728–1732 (2005). 9. Baird, N.A. et al. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3, e3376 (2008). 10. Carson, S. et al. DNA sequencing by capillary electrophoresis: use of a two-laser-two-window intensified diode array detection system. Anal. Chem. 65, 3219–3226 (1993). 1376 | VOL.6 NO.9 | 2011 | nature protocols 11. White, R.A. III, Blainey, P.C., Fan, H.C. & Quake, S.R. Digital PCR provides sensitive and absolute calibration for high throughput sequencing. BMC Genomics 10, 116 (2009). 12. Huang, J., Zheng, Z., Andersen, A.F., Engstran, L. & Ye, W. Rapid screening of complex DNA samples by single-molecule amplification and sequencing. PLoS ONE 6, e19723 (2011). 13. Roche Diagnostics. emPCR Method Manual: Lib-L LV GS FLX Titanium Series (Roche Applied Science, October 2009 (Rev. January 2010)). 14. Roche Diagnostics. Sequencing Method Manual: GS FLX Titanium Series (Roche Applied Science, October 2009 (Rev. November 2010)).
© Copyright 2026 Paperzz