Construction of small RNA cDNA libraries for deep sequencing

Methods 43 (2007) 110–117
www.elsevier.com/locate/ymeth
Construction of small RNA cDNA libraries for deep sequencing
Cheng Lu a, Blake C. Meyers a, Pamela J. Green
a
a,b,*
Department of Plant and Soil Sciences, Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711, USA
b
College of Marine and Earth Sciences, University of Delaware, Newark, DE 19711, USA
Accepted 1 May 2007
Abstract
Small RNAs (21–24 nucleotides) including microRNAs (miRNAs) and small interfering RNAs (siRNAs) are potent regulators of
gene expression in both plants and animals. Several hundred genes encoding miRNAs and thousands of siRNAs have been experimentally identified by cloning approaches. New sequencing technologies facilitate the identification of these molecules and provide global
quantitative expression data in a given biological sample. Here, we describe the methods used in our laboratory to construct small
RNA cDNA libraries for high-throughput sequencing using technologies such as MPSS, 454 or SBS.
2007 Elsevier Inc. All rights reserved.
Keywords: Small RNAs; miRNAs; siRNAs; High-throughput sequencing; 454
1. Introduction
Nearly all eukaryotes produce small RNAs (21–24
nucleotides) that function to silence genes by multiple
mechanisms. miRNAs (generally 21–22 nt) are the most
abundant type of small RNAs in most organisms. miRNAs
originate from ‘‘hairpin’’ primary transcripts from one
strand of distinct genomic loci by two rounds of endoribonuclease cleavage by RNase III-like enzymes. Another type
of small RNAs, known as siRNAs (generally 22–24 nt), is
similar in structure and function to miRNAs. siRNAs are
processed from longer double-stranded RNA molecules
and represent both strands of the RNA. In many organisms, such as plants, siRNAs are believed to originate from
longer transcripts derived from transposons, repetitive
sequences and transgenes [1–3].
The first and still most common approach to the discovery of small RNAs has been to clone and sequence individual small RNAs using traditional molecular methods. The
majority of currently known miRNAs were identified by
*
Corresponding author. Address: Department of Plant and Soil
Sciences, Delaware Biotechnology Institute, University of Delaware,
Newark, DE 19711, USA. Fax: +1 302 831 3231.
E-mail address: [email protected] (P.J. Green).
1046-2023/$ - see front matter 2007 Elsevier Inc. All rights reserved.
doi:10.1016/j.ymeth.2007.05.002
this approach. It was first used to identify miRNAs and
siRNAs in mammals, Caenorhabditis elegans, Drosophila
and Arabidopsis [4–8]. Small RNAs that are generated by
RNaseIII have 5 0 phosphate and 3 0 hydroxyl termini in
contrast to most RNA turnover products that have a 5 0
hydroxyl terminus [9]. Different cloning protocols have
been developed independently. Most of them require the
presence of 5 0 phosphate and free 3 0 hydroxyl group on
the small RNAs for adapter ligation. After reverse transcription, the cDNA is PCR-amplified using primers corresponding to the adapter sequences. The PCR products are
cloned and sequenced. Based on published data, about 30–
50% of the clones represent RNA turnover products of the
abundant rRNAs, tRNAs, snRNAs [7,10]. The cloning frequency of an individual small RNA generally reflects its
relative abundance in the sample, providing a quantitative
expression measurement.
Despite the early success of this approach, it is unlikely
that these efforts are saturating for rare or tissue-specific
small RNAs. The identification and quantification of small
RNAs using high-throughput sequencing methods was first
accomplished in Arabidopsis by our lab [11]. More than
2 million small RNAs were sequenced by Massively
Parallel Signature Sequencing (MPSS) [12] from Arabidopsis flowers and seedlings, yielding more than 70,000
C. Lu et al. / Methods 43 (2007) 110–117
genome-matching distinct sequences. This represented a
significant advance over more traditional methods for
small RNA identification. One limitation of MPSS is that
it is only capable of sequencing the 5 0 17 nucleotides of
small RNAs. We also pioneered use of an alternative
approach for small RNA sequencing based on the ‘‘454’’
method of sequencing [13], a technology which produces
longer sequence reads [12]. Recently, we reported the use
of both MPSS and 454 to sequence small RNAs from different Arabidopsis mutant backgrounds [14,15]. Combined
with genetic approaches, deep sequencing provides a powerful tool for the dissection and characterization of diverse
small RNA populations and identification of low abundance miRNAs.
This article describes the method used in our laboratory
to make size-fractionated cDNA libraries that are used for
high-throughput sequencing with parallel approaches. This
method was originally developed for use with plants and
MPSS. Substantial progress has been reported for other
next-generation sequencing technologies. Solexa, Inc. has
developed a four-color DNA sequencing-by-synthesis
(SBS) approach as a replacement for MPSS based on a
novel, reversible, dye-termination chemistry (http://
www.solexa.com). This approach can potentially generate
>10 million 25–30 nt sequence tags with high accuracy. A
different sequencing approach named Supported Oligo
Ligation Detection (SOLiD) is being developed by Agencourt Personal Genomics (now a part of Applied Biosystems, Inc.). This method uses an array of microbeads
each coated with a single DNA or cDNA fragment; a pool
of fluorescent oligos is used to ‘‘read’’ the sequences by
complementary binding using a repeated process of ligation, detection, and cleavage. This determines up to 50
nucleotides of sequence per bead, for >10 million beads.
These novel, highly parallel methods have the potential
to dramatically reduce the cost of sequencing and offer a
much richer source of sequence information. The method
described here should be applicable to all of these forthcoming technologies.
2. Method
An overview of small RNA cloning and sequencing
methods is schematically depicted in Fig. 1. First, low
molecular weight (LMW) RNA is isolated from the tissue of interest. Next, small RNAs (20–30 nt) are purified
from the LMW RNA fraction by polyacrylamide gelbased size fractionation and are ligated to a 5 0 RNA
adapter. To prevent self-ligation of small RNAs and
self-ligation of the adapter, the 5 0 terminus of the adapter has a hydroxyl group and an excess of adapter over
small RNAs is used. Then, a 3 0 adapter is ligated to the
gel-purified product of the 5 0 adapter ligation. The 3 0
RNA adapter is modified to prevent circularization and
self-ligation; typically, the 3 0 hydroxyl is blocked by
chemical synthesis of an oligonucleotide containing a 3 0
non-nucleotidic group. After reverse transcription, a
111
total RNA isolation with Trizol
separation of LMW and HMW RNA
purification of small RNAs (20-30 nt)
5’ adapter ligation
purification of 1st
ligation product
3’ adapter ligation
purification of 2nd
ligation product
reverse transcription
18 cycles of PCR amplification
purification of 75 bp PCR product
cloning into pCRII-TOPO for QC
analysis of ~100 colonies
using traditional sequencing
454, MPSS, or SBS
sequencing
Fig. 1. An outline of how to make a small RNA cDNA library from total
RNA samples. See text for details.
low number of PCR cycles are used to obtain sufficient
amount of template for sequencing. The PCR product
can be cloned and sequenced with regular PCR cloning
vectors. The quality of the small RNA cDNA libraries
is usually assessed in a quality control (‘‘QC’’) step by
sequencing about 100 individual clones. Below, these
steps are described in detail.
3. Material and reagents
1. RNA isolation: Trizol reagent (Invitrogen 15596),
chloroform, isopropanol, 75% ethanol, DEPC-treated water.
2. LMW and high molecular weight (HMW) RNA separation: 5 M NaCl, 50% PEG8000, 5 mg/ml glycogen
(Ambion 9510).
3. RNA purification: 10· TBE, 2· formamide loading
buffer (90% formamide, 1· TBE, xylene cyanol, and
bromophenol blue), 10 bp DNA ladder (1 lg/ll)
(Invitrogen 10821-015), 10% ammonium persulfate,
TEMED, 40% acrylamide stock (Ambion 9022) ,
0.3 M NaCl, ethanol (EtOH), ethidium bromide,
Spin-X filter (Corning 8162).
4. Adapter ligation: T4 RNA ligase 5 U/ll and 10·
RNA ligase buffer (Ambion 2140), RNaseOUT
40 U/ll (Invitrogen 10777).
5. RT-PCR: Superscript II RT 200 U/ll (Invitrogen
18064), Taq DNA polymerase (Invitrogen 103420),
dNTP mix (Invitrogen R725-01).
112
C. Lu et al. / Methods 43 (2007) 110–117
6. PCR product purification: 10 bp DNA ladder (1 lg/
ll) (Invitrogen 10821-015), buffer saturated phenol
(pH 7.9) (Ambion 9710), chloroform.
7. PCR cloning: TOPO TA Cloning (Invitrogen K450001), LB Broth Base (Invitrogen 12780), TOP10 One
Shot Cells (Invitrogen K4500-40), X-gal (Invitrogen
15520-034), IPTG (Invitrogen 15529-019).
8. RNA oligos for RNA ligation (Dharmacon): 5 0 RNA
adaptor (5 0 OH-GGU CUU AGU CGC AUC CUG
UAG AUG GAUC-OH 3 0 ), 3 0 RNA adaptor (5 0
pUAU GCA CAC UGA UGC UGA CAC CUG
CidT 3 0 ). (Note: p, phosphate; idT, inverted deoxythymidine. The exact sequences of the adapters can
be changed based on specific needs. Both adaptors
were purified by PAGE by Dharmacon.)
9. DNA oligo for reverse transcription: RT-primer (5 0
CAA GCA GAA GAC CGC ATA CGA 3 0 ).
10. DNA oligos for PCR amplification: 5 0 PCR primer
(5 0 CAA GCA GAA GAC CGC ATA CGA 3 0 ), 3 0
PCR primer (5 0 AAT GAT ACG GCG ACC ACC
GA 3 0 ).
4. Protocol
(i) Carefully remove supernatant and wash pellet with
80% EtOH without dislodging. Allow the pellet to air
dry, and dissolve the pellet in DEPC-treated water.
(Note: 10 ll DEPC-treated water is typically used to
resuspend LMW RNAs from 100 lg of total RNA.)
4.1.1. Comment
Most RNA isolation methods are based on either chemical extraction (e.g. Trizol) or immobilization on silica-based
membrane (e.g. QIAGEN RNeasy Kit). The second method
has been shown to work well for large RNAs (>200 nt).
However, small RNAs can not be recovered efficiently from
silica-based purification. Based on our experience, total
RNA isolated using Trizol reagent is usually free of protein
and DNA contamination yet contains most of the small RNA.
The small RNA enrichment step (HMW and LMW RNA
separation) is recommended for library construction because
a high level of rRNA and mRNA might increase background
noise in libraries. These HMW RNAs can be precipitated
using 5–10% PEG (MW = 8000). In the LMW fraction,
RNA species 6200 nt (including tRNAs) are highly enriched
(Fig. 2a). Our preferred way to quickly analyze the abundance and integrity of the LMW RNA is to run an aliquot
of the preparation on a 1.5% agarose gel (Fig. 2b).
4.1. Low molecular weight (LMW) RNA isolation
Harvest samples and immediately freeze in liquid
nitrogen.
(a) Grind to a fine powder. As an example, the use of 3 g
of seedling tissue ground using a mortar and pestle
under liquid nitrogen will yield about 500 lg of total
RNA.
(b) Isolate total RNA using Trizol reagent as indicated in
the manufacturer’s protocol. For the example tissue
in (a), we would use 40 ml Trizol. For some recalcitrant tissues, we add an extra chloroform extraction.
(c) Dissolve total RNA in DEPC-treated water to a concentration of about 1 lg/ll. (Note: A yield of 200–
400 lg of total RNA is typically what we use as starting material for the subsequent steps.)
(d) mRNA and rRNA (high molecular weight (HMW)
RNAs) are precipitated by adding both 50% PEG
(MW = 8000) to a final concentration of 5% and
5 M NaCl to a final concentration of 0.5 M.
(e) Mix well and put tube on ice for 30 min.
(f) Spin down at max speed in a microcentrifuge (12 K)
for 10 min at 4 C to pellet HMW RNAs. (Note: The
pellet from this step can be dissolved in DEPC-treated water and used for regular Northern blots.)
(g) Transfer supernatant to a new microcentrifuge tube
(this fraction contains the LMW RNAs) and add
2.5 volumes of 100% EtOH, mix well, and place at
20 C for at least 2 h.
(h) Spin down at maximum speed for at least 30 min at
4 C to pellet LMW RNAs.
4.2. 17–27 Nucleotide small RNA purification from LMW
RNA
(a) Prepare the glass and spacers (1.5 mm) for pouring the
gel. (Mini-PROTEAN 3 System from Bio-Rad or
other similar vertical electrophoresis systems with
approximately 10 · 8 cm gel size can be used.)
(b) Prepare a 15% polyacrylamide/urea gel. Mix the components (9.6 g urea, 7.5 ml 40% acrylamide stock,
2 ml 10· TBE, DEPC-treated water to 20 ml) and
warm the solution to 37 C to dissolve the urea. Filter
the solution through a nitrocellulose filter and cool to
room temperature.
(c) Add 120 ll of a freshly prepared solution of 10%
ammonium persulfate to the acrylamide solution.
Mix well.
(d) Add 9.2 ll of TEMED. Mix the solution by swirling.
Fill the space almost to the top. Lay the glass plates
against a test-tube rack at an angle of 10.
(e) Immediately insert the appropriate comb. Allow the
acrylamide to polymerize for 30 min at room
temperature.
(f) Remove the comb from the gel and rinse out the wells
thoroughly with 1· TBE.
(g) Pre-run the gel for 15–30 min at 200 V, and wash the
wells using 1· TBE. The gel is now ready for loading.
(h) Load as much as 10 ll LMW RNAs into each well as
follows: add an equal volume of 2· loading dye to the
RNA solution. Mix well by vortexing, heat the samples at 65 C for at least 5 min to disrupt secondary
structure and put on ice immediately.
C. Lu et al. / Methods 43 (2007) 110–117
a
b
1
2
c
113
2
1
30 nt
20 nt
Total RNA
d
1
LMW RNA
e
small RNAs
f
1
1
50 nt
70 nt
70 nt
5’adapter ligation
3’adapter ligation
PCR
Fig. 2. Gels illustrating different steps during small RNA cDNA library construction. (a) Total RNAs isolated with Trizol were run on a 1.2% agarose gel.
(b) After separation by PEG precipitation, HMW (lane1) and LMW (lane 2) RNAs were run on a 1.5% agarose gel. (c) The LMW fraction from 200 lg of
total RNAs (lanes 1 and 2) was resolved on a 15% denaturing polyacrylamide gel and stained with ethidium bromide. Two bands can be detected in the
range between 20 and 30 nt. The portion of gel within the rectangle was recovered. Ten microliters of 5 0 (d) or 3 0 (e) adapter ligation product (lane 1) was
resolved on a 10% or 7.5% denaturing polyacrylamide gel, respectively, and stained with ethidium bromide. The portion of gel within the rectangle was
recovered. (f) After PCR, 5 ll of PCR product (lane 1) was resolved on a 7.5% denaturing polyacrylamide gel and stained with ethidium bromide. A
75 bp band should be easily detected.
(i) Load 3 ll of 10-bp ladder (+2· loading dye) into an
unused lane. (Note: Denature the ladder the same
way as the LMW RNAs, so the ladder will run as single stranded DNA.)
(j) Run the gel at 200 V for about 1 h, and stain the gel
with 1· TBE/ethidium bromide (1 lg/ml) for 5 min.
(k) Cut out a plug of the gel corresponding to the
band size of 20–30 nucleotides with a clean razor
blade, put it into 2 ml tube and crush the gel thoroughly with a small pestle. (Note: As shown in
Fig. 2c, two small RNA bands (21 and 24 nt) can
be seen after ethidium bromide staining from most
plant samples.)
(l) Add two volumes (250 ll per lane) of 0.3 M NaCl to
the tube, and elute the RNA by rotating the tube gently
at room temperature for at least 4 h (to overnight).
(m) Transfer the eluate and the gel debris onto the top of
a Spin-X filter source, and spin at full speed for 1 min
in a microcentrifuge at RT.
(n) Add 2.5 volumes 100% EtOH and 3 ll glycogen to
the filtrate (the eluted sample), mix well, and incubate
at 80 C for at least 2 h.
(o) Spin down at maximum speed at 4 C for 30 min in a
microcentrifuge.
(p) Carefully remove the supernatant and wash the pellet
with 80% EtOH without dislodging. Allow the RNA
pellet to air dry then dissolve the RNA in 10 ll of
DEPC-treated water.
4.3. 5 0 Adaptor ligation and purification
(a) The 5 0 adaptor ligation reaction is carried out in
a 10 ll reaction containing 5 ll purified small
RNAs, 2 ll 5 0 RNA adaptor (20 lM stock concentration), 1 ll 10· RNA ligase buffer, 2 ll T4
RNA ligase. Incubate the reaction at room temperature for 6 h.
(b) Stop the reaction with 10 ll 2· loading dye. Heat the
sample in loading buffer at 65 C for 15 min prior to
loading.
(c) Prepare a 10% denaturing polyacrylamide gel as
described in 2b–g above except using the following
gel recipe: 9.6 g urea, 5 ml 40% acrylamide stock,
2 ml 10· TBE, DEPC-treated water to 20 ml.
(d) Load the entire 5 0 adapter reaction into one well.
(e) Load 2 ll of 10-bp ladder (+2· loading dye) into an
unused lane.
114
C. Lu et al. / Methods 43 (2007) 110–117
(f) Run the gel at 200 V for about 1 h, and stain the gel
with 1· TBE/ethidium bromide (1 lg/ml) for 5 min.
(g) Cut out a plug of the gel corresponding to a band size
of 50–65 nucleotides with a clean razor blade, put it
into a 2 ml tube and crush, elute, and precipitate as
described in 2i–p above and dissolve the RNA in
5 ll of DEPC-treated water.
(g) Cut out a plug of the gel corresponding to the band
size of 70–90 nucleotides with a clean razor blade,
put it into 2 ml tube and crush, elute, and precipitate
as described in 2i–p above, and dissolve the RNA in
5 ll of DEPC-treated water.
4.5. RT-PCR of small RNAs ligated with adapters
4.3.1. Comment
RNA adapters are usually supplied in a dehydrated
form. We recommend dissolving them at 200 lM in
DEPC-treated water. Then prepare a 20 lM working solution for ligation reactions. All the solutions should be aliquoted and stored at 80 C.
T4 RNA ligase catalyzes the ATP-dependent intraand intermolecular formation of phosphodiester bonds
between 5’-phosphate and 3’-hydroxyl termini of oligonucleotides, single-stranded RNA and DNA. Several T4
RNA ligases from different companies have been tested
in our lab. We found that Ambion’s T4 RNA ligase gave
the most efficient and reliable results under our reaction
conditions. The cloning frequency of individual small
RNAs usually reflects their expression level. However,
one possible source of bias in the ligation reaction is differential ligation efficiency toward the ends of various
small RNAs.
As shown in Fig. 2d, the 5 0 ligation product is generally invisible with ethidium bromide staining. Based on
the size of the 5 0 adapter, recover a band from the gel
corresponding to the right size. Because a large molar
excess of 5 0 RNA adapter is used in the reaction, the
unligated adapters are readily detectable in the gel
(Fig. 2d).
(a) Five microliters of purified ligation product is incubated with 3 ll of 100 lM RT-primer and 3 ll
DEPC-treated water at 65 C for 10 min. Spin down
to cool.
(b) Add the following components on ice, in order: 6 ll
of 5· first strand buffer, 5.5 ll of 2 mM dNTP mix,
3 ll of 100 mM DTT, 1.5 ll RNaseOut, and 3 ll
Superscipt II RT (200 U/ll).
(c) Incubate at 45 C for 1 h, followed by a final 5 min
incubation at 90 C to inactivate the enzyme.
(d) PCR is carried out in twelve 50 ll reaction tubes,
each containing 5 ll 10· PCR buffer, 1.5 ll of
50 mM MgCl2, 1 ll of 10 mM dNTPs, 0.5 ll of
100 lM 5 0 PCR primer, 0.5 ll of 100 lM 3’ PCR primer, 1 ll Taq polymerase (5 U/ll), 1 ll of RT reaction mixture.
(e) The reactions are incubated at 94 C for 1 min, and
then cycled 18 times at 94 C for 45 s, 55 C for
45 s, and 72 C for 45 s. This is followed by a 3 min
incubation at 72 C.
(f) Check the reaction on a 7.5% denaturing polyacrylamide gel as follows: remove 5 ll from the PCR reaction, add 2· loading dye, and heat the sample well
before loading along side the 10 bp ladder. A good
smear in the 75 nt size range can be seen with ethidium bromide staining (Fig. 2f).
4.4. 3 0 Adaptor ligation and purification
4.6. PCR product purification
(a) The 3 0 adaptor ligation reaction is carried out in a
10 ll reaction containing 5 ll purified 5 0 adapter ligation product, 2 ll 3 0 RNA adaptor (20 lM), 1 ll 10·
RNA ligase buffer, 2 ll T4 RNA ligase. The reaction
is incubated at room temperature for 6 h.
(b) Stop the reaction with 10 ll 2· loading dye. Heat
sample/loading buffer at 65 C for 15 min prior to
loading.
(c) Prepare a 7.5% denaturing polyacrylamide gel as
described in 2b–g except using the following gel recipe: 9.6 g urea, 3.75 ml 40% acrylamide stock, 2 ml
10· TBE, DEPC-treated water to 20 ml.
(d) Load the entire 3 0 adapter ligation reaction into one
well.
(e) Load 2 ll of 10 bp ladder (+2· loading dye) into an
unused lane.
(f) Run the gel at 200 V for about 1 h, and stain the
gel with 1· TBE/ethidium bromide (1 lg/ml) for
5 min.
(a) Add an equal volume of Tris buffer (pH 7.9)-saturated phenol:chloroform (1:1) to the PCR
reaction.
(b) Mix well by vortexing for 30 s, and spin in a microcentrifuge for 3 min at max speed.
(c) Carefully remove the aqueous layer to a new tube.
(d) To remove traces of phenol, add an equal volume of
chloroform to the aqueous layer.
(e) Mix well by vortexing for 30 s, and spin in a microcentrifuge for 3 min at maximum speed.
(f) Transfer the aqueous layer to new tube.
(g) Measure the volume of the DNA sample. Adjust the
salt concentration by adding 1/10 volume of 3 M
sodium acetate, pH 5.2. Mix well. Add 2.5 volumes
of cold 100% ethanol (calculated after salt addition).
Mix well.
(h) Place at 20 C for at least 2 h.
(i) Spin at maximum speed in a microcentrifuge for 20 min.
C. Lu et al. / Methods 43 (2007) 110–117
(j) Carefully remove the supernatant and wash the pellet
with 75% EtOH without dislodging. Allow the RNA
pellet to air dry.
(k) Resuspend the pellet in 120 ll H2O. Add 24 ll of 6·
loading dye. Mix well.
(l) Load the entire sample into eight wells of a 10% TBEpolyacrylamide gel, along with 10 bp DNA ladder as
marker, and run the gel at 150 V for 60 min.
(m) Cut out the product band (75 bp) with a clean razor
blade, put it into a 2 ml tube and crush, and precipitate it as described in 2i–p above, except add 2 volumes of cold 100% EtOH, and incubate at 20 C
for at least 2 h.
(n) Spin at maximum speed at 4 C for 30 min. Wash the
pellet with 0.5 ml room temperature 70% EtOH. Vacuum or air dry the pellet, and dissolve it in 12 ll of
H2O. (Note: For 454 sequencing, 1 lg of purified
PCR product at 100 ng/ll is required)
4.6.1. Comments
To get enough cDNA for sequencing and to maintain
quantitative information as well, 15–20 PCR cycles are
usually used for amplification. For plant samples (200 lg
total RNA), 18 PCR cycles can generate 1.5 lg of purified 75 bp product. Very often, a 50 bp band is visible in
the gel (Fig. 2f). This band is generated from the adapter
ligation product without small RNA inserts. Because most
PCR purification kits have a poor recovery efficiency for
small sized double-stranded DNA, gel purification should
be carried out.
4.7. Cloning into pCRII-TOPO for quality control (QC)
(a) Gel-purified PCR products (0.2 ll) are incubated
with 4 ll sterile water, 1 ll of 1.2 M NaCl and 1 ll
pCRII-TOPO vector at room temperature for
10 min.
115
(b) Transfer 2 ll of each reaction into separate vials of
One Shot cells.
(c) Spread 10–50 ll of each transformation mix onto LB
plates containing 50 lg/ml kanamycin and X-gal/
IPTG.
(d) Incubate overnight at 37 C.
(e) An efficient TOPO Cloning reaction should produce
several hundred colonies. Transfer white or light blue
colonies to a 96-well plate and culture them overnight
containing 50 lg/ml kanamycin.
(f) This plate is ready for regular ABI QC sequencing.
4.7.1. Comments
This step is extremely important for assessing the
quality of the small RNA cDNA libraries. Highly
expressed miRNAs should be easily identified from the
sequencing data. Furthermore, contamination from adapter self-ligation should be lower than 5%. Therefore, any
library that has failed (i.e., contains a low level of known
miRNAs and a high level of the adapter contamination)
in QC analysis should not be used for high-throughput
sequencing. Fig. 3 shows the QC analysis for a small
RNA cDNA library of Arabidopsis flowers. Our libraries
average about 5% adapter contamination. 83 out of 85
clones are in the size range between 18–27 nt, with 20–
24 nt representing the most common size. Approximately
36% of clones are known miRNAs. In addition, there are
many small RNAs, which represent endogenous siRNAs,
match to transposons, rRNA and other repetitive
sequences. The intergenic region (IGR)-derived small RNAs
could arise from novel miRNA genes or unannotated transposons or retroelements.
For some studies that do not need very deep coverage, it
is more cost-effective to mix several libraries in one run
than sequence them separately. Indexing nucleotides can
be added to the individual cDNA libraries so that the origin of the sequences can be traced. Several strategies are
Fig. 3. Quality control analysis for a small RNA cDNA library generated from Arabidopsis flowers. (a) Size distribution of the cloned small RNAs. (b)
Distribution of small RNAs in various sequence classes.
116
C. Lu et al. / Methods 43 (2007) 110–117
possible to achieve this depending on specific needs. For
example, different 5 0 RNA adapters can be used for distinct
libraries. After making libraries independently, the purified
PCR products can be combined before high-throughput
sequencing. For this method, the indexing nucleotides
can be placed adjacent to the cloned sequence rather than
at the end of a primer that also must be sequenced. Because
of the high cost of RNA oligos, this method is very expensive. If the high-throughput reads are long enough to get
entire adapter sequences (like 454), an alternative strategy
can be applied. The second strategy has the advantage that
the same RNA adapters can be used in all the libraries. The
indexing nucleotides are introduced into PCR primers (for
example, two indexing nucleotides can be added at the 5 0
ends of the PCR primers), which would produce a distinct
‘‘tag’’ for each library. We have successfully sequenced
mixed libraries by 454 with this second indexing strategy.
5. Troubleshooting
The integrity of RNA and DNA oligos has significant
impact on the outcome of the experiment. HPLC or
PAGE-purified RNA oligos should be used. Regardless
of the source of oligos, if there is any question about the
cleanliness of the oligos, the oligos should be further
PAGE-purified. The oligos can be assessed for intactness
by running an aliquot on a polyacrylamide gel. The following discussion assumes that only very pure, high quality
RNA or DNA oligos were used in the protocol.
Positive control. An optional control experiment consists
of using miRNA-certified total RNAs (Ambion) through
adaptor ligation, purification, reverse transcription and
PCR. Obtaining a good small RNA library using the control RNA demonstrates that the reagents are working
properly.
RNA quality and stability. The key factor for success is
quality of the starting RNA. RNA degradation during
RNA isolation or purification steps is the most likely reason for failure to obtain a good library. We strongly recommend that you analyze an aliquot of the total RNA
on an agarose gel before starting any purification and ligation steps. Look for a 28S ribosomal RNA band that is
twice the intensity of the 18S band. In addition, both bands
should be sharp, with no smearing. Not all RNA purification methods efficiently recover small RNAs. Therefore, it
is important to confirm that the method is effective for
recovery of small RNAs.
A few specific problems are discussed below
Low or no product after the final PCR amplification. One
probable cause is a low amount of starting RNA. At least
5 lg of LMW RNA (usually precipitated from over 100 lg
of total RNA) should be used. Although we have seen successful small RNA libraries for some samples with lower
amounts of starting material, 100 lg of high quality total
RNA provides more consistent results.
Adaptor sequence contamination. RNA degradation or
low resolution of gel-purification may cause RNA adaptor
sequence contamination in small RNA libraries. In theory,
in the absence of RNA degradation, no adapter–adapter
ligation should result since the RNA adapter either has
no 5 0 -phosphate or it has a blocked structure that cannot
undergo ligation with T4 RNA ligase. To minimize degradation and protect RNA integrity, RNase inhibitor can be
used during various enzymatic reactions.
Gel-purification. Poor quality of small RNA libraries can
alternatively be caused by unsatisfactory gel-purification.
For best results, it is recommended that freshly-made denaturing (7.5 M urea) polyacrylamide gels be used. Less than
20 ll of sample should be applied to a well of 1.5 mm thickness and of 5 mm width for good separation.
6. Conclusions
A major limitation of traditional sequencing for the discovery of small RNAs by cloning is that it is extremely
challenging to identify small RNAs that are expressed at
a low level, in restricted cell-types, or at very specific stages.
In principle, this is no longer a limiting factor due to our
ability to deeply sequence small RNA libraries from a
broad range of samples. Using the method described here,
we first analyzed the small RNA component of the transcriptome of Arabidopsis tissues [11]. This work identified
many small RNA sequences that were not previously documented and some were associated with genomic regions
previously considered devoid of activity. Our data indicated that high-throughput sequencing methods are necessary to sample the full complexity of small RNAs in plants
and likely other organisms as well. Application of this
method to several key mutants affecting small RNA biogenesis pathways can quickly lead to the identification of
candidate miRNAs, trans-acting siRNAs and other interesting classes of small RNAs [14,15]. In addition to analyzing small RNAs in plants, we have recently extended this
approach to animal, fungi and viral systems ([16] and
unpublished data). This method should prove to be a powerful approach that allows rapid identification and quantification of thousands or potentially millions of small RNA
molecules in a single run.
Acknowledgments
We thank S. Luo and C.D. Haudenschild for technical
advice and assistance; M. German and M. Accerbi for
comments on the manuscript. This work was supported
primarily by NSF Grants 0439186 and 0548569 (P.J.G.
and B.C.M.), with additional support provided by DOE
DE-FG02-04ER15541 (P.J.G.).
References
[1] X. Chen, FEBS Lett. 579 (2005) 5923–5931.
[2] B.C. Meyers, F.F. Souret, C. Lu, P.J. Green, Curr. Opin. Biotechnol.
17 (2006) 139–146.
[3] H. Vaucheret, Genes Dev. 20 (2006) 759–771.
C. Lu et al. / Methods 43 (2007) 110–117
[4] M. Lagos-Quintana, R. Rauhut, W. Lendeckel, T. Tuschl, Science
294 (2001) 853–858.
[5] N.C. Lau, L.P. Lim, E.G. Weinstein, D.P. Bartel, Science 294 (2001)
858–862.
[6] R.C. Lee, V. Ambros, Science 294 (2001) 862–864.
[7] C. Llave, K.D. Kasschau, M.A. Rector, J.C. Carrington, Plant Cell
14 (2002) 1605–1619.
[8] W. Park, J. Li, R. Song, J. Messing, X. Chen, Curr. Biol. 12 (2002)
1484–1495.
[9] P.D. Zamore, T. Tuschl, P.A. Sharp, D.P. Bartel, Cell 101 (2000) 25–33.
[10] R. Sunkar, T. Girke, P.K. Jain, J.K. Zhu, Plant Cell 17 (2005) 1397–
1411.
[11] C. Lu, S.S. Tej, S. Luo, C.D. Haudenschild, B.C. Meyers, P.J. Green,
Science 309 (2005) 1567–1569.
[12] S. Brenner, M. Johnson, J. Bridgham, G. Golda, D.H. Lloyd, D.
Johnson, S. Luo, S. McCurdy, M. Foy, M. Ewan, R. Roth, D.
George, S. Eletr, G. Albrecht, E. Vermaas, S.R. Williams, K. Moon,
T. Burcham, M. Pallas, R.B. DuBridge, J. Kirchner, K. Fearon, J.
Mao, K. Corcoran, Nat. Biotechnol. 18 (2000) 630–634.
117
[13] M. Margulies, M. Egholm, W.E. Altman, S. Attiya, J.S. Bader, L.A.
Bemben, J. Berka, M.S. Braverman, Y.J. Chen, Z. Chen, S.B. Dewell,
L. Du, J.M. Fierro, X.V. Gomes, B.C. Godwin, W. He, S. Helgesen,
C.H. Ho, G.P. Irzyk, S.C. Jando, M.L. Alenquer, T.P. Jarvie, K.B.
Jirage, J.B. Kim, J.R. Knight, J.R. Lanza, J.H. Leamon, S.M.
Lefkowitz, M. Lei, J. Li, K.L. Lohman, H. Lu, V.B. Makhijani, K.E.
McDade, M.P. McKenna, E.W. Myers, E. Nickerson, J.R. Nobile, R.
Plant, B.P. Puc, M.T. Ronan, G.T. Roth, G.J. Sarkis, J.F. Simons, J.W.
Simpson, M. Srinivasan, K.R. Tartaro, A. Tomasz, K.A. Vogt, G.A.
Volkmer, S.H. Wang, Y. Wang, M.P. Weiner, P. Yu, R.F. Begley, J.M.
Rothberg, Nature 437 (2005) 376–380.
[14] I.R. Henderson, X. Zhang, C. Lu, L. Johnson, B.C. Meyers, P.J.
Green, S.E. Jacobsen, Nat. Genet. 38 (2006) 721–725.
[15] C. Lu, K. Kulkarni, F.F. Souret, R. MuthuValliappan, S.S. Tej, R.S.
Poethig, I.R. Henderson, S.E. Jacobsen, W. Wang, P.J. Green, B.C.
Meyers, Genome Res. 16 (2006) 1276–1288.
[16] J. Burnside, E. Bernberg, A. Anderson, C. Lu, B.C. Meyers, P.J.
Green, N. Jain, G. Isaacs, R.W. Morgan, J. Virol. 80 (2006) 8778–
8786.