A Complete Solution for Generating Stranded RNA

TECH NOTE
A Complete Solution for Generating
Stranded RNA-Seq Libraries from HighInput Total RNA
SMARTer Stranded Total RNA Sample Prep Kit - HI Mammalian
Fast, efficient rRNA depletion and stranded library preparation >>
Reproducibility and consistency across replicates >>
High-quality, stranded data >>
Accurate results across RNA quality >>
Overview
Streamlined methods provide the opportunities necessary to push each
experiment to its fullest potential. By combining time-saving techniques with
high-performance reagents, studies in transcriptomics can move forward
efficiently and accurately. Expression analysis of the entire transcriptome by
RNA-sequencing (RNA-seq) can reap great benefits from high sensitivity,
wide range of sample input amounts, and easy-to-use protocols. Traditionally,
generation of RNA-seq libraries from total RNA has been challenged by the
high amounts of ribosomal RNA (rRNA) in the starting material, and lengthy
protocols required to incorporate platform-specific adaptors via ligation.
The SMARTer Stranded Total RNA Sample Prep Kit - HI Mammalian is a
unique solution for generating indexed cDNA libraries suitable for nextgeneration sequencing (NGS) on any Illumina® platform, starting with 100
ng–1 µg of total mammalian RNA of any quality.
Fast, Accurate Technology for rRNA Removal and Library
Generation
Our SMARTer RNA-seq kits are based on the core SMART (Switching
Mechanism at 5' End ofRNA Template) technology (1), a streamlined
process that maintains strand information, and also eliminates tedious library
preparation by incorporating adaptors in reverse transcription and PCR
steps. The strand-specific reverse transcription reaction maintains close to
99% accurate strand of origin information, allowing for the identification of
overlapping transcripts and antisense transcripts. Sequencer-ready libraries
are generated during PCR amplification of the cDNA, using primers
containing Illumina cluster-generating sequences and indexes.
Total RNA can consist of ≥90% rRNA, making it important to remove rRNA
from samples before generating RNA-seq libraries. The RiboGone technology
incorporated in the protocol uses hybridization technology and RNase H
digestion to bind and specifically deplete nuclear rRNA sequences (5S, 5.8S,
18S, and 28S), as well as mitochondrial rRNA sequences (12S) from human,
mouse, or rat total RNA (2). By depleting the rRNA in samples prior to library
generation, sequencing costs are lowered and mapping statistics are
improved.
With the combined power of SMART and RiboGone technologies, the
SMARTer Stranded Total RNA Sample Prep Kit - HI Mammalian enables you
to go from total RNA to Illumina-compatible RNA-seq libraries in around five
hours.
Flowchart of SMARTer Stranded Total RNA Sample Prep Kit - HI Mammalian
library generation.Section A. Depletion of rRNA from total RNA samples with
RiboGone technology. Section B. First-strand cDNA synthesis with SMART
technology, incorporating Illumina Read Primers 1 and 2. Section C.Template
switching and generation of sequencing libraries with Illumina cluster-generating
sequences and indexes by PCR amplification.
Reproducible Sequencing Data
The SMARTer Stranded Total RNA Sample Prep Kit - HI Mammalian
produces extremely reliable RNA-seq data. Two 100-ng samples of Human
Universal Reference RNA (HURR; Agilent) were treated with this kit, and the
data from the two resulting libraries were compared. The high correlation
between them (R = 0.99) displays an impressive level of reproducibility and
consistency across replicates.
Reproducibility across replicates. RNA-seq libraries were generated from two
samples of 100 ng of HURR. The scatterplot illustrates correlations between the
FPKMs (Fragments Per Kilobase Of Exon Per Million Fragments Mapped) from the
two libraries. See Methods >>
High-Quality Sequencing Data
With the SMARTer Stranded Total RNA Sample Prep Kit - HI Mammalian
system, researchers can generate RNA-seq libraries from a variety of
samples, including HURR and HBRR (Human Brain Reference RNA;
Ambion), starting from 100 ng–1 µg of total RNA. Using this kit, rRNA content
was depleted from samples prior to cDNA synthesis and library generation.
When sequenced, both the HURR and HBRR libraries yielded a high number
of quality reads, with 88–94% mapped, 84–91% uniquely mapped, and
approximately 17,600 genes identified. Additionally, based on the ERCC
Spike-In RNA, strand information was maintained at about 99% for both
samples. The benefits of rRNA depletion are clear, with less than 0.5% of
reads from the HURR library and less than 6% of reads from the HBRR
library mapped to rRNA.
Sequence Alignment Metrics
RNA source
Input amount
Number of reads (millions)
Percentage of reads:
Human
universal
Human
brain
400 ng
8.5 (paired end reads)
rRNA
0.3%
5.3%
Mapped to genome
94%
88%
Mapped uniquely to genome
91%
84%
Exonic
43%
50%
Intronic
43%
33%
Intergenic
14%
12%
Number of genes identified
17,570
17,600
Percentage of ERCC transcripts with correct
strand
99.3%
98.8%
Sequence Alignment Metrics. 400 ng of HURR and HBRR with ERCC Spike-In
RNA were treated with this kit. Alignment data is displayed for both libraries, with
the percentage of reads that mapped to rRNA, exonic regions, intronic regions,
intergenic regions, and the correct strand, as defined by Picard analysis.
See Methods >>
These same RNA-seq libraries, generated from HURR and HBRR samples,
produced data that had a strong correlation (R = 0.927) with qPCR data for
the same RNAs obtained through the MicroArray Quality Control (MAQC)
analysis. This suggested that the RiboGone method of rRNA depletion and
SMARTer cDNA synthesis and library preparation did not negatively affect
the RNA-seq data and maintained exceptional accuracy.
MAQC Analysis. RNA-seq libraries were generated with 400 ng of HURR and
HUBR. The scatter plot shows the Log2 ratio of FPKMs of HURR/HBRR graphed
against the Log2 of the ratio of HURR/HBRR derived from qPCR Taqman
probes. See Methods >>
RNA-seq libraries produced with this kit provide an accurate representation of
your sample. A 400-ng HBRR sample with ERCC (External RNA Controls
Consortium) Spike-In RNA Mix (Life Technologies) was treated with this kit,
and the libraries were sequenced, generating 8.5 million paired end reads.
The FPKMs (Fragments Per Kilobase Of Exon Per Million Fragments
Mapped) showed a strong correlation (R2 = 0.9199) and linearity (slope =
0.9988) to the input concentrations of the individual ERCC transcripts,
indicating excellent accuracy and dynamic range.
Dynamic range and linearity of RNA-seq data. Libraries were generated from
Human Brain Reference RNA with ERCC Spike-In RNA Mix2. The above graph
shows strong correlation between the Log2 of input concentrations of individual
ERCC transcripts vs. the Log2 of FPKMs for those transcripts. See Methods >>
Reliable, Accurate Results across Varying Quality of Input RNA
Libraries can be quickly and precisely generated from input RNA of a wide
range of quality. Mouse Liver RNA (Clontech) was chemically sheared to a
RIN (RNA Integrity Number) of either 3 or 7 (3). Samples of each quality were
used at both 100-ng and 1-µg levels and treated with this kit. All of the
libraries generated from these samples had high mapping statistics with 81–
88% mapped reads, 72–77% uniquely mapped reads, with over 12,000 genes
identified. Stranded information of the biological RNA was maintained at high
levels (95–98%), regardless of RIN value.
Sequence Alignment Metrics from RNA of Varying Quality
RNA source
RNA quality (RIN)
Mouse liver
RIN 3
RIN 7
Input amount
100 ng
Number of reads (millions)
1 µg
100 ng
1 µg
1.7 (paired end reads)
Percentage of reads:
rRNA
2%
2%
1%
1%
Mapped to genome
82%
86%
81%
88%
Mapped uniquely to genome
73%
75%
72%
77%
Exonic
55%
53%
54%
54%
Intronic
32%
31%
33%
32%
Intergenic
12%
14%
12%
13%
Number of genes identified
12,079
12,172
12,099
12,212
Percent biological strandedness
95.5%
97.2%
95.6%
98.1%
High-quality libraries across varying levels of RNA quality. Libraries were
generated from Mouse Liver RNA by chemically shearing until it had a RIN of 3 or
7. Sequencing data showed the percentage of reads that mapped to rRNA, exonic
regions, intronic regions, intergenic regions, and the correct strand, as defined by
Picard analysis. See Methods >>
The above data shows that the high reproducibility standards of this kit are
not affected by the quality of input RNA. A comparison of data from the 1-µg
libraries described above shows an extremely high correlation (R = 0.99),
indicating the strong ability of the SMARTer Stranded Total RNA Sample
Prep Kit - HI Mammalian kit to generate reliable, reproducible data across
varying levels of RNA quality.
Reproducibility across RNA quality. A scatterplot illustrates the correlations
between the FPKMs from two libraries generated from 1 µg Mouse Liver RNA
that was chemically sheared until it had a RIN of 3 or 7. See Methods: >>
Summary
The SMARTer Stranded Total RNA Sample Prep Kit - HI Mammalian is a
complete solution for preparing indexed Illumina sequencing libraries from
100 ng–1 µg of mammalian total RNA. This kit incorporates key RiboGone
and SMART technologies, seamlessly blending abundant transcript (rRNA)
removal and strand-specific library generation. SMART technology allows the
addition of Illumina adaptors in a ligation-free manner, significantly reducing
hands-on time while also increasing efficiency. The sequencing data obtained
with this kit maintains high quality and reproducibility across sample
replicates and RNA quality.
Methods and References
Reproducibility across replicates:
Reproducibility across replicates was illustrated with two samples of 100 ng
of Human Universal Reference RNA (Agilent), treated with SMARTer
Stranded Total RNA Sample Prep Kit - HI Mammalian. The two replicates
underwent the same protocol, except Replica #1 used 13 PCR cycles and
Replica #2 used 14 PCR cycles. The libraries were sequenced at 1.3 million
single end reads (1 x 50 bp) on an Illumina MiSeq® instrument, and aligned
with STAR against hg19 with Ensembl annotation. See data >>
MAQC and ERCC analysis:
The quality of sequencing data was demonstrated via MAQC analysis,
dynamic range analysis, and sequence alignment metrics. For this purpose,
RNA-seq libraries were generated from 400 ng of Human Universal Reference
RNA (HURR; Agilent) and Human Brain Reference RNA (HBRR; Ambion) with
ERCC Spike-In RNA, with Mix1 used for HURR and Mix2 used for HBRR.
The libraries were sequenced at 8.5 million paired end reads (2 x 75 bp) on an
Illumina MiSeq instrument, and aligned with STAR against hg19 with
Ensembl annotation. The percentage of reads that mapped to rRNA, exonic
regions, intronic regions, intergenic regions, and the correct strand were
defined by Picard analysis. For MAQC analysis, the Log2 ratio of FPKMs
from HURR/HBRR was graphed against the Log2 of the ratio of HURR/HBRR
derived from qPCR Taqman probes. For the dynamic range study, the
Log2 of input concentrations of individual ERCC transcripts was graphed
against the Log2 of FPKMs for those transcripts in the HBRR sample. See
data >>
Library generation across varying quality of input RNA:
To compare data across RNA quality, Mouse Liver RNA (Clontech) was
chemically sheared until it had a RIN (RNA Integrity Number) of 3 or 7. Either
100 ng or 1 µg of each RIN was used with this kit to generate RNA-seq
libraries. These libraries were sequenced at 1.7 million paired end reads (2 x
25 bp) on an Illumina MiSeq instrument, and aligned with STAR against
mm10 with Ensembl annotation. The percentage of reads that mapped to
rRNA, exonic regions, intronic regions, intergenic regions, and the correct
strand were defined by Picard analysis. Correlations between the FPKMs of
libraries generated from 1 µg of RNA with both RINs were illustrated in a
scatterplot. See data >>
References:
1. Chenchik, A., et al. (1998) RT-PCR Methods for Gene Cloning and
Analysis. (BioTechniques Books, MA), pp. 305–319.
2. Morlan, J.D., et al. (2012) PLOS One 7(8):e42882.
3. Mortazavi, A., et al., (2008) Nature Methods 5(7): 621–628.
http://www.clontech.com/US/Products/cDNA_Synthesis_and_Library_Construction/NGS_Learning_Resources/
Technical_Notes/High_Input_Total_RNA_Seq_Illumina