Conclusions Generate FASTQ files Generate FASTQ

Methods and data analysis tools for sequencing
ribosome protected mRNA fragments
Scott Kuersten1, Lindsay Freeberg, Agnes Radek, Victor Routti1, Sajani Swamy2
1
Epicentre, 5602 Research Park Blvd., Madison, WI, 53719; 2Illumina, San Diego
Abstract
A translating ribosome protects a discrete footprint of ~30 nt on mRNA templates from nuclease digestion [J.A. Steitz, Nature 224, 957 (1969)]. The
techniques referred to as Ribosome Profiling determines the positions of active ribosomes on cellular mRNA by deep sequencing these ribosome-protected
mRNA fragments [Ingolia, NT et al. (2009) Science 324, 218]. Quantitative measurement of abundance in this complex library is obtained by counting the
number of times any given sequence is read and a rate of protein synthesis can measured by examining the ribosome density present on each mRNA.
and monosomes for ribosome profiling. The size-exclusion method is simpler and rapid and does not require any special equipment. In addition, we have
explored methods for depleting the samples of unwanted contaminants such as tRNA and rRNA sequences, optimized the nuclease digestions step and
vastly improved many aspects of library construction. Using these methods we have stream-lined the steps to convert ribosome protected RNA fragments
into libraries compatible with sequencing on Illumina’s instruments. This effort has resulted in the development of the ARTseq™ Ribosome Profiling Kit.
Current methods to isolate polysomes and monosomes rely on several hours of ultracentrifugation using either a sucrose gradient or a cushion. We have
developed a method that uses size exclusion chromatography (SEC) in disposable spin columns as an alternate to ultracentrifugation to isolate polysomes
We present ribosome profiling data generated using these methods in conjunction with bioinformatics tools for analyzing the deep sequencing data using
mammalian samples.
Figure 1. ARTseq workflow.
Figure 4. ARTseq data analysis workflow.
Prepare gradients
or cushions (~30 min)
Lysate
Nuclease digestion
Size Exclusion
OR
Chromatography
(SEC)
Extract RNA
Load
sample
Equilibrate
spin-column
(gravity)
1 min
Load
sample
General overview of the ARTseq workflow for analyzing Fastq files.
A more detailed guide can be found on Epicentre’s ARTseq webpage.
10%
Sucrose Gradient
50%
50%
Ribo-Zero
Ultracentrifugation
PAGE purify footprints
(~4 hr)
Generate
FASTQ files
Library Prep
Illumina sequencing 1 x 51 cycles
1 min
Figure 5. Read length distribution.
Data Analysis
Extract and purify
footprinted RNA
~1.5 hr
After adaptor trimming and alignment, the output file can be used to plot the number of reads per unit length. In this
experiment, samples using various ARTseq Nuclease amounts and either the SEC (left) or sucrose gradient (right) polysome
purification are plotted. Note that the total RNA or lysate not treated with ARTseq Nuclease(0 Units) demonstrate very different
length profiles compared to Nuclease-treated samples. Also, increasing amounts of ARTseq Nuclease result in shorter length
distributions. For both polysome preps, using 60 units of ARTseq Nuclease has the most reads centered around the typical
footprint length of 28-31 bases.
Pelleted
ribosomes
Extract and purify
footprinted RNA
Collect fractions
(~1-2 hr)
~6-8 hr
General overview of the ARTseq workflow for isolating ribosome protected fragments and preparing them for sequencing.
Table 1. Sequencing and analysis metrics.
Figure 6. Coding vs UTR coverage.
Libraries
Reads PF
rRNA
tRNA
Abundant
(other)
Genome & splice
junctions
Unaligned
Column_0U_Nuclease
66,616,168
1.3%
0.5%
9.9%
64.6%
21.1%
Column_3U_Nuclease
61,033,474
18.5%
45.5%
1.9%
26.1%
7.6%
Column_10U_Nuclease
64,928,300
15.7%
29.6%
3.5%
42.1%
8.7%
Column_30U_Nuclease
64,664,480
24.5%
17.5%
5.7%
43.9%
8.0%
Column_60U_Nuclease
65,928,633
30.0%
7.6%
8.9%
44.2%
9.0%
Sucrose_0U_Nuclease
67,722,317
2.6%
0.1%
2.2%
78.1%
12.7%
Sucrose_3U_Nuclease
62,685,631
29.9%
15.4%
1.8%
42.0%
10.5%
Sucrose_10U_Nuclease
57,603,583
30.5%
12.1%
2.4%
44.6%
9.7%
Sucrose_30U_Nuclease
52,819,914
38.0%
5.1%
5.2%
41.4%
9.7%
Sucrose_60U_Nuclease
50,303,823
44.2%
2.7%
8.4%
36.6%
7.7%
Total RNA
44,595,824
2.5%
3.4%
11.3%
45.0%
33.1%
Adaptor
Trimming
Total
Sucrose
SEC
1.5%
1.3%
10.7%
7.4%
21.4%
41.1%
22.1%
29.8%
24.8%
65.7%
66.4%
7.7%
Filter out
rRNA, tRNA,
other Abundant
Coding
UTR
Intron
Intergenic
One of the output files contains data
about the alignment of RNA to various
functional classes in the genome such
as coding, intronic, UTR and intergenic.
This data can be used to calculate the
percent of reads that align to these
regions. In this example, note that
lysate treated with ARTseq Nuclease
show a strong enrichment for coding
sequences; an expected result for
ribosome protected mRNA.
Figure 7. 5′-3′ Read distribution.
Bowtie align to
rRNA and tRNA
contaminants
The output file from Bowtie contains the amount of reads that map to the designated rRNA and tRNA sequences. In this
experiment, note that on average, the SEC samples have more tRNA sequences than the sucrose gradient method. However,
when 60 units of ARTseq Nuclease is used, the percentage of tRNA sequences are similar. Also note that the percentage of tRNA
sequences decrease as more ARTseq Nuclease is added in both SEC and Sucrose methods. Therefore, one troubleshooting solution
for too much tRNA contamination could be to increase the amount of ARTseq Nuclease during polysome digestion.
Figure 2. Visualizing read density across the genome.
TopHat align to
genome and
splice junctions
These outputs depict the 5′ to 3′ coverage from the 1000-most expressed transcripts as a function of distance from the 5′ end of
transcripts. In this experiment, the Total sample demonstrates a fairly equal distribution of reads from 5′ to 3′. In contrast, samples
treated with 60 units of ARTseq Nuclease show a distinct 5′ bias and very little coverage near transcript 3′ ends, characteristic
of ribosome footprints and partly due to the addition of cycloheximide in the polysome buffer to stop translational elongation.
Libraries from both SEC and sucrose gradient samples demonstrate very similar profiles.
Figure 8. Read density near start and stop codons.
Transcript detection and
counting using Cufflinks,
Cuffdiff and Cuffmerge
Various files can be uploaded into a genome browser for visualzing results. IGV screenshot above show GAPDH region for
footprinted (SEC) and total RNA. Notice the difference in read density at 3′ and 5′ UTRs between the total and footprinted sample.
This difference is typical as we do not expect reads in regions that are not translated in the circled regions. In addition, a higher
number of reads near the start and stop codon regions are expected (see figure 8).
The output files from Picard can be used to specifically examine the areas surrounding the start and stop codons of transcripts. In
the experiment, the mean normalized read densities near start (top panels) or stop (bottom panels) codons in the samples were
plotted. In both footprinted samples, there is a peak approximately 15 bases after the start codon or 12 bases before the stop
codon. Also, near stop codons there is a dramatic drop off in read density after the peak. This represents the ribosome pausing at
the stop codon and terminating translation.
Figure 3. Gene Expression.
Picard Metrics
Conclusions
Library Workflow
 We have developed a simplified method to isolate ribosome protected mRNA fragments.
 The Size Exclusion Method (SEC) is faster and does not require special equipment compared to the sucrose gradient method.
 The SEC polysome preps do contain higher amounts of tRNA at low concentrations of ARTseq Nuclease, but by increasing the
nuclease to 60 Units, the amount of tRNA is about the same for the two polysome preps.
 In all other sequencing and data metrics, the results were very similar between the SEC and sucrose gradient methods.
The output files from Cufflinks can be used for basic gene counting and correlation analysis. In this experiment, the left plot
demonstrates a strong correlation (high Pearson, or R2, value) between the SEC and sucrose gradient samples. This suggests these
two methods produce nearly identical results. The middle and right plots demonstrate a low correlation between the samples,
indicating that mRNA associated with ribosomes are not strongly correlated with Total cellular RNA. This suggests that the fraction
of translating mRNA is different than the entire pool of available transcripts in the cell and highlights the importance of using
techniques such as Ribosome Profiling for more accurate gene expression analysis.
Contact
Scott Kuersten
[email protected]
Data Analysis Workflow
 We have developed bioinformatic tools to analyze the ribosome profiling data.
 The workflow is customizable depending on the interests of the researcher.
 The output analysis files can be easily visualized for faster and easier interpretation of the data.
 A more detailed manual for sequencing analysis can be found on Epicentre’s ARTseq website.