Methods and data analysis tools for sequencing ribosome protected mRNA fragments Scott Kuersten1, Lindsay Freeberg, Agnes Radek, Victor Routti1, Sajani Swamy2 1 Epicentre, 5602 Research Park Blvd., Madison, WI, 53719; 2Illumina, San Diego Abstract A translating ribosome protects a discrete footprint of ~30 nt on mRNA templates from nuclease digestion [J.A. Steitz, Nature 224, 957 (1969)]. The techniques referred to as Ribosome Profiling determines the positions of active ribosomes on cellular mRNA by deep sequencing these ribosome-protected mRNA fragments [Ingolia, NT et al. (2009) Science 324, 218]. Quantitative measurement of abundance in this complex library is obtained by counting the number of times any given sequence is read and a rate of protein synthesis can measured by examining the ribosome density present on each mRNA. and monosomes for ribosome profiling. The size-exclusion method is simpler and rapid and does not require any special equipment. In addition, we have explored methods for depleting the samples of unwanted contaminants such as tRNA and rRNA sequences, optimized the nuclease digestions step and vastly improved many aspects of library construction. Using these methods we have stream-lined the steps to convert ribosome protected RNA fragments into libraries compatible with sequencing on Illumina’s instruments. This effort has resulted in the development of the ARTseq™ Ribosome Profiling Kit. Current methods to isolate polysomes and monosomes rely on several hours of ultracentrifugation using either a sucrose gradient or a cushion. We have developed a method that uses size exclusion chromatography (SEC) in disposable spin columns as an alternate to ultracentrifugation to isolate polysomes We present ribosome profiling data generated using these methods in conjunction with bioinformatics tools for analyzing the deep sequencing data using mammalian samples. Figure 1. ARTseq workflow. Figure 4. ARTseq data analysis workflow. Prepare gradients or cushions (~30 min) Lysate Nuclease digestion Size Exclusion OR Chromatography (SEC) Extract RNA Load sample Equilibrate spin-column (gravity) 1 min Load sample General overview of the ARTseq workflow for analyzing Fastq files. A more detailed guide can be found on Epicentre’s ARTseq webpage. 10% Sucrose Gradient 50% 50% Ribo-Zero Ultracentrifugation PAGE purify footprints (~4 hr) Generate FASTQ files Library Prep Illumina sequencing 1 x 51 cycles 1 min Figure 5. Read length distribution. Data Analysis Extract and purify footprinted RNA ~1.5 hr After adaptor trimming and alignment, the output file can be used to plot the number of reads per unit length. In this experiment, samples using various ARTseq Nuclease amounts and either the SEC (left) or sucrose gradient (right) polysome purification are plotted. Note that the total RNA or lysate not treated with ARTseq Nuclease(0 Units) demonstrate very different length profiles compared to Nuclease-treated samples. Also, increasing amounts of ARTseq Nuclease result in shorter length distributions. For both polysome preps, using 60 units of ARTseq Nuclease has the most reads centered around the typical footprint length of 28-31 bases. Pelleted ribosomes Extract and purify footprinted RNA Collect fractions (~1-2 hr) ~6-8 hr General overview of the ARTseq workflow for isolating ribosome protected fragments and preparing them for sequencing. Table 1. Sequencing and analysis metrics. Figure 6. Coding vs UTR coverage. Libraries Reads PF rRNA tRNA Abundant (other) Genome & splice junctions Unaligned Column_0U_Nuclease 66,616,168 1.3% 0.5% 9.9% 64.6% 21.1% Column_3U_Nuclease 61,033,474 18.5% 45.5% 1.9% 26.1% 7.6% Column_10U_Nuclease 64,928,300 15.7% 29.6% 3.5% 42.1% 8.7% Column_30U_Nuclease 64,664,480 24.5% 17.5% 5.7% 43.9% 8.0% Column_60U_Nuclease 65,928,633 30.0% 7.6% 8.9% 44.2% 9.0% Sucrose_0U_Nuclease 67,722,317 2.6% 0.1% 2.2% 78.1% 12.7% Sucrose_3U_Nuclease 62,685,631 29.9% 15.4% 1.8% 42.0% 10.5% Sucrose_10U_Nuclease 57,603,583 30.5% 12.1% 2.4% 44.6% 9.7% Sucrose_30U_Nuclease 52,819,914 38.0% 5.1% 5.2% 41.4% 9.7% Sucrose_60U_Nuclease 50,303,823 44.2% 2.7% 8.4% 36.6% 7.7% Total RNA 44,595,824 2.5% 3.4% 11.3% 45.0% 33.1% Adaptor Trimming Total Sucrose SEC 1.5% 1.3% 10.7% 7.4% 21.4% 41.1% 22.1% 29.8% 24.8% 65.7% 66.4% 7.7% Filter out rRNA, tRNA, other Abundant Coding UTR Intron Intergenic One of the output files contains data about the alignment of RNA to various functional classes in the genome such as coding, intronic, UTR and intergenic. This data can be used to calculate the percent of reads that align to these regions. In this example, note that lysate treated with ARTseq Nuclease show a strong enrichment for coding sequences; an expected result for ribosome protected mRNA. Figure 7. 5′-3′ Read distribution. Bowtie align to rRNA and tRNA contaminants The output file from Bowtie contains the amount of reads that map to the designated rRNA and tRNA sequences. In this experiment, note that on average, the SEC samples have more tRNA sequences than the sucrose gradient method. However, when 60 units of ARTseq Nuclease is used, the percentage of tRNA sequences are similar. Also note that the percentage of tRNA sequences decrease as more ARTseq Nuclease is added in both SEC and Sucrose methods. Therefore, one troubleshooting solution for too much tRNA contamination could be to increase the amount of ARTseq Nuclease during polysome digestion. Figure 2. Visualizing read density across the genome. TopHat align to genome and splice junctions These outputs depict the 5′ to 3′ coverage from the 1000-most expressed transcripts as a function of distance from the 5′ end of transcripts. In this experiment, the Total sample demonstrates a fairly equal distribution of reads from 5′ to 3′. In contrast, samples treated with 60 units of ARTseq Nuclease show a distinct 5′ bias and very little coverage near transcript 3′ ends, characteristic of ribosome footprints and partly due to the addition of cycloheximide in the polysome buffer to stop translational elongation. Libraries from both SEC and sucrose gradient samples demonstrate very similar profiles. Figure 8. Read density near start and stop codons. Transcript detection and counting using Cufflinks, Cuffdiff and Cuffmerge Various files can be uploaded into a genome browser for visualzing results. IGV screenshot above show GAPDH region for footprinted (SEC) and total RNA. Notice the difference in read density at 3′ and 5′ UTRs between the total and footprinted sample. This difference is typical as we do not expect reads in regions that are not translated in the circled regions. In addition, a higher number of reads near the start and stop codon regions are expected (see figure 8). The output files from Picard can be used to specifically examine the areas surrounding the start and stop codons of transcripts. In the experiment, the mean normalized read densities near start (top panels) or stop (bottom panels) codons in the samples were plotted. In both footprinted samples, there is a peak approximately 15 bases after the start codon or 12 bases before the stop codon. Also, near stop codons there is a dramatic drop off in read density after the peak. This represents the ribosome pausing at the stop codon and terminating translation. Figure 3. Gene Expression. Picard Metrics Conclusions Library Workflow We have developed a simplified method to isolate ribosome protected mRNA fragments. The Size Exclusion Method (SEC) is faster and does not require special equipment compared to the sucrose gradient method. The SEC polysome preps do contain higher amounts of tRNA at low concentrations of ARTseq Nuclease, but by increasing the nuclease to 60 Units, the amount of tRNA is about the same for the two polysome preps. In all other sequencing and data metrics, the results were very similar between the SEC and sucrose gradient methods. The output files from Cufflinks can be used for basic gene counting and correlation analysis. In this experiment, the left plot demonstrates a strong correlation (high Pearson, or R2, value) between the SEC and sucrose gradient samples. This suggests these two methods produce nearly identical results. The middle and right plots demonstrate a low correlation between the samples, indicating that mRNA associated with ribosomes are not strongly correlated with Total cellular RNA. This suggests that the fraction of translating mRNA is different than the entire pool of available transcripts in the cell and highlights the importance of using techniques such as Ribosome Profiling for more accurate gene expression analysis. Contact Scott Kuersten [email protected] Data Analysis Workflow We have developed bioinformatic tools to analyze the ribosome profiling data. The workflow is customizable depending on the interests of the researcher. The output analysis files can be easily visualized for faster and easier interpretation of the data. A more detailed manual for sequencing analysis can be found on Epicentre’s ARTseq website.
© Copyright 2026 Paperzz