Post Run QC Analysis - Pacific Biosciences

Post Run QC Analysis 100‐339‐200‐01 1. Post Run QC Analysis 1.1 Post Run QC Analysis Notes: Welcome to Pacific Biosciences' Post Run QC Analysis Overview.
This training module will describe
the workflow to assess initial key metrics related to the
quality of data from a PacBio® RS II analytical run.
1.2 Contents Notes: In this training module, we will present the following: A review of the basic definitions of Polymerase Reads and Reads of Insert Post run RS Dashboard metrics and how they can give an indication of the quality of data acquired We will also refer to additional resources available. 1.3 How to Use This Training Module Notes: This training module is intended to demonstrate how to interpret the primary analysis metrics displayed through the RS Dashboard following a sequencing run. Customers must reference PacBio® protocols and guidelines for assessing run quality. Note: Deviations from the suggested acceptance criteria might still enable a run to be deemed successful. 1.4 Polymerase Reads & Reads of Insert Notes: Polymerase Reads & Reads of Insert 1.5 Polymerase Reads and Reads of Insert Notes: A SMRTbell™ template is a double-stranded DNA template capped by
hairpin adapter loops at both ends. These hairpin adapters allow the
SMRTbell template to effectively be considered topologically circular from
the perspective of the polymerase. If the polymerase runs to the end of the
insert, it will loop back to the opposite strand of the template molecule and
continue sequencing until the movie acquisition period ends or the
polymerase becomes deactivated.
The term “Polymerase Read” refers to the contiguous sequence of
nucleotides incorporated by the DNA polymerase during a sequencing
reaction, for example while reading around a circular SMRTbell template.
Polymerase Reads are most useful for quality-control monitoring of the
instrument run. Polymerase Read metrics primarily reflect movie length and
other run parameters rather than insert size distribution. Polymerase Reads
are trimmed to include only the high-quality region; they include sequences
from adapters; and can further include sequence from multiple passes
around a circular template.
Each Polymerase Read is partitioned to form one or more subreads, which
contain sequence from a single pass of a polymerase on a single strand of an
insert within a SMRTbell template and no adapter sequences. The subreads
contain the full set of quality values and kinetic measurements. Subreads are
useful for applications like de novo assembly, resequencing, and base
modification analysis.
The ‘Read of Insert’ represents the highest quality single sequence for an
insert, regardless of the number of passes.
A Circular Consensus Sequencing Read (CCS Read) is an example of a special
case where at least two full-pass subreads are collected for an insert. Reads
of Insert give the most accurate estimate of the length of the insert
sequence loaded onto a SMRT® Cell. For long templates, Reads of Insert may
be the same as Polymerase Reads.
If a SMRTbell template received only one-and-a-half subreads, that
information will also be combined into a Read of Insert.
Finally, a Read of Insert will also be produced in cases where the polymerase
makes only 1 or fewer (i.e., incomplete) passes around the SMRTbell
template.
1.6 RS Dashboard Metrics Assessment Notes: RS Dashboard Metrics Assessment
1.7 Post Run QC Key Metrics from RS Dashboard Notes: RS Dashboard is a web-based tool for monitoring runs and performance
trends of a run or multiple runs.
Accessing the RS Dashboard is very simple.
Using web browsers Mozilla® Firefox® or Google® Chrome™, type
<https://pap01-“instrument> name”/Metrics/RSDashboard
This will take you to the RS Dashboard page where you can select a specific
run or multiple runs.
Included in individual run reports are summaries of the run parameters for
each sample with statistics for each.
The RS Dashboard also includes per-SMRT® Cell statistics to ensure that the
resulting throughput and quality are as expected.
At the bottom of the report are charts for each SMRT Cell showing
distributions of read lengths and accuracy, as well as graphs helpful for
troubleshooting.
1.8 Post Run QC Key Metrics from RS Dashboard Notes: The key metrics to review after an analytical run are the following:
Productivity & Loading
Polymerase Reads: Length & Quality
Reads of Insert: Length & Quality
1.9 RS Dashboard Metrics Notes: For information on RS Dashboard metrics, please move the cursor over the
various buttons presented here.
When you are finished, click on the next button located at the lower right
corner of the screen to continue.
1.10 Productivity & Loading Notes: Productivity & Loading 1.11 Productivity Notes: ‘Productivity': A measure of the number (yield) of reads generated from a
ZMW.
P=1 means that there is a polymerase read from that ZMW.
P=0 means that a ZMW did not produce a read and is presumed to be
lacking a polymerase.
P=2 means “other” and the signal collected from the ZMW was not
conducive to efficient base calling, possibly due to multiple templatepolymerase complexes bound in the ZMW and/or high background signal.
1.12 Loading Complexes into ZMWs Follows a Poisson Distribution Notes: This figure shows the relationship between overall ZMW loading and the
number of singly loaded ZMWs.
A critical step in the single-molecule-sequencing workflow is complex
immobilization.
This is a step where complexes are affixed to the bottom of ZMWs in
preparation for sequencing.
The process of immobilization is either passive using diffusion-based
mechanisms, or MagBead-based mechanism.
As a result, complex loading follows a Poisson distribution, which predicts
that approximately 37% (or 55,000) of the ZMWs can be loaded with a single
polymerase.
Overloading will result in a reduction of the number of ZMWs loaded with
single polymerases, and an increase in multiply loaded ZMWs. 1.13 Optimizing Loading Notes: Optimizing Loading

Overloading may increase output of MB per SMRT® Cell, but can increase
multiply loaded ZMWs

High Quality (HQ) region filtering can “rescue” some multiply loaded
ZMWs, increasing the total number of reads per SMRT Cell

Reads that have undergone HQ filtering have
 Shortened read lengths
 Lower accuracy compared to single-loaded ZMWs
 Evidence for increased chimeras

These are less useful reads for de novo assembly

Loading can be optimized through titration
1.14 Large‐Insert Example Notes: Let's look at a typical run for a large insert library using the following
conditions:





20 kb size-selected library
P5-C3 chemistry
180-minute movies
MagBead loading
Stage Start
1.15 Productivity & Loading Notes: Use the loading evaluation plots to evaluate loading performance.
The left plot provides a sense of how much of the ZMW activity is low
quality sequencing.
Plotted are unfiltered vs. Polymerase Reads.
Unfiltered reads are raw bases, which include low- and high-quality bases.
Polymerase Reads are trimmed reads (low-quality bases are filtered) with
bases in the high-quality region.
A complete overlay of unfiltered and Polymerase Reads means most of the
bases are high-quality with little trimming required.
A large discrepancy between unfiltered and Polymerase Reads indicates
there were noisy reads caused by multiply loaded zero mode waveguides
that required significant trimming or filtering.
The plot to the right is also a good metric to measure overloading.
A fraction of 1 means all called bases are in the high-quality region.
As the low-fraction counts increase (as in the case of overloading), more and
more reads have a low proportion of called bases located inside high-quality
regions.
In this example, there is very little discrepancy between the unfiltered and
Polymerase Reads, which means loading is optimal.
Additionally, most of the called bases are located in the HQ region.
Together, this suggests that overloading is not an issue.
1.16 Polymerase Reads and Reads of Insert Notes: Polymerase Reads and Reads of Insert: Length and Quality
1.17 Polymerase Reads & Reads of Insert: Read Length Notes: In the example shown here, the average polymerase read lengths are
approximately 9 kb. This is the average of the trimmed Polymerase Reads
(HQ region only), which includes adapter sequences.
The average read length for the Reads of Insert is approximately 8.3 kb.
This is the contiguous, high-quality sequence with the adapter sequences
removed.
For large-insert SMRTbell™ templates, as in the case for 20-kb size-selected
libraries, the read lengths of the Polymerase Reads and Reads of Insert
should be close to being equal if loaded optimally (little to no trimming
applied).
Overloading and short-insert contamination will result in shorter Polymerase
Reads and Reads of Insert.
For more information on this subject, please see the PacBio® RS II brochure.
http://www.pacificbiosciences.com/products/
The brochure shows representative data from a 20-kb size-selected E. coli
library using a 180-minute movie utilizing P5-C3 chemistry where the
average polymerase read lengths of approximately 8.5 kb have been
demonstrated.
1.18 Polymerase Reads and Reads of Insert: Quality Notes: Polymerase Reads and Reads of Insert: Quality
Polymerase read quality is a trained prediction of a read's mapped accuracy
based on its pulse and base file characteristics.
The read quality depends on the chemistry utilized. In this example, the
read quality of the Polymerase Reads and Reads of Insert are the same.
1.19 Short‐Insert Example Notes: In the next example, a 1.6-kb amplicon library was created using P5-C3
chemistry, and was sequenced using 180-minute movies and MagBead
loading.
1.20 Short‐Insert Example: Length Notes: As demonstrated in this example, the read lengths of the Reads of Insert
closely match the expected insert size of the sample (1.6 kb).
For short-insert libraries, the read lengths and read quality for Polymerase
Reads and Reads of Insert differ.
Depending on the movie lengths, the Polymerase Reads are longer than the
Reads of Insert.
This is because the Reads of Insert is the consensus sequence of all subreads
from the polymerase read.
1.21 Short‐Insert Example: Quality Notes: For short-insert libraries, the read lengths of the Polymerase Reads and
Reads of Insert differ.
As demonstrated in this example, the read lengths of the Reads of Insert
closely match the expected insert size of the sample (1.6 kb).
The read quality will also be different with the Reads of Insert having a
much higher read quality than the Polymerase Reads.
This is achieved through the circular nature of the SMRTbell™ DNA template,
which allows the polymerase to sequence the same base of the same DNA
molecule multiple times.
This is dependent on the insert size.
1.22 Additional Resources Notes: Additional Resources
1.23 Additional Resources Notes: Additional resources are available from the PacBio website.
1.24 Summary Notes: In summary, we have covered the post-run QC analysis process.
In particular, we have described the basic definitions of Polymerase Reads
and Reads of Insert.
We demonstrated the assessment of the RS Dashboard metrics including:
 Productivity & Loading
 Polymerase Reads: Length & Quality, and
 Reads of Insert: Length & Quality.
And finally, we have listed additional technical resources available from the
PacBio portal.
1.25 Thank You Notes: Thank you for your participation. For more information, please contact your local PacBio Field Applications Scientist or PacBio account representative. www.pacificbiosciences.com