454 Life Sciences Technical Manual

Sequencing Solutions Technical Note
August 2013
WHAT IS…
Flow Pattern B
1. OVERVIEW
A new concept called ‘flow pattern’ was defined,
beginning with software version 2.8 of the 454
Sequencing System.
Prior to August 2012, the flow list used during a
sequencing run was a cyclic pattern of ‘TACG’, repeated
for a defined number of cycles.
Beginning with software version 2.8, a new acyclic flow
list is supported for use with shotgun sequencing using
the GS FLX Titanium Sequencing Kit XL+. With the
release of version 2.9, this was extended to allow
sequencing of long amplicons from 550 to 800 bp, and
up to 1,100 bp when used with custom modified
pipeline settings.
Applications
Shotgun Sequencing
Long Amplicon Sequencing
A new screen is included in the GS Sequencer
Instrument Procedure Wizard when using the XL+
sequencing kit, titled ‘Choose the flow pattern’.
Products
GS FLX+ System
GS FLX Titanium Sequencing Kit XL+

Flow pattern A – a cyclic ‘TACG’ flow
pattern with 1,600 nucleotide flows that
generates results similar to those obtained
with a 400 cycle run in version 2.6
software.

Flow pattern B - an acyclic flow pattern
with 1,779 nucleotide flows that is
expected to increase read length after all
signal processing filters have been applied.
Details about the design and function of flow pattern B
are described below.
For life science research only. Not for use in diagnostic procedures.
1
2. EXPECTED RESULTS
Choice of Sequencing Kit and Flow Pattern
The XL+ sequencing kit with flow pattern B is recommended as a starting point for sequencing both shotgun and
long amplicon reads. For most long read samples tested to date, flow pattern B has yielded better results than flow
pattern A. Other sample types with shorter library read distributions (such as cDNA or amplicons shorter than 550
bases) can also use this combination of kit and flow pattern, but may not show benefits as large as those observed for
long read length samples (Table 1).
Recommended
Sequencing Kit
Recommended
Flow Pattern
Expected Read Length
Genomic shotgun
XL+
Flow pattern B (Acyclic)
Longer than both XLR70 and
XL+ with flow pattern A
Long Amplicon (>550 bp)
XL+
Flow pattern B (Acyclic)
Longer than XLR70
Transcriptomic (cDNA)
XL+
Flow pattern A (Cyclic)
Flow pattern B (Acyclic)
XLR70 read lengths
Standard Amplicon (<550 bp)
XLR70
(Cyclic)
XLR70 read lengths
Paired End
XLR70
(Cyclic)
XLR70 read lengths
Sample Type
Table 1: Choice of sequencing kit and flow pattern. The XL+ kit with flow pattern B can be used with most sample types, but is
not supported for standard amplicon or paired end sequencing. The XLR70 kit automatically uses the equivalent of the cyclic
flow pattern A, but with fewer cycles (200 vs. 400).
Figure 1 compares read length of passed filter reads across genomic shotgun runs on multiple instruments for four
reference genomes with diverse sequence compositions. The benefit of flow pattern B varies across genomes.
A
B
Figure 1: Passed filter read length for reference genomes. Shotgun libraries were prepared from four reference genomes, and
sequenced repeatedly using the XLR70 kit (200 cycles) or the XL+ kit (400 cycles/flow pattern A or the acyclic flow
pattern B). The average (panel A) and modal (panel B) read lengths after signal processing are compared. The fifth set of bars
is an average of the four values obtained for the individual genomes. Error bars represent the standard error of the mean
(SEM). These results are for illustrative purposes only, and should not be interpreted as a guarantee of performance.
Technical Note: WHAT-IS…
Flow Pattern B
2
There are two major sources of variability in read length metrics; variation from genome-to-genome and variation
from run-to-run with the same genome. Read lengths vary, depending on the genome composition (e.g. variations in
G/C content and homopolymer length). However, all four reference genomes showed increased average read length
with the acyclic flow pattern B, relative to the cyclic flow pattern A. Similarly, amplicon sequencing accuracy and
read length can vary based on sequence content, but flow pattern B consistently outperforms flow pattern A for both
shotgun and amplicon sequencing.
3. STRUCTURE OF FLOW PATTERN B
The flow list defined by flow pattern B is considerably more complex than the one defined by the cyclic flow pattern.
The first 12 nucleotide flows remain unchanged from the cyclic flow pattern based on ‘TACG’. These 12 flows are
sufficient to sequence the library or control key sequences at the beginning of each read.
The remaining nucleotide flows are divided into blocks of 33 flows that fall into four categories (designated by four
colors in Figure 2). Each block has 8 flows for three of the nucleotides, with 9 flows for the fourth nucleotide. In
other words, each block has one ‘extra’ flow for one of the four nucleotides (T, A, C, and G for green, magenta, blue,
and red, respectively). Each block type is used either 13 or 14 times, with the last green block truncated to 18
nucleotide flows. Overall, each nucleotide is represented (nearly) equally, with 444 T flows, 445 A flows, 446 C flows,
and 444 G flows, for a total of 1,779 nucleotide flows.
TACGTACGTACG (Key flows)
ATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC
AGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGC
ATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC
ATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC
ATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGC
ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC
ATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC
ATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC
ATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGC
AGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC
AGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGC
ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC
ATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC
AGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC
ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGC
ATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC
ATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC
ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC
ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC
ATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC
ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC
AGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGC
AGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC
ATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC
AGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC
ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGC
AGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC
Figure 2: Flow pattern B. After 12 flows of the standard cyclic flow order ‘TACG’ (enough to sequence the four nucleotides of
any of the existing sequencing keys), flow pattern B is broken into blocks of 33 flows. Each block contains one ‘extra’
nucleotide flow relative to the other nucleotides: Red- extra G, Green- extra T, Blue- extra C, Magenta- extra A.
Although larger ‘super-blocks’ of up to 231 nucleotide flows occur within the flow pattern B flow list, the flow list
never starts over to repeat itself (in other words, it is acyclic).
Technical Note: WHAT-IS…
Flow Pattern B
3
The flow pattern B flow list is defined in the run script (.icl) file named ‘ACYCLIC_70x75_XLPLUSKIT.icl’, which is
located by default in /usr/local/rig/runScripts on the GS FLX+ Instrument. A list of the 1,779 nucleotide flows is
included in the header of each SFF file sequenced using flow pattern B (viewed using the sffinfo -c). The flow list
used to sequence specific reads is also visible on the Wells or Flowgrams tabs of GS Run Browser.
4. HOW FLOW PATTERN B WORKS
Flow Pattern B and CAFIE Errors
Ideally, each well on a PicoTiterPlate device contains exactly one DNA Capture Bead that has millions of copies of a
single amplified DNA fragment attached to it. During a sequencing run, the total signal from each well is the sum of
millions of signals, as each strand incorporates nucleotide during elongation. Over the course of the sequencing run,
the signals from individual DNA strands may fall ‘out of sync’, with some strands incorporating ahead or behind the
majority of other strands. This is called CAFIE error (see Glossary).
One of the characteristics of the cyclic flow pattern A is that once a particular nucleotide has flowed, each of the
other three will flow before the first one flows again. A consequence of this behavior is that if a subset of DNA
strands on a bead fails to fully incorporate (incomplete extension), there is a 100% probability that the following base
in the sequence will flow before the incompletely extended strands have a chance to ‘catch up’. This subset of strands
will continue to incorporate out-of-phase for the remainder of the sequencing run, leading to degradation in the
signal-to-noise and reduced basecalling accuracy.
With the acyclic flow pattern B, any given nucleotide may flow repeatedly before all of the other nucleotides have
had a chance to flow. If a second flow of the nucleotide occurs before a flow for the following base in the DNA
sequence, the incomplete strands will complete their incorporation and be back in-phase (‘catch up’) with the
remainder of the strands. This ability to catch up from an incomplete extension event depends on both the following
flow(s) in the flow list and the following base in the DNA sequence; but the probability of synchronizing will always
be as good as or better than with the cyclic flow pattern. Similarly, the other CAFIE error (carry forward error) can
also be prevented or reversed by the non-uniform distribution of flows in an acyclic flow pattern.
CAFIE errors accumulate over the course of a sequencing run, and represent one of the major limits on maximum
trimmed (passed filter) read length. Depending on how well the specific characteristics of the flow pattern match
with the specific DNA sequences encountered, acyclic flow patterns such as flow pattern B may reduce this source of
signal degradation, resulting in longer overall passed filter read lengths.
Flow Pattern B and Raw (Untrimmed) Read Length
The positive effect of the reduction of CAFIE errors is counter-balanced by the possibility that a given read will get
‘stuck’ waiting for the next flow that matches the following base in the DNA sequence. With the four nucleotide
cyclic flow pattern, a given nucleotide flow always occurs exactly four flows after the previous occurrence, so a read
will never have fewer than one positive flow per four flows. With flow pattern B, the number of flows required to
include at least one of each of the four nucleotides (defined as a flow set; see Glossary) varies over the course of a
run, but averages about 18% more than the exactly four flows required for the cyclic flow pattern.
Flow pattern B tends to generate a smaller proportion of positive flows (depending on the read sequence), and thus
may require a greater number of flows to sequence through a given read. For a genome with a balanced GC
composition and typical homopolymer content, flow pattern B requires ~1.8 flows per base, relative to ~1.5 flows per
base for flow pattern A. For high- or low-GC genomes, these values are each about 10% lower.
Technical Note: WHAT-IS…
Flow Pattern B
4
Although the smaller proportion of positive flows is partially balanced by the larger number of flows, the average
untrimmed read length in raw wells is expected to be 5% to 10% shorter for flow pattern B relative to flow pattern A.
However, for most genomes the trimmed passed filter read length will actually end up being longer, because reduced
CAFIE error results in less trimming of reads sequenced using flow pattern B.
Flow Pattern B and Read Quality Filters
The predicted difference in flows per base between flow pattern A and flow pattern B is also expected to lead to
differences in the number of reads discarded by various filters and reported as numDotFailed, numMixedFailed, and
numTrimmedTooShortQuality. Two types of adjustments were made in software v2.9 to partially compensate for
this; [a] altered cutoff values for the Dot and Mixed Filters, and [b] the addition of a qualityMinLength parameter
that controls the minimum length of acyclic flow pattern reads that have been trimmed by the Trimback Valley
Filter or the Basecall Quality Score Filter. Even with these adjustments, rejected read counts are still expected to be
somewhat different with flow pattern B.
5. TIPS AND CAVEATS

Currently (v2.9), flow pattern B is available for use with the GS FLX Titanium Sequencing Kit XL+ on the GS
FLX+ System, only.

In the future, flow pattern B or its equivalent will be available on the GS Junior System.

The advantage of flow pattern B is most easily demonstrated with long reads, because shorter reads are
already very accurate when sequenced with the cyclic flow pattern.

Tools that rely on the specific ‘TACG’ cyclic order of nucleotide flows or associated error profile will need to
be modified to work with flow pattern B.

All 454 Life Sciences tools and programs support the new flow pattern B.

The current versions of some third-party tools (e.g. mothur v1.30.0 and later) have already added
support for flow pattern B.

Other third-party tools used for downstream analysis may still need to be modified.
6. REFERENCES
Flow pattern B and related terminology are described in the following sections of the software manual.

454 Sequencing System Software Manual, Part A: GS Sequencer and Other On-Instrument Applications (for
the GS FLX and GS FLX+ Systems), Section 2.4.2 Choice of Sequencing Kit and Flow Pattern

454 Sequencing System Software Manual, Part B: GS Run Processor, GS Reporter, GS Run Browser, GS
Support Tool, Section 1.3.4 Minimum Retained Read Length

454 Sequencing System Software Manual, Part B: GS Run Processor, GS Reporter, GS Run Browser, GS
Support Tool, Section 1.3.7 Signal Processing with Acyclic Flow Pattern

454 Sequencing System Software Manual, Part B: GS Run Processor, GS Reporter, GS Run Browser, GS
Support Tool, Glossary
For additional questions, please use traditional Roche support channels.
Technical Note: WHAT-IS…
Flow Pattern B
5
7. GLOSSARY
CAFIE (CArry Forward & Incomplete Extension) – out-of-phase sequencing errors that occur when a subset of
DNA strands on a bead incorporate nucleotides out-of-phase with respect to the rest of the strands, which increases
the noise, degrades the signal-to-noise, and reduces the accuracy of basecalling.

Carry Forward occurs when a trace amount of nucleotide remains in a well after the apyrase wash,
perpetuating premature nucleotide incorporation.

Incomplete Extension occurs when some DNA strands on a bead fail to incorporate during the appropriate
nucleotide flow, and must wait for the next flow of that nucleotide to continue extending.
Dot – a block of negative nucleotide flows (denoted as ‘N’ in a DNA sequence) that is ended by a positive flow of one
of the nucleotides in the block, or started and ended by positive flows of the same nucleotide.
Flow list – the series of nucleotide flows during a sequencing run, as specified by the run script.
Flow order – the repeated sequence of nucleotides flowed during each flow set of a cyclic flow pattern sequencing
run, generally ‘TACG’.
Flow pattern – the pattern of nucleotide flows in a flow list, as determined by the choice of run script.

Cyclic flow pattern – a pattern of nucleotide flows characterized by a repeated cycle of four nucleotide flows,
with each cycle (flow set) defined by a specific flow order.

Acyclic flow pattern – a pattern of nucleotide flows characterized by a pattern that is not cyclic.
Flow set – the smallest group of nucleotide flows at any point in a flow list that includes at least one flow of each of
the four nucleotides, with the simplest case being a four nucleotide flow cycle in a cyclic flow pattern.
Nucleotide Flow – during a sequencing run, nucleotides are flowed sequentially across the PicoTiterPlate device,
one at a time, as controlled by the run script.
Run script – an instrument control (.icl) script file that specifies the type and duration of each flow during a
sequencing run, located by default in /usr/local/rig/runScripts on the GS FLX+ Instrument. The sequencing run
script is automatically selected based on choices made during run setup.
Published by:
Roche Diagnostics GmbH
Sandhofer Straße 116
68305 Mannheim
Germany
© 2013 Roche Diagnostics
All rights reserved.
Notice to Purchaser
For patent license limitations for individual products please refer to: www.technical-support.roche.com.
For life science research only. Not for use in diagnostic procedures.
Trademarks
454, 454 LIFE SCIENCES, 454 SEQUENCING, GS FLX, GS FLX TITANIUM, GS JUNIOR, and PICOTITERPLATE
are trademarks of Roche.
All other product names and trademarks are the property of their respective owners.
Technical Note: WHAT-IS…
Flow Pattern B
07002491001 (2) 0813
6

Download Report

454 Life Sciences Technical Manual

Paperzz.com

Your Paperzz