Sequencing Solutions Technical Note August 2013 WHAT IS… Flow Pattern B 1. OVERVIEW A new concept called ‘flow pattern’ was defined, beginning with software version 2.8 of the 454 Sequencing System. Prior to August 2012, the flow list used during a sequencing run was a cyclic pattern of ‘TACG’, repeated for a defined number of cycles. Beginning with software version 2.8, a new acyclic flow list is supported for use with shotgun sequencing using the GS FLX Titanium Sequencing Kit XL+. With the release of version 2.9, this was extended to allow sequencing of long amplicons from 550 to 800 bp, and up to 1,100 bp when used with custom modified pipeline settings. Applications Shotgun Sequencing Long Amplicon Sequencing A new screen is included in the GS Sequencer Instrument Procedure Wizard when using the XL+ sequencing kit, titled ‘Choose the flow pattern’. Products GS FLX+ System GS FLX Titanium Sequencing Kit XL+ Flow pattern A – a cyclic ‘TACG’ flow pattern with 1,600 nucleotide flows that generates results similar to those obtained with a 400 cycle run in version 2.6 software. Flow pattern B - an acyclic flow pattern with 1,779 nucleotide flows that is expected to increase read length after all signal processing filters have been applied. Details about the design and function of flow pattern B are described below. For life science research only. Not for use in diagnostic procedures. 1 2. EXPECTED RESULTS Choice of Sequencing Kit and Flow Pattern The XL+ sequencing kit with flow pattern B is recommended as a starting point for sequencing both shotgun and long amplicon reads. For most long read samples tested to date, flow pattern B has yielded better results than flow pattern A. Other sample types with shorter library read distributions (such as cDNA or amplicons shorter than 550 bases) can also use this combination of kit and flow pattern, but may not show benefits as large as those observed for long read length samples (Table 1). Recommended Sequencing Kit Recommended Flow Pattern Expected Read Length Genomic shotgun XL+ Flow pattern B (Acyclic) Longer than both XLR70 and XL+ with flow pattern A Long Amplicon (>550 bp) XL+ Flow pattern B (Acyclic) Longer than XLR70 Transcriptomic (cDNA) XL+ Flow pattern A (Cyclic) Flow pattern B (Acyclic) XLR70 read lengths Standard Amplicon (<550 bp) XLR70 (Cyclic) XLR70 read lengths Paired End XLR70 (Cyclic) XLR70 read lengths Sample Type Table 1: Choice of sequencing kit and flow pattern. The XL+ kit with flow pattern B can be used with most sample types, but is not supported for standard amplicon or paired end sequencing. The XLR70 kit automatically uses the equivalent of the cyclic flow pattern A, but with fewer cycles (200 vs. 400). Figure 1 compares read length of passed filter reads across genomic shotgun runs on multiple instruments for four reference genomes with diverse sequence compositions. The benefit of flow pattern B varies across genomes. A B Figure 1: Passed filter read length for reference genomes. Shotgun libraries were prepared from four reference genomes, and sequenced repeatedly using the XLR70 kit (200 cycles) or the XL+ kit (400 cycles/flow pattern A or the acyclic flow pattern B). The average (panel A) and modal (panel B) read lengths after signal processing are compared. The fifth set of bars is an average of the four values obtained for the individual genomes. Error bars represent the standard error of the mean (SEM). These results are for illustrative purposes only, and should not be interpreted as a guarantee of performance. Technical Note: WHAT-IS… Flow Pattern B 2 There are two major sources of variability in read length metrics; variation from genome-to-genome and variation from run-to-run with the same genome. Read lengths vary, depending on the genome composition (e.g. variations in G/C content and homopolymer length). However, all four reference genomes showed increased average read length with the acyclic flow pattern B, relative to the cyclic flow pattern A. Similarly, amplicon sequencing accuracy and read length can vary based on sequence content, but flow pattern B consistently outperforms flow pattern A for both shotgun and amplicon sequencing. 3. STRUCTURE OF FLOW PATTERN B The flow list defined by flow pattern B is considerably more complex than the one defined by the cyclic flow pattern. The first 12 nucleotide flows remain unchanged from the cyclic flow pattern based on ‘TACG’. These 12 flows are sufficient to sequence the library or control key sequences at the beginning of each read. The remaining nucleotide flows are divided into blocks of 33 flows that fall into four categories (designated by four colors in Figure 2). Each block has 8 flows for three of the nucleotides, with 9 flows for the fourth nucleotide. In other words, each block has one ‘extra’ flow for one of the four nucleotides (T, A, C, and G for green, magenta, blue, and red, respectively). Each block type is used either 13 or 14 times, with the last green block truncated to 18 nucleotide flows. Overall, each nucleotide is represented (nearly) equally, with 444 T flows, 445 A flows, 446 C flows, and 444 G flows, for a total of 1,779 nucleotide flows. TACGTACGTACG (Key flows) ATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC AGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGC ATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC ATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC ATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGC ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC ATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC ATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC ATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGC AGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC AGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGC ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC ATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC AGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGC ATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC ATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGC ATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC AGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGC AGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC ATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGC AGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGC ATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGC AGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC Figure 2: Flow pattern B. After 12 flows of the standard cyclic flow order ‘TACG’ (enough to sequence the four nucleotides of any of the existing sequencing keys), flow pattern B is broken into blocks of 33 flows. Each block contains one ‘extra’ nucleotide flow relative to the other nucleotides: Red- extra G, Green- extra T, Blue- extra C, Magenta- extra A. Although larger ‘super-blocks’ of up to 231 nucleotide flows occur within the flow pattern B flow list, the flow list never starts over to repeat itself (in other words, it is acyclic). Technical Note: WHAT-IS… Flow Pattern B 3 The flow pattern B flow list is defined in the run script (.icl) file named ‘ACYCLIC_70x75_XLPLUSKIT.icl’, which is located by default in /usr/local/rig/runScripts on the GS FLX+ Instrument. A list of the 1,779 nucleotide flows is included in the header of each SFF file sequenced using flow pattern B (viewed using the sffinfo -c). The flow list used to sequence specific reads is also visible on the Wells or Flowgrams tabs of GS Run Browser. 4. HOW FLOW PATTERN B WORKS Flow Pattern B and CAFIE Errors Ideally, each well on a PicoTiterPlate device contains exactly one DNA Capture Bead that has millions of copies of a single amplified DNA fragment attached to it. During a sequencing run, the total signal from each well is the sum of millions of signals, as each strand incorporates nucleotide during elongation. Over the course of the sequencing run, the signals from individual DNA strands may fall ‘out of sync’, with some strands incorporating ahead or behind the majority of other strands. This is called CAFIE error (see Glossary). One of the characteristics of the cyclic flow pattern A is that once a particular nucleotide has flowed, each of the other three will flow before the first one flows again. A consequence of this behavior is that if a subset of DNA strands on a bead fails to fully incorporate (incomplete extension), there is a 100% probability that the following base in the sequence will flow before the incompletely extended strands have a chance to ‘catch up’. This subset of strands will continue to incorporate out-of-phase for the remainder of the sequencing run, leading to degradation in the signal-to-noise and reduced basecalling accuracy. With the acyclic flow pattern B, any given nucleotide may flow repeatedly before all of the other nucleotides have had a chance to flow. If a second flow of the nucleotide occurs before a flow for the following base in the DNA sequence, the incomplete strands will complete their incorporation and be back in-phase (‘catch up’) with the remainder of the strands. This ability to catch up from an incomplete extension event depends on both the following flow(s) in the flow list and the following base in the DNA sequence; but the probability of synchronizing will always be as good as or better than with the cyclic flow pattern. Similarly, the other CAFIE error (carry forward error) can also be prevented or reversed by the non-uniform distribution of flows in an acyclic flow pattern. CAFIE errors accumulate over the course of a sequencing run, and represent one of the major limits on maximum trimmed (passed filter) read length. Depending on how well the specific characteristics of the flow pattern match with the specific DNA sequences encountered, acyclic flow patterns such as flow pattern B may reduce this source of signal degradation, resulting in longer overall passed filter read lengths. Flow Pattern B and Raw (Untrimmed) Read Length The positive effect of the reduction of CAFIE errors is counter-balanced by the possibility that a given read will get ‘stuck’ waiting for the next flow that matches the following base in the DNA sequence. With the four nucleotide cyclic flow pattern, a given nucleotide flow always occurs exactly four flows after the previous occurrence, so a read will never have fewer than one positive flow per four flows. With flow pattern B, the number of flows required to include at least one of each of the four nucleotides (defined as a flow set; see Glossary) varies over the course of a run, but averages about 18% more than the exactly four flows required for the cyclic flow pattern. Flow pattern B tends to generate a smaller proportion of positive flows (depending on the read sequence), and thus may require a greater number of flows to sequence through a given read. For a genome with a balanced GC composition and typical homopolymer content, flow pattern B requires ~1.8 flows per base, relative to ~1.5 flows per base for flow pattern A. For high- or low-GC genomes, these values are each about 10% lower. Technical Note: WHAT-IS… Flow Pattern B 4 Although the smaller proportion of positive flows is partially balanced by the larger number of flows, the average untrimmed read length in raw wells is expected to be 5% to 10% shorter for flow pattern B relative to flow pattern A. However, for most genomes the trimmed passed filter read length will actually end up being longer, because reduced CAFIE error results in less trimming of reads sequenced using flow pattern B. Flow Pattern B and Read Quality Filters The predicted difference in flows per base between flow pattern A and flow pattern B is also expected to lead to differences in the number of reads discarded by various filters and reported as numDotFailed, numMixedFailed, and numTrimmedTooShortQuality. Two types of adjustments were made in software v2.9 to partially compensate for this; [a] altered cutoff values for the Dot and Mixed Filters, and [b] the addition of a qualityMinLength parameter that controls the minimum length of acyclic flow pattern reads that have been trimmed by the Trimback Valley Filter or the Basecall Quality Score Filter. Even with these adjustments, rejected read counts are still expected to be somewhat different with flow pattern B. 5. TIPS AND CAVEATS Currently (v2.9), flow pattern B is available for use with the GS FLX Titanium Sequencing Kit XL+ on the GS FLX+ System, only. In the future, flow pattern B or its equivalent will be available on the GS Junior System. The advantage of flow pattern B is most easily demonstrated with long reads, because shorter reads are already very accurate when sequenced with the cyclic flow pattern. Tools that rely on the specific ‘TACG’ cyclic order of nucleotide flows or associated error profile will need to be modified to work with flow pattern B. All 454 Life Sciences tools and programs support the new flow pattern B. The current versions of some third-party tools (e.g. mothur v1.30.0 and later) have already added support for flow pattern B. Other third-party tools used for downstream analysis may still need to be modified. 6. REFERENCES Flow pattern B and related terminology are described in the following sections of the software manual. 454 Sequencing System Software Manual, Part A: GS Sequencer and Other On-Instrument Applications (for the GS FLX and GS FLX+ Systems), Section 2.4.2 Choice of Sequencing Kit and Flow Pattern 454 Sequencing System Software Manual, Part B: GS Run Processor, GS Reporter, GS Run Browser, GS Support Tool, Section 1.3.4 Minimum Retained Read Length 454 Sequencing System Software Manual, Part B: GS Run Processor, GS Reporter, GS Run Browser, GS Support Tool, Section 1.3.7 Signal Processing with Acyclic Flow Pattern 454 Sequencing System Software Manual, Part B: GS Run Processor, GS Reporter, GS Run Browser, GS Support Tool, Glossary For additional questions, please use traditional Roche support channels. Technical Note: WHAT-IS… Flow Pattern B 5 7. GLOSSARY CAFIE (CArry Forward & Incomplete Extension) – out-of-phase sequencing errors that occur when a subset of DNA strands on a bead incorporate nucleotides out-of-phase with respect to the rest of the strands, which increases the noise, degrades the signal-to-noise, and reduces the accuracy of basecalling. Carry Forward occurs when a trace amount of nucleotide remains in a well after the apyrase wash, perpetuating premature nucleotide incorporation. Incomplete Extension occurs when some DNA strands on a bead fail to incorporate during the appropriate nucleotide flow, and must wait for the next flow of that nucleotide to continue extending. Dot – a block of negative nucleotide flows (denoted as ‘N’ in a DNA sequence) that is ended by a positive flow of one of the nucleotides in the block, or started and ended by positive flows of the same nucleotide. Flow list – the series of nucleotide flows during a sequencing run, as specified by the run script. Flow order – the repeated sequence of nucleotides flowed during each flow set of a cyclic flow pattern sequencing run, generally ‘TACG’. Flow pattern – the pattern of nucleotide flows in a flow list, as determined by the choice of run script. Cyclic flow pattern – a pattern of nucleotide flows characterized by a repeated cycle of four nucleotide flows, with each cycle (flow set) defined by a specific flow order. Acyclic flow pattern – a pattern of nucleotide flows characterized by a pattern that is not cyclic. Flow set – the smallest group of nucleotide flows at any point in a flow list that includes at least one flow of each of the four nucleotides, with the simplest case being a four nucleotide flow cycle in a cyclic flow pattern. Nucleotide Flow – during a sequencing run, nucleotides are flowed sequentially across the PicoTiterPlate device, one at a time, as controlled by the run script. Run script – an instrument control (.icl) script file that specifies the type and duration of each flow during a sequencing run, located by default in /usr/local/rig/runScripts on the GS FLX+ Instrument. The sequencing run script is automatically selected based on choices made during run setup. Published by: Roche Diagnostics GmbH Sandhofer Straße 116 68305 Mannheim Germany © 2013 Roche Diagnostics All rights reserved. Notice to Purchaser For patent license limitations for individual products please refer to: www.technical-support.roche.com. For life science research only. Not for use in diagnostic procedures. Trademarks 454, 454 LIFE SCIENCES, 454 SEQUENCING, GS FLX, GS FLX TITANIUM, GS JUNIOR, and PICOTITERPLATE are trademarks of Roche. All other product names and trademarks are the property of their respective owners. Technical Note: WHAT-IS… Flow Pattern B 07002491001 (2) 0813 6
© Copyright 2026 Paperzz