‘KeyGene Nanopore Seminar’ April 2016 Clive G. Brown C.T.O. Clive G. Brown Oxford Nanopore CTO, Oxford Nanopore ONT - Company Overview Formed in 2005 : single-molecule sensing system for DNA/RNA/Prot Scalable electronic products: MinION™ (USB), PromethION ™, VolTRAX™ First commercial nanopore products (2014-2015). Total investment to date > £180M Experienced management and Board, 240+ employees Broad intellectual property portfolio: in-house and through collaborations including Harvard, Oxford, UCSC www.oxfordnanopore.com 3 Ubiquitous analyses to enable …an Internet Of Living Things © Copyright 2016 Oxford 4 For the analysis of any living thing, by anybody, anywhere and at any time E.g. species identification E.g. field testing in epidemics E.g. environmental monitoring E.g. infectious disease 5 For monitoring pathogen outbreaks in remote locations) 6 © Copyright 2015 Oxford NANOPORE SEQUENCING IS BEING USED IN TRADITIONAL LABS IN 50 COUNTRIES… (~1500 INSTALLATIONS) … and many emerging applications in the field – only truly portable device with potential for clinic, rural/outdoors, home 7 INTRODUCTION TO NANOPORE SEQUENCING - 1 ORIGINAL CONCEPT Deamer & Branton et al. ~20 years ago Hundreds of papers and patents Over 40 MinION papers … since ‘Strand’ Deamer’s Notebook 8 Strand Sequencing Previous publications by Oxford Nanopore collaborators 1996 2009 2010 2012 Key ONT History 2009-2012 Using mostly Board update slides Confidential ONT History : Strand Sequencing – (Nov 2010) Summary of Strand Sequencing Progress Control of Strand Motion: Demonstrated control of strand motion through Hemolysin using Phi29 Developed two modes of operation – Polymerase and Exo/Unzipping Shown that Phi29 can operate as an exonuclease and a polymerase under an applied potential Moved a complex strand (70mer PhiX fragment) through the nanopore in unzipping mode Produced a consensus plot of 55 strands from a PhiX fragment showing consistent behaviour of strand motion GGATTTCGATGTAGCAGTTGCAATATAAAACTATCAAACTGCCAATTACGACCATTACCACCAAAAGAAGTTTTAAACAATCGG CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCACCCCCCCCCCCCCCCCCCCCTAAAGCTACATCGTCAACGTTATATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCAAAATTTGTTAGCC Confidential ONT History : Strand Sequencing – Current / Sequence : March 2011 Prediction of Strand States from Triplet Data Standard UZ07 strand used as the basis for the a prediction from triplets Consensus plot generated from multiple strands G T G CC T A A T T C T G A T T T T T A Consensus plot can be manually overlaid with predicted model T T T T T TT T T G T T T G T G T T G T T T G A C C T T T C A T T C C C T T A T G G Data shows strong alignment for bases in middle section As the enzyme reaches the back end of the strand, the states fail to align ONT History : Strand Sequencing – Read Length. March 2011 Consensus plot for the 400 bp fragment Abasic Section from Primer Indication that 400 bp fragment fully unzips Large current deflection from abasic region in primer indicates correct start juncture Consistently last about 2 minutes, as expected Behaviour at the end of the strands are consistent between runs About half of the states are present in most of the strands Confidential ONT History : Strand Sequencing – Current / Sequence Base Calling. March 2011 Prediction of Strand States from Triplet Data Consensus plot generated from multiple strands on standard UZ07 strand A reference signal is produced from combining individual triplet signals Initial base caller, derived from triplet set, gives very good sequence agreement Expected issues coming into the sequence from the polyC leader End of the strand shows different behaviour to the predicted sequence – again, this is expected as the enzyme loses it’s grip on the strand De Novo base call of sequence Confidential Strand Sequencing – May 2011 De Bruijn Training – SNPs Use N to generate a mixture of static strands Examine the current spread as N is moved from triplet • Expect to see single current level past first set DDDDDDDDDDDDDDDDDDDNDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDNDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDNDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDNDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDNDDDDDDDDDDDDDDD Can also examine the exit of the barrel DDDDDDDDDDDDDDDDDNDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDNDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDNDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDNDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDNDDDDDDDDDDDDDDDDDDDDDDDDD DDD – Triplet from De Bruijn NNN– Random base (G, T, A or C) DDD– De Bruijn Background Confidential Strand Sequencing – Sense and Antisense – May 2011 Reading Around a Hairpin – sequencing the sense and antisense strand Recent work has shown that Phi29 is predominately a single stranded DNA polymerase This should mean that if a DNA hairpin is used, the Phi29 will process both sides of the hairpin Reading the sense and antisense strand from a single molecule would greatly increase sequencing accuracy The G and T rich region can clearly be seen moving through the pore followed by the TTTT hairpin turn The antisense strand, predominantly A and C is seen to follow the TTTT turn Confidential Strand Sequencing – Movement – June 2011 Effect of Poor Movement on Base-calling Missing states from problems with movement make base calling from single molecule data difficult Consensus plots can be generated from multiple single molecule reads Consensus mapped to static: Base calls can be made from a consensus of single molecule reads by mapping to known model Gaps in the sequence are evident even for known model • Need to make improvements to movement to drive progress ~JUNE 2011 ONT DITCHED POLYMERASES 18 Confidential Strand Sequencing – Movement : June’/July 2011 Base Calling Data from Helicase Data Improvements to the movement allows base calling to be performed on a single molecule Training set and moving strand chosen to be the same sequence (assumes triplet model fully describes data) Base calling algorithms used using the triplet data from the De Bruijn k3 training set States from static training data Single molecule using a helicase CCCGCGGGC CTGCTCGTGGTC TTGTTTCC AGCATCAC GAGGATGACTAGTATTACAAGAATAAACCCCTTTTT ||||| |X| ||XX|||||||| ||| |||| || | ||| X|||||X||||X|||| ||||||||||||||||||| CCCGC GTCATCTTATCGTGGTCTTTG TTCCAAG A CACAAAGGATTACTATTATT CAAGAATAAACCCCTTTTT Single molecule data performs very well – some missed state and miss calls Good starting point – system need optimisation to reduce errors Reference Called Confidential Strand Sequencing – Movement July 2011 Comparison Between Phi29 and Helicase Data Problems with using Phi29 as a motor include: • missed states, fast and slow movement regimes, backwards movement at low potential, pausing Initial work on helicases show that the distribution of movement is a lot more controlled Helicase Polymerase Training Initial Guess – Sept 2011 Read head position map Iterative methods often require an initial guess at the model – Static training • • • – Mapping existing model to a new experimental condition • • • – Mapping the read head position with a 3mer in a homo-polymer background Run all 64 3mers in the mapped read head position Generate a 3mer model Measured 3mer coefficients Run a control strand through the pore Apply linear transformations (shift and scale) Minimise distance between model and data to determine transformation Upgrade to higher k-mer model Replicate coefficients to upgrade Linear transformations to minimise “distance” Shift • 3mer to 4mer upgrade 3mers: c20290 Linear transform: c19489 Scale Abstraction of training process Fit data to model Initial guess at model Model Estimate model from fit Data Model Out Converged? Training process Currently using an custom adaptation Expectation-Maximisation (EM) algorithm – – – – We call this a “Forced Path” model Sequence of DNA is known Properties of the system are modelled The devil is in the detail Step by step process – – – – – Sequence defines the underlying state path State (k-mer) model defines the currents observed at each point in the state path Properties of the movement system, coded in a transition matrix, define movement through the state path Calculated data path is combined with other observations of the same sequence Observations from different sequences are combined and used to re-estimate the state (k-mer) model Observations from fragments can be combined to produce a complete PhiX path Calculating a path (1) State emission probabilities Current (pA) Current (pA) State (Kmer) Position in sequence Transition matrix Step back (2->1) Skip (2->4) Advance (2->3) Probability Probability Stay (2->2) Probability Probability GAGTTTTATCGCTTCCATGACGCAG AAGTTAACACTTTCGGATATTTCTGA TGAGTCGAAAAATTATCTTGATAAAG CAGGAATTACTACTGCTTGTTTACG AATTAAATCGAAGTGGACTGCTGGC GGAAAATGAGAAAATTCGACCTATC CTTGCGCAGCTCGAGAAGCTCTTAC TTTGCGACCTTTCGCCATCAACTAA CGATTCTGTCAAAAACTGACGCGTT GGATGAGGAGAAGTGGCTTAATATG CTTGGCACGTTCGTCAAGGACTGGT TTAGATATGAGTCACATTTTGTTCAT GGTAGAGATTCTCTTGTTGACATTTT AAAAGAGCGTGGATTACTATC Allowed to slip back to any previous state (low probability) Calling The PhiX Path (Sense) 100 bases from the trained PhiX path – – – – This is a “consensus” We’ve derived the model from the same dataset Where the model tracks the data well we can determine the correct state sequence States are translated back to base calls Where the data and model are close we get the states correct Distance (pA) Reference: GCTACTGCAAAGGATATTTCTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGATACTTGGAACA *####* *#################* * *##################* *######* *#####* * * ########### *#* *## Calls: CATACTGCCCGAGATATTTCTAATGTCGTCAATTATGCTGCTTCTGGTGTGGTTCCTATTTTTCCTGGTATTCCTCAGGCTGTTGCCGAATATTGAGACA Path Position Sense-Antisense Calling Sense Reference: GCTACTGCAAAGGATATTTCTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGATACTTGGAACA *####* *#################* * *##################* *######* *#####* * * ########### *#* *## Calls: CATACTGCCCGAGATATTTCTAATGTCGTCAATTATGCTGCTTCTGGTGTGGTTCCTATTTTTCCTGGTATTCCTCAGGCTGTTGCCGAATATTGAGACA AntiSense (reversed) Reference: CGATGACGTTTCCTATAAAGATTACAGCAGTGACTACGACGAAGACCACACCAACTATAAAAAGTACCATAACTATTTCGACAACGGCTATGAACCTTGT * *###* * **#* * * * * * * *##* *# * ###* * *#* ** ##*#* *#* ** ** *##* Calls: ACAAAGCGTTTAGTCAAAAGGGTCCGGAATGATTCAGAAGAAAGAAAACCCAAACTCCAGGGGCAACCCAAATAATTTCCGGCACGCATAAAAAATTTGT MORE THAN THE SUM OFTHE PARTS Sense-AntiSense Reference: GCTACTGCAAAGGATATTTCTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGATACTTGGAACA ##############################* *##################################################################* Calls: GCTACTGCAAAGGATATTTCTAATGTCGTCAGTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGATACTTGGAACA We are calling the path with the model used to generate that path This is only a 3mer model Sense path: 3760-3860 AntiSense path: 1524-1624 AGBT2012 : Online f1000 • Strand begun in 2009 • 2011 routine translocation • 2011 recognition of k-mers • 2011 use of be-bruijn training strands • 2011 HMM base calling – de novo • 2011 Move away from polymerase • 2011 <=50kb reads • 2011/2012 PhiX Genome sequencing. Mostly commercial secrets pre AGBT2012 otherwise in patents. 08709449.6 EP2126588 18-Feb-08 PCT/08/000563 WO 2008/102121 18-Feb-08 12/527687 2010-0196203 18-Feb-08 08709448.8 EP2122344 18-Feb-08 PCT/08/000562 WO 2008/102120 18-Feb-08 12/527679 2011-0121840 18-Feb-08 PCT/GB2008/004127 WO 2009/077734 15-Dec-08 12/339,956 US2009/0167288 19-Dec-08 61/080,492 14-Jul-08 08863072 2232261 15-Dec-08 EP08006456.1 EP2107040 31-Mar-08 12/409007 US2009274870 23-Mar-09 09784654 EP2310534 06-Jul-09 PCT/GB2009/001690 WO2010/004273 06-Jul-09 13/002717 2011-0177498 06-Jul-09 61/078687 07-Jul-08 09784644 EP2307540 06-Jul-09 13187149.3 268260 02-Oct-13 PCT/GB2009/001679 WO2010/004265 06-Jul-09 13/968,778 US-2014-0051069 16-Sep-13 14/455,394 US2015 0031020 08/08/2014 61/078695 07-Jul-08 08806515.6 WO2010/055307 13-Nov-09 PCT/GB09/002666 WO2010/055307 13-Nov-09 WO2010/086603 29-Jan-10 14/334,285 PCT/GB10/000133 61/148726 30-Jan-09 PCT/GB10/000160 WO2010/086622 29-Jan-10 13/147159 2012-0058468 29-Jan-10 EP10703325 WO2010/086622 29-Jan-10 61/148737 30-Jan-09 EP 10705403 WO2010/086602 29-Jan-10 PCT/GB10/000132 WO2010/086602 29-Jan-10 13/147176 2012-0064599 29-Jan-10 10722740 EP2411538 25-Mar-10 13/260,178 WO2010/109197 25-Mar-10 PCT/GB10/000567 WO2010/109197 25-Mar-10 PCT/GB10/000789 WO2010/122293 19-Apr-10 10716404.8 2422198 19-Apr-10 13/265,448 2012-0133354 19-Apr-10 61/170,729 20-Apr-09 10793017.4 WO2011/067559 01-Dec-10 13/512,937 2012-0322679 01-Dec-10 61/265488 01-Dec-09 PCT/GB2010/002206 WO2011/067559 01-Dec-10 11823878.1 2614156 07-Sep-11 PCT/US2011/001552 WO2012/033524 07-Sep-11 13/821,156 US-2014-0051068 17-09-2013 61/402,903 07-Sep-10 61/574,236 30-Jul-11 61/574,240 30-Jul-11 61/574,237 30-Jul-11 61/574,239 30-Jul-11 61/574,238 30-Jul-11 61/574,235 30-Jul-11 61/574,233 30-Jul-11 PCT/GB2011/01432 30-Sep-11 PCT/GB11/001432 WO2012/042226 31-May-11 11770853.7 2622343 31-May-11 PCT/GB11/001432 WO2012/042226 31-May-11 12703872.7 2673638 10-Feb-12 INTRODUCTION TO NANOPORE SEQUENCING - 2 MANY CHEMISTRIES POSSIBLE Many components to a nanopore sequencing system All subject to continuous upgrades…. Motor (E5, E6, E7) Nanopore reader (R7, R8, R9, R10 etc...) Membrane (M9, M10 etc...) 30 Run Conditions Salt, fuel, script, temperature... PORES FOR THOUGHT ? DIFFERENT SHAPES AND SIZES HAVE BEEN ENGINEERED FOR SEQUENCING Some public crystal structures available, others obtained by ONT and collaborators 31 RELEASED PORE – R9 R9 is…..CsgG PORE FROM E.coli Nonameric lipoprotein (nine subunits) with a 36 stranded Beta-barrel Shape, dimensions, and the position of the constriction of R9 make it a better pore for DNA sequencing CsgG wildtype has been engineered heavily to enhance its properties > 700 mutants Considerable ‘head room’ for further improvement 32 INTRODUCTION TO NANOPORE SEQUENCING - 3 BASIC PLATFORM DYNAMICS Pore Membrane array ASIC Channels A single nanopore per well 100s to 100,000s of channels Many analytes per pore, per channel, per run Channels/pores asynchronous – no ‘cycles’ MinION currently offers 100s of channels; products will scale up and down 33 INTRODUCTION TO NANOPORE SEQUENCING - 4 MinION SEQUENCER – USB DEVICE AND FLOWCELL Consumable flowcell contains sensing chemistry, nanopore, and electronics Sample added to flowcell here Sensor chip with multiple nanopores Sensor chip works with custom ASIC for control and data processing USB powers device 34 MinION docks with flowcell, data streamed to USB ASIC is the Core ONT’s ASIC is the core component of the MinION • System chemistry → electronic signal ASIC influences: • Number of parallel recording channels • Signal to noise of the nanopore recording R&D on new generation ASICs progressing well 35 INTRODUCTION TO NANOPORE SEQUENCING - 5 1D AND 2D EXPLAINED 1D - Linear 1D is a rapid library preparation allowing sequencing of the template strand. 2D is a slightly longer preparation, but give more accurate calls using template and complement reads …template… (exit) 2D - Hairpin Template… Template… 36 …template…hairpin…complement… …complement… (exit) RAPID 1D LIBRARY PREPARATION GREAT DEMAND FOR A RAPID, LONG READ, LIBRARY PREP MuA fragments gDNA & adds adapters at same time – 10 min prep Good performance from MuA libraries – Modal fragment size is lower than for g-tube shearing, but with very long tail 37 LONG READS ON ONT PLATFORM – TOWARDS 1MB READ LENGTH IS LIMITED BY INPUT DNA, NOT PLATFORM Improvements in sample-prep protocol yields long-read libraries Size selection (BluePippin) after library preparation allows exclusively long fragments Image of complete 2D read of 250 kb 250 kbase dsDNA 250 kbase E.coli prep (extracted using Qiagen 500 tip) Start - Template Complement - Exit Hairpin “turn” MinION at 40 G per run 6,400G MkII Number of Pores MkI 120G Throughput per Day (Gb) R9.2 20G Translocation Speed (b/s/pore) © Copyright 2015 Oxford Nanopore Technologies 39 NEW KITS - ALL FAST MODE NANOPORE SEQUENCING KIT RAPID SEQUENCING KIT SK-NSK007 (R9 Version) SQK-RAD001 (R9 Version) Ligation Based generating 1D and 2D Reads Transposome based generating 1D reads 40 New Enzyme 2 tubes -> 10 minutes -> Done Higher speed (250 bps) Runs at 250 bps Premixed fuel with running buffer Premixed fuel with running buffer https://www.nanoporetech.com/products-services/voltrax VolTRAX– large format alpha produced Smaller format being for direct use with the MinION platform 41 FAST MODE ENZYME “MOTOR” CONTROLS SPEED OF DNA MOVING THROUGH NANOPORE Running slow → enzyme kinetics dominated by single process → exponential kinetics – Exponential distribution of event length → event detector misses short events Exponential Missed data Minimum event length Non-Exponential fewer deletions R9 Running fast → enzyme kinetics across multiple processes → non-exponential kinetics – Fewer missed events leads to high accuracy with increased throughput 43 SYSTEM SLOWDOWN AND THROUGHPUT LOTS OF SCOPE FOR IMPROVING FLOWCELL YIELD Typically hitting ~30 % theoretical maximum throughput over 24 hours Blocking is a key factor – lots like sample prep contamination is a major contributor Control Nicked 0.0 44 Read length (kb) 5.0 ‘ACCURACY’ IMPROVEMENTS UPDATED CHEMISTRY 1D 2D R7 @ 70 b/s HMM R9 @ 250 b/s HMM 1D % Accuracy 45 2D ‘ACCURACY’ IMPROVEMENTS – MARCH 2016 @ 180MV UPDATED ALGORITHMS 1D 2D R7 @ 70 b/s HMM R9 @ 250 b/s RNN 1D 1D ~= Pb 2D ~= Capillary % Accuracy 46 2D ACCURACY IMPROVEMENTS – APRIL 2016 @200MV UPDATED ALGORITHMS 1D 2D R7 @ 70 b/s HMM R9 @ 250 b/s RNN 1D ~92% 1D ~98% 2D % Accuracy 250mv -> 350mv ? > 99% 2D INTRODUCTION TO BASECALLING - 1 EVENT DETECTION Event detection is used to convert raw data to “events” by looking at current transitions Performed using a non-linear filter based on local t-statistics Similar starting point for many of the algorithms, but assumptions less appropriate at faster speeds. 48 INTRODUCTION TO BASECALLING - 4 LSTM-RNN NEURAL NETWORK BASECALLING Squiggles contain a memory beyond that which is modelled by HMM Network based on bidirectional long short-term memory (BLSTM) recurrent cells Input features derived from a window of events Output is posterior matrix as in HMM Forwards-Backwards, decode in similar way Achieves better results than HMM, can tune for performance/speed Scaling z-scaling (zero mean std) 49 ANALYSIS OF SEQUENCING DATA MinKNOW: MinION experimental control, QC, data acquisition Epi2Me (Metrichor Ltd): cloud-based analyses designed to let users without bioinformatics expertise resolve biological questions. Currently, basecalling is also provided to customers through the cloud 50 © Copyright 2016 Oxford Nanopore Technologies PROMETHION INSTRUMENT SEQUENCING MODULE • • • • 48 individually addressable flow cells Can also be run together (by sample) New 3K ASIC - 144,000 channels Total R9.x chemistry to be shipped. COMPUTE MODULE • Local cluster of high performance compute • Real time data analysis in the box • Web based administration…can be run by a simple tablet 51 Detachable Sensing Chip Sensor Array Chemistry Sensor chip architecture redesigned for more controlled membrane formation Micro-patterning of sensor surface controls the polymer fluid used to form membranes 1000s of individually addressable nanopore membranes formed 52 PROMETHION COMPUTE COMPUTE MODULE – INITIAL SPECIFICATIONS 12/24x quad-core i7 slave nodes plus one i7 management node 2x 1 GBit ethernet management ports 12/24x 2 TB SSD data buffer storage "Slurm” job scheduling cluster Runs Ubuntu 14.04 LTS internally [possibly 16.04 LTS later this year] Web-based administration and operation interface Will mount CIFS or NFS external storage, [possibly others] 2x 10 GBit fibre uplinks 53 [1x 1 GBit ethernet developer port (debugging)] PROMETHION BUILDING CAPACITY THE PEAP QUEUE First box shipment is imminent First come first served Team prepared to build initially 4-6 per month Later this year 10 per month will achievable with current set up 2017 scale up is being planned 54 Humbling to see the number of deposits coming in Targeting delivery to everyone before the end of the year Talk to us if having the deposit committed is an issue and you need to leave the queue Sensing from Biology Membrane chemistry robust to sensing from biological samples Polymer membrane replaces traditional “lipid bilayer” Sequencing from blood is possible with the MinION system Minimal library preparation needed to get the sample ready for sequencing 55 SUMMARY Nanopore sequencing is here and maturing rapidly. Accuracy, read lengths and throughput. Usable for Human and other large genomes. Devices to sample directly from biology (e.g. blood). Direct detection of base modifications. Cloud based analyses. Direct RNA sequencing. 56 Where might it all go? 201x ? © Copyright 2015 Oxford 57 Where could devices go? Remote environmental At home Rural monitoring eg crops Food chain/quality assurance Farm Food processing Retail Home London Calling Conference 2016 26-27 May. London. Open to all Plenary sessions │ Breakouts │ Posters │ Conference dinner Register at londoncallingconf.co.uk @nanoporeconf 59 60
© Copyright 2026 Paperzz