Here - Depixus

TM
SIMDEQ
– a novel approach to combined
genetic & epigenetic analysis
Jimmy Ouellet1, Fatima Hamouri1, Laurène Giraut2, Gordon Hamilton1,2, Charles André2, Jean-François Allemand1,
David Bensimon1, Vincent Croquette1
1
Laboratoire de Physique Statistique, ENS, 24 rue Lhomond, 75005 Paris;
2
PicoSeq SAS, 74 rue Lecourbe, 75015, Paris
ABSTRACT
INTRODUCTION
PicoSeq is developing a novel platform for the analysis of DNA and RNA in collaboration with the lab of Vincent Croquette at the Ecole
Normal Superiéure in Paris. This platform, known as SIMDEQTM (short for Single-Molecule Magnetic DEtection & Quantification)
interrogates individual DNA/RNA tethered to micron-sized paramagnetic beads. The DNA/RNA molecules are attached to the floor of a
flow cell, and manipulated in a magnetic field. This approach, which is simple, accurate, and has the potential to be run at very high
throughput, can be used to map and ultimately to fully sequence DNA or RNA molecules. SIMDEQ can also be used to directly detect
base-modifications such as 5-methylcytosine as there is no need for sample amplification. We have chosen the human FMR1 gene as a
model system to demonstrate the capabilities of the SIMDEQ system. The FMR1 locus (associated with Fragile X Syndrome and several
other diseases) is difficult to characterize at the molecular level, due to its long GC-rich repeats, complex methylation patterns and
mosaicism. We have developed specific analytical protocols to address the challenges presented by FMR1 and other similar loci.
Here we present data from our SIMDEQ platform, that demonstrates robust and accurate
genotyping and epityping on single-molecules of DNA. Specifically, we share some results
from our analysis of the 3’ UTR of the FMR1 gene (a challenging highly GC-rich genomic
region), and 3-methyl cytosine detection, a base-modification that can currently only be
detected using low-resolution immuno-precipitation. In contrast to most current NGS
methods, it is worth keeping in mind that our approach is not based on the detection of the
incorporation of fluorescent nucleotides but on tracking the length of DNA hairpins with
single-base high precision (see Figures 1A – 1C below). These hairpins can contain fragments
of interest varying in length from a few nucleotides through to 20kb+.
The SIMDEQTM
bench-top prototype
EXPERIMENTAL PRINCIPLE
 Open
Hairpin
50
100
Extension (nM)
Extension (nM)
100
Closed
 Hairpin
0
Time
}
50
0
}
40nM
Time
Time
Figure 1A. The DNA hairpin is the central structure underpinning the SIMDEQ approach, as it can be repeatedly opened and closed. DNA
hairpins containing a region of interest are attached to the floor of a flow-cell by one arm and to a paramagnetic bead by the other. Using a
magnet, moveable in the z-axis, a variable force can be applied to the beads. With the magnet in the upper position, a nominal force is
exerted on each bead (left panel). Lowering the magnet increases the force pulling on the beads. When the force applied is greater than 1015pN, the DNA hairpins will “unzip” (middle panel). When the magnet moves back to the upper position, the hairpins will reform. This openclose process can be repeated many thousands of times. Importantly this entire process can be monitored by tracking the z-position of the
beads in real time. This is shown in the right panel – here the bead begins in the closed position (extension = 0nM) and is subsequently
opened (extension ~80nM) and allowed to close again. The length of the open hairpin allows the total length of the DNA molecule to be
determined (in this case about 80 nucleotides).
Figure 1B. Sequence information can be generated by blocking hairpin closure with hybridizing oligonucleotides. While
hairpins are in their unzipped state, oligonucleotides are able to hybridize to their complementary sequences (as shown in orange
in the left panel). When the force on the beads is reduced, the hairpins start to reform but bound oligonucleotides temporarily
block the rezipping of the hairpin (center panel). These hybridization events are detectable as pauses in the movement the bead
as the hairpin goes from open to closed (right panel). The duration of these pauses is largely dependent on the length of the
oligonucleotides. Typically we use oligonucleotides that bind for an average of 0.5 – 2 sec. Our method for tracking bead zposition is very precise, allowing us to detect the binding position of each oligonucleotide with single-base precision. Thus for
each pause we can determine both the underlying sequence and its position.
Characterization of Fmr1 repeats: a challenging task is greatly simplified with SIMDEQ
40nM
Figure 1C. A wide range of base-modifications can be detected by blocking hairpin closure with
antibodies. As with oligonucleotides, antibodies specific to base-modifications are able to bind to
their antigens when hairpins are in their open state, and block hairpin re-zipping. A wide range of
antibodies (both monoclonal and polyclonal) against base-modifications are commercially available.
We have tested a range of these antibodies and have shown that we can accurately determine the
presence and location of many different base-modifications. Recording the binding kinetics of
individual antibody binding events, allows us to accurately discriminate real from false-positive
binding events.
A wide range of base modifications can be detected with SIMDEQ
Biotin
B
AGG repeats
•
Conserved
Region
Oligo 2
Dig
Oligo 3
Variable
Region
Me
600 bp
B
Synthetic
Oligos
FMR1 gene
CGG repeat region
Oligo 1
400 bp
5-mC
5-hmC
5-caC
6-mA
3-mC
Oligo 4
Figure 3 (A) A series of synthetic hairpins were generated, each containing a specific epigenetic modification. They were all constructed by ligating two fragments
isolated from plasmid DNA (of 400 and 600 base-pairs in length, each end having a unique overhang), together with two synthetic oligonucleotides, one containing
the desired modification. (B) Chemical structures of base modifications which can now be robustly detected using SIMDEQ. Below we provide data from the analysis
of 3mC modifications.
Conserved
Region
Sequencing and base modification detection can be performed on the
same single molecules of DNA: an example of 3mC
C
Figure 2. Measurement of the number of CGG repeats in the 3’ UTR region of FMR1, and determination of the presence of
interspersed AGG repeats. (A) Schematic representation of the human FMR1 gene. Specific FMR1 gene-mapping oligos
(numbered 1-4) are spaced along the length of the gene in conserved regions flanking the repeats. The distance between oligos
in the conserved region is used as a reference to determine the number of bases located between the two oligos directly
flanking the repeats. (B) Left panel: mapping data from two hairpins with 23 CGG repeats (top) and 29 CGG repeats (bottom).
The region surrounding the repeats was amplified from gDNA obtained from a mix of normal individuals and cloned into
bacteria. Two clones were selected and repeats analyzed by Sanger sequencing (data not shown). Right panel: a repeat was
sized as in the right panel with 4 mapping oligos (top) and subsequently with an oligo specific for interspersed AGG repeats.
This mapping data was confirmed by Sanger sequencing (data not shown). (C) Analysis of a number of hairpins derived from
bacterial colonies of repeats of 23 (blue) and 29 repeats (red) sized as described in panel A. The distribution of actual repeat
sizes is centered on the expected size of the repeat, but shows considerable variation (+/-3 repeats). This is consistent with
other reports of instability of these repeats when cloned into E. coli. (e.g. Ref 2).
Oligo 1
3mC Ab
In our analysis of the FMR1 locus with SIMDEQ, we demonstrate that complex and GC-rich
repeats can be easily analyzed with a single, rapid hybridization experiment. Although the
results generated showed some heterogeneity, we are confident that this is due to the
instability of these repeats when cloned into E. coli and not from our analytical approach.
Future experiments will examine FMR1 molecules isolated directly from human DNA.
As well as being able to perform high-resolution genotyping, the SIMDEQ platform also
allows users to map a very wide range of base-modifications on the same unamplified DNA
molecules, without need for any conversion chemistries. Indeed, virtually any modified
DNA structure can be investigated by simply generating a suitable binding molecule. Future
work will focus on generating new binders for interesting targets such as additional
methylated bases and various forms of DNA damage.
CONCLUSIONS
PicoSeq’s SIMDEQ™ platform enables:
• Interrogation of DNA fragments ranging from a few base-pairs
to 20kb+
• Rapid and accurate analysis of repetitive, GC-rich regions, such
as the FMR1 locus. This approach can be easily expanded to
other repetitive loci
• The detection of a wide range of base-modifications on
unamplified genomic DNA
O1
O3
O2
Calculated position
of 3mC = 612 bp
Oligo 2
Oligo 3
Time
DISCUSSION
Cumulative binding
CpG island
A
Extension (nm)
A
Extension (Base pairs)
Figure 4. Detection of 3-methylcytosine (3mC) modifications using a commercial antibody. (A) A 1kb hairpin containing a single 3-methylcytosine modification
(produced as described in Figure 3A) was analyzed with a polyclonal antibody for 3-methylcysoine (Diagenode, ref 3.) and 3 reference oligos. Opening/closing cycles
were performed (as described in Figure 1C) and the bead position graphs for all cycles were then superimposed. Blocking positions for oligos are indicated by blue
arrows, and the position of the antibody binding site is indicated by the orange arrow. Note that there are a few nonspecific binding events (infrequent and/or of short
duration) which are probably due to antibodies in the polyclonal mix with poor specificity or affinity. These events can be easily filtered out of the analysis. (B) A
histogram of the binding positions was produced from the opening/closing cycles of Figure 4A. Because the sequence of the oligonucleotides and their complementary
sequences on the hairpin were known, the extension value in nm could be converted into base pairs using the oligos as reference points. The expected positions of the
oligonucleotides are represented by the rectangular bars. Once aligned, it was possible to determine the position, in base pairs, of the blockage due to the antibody,
which is in this case 612bp (expected position was 614bp).
REFERENCES
1. Single-molecule mechanical identification and sequencing (2012) Ding F, Manosas M, Spiering MM, Benkovic SJ, Bensimon D,
Allemand JF, Croquette V. Nat Methods. Mar 11;9(4):367-72
2. Sequencing the un-sequenceable: expanded CGG-repeat alleles of the Fragile X gene. (2013) Loomis EW1, Eid JS, Peluso P, Yin J,
Hickey L, Rank D, McCalmon S, Hagerman RJ, Tassone F, Hagerman PJ. Genome Res. Jan;23(1):121-8
3. http://www.diagenode.com/media/catalog/file/Datasheet_3-mC_C15410209.pdf