DNA Computing: A Research Snapshot Lila Kari A research snapshot • • • • • • • Adleman’s 20 variable 3-SAT experiment DNA Benenson automata DNA memory Towards a programmable DNA computer DNA nanoscale shapes DNA nanomachines Impact on theoretical computer science (1) Adleman’s 20-variable 3-SAT [Braich et al., Science, 2002] • The first experiment that demonstrated that DNA Computing devices can exceed the computational power of an unaided human • The answer to the problem was found after an exhaustive search of more than 1 million possible solution candidates Input to 3-SAT and solution Algorithm for 3SAT • Input: A Boolean formula in 3CNF • Step 1: Generate the set of all possible truth value assignments • Step 2: Remove the set of all truth value assignments that make the first clause false • Step 3: Repeat Step 2 for all clauses of the input formula • Output: The remaining (if any) truth value assignments Encoding the input • Every variable xk, k =1,..., 20, was associated with two distinct 15-mer DNA single strands called ‘value sequences’, one representing true, and one representing false Library of candidates • Each of the possible 2^20 truth assignments was represented by a 300-mer ‘library strand’ consisting of the ordered catenation of one 15-mer value sequence for each variable, i.e., W1 W2 ..... W20, where Wi is Xi^T or Xi^F To obtain these library strands, the 40 individual 15-mer sequences were assembled using a mix-and-match combinatorial technique 3SAT wetware • A glass ‘library module’ filled with a gel containing the library • One glass ‘clause module’ for each of the 24 clauses of the formula • Each clause module was filled with gel containing probes, i.e., 15-mer strands Watson-Crick complementary to the truth assignment that made that particular clause true Bioalgorithm for 3SAT • The strands are moved between modules by gel electrophoresis • The library passes through the first clause module, wherein library strands containing the 3 truth assignments satisfying the first clause are immobilized, while library strands that do not satisfy it go into a buffer reservoir • The captured strands are released by raising the temperature, and used as input to the 2nd clause module, etc. • At the end, only the strands representing the truth assignment satisfying all 24 clauses remain Output to 3SAT • The output was PCR amplified with primer pairs corresponding to all 4 possible true-false combinations of assigments for the first and last variable, x1 and x20 • None except the primer pair (X1F, WK(X20F)) showed any bands, indicating two truth values of the satisfying assignment, x1 = F and x20 =F • The process was repeated for all variable pairs (x1, xk), k = 2,..., 19 (2) DNA Benenson Automata [Benenson et al., Nature 2001] Construct a simple two-state automaton over a two-letter alphabet set, using double -stranded DNA molecules and restriction enzymes Main engine of Benenson automata • FokI enzyme: an unusual restriction enzyme that recognizes a sequence and cuts unspecifically a short distance away • Recognition site 5’-GGATG-3’ 3’-CC TAC -5’ • Cleaves 9bp away on the top strand and 13 bp away on the bottom strand Encoding the input • Encoding of the symbol a • Encoding of the symbol b • Encoding of the terminator t Example of encoding the input • The input strand ab is encoded as a DNA strand that contains the site for FokI, followed by the catenation of the encodings for abt Encoding state/symbol pairs The pair S0a is encoded as 5’-GGCT-3’ (the 4-mer suffix of a) The pair S1a is encoded as 5’-CTGG-3’ (the 4-mer prefix of a) Meaning: If the 4-mer suffix of the encoded symbol is detected then the symbol is interpreted as being read in state S0 If the 4-mer prefix is detected, then the symbol is being interpreted as being read in state S1 Output detection molecules • S0-D is a 161-mer DNA double strand with an overhang 3’-AGCG-5’ which ‘detects’ the last state of the computation as being S0 S1-D is a 251-mer DNA double strand with overhang 3’-ACAG-5’ which detects the last state of the computation as being S1 8 possible transition molecules Each transition molecule has a 4-mer overhang, for example T1 has 3’-CCGA-5’ ,that can selectively bind to the DNA encoding the current state/symbol pair, in this case S0a Example computation on input ab Computation on input ab • FokI enzyme cuts the input encoding abt exposing the sticky end 5’-GGCT-3’, i.e., S0a • The transition molecule T1: S0a -> S0 detects this state/symbol by binding and forming a double-stranded molecule (using ligase) Note: The transition molecule T1, incorporated in the current molecule, contains a FokI restriction site. Moreover, the 3bp spacer after the site ensures that the next cleaving will expose a suffix of the next symbol, which will be correctly interpreted as S0b Computation on input ab, contd. • The overhang is now 5’-CAGC-3’ , i.e., S0b • The sticky-end fits the transition rule T4: S0b à S1 The combination of the current strand with T4 and ligase leads to another double strand A last use of FokI exposes the overhang 5’-TGTC-3’ which is a suffix of the terminator, interpreted as S1t Outcome of computation on input ab • The overhand is complementary to the stickyend 3’-ACAG-5’ of the detector molecule S1-D corresponding to the last state of the computation being S1. • The state S1 is not final, and thus the outcome of the computation is that the input ab is not accepted by the automaton • Note that any two-state two-symbol automaton can be build using this method Application of Benenson automata • Medical diagnosis and treatment: smart drugs [Benenson et al., Nature 2004] • Automaton to identify and analyze the mRNA of disease-related genes associated with lung and prostate cancer, and produce a singlestranded DNA molecule modelled after an anti-cancer drug (3) DNA Memory • Information-encoding density • [Reif et al., DNA7, 2002] DNA has the potential of storing on the order of 10^12 more compactly than conventional storage technologies • [Baum, Science, 1995]: content-addressable DNA memory vastly larger than the brain Nested Primer Molecular Memory [ Yamamoto et al., 2008] • NPMM = pool of strands wherein each strad codes both data information and address information [CLi, BLj, Alk, DATA, ARq, BRr, CRs] Here i, j, k, q, r, s are between 0 and 15 and each component, e.g., CL0 represents a 20-mer DNA sequence How to retrieve data Use nested PCR consisting of 3 steps Use PCR with primer pair (CLi, WK(CRs)) WK(s) is the Watson-Crick complement of s This results in amplification of all molecules starting with CLi and ending in CRs • Second PCR uses primer pair (BLj, WK(BRk)) • Third PCR use primer pair (ALk, WK(ARq)) • Sequencing will result in retrieval of the DATA • • • • Advantages of NPMM memory • Enormous address space: 16.8 million addresses • High specificity • Proper selection of DNA sequences avoids mutation during PCR Organic DNA memory • • • • [Wong, Wong, Foote, Comm.ACM, 2003] [Yachie et al., Biotechnology Progress, 2007] Memory technology using living organisms First paper proposes a candidate for a living host for DNA memory sequences that tolerates the addition of artificial gene sequences and survives extreme environmental conditions Organic memory • Use Escherichia coli, and Deinococcus radiodurans (can survive extreme conditions including cold, dehydration, vacuum, acid and radiation) • Information encoding stage: an encoding scheme was chosen that assigned 3-mer sequences to various symbols. For example: AAA = “0”, AAC = “1”, AGG = “A” Information encoding • Each of the encoding 3-mers contained only 3 of the 4 DNA nucleotides • Using this encoding, any English text could be codified as a DNA sequences • The text chosen for this experiment was “And the oceans are wide” Several additional sequences were chosen to act as sentinels and tag the beginning and end of messages Choosing sentinel sequences • Identify a set of twenty-five 20-mer sequences that do not exist in either genome, yet satisfy all the genomic constraints and restrictions • All sequences contained multiple stop codons TAA, TGA, TAG as subsequences to prevent misinterpreting the memory strands, translating them into artificial proteins that could kill the bacteria Inserting the message • A 46bp DNA sequence was created, consisting of two different 20bp sentinels, connected by a 6bp recognition site of an enzyme • The embedded DNA was then inserted into cloning vectors, and transferred into E.coli allowing the vector to multiply • The vector and encoded DNA were then incorporated into the genome of Deinococcus for permanent storage and retrieval Organic DNA memory Advantages of organic memory • Message can be retrieved using prior knowledge of sequences at both borders, by PCR, read-out and decoding • 1ml of liquid can contain up to 10^9 bacteria • Potential disadvantages are random mutations but these are unlikely given the natural cellular mechanisms for detecting and correcting errors. (4) Towards a programmable DNA computer • [Sakamoto et al., Science, 2000] • [Hagiya et al., DNA3, 1997] • A self-acting DNA molecule containing, on the same strand, the input, the program, and the working memory • Whiplash PCR Whiplash PCR • The 5’ end of the DNA single strand contains state transitions A à B, encoded as DNA rule blocks WK(B) – WK(A) – stopper sequence The 3’ end of the strand contains the encoding of “current state”, say A Whiplash PCR transition A à B • Step (i): Cooling the solution will lead the 3’ end of the DNA strand, A, to attach to its corresponding rule block, namely WK(A) • Step (ii): PCR is used to extend the now-attached end A by the encoding of the new state B, and the process is stopped by the stopper sequence • Step (iii): By raising the temperature, the new current state B is detached, and the new transition cycle can begin Whiplash PCR (5) DNA nanoscale shapes [Rothemund, Nature, 2006] • ‘Scaffolded DNA origami’ for fabrication of any 2D-shape of 100nm diameter • Technique: DNA strands form complex structures by their design, which makes it possible for some single DNA strands to participate in two double helices – they wind along one helix, then switch to another DNA origami design process • (1) Build an approximate geometric model of the desired shape; the shape is approximated by cylinders that are models of DNA double helices • (2) Fill the shape by folding a single long ‘scaffold strand’ back an forth in a raster pattern such that at each moment the scaffold strand represents either the main strand or the complement strand of as double helix DNA origami design process • (3) Use a computer program to generate a set of ‘staple strands’ that provide Watson-Crick complements of the scaffold • The staple strands are designed to bind to portions of the scaffold strand, holding it thus together in the desired shape • The staple strands are fine-tuned to minimize strain and optimize binding specificity and binding energy Testing DNA origami • Scaffold = circular genomic DNA, 7, 249nt long, from the virus M13mp18 • Use 250 short staple strands and mix with the scaffold, in 10-fold excess to it • The strands annealed in less than two hours and AFM (Atomic Force Microscopy) imaging showed that the desired shape was realized • Results: Assembly of squares, triangles, fivepointed stars, smiley faces DNA origami (6) DNA nanomachines • Dynamic DNA structures with potential use to nanofabrication, engineering and computation • DNA-based nanodevices can convert static DNA structures into machines that can move or change conformation • Examples: tweezers, walkers that can be moved along a track, autonomous molecular motors Molecular tweezers [Yurke et al., Nature, 2000] • Two partially double-stranded DNA arms connected by a short single DNA strand acting as a flexible hinge • The resulting structure is on the shape of a pair of open tweezers • A ‘set strand’ is designed in such a way as to be complementary to both single-stranded ‘tails’ at the end of the arms Molecular tweezers • Adding the ‘set strand’ results in its annealing to both tails of the arms, bringing thus the arms of the tweezers together in a ‘close’ configuration • A short region of the set strand remains single stranded, And is used as a toehold that allows a new ‘reset strand’ to strip the set strand from the arms by itself hybridizing with the set strand - the tweezers are returned to the ‘open’ configuration Molecular tweezers Molecular walker [Shin, Pierce, JACS, 2004] • DNA device with two distinguishable feet that walks directionally on a linear DNA track with single strands periodically protruding from it and acting as anchors • The walker is double-stranded and has two single-stranded extensions acting as ‘legs’ • Specific attachments bind the legs to the single-stranded anchors placed periodically along the double-stranded track Molecular walker step • A step requires the sequential addition of two strands: the first lifts the back foot from the track, by strand displacement – a process by which an invading DNA single strand can displace one of the constituent strands of a double-strand by replacing it with itself, provided the new structure is more stable– • The second strand places the released foot ahead of the stationary foot Molecular walker • Molecular walker step Other molecular walkers • [Sherman, Seeman, Nanoletters, 2004] – walking devices based on pattern of inchworms – the front foot steps forward and the back foot catches up • [Sekiguchi et al., DNA13, 2008] Autonomous three-legged walker (no need for fuel strands) that can walk autonomously in 2D or 3D on a designed route. It uses an enzyme as a source of power and a track of DNA equipped with many DNA anchors arranged in a specific pattern (7) DNA Computing: Impact on Theoretical Computer Science • • • • • • • • The genetic code Splicing systems Optimal encodings for DNA Computing Sticker systems Watson-Crick automata Combinatorics on DNA words Cellular computing DNA computation by self-assembly 1953: Watson and Crick discover DNA structure The RNA Tie Club • 1954 “Solve the riddle of the RNA structure and to understand how it builds proteins” (clockwise from upper left: Francis Crick, L. Orgel, James Watson, Al. Rich) • There are 20 aminoacids that build up proteins The Diamond Code • G.Gamow - double stranded DNA acts as a template for protein synthesis: various combinations of bases could form distinctively shaped cavities into which the side chains of aminoacids might fit Comma-Free Codes (the prettiest wrong idea in 20-th century science) • The RNA piglet model The prettiest wrong idea in all of 20th century science • Suckling-pig model of protein synthesis • Construct a code in which when two sense codons (triplets) are catenated, the subword codons are nonsense codons • If CGU and AAG are sense codons, then GUA and UAA must be nonsense because they appear in CGUAAG Comma-free codes (Crick 1957) • How many words can a comma-free code include? • For n=4 and k=3 the size of a maximal commafree code is the magic number 20 • For an alphabet of n letters grouped into kletter words, if k is prime, the number of maximal comma-free codes is (n^k –n)/k • For n=4 and k=3 this equals 408 Reality Intrudes • News from the lab bench: [Nirenberg,Matthaei ’61] synthesize RNA, namely poly-U, coding for phenylalanine • By 1965 the genetic code was solved • The code resembled none of the theoretical notions • The “extra” codons are merely redundant The Genetic Code Splicing Systems (Head 1987) 5’ CCCCCTCGACCCCC 3’ 3’GGGGGAGCTGGGGG5’ + 5’AAAAAGCGCAAAAA 3’ 3’ TTTTTCGCGTTTTT 5’ + Enzyme 1 5’TCGA3’ 3’AGCT5’ + Enzyme 2 5’GCGC3’ 3’CGCG5’ Splicing Systems 5’ CCCCCT CGACCCCC 3’ 3’GGGGGAGC TGGGGG5’ + 5’AAAAAG CGCAAAAA 3’ 3’ TTTTTCGC GTTTTT 5’ DNA strands with compatible sticky ends recombine to produce two new strands Splicing operation Splicing system sample results Theorem (Paun’95, Freund,Kari,Paun ,’99) Every type-0 language can be generated by a splicing system with finitely many axioms and finitely many rules. Theorem (Freund,Kari,Paun ’99) For every given alphabet T there exists a splicing system, with finitely many axioms and finitely many rules, that is universal for the class of systems with terminal alphabet T. From DNA to TCS • • • • • • • • The genetic code Splicing systems Optimal encodings for DNA Computing Sticker systems Watson-Crick automata Combinatorics on DNA words Cellular computing DNA computation by self-assembly Encoding Information for DNA Computing • DNA strands should form desired bonds • DNA strands should be free of undesirable intra-molecular bonds • DNA strands should be free of undesirable inter-molecular bonds Intramolecular Bonds C C A T C AGT C GC T AT C A C C T GT C AGC GAT AGA Intra- and inter-molecular bonds DNA-complementarity model (Kari,Kitto,Thierrin’02) 3’ (a) G 5’ A C G T T G C A C G A C G C T G T A A T (c) 3’ (d) (b) C 5’ Bond-free languages Bonds between DNA strands Sample Results (Hussini/Kari/Konstantinidis/Losseva/Sosik ‘03) Sticker Systems (Freund,Paun,Rozenberg,Salomaa’98, Kari,Paun,Rozenberg,Salomaa,Yu’98, Hoogeboom,van Vugt’00, Kuske,Weigel’04, Paun,Rozenberg ‘98) Given a complementarity relation, define an alphabet of double-stranded columns Sticking operation Complex Sticker Systems • Sakakibara,Kobayashi ‘01: Sticker systems based on hairpins • Alhazov,Cavaliere ’05: Observable sticker systems Watson-Crick Automata (Freund,Paun,Rozenberg,Salomaa’99;Paun,Rozenberg’98; MartinVide,Paun,Rozenberg,Salomaa’98;Czeizler,Czeizler06; Paun,Paun’99;Czeizler,Czeizler,Kari,Salomaa’08) From DNA to TCS • • • • • • • • The genetic code Splicing systems Optimal encodings for DNA Computing Sticker systems Watson-Crick automata Combinatorics on DNA words Cellular computing DNA computation by self-assembly Combinatorics on DNA Words • IDEA: Consider the word w and its WKcomplement, WK(w), as equivalent • The word ACTG CAGT CAGT can be considered repetitive (periodic) because it can be written as ACGT WK(ACGT)2 • Generalize classical notions such as power of a word, border, primitive word, palindrome, conjugacy, commutativity Identity => Antimorphic involution f Pseudo-palindrome (de Luca,De Luca’06, Kari,Mahalingam’09) u = f(u) Pseudo-commutativity(Kari,Mahalingam’08) u v = f(v) u Pseudo-bordered word (Kari,Mahalingam’07) w = v x = y f(v) Pseudoknot-bordered word (Kari,Seki’09) w = u v x = y f(u) f(v) Pseudo-conjugacy of u, v (Kari,Mahalingam’08) u x = f(x) v Fine and Wilf Theorem Extended Fine and Wilf Theorem Extended Fine and Wilf Theorem Lyndon-Schutzenberger Equation Extended Lyndon-Schuzenberger Extended Lyndon-Schutzenberger DNA Computing: A research snapshot • • • • • • • Adleman’s 20 variable 3-SAT experiment DNA Benenson automata DNA memory Towards a programmable DNA computer DNA nanoscale shapes DNA nanomachines Impact on theoretical computer science Our Challenge • Discover a new, broader notion of computation • Understand the world around us in terms of information processing • “Biology and Computer Science – life and computation – are related. I am confident that at their interface great discoveries await whose who seek them.” (Adleman’98)
© Copyright 2026 Paperzz