Combinatorics on words in DNA computing

Combinatorics on words in DNA computing
’. Starosta
17. & 20. 5.2011
1 / 29
Outline
Outline
1
DNA computing
Introduction
DNA molecule
The rst DNA computation
Operations in DNA computing
DNA computing revisited
What to avoid in a test tube
2
Combinatorics on Words
Setting in CoW
Results
More results
If there is some time left...
2 / 29
DNA computing
Combinatorics on Words
Outline
1
DNA computing
Introduction
DNA molecule
The rst DNA computation
Operations in DNA computing
DNA computing revisited
What to avoid in a test tube
2
Combinatorics on Words
Setting in CoW
Results
More results
If there is some time left...
3 / 29
DNA computing
Combinatorics on Words
Introduction
barriers for traditional computers
1
HUP
2
von Neumann bottleneck
DNA/biomolecular computers: 1994 Leonard Adleman
4 / 29
DNA computing
Combinatorics on Words
Introduction
barriers for traditional computers
1
HUP
2
von Neumann bottleneck
DNA/biomolecular computers: 1994 Leonard Adleman
4 / 29
DNA computing
Combinatorics on Words
DNA - basics
deoxyribonucleic acid
1950s: DNA carries genetic information (double helix model)
structure: polymer chains - strands consisting of bases attached to
sugar phosphate backbone
4 bases: A, G, T, C
Watson-Crick complementarity: A likes T, C likes G
bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per
inch
5 / 29
DNA computing
Combinatorics on Words
DNA - basics
deoxyribonucleic acid
1950s: DNA carries genetic information (double helix model)
structure: polymer chains - strands consisting of bases attached to
sugar phosphate backbone
4 bases: A, G, T, C
Watson-Crick complementarity: A likes T, C likes G
bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per
inch
5 / 29
DNA computing
Combinatorics on Words
DNA - basics
deoxyribonucleic acid
1950s: DNA carries genetic information (double helix model)
structure: polymer chains - strands consisting of bases attached to
sugar phosphate backbone
4 bases: A, G, T, C
Watson-Crick complementarity: A likes T, C likes G
bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per
inch
5 / 29
DNA computing
Combinatorics on Words
DNA - basics
deoxyribonucleic acid
1950s: DNA carries genetic information (double helix model)
structure: polymer chains - strands consisting of bases attached to
sugar phosphate backbone
4 bases: A, G, T, C
Watson-Crick complementarity: A likes T, C likes G
bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per
inch
5 / 29
DNA computing
Combinatorics on Words
DNA - basics
deoxyribonucleic acid
1950s: DNA carries genetic information (double helix model)
structure: polymer chains - strands consisting of bases attached to
sugar phosphate backbone
4 bases: A, G, T, C
Watson-Crick complementarity: A likes T, C likes G
bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per
inch
5 / 29
DNA computing
Combinatorics on Words
DNA - basics
deoxyribonucleic acid
1950s: DNA carries genetic information (double helix model)
structure: polymer chains - strands consisting of bases attached to
sugar phosphate backbone
4 bases: A, G, T, C
Watson-Crick complementarity: A likes T, C likes G
bases are at a distance of 3, 4 ångströms, resulting in 18 Mbits per
inch
5 / 29
DNA computing
Combinatorics on Words
Structure of DNA
6 / 29
DNA computing
Combinatorics on Words
Watson-Crick complementarity and WK-palindromes
7 / 29
DNA computing
Combinatorics on Words
Hamiltonian Path Problem
Leonard Adleman in 1994: Hamiltonian Path Problem
(NP-complete) on a graph with 7 vertices
8 / 29
DNA computing
Combinatorics on Words
Adleman's DNA computation I
9 / 29
DNA computing
Combinatorics on Words
Adleman's DNA computation I
9 / 29
DNA computing
Combinatorics on Words
Adleman's DNA computation II
24 Earth masses of DNA
200-node instance would require 10
10 / 29
DNA computing
Combinatorics on Words
Adleman's DNA computation II
24 Earth masses of DNA
200-node instance would require 10
10 / 29
DNA computing
Combinatorics on Words
The likely frame
11 / 29
DNA computing
Combinatorics on Words
Manipulation with DNA
synthesis
denaturing, annealing and ligation
separation - anity purication
detect
gel electrophoresis
PCR - polymerase chain reaction
cutting (using restriction enzymes)
12 / 29
DNA computing
Combinatorics on Words
Very general model of a DNA calculation
1. select the language - code
2. do the computation
3. decode
13 / 29
DNA computing
Combinatorics on Words
Advantages and drawbacks
Advantages:
Size: the information density could go up to 1 bit per cube nm
9 calculations per ml of DNA per second
High parallelism: 10
Energy eciency: 1019 operations per Joule
Drawbacks:
required mass of DNA
reagents
need of manipulation by a
human
...
14 / 29
DNA computing
Combinatorics on Words
Advantages and drawbacks
Advantages:
Size: the information density could go up to 1 bit per cube nm
9 calculations per ml of DNA per second
High parallelism: 10
Energy eciency: 1019 operations per Joule
Drawbacks:
required mass of DNA
reagents
need of manipulation by a
human
...
14 / 29
DNA computing
Combinatorics on Words
What problems can be solved by a DNA computer
Problems solved that can be found in literature:
TSP - travelling salesman problem
addition
SAT - satisability problem
DES cracking
maximal clique problem
...
15 / 29
DNA computing
Combinatorics on Words
Intramolecular hybridization - hairpins
16 / 29
DNA computing
Combinatorics on Words
Intermolecular hybridization
17 / 29
DNA computing
Combinatorics on Words
Outline
1
DNA computing
Introduction
DNA molecule
The rst DNA computation
Operations in DNA computing
DNA computing revisited
What to avoid in a test tube
2
Combinatorics on Words
Setting in CoW
Results
More results
If there is some time left...
18 / 29
DNA computing
Combinatorics on Words
Representing DNA strands
0
DNA strands are considered in their 5
words over
→ 30
orientation as nite
∆ = { A, G , C , T }
involutive antimorphism
WK: A ↔ T , C ↔ G
an encoding (i.e. initial set of words) for a computation is a
language L
⊂ ∆∗
19 / 29
DNA computing
Combinatorics on Words
Representing DNA strands
0
DNA strands are considered in their 5
words over
→ 30
orientation as nite
∆ = { A, G , C , T }
involutive antimorphism
WK: A ↔ T , C ↔ G
an encoding (i.e. initial set of words) for a computation is a
language L
⊂ ∆∗
19 / 29
DNA computing
Combinatorics on Words
Representing DNA strands
0
DNA strands are considered in their 5
words over
→ 30
orientation as nite
∆ = { A, G , C , T }
involutive antimorphism
WK: A ↔ T , C ↔ G
an encoding (i.e. initial set of words) for a computation is a
language L
⊂ ∆∗
19 / 29
DNA computing
Combinatorics on Words
Representing DNA strands
0
DNA strands are considered in their 5
words over
→ 30
orientation as nite
∆ = { A, G , C , T }
involutive antimorphism
WK: A ↔ T , C ↔ G
an encoding (i.e. initial set of words) for a computation is a
language L
⊂ ∆∗
19 / 29
DNA computing
Combinatorics on Words
Questions
How to choose the initial coding set L?
1
simulation
2
algorithm
3
theory
Bio-operations vs. L: after each bio-operation the language
changes - do the properties hold?
20 / 29
DNA computing
Combinatorics on Words
Questions
How to choose the initial coding set L?
1
simulation
2
algorithm
3
theory
Bio-operations vs. L: after each bio-operation the language
changes - do the properties hold?
20 / 29
DNA computing
Combinatorics on Words
Constraints on
L
to avoid possible hybridizations, we impose several constraints on L
L is
Θ-k -m-subword code if
∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅
k
L is
Θ-k -code
i
if
L is bond-free if
∀u , v ∈ A ∩ L, u 6= Θ(v )
k
∀u , v ∈ A ∩ L, H (u , Θ(v )) > d ,
k
where H is
the Hamming distance
frequency of C and G in each word should be
1
2
...
21 / 29
DNA computing
Combinatorics on Words
Constraints on
L
to avoid possible hybridizations, we impose several constraints on L
L is
Θ-k -m-subword code if
∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅
k
L is
Θ-k -code
i
if
L is bond-free if
∀u , v ∈ A ∩ L, u 6= Θ(v )
k
∀u , v ∈ A ∩ L, H (u , Θ(v )) > d ,
k
where H is
the Hamming distance
frequency of C and G in each word should be
1
2
...
21 / 29
DNA computing
Combinatorics on Words
Constraints on
L
to avoid possible hybridizations, we impose several constraints on L
L is
Θ-k -m-subword code if
∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅
k
L is
Θ-k -code
i
if
L is bond-free if
∀u , v ∈ A ∩ L, u 6= Θ(v )
k
∀u , v ∈ A ∩ L, H (u , Θ(v )) > d ,
k
where H is
the Hamming distance
frequency of C and G in each word should be
1
2
...
21 / 29
DNA computing
Combinatorics on Words
Constraints on
L
to avoid possible hybridizations, we impose several constraints on L
L is
Θ-k -m-subword code if
∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅
k
L is
Θ-k -code
i
if
L is bond-free if
∀u , v ∈ A ∩ L, u 6= Θ(v )
k
∀u , v ∈ A ∩ L, H (u , Θ(v )) > d ,
k
where H is
the Hamming distance
frequency of C and G in each word should be
1
2
...
21 / 29
DNA computing
Combinatorics on Words
Constraints on
L
to avoid possible hybridizations, we impose several constraints on L
L is
Θ-k -m-subword code if
∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅
k
L is
Θ-k -code
i
if
L is bond-free if
∀u , v ∈ A ∩ L, u 6= Θ(v )
k
∀u , v ∈ A ∩ L, H (u , Θ(v )) > d ,
k
where H is
the Hamming distance
frequency of C and G in each word should be
1
2
...
21 / 29
DNA computing
Combinatorics on Words
Constraints on
L
to avoid possible hybridizations, we impose several constraints on L
L is
Θ-k -m-subword code if
∀u ∈ A , 1 ≤ i ≤ m, A∗ u A Θ(u )A∗ ∩ L = ∅
k
L is
Θ-k -code
i
if
L is bond-free if
∀u , v ∈ A ∩ L, u 6= Θ(v )
k
∀u , v ∈ A ∩ L, H (u , Θ(v )) > d ,
k
where H is
the Hamming distance
frequency of C and G in each word should be
1
2
...
21 / 29
DNA computing
Constraints on
Combinatorics on Words
L
...
22 / 29
DNA computing
Combinatorics on Words
Conjugates and
Θ-conjugates
Proposition (Lyndon and Schützenberger, 1962)
∈ A∗ such that uv = vw . Then there exist p , q ∈ A+
u = pq, w = qp and v = p (qp ) for i ≥ 0.
Let u , v , w
such that
i
Proposition (Kari and Mahalingam, 2007)
Let u , v , w
∈ A+
such that uv
morphism or antimorphism
1
where
Θ
If Θ is an antimorphism, then there exist x , y
either
u
u
2
= Θ(v )w ,
over A.
If
u
is an involutive
∈ A∗
such that
= xy , w = y Θ(x ) and v = Θ(x ); or
= x = Θ(w ), y = Θ(y ) and v = y .
Θ is a morphism, then there exist x , y ∈ A∗
= xy and one of the following hold:
w
= y Θ(x )
and v
w
= Θ(y )x
and v
such that
i
= Θ(xy )xy Θ(x ) for some i ≥ 0;
= Θ(xy )xy Θ(xy )x for some i ≥ 0.
i
23 / 29
DNA computing
Combinatorics on Words
Conjugates and
Θ-conjugates
Proposition (Lyndon and Schützenberger, 1962)
∈ A∗ such that uv = vw . Then there exist p , q ∈ A+
u = pq, w = qp and v = p (qp ) for i ≥ 0.
Let u , v , w
such that
i
Proposition (Kari and Mahalingam, 2007)
Let u , v , w
∈ A+
such that uv
morphism or antimorphism
1
where
Θ
If Θ is an antimorphism, then there exist x , y
either
u
u
2
= Θ(v )w ,
over A.
If
u
is an involutive
∈ A∗
such that
= xy , w = y Θ(x ) and v = Θ(x ); or
= x = Θ(w ), y = Θ(y ) and v = y .
Θ is a morphism, then there exist x , y ∈ A∗
= xy and one of the following hold:
w
= y Θ(x )
and v
w
= Θ(y )x
and v
such that
i
= Θ(xy )xy Θ(x ) for some i ≥ 0;
= Θ(xy )xy Θ(xy )x for some i ≥ 0.
i
23 / 29
DNA computing
Combinatorics on Words
Commutativity and
Θ-commutativity
Proposition (Lyndon and Schützenberger, 1962)
∈ A∗ such that
= p and w = p
Let u , v
that u
i
j
= vu. Then there exist p ∈ A+ such
for i , j > 0 (i.e. it has cyclic solution).
uv
Proposition (Kari and Mahalingam, 2007)
Let u , v
∈ A+
such that uv
= Θ(v )u,
A.
where
Θ
is an involutive
morphism or antimorphism over
1
Θ is an antimorphism, then there exist x , y ∈ A∗ such that
u = x (yx ) , v = (yx ) , x = Θ(x ), y = Θ(y ), m ≥ 1 and
n ≥ 0.
If Θ is a morphism, then there exists x ∈ A+ such that:
If
n
2
x
u
m
i
j
= Θ(x ), u = x and v = x for
= x (Θ(x )x ) and v = (x Θ(x ))
i
j
, ≥ 1;
≥0
some i j
for some i
and j
≥ 1.
24 / 29
DNA computing
Combinatorics on Words
Commutativity and
Θ-commutativity
Proposition (Lyndon and Schützenberger, 1962)
∈ A∗ such that
= p and w = p
Let u , v
that u
i
j
= vu. Then there exist p ∈ A+ such
for i , j > 0 (i.e. it has cyclic solution).
uv
Proposition (Kari and Mahalingam, 2007)
Let u , v
∈ A+
such that uv
= Θ(v )u,
A.
where
Θ
is an involutive
morphism or antimorphism over
1
Θ is an antimorphism, then there exist x , y ∈ A∗ such that
u = x (yx ) , v = (yx ) , x = Θ(x ), y = Θ(y ), m ≥ 1 and
n ≥ 0.
If Θ is a morphism, then there exists x ∈ A+ such that:
If
n
2
x
u
m
i
j
= Θ(x ), u = x and v = x for
= x (Θ(x )x ) and v = (x Θ(x ))
i
j
, ≥ 1;
≥0
some i j
for some i
and j
≥ 1.
24 / 29
DNA computing
Set of
Combinatorics on Words
Θ-palindromes
Proposition
The set of all
Θ-palindromes (Θ
being an involutive antimorphism)
is not regular.
Lemma (Pumping lemma for regular languages)
Let L be a regular language. Then there exists an integer p
depending only on L such that every w
be written as w
1
|y | ≥ 1,
2
|xy | ≥ p,
3
for all i
= xyz
≥ 0,
xy i z
∈L
≥1
of length at least p can
satisfying the following conditions:
∈ L.
Proposition
The set of all
Θ-palindromes
is context-free.
25 / 29
DNA computing
Set of
Combinatorics on Words
Θ-palindromes
Proposition
The set of all
Θ-palindromes (Θ
being an involutive antimorphism)
is not regular.
Lemma (Pumping lemma for regular languages)
Let L be a regular language. Then there exists an integer p
depending only on L such that every w
be written as w
1
|y | ≥ 1,
2
|xy | ≥ p,
3
for all i
= xyz
≥ 0,
xy i z
∈L
≥1
of length at least p can
satisfying the following conditions:
∈ L.
Proposition
The set of all
Θ-palindromes
is context-free.
25 / 29
DNA computing
Set of
Combinatorics on Words
Θ-palindromes
Proposition
The set of all
Θ-palindromes (Θ
being an involutive antimorphism)
is not regular.
Lemma (Pumping lemma for regular languages)
Let L be a regular language. Then there exists an integer p
depending only on L such that every w
be written as w
1
|y | ≥ 1,
2
|xy | ≥ p,
3
for all i
= xyz
≥ 0,
xy i z
∈L
≥1
of length at least p can
satisfying the following conditions:
∈ L.
Proposition
The set of all
Θ-palindromes
is context-free.
25 / 29
DNA computing
Combinatorics on Words
Untitled frame
Proposition
Let
be either an involutive morphism or antimorphism and let
Θ
∈ A+ such that wu = Θ(u )v and w Θ(u ) = uv . Then
w = (xy ) , y = (yx ) and u = (xy ) x for some x , y ∈ A∗ ,
x = Θ(x ), y = Θ(y ), m ≥ 1, n ≥ 0.
w, v, u
m
m
n
Proposition
Let u
∈A
be a non-empty word such that u
6= Θ(u ).
Then the
following statements are equivalent
1
2
√
u is the product of two non-empty
There exists a non-empty
Θ-commutes
3
Θ-palindromes.
Θ-palindrome
v such that
with u.
u is a product of two non-empty
Θ-palindromes.
26 / 29
DNA computing
Combinatorics on Words
Untitled frame
Proposition
Let
be either an involutive morphism or antimorphism and let
Θ
∈ A+ such that wu = Θ(u )v and w Θ(u ) = uv . Then
w = (xy ) , y = (yx ) and u = (xy ) x for some x , y ∈ A∗ ,
x = Θ(x ), y = Θ(y ), m ≥ 1, n ≥ 0.
w, v, u
m
m
n
Proposition
Let u
∈A
be a non-empty word such that u
6= Θ(u ).
Then the
following statements are equivalent
1
2
√
u is the product of two non-empty
There exists a non-empty
Θ-commutes
3
Θ-palindromes.
Θ-palindrome
v such that
with u.
u is a product of two non-empty
Θ-palindromes.
26 / 29
DNA computing
Combinatorics on Words
Second untitled frame
Proposition
Let
Θ
be either a morphic or an antimorphic involution and let
be such that for all a
∈ Σ,
a
Σ
6= Θ(a).
(...)
27 / 29
DNA computing
Combinatorics on Words
References
Lila Kari
28 / 29
Thank you.
29 / 29