How To Use Traveling Salesman Problem to Solve Physical

Physical Mapping Problem
[email protected]
Problem Definition

Physical mapping的定義
DNA
A
B
Q
C
P
D
O
N
E
M
F G
H
H
K
J
J
L
Fragment of DNA
Why We Need Physical
Mapping

可以利用這個地圖將DNA做完全排序


可以知道基因到底如何對人類產生作用
利用人造蛋白質...等等來改進遺傳體質
8
人類染色體(約 10 bp)
Physical map
6
(約 10 bp)
AGACTAGTCGTAACGATCGCTAATTTAAGGCTACT.....
DNA Sequencing
3
(約 10 bp)
Why We Need Physical
Mapping

可以利用這個地圖將DNA做完全排序



可以知道基因到底如何對人類產生作用
利用人造蛋白質...等等來改進遺傳體質
可以得知基因(或標記)的大約位置


對於一些遺傳疾病可以得到較多的資訊
可以幫助偵測是否具有遺傳疾病
DNA
A
B
Q
C
P
D
O
N
E
M
F G
H
K
J
H
J
α
L
Fragment of DNA
target DNA
加入酵素
Partial Digest Problem
•by single enzyme A
•restriction sites: a1< a2< a3<.....<
ap
•multiset of fragment lengths {ajai,i<j}
target DNA
Double Digest Problem (DDP)


Clones first completely digested by enzyme
A,then by B, finally A and B together
restriction sites:




by A: a1< a2< a3<.....< ap
by B: b1< b2< b3 <.....< bq
by A+B : c1< c2< c3 <.....< cp+q
Reconstruct the restriction sites from these
multisets
Example : DDP
Enzyme A
Enzyme B
Enzyme A+B
3
4
1
6
5
2
8
7
3
10
11
3
5
6
7
Solution
Double Digest Problem (DDP)
target DNA
........
By Probe Approach
target DNA
................. ATGCGCTAACTGGACTTCAAGCCTAAACTGCATCAGACTT ........
TACGCGATTGACCTGAAGT
Complementary probe
The Spirit of Hybridization
target DNA
A
B
1
C
D
2
E
F
3
G
H
4
I
5
J
1
A
B
C
D
E
F
G
H
I
J
2
3
4
5
A
B
C
D
E
F
G
H
I
J
1
1
1
1
2
1
1
1
1
1
1
3
5
1
1
1
1
1
1
1
1
4
1
1
1
1
1
J
1
2
3
4
5
D
F
I
E
G
A
C
H
B
B
H
C
A
G
E
I
F
D
J
1
1
1
1
1
2
1
1
1
1
1
1
1
3
4
5
1
1
1
1
1
1
1
1
1
1
1
False Negative
A、C
C、D、E
E、F
1
2
3
4
F、G
4
A、F、G
5
G、H、I
5
G、H、I
6
E、F、I、J、K
1
2
3
A、B、C
C、D、E
6
I、J、K
E、F
False Positive
A、C
C、D、E
E、F
1
2
3
4
F、G
4
A、F、G
5
G、H、I
5
G、H、I
6
E、F、I、J、K
1
2
3
A、B、C
C、D、E
6
I、J、K
E、F
Chimeric Clones
A、C
C、D、E
E、F
1
2
3
4
F、G
4
A、F、G
5
G、H、I
5
G、H、I
6
E、F、I、J、K
1
2
3
A、B、C
C、D、E
6
I、J、K
E、F
Clones
1 2 3 4 5 6
A、B、C
C、D、E
3
E、F
4
F、G
1
2
I、J、K
G、H、I
Probes
5
6
A
B
C
D
E
F
G
H
I
J
K
0 0 0 0 1 0
0 0 0 0 1 0
0 0 0 0 1 1
0 0 0 0 0 1
0 0 1 0 0 1
0 0 1 1 0 0
0 1 0 1 0 0
0 1 0 0 0 0
1 1 0 0 0 0
1 0 0 0 0 0
1 0 0 0 0 0
Clones
1 2 3 4 5 6
A、B、C
C、D、E
3
E、F、K
4
I、J、K、F、G
1
I、J、K
2
G、H、I
Probes
5
6
A
B
C
D
E
F
G
H
I
J
K
0 0 0 0 1 0
0 0 0 0 0 0
0 0 0 0 1 1
0 0 0 0 0 1
0 0 1 0 0 1
0 0 1 1 0 0
0 1 0 1 0 0
0 1 0 0 0 0
1 1 0 1 0 0
1 0 0 1 0 0
1 0 1 1 0 0
How To Use Traveling
Salesman Problem to Solve
Physical Mapping Problem
How to Convert to TSP?

Hamming distance
A
B
C
D
E
F
G
H
I
J
1
1
1
1
2
1
1
1
1
1
1
3
5
1
1
1
1
1
1
1
1
4
1
1
1
1
1
A
B
C
D
E
F
G
H
I
J
A
0
2
0
3
2
2
2
1
2
4
B CDE F GH I J
0
2
3
3
3
3
2
4
2
0
3
3
3
1
2
4
3
0
2
2
2
3
1
1
0
0
2
5
1
3
0
2
5
1
3
0
3 0
3 4 0
3 2 2 0
How to Convert to TSP?


Hamming distance
Cycle weight = number of gaps transitions
+2n
A
B
C
D
E
F
G
H
I
J
1
1
1
1
2
1
1
1
1
1
1
3
5
1
1
1
1
1
1
1
1
4
1
1
1
1
1
A
B
C
D
E
F
G
H
I
J
A
0
2
0
3
2
2
2
1
2
4
B CDE F GH I J
0
2
3
3
3
3
2
4
2
0
3
3
3
1
2
4
3
0
2
2
2
3
1
1
0
0
2
5
1
3
0
2
5
1
3
0
3 0
3 4 0
3 2 2 0
How to Convert to TSP?



Hamming distance
Cycle weight = number of gaps transitions
+2n
So, minimize the cycle weight is to the gap
number
Our approach

We also convert it to optimization problem
F(A) = X*C(A)+Y*P(A)+Z*N(A)+T*M(A)+ P*L(A).
 p 

X   ln 
1 p 


  
Y   ln 

1




  
Z   ln 

1  
  
T   ln 

1 
Using more complicated model
Using Genetic Algorithm to solve it.
The results of our approach tested on simulated data.
(a)
The false negative rate
is set as 0.1. The false
positive rate is 0.05.
(b)
The false negative rate
is set as 0.1. The false
positive rate is 0.01.
Experimental Results of our GA tested on Real data from chromosome 1
(a)
It shows the results of our GA
run with the data which is a
contig with about 95 clones and
about 120 probes
(b)
It shows the results of our GA
run with the data which is a
contig with about 172 clones
and about 136 probes