Probabilistic nucleotide assembling method for sequencing by

BIOINFORMATICS
Vol. 20 no. 14 2004, pages 2181–2188
doi:10.1093/bioinformatics/bth202
Probabilistic nucleotide assembling method for
sequencing by hybridization
Takaho A. Endo
Division of Molecular Life Sciences, Department of Genetic Information, School of
Medicine, Tokai University, Bouseidai, Isehara, Kanagawa, 259-1193, Japan
Received on July 14, 2003; revised on October 10, 2003; accepted on February 4, 2004
Advance Access publication April 8, 2004
ABSTRACT
Motivation: Developing a new method of assembling small
sequences based on sequencing by hybridization with many
positive and negative faults. First, an interpretation of a
generic traveling salesman problem is provided (i.e. finding the shortest route for visiting many cities), using genetic
algorithms. Second, positive errors are excluded before
assembly by a sanitization process.
Results: The present method outperforms those described in
previous studies, in terms of both time and accuracy.
Availability: http://kamit.med.u-tokai.ac.jp/~takaho/sbh/index.
html
Contact: [email protected]
INTRODUCTION
Sequencing by hybridization (SBH) is proposed as a promising approach to reading DNA sequences in a short amount
of time (Banis and Smith, 1998; Lysoc et al., 1988; Drmanac
et al., 1989).
However, due to an intrinsic problem (i.e. two types of errors
associated with nucleotide hybridization), SBH has been less
widely applicable for unknown sequences than its designers
had expected.
With the first type of error, a smaller spectrum (a set of SBH
probes) is observed than would be expected from the length of
the target sequences. The second type of error leads to a larger
spectrum than would be expected. The former type of error
is referred to as negative fault, and the latter as positive fault.
These errors are most likely due to particular experimental
conditions, such as the annealing temperature, or the structure
of the target sequences. Due to these two types of errors, it is
often difficult to assemble the fragments in the correct order.
Because hybridization can lead to errors, it is necessary
to assemble the fragments according to a probabilistic process. Each fragment can expand other sequences with a
certain probability. The probability that one sequence succeeds another increases with the amount of their overlap.
When we acknowledge that the problem of assembling SBH
fragments is one of maximizing likelihood, we see that
the problem becomes a variation of the traveling salesman
Bioinformatics 20(14) © Oxford University Press 2004; all rights reserved.
problem (TSP), a classic combinatorial optimization problem
(Lawler et al., 1985).
Although Błażewicz et al. (1997) reported a TSP model for
SBH assembly, they simplified the distance as 0 or 1 from one
city (a node of the TSP) to other cities. That distance corresponds to the probability of extension in SBH. Their method
was restricted to problems with positive faults; when negative
faults were included, no succeeding probes could be proposed.
In the present study, their method is expanded into a generic
model in which the distance between cities corresponds to the
probability of extension of one sequence by another.
TSP is known to be a non-deterministic polynomial hard
(NP-hard) problem, and is difficult to solve within a realistic
time period. While there are many ways to solve TSPs within
a realistic period, such as genetic algorithms (GAs) or simulated annealing, GAs were chosen for the present program
based on the ease with which such programs can be written.
GAs allow the calculations to be performed using a personal
computer, and they help in the construction of better solutions
for time-consuming problems (Goldberg, 1989).
Moreover, it was found that enumerating the patterns of subfragments (i.e. the fractions of observed fragments) reveals
which fragments were positive faults. Removal of positivefault candidates before assembly made the SBH spectrum
virtually free of positive faults.
According to the algorithm given below, the computer program produced the correct assembly using simulated SBH
fragment sequences containing both positive and negative
faults. Almost all random target sequences, even those containing 30% positive and 25% negative faults, were reconstructed correctly. Sequences obtained from the GenBank
database containing 20% positive and 20% negative faults
were reconstructed much more accurately than has been
reported in previous works (Błażewicz et al., 1999, 2002).
ALGORITHM
Traveling salesman problem
When fragments i and j are aligned with fragment j sucd
s
ceeding fragment i (Fig. 1), Mi→j
and Mi→j
represent the
2181
T.A.Endo
A
B
Fragment 0
TGGCTGAAGT
p0 1 = C ( 9 - 1 )
Fragment 1
GGCTGTAGTA
p1 2 = C ( 8 - 0 )
Fragment 2
CTGTAGTACG
Fig. 1. Assembly of SBH as a TSP application. (A) Successor probability was used in this study. Matching and unmatching bases are counted
at all positions for two fragments, and the maximum value is set as a probability. C is a constant value. Colored bases are different (unmatching)
from fragment 0. (B) TSP analogy of SBH assembly. Since the successor probability differs between a to b and b to a, each fragment has
an asymmetric distance to/from the other one. The increment of the probability corresponds to the decrement of the distance between cities.
Assembly among SBH fragments is interpreted as finding the shortest route among the cities.
number of the identical or different bases. The position of the
d
s
alignment is scanned and selected such that Mi→j
− Mi→j
is
maximized. The probability is then written as Equation (1). In
the equation, q is defined as an arbitrary positive coefficient
(0 ≤ q ≤ 1) corresponding to the similarity of overlapping
regions, which can be cancelled out, given that it does not
affect comparison among alignments.
The conditions for judging whether or not two fragments, i
and j , can be fused are described as follows:
pi→j ∝ q Mi→j −Mi→j .
In simulations of this report, pt was set as a probability when
one site more than half the probe length matches. pt was the
probability of a 6-base match for the spectrum of 10-base
probes, and that of a 5-base match for an 8-base probe.
If condition 1 is met and condition 2 is not, then the head
or the tail of the fragment is sealed and fusion cannot occur.
The fitness of each assembly was calculated by a logarithm
of likelihood, thus avoiding underflow values in the computer
program and resulting in a reduction in calculation time. Using
a logarithm, we do not have to consider the coefficient q in
Equation (1) to compare the fitness of GA individuals. The
fitness function can be written as Equation (3), because the
transition probability pi→j [Equation (1)] is proportional to
the difference between matched and unmatched bases:
d
s
(1)
The order of fragments is denoted as a vector n:
n = (n1 , n2 , . . . , nN )
(ni = nj ).
The likelihood of the vector is then calculated as follows:
L(n) =
N
−1
pni →ni+1 .
(2)
i=1
Thus, SBH using the TSP is interpreted as a problem of finding
a vector that maximizes the likelihood L(n) of assembly.
It is well known that the TSP is an NP-hard problem and
that it is difficult to solve within a realistic period of time.
Although the GAs applied here could reduce the necessary
calculation time, it has been reported that GAs cannot provide
optimum results when the number of cities exceeds one hundred. Therefore, it is necessary to reduce the number of
fragments assembled by the GA–TSP.
Considering the whole tour of the TSP, the problem would
become easier if some cities were clustered, and if the order
of the cities in the cluster were already determined (subtours).
2182
(1) pi→j exceeds the threshold probability pt .
(2) The most probable predecessor of fragment i is the fragment j , and the most probable successor of fragment j
is the fragment i.
F (n) =
N
−1
log(pni →ni+1 ) =
N
−1
d
s
min(Mi→j
− Mi→j
).
i=1
i=1
(3)
d
Mi→j
s
Mi→j
The difference between
and
was minimized
over the overlapping position to maximize the succeeding
probability for each pair.
After a given number of generations, the best individual is
proposed as the most likely assembly of the SBH fragments.
Probabilistic nucleotide assembling method for SBH
500
Sample
....GCTGGATTACCCAAA.......AGATTACCTTTCA....
Positive faults
The number of subfragments
Without errors
With 10% positive errors
400
Negative faults
Misassembled
100
0
1
2
5
10
15
Frequency of subfragments
Fig. 2. A typical case of misassembly with both positive and negative
faults in fragment extension. If the fragment of a region having a similar pattern in the target sequence (fragments B and C) are lost, and a
fragment from another region containing errors (C’) exists, the preparation process fails to assemble the fragments in the correct order.
In this case, the correct successor, fragment D, is not assembled after
fragment A, but fragment C is assembled. Consequently, fragment
E would succeed fragments A and C , and the resulting sequence
renders false the whole assembly.
Sanitization of the fragment spectrum
However robust the probabilistic method of SBH assembly
is against negative and positive faults, pretreatment of the
spectrum can produce bound fragments that are not included
in correct sequences.
Figure 2 shows how such cases can occur. When both negative and positive faults exist at proximal positions, and when
there is a similar pattern in the sequence, the sequences can
be misassembled in the pretreatment process. If the spectrum
is ideal, fragment A shares 9 bases with succeeding fragment B and 8 bases with fragment C. When fragments B
and C are missed (negative faults), the spectrum then contains fragment C , which is similar to the correct successor
C, and shares 8 bases with fragment A but contains positive
faults in that region. In such a case, pretreatment leads to
error.
It is necessary to remove such fragments containing errors.
The next issue is how to detect error fragments before
assembling without first establishing the correct sequences.
Assume that approximately 500-base-long target sequences
are analyzed by 10-base probes. Six nucleotides can provide
many more patterns than the target sequences have. Therefore,
most of the 6-base patterns in the observed fragments take
unique positions in the targets.
A 6-base subfragment can take 5 positions in 10-base fragments. This means that we can find 5 fragments in an ideal
spectrum for each 6-base pattern, although the subfragments
in the head and tail fragments appear not five times, but one
A
B
C
D
C'
E
CTGGATTACC
TGGATTACCC
GGATTACCCA
GATTACCCAA
GGATTACCTT
GATTACCTTT
Positive faults
....GCTGGATTACCTTTCA....
Fig. 3. Distribution of subfragment frequency to detect positive
faults. A random 560-base sequence was divided into 551 fragments of 10 bases and the number of 6-base subfragments were
counted (filled bars). Simulated positive faults were inserted into
the spectrum, comprising as much as 10% of the original spectrum;
the distribution was then recounted (pale bars). Fragments made by
positive faults had frequencies 1, 2, 6, 7 and 11.
to four times. Figure 3 shows the frequency with which each
subfragment is counted. The filled bars indicate the case of
the ideal spectrum. Frequencies of 5 and 10 have a large number of subfragments. Subfragments with frequencies of 10, 15
or 20 are derived from the same 6-base patterns in the target
sequence.
When we consider positive faults, subfragments containing errors are not shared with neighbor fragments. Their
frequencies are limited to one or two. If they are identical
to other regions by chance, then this frequency can become
six or seven (Fig. 3). Therefore, we can assume that fragments containing subfragments with a frequency of one or
two are fragments containing errors or edge (head or tail)
fragments.
When we remove these fragments with errors before
assembly by GA–TSP, it renders the following process much
easier, as well as more accurate.
In order to enhance the effectiveness of detecting and
removing positive faults, which will be referred to as sanitization, it is desirable to find faults in subfragments with a
frequency of six, seven and possibly five, because fewer than
five fragments might be sharing a subfragment due to negative
errors. We can then classify fragments sharing the same subfragments into groups in which some fragments are correct
and others contain errors.
Alignment of fragments having the same subfragments was
performed by an exhaustive alignment algorithm. When the
number of fragments with a particular pattern is six, it is
assumed that there is a set of correct fragments, as well as
one or two fragments containing an error. The number of
cases correctly classifying these fragments into two groups
(i.e. aligned and misaligned) is 26 . All cases were tested
and the alignment without errors was considered for each
group sharing a subfragment. When the frequency was 10 or
more but less than 15 and the number of groups was 3, the
number of cases undergoing the test in such a case would be
up to 315 .
2183
T.A.Endo
After this sanitization process, all given fragments were
classified as fragments with errors or as subsequences from
the target sequences.
SIMULATION RESULTS
Computer environment and simulation conditions
The program for the assembly of SBH fragments was written
in standard ANSI C++. All codes are available to academic
colleagues at my website.
The following simulations were performed on a computer equipped with an AMD Athlon 1800+ CPU, a Debian
GNU/Linux operating system and 512-MB memory. This
environment is equivalent to that of basic personal computers,
and my program completed assemblies within a short duration
of ∼24 s, with sanitization.
The parameters set for the GA were as follows: population size = 1000; mutation rate = 0.1 for rotation and
0.1 for replacement; and maximum generations = 2000.
GA iterations continued until a better route did not appear
for 300 generations, or when the generation reached a given
maximum value.
The next section recounts simulations I conducted with randomly generated sequences as target sequences in order to
obtain sequences in appropriate conditions. In the subsequent
section, biological sequences are introduced from a database
to compare the present method with other methods.
In the present experiments using random sequences,
sequences (input) having a given length (L) were created
by a computer program. The length of the SBH fragments
was set as l bases, and, subsequently, L − l + 1 fragments
were generated. Negative faults were simulated by removing
some of the fragments from the sets. When the ratio was 20%,
0.2(L − l + 1) fragments were removed. In order to introduce
positive faults, fragments were created that differed by only
one base from a fragment in the original set, but which were
not included in the spectrum.
The positions of substitution were randomly selected. These
fragments were then included in the spectrum. These positive
faults corresponded to hybridizing sequences that were similar
but not identical to the complementary sequences of the probes
on a sequencing chip.
Given spectra were randomly mixed before simulation, such
that the order of fragments for the respective inputs would not
have an influence on the results.
The correctness of the results from the computer program
(outputs) was measured by comparing the input and the output
sequences. The scoring criterion suggested by Błażewicz was
as follows. An output was aligned with the original sequence
by the Smith–Watermann method. The matching positions
were counted as +1 and the positions of different bases and
gaps were counted as −1. Finally, the scores at all positions
were calculated and summed. The elapsed time was also measured in order to determine the required time according to the
2184
error ratios, the length of probes and the length of the target
sequences.
Accuracy of reconstruction under various
conditions
The theoretical limit of SBH with a probe length of l has
been previously reported as O(2l ) (Prevzner et al., 1991; Dyer
et al., 1994).
Figure 4 presents the results of the reconstruction of random
sequences. Figure 4A and C respectively represent the correctness and calculation time for the reconstruction of sequences
from the spectrum without errors. Figure 4B and D show the
same features, but with errors. The error ratios were 10% positive and 10% negative. The SD of each accuracy value was
apparently large, because a few random sequences containing
repetitive regions decreased the average score, even though
the rest of the reconstructed sequences remained almost the
same as those of the original inputs.
These results demonstrated that the present algorithm is capable of reconstructing sequences containing both positive and
negative faults. The degree of accuracy depended on the length
of the target sequences, as well as on the length of probes.
The present algorithm correctly reconstructed sequences
having both positive and negative errors without significant
increases in calculation time.
It is of note that the present algorithm appeared to reconstruct sequences that were as long as had been theoretically
predicted. The longest reconstructible sequence l was predicted as O(2l ) bases, e.g. 1024 for a 10-base probe, 512 for
a 9-base probe and 256 for an 8-base probe.
Simulation using various error ratios
In the previous section, I applied a 10% positive and
10% negative error ratio. However, when an assembly method
is applied in the context of SBH observations, ratios of
positive and negative faults vary according to the experimental conditions, e.g. the length of the observed sequences,
annealing temperature, repetitiveness of sequences, quality
of DNA chips, etc. Therefore, a simulation was conducted
in order to change both the positive and the negative faults.
The tolerance of the algorithm against such faults was then
measured.
The correctness of the output from my program was measured using various error ratios. Each cell shown in Figure 5A
with certain ratios of positive/negative faults is the average of 100 trials with different sequences, and each cell is
indicated by a particular pseudocolor representing the correctness or the calculation time. As shown in Figure 5, my
program was able to reconstruct sequences that were almost
identical to the originals, provided the positive faults amounted to <35% and the negative faults amounted to <25% of
the total spectrum. It is also shown here that the present
method tolerated positive faults better than it did negative
faults.
Probabilistic nucleotide assembling method for SBH
A
B
1.0
Correctness ( score / length)
Correctness (score / length)
1.0
Positive error 0%
Negative error 0%
0
100
500
Positive error 10%
Negative error 10%
0
100
1000
Sequence length (bp)
C
1000
D
10
Time (sec)
10
Time (sec)
500
Sequence length (bp)
1.0
l = 11
1.0
l = 10
l=9
l=8
0.1
100
Positive error 0%
Negative error 0%
500
0.1
1000
Sequence length (bp)
100
Positive error 10%
Negative error 10%
500
l=7
1000
Sequence length (bp)
Fig. 4. Simulation of SBH assembly using random sequences. Random nucleotide sequences were created at given lengths; the score (the
calculation method of which is given in the text) and elapsed time in a 100-trial average were observed for each condition, probe length (open
circle, l = 7; red open rectangle, l = 8; blue cross, l = 9, green diamond, l = 10, yellow closed circle, l = 11), target sequence length and
error ratio. Each error bar indicates the standard deviation. (A) and (C) show the simulation results from the errorless spectrum, and (B) and
(D) show 10%/10% positive and negative error ratios, respectively. (A) and (B) show how correct the output of this method is when errors
are introduced; (C) and (D) show the average time until the program produced the assemblies.
Figure 5B shows the average calculation time. Negative faults influenced scores more than did positive faults.
Meanwhile, positive faults influenced calculation time more
than did negative faults. In particular, the number of prebound fragments before the GA–TSP, along with sequences
with repetitive regions, increased the average calculation time
required. However, the maximum amount of time needed to
produce an output was <10 s, which is not an unrealistic
amount of time for sequencing.
Applying sequences from a biological database
In 2000, Błażewicz et al. proposed a tabu search method
to assemble SBH fragments, and in 2002 proposed a heuristic algorithm. They obtained sequences from the GenBank
2185
T.A.Endo
A
500
20
250
Score
Negative error(%)
30
10
0
0
0
10
20
30
Positive error(%)
B
8.0
20
4.0
10
Calculation time(sec)
Negative error(%)
30
0
0
10
20
30
0.0
Positive error(%)
Fig. 5. Simulation of SBH assembly using various error ratios.
Around 500-base random sequences were created and each target sequence produced 491 10-base fragments. The spectrum was
modified by positive and negative errors as described in the text.
My program reconstructed the sequences from the simulated spectrums, and the assemblies and the target sequences were compared.
(A) shows the correctness with which the program reconstructed
target sequences. (B) shows the average time until the program reconstructed the target sequences. Each cell is filled with pseudocolor.
for their tests, and the accession numbers were included
in their reports (Błażewicz et al., 2000, 2002). However,
sequences obtained from the database were longer than those
that were used in their papers; the sequences from these
papers ranged from 109 to 509 bases. Therefore, the same
sequences used in the studies of Błażewicz et al. were
obtained from the authors, and were used for the present
study.
Table 1 compares the correctness of their algorithms and
that of the present algorithms, which were judged according
to their evaluation criterion. The lengths of the sequences were
109, 209, 309, 409 and 509, and the maximum score for each
2186
trial was the same as the length of each sequence. The number
of sequences of each length was 40, and the average scores
are shown in the table.
It is clear that as a result of the sanitization of positive faults,
the present method is superior to the former methods. These
results indicate that the sanitization process is advantageous
for obtaining correct sequences, even if this process tends
to lose marginal fragments as a result of the loss of shared
subfragments.
It must be emphasized in this context that the method of
Błażewicz et al. required that the first nucleotide be determined by some type of chemical method before assembly. In
contrast, the present method does not require any additional
information regarding the sequences; i.e. only the observed
fragments are needed.
Furthermore, my program requires much less in terms
of computer resources. Błażewicz et al’s. simulation was
executed on a supercomputer and required only a few seconds.
Zhang et al. (2003) proposed an algorithm that could take
more than several hundred seconds to finish a calculation using
different target sequences. My program, however, requires
<4 s to obtain the accurate sequences. It is also of note that the
present method enables the use of relatively small computers
equipped with a sequencer to proceed with the assembly soon
after hybridization is completed.
DISCUSSION
Comparison with a gapped probe method
Freize et al. (1999) proposed a different method of sequencing chips with gapped probes. In their method, the probes
contain universal or ‘don’t care’ nucleotides N , and the number of probes required to read a sequence is reduced (Freize
et al., 1999; Preparata and Upfal, 2000). Doi and Imai (2000)
and Halperin et al. (2002) reported that they improved upon
Preparata and Upfal’s work and were able to reconstruct
sequences longer than 1000 bases with much smaller probes.
Using their methods, probes were proposed such as
‘Xs (Ns X)r ’ [GP(s, r)], in which X is one of four nucleotides
(A, C, G and T) and N is a universal nucleotide that can bind
to any nucleotide.
Although the small number of probes is an advantage of their
method, it is very vulnerable to errors, and previous reports
with gapped probes could deal with fewer than 1% positive
and negative errors (0.1%/0.1% in Preparata and Upfal, 2000;
0.5%/0.5% in Doi and Imai, 2000).
An even more important problem regarding the gapped
algorithm becomes apparent when we determine the prefix
from the spectrum. The algorithm requires an s(r + 1)-prefix
in order to start the extension of target sequences. This means
that we have to read a certain number of prefix nucleotides
of target sequences before applying SBH. The reconstruction
of the prefix sequences using only gapped probes is very difficult. These results show that even if errors are very few in
Probabilistic nucleotide assembling method for SBH
Table 1. Comparison of correctness score with the results of other reports
Method
Błażewicz et al. (1999)
Błażewicz et al. (2002)
Without sanitization
With sanitization
Length (base)
109
209
309
409
509
105.1
107.8
106.7 ± 4.88
107.8 ± 1.88
184.5
188.8
203.5 ± 19.1
206.2 ± 7.56
244.6
282.3
289.6 ± 59.64
296.3 ± 46.7
315.1
350.0
366.8 ± 107.4
401.1 ± 38.8
312.3
408.0
393.7 ± 163.2
498.0 ± 43.2
Table 2. Elapsed time of assembling with or without sanitization
Method
Without sanitization
With sanitization
Time (s)
109
209
309
409
509
0.15 ± 0.032
0.061 ± 9.4 × 10−3
0.70 ± 0.050
0.26 ± 0.15
3.90 ± 7.45
0.88 ± 1.98
28.3 ± 97.3
1.49 ± 1.37
29.9 ± 23.8
2.07 ± 1.94
number, SBH chips with gapped probes need solid probes for
the reconstruction of prefixes.
Effects of sanitization
The results of assembly both with sanitization and without
sanitization are shown in Table 1. These results demonstrate
that the sanitization process has an apparent advantage for
use with SBH. Sanitization renders SBH assembling more
accurate and more rapid.
Figure 5A shows that the algorithm in this report provided
correct assembly at the same level, regardless of whether the
frequency of errors increases. Within a 30% positive error
ratio, correctness was influenced by negative faults more than
by positive faults, most likely because sanitization purged the
errors.
Table 2 shows that sanitization decreased the calculation
time significantly. This reduction was the result of the reduced
number of pre-bound fragments before the GA–TSP. Positive faults could have the same succession probability as
the correct fragments, in which case pre-binding would tend
to suspend assemblies as the result of positive errors. Consequently, removing positive errors simplified the assembly
process, decreased the calculation time and enabled more
correct results .
CONCLUSION
The method described in this report solved the intrinsic problem of SBH, i.e. hybridization errors. Provided that the target
sequences are not highly repetitive, the present method is able
to reconstruct the majority of them.
The present method can thus be used to assemble SBH
fragments, and it can efficiently sequence on a massive scale.
ACKNOWLEDGEMENTS
I would like to thank to Dr Jacek Błażewicz and Dr Marta
Kasprzak for providing the data used to illustrate the
method described in their papers. All their reports regarding SBH were very insightful. This study was supported by the Japanese Society for the Promotion of Science
(JSPS).
REFERENCES
Banis,W. and Smith,G.C. (1988) A novel method for nucleic acid
sequence determination. J. Theor. Biol., 135, 303–307.
Błażewicz,J., Kaczmarek,J., Kasprzak,M., Markiewicz,W.T. and
Wȩglarz,J. (1997) Sequential and parallel algorithm for DNA
sequencing. CABIOS, 13, 151–158.
Błażewicz,J., Formanowicz,P., Kasprzak,M., Markiewicz,W.T. and
Wȩglarz,J. (1999) DNA sequencing with positive and negative
errors. J. Comput. Biol., 6, 113–123.
Błażewicz,J., Formanowicz,P., Kasprzak,M., Markiewicz,W.T. and
Wȩglarz,J. (2000) Tabu search for DNA sequencing with false
negatives and false positives. European J. Oper. Res., 125,
257–265.
Błażewicz,J., Formanowicz,P., Guinand,F. and Kasprzak,M. (2002)
A heuristic managing errors for DNA sequencing. Bioinformatics,
18, 652–660.
Doi,K. and Imai,H. (2000) Sequencing by hybridization in the presence of hybridization errors. In Proceedings of the Workshop on
Genome Informatics, 11, pp. 53–62.
Drmanac,R., Labat,I., Bruckner,I. and Crkvenjakov,R. (1989)
Sequencing of megabase plus DNA by hybridization: theory of
the method. Genomics, 4, 114–128.
Dyer,M.E., Frieze,A.M. and Suen,S. (1994) The probability of
unique solutions of sequencing by hybridization. J. Comput. Biol.,
1, 105–110.
2187
T.A.Endo
Freize,A.M., Preparata,F.P. and Upfal,E. (1999) Optimal reconstruction of a sequence from its probes. J. Comput. Biol., 6,
361–368.
Goldberg,D.E. (1989) Genetic Algorithm in Search, Optimization,
and Machine Learning. Addison-Wesley.
Halperin,E., Halperin,S., Hartman,T. and Shamir,R. (2002) Handling long targets and errors in sequencing by hybridization.
RECOMB’02, 176–185.
Junger,M. and Naddef,D. (2001) TSP cuts which do not conform to
the template paradigm. Comput. Combinat. Optim., 261–304.
Lawler,E.L., Lenstra,J.K., Rinnooy Kan,A.H.G. and Shmoys,D.B.
(1985) The Traveling Salesman Problem: A Guided Tour of
Combinatorial Optimization. John Wiley & Sons, New York.
2188
Prevzner,P.A.,
Lysov,Y.P.,
Khrapko,K.R.,
Belyavsky,S.V.,
Florentiev,V.L. and Mirzabekov,A.D. (1991) Improved chips
for sequencing by hybridization. J. Biomol. Struct. Dyn, 9,
399–410.
Preparata,F.P. and Upfal,E. (2000) Sequencing-by-hybridization at
the information-theory bound: an optimal algorithm, J. Comput.
Biol., 7, 621–630.
Strezoska,Z., Paunesku,T., Radosavljevic, D., Labat,I., Drmanac,R.
and Crkvenjakov,R. (1991) DNA sequencing by hybridization:
100 bases read by a non-gel-based method. Proc. Natl Acad. Sci.,
USA, 88, 10089–10093.
Zhang,J., Wu,L. and Zhang,X. (2003) Reconstruction of DNA
sequencing by hybridization. Bioinformatics, 19, 14–21.