BIOINFORMATICS Vol. 20 no. 14 2004, pages 2181–2188 doi:10.1093/bioinformatics/bth202 Probabilistic nucleotide assembling method for sequencing by hybridization Takaho A. Endo Division of Molecular Life Sciences, Department of Genetic Information, School of Medicine, Tokai University, Bouseidai, Isehara, Kanagawa, 259-1193, Japan Received on July 14, 2003; revised on October 10, 2003; accepted on February 4, 2004 Advance Access publication April 8, 2004 ABSTRACT Motivation: Developing a new method of assembling small sequences based on sequencing by hybridization with many positive and negative faults. First, an interpretation of a generic traveling salesman problem is provided (i.e. finding the shortest route for visiting many cities), using genetic algorithms. Second, positive errors are excluded before assembly by a sanitization process. Results: The present method outperforms those described in previous studies, in terms of both time and accuracy. Availability: http://kamit.med.u-tokai.ac.jp/~takaho/sbh/index. html Contact: [email protected] INTRODUCTION Sequencing by hybridization (SBH) is proposed as a promising approach to reading DNA sequences in a short amount of time (Banis and Smith, 1998; Lysoc et al., 1988; Drmanac et al., 1989). However, due to an intrinsic problem (i.e. two types of errors associated with nucleotide hybridization), SBH has been less widely applicable for unknown sequences than its designers had expected. With the first type of error, a smaller spectrum (a set of SBH probes) is observed than would be expected from the length of the target sequences. The second type of error leads to a larger spectrum than would be expected. The former type of error is referred to as negative fault, and the latter as positive fault. These errors are most likely due to particular experimental conditions, such as the annealing temperature, or the structure of the target sequences. Due to these two types of errors, it is often difficult to assemble the fragments in the correct order. Because hybridization can lead to errors, it is necessary to assemble the fragments according to a probabilistic process. Each fragment can expand other sequences with a certain probability. The probability that one sequence succeeds another increases with the amount of their overlap. When we acknowledge that the problem of assembling SBH fragments is one of maximizing likelihood, we see that the problem becomes a variation of the traveling salesman Bioinformatics 20(14) © Oxford University Press 2004; all rights reserved. problem (TSP), a classic combinatorial optimization problem (Lawler et al., 1985). Although Błażewicz et al. (1997) reported a TSP model for SBH assembly, they simplified the distance as 0 or 1 from one city (a node of the TSP) to other cities. That distance corresponds to the probability of extension in SBH. Their method was restricted to problems with positive faults; when negative faults were included, no succeeding probes could be proposed. In the present study, their method is expanded into a generic model in which the distance between cities corresponds to the probability of extension of one sequence by another. TSP is known to be a non-deterministic polynomial hard (NP-hard) problem, and is difficult to solve within a realistic time period. While there are many ways to solve TSPs within a realistic period, such as genetic algorithms (GAs) or simulated annealing, GAs were chosen for the present program based on the ease with which such programs can be written. GAs allow the calculations to be performed using a personal computer, and they help in the construction of better solutions for time-consuming problems (Goldberg, 1989). Moreover, it was found that enumerating the patterns of subfragments (i.e. the fractions of observed fragments) reveals which fragments were positive faults. Removal of positivefault candidates before assembly made the SBH spectrum virtually free of positive faults. According to the algorithm given below, the computer program produced the correct assembly using simulated SBH fragment sequences containing both positive and negative faults. Almost all random target sequences, even those containing 30% positive and 25% negative faults, were reconstructed correctly. Sequences obtained from the GenBank database containing 20% positive and 20% negative faults were reconstructed much more accurately than has been reported in previous works (Błażewicz et al., 1999, 2002). ALGORITHM Traveling salesman problem When fragments i and j are aligned with fragment j sucd s ceeding fragment i (Fig. 1), Mi→j and Mi→j represent the 2181 T.A.Endo A B Fragment 0 TGGCTGAAGT p0 1 = C ( 9 - 1 ) Fragment 1 GGCTGTAGTA p1 2 = C ( 8 - 0 ) Fragment 2 CTGTAGTACG Fig. 1. Assembly of SBH as a TSP application. (A) Successor probability was used in this study. Matching and unmatching bases are counted at all positions for two fragments, and the maximum value is set as a probability. C is a constant value. Colored bases are different (unmatching) from fragment 0. (B) TSP analogy of SBH assembly. Since the successor probability differs between a to b and b to a, each fragment has an asymmetric distance to/from the other one. The increment of the probability corresponds to the decrement of the distance between cities. Assembly among SBH fragments is interpreted as finding the shortest route among the cities. number of the identical or different bases. The position of the d s alignment is scanned and selected such that Mi→j − Mi→j is maximized. The probability is then written as Equation (1). In the equation, q is defined as an arbitrary positive coefficient (0 ≤ q ≤ 1) corresponding to the similarity of overlapping regions, which can be cancelled out, given that it does not affect comparison among alignments. The conditions for judging whether or not two fragments, i and j , can be fused are described as follows: pi→j ∝ q Mi→j −Mi→j . In simulations of this report, pt was set as a probability when one site more than half the probe length matches. pt was the probability of a 6-base match for the spectrum of 10-base probes, and that of a 5-base match for an 8-base probe. If condition 1 is met and condition 2 is not, then the head or the tail of the fragment is sealed and fusion cannot occur. The fitness of each assembly was calculated by a logarithm of likelihood, thus avoiding underflow values in the computer program and resulting in a reduction in calculation time. Using a logarithm, we do not have to consider the coefficient q in Equation (1) to compare the fitness of GA individuals. The fitness function can be written as Equation (3), because the transition probability pi→j [Equation (1)] is proportional to the difference between matched and unmatched bases: d s (1) The order of fragments is denoted as a vector n: n = (n1 , n2 , . . . , nN ) (ni = nj ). The likelihood of the vector is then calculated as follows: L(n) = N −1 pni →ni+1 . (2) i=1 Thus, SBH using the TSP is interpreted as a problem of finding a vector that maximizes the likelihood L(n) of assembly. It is well known that the TSP is an NP-hard problem and that it is difficult to solve within a realistic period of time. Although the GAs applied here could reduce the necessary calculation time, it has been reported that GAs cannot provide optimum results when the number of cities exceeds one hundred. Therefore, it is necessary to reduce the number of fragments assembled by the GA–TSP. Considering the whole tour of the TSP, the problem would become easier if some cities were clustered, and if the order of the cities in the cluster were already determined (subtours). 2182 (1) pi→j exceeds the threshold probability pt . (2) The most probable predecessor of fragment i is the fragment j , and the most probable successor of fragment j is the fragment i. F (n) = N −1 log(pni →ni+1 ) = N −1 d s min(Mi→j − Mi→j ). i=1 i=1 (3) d Mi→j s Mi→j The difference between and was minimized over the overlapping position to maximize the succeeding probability for each pair. After a given number of generations, the best individual is proposed as the most likely assembly of the SBH fragments. Probabilistic nucleotide assembling method for SBH 500 Sample ....GCTGGATTACCCAAA.......AGATTACCTTTCA.... Positive faults The number of subfragments Without errors With 10% positive errors 400 Negative faults Misassembled 100 0 1 2 5 10 15 Frequency of subfragments Fig. 2. A typical case of misassembly with both positive and negative faults in fragment extension. If the fragment of a region having a similar pattern in the target sequence (fragments B and C) are lost, and a fragment from another region containing errors (C’) exists, the preparation process fails to assemble the fragments in the correct order. In this case, the correct successor, fragment D, is not assembled after fragment A, but fragment C is assembled. Consequently, fragment E would succeed fragments A and C , and the resulting sequence renders false the whole assembly. Sanitization of the fragment spectrum However robust the probabilistic method of SBH assembly is against negative and positive faults, pretreatment of the spectrum can produce bound fragments that are not included in correct sequences. Figure 2 shows how such cases can occur. When both negative and positive faults exist at proximal positions, and when there is a similar pattern in the sequence, the sequences can be misassembled in the pretreatment process. If the spectrum is ideal, fragment A shares 9 bases with succeeding fragment B and 8 bases with fragment C. When fragments B and C are missed (negative faults), the spectrum then contains fragment C , which is similar to the correct successor C, and shares 8 bases with fragment A but contains positive faults in that region. In such a case, pretreatment leads to error. It is necessary to remove such fragments containing errors. The next issue is how to detect error fragments before assembling without first establishing the correct sequences. Assume that approximately 500-base-long target sequences are analyzed by 10-base probes. Six nucleotides can provide many more patterns than the target sequences have. Therefore, most of the 6-base patterns in the observed fragments take unique positions in the targets. A 6-base subfragment can take 5 positions in 10-base fragments. This means that we can find 5 fragments in an ideal spectrum for each 6-base pattern, although the subfragments in the head and tail fragments appear not five times, but one A B C D C' E CTGGATTACC TGGATTACCC GGATTACCCA GATTACCCAA GGATTACCTT GATTACCTTT Positive faults ....GCTGGATTACCTTTCA.... Fig. 3. Distribution of subfragment frequency to detect positive faults. A random 560-base sequence was divided into 551 fragments of 10 bases and the number of 6-base subfragments were counted (filled bars). Simulated positive faults were inserted into the spectrum, comprising as much as 10% of the original spectrum; the distribution was then recounted (pale bars). Fragments made by positive faults had frequencies 1, 2, 6, 7 and 11. to four times. Figure 3 shows the frequency with which each subfragment is counted. The filled bars indicate the case of the ideal spectrum. Frequencies of 5 and 10 have a large number of subfragments. Subfragments with frequencies of 10, 15 or 20 are derived from the same 6-base patterns in the target sequence. When we consider positive faults, subfragments containing errors are not shared with neighbor fragments. Their frequencies are limited to one or two. If they are identical to other regions by chance, then this frequency can become six or seven (Fig. 3). Therefore, we can assume that fragments containing subfragments with a frequency of one or two are fragments containing errors or edge (head or tail) fragments. When we remove these fragments with errors before assembly by GA–TSP, it renders the following process much easier, as well as more accurate. In order to enhance the effectiveness of detecting and removing positive faults, which will be referred to as sanitization, it is desirable to find faults in subfragments with a frequency of six, seven and possibly five, because fewer than five fragments might be sharing a subfragment due to negative errors. We can then classify fragments sharing the same subfragments into groups in which some fragments are correct and others contain errors. Alignment of fragments having the same subfragments was performed by an exhaustive alignment algorithm. When the number of fragments with a particular pattern is six, it is assumed that there is a set of correct fragments, as well as one or two fragments containing an error. The number of cases correctly classifying these fragments into two groups (i.e. aligned and misaligned) is 26 . All cases were tested and the alignment without errors was considered for each group sharing a subfragment. When the frequency was 10 or more but less than 15 and the number of groups was 3, the number of cases undergoing the test in such a case would be up to 315 . 2183 T.A.Endo After this sanitization process, all given fragments were classified as fragments with errors or as subsequences from the target sequences. SIMULATION RESULTS Computer environment and simulation conditions The program for the assembly of SBH fragments was written in standard ANSI C++. All codes are available to academic colleagues at my website. The following simulations were performed on a computer equipped with an AMD Athlon 1800+ CPU, a Debian GNU/Linux operating system and 512-MB memory. This environment is equivalent to that of basic personal computers, and my program completed assemblies within a short duration of ∼24 s, with sanitization. The parameters set for the GA were as follows: population size = 1000; mutation rate = 0.1 for rotation and 0.1 for replacement; and maximum generations = 2000. GA iterations continued until a better route did not appear for 300 generations, or when the generation reached a given maximum value. The next section recounts simulations I conducted with randomly generated sequences as target sequences in order to obtain sequences in appropriate conditions. In the subsequent section, biological sequences are introduced from a database to compare the present method with other methods. In the present experiments using random sequences, sequences (input) having a given length (L) were created by a computer program. The length of the SBH fragments was set as l bases, and, subsequently, L − l + 1 fragments were generated. Negative faults were simulated by removing some of the fragments from the sets. When the ratio was 20%, 0.2(L − l + 1) fragments were removed. In order to introduce positive faults, fragments were created that differed by only one base from a fragment in the original set, but which were not included in the spectrum. The positions of substitution were randomly selected. These fragments were then included in the spectrum. These positive faults corresponded to hybridizing sequences that were similar but not identical to the complementary sequences of the probes on a sequencing chip. Given spectra were randomly mixed before simulation, such that the order of fragments for the respective inputs would not have an influence on the results. The correctness of the results from the computer program (outputs) was measured by comparing the input and the output sequences. The scoring criterion suggested by Błażewicz was as follows. An output was aligned with the original sequence by the Smith–Watermann method. The matching positions were counted as +1 and the positions of different bases and gaps were counted as −1. Finally, the scores at all positions were calculated and summed. The elapsed time was also measured in order to determine the required time according to the 2184 error ratios, the length of probes and the length of the target sequences. Accuracy of reconstruction under various conditions The theoretical limit of SBH with a probe length of l has been previously reported as O(2l ) (Prevzner et al., 1991; Dyer et al., 1994). Figure 4 presents the results of the reconstruction of random sequences. Figure 4A and C respectively represent the correctness and calculation time for the reconstruction of sequences from the spectrum without errors. Figure 4B and D show the same features, but with errors. The error ratios were 10% positive and 10% negative. The SD of each accuracy value was apparently large, because a few random sequences containing repetitive regions decreased the average score, even though the rest of the reconstructed sequences remained almost the same as those of the original inputs. These results demonstrated that the present algorithm is capable of reconstructing sequences containing both positive and negative faults. The degree of accuracy depended on the length of the target sequences, as well as on the length of probes. The present algorithm correctly reconstructed sequences having both positive and negative errors without significant increases in calculation time. It is of note that the present algorithm appeared to reconstruct sequences that were as long as had been theoretically predicted. The longest reconstructible sequence l was predicted as O(2l ) bases, e.g. 1024 for a 10-base probe, 512 for a 9-base probe and 256 for an 8-base probe. Simulation using various error ratios In the previous section, I applied a 10% positive and 10% negative error ratio. However, when an assembly method is applied in the context of SBH observations, ratios of positive and negative faults vary according to the experimental conditions, e.g. the length of the observed sequences, annealing temperature, repetitiveness of sequences, quality of DNA chips, etc. Therefore, a simulation was conducted in order to change both the positive and the negative faults. The tolerance of the algorithm against such faults was then measured. The correctness of the output from my program was measured using various error ratios. Each cell shown in Figure 5A with certain ratios of positive/negative faults is the average of 100 trials with different sequences, and each cell is indicated by a particular pseudocolor representing the correctness or the calculation time. As shown in Figure 5, my program was able to reconstruct sequences that were almost identical to the originals, provided the positive faults amounted to <35% and the negative faults amounted to <25% of the total spectrum. It is also shown here that the present method tolerated positive faults better than it did negative faults. Probabilistic nucleotide assembling method for SBH A B 1.0 Correctness ( score / length) Correctness (score / length) 1.0 Positive error 0% Negative error 0% 0 100 500 Positive error 10% Negative error 10% 0 100 1000 Sequence length (bp) C 1000 D 10 Time (sec) 10 Time (sec) 500 Sequence length (bp) 1.0 l = 11 1.0 l = 10 l=9 l=8 0.1 100 Positive error 0% Negative error 0% 500 0.1 1000 Sequence length (bp) 100 Positive error 10% Negative error 10% 500 l=7 1000 Sequence length (bp) Fig. 4. Simulation of SBH assembly using random sequences. Random nucleotide sequences were created at given lengths; the score (the calculation method of which is given in the text) and elapsed time in a 100-trial average were observed for each condition, probe length (open circle, l = 7; red open rectangle, l = 8; blue cross, l = 9, green diamond, l = 10, yellow closed circle, l = 11), target sequence length and error ratio. Each error bar indicates the standard deviation. (A) and (C) show the simulation results from the errorless spectrum, and (B) and (D) show 10%/10% positive and negative error ratios, respectively. (A) and (B) show how correct the output of this method is when errors are introduced; (C) and (D) show the average time until the program produced the assemblies. Figure 5B shows the average calculation time. Negative faults influenced scores more than did positive faults. Meanwhile, positive faults influenced calculation time more than did negative faults. In particular, the number of prebound fragments before the GA–TSP, along with sequences with repetitive regions, increased the average calculation time required. However, the maximum amount of time needed to produce an output was <10 s, which is not an unrealistic amount of time for sequencing. Applying sequences from a biological database In 2000, Błażewicz et al. proposed a tabu search method to assemble SBH fragments, and in 2002 proposed a heuristic algorithm. They obtained sequences from the GenBank 2185 T.A.Endo A 500 20 250 Score Negative error(%) 30 10 0 0 0 10 20 30 Positive error(%) B 8.0 20 4.0 10 Calculation time(sec) Negative error(%) 30 0 0 10 20 30 0.0 Positive error(%) Fig. 5. Simulation of SBH assembly using various error ratios. Around 500-base random sequences were created and each target sequence produced 491 10-base fragments. The spectrum was modified by positive and negative errors as described in the text. My program reconstructed the sequences from the simulated spectrums, and the assemblies and the target sequences were compared. (A) shows the correctness with which the program reconstructed target sequences. (B) shows the average time until the program reconstructed the target sequences. Each cell is filled with pseudocolor. for their tests, and the accession numbers were included in their reports (Błażewicz et al., 2000, 2002). However, sequences obtained from the database were longer than those that were used in their papers; the sequences from these papers ranged from 109 to 509 bases. Therefore, the same sequences used in the studies of Błażewicz et al. were obtained from the authors, and were used for the present study. Table 1 compares the correctness of their algorithms and that of the present algorithms, which were judged according to their evaluation criterion. The lengths of the sequences were 109, 209, 309, 409 and 509, and the maximum score for each 2186 trial was the same as the length of each sequence. The number of sequences of each length was 40, and the average scores are shown in the table. It is clear that as a result of the sanitization of positive faults, the present method is superior to the former methods. These results indicate that the sanitization process is advantageous for obtaining correct sequences, even if this process tends to lose marginal fragments as a result of the loss of shared subfragments. It must be emphasized in this context that the method of Błażewicz et al. required that the first nucleotide be determined by some type of chemical method before assembly. In contrast, the present method does not require any additional information regarding the sequences; i.e. only the observed fragments are needed. Furthermore, my program requires much less in terms of computer resources. Błażewicz et al’s. simulation was executed on a supercomputer and required only a few seconds. Zhang et al. (2003) proposed an algorithm that could take more than several hundred seconds to finish a calculation using different target sequences. My program, however, requires <4 s to obtain the accurate sequences. It is also of note that the present method enables the use of relatively small computers equipped with a sequencer to proceed with the assembly soon after hybridization is completed. DISCUSSION Comparison with a gapped probe method Freize et al. (1999) proposed a different method of sequencing chips with gapped probes. In their method, the probes contain universal or ‘don’t care’ nucleotides N , and the number of probes required to read a sequence is reduced (Freize et al., 1999; Preparata and Upfal, 2000). Doi and Imai (2000) and Halperin et al. (2002) reported that they improved upon Preparata and Upfal’s work and were able to reconstruct sequences longer than 1000 bases with much smaller probes. Using their methods, probes were proposed such as ‘Xs (Ns X)r ’ [GP(s, r)], in which X is one of four nucleotides (A, C, G and T) and N is a universal nucleotide that can bind to any nucleotide. Although the small number of probes is an advantage of their method, it is very vulnerable to errors, and previous reports with gapped probes could deal with fewer than 1% positive and negative errors (0.1%/0.1% in Preparata and Upfal, 2000; 0.5%/0.5% in Doi and Imai, 2000). An even more important problem regarding the gapped algorithm becomes apparent when we determine the prefix from the spectrum. The algorithm requires an s(r + 1)-prefix in order to start the extension of target sequences. This means that we have to read a certain number of prefix nucleotides of target sequences before applying SBH. The reconstruction of the prefix sequences using only gapped probes is very difficult. These results show that even if errors are very few in Probabilistic nucleotide assembling method for SBH Table 1. Comparison of correctness score with the results of other reports Method Błażewicz et al. (1999) Błażewicz et al. (2002) Without sanitization With sanitization Length (base) 109 209 309 409 509 105.1 107.8 106.7 ± 4.88 107.8 ± 1.88 184.5 188.8 203.5 ± 19.1 206.2 ± 7.56 244.6 282.3 289.6 ± 59.64 296.3 ± 46.7 315.1 350.0 366.8 ± 107.4 401.1 ± 38.8 312.3 408.0 393.7 ± 163.2 498.0 ± 43.2 Table 2. Elapsed time of assembling with or without sanitization Method Without sanitization With sanitization Time (s) 109 209 309 409 509 0.15 ± 0.032 0.061 ± 9.4 × 10−3 0.70 ± 0.050 0.26 ± 0.15 3.90 ± 7.45 0.88 ± 1.98 28.3 ± 97.3 1.49 ± 1.37 29.9 ± 23.8 2.07 ± 1.94 number, SBH chips with gapped probes need solid probes for the reconstruction of prefixes. Effects of sanitization The results of assembly both with sanitization and without sanitization are shown in Table 1. These results demonstrate that the sanitization process has an apparent advantage for use with SBH. Sanitization renders SBH assembling more accurate and more rapid. Figure 5A shows that the algorithm in this report provided correct assembly at the same level, regardless of whether the frequency of errors increases. Within a 30% positive error ratio, correctness was influenced by negative faults more than by positive faults, most likely because sanitization purged the errors. Table 2 shows that sanitization decreased the calculation time significantly. This reduction was the result of the reduced number of pre-bound fragments before the GA–TSP. Positive faults could have the same succession probability as the correct fragments, in which case pre-binding would tend to suspend assemblies as the result of positive errors. Consequently, removing positive errors simplified the assembly process, decreased the calculation time and enabled more correct results . CONCLUSION The method described in this report solved the intrinsic problem of SBH, i.e. hybridization errors. Provided that the target sequences are not highly repetitive, the present method is able to reconstruct the majority of them. The present method can thus be used to assemble SBH fragments, and it can efficiently sequence on a massive scale. ACKNOWLEDGEMENTS I would like to thank to Dr Jacek Błażewicz and Dr Marta Kasprzak for providing the data used to illustrate the method described in their papers. All their reports regarding SBH were very insightful. This study was supported by the Japanese Society for the Promotion of Science (JSPS). REFERENCES Banis,W. and Smith,G.C. (1988) A novel method for nucleic acid sequence determination. J. Theor. Biol., 135, 303–307. Błażewicz,J., Kaczmarek,J., Kasprzak,M., Markiewicz,W.T. and Wȩglarz,J. (1997) Sequential and parallel algorithm for DNA sequencing. CABIOS, 13, 151–158. Błażewicz,J., Formanowicz,P., Kasprzak,M., Markiewicz,W.T. and Wȩglarz,J. (1999) DNA sequencing with positive and negative errors. J. Comput. Biol., 6, 113–123. Błażewicz,J., Formanowicz,P., Kasprzak,M., Markiewicz,W.T. and Wȩglarz,J. (2000) Tabu search for DNA sequencing with false negatives and false positives. European J. Oper. Res., 125, 257–265. Błażewicz,J., Formanowicz,P., Guinand,F. and Kasprzak,M. (2002) A heuristic managing errors for DNA sequencing. Bioinformatics, 18, 652–660. Doi,K. and Imai,H. (2000) Sequencing by hybridization in the presence of hybridization errors. In Proceedings of the Workshop on Genome Informatics, 11, pp. 53–62. Drmanac,R., Labat,I., Bruckner,I. and Crkvenjakov,R. (1989) Sequencing of megabase plus DNA by hybridization: theory of the method. Genomics, 4, 114–128. Dyer,M.E., Frieze,A.M. and Suen,S. (1994) The probability of unique solutions of sequencing by hybridization. J. Comput. Biol., 1, 105–110. 2187 T.A.Endo Freize,A.M., Preparata,F.P. and Upfal,E. (1999) Optimal reconstruction of a sequence from its probes. J. Comput. Biol., 6, 361–368. Goldberg,D.E. (1989) Genetic Algorithm in Search, Optimization, and Machine Learning. Addison-Wesley. Halperin,E., Halperin,S., Hartman,T. and Shamir,R. (2002) Handling long targets and errors in sequencing by hybridization. RECOMB’02, 176–185. Junger,M. and Naddef,D. (2001) TSP cuts which do not conform to the template paradigm. Comput. Combinat. Optim., 261–304. Lawler,E.L., Lenstra,J.K., Rinnooy Kan,A.H.G. and Shmoys,D.B. (1985) The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. John Wiley & Sons, New York. 2188 Prevzner,P.A., Lysov,Y.P., Khrapko,K.R., Belyavsky,S.V., Florentiev,V.L. and Mirzabekov,A.D. (1991) Improved chips for sequencing by hybridization. J. Biomol. Struct. Dyn, 9, 399–410. Preparata,F.P. and Upfal,E. (2000) Sequencing-by-hybridization at the information-theory bound: an optimal algorithm, J. Comput. Biol., 7, 621–630. Strezoska,Z., Paunesku,T., Radosavljevic, D., Labat,I., Drmanac,R. and Crkvenjakov,R. (1991) DNA sequencing by hybridization: 100 bases read by a non-gel-based method. Proc. Natl Acad. Sci., USA, 88, 10089–10093. Zhang,J., Wu,L. and Zhang,X. (2003) Reconstruction of DNA sequencing by hybridization. Bioinformatics, 19, 14–21.
© Copyright 2026 Paperzz