The Bounded Search Tree Algorithm for the Closest String Problem has Quadratic Smoothed Complexity Christina Boucher Department of Computer Science and Engineering University of California, San Diego [email protected] Abstract. Given a set S of n strings, each of length `, and a nonnegative value d, we define a center string as a string of length ` that has Hamming distance at most d from each string in S. The Closest String problem aims to determine whether there exists a center string for a given set of strings S and input parameters n, `, and d. When n is relatively large with respect to ` then the basic majority algorithm solves the Closest String problem efficiently, and the problem can also be solved efficiently when either n, ` or d is reasonably small [12]. Hence, the only case for which there is no known efficient algorithm is when n is between log `/ log log ` and log `. Using smoothed analysis, we prove that such Closest String instances can be solved efficiently by the O(n` + nd · dd )-time algorithm by Gramm et al. [13]. In particular, we show that for any given Closest String instance I, the expected running time of 2+o(1) this algorithm on a small perturbation of I is O n` + nd · d . 1 Introduction Finding similar regions in multiple DNA, RNA, or protein sequences plays an important role in many applications, including universal PCR primer design [7, 16, 19, 26], genetic probe design [16], antisense drug design [6, 16], finding transcription factor binding sites in genomic data [28], determining an unbiased consensus of a protein family [4], and motif recognition [16, 24, 25]. The Closest String problem formalizes this task of finding a common pattern in an input set of strings and can be defined as follows: Input: A set of n length-` strings S = {s1 , . . . , sn } over a finite alphabet Σ and a nonnegative integer d. Question: Find a string s of length `, where the Hamming distance from s to any string in si is at most d. We refer to s as the center string and let d(x, y) be the Hamming distance between strings x and y. The optimization version of this problem tries to minimize the parameter d. The Closest String problem was first introduced and studied in the context of bioinformatics by Lanctot et al. [16]. Frances and Litman [11] showed the problem to be NP-complete even for the special case when the input contains only binary strings, implying there is unlikely to be a polynomial-time algorithm for solving this problem unless P = NP. Since its introduction, efficient approximation algorithms and exact heuristics for the Closest String problem have been thoroughly considered [9, 10, 13, 16, 17, 21]. Most recently, Hufsky et al. [14] introduced a data reduction techniques that allows instances that do not have a solution and can be filtered out and incorporate this preprocessing step into the algorithm of Gramm et al. [13]. One approach to investigating the computational intractability of the Closest String problem is to consider its parameterized complexity, which aims to classify computationally hard problems according to their inherent difficulty with respect to a subset of the input parameters. If it is solvable by an algorithm whose running time is polynomial in the input size and exponential in parameters that typically remain small then it can still be considered tractable in some practical sense. A problem ϕ is said to be fixed-parameter tractable with respect to parameter k if there exists an algorithm that solves ϕ in f (k) · nO(1) time, where f is a function of k that is independent of n [8]. Gramm et al. [13] proved that Closest String is fixed-parameter tractable with respect to the parameter d by giving a O(n` + nd · dd )-time algorithm that is based on the bounded search tree paradigm. It has been previously shown that when the number of strings is significantly large with respect to ` (namely, whenever 2n > `) then the basic majority algorithm, which returns a string that contains the majority symbol at each position with ties broken arbitrarily, works well in practice. Also, there exist efficient solutions for the Closest String problem when either n, ` or d are reasonably small [12]. The only case for which there is no known efficient algorithm is when n is between log `/ log log ` and log `. Ma and Sun [21] state: “The instances with d in this range seem to be the hardest instances of the closest string problem. However, because the fixed-parameter algorithm has polynomial (although with high degree) running time on these instances, a proof for the hardness of these instances seem to be difficult too.” We initiate the study of the smoothed complexity of a slightly modified version of the algorithm by Gramm et al. [13], and demonstrate that more careful analysis of the algorithm of Gramm et al. [13] reveals that it is efficient for the “hardest” Closest String instances where n is between log `/ log log ` and log `. Our analysis gives an analytical reason as to why this algorithm performs well in practice. We introduce a perturbation model for the Closest String problem, and prove that the expected size of the search tree of the algorithm of Gramm et al. [13] on these smoothed instances is at most d2+o(1) , hence resolving an open problem that was suggested by Gramm et al. [13], and Ma and Sun [21]. 1.1 Related Work Gramm et al. [13] proved that the Closest String problem is fixed-parameter tractable when parameterized by n, and when parameterized by d. More recently, Ma and Sun gave an O(n|Σ|O(d) )-time algorithm, which is a fixed-parameter algorithm in parameters d and Σ [21]. Chen et al. [5], Wang and Zhu [29], and Zhao and Zhang [30] improved upon the fixed-parameter tractable result of Ma and Sun [21]. Lokshtanov et al. [18] gave a lower bound for the time complexity for the Closest String problem with respect to d. Another approach to investigate the tractability of this NP-complete problem is to consider how well the Closest String problem can be approximated in polynomial-time. Lanctot et al. [16] gave a polynomial time algorithm that achieves a 34 + o(1) approximation guarantee. Li et al. [17], Andoni et al. [1] and Ma and Sun [21] each proved PTAS results for this problem. Smoothed analysis was introduced as an intermediate measure between worstcase and average-case analysis and is used to explain the phenomena that many algorithms with detrimental worst-case analysis efficiently find good solutions in practice. It works by showing that the worst-case instances are fragile to small change; slightly perturbing a worst-case instance destroys the property of it being worst-case [27]. The smoothed complexity of other string and sequence problems has been considered by Andoni and Krauthgamer [2], Banderier et al. [3], Manthey and Reischuk [23], and Ma [20]. Andoni and Krauthgamer [2] studied the smoothed complexity of sequence alignment by the use of a novel model of edit distance; their results demonstrate the efficiency of several tools used for sequence alignment, most notably PatternHunter [22]. Manthey and Reischuk gave several results considering the smoothed analysis of binary search trees [23]. Ma demonstrated that a simple greedy algorithm runs efficiently in practice for Shortest Common Superstring [20], a problem that has applications to string compression and DNA sequence assembly. 1.2 Preliminaries Let s be a string over the alphabet Σ. We denote the length of s as |s|, and the jth letter of s as s[j]. Hence, s = s[1]s[2] . . . s[|s|]. It will be convenient to consider a set of strings S = {s1 , . . . , sn }, each of which has length `, as a n × ` matrix. Then we refer to the ith column as the vector ci = [s1 (i), . . . , sn (i)]T in the matrix representation of S. We refer to a majority string for S as the length-` string containing the letter that occurs most often at each position; this string is not necessarily unique. The following fact, which is easily proved, is used in Section 3. Fact 1 Let I = (S, d) be a Closest String instance and smaj be any majority string for S then d(s∗ , smaj ) ≤ 2d for any center string s∗ for S. Given functions f and g of a natural number variable x, the notation f g (x → ∞) is used to express that f (x) =1 x→∞ g(x) lim and f is an asymptotic estimation of g (for relatively large values of x). The following asymptotic estimation is used in our analysis. Fact 2 For fixed j > 0 the following asymptotic estimation exists: i i+j 2d X 1 2i + j `−1 i=0 i ` ` 1 `−1 2d . Given a Closest String instance I = (S, d) that has at least one center string we can assume, without loss of generality, that 0` is a center string; any instance that has a center string can be transformed to an equivalent instance where 0` is a center string [12]. Hence, for the remainder of this paper we assume that any instance that has a center string, has 0` is a center string. 2 Bounded Search Tree Algorithm The following algorithm, due to Gramm et al. [13], applies a well-known bounded search tree paradigm to prove that the Closest String problem can be solved in linear time when parameterized by d. Bounded Search Tree Algorithm Input: A Closest String instance I = (S, d), a candidate string s, and a parameter ∆d. Output: A center string s if it exists, and “not found” otherwise. If ∆d < 0, then return “Not found” Choose i ∈ {1, . . . , n} such that d(s, si ) > d. P = {p|s[p] 6= si [p]}; Choose any P 0 ⊆ P with |P 0 | = d + 1. For each position p ∈ P 0 Let s(p) = si (p) sret = Bounded Search Tree Algorithm (s, ∆d − 1) If sret 6= “not found ”, then return sret Return “not found” The parameter ∆d is initialized to be equal to d. Since every recursive call decreases ∆d by one and the algorithm halts when ∆d < 0, the search tree has height at most d. At each recursive step if the candidate string s is not a center string then it is augmented at one position as follows: a string si is chosen uniformly at random from the set of strings that have distance greater than d from s, and s is changed so that it is equal to si at one of the positions where s and si disagree. This yields an upper bound of (d + 1)d on the search tree size. Gramm et al. [13] initialize the candidate string s to be a string from S chosen uniformly at random. We consider a slight modification where the candidate string is initialized to be a majority string. As stated in Fact 1, any majority string has distance at most 2d from the center string. The analysis of Gramm et al. [13] concerning the running time and correctness of the algorithm holds for this modification, and yields a worst-case of O(n`+nd·d2d ). Hence, the following theorem is a trivial extension to the worst-case analysis by Gramm et al. [13] and will be used in our smoothed analysis of Bounded Search Tree Algorithm. Theorem 1. “Bounded Search Tree Algorithm” solves the Closest String problem in O(n` + nd · d2d )-time. 3 3.1 Smoothed Analysis Pertubation of Closest String Instances Our model applies to problems defined on strings. It is parameterized by a probability p, where 0 ≤ p ≤ 1 and is defined as follows. Given a length-` string s[1]s[2] . . . s[`], each element is selected (independently) with probability p. Let x be the number of selected elements (on average x = p`). Replace each of the x positions with a symbol chosen (uniformly at random) from the set of |Σ| symbols from Σ. Given a Closest String instance I = (S, d), we obtain a perturbed instance of I by perturbing each string in S with probability p > 0 as previously described. We denote a perturbed instance as I 0 = (S 0 , d0 ), where d0 = d and S 0 contains the perturbed strings of S. This perturbation model has the effect of naturally adding noise to input. Note that the pertubation model may have the affect of converting an instance that has a center string to one that does not, however, this remains a valid model for smoothed analysis. For example, the model used by Spielman and Teng allows pertubation from a feasible linear program to a non-feasible linear program [27]. 3.2 Good Columns and Simple Instances In this subsection we classify the instances for which the Bounded Search Tree Algorithm performs efficiently, and bound the probability that an instance has this classification. This classification is used in our smoothed analysis. Definition 1. Let I = (S, d) be a Closest String instance, and Smaj be the set of majority strings for S. We define S as simple if any string in Smaj has Hamming distance at most d from all strings in S. Closest String instances that are simple have the property that Bounded Search Tree Algorithm halts immediately with a center string. In the remainder of this subsection we aim to bound the probability that an instance is simple. The next definitions are used to simplify the discussion of the analysis that bounds this probability. Recall our assumption that any instance that contains a center string has 0` as a center string. Given an instance I = (S, d) with a center string, we refer to a column of S as good if it contains more zeros than nonzeros and thus, guarantees that the majority symbol is equal to the center string at that position; all other columns are bad. Lemma 1. Let I 0 = (S 0 , d0 ) be the perturbed instance of I = (S, d) with probabil ` n/2−1 ity p. Then the probability that I 0 is not simple is at least 1− 1 − (q(1 − q)) ` n/2+1 n and at most 1 − 1 − n/2+1 (q(1 − q)) , where 0 ≤ q ≤ d` (1 − 2p) + p. Proof. Let s0i ∈ S 0 , s∗ be a closest string for S, and q denote the probability that s0i [j] = 0 for some 0 ≤ j ≤ `. It follows that d(si , s∗ ) d(si , s∗ ) d(si , s∗ ) q= (1 − p) + 1 − p= (1 − 2p) + p, ` ` ` which is at most d` (1 − 2p) + p. We first calculate the probability that a column is good when n is odd. Let Xi,j be a binary random variable that is equal to 1 if si is equal to the value of the center string (i.e. equal to 0)Pat the jth position. For a given column j we let the number of zeros be Xj = i Xi,j . Pr[Xj ≥ bn/2c + 1] = 1 − Pr[Xj ≤ bn/2c] i bn/2c X n q n = 1 − (1 − q) i 1−q i=0 We focus on bounding (1−q)n Pbn/2c n i i=0 q 1−q i . Note that n i is unimodal, peaking when i is equal to bn/2c when 0 ≤ i ≤ bn/2c. We have that: bn/2c X S= i=0 n i q 1−q i bn/2c−1 n q bn/2c − 1 1−q ≥ and similarly, bn/2c S= X i=0 n i q 1−q i ≤ n bn/2c n ≤ bn/2c bn/2c X i=0 q 1−q i q 1−q bn/2c [unimodality] [geometric series] The sum S is equal to the first term up to a small multiplicative error and therefore, we obtain the following: Pr[Xj > bn/2c] ≈ 1 − q bn/2c (1 − q)bn/2c n bn/2c Therefore, we obtain the following bounds: n n/2−1 n/2+1 1− (q(1 − q)) ≤ Pr[Xj > bn/2c] ≤ 1 − (q(1 − q)) n/2 − 1 Using the previous inequality we can bound the probability that I 0 is not simple by determining the probability that I 0 contains at least one bad column. Thus, we get ` n/2+1 1 − 1 − (q(1 − q)) ≤ Pr[I 0 is not simple] and ` n n/2+1 Pr[I is not simple] ≤ 1 − 1 − . (q(1 − q)) n/2 − 1 0 Similarly, these bounds exist for the case when n is even. 3.3 Smoothed Height of the Bounded Search Tree As previously discussed, there are O(dd ) possible paths in a bounded search tree, denoted as T , corresponding to the solutions traversed by the Bounded Search Tree Algorithm. We now bound the size of T for perturbed instances. Let Pi be the indicator variable describing whether the ith path in T results in a center string, i.e. Pi = 1 if the ith path leads to a center string and Pi = 0 otherwise. The algorithm halts when Pi = 1. Let PPi =∞ be the number of paths considered until Pi = 1 and the algorithm halts. Lemma 2. Let I 0 = (S 0 , d0 ) be a perturbed instance with probability 0 < p ≤ 21 . If I 0 = (S 0 , d0 ) is not a simple instance, then for sufficiently large `, constant c > 0, and when n is between log `/ log log ` and log `, we have: Pr[P ≥ ddpc ] ≤ 1 . ddpc Proof. If I 0 = (S 0 , d0 ) does not have a center string then Bounded Search Tree Algorithm will always return false; otherwise, Pi = 1 with some probability. It follows that the expected number of paths that need to be considered is 1/ Pr[Pi = 1]. We now calculate Pr[Pi = 1]. If the candidate string s is not equal to 0` (i.e. a center string) then there exists at least one position of s that can be augmented so that d(s, 0` ) decreases by one; the probability of this occurring is at least 1/`. Let Yk ∈ {0, 1, . . . , d0 } be the random variable that corresponds to the Hamming distance between s and 0` , where k is the number of recursive iterations of the algorithm. The process Y0 , Y1 , Y2 , . . . is a Markov chain with a barrier at state d0 and contains varying time and state dependent transfer probabilities. This process is overly complicated and we instead analyze the following Markov chain: Z0 , Z1 , . . ., where Zk is the random variable that is equal to the state number after k recursive steps and there exists infinitely many states. Initially, this Markov chain is started like the stochastic process above, i.e. Z0 = Y0 . We let Zk+1 = Yk − 1 if the process decreases the Hamming distance between s and 0` by one; otherwise Zk+1 = Yk+1 . After the algorithm halts, we continue with the same transfer probabilities. We can show by induction on k that Yk ≤ Zk for all k and therefore, Pr[Pi = 1] is at least Pr[∃t ≤ d : Zt = 0]. We made the assumption that S 0 contains only one center string, however, this assumption is not needed – the random walk may find another center string while not in the terminating state but this possibility only increases the probability that the algorithm terminates. Given that the Markov chain starts in state k it can reach a halting state in at least k steps by making transitions through the states k − 1, k − 2, . . ., 1, 0. The probability of this happening is (1/`)k . We now incorporate the possibility that several steps in the“wrong” direction are made in the analysis; “wrong” steps refer to when the candidate string is altered so that the distance between the candidate string and the center string increases. Suppose w steps in the Markov chain are taken in wrong direction then k + w steps are needed in the “correct” direction, and therefore, the halting state can be reached in 2w + k steps. Let q(w, k) be the probability that Z2w+k = 0 such that the halting state is not reached in any earlier set, under the condition that the Markov chain started in state k. More formally, q(w, k) = Pr[Z2w+k = 0 and Zα > 0 ∀ α < 2w + k | Z0 = k]. Clearly q(0, k) = (1/`)k , and in the general case q(w, k) is ((` − 1)/`)w (1/`)w+k times the number of ways of arranging w wrong steps and w + k correct steps such that the sequence starts in state k, ends in the halting state and does not reach this state before the last step. w By applying the ballot theorem [15] we can deduce that there are 2w+k 2w+k w possible arrangements of these w wrong steps and w + k correct steps, and the above probability is at least w 2w + k w `−1 1 . w 2w + k ` `w+k This expression is not defined in the case w = k = 0, however, it is equal to 1 in this case. The probability that Pi = 1 at the ith path is dependent on the starting position of the Markov chain, which is equal to the number of bad columns. Let Xbad be the number of bad columns in S 0 , which is at most 2d (by Fact 1). Hence, we get the following: Pr[Pi = 1] ≥ 2d X 1 2 (i−k) Pr[Xbad = k] X q(w, k) w=0 k=1 1 (i−k) 2d 2 X X ≥ (Pr[Xbad ≤ 2d] − Pr[Xbad = 0]) q(w, k) k=1 w=0 1 0 ≥ Pr[I is not simple] (i−k) 2d 2 X X q(w, k) k=1 w=0 1 ≥ 1 − 1 − (q(1 − q)) n/2+1 (i−k) 2d 2 X ` X q(w, k) [Lemma 1] k=1 w=0 We now aim to find a bound on q(w, k). 1 Pr[Pi = 1] ≥ 1 − 1 − (q(1 − q)) n/2+1 (i−1) 2d ` 2 X X q(w, k) w=0 k=1 1 2d+1 (i−1) ` 2 X 1 1 − 1 − (q(1 − q)) 1−` w=0 id+1 1 ` 1 − `−1 n/2+1 ≥ 1 − 1 − (q(1 − q)) 2d+1 1 1 − `−1 n/2+1 [Fact 2] ` n/2+1 Hence, for sufficiently large ` we have Pr[Pi = 1] = 1 − 1 − (q(1 − q)) and it follows that: 1 E[P] = ` n/2+1 1 − 1 − (q(1 − q)) and by Markov inequality recall that for any c > 0 1 Pr[P ≥ ddcp ] ≤ ddcp ≤ 1 − 1 − (q(1 − q)) n/2+1 ` 1 . ddcp 1 − exp(−`/(q(1 − q))n/2+1 ) Hence, Pr[P ≥ ddcp ] is equal to between log `/ log log ` and log `. 1 ddcp for significantly large ` and when n is The following is our main theorem which provides an upper bound on expected number of paths that need to be considered before a center string is found. An important aspect about this result is the small perturbation probability require in comparison to the instance size; the expected number of positions to change in each string is O(log `). Theorem 2. For some small > 0 and perturbation probability 0 ≤ p ≤ log `/`, the expected running time of “Bounded Search Tree Algorithm” on the perturbed instances is O(n` + nd · d2+ ) when n is between log `/ log log ` and log `, and ` is sufficiently large. Proof. Let P 0 be the number of paths considered until a center string is found for a perturbed instance. There are O(d2d ) possible paths in the search tree corresponding to Bounded Search Tree Algorithm. The size of the bounded search tree is equal to zero for simple instances and therefore, we are only required to consider instances that are not simple. For instances that are not simple but satisfy the conditions of Lemma 2, we use this lemma with p ≤ 2+ dc , where c > 0, to bound the size of the search tree. Lemma 1, which describes the probability that an instance is not simple, is also used in the following analysis. E[P] ≤ ddcp · Pr[I 0 is not simple] + 2d X di Pr[P ≥ di ] i=dcp 2+ ≤d 1− 1− n n/2−1 (q(q − 1)) n/2 − 1 ` ! + 2d X di Pr[P ≥ di ] i=2+ For sufficiently large ` and when n is between log `/ log log ` and log `, we get: 2+ E[P] ≤ d + 2d X di Pr[P ≥ di ] i=2+ Therefore, we have E[P] ≤ d2+ + d − 2 − . The expected size of the search tree is at most o(1) + d2+ . It follows form the analysis of Gramm et al. [13] that demonstrated each recursive step takes time O(nd) and the preprocessing time takes O(n`), that Bounded Search Tree Algorithm has expected running time of O(n` + nd · d2+ ). We note that we require ` to be sufficiently large, however, this restriction is not significant since we require ` ≥ 10. As previously mentioned, the problem can be solved efficiently when ` is relatively small (i.e. ` ≤ 10), even the trivial algorithm that tries all |Σ|` can be used for these instances. Acknowledgements The author would like to thank Professor Bin Ma for his discussions and insights concerning the results presented in this paper and Professor Ming Li for suggesting this area of study. The author is supported by NSERC Postdoctoral Fellowship, NSERC Grant OGP0046506, NSERC Grant OGP0048487, Canada Research Chair program, MITACS, and Premier’s Discovery Award. References 1. A. Andoni, P. Indyk, and M. Patrascu. On the optimality of the dimensionality reduction method. In Proc. of FOCS, pages 449–456, 2006. 2. A. Andoni and R. Krauthgamer. The smoothed complexity of edit distance. In Proc. of ICALP, pages 357–369, 2008. 3. C. Banderier, R. Beier, and K. Mehlhorn. Smoothed analysis of three combinatorial problems. In Proc. of MFCS, pages 198–207, 2003. 4. A. Ben-Dor, G. Lancia, J. Perone, and R. Ravi. Banishing bias from consensus strings. In Proc. of 8th CPM, pages 247–261, 1997. 5. Z.-Z. Chen, B. Ma, and L. Wang. A three-string approach to the closest string problem. In Proc. of 16th COCOON, pages 449–458, 2010. 6. X. Deng, G. Li, Z. Li, B. Ma, and L. Wang. Genetic design of drugs without side-effects. SIAM Journal on Computing, 32(4):1073–1090, 2003. 7. J. Dopazo, A. Rodrı́guez, J.C. Sáiz, and F. Sobrino. Design of primers for PCR amplification of highly variable genomes. Computer Applications in the Biosciences, 9:123–125, 1993. 8. R.G. Downey and M.R. Fellows. Parameterized Complexity. Springer, 1999. 9. M.R. Fellows, J. Gramm, and R. Niedermeier. On the parameterized intractability of closest substring and related problems. In Proc. of 19th STACS, pages 262– 273, 2002. 10. M.R. Fellows, J. Gramm, and R. Niedermeier. On the parameterized intractability of motif search problems. Combinatorica, 26:141–167, 2006. 11. M. Frances and A. Litman. On covering problems of codes. Theory of Computing Systems, 30(2):113–119, 1997. 12. J. Gramm, R. Niedermeier, and P. Rossmanith. Exact solutions for closest string and related problems. In Proc. of the 12th ISAAC, pages 441–453, 2001. 13. J. Gramm, R. Niedermeier, and P. Rossmanith. Fixed-parameter algorithms for closest string and related problems. Algorithmica, 37(1):25–42, 2003. 14. F. Hufsky, L. Kuchenbecker, K. Jahn, J. Stoye, and S. Böcker. Swiftly computing center strings. BMC Bioinformatics, 12(106), 2011. 15. T. Konstantopoulos. Ballot theorems revisited. Statistics and Probability Letters, 24:331–338, 1995. 16. J.K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing string selection problems. Information and Computation, 185(1):41–55, 2003. 17. M. Li, B. Ma, and L. Wang. Finding similar regions in many strings. Journal of Computer and System Sciences, 65(1):73–96, 2002. 18. D. Lokshtanov, D. Marx, and S. Saurabh. Slightly superexponential parameterized problems. In Proc. of the 22nd SODA, pages 760–776, 2011. 19. K. Lucas, M. Busch, S. Össinger, and J.A. Thompson. An improved microcomputer program for finding gene- and gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. Computer Applications in the Biosciences, 7:525–529, 1991. 20. B. Ma. Why greedy works for shortest common superstring problem. In Proc. of CPM, pages 244–254, 2008. 21. B. Ma and X. Sun. More efficient algorithms for closest string and substring problems. SIAM Journal on Computing, 39:1432–1443, 2009. 22. B. Ma, J. Tromp, and M. Li. PatternHunter: faster and more sensitive homology search. Bioinformatics, 18(3):440–445, 2002. 23. B. Manthey and R. Reischuk. Smoothed analysis of binary search trees. Theoretical Computer Science, 3(378):292–315, 2007. 24. G. Pavesi, G. Mauri, and G. Pesole. An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics, 17:S207–S214, 2001. 25. P. Pevzner and S. Sze. Combinatorial approaches to finding subtle signals in DNA strings. In Proc. of 8th ISMB, pages 269–278, 2000. 26. V. Proutski and E.C. Holme. Primer master: A new program for the design and analyiss of PCR primers. Computer Applications in the Biosciences, 12:253–255, 1996. 27. D.A. Spielman and S.-H. Teng. Smoothed analysis of algorithms: why the simplex algorithm ususally takes polynomial time. Journal of the ACM, 51:296–305, 2004. 28. M. Tompa and et al. Assessing computational tools for the discovery of transcription factor binding sites. Nature, 23(1):137–144, 2005. 29. L. Wang and B. Zhu. Efficient algorithms for the closest string and distinguishing string selection problems. In Proc. of 3rd FAW, pages 261–270, 2009. 30. R. Zhao and N. Zhang. A more efficient closest string algorithm. In Proc. of 2nd BICoB, pages 210–215, 2010.
© Copyright 2026 Paperzz