The Bounded Search Tree Algorithm for the

The Bounded Search Tree Algorithm for the
Closest String Problem has Quadratic
Smoothed Complexity
Christina Boucher
Department of Computer Science and Engineering
University of California, San Diego
[email protected]
Abstract. Given a set S of n strings, each of length `, and a nonnegative value d, we define a center string as a string of length ` that
has Hamming distance at most d from each string in S. The Closest
String problem aims to determine whether there exists a center string
for a given set of strings S and input parameters n, `, and d. When n is
relatively large with respect to ` then the basic majority algorithm solves
the Closest String problem efficiently, and the problem can also be
solved efficiently when either n, ` or d is reasonably small [12]. Hence,
the only case for which there is no known efficient algorithm is when n is
between log `/ log log ` and log `. Using smoothed analysis, we prove that
such Closest String instances can be solved efficiently by the O(n` +
nd · dd )-time algorithm by Gramm et al. [13]. In particular, we show that
for any given Closest String instance I, the expected
running time
of
2+o(1)
this algorithm on a small perturbation of I is O n` + nd · d
.
1
Introduction
Finding similar regions in multiple DNA, RNA, or protein sequences plays an important role in many applications, including universal PCR primer design [7, 16,
19, 26], genetic probe design [16], antisense drug design [6, 16], finding transcription factor binding sites in genomic data [28], determining an unbiased consensus
of a protein family [4], and motif recognition [16, 24, 25]. The Closest String
problem formalizes this task of finding a common pattern in an input set of
strings and can be defined as follows:
Input: A set of n length-` strings S = {s1 , . . . , sn } over a finite alphabet Σ and
a nonnegative integer d.
Question: Find a string s of length `, where the Hamming distance from s to
any string in si is at most d.
We refer to s as the center string and let d(x, y) be the Hamming distance between strings x and y. The optimization version of this problem tries to minimize
the parameter d.
The Closest String problem was first introduced and studied in the context of bioinformatics by Lanctot et al. [16]. Frances and Litman [11] showed the
problem to be NP-complete even for the special case when the input contains
only binary strings, implying there is unlikely to be a polynomial-time algorithm
for solving this problem unless P = NP. Since its introduction, efficient approximation algorithms and exact heuristics for the Closest String problem have
been thoroughly considered [9, 10, 13, 16, 17, 21]. Most recently, Hufsky et al. [14]
introduced a data reduction techniques that allows instances that do not have a
solution and can be filtered out and incorporate this preprocessing step into the
algorithm of Gramm et al. [13].
One approach to investigating the computational intractability of the Closest String problem is to consider its parameterized complexity, which aims
to classify computationally hard problems according to their inherent difficulty
with respect to a subset of the input parameters. If it is solvable by an algorithm whose running time is polynomial in the input size and exponential in
parameters that typically remain small then it can still be considered tractable
in some practical sense. A problem ϕ is said to be fixed-parameter tractable with
respect to parameter k if there exists an algorithm that solves ϕ in f (k) · nO(1)
time, where f is a function of k that is independent of n [8]. Gramm et al. [13]
proved that Closest String is fixed-parameter tractable with respect to the
parameter d by giving a O(n` + nd · dd )-time algorithm that is based on the
bounded search tree paradigm.
It has been previously shown that when the number of strings is significantly
large with respect to ` (namely, whenever 2n > `) then the basic majority algorithm, which returns a string that contains the majority symbol at each position
with ties broken arbitrarily, works well in practice. Also, there exist efficient solutions for the Closest String problem when either n, ` or d are reasonably
small [12]. The only case for which there is no known efficient algorithm is when
n is between log `/ log log ` and log `. Ma and Sun [21] state: “The instances with
d in this range seem to be the hardest instances of the closest string problem.
However, because the fixed-parameter algorithm has polynomial (although with
high degree) running time on these instances, a proof for the hardness of these
instances seem to be difficult too.”
We initiate the study of the smoothed complexity of a slightly modified version of the algorithm by Gramm et al. [13], and demonstrate that more careful
analysis of the algorithm of Gramm et al. [13] reveals that it is efficient for
the “hardest” Closest String instances where n is between log `/ log log ` and
log `. Our analysis gives an analytical reason as to why this algorithm performs
well in practice. We introduce a perturbation model for the Closest String
problem, and prove that the expected size of the search tree of the algorithm of
Gramm et al. [13] on these smoothed instances is at most d2+o(1) , hence resolving an open problem that was suggested by Gramm et al. [13], and Ma and Sun
[21].
1.1
Related Work
Gramm et al. [13] proved that the Closest String problem is fixed-parameter
tractable when parameterized by n, and when parameterized by d. More recently,
Ma and Sun gave an O(n|Σ|O(d) )-time algorithm, which is a fixed-parameter algorithm in parameters d and Σ [21]. Chen et al. [5], Wang and Zhu [29], and
Zhao and Zhang [30] improved upon the fixed-parameter tractable result of Ma
and Sun [21]. Lokshtanov et al. [18] gave a lower bound for the time complexity
for the Closest String problem with respect to d. Another approach to investigate the tractability of this NP-complete problem is to consider how well the
Closest String problem can be approximated in polynomial-time. Lanctot et
al. [16] gave a polynomial time algorithm that achieves a 34 + o(1) approximation
guarantee. Li et al. [17], Andoni et al. [1] and Ma and Sun [21] each proved
PTAS results for this problem.
Smoothed analysis was introduced as an intermediate measure between worstcase and average-case analysis and is used to explain the phenomena that many
algorithms with detrimental worst-case analysis efficiently find good solutions
in practice. It works by showing that the worst-case instances are fragile to
small change; slightly perturbing a worst-case instance destroys the property of
it being worst-case [27]. The smoothed complexity of other string and sequence
problems has been considered by Andoni and Krauthgamer [2], Banderier et
al. [3], Manthey and Reischuk [23], and Ma [20]. Andoni and Krauthgamer [2]
studied the smoothed complexity of sequence alignment by the use of a novel
model of edit distance; their results demonstrate the efficiency of several tools
used for sequence alignment, most notably PatternHunter [22]. Manthey and
Reischuk gave several results considering the smoothed analysis of binary search
trees [23]. Ma demonstrated that a simple greedy algorithm runs efficiently in
practice for Shortest Common Superstring [20], a problem that has applications to string compression and DNA sequence assembly.
1.2
Preliminaries
Let s be a string over the alphabet Σ. We denote the length of s as |s|, and
the jth letter of s as s[j]. Hence, s = s[1]s[2] . . . s[|s|]. It will be convenient to
consider a set of strings S = {s1 , . . . , sn }, each of which has length `, as a n × `
matrix. Then we refer to the ith column as the vector ci = [s1 (i), . . . , sn (i)]T in
the matrix representation of S.
We refer to a majority string for S as the length-` string containing the letter
that occurs most often at each position; this string is not necessarily unique. The
following fact, which is easily proved, is used in Section 3.
Fact 1 Let I = (S, d) be a Closest String instance and smaj be any majority
string for S then d(s∗ , smaj ) ≤ 2d for any center string s∗ for S.
Given functions f and g of a natural number variable x, the notation f g
(x → ∞) is used to express that
f (x)
=1
x→∞ g(x)
lim
and f is an asymptotic estimation of g (for relatively large values of x). The
following asymptotic estimation is used in our analysis.
Fact 2 For fixed j > 0 the following asymptotic estimation exists:
i i+j
2d X
1
2i + j
`−1
i=0
i
`
`
1
`−1
2d
.
Given a Closest String instance I = (S, d) that has at least one center
string we can assume, without loss of generality, that 0` is a center string; any
instance that has a center string can be transformed to an equivalent instance
where 0` is a center string [12]. Hence, for the remainder of this paper we assume
that any instance that has a center string, has 0` is a center string.
2
Bounded Search Tree Algorithm
The following algorithm, due to Gramm et al. [13], applies a well-known bounded
search tree paradigm to prove that the Closest String problem can be solved
in linear time when parameterized by d.
Bounded Search Tree Algorithm
Input: A Closest String instance I = (S, d), a candidate string s, and a parameter ∆d.
Output: A center string s if it exists, and “not found” otherwise.
If ∆d < 0, then return “Not found”
Choose i ∈ {1, . . . , n} such that d(s, si ) > d.
P = {p|s[p] 6= si [p]};
Choose any P 0 ⊆ P with |P 0 | = d + 1.
For each position p ∈ P 0
Let s(p) = si (p)
sret = Bounded Search Tree Algorithm (s, ∆d − 1)
If sret 6= “not found ”, then return sret
Return “not found”
The parameter ∆d is initialized to be equal to d. Since every recursive call
decreases ∆d by one and the algorithm halts when ∆d < 0, the search tree
has height at most d. At each recursive step if the candidate string s is not a
center string then it is augmented at one position as follows: a string si is chosen
uniformly at random from the set of strings that have distance greater than d
from s, and s is changed so that it is equal to si at one of the positions where s
and si disagree. This yields an upper bound of (d + 1)d on the search tree size.
Gramm et al. [13] initialize the candidate string s to be a string from S chosen
uniformly at random. We consider a slight modification where the candidate
string is initialized to be a majority string. As stated in Fact 1, any majority
string has distance at most 2d from the center string. The analysis of Gramm et
al. [13] concerning the running time and correctness of the algorithm holds for
this modification, and yields a worst-case of O(n`+nd·d2d ). Hence, the following
theorem is a trivial extension to the worst-case analysis by Gramm et al. [13]
and will be used in our smoothed analysis of Bounded Search Tree Algorithm.
Theorem 1. “Bounded Search Tree Algorithm” solves the Closest String
problem in O(n` + nd · d2d )-time.
3
3.1
Smoothed Analysis
Pertubation of Closest String Instances
Our model applies to problems defined on strings. It is parameterized by a
probability p, where 0 ≤ p ≤ 1 and is defined as follows. Given a length-`
string s[1]s[2] . . . s[`], each element is selected (independently) with probability
p. Let x be the number of selected elements (on average x = p`). Replace each
of the x positions with a symbol chosen (uniformly at random) from the set of
|Σ| symbols from Σ. Given a Closest String instance I = (S, d), we obtain
a perturbed instance of I by perturbing each string in S with probability p > 0
as previously described. We denote a perturbed instance as I 0 = (S 0 , d0 ), where
d0 = d and S 0 contains the perturbed strings of S.
This perturbation model has the effect of naturally adding noise to input.
Note that the pertubation model may have the affect of converting an instance
that has a center string to one that does not, however, this remains a valid
model for smoothed analysis. For example, the model used by Spielman and
Teng allows pertubation from a feasible linear program to a non-feasible linear
program [27].
3.2
Good Columns and Simple Instances
In this subsection we classify the instances for which the Bounded Search Tree
Algorithm performs efficiently, and bound the probability that an instance has
this classification. This classification is used in our smoothed analysis.
Definition 1. Let I = (S, d) be a Closest String instance, and Smaj be the
set of majority strings for S. We define S as simple if any string in Smaj has
Hamming distance at most d from all strings in S.
Closest String instances that are simple have the property that Bounded
Search Tree Algorithm halts immediately with a center string. In the remainder
of this subsection we aim to bound the probability that an instance is simple. The
next definitions are used to simplify the discussion of the analysis that bounds
this probability.
Recall our assumption that any instance that contains a center string has 0`
as a center string. Given an instance I = (S, d) with a center string, we refer to a
column of S as good if it contains more zeros than nonzeros and thus, guarantees
that the majority symbol is equal to the center string at that position; all other
columns are bad.
Lemma 1. Let I 0 = (S 0 , d0 ) be the perturbed instance of I = (S, d) with probabil
`
n/2−1
ity p. Then the probability that I 0 is not simple is at least 1− 1 − (q(1 − q))
`
n/2+1
n
and at most 1 − 1 − n/2+1
(q(1 − q))
, where 0 ≤ q ≤ d` (1 − 2p) + p.
Proof. Let s0i ∈ S 0 , s∗ be a closest string for S, and q denote the probability that
s0i [j] = 0 for some 0 ≤ j ≤ `. It follows that
d(si , s∗ )
d(si , s∗ )
d(si , s∗ )
q=
(1 − p) + 1 −
p=
(1 − 2p) + p,
`
`
`
which is at most d` (1 − 2p) + p. We first calculate the probability that a column
is good when n is odd. Let Xi,j be a binary random variable that is equal to 1 if
si is equal to the value of the center string (i.e. equal to 0)Pat the jth position.
For a given column j we let the number of zeros be Xj = i Xi,j .
Pr[Xj ≥ bn/2c + 1] = 1 − Pr[Xj ≤ bn/2c]
i
bn/2c X n
q
n
= 1 − (1 − q)
i
1−q
i=0
We focus on bounding (1−q)n
Pbn/2c
n
i
i=0
q
1−q
i
. Note that
n
i
is unimodal,
peaking when i is equal to bn/2c when 0 ≤ i ≤ bn/2c. We have that:
bn/2c X
S=
i=0
n
i
q
1−q
i
bn/2c−1
n
q
bn/2c − 1
1−q
≥
and similarly,
bn/2c S=
X
i=0
n
i
q
1−q
i
≤
n
bn/2c
n
≤
bn/2c
bn/2c
X i=0
q
1−q
i
q
1−q
bn/2c
[unimodality]
[geometric series]
The sum S is equal to the first term up to a small multiplicative error and
therefore, we obtain the following:
Pr[Xj > bn/2c] ≈ 1 − q bn/2c (1 − q)bn/2c
n
bn/2c
Therefore, we obtain the following bounds:
n
n/2−1
n/2+1
1−
(q(1 − q))
≤ Pr[Xj > bn/2c] ≤ 1 − (q(1 − q))
n/2 − 1
Using the previous inequality we can bound the probability that I 0 is not
simple by determining the probability that I 0 contains at least one bad column.
Thus, we get
`
n/2+1
1 − 1 − (q(1 − q))
≤ Pr[I 0 is not simple]
and
`
n
n/2+1
Pr[I is not simple] ≤ 1 − 1 −
.
(q(1 − q))
n/2 − 1
0
Similarly, these bounds exist for the case when n is even.
3.3
Smoothed Height of the Bounded Search Tree
As previously discussed, there are O(dd ) possible paths in a bounded search tree,
denoted as T , corresponding to the solutions traversed by the Bounded Search
Tree Algorithm. We now bound the size of T for perturbed instances.
Let Pi be the indicator variable describing whether the ith path in T results
in a center string, i.e. Pi = 1 if the ith path leads to a center string and Pi = 0
otherwise. The algorithm halts when Pi = 1. Let PPi =∞ be the number of paths
considered until Pi = 1 and the algorithm halts.
Lemma 2. Let I 0 = (S 0 , d0 ) be a perturbed instance with probability 0 < p ≤ 21 .
If I 0 = (S 0 , d0 ) is not a simple instance, then for sufficiently large `, constant
c > 0, and when n is between log `/ log log ` and log `, we have:
Pr[P ≥ ddpc ] ≤
1
.
ddpc
Proof. If I 0 = (S 0 , d0 ) does not have a center string then Bounded Search Tree
Algorithm will always return false; otherwise, Pi = 1 with some probability.
It follows that the expected number of paths that need to be considered is
1/ Pr[Pi = 1].
We now calculate Pr[Pi = 1]. If the candidate string s is not equal to 0` (i.e.
a center string) then there exists at least one position of s that can be augmented
so that d(s, 0` ) decreases by one; the probability of this occurring is at least 1/`.
Let Yk ∈ {0, 1, . . . , d0 } be the random variable that corresponds to the Hamming
distance between s and 0` , where k is the number of recursive iterations of
the algorithm. The process Y0 , Y1 , Y2 , . . . is a Markov chain with a barrier at
state d0 and contains varying time and state dependent transfer probabilities.
This process is overly complicated and we instead analyze the following Markov
chain: Z0 , Z1 , . . ., where Zk is the random variable that is equal to the state
number after k recursive steps and there exists infinitely many states. Initially,
this Markov chain is started like the stochastic process above, i.e. Z0 = Y0 . We
let Zk+1 = Yk − 1 if the process decreases the Hamming distance between s and
0` by one; otherwise Zk+1 = Yk+1 . After the algorithm halts, we continue with
the same transfer probabilities. We can show by induction on k that Yk ≤ Zk
for all k and therefore, Pr[Pi = 1] is at least Pr[∃t ≤ d : Zt = 0].
We made the assumption that S 0 contains only one center string, however,
this assumption is not needed – the random walk may find another center string
while not in the terminating state but this possibility only increases the probability that the algorithm terminates.
Given that the Markov chain starts in state k it can reach a halting state in at
least k steps by making transitions through the states k − 1, k − 2, . . ., 1, 0. The
probability of this happening is (1/`)k . We now incorporate the possibility that
several steps in the“wrong” direction are made in the analysis; “wrong” steps
refer to when the candidate string is altered so that the distance between the
candidate string and the center string increases. Suppose w steps in the Markov
chain are taken in wrong direction then k + w steps are needed in the “correct”
direction, and therefore, the halting state can be reached in 2w + k steps. Let
q(w, k) be the probability that Z2w+k = 0 such that the halting state is not
reached in any earlier set, under the condition that the Markov chain started in
state k. More formally,
q(w, k) = Pr[Z2w+k = 0 and Zα > 0 ∀ α < 2w + k | Z0 = k].
Clearly q(0, k) = (1/`)k , and in the general case q(w, k) is ((` − 1)/`)w (1/`)w+k
times the number of ways of arranging w wrong steps and w + k correct steps
such that the sequence starts in state k, ends in the halting state and does not
reach this state before the last step.
w
By applying the ballot theorem [15] we can deduce that there are 2w+k
2w+k
w
possible arrangements of these w wrong steps and w + k correct steps, and the
above probability is at least
w
2w + k
w
`−1
1
.
w
2w + k
`
`w+k
This expression is not defined in the case w = k = 0, however, it is equal to 1 in
this case.
The probability that Pi = 1 at the ith path is dependent on the starting
position of the Markov chain, which is equal to the number of bad columns. Let
Xbad be the number of bad columns in S 0 , which is at most 2d (by Fact 1).
Hence, we get the following:
Pr[Pi = 1] ≥
2d
X
1
2 (i−k)
Pr[Xbad = k]
X
q(w, k)
w=0
k=1
1
(i−k)
2d 2 X
X
≥ (Pr[Xbad ≤ 2d] − Pr[Xbad = 0])
q(w, k)
k=1 w=0
1
0
≥ Pr[I is not simple]
(i−k)
2d 2 X
X
q(w, k)
k=1 w=0
1
≥ 1 − 1 − (q(1 − q))
n/2+1
(i−k)
2d 2 X
` X
q(w, k)
[Lemma 1]
k=1 w=0
We now aim to find a bound on q(w, k).
1
Pr[Pi = 1] ≥ 1 − 1 − (q(1 − q))
n/2+1
(i−1) 2d
` 2 X
X
q(w, k)
w=0 k=1
1
2d+1
(i−1) ` 2 X
1
1 − 1 − (q(1 − q))
1−`
w=0
id+1
1
` 1 − `−1
n/2+1
≥ 1 − 1 − (q(1 − q))
2d+1
1
1 − `−1
n/2+1
[Fact 2]
`
n/2+1
Hence, for sufficiently large ` we have Pr[Pi = 1] = 1 − 1 − (q(1 − q))
and it follows that:
1
E[P] =
`
n/2+1
1 − 1 − (q(1 − q))
and by Markov inequality recall that for any c > 0
1
Pr[P ≥ ddcp ] ≤
ddcp
≤
1 − 1 − (q(1 − q))
n/2+1
` 1
.
ddcp 1 − exp(−`/(q(1 − q))n/2+1 )
Hence, Pr[P ≥ ddcp ] is equal to
between log `/ log log ` and log `.
1
ddcp
for significantly large ` and when n is
The following is our main theorem which provides an upper bound on expected number of paths that need to be considered before a center string is
found. An important aspect about this result is the small perturbation probability require in comparison to the instance size; the expected number of positions
to change in each string is O(log `).
Theorem 2. For some small > 0 and perturbation probability 0 ≤ p ≤ log `/`,
the expected running time of “Bounded Search Tree Algorithm” on the perturbed
instances is O(n` + nd · d2+ ) when n is between log `/ log log ` and log `, and `
is sufficiently large.
Proof. Let P 0 be the number of paths considered until a center string is found
for a perturbed instance. There are O(d2d ) possible paths in the search tree
corresponding to Bounded Search Tree Algorithm. The size of the bounded search
tree is equal to zero for simple instances and therefore, we are only required to
consider instances that are not simple. For instances that are not simple but
satisfy the conditions of Lemma 2, we use this lemma with p ≤ 2+
dc , where c > 0,
to bound the size of the search tree. Lemma 1, which describes the probability
that an instance is not simple, is also used in the following analysis.
E[P] ≤ ddcp · Pr[I 0 is not simple] +
2d
X
di Pr[P ≥ di ]
i=dcp
2+
≤d
1− 1−
n
n/2−1
(q(q − 1))
n/2 − 1
` !
+
2d
X
di Pr[P ≥ di ]
i=2+
For sufficiently large ` and when n is between log `/ log log ` and log `, we get:
2+
E[P] ≤ d
+
2d
X
di Pr[P ≥ di ]
i=2+
Therefore, we have E[P] ≤ d2+ + d − 2 − . The expected size of the search
tree is at most o(1) + d2+ . It follows form the analysis of Gramm et al. [13] that
demonstrated each recursive step takes time O(nd) and the preprocessing time
takes O(n`), that Bounded Search Tree Algorithm has expected running time of
O(n` + nd · d2+ ).
We note that we require ` to be sufficiently large, however, this restriction
is not significant since we require ` ≥ 10. As previously mentioned, the problem
can be solved efficiently when ` is relatively small (i.e. ` ≤ 10), even the trivial
algorithm that tries all |Σ|` can be used for these instances.
Acknowledgements
The author would like to thank Professor Bin Ma for his discussions and insights concerning the results presented in this paper and Professor Ming Li for
suggesting this area of study. The author is supported by NSERC Postdoctoral
Fellowship, NSERC Grant OGP0046506, NSERC Grant OGP0048487, Canada
Research Chair program, MITACS, and Premier’s Discovery Award.
References
1. A. Andoni, P. Indyk, and M. Patrascu. On the optimality of the dimensionality
reduction method. In Proc. of FOCS, pages 449–456, 2006.
2. A. Andoni and R. Krauthgamer. The smoothed complexity of edit distance. In
Proc. of ICALP, pages 357–369, 2008.
3. C. Banderier, R. Beier, and K. Mehlhorn. Smoothed analysis of three combinatorial
problems. In Proc. of MFCS, pages 198–207, 2003.
4. A. Ben-Dor, G. Lancia, J. Perone, and R. Ravi. Banishing bias from consensus
strings. In Proc. of 8th CPM, pages 247–261, 1997.
5. Z.-Z. Chen, B. Ma, and L. Wang. A three-string approach to the closest string
problem. In Proc. of 16th COCOON, pages 449–458, 2010.
6. X. Deng, G. Li, Z. Li, B. Ma, and L. Wang. Genetic design of drugs without
side-effects. SIAM Journal on Computing, 32(4):1073–1090, 2003.
7. J. Dopazo, A. Rodrı́guez, J.C. Sáiz, and F. Sobrino. Design of primers for PCR amplification of highly variable genomes. Computer Applications in the Biosciences,
9:123–125, 1993.
8. R.G. Downey and M.R. Fellows. Parameterized Complexity. Springer, 1999.
9. M.R. Fellows, J. Gramm, and R. Niedermeier. On the parameterized intractability
of closest substring and related problems. In Proc. of 19th STACS, pages 262–
273, 2002.
10. M.R. Fellows, J. Gramm, and R. Niedermeier. On the parameterized intractability
of motif search problems. Combinatorica, 26:141–167, 2006.
11. M. Frances and A. Litman. On covering problems of codes. Theory of Computing
Systems, 30(2):113–119, 1997.
12. J. Gramm, R. Niedermeier, and P. Rossmanith. Exact solutions for closest
string and related problems. In Proc. of the 12th ISAAC, pages 441–453, 2001.
13. J. Gramm, R. Niedermeier, and P. Rossmanith. Fixed-parameter algorithms for
closest string and related problems. Algorithmica, 37(1):25–42, 2003.
14. F. Hufsky, L. Kuchenbecker, K. Jahn, J. Stoye, and S. Böcker. Swiftly computing
center strings. BMC Bioinformatics, 12(106), 2011.
15. T. Konstantopoulos. Ballot theorems revisited. Statistics and Probability Letters,
24:331–338, 1995.
16. J.K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing string selection
problems. Information and Computation, 185(1):41–55, 2003.
17. M. Li, B. Ma, and L. Wang. Finding similar regions in many strings. Journal of
Computer and System Sciences, 65(1):73–96, 2002.
18. D. Lokshtanov, D. Marx, and S. Saurabh. Slightly superexponential parameterized
problems. In Proc. of the 22nd SODA, pages 760–776, 2011.
19. K. Lucas, M. Busch, S. Össinger, and J.A. Thompson. An improved microcomputer program for finding gene- and gene family-specific oligonucleotides suitable
as primers for polymerase chain reactions or as probes. Computer Applications in
the Biosciences, 7:525–529, 1991.
20. B. Ma. Why greedy works for shortest common superstring problem. In Proc. of
CPM, pages 244–254, 2008.
21. B. Ma and X. Sun. More efficient algorithms for closest string and substring
problems. SIAM Journal on Computing, 39:1432–1443, 2009.
22. B. Ma, J. Tromp, and M. Li. PatternHunter: faster and more sensitive homology
search. Bioinformatics, 18(3):440–445, 2002.
23. B. Manthey and R. Reischuk. Smoothed analysis of binary search trees. Theoretical
Computer Science, 3(378):292–315, 2007.
24. G. Pavesi, G. Mauri, and G. Pesole. An algorithm for finding signals of unknown
length in DNA sequences. Bioinformatics, 17:S207–S214, 2001.
25. P. Pevzner and S. Sze. Combinatorial approaches to finding subtle signals in DNA
strings. In Proc. of 8th ISMB, pages 269–278, 2000.
26. V. Proutski and E.C. Holme. Primer master: A new program for the design and
analyiss of PCR primers. Computer Applications in the Biosciences, 12:253–255,
1996.
27. D.A. Spielman and S.-H. Teng. Smoothed analysis of algorithms: why the simplex
algorithm ususally takes polynomial time. Journal of the ACM, 51:296–305, 2004.
28. M. Tompa and et al. Assessing computational tools for the discovery of transcription factor binding sites. Nature, 23(1):137–144, 2005.
29. L. Wang and B. Zhu. Efficient algorithms for the closest string and distinguishing
string selection problems. In Proc. of 3rd FAW, pages 261–270, 2009.
30. R. Zhao and N. Zhang. A more efficient closest string algorithm. In Proc. of 2nd
BICoB, pages 210–215, 2010.