documentclass[11pt]{article} - Institute of Information Science

Title: Fast Algorithms for the Density Finding Problem
1
1
D. T. Lee , Tien-Ching Lin and Hsueh-I Lu
Department of Computer Science and Information Engineering
National Taiwan University
Taipei, Taiwan
Email: {dtlee,kero,hil}@csie.ntu.edu.tw
1
Also with Institute of Information Science,. Academia Sinica, Nankang, Taipei, Taiwan
Abstract:
We study the problem of finding a specific density subsequence of a sequence arising from the analysis of
biomolecular sequences. Given a sequence $A = (a_1, w_1), (a_2, w_2),\ldots, (a_n, w_n)$ of $n$ ordered
pairs $(a_i,w_i)$ of real numbers a_i and width $w_i > 0$ for each $1 \le i \le n$, two nonnegative real
numbers $\ell$, $u$ with $\ell \leq u$ and a real number $\delta$, the {\sc Density Finding Problem} is to find the
consecutive subsequence $A(i^*,j^*)$ over all $O(n^2)$ consecutive subsequences $A(i,j)$ with width constraint
satisfying $\ell \leq w(i,j) = \sum_{r=i}^j w_r \leq u$ such that its density $d(i^*,j^*) = \sum_{r=i^*}^{j*} a_r /
w(i^*,j^*)$ is closest to $\delta$. The extensively studied {\sc Maximum-Density Segment Problem} is a special
case of the {\sc Density Finding Problem} with $\delta = \infty$. We show that the {\sc Density Finding Problem}
has a lower bound $\Omega(n \log n)$ in the algebraic decision tree model of computation. We give an algorithm
for the {\sc Density Finding Problem} that runs in optimal $O(n \log n)$ time and $O(n \log n)$ space for the
case when there is no upper bound on the width of the sequence, i.e., $u=w(1,n)$. For the general case, we give
an algorithm that runs in $O(n \log^{2} m)$ time and $O(n + m \log m)$ space, where $m = min
{\frac{u-\ell}{w_{min}}, n\}$ and $w_{min} = \min_{r = 1}^{n} w_r$. As a byproduct, we give another $O(n)$
time and space algorithm for the {\sc Maximum-Density Segment Problem}.
Keywords: maximum-density segment problem, density finding problem, slope selection problem, convex hull,
computational geometry, GC content, DNA sequence, bioinformatics.