Title: Fast Algorithms for the Density Finding Problem 1 1 D. T. Lee , Tien-Ching Lin and Hsueh-I Lu Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan Email: {dtlee,kero,hil}@csie.ntu.edu.tw 1 Also with Institute of Information Science,. Academia Sinica, Nankang, Taipei, Taiwan Abstract: We study the problem of finding a specific density subsequence of a sequence arising from the analysis of biomolecular sequences. Given a sequence $A = (a_1, w_1), (a_2, w_2),\ldots, (a_n, w_n)$ of $n$ ordered pairs $(a_i,w_i)$ of real numbers a_i and width $w_i > 0$ for each $1 \le i \le n$, two nonnegative real numbers $\ell$, $u$ with $\ell \leq u$ and a real number $\delta$, the {\sc Density Finding Problem} is to find the consecutive subsequence $A(i^*,j^*)$ over all $O(n^2)$ consecutive subsequences $A(i,j)$ with width constraint satisfying $\ell \leq w(i,j) = \sum_{r=i}^j w_r \leq u$ such that its density $d(i^*,j^*) = \sum_{r=i^*}^{j*} a_r / w(i^*,j^*)$ is closest to $\delta$. The extensively studied {\sc Maximum-Density Segment Problem} is a special case of the {\sc Density Finding Problem} with $\delta = \infty$. We show that the {\sc Density Finding Problem} has a lower bound $\Omega(n \log n)$ in the algebraic decision tree model of computation. We give an algorithm for the {\sc Density Finding Problem} that runs in optimal $O(n \log n)$ time and $O(n \log n)$ space for the case when there is no upper bound on the width of the sequence, i.e., $u=w(1,n)$. For the general case, we give an algorithm that runs in $O(n \log^{2} m)$ time and $O(n + m \log m)$ space, where $m = min {\frac{u-\ell}{w_{min}}, n\}$ and $w_{min} = \min_{r = 1}^{n} w_r$. As a byproduct, we give another $O(n)$ time and space algorithm for the {\sc Maximum-Density Segment Problem}. Keywords: maximum-density segment problem, density finding problem, slope selection problem, convex hull, computational geometry, GC content, DNA sequence, bioinformatics.
© Copyright 2024 Paperzz