Approximate NN queries on Streams with Guaranteed Error/performance Bounds Nick Koudas @ AT&T labs-research Beng Chin Ooi , Kian-Lee Tan , Rui Zhang @ National University of Singapore Problem • Problem: kNN search. • Environment: data stream (one scan; memory constraint). • Approximate Solution: e-approximate kNN (ekNN). • Motivation: Applications in which absolute error is preferable or more straightforward. IP: 137.132.48.120 137.132.48.121 … • Two Optimization Problems: – memory optimization for a given error bound: given an error bound e, use as little memory as possible to answer ekNN queries. – error minimization for a given memory size: given a fixed amount of memory, achieve the best accuracy for ekNN queries. • Requirements: – One scan algorithm. – Satisfies the constraints. – Efficient updates and query processing. A Framework • Divide space into equal square-shaped cells. • Maintain at most K points in each cell. • For any k≤K, absolute error of kNN distance is bounded by dM, the maximum distance within a cell. For Euclidean distance: dM = d / u where d is dimensionality; u is the number of cells each dim is divided to. Maintenance of the Points --aDaptive Indexing on Streams by space-filling Curves (DISC) • Cells are not explicitly maintained, only points. • Cells linearized according to Z-curve. • Z-value of the cell is the key of a point. • Points maintained in a B*-tree. • An efficient merge-cell algorithm possible. Algorithm: Build index • m: the order of Z-curve, 2m cells each dim. m • If e given, d / 2 e e , we get me log 2 ( d / e) . me is integer, so me log 2 ( d / e) • If memory constraint given, set a large enough m. • Build index – Initialize m – Read a record P, calculate Z-value, search the B*-tree and find out Nc: number of existing points in the cell P belongs to. – If Nc < K • Insert P to the B*-tree. – Else • Discard one and insert P. – If memory runs out //this only happens for the error minimization problem • Merge cells and let m=m-1 – Go back to Step 2 (Read next record) Algorithm: Merge Cells • General Merge-Cell – Apply to any structure. – For each new cell, find all the points of the old cells in it, and merge them. • Bulk Merge-Cell – Only apply to DISC. – Scan all the leaf pages once. Algorithm: KNN search • W: a window query centered at the center of the cell Q is in; and with gradually increasing side length s. • Find the kNN to Q within W. – If the kNN distance is no larger than the distance between the nearest side of W to Q and Q, search terminates; – Else increase s by 1/u . Experiments Questions ?
© Copyright 2026 Paperzz