Algorithms For Solving Near Point Problems - DigiNole!

Florida State University Libraries
Electronic Theses, Treatises and Dissertations
The Graduate School
2011
Algorithms for Solving Near Point Problems
Michael Connor
Follow this and additional works at the FSU Digital Library. For more information, please contact [email protected]
THE FLORIDA STATE UNIVERSITY
COLLEGE OF ARTS AND SCIENCES
ALGORITHMS FOR SOLVING NEAR POINT PROBLEMS
By
MICHAEL CONNOR
A Dissertation submitted to the
Department of Computer Science
in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy
Degree Awarded:
Spring Semester, 2011
The members of the committee approve the dissertation of Michael Connor defended on
April 4, 2011.
Piyush Kumar
Professor Directing Thesis
Washington Mio
University Representative
Feifei Li
Committee Member
Xiuwen Liu
Committee Member
Approved:
David Whalley, Chair, Department of Computer Science
The Graduate School has verified and approved the above-named committee members.
ii
TABLE OF CONTENTS
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
x
1 Introduction
1.1 Problem Definitions . . . . . . . . . . . . . . . . . . .
1.1.1 k-Nearest Neighbor Graphs . . . . . . . . . . .
1.1.2 Nearest Neighbor Searching . . . . . . . . . . .
Nearest Neighbor Searching in the Plane. . . .
High Dimensional Nearest Neighbor Searching.
1.1.3 Geometric Minimum Spanning Trees . . . . . .
1.2 Organization . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
2
3
3
4
2 Background
2.1 Quadtrees . . . . . . . . . . . . . . . . . . . .
2.2 Morton Ordering . . . . . . . . . . . . . . . .
2.3 KD-Trees . . . . . . . . . . . . . . . . . . . .
2.3.1 Advanced Splitting Rules . . . . . . .
2.4 Kruskal’s MST Algorithm and Union Find . .
2.4.1 Well Separated Pair Decomposition .
2.5 Bi-chromatic Pair Closest Pair Computation .
2.6 Voronoi Cell Diagrams . . . . . . . . . . . . .
2.7 Delaunay Graphs . . . . . . . . . . . . . . . .
2.7.1 Principal Component Analysis . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
5
6
8
8
8
9
9
11
12
3 k-Nearest Neighbor Graphs
3.0.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 The knng Algorithm . . . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Preprocessing and Morton Ordering On Floating Points
3.1.2 Sliding Window Phase . . . . . . . . . . . . . . . . . . .
3.1.3 Search Phase . . . . . . . . . . . . . . . . . . . . . . . .
3.1.4 Parallel Construction . . . . . . . . . . . . . . . . . . . .
3.1.5 Handling large data sets . . . . . . . . . . . . . . . . . .
3.2 Analysis of the knng Algorithm . . . . . . . . . . . . . . . . . .
3.3 Experimental Analysis of the knng Algorithm . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
14
14
15
16
17
17
17
19
24
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3.3.1
3.3.2
3.3.3
3.3.4
3.3.5
ANN . . . . . . . . . . . . .
Experimental Data . . . . .
Intel Architecture . . . . .
Construction Time Results.
AMD Architecture . . . . .
Sun Architecture . . . . . .
4 Planar Nearest Neighbor Search
Notation. . . . . . . . .
4.1 The Algorithm . . . . . . . . .
4.2 Analysis of DelaunayNN . . . .
4.3 Experimental Analysis . . . . .
4.3.1 FDH Algorithm . . . . .
4.3.2 Data Distributions . . .
4.3.3 Experimental Results .
4.4 3-Dimensional Experiments . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
25
25
25
28
30
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
33
33
34
37
39
40
40
41
42
5 Geometric Minimum Spanning Trees
5.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 The GeoFilterKruskal algorithm . . . . . . . . . . . . . .
5.3 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Analysis of the Running Time . . . . . . . . . . . . . . . .
5.5 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . .
5.5.1 Comparing Quadtrees and Fair Split Trees . . . .
5.6 Geometric Minimum Spanning Tree Experimental Setup .
5.7 Geometric Minimum Spanning Tree Experimental Results
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
50
50
51
51
53
53
56
57
6 Nearest Neighbor Search of High Dimensional SIFT Data
6.0.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 The PCANN Algorithm . . . . . . . . . . . . . . . . . . . . . . .
6.2 High Dimensional Nearest Neighbor Search Experimental Setup .
6.2.1 FLANN . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.2 LSB-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.3 Experiment Data . . . . . . . . . . . . . . . . . . . . . . .
6.3 High Dimensional Nearest Neighbor Search Experimental Results
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
62
63
63
66
66
66
67
67
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7 Conclusions
72
A Full Proof of the GMST Algorithm Analysis
High Probability Bound Analysis. . . . . . . . . . . . . . . . . .
73
73
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
iv
LIST OF TABLES
3.1
Construction times for k = 1-nearest neighbor graphs constructed on nonrandom 3-dimensional data sets. Each graph was constructed using 8 threads
on Intel architecture. All timings are in seconds. . . . . . . . . . . . . . . . .
25
5.1
Algorithm Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
5.2
Point Distribution Info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
v
LIST OF FIGURES
1.1
A 2-nearest neighbor graph built on a two dimensional point set. . . . . . . .
2
1.2
A simple example of nearest neighbor searching. Here, the query is in green.
The red point marks the answer to the query, with the rest of the data points
in blue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
A geometric minimum spanning tree built over a two dimensional point set. .
4
2.1
The Morton order curve preceding the upper left corner, and following the
lower right corner of a quadtree hyper-cube, will never re-enter the region. .
6
The smallest quadtree hypercube containing two points will also contain all
points lying between the two in Morton order. . . . . . . . . . . . . . . . . .
7
2.3
A KD-tree constructed over a two dimensional point set . . . . . . . . . . . .
7
2.4
A Voronoi Cell Diagram built over a two dimensional point set. . . . . . . . .
10
2.5
A Delaunay graph constructed over a two dimensional point set. . . . . . . .
11
2.6
Compass routing on a Delaunay graph to locate a nearest neighbor . . . . . .
12
3.1
Pictorial representation of Algorithm 2, Line 5. Since all the points inside
the approximate nearest neighbor ball of pi have been scanned, the nearest
⌈rad(Ai )⌉
neighbor has been found. This happens because pi
is the largest point
with respect to the Morton ordering compared to any point inside the box. Any
point greater than pi+k in Morton ordering cannot intersect the box shown. A
−⌈rad(Ai )⌉
similar argument holds for pi
and pi−k . . . . . . . . . . . . . . . . . .
16
(a) B lands cleanly on the quadtree box BQ twice its size. (b) B lands cleanly
on a quadtree box 22 times its size. In both figures, if the upper left corner
of B lies in the shaded area, the box B does not intersect the boundary of
BQ . Obviously, (a) happens with probability 14 (probability ( 12 )d in general
2
9
22 −1 d
2
dimension) and (b) happens with probability ( 2 2−1
2 ) = 16 (probability ( 22 )
in general dimension). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
All boxes referred in Lemma 3.2.3. . . . . . . . . . . . . . . . . . . . . . . . .
22
2.2
3.2
3.3
vi
3.4
Graph of 1-NN graph construction time vs. number of data points on the Intel
architecture. Each algorithm was run using 8 threads in parallel. . . . . . . .
26
Graph of k-NN graph construction time for varying k on the Intel architecture.
Each algorithm was run using 8 threads in parallel. Data sets contained one
million points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
Graph of 1-NN graph construction time for varying number of threads on Intel
architecture. Data sets contained ten million points. . . . . . . . . . . . . . .
27
Graph of memory usage per point vs. data size on Intel architecture. Memory
usage was determined using valgrind. . . . . . . . . . . . . . . . . . . . . . .
27
Graph of cache misses vs. data set size on Intel architecture. All data sets
were uniformly random 3-dimensional data sets. Cache misses were determined
using valgrind which simulated a 2 MB L1 cache. . . . . . . . . . . . . . . . .
28
Graph of 1-NN graph construction time vs. number of data points on AMD
architecture. Each algorithm was run using 16 threads in parallel. . . . . . .
29
Graph of k-NN graph construction time for varying k on AMD architecture.
Each algorithm was run using 16 threads in parallel. Data sets contained ten
million points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
Graph of 1-NN graph construction time for varying number of threads on
AMD architecture. Data sets contained ten million points. . . . . . . . . . .
30
Graph of 1-NN graph construction time vs. number of data points on Sun
architecture. Each algorithm was run using 128 threads in parallel. . . . . . .
31
Graph of k-NN graph construction time for varying k on Sun architecture.
Each algorithm was run using 128 threads in parallel. Data sets contained ten
million points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
Graph of 1-NN graph construction time for varying number of threads on Sun
architecture. Data sets contained ten million points. . . . . . . . . . . . . . .
32
4.1
The three layers of the query algorithm. . . . . . . . . . . . . . . . . . . . . .
34
4.2
Here we see the center vertex p, and the sectors defined by rays passing through
the Voronoi vertices. To find a nearer neighbor, we locate which sector the
query lies in (via a binary search), then check the distance to the adjacent
point that lies in that sector. . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
4.3
Two degenerate cases for linear degree vertices. . . . . . . . . . . . . . . . . .
37
4.4
Proof of Lemma 4.2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
vii
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
4.15
5.1
Showing average time per query versus data set size for points taken from
uniform distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
Showing average time per query versus data set size for points taken from a
unit circle. Query points were taken from the smallest square enclosing the
circle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
Showing average query time versus data set size for points taken from the unit
circle, plus some points chosen uniformly at random from the square containing
the circle. Query points taken from the circle. . . . . . . . . . . . . . . . . . .
44
Showing average time per query versus data set size for points taken from a
parabola. Query points were taken from the smallest rectangle enclosing the
parabola. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
Showing average time per query versus data set size for points taken from a
distribution with linear degree vertices. Note that ANN is faster than the
Delaunay algorithm run without the Voronoi cell search capability. . . . . . .
45
Showing average time per point to pre-process data sets for queries. Data was
taken uniformly at random from the unit square. . . . . . . . . . . . . . . . .
46
Showing average time per query versus data set size for points taken from
uniform distribution in three dimensions. . . . . . . . . . . . . . . . . . . . .
46
Showing average time per query versus data set size for points taken from the
circular distribution in three dimensions. . . . . . . . . . . . . . . . . . . . .
47
Showing average time per query versus data set size for points taken from the
“fuzzy“ circular distribution in three dimensions. . . . . . . . . . . . . . . . .
47
Showing average time per query versus data set size for points taken from the
parabolic distribution in three dimensions. . . . . . . . . . . . . . . . . . . .
48
Showing average time per query versus data set size for points taken from a
distribution with linear degree vertices. In this case, ANN wins due to the
necessity of processing the linear degree vertices by sequential scan. . . . . .
48
This figure demonstrates the run time gains of the algorithm as more threads
are used. Scaling for two architectures. The AMD line was computed on the
machine described in Section 5.6. The Intel line used a machine with four
2.66GHz Intel(R) Xeon(R) CPU X5355, 4096 KB of L1 cache, and 8GB total
memory. For additional comparison, we include KNNG construction time
using a parallel 8-nearest neighbor graph algorithm. All cases were run on 20
data sets from uniform random distribution of size 106 points, final total run
time is an average of the results. . . . . . . . . . . . . . . . . . . . . . . . . .
53
viii
5.2
Comparison of the number of WSPs versus the number of points for quadtrees
and fair split trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
5.3
Comparison of the number of points versus GMST construction time for quadtrees
and fair split trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
5.4
Separation factor of WSPs versus the number of WSPs produced. . . . . . .
55
5.5
Separation of WSPs versus the error in the length of the GMST. . . . . . . .
56
5.6
Total running time for each algorithm over varying sized data sets. Data was
from uniformly random and two dimensional. . . . . . . . . . . . . . . . . . .
58
Total running time for each algorithm over varying sized data sets. Data was
from uniformly random and three dimensional. . . . . . . . . . . . . . . . . .
59
Total running time for each algorithm over varying sized data sets. Data was
from uniformly random and four dimensional. . . . . . . . . . . . . . . . . . .
59
Showing average error and standard deviation when comparing GeoMST to
GeoFK1 on varying distributions. Data size was 106 points in two dimensions.
60
Showing average error and standard deviation when comparing GeoMST2 to
GeoFK1 on varying distributions. Data size was 106 points in two dimensions.
60
Showing average error and standard deviation when comparing Triangle to
GeoFK1 on varying distributions. Data size was 106 points in two dimensions.
61
6.1
Timing results for PCANN versus FLANN on SIFT data. . . . . . . . . . . .
68
6.2
Timing results for PCANN versus LSB-Tree on SIFT data. . . . . . . . . . .
69
6.3
Average error for PCANN, FLANN and LSB-Tree on SIFT data. . . . . . . .
69
6.4
Timing results for PCANN versus FLANN on uniform random data. . . . . .
70
6.5
Timing results for PCANN versus LSB-Tree on uniform random data.
. . .
70
6.6
Average error for PCA projection for SIFT and uniform random data. Average
is from 1000 queries on 100000 data points. . . . . . . . . . . . . . . . . . . .
71
Timing results for sequential PCANN versus parallel on SIFT data. . . . . .
71
5.7
5.8
5.9
5.10
5.11
6.7
ix
ABSTRACT
Near point problems are widely used in computational geometry as well as a variety of
other scientific fields. This work examines four common near point problems and presents
original algorithms that solve them.
Planar nearest neighbor searching is highly motivated by geographic information system
and sensor network problems. Efficient data structures to solve near neighbor queries in the
plane can exploit the extreme low dimension for fast results. To this end, DealaunayNN is
an algorithm using Delaunay graphs and Voronoi cells to answer queries in O(log n) time,
faster in practice than other common state-of-the art algorithms.
k-Nearest neighbor graph construction arises in computer graphics in areas of normal
estimation and surface simplification. This work presents knng, an efficient algorithm using
Morton ordering to solve the problem. The knng algorithm exploits cache coherence and
low storage space, as well as being extremely optimize-able for parallel processors.
The GeoFilterKruskal algorithm solves the problem of computing geometric minimum
spanning trees. A common tool in tackling clustering problems, GMSTs are an extension
of the minimum spanning tree graph problem, applied to the complete graph of a point
set. By using well separated pair decomposition, bi-chromatic closest pair computation,
and partitioning and filtering techniques, GeoFilterKruskal greatly reduces the total computation required. It is also one of the only algorithms to compute GMSTs in a manner
that lends itself to parallel computation; a major advantage over its competitors.
High dimensional nearest neighbor searching is an expensive operation, due to an exponential dependence on dimension from many lower dimensional solutions. Modern techniques to solve this problem often revolve around projecting data points into a large number
of lower dimensional subspaces. PCANN explores the idea of picking one particularly relevant subspace for projection. When used on SIFT data, principal component analysis
allows for greatly reduced dimension with no need for multiple projection. Additionally,
this algorithm is also highly motivated to make use of parallel computing power.
x
CHAPTER 1
INTRODUCTION
Many problems in computational geometry, particularly those dealing with point clouds,
use near point queries or data structures as building blocks to an eventual solution. As such,
the practical efficiency of these building blocks becomes an important factor in considering
the design of the solution. This work will examine several computational geometry problems
in an effort to expand the toolbox of near neighbor problem solvers.
One particular aspect motivating this work is a lack of parallel friendly algorithms.
Computer architecture is increasingly providing multiple processing units, and algorithms
that take an eye towards taking advantage of this processing power will have an increasing
advantage.
In this introductory chapter, four distinct problems in computational geometry will be
defined. In addition, common uses for these problems will be identified, and (hopefully) the
reasons to seek more practical algorithms for solving them will be made clear.
For all research subgroups, the goal will be three-fold. First, design an algorithm that
can be competitive with the current state of the art. Second, complete a theoretical analysis
of the algorithm’s runtime. The obvious goal is to improve the running time when compared
to current methods, or to match the running time with fewer theoretical restrictions on the
data (such as assuming a uniform distribution). Finally, algorithms should be implemented
in the most efficient manner, including parallel implementation where possible. This will
allow for a practical comparative analysis with other implementations via experimentation,
as well as allowing other interested parties to replicate experimental results.
1.1
1.1.1
Problem Definitions
k-Nearest Neighbor Graphs
A k-nearest neighbor graph is defined for a set of points by creating an edge from every
point to the k nearest points in the data set. More formally, let P = {p1 , p2 , . . . , pn } be a
point cloud in Rd . For each pi ∈ P , let Nik be the k points in P , closest to pi . The k-nearest
neighbor graph is a graph with vertex set {p1 , p2 , . . . , pn } and edge set E = {(pi , pj ) : pi ∈
Njk or pj ∈ Nik }.
In the problem of surface reconstruction, one is given a set of points that are known
to be the vertices of a triangulation of the surface. The goal of the surface reconstruction
problem is to reconstruct that triangulation from the set of points. One method of solving
1
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
Figure 1.1: A 2-nearest neighbor graph built on a two dimensional point set.
the problem is to consider local sections of the surface, as defined by the k-nearest neighbors
of all the points. Then, one can use an algorithm to find fittings for that neighborhood [63].
By identifying how well certain geometric shapes fit the neighborhood, they can be used to
reconstruct difficult regions of the point cloud, such as corners, edges, and other features.
In this instance, a fast algorithm to find k-nearest neighbors for every point in the set
allows for fast identification of the neighborhoods, and more time to be spent on fitting the
features.
Additionally, the problem of computing k-nearest neighbor graph computation arises
in many other applications and areas including computer graphics, visualization, pattern
recognition, computational geometry and geographic information systems. In graphics and
visualization, computation of k-nearest neighbor graph forms a basic building block in solving many important problems including normal estimation [66] , surface simplification [79] ,
finite element modeling [27], shape modeling [80], watermarking [32], virtual walkthroughs
[26] and surface reconstruction [4, 45].
1.1.2
Nearest Neighbor Searching
Define the fundamental nearest neighbor problem as follows: given a set P of n points
in Rd , and a query point q (also in Rd ), output pi ∈ P such that the distance dℓ (q, p) is
minimized over all p ∈ P .
Nearest neighbor search is a fundamental geometric problem important in a variety of
applications including data mining, machine learning, pattern recognition, computer vision,
graphics, statistics, bio-informatics, and data compression [29, 6].
Nearest Neighbor Searching in the Plane. In this instance of the problem, the
focus is on efficient methods for solving the problem for d=2. This is an important distinction, as nearest neighbor searching in low dimensions is much less difficult than higher
dimensions.
Planar nearest neighbor search is one of the iconic nearest neighbor problems, originally
presented as the “post office problem” by Knuth in the 1970s. To wit, when looking at a
map, how do you decide which is the closest post office to your location? With the advent
2
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
Figure 1.2: A simple example of nearest neighbor searching. Here, the query is
in green. The red point marks the answer to the query, with the rest of the data
points in blue.
of interactive maps available on any number of devices, this problem has re-emerged as a
fundamental tool in dynamic route planning.
Applications of the nearest neighbor problem in the plane are also highly motivated by
problems in geographic information systems [85], particle physics [89] and chemistry [12].
High Dimensional Nearest Neighbor Searching. High dimensional nearest neighbor search is not fundamentally different from lower dimensions. However, nearest neighbor
searching suffers from what is known as the “curse of dimensionality.” As the dimension of
the data points grows linearly, the running time of the algorithm can grow exponentially.
Because of this exponential dependence on dimension inherent to most nearest neighbor
algorithms, different approaches are required.
Scale-invariant feature transform(SIFT) is an algorithm in computer vision designed to
identify objects. “Interesting” features are identified in a training image, and stored in such
a way as to create a description of the object. In order to recognize an unknown object
as being similar to one previously identified, these feature descriptions must be compared
via nearest neighbor searching. There are two aspects to consider when optimizing nearest
neighbor searches for this type of use. The more nearest neighbor features can be identified,
the more responsive a vision system based on this sort of data can process images, up to or
surpassing real time identification. In addition, more efficient methods can allow for higher
dimensional points to be processed, allowing for more intricate features to be stored and
used.
High dimensional nearest neighbor searching is a common problem other aspects of
computer vision, where it is used for feature matching [75], object recognition [87, 74], and
image retrieval [60].
1.1.3
Geometric Minimum Spanning Trees
The minimum spanning tree of a graph seeks to minimize the weight of the graph edges
connecting all vertices. In the geometric problem, the problem is finding the minimum
3
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
Figure 1.3: A geometric minimum spanning tree built over a two dimensional point set.
weight spanning tree of the complete undirected graph over a set of points, with the weight
d
is given by the distance between the
X points. Formally, given a set of n points, P , in R ,
output graph G = P, E such that
|e| is minimized.
e∈E
Computing the geometric minimum spanning tree (GMST) is a classic computational
geometry problem, but it frequently arises in other fields. Actin is a protein found in the
muscle cells of animals. Understanding how Actin interacts with cell structures is a key
factor in understanding cell locomotion and morphology. Digital imaging interpretation
of two dimensional cell images greatly speeds up the process of identifying these proteins.
GMSTs can be used in this sort of interpretation to rapidly identify these proteins, alleviating the need for tedious visual scans by a human, by creating meaningful connections in
these images that can be matched to known protein structures.
GMST computation arises in many other applications as well, including clustering and
pattern classification [100] , surface reconstruction [62] , cosmology [11, 17] , TSP approximations [5] and computer graphics [58].
1.2
Organization
In chapter 2, background information on many tools used to create algorithms to solve
these problems will be discussed. Subsequent chapters will deal specifically with each problem in turn, providing background on the current state-of-the art, a new algorithm design,
theoretical analysis and extensive experimentation.
4
CHAPTER 2
BACKGROUND
In the previous chapter, four near point problems were identified, and practical motivations for designing more efficient algorithms discussed. Nearest neighbor search, k-nearest
neighbor graphs, and geometric minimum spanning trees are all well established problems
in computational geometry, and just as they are tools to solve more complex problems,
there exists a toolbox of useful structures to solve them. In this chapter, a number of techniques will be discussed that are key components in the algorithms proposed in subsequent
chapters.
These tools are presented as black boxes, with an eye towards understanding their input
and output as opposed to their inner workings. For a more thorough analysis, references to
classic works on these problems have been provided.
2.1
Quadtrees
One of the first types of spatial subdivision trees, the quad-tree, and an algorithm
to construct it, was introduced in 1975 by J. Bentley [44]. A quad-tree is defined by
partitioning space into 2d equal, axis-aligned hyper-cubes. Each subspace is then further
divided, until all points in the data set are contained within their own hyper-cube. Naively,
it can be constructed in O(n log n) time by first sorting all input points according to their
Morton order(see 2.2, then computing the smallest quad-tree boundary lying between any
two adjacent points. This constructs a tree containing O(n) elements. However, it has
since been shown that an equivalent tree can be constructed in O(n) time, by reducing the
problem to a Cartesian tree [25]. Nearest neighbor algorithms based on quad-trees have
been theorized, with expected running times similar to those based on other trees, such
as kd-trees(see 2.3). In general, they are thought to be less efficient in practice [13] [23].
In chapter 3 an algorithm for constructing k-Nearest Neighbor graphs using quadtrees is
presented.
2.2
Morton Ordering
Morton order is a space filling curve. It reduces points in Rd to a one dimensional space,
while attempting to preserve locality. The Morton order curve can be conceptually achieved
5
by building a quadtree, then recursively ordering the hyper-cubes. Morton order curves are
sometimes called Z curves because these hyper-cubes are ordered so as to form a Z.
In practice, Morton order can be determined by computing the Z-value for all the data
points, then ordering them. The Z-value is a bitwise operation on the coordinates of a
point. For a particular point, the bits of the coordinate values are interleaved. The resulting
number is the Z-value. By constructing the curve in this way, we can implicitly define a
quadtree.
Chan [24] showed that the relative Morton order of two integer points can be easily
calculated, by determining which pair of coordinates have the first differing bit in binary
notation in the largest place. He further showed that this can be accomplished using a
few binary operations. Using this method, Morton order can be reduced to sorting. In
chapter 3.1.1 a method for computing relative Morton order on floating points is presented.
Morton order curves have two simple properties that are useful in nearest neighbor
searching. The first, shown in Figure 2.1, is that the curve does not “double back” on
itself. Once the Morton order leaves a hyper-cube of the quadtree, it will not intersect that
hypercube again.
Figure 2.1: The Morton order curve preceding the upper left corner, and following
the lower right corner of a quadtree hyper-cube, will never re-enter the region.
The second property, shown in Figure 2.2, states that a quadtree hyper-cube containing
two points on the Morton order curve will also contain all points which lie on the curve
between the original two. In chapter 3, an algorithm for construction of k-nearest neighbor
graphs using Morton order curves is presented.
2.3
KD-Trees
As an improvement over the quad-tree, Bentley introduced the kd-tree [13]. This is
another spatial decomposition tree, which relaxes the requirement of the quad-tree that all
regions be divided equally. Instead, areas are divided into axis-aligned hyper-rectangles.
The method for choosing the “splitting’ point varies, but the goal is always to divide the
points as evenly as possible, while simultaneously attempting to keep the ratio of the side
6
Figure 2.2: The smallest quadtree hypercube containing two points will also contain all points lying between the two in Morton order.
b
b
b
b
b
b
b
b
b
b
b b
b
b
b
Figure 2.3: A KD-tree constructed over a two dimensional point set
length as low as possible. A common method for choosing the split is to rotate through the
dimensions in the point set. That is, the first level of the tree is formed by sorting points
according to their x coordinate, then cutting it with an orthogonal plane at the median
point. The next level would be split similarly on the y axis, and so on.
It is worth mentioning that the run-time of nearest neighbor algorithms based on these
trees is related to both the height of the tree, and the aspect ratio of the hyper-rectangles.
If the rectangles are too skinny, nearest neighbor balls will intersect more than a constant
number, thus inflating the running time.
One improved method for construction of kd-trees modifies the way the splitting plane is
chosen. Instead of merely rotating through the dimensions, the splitting dimension at each
internal node is determined by the “spread’ of points in that dimension. This, in practice,
helps to bound the aspect ratio of the hyper-rectangles. This is accepted as the standard
kd-tree splitting rule, and guarantees a tree of height O(log n) [46]. It does not, however,
ensure that the hyper-rectangles are well-formed (having a constant factor aspect ratio).
7
Other splitting rules have been introduced to address this problem, and are described below.
2.3.1
Advanced Splitting Rules
Many improvements have been made to the implementation of KD-trees, based on the
way the hyper-rectangles are split. Some of these include the Midpoint Rule [46], the Sliding
Midpoint Rule [61], the Balance Split Rule [8] and the Hybrid Split Rule [8].
The Midpoint Rule dictates that rather than split the hyper-rectangles along the dimension of greatest spread, they should be split by a plane orthogonal to the midpoint of
the longest side. This begins to address the need to keep the aspect ratio a constant, but
can lead to a tree of arbitrary height, as well as empty internal nodes.
The Sliding Midpoint Rule is a practical modification of the Midpoint Rule, introduced
by Arya and Mount in their ANN nearest neighbor library [70]. In the case of an empty
hyper-rectangle, the split is shifted to lie on the nearest point contained in the hyperrectangle. This eliminates the risk of empty nodes. It can be shown that while the height
of this tree could still be as bad as O(n), nearest neighbor searches done on it work very
well in practice, and have an expected running time of O(log n)[8].
The Balance Split Rule is a further relaxation of the Sliding Midpoint Rule. Again, the
longest side of the hyper-rectangle is chosen to be split. This time, however, an orthogonal
splitting plane is chosen so that the number of points on either side are roughly balanced.
This yields an expected query time of O(log n), and has a worst case query time of O(log d n).
The Hybrid Split Rule alternates between the Sliding Midpoint Rule and the Balance
Split Rule at different levels of the tree in an attempt to limit both the aspect ratio, and
the height of the tree. This rule yields a tree with a height of O(log n), while still having
rectangles of bounded aspect ratio, which gives an expected running time of O(log d n).
2.4
Kruskal’s MST Algorithm and Union Find
Published in 1956, Kruskal’s algorithm [56] finds the minimum weight spanning tree
of a graph. It begins by placing the edges in the graph into a priority queue, ordered by
weight. Then it builds a structure (commonly called Union Find) which will identify edges
in terms of connected components. Edges are removed from the head of the priority queue
and inserted into the mst if they will not create a cycle. This proceeds until n − 1 edges
have been added (where n is the number of vertices in the graph).
While this algorithm was designed to work for arbitrary graphs, it can be used to find
the mst built on a set of points. In this case, the original graph is considered to be the
complete graph of the point sets, and the weight to be the euclidean distance between them.
However, since there are O(n2 ) edges in the complete graph, this is an expensive approach.
Sections 2.4.1 and 2.5 describe tools used to improve the viability of this method, and
chapter 5 details a new algorithm for efficiently using Kruskal’s method to find geometric
minimum spanning trees.
2.4.1
Well Separated Pair Decomposition
Proposed by Callahan, the well-separated pair decomposition uses a spatial decomposition tree to create a simplified representation of the complete graph of a point set. In
8
essence, the n2 edges of the complete graph are represented by O(n) components. Formally,
given a spatial decomposition tree Q, built on a set of points P in Rd , the well separated
pair decomposition(WSPD) of the tree consists of all pairs of nodes (a1 , b1 ), ..., (am , bm )
such that for every point p ∈ a and every point q ∈ b, dℓ (a, b) < γdℓ (p, q). γ is known√as
the separation factor. In order for pairs to be truly ”well separated“, γ must be >= 2.
In addition to being separated, every pair of points (p, q) should appear in exactly one well
separated pair(WSP).
Construction of the WSPD can be executed on a quadtree or fair split tree in O(n)
time, and Callahan proved that construction will yield O(n) WSPs. Note that while the
process to construct the WSPD is identical regardless of the tree used, the number of WSPs
can vary by a substantial amount(although within a constant factor). Section 2.5 describes
Bi-chromatic closest pair computation, which can be used along with WSPD to compute
geometric minimum spanning trees. A new algorithm using these methods is presented in
chapter 5.
2.5
Bi-chromatic Pair Closest Pair Computation
Given two sets of points, A (colored red) and B colored green, the bi-chromatic closest
pair(BCCP) of (A, B) is defined as the minimum weight edge with endpoints p ∈ A and
q ∈ B. Callahan showed that given a WSPD, the geometric minimum spanning tree is a
subset of its BCCP edges. In fact, GMST computation can be reduced to the computation
of the BCCPs for a WSPD. While faster BCCP algorithms exist [1], in practice a simple
quadratic algorithm for finding BCCPs is typically used in GMST construction. This is
due to the fact that competitive algorithms for GMST seek to minimize the number of
BCCP computations necessary, and, in most cases, rarely have to compute BCCPs on
WSPs of greater than a constant size. A simple, recursive BCCP algorithm is presented as
Algorithm 1.
Given a WSP, Q, and nodes from the tree, a and b, we compute the distance between
the nodes, and recurse if that distance is less our current minimum. This is repeated until
we have a minimum distance between one pair of points, p ∈ a and q ∈ b.
2.6
Voronoi Cell Diagrams
A classical computational geometry structure, the Voronoi diagram is composed of cells
which are constructed around the input points [10]. These cells are designed with the
property that they each border a region that is closer to one data point than any other. By
using well known divide and conquer algorithms, two dimensional Voronoi diagrams can be
constructed in O(n log n) time [90].
In order to answer nearest neighbor queries using the Voronoi diagram one has to be
able to identify in which cell that query point lies. The Dobkin-Kirkpatrick hierarchy is a
data structure which answers this question in O(log n) time. This structure is based on a
hierarchy of increasingly complex triangulations. At the bottom level of the hierarchy, the
vertices of the Voronoi diagram are triangulated. A maximal independent set is calculated
and removed, and then the holes are re-triangulated. Links are maintained between the
9
Algorithm 1 BCCP Algorithm [73]: Compute {p′ , q ′ , δ ′ } =BCCP(a, b, [, {p, q, δ} = η])
Require: a, b ∈ Q, δ ∈ R+
Require: If {p, q, δ} is not specified, {p, q, δ} = η,
1: an upper bound on BCCP (a, b).
2: procedure BCCP (a,b[, {p, q, δ} = η])
3:
if (|a| = 1 and |b| = 1) then
4:
Let p′ ∈ a, q ′ ∈ b
5:
if dℓ (p′ , q ′ ) < δ then
6: return {p′ , q ′ , dℓ (p′ , q ′ )}
7:
else
8:
γ = dℓ (Lef t(a), b)
9:
ζ = dℓ (Right(a), b)
10:
if γ < δ then
11: {p, q, δ} = BCCP (Lef t(a)), b, {p, q, δ})
12:
if ζ < δ then
13: {p, q, δ} = BCCP (Right(a)), b, {p, q, δ})
14:
return {p, q, δ}
15: end procedure
Figure 2.4: A Voronoi Cell Diagram built over a two dimensional point set.
10
levels of the tree, and each triangle in one level intersects only a constant number of triangles
in the level below. Kirkpatrick showed that the minimum size of a maximal independent
n
set is at least 32
, meaning that at each level some constant fraction of n is removed. This
yields a tree of height O(log n). Queries are computed by locating a point in the topmost
(constant size) triangulation, then proceeding down the hierarchy, doing a constant amount
of work at each level.
n
factor is, in practice, a fairly high constant. More recent work has shown that
The 32
n
the factor is provably higher (Iocono gives a factor of 25
[49]. Work has been done to
give improved results in this kind of planar point location, including work based on layered
directed acyclic graphs [40] and randomized incremental algorithms [72] [88]. By allowing
for expected O(log n) queries, data structures based on orientation tests have given some
of the most efficient data structures to date for planar point location [9].
2.7
Delaunay Graphs
Another common computational geometry tool, the Delaunay graph is the dual of the
Voronoi diagram. It is formed in such a way that, given the complete graph, every edge
around which can be drawn an empty circumcircle is included in the graph [35]. Through
various methods, the Delaunay graph can be constructed in O(n log n) time [94].
b
b
b
b
b
b
b
b
b
b
b
b
b
b
Figure 2.5: A Delaunay graph constructed over a two dimensional point set.
It has been shown that by using a simple compass routing technique, one can find the
nearest neighbor to a point using the Delaunay graph. To begin the query, an arbitrary
starting place is chosen on the graph. The adjacent points are compared to the query point,
and if any are closer to the query than the current position, the position is updated. By
repeating the search for a local improvement, the global solution can be found [37].
The query time of such an algorithm is bounded by two factors. The first is the length
of the path one must traverse to find the solution, the second is the degree of the vertices on
the path. Birn et. al. showed that by constructing a hierarchy of Delaunay graphs, the first
factor can be bounded, in expectation, by O(log n). They presented a practical algorithm
using this method for answering the nearest neighbor query exactly in the plane [18], which
is described in section 4.
11
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
Figure 2.6: Compass routing on a Delaunay graph to locate a nearest neighbor
Like the Voronoi diagram method, this approach has only be shown to be practical
in the plane. In higher dimensions, the Delaunay graph suffers from greater than linear
complexity, making it unlikely to be useful in general dimensions. In chapter 4 a new
algorithm using both Delaunay graphs and Voronoi cells to find planar nearest neighbors
is presented. In addition, chapter 4 presents results applying these techniques to two and
three dimensional point sets.
2.7.1
Principal Component Analysis
Introduced in 1901 [81], principal component analysis(PCA) is a tool which can be
used to simplify high dimensional data by identifying structure within it and eliminating
noise. In essence, it seeks to reduce d-dimensional data to d′ -dimensions (with d′ < d) while
maximally preserving relational information. One method for accomplishing this is through
eigenvector decomposition of the covariance matrix of the data. In chapter 4, the Fast
PCA [91] algorithm will be used to reduce SIFT data for more efficient nearest neighbor
searching.
Algorithm 2 Fast PCA Algorithm [91]
Require: Set of points P ∈ Rd
1: procedure FastPCA(points P , integer h)
2:
C ← covariance matrix of P
3:
for p = 1 to h
4:
Ep ← random d × 1 vector
5:
do
6:
Ep ← CEp
P
T
7:
Ep ← Ep − p−1
j=1 (Ep Ej )Ej
8:
Ep ← Ep /||Ep ||
9:
while Ep has not converged.
10:
return E1 ...Eh
11: end procedure
12
CHAPTER 3
K-NEAREST NEIGHBOR GRAPHS
K-nearest neighbor graph construction involves creating a graph in which every point is
connected to its k-nearest neighbors. This chapter will discuss current methods for solving
this problem and present a new algorithm, called knng, that is cache efficient, capable of
utilizing parallel processors, and able to work on data sets too large to fit into conventional
memory. An extensive theoretical analysis proves the running time is competitive with other
state-of-the-art algorithms. Finally, extensive experiments will be presented, demonstrating
the practical advantages of this new approach.
The naive approach to solve the k-nearest neighbor graph construction problem uses
O(n2 ) time and O(nk) space. Theoretically, the k-nearest neighbor graph can be computed
in O(n log n + nk) [21]. The method is not only theoretically optimal and elegant but also
parallelizable. Unfortunately these methods also have high constant values that can make
them impractical [98, 21, 28, 38].
In practice, variants of kd-tree implementations [66, 78, 27] are generally chosen. In
low dimensions, one of the best kd-tree implementations is by Arya et al. [7]. They present
an ǫ-approximation based kd-tree algorithm that can answer nearest neighbor queries in
O(log n log ǫ1d ) time.
Morton order or Z-order of points, described in chapter 2 have been used previously
for many related problems. Tropf and Herzog [96] present a precursor to many nearest
neighbor algorithms. Their method uses one, unshifted Morton order point set to conduct
range queries. The main drawbacks of their method were: (1) It does not allow use of
non-integer keys. (2) It does not offer a rigorous proof of worst case or expected run time,
in fact it leaves these as an open problem. (3) It offers no scheme to easily parallelize their
algorithm. Orenstein and Merrett [76] described another data structures for range searching
using Morton order on integer keys [67].
Bern [16] described an algorithm using 2d shifted copies of a point set in Morton order
to compute an approximate solution to the k-nearest neighbor problem. This paper avoids
a case where two points lying close together in space are far away in the one-dimensional
order, in expectation. In all these algorithms, the Morton order was determined using
explicit interleaving of the coordinate bits.
Liao et al. [59] used Hilbert curves to compute an approximate nearest neighbor solution
using only d + 1 shifted curves. Chan [23] refined this approach, and later presented an
algorithm that used only one copy of the data, while still guaranteeing a correct approxima-
13
tion result for the nearest neighbor problem [24]. It is worth noting that in practice, these
methods for k-nearest neighbor computation are less efficient than state of the art kd-tree
based algorithms [24].
3.0.2
Notation
In this chapter, lower case Roman letters denote scalar values. p and q are specifically
reserved to refer to points. P is reserved to refer to a point set. n is reserved to refer to the
number of points in P .
p < q denotes the Z-value of p is less than q (> is used similarly). ps denotes the shifted
point p + (s, s, . . . , s). P s = {ps |p ∈ P }. dℓ (p, q) denotes the Euclidean distance between p
and q.
pi is the i-th point in the sorted Morton ordering of the point set. p(j) is used to denote
the j-th coordinate of the point p. The Morton ordering also defines a quadtree on the
point cloud. boxQ (pi , pj ) refers to the smallest quadtree box that contains the points pi and
pj . box(c, r) denotes a box with center c and radius r. The radius of a box is the radius of
the inscribed sphere of the box.
k is reserved to refer to the number of nearest neighbors to be found for every point. d
is reserved to refer to the dimension.
In general, upper case Roman letters (B) refer to a bounding box. Bounding boxes with
a subscript Q (BQ ) refer to a quadtree box. and dℓ (p, B) is defined as the minimum distance
from point p to box (or quadtree box) B. E[ ] is reserved to refer to an expected value. E
refers to an event. P (E) is reserved to refer to the probability of an event E. Ai is reserved
to refer to the current k nearest neighbor solution for point pi , which may still need to be
refined to find the exact nearest neighbors. nnk (p, {}) defines a function that returns the k
nearest neighbors to p from a set. The bounding box of Ai refers to the minimum enclosing
box for the ball defined by Ai . Finally, rad(p, {}) returns the distance from point p to the
farthest point in a set. rad(Ai ) = rad(pi , Ai ).
3.1
The knng Algorithm
The knng algorithm(Algorithm 4) mainly consists of the following three high level components:
• Preprocessing Phase: In this step, input points, P , are sorted the using Morton
ordering.
• Sliding Window Phase: For each point p ∈ P , compute its approximate k-nearest
neighbors by scanning O(k) points to the left and right of p. Another way to think
of this step is, to slide a window of length O(k) on the sorted array and find the
k-nearest neighbors restricted to this window.
• Search Phase: Refine the answers of the previous phase by zooming inside the
constant factor approximate k-nearest neighbor balls using properties of the Morton
order.
14
Recall from chapter 2.2 that the common approach to Morton ordering does not account
for point values with non-integer coordinates. Next, an algorithm for computing the relative
Morton order of points with floating point coordinates will be discussed, along with the
details of the preprocessing phase.
3.1.1
Preprocessing and Morton Ordering On Floating Points
As described in chapter 2.2 Chan’s method for relative Morton order only applies to
integer types. This is due to using the XOR operation and bit shifting operators that, in
general, are not applicable on floating point types. However, it can be extended to floating
point types as shown in Algorithm 3. The algorithm takes two points with floating point
coordinates. The relative order of the two points is determined by the pair of coordinates
who have the first differing bit with the highest exponent. The XORmsb function computes
this by first comparing the exponents of the coordinates, then comparing the bits in the
mantissa if the exponents are equal. Note that the msdb function on line 14 returns the
most significant differing bit of two integer arguments. This is calculated by first XORing
the two values, then shifting until the most significant bit is reached. The exponent and
mantissa functions return those parts of the floating point number in integer format. This
algorithm allows the relative Morton order of points with floating point coordinates to be
found in O(d) time and space complexity.
Algorithm 3 Floating Point Morton Order Algorithm
Require: d-dimensional points p and q
Ensure: true if p < q in Morton order
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
procedure Compare(point p , point q)
x ← 0; dim ← 0
for all j = 0 to d do
y ← XORmsb(p(j) , q(j) )
if x < y then
x ← y; dim ← j
end if
end for
return p(dim) < q(dim)
end procedure
procedure XORmsb(double a , double b)
x ←exponent(a); y ←exponent(b)
if x = y then
z ← msdb(mantissa(a),mantissa(b))
x←x−z
return x
end if
if y < x then return x
else return y
end procedure
15
Figure 3.1: Pictorial representation of Algorithm 2, Line 5. Since all the points
inside the approximate nearest neighbor ball of pi have been scanned, the nearest
⌈rad(Ai )⌉
neighbor has been found. This happens because pi
is the largest point with
respect to the Morton ordering compared to any point inside the box. Any point
greater than pi+k in Morton ordering cannot intersect the box shown. A similar
−⌈rad(Ai )⌉
argument holds for pi
and pi−k .
Once it is possible to determine the relative Morton order of points, the preprocessing
phase of the algorithm is trivial. Using the Morton order comparison, all points p ∈ P are
sorted according to their Z-value.
3.1.2
Sliding Window Phase
Once P has been sorted in the previous phase, a partial solution is found for each point
pi by finding the k nearest neighbors from the set of points {pi−ck ...pi+ck } for some constant
c ≥ 12 . This is done via a linear scan of the range of points. The actual best value of c
is platform dependent, and in general c should not be too large. Once this partial nearest
neighbor solution is found, its correctness can be checked using the property of Morton
ordering shown chapter 2.2. If the corners of the bounding box for the current solution lie
within the range that has already been searched, then the partial solution is in fact the true
solution (see Figure 3.1).
If the current approximate nearest neighbor ball is not bounded by the lower and upper
points already checked, a binary search must be used to find the location of the lower and
upper corners of the bounding box of the current approximate nearest neighbor ball in the
Morton ordered point set. This defines the range that needs to be searched in the final
16
phase of the algorithm.
3.1.3
Search Phase
For each point pi for which the solution was not found in the previous step, the partial
solution must be refined to find the actual solution. This is done using a recursive algorithm.
Given a range of points {pa ...pb }, first check if the distance r from pi to boxQ (pa , pb ) is greater
than the radius of Ai (line 24). If it is, then the current solution does not intersect this range.
Otherwise, update Ai with pa+b/2 . Repeat the procedure for the ranges {pa ...pa+b/2−1 } and
{pa+b/2+1 ...pb }. One important observation is the property used as a check in the scan
portion of the algorithm still holds, and one of these two new ranges of points may be
eliminated by comparing the bounding box of Ai with pa+b/2 . If the length of a range is less
than ν, a fixed constant, a linear scan of the range is more efficient than recursing further.
A good value for ν is platform dependent, and should be experimentally determined.
3.1.4
Parallel Construction
Parallel implementation of this algorithm happens in three phases. For the first phase,
a parallel Quick Sort [97] is used in place of a standard sorting routine. Second, the sorted
array is split into p chunks (assuming p threads to be used), with each thread computing
the initial approximate nearest neighbor ball for one chunk independently. Finally, each
thread performs the recursive step of the algorithm on each point in its chunk. This allows,
as near as is possible, the workload to be evenly divided across all threads. In practice,
memory overhead and other factors keep this from being a perfectly linear speedup.
3.1.5
Handling large data sets
Many applications of k-nearest neighbor graph construction require large point clouds
to be handled that do not fit in memory. One way to handle this problem is to make diskbased data structures [86]. An alternative solution is simply increasing the swap space of
the operating system and running the same implementation as in internal memory. Many
operating systems allow on the fly creation and deletion of temporary swap files (Windows,
Linux), which can be used to run the knng algorithm on very large data sets (100 million
or more points). Experimentally, this algorithm was able to calculate k-nearest neighbor
graphs for very large data sets (up to 285 million points as seen in Table 3.1).
In Linux, new user space memory allocations (using new or malloc) of large sizes are
handled automatically using mmap which is indeed a fast way to do IO from disks. Once the
data is memory mapped to disk, both sorting and scanning preserve locality of access in the
algorithm and hence are not only cache friendly but also disk friendly. The last phase of
the knng algorithm is designed to be disk friendly as well. Once an answer is computed for
point pi by a single thread, the next point in the Morton order uses the locality of access
from the previous point and therefore causes very few page faults in practice.
17
Algorithm 4 KNN Graph Construction Algorithm
Require: Randomly shifted point set P of size n.
<,>. (COMPARE= <).
Ensure: Ai contains k nearest neighbors of pi in P .
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
Morton order compare operators:
procedure Construct(P , int k)
P ← ParallelQSort(P, <)
parallel for all pi in P
Ai ← nnk (pi , {pi−k , . . . , pi+k })
⌈rad(Ai )⌉
if pi
< pi+k then u ← i
else
⌈rad(Ai )⌉
I ← 1; while pi
< pi+2I do: ++I
u ← min(i + 2I , n)
end if
−⌈rad(Ai )⌉
if pi
> pi−k then l ← i
else
−⌈rad(Ai )⌉
I ← 1; while pi
> pi−2I do: ++I
l ← max(i − 2I , 1)
end if
if l 6= u then CSearch(pi , l, u)
end procedure
procedure CSearch(point pi , int l , int h)
if (h − l) < ν then
Ai ← nnk (pi , Ai ∪ {pl . . . ph })
return
end if
m ← (h + l)/2
Ai ← nnk (pi , Ai ∪ pm )
if dℓ (pi , box(pl , ph )) ≥ rad(Ai ) then return
if pi < pm then
CSearch(pi , l, m − 1)
⌈r(A )⌉
if pm < pi i then CSearch(pi , m + 1, h)
else
CSearch(pi , m + 1, h)
−⌈r(Ai )⌉
if pi
< pm then CSearch(pi , l,m − 1)
end if
end procedure
18
3.2
Analysis of the knng Algorithm
In this section, the knng algorithm is shown to have a running time of O(⌈ pn ⌉k log k) in
expectation, plus the time for one sort. In addition to the space required for storing the
input and output, the algorithm requires O(pk) extra space.
The running time above is only valid under certain assumptions about the input data.
Let P be a finite set of points in Rd such that |P | = n ≫ k ≥ 1. Let µ be a counting
measure on P . Let the measure of a box, µ(B) be defined as the number of points in B ∩ P .
P is said to have expansion constant γ if ∀pi ∈ P and for all k ∈ (1, n):
µ(box(pi , 2 × rad(pi , Nik ))) ≤ γk
In plain terms, this restriction states that doubling the size of a ball placed over points
from the input distribution should not increase the number of points within that ball by
greater than some constant factor. This is a similar restriction to the doubling metric
restriction on metric spaces [29, 51] and has been used before [51]. Throughout the analysis,
we will assume that P has an expansion constant γ =O(1).
The first phase of the algorithm is sorting, which has well established runtime bounds.
The second phase is a linear scan. The dominating factor in the running time will be
from the third phase; the recursive search function. The running time of this phase will be
bounded by showing that the smallest quadtree box containing the actual nearest neighbor
ball for a point is, in expectation, only a constant factor smaller than the quadtree box
containing the approximate solution found in phase two. Given the distribution stated
above, this implies there are only O(k) additional points that need to be compared to refine
the approximation to the actual solution. The actual running time of the CSearch function
is upper bounded by the time it would take to simply scan the O(k) points.
To prove the running time of the algorithm, consider the solution to the following game:
In a room tiled or paved with equal square tiles (created using equidistant parallel lines
in the plane), a coin is thrown upwards. If the coin rests cleanly within a tile, the length
of the square tile is noted down and the game is over. Otherwise, the side length of the
square tiles in the room are doubled in size and the same coin is tossed again. This process
is repeated till the coin rests cleanly inside a square tile.
Note that in this problem, the square tiles come from quadtrees defined by Morton order,
and the coin is defined by the optimal k-nearest neighbor ball of pi ∈ P . The goal is to bound
the number of points inside the smallest quadtree box that contains box(pi , rad(pi , Nik )).
This leads to the following lemma:
Lemma 3.2.1. Let B be the smallest box, centered at pi ∈ P containing Nik and with side
length 2h (where h is assumed w.l.o.g to be an integer > 0) which is randomly placed in a
quadtree Q. If the event Ej is defined as B being contained in a quadtree box BQ with side
length 2h+j , and BQ is the smallest such quadtree box, then
P (Ej ) ≤
1
1− j
2
d
dj−1
2
j 2 −j
2
Proof. From Figure 3.2, it can be inferred that in order for B to be contained in BQ , the
total number of candidate boxes where the upper left corner of B can lie, is (2j − 1) along
19
Figure 3.2: (a) B lands cleanly on the quadtree box BQ twice its size. (b) B lands
cleanly on a quadtree box 22 times its size. In both figures, if the upper left corner
of B lies in the shaded area, the box B does not intersect the boundary of BQ .
Obviously, (a) happens with probability 41 (probability ( 21 )d in general dimension)
2
2 = 9 (probability ( 22 −1 )d in general
and (b) happens with probability ( 2 2−1
2 )
16
22
dimension).
each dimension. The total number of candidate gray boxes is therefore (2j − 1)d . The
probability that the upper left corner lies in a particular gray box is 2hd /2(h+j)d . Thus the
probability that B is contained in BQ is ((2j − 1)/2j )d . If BQ is the smallest quadtree box
housing B, then all quadtree boxes with side lengths 2h+1 ,2h+2 ,. . . ,2h+j−1 cannot contain
BQ . This probability is given by:
l
d ! j−1
j−1
Y
Y (2l )d − (2l − 1)d
2 −1
1−
=
2l
(2l )d
l=1
l=1
The probability of BQ containing B is therefore
j−1
(2j − 1)d Y (2l )d − (2l − 1)d
P (Ej ) =
(2j )d
(2l )d
l=1
2
Now consider following inequality: Given v such that 0 < v < 1; (1−v)d ≤ 1−dv+d(d−1) v2! ,
which can be easily proved using induction or alternating series estimation theorem. Putting
v = 1/2l yields:
(1 − v)d ≤
1 − dv(1 + v/2) + d2 v 2 /2
≤ 1 − 2dl 1 + 1/2l+1 + d2 /22l+1
The sum can be simplified using a Taylor series with v =
(1 − v)d ≤
1 − dv + d(d − 1)
1
.
2l
v2
v3
− d(d − 1)(d − 2) + . . .
2!
3!
20
v2
2!
2
2
dv 2
d v
−
≤ 1 − dv +
2
2
v
d2 v 2
≤ 1 − dv 1 +
+
2 2
1
d2
d
≤ 1 − l 1 + l+1 + 2l+1
2
2
2
≤ 1 − dv + (d2 − d)
Then, by substituting and simplifying
d j−1
Y
1
d2
d
P (Ej ) ≤
1 + l+1 − 2l+1
2l
2
2
l=1
j−1 d
d
1 dY d
≤
1− j
1 + 2l+1 − 2l+1
2
2l
2
2
l=1
j−1
1 dY d
≤
1− j
2
2l
l=1
1 d dj−1
≤
1− j
j 2 −j
2
2 2
1
1− j
2
Lemma 3.2.2. The linear scan phase of the algorithm produces an approximate k-nearest
neighbor box B ′ centered at pi with radius at most the side length of BQ . Here BQ is the
smallest quadtree box containing B, the k-nearest neighbor box of pi .
Proof. The algorithm scans at least pi−k . . . pi+k , and picks the top k nearest neighbors to pi
among these candidates. Let a be the number of points between pi and the largest Morton
order point in B. Similarly, let b be the number of points between pi and the smallest
Morton order point in B. Clearly, a + b ≥ k. Note that B is contained inside BQ , hence
µ(BQ ) ≥ k. Now, pi . . . pi+k must contain a points inside BQ . Similarly, pi−k . . . pi must
contain at least b points from BQ . Since we have collected at least k points from BQ , the
radius of B ′ is upper bounded by the side length of BQ .
′ , has only a constant number
Lemma 3.2.3. The smallest quadtree box containing B ′ , BQ
of points more than k in expectation.
Proof. Let there be k ′ points in B. Clearly k ′ =O(k) given γ =O(1). The expected number
′ is at most E[γ x k ′ ], where x is such that if the side length of B is 2h then
of points in BQ
′ is 2h+x . Let the event E ′ be defined as this occurring for some fixed
the side length of BQ
x
value of x.
Recall from Lemma 3.2.1 that j is such that the side length of BQ is 2h+j . The probability for the event Ej is
21
Figure 3.3: All boxes referred in Lemma 3.2.3.
P (Ej ) =
j−1
(2j − 1)d Y (2l )d − (2l − 1)d
(2j )d
(2l )d
l=1
1 dj−1
≤ (1 − j )d j 2 −j
2 2 2
′
From Lemma 3.2.2 B ′ has a side length of at most 2h+j+1 = 2h . Let E ′′ j ′ be the event
′ with side length 2h′ +j ′ . Note that E ′′ ′ has the
that, for some fixed h′ , B ′ is contained in BQ
j
′ is independent of j. Given
same probability
mass
function
as
E
,
and
that
the
value
of
j
j
P
′′
x ′
this, P (E ′ x ) = x−1
j=1 P (Ej )P (E j ′ ). From this, E[γ k ] follows
E[γ x k] ≤
∞
X
=
x=2
∞
X
γ x kP (E ′ x )
γxk
x=2
≤
≤
∞
X
x=2
P (Ej )P (E ′′ j ′ )
j=1
γxk
x=2
∞
X
x−1
X
x−1
X
j=1
γxk
x−1
X
j=1
(1 −
1 d dj−1
1 d dx−j−1
)
)
(1
−
2j 2 j 22−j
2x−j 2 (x−j)22−(x−j)
(1 −
1 d
dx−2
1 d
)
(1
−
)
2
2j
2x−j 2 x2 −(1+2j)x+2j
2
Observe that, ∀j ∈ {1, 2, . . . , x − 1}:
(1 − 2−j )d (1 − 2j−x )d ≤ (1 − 2−x/2 )2d
22
which can be proved by showing:
(1 − 2−j )(1 − 2j−x ) ≤ (1 − 2−x/2 )2
which is true because a+b
2
expectation calculation yields:
or 2−j + 2j−x
≥ 2−x/2+1
√
≥ ab. Putting this simplified upper bound back in the
E[γ x k] =
∞
x−1 X
X
dx−2
1 2d
x
γ k
1 − x/2
≤
2
x −(1+2j)x+2j 2
2
2
2
x=2
j=1
x−1
∞
2d
X x2 −(1+2j)x+2j 2
X
1
2
γ x 1 − x/2
≤ k
dx−2
2−
2
x=2
j=1
x−1
∞
2d
X
X
1
2
dx−2 2(x−x )/2
≤ k
2j(x−j)
γ x 1 − x/2
2
x=2
j=1
It is easy to show that j(x − j) ≤ x2 /4 ∀j ∈ {1, . . . , x − 1}: Let a′ = x/2 and b′ = j − x/2.
Then (a′ + b′ )(a′ − b′ ) = a′2 − b′2 where the LHS is j(x − j) and RHS is x2 /4 − b′2 . Hence
2
j(x − j) ≤ x2 /4. Using the fact that 2j(x−j) ≤ 2(x/2) in the expectation calculation yields:
∞
X
1 2d x−2 (x−x2 )/2 x2 /4
x
d 2
x2
≤ k
γ 1 − x/2
2
x=2
∞
X
√ x
1 2d −x2 /4
≤ k
x(dγ 2) 1 − x/2
2
2
x=2
Putting y = x/2 and c = 2(dγ)2 , yields:
≤ 2k
∞
X
√ 2y
1 2d
2
y c 2−y 1 − y
2
y=1
Using the Taylor’s approximation:
1 2d
≤ 1 − d2−y 2 + 2−y + d2 2−2y+1
1− y
2
which when substituted:
≤ 2k
Z
∞
y
y=0
c y
1 − d2−y 2 + 2−y + d2 2−2y+1 dy
y
2
Integrating and simplifying using the facts that the error function, erf(x) encountered in
integrating the normal distribution follows erf(x) ≤ 1, and c = 2(dγ)2 =O(1), yields:
(ln(c))2 √
1
π ln 2 e 4 ln(2)
+
≤ k
ln 2
= O(k)
23
Theorem 3.2.4. For a given point set P of size n, with a bounded expansion constant and
a fixed dimension, in a constrained CREW PRAM model, assuming p threads, the k-nearest
neighbor graph can be found in one comparison based sort plus O(⌈ pn ⌉k log k) expected time.
′ can be found in O(log k)
Proof. Once B ′ is established in the first part of the algorithm, BQ
time by using a binary search outward from pi (this corresponds to lines 5 to 14 in Algo′ is found, it takes at most another O(k) steps to report the solution.
rithm 4). Once BQ
There is an additional O(log k) cost for each point update to maintain a priority queue of
the k nearest neighbors of pi . Since the results for each point are independent, the neighbors
for each point can be computed by an independent thread. Note that the algorithm reduces
the problem of computing the k-nearest neighbor graph to a sorting problem (which can
be solved optimally) when k =O(1), which is the case for many graphics and visualization applications. Also, the expectation in the running time is independent of the input
distribution and is valid for arbitrary point clouds.
That concludes the theoretical analysis of the knng algorithm. In the next section, the
algorithm’s running time will be tested experimentally.
3.3
Experimental Analysis of the knng Algorithm
The knng algorithm was tested on three different architecture setups, each detailed in
its own section below. The primary competition for k-nearest neighbor graph construction
is the ANN library for low dimensional nearest neighbor searching.
3.3.1
ANN
Mount and Arya introduced their ANN approximate nearest neighbor library implementation based on pruning kd-trees to find nearest neighbors. One further improvement they
made to the classic tree based query algorithm was the introduction of a priority queue
to the search. As the nearest neighbor ball descends in the tree, new hyper-rectangles are
inserted into a priority queue. Rectangles are then considered in order of their distance
from the current nearest neighbor ball. In this way, one hopes to consider the most relevant
rectangles first, and eliminate the need to consider others entirely.
The ANN library is coded in C++, and is highly optimized for both cache efficiency and
running time. It has long been used as the standard in low dimensional nearest neighbor
searching.
ANN [69] had to be modified to allow a fair comparison. Nearest neighbor graph construction using ANN is done in two phases. The preprocessing stage is the creation of a
kd-tree using the input data set. Then, a nearest neighbor query is made for each point
in the input data set. For these experiments, the source code was modified allow multiple
threads to query the same data structure simultaneously. The kd-tree construction was not
modified to use a parallel algorithm. However, it is worth noting that even if a parallel kdtree construction algorithm was implemented, it would almost certainly still be slower than
parallel sorting (the preprocessing step in the knng algorithm). In the interests of a fair
comparison, the empirical results section includes several examples of k-nearest neighbor
graph construction where only one thread was used (Figures 3.6, 3.11, 3.14).
24
Table 3.1: Construction times for k = 1-nearest neighbor graphs constructed on
non-random 3-dimensional data sets. Each graph was constructed using 8 threads
on Intel architecture. All timings are in seconds.
Dataset
Screw
Dinosaur
Ball
Isis
Blade
Awakening
David
Night
Atlas
Tallahassee
3.3.2
Size
(points)
27152
56194
137602
187644
861240
2057930
3614098
11050083
182786009
285000000
ANN
(s)
.06
.11
.31
.46
2.9
8.6
16.7
62.2
-
knng(long)
(s)
.04
.07
.14
.18
.86
2.1
3.7
12.4
1564
2789
knng(float)
(s)
.06
.11
.21
.27
1.3
3.2
5.6
18.6
2275
4235
Experimental Data
Except where noted, random data sets were generated from 3-dimensional points uniformly distributed between (0, 1], stored as 32 bit floats. Graphs using random data sets
were generated using multiple sets of data, and averaging the result. Results from random
data sets with different distributions (such as Gaussian, clustered Gaussian, and spherical)
were not significantly different from the uniform distribution. Also included were several
non-random data sets, consisting of surface scans of objects. In all graphs, the knng algorithm will be labeled ‘knng(float)’. The label ‘knng(long)’, means that the data was scaled
to a 64 bit integer grid, and stored as a 64 bit integer. This improves the running time of
the algorithm dramatically, and can be done without loss of precision in most applications.
3.3.3
Intel Architecture
This experiment was conducted on a machine equipped with dual Quad-core 2.66GHz
Intel Xeon CPUs, and a total of 4 GB of DDR memory. Each core has 2 MB of total cache.
SUSE Linux with kernel 2.6.22.17-0.1-default was running on the system. The compiler
used was gcc version 4.3.2 for compilation of all code (with -O3).
Construction Time Results. As shown in Figure 3.4 and Table 3.1, the knng algorithm performs very favorably against k-nearest neighbor graph construction using ANN.
Table 3.1 shows timings of k-nearest neighbor graph construction on very large point sets,
where ANN was unable to complete a graph due to memory issues. Construction times
improve dramatically when floating points are scaled to an integer grid. Other random
distributions had similar construction times to these cases, and so more graphs were not
included. Figure 3.5 shows that as k increases, the advantage runs increasingly toward
the knng implementation. Finally, Figure 3.6 shows the speedup gained by increasing the
number of threads. In these and all other graphs shown, standard deviation was very small;
less than 2% of the mean value.
25
160
knng(long)
ANN
knng(float)
Time (seconds)
140
120
100
80
60
40
20
0
0
2
4
6
8
10 12 14 16 18 20
Size of Input Set(millions)
Figure 3.4: Graph of 1-NN graph construction time vs. number of data points on
the Intel architecture. Each algorithm was run using 8 threads in parallel.
40
knng(long)
ANN
knng(float)
Time (seconds)
35
30
25
20
15
10
5
0
0
10 20 30 40 50 60 70 80 90 100
Size of Nearest Neighbor Ball
Figure 3.5: Graph of k-NN graph construction time for varying k on the Intel architecture. Each algorithm was run using 8 threads in parallel. Data sets contained
one million points.
26
30
knng(long)
ANN
knng(float)
Time (seconds)
25
20
15
10
5
0
1
2
3
4
5
6
Number of Threads
7
8
Figure 3.6: Graph of 1-NN graph construction time for varying number of threads
on Intel architecture. Data sets contained ten million points.
Allocated Memory per Point(bytes)
1000
knng(float)
ANN
800
600
400
200
0
0
2
4
6
8
10 12 14 16
Number of Points (millions)
18
20
Figure 3.7: Graph of memory usage per point vs. data size on Intel architecture.
Memory usage was determined using valgrind.
27
40
knng(float)
ANN
Cache Misses/1000
35
30
25
20
15
10
5
0
0
100 200 300 400 500 600 700 800 900 1000
Size of Point Set (k)
Figure 3.8: Graph of cache misses vs. data set size on Intel architecture. All
data sets were uniformly random 3-dimensional data sets. Cache misses were
determined using valgrind which simulated a 2 MB L1 cache.
3.3.4
AMD Architecture
This machine was equipped with 8 Dual Core 2.6GHz AMD OpteronTM Processor 885,
for a total of 16 cores. Each processor had 128 KB L1 Cache, 2048 KB L2 cache and shared
a total of 64 GB of memory. Compiler gcc version 4.3.2 was used for compilation of all
code (with -O3).
As can be seen in Figures 3.9, 3.10 and 3.11, the knng algorithm performs well despite
the change in architecture. ANN fared particularly poorly on this architecture.
28
250
knng(long)
ANN
knng(float)
Time (seconds)
200
150
100
50
0
0
2
4
6
8
10 12 14 16 18 20
Size of Input Set(millions)
Figure 3.9: Graph of 1-NN graph construction time vs. number of data points on
AMD architecture. Each algorithm was run using 16 threads in parallel.
350
knng(long)
ANN
knng(float)
Time (seconds)
300
250
200
150
100
50
0
0
10 20 30 40 50 60 70 80 90 100
Size of Nearest Neighbor Ball (K)
Figure 3.10: Graph of k-NN graph construction time for varying k on AMD architecture. Each algorithm was run using 16 threads in parallel. Data sets contained
ten million points.
29
300
knng(long)
ANN
knng(float)
Time (seconds)
250
200
150
100
50
0
0
2
4
6
8
10 12
Number of Threads
14
16
Figure 3.11: Graph of 1-NN graph construction time for varying number of threads
on AMD architecture. Data sets contained ten million points.
3.3.5
Sun Architecture
This machine is a Sun T5120 server with a eight-core, T2 OpenSparc processor and 32
GB of memory. Overall, this was a slower machine compared to the others that were used,
however it was capable of running 64 threads simultaneously. The compiler gcc version
4.3.2 was used for compilation of all code (with -O3).
As can be seen in Figure 3.12, results for construction were similar to the previous
experiments. One unexpected result was ANN’s performance as k increased, as seen in
Figure 3.13. Since ANN was developed on a Sun platform, the improvements seen as k
increases may be due to platform specific tuning. In Figure 3.14, it can be observed how
both algorithms behave with a large number of threads. Both ANN and the two versions
of knng level out as the number of threads increases (processing power is limited to eight
cores).
30
Time (seconds)
50
45
40
35
30
25
20
15
10
5
0
knng(long)
ANN
knng(float)
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Size of Input Set(millions)
Figure 3.12: Graph of 1-NN graph construction time vs. number of data points
on Sun architecture. Each algorithm was run using 128 threads in parallel.
50
knng(long)
ANN
knng(float)
45
Time (seconds)
40
35
30
25
20
15
10
5
0
10 20 30 40 50 60 70 80 90 100
Size of Nearest Neighbor Ball
Figure 3.13: Graph of k-NN graph construction time for varying k on Sun architecture. Each algorithm was run using 128 threads in parallel. Data sets contained
ten million points.
31
80
knng(long)
ANN
knng(float)
Time (seconds)
70
60
50
40
30
20
10
0
0
20
40
60
80
100
Number of Processors
120
140
Figure 3.14: Graph of 1-NN graph construction time for varying number of threads
on Sun architecture. Data sets contained ten million points.
32
CHAPTER 4
PLANAR NEAREST NEIGHBOR SEARCH
Planar nearest neighbor search seeks to answer near neighbor queries on data restricted
to a single two dimensional subspace. This chapter will explore this problem by discussing
current solutions and presenting a new algorithm, called DelaunayNN, which can outperform
them. This will be followed by a theoretical analysis which can make stronger theoretical
guarantees than some competitors. Finally, extensive experimental results will be presented.
Linear space, O(n log n) pre-processing and O(log n) query time algorithms for low dimensional nearest neighbor searching are known, such as the Dobkin-Kirkpatrick hierarchy,
but as discussed in chapter 2, these tend to have large constants [52, 34] that make them
impractical.
One of the most practical algorithms available for this problem with provable guarantees
is due to Devillers [36, 19]. This approach uses the idea of building a hierarchy out of a subset
of the incremental stages of Delaunay graph construction, then using local information to
quickly locate nearest neighbors.
Recently, Birn et al. [18] have announced a very practical algorithm for this problem
which beats the state of the art [69, 36]. The full delaunay hierarchy nearest neighbor search
(FDHNN) includes edges from every incremental stage of Delaunay graph construction.
This allows for fast answers to nearest neighbor queries, but gives up some theoretical
guarantees.
One interesting feature that is used by both Delaunay hierarchy algorithms is the fact
that compass routing is supported by Delaunay triangulations [54, 18]. It was shown by
Kranakis et al. [54] that if one wants to travel between two vertices s and t of a Delaunay
triangulation, one can only use local information on the current node and the coordinates of
t to reach t, starting from s. The routing algorithm is simple, the next vertex visited is the
one whose distance to t is minimum amongst the vertices connected to the current vertex.
This type of local greedy routing has also been studied as the ‘small-world phenomenon’ or
the ‘six degrees of separation’ [65].
As mentioned in chapter 3, the ANN algorithm is very effective for low dimensional
nearest neighbor search. In theory, however, its runtime is based on approximate nearest
neighbors. For exact nearest neighbor searching it cannot offer runtime bounds better than
O(n).
Notation. In this chapter, points are denoted by lower-case Roman letters. i, j, k are
reserved for indexing purposes. dℓ (p, q) denotes the distance between the points p and q
33
in L2 metric. P is reserved to refer to a point set. n is reserved to refer to the number of
points in P .
p < q iff p precedes q in Morton order (> is used similarly). ps is used to denote the
shifted point p + (s, s, . . . , s). P s = {ps |p ∈ P }. pi is the i-th point in the sorted Morton
ordering of the point set.
Upper-case Roman letters are reserved for sets. Scalars are represented by lower-case
Greek letters. Ball(p, q) represents the diametral ball of p and q. Ball(p, ρ) would identify a
ball centered at p with diameter ρ.
For point q, let Nqk be the k points in P , closest to q. |.| denotes the cardinality of a
set or the number of points in P inside the geometric entity enclosed in ||. Let, nn(p, {})
return the nearest neighbor of p in a given set. Finally, rad(p, {}) returns the distance from
point p to the farthest point in a set.
4.1
The Algorithm
Algorithm 5 describes nearest neighbor pre-processing. The pre-processing algorithm
essentially splits the input point set, P , into three layers. First the Delaunay triangulation,
G, of P is computed and a maximal independent set M is computed for this graph. P ′ is
the set P \ M . Layer 1 is constructed from points P ′ by sorting them in Morton order.
Layer 2 is comprised of the Delaunay triangulation of points P ′ . Finally, Layer 3 contains
M , along with edges from the original Delaunay graph of P , connecting the points in Layer
2 to the maximal independent set of points. (see Figure 4.1).
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
(a) The first layer, consisting of
the non-maximal independent
set vertices sorted in Morton
order. Queries are processed
using a binary search to find
an approximate nearest neighbor ball.
(b) The second layer, consisting of the points in the first
layer, and the edges of their
Delaunay triangulation. The
queries are processed by using compass routing, starting
at the nearest point found in
the previous layer, and ending
at the nearest neighbor in this
layer.
(c) The final layer, consisting
of edges that connect the points
in the second layer to the points
in the maximal independent
set. In this layer, we refine
the nearest neighbor found in
the previous step by scanning
those points adjacent to it in
this graph. This results in the
final answer.
Figure 4.1: The three layers of the query algorithm.
Pre-processing is complete unless there is a point p in Layer 2 or 3 with large de-
34
gree (Ω(1)). In these cases, the vertices of the Voronoi cell of the point p are computed,
along with the rays emanating from p and going through these vertices, partitioning the
space around p into sectors. These sectors are sorted in clockwise order to facilitate fast
searching. These voronoi rays and the points located in the cell are stored in layers 2 or
3 (see Figure 4.2).
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
p
b
b
b
b
b
b
Figure 4.2: Here we see the center vertex p, and the sectors defined by rays passing
through the Voronoi vertices. To find a nearer neighbor, we locate which sector
the query lies in (via a binary search), then check the distance to the adjacent
point that lies in that sector.
Algorithm 5 DealaunayNN Preprocessing
Require: Randomly shifted point set P of size n. Morton order compare operator <.
1: procedure PreProcess(P )
2:
G = (P, E) ← Delaunay Triangulation of P
3:
M ← Maximal Independent set of G
4:
P′ = P \ M
5:
P ′ ← Sort(P ′ , <)
6:
G′ = (P ′ , E ′ ) ← Delaunay Triangulation of P ′
7:
for all p ∈ P ′ do:
8:
H(p) ← {q|e = (p, q) ∈ G where q ∈ P \ P ′ }.
9:
for all F in {G′ , H}
10:
for all v ∈ F and degree(v) = Ω(1) do:
11:
Pre-process VoronoiCell(v, F ) for fast lookups and jumps.
12: end procedure
The query algorithm(Algorithm 6 starts by locating the query point q in the Morton
ordering of points in Layer 1 (P ′ ) via a binary search. It then scans η =O(1) points around
the location and finds the closest point p′i to q among those points. CompassRouting is
used to find the nearest neighbor of q in Layer 2 starting from p′i . Let this point in Layer 2
be called p′j ∈ P ′ . The nearest neighbor of q in Layer 3 is found by traversing the connection
edge, and again checking the local neighborhood. The answer is then returned.
35
Algorithm 6 DelaunayNN Query
1: procedure CompassRouting(point v, point q, Graph G = (P, E))
2:
Require: v ∈ G.
3:
repeat
4:
If degree(v) = Ω(1) then
5:
If q ∈VoronoiCell(v, G) then return v
6:
else: Update v using preprocessed VoronoiCell(v, G).
7:
else:
8:
for all v ′ ∈ G incident on v do:
9:
If Dist(v ′ , q) < Dist(v, q) then:
10:
v ← v ′ ; break
11:
until No improvement found
12: end procedure
13:
14:
15:
16:
17:
18:
procedure Query(point q)
i′ ←BinarySearch(P ′ , q, <)
p′i ←nn(q, {pi′ −η , . . . , pi′ +η }) where η =O(1)
p′j ←CompassRouting(p′i , q, G′ )
return nn(p′j , H(p′j )) // Uses preprocessed VoronoiCell(p′j , H) if |H(p′j )| =
Ω(1)
end procedure
The CompassRouting algorithm is simple. Assuming there are no high degree vertices
in Layer 2, it starts with a point v and searches for one closer point to q that is incident
on v. If there are no such vertices, it proceeds to layer 3, otherwise it jumps to the found
point and repeats the process. In case it hits a point p that has large degree, it has access
to the Voronoi rays emanating from p. It first locates q in the ray system of p in O(log n)
time using a binary search and then tests whether q lies in the Voronoi cell of p. If this is
the case, p is returned as the answer for layer 2, and the search will continue in layer 3.
Otherwise the point opposite to p in the sector containing q is nearer to q compared to p.
In this case, the current position is updated to the opposite point, and routing continues.
Once in layer 3 the algorithm again looks at all vertices incident on the current answer,
p′j . In this case, the edges are from the original Dealaunay graph of the undivided point
set, which connect the vertices in layer 2 to the maximal independent set vertices in layer
3. In case of a vertex in Layer 3 with large degree, the preprocessed Voronoi cell of p′j is
used in the same fashion as in Layer 2. In this case, if q does not lie in the Voronoi cell of
p′j , then the point opposite to p′j in the sector containing q is the nearest neighbor of q, as
opposed to just being a nearer neighbor.
It should be noted that there are two degenerate cases when considering the Voronoi
cell of p, both of which can be resolved in constant time. The first case occurs when a
sector contains an open face of the Voronoi cell (Figure 4.3(a)). This sector will contain two
adjacent vertices, not one, and if q lies in such a sector, we simply compute the distance to
both.
The second degenerate case occurs when an edge of the Voronoi cell of p has length 0
36
(Figure 4.3(b)), implying that some or all of the points bordering p are co-circular with p.
In this case,in pre-processing the points immediately clockwise and counter-clockwise to p
on the circle are identified. If q is determined to lie in a sector bordering an edge with 0
length, one of those two found points must be nearer to the actual nearest neighbor than p
(assuming p is not the answer).
b
b
b
b
b
v2
b
b
b
b
b
b
p
v2
b
b
v1
b
b
b
b
b
b
b
p
b
b
v1
(a) In the first degenerate case, the
center vertex has an open Voronoi
cell. If the query point lies in this
sector, we check the distance to the
two points in the open sector (v1, v2).
(b) In the second degenerate case,
the center vertex is co-circular with
several adjacent vertices, and there
is only one Voronoi vertex. In this
case, we always check the nearest
co-circular points in the clockwise
and counter-clockwise directions for
a nearer neighbor (v1, v2), and ignore
the other co-circular points.
Figure 4.3: Two degenerate cases for linear degree vertices.
4.2
Analysis of DelaunayNN
In terms of analysis, the first goal will be to prove the correctness of the DelaunayNN algorithm. Begin by defining Compass Routing formally (this definition is slightly different
than [54]): Given a geometric graph G = (P, E), an initial vertex s ∈ G and a destination q
(may not be in the graph), let vi be the closest vertex in G to q. The goal is to travel from
s to vi , when the only information available at any point in time is the coordinates of q,
the current position, and the edges incident on the current vertex . Starting at s, traverse
the edge (s, s′ ) ∈ E incident on s that leads closest to q. We assign s = s′ and repeat this
procedure till no movement will decrease the distance to q.
Lemma 4.2.1, proves a simple property of Compass Routing on Delaunay Graphs in
d-dimensions.
Lemma 4.2.1. Let P ⊂ Rd and G = (P, E) be the graph output from its Delaunay triangulation. Let q be a query point for which to compute the nearest neighbor in P . Compass
routing on G yields nearest neighbor of q in P .
37
Figure 4.4: Proof of Lemma 4.2.1.
Proof. Let the compass routing begin with a vertex v0 ∈ P . Let vi be the vertex on which
compass routing stops and can not improve the distance to q. Let Nbr(vi ) be the set of all
vertices having an edge with vi in G.
This implies that Ball(q, Dist(q, vi )) is empty of vertices in Nbr(vi ). For a contradiction,
let v ∗ 6= vi in P be the nearest neighbor of q. Then v ∗ ∈ Ball(q, Dist(q, vi )) and there is no
edge between v ∗ and vi in G.
Now draw a ball with vi and v ∗ on its boundary such that it lies inside Ball(q, Dist(q, vi )).
If this ball is empty, then v ∗ ∈ Nbr(vi ) which is a contradiction of the Delaunay property of
the graph. Otherwise, shrink the ball, keeping it hinged on vi and inside Ball(q, Dist(q, vi )),
till it contains only one point vj ∈ P . This again is a contradiction since vj is closer to q than
vi and (vi , vj ) ∈ G (Compass routing should not have terminated at vi ). See Figure 4.4.
Next, it is necessary to show the Query function in Algorithm 6 returns the correct
answer.
Lemma 4.2.2. The Query function in Algorithm 6 returns the nearest neighbor of q in
P.
Proof. Do to the correctness of compass routing, the need here is to show that correctness of
the query algorithm in not affected by separating P into three layers. Lemma 4.2.1 ensures
that the nearest neighbor in Layer 2 is found. Let this neighbor be p′j . Ball(q, Dist(q, p′j )) is
empty of points in Layer 2. If Ball(q, Dist(q, p′j )) is empty, then p′j will be returned as the
nearest neighbor of q correctly by the Query function. Otherwise, |Ball(q, Dist(q, p′j ))| = 1;
if there were more points than 1 in this ball, there would exist a Delaunay edge between
two of these points, contradicting the fact that they are a maximal independent set in the
Delaunay triangulation of P . Let v ∗ in Layer 3 be inside Ball(q, Dist(q, p′j )) in this case. Then
one can draw an empty ball passing through p′j and v ∗ , keeping it inside Ball(q, Dist(q, p′j )).
This means there must be a Delaunay edge connecting p′j and v ∗ implying that v ∗ ∈ H(p′j )
and hence the Query function must return v ∗ correctly.
In order to prove the running time of the algorithm is O(log n), some simple assumptions
must be made about the input data. This is the same restriction described in chapter 3.2.
38
Let P be a finite set of points in Rd such that |P | = n. Let µ be a counting measure on P .
Let the measure of a ball, µ(Ball(c, r)) be defined as the number of points in Ball(c, r) ∩ P .
A point q is said to have expansion constant γ if for all k ∈ (1, n):
µ(Ball(q, 2 × rad(q, Nqk ))) ≤ γk
Throughout the analysis, assume that the query point q has an expansion constant
γ =O(1). Note that for finding exact nearest neighbors in O(log n) time, the queries with
high γ are precisely the queries which drive provable (1 + ǫ)-approximate nearest neighbor
data structures to spend more time in computing the solution when ǫ is close to zero [69].
The following observation bounds the running time of our query:
Lemma 4.2.3. In O(log n) time, Ball(q, r) can be computed such that, in expectation,
|Ball(q, r)| =O(1).
Proof. This follows directly from Lemma 3.2.3, which shows the nearest neighbor to q
chosen from O(1) points in P adjacent to q in Morton order is contained in a box that has,
in expectation, only a constant factor more points than the box containing nn(q, P ).
Finally, we can use the above observation to find the actual bound:
Theorem 4.2.4. Given q, with expansion constant γ =O(1), nn(q,P) can be found in
O(log n) time in expectation.
Proof. Given that P is sorted in Morton order, a binary search for q obviously takes only
O(log n) time. This yields a ball to be refined with only expected O(1) vertices of the
Delaunay triangulation of P . Compass routing can therefore find a path containing only
O(1) vertices. Given that any vertex can be processed in O(log n) time to find a nearer
neighbor by using the Voronoi cell, nn(q, P ) can be found in O(log n) time in expectation.
Note that splitting P in two layers, does not increase the running time because the number
of points visited is still expected O(1) (By Lemma 4.2.3).
The construction time of the algorithm is O(n log n), bounded by the sorting of the
input set in Morton order, as well as constructing the Delaunay graph and Voronoi graph,
all of which have O(n log n) running times. The maximal independent set is found in O(n)
time.
4.3
Experimental Analysis
In this section the DelaunayNN algorithm will be tested, in practice, against two other
nearest neighbor algorithms. The first was ANN, the kd-tree nearest neighbor implementation from David Mount [69]. The second was an implementation of the full Delaunay
hierarchy (FDH) algorithm presented by Birn et al. [18].
Experiments were conducted on a machine with dual 2.66 GHz Quad-core Intel Xeon
CPUs, using a total of 4 GB DDR memory. Each core had 2 MB of total cache. The
operating system used was SUSE Linux version 11.2, kernel 2.6.31.8-0.1. All source code
was compiled using g++ version 4.4.1, with -O3 enabled.
39
DelaunayNN was written using C++. It used the Triangle library by Jonathan Shewchuk [92]
to construct the Delaunay triangulation in pre-processing. It should also be noted that the
maximum degree of any vertex for the majority of tested data sets was 64, which was small
enough that the Voronoi pre-processing of points was not needed for most of the experiments, the lower bound Ω(1) was replaced by 64. The final distribution was specifically
designed to to check the case where large degree vertices become a factor, and did use the
Vornoi cell pre-processing.
Next, the FDH algorithm will be described in detail. For detail on the ANN algorithm,
please refer to chapter 3.3.1
4.3.1
FDH Algorithm
FDH answers nearest neighbor queries using compass routing on a full Delaunay hierarchy. Construction is done using an incremental Delaunay graph construction algorithm.
Points are inserted, and their index is stored along with all edges incident on them in the
graph. As new points are added, edge lists are updated for every previous point. This
process yields a graph with a superset of Delaunay edges. To answer a query, the algorithm
begins at the 0 index point, and uses compass routing to walk the graph. Due to the layout,
the algorithm only considers adjacent points with a higher index than the current location.
FDH was implemented using C++. It used the CGAL library [19] to construct the Delaunay
hierarchy in pre-processing.
In both DelaunayNN and FDH, exact predicates were used to construct the Delaunay
graphs. To keep a fair comparison with ANN, however, both used inexact floating point
arithmetic when computing distances for queries. In all experimental cases this had no
impact on the correctness of the solution. For both DelaunayNN and FDH, points were
stored along with edges of the graph in order to take advantage of spatial locality in the
cache, at the cost of some storage efficiency. In all experiments, the nearest neighbor to the
query point was found exactly (ANN used ǫ = 0).
4.3.2
Data Distributions
For comparison purposes, point distributions for the experiments were chosen to be the
same as those used by Birn et al. [18, 36]. To recap, there are four distributions used:
1. Data points chosen uniformly at random from the unit square. Query points chosen
uniformly at random from a square 5% larger than the unit square.
2. Data points chosen uniformly at random from the border of the unit circle. Query
points chosen uniformly at random from the smallest containing square around the
unit circle.
3. Data points chosen with 95% from the unit circle, 5% from the smallest square containing the unit circle. Query points chosen at random from the unit circle.
4. Data points chosen with x = [−1000, 1000] and y = x2 . Query points chosen uniformly
at random from the rectangle containing the parabola.
40
5. Data points chosen from the border of several non-overlapping circles. The centers
of these circles are included as well. Query points are chosen from the bounding box
containing all circles.
For each experiment, point sets were created ranging in size from one million to 128
million points. 100,000 queries were used in each experiment. To account for randomness
in the algorithms and the system, each experiment was run multiple times (with unique
data and query sets for each), and the results were averaged. The next section describes
the results and shows graphs of the run time. Note that all graphs use a base 2 logarithmic
scale.
4.3.3
Experimental Results
As shown in Figures 4.5 - 4.8, the DelaunayNN algorithm behaves very well in practice
on point sets from various distributions. For data sets of sufficient size, the DelaunayNN
implementation proves faster than FDH in all cases, and faster than ANN in almost all
cases.
Figure 4.5 shows the results for uniform distribution, which most closely follows the
bounded expansion constant considered in the analysis. All implementations performed
very well, with low average query times even for very large data sets. Overall, the increase
in average query time was significantly less than log n for all three implementations.
In Figure 4.6 we see that for data sets where points are distributed on a circle, the
DelaunayNN displays timing results that are very similar to it’s performance on uniformly
distributed data, where both ANN and FDH perform substantially worse than on uniform
data. This trend continues in Figure 4.8, with the exception of ANN’s performance, which
is closer to its performance on uniform data.
One anomalous case is documented in Figure 4.7. In this case, ANN had a marked edge
in performance for larger point sets over DelaunayNN and FDH. It is also worth noting
that for this type of distribution, all implementations had significantly worse scaling than
on other distributions.
The final test case was designed to see how linear degree vertices would impact the
running time of the algorithm. In this instance, data points were chosen from the borders
of a small number (50) of non-overlapping circles. In addition, the centers of these circles
were were included as data points. In the Delaunay graph, these centers become vertices
of linear degree, as they will form a cell with every point on the edge of the circle. Finally,
a percentage of data points were included from outside each circle. The results of this test
case (Figure 4.9) show that using the Voronoi cell search makes the difference between a
faster query time than ANN and a slower one. It should also be noted that this type of
distribution severely impacted the performance of the full Delaunay hierarchy algorithm.
Figure 4.10 shows the difference in pre-processing time for the various implementations
on uniform data. While ANN maintains a distinct advantage, DelaunayNN scales much
better as the data set size increases. It is also clear that using divide and conquer approach
allows for Delaunay triangulation with much more reasonable construction times, whereas
FDH is forced to use the practically less efficient, incremental construction.
41
Query Time for Uniform Distribution
Microseconds Per Query
1
DelaunayNN
FDH
ANN
0.5
0.25
0.125
1
2
4
8
16
Number Of Data Points (Millions)
32
64
128
Figure 4.5: Showing average time per query versus data set size for points taken
from uniform distribution.
4.4
3-Dimensional Experiments
There are several factors that, in theory, limit the DelaunayNN algorithm to nearest
neighbor searches to two dimensions. First, it is well known that Delaunay graphs in three
dimensions can contain a quadratic number of edges. Second, the approach to dealing with
high degree vertices by storing the Voronoi cells can’t work in three dimensions; the regions
can no longer be stored in a manner that allows us to perform a binary search to locate
the next closer vertex. These limitations mean that, the expected running time is linear, as
opposed to logarithmic.
There are some indications, however, that this approach may be serviceable in practice
for data sets in three dimensions. It has been shown that while in theory Delaunay graphs
are quadratic in three dimensions, in practice the organizations where this occurs are very
specific and contrived [42]. In addition, data sets that follow a random distribution do
not, in general, contain vertices of linear degree. In this next set of experiments, the timing
results from using the DelaunayNN algorithm on three dimensional data, compared to using
ANN. Data was generated from the same distributions as in the previous section (in three
dimensions instead of two).
Results from the FDH algorithm are not included in the three dimensional experiment.
The hierarchy built by this algorithm becomes very large in three dimensions, even for these
well behaved distributions, and so the query times do not remain competitive. It can be
seen in Figure 4.11 that the DelaunayNN algorithm performs quite well in practice when
compared to ANN, for uniform data sets in three dimensions. Similar results to the two
dimensional case are observed for both of the circular distributions (Figures 4.12 - 4.13 as
well. Results for the parabolic distribution are slightly better than in the two dimensional
case. ANN still surpasses the Delaunay based approach for large data sets, although in this
case the size of the data set is larger at the crossing point than previously. Two final cases
42
show the pitfalls of this approach in three dimensional cases. One test case had several
spheres, with centers included as data points. Similar to the two dimensional case, this
creates a Delaunay graph with several vertices of linear degree. In two dimensions, the
Voronoi cell approach kept these additional points from having too large an impact on the
average query time. In the three dimensional case, however, having to do a linear scan of
the adjacent vertices allowed ANN to surpass the Delaunay algorithm (Figure 4.15). The
last test case contained data points sampled from two orthogonal lines. In this case, the
Delaunay graph is truly quadratic. Results from this case are not shown, as the running
time for the DelaunayNN algorithm was too large to sufficiently test. Pre-processing time
for the three dimensional case was an issue. Delaunay graphs for three dimensional data
were much slower to construct than the corresponding kd-trees. This was partly due to
the fact that divide and conquer construction for Delaunay graphs is not available for three
dimensions. In light of this, using a Delaunay graph to do nearest neighbor searches in
three dimensions may only be advisable when construction time is not a factor, or if the
number of queries is large enough to overcome the difference in timing.
43
Query Time for Circular Distribution
256
DelaunayNN
FDH
ANN
Microseconds Per Query
64
16
4
1
0.25
0.0625
1
2
4
8
16
Number Of Data Points (Millions)
32
64
128
Figure 4.6: Showing average time per query versus data set size for points taken
from a unit circle. Query points were taken from the smallest square enclosing the
circle.
Query Time for Fuzzy Circle Distribution
2
DelaunayNN
FDH
ANN
Microseconds Per Query
1
0.5
0.25
0.125
1
2
4
8
16
Number Of Data Points (Millions)
32
64
128
Figure 4.7: Showing average query time versus data set size for points taken from
the unit circle, plus some points chosen uniformly at random from the square
containing the circle. Query points taken from the circle.
44
Query Time for Parabolic Distribution
64
DelaunayNN
FDH
ANN
32
Microseconds Per Query
16
8
4
2
1
0.5
0.25
0.125
1
2
4
8
16
Number Of Data Points (Millions)
32
64
128
Figure 4.8: Showing average time per query versus data set size for points taken
from a parabola. Query points were taken from the smallest rectangle enclosing
the parabola.
Query time for linear degree vertices
8
DelaunayNN(Voronoi Search)
FDH
ANN
DelaunayNN(No Voronoi)
Average time per query(Microseconds)
4
2
1
0.5
0.25
0.125
1
2
4
8
16
32
64
128
Number of Data Points(Millions)
Figure 4.9: Showing average time per query versus data set size for points taken
from a distribution with linear degree vertices. Note that ANN is faster than the
Delaunay algorithm run without the Voronoi cell search capability.
45
Construction Time
64
DelaunayNN
FDH
ANN
32
Microseconds per Point
16
8
4
2
1
0.5
1
2
4
8
16
32
64
128
Number Of Data Points (Millions)
Figure 4.10: Showing average time per point to pre-process data sets for queries.
Data was taken uniformly at random from the unit square.
Query time for uniform distribution(3D)
Average time per query(Microseconds)
4
DelaunayNN
ANN
2
1
0.5
1
2
4
8
16
32
64
128
Number of Data Points(Millions)
Figure 4.11: Showing average time per query versus data set size for points taken
from uniform distribution in three dimensions.
46
Query time for circular distribution(3D)
1024
DelaunayNN
ANN
Average time per query(Microseconds)
512
256
128
64
32
16
8
4
2
1
1
2
4
8
16
Number of Data Points(Millions)
32
64
128
Figure 4.12: Showing average time per query versus data set size for points taken
from the circular distribution in three dimensions.
Query time for fuzzy circular distribution(3D)
Average time per query(Microseconds)
1
DelaunayNN
ANN
0.5
0.25
1
2
4
8
16
Number of Data Points(Millions)
32
64
128
Figure 4.13: Showing average time per query versus data set size for points taken
from the “fuzzy“ circular distribution in three dimensions.
47
Query time for parabolic distribution(3D)
32
DelaunayNN
ANN
Average time per query(Microseconds)
16
8
4
2
1
0.5
0.25
1
2
4
8
16
32
64
128
Number of Data Points(Millions)
Figure 4.14: Showing average time per query versus data set size for points taken
from the parabolic distribution in three dimensions.
Query time for linear degree vertices(3D)
Average time per query(Microseconds)
4
DelaunayNN
ANN
2
1
0.5
0.25
1
2
4
8
16
Number of Data Points(Millions)
32
64
128
Figure 4.15: Showing average time per query versus data set size for points taken
from a distribution with linear degree vertices. In this case, ANN wins due to the
necessity of processing the linear degree vertices by sequential scan.
48
CHAPTER 5
GEOMETRIC MINIMUM SPANNING TREES
In this chapter, a practical deterministic algorithm to solve the problem of constructing
geometric minimum spanning trees is presented. Called GeoFilterKruskal, the algorithm
efficiently generates the graphs in a manner that easily lends itself to parallelization. Prior
to discussing the actual algorithm, a brief history of other approaches to solving this problem
will be presented.
It is well established that the GMST is a subset of edges in the Delaunay triangulation
of a point set [82]. It is well known that that this method is inefficient for any dimension
d > 2. It was shown by Agarwal et al. [2] that the GMST problem is related to solving
bichromatic closest pairs for some subsets of the input set (for an explanation of the BCCP
problem, see chapter 2.5).
Callahan [22] used well separated pair decomposition and bichromatic closest pairs to
solve the same problem in O(Td (n, n) log n), where Td (n, n) is the time required to solve the
bichromatic closest pairs problem for n red and n green points. It is known that bichromatic
closest pair is probably harder than computing the GMST [41].
Clarkson [30] gave an algorithm that is particularly efficient for points that are independently and uniformly distributed in a unit d-cube. His algorithm has an expected running
time of O(nα(cn, n)), where c is a constant depending on the dimension and α is an extremely slow growing inverse Ackermann function [31]. Bentley [14] also gave an expected
nearly linear time algorithm for computing GMSTs in Rd . Dwyer [39] proved that if a set of
points is generated uniformly at random from the unit ball in Rd , its Delaunay triangulation
has linear expected complexity and can be computed in expected linear time. Since GMSTs
are subsets of Euclidean Delaunay triangulations, one can combine this result with linear
time MST algorithms [53] to get an expected O(n) time algorithm for GMSTs of uniformly
distributed points in a unit ball. Rajasekaran [84] proposed a simple expected linear time
algorithm to compute GMSTs for uniform distributions in Rd . All these approaches use
bucketing techniques to execute a spiral search procedure for finding a supergraph of the
GMST with O(n) edges.
Narasimhan et al. [73] gave a practical algorithm that solves the GMST problem. They
prove that for uniformly distributed points, in fixed dimensions, an expected O(n log n) steps
suffice to compute the GMST using well separated pair decomposition. Their algorithm,
GeoMST2, mimics Kruskal’s algorithm [55] on well separated pairs and eliminates the need
to compute bichromatic closest pairs for many well separated pairs.
49
Brennan [20] presented a modification to Kruskal’s classic minimum spanning tree
(MST) algorithm [55] that operated in a manner similar to quicksort; splitting an edge
set into “light” and “heavy” subsets. Recently, Osipov et al. [77] further expanded this
idea by adding a multi-core friendly filtering step designed to eliminate edges that were
obviously not in the MST (Filter-Kruskal).
5.1
Notation
Points are denoted by lower-case Roman letters. dℓ (p, q) denotes the distance between
the points p and q in L2 metric. Upper-case Roman letters are reserved for sets. Scalars
except for c, d, m and n are represented by lower-case Greek letters. i, j, k are reserved for
indexing purposes. Vol( ) denotes the volume of an object. For a given quadtree, Box(p,q)
denotes the smallest quadtree box containing points p and q; Fraktur letters (a) denote a
quadtree/fair split tree node.
The Cartesian product of two sets X and Y , is denoted by X × Y = {(x, y) | x ∈
X and y ∈ Y }.
Let P be a point set in Rd , where d is the dimension. The bounding rectangle of P ,
denoted by R(P ) is defined to be the smallest rectangle that encloses all points in P , where
the word “rectangle” denotes the Cartesian product of R = [x1 , x′1 ] × [x2 , x′2 ] × ... × [xd , x′d ]
in Rd . Denote the length of R in the ith dimension by li (R) = x′i −xi . Denote the maximum
and minimum lengths by lmax (R) and lmin (R). When all li (R) are equal, R is a d-cube,
and denote its length by l(R) = lmax (R) = lmin (R). li (P ), lmin (P ), lmax (P ) is shorthand
for li (R(P )), lmin (R(P )) and lmax (R(P )), respectively.
MinDist(a, b) denotes the minimum distance between the quadtree boxes of two nodes
in case of a quadtree and the same between the bounding boxes of two nodes in case of a
fair split tree. Bccp(a, b) computes the bichromatic closest pair of two nodes, and returns
{u, v, δ}, where (u, v) is the edge defining the Bccp and δ is the edge length. Left(a) and
Right(a) denotes the left and right child of a node. |.| denotes the cardinality of a set or the
number of points in a quadtree/fair split tree node. α(n) is used to denote inverse of the
Ackermann function [31].
5.2
The GeoFilterKruskal algorithm
The GeoFilterKruskal algorithm computes a GMST for P ⊆ Rd . Kruskal’s [55]
algorithm shows that given a set of edges, the MST can be constructed by considering
edges in increasing order of weight. It is known that the GMST can be computed by
running Kruskal’s algorithm on the Bccp edges of the WSPD of P [22]. When Kruskal’s
algorithm adds a Bccp edge (u, v) to the GMST, where u, v ∈ P , it uses the UnionFind data
structure to check whether u and v belong to the same connected component. If they do,
that edge is discarded. Otherwise, it is added to the GMST. Hence, before testing for an
edge (u, v) for inclusion into the GMST, it should always attempt to add all Bccp edges
(u′ , v ′ ), such that, Dist(u′ , v ′ ) < Dist(u, v).
Algorithm 7 describes the GeoFilterKruskal algorithm for computing the geometric minimum spanning tree of a set of points. The input to the algorithm is a WSPD of the point
50
set P ⊆ Rd . The set of WSPs S is partitioned into set El that contains WSPs with cardinality less than β (initially 2), and set Eu = S \ El . Then the BCCP of all elements of set
El are is computed, and compute ρ equal to the minimum dℓ (a, b), for all (a, b) ∈ Eu . El
is further partitioned into El1 , containing all elements with a BCCP distance less than ρ,
and El2 = El \ El1 . El1 is passed to the Kruskal procedure, and El2 ∪ Eu is passed to the
F ilter procedure.
The Kruskal procedure is the classic Kruskal’s algorithm. It first sorts the edges according to their length, then adds them to the GMST. The Union Find data structure is
maintained to keep track of the connected components. Any edge passed to this procedure
is either added to the GMST or discarded.
F ilter examines all the remaining WSPs, and uses Union Find to check if they have
been connected by a previous call to Kruskal. By partitioning the WSPs into batches, the
GeoFilterKruskal algorithm can apply the same technique to geometric minimum spanning
tree construction.
The GeoF ilterKruskal procedure is recursively called, increasing the threshold value (β)
by one each time, on the WSPs that survive the F ilter procedure, until the complete minimum spanning tree has been found.
5.3
Correctness
Given previous work by Kruskal [31] as well as Callahan [22], it is sufficient to show
two facts to ensure the correctness of the GeoFilterKruskal algorithm. First, the WSPs
must be added to the GMST in the order of their Bccp distance. This is obviously true,
considering WSPs are only passed to the Kruskal procedure if their Bccp distance is less
than the lower bound on the Bccp distance of the remaining WSPs. Second, the Filter
procedure should not remove WSPs that should be added to the GMST. Once again, it
is clear that any edge removed by the Filter procedure would have been removed by
the Kruskal procedure eventually, as they both use the UnionFind structure to determine
connectivity. By these two facts the GeoFilterKruskal algorithm produces a correct
GMST.
5.4
Analysis of the Running Time
The real bottleneck of this algorithm, as well as the one proposed by Narasimhan [73], is
the computation of the Bccp.1 . If |A| = |B| =O(n), the Bccp algorithm stated in chapter 2.5
has a worst case time complexity of O(n2 ). Since O(n) must be processed edges, naively,
the computation time for GMST will be O(n3 ) in the worst case.
In order to bound the running time of the GeoFilterKruskal algorithm, the size of
the well separated pairs passed to the Bccp algorithm must be a constant. In that case, even
processing O(n) edges will not cause the running time to go above the O(n log n) running
1
According to the algebraic decision tree model, the lower bound of the set intersection problem can be
shown to be Ω(n log n) [43]. Here, the set intersection problem is solved using Bccp. If the Bccp distance
between two sets is zero, the sets intersect, otherwise they do not. Since the set intersection problem is lower
bounded by Ω(n log n), the Bccp computation is also lower bounded by Ω(n log n).
51
Algorithm 7 GeoFilterKruskal Algorithm
Require: S = {(a1 , b1 ), ..., (am , bm )} is a WSPD, constructed from P ⊆ Rd ; T = {}.
Ensure: Bccp Threshold β ≥ 2.
1: procedure GeoFilterKruskal(Sequence of WSPs : S, Sequence of Edges : T , UnionFind : G, Integer : β)
2:
El = Eu = El1 = El2 = ∅
3:
for all (ai , bi ) ∈ S do
4:
if (|ai | + |bi |) ≤ β then El = El ∪ {(ai , bi )} else Eu = Eu ∪ {(ai , bi )}
5:
end for
6:
ρ = min{MinDist{ai , bi } : (ai , bi ) ∈ Eu , i = 1, 2, ..., m}
7:
for all (ai , bi ) ∈ El do
8:
{u, v, δ} = Bccp(ai , bi )
9:
if (δ ≤ ρ) then El1 = El1 ∪ {(u, v)} else El2 = El2 ∪ {(u, v)}
10:
end for
11:
Kruskal(El1 , T, G)
12:
Enew = El2 ∪ Eu
13:
Filter(Enew , G)
14:
if ( |T | < (n − 1)) then GeoFilterKruskal(Enew , T, G, β + 1)
15: end procedure
16: procedure Kruskal(Sequence of WSPs : E, Sequence of Edges : T , UnionFind : G)
17:
Sort(E): by increasing Bccp distance
18:
for all (u, v) ∈ E do
19:
if G.Find(u) 6= G.Find(v) then T = T ∪ {(u, v)} ; G.Union(u, v);
20:
end for
21: end procedure
22: procedure Filter(Sequence of WSPs : E, UnionFind : G)
23:
for all (ai , bi ) ∈ E do
24:
if (G.Find(u) = G.Find(v) : u ∈ ai ,v ∈ bi ) then E = E \ {(ai , bi )}
25:
end for
26: end procedure
52
Multi-core scaling of GeoFK vs. KNNG
18
AMD
Intel
knng(AMD)
knng(Intel)
16
Total Run Time(seconds)
14
12
10
8
6
4
2
0
10
20
30
40
50
60
Number of Threads
Figure 5.1: This figure demonstrates the run time gains of the algorithm as more
threads are used. Scaling for two architectures. The AMD line was computed on
the machine described in Section 5.6. The Intel line used a machine with four
2.66GHz Intel(R) Xeon(R) CPU X5355, 4096 KB of L1 cache, and 8GB total
memory. For additional comparison, we include KNNG construction time using a
parallel 8-nearest neighbor graph algorithm. All cases were run on 20 data sets
from uniform random distribution of size 106 points, final total run time is an
average of the results.
time of the initial sort. Appendix A gives a formal high probability bound proof showing
that this is the case. The running time of GeoFilterKruskalcan then be shown to be
O(n) plus the time of one sort.
5.5
Parallelization
Although the whole of algorithm 7 is not parallelizable, most portions of the algorithm
can be parallelized. The parallel partition algorithm [83] is used in order to divide the
set S into subsets El and Eu (See Algorithm 7). ρ can be computed using parallel prefix
computation. The further subdivision of El , as well as the Filter procedure, are just
further instances of parallel partition. The sorting step used in the Kruskal procedure
can also be parallelized [83]. Efforts to parallelize the linear time tree construction showed
that one can not use more number of processors on multi-core machines to speed up this
construction, because it is memory bound. Figure 5.1 shows empirical results for parallel
execution of the GeoFilterKruskal algorithm.
5.5.1
Comparing Quadtrees and Fair Split Trees
Theoretically, the choice of partition tree used to construct the WSPD for this algorithm
does not matter. This section compares results of running the GeoFilterKrusakl algorithm
using the quadtree versus a KD-tree(with a modified splitting rule, this tree will be referred
53
(a) Number of WSPs v. Number of Points
140
Fair Split Tree
Quadtree
120
Number of Pairs(/106)
100
80
60
40
20
0
1
2
3
4
5
6
7
8
9
10
6
Number of Points(/10 )
Figure 5.2: Comparison of the number of WSPs versus the number of points for
quadtrees and fair split trees
to as a fair split tree, or FST). The main difference between the two is that the quadtree
can be constructed faster, however the FST exhibits better clustering.
The data sets used were uniformly random, two dimensional points. Experiments run
on data sets from other distributions and higher dimensions (up to 5) were not significantly
different. Figure 5.2 shows the difference in the size of the WSPD set computed from the
two trees. As expected, better clustering in the fair split tree yields fewer well separated
pairs. Figure 5.3 shows the timing comparisons from using both of the trees. Even though
the quadtree can be constructed more quickly, the algorithm runs significantly faster given
the fewer WSPs produced by the fair split tree.
Figure 5.4 shows that modifying the separation factor for the well separated pair decomposition in the code produced fewer WSPs than required from the trees. In this case,
the MSTs produced had some small margin of error. Interestingly, it seems that if one
accepts a small error in the MST, reducing the separation factor can produce approximate
MSTs using many fewer WSPs.
Figure 5.5 shows empirical results for this case, varying
√
the separation
factor
from
2
(the
lowest factor that guarantees a correct result) down to
√
0.1 ∗ 2. At this time there are no theoretical guarantees on the error of the approximation.
These results show that, in practice, it appears to be quite small.
54
(b) Number of Points v. Time
8
Fair Split Tree
Quadtree
7
Time(seconds)
6
5
4
3
2
1
0
1
2
3
4
5
6
7
8
9
10
Number of Points(/106)
Figure 5.3: Comparison of the number of points versus GMST construction time
for quadtrees and fair split trees.
(c) Separation Factor v. Number of Pairs
60
Fair Split Tree
55
Number of Pairs(/106)
50
45
40
35
30
25
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Separation Factor
0.8
0.9
1
Figure 5.4: Separation factor of WSPs versus the number of WSPs produced.
55
(d) Separation Factor v. Percent Error
2.5
Fair Split Tree
Percent Error
2
1.5
1
0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Separation Factor
0.8
0.9
1
Figure 5.5: Separation of WSPs versus the error in the length of the GMST.
5.6
Geometric Minimum Spanning Tree Experimental
Setup
The GeoFilterKruskal algorithm was tested in practice against several other implementations of geometric minimum spanning tree algorithms. A subset of the algorithms
compared in [73] were chosen, excluding some based on the availability of source code and
the clear advantage shown by some algorithms in the aforementioned work. Based on the
results in the previous section, we elected to implement the Geometric Filter Kruskal algorithm using fair split trees, instead of quadtrees. Table 1 lists the algorithms that will be
referred to in the experimental section.
GeoFilterKruskal was written in C++ and compiled with /tt gcc with -O3 optimization. Parallel code was written using OpenMP [33] and the parallel mode extension to the
STL [83]. C++ source code for GeoMST, GeoMST2, and Triangle were provided by Giri
Narsimhan. In addition, Triangle used Jonathan Shewchuk’s triangle library for Delaunay
triangulation [92].
The machine used has 8 Quad-Core AMD Opteron(tm) Processor 8378 with hyperthreading enabled. Each core has a L1 cache size of 512 KB, L2 of 2MB and L3 of 6MB
with 128 GB total memory. The operating system was CentOS 5.3. All data was generated
and stored as 64 bit doubles.
In the next section there are two distinct series of graphs. The first set displays graphs
of total running time versus the number of input points, for two to five dimensional points,
with uniform random distribution in a unit hypercube. The L2 metric was used for distances
in all cases, and all algorithms were run on the same random data set. Each algorithm was
run on five data sets, and the results were averaged. As noted above, Triangle was not used
in dimensions greater than two.
56
Table 5.1: Algorithm Descriptions
Name
GeoFK#
GeoMST
GeoMST2
Triangle
Description
Implementation of Algorithm 7. There are two important differences between the implementation and the theoretical version. First, the BCCP
Threshold β is incremented in steps of size O(1) instead of size 1. This
change does not affect the analysis but helps in practice. Second, for
small well separated pairs (less than 32 total points) the BCCP is computed by a brute force algorithm. In the experimental results, GeoFK1
refers to the algorithm running with 1 thread. GeoFK8 refers to the
algorithm using 8 threads. This implementation used a fair split tree,
as opposed to the quadtree.
Described by Callahan and Kosaraju [22]. This algorithm computes a
WSPD of the input data followed by the BCCP of every pair. It then
runs Kruskal’s algorithm to find the MST.
Described in [73]. This algorithm improves on GeoMST by using
marginal distances and a priority queue to avoid many BCCP computations.
This algorithm first computes the Delaunay Triangulation of the input
data, then applies Kruskal’s algorithm. Triangle only works with two
dimensional data.
The second set of graphs shows the mean total running times for two dimensional data
of various distributions, as well as the standard deviation. The distributions were taken
from [73] (given n d-dimensional points with coordinates c1 ...cd ), shown in Table 5.2.
5.7
Geometric Minimum Spanning Tree Experimental
Results
As shown in Figures 5.6 - 5.8, GeoFK1 performs favorably in practice for almost all cases
compared to other algorithms (see Table 5.1). In two dimensions, only Triangle outperforms
GeoFK1. In higher dimensions, GeoFK1 is the clear winner when only one thread is used.
Figures 5.9 - 5.11, show that in most cases, GeoFK1 performs better regardless of
the distribution of the input point set. Apart from the fact that Triangle maintains its
superiority in two dimensions, GeoFK1 performs better in all the distributions that were
considered, except when the points are drawn from arith distribution. In the data set from
arith, the ordering of the WSPs based on the minimum distance is the same as based on
the Bccp distance. Hence the second partitioning step in GeoFK1 acts as an overhead. The
results from Figure 5.9 - 5.11, are for two dimensional data. The same experiments for
data sets of other dimensions did not give significantly different results, and so were not
included.
57
Table 5.2: Point Distribution Info
Name
unif
annul
arith
ball
clus
edge
diam
corn
grid
norm
spok
Description
c1 to cd chosen from unit hypercube with uniform distribution (U d )
c1 to c2 chosen from unit circle with uniform distribution, c3 ...cd chosen from U d
c1 = 0, 1, 4, 9, 16... c2 to cd are 0
c1 to cd chosen from unit hypersphere with uniform distribution
c1 to cd chosen from 10 clusters of normal distribution centered at 10 points chosen from U d
c1 chosen from U d , c2 to cd equal to c1
c1 chosen from U d , c2 to cd are 0
c1 to cd chosen from 2d unit hypercubes, each one centered
at one of the 2d corners of a (0,2) hypercube
n points chosen uniformly at random from a grid with 1.3n points,
the grid is housed in a unit hypercube
c1 to cd chosen from (−1, 1) with normal distribution
For each dimension d′ in d nd points chosen with cd′ chosen from U 1 and all others equal to 21
2-d
120
GeoMST
GeoMST2
GeoFK1
GeoFK8
Triangle
100
Total Run Time
80
60
40
20
0
1
2
3
4
5
6
7
8
9
10
Number of Points/106
Figure 5.6: Total running time for each algorithm over varying sized data sets.
Data was from uniformly random and two dimensional.
58
3-d
40
GeoMST
GeoMST2
GeoFK1
GeoFK8
35
Total Run Time
30
25
20
15
10
5
0
1
2
3
4
5
6
7
Number of Points/10
8
9
10
5
Figure 5.7: Total running time for each algorithm over varying sized data sets.
Data was from uniformly random and three dimensional.
4-d
180
GeoMST
GeoMST2
GeoFK1
GeoFK8
160
140
Total Run Time
120
100
80
60
40
20
0
1
2
3
4
5
6
7
8
9
Number of Points/105
Figure 5.8: Total running time for each algorithm over varying sized data sets.
Data was from uniformly random and four dimensional.
59
GeoMST
2.5
GeoMST
GeoFK1
Mean Run Time
2
1.5
1
0.5
0
unif annul arith ball clus edge diam corn grid norm spok
Distribution
Figure 5.9: Showing average error and standard deviation when comparing GeoMST to GeoFK1 on varying distributions. Data size was 106 points in two dimensions.
GeoMST2
1.3
GeoMST2
GeoFK1
1.2
1.1
Mean Run Time
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
unif annul arith ball clus edge diam corn grid norm spok
Distribution
Figure 5.10: Showing average error and standard deviation when comparing GeoMST2 to GeoFK1 on varying distributions. Data size was 106 points in two
dimensions.
60
Triangle
1.4
Triangle
GeoFK1
1.2
Mean Run Time
1
0.8
0.6
0.4
0.2
0
unif annul arith ball clus edge diam corn grid norm spok
Distribution
Figure 5.11: Showing average error and standard deviation when comparing Triangle to GeoFK1 on varying distributions. Data size was 106 points in two dimensions.
61
CHAPTER 6
NEAREST NEIGHBOR SEARCH OF HIGH
DIMENSIONAL SIFT DATA
High dimensional nearest neighbor search is not fundamentally different than nearest neighbor search in lower dimension. The challenge comes in dealing with the “curse of dimensionality,” the exponential dependence on dimension that cripples standard techniques when
they are applied in high dimensions. In this chapter, a new method, PCANN, for conducting higher dimensional nearest neighbor searches will be presented, and its performance will
be compared with other state-of-the-art implementations.
Methods for searching nearest neighbors in high dimensions can be grouped into exact
methods and approximate. The most obvious exact method is a sequential scan of the data,
and this is used if the number of queries to be made is small. Weber et al. [99] improve on
the sequential scan through use of of data compression. The idea of compressing the data
was later combined with R-trees to create the IQ-tree [15]. The iDistance technique maps
high dimensional points to a Morton order curve, then indexes the curve with a B-tree to
search for nearest neighbor. This is considered to be the most efficient algorithm for exact
nearest neighbor search in high dimensions, but approximation techniques beat it by a wide
margin [95]. All of these solutions have, at best, a provably linear query time.
For approximate nearest neighbor searching in high dimensions, Indyk and Motwani
gave the first algorithm with a less than linear query time as well as a somewhat practical
implementation. LSH uses multiple hash functions picked, at random, from a family of hash
functions in order to bucket the high dimensional points in a low dimensional subspace [50].
It then seeks to answer ball queries by hashing query points and returning any or all points
that lie within a radius. Two LSH methods have been used in practice. Rigorous-LSH [50]
uses a number of radius balls to guarantee a constant factor nearest neighbor solution,
although with substantial cost in both time and storage space. Adhoc-LSH [48] forgoes the
theoretical guarantees and instead focuses on fast queries. By using a heuristic approach
to selecting the query radius, as well as multiple shifted versions of the buckets, adhoc-LSH
answers queries much more rapidly than the rigorous-LSH method. The drawback, however,
is that the quality of the nearest neighbor solution suffer.
Proposed by Tao et al, LSB-trees [95] were designed to specifically address the shortcomings of the LSH algorithm for nearest neighbor searching. This method uses hash functions
similar to LSH, as well as a B-tree to index points for quick searching. They showed that,
in practice, they can easily outperform LSH and iDistance. Additionally, they were able to
62
give constant factor bounds on the nearest neighbor answer returned [95].
Other approximation algorithms are based on randomized kd-trees. Silpa-Anan and
Hartley demonstrated this method could work, in practice [93]. Fukunaga and Narendra [47]
proposed partitioning high level point sets recursively into disjoint groups using k-means
clustering. This was later put into practice for nearest neighbor queries on large databases.
Mikolajczyk and Matas [64] evaluated the practicality of many of these methods, and later
Muja and Lowe developed a self tuning algorithm, called FLANN, that actually selects
between several algorithms depending on the data set [71]. Using this, they were able to
outperform other methods, including LSH and ANN.
6.0.1
Notation
Points are denoted by lower-case Roman letters. d is reserved for the dimension of the
point set. n is reserved for the cardinality of the point set. dℓ (p, q) denotes the distance
between the points p and q in L2 metric. Upper-case Roman letters are reserved for matrices.
Additionally, point sets are considered to be stored in a d by n matrix. Bold, upper case
Roman letters are reserved for sets. Scalars are represented by lower-case Greek letters.
i, j, k are reserved for indexing purposes.
6.1
The PCANN Algorithm
The FLANN implementation demonstrates two useful ideas in terms of high dimensional
nearest neighbor searching. The first is that, depending on the data set, one method for
nearest neighbor searching may be superior to others. A practical algorithm, therefore,
might be able to gain advantage by analyzing the data set and picking an optimal method.
The second is the common thought that if the data is projected into enough low dimensional
subspaces, some of them will contain the correct adjacency data. PCANN(Algorithm 8)
seeks to combine these notions. Instead of random projections into low dimensional subspaces, this algorithm will analyze the data and find subspaces which best captures the
adjacency information.
Algorithm 8 details the construction of a data structure for high dimensional nearest
neighbor searching. The Build procedure is the called function. Let P be a set of points
in Rd . Let T be a set of points in Rd for which the nearest neighbor in P is known. ǫ is
the error parameter, ρ′ is the number of partitions, ν is a constant to define the number of
random projections used in partitioning the points.
First, principal component analysis(see chapter 2.7.1) is performed on the d dimensional
data set, identifying d eigenvectors and ordering them by dominance(Line 12). The data,
along with a training set of queries, are projected into the hyper-plane identified by the first
eigenvector(Line 13). Nearest neighbors are found for the projected points in this initial
subspace, and error is computed. If that error is greater than the target error ǫ, then the
projecting dimension is incremented and the procedure is repeated(Lines28-32). For ease of
explanation, this is described using a linear increase in projection dimension. In practice,
this is best accomplished using a binary search style approach.
Once the lower dimension has been identified, the points in d will be projected down
into the new subspace, defined by the first d′ eigenvectors from the PCA(Line 13). It would
63
be possible to construct a kd-tree over this projection, and search for nearest neighbors.
Instead, additional processing allows the further reduction of dimension (with a corresponding reduction in query time). In order to partition the point set, ν random 3 dimensional
projections are constructed. These projections are partitioned by fitting two slabs to the
point sets(Lines 15-23). In this case, a slab is defined as two parallel planes. This 2-line
center problem is solved using an approximate incremental algorithm introduced by Kumar
and Kumar [57]. In it, empty slabs are initially created, and the furthest point from them
in the data set is found. This point is added to each slab, and recursion is carried out
to find the best fit for the remaining points, which minimizes the width of the slabs and
also places a minimum threshold of points into each set. Out of ν projections, the one
with slabs of minimum width is chosen. The points are then divided into a red and blue
set,depending on which slab contained them. The entire procedure is repeated recursively
until the original data set has been divided into ρ′ partitions. The goal in dividing the
points in this manner is to find projections using PCA that allow maximum reduction in
dimension while minimizing the error in nearest neighbor searching.
Once all the projections have been identified, the Project procedure is repeated one final
time for each piece. This again identifies a subspace, based on PCA, for which the training
data has an acceptable error. Now, a kd-tree can be built over each piece, making up the
set K.
The number of partitions, ρ′ , chosen is influenced by several factors. In a standard,
sequential case, there is a tipping point (which can be determined empirically) where the
overhead from having additional partitions overcomes the advantage of partitioning the
point set. For data sets too large to fit into conventional memory, this technique can be
adapted to work effectively by choosing to partition until each piece can be processed in
conventional memory, eliminating the need for some costly calls to disk. This algorithm
also lends itself well to parallel queries. In this case, the number of partitions would be best
dictated by the number of available threads (some constant number of partitions going to
each thread).
Once the structure has been created, the query algorithm is simple. A query point is
projected into the subspace defined by each of the constructed partitions. Each partition
is queried using the kd-tree, and the top k results are reported. In the sequential version,
partitions are processed in order of subspace distance from the query point. Subspaces
can be eliminated if they are farther away than any current neighbor. For parallel queries,
the greatest timing improvements can be seen when queries are processed in batches. In
this case, each partition runs in a separate thread, and processes all queries in a batch.
Afterward, the results from each partition are combined, with the top k for each query
being reported.
The computational complexity of the construction is dominated by the two-plane fit
computation. The PCA computation is linear in complexity, and the kd-tree construction is
easily done in O(n log n) time. Kumar and Kumar showed that their k-clustering algorithm
has a running time of O(k n ) [57], which would be O(2n ) in this case. Note that this is
sub-optimal theoreticaly. Agarwal and Procopiuc give an approximation algorithm with a
running time of O(n log n) [3]. However, the incremental algorithm has been shown, in
practice, to terminate quickly in most cases [57].
64
Algorithm 8 Preprocessing for PCANN
Require: Point sets P, T ∈ Rd of size n.
Ensure: M[1...ρ] ← all independent partitions of P .
1: procedure Build(P , T , ǫ, ρ′ , ν)
2:
M′ [] ← Partition(P , T , ǫ, 0, ν, M′ [])
3:
for i = 1 to ρ′
4:
E[i] ← eigenvectors of M′ [i] from PCA
5:
M[i] ← Project(M′ [i], T, ǫ, E[i])
6:
K[i] ← kd-tree(M[i])
7: end procedure
8: procedure Partition(P , T , ǫ, ρ, ρ′ , ν, M[])
9:
if ρ >= ρ′ then
10:
partition P added to M
11:
return
12:
E ← d eigenvectors of P through PCA.
13:
P ′ ← Project(P, T, ǫ, E)
14:
φ←∞
15:
for j = 1 to ν
16:
R ← i random vectors of length 3
17:
P ′′ ← P ′ × R
18:
φ′ ← T woSlabF it(P ′′ , R, B) //The width of the slabs is returned
19:
//from the fitting procedure
20:
if φ′ < φ then
//and partitions points into
21:
φ = φ′
//slabs R and B
22:
P1 ← R
23:
P2 ← B
24:
Partition(P1 , T , ǫ, ρ + 2, ρ′ , ν, M[])
25:
Partition(P2 , T , ǫ, ρ + 2, ρ′ , ν, M[])
26: end procedure
Require: Point sets P, T ∈ Rd of size n.
27: procedure Project(P , T , ǫ, E)
28:
for i = 1 to d
29:
P ′ ← P × E[1...i]
30:
T ′ ←PT × E[1...i]
31:
e = t∈T dℓ (t, nn(t, P ′ )) − dℓ (t, nn(t, P ))
32:
if e/|T | < ǫ then return P ′
33: end procedure
65
6.2
High Dimensional Nearest Neighbor Search
Experimental Setup
The PCANN algorithm was tested experimentally against two competitors. The goal
was to show that given an appropriate data set, the PCA projection approach can lead to
better practical results.
The PCANN algorithm was coded in C++ and compiled with gcc with -O3 optimization. Parallel code was written using OpenMP [33] and the parallel mode extension to the
STL [83]. C++ source code. ANN was used to construct the kd-trees and query them.
The Linux machine used has 8 Quad-Core AMD Opteron(tm) Processor 8378 with
hyperthreading enabled. Each core has a L1 cache size of 512 KB, L2 of 2MB and L3 of
6MB with 128 GB total memory. The operating system was CentOS 5.3. All data was
generated and stored as 64 bit doubles.
The Windows machine used has an Intel Core 2 Duo CPU E7500. It has an 3MB L2
cache and 4 GB total memory. The operating system was Windows 7.
The FLANN and LSB-Tree algorithms were chosen as competitors, due to their performance when compared to the rest of the field. Both algorithms have been tested experimentally against other leading implementations, and came out ahead. Results from testing
LSH was not included in these experiments, due to it begin dominated by FLANN and
LSB-Tree in previous research. FLANN was tested on the Linux machine, LSB-Tree was
tested on the Windows machine.
6.2.1
FLANN
FLANN uses an optimization problem to choose between two nearest neighbor algorithms based on the structure of the data set and a training set of data. The first algorithm
it considers is a randomized kd-tree algorithm. It projects data points into a random, lowdimensional plane, and builds a kd-tree on it. It repeats this procedure until it has enough
trees so that queries can be answered within the accuracy of a user defined error parameter.
The projection dimension is a fixed constant.
The second algorithm is a hierarchical k-means tree. It is constructed by partitioning
the data sets into K distinct regions using k-means clustering, and recursively constructing
nodes until the number of points in a node is smaller than K. The query algorithm is
similar to the method used by ANN. First, there is a traversal of the tree in order to
identify necessary paths of traversal. These are placed in a priority queue and processed in
order.
6.2.2
LSB-Tree
The LSB-tree is designed to overcome the weaknesses of LSH in performing nearest
neighbor search. Its data structure is created by first choosing a projecting dimension, based
on the original dimension, the number of points, and the page size of the architecture. Then
random hash functions are chosen, in a manner similar to LSH, one for each dimension in
the projection. Each coordinate in the projected point comes from a hash of the point in
the original space. Once this is done, the hashed points are projected onto a Morton order
curve, and indexed with a B tree. Queries are processed by finding their location in the
66
Morton order, and considering the hashed points in that node. Further nodes are processed
in order of increasing z-value, until a maximum number of searches have been reached, or
the nearest unsearched node is farther than a pre-determined value.
A single LSB-tree is not accurate enough to return good nearest neighbor results. In
order to bound the quality of the answers, a set of trees is constructed (again based on
the dimension, size and page size). This LSB-forest gives 4-factor approximation with a
constant probability. By increasing the number of forests, this factor can be improved to
(2 + ǫ).
6.2.3
Experiment Data
Data for these experiments was taken from BIGANN, consisting of one billion 128dimensional SIFT descriptors. These features were extracted from one million images. SIFT
data is commonly used along with nearest neighbor searching to match features, making
this a realistic test of the effectiveness of these methods. Finally, this is similar data to that
used by the FLANN authors to test their algorithm. From the one billion vectors, a subset
were selected to form data sets of ranging size for the tests.
In addition to the SIFT data, the algorithms were also tested on an artificial distribution.
Uniformly random data distributed in the unit hypercube has much less structure than
SIFT data. This will demonstrate how the PCANN algorithm behaves when the principal
component analysis cannot identify very dominant eigenvectors for projection.
6.3
High Dimensional Nearest Neighbor Search
Experimental Results
The first set of experimental results show a comparison between PCANN and FLANN(Figure 6.1)
and PCANN and LSB-Tree(Figure 6.2). PCANN clearly has an advantage in timing on the
SIFT data. Figure 6.3 shows the average error for all three algorithms. In order to tune
PCANN and FLANN, the LSB-Tree algorithm was run first, and it’s error was used as
the target for the other two. Error factor is determined as the distance of the found near
neighbor divided by the distance to the correct nearest neighbor. All three algorithms had
similar error.
The second set of experiments show the same trials on uniform random data. In this
case, PCANN did not have as good performance. Figure 6.4 shows that it was still able to
outperform FLANN, but LSB-Tree had better query times(Figure 6.5). Again, error was
tuned to LSB-Tree’s error, and results did not differ significantly from the SIFT data in
terms of relative error.
Figure 6.6 shows the error factor for varying projection dimensions. In this experiment,
PCA was done on the data set, and the data points projected into a particular dimension.
Queries were run, and the error factor calculated. The error is slightly higher than it
would be if partitioning had been done, but this simpler graph demonstrates the principal
without much clutter. The graph clearly shows that in order to achieve a low error rate, the
SIFT data requires a much lower projection dimension than uniform data, which translates
directly into a faster query time.
67
The final graph(Figure 6.7) shows PCANN’s performance in parallel. The sequential
version was tested alongside a parallel version running on 8 threads. In this instance, the
parallel version has around 3.5 times faster queries than the sequential version. The need
to compile the results from various threads prevents the algorithm from being perfectly
parallel.
Time v Data Size
50
FLANN
PCANN
Average Time per Query(microseconds)
45
40
35
30
25
20
15
10
5
0
0
50
100
150
200
Data Size(Thousands of Points)
250
300
Figure 6.1: Timing results for PCANN versus FLANN on SIFT data.
68
Time v Data Size
40
LSB-Tree
PCANN
Average Time per Query(microseconds)
35
30
25
20
15
10
0
50
100
150
200
250
300
Data Size(Thousands of Points)
Figure 6.2: Timing results for PCANN versus LSB-Tree on SIFT data.
Nearest Neighbor Error
1.25
LSB-Tree
PCANN
FLANN
Average Error Factor
1.2
1.15
1.1
1.05
1
0
50
100
150
200
Data Size(Thousands of Points)
250
300
Figure 6.3: Average error for PCANN, FLANN and LSB-Tree on SIFT data.
69
Time v Data Size
80
FLANN
PCANN
Average Query Time(microseconds)
70
60
50
40
30
20
10
0
0
50
100
150
200
Data Size(Thousands of Points)
250
300
Figure 6.4: Timing results for PCANN versus FLANN on uniform random data.
Time v Data Size
60
LSB-Tree
PCANN
Average Query Time(microseconds)
50
40
30
20
10
0
0
50
100
150
200
Data Size(Thousands of Points)
250
300
Figure 6.5: Timing results for PCANN versus LSB-Tree on uniform random data.
70
Error Factor in PCANN
1.25
SIFT
Uniform
Error Factor
1.2
1.15
1.1
1.05
1
0
20
40
60
80
Projection Dimension
100
120
140
Figure 6.6: Average error for PCA projection for SIFT and uniform random data.
Average is from 1000 queries on 100000 data points.
Time v Data Size
35
Sequential
Parallel
Average Query Time(microseconds)
30
25
20
15
10
5
0
0
50
100
150
200
250
300
Data Size(Thousands of Points)
Figure 6.7: Timing results for sequential PCANN versus parallel on SIFT data.
71
CHAPTER 7
CONCLUSIONS
This work has attempted to expand the tool set for solving common computational geometry problems. Through the use of common building blocks, several algorithms have been
presented that offer advantages over traditional competitors.
The knng algorithm was presented as an effective solution to the k-nearest neighbor
graph construction problem, which takes advantage advantage of multiple threads. While
the algorithm performs best on point sets that use integer coordinates, it is clear from
experimentation that the algorithm is still viable using floating point coordinates. Further,
the algorithm scales well in practice as k increases, as well as for data sets that are too large
to reside in internal memory. Finally, the cache efficiency of the algorithm should allow it
to scale well as more processing power becomes available.
The DelaunayNN algorithm for finding the nearest neighbor for a query in two dimensions has both an expected run time bound of O(log n) and strong experimental performance
when compared to existing, state of the art implementations. This work has also explored
using the algorithm on data sets in three dimensions, and seen that in some cases it may
be superior. It remains to be seen if this approach can be applied in a reasonable manner
to dimensions higher than three, and if it can be extended to allow for efficient solutions to
the k-nearest neighbor problem.
The GeoFilterKruskal algorithm is a provably efficient and practically effective method
for constructing geometric minimum spanning trees. It uses well separated pair computation
in combination with partitioning and filtering to solve the problem in a parallelizeable way.
As demonstrated on a wide variety of data sets, it is superior to many state-of-the art
implementations on most distributions.
The PCANN algorithm demonstrates how appropriate subspace selection, through the
use of principal component analysis and projective clustering, can improve the query time for
high dimensional nearest neighbor searching on some data sets. While it was less efficient on
random data sets, it clearly offers an effective and parallizeable solution for data structures
which demonstrate an appropriate, exploitable structure.
The field of computational geometry offers a wide array of problems which are applicable
to a multitude of other fields of science. By improving on solutions to these problems,
hopefully both the field of computational geometry, as well as the fields that use it, can be
advanced. The goal of this work was to be just such an improvement, and it is the hope of
the author that other researchers in these fields will find the results presented here useful.
72
APPENDIX A
FULL PROOF OF THE GMST ALGORITHM
ANALYSIS
This high probability proof was written in conjunction with Samidh Chaterjee, and is
reproduced here with his permission.
High Probability Bound Analysis. In this section we show that the GeoFilterKruskal algorithm takes one sort plus O(n) additional steps, with high probability
(WHP) [68], to compute the GMST. Let P r(E) denote the probability of occurrence of an
event E, where E is a function of n. An event E is said to occur WHP if given β > 1,
P r(E) > 1 − 1/nβ [68]. Let P be a set of n points chosen uniformly from a unit hypercube
H in Rd . Given this, we state the following lemma from [73].
Lemma A.0.1. Let C1 and C2 be convex regions in H such that α ≤ volume(C1 )/volume(C2 ) ≤
1/α for some constant 0 < α < 1. If |C1 ∩ P | is bounded by a constant, then with high
probability |C2 ∩ P | is also bounded by a constant.
We will use the above lemma to prove the following claims.
Lemma A.0.2. Given a constant γ > 1, WHP, GeoFilterKruskalfilters out WSPs that
have more than γ points. We prove this lemma for quadtrees only, for FST, the proof does
not change from what is shown in [73].
Proof. The proof of this lemma is similar to the one for GeoMST2. Consider a WSP (a, b).
If both |a| and |b| are less than or equal to γ then the time to compute their Bccp distance
is O(1). Let us now assume, w.lo.g., that |a| > γ. We will show that, in this case, we do
not need to compute the Bccp distance of (a, b) WHP. Let pq be a line segment joining
a and b such that the length of pq (let us denote this by |pq|) is MinDist(a, b). Let C1
be a hypersphere centered at the midpoint of pq and radius |pq|/4. Let C2 be another
hypersphere with the same center but radius 3|pq|/2. Since a and b are well separated,
C2 will contain both a and b. Now, volume(C1 )/volume(C2 ) = 6−d . Since C1 is a convex
region, if |C1 | is empty, then by Lemma A.0.1, |C2 | is bounded by a constant WHP. But
C2 contains a which has more than γ points. Hence C1 cannot be empty WHP. Let a ∈ a,
b ∈ b and c ∈ C1 . Also, let the pair (a, c) and (b, c) belong to WSPs (u1 , v1 ) and (u2 , v2 )
respectively. Note that Bccp(a, b) must be greater than Bccp(u1 , v1 ) and Bccp(u2 , v2 ). Since
our algorithm adds the Bccp edges by order of their increasing distance, c and the points
73
in a will be connected before the Bccp edge between a and b is examined. The same is true
for c and the points in b. This causes a and b to belong to the same connected component
WHP, and thus, our filtering step will get rid of the well separated pair (a, b) before we
need to compute its Bccp edge.
Lemma A.0.3. WHP, the total running time of the UnionFind operation is O(α(n)n).
Proof. Lemma A.0.2 shows that, WHP, we only need to compute Bccp distances of WSPs of
constant size. Since we compute Bccp distances incrementally, WHP, the number of calls to
the GeoFilterKruskal procedure is also bounded above by O(1). In each of such calls,
the Filter function is called once, which in turn calls the Find(u) function of the UnionFind
data structure O(n) times. Hence, there are in total O(n) Find(u) operations done WHP.
Thus the overall running time of the Union() and Find() operations is O(α(n)n) WHP.
Theorem A.0.4. GeoFilterKruskaltakes one sort plus O(n) additional steps, WHP,
to compute the GMST.
Proof. We partition the list of well separated pairs twice in the GeoFilterKruskal method.
The first time we do it based on the need to compute the Bccp of the well separated pair.
We have the sets El and Eu in the process. This takes O(n) time except for the Bccp
computation. In O(n) time we can find the pivot element of Eu for the next partition.
This partitioning also takes linear time. From Lemma A.0.2, we can infer that the recursive
call on GeoFilterKruskal is performed O(1) times WHP. Thus the total time spent in
partitioning is O(n) WHP. Since the total number of Bccp edges required to compute the
GMST is O(n), by Lemma A.0.2, the time spent in computing all such edges is O(n) WHP.
Total time spent in sorting the edges in the base case is O(n log n). Including the time to
compute the Morton order sort for the WSPD, the total running time of the algorithm is
one sort plus O(n) additional steps WHP.
74
BIBLIOGRAPHY
[1] Pankaj K. Agarwal, Herbert Edelsbrunner, Otfried Schwarzkopf, and Emo Welzl.
Euclidean minimum spanning trees and bichromatic closest pairs. In Proceedings of
the sixth annual symposium on Computational geometry, SCG ’90, pages 203–210,
New York, NY, USA, 1990. ACM.
[2] Pankaj K. Agarwal, Herbert Edelsbrunner, Otfried Schwarzkopf, and Emo Welzl.
Euclidean minimum spanning trees and bichromatic closest pairs. Discrete Comput.
Geom., 6(5):407–422, 1991.
[3] Pankaj K. Agarwal, Cecilia Magdalena Procopiuc, and Kasturi R. Varadarajan. Approximation algorithms for k-line center. In Proceedings of the 10th Annual European
Symposium on Algorithms, ESA ’02, pages 54–63, London, UK, UK, 2002. SpringerVerlag.
[4] M. Alexa, J. Behr, D. Cohen-Or, S. Fleishman, D. Levin, and C. T. Silva. Point set
surfaces. In VIS ’01: Proceedings of the conference on Visualization ’01, pages 21–28,
Washington, DC, USA, 2001. IEEE Computer Society.
[5] Sanjeev Arora. Polynomial time approximation schemes for euclidean traveling salesman and other geometric problems. J. ACM, 45(5):753–782, 1998.
[6] S. Arya and D. Mount. Computational geometry: Proximity and location. In Dinesh Mehta and Sartaj Sahni, editors, Handbook of Data Structures and Applications,
chapter 63, pages 63–1, 63–22. CRC Press, 2005.
[7] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu. An optimal
algorithm for approximate nearest neighbor searching in fixed dimensions. J. ACM,
45:891–923, 1998.
[8] Sunil Arya and Ho-Yam Addy Fu. Expected-case complexity of approximate nearest
neighbor searching. SIAM J. Comput., 32:793–815, March 2003.
[9] Sunil Arya, Theocharis Malamatos, David M. Mount, and Ka Chun Wong. Optimal
expected-case planar point location. SIAM J. Comput., 37(2):584–610, 2007.
[10] Franz Aurenhammer. Voronoi diagrams—a survey of a fundamental geometric data
structure. ACM Comput. Surv., 23(3):345–405, 1991.
[11] J.D. Barrow, S.P. Bhavsar, and D.H. Sonoda. Min-imal spanning trees, filaments and
galaxy clustering. MNRAS, 216:17–35, Sept 1985.
75
[12] R. Baxter. Planar lattice gases with nearest-neighbor exclusion. Annals of Combinatorics, 3:191–203, 1999. 10.1007/BF01608783.
[13] Jon Louis Bentley. Multidimensional binary search trees used for associative searching.
Commun. ACM, 18(9):509–517, 1975.
[14] Jon Louis Bentley, Bruce W. Weide, and Andrew C. Yao. Optimal expected-time
algorithms for closest point problems. ACM Trans. Math. Softw., 6(4):563–580, 1980.
[15] Stefan Berchtold, Christian Böhm, H. V. Jagadish, Hans-Peter Kriegel, and Jörg
Sander. Independent quantization: An index compression technique for highdimensional data spaces. In IN ICDE, pages 577–588, 1999.
[16] M. Bern. Approximate closest-point queries in high dimensions. Inf. Process. Lett.,
45(2):95–99, 1993.
[17] S. P. Bhavsar and R. J. Splinter. The superiority of the minimal spanning tree in
percolation analyses of cosmological data sets. MNRAS, 282:1461–1466, Oct 1996.
[18] Marcel Birn, Manuel Holtgrewe, Peter Sanders, and J. Singler. Simple and Fast
Nearest Neighbor Search. In 2010 Proceedings of the Twelfth Workshop on Algorithm
Engineering and Experiments, pages 43–54, 16 January 2010.
[19] Jean-Daniel Boissonnat, Olivier Devillers, Monique Teillaud, and Mariette Yvinec.
Triangulations in cgal (extended abstract). In SCG ’00: Proceedings of the sixteenth
annual symposium on Computational geometry, pages 11–18, New York, NY, USA,
2000. ACM.
[20] J.J. Brennan. Minimal spanning trees and partial sorting. Operations Research Letters, 1(3):138–141, 1982.
[21] P. B. Callahan and S. R. Kosaraju. A decomposition of multidimensional point
sets with applications to k-nearest-neighbors and n-body potential fields. J. ACM,
42(1):67–90, 1995.
[22] Paul B. Callahan. Dealing with higher dimensions: the well-separated pair decomposition and its applications. PhD thesis, Johns Hopkins University, Baltimore, MD,
USA, 1995.
[23] T. M. Chan. Approximate nearest neighbor queries revisited. In SCG ’97: Proceedings
of the thirteenth annual symposium on Computational geometry, pages 352–358, New
York, NY, USA, 1997. ACM.
[24] T. M. Chan. Manuscript: A minimalist’s implementation of an approximate nearest
neighbor algorithm in fixed dimensions, 2006.
[25] Timothy M. Chan. Well-separated pair decomposition in linear time? Inf. Process.
Lett., 107(5):138–141, 2008.
76
[26] J. Chhugani, B. Purnomo, S. Krishnan, J. Cohen, S. Venkatasubramanian, and D. S.
Johnson. vlod: High-fidelity walkthrough of large virtual environments. IEEE Transactions on Visualization and Computer Graphics, 11(1):35–47, 2005.
[27] U. Clarenz, M. Rumpf, and A. Telea. Finite elements on point based surfaces. In
Proc. EG Symposium of Point Based Graphics (SPBG 2004), pages 201–211, 2004.
[28] K. L. Clarkson. Fast algorithms for the all nearest neighbors problem. In FOCS ’83:
Proceedings of the Twenty-fourth Symposium on Foundations of Computer Science,
Tucson, AZ, November 1983. Included in PhD Thesis.
[29] K. L. Clarkson. Nearest-neighbor searching and metric space dimensions. In
G. Shakhnarovich, T. Darrell, and P. Indyk, editors, Nearest-Neighbor Methods for
Learning and Vision: Theory and Practice, pages 15–59. MIT Press, 2006.
[30] Kenneth L. Clarkson. An algorithm for geometric minimum spanning trees requiring
nearly linear expected time. Algorithmica, 4:461–469, 1989. Included in PhD Thesis.
[31] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press and McGraw-Hill, 2001.
[32] D. Cotting, T. Weyrich, M. Pauly, and M. Gross. Robust watermarking of pointsampled geometry. In SMI ’04: Proceedings of the Shape Modeling International
2004, pages 233–242, Washington, DC, USA, 2004. IEEE Computer Society.
[33] L. Dagum and R. Menon. Openmp: an industry standard api for shared-memory
programming. IEEE Computational Science and Engineering, 5(1):46–55, 1998.
[34] Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, second edition,
2000.
[35] B. Delaunay. Sur la sphère vide. Otdelenie Matematicheskikh i Estestvennykh Nauk,
7:793—-800, 1934.
[36] Olivier Devillers. The Delaunay Hierarchy. International Journal of Foundations of
Computer Science, 13:163–180, 2002.
[37] M. T. Dickerson and R. S. Drysdale. Fixed-radius near neighbors search algorithms
for points and segments. Inf. Process. Lett., 35(5):269–273, 1990.
[38] M. T. Dickerson and D. Eppstein. Algorithms for proximity problems in higher dimensions. Computational Geometry Theory & Applications, 5(5):277–291, January
1996.
[39] R. A. Dwyer. Higher-dimensional voronoi diagrams in linear expected time. In SCG
’89: Proceedings of the fifth annual symposium on Computational geometry, pages
326–333, New York, NY, USA, 1989. ACM.
77
[40] Herbert Edelsbrunner, Lionidas J Guibas, and Jorge Stolfi. Optimal point location
in a monotone subdivision. SIAM J. Comput., 15(2):317–340, 1986.
[41] Jeff Erickson. On the relative complexities of some geometric problems. In In Proc.
7th Canad. Conf. Comput. Geom, pages 85–90, 1995.
[42] Jeff Erickson. Nice point sets can have nasty delaunay triangulations. In Proceedings
of the seventeenth annual symposium on Computational geometry, SCG ’01, pages
96–105, New York, NY, USA, 2001. ACM.
[43] Jeffrey Gordon Erickson. Lower bounds for fundamental geometric problems. PhD
thesis, University of California, Berkeley, 1996. Chair-Seidel, Raimund.
[44] R. A. Finkel and J. L. Bentley. Quad trees a data structure for retrieval on composite
keys. Acta Informatica, 4(1):1–9, March 1974.
[45] S. Fleishman, D. Cohen-Or, and C. T. Silva. Robust moving least-squares fitting with
sharp features. ACM Trans. Graph., 24(3):544–552, 2005.
[46] Jerome H. Friedman, Jon Louis Bentley, and Raphael Ari Finkel. An algorithm for
finding best matches in logarithmic expected time. ACM Trans. Math. Softw., 3:209–
226, September 1977.
[47] K. Fukunage and P. M. Narendra. A branch and bound algorithm for computing
k-nearest neighbors. IEEE Trans. Comput., 24:750–753, July 1975.
[48] Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very
Large Data Bases, VLDB ’99, pages 518–529, San Francisco, CA, USA, 1999. Morgan
Kaufmann Publishers Inc.
[49] John Iacono. Optimal planar point location. In SODA ’01: Proceedings of the twelfth
annual ACM-SIAM symposium on Discrete algorithms, pages 340–341, Philadelphia,
PA, USA, 2001. Society for Industrial and Applied Mathematics.
[50] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing
the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on
Theory of computing, STOC ’98, pages 604–613, New York, NY, USA, 1998. ACM.
[51] D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics.
In STOC ’02: Proceedings of the thirty-fourth annual ACM symposium on Theory of
computing, pages 741–750, New York, NY, USA, 2002. ACM.
[52] David G. Kirkpatrick. Optimal search in planar subdivisions. SIAM Journal on
Computing, 12(1):28–35, 1983.
[53] Philip N. Klein and Robert E. Tarjan. A randomized linear-time algorithm for finding
minimum spanning trees. In STOC ’94: Proceedings of the twenty-sixth annual ACM
symposium on Theory of computing, pages 9–15, New York, NY, USA, 1994. ACM.
78
[54] E. Kranakis, H. Singh, and J. Urrutia. Compass routing on geometric networks. In
Proc. of 11th Canadian Conference on Computational Geometry, pages 51–54, 1999.
[55] J.B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman
problem. In Proc. American Math. Society, pages 7–48, 1956.
[56] Joseph B. Kruskal. On the Shortest Spanning Subtree of a Graph and the Traveling
Salesman Problem. Proceedings of the American Mathematical Society, 7(1):48–50,
February 1956.
[57] Pankaj Kumar and Piyush Kumar. Almost optimal solutions to k-clustering problems.
Int. J. Comput. Geometry Appl., 20(4):431–447, 2010.
[58] Elmar Langetepe and Gabriel Zachmann. Geometric Data Structures for Computer
Graphics. A. K. Peters, Ltd., Natick, MA, USA, 2006.
[59] Swanwa Liao, Mario A. Lopez, and Scott T. Leutenegger. High dimensional similarity
search with space filling curves. In Proceedings of the 17th International Conference
on Data Engineering, pages 615–622, Washington, DC, USA, 2001. IEEE Computer
Society.
[60] David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J.
Comput. Vision, 60:91–110, November 2004.
[61] Songrit Maneewongvatana and David M. Mount. Analysis of approximate nearest
neighbor searching with clustered point sets, 1999.
[62] Robert Mencl. A graph based approach to surface reconstruction. Computer Graphics
Forum, 14:445 – 456, 2008.
[63] A. Mhatre and P. Kumar. Projective clustering and its application to surface reconstruction: extended abstract. In SCG ’06: Proceedings of the twenty-second annual
symposium on Computational geometry, pages 477–478, New York, NY, USA, 2006.
ACM.
[64] Krystian Mikolajczyk and Jiri Matas. Improving Descriptors for Fast Tree Matching
by Optimal Linear Projection. In Proceedings of IEEE International Conference on
Computer Vision, pages 1–8, 2007.
[65] Stanley Milgram. The small world problem. Psychology Today, 1(1):60–67, 1967.
[66] N. J. Mitra and A. Nguyen. Estimating surface normals in noisy point cloud data. In
SCG ’03: Proceedings of the nineteenth annual symposium on Computational geometry, pages 322–328, New York, NY, USA, 2003. ACM.
[67] G. M. Morton. A computer oriented geodetic data base and a new technique in file
sequencing. In Technical Report,IBM Ltd, 1966.
[68] Rajeev Motwani and Prabhakar Raghavan. Randomized algorithms. ACM Comput.
Surv., 28(1):33–37, 1996.
79
[69] D. Mount. ANN: Library for Approximate Nearest Neighbor Searching, 1998. http:
//www.cs.umd.edu/~mount/ANN/.
[70] David Mount and Sunil Arya. Ann: A library for approximate nearest neighbor
searching. 1997.
[71] Marius Muja and David G. Lowe. Fast approximate nearest neighbors with automatic
algorithm configuration. In In VISAPP International Conference on Computer Vision
Theory and Applications, pages 331–340, 2009.
[72] K. Mulmuley. A fast planar partition algorithm, ii. In SCG ’89: Proceedings of the
fifth annual symposium on Computational geometry, pages 33–43, New York, NY,
USA, 1989. ACM.
[73] Giri Narasimhan and Martin Zachariasen. Geometric minimum spanning trees via
well-separated pair decompositions. J. Exp. Algorithmics, 6:6, 2001.
[74] David Nistér and Henrik Stewénius. Scalable recognition with a vocabulary tree. In
IN CVPR, pages 2161–2168, 2006.
[75] D. Omercevic, O. Drbohlav, and A. Leonardis. High-dimensional feature matching:
Employing the concept of meaningful nearest neighbors. In Computer Vision, 2007.
ICCV 2007. IEEE 11th International Conference on, pages 1 –8, 2007.
[76] J. A. Orenstein and T. H. Merrett. A class of data structures for associative searching. In PODS ’84: Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on
Principles of database systems, pages 181–190, New York, NY, USA, 1984. ACM.
[77] Vitaly Osipov, Peter Sanders, and Johannes Singler. The filter-kruskal minimum
spanning tree algorithm. In Irene Finocchi and John Hershberger, editors, ALENEX,
pages 52–61. SIAM, 2009.
[78] R. Pajarola. Stream-processing points. In Proceedings IEEE Visualization, 2005,
Online., pages 239–246. Computer Society Press, 2005.
[79] M. Pauly, M. Gross, and L. P. Kobbelt. Efficient simplification of point-sampled
surfaces. In VIS ’02: Proceedings of the conference on Visualization ’02, pages 163–
170, Washington, DC, USA, 2002. IEEE Computer Society.
[80] M. Pauly, R. Keiser, L. P. Kobbelt, and M. Gross. Shape modeling with point-sampled
geometry. ACM Trans. Graph., 22(3):641–650, 2003.
[81] K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(6):559–572, 1901.
[82] Franco P. Preparata and Michael I. Shamos. Computational geometry: an introduction. Springer-Verlag New York, Inc., New York, NY, USA, 1985.
80
[83] Felix Putze, Peter Sanders, and Johannes Singler. Mcstl: the multi-core standard
template library. In PPoPP ’07: Proceedings of the 12th ACM SIGPLAN symposium
on Principles and practice of parallel programming, pages 144–145, New York, NY,
USA, 2007. ACM.
[84] Sanguthevar Rajasekaran. On the euclidean minimum spanning tree problem. Computing Letters, 1(1), 2004.
[85] Hanan Samet. Applications of spatial data structures: Computer graphics, image
processing, and GIS. Addison-Wesley Longman Publishing Co., Inc., Boston, MA,
USA, 1990.
[86] J. Sankaranarayanan, H. Samet, and A. Varshney. A fast all nearest neighbor algorithm for applications involving large point-clouds. Comput. Graph., 31(2):157–174,
2007.
[87] Cordelia Schmid and Roger Mohr. Local grayvalue invariants for image retrieval.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:530–535, 1997.
[88] Raimund Seidel. A simple and fast incremental randomized algorithm for computing
trapezoidal decompositions and for triangulating polygons. Comput. Geom. Theory
Appl., 1(1):51–64, 1991.
[89] J. S. Semura and D. L. Huber. Low-temperature behavior of the planar heisenberg
ferromagnet. Phys. Rev. B, 7(5):2154–2162, Mar 1973.
[90] Michael Ian Shamos and Dan Hoey. Closest-point problems. In SFCS ’75: Proceedings
of the 16th Annual Symposium on Foundations of Computer Science, pages 151–162,
Washington, DC, USA, 1975. IEEE Computer Society.
[91] Alok Sharma and Kuldip K. Paliwal. Fast principal component analysis using fixedpoint algorithm. Pattern Recogn. Lett., 28:1151–1155, July 2007.
[92] Jonathan Richard Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator
and Delaunay Triangulator. In Ming C. Lin and Dinesh Manocha, editors, Applied
Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture
Notes in Computer Science, pages 203–222. Springer-Verlag, May 1996. From the
First ACM Workshop on Applied Computational Geometry.
[93] C. Silpa-Anan and R. Hartley. Optimised KD-trees for fast image descriptor matching.
In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference
on, pages 1–8, June 2008.
[94] Peter Su and Robert L. Scot Drysdale. A comparison of sequential delaunay triangulation algorithms. In SCG ’95: Proceedings of the eleventh annual symposium on
Computational geometry, pages 61–70, New York, NY, USA, 1995. ACM.
[95] Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. Efficient and accurate nearest
neighbor and closest pair search in high-dimensional space. ACM Trans. Database
Syst., 35:20:1–20:46, July 2010.
81
[96] H. Tropf and H. Herzog. Multidimensional range search in dynamically balanced
trees. Angewandte Informatik, 2:71–77, 1981.
[97] P. Tsigas and Y. Zhang. A simple, fast parallel implementation of quicksort and its
performance evaluation on SUN Enterprise 10000. Eleventh Euromicro Conference
on Parallel, Distributed and Network-Based Processing, 00:372, 2003.
[98] P. M. Vaidya. An O(n log n) algorithm for the all-nearest-neighbors problem. Discrete
Comput. Geom., 4(2):101–115, 1989.
[99] Roger Weber, Hans-Jörg Schek, and Stephen Blott. A quantitative analysis and
performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB ’98,
pages 194–205, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc.
[100] C.T. Zahn. Graph-theoretical methods for detecting and describing gestalt clusters.
Transactions on Computers, C-20(1):68–86, 1971.
82
BIOGRAPHICAL SKETCH
Michael Connor began study at Florida State University in 2002, in the Department of
Computer Science. He was awarded the degree of Bachelor of Science from the computer
science program in 2004. He was admitted to the graduate program, and continued his
study in the department. Under the advisement of Dr. Piyush Kumar, he was awarded the
degree of Master of Science in 2007.
83