Florida State University Libraries Electronic Theses, Treatises and Dissertations The Graduate School 2011 Algorithms for Solving Near Point Problems Michael Connor Follow this and additional works at the FSU Digital Library. For more information, please contact [email protected] THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCES ALGORITHMS FOR SOLVING NEAR POINT PROBLEMS By MICHAEL CONNOR A Dissertation submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy Degree Awarded: Spring Semester, 2011 The members of the committee approve the dissertation of Michael Connor defended on April 4, 2011. Piyush Kumar Professor Directing Thesis Washington Mio University Representative Feifei Li Committee Member Xiuwen Liu Committee Member Approved: David Whalley, Chair, Department of Computer Science The Graduate School has verified and approved the above-named committee members. ii TABLE OF CONTENTS List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction 1.1 Problem Definitions . . . . . . . . . . . . . . . . . . . 1.1.1 k-Nearest Neighbor Graphs . . . . . . . . . . . 1.1.2 Nearest Neighbor Searching . . . . . . . . . . . Nearest Neighbor Searching in the Plane. . . . High Dimensional Nearest Neighbor Searching. 1.1.3 Geometric Minimum Spanning Trees . . . . . . 1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 2 3 3 4 2 Background 2.1 Quadtrees . . . . . . . . . . . . . . . . . . . . 2.2 Morton Ordering . . . . . . . . . . . . . . . . 2.3 KD-Trees . . . . . . . . . . . . . . . . . . . . 2.3.1 Advanced Splitting Rules . . . . . . . 2.4 Kruskal’s MST Algorithm and Union Find . . 2.4.1 Well Separated Pair Decomposition . 2.5 Bi-chromatic Pair Closest Pair Computation . 2.6 Voronoi Cell Diagrams . . . . . . . . . . . . . 2.7 Delaunay Graphs . . . . . . . . . . . . . . . . 2.7.1 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 5 6 8 8 8 9 9 11 12 3 k-Nearest Neighbor Graphs 3.0.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 The knng Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Preprocessing and Morton Ordering On Floating Points 3.1.2 Sliding Window Phase . . . . . . . . . . . . . . . . . . . 3.1.3 Search Phase . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 Parallel Construction . . . . . . . . . . . . . . . . . . . . 3.1.5 Handling large data sets . . . . . . . . . . . . . . . . . . 3.2 Analysis of the knng Algorithm . . . . . . . . . . . . . . . . . . 3.3 Experimental Analysis of the knng Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 14 14 15 16 17 17 17 19 24 iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 ANN . . . . . . . . . . . . . Experimental Data . . . . . Intel Architecture . . . . . Construction Time Results. AMD Architecture . . . . . Sun Architecture . . . . . . 4 Planar Nearest Neighbor Search Notation. . . . . . . . . 4.1 The Algorithm . . . . . . . . . 4.2 Analysis of DelaunayNN . . . . 4.3 Experimental Analysis . . . . . 4.3.1 FDH Algorithm . . . . . 4.3.2 Data Distributions . . . 4.3.3 Experimental Results . 4.4 3-Dimensional Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 25 25 25 28 30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 33 34 37 39 40 40 41 42 5 Geometric Minimum Spanning Trees 5.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The GeoFilterKruskal algorithm . . . . . . . . . . . . . . 5.3 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Analysis of the Running Time . . . . . . . . . . . . . . . . 5.5 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Comparing Quadtrees and Fair Split Trees . . . . 5.6 Geometric Minimum Spanning Tree Experimental Setup . 5.7 Geometric Minimum Spanning Tree Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 50 50 51 51 53 53 56 57 6 Nearest Neighbor Search of High Dimensional SIFT Data 6.0.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 The PCANN Algorithm . . . . . . . . . . . . . . . . . . . . . . . 6.2 High Dimensional Nearest Neighbor Search Experimental Setup . 6.2.1 FLANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 LSB-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Experiment Data . . . . . . . . . . . . . . . . . . . . . . . 6.3 High Dimensional Nearest Neighbor Search Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 63 63 66 66 66 67 67 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conclusions 72 A Full Proof of the GMST Algorithm Analysis High Probability Bound Analysis. . . . . . . . . . . . . . . . . . 73 73 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Biographical Sketch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 iv LIST OF TABLES 3.1 Construction times for k = 1-nearest neighbor graphs constructed on nonrandom 3-dimensional data sets. Each graph was constructed using 8 threads on Intel architecture. All timings are in seconds. . . . . . . . . . . . . . . . . 25 5.1 Algorithm Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.2 Point Distribution Info . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 v LIST OF FIGURES 1.1 A 2-nearest neighbor graph built on a two dimensional point set. . . . . . . . 2 1.2 A simple example of nearest neighbor searching. Here, the query is in green. The red point marks the answer to the query, with the rest of the data points in blue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 A geometric minimum spanning tree built over a two dimensional point set. . 4 2.1 The Morton order curve preceding the upper left corner, and following the lower right corner of a quadtree hyper-cube, will never re-enter the region. . 6 The smallest quadtree hypercube containing two points will also contain all points lying between the two in Morton order. . . . . . . . . . . . . . . . . . 7 2.3 A KD-tree constructed over a two dimensional point set . . . . . . . . . . . . 7 2.4 A Voronoi Cell Diagram built over a two dimensional point set. . . . . . . . . 10 2.5 A Delaunay graph constructed over a two dimensional point set. . . . . . . . 11 2.6 Compass routing on a Delaunay graph to locate a nearest neighbor . . . . . . 12 3.1 Pictorial representation of Algorithm 2, Line 5. Since all the points inside the approximate nearest neighbor ball of pi have been scanned, the nearest ⌈rad(Ai )⌉ neighbor has been found. This happens because pi is the largest point with respect to the Morton ordering compared to any point inside the box. Any point greater than pi+k in Morton ordering cannot intersect the box shown. A −⌈rad(Ai )⌉ similar argument holds for pi and pi−k . . . . . . . . . . . . . . . . . . 16 (a) B lands cleanly on the quadtree box BQ twice its size. (b) B lands cleanly on a quadtree box 22 times its size. In both figures, if the upper left corner of B lies in the shaded area, the box B does not intersect the boundary of BQ . Obviously, (a) happens with probability 14 (probability ( 12 )d in general 2 9 22 −1 d 2 dimension) and (b) happens with probability ( 2 2−1 2 ) = 16 (probability ( 22 ) in general dimension). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 All boxes referred in Lemma 3.2.3. . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 3.2 3.3 vi 3.4 Graph of 1-NN graph construction time vs. number of data points on the Intel architecture. Each algorithm was run using 8 threads in parallel. . . . . . . . 26 Graph of k-NN graph construction time for varying k on the Intel architecture. Each algorithm was run using 8 threads in parallel. Data sets contained one million points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Graph of 1-NN graph construction time for varying number of threads on Intel architecture. Data sets contained ten million points. . . . . . . . . . . . . . . 27 Graph of memory usage per point vs. data size on Intel architecture. Memory usage was determined using valgrind. . . . . . . . . . . . . . . . . . . . . . . 27 Graph of cache misses vs. data set size on Intel architecture. All data sets were uniformly random 3-dimensional data sets. Cache misses were determined using valgrind which simulated a 2 MB L1 cache. . . . . . . . . . . . . . . . . 28 Graph of 1-NN graph construction time vs. number of data points on AMD architecture. Each algorithm was run using 16 threads in parallel. . . . . . . 29 Graph of k-NN graph construction time for varying k on AMD architecture. Each algorithm was run using 16 threads in parallel. Data sets contained ten million points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Graph of 1-NN graph construction time for varying number of threads on AMD architecture. Data sets contained ten million points. . . . . . . . . . . 30 Graph of 1-NN graph construction time vs. number of data points on Sun architecture. Each algorithm was run using 128 threads in parallel. . . . . . . 31 Graph of k-NN graph construction time for varying k on Sun architecture. Each algorithm was run using 128 threads in parallel. Data sets contained ten million points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Graph of 1-NN graph construction time for varying number of threads on Sun architecture. Data sets contained ten million points. . . . . . . . . . . . . . . 32 4.1 The three layers of the query algorithm. . . . . . . . . . . . . . . . . . . . . . 34 4.2 Here we see the center vertex p, and the sectors defined by rays passing through the Voronoi vertices. To find a nearer neighbor, we locate which sector the query lies in (via a binary search), then check the distance to the adjacent point that lies in that sector. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.3 Two degenerate cases for linear degree vertices. . . . . . . . . . . . . . . . . . 37 4.4 Proof of Lemma 4.2.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 vii 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 5.1 Showing average time per query versus data set size for points taken from uniform distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Showing average time per query versus data set size for points taken from a unit circle. Query points were taken from the smallest square enclosing the circle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Showing average query time versus data set size for points taken from the unit circle, plus some points chosen uniformly at random from the square containing the circle. Query points taken from the circle. . . . . . . . . . . . . . . . . . . 44 Showing average time per query versus data set size for points taken from a parabola. Query points were taken from the smallest rectangle enclosing the parabola. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Showing average time per query versus data set size for points taken from a distribution with linear degree vertices. Note that ANN is faster than the Delaunay algorithm run without the Voronoi cell search capability. . . . . . . 45 Showing average time per point to pre-process data sets for queries. Data was taken uniformly at random from the unit square. . . . . . . . . . . . . . . . . 46 Showing average time per query versus data set size for points taken from uniform distribution in three dimensions. . . . . . . . . . . . . . . . . . . . . 46 Showing average time per query versus data set size for points taken from the circular distribution in three dimensions. . . . . . . . . . . . . . . . . . . . . 47 Showing average time per query versus data set size for points taken from the “fuzzy“ circular distribution in three dimensions. . . . . . . . . . . . . . . . . 47 Showing average time per query versus data set size for points taken from the parabolic distribution in three dimensions. . . . . . . . . . . . . . . . . . . . 48 Showing average time per query versus data set size for points taken from a distribution with linear degree vertices. In this case, ANN wins due to the necessity of processing the linear degree vertices by sequential scan. . . . . . 48 This figure demonstrates the run time gains of the algorithm as more threads are used. Scaling for two architectures. The AMD line was computed on the machine described in Section 5.6. The Intel line used a machine with four 2.66GHz Intel(R) Xeon(R) CPU X5355, 4096 KB of L1 cache, and 8GB total memory. For additional comparison, we include KNNG construction time using a parallel 8-nearest neighbor graph algorithm. All cases were run on 20 data sets from uniform random distribution of size 106 points, final total run time is an average of the results. . . . . . . . . . . . . . . . . . . . . . . . . . 53 viii 5.2 Comparison of the number of WSPs versus the number of points for quadtrees and fair split trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3 Comparison of the number of points versus GMST construction time for quadtrees and fair split trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.4 Separation factor of WSPs versus the number of WSPs produced. . . . . . . 55 5.5 Separation of WSPs versus the error in the length of the GMST. . . . . . . . 56 5.6 Total running time for each algorithm over varying sized data sets. Data was from uniformly random and two dimensional. . . . . . . . . . . . . . . . . . . 58 Total running time for each algorithm over varying sized data sets. Data was from uniformly random and three dimensional. . . . . . . . . . . . . . . . . . 59 Total running time for each algorithm over varying sized data sets. Data was from uniformly random and four dimensional. . . . . . . . . . . . . . . . . . . 59 Showing average error and standard deviation when comparing GeoMST to GeoFK1 on varying distributions. Data size was 106 points in two dimensions. 60 Showing average error and standard deviation when comparing GeoMST2 to GeoFK1 on varying distributions. Data size was 106 points in two dimensions. 60 Showing average error and standard deviation when comparing Triangle to GeoFK1 on varying distributions. Data size was 106 points in two dimensions. 61 6.1 Timing results for PCANN versus FLANN on SIFT data. . . . . . . . . . . . 68 6.2 Timing results for PCANN versus LSB-Tree on SIFT data. . . . . . . . . . . 69 6.3 Average error for PCANN, FLANN and LSB-Tree on SIFT data. . . . . . . . 69 6.4 Timing results for PCANN versus FLANN on uniform random data. . . . . . 70 6.5 Timing results for PCANN versus LSB-Tree on uniform random data. . . . 70 6.6 Average error for PCA projection for SIFT and uniform random data. Average is from 1000 queries on 100000 data points. . . . . . . . . . . . . . . . . . . . 71 Timing results for sequential PCANN versus parallel on SIFT data. . . . . . 71 5.7 5.8 5.9 5.10 5.11 6.7 ix ABSTRACT Near point problems are widely used in computational geometry as well as a variety of other scientific fields. This work examines four common near point problems and presents original algorithms that solve them. Planar nearest neighbor searching is highly motivated by geographic information system and sensor network problems. Efficient data structures to solve near neighbor queries in the plane can exploit the extreme low dimension for fast results. To this end, DealaunayNN is an algorithm using Delaunay graphs and Voronoi cells to answer queries in O(log n) time, faster in practice than other common state-of-the art algorithms. k-Nearest neighbor graph construction arises in computer graphics in areas of normal estimation and surface simplification. This work presents knng, an efficient algorithm using Morton ordering to solve the problem. The knng algorithm exploits cache coherence and low storage space, as well as being extremely optimize-able for parallel processors. The GeoFilterKruskal algorithm solves the problem of computing geometric minimum spanning trees. A common tool in tackling clustering problems, GMSTs are an extension of the minimum spanning tree graph problem, applied to the complete graph of a point set. By using well separated pair decomposition, bi-chromatic closest pair computation, and partitioning and filtering techniques, GeoFilterKruskal greatly reduces the total computation required. It is also one of the only algorithms to compute GMSTs in a manner that lends itself to parallel computation; a major advantage over its competitors. High dimensional nearest neighbor searching is an expensive operation, due to an exponential dependence on dimension from many lower dimensional solutions. Modern techniques to solve this problem often revolve around projecting data points into a large number of lower dimensional subspaces. PCANN explores the idea of picking one particularly relevant subspace for projection. When used on SIFT data, principal component analysis allows for greatly reduced dimension with no need for multiple projection. Additionally, this algorithm is also highly motivated to make use of parallel computing power. x CHAPTER 1 INTRODUCTION Many problems in computational geometry, particularly those dealing with point clouds, use near point queries or data structures as building blocks to an eventual solution. As such, the practical efficiency of these building blocks becomes an important factor in considering the design of the solution. This work will examine several computational geometry problems in an effort to expand the toolbox of near neighbor problem solvers. One particular aspect motivating this work is a lack of parallel friendly algorithms. Computer architecture is increasingly providing multiple processing units, and algorithms that take an eye towards taking advantage of this processing power will have an increasing advantage. In this introductory chapter, four distinct problems in computational geometry will be defined. In addition, common uses for these problems will be identified, and (hopefully) the reasons to seek more practical algorithms for solving them will be made clear. For all research subgroups, the goal will be three-fold. First, design an algorithm that can be competitive with the current state of the art. Second, complete a theoretical analysis of the algorithm’s runtime. The obvious goal is to improve the running time when compared to current methods, or to match the running time with fewer theoretical restrictions on the data (such as assuming a uniform distribution). Finally, algorithms should be implemented in the most efficient manner, including parallel implementation where possible. This will allow for a practical comparative analysis with other implementations via experimentation, as well as allowing other interested parties to replicate experimental results. 1.1 1.1.1 Problem Definitions k-Nearest Neighbor Graphs A k-nearest neighbor graph is defined for a set of points by creating an edge from every point to the k nearest points in the data set. More formally, let P = {p1 , p2 , . . . , pn } be a point cloud in Rd . For each pi ∈ P , let Nik be the k points in P , closest to pi . The k-nearest neighbor graph is a graph with vertex set {p1 , p2 , . . . , pn } and edge set E = {(pi , pj ) : pi ∈ Njk or pj ∈ Nik }. In the problem of surface reconstruction, one is given a set of points that are known to be the vertices of a triangulation of the surface. The goal of the surface reconstruction problem is to reconstruct that triangulation from the set of points. One method of solving 1 b b b b b b b b b b b b b b b b b b b b Figure 1.1: A 2-nearest neighbor graph built on a two dimensional point set. the problem is to consider local sections of the surface, as defined by the k-nearest neighbors of all the points. Then, one can use an algorithm to find fittings for that neighborhood [63]. By identifying how well certain geometric shapes fit the neighborhood, they can be used to reconstruct difficult regions of the point cloud, such as corners, edges, and other features. In this instance, a fast algorithm to find k-nearest neighbors for every point in the set allows for fast identification of the neighborhoods, and more time to be spent on fitting the features. Additionally, the problem of computing k-nearest neighbor graph computation arises in many other applications and areas including computer graphics, visualization, pattern recognition, computational geometry and geographic information systems. In graphics and visualization, computation of k-nearest neighbor graph forms a basic building block in solving many important problems including normal estimation [66] , surface simplification [79] , finite element modeling [27], shape modeling [80], watermarking [32], virtual walkthroughs [26] and surface reconstruction [4, 45]. 1.1.2 Nearest Neighbor Searching Define the fundamental nearest neighbor problem as follows: given a set P of n points in Rd , and a query point q (also in Rd ), output pi ∈ P such that the distance dℓ (q, p) is minimized over all p ∈ P . Nearest neighbor search is a fundamental geometric problem important in a variety of applications including data mining, machine learning, pattern recognition, computer vision, graphics, statistics, bio-informatics, and data compression [29, 6]. Nearest Neighbor Searching in the Plane. In this instance of the problem, the focus is on efficient methods for solving the problem for d=2. This is an important distinction, as nearest neighbor searching in low dimensions is much less difficult than higher dimensions. Planar nearest neighbor search is one of the iconic nearest neighbor problems, originally presented as the “post office problem” by Knuth in the 1970s. To wit, when looking at a map, how do you decide which is the closest post office to your location? With the advent 2 b b b b b b b b b b b b b b b b Figure 1.2: A simple example of nearest neighbor searching. Here, the query is in green. The red point marks the answer to the query, with the rest of the data points in blue. of interactive maps available on any number of devices, this problem has re-emerged as a fundamental tool in dynamic route planning. Applications of the nearest neighbor problem in the plane are also highly motivated by problems in geographic information systems [85], particle physics [89] and chemistry [12]. High Dimensional Nearest Neighbor Searching. High dimensional nearest neighbor search is not fundamentally different from lower dimensions. However, nearest neighbor searching suffers from what is known as the “curse of dimensionality.” As the dimension of the data points grows linearly, the running time of the algorithm can grow exponentially. Because of this exponential dependence on dimension inherent to most nearest neighbor algorithms, different approaches are required. Scale-invariant feature transform(SIFT) is an algorithm in computer vision designed to identify objects. “Interesting” features are identified in a training image, and stored in such a way as to create a description of the object. In order to recognize an unknown object as being similar to one previously identified, these feature descriptions must be compared via nearest neighbor searching. There are two aspects to consider when optimizing nearest neighbor searches for this type of use. The more nearest neighbor features can be identified, the more responsive a vision system based on this sort of data can process images, up to or surpassing real time identification. In addition, more efficient methods can allow for higher dimensional points to be processed, allowing for more intricate features to be stored and used. High dimensional nearest neighbor searching is a common problem other aspects of computer vision, where it is used for feature matching [75], object recognition [87, 74], and image retrieval [60]. 1.1.3 Geometric Minimum Spanning Trees The minimum spanning tree of a graph seeks to minimize the weight of the graph edges connecting all vertices. In the geometric problem, the problem is finding the minimum 3 b b b b b b b b b b b b b b b b b b b b Figure 1.3: A geometric minimum spanning tree built over a two dimensional point set. weight spanning tree of the complete undirected graph over a set of points, with the weight d is given by the distance between the X points. Formally, given a set of n points, P , in R , output graph G = P, E such that |e| is minimized. e∈E Computing the geometric minimum spanning tree (GMST) is a classic computational geometry problem, but it frequently arises in other fields. Actin is a protein found in the muscle cells of animals. Understanding how Actin interacts with cell structures is a key factor in understanding cell locomotion and morphology. Digital imaging interpretation of two dimensional cell images greatly speeds up the process of identifying these proteins. GMSTs can be used in this sort of interpretation to rapidly identify these proteins, alleviating the need for tedious visual scans by a human, by creating meaningful connections in these images that can be matched to known protein structures. GMST computation arises in many other applications as well, including clustering and pattern classification [100] , surface reconstruction [62] , cosmology [11, 17] , TSP approximations [5] and computer graphics [58]. 1.2 Organization In chapter 2, background information on many tools used to create algorithms to solve these problems will be discussed. Subsequent chapters will deal specifically with each problem in turn, providing background on the current state-of-the art, a new algorithm design, theoretical analysis and extensive experimentation. 4 CHAPTER 2 BACKGROUND In the previous chapter, four near point problems were identified, and practical motivations for designing more efficient algorithms discussed. Nearest neighbor search, k-nearest neighbor graphs, and geometric minimum spanning trees are all well established problems in computational geometry, and just as they are tools to solve more complex problems, there exists a toolbox of useful structures to solve them. In this chapter, a number of techniques will be discussed that are key components in the algorithms proposed in subsequent chapters. These tools are presented as black boxes, with an eye towards understanding their input and output as opposed to their inner workings. For a more thorough analysis, references to classic works on these problems have been provided. 2.1 Quadtrees One of the first types of spatial subdivision trees, the quad-tree, and an algorithm to construct it, was introduced in 1975 by J. Bentley [44]. A quad-tree is defined by partitioning space into 2d equal, axis-aligned hyper-cubes. Each subspace is then further divided, until all points in the data set are contained within their own hyper-cube. Naively, it can be constructed in O(n log n) time by first sorting all input points according to their Morton order(see 2.2, then computing the smallest quad-tree boundary lying between any two adjacent points. This constructs a tree containing O(n) elements. However, it has since been shown that an equivalent tree can be constructed in O(n) time, by reducing the problem to a Cartesian tree [25]. Nearest neighbor algorithms based on quad-trees have been theorized, with expected running times similar to those based on other trees, such as kd-trees(see 2.3). In general, they are thought to be less efficient in practice [13] [23]. In chapter 3 an algorithm for constructing k-Nearest Neighbor graphs using quadtrees is presented. 2.2 Morton Ordering Morton order is a space filling curve. It reduces points in Rd to a one dimensional space, while attempting to preserve locality. The Morton order curve can be conceptually achieved 5 by building a quadtree, then recursively ordering the hyper-cubes. Morton order curves are sometimes called Z curves because these hyper-cubes are ordered so as to form a Z. In practice, Morton order can be determined by computing the Z-value for all the data points, then ordering them. The Z-value is a bitwise operation on the coordinates of a point. For a particular point, the bits of the coordinate values are interleaved. The resulting number is the Z-value. By constructing the curve in this way, we can implicitly define a quadtree. Chan [24] showed that the relative Morton order of two integer points can be easily calculated, by determining which pair of coordinates have the first differing bit in binary notation in the largest place. He further showed that this can be accomplished using a few binary operations. Using this method, Morton order can be reduced to sorting. In chapter 3.1.1 a method for computing relative Morton order on floating points is presented. Morton order curves have two simple properties that are useful in nearest neighbor searching. The first, shown in Figure 2.1, is that the curve does not “double back” on itself. Once the Morton order leaves a hyper-cube of the quadtree, it will not intersect that hypercube again. Figure 2.1: The Morton order curve preceding the upper left corner, and following the lower right corner of a quadtree hyper-cube, will never re-enter the region. The second property, shown in Figure 2.2, states that a quadtree hyper-cube containing two points on the Morton order curve will also contain all points which lie on the curve between the original two. In chapter 3, an algorithm for construction of k-nearest neighbor graphs using Morton order curves is presented. 2.3 KD-Trees As an improvement over the quad-tree, Bentley introduced the kd-tree [13]. This is another spatial decomposition tree, which relaxes the requirement of the quad-tree that all regions be divided equally. Instead, areas are divided into axis-aligned hyper-rectangles. The method for choosing the “splitting’ point varies, but the goal is always to divide the points as evenly as possible, while simultaneously attempting to keep the ratio of the side 6 Figure 2.2: The smallest quadtree hypercube containing two points will also contain all points lying between the two in Morton order. b b b b b b b b b b b b b b b Figure 2.3: A KD-tree constructed over a two dimensional point set length as low as possible. A common method for choosing the split is to rotate through the dimensions in the point set. That is, the first level of the tree is formed by sorting points according to their x coordinate, then cutting it with an orthogonal plane at the median point. The next level would be split similarly on the y axis, and so on. It is worth mentioning that the run-time of nearest neighbor algorithms based on these trees is related to both the height of the tree, and the aspect ratio of the hyper-rectangles. If the rectangles are too skinny, nearest neighbor balls will intersect more than a constant number, thus inflating the running time. One improved method for construction of kd-trees modifies the way the splitting plane is chosen. Instead of merely rotating through the dimensions, the splitting dimension at each internal node is determined by the “spread’ of points in that dimension. This, in practice, helps to bound the aspect ratio of the hyper-rectangles. This is accepted as the standard kd-tree splitting rule, and guarantees a tree of height O(log n) [46]. It does not, however, ensure that the hyper-rectangles are well-formed (having a constant factor aspect ratio). 7 Other splitting rules have been introduced to address this problem, and are described below. 2.3.1 Advanced Splitting Rules Many improvements have been made to the implementation of KD-trees, based on the way the hyper-rectangles are split. Some of these include the Midpoint Rule [46], the Sliding Midpoint Rule [61], the Balance Split Rule [8] and the Hybrid Split Rule [8]. The Midpoint Rule dictates that rather than split the hyper-rectangles along the dimension of greatest spread, they should be split by a plane orthogonal to the midpoint of the longest side. This begins to address the need to keep the aspect ratio a constant, but can lead to a tree of arbitrary height, as well as empty internal nodes. The Sliding Midpoint Rule is a practical modification of the Midpoint Rule, introduced by Arya and Mount in their ANN nearest neighbor library [70]. In the case of an empty hyper-rectangle, the split is shifted to lie on the nearest point contained in the hyperrectangle. This eliminates the risk of empty nodes. It can be shown that while the height of this tree could still be as bad as O(n), nearest neighbor searches done on it work very well in practice, and have an expected running time of O(log n)[8]. The Balance Split Rule is a further relaxation of the Sliding Midpoint Rule. Again, the longest side of the hyper-rectangle is chosen to be split. This time, however, an orthogonal splitting plane is chosen so that the number of points on either side are roughly balanced. This yields an expected query time of O(log n), and has a worst case query time of O(log d n). The Hybrid Split Rule alternates between the Sliding Midpoint Rule and the Balance Split Rule at different levels of the tree in an attempt to limit both the aspect ratio, and the height of the tree. This rule yields a tree with a height of O(log n), while still having rectangles of bounded aspect ratio, which gives an expected running time of O(log d n). 2.4 Kruskal’s MST Algorithm and Union Find Published in 1956, Kruskal’s algorithm [56] finds the minimum weight spanning tree of a graph. It begins by placing the edges in the graph into a priority queue, ordered by weight. Then it builds a structure (commonly called Union Find) which will identify edges in terms of connected components. Edges are removed from the head of the priority queue and inserted into the mst if they will not create a cycle. This proceeds until n − 1 edges have been added (where n is the number of vertices in the graph). While this algorithm was designed to work for arbitrary graphs, it can be used to find the mst built on a set of points. In this case, the original graph is considered to be the complete graph of the point sets, and the weight to be the euclidean distance between them. However, since there are O(n2 ) edges in the complete graph, this is an expensive approach. Sections 2.4.1 and 2.5 describe tools used to improve the viability of this method, and chapter 5 details a new algorithm for efficiently using Kruskal’s method to find geometric minimum spanning trees. 2.4.1 Well Separated Pair Decomposition Proposed by Callahan, the well-separated pair decomposition uses a spatial decomposition tree to create a simplified representation of the complete graph of a point set. In 8 essence, the n2 edges of the complete graph are represented by O(n) components. Formally, given a spatial decomposition tree Q, built on a set of points P in Rd , the well separated pair decomposition(WSPD) of the tree consists of all pairs of nodes (a1 , b1 ), ..., (am , bm ) such that for every point p ∈ a and every point q ∈ b, dℓ (a, b) < γdℓ (p, q). γ is known√as the separation factor. In order for pairs to be truly ”well separated“, γ must be >= 2. In addition to being separated, every pair of points (p, q) should appear in exactly one well separated pair(WSP). Construction of the WSPD can be executed on a quadtree or fair split tree in O(n) time, and Callahan proved that construction will yield O(n) WSPs. Note that while the process to construct the WSPD is identical regardless of the tree used, the number of WSPs can vary by a substantial amount(although within a constant factor). Section 2.5 describes Bi-chromatic closest pair computation, which can be used along with WSPD to compute geometric minimum spanning trees. A new algorithm using these methods is presented in chapter 5. 2.5 Bi-chromatic Pair Closest Pair Computation Given two sets of points, A (colored red) and B colored green, the bi-chromatic closest pair(BCCP) of (A, B) is defined as the minimum weight edge with endpoints p ∈ A and q ∈ B. Callahan showed that given a WSPD, the geometric minimum spanning tree is a subset of its BCCP edges. In fact, GMST computation can be reduced to the computation of the BCCPs for a WSPD. While faster BCCP algorithms exist [1], in practice a simple quadratic algorithm for finding BCCPs is typically used in GMST construction. This is due to the fact that competitive algorithms for GMST seek to minimize the number of BCCP computations necessary, and, in most cases, rarely have to compute BCCPs on WSPs of greater than a constant size. A simple, recursive BCCP algorithm is presented as Algorithm 1. Given a WSP, Q, and nodes from the tree, a and b, we compute the distance between the nodes, and recurse if that distance is less our current minimum. This is repeated until we have a minimum distance between one pair of points, p ∈ a and q ∈ b. 2.6 Voronoi Cell Diagrams A classical computational geometry structure, the Voronoi diagram is composed of cells which are constructed around the input points [10]. These cells are designed with the property that they each border a region that is closer to one data point than any other. By using well known divide and conquer algorithms, two dimensional Voronoi diagrams can be constructed in O(n log n) time [90]. In order to answer nearest neighbor queries using the Voronoi diagram one has to be able to identify in which cell that query point lies. The Dobkin-Kirkpatrick hierarchy is a data structure which answers this question in O(log n) time. This structure is based on a hierarchy of increasingly complex triangulations. At the bottom level of the hierarchy, the vertices of the Voronoi diagram are triangulated. A maximal independent set is calculated and removed, and then the holes are re-triangulated. Links are maintained between the 9 Algorithm 1 BCCP Algorithm [73]: Compute {p′ , q ′ , δ ′ } =BCCP(a, b, [, {p, q, δ} = η]) Require: a, b ∈ Q, δ ∈ R+ Require: If {p, q, δ} is not specified, {p, q, δ} = η, 1: an upper bound on BCCP (a, b). 2: procedure BCCP (a,b[, {p, q, δ} = η]) 3: if (|a| = 1 and |b| = 1) then 4: Let p′ ∈ a, q ′ ∈ b 5: if dℓ (p′ , q ′ ) < δ then 6: return {p′ , q ′ , dℓ (p′ , q ′ )} 7: else 8: γ = dℓ (Lef t(a), b) 9: ζ = dℓ (Right(a), b) 10: if γ < δ then 11: {p, q, δ} = BCCP (Lef t(a)), b, {p, q, δ}) 12: if ζ < δ then 13: {p, q, δ} = BCCP (Right(a)), b, {p, q, δ}) 14: return {p, q, δ} 15: end procedure Figure 2.4: A Voronoi Cell Diagram built over a two dimensional point set. 10 levels of the tree, and each triangle in one level intersects only a constant number of triangles in the level below. Kirkpatrick showed that the minimum size of a maximal independent n set is at least 32 , meaning that at each level some constant fraction of n is removed. This yields a tree of height O(log n). Queries are computed by locating a point in the topmost (constant size) triangulation, then proceeding down the hierarchy, doing a constant amount of work at each level. n factor is, in practice, a fairly high constant. More recent work has shown that The 32 n the factor is provably higher (Iocono gives a factor of 25 [49]. Work has been done to give improved results in this kind of planar point location, including work based on layered directed acyclic graphs [40] and randomized incremental algorithms [72] [88]. By allowing for expected O(log n) queries, data structures based on orientation tests have given some of the most efficient data structures to date for planar point location [9]. 2.7 Delaunay Graphs Another common computational geometry tool, the Delaunay graph is the dual of the Voronoi diagram. It is formed in such a way that, given the complete graph, every edge around which can be drawn an empty circumcircle is included in the graph [35]. Through various methods, the Delaunay graph can be constructed in O(n log n) time [94]. b b b b b b b b b b b b b b Figure 2.5: A Delaunay graph constructed over a two dimensional point set. It has been shown that by using a simple compass routing technique, one can find the nearest neighbor to a point using the Delaunay graph. To begin the query, an arbitrary starting place is chosen on the graph. The adjacent points are compared to the query point, and if any are closer to the query than the current position, the position is updated. By repeating the search for a local improvement, the global solution can be found [37]. The query time of such an algorithm is bounded by two factors. The first is the length of the path one must traverse to find the solution, the second is the degree of the vertices on the path. Birn et. al. showed that by constructing a hierarchy of Delaunay graphs, the first factor can be bounded, in expectation, by O(log n). They presented a practical algorithm using this method for answering the nearest neighbor query exactly in the plane [18], which is described in section 4. 11 b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b Figure 2.6: Compass routing on a Delaunay graph to locate a nearest neighbor Like the Voronoi diagram method, this approach has only be shown to be practical in the plane. In higher dimensions, the Delaunay graph suffers from greater than linear complexity, making it unlikely to be useful in general dimensions. In chapter 4 a new algorithm using both Delaunay graphs and Voronoi cells to find planar nearest neighbors is presented. In addition, chapter 4 presents results applying these techniques to two and three dimensional point sets. 2.7.1 Principal Component Analysis Introduced in 1901 [81], principal component analysis(PCA) is a tool which can be used to simplify high dimensional data by identifying structure within it and eliminating noise. In essence, it seeks to reduce d-dimensional data to d′ -dimensions (with d′ < d) while maximally preserving relational information. One method for accomplishing this is through eigenvector decomposition of the covariance matrix of the data. In chapter 4, the Fast PCA [91] algorithm will be used to reduce SIFT data for more efficient nearest neighbor searching. Algorithm 2 Fast PCA Algorithm [91] Require: Set of points P ∈ Rd 1: procedure FastPCA(points P , integer h) 2: C ← covariance matrix of P 3: for p = 1 to h 4: Ep ← random d × 1 vector 5: do 6: Ep ← CEp P T 7: Ep ← Ep − p−1 j=1 (Ep Ej )Ej 8: Ep ← Ep /||Ep || 9: while Ep has not converged. 10: return E1 ...Eh 11: end procedure 12 CHAPTER 3 K-NEAREST NEIGHBOR GRAPHS K-nearest neighbor graph construction involves creating a graph in which every point is connected to its k-nearest neighbors. This chapter will discuss current methods for solving this problem and present a new algorithm, called knng, that is cache efficient, capable of utilizing parallel processors, and able to work on data sets too large to fit into conventional memory. An extensive theoretical analysis proves the running time is competitive with other state-of-the-art algorithms. Finally, extensive experiments will be presented, demonstrating the practical advantages of this new approach. The naive approach to solve the k-nearest neighbor graph construction problem uses O(n2 ) time and O(nk) space. Theoretically, the k-nearest neighbor graph can be computed in O(n log n + nk) [21]. The method is not only theoretically optimal and elegant but also parallelizable. Unfortunately these methods also have high constant values that can make them impractical [98, 21, 28, 38]. In practice, variants of kd-tree implementations [66, 78, 27] are generally chosen. In low dimensions, one of the best kd-tree implementations is by Arya et al. [7]. They present an ǫ-approximation based kd-tree algorithm that can answer nearest neighbor queries in O(log n log ǫ1d ) time. Morton order or Z-order of points, described in chapter 2 have been used previously for many related problems. Tropf and Herzog [96] present a precursor to many nearest neighbor algorithms. Their method uses one, unshifted Morton order point set to conduct range queries. The main drawbacks of their method were: (1) It does not allow use of non-integer keys. (2) It does not offer a rigorous proof of worst case or expected run time, in fact it leaves these as an open problem. (3) It offers no scheme to easily parallelize their algorithm. Orenstein and Merrett [76] described another data structures for range searching using Morton order on integer keys [67]. Bern [16] described an algorithm using 2d shifted copies of a point set in Morton order to compute an approximate solution to the k-nearest neighbor problem. This paper avoids a case where two points lying close together in space are far away in the one-dimensional order, in expectation. In all these algorithms, the Morton order was determined using explicit interleaving of the coordinate bits. Liao et al. [59] used Hilbert curves to compute an approximate nearest neighbor solution using only d + 1 shifted curves. Chan [23] refined this approach, and later presented an algorithm that used only one copy of the data, while still guaranteeing a correct approxima- 13 tion result for the nearest neighbor problem [24]. It is worth noting that in practice, these methods for k-nearest neighbor computation are less efficient than state of the art kd-tree based algorithms [24]. 3.0.2 Notation In this chapter, lower case Roman letters denote scalar values. p and q are specifically reserved to refer to points. P is reserved to refer to a point set. n is reserved to refer to the number of points in P . p < q denotes the Z-value of p is less than q (> is used similarly). ps denotes the shifted point p + (s, s, . . . , s). P s = {ps |p ∈ P }. dℓ (p, q) denotes the Euclidean distance between p and q. pi is the i-th point in the sorted Morton ordering of the point set. p(j) is used to denote the j-th coordinate of the point p. The Morton ordering also defines a quadtree on the point cloud. boxQ (pi , pj ) refers to the smallest quadtree box that contains the points pi and pj . box(c, r) denotes a box with center c and radius r. The radius of a box is the radius of the inscribed sphere of the box. k is reserved to refer to the number of nearest neighbors to be found for every point. d is reserved to refer to the dimension. In general, upper case Roman letters (B) refer to a bounding box. Bounding boxes with a subscript Q (BQ ) refer to a quadtree box. and dℓ (p, B) is defined as the minimum distance from point p to box (or quadtree box) B. E[ ] is reserved to refer to an expected value. E refers to an event. P (E) is reserved to refer to the probability of an event E. Ai is reserved to refer to the current k nearest neighbor solution for point pi , which may still need to be refined to find the exact nearest neighbors. nnk (p, {}) defines a function that returns the k nearest neighbors to p from a set. The bounding box of Ai refers to the minimum enclosing box for the ball defined by Ai . Finally, rad(p, {}) returns the distance from point p to the farthest point in a set. rad(Ai ) = rad(pi , Ai ). 3.1 The knng Algorithm The knng algorithm(Algorithm 4) mainly consists of the following three high level components: • Preprocessing Phase: In this step, input points, P , are sorted the using Morton ordering. • Sliding Window Phase: For each point p ∈ P , compute its approximate k-nearest neighbors by scanning O(k) points to the left and right of p. Another way to think of this step is, to slide a window of length O(k) on the sorted array and find the k-nearest neighbors restricted to this window. • Search Phase: Refine the answers of the previous phase by zooming inside the constant factor approximate k-nearest neighbor balls using properties of the Morton order. 14 Recall from chapter 2.2 that the common approach to Morton ordering does not account for point values with non-integer coordinates. Next, an algorithm for computing the relative Morton order of points with floating point coordinates will be discussed, along with the details of the preprocessing phase. 3.1.1 Preprocessing and Morton Ordering On Floating Points As described in chapter 2.2 Chan’s method for relative Morton order only applies to integer types. This is due to using the XOR operation and bit shifting operators that, in general, are not applicable on floating point types. However, it can be extended to floating point types as shown in Algorithm 3. The algorithm takes two points with floating point coordinates. The relative order of the two points is determined by the pair of coordinates who have the first differing bit with the highest exponent. The XORmsb function computes this by first comparing the exponents of the coordinates, then comparing the bits in the mantissa if the exponents are equal. Note that the msdb function on line 14 returns the most significant differing bit of two integer arguments. This is calculated by first XORing the two values, then shifting until the most significant bit is reached. The exponent and mantissa functions return those parts of the floating point number in integer format. This algorithm allows the relative Morton order of points with floating point coordinates to be found in O(d) time and space complexity. Algorithm 3 Floating Point Morton Order Algorithm Require: d-dimensional points p and q Ensure: true if p < q in Morton order 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: procedure Compare(point p , point q) x ← 0; dim ← 0 for all j = 0 to d do y ← XORmsb(p(j) , q(j) ) if x < y then x ← y; dim ← j end if end for return p(dim) < q(dim) end procedure procedure XORmsb(double a , double b) x ←exponent(a); y ←exponent(b) if x = y then z ← msdb(mantissa(a),mantissa(b)) x←x−z return x end if if y < x then return x else return y end procedure 15 Figure 3.1: Pictorial representation of Algorithm 2, Line 5. Since all the points inside the approximate nearest neighbor ball of pi have been scanned, the nearest ⌈rad(Ai )⌉ neighbor has been found. This happens because pi is the largest point with respect to the Morton ordering compared to any point inside the box. Any point greater than pi+k in Morton ordering cannot intersect the box shown. A similar −⌈rad(Ai )⌉ argument holds for pi and pi−k . Once it is possible to determine the relative Morton order of points, the preprocessing phase of the algorithm is trivial. Using the Morton order comparison, all points p ∈ P are sorted according to their Z-value. 3.1.2 Sliding Window Phase Once P has been sorted in the previous phase, a partial solution is found for each point pi by finding the k nearest neighbors from the set of points {pi−ck ...pi+ck } for some constant c ≥ 12 . This is done via a linear scan of the range of points. The actual best value of c is platform dependent, and in general c should not be too large. Once this partial nearest neighbor solution is found, its correctness can be checked using the property of Morton ordering shown chapter 2.2. If the corners of the bounding box for the current solution lie within the range that has already been searched, then the partial solution is in fact the true solution (see Figure 3.1). If the current approximate nearest neighbor ball is not bounded by the lower and upper points already checked, a binary search must be used to find the location of the lower and upper corners of the bounding box of the current approximate nearest neighbor ball in the Morton ordered point set. This defines the range that needs to be searched in the final 16 phase of the algorithm. 3.1.3 Search Phase For each point pi for which the solution was not found in the previous step, the partial solution must be refined to find the actual solution. This is done using a recursive algorithm. Given a range of points {pa ...pb }, first check if the distance r from pi to boxQ (pa , pb ) is greater than the radius of Ai (line 24). If it is, then the current solution does not intersect this range. Otherwise, update Ai with pa+b/2 . Repeat the procedure for the ranges {pa ...pa+b/2−1 } and {pa+b/2+1 ...pb }. One important observation is the property used as a check in the scan portion of the algorithm still holds, and one of these two new ranges of points may be eliminated by comparing the bounding box of Ai with pa+b/2 . If the length of a range is less than ν, a fixed constant, a linear scan of the range is more efficient than recursing further. A good value for ν is platform dependent, and should be experimentally determined. 3.1.4 Parallel Construction Parallel implementation of this algorithm happens in three phases. For the first phase, a parallel Quick Sort [97] is used in place of a standard sorting routine. Second, the sorted array is split into p chunks (assuming p threads to be used), with each thread computing the initial approximate nearest neighbor ball for one chunk independently. Finally, each thread performs the recursive step of the algorithm on each point in its chunk. This allows, as near as is possible, the workload to be evenly divided across all threads. In practice, memory overhead and other factors keep this from being a perfectly linear speedup. 3.1.5 Handling large data sets Many applications of k-nearest neighbor graph construction require large point clouds to be handled that do not fit in memory. One way to handle this problem is to make diskbased data structures [86]. An alternative solution is simply increasing the swap space of the operating system and running the same implementation as in internal memory. Many operating systems allow on the fly creation and deletion of temporary swap files (Windows, Linux), which can be used to run the knng algorithm on very large data sets (100 million or more points). Experimentally, this algorithm was able to calculate k-nearest neighbor graphs for very large data sets (up to 285 million points as seen in Table 3.1). In Linux, new user space memory allocations (using new or malloc) of large sizes are handled automatically using mmap which is indeed a fast way to do IO from disks. Once the data is memory mapped to disk, both sorting and scanning preserve locality of access in the algorithm and hence are not only cache friendly but also disk friendly. The last phase of the knng algorithm is designed to be disk friendly as well. Once an answer is computed for point pi by a single thread, the next point in the Morton order uses the locality of access from the previous point and therefore causes very few page faults in practice. 17 Algorithm 4 KNN Graph Construction Algorithm Require: Randomly shifted point set P of size n. <,>. (COMPARE= <). Ensure: Ai contains k nearest neighbors of pi in P . 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: Morton order compare operators: procedure Construct(P , int k) P ← ParallelQSort(P, <) parallel for all pi in P Ai ← nnk (pi , {pi−k , . . . , pi+k }) ⌈rad(Ai )⌉ if pi < pi+k then u ← i else ⌈rad(Ai )⌉ I ← 1; while pi < pi+2I do: ++I u ← min(i + 2I , n) end if −⌈rad(Ai )⌉ if pi > pi−k then l ← i else −⌈rad(Ai )⌉ I ← 1; while pi > pi−2I do: ++I l ← max(i − 2I , 1) end if if l 6= u then CSearch(pi , l, u) end procedure procedure CSearch(point pi , int l , int h) if (h − l) < ν then Ai ← nnk (pi , Ai ∪ {pl . . . ph }) return end if m ← (h + l)/2 Ai ← nnk (pi , Ai ∪ pm ) if dℓ (pi , box(pl , ph )) ≥ rad(Ai ) then return if pi < pm then CSearch(pi , l, m − 1) ⌈r(A )⌉ if pm < pi i then CSearch(pi , m + 1, h) else CSearch(pi , m + 1, h) −⌈r(Ai )⌉ if pi < pm then CSearch(pi , l,m − 1) end if end procedure 18 3.2 Analysis of the knng Algorithm In this section, the knng algorithm is shown to have a running time of O(⌈ pn ⌉k log k) in expectation, plus the time for one sort. In addition to the space required for storing the input and output, the algorithm requires O(pk) extra space. The running time above is only valid under certain assumptions about the input data. Let P be a finite set of points in Rd such that |P | = n ≫ k ≥ 1. Let µ be a counting measure on P . Let the measure of a box, µ(B) be defined as the number of points in B ∩ P . P is said to have expansion constant γ if ∀pi ∈ P and for all k ∈ (1, n): µ(box(pi , 2 × rad(pi , Nik ))) ≤ γk In plain terms, this restriction states that doubling the size of a ball placed over points from the input distribution should not increase the number of points within that ball by greater than some constant factor. This is a similar restriction to the doubling metric restriction on metric spaces [29, 51] and has been used before [51]. Throughout the analysis, we will assume that P has an expansion constant γ =O(1). The first phase of the algorithm is sorting, which has well established runtime bounds. The second phase is a linear scan. The dominating factor in the running time will be from the third phase; the recursive search function. The running time of this phase will be bounded by showing that the smallest quadtree box containing the actual nearest neighbor ball for a point is, in expectation, only a constant factor smaller than the quadtree box containing the approximate solution found in phase two. Given the distribution stated above, this implies there are only O(k) additional points that need to be compared to refine the approximation to the actual solution. The actual running time of the CSearch function is upper bounded by the time it would take to simply scan the O(k) points. To prove the running time of the algorithm, consider the solution to the following game: In a room tiled or paved with equal square tiles (created using equidistant parallel lines in the plane), a coin is thrown upwards. If the coin rests cleanly within a tile, the length of the square tile is noted down and the game is over. Otherwise, the side length of the square tiles in the room are doubled in size and the same coin is tossed again. This process is repeated till the coin rests cleanly inside a square tile. Note that in this problem, the square tiles come from quadtrees defined by Morton order, and the coin is defined by the optimal k-nearest neighbor ball of pi ∈ P . The goal is to bound the number of points inside the smallest quadtree box that contains box(pi , rad(pi , Nik )). This leads to the following lemma: Lemma 3.2.1. Let B be the smallest box, centered at pi ∈ P containing Nik and with side length 2h (where h is assumed w.l.o.g to be an integer > 0) which is randomly placed in a quadtree Q. If the event Ej is defined as B being contained in a quadtree box BQ with side length 2h+j , and BQ is the smallest such quadtree box, then P (Ej ) ≤ 1 1− j 2 d dj−1 2 j 2 −j 2 Proof. From Figure 3.2, it can be inferred that in order for B to be contained in BQ , the total number of candidate boxes where the upper left corner of B can lie, is (2j − 1) along 19 Figure 3.2: (a) B lands cleanly on the quadtree box BQ twice its size. (b) B lands cleanly on a quadtree box 22 times its size. In both figures, if the upper left corner of B lies in the shaded area, the box B does not intersect the boundary of BQ . Obviously, (a) happens with probability 41 (probability ( 21 )d in general dimension) 2 2 = 9 (probability ( 22 −1 )d in general and (b) happens with probability ( 2 2−1 2 ) 16 22 dimension). each dimension. The total number of candidate gray boxes is therefore (2j − 1)d . The probability that the upper left corner lies in a particular gray box is 2hd /2(h+j)d . Thus the probability that B is contained in BQ is ((2j − 1)/2j )d . If BQ is the smallest quadtree box housing B, then all quadtree boxes with side lengths 2h+1 ,2h+2 ,. . . ,2h+j−1 cannot contain BQ . This probability is given by: l d ! j−1 j−1 Y Y (2l )d − (2l − 1)d 2 −1 1− = 2l (2l )d l=1 l=1 The probability of BQ containing B is therefore j−1 (2j − 1)d Y (2l )d − (2l − 1)d P (Ej ) = (2j )d (2l )d l=1 2 Now consider following inequality: Given v such that 0 < v < 1; (1−v)d ≤ 1−dv+d(d−1) v2! , which can be easily proved using induction or alternating series estimation theorem. Putting v = 1/2l yields: (1 − v)d ≤ 1 − dv(1 + v/2) + d2 v 2 /2 ≤ 1 − 2dl 1 + 1/2l+1 + d2 /22l+1 The sum can be simplified using a Taylor series with v = (1 − v)d ≤ 1 − dv + d(d − 1) 1 . 2l v2 v3 − d(d − 1)(d − 2) + . . . 2! 3! 20 v2 2! 2 2 dv 2 d v − ≤ 1 − dv + 2 2 v d2 v 2 ≤ 1 − dv 1 + + 2 2 1 d2 d ≤ 1 − l 1 + l+1 + 2l+1 2 2 2 ≤ 1 − dv + (d2 − d) Then, by substituting and simplifying d j−1 Y 1 d2 d P (Ej ) ≤ 1 + l+1 − 2l+1 2l 2 2 l=1 j−1 d d 1 dY d ≤ 1− j 1 + 2l+1 − 2l+1 2 2l 2 2 l=1 j−1 1 dY d ≤ 1− j 2 2l l=1 1 d dj−1 ≤ 1− j j 2 −j 2 2 2 1 1− j 2 Lemma 3.2.2. The linear scan phase of the algorithm produces an approximate k-nearest neighbor box B ′ centered at pi with radius at most the side length of BQ . Here BQ is the smallest quadtree box containing B, the k-nearest neighbor box of pi . Proof. The algorithm scans at least pi−k . . . pi+k , and picks the top k nearest neighbors to pi among these candidates. Let a be the number of points between pi and the largest Morton order point in B. Similarly, let b be the number of points between pi and the smallest Morton order point in B. Clearly, a + b ≥ k. Note that B is contained inside BQ , hence µ(BQ ) ≥ k. Now, pi . . . pi+k must contain a points inside BQ . Similarly, pi−k . . . pi must contain at least b points from BQ . Since we have collected at least k points from BQ , the radius of B ′ is upper bounded by the side length of BQ . ′ , has only a constant number Lemma 3.2.3. The smallest quadtree box containing B ′ , BQ of points more than k in expectation. Proof. Let there be k ′ points in B. Clearly k ′ =O(k) given γ =O(1). The expected number ′ is at most E[γ x k ′ ], where x is such that if the side length of B is 2h then of points in BQ ′ is 2h+x . Let the event E ′ be defined as this occurring for some fixed the side length of BQ x value of x. Recall from Lemma 3.2.1 that j is such that the side length of BQ is 2h+j . The probability for the event Ej is 21 Figure 3.3: All boxes referred in Lemma 3.2.3. P (Ej ) = j−1 (2j − 1)d Y (2l )d − (2l − 1)d (2j )d (2l )d l=1 1 dj−1 ≤ (1 − j )d j 2 −j 2 2 2 ′ From Lemma 3.2.2 B ′ has a side length of at most 2h+j+1 = 2h . Let E ′′ j ′ be the event ′ with side length 2h′ +j ′ . Note that E ′′ ′ has the that, for some fixed h′ , B ′ is contained in BQ j ′ is independent of j. Given same probability mass function as E , and that the value of j j P ′′ x ′ this, P (E ′ x ) = x−1 j=1 P (Ej )P (E j ′ ). From this, E[γ k ] follows E[γ x k] ≤ ∞ X = x=2 ∞ X γ x kP (E ′ x ) γxk x=2 ≤ ≤ ∞ X x=2 P (Ej )P (E ′′ j ′ ) j=1 γxk x=2 ∞ X x−1 X x−1 X j=1 γxk x−1 X j=1 (1 − 1 d dj−1 1 d dx−j−1 ) ) (1 − 2j 2 j 22−j 2x−j 2 (x−j)22−(x−j) (1 − 1 d dx−2 1 d ) (1 − ) 2 2j 2x−j 2 x2 −(1+2j)x+2j 2 Observe that, ∀j ∈ {1, 2, . . . , x − 1}: (1 − 2−j )d (1 − 2j−x )d ≤ (1 − 2−x/2 )2d 22 which can be proved by showing: (1 − 2−j )(1 − 2j−x ) ≤ (1 − 2−x/2 )2 which is true because a+b 2 expectation calculation yields: or 2−j + 2j−x ≥ 2−x/2+1 √ ≥ ab. Putting this simplified upper bound back in the E[γ x k] = ∞ x−1 X X dx−2 1 2d x γ k 1 − x/2 ≤ 2 x −(1+2j)x+2j 2 2 2 2 x=2 j=1 x−1 ∞ 2d X x2 −(1+2j)x+2j 2 X 1 2 γ x 1 − x/2 ≤ k dx−2 2− 2 x=2 j=1 x−1 ∞ 2d X X 1 2 dx−2 2(x−x )/2 ≤ k 2j(x−j) γ x 1 − x/2 2 x=2 j=1 It is easy to show that j(x − j) ≤ x2 /4 ∀j ∈ {1, . . . , x − 1}: Let a′ = x/2 and b′ = j − x/2. Then (a′ + b′ )(a′ − b′ ) = a′2 − b′2 where the LHS is j(x − j) and RHS is x2 /4 − b′2 . Hence 2 j(x − j) ≤ x2 /4. Using the fact that 2j(x−j) ≤ 2(x/2) in the expectation calculation yields: ∞ X 1 2d x−2 (x−x2 )/2 x2 /4 x d 2 x2 ≤ k γ 1 − x/2 2 x=2 ∞ X √ x 1 2d −x2 /4 ≤ k x(dγ 2) 1 − x/2 2 2 x=2 Putting y = x/2 and c = 2(dγ)2 , yields: ≤ 2k ∞ X √ 2y 1 2d 2 y c 2−y 1 − y 2 y=1 Using the Taylor’s approximation: 1 2d ≤ 1 − d2−y 2 + 2−y + d2 2−2y+1 1− y 2 which when substituted: ≤ 2k Z ∞ y y=0 c y 1 − d2−y 2 + 2−y + d2 2−2y+1 dy y 2 Integrating and simplifying using the facts that the error function, erf(x) encountered in integrating the normal distribution follows erf(x) ≤ 1, and c = 2(dγ)2 =O(1), yields: (ln(c))2 √ 1 π ln 2 e 4 ln(2) + ≤ k ln 2 = O(k) 23 Theorem 3.2.4. For a given point set P of size n, with a bounded expansion constant and a fixed dimension, in a constrained CREW PRAM model, assuming p threads, the k-nearest neighbor graph can be found in one comparison based sort plus O(⌈ pn ⌉k log k) expected time. ′ can be found in O(log k) Proof. Once B ′ is established in the first part of the algorithm, BQ time by using a binary search outward from pi (this corresponds to lines 5 to 14 in Algo′ is found, it takes at most another O(k) steps to report the solution. rithm 4). Once BQ There is an additional O(log k) cost for each point update to maintain a priority queue of the k nearest neighbors of pi . Since the results for each point are independent, the neighbors for each point can be computed by an independent thread. Note that the algorithm reduces the problem of computing the k-nearest neighbor graph to a sorting problem (which can be solved optimally) when k =O(1), which is the case for many graphics and visualization applications. Also, the expectation in the running time is independent of the input distribution and is valid for arbitrary point clouds. That concludes the theoretical analysis of the knng algorithm. In the next section, the algorithm’s running time will be tested experimentally. 3.3 Experimental Analysis of the knng Algorithm The knng algorithm was tested on three different architecture setups, each detailed in its own section below. The primary competition for k-nearest neighbor graph construction is the ANN library for low dimensional nearest neighbor searching. 3.3.1 ANN Mount and Arya introduced their ANN approximate nearest neighbor library implementation based on pruning kd-trees to find nearest neighbors. One further improvement they made to the classic tree based query algorithm was the introduction of a priority queue to the search. As the nearest neighbor ball descends in the tree, new hyper-rectangles are inserted into a priority queue. Rectangles are then considered in order of their distance from the current nearest neighbor ball. In this way, one hopes to consider the most relevant rectangles first, and eliminate the need to consider others entirely. The ANN library is coded in C++, and is highly optimized for both cache efficiency and running time. It has long been used as the standard in low dimensional nearest neighbor searching. ANN [69] had to be modified to allow a fair comparison. Nearest neighbor graph construction using ANN is done in two phases. The preprocessing stage is the creation of a kd-tree using the input data set. Then, a nearest neighbor query is made for each point in the input data set. For these experiments, the source code was modified allow multiple threads to query the same data structure simultaneously. The kd-tree construction was not modified to use a parallel algorithm. However, it is worth noting that even if a parallel kdtree construction algorithm was implemented, it would almost certainly still be slower than parallel sorting (the preprocessing step in the knng algorithm). In the interests of a fair comparison, the empirical results section includes several examples of k-nearest neighbor graph construction where only one thread was used (Figures 3.6, 3.11, 3.14). 24 Table 3.1: Construction times for k = 1-nearest neighbor graphs constructed on non-random 3-dimensional data sets. Each graph was constructed using 8 threads on Intel architecture. All timings are in seconds. Dataset Screw Dinosaur Ball Isis Blade Awakening David Night Atlas Tallahassee 3.3.2 Size (points) 27152 56194 137602 187644 861240 2057930 3614098 11050083 182786009 285000000 ANN (s) .06 .11 .31 .46 2.9 8.6 16.7 62.2 - knng(long) (s) .04 .07 .14 .18 .86 2.1 3.7 12.4 1564 2789 knng(float) (s) .06 .11 .21 .27 1.3 3.2 5.6 18.6 2275 4235 Experimental Data Except where noted, random data sets were generated from 3-dimensional points uniformly distributed between (0, 1], stored as 32 bit floats. Graphs using random data sets were generated using multiple sets of data, and averaging the result. Results from random data sets with different distributions (such as Gaussian, clustered Gaussian, and spherical) were not significantly different from the uniform distribution. Also included were several non-random data sets, consisting of surface scans of objects. In all graphs, the knng algorithm will be labeled ‘knng(float)’. The label ‘knng(long)’, means that the data was scaled to a 64 bit integer grid, and stored as a 64 bit integer. This improves the running time of the algorithm dramatically, and can be done without loss of precision in most applications. 3.3.3 Intel Architecture This experiment was conducted on a machine equipped with dual Quad-core 2.66GHz Intel Xeon CPUs, and a total of 4 GB of DDR memory. Each core has 2 MB of total cache. SUSE Linux with kernel 2.6.22.17-0.1-default was running on the system. The compiler used was gcc version 4.3.2 for compilation of all code (with -O3). Construction Time Results. As shown in Figure 3.4 and Table 3.1, the knng algorithm performs very favorably against k-nearest neighbor graph construction using ANN. Table 3.1 shows timings of k-nearest neighbor graph construction on very large point sets, where ANN was unable to complete a graph due to memory issues. Construction times improve dramatically when floating points are scaled to an integer grid. Other random distributions had similar construction times to these cases, and so more graphs were not included. Figure 3.5 shows that as k increases, the advantage runs increasingly toward the knng implementation. Finally, Figure 3.6 shows the speedup gained by increasing the number of threads. In these and all other graphs shown, standard deviation was very small; less than 2% of the mean value. 25 160 knng(long) ANN knng(float) Time (seconds) 140 120 100 80 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 Size of Input Set(millions) Figure 3.4: Graph of 1-NN graph construction time vs. number of data points on the Intel architecture. Each algorithm was run using 8 threads in parallel. 40 knng(long) ANN knng(float) Time (seconds) 35 30 25 20 15 10 5 0 0 10 20 30 40 50 60 70 80 90 100 Size of Nearest Neighbor Ball Figure 3.5: Graph of k-NN graph construction time for varying k on the Intel architecture. Each algorithm was run using 8 threads in parallel. Data sets contained one million points. 26 30 knng(long) ANN knng(float) Time (seconds) 25 20 15 10 5 0 1 2 3 4 5 6 Number of Threads 7 8 Figure 3.6: Graph of 1-NN graph construction time for varying number of threads on Intel architecture. Data sets contained ten million points. Allocated Memory per Point(bytes) 1000 knng(float) ANN 800 600 400 200 0 0 2 4 6 8 10 12 14 16 Number of Points (millions) 18 20 Figure 3.7: Graph of memory usage per point vs. data size on Intel architecture. Memory usage was determined using valgrind. 27 40 knng(float) ANN Cache Misses/1000 35 30 25 20 15 10 5 0 0 100 200 300 400 500 600 700 800 900 1000 Size of Point Set (k) Figure 3.8: Graph of cache misses vs. data set size on Intel architecture. All data sets were uniformly random 3-dimensional data sets. Cache misses were determined using valgrind which simulated a 2 MB L1 cache. 3.3.4 AMD Architecture This machine was equipped with 8 Dual Core 2.6GHz AMD OpteronTM Processor 885, for a total of 16 cores. Each processor had 128 KB L1 Cache, 2048 KB L2 cache and shared a total of 64 GB of memory. Compiler gcc version 4.3.2 was used for compilation of all code (with -O3). As can be seen in Figures 3.9, 3.10 and 3.11, the knng algorithm performs well despite the change in architecture. ANN fared particularly poorly on this architecture. 28 250 knng(long) ANN knng(float) Time (seconds) 200 150 100 50 0 0 2 4 6 8 10 12 14 16 18 20 Size of Input Set(millions) Figure 3.9: Graph of 1-NN graph construction time vs. number of data points on AMD architecture. Each algorithm was run using 16 threads in parallel. 350 knng(long) ANN knng(float) Time (seconds) 300 250 200 150 100 50 0 0 10 20 30 40 50 60 70 80 90 100 Size of Nearest Neighbor Ball (K) Figure 3.10: Graph of k-NN graph construction time for varying k on AMD architecture. Each algorithm was run using 16 threads in parallel. Data sets contained ten million points. 29 300 knng(long) ANN knng(float) Time (seconds) 250 200 150 100 50 0 0 2 4 6 8 10 12 Number of Threads 14 16 Figure 3.11: Graph of 1-NN graph construction time for varying number of threads on AMD architecture. Data sets contained ten million points. 3.3.5 Sun Architecture This machine is a Sun T5120 server with a eight-core, T2 OpenSparc processor and 32 GB of memory. Overall, this was a slower machine compared to the others that were used, however it was capable of running 64 threads simultaneously. The compiler gcc version 4.3.2 was used for compilation of all code (with -O3). As can be seen in Figure 3.12, results for construction were similar to the previous experiments. One unexpected result was ANN’s performance as k increased, as seen in Figure 3.13. Since ANN was developed on a Sun platform, the improvements seen as k increases may be due to platform specific tuning. In Figure 3.14, it can be observed how both algorithms behave with a large number of threads. Both ANN and the two versions of knng level out as the number of threads increases (processing power is limited to eight cores). 30 Time (seconds) 50 45 40 35 30 25 20 15 10 5 0 knng(long) ANN knng(float) 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Size of Input Set(millions) Figure 3.12: Graph of 1-NN graph construction time vs. number of data points on Sun architecture. Each algorithm was run using 128 threads in parallel. 50 knng(long) ANN knng(float) 45 Time (seconds) 40 35 30 25 20 15 10 5 0 10 20 30 40 50 60 70 80 90 100 Size of Nearest Neighbor Ball Figure 3.13: Graph of k-NN graph construction time for varying k on Sun architecture. Each algorithm was run using 128 threads in parallel. Data sets contained ten million points. 31 80 knng(long) ANN knng(float) Time (seconds) 70 60 50 40 30 20 10 0 0 20 40 60 80 100 Number of Processors 120 140 Figure 3.14: Graph of 1-NN graph construction time for varying number of threads on Sun architecture. Data sets contained ten million points. 32 CHAPTER 4 PLANAR NEAREST NEIGHBOR SEARCH Planar nearest neighbor search seeks to answer near neighbor queries on data restricted to a single two dimensional subspace. This chapter will explore this problem by discussing current solutions and presenting a new algorithm, called DelaunayNN, which can outperform them. This will be followed by a theoretical analysis which can make stronger theoretical guarantees than some competitors. Finally, extensive experimental results will be presented. Linear space, O(n log n) pre-processing and O(log n) query time algorithms for low dimensional nearest neighbor searching are known, such as the Dobkin-Kirkpatrick hierarchy, but as discussed in chapter 2, these tend to have large constants [52, 34] that make them impractical. One of the most practical algorithms available for this problem with provable guarantees is due to Devillers [36, 19]. This approach uses the idea of building a hierarchy out of a subset of the incremental stages of Delaunay graph construction, then using local information to quickly locate nearest neighbors. Recently, Birn et al. [18] have announced a very practical algorithm for this problem which beats the state of the art [69, 36]. The full delaunay hierarchy nearest neighbor search (FDHNN) includes edges from every incremental stage of Delaunay graph construction. This allows for fast answers to nearest neighbor queries, but gives up some theoretical guarantees. One interesting feature that is used by both Delaunay hierarchy algorithms is the fact that compass routing is supported by Delaunay triangulations [54, 18]. It was shown by Kranakis et al. [54] that if one wants to travel between two vertices s and t of a Delaunay triangulation, one can only use local information on the current node and the coordinates of t to reach t, starting from s. The routing algorithm is simple, the next vertex visited is the one whose distance to t is minimum amongst the vertices connected to the current vertex. This type of local greedy routing has also been studied as the ‘small-world phenomenon’ or the ‘six degrees of separation’ [65]. As mentioned in chapter 3, the ANN algorithm is very effective for low dimensional nearest neighbor search. In theory, however, its runtime is based on approximate nearest neighbors. For exact nearest neighbor searching it cannot offer runtime bounds better than O(n). Notation. In this chapter, points are denoted by lower-case Roman letters. i, j, k are reserved for indexing purposes. dℓ (p, q) denotes the distance between the points p and q 33 in L2 metric. P is reserved to refer to a point set. n is reserved to refer to the number of points in P . p < q iff p precedes q in Morton order (> is used similarly). ps is used to denote the shifted point p + (s, s, . . . , s). P s = {ps |p ∈ P }. pi is the i-th point in the sorted Morton ordering of the point set. Upper-case Roman letters are reserved for sets. Scalars are represented by lower-case Greek letters. Ball(p, q) represents the diametral ball of p and q. Ball(p, ρ) would identify a ball centered at p with diameter ρ. For point q, let Nqk be the k points in P , closest to q. |.| denotes the cardinality of a set or the number of points in P inside the geometric entity enclosed in ||. Let, nn(p, {}) return the nearest neighbor of p in a given set. Finally, rad(p, {}) returns the distance from point p to the farthest point in a set. 4.1 The Algorithm Algorithm 5 describes nearest neighbor pre-processing. The pre-processing algorithm essentially splits the input point set, P , into three layers. First the Delaunay triangulation, G, of P is computed and a maximal independent set M is computed for this graph. P ′ is the set P \ M . Layer 1 is constructed from points P ′ by sorting them in Morton order. Layer 2 is comprised of the Delaunay triangulation of points P ′ . Finally, Layer 3 contains M , along with edges from the original Delaunay graph of P , connecting the points in Layer 2 to the maximal independent set of points. (see Figure 4.1). b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b (a) The first layer, consisting of the non-maximal independent set vertices sorted in Morton order. Queries are processed using a binary search to find an approximate nearest neighbor ball. (b) The second layer, consisting of the points in the first layer, and the edges of their Delaunay triangulation. The queries are processed by using compass routing, starting at the nearest point found in the previous layer, and ending at the nearest neighbor in this layer. (c) The final layer, consisting of edges that connect the points in the second layer to the points in the maximal independent set. In this layer, we refine the nearest neighbor found in the previous step by scanning those points adjacent to it in this graph. This results in the final answer. Figure 4.1: The three layers of the query algorithm. Pre-processing is complete unless there is a point p in Layer 2 or 3 with large de- 34 gree (Ω(1)). In these cases, the vertices of the Voronoi cell of the point p are computed, along with the rays emanating from p and going through these vertices, partitioning the space around p into sectors. These sectors are sorted in clockwise order to facilitate fast searching. These voronoi rays and the points located in the cell are stored in layers 2 or 3 (see Figure 4.2). b b b b b b b b b b b b b b b p b b b b b b Figure 4.2: Here we see the center vertex p, and the sectors defined by rays passing through the Voronoi vertices. To find a nearer neighbor, we locate which sector the query lies in (via a binary search), then check the distance to the adjacent point that lies in that sector. Algorithm 5 DealaunayNN Preprocessing Require: Randomly shifted point set P of size n. Morton order compare operator <. 1: procedure PreProcess(P ) 2: G = (P, E) ← Delaunay Triangulation of P 3: M ← Maximal Independent set of G 4: P′ = P \ M 5: P ′ ← Sort(P ′ , <) 6: G′ = (P ′ , E ′ ) ← Delaunay Triangulation of P ′ 7: for all p ∈ P ′ do: 8: H(p) ← {q|e = (p, q) ∈ G where q ∈ P \ P ′ }. 9: for all F in {G′ , H} 10: for all v ∈ F and degree(v) = Ω(1) do: 11: Pre-process VoronoiCell(v, F ) for fast lookups and jumps. 12: end procedure The query algorithm(Algorithm 6 starts by locating the query point q in the Morton ordering of points in Layer 1 (P ′ ) via a binary search. It then scans η =O(1) points around the location and finds the closest point p′i to q among those points. CompassRouting is used to find the nearest neighbor of q in Layer 2 starting from p′i . Let this point in Layer 2 be called p′j ∈ P ′ . The nearest neighbor of q in Layer 3 is found by traversing the connection edge, and again checking the local neighborhood. The answer is then returned. 35 Algorithm 6 DelaunayNN Query 1: procedure CompassRouting(point v, point q, Graph G = (P, E)) 2: Require: v ∈ G. 3: repeat 4: If degree(v) = Ω(1) then 5: If q ∈VoronoiCell(v, G) then return v 6: else: Update v using preprocessed VoronoiCell(v, G). 7: else: 8: for all v ′ ∈ G incident on v do: 9: If Dist(v ′ , q) < Dist(v, q) then: 10: v ← v ′ ; break 11: until No improvement found 12: end procedure 13: 14: 15: 16: 17: 18: procedure Query(point q) i′ ←BinarySearch(P ′ , q, <) p′i ←nn(q, {pi′ −η , . . . , pi′ +η }) where η =O(1) p′j ←CompassRouting(p′i , q, G′ ) return nn(p′j , H(p′j )) // Uses preprocessed VoronoiCell(p′j , H) if |H(p′j )| = Ω(1) end procedure The CompassRouting algorithm is simple. Assuming there are no high degree vertices in Layer 2, it starts with a point v and searches for one closer point to q that is incident on v. If there are no such vertices, it proceeds to layer 3, otherwise it jumps to the found point and repeats the process. In case it hits a point p that has large degree, it has access to the Voronoi rays emanating from p. It first locates q in the ray system of p in O(log n) time using a binary search and then tests whether q lies in the Voronoi cell of p. If this is the case, p is returned as the answer for layer 2, and the search will continue in layer 3. Otherwise the point opposite to p in the sector containing q is nearer to q compared to p. In this case, the current position is updated to the opposite point, and routing continues. Once in layer 3 the algorithm again looks at all vertices incident on the current answer, p′j . In this case, the edges are from the original Dealaunay graph of the undivided point set, which connect the vertices in layer 2 to the maximal independent set vertices in layer 3. In case of a vertex in Layer 3 with large degree, the preprocessed Voronoi cell of p′j is used in the same fashion as in Layer 2. In this case, if q does not lie in the Voronoi cell of p′j , then the point opposite to p′j in the sector containing q is the nearest neighbor of q, as opposed to just being a nearer neighbor. It should be noted that there are two degenerate cases when considering the Voronoi cell of p, both of which can be resolved in constant time. The first case occurs when a sector contains an open face of the Voronoi cell (Figure 4.3(a)). This sector will contain two adjacent vertices, not one, and if q lies in such a sector, we simply compute the distance to both. The second degenerate case occurs when an edge of the Voronoi cell of p has length 0 36 (Figure 4.3(b)), implying that some or all of the points bordering p are co-circular with p. In this case,in pre-processing the points immediately clockwise and counter-clockwise to p on the circle are identified. If q is determined to lie in a sector bordering an edge with 0 length, one of those two found points must be nearer to the actual nearest neighbor than p (assuming p is not the answer). b b b b b v2 b b b b b b p v2 b b v1 b b b b b b b p b b v1 (a) In the first degenerate case, the center vertex has an open Voronoi cell. If the query point lies in this sector, we check the distance to the two points in the open sector (v1, v2). (b) In the second degenerate case, the center vertex is co-circular with several adjacent vertices, and there is only one Voronoi vertex. In this case, we always check the nearest co-circular points in the clockwise and counter-clockwise directions for a nearer neighbor (v1, v2), and ignore the other co-circular points. Figure 4.3: Two degenerate cases for linear degree vertices. 4.2 Analysis of DelaunayNN In terms of analysis, the first goal will be to prove the correctness of the DelaunayNN algorithm. Begin by defining Compass Routing formally (this definition is slightly different than [54]): Given a geometric graph G = (P, E), an initial vertex s ∈ G and a destination q (may not be in the graph), let vi be the closest vertex in G to q. The goal is to travel from s to vi , when the only information available at any point in time is the coordinates of q, the current position, and the edges incident on the current vertex . Starting at s, traverse the edge (s, s′ ) ∈ E incident on s that leads closest to q. We assign s = s′ and repeat this procedure till no movement will decrease the distance to q. Lemma 4.2.1, proves a simple property of Compass Routing on Delaunay Graphs in d-dimensions. Lemma 4.2.1. Let P ⊂ Rd and G = (P, E) be the graph output from its Delaunay triangulation. Let q be a query point for which to compute the nearest neighbor in P . Compass routing on G yields nearest neighbor of q in P . 37 Figure 4.4: Proof of Lemma 4.2.1. Proof. Let the compass routing begin with a vertex v0 ∈ P . Let vi be the vertex on which compass routing stops and can not improve the distance to q. Let Nbr(vi ) be the set of all vertices having an edge with vi in G. This implies that Ball(q, Dist(q, vi )) is empty of vertices in Nbr(vi ). For a contradiction, let v ∗ 6= vi in P be the nearest neighbor of q. Then v ∗ ∈ Ball(q, Dist(q, vi )) and there is no edge between v ∗ and vi in G. Now draw a ball with vi and v ∗ on its boundary such that it lies inside Ball(q, Dist(q, vi )). If this ball is empty, then v ∗ ∈ Nbr(vi ) which is a contradiction of the Delaunay property of the graph. Otherwise, shrink the ball, keeping it hinged on vi and inside Ball(q, Dist(q, vi )), till it contains only one point vj ∈ P . This again is a contradiction since vj is closer to q than vi and (vi , vj ) ∈ G (Compass routing should not have terminated at vi ). See Figure 4.4. Next, it is necessary to show the Query function in Algorithm 6 returns the correct answer. Lemma 4.2.2. The Query function in Algorithm 6 returns the nearest neighbor of q in P. Proof. Do to the correctness of compass routing, the need here is to show that correctness of the query algorithm in not affected by separating P into three layers. Lemma 4.2.1 ensures that the nearest neighbor in Layer 2 is found. Let this neighbor be p′j . Ball(q, Dist(q, p′j )) is empty of points in Layer 2. If Ball(q, Dist(q, p′j )) is empty, then p′j will be returned as the nearest neighbor of q correctly by the Query function. Otherwise, |Ball(q, Dist(q, p′j ))| = 1; if there were more points than 1 in this ball, there would exist a Delaunay edge between two of these points, contradicting the fact that they are a maximal independent set in the Delaunay triangulation of P . Let v ∗ in Layer 3 be inside Ball(q, Dist(q, p′j )) in this case. Then one can draw an empty ball passing through p′j and v ∗ , keeping it inside Ball(q, Dist(q, p′j )). This means there must be a Delaunay edge connecting p′j and v ∗ implying that v ∗ ∈ H(p′j ) and hence the Query function must return v ∗ correctly. In order to prove the running time of the algorithm is O(log n), some simple assumptions must be made about the input data. This is the same restriction described in chapter 3.2. 38 Let P be a finite set of points in Rd such that |P | = n. Let µ be a counting measure on P . Let the measure of a ball, µ(Ball(c, r)) be defined as the number of points in Ball(c, r) ∩ P . A point q is said to have expansion constant γ if for all k ∈ (1, n): µ(Ball(q, 2 × rad(q, Nqk ))) ≤ γk Throughout the analysis, assume that the query point q has an expansion constant γ =O(1). Note that for finding exact nearest neighbors in O(log n) time, the queries with high γ are precisely the queries which drive provable (1 + ǫ)-approximate nearest neighbor data structures to spend more time in computing the solution when ǫ is close to zero [69]. The following observation bounds the running time of our query: Lemma 4.2.3. In O(log n) time, Ball(q, r) can be computed such that, in expectation, |Ball(q, r)| =O(1). Proof. This follows directly from Lemma 3.2.3, which shows the nearest neighbor to q chosen from O(1) points in P adjacent to q in Morton order is contained in a box that has, in expectation, only a constant factor more points than the box containing nn(q, P ). Finally, we can use the above observation to find the actual bound: Theorem 4.2.4. Given q, with expansion constant γ =O(1), nn(q,P) can be found in O(log n) time in expectation. Proof. Given that P is sorted in Morton order, a binary search for q obviously takes only O(log n) time. This yields a ball to be refined with only expected O(1) vertices of the Delaunay triangulation of P . Compass routing can therefore find a path containing only O(1) vertices. Given that any vertex can be processed in O(log n) time to find a nearer neighbor by using the Voronoi cell, nn(q, P ) can be found in O(log n) time in expectation. Note that splitting P in two layers, does not increase the running time because the number of points visited is still expected O(1) (By Lemma 4.2.3). The construction time of the algorithm is O(n log n), bounded by the sorting of the input set in Morton order, as well as constructing the Delaunay graph and Voronoi graph, all of which have O(n log n) running times. The maximal independent set is found in O(n) time. 4.3 Experimental Analysis In this section the DelaunayNN algorithm will be tested, in practice, against two other nearest neighbor algorithms. The first was ANN, the kd-tree nearest neighbor implementation from David Mount [69]. The second was an implementation of the full Delaunay hierarchy (FDH) algorithm presented by Birn et al. [18]. Experiments were conducted on a machine with dual 2.66 GHz Quad-core Intel Xeon CPUs, using a total of 4 GB DDR memory. Each core had 2 MB of total cache. The operating system used was SUSE Linux version 11.2, kernel 2.6.31.8-0.1. All source code was compiled using g++ version 4.4.1, with -O3 enabled. 39 DelaunayNN was written using C++. It used the Triangle library by Jonathan Shewchuk [92] to construct the Delaunay triangulation in pre-processing. It should also be noted that the maximum degree of any vertex for the majority of tested data sets was 64, which was small enough that the Voronoi pre-processing of points was not needed for most of the experiments, the lower bound Ω(1) was replaced by 64. The final distribution was specifically designed to to check the case where large degree vertices become a factor, and did use the Vornoi cell pre-processing. Next, the FDH algorithm will be described in detail. For detail on the ANN algorithm, please refer to chapter 3.3.1 4.3.1 FDH Algorithm FDH answers nearest neighbor queries using compass routing on a full Delaunay hierarchy. Construction is done using an incremental Delaunay graph construction algorithm. Points are inserted, and their index is stored along with all edges incident on them in the graph. As new points are added, edge lists are updated for every previous point. This process yields a graph with a superset of Delaunay edges. To answer a query, the algorithm begins at the 0 index point, and uses compass routing to walk the graph. Due to the layout, the algorithm only considers adjacent points with a higher index than the current location. FDH was implemented using C++. It used the CGAL library [19] to construct the Delaunay hierarchy in pre-processing. In both DelaunayNN and FDH, exact predicates were used to construct the Delaunay graphs. To keep a fair comparison with ANN, however, both used inexact floating point arithmetic when computing distances for queries. In all experimental cases this had no impact on the correctness of the solution. For both DelaunayNN and FDH, points were stored along with edges of the graph in order to take advantage of spatial locality in the cache, at the cost of some storage efficiency. In all experiments, the nearest neighbor to the query point was found exactly (ANN used ǫ = 0). 4.3.2 Data Distributions For comparison purposes, point distributions for the experiments were chosen to be the same as those used by Birn et al. [18, 36]. To recap, there are four distributions used: 1. Data points chosen uniformly at random from the unit square. Query points chosen uniformly at random from a square 5% larger than the unit square. 2. Data points chosen uniformly at random from the border of the unit circle. Query points chosen uniformly at random from the smallest containing square around the unit circle. 3. Data points chosen with 95% from the unit circle, 5% from the smallest square containing the unit circle. Query points chosen at random from the unit circle. 4. Data points chosen with x = [−1000, 1000] and y = x2 . Query points chosen uniformly at random from the rectangle containing the parabola. 40 5. Data points chosen from the border of several non-overlapping circles. The centers of these circles are included as well. Query points are chosen from the bounding box containing all circles. For each experiment, point sets were created ranging in size from one million to 128 million points. 100,000 queries were used in each experiment. To account for randomness in the algorithms and the system, each experiment was run multiple times (with unique data and query sets for each), and the results were averaged. The next section describes the results and shows graphs of the run time. Note that all graphs use a base 2 logarithmic scale. 4.3.3 Experimental Results As shown in Figures 4.5 - 4.8, the DelaunayNN algorithm behaves very well in practice on point sets from various distributions. For data sets of sufficient size, the DelaunayNN implementation proves faster than FDH in all cases, and faster than ANN in almost all cases. Figure 4.5 shows the results for uniform distribution, which most closely follows the bounded expansion constant considered in the analysis. All implementations performed very well, with low average query times even for very large data sets. Overall, the increase in average query time was significantly less than log n for all three implementations. In Figure 4.6 we see that for data sets where points are distributed on a circle, the DelaunayNN displays timing results that are very similar to it’s performance on uniformly distributed data, where both ANN and FDH perform substantially worse than on uniform data. This trend continues in Figure 4.8, with the exception of ANN’s performance, which is closer to its performance on uniform data. One anomalous case is documented in Figure 4.7. In this case, ANN had a marked edge in performance for larger point sets over DelaunayNN and FDH. It is also worth noting that for this type of distribution, all implementations had significantly worse scaling than on other distributions. The final test case was designed to see how linear degree vertices would impact the running time of the algorithm. In this instance, data points were chosen from the borders of a small number (50) of non-overlapping circles. In addition, the centers of these circles were were included as data points. In the Delaunay graph, these centers become vertices of linear degree, as they will form a cell with every point on the edge of the circle. Finally, a percentage of data points were included from outside each circle. The results of this test case (Figure 4.9) show that using the Voronoi cell search makes the difference between a faster query time than ANN and a slower one. It should also be noted that this type of distribution severely impacted the performance of the full Delaunay hierarchy algorithm. Figure 4.10 shows the difference in pre-processing time for the various implementations on uniform data. While ANN maintains a distinct advantage, DelaunayNN scales much better as the data set size increases. It is also clear that using divide and conquer approach allows for Delaunay triangulation with much more reasonable construction times, whereas FDH is forced to use the practically less efficient, incremental construction. 41 Query Time for Uniform Distribution Microseconds Per Query 1 DelaunayNN FDH ANN 0.5 0.25 0.125 1 2 4 8 16 Number Of Data Points (Millions) 32 64 128 Figure 4.5: Showing average time per query versus data set size for points taken from uniform distribution. 4.4 3-Dimensional Experiments There are several factors that, in theory, limit the DelaunayNN algorithm to nearest neighbor searches to two dimensions. First, it is well known that Delaunay graphs in three dimensions can contain a quadratic number of edges. Second, the approach to dealing with high degree vertices by storing the Voronoi cells can’t work in three dimensions; the regions can no longer be stored in a manner that allows us to perform a binary search to locate the next closer vertex. These limitations mean that, the expected running time is linear, as opposed to logarithmic. There are some indications, however, that this approach may be serviceable in practice for data sets in three dimensions. It has been shown that while in theory Delaunay graphs are quadratic in three dimensions, in practice the organizations where this occurs are very specific and contrived [42]. In addition, data sets that follow a random distribution do not, in general, contain vertices of linear degree. In this next set of experiments, the timing results from using the DelaunayNN algorithm on three dimensional data, compared to using ANN. Data was generated from the same distributions as in the previous section (in three dimensions instead of two). Results from the FDH algorithm are not included in the three dimensional experiment. The hierarchy built by this algorithm becomes very large in three dimensions, even for these well behaved distributions, and so the query times do not remain competitive. It can be seen in Figure 4.11 that the DelaunayNN algorithm performs quite well in practice when compared to ANN, for uniform data sets in three dimensions. Similar results to the two dimensional case are observed for both of the circular distributions (Figures 4.12 - 4.13 as well. Results for the parabolic distribution are slightly better than in the two dimensional case. ANN still surpasses the Delaunay based approach for large data sets, although in this case the size of the data set is larger at the crossing point than previously. Two final cases 42 show the pitfalls of this approach in three dimensional cases. One test case had several spheres, with centers included as data points. Similar to the two dimensional case, this creates a Delaunay graph with several vertices of linear degree. In two dimensions, the Voronoi cell approach kept these additional points from having too large an impact on the average query time. In the three dimensional case, however, having to do a linear scan of the adjacent vertices allowed ANN to surpass the Delaunay algorithm (Figure 4.15). The last test case contained data points sampled from two orthogonal lines. In this case, the Delaunay graph is truly quadratic. Results from this case are not shown, as the running time for the DelaunayNN algorithm was too large to sufficiently test. Pre-processing time for the three dimensional case was an issue. Delaunay graphs for three dimensional data were much slower to construct than the corresponding kd-trees. This was partly due to the fact that divide and conquer construction for Delaunay graphs is not available for three dimensions. In light of this, using a Delaunay graph to do nearest neighbor searches in three dimensions may only be advisable when construction time is not a factor, or if the number of queries is large enough to overcome the difference in timing. 43 Query Time for Circular Distribution 256 DelaunayNN FDH ANN Microseconds Per Query 64 16 4 1 0.25 0.0625 1 2 4 8 16 Number Of Data Points (Millions) 32 64 128 Figure 4.6: Showing average time per query versus data set size for points taken from a unit circle. Query points were taken from the smallest square enclosing the circle. Query Time for Fuzzy Circle Distribution 2 DelaunayNN FDH ANN Microseconds Per Query 1 0.5 0.25 0.125 1 2 4 8 16 Number Of Data Points (Millions) 32 64 128 Figure 4.7: Showing average query time versus data set size for points taken from the unit circle, plus some points chosen uniformly at random from the square containing the circle. Query points taken from the circle. 44 Query Time for Parabolic Distribution 64 DelaunayNN FDH ANN 32 Microseconds Per Query 16 8 4 2 1 0.5 0.25 0.125 1 2 4 8 16 Number Of Data Points (Millions) 32 64 128 Figure 4.8: Showing average time per query versus data set size for points taken from a parabola. Query points were taken from the smallest rectangle enclosing the parabola. Query time for linear degree vertices 8 DelaunayNN(Voronoi Search) FDH ANN DelaunayNN(No Voronoi) Average time per query(Microseconds) 4 2 1 0.5 0.25 0.125 1 2 4 8 16 32 64 128 Number of Data Points(Millions) Figure 4.9: Showing average time per query versus data set size for points taken from a distribution with linear degree vertices. Note that ANN is faster than the Delaunay algorithm run without the Voronoi cell search capability. 45 Construction Time 64 DelaunayNN FDH ANN 32 Microseconds per Point 16 8 4 2 1 0.5 1 2 4 8 16 32 64 128 Number Of Data Points (Millions) Figure 4.10: Showing average time per point to pre-process data sets for queries. Data was taken uniformly at random from the unit square. Query time for uniform distribution(3D) Average time per query(Microseconds) 4 DelaunayNN ANN 2 1 0.5 1 2 4 8 16 32 64 128 Number of Data Points(Millions) Figure 4.11: Showing average time per query versus data set size for points taken from uniform distribution in three dimensions. 46 Query time for circular distribution(3D) 1024 DelaunayNN ANN Average time per query(Microseconds) 512 256 128 64 32 16 8 4 2 1 1 2 4 8 16 Number of Data Points(Millions) 32 64 128 Figure 4.12: Showing average time per query versus data set size for points taken from the circular distribution in three dimensions. Query time for fuzzy circular distribution(3D) Average time per query(Microseconds) 1 DelaunayNN ANN 0.5 0.25 1 2 4 8 16 Number of Data Points(Millions) 32 64 128 Figure 4.13: Showing average time per query versus data set size for points taken from the “fuzzy“ circular distribution in three dimensions. 47 Query time for parabolic distribution(3D) 32 DelaunayNN ANN Average time per query(Microseconds) 16 8 4 2 1 0.5 0.25 1 2 4 8 16 32 64 128 Number of Data Points(Millions) Figure 4.14: Showing average time per query versus data set size for points taken from the parabolic distribution in three dimensions. Query time for linear degree vertices(3D) Average time per query(Microseconds) 4 DelaunayNN ANN 2 1 0.5 0.25 1 2 4 8 16 Number of Data Points(Millions) 32 64 128 Figure 4.15: Showing average time per query versus data set size for points taken from a distribution with linear degree vertices. In this case, ANN wins due to the necessity of processing the linear degree vertices by sequential scan. 48 CHAPTER 5 GEOMETRIC MINIMUM SPANNING TREES In this chapter, a practical deterministic algorithm to solve the problem of constructing geometric minimum spanning trees is presented. Called GeoFilterKruskal, the algorithm efficiently generates the graphs in a manner that easily lends itself to parallelization. Prior to discussing the actual algorithm, a brief history of other approaches to solving this problem will be presented. It is well established that the GMST is a subset of edges in the Delaunay triangulation of a point set [82]. It is well known that that this method is inefficient for any dimension d > 2. It was shown by Agarwal et al. [2] that the GMST problem is related to solving bichromatic closest pairs for some subsets of the input set (for an explanation of the BCCP problem, see chapter 2.5). Callahan [22] used well separated pair decomposition and bichromatic closest pairs to solve the same problem in O(Td (n, n) log n), where Td (n, n) is the time required to solve the bichromatic closest pairs problem for n red and n green points. It is known that bichromatic closest pair is probably harder than computing the GMST [41]. Clarkson [30] gave an algorithm that is particularly efficient for points that are independently and uniformly distributed in a unit d-cube. His algorithm has an expected running time of O(nα(cn, n)), where c is a constant depending on the dimension and α is an extremely slow growing inverse Ackermann function [31]. Bentley [14] also gave an expected nearly linear time algorithm for computing GMSTs in Rd . Dwyer [39] proved that if a set of points is generated uniformly at random from the unit ball in Rd , its Delaunay triangulation has linear expected complexity and can be computed in expected linear time. Since GMSTs are subsets of Euclidean Delaunay triangulations, one can combine this result with linear time MST algorithms [53] to get an expected O(n) time algorithm for GMSTs of uniformly distributed points in a unit ball. Rajasekaran [84] proposed a simple expected linear time algorithm to compute GMSTs for uniform distributions in Rd . All these approaches use bucketing techniques to execute a spiral search procedure for finding a supergraph of the GMST with O(n) edges. Narasimhan et al. [73] gave a practical algorithm that solves the GMST problem. They prove that for uniformly distributed points, in fixed dimensions, an expected O(n log n) steps suffice to compute the GMST using well separated pair decomposition. Their algorithm, GeoMST2, mimics Kruskal’s algorithm [55] on well separated pairs and eliminates the need to compute bichromatic closest pairs for many well separated pairs. 49 Brennan [20] presented a modification to Kruskal’s classic minimum spanning tree (MST) algorithm [55] that operated in a manner similar to quicksort; splitting an edge set into “light” and “heavy” subsets. Recently, Osipov et al. [77] further expanded this idea by adding a multi-core friendly filtering step designed to eliminate edges that were obviously not in the MST (Filter-Kruskal). 5.1 Notation Points are denoted by lower-case Roman letters. dℓ (p, q) denotes the distance between the points p and q in L2 metric. Upper-case Roman letters are reserved for sets. Scalars except for c, d, m and n are represented by lower-case Greek letters. i, j, k are reserved for indexing purposes. Vol( ) denotes the volume of an object. For a given quadtree, Box(p,q) denotes the smallest quadtree box containing points p and q; Fraktur letters (a) denote a quadtree/fair split tree node. The Cartesian product of two sets X and Y , is denoted by X × Y = {(x, y) | x ∈ X and y ∈ Y }. Let P be a point set in Rd , where d is the dimension. The bounding rectangle of P , denoted by R(P ) is defined to be the smallest rectangle that encloses all points in P , where the word “rectangle” denotes the Cartesian product of R = [x1 , x′1 ] × [x2 , x′2 ] × ... × [xd , x′d ] in Rd . Denote the length of R in the ith dimension by li (R) = x′i −xi . Denote the maximum and minimum lengths by lmax (R) and lmin (R). When all li (R) are equal, R is a d-cube, and denote its length by l(R) = lmax (R) = lmin (R). li (P ), lmin (P ), lmax (P ) is shorthand for li (R(P )), lmin (R(P )) and lmax (R(P )), respectively. MinDist(a, b) denotes the minimum distance between the quadtree boxes of two nodes in case of a quadtree and the same between the bounding boxes of two nodes in case of a fair split tree. Bccp(a, b) computes the bichromatic closest pair of two nodes, and returns {u, v, δ}, where (u, v) is the edge defining the Bccp and δ is the edge length. Left(a) and Right(a) denotes the left and right child of a node. |.| denotes the cardinality of a set or the number of points in a quadtree/fair split tree node. α(n) is used to denote inverse of the Ackermann function [31]. 5.2 The GeoFilterKruskal algorithm The GeoFilterKruskal algorithm computes a GMST for P ⊆ Rd . Kruskal’s [55] algorithm shows that given a set of edges, the MST can be constructed by considering edges in increasing order of weight. It is known that the GMST can be computed by running Kruskal’s algorithm on the Bccp edges of the WSPD of P [22]. When Kruskal’s algorithm adds a Bccp edge (u, v) to the GMST, where u, v ∈ P , it uses the UnionFind data structure to check whether u and v belong to the same connected component. If they do, that edge is discarded. Otherwise, it is added to the GMST. Hence, before testing for an edge (u, v) for inclusion into the GMST, it should always attempt to add all Bccp edges (u′ , v ′ ), such that, Dist(u′ , v ′ ) < Dist(u, v). Algorithm 7 describes the GeoFilterKruskal algorithm for computing the geometric minimum spanning tree of a set of points. The input to the algorithm is a WSPD of the point 50 set P ⊆ Rd . The set of WSPs S is partitioned into set El that contains WSPs with cardinality less than β (initially 2), and set Eu = S \ El . Then the BCCP of all elements of set El are is computed, and compute ρ equal to the minimum dℓ (a, b), for all (a, b) ∈ Eu . El is further partitioned into El1 , containing all elements with a BCCP distance less than ρ, and El2 = El \ El1 . El1 is passed to the Kruskal procedure, and El2 ∪ Eu is passed to the F ilter procedure. The Kruskal procedure is the classic Kruskal’s algorithm. It first sorts the edges according to their length, then adds them to the GMST. The Union Find data structure is maintained to keep track of the connected components. Any edge passed to this procedure is either added to the GMST or discarded. F ilter examines all the remaining WSPs, and uses Union Find to check if they have been connected by a previous call to Kruskal. By partitioning the WSPs into batches, the GeoFilterKruskal algorithm can apply the same technique to geometric minimum spanning tree construction. The GeoF ilterKruskal procedure is recursively called, increasing the threshold value (β) by one each time, on the WSPs that survive the F ilter procedure, until the complete minimum spanning tree has been found. 5.3 Correctness Given previous work by Kruskal [31] as well as Callahan [22], it is sufficient to show two facts to ensure the correctness of the GeoFilterKruskal algorithm. First, the WSPs must be added to the GMST in the order of their Bccp distance. This is obviously true, considering WSPs are only passed to the Kruskal procedure if their Bccp distance is less than the lower bound on the Bccp distance of the remaining WSPs. Second, the Filter procedure should not remove WSPs that should be added to the GMST. Once again, it is clear that any edge removed by the Filter procedure would have been removed by the Kruskal procedure eventually, as they both use the UnionFind structure to determine connectivity. By these two facts the GeoFilterKruskal algorithm produces a correct GMST. 5.4 Analysis of the Running Time The real bottleneck of this algorithm, as well as the one proposed by Narasimhan [73], is the computation of the Bccp.1 . If |A| = |B| =O(n), the Bccp algorithm stated in chapter 2.5 has a worst case time complexity of O(n2 ). Since O(n) must be processed edges, naively, the computation time for GMST will be O(n3 ) in the worst case. In order to bound the running time of the GeoFilterKruskal algorithm, the size of the well separated pairs passed to the Bccp algorithm must be a constant. In that case, even processing O(n) edges will not cause the running time to go above the O(n log n) running 1 According to the algebraic decision tree model, the lower bound of the set intersection problem can be shown to be Ω(n log n) [43]. Here, the set intersection problem is solved using Bccp. If the Bccp distance between two sets is zero, the sets intersect, otherwise they do not. Since the set intersection problem is lower bounded by Ω(n log n), the Bccp computation is also lower bounded by Ω(n log n). 51 Algorithm 7 GeoFilterKruskal Algorithm Require: S = {(a1 , b1 ), ..., (am , bm )} is a WSPD, constructed from P ⊆ Rd ; T = {}. Ensure: Bccp Threshold β ≥ 2. 1: procedure GeoFilterKruskal(Sequence of WSPs : S, Sequence of Edges : T , UnionFind : G, Integer : β) 2: El = Eu = El1 = El2 = ∅ 3: for all (ai , bi ) ∈ S do 4: if (|ai | + |bi |) ≤ β then El = El ∪ {(ai , bi )} else Eu = Eu ∪ {(ai , bi )} 5: end for 6: ρ = min{MinDist{ai , bi } : (ai , bi ) ∈ Eu , i = 1, 2, ..., m} 7: for all (ai , bi ) ∈ El do 8: {u, v, δ} = Bccp(ai , bi ) 9: if (δ ≤ ρ) then El1 = El1 ∪ {(u, v)} else El2 = El2 ∪ {(u, v)} 10: end for 11: Kruskal(El1 , T, G) 12: Enew = El2 ∪ Eu 13: Filter(Enew , G) 14: if ( |T | < (n − 1)) then GeoFilterKruskal(Enew , T, G, β + 1) 15: end procedure 16: procedure Kruskal(Sequence of WSPs : E, Sequence of Edges : T , UnionFind : G) 17: Sort(E): by increasing Bccp distance 18: for all (u, v) ∈ E do 19: if G.Find(u) 6= G.Find(v) then T = T ∪ {(u, v)} ; G.Union(u, v); 20: end for 21: end procedure 22: procedure Filter(Sequence of WSPs : E, UnionFind : G) 23: for all (ai , bi ) ∈ E do 24: if (G.Find(u) = G.Find(v) : u ∈ ai ,v ∈ bi ) then E = E \ {(ai , bi )} 25: end for 26: end procedure 52 Multi-core scaling of GeoFK vs. KNNG 18 AMD Intel knng(AMD) knng(Intel) 16 Total Run Time(seconds) 14 12 10 8 6 4 2 0 10 20 30 40 50 60 Number of Threads Figure 5.1: This figure demonstrates the run time gains of the algorithm as more threads are used. Scaling for two architectures. The AMD line was computed on the machine described in Section 5.6. The Intel line used a machine with four 2.66GHz Intel(R) Xeon(R) CPU X5355, 4096 KB of L1 cache, and 8GB total memory. For additional comparison, we include KNNG construction time using a parallel 8-nearest neighbor graph algorithm. All cases were run on 20 data sets from uniform random distribution of size 106 points, final total run time is an average of the results. time of the initial sort. Appendix A gives a formal high probability bound proof showing that this is the case. The running time of GeoFilterKruskalcan then be shown to be O(n) plus the time of one sort. 5.5 Parallelization Although the whole of algorithm 7 is not parallelizable, most portions of the algorithm can be parallelized. The parallel partition algorithm [83] is used in order to divide the set S into subsets El and Eu (See Algorithm 7). ρ can be computed using parallel prefix computation. The further subdivision of El , as well as the Filter procedure, are just further instances of parallel partition. The sorting step used in the Kruskal procedure can also be parallelized [83]. Efforts to parallelize the linear time tree construction showed that one can not use more number of processors on multi-core machines to speed up this construction, because it is memory bound. Figure 5.1 shows empirical results for parallel execution of the GeoFilterKruskal algorithm. 5.5.1 Comparing Quadtrees and Fair Split Trees Theoretically, the choice of partition tree used to construct the WSPD for this algorithm does not matter. This section compares results of running the GeoFilterKrusakl algorithm using the quadtree versus a KD-tree(with a modified splitting rule, this tree will be referred 53 (a) Number of WSPs v. Number of Points 140 Fair Split Tree Quadtree 120 Number of Pairs(/106) 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 6 Number of Points(/10 ) Figure 5.2: Comparison of the number of WSPs versus the number of points for quadtrees and fair split trees to as a fair split tree, or FST). The main difference between the two is that the quadtree can be constructed faster, however the FST exhibits better clustering. The data sets used were uniformly random, two dimensional points. Experiments run on data sets from other distributions and higher dimensions (up to 5) were not significantly different. Figure 5.2 shows the difference in the size of the WSPD set computed from the two trees. As expected, better clustering in the fair split tree yields fewer well separated pairs. Figure 5.3 shows the timing comparisons from using both of the trees. Even though the quadtree can be constructed more quickly, the algorithm runs significantly faster given the fewer WSPs produced by the fair split tree. Figure 5.4 shows that modifying the separation factor for the well separated pair decomposition in the code produced fewer WSPs than required from the trees. In this case, the MSTs produced had some small margin of error. Interestingly, it seems that if one accepts a small error in the MST, reducing the separation factor can produce approximate MSTs using many fewer WSPs. Figure 5.5 shows empirical results for this case, varying √ the separation factor from 2 (the lowest factor that guarantees a correct result) down to √ 0.1 ∗ 2. At this time there are no theoretical guarantees on the error of the approximation. These results show that, in practice, it appears to be quite small. 54 (b) Number of Points v. Time 8 Fair Split Tree Quadtree 7 Time(seconds) 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 Number of Points(/106) Figure 5.3: Comparison of the number of points versus GMST construction time for quadtrees and fair split trees. (c) Separation Factor v. Number of Pairs 60 Fair Split Tree 55 Number of Pairs(/106) 50 45 40 35 30 25 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Separation Factor 0.8 0.9 1 Figure 5.4: Separation factor of WSPs versus the number of WSPs produced. 55 (d) Separation Factor v. Percent Error 2.5 Fair Split Tree Percent Error 2 1.5 1 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Separation Factor 0.8 0.9 1 Figure 5.5: Separation of WSPs versus the error in the length of the GMST. 5.6 Geometric Minimum Spanning Tree Experimental Setup The GeoFilterKruskal algorithm was tested in practice against several other implementations of geometric minimum spanning tree algorithms. A subset of the algorithms compared in [73] were chosen, excluding some based on the availability of source code and the clear advantage shown by some algorithms in the aforementioned work. Based on the results in the previous section, we elected to implement the Geometric Filter Kruskal algorithm using fair split trees, instead of quadtrees. Table 1 lists the algorithms that will be referred to in the experimental section. GeoFilterKruskal was written in C++ and compiled with /tt gcc with -O3 optimization. Parallel code was written using OpenMP [33] and the parallel mode extension to the STL [83]. C++ source code for GeoMST, GeoMST2, and Triangle were provided by Giri Narsimhan. In addition, Triangle used Jonathan Shewchuk’s triangle library for Delaunay triangulation [92]. The machine used has 8 Quad-Core AMD Opteron(tm) Processor 8378 with hyperthreading enabled. Each core has a L1 cache size of 512 KB, L2 of 2MB and L3 of 6MB with 128 GB total memory. The operating system was CentOS 5.3. All data was generated and stored as 64 bit doubles. In the next section there are two distinct series of graphs. The first set displays graphs of total running time versus the number of input points, for two to five dimensional points, with uniform random distribution in a unit hypercube. The L2 metric was used for distances in all cases, and all algorithms were run on the same random data set. Each algorithm was run on five data sets, and the results were averaged. As noted above, Triangle was not used in dimensions greater than two. 56 Table 5.1: Algorithm Descriptions Name GeoFK# GeoMST GeoMST2 Triangle Description Implementation of Algorithm 7. There are two important differences between the implementation and the theoretical version. First, the BCCP Threshold β is incremented in steps of size O(1) instead of size 1. This change does not affect the analysis but helps in practice. Second, for small well separated pairs (less than 32 total points) the BCCP is computed by a brute force algorithm. In the experimental results, GeoFK1 refers to the algorithm running with 1 thread. GeoFK8 refers to the algorithm using 8 threads. This implementation used a fair split tree, as opposed to the quadtree. Described by Callahan and Kosaraju [22]. This algorithm computes a WSPD of the input data followed by the BCCP of every pair. It then runs Kruskal’s algorithm to find the MST. Described in [73]. This algorithm improves on GeoMST by using marginal distances and a priority queue to avoid many BCCP computations. This algorithm first computes the Delaunay Triangulation of the input data, then applies Kruskal’s algorithm. Triangle only works with two dimensional data. The second set of graphs shows the mean total running times for two dimensional data of various distributions, as well as the standard deviation. The distributions were taken from [73] (given n d-dimensional points with coordinates c1 ...cd ), shown in Table 5.2. 5.7 Geometric Minimum Spanning Tree Experimental Results As shown in Figures 5.6 - 5.8, GeoFK1 performs favorably in practice for almost all cases compared to other algorithms (see Table 5.1). In two dimensions, only Triangle outperforms GeoFK1. In higher dimensions, GeoFK1 is the clear winner when only one thread is used. Figures 5.9 - 5.11, show that in most cases, GeoFK1 performs better regardless of the distribution of the input point set. Apart from the fact that Triangle maintains its superiority in two dimensions, GeoFK1 performs better in all the distributions that were considered, except when the points are drawn from arith distribution. In the data set from arith, the ordering of the WSPs based on the minimum distance is the same as based on the Bccp distance. Hence the second partitioning step in GeoFK1 acts as an overhead. The results from Figure 5.9 - 5.11, are for two dimensional data. The same experiments for data sets of other dimensions did not give significantly different results, and so were not included. 57 Table 5.2: Point Distribution Info Name unif annul arith ball clus edge diam corn grid norm spok Description c1 to cd chosen from unit hypercube with uniform distribution (U d ) c1 to c2 chosen from unit circle with uniform distribution, c3 ...cd chosen from U d c1 = 0, 1, 4, 9, 16... c2 to cd are 0 c1 to cd chosen from unit hypersphere with uniform distribution c1 to cd chosen from 10 clusters of normal distribution centered at 10 points chosen from U d c1 chosen from U d , c2 to cd equal to c1 c1 chosen from U d , c2 to cd are 0 c1 to cd chosen from 2d unit hypercubes, each one centered at one of the 2d corners of a (0,2) hypercube n points chosen uniformly at random from a grid with 1.3n points, the grid is housed in a unit hypercube c1 to cd chosen from (−1, 1) with normal distribution For each dimension d′ in d nd points chosen with cd′ chosen from U 1 and all others equal to 21 2-d 120 GeoMST GeoMST2 GeoFK1 GeoFK8 Triangle 100 Total Run Time 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 Number of Points/106 Figure 5.6: Total running time for each algorithm over varying sized data sets. Data was from uniformly random and two dimensional. 58 3-d 40 GeoMST GeoMST2 GeoFK1 GeoFK8 35 Total Run Time 30 25 20 15 10 5 0 1 2 3 4 5 6 7 Number of Points/10 8 9 10 5 Figure 5.7: Total running time for each algorithm over varying sized data sets. Data was from uniformly random and three dimensional. 4-d 180 GeoMST GeoMST2 GeoFK1 GeoFK8 160 140 Total Run Time 120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 Number of Points/105 Figure 5.8: Total running time for each algorithm over varying sized data sets. Data was from uniformly random and four dimensional. 59 GeoMST 2.5 GeoMST GeoFK1 Mean Run Time 2 1.5 1 0.5 0 unif annul arith ball clus edge diam corn grid norm spok Distribution Figure 5.9: Showing average error and standard deviation when comparing GeoMST to GeoFK1 on varying distributions. Data size was 106 points in two dimensions. GeoMST2 1.3 GeoMST2 GeoFK1 1.2 1.1 Mean Run Time 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 unif annul arith ball clus edge diam corn grid norm spok Distribution Figure 5.10: Showing average error and standard deviation when comparing GeoMST2 to GeoFK1 on varying distributions. Data size was 106 points in two dimensions. 60 Triangle 1.4 Triangle GeoFK1 1.2 Mean Run Time 1 0.8 0.6 0.4 0.2 0 unif annul arith ball clus edge diam corn grid norm spok Distribution Figure 5.11: Showing average error and standard deviation when comparing Triangle to GeoFK1 on varying distributions. Data size was 106 points in two dimensions. 61 CHAPTER 6 NEAREST NEIGHBOR SEARCH OF HIGH DIMENSIONAL SIFT DATA High dimensional nearest neighbor search is not fundamentally different than nearest neighbor search in lower dimension. The challenge comes in dealing with the “curse of dimensionality,” the exponential dependence on dimension that cripples standard techniques when they are applied in high dimensions. In this chapter, a new method, PCANN, for conducting higher dimensional nearest neighbor searches will be presented, and its performance will be compared with other state-of-the-art implementations. Methods for searching nearest neighbors in high dimensions can be grouped into exact methods and approximate. The most obvious exact method is a sequential scan of the data, and this is used if the number of queries to be made is small. Weber et al. [99] improve on the sequential scan through use of of data compression. The idea of compressing the data was later combined with R-trees to create the IQ-tree [15]. The iDistance technique maps high dimensional points to a Morton order curve, then indexes the curve with a B-tree to search for nearest neighbor. This is considered to be the most efficient algorithm for exact nearest neighbor search in high dimensions, but approximation techniques beat it by a wide margin [95]. All of these solutions have, at best, a provably linear query time. For approximate nearest neighbor searching in high dimensions, Indyk and Motwani gave the first algorithm with a less than linear query time as well as a somewhat practical implementation. LSH uses multiple hash functions picked, at random, from a family of hash functions in order to bucket the high dimensional points in a low dimensional subspace [50]. It then seeks to answer ball queries by hashing query points and returning any or all points that lie within a radius. Two LSH methods have been used in practice. Rigorous-LSH [50] uses a number of radius balls to guarantee a constant factor nearest neighbor solution, although with substantial cost in both time and storage space. Adhoc-LSH [48] forgoes the theoretical guarantees and instead focuses on fast queries. By using a heuristic approach to selecting the query radius, as well as multiple shifted versions of the buckets, adhoc-LSH answers queries much more rapidly than the rigorous-LSH method. The drawback, however, is that the quality of the nearest neighbor solution suffer. Proposed by Tao et al, LSB-trees [95] were designed to specifically address the shortcomings of the LSH algorithm for nearest neighbor searching. This method uses hash functions similar to LSH, as well as a B-tree to index points for quick searching. They showed that, in practice, they can easily outperform LSH and iDistance. Additionally, they were able to 62 give constant factor bounds on the nearest neighbor answer returned [95]. Other approximation algorithms are based on randomized kd-trees. Silpa-Anan and Hartley demonstrated this method could work, in practice [93]. Fukunaga and Narendra [47] proposed partitioning high level point sets recursively into disjoint groups using k-means clustering. This was later put into practice for nearest neighbor queries on large databases. Mikolajczyk and Matas [64] evaluated the practicality of many of these methods, and later Muja and Lowe developed a self tuning algorithm, called FLANN, that actually selects between several algorithms depending on the data set [71]. Using this, they were able to outperform other methods, including LSH and ANN. 6.0.1 Notation Points are denoted by lower-case Roman letters. d is reserved for the dimension of the point set. n is reserved for the cardinality of the point set. dℓ (p, q) denotes the distance between the points p and q in L2 metric. Upper-case Roman letters are reserved for matrices. Additionally, point sets are considered to be stored in a d by n matrix. Bold, upper case Roman letters are reserved for sets. Scalars are represented by lower-case Greek letters. i, j, k are reserved for indexing purposes. 6.1 The PCANN Algorithm The FLANN implementation demonstrates two useful ideas in terms of high dimensional nearest neighbor searching. The first is that, depending on the data set, one method for nearest neighbor searching may be superior to others. A practical algorithm, therefore, might be able to gain advantage by analyzing the data set and picking an optimal method. The second is the common thought that if the data is projected into enough low dimensional subspaces, some of them will contain the correct adjacency data. PCANN(Algorithm 8) seeks to combine these notions. Instead of random projections into low dimensional subspaces, this algorithm will analyze the data and find subspaces which best captures the adjacency information. Algorithm 8 details the construction of a data structure for high dimensional nearest neighbor searching. The Build procedure is the called function. Let P be a set of points in Rd . Let T be a set of points in Rd for which the nearest neighbor in P is known. ǫ is the error parameter, ρ′ is the number of partitions, ν is a constant to define the number of random projections used in partitioning the points. First, principal component analysis(see chapter 2.7.1) is performed on the d dimensional data set, identifying d eigenvectors and ordering them by dominance(Line 12). The data, along with a training set of queries, are projected into the hyper-plane identified by the first eigenvector(Line 13). Nearest neighbors are found for the projected points in this initial subspace, and error is computed. If that error is greater than the target error ǫ, then the projecting dimension is incremented and the procedure is repeated(Lines28-32). For ease of explanation, this is described using a linear increase in projection dimension. In practice, this is best accomplished using a binary search style approach. Once the lower dimension has been identified, the points in d will be projected down into the new subspace, defined by the first d′ eigenvectors from the PCA(Line 13). It would 63 be possible to construct a kd-tree over this projection, and search for nearest neighbors. Instead, additional processing allows the further reduction of dimension (with a corresponding reduction in query time). In order to partition the point set, ν random 3 dimensional projections are constructed. These projections are partitioned by fitting two slabs to the point sets(Lines 15-23). In this case, a slab is defined as two parallel planes. This 2-line center problem is solved using an approximate incremental algorithm introduced by Kumar and Kumar [57]. In it, empty slabs are initially created, and the furthest point from them in the data set is found. This point is added to each slab, and recursion is carried out to find the best fit for the remaining points, which minimizes the width of the slabs and also places a minimum threshold of points into each set. Out of ν projections, the one with slabs of minimum width is chosen. The points are then divided into a red and blue set,depending on which slab contained them. The entire procedure is repeated recursively until the original data set has been divided into ρ′ partitions. The goal in dividing the points in this manner is to find projections using PCA that allow maximum reduction in dimension while minimizing the error in nearest neighbor searching. Once all the projections have been identified, the Project procedure is repeated one final time for each piece. This again identifies a subspace, based on PCA, for which the training data has an acceptable error. Now, a kd-tree can be built over each piece, making up the set K. The number of partitions, ρ′ , chosen is influenced by several factors. In a standard, sequential case, there is a tipping point (which can be determined empirically) where the overhead from having additional partitions overcomes the advantage of partitioning the point set. For data sets too large to fit into conventional memory, this technique can be adapted to work effectively by choosing to partition until each piece can be processed in conventional memory, eliminating the need for some costly calls to disk. This algorithm also lends itself well to parallel queries. In this case, the number of partitions would be best dictated by the number of available threads (some constant number of partitions going to each thread). Once the structure has been created, the query algorithm is simple. A query point is projected into the subspace defined by each of the constructed partitions. Each partition is queried using the kd-tree, and the top k results are reported. In the sequential version, partitions are processed in order of subspace distance from the query point. Subspaces can be eliminated if they are farther away than any current neighbor. For parallel queries, the greatest timing improvements can be seen when queries are processed in batches. In this case, each partition runs in a separate thread, and processes all queries in a batch. Afterward, the results from each partition are combined, with the top k for each query being reported. The computational complexity of the construction is dominated by the two-plane fit computation. The PCA computation is linear in complexity, and the kd-tree construction is easily done in O(n log n) time. Kumar and Kumar showed that their k-clustering algorithm has a running time of O(k n ) [57], which would be O(2n ) in this case. Note that this is sub-optimal theoreticaly. Agarwal and Procopiuc give an approximation algorithm with a running time of O(n log n) [3]. However, the incremental algorithm has been shown, in practice, to terminate quickly in most cases [57]. 64 Algorithm 8 Preprocessing for PCANN Require: Point sets P, T ∈ Rd of size n. Ensure: M[1...ρ] ← all independent partitions of P . 1: procedure Build(P , T , ǫ, ρ′ , ν) 2: M′ [] ← Partition(P , T , ǫ, 0, ν, M′ []) 3: for i = 1 to ρ′ 4: E[i] ← eigenvectors of M′ [i] from PCA 5: M[i] ← Project(M′ [i], T, ǫ, E[i]) 6: K[i] ← kd-tree(M[i]) 7: end procedure 8: procedure Partition(P , T , ǫ, ρ, ρ′ , ν, M[]) 9: if ρ >= ρ′ then 10: partition P added to M 11: return 12: E ← d eigenvectors of P through PCA. 13: P ′ ← Project(P, T, ǫ, E) 14: φ←∞ 15: for j = 1 to ν 16: R ← i random vectors of length 3 17: P ′′ ← P ′ × R 18: φ′ ← T woSlabF it(P ′′ , R, B) //The width of the slabs is returned 19: //from the fitting procedure 20: if φ′ < φ then //and partitions points into 21: φ = φ′ //slabs R and B 22: P1 ← R 23: P2 ← B 24: Partition(P1 , T , ǫ, ρ + 2, ρ′ , ν, M[]) 25: Partition(P2 , T , ǫ, ρ + 2, ρ′ , ν, M[]) 26: end procedure Require: Point sets P, T ∈ Rd of size n. 27: procedure Project(P , T , ǫ, E) 28: for i = 1 to d 29: P ′ ← P × E[1...i] 30: T ′ ←PT × E[1...i] 31: e = t∈T dℓ (t, nn(t, P ′ )) − dℓ (t, nn(t, P )) 32: if e/|T | < ǫ then return P ′ 33: end procedure 65 6.2 High Dimensional Nearest Neighbor Search Experimental Setup The PCANN algorithm was tested experimentally against two competitors. The goal was to show that given an appropriate data set, the PCA projection approach can lead to better practical results. The PCANN algorithm was coded in C++ and compiled with gcc with -O3 optimization. Parallel code was written using OpenMP [33] and the parallel mode extension to the STL [83]. C++ source code. ANN was used to construct the kd-trees and query them. The Linux machine used has 8 Quad-Core AMD Opteron(tm) Processor 8378 with hyperthreading enabled. Each core has a L1 cache size of 512 KB, L2 of 2MB and L3 of 6MB with 128 GB total memory. The operating system was CentOS 5.3. All data was generated and stored as 64 bit doubles. The Windows machine used has an Intel Core 2 Duo CPU E7500. It has an 3MB L2 cache and 4 GB total memory. The operating system was Windows 7. The FLANN and LSB-Tree algorithms were chosen as competitors, due to their performance when compared to the rest of the field. Both algorithms have been tested experimentally against other leading implementations, and came out ahead. Results from testing LSH was not included in these experiments, due to it begin dominated by FLANN and LSB-Tree in previous research. FLANN was tested on the Linux machine, LSB-Tree was tested on the Windows machine. 6.2.1 FLANN FLANN uses an optimization problem to choose between two nearest neighbor algorithms based on the structure of the data set and a training set of data. The first algorithm it considers is a randomized kd-tree algorithm. It projects data points into a random, lowdimensional plane, and builds a kd-tree on it. It repeats this procedure until it has enough trees so that queries can be answered within the accuracy of a user defined error parameter. The projection dimension is a fixed constant. The second algorithm is a hierarchical k-means tree. It is constructed by partitioning the data sets into K distinct regions using k-means clustering, and recursively constructing nodes until the number of points in a node is smaller than K. The query algorithm is similar to the method used by ANN. First, there is a traversal of the tree in order to identify necessary paths of traversal. These are placed in a priority queue and processed in order. 6.2.2 LSB-Tree The LSB-tree is designed to overcome the weaknesses of LSH in performing nearest neighbor search. Its data structure is created by first choosing a projecting dimension, based on the original dimension, the number of points, and the page size of the architecture. Then random hash functions are chosen, in a manner similar to LSH, one for each dimension in the projection. Each coordinate in the projected point comes from a hash of the point in the original space. Once this is done, the hashed points are projected onto a Morton order curve, and indexed with a B tree. Queries are processed by finding their location in the 66 Morton order, and considering the hashed points in that node. Further nodes are processed in order of increasing z-value, until a maximum number of searches have been reached, or the nearest unsearched node is farther than a pre-determined value. A single LSB-tree is not accurate enough to return good nearest neighbor results. In order to bound the quality of the answers, a set of trees is constructed (again based on the dimension, size and page size). This LSB-forest gives 4-factor approximation with a constant probability. By increasing the number of forests, this factor can be improved to (2 + ǫ). 6.2.3 Experiment Data Data for these experiments was taken from BIGANN, consisting of one billion 128dimensional SIFT descriptors. These features were extracted from one million images. SIFT data is commonly used along with nearest neighbor searching to match features, making this a realistic test of the effectiveness of these methods. Finally, this is similar data to that used by the FLANN authors to test their algorithm. From the one billion vectors, a subset were selected to form data sets of ranging size for the tests. In addition to the SIFT data, the algorithms were also tested on an artificial distribution. Uniformly random data distributed in the unit hypercube has much less structure than SIFT data. This will demonstrate how the PCANN algorithm behaves when the principal component analysis cannot identify very dominant eigenvectors for projection. 6.3 High Dimensional Nearest Neighbor Search Experimental Results The first set of experimental results show a comparison between PCANN and FLANN(Figure 6.1) and PCANN and LSB-Tree(Figure 6.2). PCANN clearly has an advantage in timing on the SIFT data. Figure 6.3 shows the average error for all three algorithms. In order to tune PCANN and FLANN, the LSB-Tree algorithm was run first, and it’s error was used as the target for the other two. Error factor is determined as the distance of the found near neighbor divided by the distance to the correct nearest neighbor. All three algorithms had similar error. The second set of experiments show the same trials on uniform random data. In this case, PCANN did not have as good performance. Figure 6.4 shows that it was still able to outperform FLANN, but LSB-Tree had better query times(Figure 6.5). Again, error was tuned to LSB-Tree’s error, and results did not differ significantly from the SIFT data in terms of relative error. Figure 6.6 shows the error factor for varying projection dimensions. In this experiment, PCA was done on the data set, and the data points projected into a particular dimension. Queries were run, and the error factor calculated. The error is slightly higher than it would be if partitioning had been done, but this simpler graph demonstrates the principal without much clutter. The graph clearly shows that in order to achieve a low error rate, the SIFT data requires a much lower projection dimension than uniform data, which translates directly into a faster query time. 67 The final graph(Figure 6.7) shows PCANN’s performance in parallel. The sequential version was tested alongside a parallel version running on 8 threads. In this instance, the parallel version has around 3.5 times faster queries than the sequential version. The need to compile the results from various threads prevents the algorithm from being perfectly parallel. Time v Data Size 50 FLANN PCANN Average Time per Query(microseconds) 45 40 35 30 25 20 15 10 5 0 0 50 100 150 200 Data Size(Thousands of Points) 250 300 Figure 6.1: Timing results for PCANN versus FLANN on SIFT data. 68 Time v Data Size 40 LSB-Tree PCANN Average Time per Query(microseconds) 35 30 25 20 15 10 0 50 100 150 200 250 300 Data Size(Thousands of Points) Figure 6.2: Timing results for PCANN versus LSB-Tree on SIFT data. Nearest Neighbor Error 1.25 LSB-Tree PCANN FLANN Average Error Factor 1.2 1.15 1.1 1.05 1 0 50 100 150 200 Data Size(Thousands of Points) 250 300 Figure 6.3: Average error for PCANN, FLANN and LSB-Tree on SIFT data. 69 Time v Data Size 80 FLANN PCANN Average Query Time(microseconds) 70 60 50 40 30 20 10 0 0 50 100 150 200 Data Size(Thousands of Points) 250 300 Figure 6.4: Timing results for PCANN versus FLANN on uniform random data. Time v Data Size 60 LSB-Tree PCANN Average Query Time(microseconds) 50 40 30 20 10 0 0 50 100 150 200 Data Size(Thousands of Points) 250 300 Figure 6.5: Timing results for PCANN versus LSB-Tree on uniform random data. 70 Error Factor in PCANN 1.25 SIFT Uniform Error Factor 1.2 1.15 1.1 1.05 1 0 20 40 60 80 Projection Dimension 100 120 140 Figure 6.6: Average error for PCA projection for SIFT and uniform random data. Average is from 1000 queries on 100000 data points. Time v Data Size 35 Sequential Parallel Average Query Time(microseconds) 30 25 20 15 10 5 0 0 50 100 150 200 250 300 Data Size(Thousands of Points) Figure 6.7: Timing results for sequential PCANN versus parallel on SIFT data. 71 CHAPTER 7 CONCLUSIONS This work has attempted to expand the tool set for solving common computational geometry problems. Through the use of common building blocks, several algorithms have been presented that offer advantages over traditional competitors. The knng algorithm was presented as an effective solution to the k-nearest neighbor graph construction problem, which takes advantage advantage of multiple threads. While the algorithm performs best on point sets that use integer coordinates, it is clear from experimentation that the algorithm is still viable using floating point coordinates. Further, the algorithm scales well in practice as k increases, as well as for data sets that are too large to reside in internal memory. Finally, the cache efficiency of the algorithm should allow it to scale well as more processing power becomes available. The DelaunayNN algorithm for finding the nearest neighbor for a query in two dimensions has both an expected run time bound of O(log n) and strong experimental performance when compared to existing, state of the art implementations. This work has also explored using the algorithm on data sets in three dimensions, and seen that in some cases it may be superior. It remains to be seen if this approach can be applied in a reasonable manner to dimensions higher than three, and if it can be extended to allow for efficient solutions to the k-nearest neighbor problem. The GeoFilterKruskal algorithm is a provably efficient and practically effective method for constructing geometric minimum spanning trees. It uses well separated pair computation in combination with partitioning and filtering to solve the problem in a parallelizeable way. As demonstrated on a wide variety of data sets, it is superior to many state-of-the art implementations on most distributions. The PCANN algorithm demonstrates how appropriate subspace selection, through the use of principal component analysis and projective clustering, can improve the query time for high dimensional nearest neighbor searching on some data sets. While it was less efficient on random data sets, it clearly offers an effective and parallizeable solution for data structures which demonstrate an appropriate, exploitable structure. The field of computational geometry offers a wide array of problems which are applicable to a multitude of other fields of science. By improving on solutions to these problems, hopefully both the field of computational geometry, as well as the fields that use it, can be advanced. The goal of this work was to be just such an improvement, and it is the hope of the author that other researchers in these fields will find the results presented here useful. 72 APPENDIX A FULL PROOF OF THE GMST ALGORITHM ANALYSIS This high probability proof was written in conjunction with Samidh Chaterjee, and is reproduced here with his permission. High Probability Bound Analysis. In this section we show that the GeoFilterKruskal algorithm takes one sort plus O(n) additional steps, with high probability (WHP) [68], to compute the GMST. Let P r(E) denote the probability of occurrence of an event E, where E is a function of n. An event E is said to occur WHP if given β > 1, P r(E) > 1 − 1/nβ [68]. Let P be a set of n points chosen uniformly from a unit hypercube H in Rd . Given this, we state the following lemma from [73]. Lemma A.0.1. Let C1 and C2 be convex regions in H such that α ≤ volume(C1 )/volume(C2 ) ≤ 1/α for some constant 0 < α < 1. If |C1 ∩ P | is bounded by a constant, then with high probability |C2 ∩ P | is also bounded by a constant. We will use the above lemma to prove the following claims. Lemma A.0.2. Given a constant γ > 1, WHP, GeoFilterKruskalfilters out WSPs that have more than γ points. We prove this lemma for quadtrees only, for FST, the proof does not change from what is shown in [73]. Proof. The proof of this lemma is similar to the one for GeoMST2. Consider a WSP (a, b). If both |a| and |b| are less than or equal to γ then the time to compute their Bccp distance is O(1). Let us now assume, w.lo.g., that |a| > γ. We will show that, in this case, we do not need to compute the Bccp distance of (a, b) WHP. Let pq be a line segment joining a and b such that the length of pq (let us denote this by |pq|) is MinDist(a, b). Let C1 be a hypersphere centered at the midpoint of pq and radius |pq|/4. Let C2 be another hypersphere with the same center but radius 3|pq|/2. Since a and b are well separated, C2 will contain both a and b. Now, volume(C1 )/volume(C2 ) = 6−d . Since C1 is a convex region, if |C1 | is empty, then by Lemma A.0.1, |C2 | is bounded by a constant WHP. But C2 contains a which has more than γ points. Hence C1 cannot be empty WHP. Let a ∈ a, b ∈ b and c ∈ C1 . Also, let the pair (a, c) and (b, c) belong to WSPs (u1 , v1 ) and (u2 , v2 ) respectively. Note that Bccp(a, b) must be greater than Bccp(u1 , v1 ) and Bccp(u2 , v2 ). Since our algorithm adds the Bccp edges by order of their increasing distance, c and the points 73 in a will be connected before the Bccp edge between a and b is examined. The same is true for c and the points in b. This causes a and b to belong to the same connected component WHP, and thus, our filtering step will get rid of the well separated pair (a, b) before we need to compute its Bccp edge. Lemma A.0.3. WHP, the total running time of the UnionFind operation is O(α(n)n). Proof. Lemma A.0.2 shows that, WHP, we only need to compute Bccp distances of WSPs of constant size. Since we compute Bccp distances incrementally, WHP, the number of calls to the GeoFilterKruskal procedure is also bounded above by O(1). In each of such calls, the Filter function is called once, which in turn calls the Find(u) function of the UnionFind data structure O(n) times. Hence, there are in total O(n) Find(u) operations done WHP. Thus the overall running time of the Union() and Find() operations is O(α(n)n) WHP. Theorem A.0.4. GeoFilterKruskaltakes one sort plus O(n) additional steps, WHP, to compute the GMST. Proof. We partition the list of well separated pairs twice in the GeoFilterKruskal method. The first time we do it based on the need to compute the Bccp of the well separated pair. We have the sets El and Eu in the process. This takes O(n) time except for the Bccp computation. In O(n) time we can find the pivot element of Eu for the next partition. This partitioning also takes linear time. From Lemma A.0.2, we can infer that the recursive call on GeoFilterKruskal is performed O(1) times WHP. Thus the total time spent in partitioning is O(n) WHP. Since the total number of Bccp edges required to compute the GMST is O(n), by Lemma A.0.2, the time spent in computing all such edges is O(n) WHP. Total time spent in sorting the edges in the base case is O(n log n). Including the time to compute the Morton order sort for the WSPD, the total running time of the algorithm is one sort plus O(n) additional steps WHP. 74 BIBLIOGRAPHY [1] Pankaj K. Agarwal, Herbert Edelsbrunner, Otfried Schwarzkopf, and Emo Welzl. Euclidean minimum spanning trees and bichromatic closest pairs. In Proceedings of the sixth annual symposium on Computational geometry, SCG ’90, pages 203–210, New York, NY, USA, 1990. ACM. [2] Pankaj K. Agarwal, Herbert Edelsbrunner, Otfried Schwarzkopf, and Emo Welzl. Euclidean minimum spanning trees and bichromatic closest pairs. Discrete Comput. Geom., 6(5):407–422, 1991. [3] Pankaj K. Agarwal, Cecilia Magdalena Procopiuc, and Kasturi R. Varadarajan. Approximation algorithms for k-line center. In Proceedings of the 10th Annual European Symposium on Algorithms, ESA ’02, pages 54–63, London, UK, UK, 2002. SpringerVerlag. [4] M. Alexa, J. Behr, D. Cohen-Or, S. Fleishman, D. Levin, and C. T. Silva. Point set surfaces. In VIS ’01: Proceedings of the conference on Visualization ’01, pages 21–28, Washington, DC, USA, 2001. IEEE Computer Society. [5] Sanjeev Arora. Polynomial time approximation schemes for euclidean traveling salesman and other geometric problems. J. ACM, 45(5):753–782, 1998. [6] S. Arya and D. Mount. Computational geometry: Proximity and location. In Dinesh Mehta and Sartaj Sahni, editors, Handbook of Data Structures and Applications, chapter 63, pages 63–1, 63–22. CRC Press, 2005. [7] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. J. ACM, 45:891–923, 1998. [8] Sunil Arya and Ho-Yam Addy Fu. Expected-case complexity of approximate nearest neighbor searching. SIAM J. Comput., 32:793–815, March 2003. [9] Sunil Arya, Theocharis Malamatos, David M. Mount, and Ka Chun Wong. Optimal expected-case planar point location. SIAM J. Comput., 37(2):584–610, 2007. [10] Franz Aurenhammer. Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput. Surv., 23(3):345–405, 1991. [11] J.D. Barrow, S.P. Bhavsar, and D.H. Sonoda. Min-imal spanning trees, filaments and galaxy clustering. MNRAS, 216:17–35, Sept 1985. 75 [12] R. Baxter. Planar lattice gases with nearest-neighbor exclusion. Annals of Combinatorics, 3:191–203, 1999. 10.1007/BF01608783. [13] Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517, 1975. [14] Jon Louis Bentley, Bruce W. Weide, and Andrew C. Yao. Optimal expected-time algorithms for closest point problems. ACM Trans. Math. Softw., 6(4):563–580, 1980. [15] Stefan Berchtold, Christian Böhm, H. V. Jagadish, Hans-Peter Kriegel, and Jörg Sander. Independent quantization: An index compression technique for highdimensional data spaces. In IN ICDE, pages 577–588, 1999. [16] M. Bern. Approximate closest-point queries in high dimensions. Inf. Process. Lett., 45(2):95–99, 1993. [17] S. P. Bhavsar and R. J. Splinter. The superiority of the minimal spanning tree in percolation analyses of cosmological data sets. MNRAS, 282:1461–1466, Oct 1996. [18] Marcel Birn, Manuel Holtgrewe, Peter Sanders, and J. Singler. Simple and Fast Nearest Neighbor Search. In 2010 Proceedings of the Twelfth Workshop on Algorithm Engineering and Experiments, pages 43–54, 16 January 2010. [19] Jean-Daniel Boissonnat, Olivier Devillers, Monique Teillaud, and Mariette Yvinec. Triangulations in cgal (extended abstract). In SCG ’00: Proceedings of the sixteenth annual symposium on Computational geometry, pages 11–18, New York, NY, USA, 2000. ACM. [20] J.J. Brennan. Minimal spanning trees and partial sorting. Operations Research Letters, 1(3):138–141, 1982. [21] P. B. Callahan and S. R. Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. J. ACM, 42(1):67–90, 1995. [22] Paul B. Callahan. Dealing with higher dimensions: the well-separated pair decomposition and its applications. PhD thesis, Johns Hopkins University, Baltimore, MD, USA, 1995. [23] T. M. Chan. Approximate nearest neighbor queries revisited. In SCG ’97: Proceedings of the thirteenth annual symposium on Computational geometry, pages 352–358, New York, NY, USA, 1997. ACM. [24] T. M. Chan. Manuscript: A minimalist’s implementation of an approximate nearest neighbor algorithm in fixed dimensions, 2006. [25] Timothy M. Chan. Well-separated pair decomposition in linear time? Inf. Process. Lett., 107(5):138–141, 2008. 76 [26] J. Chhugani, B. Purnomo, S. Krishnan, J. Cohen, S. Venkatasubramanian, and D. S. Johnson. vlod: High-fidelity walkthrough of large virtual environments. IEEE Transactions on Visualization and Computer Graphics, 11(1):35–47, 2005. [27] U. Clarenz, M. Rumpf, and A. Telea. Finite elements on point based surfaces. In Proc. EG Symposium of Point Based Graphics (SPBG 2004), pages 201–211, 2004. [28] K. L. Clarkson. Fast algorithms for the all nearest neighbors problem. In FOCS ’83: Proceedings of the Twenty-fourth Symposium on Foundations of Computer Science, Tucson, AZ, November 1983. Included in PhD Thesis. [29] K. L. Clarkson. Nearest-neighbor searching and metric space dimensions. In G. Shakhnarovich, T. Darrell, and P. Indyk, editors, Nearest-Neighbor Methods for Learning and Vision: Theory and Practice, pages 15–59. MIT Press, 2006. [30] Kenneth L. Clarkson. An algorithm for geometric minimum spanning trees requiring nearly linear expected time. Algorithmica, 4:461–469, 1989. Included in PhD Thesis. [31] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press and McGraw-Hill, 2001. [32] D. Cotting, T. Weyrich, M. Pauly, and M. Gross. Robust watermarking of pointsampled geometry. In SMI ’04: Proceedings of the Shape Modeling International 2004, pages 233–242, Washington, DC, USA, 2004. IEEE Computer Society. [33] L. Dagum and R. Menon. Openmp: an industry standard api for shared-memory programming. IEEE Computational Science and Engineering, 5(1):46–55, 1998. [34] Mark de Berg, Marc van Kreveld, Mark Overmars, and Otfried Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, second edition, 2000. [35] B. Delaunay. Sur la sphère vide. Otdelenie Matematicheskikh i Estestvennykh Nauk, 7:793—-800, 1934. [36] Olivier Devillers. The Delaunay Hierarchy. International Journal of Foundations of Computer Science, 13:163–180, 2002. [37] M. T. Dickerson and R. S. Drysdale. Fixed-radius near neighbors search algorithms for points and segments. Inf. Process. Lett., 35(5):269–273, 1990. [38] M. T. Dickerson and D. Eppstein. Algorithms for proximity problems in higher dimensions. Computational Geometry Theory & Applications, 5(5):277–291, January 1996. [39] R. A. Dwyer. Higher-dimensional voronoi diagrams in linear expected time. In SCG ’89: Proceedings of the fifth annual symposium on Computational geometry, pages 326–333, New York, NY, USA, 1989. ACM. 77 [40] Herbert Edelsbrunner, Lionidas J Guibas, and Jorge Stolfi. Optimal point location in a monotone subdivision. SIAM J. Comput., 15(2):317–340, 1986. [41] Jeff Erickson. On the relative complexities of some geometric problems. In In Proc. 7th Canad. Conf. Comput. Geom, pages 85–90, 1995. [42] Jeff Erickson. Nice point sets can have nasty delaunay triangulations. In Proceedings of the seventeenth annual symposium on Computational geometry, SCG ’01, pages 96–105, New York, NY, USA, 2001. ACM. [43] Jeffrey Gordon Erickson. Lower bounds for fundamental geometric problems. PhD thesis, University of California, Berkeley, 1996. Chair-Seidel, Raimund. [44] R. A. Finkel and J. L. Bentley. Quad trees a data structure for retrieval on composite keys. Acta Informatica, 4(1):1–9, March 1974. [45] S. Fleishman, D. Cohen-Or, and C. T. Silva. Robust moving least-squares fitting with sharp features. ACM Trans. Graph., 24(3):544–552, 2005. [46] Jerome H. Friedman, Jon Louis Bentley, and Raphael Ari Finkel. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw., 3:209– 226, September 1977. [47] K. Fukunage and P. M. Narendra. A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. Comput., 24:750–753, July 1975. [48] Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB ’99, pages 518–529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. [49] John Iacono. Optimal planar point location. In SODA ’01: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 340–341, Philadelphia, PA, USA, 2001. Society for Industrial and Applied Mathematics. [50] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, STOC ’98, pages 604–613, New York, NY, USA, 1998. ACM. [51] D. R. Karger and M. Ruhl. Finding nearest neighbors in growth-restricted metrics. In STOC ’02: Proceedings of the thirty-fourth annual ACM symposium on Theory of computing, pages 741–750, New York, NY, USA, 2002. ACM. [52] David G. Kirkpatrick. Optimal search in planar subdivisions. SIAM Journal on Computing, 12(1):28–35, 1983. [53] Philip N. Klein and Robert E. Tarjan. A randomized linear-time algorithm for finding minimum spanning trees. In STOC ’94: Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pages 9–15, New York, NY, USA, 1994. ACM. 78 [54] E. Kranakis, H. Singh, and J. Urrutia. Compass routing on geometric networks. In Proc. of 11th Canadian Conference on Computational Geometry, pages 51–54, 1999. [55] J.B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. In Proc. American Math. Society, pages 7–48, 1956. [56] Joseph B. Kruskal. On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Proceedings of the American Mathematical Society, 7(1):48–50, February 1956. [57] Pankaj Kumar and Piyush Kumar. Almost optimal solutions to k-clustering problems. Int. J. Comput. Geometry Appl., 20(4):431–447, 2010. [58] Elmar Langetepe and Gabriel Zachmann. Geometric Data Structures for Computer Graphics. A. K. Peters, Ltd., Natick, MA, USA, 2006. [59] Swanwa Liao, Mario A. Lopez, and Scott T. Leutenegger. High dimensional similarity search with space filling curves. In Proceedings of the 17th International Conference on Data Engineering, pages 615–622, Washington, DC, USA, 2001. IEEE Computer Society. [60] David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60:91–110, November 2004. [61] Songrit Maneewongvatana and David M. Mount. Analysis of approximate nearest neighbor searching with clustered point sets, 1999. [62] Robert Mencl. A graph based approach to surface reconstruction. Computer Graphics Forum, 14:445 – 456, 2008. [63] A. Mhatre and P. Kumar. Projective clustering and its application to surface reconstruction: extended abstract. In SCG ’06: Proceedings of the twenty-second annual symposium on Computational geometry, pages 477–478, New York, NY, USA, 2006. ACM. [64] Krystian Mikolajczyk and Jiri Matas. Improving Descriptors for Fast Tree Matching by Optimal Linear Projection. In Proceedings of IEEE International Conference on Computer Vision, pages 1–8, 2007. [65] Stanley Milgram. The small world problem. Psychology Today, 1(1):60–67, 1967. [66] N. J. Mitra and A. Nguyen. Estimating surface normals in noisy point cloud data. In SCG ’03: Proceedings of the nineteenth annual symposium on Computational geometry, pages 322–328, New York, NY, USA, 2003. ACM. [67] G. M. Morton. A computer oriented geodetic data base and a new technique in file sequencing. In Technical Report,IBM Ltd, 1966. [68] Rajeev Motwani and Prabhakar Raghavan. Randomized algorithms. ACM Comput. Surv., 28(1):33–37, 1996. 79 [69] D. Mount. ANN: Library for Approximate Nearest Neighbor Searching, 1998. http: //www.cs.umd.edu/~mount/ANN/. [70] David Mount and Sunil Arya. Ann: A library for approximate nearest neighbor searching. 1997. [71] Marius Muja and David G. Lowe. Fast approximate nearest neighbors with automatic algorithm configuration. In In VISAPP International Conference on Computer Vision Theory and Applications, pages 331–340, 2009. [72] K. Mulmuley. A fast planar partition algorithm, ii. In SCG ’89: Proceedings of the fifth annual symposium on Computational geometry, pages 33–43, New York, NY, USA, 1989. ACM. [73] Giri Narasimhan and Martin Zachariasen. Geometric minimum spanning trees via well-separated pair decompositions. J. Exp. Algorithmics, 6:6, 2001. [74] David Nistér and Henrik Stewénius. Scalable recognition with a vocabulary tree. In IN CVPR, pages 2161–2168, 2006. [75] D. Omercevic, O. Drbohlav, and A. Leonardis. High-dimensional feature matching: Employing the concept of meaningful nearest neighbors. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1 –8, 2007. [76] J. A. Orenstein and T. H. Merrett. A class of data structures for associative searching. In PODS ’84: Proceedings of the 3rd ACM SIGACT-SIGMOD symposium on Principles of database systems, pages 181–190, New York, NY, USA, 1984. ACM. [77] Vitaly Osipov, Peter Sanders, and Johannes Singler. The filter-kruskal minimum spanning tree algorithm. In Irene Finocchi and John Hershberger, editors, ALENEX, pages 52–61. SIAM, 2009. [78] R. Pajarola. Stream-processing points. In Proceedings IEEE Visualization, 2005, Online., pages 239–246. Computer Society Press, 2005. [79] M. Pauly, M. Gross, and L. P. Kobbelt. Efficient simplification of point-sampled surfaces. In VIS ’02: Proceedings of the conference on Visualization ’02, pages 163– 170, Washington, DC, USA, 2002. IEEE Computer Society. [80] M. Pauly, R. Keiser, L. P. Kobbelt, and M. Gross. Shape modeling with point-sampled geometry. ACM Trans. Graph., 22(3):641–650, 2003. [81] K. Pearson. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(6):559–572, 1901. [82] Franco P. Preparata and Michael I. Shamos. Computational geometry: an introduction. Springer-Verlag New York, Inc., New York, NY, USA, 1985. 80 [83] Felix Putze, Peter Sanders, and Johannes Singler. Mcstl: the multi-core standard template library. In PPoPP ’07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 144–145, New York, NY, USA, 2007. ACM. [84] Sanguthevar Rajasekaran. On the euclidean minimum spanning tree problem. Computing Letters, 1(1), 2004. [85] Hanan Samet. Applications of spatial data structures: Computer graphics, image processing, and GIS. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1990. [86] J. Sankaranarayanan, H. Samet, and A. Varshney. A fast all nearest neighbor algorithm for applications involving large point-clouds. Comput. Graph., 31(2):157–174, 2007. [87] Cordelia Schmid and Roger Mohr. Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19:530–535, 1997. [88] Raimund Seidel. A simple and fast incremental randomized algorithm for computing trapezoidal decompositions and for triangulating polygons. Comput. Geom. Theory Appl., 1(1):51–64, 1991. [89] J. S. Semura and D. L. Huber. Low-temperature behavior of the planar heisenberg ferromagnet. Phys. Rev. B, 7(5):2154–2162, Mar 1973. [90] Michael Ian Shamos and Dan Hoey. Closest-point problems. In SFCS ’75: Proceedings of the 16th Annual Symposium on Foundations of Computer Science, pages 151–162, Washington, DC, USA, 1975. IEEE Computer Society. [91] Alok Sharma and Kuldip K. Paliwal. Fast principal component analysis using fixedpoint algorithm. Pattern Recogn. Lett., 28:1151–1155, July 2007. [92] Jonathan Richard Shewchuk. Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator. In Ming C. Lin and Dinesh Manocha, editors, Applied Computational Geometry: Towards Geometric Engineering, volume 1148 of Lecture Notes in Computer Science, pages 203–222. Springer-Verlag, May 1996. From the First ACM Workshop on Applied Computational Geometry. [93] C. Silpa-Anan and R. Hartley. Optimised KD-trees for fast image descriptor matching. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8, June 2008. [94] Peter Su and Robert L. Scot Drysdale. A comparison of sequential delaunay triangulation algorithms. In SCG ’95: Proceedings of the eleventh annual symposium on Computational geometry, pages 61–70, New York, NY, USA, 1995. ACM. [95] Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Trans. Database Syst., 35:20:1–20:46, July 2010. 81 [96] H. Tropf and H. Herzog. Multidimensional range search in dynamically balanced trees. Angewandte Informatik, 2:71–77, 1981. [97] P. Tsigas and Y. Zhang. A simple, fast parallel implementation of quicksort and its performance evaluation on SUN Enterprise 10000. Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 00:372, 2003. [98] P. M. Vaidya. An O(n log n) algorithm for the all-nearest-neighbors problem. Discrete Comput. Geom., 4(2):101–115, 1989. [99] Roger Weber, Hans-Jörg Schek, and Stephen Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proceedings of the 24rd International Conference on Very Large Data Bases, VLDB ’98, pages 194–205, San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc. [100] C.T. Zahn. Graph-theoretical methods for detecting and describing gestalt clusters. Transactions on Computers, C-20(1):68–86, 1971. 82 BIOGRAPHICAL SKETCH Michael Connor began study at Florida State University in 2002, in the Department of Computer Science. He was awarded the degree of Bachelor of Science from the computer science program in 2004. He was admitted to the graduate program, and continued his study in the department. Under the advisement of Dr. Piyush Kumar, he was awarded the degree of Master of Science in 2007. 83
© Copyright 2026 Paperzz