Nearest Neighbor Queries using R-trees Based on notes by Yufei Tao Nearest Neighbor Search Find the object nearest to a query point q E.g., find the gas station nearest to the red point. k nearest neighbors: Find the k objects nearest to q E.g., 1 NN = {h}, 2NN = {h, a}, 3NN = {h, a, i} y axis 10 l m f e 8 g d 6 k query j 4 b 2 i h a c x axis 2 0 2 4 6 8 10 CS4482 CityU of HK Nearest Neighbor Processing The R-tree can accelerate NN search, too. Concept: mindist(q, E) The minimum distance between a point q and a rectangle E y axis mindist(q,E7) 10 E7 8 E6 E7 6 E6 4 query a b 2 E4 E5 E1 E2 E3 c d e f k j g h i l m mindist(q,E6) x axis 0 2 4 6 8 10 3 CS4482 CityU of HK Depth-first NN Algorithm First load the root and compute the mindist from each entry to the query. Visit the child of the entry with the smallest mindist. In this case: E6 y axis 10 8 E6 E7 6 E7 E4 E5 E1 E2 E3 query 4 a b 2 c d e f k j g h i l m E6 x axis 0 2 4 6 8 10 4 CS4482 CityU of HK Depth-first NN Algorithm (cont.) Do this recursively at the next level. In the child node of E6, compute the mindist from every entry to the query. Visit the child node of the entry having the smallest mindist. In this case, E1 and E2 have the same mindist. So the decision is random – say, E1 first. Among all the points in the child node of E1, find the closest point a (our current result). y axis 10 8 E6 E7 E2 6 E3 4 b 2 c a E1 a b x axis 0 E4 E5 E1 E2 E3 query 2 4 6 8 10 c 5 d e f k j g h i l m CS4482 CityU of HK Depth-first NN Algorithm (cont.) Then backtrack to the child node of E6, where the entry with the next mindist value is E2. Its mindist 51/2 is however the same as the distance from q to a. So, we know that no point in E2 can possibly be closer to q than a. No result in E3 either – same reasoning. y axis 10 8 E6 E7 E2 6 E3 query E4 E5 E1 E2 E3 4 a E1 2 a b c d e f k j g h i l m x axis 0 2 4 6 8 10 6 CS4482 CityU of HK Depth-first NN Algorithm (cont.) We now backtrack to the root, where the entry with the next mindist is E7. Its mindist 21/2 closer than the distance 51/2 from q to a. Thus, its subtree may contain some point whose distance to q is smaller than the distance between q and a; so we have to visit it At the child node of E7, compute the mindist of all entries to q. E4 will be descended next. y axis 10 E7 8 E5 E6 E7 E4 6 4 E4 E5 E1 E2 E3 E6 query a a b c d e f k j g h i l m 2 x axis 0 2 4 6 8 10 7 CS4482 CityU of HK Depth-first NN Algorithm (cont.) In the child node of E4, we find a point h that is closer to q than a. So h becomes our new nearest neighbor. We backtrack to the child node of E7, where the entry with the next mindist is E5. E5’s mindist 131/2 is larger than the distance 21/2 from q to a. So we prune its subtree. The algorithm backtracks to the root and terminates. Visited (in this order) root, and the child nodes of E6, E1, E7, E4. y axis 10 E7 8 E5 E4 6 i h E6 4 E6 E7 g E4 E5 E1 E2 E3 query a b c d e f k j g h i l m 2 x axis 0 2 4 6 8 10 8 CS4482 CityU of HK Another Depth-first Example: 2 NN Difference: entries must be pruned based on their distances to our 2nd current NN. Root => child node of E6 => child node of E1 => find {a, b} here Backtrack to child node of E6 => child node of E2 (its mindist < dist(q, b)) => update our result to {a, f} Backtrack to child node of E6 => child node of E3 => backtrack to the root => child node of E7 => child node of E4 => update our result to {a, h} Backtrack to child node of E7 => prune E5 => backtrack to the root => end. y axis 10 l f e 8 d 6 j E6 b c E4 E6 E7 g i h E3 4 2 E2 k m E7 E5 query a E1 a b x axis 0 2 E4 E5 E1 E2 E3 4 6 8 10 c 9 d e f k j g h i l m CS4482 CityU of HK Optimal Performance of kNN Search What’s the best performance that can ever be achieved for a kNN? Vicinity circle: Centered at query q, with radius equal to the distance of q to its k-th NN All nodes that intersect the vicinity circle must be visited. Child node of E6 must be accessed by any algorithm. Although there’s no result in its subtree, this cannot be verified unless we visit it! y axis 10 l f e 8 d 6 E3 j 4 b 2 E2 k c m E7 E5 E4 i h E6 E6 E7 g query a E1 a b x axis 0 2 4 E4 E5 E1 E2 E3 6 8 10 10 c d e f k j g h i l m CS4482 CityU of HK Best-first Algorithm (optimal algorithm) BF maintains all the (leaf- and non-leaf) entries seen so far in the memory, and sorts them in ascending order by their mindist. Each step processes the entry in memory with the smallest mindist. Action Visit Root Memory Result E 1 E 2 6 7 {empty} y axis 10 E7 8 E6 E7 6 E6 4 E4 E5 E1 E2 E3 query a b c d e f k j g h i l m 2 x axis 0 2 4 6 8 10 11 CS4482 CityU of HK Best-first Algorithm (cont.) Insert all the entries in the child node of E6 into the sorted list. E7 is the next one to be processed. Action follow E6 Memory Result {empty} E 2 E1 5 E2 5 E3 9 7 y axis 10 E7 8 E6 E7 E2 6 E3 4 a b c d e f k j g h i l m E1 2 x axis 0 E4 E5 E1 E2 E3 query E6 2 4 6 8 10 12 CS4482 CityU of HK Best-first Algorithm (cont.) Insert all the entries in the child node of E7 into the sorted list. The next entry to be processed is E4. Action Memory follow E7 Result E 2 E1 5 E2 5 E3 9 E5 13 4 {empty} y axis 10 E7 E5 8 E2 6 E6 E7 E4 E3 4 query a b c d e f k j g h i l m E1 2 x axis 0 E4 E5 E1 E2 E3 E6 2 4 6 8 10 13 CS4482 CityU of HK Best-first Algorithm (cont.) Insert all the entries in the child node of E4 into the sorted list. The next entry to be processed is h, which is a leaf entry. This is the first NN of q. Action Memory Result follow E4 h 2 E1 5 E2 5 E3 9 i 10 E5 13 g 13 {empty} y axis 10 E7 E5 8 E2 6 E6 E6 E7 g i h E3 4 E4 E4 E5 E1 E2 E3 query a b c d e f k j g h i l m E1 2 x axis 14 0 2 4 6 8 10 CS4482 CityU of HK Best-first Algorithm: 2NN Assume we want 2 NNs; then, the algorithm continues. Report h as the 1st NN, and remove it from the heap The next entry to be processed is E1 Action Memory Result i 10 E5 13 g 13 E 5 E 5 E 9 1 2 3 remove h {h} y axis 10 E7 E5 8 E2 6 E6 E6 E7 g i h E3 4 E4 E4 E5 E1 E2 E3 query a b c d e f k j g h i l m E1 2 x axis 0 2 4 6 8 10 15 CS4482 CityU of HK Best-first Algorithm: 2NN (cont.) Visit the child node of E1; enter all its entries into the sorted list. The next entry is a, which is a leaf entry The 2nd NN and the algorithm terminates. Whenever we process a leaf entry in memory, it is the next NN for sure. Action Memory follow E1 a 5 E2 5 E3 9 Result i 10 E5 13 g 13 b 13 c 18 {h} y axis 10 E7 E5 8 E2 6 E6 b 2 c E6 E7 g i h E3 4 E4 E4 E5 E1 E2 E3 query a E1 a b c d e f k j g h i l m x axis 16 0 2 4 6 8 10 CS4482 CityU of HK Best-first = Best Performance To find the 1st NN, we visited the root, and the child nodes of E6, E7, E4. To find the 2nd, in addition to the above 3 nodes, we also visited the child node of E1. Both cases are optimal. It can be proved that BF visits the nodes in the tree in ascending order of their mindist to the query point. y axis 10 l f e 8 d 6 j E6 b c E4 E6 E7 g i h E3 4 2 E2 k m E7 E5 query a E1 a b x axis 0 2 E4 E5 E1 E2 E3 4 6 8 10 17 c d e f k j g h i l m CS4482 CityU of HK Retrospect: The Rationale Behind What is the main reasoning of depth-first and best-first algorithms? Use mindist to quantify the quality of the best point in a subtree. If a node’s mindist is already greater than our current result, prune it. y axis 10 l f e 8 d 6 j E6 b c E4 E6 E7 g i h E3 4 2 E2 k m E7 E5 E4 E5 E1 E2 E3 query a E1 a b c d e f k j g h i l m x axis 0 2 4 6 8 10 18 CS4482 CityU of HK
© Copyright 2024 Paperzz