slides2 - Cs.ucr.edu

Nearest Neighbor Queries
using R-trees
Based on notes by Yufei Tao
Nearest Neighbor Search
 Find the object nearest to a query point q
 E.g., find the gas station nearest to the red point.
 k nearest neighbors: Find the k objects nearest to q
 E.g., 1 NN = {h}, 2NN = {h, a}, 3NN = {h, a, i}
y axis
10
l
m
f
e
8
g
d
6
k
query
j
4
b
2
i
h
a
c
x axis
2
0
2
4
6
8
10
CS4482 CityU of HK
Nearest Neighbor Processing
 The R-tree can accelerate NN search, too.
 Concept: mindist(q, E)
 The minimum distance between a point q and a rectangle E
y axis
mindist(q,E7)
10
E7
8
E6 E7
6
E6
4
query
a b
2
E4 E5
E1 E2 E3
c
d e f
k j
g h i
l
m
mindist(q,E6)
x axis
0
2
4
6
8
10
3
CS4482 CityU of HK
Depth-first NN Algorithm
 First load the root and compute the mindist from each entry to the
query.
 Visit the child of the entry with the smallest mindist.
 In this case: E6
y axis
10
8
E6 E7
6
E7
E4 E5
E1 E2 E3
query
4
a b
2
c
d e
f
k
j
g h i
l
m
E6
x axis
0
2
4
6
8
10
4
CS4482 CityU of HK
Depth-first NN Algorithm (cont.)
 Do this recursively at the next level. In the child node of E6, compute
the mindist from every entry to the query.
 Visit the child node of the entry having the smallest mindist.
 In this case, E1 and E2 have the same mindist.
 So the decision is random – say, E1 first.
 Among all the points in the child node of E1, find the closest point a
(our current result).
y axis
10
8
E6 E7
E2
6
E3
4
b
2
c
a
E1
a b
x axis
0
E4 E5
E1 E2 E3
query
2
4
6
8
10
c
5
d e
f
k
j
g h i
l
m
CS4482 CityU of HK
Depth-first NN Algorithm (cont.)
 Then backtrack to the child node of E6, where the entry with the
next mindist value is E2.
 Its mindist 51/2 is however the same as the distance from q to a.
 So, we know that no point in E2 can possibly be closer to q than
a.
 No result in E3 either – same reasoning.
y axis
10
8
E6 E7
E2
6
E3
query
E4 E5
E1 E2 E3
4
a
E1
2
a b
c
d e
f
k
j
g h i
l
m
x axis
0
2
4
6
8
10
6
CS4482 CityU of HK
Depth-first NN Algorithm (cont.)
 We now backtrack to the root, where the entry with the next mindist is
E7.
 Its mindist 21/2 closer than the distance 51/2 from q to a.
 Thus, its subtree may contain some point whose distance to q is
smaller than the distance between q and a; so we have to visit it
 At the child node of E7, compute the mindist of all entries to q.
 E4 will be descended next.
y axis
10
E7
8
E5
E6 E7
E4
6
4
E4 E5
E1 E2 E3
E6
query
a
a b
c
d e
f
k
j
g h i
l
m
2
x axis
0
2
4
6
8
10
7
CS4482 CityU of HK
Depth-first NN Algorithm (cont.)
 In the child node of E4, we find a point h that is closer to q than a.
 So h becomes our new nearest neighbor.
 We backtrack to the child node of E7, where the entry with the next
mindist is E5.
 E5’s mindist 131/2 is larger than the distance 21/2 from q to a. So
we prune its subtree.
 The algorithm backtracks to the root and terminates.
 Visited (in this order) root, and the child nodes of E6, E1, E7, E4.
y axis
10
E7
8
E5
E4
6
i
h
E6
4
E6 E7
g
E4 E5
E1 E2 E3
query
a b
c
d e
f
k
j
g h i
l
m
2
x axis
0
2
4
6
8
10
8
CS4482 CityU of HK
Another Depth-first Example: 2 NN
 Difference: entries must be pruned based on their distances to our 2nd
current NN.
 Root => child node of E6 => child node of E1 => find {a, b} here
 Backtrack to child node of E6 => child node of E2 (its mindist < dist(q,
b)) => update our result to {a, f}
 Backtrack to child node of E6 => child node of E3 => backtrack to the
root => child node of E7 => child node of E4 => update our result to
{a, h}
 Backtrack to child node of E7 => prune E5 => backtrack to the root =>
end. y axis
10
l
f
e
8
d
6
j E6
b
c
E4
E6 E7
g
i
h
E3
4
2
E2
k
m
E7 E5
query
a
E1
a b
x axis
0
2
E4 E5
E1 E2 E3
4
6
8
10
c
9
d e
f
k
j
g h i
l
m
CS4482 CityU of HK
Optimal Performance of kNN Search
 What’s the best performance that can ever be achieved for a kNN?
 Vicinity circle: Centered at query q, with radius equal to the distance
of q to its k-th NN
 All nodes that intersect the vicinity circle must be visited.
 Child node of E6 must be accessed by any algorithm.
 Although there’s no result in its subtree, this cannot be verified
unless we visit it!
y axis
10
l
f
e
8
d
6
E3
j
4
b
2
E2
k
c
m
E7 E5
E4
i
h
E6
E6 E7
g
query
a
E1
a b
x axis
0
2
4
E4 E5
E1 E2 E3
6
8
10
10
c
d e
f
k
j
g h i
l
m
CS4482 CityU of HK
Best-first Algorithm (optimal algorithm)
 BF maintains all the (leaf- and non-leaf) entries seen so far in the
memory, and sorts them in ascending order by their mindist.
 Each step processes the entry in memory with the smallest mindist.
Action
Visit Root
Memory
Result
E 1 E 2
6
7
{empty}
y axis
10
E7
8
E6 E7
6
E6
4
E4 E5
E1 E2 E3
query
a b
c
d e
f
k
j
g h i
l
m
2
x axis
0
2
4
6
8
10
11
CS4482 CityU of HK
Best-first Algorithm (cont.)
 Insert all the entries in the child node of E6 into the sorted list.
 E7 is the next one to be processed.
Action
follow E6
Memory
Result
{empty}
E 2 E1 5 E2 5 E3 9
7
y axis
10
E7
8
E6 E7
E2
6
E3
4
a b
c
d e
f
k
j
g h i
l
m
E1
2
x axis
0
E4 E5
E1 E2 E3
query
E6
2
4
6
8
10
12
CS4482 CityU of HK
Best-first Algorithm (cont.)
 Insert all the entries in the child node of E7 into the sorted list.
 The next entry to be processed is E4.
Action
Memory
follow E7
Result
E 2 E1 5 E2 5 E3 9 E5 13
4
{empty}
y axis
10
E7 E5
8
E2
6
E6 E7
E4
E3
4
query
a b
c
d e
f
k
j
g h i
l
m
E1
2
x axis
0
E4 E5
E1 E2 E3
E6
2
4
6
8
10
13
CS4482 CityU of HK
Best-first Algorithm (cont.)
 Insert all the entries in the child node of E4 into the sorted list.
 The next entry to be processed is h, which is a leaf entry.
 This is the first NN of q.
Action
Memory
Result
follow E4
h 2 E1 5 E2 5 E3 9 i 10 E5 13 g 13
{empty}
y axis
10
E7 E5
8
E2
6
E6
E6 E7
g
i
h
E3
4
E4
E4 E5
E1 E2 E3
query
a b
c
d e
f
k
j
g h i
l
m
E1
2
x axis
14
0
2
4
6
8
10
CS4482 CityU of HK
Best-first Algorithm: 2NN
 Assume we want 2 NNs; then, the algorithm continues.
 Report h as the 1st NN, and remove it from the heap
 The next entry to be processed is E1
Action
Memory
Result
i 10 E5 13 g 13
E 5 E 5 E 9
1
2
3
remove h
{h}
y axis
10
E7 E5
8
E2
6
E6
E6 E7
g
i
h
E3
4
E4
E4 E5
E1 E2 E3
query
a b
c
d e
f
k
j
g h i
l
m
E1
2
x axis
0
2
4
6
8
10
15
CS4482 CityU of HK
Best-first Algorithm: 2NN (cont.)




Visit the child node of E1; enter all its entries into the sorted list.
The next entry is a, which is a leaf entry
The 2nd NN and the algorithm terminates.
Whenever we process a leaf entry in memory, it is the next NN for
sure.
Action
Memory
follow E1
a 5 E2 5 E3 9
Result
i 10 E5 13 g 13 b 13 c 18
{h}
y axis
10
E7 E5
8
E2
6
E6
b
2
c
E6 E7
g
i
h
E3
4
E4
E4 E5
E1 E2 E3
query
a
E1
a b
c
d e
f
k
j
g h i
l
m
x axis
16
0
2
4
6
8
10
CS4482 CityU of HK
Best-first = Best Performance
 To find the 1st NN, we visited the root, and the child nodes of E6, E7,
E4.
 To find the 2nd, in addition to the above 3 nodes, we also visited the
child node of E1.
 Both cases are optimal.
 It can be proved that BF visits the nodes in the tree in ascending
order of their mindist to the query point.
y axis
10
l
f
e
8
d
6
j E6
b
c
E4
E6 E7
g
i
h
E3
4
2
E2
k
m
E7 E5
query
a
E1
a b
x axis
0
2
E4 E5
E1 E2 E3
4
6
8
10
17
c
d e
f
k
j
g h i
l
m
CS4482 CityU of HK
Retrospect: The Rationale Behind
 What is the main reasoning of depth-first and best-first algorithms?
 Use mindist to quantify the quality of the best point in a subtree.
 If a node’s mindist is already greater than our current result, prune it.
y axis
10
l
f
e
8
d
6
j E6
b
c
E4
E6 E7
g
i
h
E3
4
2
E2
k
m
E7 E5
E4 E5
E1 E2 E3
query
a
E1
a b
c
d e
f
k
j
g h i
l
m
x axis
0
2
4
6
8
10
18
CS4482 CityU of HK