Nearest Neighbor Queries
using R-trees
Based on notes by Yufei Tao
Nearest Neighbor Search
Find the object nearest to a query point q
E.g., find the gas station nearest to the red point.
k nearest neighbors: Find the k objects nearest to q
E.g., 1 NN = {h}, 2NN = {h, a}, 3NN = {h, a, i}
y axis
10
l
m
f
e
8
g
d
6
k
query
j
4
b
2
i
h
a
c
x axis
2
0
2
4
6
8
10
CS4482 CityU of HK
Nearest Neighbor Processing
The R-tree can accelerate NN search, too.
Concept: mindist(q, E)
The minimum distance between a point q and a rectangle E
y axis
mindist(q,E7)
10
E7
8
E6 E7
6
E6
4
query
a b
2
E4 E5
E1 E2 E3
c
d e f
k j
g h i
l
m
mindist(q,E6)
x axis
0
2
4
6
8
10
3
CS4482 CityU of HK
Depth-first NN Algorithm
First load the root and compute the mindist from each entry to the
query.
Visit the child of the entry with the smallest mindist.
In this case: E6
y axis
10
8
E6 E7
6
E7
E4 E5
E1 E2 E3
query
4
a b
2
c
d e
f
k
j
g h i
l
m
E6
x axis
0
2
4
6
8
10
4
CS4482 CityU of HK
Depth-first NN Algorithm (cont.)
Do this recursively at the next level. In the child node of E6, compute
the mindist from every entry to the query.
Visit the child node of the entry having the smallest mindist.
In this case, E1 and E2 have the same mindist.
So the decision is random – say, E1 first.
Among all the points in the child node of E1, find the closest point a
(our current result).
y axis
10
8
E6 E7
E2
6
E3
4
b
2
c
a
E1
a b
x axis
0
E4 E5
E1 E2 E3
query
2
4
6
8
10
c
5
d e
f
k
j
g h i
l
m
CS4482 CityU of HK
Depth-first NN Algorithm (cont.)
Then backtrack to the child node of E6, where the entry with the
next mindist value is E2.
Its mindist 51/2 is however the same as the distance from q to a.
So, we know that no point in E2 can possibly be closer to q than
a.
No result in E3 either – same reasoning.
y axis
10
8
E6 E7
E2
6
E3
query
E4 E5
E1 E2 E3
4
a
E1
2
a b
c
d e
f
k
j
g h i
l
m
x axis
0
2
4
6
8
10
6
CS4482 CityU of HK
Depth-first NN Algorithm (cont.)
We now backtrack to the root, where the entry with the next mindist is
E7.
Its mindist 21/2 closer than the distance 51/2 from q to a.
Thus, its subtree may contain some point whose distance to q is
smaller than the distance between q and a; so we have to visit it
At the child node of E7, compute the mindist of all entries to q.
E4 will be descended next.
y axis
10
E7
8
E5
E6 E7
E4
6
4
E4 E5
E1 E2 E3
E6
query
a
a b
c
d e
f
k
j
g h i
l
m
2
x axis
0
2
4
6
8
10
7
CS4482 CityU of HK
Depth-first NN Algorithm (cont.)
In the child node of E4, we find a point h that is closer to q than a.
So h becomes our new nearest neighbor.
We backtrack to the child node of E7, where the entry with the next
mindist is E5.
E5’s mindist 131/2 is larger than the distance 21/2 from q to a. So
we prune its subtree.
The algorithm backtracks to the root and terminates.
Visited (in this order) root, and the child nodes of E6, E1, E7, E4.
y axis
10
E7
8
E5
E4
6
i
h
E6
4
E6 E7
g
E4 E5
E1 E2 E3
query
a b
c
d e
f
k
j
g h i
l
m
2
x axis
0
2
4
6
8
10
8
CS4482 CityU of HK
Another Depth-first Example: 2 NN
Difference: entries must be pruned based on their distances to our 2nd
current NN.
Root => child node of E6 => child node of E1 => find {a, b} here
Backtrack to child node of E6 => child node of E2 (its mindist < dist(q,
b)) => update our result to {a, f}
Backtrack to child node of E6 => child node of E3 => backtrack to the
root => child node of E7 => child node of E4 => update our result to
{a, h}
Backtrack to child node of E7 => prune E5 => backtrack to the root =>
end. y axis
10
l
f
e
8
d
6
j E6
b
c
E4
E6 E7
g
i
h
E3
4
2
E2
k
m
E7 E5
query
a
E1
a b
x axis
0
2
E4 E5
E1 E2 E3
4
6
8
10
c
9
d e
f
k
j
g h i
l
m
CS4482 CityU of HK
Optimal Performance of kNN Search
What’s the best performance that can ever be achieved for a kNN?
Vicinity circle: Centered at query q, with radius equal to the distance
of q to its k-th NN
All nodes that intersect the vicinity circle must be visited.
Child node of E6 must be accessed by any algorithm.
Although there’s no result in its subtree, this cannot be verified
unless we visit it!
y axis
10
l
f
e
8
d
6
E3
j
4
b
2
E2
k
c
m
E7 E5
E4
i
h
E6
E6 E7
g
query
a
E1
a b
x axis
0
2
4
E4 E5
E1 E2 E3
6
8
10
10
c
d e
f
k
j
g h i
l
m
CS4482 CityU of HK
Best-first Algorithm (optimal algorithm)
BF maintains all the (leaf- and non-leaf) entries seen so far in the
memory, and sorts them in ascending order by their mindist.
Each step processes the entry in memory with the smallest mindist.
Action
Visit Root
Memory
Result
E 1 E 2
6
7
{empty}
y axis
10
E7
8
E6 E7
6
E6
4
E4 E5
E1 E2 E3
query
a b
c
d e
f
k
j
g h i
l
m
2
x axis
0
2
4
6
8
10
11
CS4482 CityU of HK
Best-first Algorithm (cont.)
Insert all the entries in the child node of E6 into the sorted list.
E7 is the next one to be processed.
Action
follow E6
Memory
Result
{empty}
E 2 E1 5 E2 5 E3 9
7
y axis
10
E7
8
E6 E7
E2
6
E3
4
a b
c
d e
f
k
j
g h i
l
m
E1
2
x axis
0
E4 E5
E1 E2 E3
query
E6
2
4
6
8
10
12
CS4482 CityU of HK
Best-first Algorithm (cont.)
Insert all the entries in the child node of E7 into the sorted list.
The next entry to be processed is E4.
Action
Memory
follow E7
Result
E 2 E1 5 E2 5 E3 9 E5 13
4
{empty}
y axis
10
E7 E5
8
E2
6
E6 E7
E4
E3
4
query
a b
c
d e
f
k
j
g h i
l
m
E1
2
x axis
0
E4 E5
E1 E2 E3
E6
2
4
6
8
10
13
CS4482 CityU of HK
Best-first Algorithm (cont.)
Insert all the entries in the child node of E4 into the sorted list.
The next entry to be processed is h, which is a leaf entry.
This is the first NN of q.
Action
Memory
Result
follow E4
h 2 E1 5 E2 5 E3 9 i 10 E5 13 g 13
{empty}
y axis
10
E7 E5
8
E2
6
E6
E6 E7
g
i
h
E3
4
E4
E4 E5
E1 E2 E3
query
a b
c
d e
f
k
j
g h i
l
m
E1
2
x axis
14
0
2
4
6
8
10
CS4482 CityU of HK
Best-first Algorithm: 2NN
Assume we want 2 NNs; then, the algorithm continues.
Report h as the 1st NN, and remove it from the heap
The next entry to be processed is E1
Action
Memory
Result
i 10 E5 13 g 13
E 5 E 5 E 9
1
2
3
remove h
{h}
y axis
10
E7 E5
8
E2
6
E6
E6 E7
g
i
h
E3
4
E4
E4 E5
E1 E2 E3
query
a b
c
d e
f
k
j
g h i
l
m
E1
2
x axis
0
2
4
6
8
10
15
CS4482 CityU of HK
Best-first Algorithm: 2NN (cont.)
Visit the child node of E1; enter all its entries into the sorted list.
The next entry is a, which is a leaf entry
The 2nd NN and the algorithm terminates.
Whenever we process a leaf entry in memory, it is the next NN for
sure.
Action
Memory
follow E1
a 5 E2 5 E3 9
Result
i 10 E5 13 g 13 b 13 c 18
{h}
y axis
10
E7 E5
8
E2
6
E6
b
2
c
E6 E7
g
i
h
E3
4
E4
E4 E5
E1 E2 E3
query
a
E1
a b
c
d e
f
k
j
g h i
l
m
x axis
16
0
2
4
6
8
10
CS4482 CityU of HK
Best-first = Best Performance
To find the 1st NN, we visited the root, and the child nodes of E6, E7,
E4.
To find the 2nd, in addition to the above 3 nodes, we also visited the
child node of E1.
Both cases are optimal.
It can be proved that BF visits the nodes in the tree in ascending
order of their mindist to the query point.
y axis
10
l
f
e
8
d
6
j E6
b
c
E4
E6 E7
g
i
h
E3
4
2
E2
k
m
E7 E5
query
a
E1
a b
x axis
0
2
E4 E5
E1 E2 E3
4
6
8
10
17
c
d e
f
k
j
g h i
l
m
CS4482 CityU of HK
Retrospect: The Rationale Behind
What is the main reasoning of depth-first and best-first algorithms?
Use mindist to quantify the quality of the best point in a subtree.
If a node’s mindist is already greater than our current result, prune it.
y axis
10
l
f
e
8
d
6
j E6
b
c
E4
E6 E7
g
i
h
E3
4
2
E2
k
m
E7 E5
E4 E5
E1 E2 E3
query
a
E1
a b
c
d e
f
k
j
g h i
l
m
x axis
0
2
4
6
8
10
18
CS4482 CityU of HK
© Copyright 2026 Paperzz