Document

Nearest Neighbor Search
(NNS)
Presented by: Mohsen Kamyar
Introduction



For a set of n points P = {p1,… ,pn} and metric
space X and distance function d ,preprocess P to
answer efficiently for every query point q that what
is the nearest neighbor of q in P.
Most of algorithms are good for low dimensions
and are like brute-force search in high dimensions.
These algorithms are of two types:

Low preprocessing and query time linear in n & d.
Introduction



Exponential preprocessing time nd and query time
sublinear in n and polynomial in d.
We can solve this problem efficiently only in
-approximate nearest neighbor.
We present two algorithm for this mode:


Preprocessing polynomial in n and d and sublinear query
time.
Processing time O(n)*O(1/  )d and query time polynomial
in log n and d.
Introduction

Applications:








Data compression
Database and data mining
Information retrieval
Image and video databases
Machine learning
Pattern recognition
Statistics and data analysis
…
Previous works

Samet: surveys a variety of data structures like k-d trees, R-trees and
… for this purpose. These methods are good only for 2 and 3
dimensions.


Dobkin and Lipton: query time O(2dlog n) and preprocessing
O( n 2 )
Clarkson: preprocessing O(n d / 2(1 ) ) and query time
d 1
O(2O ( d logd ) log n)


All previous algorithms suffer from a query time that is exponential
in d.
Meiser: query time O(d5log n) and preprocessing O(n d  )
Previous works

The “vantage point” technique is a popular heuristic but all
heuristics are exponential in high dimensions.
The problem is solved efficiently in approximate mode.

Arya and Mount: preprocessing

O(1 /  ) d O(log n)


O(1 /  ) d O(n) and query time
Clarkson and Chan: reduce dependence on to  ( d 1) / 2
Arya, Mount, Netanyahu, Silverman and Wu: optimal
preprocessing time O (n) and query time O(d d ) .

Kleinberg: preprocessing O(n log d)2d and query time polynomial in
d, ,log n.
Previous works

Kleinberg: preprocessing polynomial in d,  and n and query time
O(n  d log 3 n)

Dolev, Harari and Parnas: for Hamming space {0,1}d that
gives all point with distance r from query point q with query time and
preprocessing time exponential.

Greene, Parnas, Yao: for a same problem but binary data
chosen uniformly at random with preprocessing time O(dn1 r / d )
and query time O(dn r / d ) .
Our solution


We reduce  –NNS to  –PLEB( point location in
equal balls )
We give two solutions for –PLEB


Based on bucketing algorithm
Based on locality sensitive hashing
Reduction to PLEB (some definitions)

For any metric space M=(X,d) and for any
we have:
P X

d ( p, P)  min qP d ( p, q)

( P)  max p ,qP d ( p, q) (diameter of P)

B( p, r )  {q  X | d ( p, q)  r} a ball centered at p with radius r

R( p, r1, r2 )  B( p, r2 )  B( p, r1 )  {q  X | r1  d ( p, q)  r2} a
ring centered at p.
Reduction to PLEB




Problem definition: for n balls with radius r and
centered at n points devise a data structure that
determine the ball that query point q is in it.
 -PLEB: if there is a ci that d (q, ci )  (1   )r then
return ci (for a point query q).
We know that reduction from PLEB to NNS is very
easy.
We do reduction from NNS to PLEB using a data
structure called ring-cover tree.
Reduction to PLEB

There is a simple but inefficient reduction

Let R be the ratio of largest and smallest distance in P.
For each l {(1   )0 , (1   )1 ,..., R}generate a sequence of
balls B l  {B1l ,..., Bnl }. Each set is a instance of PLEB.
Then for query point q we find the minimum l that there
is ball containing q. query overhead is O(log log R) and
storage overhead is O(log R).
Reduction to PLEB (some definitions)

R( p, r1 , r2 ) is
an
(1 , 2 ,  ) - ring separator
| P  B( p, r1 ) | 1 | P |


for P if
and | P \ B( p, r2 ) | 2 | P | where
r2 / r1    1, 1 ,  2  0 .
S  P is a ( ,  ) - cluster for P if for every p  S , | P  B ( p,
( S )) |  | P | where 0   ,   1 .
A sequence A1,…,Al is a (b,c,d)-cover for S  P if
there exists r  d( A) for A   i Ai such that S  Aand
P  ( pA B( p, r ))  b Ai and Ai  c P where b  1,0  c  1, d  0
i
Reduction to PLEB

Theorem: for any P,0    1, and   1 :



P has an ( ,  ,  ) - ring separator or
P contains a ( 1 2 ,  ) - cluster of size at least (1  2 ) | P | .
Theorem: Let S be a ( ,  ) - cluster for P. Then for any
b there is an algorithm for computing a collection

) - cover
of sets that is(ba,  ,
for S.
(1   ) log b n
An algorithm for computing cover
Algorithm Cover : S  P  R (q, r ,  rq );
r
 ( S )
log b n
; j  0;
repeat
j  j  1; choose some p j  S ; B1j  { p j };
i  1;
while | P   qB i B (q, r ) | b | B ij | do
j
B ij1  P   qB i B (q, r );
j
i  i 1
endwhile;
A j  B ij ; S  S  A j ; P  P  A j
until S   ;
k  j.
Reduction to PLEB

Corollary: for any


P,0    1,   1, b  1
P has an ( ,  ,  ) - ring separator R( p, r , r ) ,or
There is a (b,  , d ) - cover for some S  P such that
1
| S | (1  2 )n and d 
.
(2   1) log b n
Constructing Ring-Cover Trees


We split any node to some subsets and do it
recursively based on two last corollaries.
There are two cases depending on two properties.
1
1
1  1 log n
Let   2(1  ), b  1  2 , and  
.



log n
2
Call P a ring node and its children are S1  P  B( p, r )
and S2  P  B( p, r ) . We store some information about
ring separator R.
Call P a cover node. Subsets are Si  P   pA B( p, r )
and S0  S  A . The information stored at P is as follows.
i
Constructing Ring-Cover Trees


Let r0  (1  1  )( A) and ri  r0 (1   )i for i  {1,..., k}
(1  1  ) log b n
where k  log 1
 1 . For each ri generate

an instance of PLEB with balls B(p,ri) and stores all
instances in P.
The ring-cover tree can be constructed in O(n2).
Search procedure

If P is a ring node with an ( ,  ,  ) - ring separator R( p, r , r )
then:



If q  B( p, r (1  1  )) then return Search(q,S1)
Else compute p  Search(q,S2); return min q ( p, p)
If P is a cover node with a (b,c,d)-cover A1,…,Al of
radius r for S  P then:

If q  B(a, r0 ) then compute p=Search(q,P-A), choose
any a  A and return min q ( p, a) ;
Search procedure


Else if q  B(a, r0 ) for somea  A but q  B(a, rk ) for all
a  A then using binary search on risi find an  - NN p of
q in A, compute
min q ( p, p)
p  Search(q,P-A), and return
;
a  Ai
q  B(a, rk )
Else if
for some
then return
Search(q,Si).
Analysis of Ring-Cover Trees



Depth of the ring-cover tree is O(log 1 2 n)  O(log 2 n) .
Number of PLEB queries is O(log 2 n  log k )
Needed storage is O(knblog n (1  2(1  2 )) logn ) 
1 2
O(npoly log n)
Point Location

We have two algorithms


Bucketing method: works for all lp norm
Locality-sensitive hashing: works for Hamming spaces
O(d )
d
 We can reduce every l to l
for any p  1,2 .
1
p

We can reduce every  - NN for l1d to  - PLEB in Hm where
m  d log b n  max( 1  ,  )
Point Location( Bucketing method )






Assume a grid with spacing s   / d .
Bi is the set of intersecting grid cells with Bi.
Store all elements of  B in a hash table with
i
i
information about corresponding balls.
For each query point q only compute the
corresponding grid cell.
Hash function:h(( x1 ,..., xd ))  (a1 x1  ...  ad xdmod P)mod
M
Preprocessing time is O(n)  O(1  )d.
Point Location (Locality-Sensitive Hashing)

A family   {h : S  U } is called (r1,r2,p1,p2)-sensitive
for D if for any q, p  S





If p  B(q, r1 ) then Pr [h(q)  h( p)]  p1 ,
If p  B(q, r2 ) then Pr [h(q)  h( p)]  p2 .
Define family   {g : S  U k } such that
g(p)=(h1(p),…,hk(p)) where hi   .
For an integer l choose l functions g1 to gl randomly.
Store each point in bucket gi(p).
Point Location (Locality-Sensitive Hashing)

For query point q search all buckets g1(q) to gl(q) and
store first 2l points then for each of them check if
p j  B(q, r2 )

We choose k and l such that following properties be
true and hence the algorithm work correctly


If there exists p*  B(q, r1 ) then g j ( p* )  g j (q) for some
j=1,…,l
Collisions of q with points of P-B(q,r2) is less than 2l
l
1
|
(
P

B
(
q
,
r
))

g

2
j ( g j (q)) |  2l
j 1
Point Location (Locality-Sensitive Hashing)

Set k  log 1 p n and l  n  .
Let S=Hd and D(p,q) be the hamming metric then
for r ,   0 the family H={hi:hi((b1,…,bd))=bi, i=1…n}

is (r , r (1   ),1  r ,1  r (1   ) )-sensitive.
d
d
This algorithm uses O(dn  n11 (1 ) ) space and the
query needs O(n1 (1 ) ) hash evaluation and each
evaluation needs O(d ) .

2