Data-dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Based on joint work with: Piotr Indyk, Huy Nguyen, Ilya Razenshteyn Nearest Neighbor Search (NNS) ο½ Preprocess: a set π of points ο½ Query: given a query point π, report a point πβ β π with the smallest distance to π πβ π 2 Motivation ο½ Generic setup: ο½ Points model objects (e.g. images) ο½ Distance models (dis)similarity measure ο½ Application areas: ο½ machine learning: k-NN rule ο½ speech/image/video/music recognition, vector quantization, bioinformatics, etcβ¦ ο½ Distances: ο½ Hamming, Euclidean, edit distance, earthmover distance, etcβ¦ ο½ Core primitive: closest pair, clustering, etcβ¦ 3 000000 011100 010100 000100 010100 011111 000000 001100 000100 000100 110100 111111 πβ π Curse of Dimensionality All exact algorithms degrade rapidly with the dimension π ο½ Algorithm Full indexing No indexing β linear scan 4 Query time π β log π π(π β π) π 1 Space ππ(π) (Voronoi diagram size) π(π β π) Approximate NNS π-near neighbor: given a query point π, report a point πβ² β π s.t. πβ² β π β€ ππ π ο½ ο½ as long as there is some point within distance π Practice: use for exact NNS ο½ ο½ 5 Filtering: gives a set of candidates (hopefully small) π ππ πβ π πβ² NNS algorithms Exponential dependence on dimension [Arya-Mountβ93], [Clarksonβ94], [Arya-Mount-Netanyahu-SilvermanWeβ98], [Kleinbergβ97], [Har-Peledβ02],[Arya-Fonseca-Mountβ11],β¦ ο½ Linear/poly dependence on dimension [Kushilevitz-Ostrovsky-Rabaniβ98], [Indyk-Motwaniβ98], [Indykβ98, β01], [Gionis-Indyk-Motwaniβ99], [Charikarβ02], [Datar-Immorlica-IndykMirrokniβ04], [Chakrabarti-Regevβ04], [Panigrahyβ06], [Ailon-Chazelleβ06], [A.-Indykβ06], [A.-Indyk-Nguyen-Razenshteynβ14], [A.-Razenshteynβ15], [Paghβ16],[Laarhovenβ16],β¦ ο½ 6 Locality-Sensitive Hashing [Indyk-Motwaniβ98] Random hash function β on π satisfying: π πβ² π π π for close pair: when π β π β€ π βnot-so-smallβ π1 = Pr[β(π) = β(π)] is βhighβ ο½ for far pair: when π β πβ² > ππ π2 = Pr[β(π) = β(πβ²)] is βsmallβ ο½ Use several hash tables: ππ , log 1/π1 where π = log 1/π2 Pr[π(π) = π(π)] 1 π1 π2 7 πβπ π ππ LSH Algorithms Space Time Exponent Hamming π1+π space ππ Euclidean π1+π space ππ π = 1/π π β₯ 1/π Reference π = 1/2 [IMβ98] [MNPβ06, OWZβ11] π = 1/π π = 1/2 [IMβ98, DIIMβ04] π β 1/π 2 π = 1/4 [AIβ06] π β₯ 1/π 2 8 π=π [MNPβ06, OWZβ11] LSH is tightβ¦ whatβs next? Lower bounds (cell probe) Datasets with additional structure [A.-Indyk-Patrascuβ06, [Clarksonβ99, Panigrahy-Talwar-Wiederβ08,β10, Karger-Ruhlβ02, Kapralov-Panigrahyβ12] Krauthgamer-Leeβ04, Beygelzimer-Kakade-Langfordβ06, Indyk-Naorβ07, Space-time trade-offs Dasgupta-Sinhaβ13, [Panigrahyβ06, Abdullah-A.-Krauthgamer-Kannanβ14,β¦] A.-Indykβ06] But are we really done with basic NNS algorithms? 9 Beyond Locality Sensitive Hashing Space Time Exponent Hamming space π1+π ππ π β₯ 1/π π1+π ππ Euclidean π1+π space ππ π1+π ππ 10 π = 1/π π=π Reference π = 1/2 [IMβ98] [MNPβ06, OWZβ11] π = 1/2 β π [AINRβ14] complicated 1 πβ 2π β 1 π = 1/3 [ARβ15] π β 1/π 2 π = 1/4 [AIβ06] π β₯ 1/π 2 LSH [MNPβ06, OWZβ11] complicated π = 1/4 β π [AINRβ14] 1 π = 1/7 [ARβ15] πβ 2 2π β 1 LSH New approach? Data-dependent hashing ο½ ο½ A random hash function, chosen after seeing the given dataset Efficiently computable 11 Construction of hash function ο½ Two components: ο½ ο½ ο½ Nice geometric structure Reduction to such structure has better LSH data-dependent Like a (weak) βregularity lemmaβ for a set of points 12 Nice geometric structure: average-case ο½ Think: random dataset on a sphere ο½ ο½ ο½ vectors perpendicular to each other s.t. random points at distance β ππ Lemma: π = ο½ 1 2π 2 β1 via Cap Carving ππ ππ/ 2 13 Reduction to nice structure ο½ Idea: iteratively decrease the radius of minimum enclosing ball ο½ Algorithm: ο½ find dense clusters ο½ ο½ ο½ ο½ recurse on dense clusters apply cap carving on the rest ο½ ο½ 14 with smaller radius large fraction of points recurse on each βcapβ eg, dense clusters might reappear Why ok? β’ no dense clusters β’ like βrandom datasetβ with radius=100ππ β’ even better! radius = 100ππ radius = 99ππ *picture not to scale & dimension Hash function ο½ Described by a tree (like a hash table) radius = 100ππ 15 *picture not to scale&dimension 2βπ π Dense clusters ο½ ο½ Current dataset: radius π A dense cluster: ο½ ο½ ο½ ο½ Contains π1βπΏ points Smaller radius: 1 β Ξ© π 2 π After we remove all clusters: ο½ For any point on the surface, there are at most π1βπΏ points within distance 2 β π π π trade-off πΏ trade-off ο½ The other points are essentially orthogonal ! When applying Cap Carving with parameters (π1 , π2 ): ο½ ο½ 1βπΏ Empirical number of far pts colliding with query: ππ2 + π As long as ππ2 β« π1βπΏ , the βimpurityβ doesnβt matter! ? Tree recap ο½ During query: ο½ ο½ ο½ ο½ Recurse in all clusters Just in one bucket in CapCarving Will look in >1 leaf! How much branching? ο½ ο½ πΏ Claim: at most π + 1 Each time we branch ο½ ο½ ο½ ο½ at most ππΏ clusters (+1) a cluster reduces radius by Ξ©(π 2 ) cluster-depth at most 100/Ξ© π 2 Progress in 2 ways: ο½ ο½ ο½ π(1/π2 ) Clusters reduce radius CapCarving nodes reduce the # of far points (empirical π2) A tree succeeds with probability β₯ 17 πΏ trade-off 1 β βπ(1) π 2π2β1 Beyond βBeyond LSHβ ο½ Practice: often optimize partition to your dataset ο½ ο½ ο½ PCA-tree, spectral hashing, etc [S91, McN01,VKD09, WTF08,β¦] no guarantees (performance or correctness) Theory: assume special structure in the dataset ο½ ο½ low intrinsic dimension [KRβ02, KLβ04, BKLβ06, INβ07, DSβ13,β¦] structure + noise [Abdullah-A.-Krauthgamer-Kannanβ14] Data-dependent hashing helps even when no a priori structure ! 18 Data-dependent hashing wrap-up ο½ Dynamicity? ο½ ο½ Dynamization techniques [Overmars-van Leeuwenβ81] Better bounds? ο½ ο½ For dimension π = π(log π), can get better π! [Laarhovenβ16] For π > log1+πΏ π: our π is optimal even for data-dependent hashing! [ARazenshteynβ??]: ο½ ο½ ο½ ο½ Practical variant [A-Indyk-Laarhoven-Razenshteyn-Schmidtβ15] NNS for ββ ο½ [Indykβ98] gets approximation π(log log π) (poly space, sublinear qt) ο½ ο½ ο½ ο½ in the right formalization (to rule out Voronoi diagram): description complexity of the hash function is π1βΞ©(1) Cf., ββ has no non-trivial sketch! Some matching lower bounds in the relevant model [ACPβ08, KPβ12] Can be thought of as data-dependent hashing NNS for any norm (eg, matrix norms, EMD) ? 19
© Copyright 2026 Paperzz