Optimal Data-Dependent Hashing for Nearest Neighbor Search Alex Andoni (Columbia University) Joint work with: Ilya Razenshteyn Nearest Neighbor Search (NNS) ο½ Preprocess: a set π of points ο½ Query: given a query point π, report a point πβ β π with the smallest distance to π πβ π 2 Motivation ο½ Generic setup: ο½ Points model objects (e.g. images) ο½ Distance models (dis)similarity measure ο½ Application areas: ο½ machine learning: k-NN rule ο½ speech/image/video/music recognition, vector quantization, bioinformatics, etcβ¦ ο½ Distances: ο½ Hamming, Euclidean, edit distance, earthmover distance, etcβ¦ ο½ Core primitive for: closest pair, clustering β¦ 3 000000 011100 010100 000100 010100 011111 000000 001100 000100 000100 110100 111111 πβ π Curse of Dimensionality All exact algorithms degrade rapidly with the dimension π ο½ Algorithm Full indexing No indexing β linear scan 4 Query time π β log π π(π β π) π 1 Space ππ(π) (Voronoi diagram size) π(π β π) Approximate NNS π-near neighbor: given a query point π, report a point πβ² β π s.t. πβ² β π β€ ππ π ο½ ο½ as long as there is some point within distance π Practice: use for exact NNS ο½ ο½ 5 Filtering: gives a set of candidates (hopefully small) π ππ πβ π πβ² NNS algorithms Exponential dependence on dimension [Arya-Mountβ93], [Clarksonβ94], [Arya-Mount-Netanyahu-SilvermanWeβ98], [Kleinbergβ97], [Har-Peledβ02],[Arya-Fonseca-Mountβ11],β¦ ο½ Linear dependence on dimension [Kushilevitz-Ostrovsky-Rabaniβ98], [Indyk-Motwaniβ98], [Indykβ98, β01], [Gionis-Indyk-Motwaniβ99], [Charikarβ02], [Datar-Immorlica-IndykMirrokniβ04], [Chakrabarti-Regevβ04], [Panigrahyβ06], [Ailon-Chazelleβ06], [A.-Indykβ06], [A.-Indyk-Nguyen-Razenshteynβ14], [A.-Razenshteynβ15], [Paghβ16],β¦ ο½ 6 Locality-Sensitive Hashing [Indyk-Motwaniβ98] Random hash function β on π satisfying: π πβ² π π π for close pair: when π β π β€ π βnot-so-smallβ π1 = Pr[β(π) = β(π)] is βhighβ ο½ for far pair: when π β πβ² > ππ π2 = Pr[β(π) = β(πβ²)] is βsmallβ ο½ Use several hash tables: ππ, log 1/π1 where π = log 1/π2 Pr[π(π) = π(π)] 1 π1 π2 7 πβπ π ππ LSH Algorithms Space Time Exponent Hamming π1+π space ππ Euclidean π1+π space ππ π = 1/π π β₯ 1/π Reference π = 1/2 [IMβ98] [MNPβ06, OWZβ11] π = 1/π π = 1/2 [IMβ98, DIIMβ04] π β 1/π 2 π = 1/4 [AIβ06] π β₯ 1/π 2 8 π=π [MNPβ06, OWZβ11] LSH is tight. Are we really done with basic NNS algorithms? Detour: Closest Pair Problem Closest Pair Problem (bichromatic) in 0,1 π : ο½ ο½ given π΄, π΅, find π β π΄, π β π΅ at distance β€ ππ assuming there is a pair at distance β€ π Via LSH: ο½ ο½ ο½ π =1+π Preprocess π΄, and then query each π β π΅ Gives π2βΞ(π) time [Alman-Williamsβ15]: ο½ ο½ exact in π(π 2βπΏ ) for π < π(log2 1/πΏ) πΏ β log π [Valiantβ12, Karppa-Kaski-Kohonenβ16]: ο½ ο½ ο½ 9 Average case: π2π/3+ππ(1) for π β« log π Worst-case: π2βΞ( π) Points: random β 0,1 π at distance ππ = π/2 Close pair: random at π/2 distance π = π Much more efficient algorithms for π = π(log π) or for average-case Beyond Locality Sensitive Hashing Space Time Exponent 1+π π Hamming ππ π = 1/π π β₯ 1/π space π=π Reference π = 1/2 [IMβ98] [MNPβ06, OWZβ11] π1+π ππ complicated π = 1/2 β π [AINRβ14] 1 π = 1/3 [ARβ15] πβ 2π β 1 Euclidean π1+π space ππ π β 1/π 2 π1+π ππ 10 π = 1/4 [AIβ06] [MNPβ06, OWZβ11] π β₯ 1/π 2 complicated π = 1/4 β π [AINRβ14] 1 π = 1/7 [ARβ15] πβ 2 2π β 1 LSH LSH New approach? Data-dependent hashing ο½ ο½ A random hash function, chosen after seeing the given dataset Efficiently computable 11 A look at LSH lower bounds ο½ LSH lower bounds in Hamming space ο½ ο½ Fourier analytic [Motwani-Naor-Panigrahyβ06] ο½ ο½ ο½ ο½ π» distribution over hash functions β: 0,1 π β π πΏπ Far pair: π, π random, distance = π/2 π/2 πΏπ Close pair: π, π random at distance = π π Get π β₯ 0.5/π π β₯ 1/π 12 [OβDonnell-Wu-Zhouβ11] Points random β 0,1 π at distance ππ = π/2 Close pair: distance π/2π at distance π = π/2π Why not NNS lower bound? ο½ Suppose we try to apply [OWZβ11] to NNS ο½ ο½ ο½ 13 Pick random π All the βfar pointβ are concentrated in a small ball of radius ππ Easy to see at preprocessing: actual near neighbor close to the center of the minimum enclosing ball Construction of hash function ο½ Two components: ο½ ο½ ο½ Average-case: nicer structure Reduction to such structure has better LSH data-dependent Like a (very weak) βregularity lemmaβ for a set of points 14 Average-case: nicer structure [A.-Indyk-Nguyen-Razenshteynβ14] ο½ Think: random dataset on a sphere ο½ ο½ ο½ vectors perpendicular to each other s.t. two random points at distance β ππ Lemma: π = ο½ 1 2π 2 β1 ππ via Cap Carving ππ/ 2 15 Reduction to nice structure ο½ Idea: iteratively decrease the radius of minimum enclosing ball ο½ Algorithm: ο½ find dense clusters ο½ ο½ ο½ ο½ recurse on dense clusters apply cap carving on the rest ο½ ο½ 16 with smaller radius large fraction of points recurse on each βcapβ eg, dense clusters might reappear Why ok? β’ no dense clusters β’ like βrandom datasetβ with radius=100ππ β’ even better! radius = 100ππ radius = 99ππ *picture not to scale & dimension Hash function ο½ Described by a tree (like a hash table) radius = 100ππ 17 *picture not to scale&dimension 2βπ π Dense clusters ο½ ο½ Current dataset: radius π A dense cluster: ο½ ο½ ο½ After we remove all clusters: ο½ ο½ ο½ Contains π1βπΏ points Smaller radius: 1 β Ξ© π 2 π For any point on the surface, there are at most π1βπΏ points within distance 2 β π π π trade-off πΏ trade-off The other points are essentially orthogonal ! When applying Cap Carving with parameters (π1 , π2 ): ο½ ο½ + π1βπΏ Empirical number of far pts colliding with query: ππ2 As long as ππ2 β« π1βπΏ , the βimpurityβ doesnβt matter! ? Tree recap ο½ During query: ο½ ο½ ο½ ο½ Recurse in all clusters Just in one bucket in CapCarving Will look in >1 leaf! How much branching? ο½ ο½ πΏ Claim: at most π + 1 Each time we branch ο½ ο½ ο½ ο½ at most ππΏ clusters (+1) a cluster reduces radius by Ξ©(π 2 ) cluster-depth at most 100/Ξ© π 2 Progress in 2 ways: ο½ ο½ ο½ π(1/π2 ) Clusters reduce radius CapCarving nodes reduce the # of far points (empirical π2 ) A tree succeeds with probability β₯ 19 πΏ trade-off 1 β βπ(1) π 2π2 β1 Fast preprocessing ο½ How to find the dense clusters fast? ο½ Step 1: reduce to π(π2 ) time. ο½ ο½ Enough to consider centers that are data points Step 2: reduce to near-linear time. ο½ ο½ ο½ 20 Down-sample! Ok because we want clusters of size π1βπΏ After downsampling by a factor of π, a cluster is still somewhat heavy. 2βπ π Even more details ο½ Analysis: ο½ ο½ Instead of working with βprobability of collision with far pointβ π2 , work with βempirical estimateβ (the actual number) A little delicate: interplay with βprobability of collision with close pointβ, π1 ο½ ο½ ο½ ο½ Empirical π2 important only for the bucket where the query falls into Need to condition: on collision with the close point β 3-point collision analysis for LSH In dense clusters, points may appear inside the balls ο½ ο½ 21 whereas CapCarving works for points on the sphere need to partition balls into thin shells (introduces more branching) Data-dependent hashing: beyond LSH ο½ A reduction from worst-case to average-case ο½ ο½ 1 β NNS with query ππ , where π = 2π 2β1 Better bounds? ο½ For dimension π = π(log π), can get better π! [Laarhovenβ16] ο½ ο½ For π > log1+πΏ π: above π is optimal even for data-dependent hashing! [ARazenshteynβ??]: ο½ ο½ ο½ ο½ in the right formalization (to rule out Voronoi diagram): description complexity of the hash function is π1βΞ©(1) Better reduction? NNS for ββ ο½ [Indykβ98] gets approximation π(log log π) (poly space, sublinear qt) ο½ ο½ ο½ like it is the case for closest pair [Alman-Williamsβ15] Matching lower bounds in the relevant model [ACPβ08, KPβ12] Can be thought of as data-dependent hashing NNS for any norm (eg, matrix norms, EMD) ? 22 Lower bounds: an opportunity to discover new algorithmic techniques
© Copyright 2026 Paperzz