Beyond Locality-Sensitive Hashing Alex Andoni (Columbia University) Based on joint work with: Piotr Indyk, Thijs Laarhoven, Huy Nguyen, Sasho Nikolov, Ilya Razenshteyn, Ludwig Schmidt, Negev Shekel-Nosatzki, Erik Waingarten Nearest Neighbor Search (NNS) ο½ Preprocess: a set π of points ο½ Query: given a query point π, report a point πβ β π with the smallest distance to π πβ π 2 Motivation ο½ Generic setup: 000000 011100 010100 000100 010100 011111 ο½ Points model objects (e.g. images) ο½ Distance models (dis)similarity measure ο½ Applications: Goal: k-NN rule ο½ machine learning: 0.99 ο½ in many areasβ¦Sublinear time: π π Polynomial space: π ο½ Distances: ο½ 1 Hamming, Euclidean, ββ edit distance, earthmover distance, etcβ¦ ο½ Core primitive: closest pair, clustering, etcβ¦ 3 000000 001100 000100 000100 110100 111111 πβ π Near Neighbor Search π-near neighbor: given a query point π, report a point πβ² β π s.t. πβ² β π β€ ππ π ο½ ο½ as long as there is some point within distance π π ππ πβ π πβ² 4 Itβs all about space decompositions Approximate space decompositions [Arya-Mountβ93], [Clarksonβ94], [AryaMount-Netanyahu-Silverman-Weβ98], [Kleinbergβ97], [Har-Peledβ02],[AryaFonseca-Mountβ11],β¦ ο½ ~2π factors Random decompositions == hashing [Kushilevitz-Ostrovsky-Rabaniβ98], [IndykMotwaniβ98], [Indykβ01], [Gionis-IndykMotwaniβ99], [Charikarβ02], [DatarImmorlica-Indyk-Mirrokniβ04], [ChakrabartiRegevβ04], [Panigrahyβ06], [AilonChazelleβ06], [A.-Indykβ06], [TerasawaTanakaβ07], [Kapralovβ15], [Paghβ16], [BeckerDucas-Gama-Laarhovenβ16], [Christianiβ16], [Ahle-Aumuhler-Paghβ17], β¦ ο½ 5 π π πβ² Locality Sensitive Hashing [IMβ98] Hash: project on random bits! LSH until ~β14 Space Time Exponent Hamming π1+π space ππ Euclidean space ππ π1+π π = 1/π π β₯ 1/π π=π Reference π = 1/2 [IMβ98] [MNPβ06, OWZβ11] π = 1/π π = 1/2 [IMβ98, DIIMβ04] π β 1/π 2 π = 1/4 [AIβ06] π β₯ 1/π 2 Lower bounds (cell probe) [A.-Indyk-Patrascuβ06, Panigrahy-Talwar-Wiederβ08,β10, Kapralov-Panigrahyβ12] Space-time trade-offs [MNPβ06, OWZβ11] Datasets with additional structure [Clarksonβ99, Karger-Ruhlβ02, Krauthgamer-Leeβ04, Beygelzimer-Kakade-Langfordβ06, Indyk-Naorβ07, Dasgupta-Sinhaβ13, Abdullah-A.-Krauthgamer-Kannanβ14,β¦] [Panigrahyβ06, A.-Indykβ06, Kapralovβ15] Post-LSH Space Time Exponent Hamming space π1+π ππ π = 1/π π β₯ 1/π π1+π ππ π=π Reference π = 1/2 [IMβ98] [MNPβ06, OWZβ11] LSH complicated π = 1/2 β π [AINRβ14] 1 π = 1/3 [ARβ15] πβ 2π β 1 Random hashing is not optimal! Euclidean π1+π space ππ π1+π ππ 7 π β 1/π 2 π = 1/4 [AIβ06] [MNPβ06, OWZβ11] π β₯ 1/π 2 complicated π = 1/4 β π [AINRβ14] 1 π = 1/7 [ARβ15] πβ 2 2π β 1 LSH Data-dependent hashing A random space partition, chosen after seeing the given dataset Efficientβ¦ ο½ ο½ 8 Optimal hashing* algorithm for time-space trade-off [A-Laarhoven-Razenshteyn-Weingartenβ17] Algorithm: achieves π-approximate NNS with: ο½ ο½ Space: π1+ππ’+π 1 Query time: πππ +π(1) ο½ As long as π 2 ππ + π 2 β 1 ο½ Proof plan: ο½ ο½ ο½ ο½ Basic algorithm (=LSH) Data-dependent hashing Trading-off space & time ππ’ = 2π 2 β 1 π1.77β¦ space ππ(1) query time Lower bounds ο½ ο½ ο½ 9 Optimal if hashing* Unconditional for small q.t. π1.14β¦ π0.14β¦ π=2 best LSH [A-Indykβ06] π1+π(1) π0.43β¦ Setup for algorithm ο½ All points (dataset, query) are on the surface of a unit sphere ο½ ο½ Wlog: look at the dataset from far away Dimension: π = log1.1 π ο½ 10 Wlog: [Johnson-Lindenstraussβ84] Basic Algorithm (with oracle) ο½ Preprocessing ο½ ο½ ο½ ο½ How to pick π΄π βs? ο½ ο½ ο½ Pick π random parts π΄π Form subsets ππ = π β© π΄π Store non-empty ππ βs π and π: parameters to be chosen later Sample Gaussian vector π§π π΄π = π: π, π§π β₯ π Query ο½ ο½ 11 Retrieve all the sets ππ s.t. π β π΄π (i.e., π, π§π β₯ π) Inside the retrieved ππ βs, search a point within ππ from π πΌ: s.t. Pr far π β π΄π π β π΄π ] = 1/π 1 Space: πβ . Analysis Key quantity: Pr π β π΄π π β π΄π ] ο½ Query time: Matches best LSH [AIβ06] ο½ ο½ 1 π 1+ 2 +π(1) space π query time π π , 1 +π(1) π2 π Pr π β π΄π π β π΄π ] ο½ Pr close πβπ΄π πβπ΄π ] 1 . Pr close πβπ΄π πβπ΄π ] 1 ππ ππ 1 where π π, π β€ π 2 + π 1 1 π π=0 Related partitions ο½ Similar to ο½ ο½ ο½ Locality Sensitive Filters [Becker, Ducas, Gama, Laarhoven 2016] cell-probe algorithm of [Panigrahy, Talwar, Wieder 2010] Related (or sometimes equivalent) partitioning procedures: ο½ ο½ ο½ ο½ ο½ 13 Approximation algorithms [Goemans, Williamson 1995], [Karger, Motwani, Sudan 1995], [Charikar, Chekuri, Goel, Guha, Plotkin 1998], [Chlamtac, Makarychev, Makarychev 2006], [Louis, Makarychev 2014] Spectral graph partitioning [Lee, Oveis Gharan, Trevisan 2012], [Louis, Raghavendra, Tetali, Vempala 2012] Spherical cubes [Kindler, OβDonnell, Rao, Wigderson 2008] Metric embeddings [Fakcharoenphol, Rao, Talwar 2003], [Mendel, Naor 2005] Communication complexity [Bogdanov, Mossel 2011], [Canonne, Guruswami, Meka, Sudan 2015] How to implement the oracle? Retrieve all the caps such that π, π§π β₯ π ο½ Idea: βgradualβ partitioning, like in block coding ο½ ο½ βcap π΄π β = intersection of πΎ caps π Algorithm: ο½ ο½ ο½ ο½ 14 Sample π Gaussians π§1 , π§2 , β¦ , π§π Form ππ = π β π | π, π§π β₯ πβ² Recurse on non-empty ππ βs For πΎ levels π = 3, πΎ = 2 π1 π2 π3 Data-dependent hashing [A-Razenshteynβ15] ο½ Two components: ο½ ο½ 15 Pseudo-random configuration Reduction to such configuration has better d.i. hashing data-dependent Pseudo-random configuration ο½ Points on a unit sphere, where ο½ ππ β 2, i.e., far pair is (near) orthogonal this would be the distance if the dataset were random on the sphere ο½ Close pair: π = 2/π ο½ ππ ο½ Lemma (in the LSH regime): ο½ Cap Hashing gives π= 16 1 2π 2 β1 ππ/ 2 Reduction to nice configuration ο½ Idea: iteratively make the pointset more isotopic by narrowing ball around it ο½ Algorithm: ο½ find dense clusters ο½ ο½ ο½ ο½ recurse on dense clusters apply Cap Hashing on the rest ο½ ο½ 17 with smaller radius large fraction of points recurse on each βcapβ eg, dense clusters might reappear Why ok? β’ no dense clusters β’ like βrandom datasetβ with radius=100ππ β’ even better! radius = 100ππ radius = 99ππ *picture not to scale & dimension Practice? ο½ Two components: ο½ ο½ Nice geometric structure: spherical case Reduction to such structure practical variant [A-Indyk-Laarhoven-Razenshteyn-Schmidtβ15] harder: regularity lemma flavor 18 Ludwigβs talk Partial progress: LSH Forest++ [A.βRazenshteynβShekel-Nosatzkiβ17] ο½ Hamming space ο½ ο½ ο½ [Indyk-Motwaniβ98] ππ π = 1/π π1+π ππ random trees, of height π each node: coordinate split by random coordinate π₯ = 00110110 LSH Forest++: ο½ ο½ ο½ In a tree, split until π(1) bucket size ++: store π‘ points closest to the dataset mean ο½ ο½ coor 3 LSH Forest [Bawa-Condie-Ganesanβ05] query compared against the π‘ points Theorem: obtains π β coor 2 coor 5 0.72 π for π‘, π large enough =performance of IM98 on random data ο½ Open: practical algorithm with optimal bounds? 19 coor 7 Time-space trade-offs ο½ So far, all algorithms: ο½ ο½ ο½ π1+π space ππ query time for some constant π Given a space budget, whatβs the best query time? ο½ Studied in [KORβ98,Indβ02, Panβ06, Kapβ15] ο½ 20 No linear-space algorithm for π = 1.5 Trade-off: eg, smaller query time? πΌ: s.t. Pr far π β π΄π π β π΄π ] = 1/π 1 Space: πβ . Query ο½ increase probability? Consider a point π captured by π΄π iff π, π§π β₯ ππ’ ο½ ο½ Pr close πβπ΄π πβπ΄π ] 1 time: . Pr close πβπ΄π πβπ΄π ] is affected where ππ’ < π Need to match π to π΄π when ο½ 21 π, π§π β₯ ππ > π π ππ’ π΄π Trade-off algo. (with oracle) ο½ Parameters: π, ππ’ and ππ ο½ Preprocessing ο½ ο½ ο½ ο½ ο½ Pick π random parts π΄(π΄ π π , π΅π ) Form subsets ππ = π β© π΄π Store non-empty ππ βs Sample Gaussian vector π§π π, π§π§ππ β₯ β₯ πππ’ and π΅π = π | π, π§π β₯ ππ π΄π΄π π== ππβΆ | π, Query ο½ ο½ 22 ππ Picking π΄π βs? ο½ ο½ ππ’ Retrieve all the sets ππ such that ππββπ΄π΅ππ Inside the retrieved ππ βs, search a point within ππ from π π΅π π΄π Overall Algorithm [ALRWβ17] ο½ Can achieve π-approximate NNS with ο½ for ππ , ππ’ s.t. π 2 ππ + π 2 β 1 ο½ Query time: πππ +π(1) Space: π1+ππ’+π 1 ο½ ππ’ = 2π 2 β 1 data-independent LSH regime: ππ’ = ππ data-dependent 23 Lower Bounds ο½ Trade-off is tight! [ALRWβ17, Christianiβ17] ο½ ο½ Since caps are near-optimal [OβDonnellβ14] Cell-probe lower bounds: ο½ 1 cell probe: tight space! ο½ ο½ 2 cell probes: also! ο½ ο½ ο½ Using [Panigrahy-Talwar-Wiederβ10] Using Locally-Decodable Codes lower bounds [Kerenidis, de Wolf β03] first such lower bound for static data structures Lower bounds for data-dependent hashing? ο½ ο½ Need to restrict computational power of hashes? [A-Razenshteynβ16]: restrict description of hash function ο½ ο½ 24 in LSH regime only (ππ’ = ππ ) (likely extends to entire trade-off) Looking forward ο½ Data-dependent hashing: reduction in other settings? ο½ ο½ Go βbeyond-worst caseβ? ο½ ο½ Eg, closest pair can be solved much faster for random-case [Valiantβ12, Karppa-Kaski-Kohonenβ16] Instance-optimal partitions? Space partitions for harder metrics? ο½ ο½ ο½ Classic hard distance: ββ No non-trivial sketch But: [Indykβ98] gets efficient NNS with approximation π(log log π) ο½ ο½ ο½ 25 Tight approximation in the relevant models [ACPβ08, PTWβ10, KPβ12] Can be thought of as data-dependent hashing! Other distances: EMD, edit, matrix norms, etc ? NNS for any symmetric norm [A-Nguyen-Nikolov-Razenshteyn-Weingartenβ17] ο½ Thm: for any symmetric norm with π < ππ(1) , can get ο½ ο½ ο½ ο½ invariant if permute coordinates log log π π 1 approximation π1+π(1) space ππ 1 query time Ilyaβs talk Is there an efficient ANN algorithm for general highdimensional norms with approximation π π(1) ? General norm ? Conjectured LB if: there is a family of spectral expanders that embed with distortion π 1 into some log π(1) π-dimensional norm, where π is the number of nodes [Naor β17] 26
© Copyright 2026 Paperzz