Proximity Searching in High Dimensional Spaces with a Proximity Preserving Order Edgar Chávez Karina Figueroa Gonzalo Navarro UNIVERSIDAD MICHOACANA, MEXICO UNIVERSIDAD DE CHILE, CHILE Content 1. 2. 3. 4. 5. 6. About the problem Basic concepts Previous work Our technique Experiments Conclusion and future wok Proximity Searching Huge Database Expensive distance •Exact searching is not possible Applications • • • • • Retrieval Information Classification People finder through the web Clustering Currently used on – Classification of Spider’s web – Face recognition on Chilean’s Web Problems (metric spaces) Huge databases Extraction of characteristics High dimension Complex objects Memory limited Index Terminology Properties • Queries •Symmetry – Range query •Strict possitiveness – K nearest neighbor •Triangle inequality Previous work • Pivot based • Partition based Pivot distance q Previous work • Pivot based • Partition based q centro Our technique Permutation P1 p2 P4 P6 p5 p3 u Permutant Our technique • Exact matching elements have the same permutation • Similar elements must have a similar permutation (we guess) • Spearman footrule metric – Measures the similarity of the permutations – Promissority elements first Spearman Footrule metric Example 3-1, 6 - 2, 3-2, 4-1, 5-5, 6-4 Difference of positions Searching process (1a. part) Preprocessing time p3,p1,p2 Permutant p1 p3 p2,p1,p3 p2 p2,p3,p1 p3,p2,p1 Searching process (2a. part) Query time Permutant Sorting elements by Spearman Footrule metric p3,p1,p2 p1 p2,p1,p3 p2,p3,p1 ….. ….. p3,p1,p2 p3 p2,p1,p3 q p2 p2,p3,p1 p2,p1,p3 p3,p2,p1 %retrieved Experiments 93% retrieved, comparing 10% of database Pivot based algorithm Retrieved 48% 90% retrieved, comparing 60% of database %retrieved Experiments 100% retrieved, comparing 15% of database 100% retrieved, comparing 90% of database How good is our prediction? Dimension 256, using 256 pivots retrieved Metric algorithms are using one of them Percentage of the database compared Similarities between permutations Almost the same value Conclusion • A new probabilistic algorithm for proximity searching in metric space. • Our technique is based on permutations. • Close elements will have similar permutations. • This technique is the fastest known algorithm for high dimension. • Permutations are good predictor Future Work • Can Non-metric spaces be tackled with this technique? • Approximated all K Nearest neighbor algorithm. • Improving other metric indexes. Thank you UNIVERSIDAD MICHOACANA, MEXICO UNIVERSIDAD DE CHILE, CHILE [email protected]
© Copyright 2026 Paperzz