Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin… MIT CSAIL Vision interfaces Key challenges: robustness Illumination Occlusions Object pose Intra-class appearance Clutter Viewpoint MIT CSAIL Vision interfaces Key challenges: efficiency • Thousands to millions of pixels in an image • 3,000-30,000 human recognizable object categories • Billions of images indexed by Google Image Search • 18 billion+ prints produced from digital camera images in 2004 • 295.5 million camera phones sold in 2005 MIT CSAIL Vision interfaces Local representations Describe component regions or patches separately Maximally Stable Extremal Regions [Matas et al.] SIFT [Lowe] Shape context [Belongie et al.] Salient regions [Kadir et al.] Harris-Affine [Schmid et al.] Superpixels [Ren et al.] Spin images [Johnson and Hebert] Geometric Blur MIT CSAIL [Berg et al.] Vision interfaces How to handle sets of features? • Each instance is unordered set of vectors • Varying number of vectors per instance MIT CSAIL Vision interfaces Partial matching Compare sets by computing a partial matching between their features. MIT CSAIL Vision interfaces Pyramid match overview optimal partial matching MIT CSAIL Vision interfaces Computing the partial matching • Optimal matching • Greedy matching • Pyramid match for sets with features of dimension MIT CSAIL Vision interfaces Pyramid match overview Pyramid match measures similarity of a partial matching between two sets: • • • Place multi-dimensional, multi-resolution grid over point sets Consider points matched at finest resolution where they fall into same grid cell Approximate optimal similarity with worst case similarity within pyramid cell No explicit search for matches! MIT CSAIL Vision interfaces Pyramid match Number of newly matched pairs at level i Approximate partial match similarity Measure of difficulty of a match at level i MIT CSAIL [Grauman and Darrell, ICCV 2005] Vision interfaces Pyramid extraction , Histogram pyramid: level i has bins of size MIT CSAIL Vision interfaces Counting matches Histogram intersection MIT CSAIL Vision interfaces Example pyramid match MIT CSAIL Vision interfaces Example pyramid match MIT CSAIL Vision interfaces Example pyramid match MIT CSAIL Vision interfaces Example pyramid match pyramid match optimal match MIT CSAIL Vision interfaces Approximating the optimal partial matching x MIT CSAIL interfaces Randomly generated uniformly distributed point sets with m= 5Vision to 100, d=2 PM preserves rank… MIT CSAIL Vision interfaces and is robust to clutter… MIT CSAIL Vision interfaces Learning with the pyramid match • Kernel-based methods – Embed data into a Euclidean space via a similarity function (kernel), then seek linear relationships among embedded data – Efficient and good generalization – Include classification, regression, clustering, dimensionality reduction,… • Pyramid match forms a Mercer kernel MIT CSAIL Vision interfaces Category recognition results ETH-80 data set Kernel Complexity Match [Wallraven et al.] Time (s) Accuracy Pyramid match Mean number of features Mean number of features MIT CSAIL Vision interfaces Category recognition results 0.002 s / match 5 s / match Pyramid match kernel over spatial features with quantized appearance 2004 6/05 12/05 3/06 Time of publication 6/06 MIT CSAIL Vision interfaces Vocabulary-guided pyramid match But rectangular histogram may scale poorly with input dimension… Build data-dependent histogram structure… New Vocabulary-guided PM [NIPS 06]: • Hierarchical k-means over training set • Irregular cells; record diameter of each bin • VG pyramid structure stored O(kL); stored once • Individual Histograms still stored sparsely MIT CSAIL Vision interfaces Vocabulary-guided pyramid match Uniform bins Vocabularyguided bins • Tune pyramid partitions to the feature distribution • Accurate for d > 100 • Requires initial corpus of features to determine pyramid structure • Small cost increase over uniform bins: kL distances against bin MIT points CSAIL centers to insert Vision interfaces Vocabulary-guided pyramid match W * # new matches @ level i wij * (# matches in cell j level i - # matches in children) nij(X) : hist. X level i cell j ch(n) : child h of node n wij : weight for hist. X level i cell j (1) ~= diameter of cell Mercer kernel (2) ~= dij(X) + dij(Y) (dij(H)=max dist of H’s pts in cell i,j to center) Upper bound c2(n11) MIT CSAIL Vision interfaces Results: Evaluation criteria • Quality of match scores How similar are the rankings produced by the approximate measure to those produced by the optimal measure? • Quality of correspondences How similar is the approximate correspondence field to the optimal one? • Object recognition accuracy Used as a match kernel over feature sets, what is the recognition output? MIT CSAIL Vision interfaces Match score quality ETH-80 images, sets of SIFT features d=8 d=128 Vocabularyguided pyramid match d=8 d=128 Uniform bin pyramid match Dense SIFT (d=128) MIT CSAIL k=10, L=5 for VG PM; Vision PCA forinterfaces low-dim feats Match score quality ETH-80 images, sets of SIFT features MIT CSAIL Vision interfaces Bin structure and match counts Data-dependent bins allow more gradual distance ranges d=3 d=8 d=13 d=68 d=113 d=128 MIT CSAIL Vision interfaces Approximate correspondences Use pyramid intersections to compute smaller explicit matchings. MIT CSAIL Vision interfaces Approximate correspondences Use pyramid intersections to compute smaller explicit matchings. MIT CSAIL Vision interfaces Correspondence examples MIT CSAIL Vision interfaces Approximate correspondences ETH-80 images, sets of SIFT descriptors MIT CSAIL Vision interfaces Approximate correspondences ETH-80 images, sets of SIFT descriptors MIT CSAIL Vision interfaces Impact on recognition accuracy • VG-PMK as kernel for SVM • Caltech-4 data set • SIFT descriptors extracted at Harris and MSER interest points MIT CSAIL Vision interfaces Sets of features elsewhere diseases as sets of gene expressions methods as sets of instructions documents as bags of words MIT CSAIL Vision interfaces
© Copyright 2026 Paperzz