Minimum Loss Hashing for Compact Binary Codes

Minimal Loss Hashing for
Compact Binary Codes
Mohammad Norouzi
David Fleet
University of Toronto
Near Neighbor Search
Near Neighbor Search
Near Neighbor Search
Similarity-Preserving Binary Hashing
Why binary codes?
 Sub-linear search using hash indexing
(even exhaustive linear search is fast)
 Binary codes are storage-efficient
Similarity-Preserving Binary Hashing
Hash function
binary
parameter input
quantization
matrix
vector
kth row of W
Random projections used by locality-sensitive hashing
(LSH) and related techniques [Indyk & Motwani ‘98;
Charikar ’02; Raginsky & Lazebnik ’09]
Learning Binary Hash Functions
Reasons to learn hash functions:
 to find more compact binary codes
 to preserve general similarity measures
Previous work
 boosting [Shakhnarovich et al ’03]
 neural nets [Salakhutdinov & Hinton 07; Torralba et al 07]
 spectral methods [Weiss et al ’08]
 loss-based methods [Kulis & Darrel ‘09]
…
Formulation
Input data:
Similarity labels:
Binary codes:
Hash function:
Loss Function
Hash code quality measured by a loss function:
binary
: code for item 1
codes
: code for item 2
: similarity label
similarity
label
cost
measures
consistency
Similar items should map to nearby hash codes
Dissimilar items should map to very different codes
Hinge Loss
Similar items should map to codes within a radius of
bits
Dissimilar items should map to codes no closer than
bits
Empirical Loss
Given training pairs with similarity labels
Good:
 incorporates quantization and Hamming distance
Not so good:
 discontinuous, non-convex objective function
We minimize an upper bound on empirical loss,
inspired by structural SVM formulations
[Taskar et al ‘03; Tsochantaridis et al ‘04; Yu &
Joachims ‘09]
Bound on loss
LHS = RHS
Bound on loss
Remarks:
 piecewise linear in W
 convex-concave in W
 relates to structural SVM with latent variables
[Yu & Joachims ‘09]
Bound on Empirical Loss
Loss-adjusted inference
 Exact
 Efficient
Perceptron-like Learning
 Initialize
with LSH
 Iterate over pairs
• Compute
, the codes given by
• Solve loss-adjusted inference
• Update
[McAllester et al.., 2010]
Experiment: Euclidean ANN
Similarity based on Euclidean distance
Datasets
 LabelMe (GIST)
 MNIST (pixels)
 PhotoTourism (SIFT)
 Peekaboom (GIST)
 Nursery (8D attributes)
 10D Uniform
Experiment: Euclidean ANN
22K LabelMe
 512 GIST
 20K training
 2K testing
 ~1% of pairs are similar
Evaluation
 Precision: #hits / number of items retrieved
 Recall: #hits / number of similar items
Techniques of interest
 MLH – minimal loss hashing (This work)
 LSH – locality-sensitive hashing (Charikar ‘02)
 SH – spectral hashing (Weiss, Torralba & Fergus ‘09)
 SIKH – shift-Invariant kernel hashing (Raginsky &
Lazebnik ‘09)
 BRE – Binary reconstructive embedding (Kulis &
Darrel ‘09)
Euclidean Labelme – 32 bits
Euclidean Labelme – 32 bits
Euclidean Labelme – 32 bits
Euclidean Labelme – 64 bits
Euclidean Labelme – 64 bits
Euclidean Labelme – 128 bits
Euclidean Labelme – 256 bits
Experiment: Semantic ANN
Semantic similarity measure based on annotations
(object labels) from LabelMe database:
 512D GIST, 20K training, 2K testing
Techniques of interest
 MLH – minimal loss hashing
 NN – nearest neighbor in GIST space
 NNCA – multilayer network with RBM pre-training
and nonlinear NCA fine tuning [Torralba, et al. ’09;
Salakhutdinov & Hinton ’07]
Semantic LabelMe
Semantic LabelMe
Summary
A formulation for learning binary hash functions
based on
 structured prediction with latent variables
 hinge-like loss function for similarity search
Experiments show that with minimal loss hashing
 binary codes can be made more compact
 semantic similarity based on human labels can
be preserved
Thank you!
Questions?