Active hashing and its application to image and text retrieval Yi

Active hashing and its application to
image and text retrieval
Yi Zhen, Dit-Yan Yeung, Published in DMKD Feb 2012
Presented by
Arshad Jamal, Rajesh Dhania, VinkalVishnoi
Introduction
 Computing similarity plays a fundamental role
 Hashing based methods gained popularity for large-
scale similarity search
Hashing based
Tree based
Suitable for low
dimensions
This paper proposes
a novel
Framework for
Active Hashing
Data
Dependent
Unsupervised
Data
Independent
Semisupervised
Related work
 Locality Sensitive Hashing [Andoni A, Indyk P (2006)]
 Goal is to assign similar binary code for data points that are
closer in feature space [Random Linear Projection + Thresh]
 Code length could become quite large
 Spectral Hashing [Weiss Y, Torralba A, Fergus R (2008)]
 Performs spectral decomposition to learn hash functions
 Assumes data to be uniformly distributed
 Active Learning
 Identify and present the most informative unlabeled data to
human experts for labeling
Related Work: Semi-supervised
Hashing [Wang J,Kumar S, Chang S-F (2010a)]
 Given N normalized data points of D dimensions
 Learn K Hash functions to generate K-bit binary code
hk (x)  sgn( w x)
T
k
 Build two set of point pairs S (Similar), D(Dissimilar)
 Together they characterize the semantic similarity
 Hash functions Η  {hk }kK1 are learned by maximizing
an objective function,




J ( H )     hk ( xi )hk ( x j )   hk ( xi )hk ( x j )

k 1 
( xi , x j )D
( xi , x j )S

K
Limitations of SSH
 Point pairs from both S and D sets are considered to be
equally important
 For multi-class data, the D points picked from closer or
farther class contribute same weight
 More dissimilar points will spoil the learned hash function
C1
C2
C3
Active Hashing (Greedy AH)
 Tries to overcome the limitations of SSH by picking most
informative points
 Algorithm: Three main steps
 Given (L, U) labeled and un-labeled data points and candidate
set C
1. Select most
informative pts A
from C
2. Get A labeled by an
expert
Update L, U, C
3. Train the hash
functions based on L & U
Greedy AH: Selecting data points
 Based on SSH model hash function hk (x)  sgn( wTk x)
 Intuitively, the term
w Tk x indicates the certainty of x
2
 Data certainty (DC):
f ( H , x)  W T x
2
 Data points with smallest f will be the most informative
points
Batch Mode Active Hashing
 Selecting points one by one is inefficient and suboptimal
 Set of points are selected and processed to learn a Hash fn.
 T ~  T 
min  f   K 

M


 µ is indicator function deciding about the presence of a point
 f is a vector of normalized certainty values in C
 K is positive semi-definite similarity matrix defined on C
 Choose M examples with largest µ
BMAH Algorithm
 Algorithm: Three main steps
 Given (L, U) labeled and un-labeled data points and candidate
set C
1. Select most
informative M pts
from C
2. Get it labeled by an
expert
Update L, U, C
Train the hash functions
based on L & U
Experimental evaluation-I
 Image retrieval (MNIST dataset): Results reported for
different parameter settings
 Text Retrieval (20Newsgroups (NEWS) data set)
 Random vs BMAH: Performance improvement
Experimental evaluation-II
 Image retrieval (MNIST dataset)
 BMAH vs GAH: BMAH takes less time
References
 Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate
nearest neighbor in high dimensions. In: Proceedings of the 47th annual IEEE
symposium on foundations of computer science, FOCS ’06, IEEE Computer
Society, Washington, pp 459–468
 Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Koller D, Schuurmans
D, BengioY, Bottou L (eds) Advances in neural information processing systems
21, NIPS 21, The MIT Press, Cambridge, MA, pp 1753–1760
 Wang J,Kumar S, Chang S-F (2010a) Semi-supervised hashing for scalable
image retrieval. In: Proceedings of IEEE conference on computer vision and
pattern recognition [46], pp 3424–3431
 Salakhutdinov R, Hinton GE (2009) Semantic hashing. Int J Approx Reason
50:969–978
Thanks