Automatic Image Annotation Using Group Sparsity

Automatic Image Annotation
Using Group Sparsity
Shaoting Zhang1, Junzhou Huang1,
Yuchi Huang1, Yang Yu1, Hongsheng Li2,
Dimitris Metaxas1
1CBIM, Rutgers University, NJ
2IDEA Lab, Lehigh University, PA
Introductions
• Goal: image annotation is to automatically assign
relevant text keywords to any given image,
reflecting its content.
• Previous methods:
– Topic models [Barnard, et.al., J. Mach. Learn Res.’03;
Putthividhya, et.al., CVPR’10]
– Mixture models [Carneiro, et.al., TPAMI’07; Feng,
et.al., CVPR’04]
– Discriminative models [Grangier, et.al., TPAMI’08;
Hertz, et.al., CVPR’04]
– Nearest neighbor based methods [Makadia, et.al.,
ECCV’08; Guillaumin, et.al., ICCV’09]
Introductions
• Limitations:
– Features are often preselected, yet the properties of
different features and feature combinations are not
well investigated in the image annotation task.
– Feature selection is not well investigated in this
application.
• Our method and contributions:
– Use feature selection to solve annotation problem.
– Use clustering prior and sparsity prior to guide the
selection.
Outline
• Regularization based Feature Selection
– Annotation framework
– L2 norm regularization
– L1 norm regularization
– Group sparsity based regularization
• Obtain Image Pairs
• Experiments
Regularization based Feature Selection
• Given similar/dissimilar image pair list (P1,P2)
……
……
……
……
……
……
……
FP1
……
……
……
……
……
……
……
FP2
……
……
……
……
……
……
……
X
Regularization based Feature Selection
wˆ  arg min || Xw  Y ||22
wR p
1
-1
1
1
…
…
…
…
…
X
w
Y
Regularization based Feature Selection
• Annotation framework
Weights
Similarity
Testing
input
High
similarity
Training data
Regularization based Feature Selection
1

wˆ  arg min  || Xw  Y ||22 + || w ||2 
wR p
n

• L2 regularization
• Robust, solvable:
(XTX+λI)-1XTY
• No sparsity
%
w
Histogram of weights
Regularization based Feature Selection
1

wˆ  arg min  || Xw  Y ||22 + || w ||1 
wR p  n

• L1 regularization
• Convex optimization
• Basis pursuit, Grafting,
Shooting, etc.
• Sparsity prior
%
w
Histogram of weights
Regularization based Feature Selection
m
1
2
wˆ  arg min  || Xw  Y ||2 + || wG j
wR p  n
j 1
• Group
• L2 inside the same group,
L1 for different groups
• Benefits: removal of whole
feature groups
• Projected-gradient[2]
sparsity[1]

||2 

RGB
HSV
=0
≠0
[1] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal
Statistical Society, Series B, 68:49–67, 2006.
[2] E. Berg, M. Schmidt, M. Friedlander, and K. Murphy. Group sparsity via linear-time projection. In Technical report,
TR-2008-09, 2008. http://www.cs.ubc.ca/~murphyk/Software/L1CRF/index.html
Outline
• Regularization based Feature Selection
• Obtain Image Pairs
– Only rely on keyword similarity
– Also rely on feedback information
• Experiments
Obtain Image Pairs
• Previous method[1] solely relies on keyword
similarity, which induces a lot of noise.
Distance histogram of similar pairs
Distance histogram of all pairs
[1] A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In ECCV, pages 316–329, 2008.
Obtain Image Pairs
• Inspired by the relevance feedback and the
expectation maximization method.
k1 nearest
(candidates of
similar pairs)
k2 farthest
(candidates of
dissimilar pairs)
m
1
2
wˆ  arg min  || Xw  Y ||2 + || wG j
wR p  n
j 1

||2 

Outline
• Regularization based Feature Selection
• Obtain Image Pairs
• Experiments
– Experimental settings
– Evaluation of regularization methods
– Evaluation of generality
– Some annotation results
Experimental Settings
• Data protocols
– Corel5K (5k images)
– IAPR TC12[1] (20k images)
• Evaluation
– Average precision
– Average recall
– #keywords recalled (N+)
[1] M. Grubinger, P. D. Clough, H. Muller, and T. Deselaers. The iapr tc-12 benchmark - a new evaluation resource for
visual information systems. 2006.
Experimental Settings
• Features
– RGB, HSV, LAB
– Opponent
– rghistogram
– Transformed color distribution
– Color from Saliency[1]
– Haar, Gabor[2]
– SIFT[3], HOG[4]
[1] X. Hou and L. Zhang. Saliency detection: A spectral residual approach. In CVPR, 2007.
[2] A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In ECCV, pages 316–329, 2008.
[3] K. van de Sande, T. Gevers, and C. Snoek. Evaluating color descriptors for object and scene recognition. PAMI, 99(1),2010.
[4] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In CVPR, pages 886–893, 2005.
Evaluation of Regularization Methods
Precision
Recall
N+
Corel5K
 || w ||
IIAPR TC12
Evaluation of Generality
• Weights computed from Corel5K, then applied on
IAPR TC12.
Precision
N+
Recall
λ
λ
λ
Some Annotation Results
Conclusions and Future Work
• Conclusions
– Proposed a feature selection framework using both
sparsity and clustering priors to annotate images.
– The sparse solution improves the scalability.
– Image pairs from relevance feedback perform much
better.
• Future work
–
–
–
–
Different grouping methods.
Automatically find groups (dynamic group sparsity).
More priors (combine with other methods).
Extend this framework to object recognition.
Thanks for listening