Slides - Microsoft

Towards MSR-Bing Challenge:
Ensemble of Diverse Models for Image Retrieval
Quan Fan, Hanqiu Xu, Ruowei Wang, Shengsheng Qian, Ting Wang,
Jitao Sang, Changsheng Xu
Institute of Automation, Chinese Academy of Sciences
Chinese-Singapore Institute of Digital Media
October 07, 2013
Review of The Task
๏ฎ Task: Develop a score system to assess the queryimage relevance
๏ฎ For each image-query pair, output a floating score indicating
how effective the query is used to describe the image.
Input
๐’™
๐’’ barack obama
Output
๐’”(๐’’, ๐’™)
๏ฎ Evaluation
๏ฎ For one specific query ๐‘ž, image rank list is generated by
sorting the relevance scores ๐‘ (๐‘ž,โ‹…);
๏ฎ Ave. DCG@25 over all test queries is employed as the final
evaluation metric.
Data Set
๏ฎ Training Set:
image ID <tab> query <tab> click count
๏ฎ Development Set:
query <tab> image ID <tab> judgment (Excellent/Good/Bad)
โ€œkatrina darlingโ€ img1504 Excellent
โ€œkatrina darlingโ€ img2817 Bad
Our Solution
Query-based Scoring
Image-based Scoring
Feature
Extraction
Concept Classification
Discriminative Ranking
Model
Ensemble
Model
โ€ฆ
System Illustration
offline
online
<query,
image, click>
Feature Extraction
Model Learning
Feature Extraction
Scoring
panda eat bamboo
relevance score
Feature Extraction
๏ฎ Query Features
โ€“ BoW representation: ๐‘ž = (๐‘ž1 , โ€ฆ , ๐‘ž๐‘‡ ) โˆˆ โ„๐‘‡ , ๐‘‡ = 100,000;
โ€“ Feature value: word occurrence
๏ฎ Image Features (๐‘‘ = 22,312)
โ€“ Local Features
๏ƒ˜ HOG+LLC+SPM
๏ƒ˜ LBP
โ€“ Global Features
๏ƒ˜ Color moment
๏ƒ˜ Edge histogram
๏ƒ˜ Wavelet texture feature
๏ƒ˜ GIST feature
#1 Query-based Scoring
๏ฎ Motivation
โ€“ Transfer to measure image-image visual similarity.
๏ฎ Solution
โ€“ Retrieve the related image set ๐‘‹ by issuing the test query ๐‘ž
into the training set;
โ€“ Calculate query-image relevance by aggregating the visual
similarities between test image ๐‘ฅ and the query-related images.
๐’’
๐’™
panda eat bamboo
<query,
image, click>
Related images
๐’”(๐’™, ๐’’)
#2 Image-based Scoring
๏ฎ Motivation
โ€“ Transfer to measure query-tag textual similarity.
๏ฎ Solution
โ€“ Retrieve the related tag set ๐ป by issuing the test query ๐‘ฅ into
the training set via locality sensitive hashing (LSH);
โ€“ Calculate query-image relevance by aggregating the textual
similarities between test query ๐‘ž and the image-related tags.
๐’’
๐’™
panda eat bamboo
๐’”(๐’™, ๐’’)
<query,
image, click>
LSH
panda river
bamboo China Related tags
green
#3 Concept Classification
๏ฎ Motivation
โ€“ Transfer to an image classification problem;
โ€“ Classification confidence as query-image relevance.
๏ฎ Solution
โ€“ Concept set
๏ƒ˜ Concept refers to a salient term or phrase
๏ƒ˜ Construct 249,527 concept vocabulary from training queries;
๏ƒ˜ Using OpenNLP toolbox.
#3 Concept Classification
๏ฎ Solution
โ€“ Concept classifier training
๏ƒ˜ Large-margin classifier (SVM, boosting, etc.);
๏ƒ˜ Positive v.s. Negative sample collection.
โ€“ Test
๏ƒ˜ Concept detection from test query;
๏ƒ˜ Calculate the classification confidences of the test image to each
detected concept;
๏ƒ˜ Sum. or Ave. fusion of confidences as the final relevance score.
๐’’
๐’™
panda eat bamboo
Concept
detection
panda bamboo detected concepts
Concept classification
๐’”(๐’™, ๐’’)
#4 Discriminative Ranking
๏ฎ Motivation
โ€“ Learn a discriminative model that both reserves ranked
relationship in the training set and boosts ranking performance
on new data.
๏ฎ Solution
โ€“ Based on model from [1].
โ€“ Learn a mapping function ๐‘“๐œƒ from image space to text space:
๐‘  ๐‘ฅ, ๐‘ž = ๐‘ž โ€ข ๐‘“๐œƒ (๐‘ฅ)
โ€“ ๐‘“ is optimized towards minimizing the supervised loss for
image-query ranking in the training set:
โ€“ Generalization capability is guaranteed by SVM-alike formulation.
[1] David Grangier, Samy Bengio: A Discriminative Kernel-Based Approach to Rank Images from Text Queries. PAMI=30(8): 1371-1384 (2008)
Ensemble Model
๏ฎ Ranking SVM-based ensemble
โ€“ Ensemble on score level
โ€“ Supervised learning to obtain optimal fusion weight on
the development set.
๏ฎ Ensemble schemes
โ€“ Two Model Fusion
๏ƒ˜ Concept Classification + Discriminative Ranking
โ€“ All Model Fusion
๏ƒ˜ Image-based Scoring
๏ƒ˜ Query-based Scoring
๏ƒ˜ Concept Classification
๏ƒ˜ Discriminative Ranking
Evaluation Results
Discussion
๏ฎ Whatโ€™s the most difficult part in this challenge?
โ€“ Textual query complexity (noise, multiple words, etc. ).
๏ฎ What did you spend most of your time on?
โ€“ Implement and compare between different models.
๏ฎ How did you handle system scalability?
โ€“ Model-based && preprocessing.
๏ฎ What would you do if you do it again?
โ€“ Explicitly analyze the word relations within test query.
๏ฎ What would you do if the data size increases to 40 M?
โ€“ Most of the examined models are expected to scale well.
๏ฎ What else can we do with this dataset?
โ€“ If extended by the user dimension, tasks of personalized
image retrieval is enabled.
Q&A?
Multimedia Computing group
http://nlpr-web.ia.ac.cn/mmc/
National Lab of Pattern Recognition
Institute of Automation, Chinese Academy of Sciences
#5 Matrix Factorization-based Scoring
๏ฎ Motivation
โ€“ Assumption: similar images are relevant to similar
queries;
โ€“ Transfer to a recommendation problem.
๏ฎ Solution
โ€“ Analogous to collaborative filtering
๏ƒ˜ Image-query relevance as the confidence of recommending the
image to the query
โ€“ Factorization Machine (FM [2]) model
[2] Steffen Rendle: Factorization Machines with libFM. ACM TIST 3(3): 57 (2012)
#6 Multimodal Deep Learning
Results on Development Set