Quantity Queries VS Not quantity queries

Somnath Banerjee, Soumen Chakrabarti, Ganesh Ramakrishnan
SIGIR 2009
Presentation by Gonçalo Simões
Course: Recuperação de Informação
Outline
Basic Concepts
 Quantity consensus queries (QCQ)
 Baseline approaches to QCQ
 Finding Intervals for QCQ answers
 Quantity-imputed labeling
 Conclusions

Outline

Basic Concepts
 Entity search
 Question answering
 Information Extraction
Quantity consensus queries (QCQ)
 Baseline approaches to QCQ
 Finding Intervals for QCQ answers
 Quantity-imputed labeling
 Conclusions

Entity search

Entity Search is an Information Retrieval task
that aims to return relevant entities for a given
query and an entity type
 Example:
○ Query: porto chelsea 25 november champions league
○ Type: time
○ Result: 19:45
Question Answering

Question answering is an Information
Retrieval/Natural Language processing task
that automatically answers questions posed in
natural language
 Example:
○ Query: What time does Porto-Chelsea start on
November 25th?
○ Result: 19:45
Information Extraction

Information Extraction (IE) proposes
techniques to extract relevant information from
non-structured or semi-structured texts
 Extracted information is transformed so that it can
be represented in a fixed format
Outline
Basic Concepts
 Quantity consensus queries (QCQ)

 Motivation
 Terminology
 QCQ System and Testbed
Baseline approaches to QCQ
 Finding Intervals for QCQ answers
 Quantity-imputed labeling
 Conclusions

Motivation

TREC-QA 2005, 2006 and 2007 have a total of 1125
factoid queries
Factoid Queries
TREC-QA
2005,2006,2007
Motivation

418 of the factoid queries are quantity queries
Quantity Queries VS Not quantity queries
Not quantity Queries
Quantity Queries
Motivation

128 of the quantity queries are quantity consensus
queries
Quantity Consensus Queries VS Spot Quantity
Queries VS Not quantity queries
Not quantity Queries
Spot Quantity Queries
Quantity Consensus
Querie
Motivation

Quantity consensus queries (QCQ) are
queries for which there is uncertainty
about the answer quantity.
 Example:
○ What is the height of a giraffe in meters?
○ Answer: (Google Search)
Terminology

Query
 Set of query words
 Specification of the quantity type
 Relative width parameter (optional)

Example
 “+giraffe +height meters”
Terminology

Snippet: window of tokens around a
candidate quantity which matches the
unit specified in the query
 Quantity: xi
 Feature Vector: zi

Examples
 “The giraffe is the tallest animal in the world
and often reaches a height of 5.5 meters”
 “The record height for a Giraffe unycicle is
about 30,5 meters”
CQC System and Testbed

CQC queries
 162 queries from diverse sources
○ 40 from Wikipedia infoboxes
○ 16 from TREC-QA 2004
○ 61 from TREC-QA 2007
○ 9 provided by Wu and Marian
○ 36 produced by volunteers
CQC System and Testbed

Data pre-processing
 Web search
○ Words from queries and unit names were
submited as input to a Web Search API
 Information Extraction
○ JAPE engine from GATE NLP
○ 150 rules to extract quantities regarding to
mass, mileage, power, speed, density,
volume, area, money, time duration, time
epoch, temperature, length...
Recall Precision F1-Measure
0.92
0.97
0.95
CQC System and Testbed

Data pre-processing
 Feature extraction
○ Standard ranking features
 TF,IDF,TFIDF of a token
- Snippet
- Window of 10 sentences above and bellow a snippet
- Page of the snippet
- HTML title of the snippet
- URL of the page where the snippet belongs
 Jaccard similarity between query and snippet tokens
 Number of tokens in the snippet
| A B |
J ( A, B) 
| A  B ||
CQC System and Testbed

Data pre-processing
 Feature extraction
○ Lexical proximity features
 Maximum proximity of the candidate quantity to any query token
 Proximity of the candidate quantity to the query token with largest
idf (rarest)
 Proximity of the candidate quantity to the query token with
smallest idf (most common)
 IDF-weighted avereage proximity of the candidate quantity to all
query tokens
CQC System and Testbed

Pre-processing results:
 15.000 snippets produced over the 162 QCQs

Training data:
 100 of the resulting snippets were selected for
manual relevance judgement
 These snippets were used as training data to
estimate a weighting vector w to determine the
relevance of the features used
Outline
Basic Concepts
 Quantity consensus queries (QCQ)
 Baseline approaches to QCQ





Web search
RankSVM
Wu and Marian’s system
Laplacian smoothing
Finding Intervals for QCQ answers
 Quantity-imputed labeling
 Conclusions

Web search

Minimal baseline that any QCQ system must
beat
1. Send query words to a search engine
2. Get the snippets from the top ranking pages
3. List the extracted quantities according to the rank
of the page
Web Search
Poor MAP and NDCG (below 0.15)
 Considering credit for correct quantities
anywhere in the page:

RankSVM
Predicts the weighting vector w that indicates
how relevant each feature in the feature vector
zi is
 Optimization function:

RankSVM

Outperforms Web Search even with the
credit for correct quantities anywhere in the
page:
RankSVM

The results of SVM can be used to analyze
the distribution of relevant quantities
Wu and Marian’s System
Incorporates the value of the candidate quantity xi
in the process by using a voting/scoring method
 The score for each word decreases with:

 The rank assigned to the source page
 The number of candidate quantities in the source page
 The number of duplicate pages for the same domain
 The shortest distance between the quantity and a query
token
Wu and Marian’s System
W&M incorporates xi by aggregating the score
of equal quantity candidates (voting system)
 The results are worse than the RankSVM

Laplacian smoothing
Combines xi and wTzi via a graph Laplacian
approach
 Each snipper is a node of the graph G=(V,E)
 Each edge is a similarity measure between
two nodes of the graph

Laplacian smoothing

Optimization function:
min
w, f

( f
{i , j }E

i
2
 w zi )  R(i, j )( f i  f j )
2
R(i,j) is the similarity function. Four functions were
tested:
 Equality: R(i,j)=1 if xi=xj and 0 otherwise
 Distance: R(i,j) = max{0,1-|xi-xj|/ (|xi|+|xj|)
 Decay: R(i,j) = exp(-s(xi-xj)2) where s is a tuned spread
parameter
 Cosine: R(i,j) is the cosine similarity between the
snippets of xi and xj
Laplacian smoothing
Outline
Basic Concepts
 Quantity consensus queries (QCQ)
 Baseline approaches to QCQ
 Finding Intervals for QCQ answers

 Listing and scoring intervals
 Learning to rank intervals
Quantity-imputed labeling
 Conclusions

Listing and Scoring Intervals
Find rectangular regions that clusters several
relevant quantities
 The rectangular regions are represented by
I=[xs,xe] where xe ≤ (1+r) xs
 For a query q with n snippets there the number
of existing intervals is given by:

 n  1 (n  1)! (n  1)n

 

2
 2  2(n  1)!
Listing and Scoring Intervals

Only small enough intervals are considered in
the process: xe ≤ (1+r) xs
Listing and Scoring Intervals

Merit functions:
 Sum
w
T
i:xi I
 Diff
zi
  (w
T
i:xi I j:x j I
 Hinge

zi  w z j )
T
T
T
max{
0
,
w
z

w
z j}

i
i:xi I j:x j I
Listing and Scoring Intervals

Results:
Learning to rank intervals


Use RankSVM
Features






All snippets of I contains some query word
All snippets of I contain the minimum IDF query word
All snippets of I contain the maximum IDF query word
Number of distinct words in snippets of I
Number of words in all in all snippets of I
One minus the number of distinct quantities in snippets
of I divided by the number of elements of I
 Percentage of snippets of I in the whole set of candidate
quantities
 Merit functions for intervals
Learning to rank intervals

Interval relevance
 Naive measure
nI
relI 
nI
 Discretized measure
10nI 
relI  

 nI 
Learning to rank intervals

Optimization function:
Learning to rank intervals

Results:
Learning to rank intervals

Interval oriented evaluation
 n+ number of relevant snippets for the query
 ki number of relevant snippets in the interval
 ni number of snippets in the interval
recall 
 ki
i{0 ,..., j }

n
precision 
k
n
i
i{0 ,..., j }
i
i{0 ,..., j }
Learning to rank intervals

Results
Outline
Basic Concepts
 Quantity consensus queries (QCQ)
 Baseline approaches to QCQ
 Finding Intervals for QCQ answers
 Quantity-imputed labeling
 Conclusions

Quantity-imputed labeling
Relevance judgement for all the snippets in a
training corpus is a tedious work
 Alternative:

 Indicating, for a given training query, the answers
that can be considered correct

Problem:
 Production of false positives and false negatives in
the training data
Quantity-imputed labeling

Results:
 14.562 labeled snippets
 571 false positives
 395 false negatives
Outline
Basic Concepts
 Quantity consensus queries (QCQ)
 Baseline approaches to QCQ
 Finding Intervals for QCQ answers
 Quantity-imputed labeling
 Conclusions

Conclusions

The authors contribute with
 An introduction to QCQ
 Proposal of algorithms for consensus intervals
 Evaluation of an approach that uses Invervals
ranking

Future work
 Replace the search API with a quantity index on
Web-scale corpora
The end!
Questions?