Somnath Banerjee, Soumen Chakrabarti, Ganesh Ramakrishnan
SIGIR 2009
Presentation by Gonçalo Simões
Course: Recuperação de Informação
Outline
Basic Concepts
Quantity consensus queries (QCQ)
Baseline approaches to QCQ
Finding Intervals for QCQ answers
Quantity-imputed labeling
Conclusions
Outline
Basic Concepts
Entity search
Question answering
Information Extraction
Quantity consensus queries (QCQ)
Baseline approaches to QCQ
Finding Intervals for QCQ answers
Quantity-imputed labeling
Conclusions
Entity search
Entity Search is an Information Retrieval task
that aims to return relevant entities for a given
query and an entity type
Example:
○ Query: porto chelsea 25 november champions league
○ Type: time
○ Result: 19:45
Question Answering
Question answering is an Information
Retrieval/Natural Language processing task
that automatically answers questions posed in
natural language
Example:
○ Query: What time does Porto-Chelsea start on
November 25th?
○ Result: 19:45
Information Extraction
Information Extraction (IE) proposes
techniques to extract relevant information from
non-structured or semi-structured texts
Extracted information is transformed so that it can
be represented in a fixed format
Outline
Basic Concepts
Quantity consensus queries (QCQ)
Motivation
Terminology
QCQ System and Testbed
Baseline approaches to QCQ
Finding Intervals for QCQ answers
Quantity-imputed labeling
Conclusions
Motivation
TREC-QA 2005, 2006 and 2007 have a total of 1125
factoid queries
Factoid Queries
TREC-QA
2005,2006,2007
Motivation
418 of the factoid queries are quantity queries
Quantity Queries VS Not quantity queries
Not quantity Queries
Quantity Queries
Motivation
128 of the quantity queries are quantity consensus
queries
Quantity Consensus Queries VS Spot Quantity
Queries VS Not quantity queries
Not quantity Queries
Spot Quantity Queries
Quantity Consensus
Querie
Motivation
Quantity consensus queries (QCQ) are
queries for which there is uncertainty
about the answer quantity.
Example:
○ What is the height of a giraffe in meters?
○ Answer: (Google Search)
Terminology
Query
Set of query words
Specification of the quantity type
Relative width parameter (optional)
Example
“+giraffe +height meters”
Terminology
Snippet: window of tokens around a
candidate quantity which matches the
unit specified in the query
Quantity: xi
Feature Vector: zi
Examples
“The giraffe is the tallest animal in the world
and often reaches a height of 5.5 meters”
“The record height for a Giraffe unycicle is
about 30,5 meters”
CQC System and Testbed
CQC queries
162 queries from diverse sources
○ 40 from Wikipedia infoboxes
○ 16 from TREC-QA 2004
○ 61 from TREC-QA 2007
○ 9 provided by Wu and Marian
○ 36 produced by volunteers
CQC System and Testbed
Data pre-processing
Web search
○ Words from queries and unit names were
submited as input to a Web Search API
Information Extraction
○ JAPE engine from GATE NLP
○ 150 rules to extract quantities regarding to
mass, mileage, power, speed, density,
volume, area, money, time duration, time
epoch, temperature, length...
Recall Precision F1-Measure
0.92
0.97
0.95
CQC System and Testbed
Data pre-processing
Feature extraction
○ Standard ranking features
TF,IDF,TFIDF of a token
- Snippet
- Window of 10 sentences above and bellow a snippet
- Page of the snippet
- HTML title of the snippet
- URL of the page where the snippet belongs
Jaccard similarity between query and snippet tokens
Number of tokens in the snippet
| A B |
J ( A, B)
| A B ||
CQC System and Testbed
Data pre-processing
Feature extraction
○ Lexical proximity features
Maximum proximity of the candidate quantity to any query token
Proximity of the candidate quantity to the query token with largest
idf (rarest)
Proximity of the candidate quantity to the query token with
smallest idf (most common)
IDF-weighted avereage proximity of the candidate quantity to all
query tokens
CQC System and Testbed
Pre-processing results:
15.000 snippets produced over the 162 QCQs
Training data:
100 of the resulting snippets were selected for
manual relevance judgement
These snippets were used as training data to
estimate a weighting vector w to determine the
relevance of the features used
Outline
Basic Concepts
Quantity consensus queries (QCQ)
Baseline approaches to QCQ
Web search
RankSVM
Wu and Marian’s system
Laplacian smoothing
Finding Intervals for QCQ answers
Quantity-imputed labeling
Conclusions
Web search
Minimal baseline that any QCQ system must
beat
1. Send query words to a search engine
2. Get the snippets from the top ranking pages
3. List the extracted quantities according to the rank
of the page
Web Search
Poor MAP and NDCG (below 0.15)
Considering credit for correct quantities
anywhere in the page:
RankSVM
Predicts the weighting vector w that indicates
how relevant each feature in the feature vector
zi is
Optimization function:
RankSVM
Outperforms Web Search even with the
credit for correct quantities anywhere in the
page:
RankSVM
The results of SVM can be used to analyze
the distribution of relevant quantities
Wu and Marian’s System
Incorporates the value of the candidate quantity xi
in the process by using a voting/scoring method
The score for each word decreases with:
The rank assigned to the source page
The number of candidate quantities in the source page
The number of duplicate pages for the same domain
The shortest distance between the quantity and a query
token
Wu and Marian’s System
W&M incorporates xi by aggregating the score
of equal quantity candidates (voting system)
The results are worse than the RankSVM
Laplacian smoothing
Combines xi and wTzi via a graph Laplacian
approach
Each snipper is a node of the graph G=(V,E)
Each edge is a similarity measure between
two nodes of the graph
Laplacian smoothing
Optimization function:
min
w, f
( f
{i , j }E
i
2
w zi ) R(i, j )( f i f j )
2
R(i,j) is the similarity function. Four functions were
tested:
Equality: R(i,j)=1 if xi=xj and 0 otherwise
Distance: R(i,j) = max{0,1-|xi-xj|/ (|xi|+|xj|)
Decay: R(i,j) = exp(-s(xi-xj)2) where s is a tuned spread
parameter
Cosine: R(i,j) is the cosine similarity between the
snippets of xi and xj
Laplacian smoothing
Outline
Basic Concepts
Quantity consensus queries (QCQ)
Baseline approaches to QCQ
Finding Intervals for QCQ answers
Listing and scoring intervals
Learning to rank intervals
Quantity-imputed labeling
Conclusions
Listing and Scoring Intervals
Find rectangular regions that clusters several
relevant quantities
The rectangular regions are represented by
I=[xs,xe] where xe ≤ (1+r) xs
For a query q with n snippets there the number
of existing intervals is given by:
n 1 (n 1)! (n 1)n
2
2 2(n 1)!
Listing and Scoring Intervals
Only small enough intervals are considered in
the process: xe ≤ (1+r) xs
Listing and Scoring Intervals
Merit functions:
Sum
w
T
i:xi I
Diff
zi
(w
T
i:xi I j:x j I
Hinge
zi w z j )
T
T
T
max{
0
,
w
z
w
z j}
i
i:xi I j:x j I
Listing and Scoring Intervals
Results:
Learning to rank intervals
Use RankSVM
Features
All snippets of I contains some query word
All snippets of I contain the minimum IDF query word
All snippets of I contain the maximum IDF query word
Number of distinct words in snippets of I
Number of words in all in all snippets of I
One minus the number of distinct quantities in snippets
of I divided by the number of elements of I
Percentage of snippets of I in the whole set of candidate
quantities
Merit functions for intervals
Learning to rank intervals
Interval relevance
Naive measure
nI
relI
nI
Discretized measure
10nI
relI
nI
Learning to rank intervals
Optimization function:
Learning to rank intervals
Results:
Learning to rank intervals
Interval oriented evaluation
n+ number of relevant snippets for the query
ki number of relevant snippets in the interval
ni number of snippets in the interval
recall
ki
i{0 ,..., j }
n
precision
k
n
i
i{0 ,..., j }
i
i{0 ,..., j }
Learning to rank intervals
Results
Outline
Basic Concepts
Quantity consensus queries (QCQ)
Baseline approaches to QCQ
Finding Intervals for QCQ answers
Quantity-imputed labeling
Conclusions
Quantity-imputed labeling
Relevance judgement for all the snippets in a
training corpus is a tedious work
Alternative:
Indicating, for a given training query, the answers
that can be considered correct
Problem:
Production of false positives and false negatives in
the training data
Quantity-imputed labeling
Results:
14.562 labeled snippets
571 false positives
395 false negatives
Outline
Basic Concepts
Quantity consensus queries (QCQ)
Baseline approaches to QCQ
Finding Intervals for QCQ answers
Quantity-imputed labeling
Conclusions
Conclusions
The authors contribute with
An introduction to QCQ
Proposal of algorithms for consensus intervals
Evaluation of an approach that uses Invervals
ranking
Future work
Replace the search API with a quantity index on
Web-scale corpora
The end!
Questions?
© Copyright 2026 Paperzz