Outline
Exploiting Time-based Synonyms in Searching
Document Archives
Nattiya Kanhabua and Kjetil Nørvåg
Database System Group
NTNU, Norway
TDT4215 Web-intelligence Spring 2011
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Outline
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Outline
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Outline
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Outline
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Outline
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Problem statement
In recent years, document archives are publicly available
E.g., Internet Archive, digital libraries and news archives
Searching in such resources is not straightforward
Contents in these resources are strongly time-dependent
Query “Pope Benedict XVI” and dates “before 2005”
Unable to retrieve documents about “Joseph Alois
Ratzinger”
To improve the retrieval effectiveness, query expansion
using synonyms wrt. time can be employed
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Problem statement
In recent years, document archives are publicly available
E.g., Internet Archive, digital libraries and news archives
Searching in such resources is not straightforward
Contents in these resources are strongly time-dependent
Query “Pope Benedict XVI” and dates “before 2005”
Unable to retrieve documents about “Joseph Alois
Ratzinger”
To improve the retrieval effectiveness, query expansion
using synonyms wrt. time can be employed
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Problem statement
In recent years, document archives are publicly available
E.g., Internet Archive, digital libraries and news archives
Searching in such resources is not straightforward
Contents in these resources are strongly time-dependent
Query “Pope Benedict XVI” and dates “before 2005”
Unable to retrieve documents about “Joseph Alois
Ratzinger”
To improve the retrieval effectiveness, query expansion
using synonyms wrt. time can be employed
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Observation
Named entities (people, organization, location, etc.)
constitute a major fraction of queries [Sanderson SIGIR’2008]
Very dynamic in appearance, i.e., relationships between
terms changes over time
E.g. changes of roles, name alterations, or semantic shift
Synonyms are different words with similar meanings
In our context, synonyms are terms used as name variants
(other names, titles, or roles) of a named entity
E.g., “Cardinal Joseph Ratzinger” is a synonym of “Pope
Benedict XVI” before 2005
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Observation
Named entities (people, organization, location, etc.)
constitute a major fraction of queries [Sanderson SIGIR’2008]
Very dynamic in appearance, i.e., relationships between
terms changes over time
E.g. changes of roles, name alterations, or semantic shift
Synonyms are different words with similar meanings
In our context, synonyms are terms used as name variants
(other names, titles, or roles) of a named entity
E.g., “Cardinal Joseph Ratzinger” is a synonym of “Pope
Benedict XVI” before 2005
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Observation
Named entities (people, organization, location, etc.)
constitute a major fraction of queries [Sanderson SIGIR’2008]
Very dynamic in appearance, i.e., relationships between
terms changes over time
E.g. changes of roles, name alterations, or semantic shift
Synonyms are different words with similar meanings
In our context, synonyms are terms used as name variants
(other names, titles, or roles) of a named entity
E.g., “Cardinal Joseph Ratzinger” is a synonym of “Pope
Benedict XVI” before 2005
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
What are time-based synonyms?
Time-independent synonyms are invariant to time
Time-dependent synonyms are relevant to a particular time
period, i.e., entity-synonym relationships change over time
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Application
News archive search
Search terms are named entities
Publication dates of documents are temporal criteria
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Application
News archive search
Search terms are named entities
Publication dates of documents are temporal criteria
Scenario 1
Query: “Pope Benedict XVI” and written before 2005
Documents about “Joseph Alois Ratzinger” are relevant
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Application
News archive search
Search terms are named entities
Publication dates of documents are temporal criteria
Scenario 2
Query: “Hillary R. Clinton” and written from 1997 to 2002
Documents about “New York Senator” and “First Lady
of the United States” are relevant
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Application
News archive search
Search terms are named entities
Publication dates of documents are temporal criteria
Challenge
Semantic gaps in searching archives, or a lack of knowledge
about a query and synonyms at particular time
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Contributions
1
Formal models
Wikipedia viewed as a temporal resource
2
Proposed approaches
Discover time-based synonyms over time
Improve the accuracy of time of synonyms
Expand a query using time-based synonyms
3
Experiments
Evaluate extracting and improving time of synonyms
Evaluate query expansion using time-based synonyms
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Contributions
1
Formal models
Wikipedia viewed as a temporal resource
2
Proposed approaches
Discover time-based synonyms over time
Improve the accuracy of time of synonyms
Expand a query using time-based synonyms
3
Experiments
Evaluate extracting and improving time of synonyms
Evaluate query expansion using time-based synonyms
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Problem Statement
Contributions
Contributions
1
Formal models
Wikipedia viewed as a temporal resource
2
Proposed approaches
Discover time-based synonyms over time
Improve the accuracy of time of synonyms
Expand a query using time-based synonyms
3
Experiments
Evaluate extracting and improving time of synonyms
Evaluate query expansion using time-based synonyms
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Recognizing named entities
Step 1: Partition Wikipedia
regarding to the time granularity
g = month to obtain its
snapshots W = {Wt1 , . . . , Wtz }
Step 2: For each snapshot
Wtk ∈ W, identify named entity
pages to obtain a set of named
entities Etk = {e1 , . . . , ej }
Step 3: For each name entity
ei ∈ Etk , find a set of
entity-synonym relationships
Stk = {ξ1,1 , . . . , ξn,m }
Kanhabua and Nørvåg
Figure: A snapshot of Wikipedia
and current revisions at time tk
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Recognizing named entities
Step 1: Partition Wikipedia
regarding to the time granularity
g = month to obtain its
snapshots W = {Wt1 , . . . , Wtz }
Step 2: For each snapshot
Wtk ∈ W, identify named entity
pages to obtain a set of named
entities Etk = {e1 , . . . , ej }
Step 3: For each name entity
ei ∈ Etk , find a set of
entity-synonym relationships
Stk = {ξ1,1 , . . . , ξn,m }
Kanhabua and Nørvåg
Example
[Bunescu and Paşca EACL’2006]
1) Multi-word titles and all words are
capitalized
President_of_the_United_
States ⇒ named entity
2) Single-word titles with multiple capital
letters
UNICEF and WHO are named
entities
3) 75% of occurrences in the article text
itself are capitalized
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Recognizing named entities
Step 1: Partition Wikipedia
regarding to the time granularity
g = month to obtain its
snapshots W = {Wt1 , . . . , Wtz }
Step 2: For each snapshot
Wtk ∈ W, identify named entity
pages to obtain a set of named
entities Etk = {e1 , . . . , ej }
Step 3: For each name entity
ei ∈ Etk , find a set of
entity-synonym relationships
Stk = {ξ1,1 , . . . , ξn,m }
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Recognizing named entities
Step 1: Partition Wikipedia
regarding to the time granularity
g = month to obtain its
snapshots W = {Wt1 , . . . , Wtz }
Step 2: For each snapshot
Wtk ∈ W, identify named entity
pages to obtain a set of named
entities Etk = {e1 , . . . , ej }
Step 3: For each name entity
ei ∈ Etk , find a set of
entity-synonym relationships
Stk = {ξ1,1 , . . . , ξn,m }
Kanhabua and Nørvåg
Example
ei : President_of_the_
United_States
tk : 11/2001
sj : “George W. Bush”
ξi, : (ei , sj ) or
(President_of_the_United_
States,“George W. Bush”)
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Extracting synonyms
Step 1: For each entity
ei ∈ Etk , find its synonyms by
extracting anchor texts from
article links
Step 2: Accumulate
entity-synonym relationships for
all entities at time tk , i.e., a
synonym snapshot
Stk = {ξ1,1 , . . . , ξn,m }
Kanhabua and Nørvåg
Example
[[President_of_the_United_
States|BarackObama]], “Barack
Obama” is anchor texts linking to the
article President_of_the_United_
States
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Extracting synonyms
Step 1: For each entity
ei ∈ Etk , find its synonyms by
extracting anchor texts from
article links
Step 2: Accumulate
entity-synonym relationships for
all entities at time tk , i.e., a
synonym snapshot
Stk = {ξ1,1 , . . . , ξn,m }
Kanhabua and Nørvåg
Example
[[President_of_the_United_
States|BarackObama]], “Barack
Obama” is anchor texts linking to the
article President_of_the_United_
States
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Extracting synonyms
Output
Entity-synonym relationships and time periods
Named Entity
Pope Benedict XVI
Barack Obama
Hillary Rodham Clinton
Synonym
Cardinal Joseph Ratzinger
Joseph Ratzinger
Pope Benedict XVI
Barack Hussein Obama II
Sen. Barack Obama
Senator Barack Obama
Hillary Clinton
Sen. Hillary Clinton
Senator Clinton
Time Period
05/2005 - 03/2009*
05/2005 - 03/2009
05/2005 - 03/2009
02/2007 - 03/2009
07/2007 - 03/2009
05/2006 - 03/2009
08/2003 - 03/2009
03/2007 - 03/2009
11/2007 - 03/2009
* The time of synonyms are timestamps of Wikipedia articles (8 years) in which they
appear, not temporal expression extracted from the contents
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Improving the accuracy of time using burst detection
Analyze the New York Time Annotated Corpus (NYT) to discover
more accurate time
1.8M articles from January 1987 to June 2007 (20 years)
Use the burst detection algorithm [Kleinberg in KDD’2002]
Generate bursty periods of ξi,j by computing a rate of occurrence
from document streams
Output bursty intervals and bursty weight, i.e., periods of
occurrence and intensity
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Improving the accuracy of time using burst detection
Output
Results from burst-detection algorithm
Synonym
Entity
President Reagan
President Ronald
President Ronald
Senator Clinton
Senator Clinton
Senator Clinton
Ronald Reagan
Ronald Reagan
Ronald Reagan
Hillary Rodham Clinton
Hillary Rodham Clinton
Hillary Rodham Clinton
Kanhabua and Nørvåg
Burst Weight
5506.858
100.401
67.208
18.214
17.732
172.356
Time
Start
End
01/1987 02/1989
01/1989 03/1990
07/1990 02/1993
01/2001 10/2001
05/2002 01/2003
06/2003 11/2004
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Classifying synonyms into two types
Definition
Class A: time-independent
Robust to change over time and good synonym candidates for an ordinary
search (no temporal criteria provided)
E.g., “Barack Hussein Obama II” is a time-independent synonym of “Barack
Obama”
Class B: time-dependent
Related to particular time in the past and good synonym candidates for a
temporal search where changes in semantics must be considered
E.g., “Cardinal Joseph Ratzinger” is a time-dependent synonym of “Pope
Benedict XVI” before 2005
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Ranking time-independent synonyms
Definition
Time-independent synonyms are weighted by a mixture model of a temporal feature
and a frequency feature
TIDP(sj ) = µ · pf (sj ) + (1 − µ) · tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
P
tf (sj ) is an averaged tf of sj in all time partitions, tf (sj ) =
tf (sj ,pi )
pf (sj )
i
µ underlines the importance of a temporal feature and a frequency feature
µ = 0.5 yields the best performance in the experiments
Intuition
The model measures popularity of synonyms based on two factors
Robustness to change over time, i.e, the more partitions synonyms occur, the
more robust to time they are
High usages over time, i.e., a high value of averaged frequencies over time
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Ranking time-independent synonyms
Definition
Time-independent synonyms are weighted by a mixture model of a temporal feature
and a frequency feature
TIDP(sj ) = µ · pf (sj ) + (1 − µ) · tf (sj )
pf (sj ) is the time partition frequency in which sj occurs
P
tf (sj ) is an averaged tf of sj in all time partitions, tf (sj ) =
tf (sj ,pi )
pf (sj )
i
µ underlines the importance of a temporal feature and a frequency feature
µ = 0.5 yields the best performance in the experiments
Intuition
The model measures popularity of synonyms based on two factors
Robustness to change over time, i.e, the more partitions synonyms occur, the
more robust to time they are
High usages over time, i.e., a high value of averaged frequencies over time
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Ranking time-dependent synonyms
Definition
Given time tk , time-dependent synonyms at tk are weighted by
TDP(sj , tk ) = tf (sj , tk )
tf (sj , tk ) is a term frequency of sj at tk
Intuition
Only term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular time
period tk are interesting
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Ranking time-dependent synonyms
Definition
Given time tk , time-dependent synonyms at tk are weighted by
TDP(sj , tk ) = tf (sj , tk )
tf (sj , tk ) is a term frequency of sj at tk
Intuition
Only term frequencies will be used to measure the importance of synonyms
Time partitions are not considered because only synonyms in a particular time
period tk are interesting
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Experiment Setting
Experimental Results
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Experiment Setting
Experimental Results
Overview of experiments
Our experimental evaluation is divided into three main parts:
1
Extracting and improving the accuracy of time of synonyms
2
Query expansion using time-independent synonyms
3
Query expansion using time-dependent synonyms
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Experiment Setting
Experimental Results
Extracting and improving time of synonyms
Data collection:
The whole history of English Wikipedia
All pages and revisions 03/2001 to 03/2008 – 85 snapshots (01/03/2001,
01/02/2001, . . ., 01/03/2008) about 2.8 Terabytes
4 additional snapshots (24/05/2008, 27/07/2008, 08/10/2008, 06/03/2009)
New York Time Annotated Corpus contains over 1.8 million articles from January
1987 to June 2007
Tools:
MWDumper http://www.mediawiki.org/wiki/Mwdumper
Oracle Berkeley DB version 4.7.25
Burst detection algorithm implemented by Kleinberg
Number of states: 2
Ratio of rate of second state to base state: 2
Ratio of rate of each subsequent state to previous state: 2
Gamma parameter of the HMM: 1
Measurement: Accuracy
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Experiment Setting
Experimental Results
Query expansion using time-independent synonyms
Data collection:
TREC Robust Track (2004)
250 topics (topics 301-450 and topics 601-700)
Tools:
Terrier – an open source search engine developed by University of Glasgow
BM25 probabilistic model with Generic Divergence From Randomness (DFR)
weighting
Expand the top-k synonyms {s1 , . . . , sk } plus TIDP scores as boosting weight
qexp = qorg s1
∧
w1 s2
∧
w2 . . . sk
∧
wk
Measurement: Mean Average Precision (MAP), R-precision and Recall
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Experiment Setting
Experimental Results
Query expansion using time-dependent synonyms
Data collection:
NewsLibrary.com contains more than 182 million newspaper articles from
thousands of credible U.S. publications
Select 20 strongly time-dependent queries
Measurement: Precision at 10, 20 and 30 retrieved documents
Examples of temporal queries
Temporal Query
Named Entity
American Broadcasting Company
Barack Obama
Eminem
George H. W. Bush
George W. Bush
Hillary Rodham Clinton
Kmart
Pope Benedict XVI
Ronald Reagan
Virgin Media
Time Period
1995-2000
2005-2007
1999-2004
1988-1992
2000-2007
2001-2007
1987-1987
1988-2005
1987-1989
1999-2002
Kanhabua and Nørvåg
Synonym
Disney/ABC
Senator Obama
Slim Shady
President George H.W. Bush
President George W. Bush
Senator Clinton
Kresge
Cardinal Ratzinger
Reagan Revolution
Telewest Communications
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Experiment Setting
Experimental Results
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Experiment Setting
Experimental Results
Extracting and improving time of synonyms
Statistics and accuracy of entity-synonym relationships extracted from Wikipedia
Avg. Syn. Accuracy
NER Method
#NE
#NE-Syn.
per NE
(%)
BPF-NERW
2,574,319 3,199,115
1.2
51
BPCF-NERW
473,829
488,383
1.0
73
BPF-NERW: Bunescu and Paşca’s Named Entity Recognition of Wikipedia titles with Filtering criteria:
1) time interval < 6 months, and 2) average frequency < 2
BPCF-NERW: BPF-NERW with only the Categories of “people”, “organization” or “company”
Note: Randomly selected 500 entity-synonym relationships for assessing the accuracy of time periods
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Experiment Setting
Experimental Results
Robust2004 query statistics
Two methods for recognizing named entities in queries:
1
Exactly matched Wikipedia page (MW-NERQ)
2
Exactly matched Wikipedia page and top-k related Wikipedia pages
(MRW-NERQ)
k = 2: if k > 2, bring noise to the NERQ process
Number of queries using two different NER
Type
MW-NERQ
MRW-NERQ
Named entity
42
149
Not named entity
208
101
Total
250
250
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Experiment Setting
Experimental Results
Query expansion using time-independent synonyms
MAP, R-precision, and Recall (* indicates statistically significant at p < 0.05)
Method
PM
PRF
SQE-PRF
SWQE-PRF
MAP
MW-NERQ
R-precision
Recall
MAP
.2889
.3469
.3608*
.3653*
.3309
.3711
.3652
.3861*
.6185
.6944
.7405*
.7388
.2455
.3002
.2507
.2885
MRW-NERQ
R-precision
.2904
.3227
.2665
.3080
Recall
.5629
.6761
.5932
.6504
PM: Probabilistic Model without query expansion
PRF: Pseudo Relevance Feedback using Rocchio algorithm
SQE-PRF: Top-k Synonyms Query Expansion with Pseudo Relevant Feedback
SWQE-PRF: Top-k Synonyms TIDP-Weighted Query Expansion, with Pseudo Relevant Feedback
Note: 40 expansion terms, top-10 retrieved documents, DFR term weighting model, i.e., Bose-Einstein 1
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Experiment Setting
Experimental Results
Query expansion using time-dependent synonyms
P@10, P@20 and P@30 (* indicates statistically significant at p < 0.05)
Method
P@10
P@20
P@30
TQ
.1000
.0500
.0333
TSQ
.5200*
.3800* .2800*
TQ: search a Temporal Query, i.e., a keyword wq and time tq
TSQ: search a Temporal Query and expand with Synonyms wrt. time tq
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Experiment Setting
Experimental Results
QUEST: Query Expansion using Synonyms over Time
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Conclusions and Future Work
Outline
1
2
3
4
5
Introduction
Problem Statement
Contributions
Synonym Detection
Entity Recognition and Synonym Extraction
Improving the Accuracy of Time
Query Expansion
Time-based Synonyms
Ranking Time-independent Synonyms
Ranking Time-dependent Synonyms
Evaluation
Experiment Setting
Experimental Results
Conclusions
Conclusions and Future Work
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from Wikipedia
Improve time of synonyms using NYT
Perform query expansion using the time-based synonyms
Conduct extensive experiments showing significant
increase in retrieval effectiveness
Future work:
Combine time-dependent synonyms and temporal
language models to determine time of queries
Exploit temporal information extraction techniques to
discover synonyms at particular time points
Improve temporal text mining/clustering using time-based
synonyms
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from Wikipedia
Improve time of synonyms using NYT
Perform query expansion using the time-based synonyms
Conduct extensive experiments showing significant
increase in retrieval effectiveness
Future work:
Combine time-dependent synonyms and temporal
language models to determine time of queries
Exploit temporal information extraction techniques to
discover synonyms at particular time points
Improve temporal text mining/clustering using time-based
synonyms
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from Wikipedia
Improve time of synonyms using NYT
Perform query expansion using the time-based synonyms
Conduct extensive experiments showing significant
increase in retrieval effectiveness
Future work:
Combine time-dependent synonyms and temporal
language models to determine time of queries
Exploit temporal information extraction techniques to
discover synonyms at particular time points
Improve temporal text mining/clustering using time-based
synonyms
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from Wikipedia
Improve time of synonyms using NYT
Perform query expansion using the time-based synonyms
Conduct extensive experiments showing significant
increase in retrieval effectiveness
Future work:
Combine time-dependent synonyms and temporal
language models to determine time of queries
Exploit temporal information extraction techniques to
discover synonyms at particular time points
Improve temporal text mining/clustering using time-based
synonyms
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from Wikipedia
Improve time of synonyms using NYT
Perform query expansion using the time-based synonyms
Conduct extensive experiments showing significant
increase in retrieval effectiveness
Future work:
Combine time-dependent synonyms and temporal
language models to determine time of queries
Exploit temporal information extraction techniques to
discover synonyms at particular time points
Improve temporal text mining/clustering using time-based
synonyms
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from Wikipedia
Improve time of synonyms using NYT
Perform query expansion using the time-based synonyms
Conduct extensive experiments showing significant
increase in retrieval effectiveness
Future work:
Combine time-dependent synonyms and temporal
language models to determine time of queries
Exploit temporal information extraction techniques to
discover synonyms at particular time points
Improve temporal text mining/clustering using time-based
synonyms
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Conclusions and Future Work
Conclusions and future work
Extract time-based synonym from Wikipedia
Improve time of synonyms using NYT
Perform query expansion using the time-based synonyms
Conduct extensive experiments showing significant
increase in retrieval effectiveness
Future work:
Combine time-dependent synonyms and temporal
language models to determine time of queries
Exploit temporal information extraction techniques to
discover synonyms at particular time points
Improve temporal text mining/clustering using time-based
synonyms
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
Introduction
Synonym Detection
Query Expansion
Evaluation
Conclusions
Conclusions and Future Work
QUEST: Query Expansion using Synonyms over Time
http://research.idi.ntnu.no/wislab/quest/
Thank you!
Kanhabua and Nørvåg
Exploiting Time-based Synonyms in Search
© Copyright 2026 Paperzz