Intent-Based Categorization of Search Results Using Questions from

Intent-Based Categorization of Search Results
Using Questions from Web Q&A Corpus
Soungwoong Yoon, Adam Jatowt, and Katsumi Tanaka
Graduate School of Informatics, Kyoto University
Yoshida Honmachi, Sakyo, Kyoto 606-8501, Japan
{yoon,adam,tanaka}@dl.kuis.kyoto-u.ac.jp
Abstract. User intent is defined as a user’s information need. Detecting intent
in Web search helps users to obtain relevant content, thus improving their
satisfaction. We propose a novel approach to instantiating intent by using
adaptive categorization producing predicted intent probabilities. For this, we
attempt to detect factors by which intent is formed, called intent features, by
using a Web Q&A corpus. Our approach was motivated by the observation that
questions related to queries are effective for finding intent features. We extract
set of categories and their intent features automatically by analyzing questions
within Web Q&A corpus, and categorize search results using these features.
The advantages of our intent-based categorization are twofold, (1) presenting
the most probable intent categories to help users clarify and choose starting
points for Web searches, and (2) adapting sets of intent categories for each
query. Experimental results show that distilled intent features can efficiently
describe intent categories, and search results can be efficiently categorized
without any human supervision.
Keywords: User intent detection, Intent-based categorization.
1 Introduction
The continuous growth of the Web has made it increasingly important in our lives, yet
it is still difficult to find appropriate information we intend to seek. Search engines are
the most frequently used ‘gateways’ for entering the territory of the Web, and
conventional search engines return pages that are most relevant to queries issued
under the practical assumption that users’ search needs have been explicitly
represented by the search queries. However, queries are generally insufficient to fully
describe a user’s information needs as they contain very short keyword phrases [7],
which may be ambiguous. Moreover, users modify their search needs frequently on
viewing the search results or browsing Web pages.
A user’s information need is defined as intent [1]. The Intent Detection on Web
searching is a kind of ‘holy grail’ for search engines as it promises better user
experience and improved satisfaction. There were several studies for detecting intent
within queries by analyzing query logs [1,8,10]. However, detecting the complete user
intent is not easy and is actually often impossible, because the user’s search intent is
G. Vossen, D.D.E. Long, and J.X. Yu (Eds.): WISE 2009, LNCS 5802, pp. 145–158, 2009.
© Springer-Verlag Berlin Heidelberg 2009
146
S. Yoon, A. Jatowt, and K. Tanaka
generally very subjective and ambiguous. The precise intent may even be difficult for
the users themselves to determine.
As an alternative, conventional approaches focused on categorizing search results
using external knowledge and click-through data [2,9]. However, as shown in Fig.
1(a), these techniques prejudge the fixed number of intent-category pairs, rather than
generate a set of exhaustive intent possibilities using the query. Moreover, they
attempt to detect the statistically dominant intent of each search result, which means
they overlook the fact that queries can represent many variations of intent.
(a) Conventional approach
(b) Our approach
Fig. 1. Comparison of search results’ categorization
Different to the conventional approaches, we propose a novel method of
representing the most probable intent possibilities for users to improve their Web
experience. Our method relies on extracting factors through which intent is formed,
called intent features, using Web Q&A corpus, since the questions contain useful and
typical expressions that reveal the questioner’s intent.
In the proposed approach depicted in Fig. 1(b), intent features of queries are
extracted using question phrases and their categories1 appearing in Web Q&A corpus.
Next, using common categories of such questions, we can determine the candidate
intent features of a user query represented by a set of categories, called intent
categories, without any human supervision. Moreover, these intent categories are
adaptively changed according to the query, because there are numerous kinds of
categories in Web Q&A corpus2. Finally, we categorize search results into one or
more intent categories by calculating their matches with intent features of the query.
We present such categorized search results to the users in real time as an alternative to
a conventional ranking of search results returned by current search engines.
The advantages of our intent-based categorization of Web search results are that
users should be able to more easily determine their own search needs and satisfy their
search goals by seeing the categorized search results. More precisely, the advantages
are two-fold: (1) presenting the most probable intent categories to help users clarify
1
For example, for the query ‘Kyoto’ there are many question phrases related to this query in
Web Q&A corpus such as ‘Where is the cheapest hotel in Kyoto?’, ‘How to go to Kyoto?’ or
‘When will United-States sign the Kyoto Protocol?’. These question phrases are manually
assigned within Web Q&A corpus to their corresponding categories such as ‘Travel’, ‘Japan’
or ‘Health’ for the above examples, respectively.
2
For example, Yahoo! Answers (http://answers.yahoo.com), one of useful Web Q&A corpora,
contains 26 top level categories and 1,640 sub-categories.
Intent-Based Categorization of Search Results Using Questions from Web Q&A Corpus
147
and choose starting points for Web searches, and (2) adapting sets of intent categories
for each query.
Our main contribution is as follows: (a) we propose intent-based categorization of
Web search results by using Web Q&A corpus to the problem of intent detection and
(b) we show the method for such categorization based on linguistic analysis of
questions and (c) we evaluate it through experiments.
The reminder of this paper is organized as follows. Related work and the
background are presented in Section 2. We explain our methodology in Section 3 and
present the experimental results and a discussion in Section 4. The conclusion and
future research directions are given in Section 5.
2 Related Research
2.1 Intent Discovery
Research on search intent discovery originated from the analysis of click-through data
and query-intent categorization. Following the well-known query classification first
proposed by Broder [1], Jansen et al. [8] stated that user intent can be categorized into
three general intent classes: navigational, transactional and informational intent. The
characteristics of user intent have been conventionally defined by analyzing clickthrough data [1,8,10]. Using large amounts of data containing evidence of user
search-related activities made it possible to not only understand user needs, but also to
depict user behavior in browsing Web search results [6] or support non-informational
search intent on the Web [11].
However, the usefulness of these categorizations is limited by the data sets used
and the efficiency of post-processing. There is a risk of the over-generalization being
reflected in mismatches in classification between automatic and manual
categorization. This is because the above researches were based on their own rigid
classification schemes, and biased by the data sets they used. Furthermore, the
previously proposed methods often have failed to represent the actual user intent as
the scope of possible intents may simply be too large and too heterogeneous to be
accurately reflected in any fixed taxonomy. Moreover, one should realize that the
information needs of Web users are constantly changing and so does the Web itself.
In this study, we take a different approach. Rather than conjecturing likely user
search intent with fixed categorization choices, we allow users themselves to choose
the search categories that they may be interested in. To achieve this, we use Web
Q&A corpus that reflect typical needs of users in Web searches. With this kind of
data at hand we categorize search results returned by conventional search engines
allowing pages to be assigned to multiple categories. As a result, users receive
categorized search results that can serve as starting points to continue search
processes. We think that this kind of presentation of search results should help users
to better organize their search intent space and to more effectively reach search results
that truly reflect their search needs.
Our method is somewhat similar to the notion of Navigation-Aided Retrieval
(NAR) proposed by Pandit and Olson [14] as a kind of post-query user navigation.
They presented starting points for Web navigation using the information scent model
148
S. Yoon, A. Jatowt, and K. Tanaka
based on a hyperlink structure within the neighborhoods of returned search results and
an original query. We do not employ a link structure analysis in our work but rather
focus on query semantics and possible intent categories (distilled from Web Q&A
corpus) that Web users may have in relation to their queries.
2.2 Feature Extraction and Categorization of Web Search Results
Feature extraction is an essential task for categorizing texts and is, in fact, the basis of
its efficiency. Detecting salient features may however be sometimes very difficult
especially for short documents. Therefore, previous approaches to feature extraction
from documents have sometimes also employed additional resources. Such extensions
have involved acquiring lexical meanings from dictionaries or using external
knowledge bases like WordNet 3 [13], Open Directory Project (ODP) 4 [4] and
Wikipedia5 [3].
Manual classification of Web pages such as the one done in ODP and Yahoo!
Directory6 is another simple solution that depends on human intervention. Although
manual classification guarantees a high precision, it cannot be scaled to accommodate
the Web. Therefore, automatic classification has been proposed. Clusty.org is an
example of a clustering-based search engine that groups related Web search results
based on their content. Other information can also be utilized to automatically group
pages. For example, Chaker and Ounelli [2] used URLs and logical and hyperlink
structure to capture various genres – the content, form and functionality – of Web
pages, and Hu et al. [7] used a concept graph based on distilled concepts from
Wikipedia and its link structures.
In this work, we use Web Q&A corpus as a training set, because questions
concerned with query terms are effective expressions of the potential user intent that
is hidden behind these terms. Web Q&A corpus also supports quick matching of
terms and provides ready categories for questions, which enables very efficient
adaptive categorization.
3 Find Intent Features and Categorize Search Results
There is an overview of our approach shown in Fig. 2. First, a user issues a query to a
conventional search engine for which search results are obtained. The same query is
also sent to Web Q&A corpus. Our system receives questions relevant to the query
and their categories, and characterizes each category with its associated questions by
head noun extraction and term scoring using the well-known term frequency–inverse
document frequency (TFIDF) weighting scheme. Next, the returned search results are
compared against the categories using intent features. Finally, a user receives categorized search results that s/he can browse and utilize to modify subsequent queries.
3
WordNet, a lexical database for English language. Princeton University, http://wordnet.princeton.edu
http://dmoz.org
5
http://www.wikipedia.org
6
http://dir.yahoo.com
4
Intent-Based Categorization of Search Results Using Questions from Web Q&A Corpus
149
Fig. 2. Overview of our proposed system
3.1 Multiple User-Intent Model
Intent is linguistically defined as a ‘purpose’ or ‘aim.’ These meanings in the Websearch environment are restricted by the constraint ‘user,’ such as ‘the perceived need
for information that leads to someone using an information retrieval system in the
first place’ [1], ‘user goals in Web search’ [10], or ‘the type of resource desired in the
user’s expression to the system’ [8].
With the above definitions we can assume that user intent in a query can be formed
with words and sentences. In other words, a user can form his/her intent using words
and sentences on the Web. Even though there may be many possible implicit
directions, the best explicit form of intent is the query itself, which gives strength to
the general assumption of the query’s importance.
Suppose that there are numerous, but limited kinds of intent in a query. In the
simplest case, the dominant search need of a user is the same as the query itself.
However, we often face the situation when there is implicit intent hidden behind the
query that cannot be directly deduced. For example, we can easily guess that the
dominant intent of the query ‘kyoto travel’ is some kind of ‘general information about
travel to Kyoto city.’ However, the implicit user intent may actually be more complex
such as the need for ‘the cheapest way to travel to Kyoto city,’ or for more specific
information on Kyoto such as ‘hotels in Kyoto.’ We regard these differences in the
starting points of the same query as the Multiple User-Intent Model. Even though the
query is the same, the user’s starting points of Web search can be different (varying
intent for the same query). To use our model, we made two assumptions.
• Assumption 1. User has always some intent behind issued query 7 . S/he can
express her/his search intent by using text. We assume that users have intent in all
queries, which means that their queries contain explicit intent expressions
described directly by query terms, as well as implicit expressions that can be
represented by query synonyms or coordinated terms.
• Assumption 2. The Web is sufficient to discover user intent. We assume that when
the user browses information on the entire Web, s/he can find intended search
results, because the Web is the largest corpus of knowledge.8
7
8
Exceptional cases such as input mistakes, confusion, or spelling corrections are omitted.
This assumption has been used implicitly in all intent researches, such as ones based on
utilizing user click histories or using words and phrases in language models.
150
S. Yoon, A. Jatowt, and K. Tanaka
If the user already has a certain amount of knowledge, the query may contain
his/her actual intent and the probability of finding meaningful Web information
increases. However, when the user is a novice or has no idea about the query, s/he
may want to realize her/his actual information need or its clue(s) by browsing search
results.
3.2 Extracting Intent Features
The main problem is how to efficiently estimate set of possible intent choices for a
given query. Conventional search engines deduce a user’s intent by using additional
information on the query such as the context of search results or its log data. They
then present search results after having conducted prior preprocessing steps such as
query expansions or spell corrections. The returned search results are not, however,
grouped according to their meaning or potential intent-based categories. They are
only arranged by ranking the search results according to pages’ relevance scores.
We have chosen Web Q&A corpus in our research as a reference knowledge base.
Aggregated user questions related to the query are strong evidence of typical search
intentions behind the query and can be effectively used for analyzing possible intent.
By sending a query to Web Q&A corpus, we can receive useful hints and topics
connected to questions that are basis of our intent prediction process.
Fig. 3. Example of sending a query to Yahoo! Answers
In the example in Fig. 3 we show results obtained for the query ‘kyoto travel’ from
Yahoo! Answers. The question ‘What is the cheapest way to go to Kyoto?’ is
included in the category ‘Japan’. It is an example of a particular possible intent that
users interested in travelling to Kyoto may have. We can then use not only the
category ‘Japan’ but also the expression ‘cheapest way,’ which is the core meaning of
the question and is directly connected with category ‘Travel’ for describing this
particular user intent.
Using a query, questions Q = [q1’, q2’,... qn’] with their matched categories
extracted from Web Q&A corpus, C’ = [c1’, c2’,... cn’], are collected. Next, we filter a
set of unique categories appearing in C’ expressed as C = {c1, c2,... cn}, as well as
distilled sets of questions by category, S = {s1, s2,... sn} with n ≤n’. Here si denotes the
set of questions included in category ci. From now on we will call categories C = {c1,
c2,... cn} as intent categories.
Intent-Based Categorization of Search Results Using Questions from Web Q&A Corpus
151
We propose two methods of extracting the features for each intent category.
• Head-noun set of questions: The main noun phrase of a sentence contains the
focus of the sentence, and the headword can be thought of as the ‘important’ noun
within the phrase [12]. In our study, a question is an ordinary sentence and the
head noun of a question within a certain intent category is regarded as a key factor
of intent feature of that category. We extend the methodology of Metzler et al. [12]
to extract the meanings of head nouns as the main foci of questions, called headnoun sets, as follows.
Pseudo code for extracting head-noun set of question q in question set sj of category cj
Foreach q in sj
POS tagging q to q[1],q[2],… q[o]
Foreach q[i] where 1 ≤i ≤o
If (q[i] is Noun)
If (q[i+1] is Noun)
hnq = q[i] + q[i+1]
Else
hnq = q[i]
End If
If (q[i-1] is Adjective) hnq = q[i-1] + hnq
End If
End If
Break Foreach
Stemming hnq to hnq’
Insert hnq’ into HNcj with count
End Foreach
We assume the first noun or noun phrase is the head-noun phrase for each question
and extract the adjective of that phrase if there is any. The collected head-noun
phrases are assumed to be head-noun sets after stemming. For example, head-noun
phrases ‘cheap way’ in the topic ‘Travel’ and ‘cheap ways’ or ‘cheapest way’ are
treated the same. Finally, the jth intent category has l head-noun sets HNcj = {hncj,1,
hncj,2,... hncj,l}. These head noun sets are later used for computing their inclusion
within the Web search results.
• TFIDF vector of a question set: We distill nouns by POS tagging and calculate
TFIDF vectors of the question set for each intent category. As each question in an
intent category is treated as a document in the traditional TFIDF scheme, the total
number of documents corresponds to the total number of questions in this category.
We call this adaptation of traditional TFIDF weighting scheme, a question-TFIDF.
qtfidf
c j ,m
=
n j ', m
∑
n j ', m
m
∗ log e
| {s j ' | s j ' ∈ c j } |
| { s j ' | t j ', m ∈ s j ' and s j ' ∈ c j } |
(1)
where qtfidfcj,m is the question-TFIDF value for the mth noun of the jth intent category, tj’,m is
mth noun in the j’th question, nj’,m is the number of occurrences of tj’,m, sj’ is the j’th question
and cj is the j’th intent category.
The question-TFIDF vectors of jth intent category qTFIDFcj = [qtfidfcj,1, qtfidfcj,2,...
qtfidfcj,m] are later used to compute their similarity with the Web search results.
152
S. Yoon, A. Jatowt, and K. Tanaka
3.3 Intent Score
For search results R = [r1, r2,..., rk] of a given query, the set of intent scores of each ith
search result Ii = [Ii(c1), Ii(c2),..., Ii(cn)] is calculated using either head noun inclusion
or cosine similarity between TFIDF weighted vectors. Here the intent score Ii(cj) of a
given search result i and intent category j defines the correspondence of that search
result to the particular intent category. We use two methods for intent score
calculation:
• Counting the exact matches of head-noun sets in the jth intent category with the ith
search result using HNsj and normalizing by the number of head-noun sets in the jth
intent category.
Score HN (ri , c j ) =
| {termi ',ri | termi ',ri ∈ HN c j } |
| HN c j |
(2)
where termi’,ri is the i’ th noun term in ith search result.
• Measuring the cosine similarity between the term vector of ri and jth intent category
vector qTFIDFcj.
G
G G
ri ⋅ qTFIDF c j
ri ⋅ c j
Score TFIDF ( ri , c j ) = G G = G
ri ⋅ c j
ri ⋅ qTFIDF c j
(3)
Finally, the intent score is represented by a weighted sum of HN and qTFIDF scores.
It indicates the inclusion strength of the ith result in the jth intent category. Parameter α
is used to control the influence of HN and qTFIDF factors.
I i (c j ) = α ⋅ ScoreHN (ri , c j ) + (1 − α ) ⋅ ScoreTFIDF (ri , c j )
(4)
The ith search result can be included in multiple intent categories by using intent score
Ii(cj). Search results are categorized according to their intent scores and head-noun
sets. For example in Fig.4(a), suppose we have 4 search results returned for query
‘kyoto travel’. Using questions of Web Q&A corpus, we find out two intent
categories, ‘Japan’ and ‘Hotels’, and distill HNJapan / HNHotels sets and qTFIDFJapan /
qTFIDFHotels vectors. In Fig. 4(b), each search result is categorized by its intent score,
and both ‘Japan’ and ‘Hotels’ category include 1st search result.
(a) Search results
(b) Search result categorization
Fig. 4. Categorization example
Intent-Based Categorization of Search Results Using Questions from Web Q&A Corpus
153
4 Experiments
To collect questions, we use Yahoo! Answers, which is a large corpus of user
questions and answers started in December 2005. As a popular Internet reference,
Yahoo! Answers has more than 10 million questions, multiple answers to each
question with their validation scores, 24 top-level categories (TLCs) and their
numerous sub-categories 9 [5]. This data set is useful for connecting a certain term
with its corresponding questions and answers. Yahoo! Answers API10 enables instant
access by query and provides XML-formatted response. All experimental data was
collected and the experiments were done in April 2009.
We use AOL 500K User Session Collection 11 to obtain queries used for evaluation. This query log contains 10,154,742 unique queries issued from March 1 to
May 31 in 2006. 20 queries are randomly chosen from the top 50 most frequent AOL
queries for our experiments. Table 1 lists the chosen queries together with average
search results by Yahoo! and questions by Yahoo! Answers returned for these queries.
Table 1. Queries and their characteristics
american idol, bank of america, ebay, google, internet, mapquest, myspace,
weather, yahoo, southwest airlines, walmart, orbitz, home depot, horoscopes,
yellow pages, cingular, craigslist, msn, myspace layouts, sears
Average Yahoo! Answers
Average number of intent
Average Yahoo! hits
hits
categories in 200 questions
65,635,117
175,931
38
Query
4.1 Choosing Number of Questions
First, we have to choose the number of questions to be taken from Yahoo! Answers
API for each query. The higher is the number of questions extracted, the higher is the
resulting number of categories, however, at the same time, the amount of noise
increases and consequently the system’s performance diminishes. Therefore, it is
important to choose the appropriate number of questions to be analyzed.
For estimating the appropriate number of questions, we take the top 50 most
frequent AOL queries excluding duplicates. We then collect 20, 50, 100, 200, 500 and
1000 questions (six question sets) including their categories obtained by sending the
top 50 AOL queries to Yahoo! Answers API. Then we extract the unique categories in
each question set to create the list of unique categories within each question set. We
next compare the overlap of unique categories in the 1000 question set with the ones
in the remaining question sets treating the former as ground truth data. In this way we
calculate the precision and recall of the question sets. Fβ measure is used for choosing
the number of questions as follows. As precision is more important than recall for
showing intent possibilities to user, we set =0.5 in our experiment.
β
9
Yahoo! Answers have 26 TLCs, 326 second-level sub-categories and 1,314 third-levels with
duplications in April 2009. These categories are assumed to be changed continuously in
the future.
10
http://developer.yahoo.com/ answers/
11
http://www.gregsadetsky.com/aol-data/
154
S. Yoon, A. Jatowt, and K. Tanaka
Fβ = (1 + β 2 ) ⋅ precision ⋅ recall /( β 2 ⋅ precision + recall )
(5)
As shown in Fig. 5, we have found that a 200-question set is optimal. Conse-quently,
we decided to use 200 questions for each query to detect intent features.
Fβ=0.5 measure
0.7
0.68
0.66
0.64
0.62
0.6
0.58
0.56
0.54
0.52
0.5
20
50
100
200
Question set
500
1000
Fig. 5. Fβ=0.5 measure result
4.2 Evaluation
From the 200 questions for each query returned by Yahoo! Answers, we extract headnoun sets HN and calculate the qTFIDF vectors of each intent category using an
English morphological analyzer12 and the Porter stemmer [15].
The terms within the 50 search results returned by Yahoo! for each query are
extracted from the returned titles and snippets. Using these terms the search results
are then categorized based on their intent scores. Table 2 shows an example of
categorization results for query ‘southwest airlines’.
Table 2. Categorization example (query: southwest airlines)
Intent Category Head-noun set
Search result
Intent score
Air Travel
Airlines
50. Southwest Airlines
0.2389
46. Southwest Airlines – Mahalo
0.2363
42. Southwest Airlines
0.2338
45. Southwest Airlines – Flight, Airfare, …
0.2210
49. Southwest Airlines Raises Fares – cbs11tv
0.2191
Flight
4. Southwest Vacations – Vacation Packages…
0.0985
1. Southwest Airlines
0.0939
2. Southwest Airlines Reservations
0.0893
Corporations
Southwest 26. Southwest Airlines News
0.3164
airlines
10. Southwest Airlines – Wikipedia
0.2964
40. Southwest Airlines – USATODAY.com
0.2933
28. The Southwest Airlines Chinese New
0.2903
Year...
46. Southwest Airlines – Mahalo
0.2900
Packing &
Baggage
42. Southwest Airlines
0.4138
Preparation
23. Southwest Airlines Flight 1455 – Wikipedia
0.4043
11. Southwest Airlines Information
0.3986
46. Southwest Airlines – Mahalo
0.3939
29. AIRLINE BIZ Blog | The Dallas News
0.3892
12
An English part-of-speech tagger with bidirectional interface, Tsujii Laboratory, University
of Tokyo: http://www-tsujii.s.u-tokyo.ac.jp
Intent-Based Categorization of Search Results Using Questions from Web Q&A Corpus
155
To check the efficiency of HN and qTFIDF factors, parameter α is used in three
different ways: (1) qTFIDF only (α= 0): In this case, we use only qTFIDF score. (2)
HN only (α= 1): In this case, we use only the HN score. (3) Hybrid (α= 0.75): In this
case, we use both qTFIDF and HN scheme, but head-noun sets are used here more
than qTFIDF for assessing the relevant intent categories of search results.
Search result’s inclusion into each intent category is assessed in two different ways:
(a) with threshold: We use the threshold of the intent score equal to 0.25. Here, the
number of intent categories is reduced as some intent categories were not collected if
their assigned search results had their calculated intent scores below 0.25. (b) without
threshold: We collect all intent categories that have any search results assigned.
We assume that less than or equal to 20 intent categories are convenient to be
shown for a query, which is actually same as the requirement for convenient
presentation of search results mentioned in [10]. There is a trade-off in that showing
fewer intent categories is more convenient for users, but naturally this may increase
the omission ratio of potentially relevant intent categories. To check how close
the number of collected intent categories is to 20, we use a one-sided optimum
deviation – a standard deviation whose mean number of intent categories is
substituted by 20 excluding the cases with fewer than 20 intent categories. The larger
the one-sided optimum deviation is, the more difficult it is for users to recognize
useful intent categories.
To evaluate category extraction efficiency, we use the precision of intent categories
expressed as the count of correct intent categories within all intent categories
collected for each query. We decided whether the assigned intent categories are
correct or not after manually checking page content.
Fig. 6(a) and (b) show the number of intent categories and one-sided optimum
deviation. We can see that the cases of using threshold are generally approaching 20
categories and deviate less than the cases without threshold.
One-sided optimum deviation
Number of intent categories
40
30.0
35
25.0
30
20.0
25
20
15.0
15
10.0
10
5.0
5
0
qTFIDF
HN
with threshold
Hybrid
without threshold
(a) Number of intent categories
0.0
qTFIDF
HN
with threshold
Hybrid
without threshold
(b) One-sided optimum deviation
Fig. 6. Result of category analysis
As seen in Fig. 7, the precision of extracting intent categories from Yahoo!
Answers generally exceeds 0.7 and reaches a maximum of 0.784 in the case of HN
used only with threshold. This means that questions in Yahoo! Answers are useful for
extracting relevant intent categories of queries. The precision decreases for the case of
not using threshold in both the qTFIDF and hybrid settings.
156
S. Yoon, A. Jatowt, and K. Tanaka
Intent category precision
0.80
0.78
0.76
0.74
0.72
0.70
0.68
0.66
0.64
qTFIDF
HN
with threshold
Hybrid
without threshold
Fig. 7. Intent category precision
Next, we use the mean average precision (MAP) to evaluate the intent-based
categorization of search results. We employ a binary decision to check whether search
results are correctly included in given intent categories, and MAP@1 and @5 are
used to compare intent-based categorization efficiency.
MAP@5
MAP@1
0.40
0.44
0.35
0.42
0.40
0.30
0.38
0.36
0.25
0.34
0.20
0.32
0.30
qTFIDF
HN
with threshold
Hybrid
without threshold
Fig. 8. Results for MAP@1
0.15
qTFIDF
HN
with threshold
Hybrid
without threshold
Fig. 9. Results for MAP@5
As seen in Fig. 8 and 9, the best accuracy is obtained in hybrid method, which
means the HN and qTFIDF mixed case yields search results better matched to their
correct intent categories. Generally, using threshold cases result in a better precision
except for a case of qTFIDF for MAP@5.
To sum up, the case of hybrid setting with threshold produces the most relevant set
of intent categories and reaches the best performance of search results’ categorization.
4.3 Discussion
• Recall of intent category extraction: Frequency-based category distillation has
defects in that it sometimes excludes valuable questions which relatively rarely
occur in Web Q&A corpus. On the other hand, the categories may sometimes be
too precise for a query. For example, the query ‘mapquest’ has various kinds of
geographical categories such as ‘United States,’ ‘Canada,’ and ‘Boston’ in Yahoo!
Answers. Even though this information is still partially useful to recognize the
actual query intent categories (e.g. mapquest software covers all geographic
locations), we should apply an efficient aggregation methodology in this case.
Intent-Based Categorization of Search Results Using Questions from Web Q&A Corpus
157
• Efficiency of extracted intent categories: Some distilled intent categories may not
correspond to any search results. For example, the query ‘bank of america’ produces
the ‘Law & Ethics’ or ‘Elections’ categories, which are concerned with the current
events of Bank of America due to the world economic crisis and the policies of the
US government. However, there are actually no search results that are relevant to
these categories within 50 top search results despite that these categories are correct.
On the other hand, some distilled intent categories may not have any assigned search
results. In such a case user cannot find pages within top 50 search results that would
directly correspond to his search intent and satisfy his information needs. To solve
this problem we may need to incorporate other methods such as query expansion by
adding intent features derived from the empty intent categories.
Using only titles and snippets may sometimes be insufficient to generate the
precise results when users have a definite meaning basis with query. For example,
the query ‘google’ has a dominant intent which is navigational, and this primary
intent cannot be explained linguistically, but only with head-noun set extractions.
• Precision of intent-based categorization: Certain intent categories may have too
many search results. In this case, we can hardly see any meaningful results.
Sometimes all 50 results are included. In another case, there are wrongly categorized pages for the query ‘Bank of America’ such as ‘http://www.bankofamerica
store.com/’, a souvenir shop page of Bank of America. Such pages decrease the
overall categorization precision. For making intent categories more useful, we treat
intent scores and threshold carefully to get well-assigned search results.
• Linguistic problems: Sometimes there are no matches between search results and
the head-noun sets of a certain category. This problem is caused by the
characteristics of the Yahoo! Answers corpus, which has mainly been compiled by
English speakers. We can obtain more accurate matching if we use a localized
Q&A corpus, such as Yahoo! Answers Japan with Japanese queries.
5 Conclusion and Future Directions
We show a novel approach to categorization of search results called intent-based
categorization. Such categorization groups search results returned for user query
according to their correspondence to the most probable search intents that users can
have when issuing the query. By arranging returned Web search results into the main
intent categories we believe that users can better find the information that directly
matches their actual search needs. In our method we extract intent features from a
large corpus of online questions in order to form key intent categories for a given user
query. We then use these features for categorizing search results by three methods,
Head-noun set extraction, TFIDF-based vector similarity comparison and a hybrid
method. The evaluation of our approach indicates high efficiency in distilling key
intent categories (70% precision) and shows the potential for categorizing search
results according to possible user intents (MAP@1 = 43% and MAP@5 = 37%).
As it is the first attempt of intent-based categorization, in the future, we need to
increase the overall precision of the proposed method. For this, we plan to consider
more deeply the actual semantics of queries. Furthermore, other aspects of data in
Web Q&A corpus could be used to more precisely extract user intent features such as
answers of questions, timestamps of questions and best answer votes. We also intend
158
S. Yoon, A. Jatowt, and K. Tanaka
to employ our general intent-based approach for other purposes. For example, one can
imagine an intent-based re-ranking application for Web search results or intent-based
browsing and navigation enhancement in the Web. Lastly, we plan to investigate the
usefulness of Yahoo! Q&A corpus for large scale usage on the Web by analyzing the
number of questions and categories as well as their distribution for most popular
queries in current query logs.
Acknowledgments. This research was supported in part by the National Institute of
Information and Communications Technology, Japan, by Grants-in-Aid for Scientific
Research (No. 18049041) from MEXT of Japan, and by the Kyoto University Global
COE Program: Informatics Education and Research Center for KnowledgeCirculating Society.
References
1. Broder, A.Z.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)
2. Chaker, J., Ounelli, H.: Genre Categorization of Web Pages. In: Proceedings of the 7th
IEEE International Conference on Data Mining Workshops, pp. 455–464 (2007)
3. Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness Using Wikipedia-based
Explicit Semantic Analysis. In: Proceedings of the 20th International Joint Conferences on
Artificial Intelligence, pp. 1606–1611 (2007)
4. Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world
knowledge. In: Proceedings of the 18th International Joint Conferences on Artificial
Intelligence, pp. 1048–1053 (2005)
5. Gyöngyi, Z., Koutrika, G., Pedersen, J., Garcia-Molina, H.: Questioning Yahoo! Answers.
In: Proceedings of QAWeb 2008 (2008)
6. Guo, Q., Agichtein, E.: Exploring mouse movements for inferring query intent. In:
Proceedings of the 31st International SIGIR Conference on Research and Development in
Information Retrieval, pp. 707–708 (2008)
7. Hu, J., Wang, G., Lochovsky, F., Chen, Z.: Understanding User’s Query Intent with
Wikipedia. In: Proceedings of the 18th International Conference on World Wide Web, pp.
471–480 (2009)
8. Jansen, B.J., Booth, D.L., Spink, A.: Determining the informational, navigational, and
transactional intent of Web queries. Information Process and Management 44(3), 1251–
1266 (2008)
9. Kules, B., Kustanowitz, J., Shneiderman, B.: Categorizing Web Search Results into
Meaningful and Stable Categories Using Fast-Feature Techniques. In: Proceedings of the
6th ACM/IEEE Joint Conference on Digital Libraries, pp. 210–219 (2006)
10. Lee, U., Liu, Z., Cho, J.: Automatic identification of user goals in Web search. In:
Proceedings of the 14th International Conference on World Wide Web, pp. 391–400 (2005)
11. Li, Y., Krishnamurthy, R., Vaithyanathan, S., Jagadish, H.V.: Getting work done on the
Web: Supporting transactional queries. In: Proceedings of the 29st International SIGIR
Conference on Research and Development in Information Retrieval, pp. 557–564 (2006)
12. Metzler, D., Croft, W.B.: Analysis of Statistical Question Classification for Fact-based
Questions. Information Retrieval 8(3), 481–504 (2004)
13. Nastase, V., Sayyad-Shirabad, J., Sokolova, M., Szpakowicz, S.: Learning Noun-Modifier
Semantic Relations with Corpus-based and WordNet-based Features. In: Proceedings of
American Association for Artificial Intelligence (2006)
14. Pandit, S., Olson, C.: Navigation-Aided Retrieval. In: Proceedings of the 16th
International Conference on World Wide Web, pp. 391–400 (2007)
15. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

Download Report

Intent-Based Categorization of Search Results Using Questions from

Paperzz.com

Your Paperzz