A Query Construction Service for large

A Query Construction Service for large-scale Web Search Engines
Ioannis Papadakis
Department of Archives and Library Sciences
Ionian University, GREECE
[email protected]
Sofia Stamou
Computer Engineering and Informatics Dept.
Patras University, GREECE
[email protected]
Abstract
The most popular way for finding information on the web is
go to a large-scale search engine and submit a query. Despite their wide usage, large-scale search engines are not
always effective in tracing the best possible information for
the user needs. There are times when web searchers spend
too much time searching over a large-scale search engine.
When (if) they eventually succeed in getting back the anticipated results, they often realize that their successful
queries are significantly different from their initial one. In
this paper, we introduce a query construction service for
assisting web information seekers specify precise and unambiguous queries over large-scale search engines. The
proposed service leverages the collective knowledge encapsulated mainly in the Wikipedia corpus and provides an
intuitive GUI via which web users can determine the semantic orientation of their searches before these are executed by the desired engine.
1. Introduction
Currently, large-scale web search engines are the predominant mean for accessing the flourishing data that is
available on the web. One thing that makes search engines
so popular is that they enable users query the web in an
intuitive yet simple manner, i.e. by submitting a few keywords to the engine’s search box. Despite the intended simplicity associated with querying the web via a large-scale
search engine, there are times when web searchers spend
too much time reformulating queries, without being able to
satisfy their information needs. Search engines provide
little help to users with vague knowledge of the terminology employed within relevant documents. Even if searchers
succeed in locating the information sought, they often realize that their successful queries differ significantly from
their initial query.
Michalis Stefanidakis
Computer Science Department
Ionian University, GREECE
[email protected]
Ioannis Andreou
Digital Systems Department
University of Piraeus, GREECE
[email protected]
In this paper, we propose a query construction service
that resides on top of large-scale web search engines and
aims at assisting information seekers formulate search queries that are expressive of their search intentions. This way,
we implicitly help search engines better understand user
information needs and accordingly serve their queries.
The motive of this work is to bridge the semantic gap
between the initial query and the query that should be addressed to the large-scale web search engine, if the user
knew the terminology employed by the documents that
actually pertain to his information needs.
The proposed query construction service extends the
functionality of the traditional search box and acts as an
intermediate layer between searchers and large-scale web
search engines. Specifically, the service’s search box incorporates auto-suggest functionality that offers query suggestions based on the semantic information of an underlying ontology. Upon selection of a suggested query by the
user, the latter is provided with information about the semantics of his selected query. Semantic information is visualized as a conceptual ontology whose nodes represent
concepts and whose labeled links represent the semantic
relation that connects concepts together. By traversing the
ontology, the user can improve his query and submit it for
search. The ontology contains information driving mainly
from Wikipedia1 and is exposed to the searcher through an
interactive, ontology-browsing GUI.
The rest of the paper is organized as follows. We begin
our discussion with an overview of the different search
modes that web users employ, we present the main difficulties associated with querying the web and we discuss a
number of methods that have been proposed for addressing
such difficulties. In Section 3, we introduce our query construction service. In Section 4, we give several snapshots of
our service’s GUI in order to illustrate the functionality and
intuitiveness of our proposed method. Section 5 discusses
the proposed approach and Section 6 provides an evalua1
www.wikipedia.org/
tion for assessing the usefulness of the proposed service in
the web search process. In Section 7, we outline the main
differences between our study and related works and we
conclude the paper in Section 8.
2. Preliminaries and Related Work
In this section, we describe the different search strategies that users employ when querying the web, in order to
illustrate how the different querying behaviors affect both
the engines’ retrieval performance and the users’ search
experience. Then, we outline the current search paradigm
that search engines support in order to address the corresponding difficulties. Finally, we discuss the different
methods that have been proposed for assisting web information seekers specify good queries.
2.1 Web Querying Behaviors
It is common knowledge that searchers do not employ a
standard behavior when querying the web. This is essentially because people have different backgrounds and varying needs and thus they make their query selections based
on different criteria and underlying knowledge. Currently,
there exist a number of studies (e.g. [6], [14]) that try to
elucidate the different search modes that web users employ.
In this direction, [13] identified four intersecting information seeking modes: (i) the known-item, (ii) the exploratory
mode, (iii) the don’t know what I need to know and (iv) the
re-finding mode. Given that the aim of our study is to help
users formulate good queries in various information seeking modes, we rely upon the research suggested in [13].
In particular, the known-item search mode adheres
when the user has a specific information need and is capable of picking the right keywords for specifying his query.
Under the known-item search, any difficulties that search
engines encounter with respect to answering known-item
queries emerge from the intrinsic nature of natural languages, as we discuss next.
The exploratory search mode is employed when the
user has a specific information need but is not sure how to
express it in a set of keywords. Under this search mode, the
challenge that search engines encounter is how to assist
users formulate intention-descriptive queries.
The don’t know what I need to know mode refers to
the situation that a user submits a query without a specific
goal in mind. Such searches might occur in complex or
unknown domains (i.e. legal, medical) as well as in case
the user’s need is to get an update of what is on the web
about his query. The paradox of this search mode is that
neither the user nor the engine are able to resolve the intention of the query without the assistance of some external
resource, e.g. pages retrieved for an initial query [10].
Thus, the greatest challenge is how to help users crystallize
their search goals at query time.
Finally, the re-find mode is encountered when the user
queries the engine in order to find information that he has
already seen in a previous search. Generally speaking, the
fourth information seeking mode is usually addressed by
personalized search, where users have to sign-in to the personalized search engine. There is also the option of addressing this mode outside the search engine (i.e. web
browser’s bookmarking functionality), but this will not
concern us further in this paper.
Having described the main information seeking modes
that web users employ while interacting with large-scale
web search engines, we now present the way in which
search engines perceive and interpret search queries.
2.2 Search Engines’ Query Handling
Although information seekers experience different
modes when querying the web, large-scale search engines’
main concern is to find the most efficient way to rank
search results. However, without any help from the search
engine at query construction time, searchers that do not
know exactly what they are looking for and/or how to express it as a search query, are most likely to issue a misleading query that will eventually result in the retrieval of
perfectly ranked, irrelevant documents. To make things
worse, queries might be ambiguous or polysemous; using
identical terms to represent distinct information needs,
which constitutes the retrieval of relevant data arduous.
Evidently, as users become more dependent on web data
to find information about a subject of interest, there is an
ever-increasing need that we equip search engines with
modules that can assist information seekers select queries
that express their varying search intentions in a distinguishable by the engine manner.
2.3 Query Selection for Improved Searches
To address the difficulties that web users encounter
when searching for information about a topic of interest, a
number of techniques have emerged such as search personalization, query refinement, relevance feedback, etc.
Search personalization is the process of incorporating
information about the user needs in the query processing
phase. One approach to personalization is to have users
describe their general search interests, which are stored as
personal profiles [12]. Another approach employs relevance feedback. Relevance feedback dictates that queries
are reformulated based on previously retrieved relevant and
non- relevant information resources [7]. This technique
provides a controlled query alteration process that is designed to emphasize some terms and to deemphasize others, as required in particular search environments. However, this approach cannot be easily applied to large-scale
web search engines, where authentication is difficult to
impose and diversity prevails. Moreover, as noted in [9], it
is overambitious to expect searchers to voluntarily provide
feedback to the overall information seeking process without proper motivation. Even in the case of automatic
(blind) relevance feedback, where terms from the top few
information resources returned are automatically fed back
into the query [21], success is by no means self evident.
In an effort to incorporate personalization functionality
within their popular, large-scale web search engines,
Google’s search wiki2 and Yahoo’s SearchMonkey3 approaches take advantage of their email services for authenticating their users and consequently log their personal
search tactics (e.g. based on the analysis of past clickthrough data [15]). They both employ auto-suggest functionality within the search box and they both anticipate
explicit feedback from the searchers during their searching
process. Upon addressing a query to the search engine,
Google’s approach presents a list of pairs. Each pair contains a suggested query constructed by the search engine
together with the corresponding first result. This way, users
have the chance to disambiguate their initial query by
choosing the suggestion that best matches their information
needs. On the contrary, Yahoo’s approach requires explicit
feedback from the searchers during their searching process.
Whether they provide enough motives for the web searchers to spend extra time in providing such feedback, still
remains to be seen.
Another approach towards this direction dictates the refinement of user queries with semantically related terms
[4]. Most of the efforts in this direction concentrate on the
disambiguation of the query terms based on either local
(i.e. results sets) or global (usually ontologies expressed as
thesauri [16]) document analysis. However, when it comes
to large-scale web search engines, the utilization of ontologies in query construction methods is difficult for three
reasons [2]: (i) integration is extremely hard, (ii) the web
imposes scalability and performance restrictions and (iii)
there is a cultural divide between the semantic web and
information retrieval disciplines.
Having the above remarks in mind, we introduce an ontology-based, query construction service suitable for largescale web search engines. The ontology models machinereadable, semantic information provided by DBpedia [1].
The details of the proposed service are given next.
accumulated mainly in Wikipedia and made readily available through DBpedia. Serving as an application for the
semantic web, the proposed approach provides an interactive GUI that seamlessly integrates the knowledge provided
by web users with large-scale web search engines. Such
knowledge is modeled in a carefully designed, low complexity and extensible ontology that supports basic reasoning against a large volume of asserted and/or inferred facts.
Basic reasoning refers to the situation where the possible
inferences that are required from the ontology are known in
advance, since they refer to answers addressed to a finite
set of GUI-driven questions. This way, the underlying ontology is serialized in a way that promotes performance and
scalability. On the other hand, expressivity is reduced.
Nevertheless, it is our belief that this is a fair tradeoff considering that the proposed approach is greatly affected by
performance and scalability.
3.1 Harvesting Data from Wikipedia
There exist several studies that rely upon the DBpedia
datasets for building highly expressive ontologies via the
combination of Wikipedia and WordNet4. The two most
widely known resources that have emerged from such efforts are the Kylin Ontology Generator (KOG) [19] and the
YAGO ontology [17]. In addition, [18] studied how to
transform Wikipedia into a giant knowledge base of semantic triples that can be used for faceted browsing.
Based on the findings of the above studies, we encapsulate the knowledge available in DBpedia into a query construction service that acts as a mediator between searchers
and large-scale web search engines. Our service aims at
assisting both users and engines perform successful web
retrieval tasks.
Our service consists of a client and a server. The client
utilizes Ajax technology and serves towards augmenting
user typed keywords with information from DBpedia. The
server is written in python and uses the Twisted http server
framework5 to fulfill automated client-side requests, by
retrieving data related to the user needs from the underling
DBpedia-based ontology. Retrieval queries are eventually
transformed into relational SQL-select statements since the
ontology is stored into a MySQL database.
3. Query Construction Service
The approach we adopt towards building a query construction service makes use of the collaborative knowledge
2
Google’s search wiki, http://www.google.com/psearch
4
3
Searchmonkey, http://developer.yahoo.com/searchmonkey
5
http://wordnet.princeton.edu/
http://twistedmatrix.com
Table 1. Statistics on the Wikipedia harvested data
Collection period: January 2009
Dataset
Wikipedia articles
Disambiguation entries
Categories
WordNet classes
Articles linked to WordNet classes
Infobox records
Value
2,866,994
226,978
339,112
124
497,797
19,230,789
So far, our service utilizes the following DBpedia datasets6: (i) the Wikipedia articles, (ii) the list of disambiguations that Wikipedia encodes for connecting generic articles to their specific interpretations, (iii) the categories under which the Wikipedia articles are classified, (iv) the
WordNet classes to which Wikipedia articles correspond
and which mainly pertain to the synset name that represents
the entities’ corresponding properties and features and (v)
the articles’ infobox datasets that contain semantically rich
key-value pair of properties about the considered articles.
Table 1 summarizes the DBpedia data that we explored in
our work. Relying on the above dataset, we organized it
into an ontological scheme as shown in Fig. 1.
disambiguated_by
infobox property 1…n
infobox
infobox
value 1
value
2
infobox
value n
Categories
addressed to the search engine. Nevertheless, our ontology
construction process is not limited to the DBpedia data, but
rather it can be fruitfully employed for any other semantic
resource that is available in machine-readable format.
4. GUI for Structuring Queries
In this section we turn our discussion to the description
of the GUI, which extends the traditional search box by
providing the opportunity to view the semantics of the
typed keywords and/or alternative formulations of the intended searches. The principle upon which the GUI design
took place is that it should be interactive, inductive, easy to
use and fast to execute. Having such requirements in mind
and based on the work of [5] on ontology visualization, we
built the query formulation box illustrated in Fig. 2.
This box enables users to type their search terms and receive in response a set of alternative query wordings. In
particular, upon typing a few characters of a search query,
the box suggests a number of strings that can be attached to
the typed tokens in order to complete them. The autocomplete suggestions are leveraged from the titles of the
Wikipedia articles that our service encapsulates.
in_category
Articles
Wordnets
of_wordnet
Figure 1. Ontology Schema.
The class Articles contains all the Wikipedia article references organized as class instances. The class Categories is
employed to host the respective categories of the Wikipedia
articles. WordNet classes store all the possible types of the
article entities. The article disambiguations are expressed
as a reflexive ‘disambiguated_by’ relation, which has the
Articles class as both domain and range. In a similar manner, the ‘in_category’ relation is employed to link articles
to their belonging categories. According to DBpedia, there
exist 7,721,468 in_category relations that connect articles
with their respective categories. Furthermore, the articles
that are associated with WordNet classes are linked to their
respective entity type encoded in WordNet via the
‘of_WordNet’ relation. Finally, the Wikipedia infoboxes of
name-value pairs of properties are ontologically expressed
as datatype properties of their corresponding article instances (sketched as dotted arrows in Fig. 1).
Based on the above elements, we developed an ontology
that we incorporated into our query construction service in
the hope of assisting users decipher the semantic orientations of their candidate queries before these are actually
6
http://wiki.dbpedia.org/Downloads32
Figure 2. Auto-suggestions for query construction.
In case the user does not wish to employ any of the suggested query alternatives, he can ignore the suggestions and
search with his self-selected keywords. On the other hand,
if the user selects a suggested alternative; an HTTP-GET
request is addressed to the server aiming at extracting semantic information for the selected query suggestion. Semantic information pertains to (i) the query disambiguations (possibly) grouped by the WordNet classes to which
disambiguations belong, (ii) the Wikipedia categories associated with each of the suggestions and (iii) key-value pairs
harvested from the Wikipedia infoboxes.
Query Disambiguation. Query disambiguation is performed in one or two steps: at first, upon selecting the ‘disambiguated by’ box, the user receives a list of all the corresponding disambiguations that match his selected suggestion (Fig. 3a). Such disambiguations could be grouped by a
WordNet class, provided they share a common WordNet
meaning. In such case, upon selecting the corresponding
WordNet label, a second-level disambiguation list appears.
lected box using a line labeled with the key name of the
selection. At the same time a search query consisting of the
keywords deriving from the two box titles is addressed to
the underlying search engine.
Figure 3a. Query Disambiguations.
Figure 5. Infobox for the Disambiguated Query.
Figure 3b. Selected Query Disambiguations.
By selecting either one of the first- or second-level disambiguations, a new box containing the disambiguated entity
is sketched at the right (Fig. 3b), which is connected to the
previous box with a line labeled ‘disambiguated by’. Simultaneously, a search query that consists of keywords
deriving from the two box titles (elimination of duplicates
is applied) is addressed to the underlying large-scale search
engine.
Categories. In case the selected suggestion (from the
search box) is associated with Wikipedia categories, the
label ‘in category’ appears on the interface and a similar
process is initiated. The searcher is prompted to select the
‘in category’ relation in order to find the category that best
matches his intended query semantics.
Upon category selection, a new box containing the selected
category is sketched at the right of the interface and is connected to the previous box with a line labeled ‘in category’
(Fig. 4).
Figure 4. Selected Category for the Disambiguated Query.
Simultaneously, a search query that consists of keywords
deriving from the two box titles (elimination of duplicates
is applied) is addressed to the underlying search engine.
Infoboxes. Finally, if the selected suggestion (from the
search box) is associated with Wikipedia infoboxes, these
are displayed as labels beneath the query inside the box.
The user is prompted to select a key in order to obtain the
corresponding values (Fig. 5). Upon the selection of an
infobox value, a new box containing the selection is
sketched at the right and connected to the previously se-
Each sketched box corresponds through it’s title to a
part of the resulting query. The user decides which boxes
will eventually participate in the search query by clicking
on the checkboxes that reside on top of each box (see Fig.
6).
Figure 6. Selecting query terms.
Based on the above query construction steps, we provide the user with information for determining and accordingly expressing the semantic orientation of his queries,
before these are submitted for search.
5. Discussion
As final notes, we should firstly underline the fact that
the searcher is always in control of the query construction
process. In case he has difficulty in using the service, the
overall search process does not break down but rather the
user may proceed as usual and address the query to the
search engine.
Moreover, by employing the proposed service, the
searcher is instantly acquainted with query terms that otherwise would take him a lot of time to gather by exhaustively running through the search results of an initially
vague query.
Additionally, the provided functionality is smoothly integrated to the traditional search engine’s GUI, since it
occupies just a small portion of the screen on top of the
search box, thus leaving plenty of room for the display of
search results. Furthermore, the simplicity of the underlying architecture not only renders the proposed service scalable to future enhancements with more semantically-rich
datasets, but also guaranties its rapid execution time. The
above features are very important for large-scale web
search engines where time and space play a crucial role for
their prosperity.
The employment of common web widgets such as the
auto-suggest box and clickable divisions (<div>) as well as
the absence of semantic web terminology from the GUI,
renders the proposed service fast to learn and easy to use.
Finally, we should mention that if there is no information in Wikipedia about the user typed terms, the query is
transparently forwarded to the search engine that our system integrates. Therefore, the worst case scenario in the
search process is that the user does not get any help from
the service, but still his query is automatically submitted to
the underlying engine for search.
Our query construction service has so far been integrated with two major web search engines (Google and
Yahoo) and can be accessed online7. Thus, we believe that
the integration is doable for any search engine that gives
programmable access to its search box.
6. Evaluation
In this section, we discuss the details of an evaluation
we carried out in order to validate the impact of our service
in the searching process on the web. For this purpose, we
decided to develop a simple, fast to answer online questionnaire and to distribute it (through social networking
sites and mailing lists) to people that spend considerable
time online. Their specific demographic features (e.g. age,
sex, occupation, etc) were not recorded during the evaluation process. Eventually, we received 72 responses, allowing us to obtain a concrete picture about the impact of the
service to a specific group of users, consisting of web
searchers that employ large-scale web search engines as
part of their everyday habits. The overall impact of the
proposed service to the entire web population can only be
assessed through the adoption of the service from a largescale web search engine.
6.1 Evaluation process
Each participant was asked to initiate a search session
with a large–scale search engine through the employment
of the proposed service. Afterwards, he was asked to fill
the provided questionnaire, which consisted of seven
closed-type questions. The questionnaire together with the
statistics of the survey are available online8.
7
http://195.251.111.53/snh/entry/index.html for Google and
http://195.251.111.53/snh/entry/index2.html for Yahoo.
8
http://195.251.111.53/snh/entry/survey_results.html
We asked the participants to answer the questionnaire
for each of their searches, in order to record, (i) the type of
searches performed, (ii) the query refinements, (iii) our
service’s contribution in the query refinement process, (iv)
the user satisfaction from the search results and (v) our
service’s usability.
6.2 Evaluation results
Based on the statistics of our data we observe that in
87.3% of the searches, users modified their initial queries
with alternative keyword formulations and that the majority
of the query modifications (i.e. 77.4%) were assisted by
our service.
The first question attempts to identify the information
seeking mode (defined by Spencer [13]) of the searcher
prior to the initiation of the searching process. According
to the results of the survey, 18 of the searches belong to the
‘known-item’ mode, 36 belong to the ‘exploratory’ mode
and 18 belong to the ‘don’t know what I need to know’
mode. In order to assess the impact of the service to each
mode, we grouped together the remaining questions per
mode.
Known-item searches. From the 18 searches adhering
to this mode, the initial query was altered in nearly half of
them (9). From these, the proposed service was employed
in 5 searches with balanced satisfaction: In a five-point
scale, 2 participants answered that their information needs
were not satisfied, 1 participant answered that his satisfaction was neutral and 2 participants answered that their information needs were satisfied. Satisfaction was also balanced in the remaining searches where the proposed service was not employed: 3 participants answered that their
satisfaction was neutral and 1 participant answered that his
information needs were satisfied.
From the above results, we can conclude that the employment of the proposed service for the known item information seeking mode did not influence significantly user
satisfaction.
Exploratory searches. From the 36 searches adhering
to this mode, the initial query was altered in all of them.
Only one participant did not employ the service to refine
his query. The proposed service was exclusively employed
in 30 searches with increased satisfaction: In a five-point
scale, 3 participants answered that their information needs
were not satisfied, 7 participants answered that their satisfaction was neutral, 15 participants answered that their information needs were satisfied and 5 participants answered
that their information needs were greatly satisfied.
From the above results, we can conclude that the employment of the proposed service greatly influenced user
satisfaction.
Finally, the most popular option to refine the initial
query through the employment of the proposed service was
the ‘in category’ choice (26 participants) followed by the
‘clarified by’ choice (6 participants).
‘Don’t know what I need to know’ searches. From the
18 searches adhering to this mode, the initial query was
altered in nearly all of them (17). The traditional way of
refining the initial query (i.e. by exhaustively running
through search results) was employed in only 2 searches.
The proposed service was exclusively employed in 13
searches with increased satisfaction: In a five-point scale, 1
participant answered that his information needs were completely not satisfied, 1 participant answered that his satisfaction was neutral, 9 participants answered that their information needs were satisfied and 2 participants answered
that their information needs were greatly satisfied.
From the above results, we can conclude that the employment of the proposed service greatly influenced user
satisfaction.
Finally, the most popular option to refine the initial
query through the employment of the proposed service was
the ‘in category’ choice (8 participants) followed by the
‘clarified by’ choice (5 participants).
Usability. With respect to usability, the answers (in a
five-point scale) that our participants gave to the question:
“Do you think that the provided service was easy to use?”
are quite encouraging: 4.17% said that they completely
disagree, 11.11% said that they were neutral, 51.39% said
that they agree and 18.06% said that they completely agree.
Results demonstrate that our service has a significant
potential in assisting web users make informed query selection and thus improve their web search experience. Our
service’s impact is especially pronounced for searches pertaining to vague information needs for which the user is
unable to pick the best keywords for verbalizing his search
intentions.
7. Comparison with Relevant Works
The idea of using external resources for improving retrieval effectiveness is now new. As a matter of fact, many
works suggest the utilization of large corpora [8] or ontologies [3] for making rich connections between user queries and document collections in the hope of improving the
accuracy of search results. Recently, researchers suggested
that Wikipedia may serve as the external resource to support query formulation. In this respect, Wikipedia-based
querying methods have been reported in [10] and [11] respectively. Here, we briefly discuss those approaches in
order to underline their commonalities and diversifications
form our work.
Specifically, in [11] a query expansion method based on
Wikipedia articles is introduced, which aims at improving
retrieval effectiveness for weak queries for which pseudorelevance feedback is not helpful. Despite the commonality
of using Wikipedia for empowering web information retrieval, our work is different from [11] in the following.
First and foremost, our aim is not to automatically expand
the user queries with semantic information harvested from
Wikipedia, but rather to inform users about the different
interpretations of their search keywords and thus enable
them to make informed query selections. Hence, under our
model the user maintains control throughout his searches
by picking the query terms himself, rather than letting the
engine take over after issuing his initial query. Moreover,
the work in [15] is intended for use in controlled repositories, not for integration with large-scale web search engines.
In the work of [10], the Koru interface is introduced that
offers WikiSauri, i.e. thesauri extracted from the Wikipedia
articles, upon which users rely to find alternative formulations for their queries in order to satisfy their information
needs. Although the proposed system shares both a common objective and approach with our query construction
service, it is different from our work in four significant
ways. Firstly, it is addressed to a digital library’s search
engine, not a large-scale web search engine. Moreover, the
enormous size of the information resources within largescale web search engines together with their highly dynamic and complex nature demands for completely decoupling the query interface from the underlying information
resources, in order to avoid scalability and performance
problems [2]. This is not the case in Koru, where query
terms are ranked according to their relevance to the document collection. Additionally, the proposed GUI employs
commonly used widgets and thus entails a reduced learning
curve compared to Koru. Finally, the contribution of our
service was evaluated against a real web dataset originating
from the Google search engine. Conversely, Koru was
evaluated against a TREC document collection in which
the query relevant documents were already known.
8. Conclusions
In this paper, we introduced a novel query construction
service that has been seamlessly integrated with large-scale
web search engines in an attempt to assist information
seekers specify precise and unambiguous queries. The proposed service relies upon the semantic information stored
in DBpedia, which it organizes into an ontology and enables users understand the semantic orientation of their
search keywords before these are actually issued for search.
To demonstrate the provided functionality, we have implemented an interactive, non-intrusive and easy-to-use
query construction service via which users obtain information about the semantics of their search terms as well as
alternative wordings for verbalizing their search intentions.
Semantic information is gradually provided to the users
upon request and helps them crystallize their search pursuits progressively.
The evaluation of the proposed service produced highly
encouraging results, considering the fact that participants
were not specifically instructed how to use the service.
Instead, they were provided with a URL pointing to the
service. Both the how-to instructions and the questionnaire
were available directly from the homepage of the proposed
service. Moreover, the fact that only 11.11% of the participants did not make use of the provided suggestions for refining their initial query, indicates that despite the fact that
Wikipedia contains just about 3,000,000 articles, nevertheless this number is capable of addressing the majority of
everyday queries to large-scale search engines. Of course,
this is valid just for the part of the web population represented by the specific group of users that participated in the
survey.
Future work is underway in the direction of extending
the underlying ontology with more datasets from DBpedia,
providing this way more options to web searchers wishing
to address semantically rich queries to large scale, web
search engines.
9. References
[1] Auer, S., Bizer, C., Lehmann, J., Kobilarov, G., Cy-
[2]
[3]
[4]
[5]
[6]
[7]
ganiak, R. and Ives, Z. 2007. DBpedia: a nucleus for a
web of open data. In Proc.of the 6th Intl. Semantic Web
Conference.
Baeza-Yates, R., Ciaramita, M., Mika, P. and
Zaragoza, H. 2008. Towards semantic search, natural
language and information systems. In Springer Lecture
Notes in Computer Science, vol.5039, pp. 4-11.
Bhogal, J., Macfarlane, A. and Smith, P. 2007. A review of ontology based query expansion. In Information Processing and Management, vol.43, no.4, pp.
866-886.
Billerbeck, B., Scholer, F., Williams, H.E. and Zobel,
J.2003. Query expansion using associated queries. In
Proceedings of the ACM CIKM Conference.
Papadakis I., Stefanidakis M. 2008. Visualizing ontologies on the web, New Directions in Intelligent Interactive Multimedia, Studies in Computational Intelligence, vol. 142, Springer Verlag, pp. 303-311.
Broder, A. 2002. A taxonomy of web search. In SIGIR
Forum, 36(2).
Salton G., Buckley C. 1990. Improving Retrieval Performance by Relevance Feedback, Journal of the
American Society for Information Science, 41(4), pp.
288-297.
[8] Diaz, F. and Metzler, D. 2006. Improving the estima-
tion of relevance models using large external corpora.
In Proceedings of the SIGIR Conference, pp. 154-161.
[9] Bradley, P. 2008. Human-powered Search Engines:
An Overview and Roundup, Ariadne, 54, available
online:
http://www.ariadne.ac.uk/issue54/searchengines/.
[10] Milne, D.N., Witten, I.H. and Nichols, D. 2007. A
knowledge-based search engine powered by Wikipedia. In Proc. of the 16th Conf. on Knowledge Management, pp.445-454.
[11] Li, Y., Luk, R.W.P., Ho, E.K.S. and Chung, F.L. 2007.
Improving weak ad-hoc queries using wikipedia as external corpus. In Proceedings of the SIGIR Conference, pp.797-798.
[12] Pazzani, M., Muramatsu, J., Billsus, D. 1996. Syskill
& Webert: identifying interesting web sites. In Proceedings of the 13th National Conference on Artificial
Intelligence, pp.54-61.
[13] Spencer, D. 2006. Four models of seeking information
and how to design for them boxes and arrows. Available
at:
http://www.boxesandarrows.com/view/four_modes_of
_seeking_information_and_how_to_design_for_them
[14] Spink, A., Park, M., Jansen, B. and Pedersen, O. 2006.
Multitasking during web search session. In Information Processing and Management, 42(5): 1366-1378.
[15] Stamou, S. and Ntoulas, A. 2009. Search personalization through query and page topical analysis. In User
Modeling and User Adapted Interaction, vol. 19, no.12, pp. 5-33.
[16] Stamou, S., Kozanidis, L., Tzekou, P. and Zotos, N.
2009. Ontology-driven personalized query refinement.
In the Journal of Web Engineering, vol.8, no.2, pp.
113-153.
[17] Suchanek, F.M., Kasneci, G. and Weikum, G. 2007.
YAGO: a core of semantic knowledge-unifying WordNet and Wikipedia. In Proc. of the WWW Conference,
pp. 697-706.
[18] Weld, D.S., Wu, F., Adar, E., Amershi, S., Fogarty, J.,
Hoffmann, R., Patel, K. and Skinner, M. 2008. Intelligence in Wikipedia. In Proc. of the 23rd AAAI Conference.
[19] Wu, F. and Weld, D.S. 2008. Automatically refining
the Wikipedia infobox ontology. In Proceedings of the
17th Intl. World Wide Web Conference, pp. 635-644.
[20] Sun, J.T. 2005. CubeSVD: A Novel Approach to Personalized Web Search, In Proceedings of WWW ’05,
pp. 382–390.
[21] Billerbeck, B. and Zobel, J. 2004. Questioning query
expansion: an examination of behaviour and parameters. In Proc Australian Database Conf, pp. 69-76.