Dynamic faceted navigation in decision making using Semantic Web

DECSUP-12454; No of Pages 10
Decision Support Systems xxx (2014) xxx–xxx
Contents lists available at ScienceDirect
Decision Support Systems
journal homepage: www.elsevier.com/locate/dss
Dynamic faceted navigation in decision making using Semantic
Web technology
Hak-Jin Kim a, Yongjun Zhu b, Wooju Kim c,⁎, Taimao Sun c
a
b
c
School of Business, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Republic of Korea
College of Computing and Informatics, Drexel University, 3141 Chestnut Street, PA 19104, USA
Dept. of Information and Industrial Engineering, Yonsei University, 50 Yonsei-ro, Seodaemun-gu, Seoul 120-749, Republic of Korea
a r t i c l e
i n f o
Article history:
Received 16 July 2013
Received in revised form 31 December 2013
Accepted 17 January 2014
Available online xxxx
Keywords:
Facet navigation
Semantic search
Information gain
Decision making
a b s t r a c t
Categorization in the decision making classifies decision makers' experiences about the world and provides a
guide to reach a goal. This implies that dynamically providing categories reflecting the given decision context
gives a great enhancement in decision quality. This study discusses the dynamic category selection under the
Semantic Web environment, focusing on an implementation of a decision support system, the dynamic facet
navigation system working with an ontology. Predefined fixed categories are provided to refine search results
to evade use of complex queries and tedious review of search results, but they often output insensible information
because of never reflecting the difference in search results. This paper proposes a dynamic category selection
mechanism by using the total gain ratio under a given ontology, and a reordering scheme for resulted categories.
It proves the validity of the proposed approach with a statistical analysis lastly.
© 2014 Elsevier B.V. All rights reserved.
1. Introduction
Consumers face decision making in their everyday lives to find their
products of interest such as when shopping in the supermarket and
choosing their magazines from a variety of periodicals. Nowadays
their decision making has gotten more difficult because of the product
specificity and the consumer preference variation from mass customization in modern production. This burden of comparing the vast number
of different products to find the proper demands to decision makers
requires good strategies that ease the difficulty in their decision process.
Since decision making commands gathering data and developing alternatives together with conceptual grouping, categorization is one of the
fundamental and important approaches in decision process—as in
cognitive processes such as language acquisition, learning, prediction,
production, and inductive reasoning [1]. Amos Tversky's decision making
process, elimination by aspects [2], says that decision making compares
all alternatives by aspects; it chooses an aspect; any alternatives without
that aspect are eliminated; it repeats this process with as many aspects
as needed until there remains only one alternative. With the interpretation of Tversky's decision process, decision makers group alternatives as
categories—also called facets—and scope down to the alternatives in the
relevant categories in their decision process. Categories, as the abstraction
of decision maker's experience in the world, hence have two roles: a category integrates alternatives into a general description toward a decisionmaking goal, and provides an instructional manual for inferences to guide
⁎ Corresponding author. Tel.: +82 2 2123 5716; fax: +82 2 364 7807.
E-mail addresses: [email protected] (H.-J. Kim), [email protected] (Y. Zhu),
[email protected] (W. Kim), [email protected] (T. Sun).
for the goal [3]. The conceptual system of decision making therefore
depends on construction of categories and their selection.
This paper focuses on a dynamic categorization—called faceted
navigation in literature. The topic of the paper is, in particular, facet
selection/ordering using the total gain ratio, applied to a movie search
engine working with an ontology. Based on our initial work [4], this
study focuses on implementing a dynamic faceted navigation system,
as a decision support system, which groups results based on search context using the Semantic Web technology. The current state of search
systems requires users to review long lists of items from search results
laboriously. A conventional solution to its alleviation is the provision
of refining fixed categories. Search engines provide categories that classify items in a search result into several groups, and users may refine
their searches by selecting a category value more relevant to what
they want to find. A fundamental problem residing in this approach is
that refining categories are predefined and fixed regardless of the difference in search results. Considering human decision making, faceted
navigation should dynamically evolve and provide relevant information
by contextual grouping, but due to lack of contextual knowledge, fixed
category systems do not fit on the condition. Fixed categories may
provide insensible information. For example, IMDB1, an online database
on films, has fixed categories in the fixed order such as “Refine By Type”,
“Refine By Provider”, “Refine By Top Titles”, “Refine By Top Names”,
“Refine By Genre” and “Refine By Payment Model” (Fig. 1). Querying
“CBS videos” to its search engine produces all videos provided by “CBS”,
but in this case the category “Refine By Providers” becomes superfluous
1
http://www.imdb.com (2012).
0167-9236/$ – see front matter © 2014 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.dss.2014.01.010
Please cite this article as: H.-J. Kim, et al., Dynamic faceted navigation in decision making using Semantic Web technology, Decision Support Systems (2014), http://dx.doi.org/10.1016/j.dss.2014.01.010
2
H.-J. Kim et al. / Decision Support Systems xxx (2014) xxx–xxx
Fig. 1. Fixed categories in IMDB.
and insensible. The combination of free queries and facet selection asks
for dynamic generation of facets.
The dynamic category selection is conceivable because the system is
based on an ontology which makes meanings of instances and their
properties understood by the system. Using the information from the
ontology, the selection is attained by the measurement of the relevancy
of the properties. Items resulted in a query search hence are structured
with their associated properties. For example, if a search result has a
book “The Old Man and the Sea”, it has properties in a book ontology
such as “Author”, “Publication Date” and “Publisher” (Fig. 2).
Fig. 2. Properties of a book.
Please cite this article as: H.-J. Kim, et al., Dynamic faceted navigation in decision making using Semantic Web technology, Decision Support Systems (2014), http://dx.doi.org/10.1016/j.dss.2014.01.010
H.-J. Kim et al. / Decision Support Systems xxx (2014) xxx–xxx
The properties for a given book item have their own values that
describe the characteristics of the item. Importantly, properties may
be used for classifying book items as well as for describing them. In
this research, categories are properties whose values classify items in
search results; items with the same property value go into the same
group. The dynamic selection of categories hence can be rephrased as
selecting properties that classify items depending on search results;
different search results induce different selections. It needs to define a
criterion used in the selection process—that is proposed in this research.
The ordering in categories is another issue because a category beyond
the selection range of order cannot be displayed. The ordering in categories may partially affect on the quality of classification. This research
therefore suggests a dynamic category system by developing a measure
for a criterion in the category selection and an algorithm for the category
ordering.
The presentation is organized after this section as follows. Section 2
provides some background knowledge and information in related
works about the dynamic faceted navigation. Section 3 presents the overview of the dynamic category system, the dynamic category selection
measure, and the selection procedure. Section 4 proposes a reordering
scheme for selected categories and its algorithm. Section 5 demonstrates
an example on the calculation of the proposed measure and the
reordering scheme. Section 6 evaluates the performance of the system
in comparison with the fixed category system. The section of conclusion
comes last.
2. Related works
This research assumes that a search system is based on ontology.
While the term ontology is used in many different disciplines such as
philosophy, metaphysics, and information science, this study uses the
meaning of ontology defined in the Semantic Web technology formed
with the Resource Description Framework (RDF), RDF schema, the
Ontology Web Language (OWL) [5–8]. Ontology is a specification of a
representational vocabulary for a shared domain of discourse (definitions of classes, relations, and functions with other objects) [9]. Its
expressiveness of resources and their relationships facilitate many
information systems adopting it as their structural framework for organizing information and data, mainly due to its availability of information
search.
In literature, categories are also called facets and the search using
categories faceted search. The faceted search is known to be a good alternative in search for complex search queries, because the mean query
length is about 2.4 words and most users do not use advanced search
functionality according to [10]. Literature about faceted search or navigation mainly focuses on facet construction, user interface, and facet
selection. For facet construction, how to map documents to facet hierarchy is a key issue. Many authors used varied ways: algorithms based on
lexical subsumption in [11,12], synsets and hypernym relations in [13],
the RawSugar social tagging system in [14], and personalized PageRank
values in [15]. For user interface, authors are interested in how to enable
intuitive discovery-oriented navigation with easy presentation of a
complex information space. A design recommendation in user interface
is provided in [16], a visualization scheme in [17], and two-dimensional
tables with axes of hierarchical categories and correlating pairs of facets
in [18,19], respectively. Facet selection is important in the drill down
operation. Ref. [20] selects facets in the perspective of finding the
“surprising” aspects of the data, Ref. [17] uses the combination of each
facet's a-priori, query independent usefulness and a dynamic, entropybased measure in selection. Ref. [21] maximizes the utility of the selected
facets to each individual searcher, and Ref. [22] sets up and solves the
facet selection problem with approximation algorithms. Refs. [23,24]
applied the faceted search to product search on the Web.
In generating categories dynamically, a measure needs to be used in
selecting one property against another. A fundamental concept that is
necessary in order to construct such a measure is Shannon's entropy.
3
In information theory, entropy is a measure of the average information
amount of a given content when it is considered as the value of a
random variable [25,26]. Consider the situation of using compression
by variable-length codes, where different words are encoded in bit
strings of different length. Frequent words are encoded with shorter
codes and rare words with longer codes. Entropy is a precise lower
bound of the expected number of bits required to encode an instance,
denoted by a random variable X and sampled with a probability distribution p [27]. It is also considered as a measure of unpredictability;
high entropy states that the outcome is unpredictable and zero entropy
states that the outcome value is constant. Shannon [25] defined the
entropy H of a discrete random variable X with the finite number of
values x as
X
1
H ðX Þ ¼ E log
¼ − pðxÞ log pðxÞ
pðX Þ
x
ð1Þ
where p(x) is the probability mass function of an outcome x. When two
random variables involved, the joint entropy is defined as
H ðX; Y Þ ¼ E log
X
1
¼ − pðx; yÞ log pðx; yÞ:
pðX; Y Þ
x;y
ð2Þ
To define the measure for faceted navigation, another conceptual
measure information gain is needed. In Shannon's theory, a receiver
receives a corrupted message with noise. The situation may be represented with two random variables X and Y, where X denotes the original
source of the message and Y noise [28]. The amount of information of
a random variable X contained in another Y is called information gain
IG(X; Y) or mutual information I(X; Y), which is usually defined with
conditional entropy. Conditional entropy of two random variables X
and Y is the amount of information that is newly obtained by X beyond
the information about X contained in Y; that is, the additional cost of
encoding X given the encoding of Y. It is defined by the conditional probability
X
pðx; yÞ
H ðXjY Þ ¼ H ðX; Y Þ−H ðY Þ ¼ − pðx; yÞ log
pðyÞ
x;y
ð3Þ
where p(x, y) is the probability that X = x and Y = y. Using the conditional entropy, the information gain is defined as
IGðX; Y Þ ¼ H ðX Þ−H ðXjY Þ:
ð4Þ
According to [29], it is an asymmetric measure of the difference
between two probability distributions, and also the change in information entropy from a prior state to a state that takes some additional
information. This means that the entropy of X is obtained by adding
some extra information described with the conditional entropy H(X|Y)
to the information IG(X; Y) in X presented by Y.
The generation of dynamic categories has not been studied well, and
it is hard to find an article on it in the literature. Park et al. [30], one of
the rare studies, proposed an e-mail classification agent using a category
generation method. Their agent generates categories dynamically,
based on the information of titles and contents in e-mail messages.
Since titles and contents in e-mail messages contain unstructured information that is hard to classify, categories cannot be prebuilt before
search and selected from the prebuilt. Thus, they are generated on the
spot, and because of that, users cannot anticipate what categories will
be generated. This implies that it is very hard to avoid meeting odd
and unintuitive categories that are not easily understandable. Users
tend to dislike such categories because classifying e-mail messages
according to such might be very awkward and make later references
not easy. Search results considered in this research, however, are structured because search is based on structured ontology databases; ontologies have predefined properties which can be used as categories that
Please cite this article as: H.-J. Kim, et al., Dynamic faceted navigation in decision making using Semantic Web technology, Decision Support Systems (2014), http://dx.doi.org/10.1016/j.dss.2014.01.010
4
H.-J. Kim et al. / Decision Support Systems xxx (2014) xxx–xxx
classify search results. That structuredness also enables precise analysis
in search results and may generate more intuitive and understandable
categories. Khoshnevisan [31] proposed a method that identifies a set
of search categories based on category preference obtained from search
results. Items in a search result are ranked according to the relevancy to
query and each has category preference information. The preference of a
category is determined by the order of items found from that category.
On the other hand, all items of search results in this paper become correct
answers to the given query because ontology search is used; items in a
search result cannot be ranked because they have no concept of priority
and have the same relevancy to query. All items in an ontology search
result are evaluated together, rather than dealt separately, in generating
categories. Also, the generated categories may provide more appropriate
classification than the method proposed by Khoshnevisan because
this research uses an ontology-based complete search while the
Khoshnevisan's uses the conventional text-based search.
3. The dynamic category selection
3.1. The overview of the dynamic category system
The goal of this research is to propose a dynamic category system
where categories on search results dynamically change. Fig. 3 shows
the overview of the system. All instances are stored in the ontology.
When a user asks a search query to the ontology, it returns its result
as resources. The system calculates values of measures for categories
to be proposed next using the knowledge stored in the ontology. The
given number of categories is selected based on the values according
to a certain criterion. The selected categories are displayed to the user
with the resources. Then the user checks if the displayed resources
have an item she or he wants. If it is found, the system stops. Otherwise
the user selects a category, which makes the system find the corresponding property and identify the property's values from the ontology.
The system expands the values to the user, and the user chooses a value.
Using the chosen category and value, the system updates the collection
of resources to the reduced version of resources that collects all items
whose property is the chosen category and whose property value is
the chosen value. Based on the updated resources, category measures
are recalculated, and the given number of categories is selected and
displayed again. This gives the user another chance to check her or his
item. If one is not found, it goes to the step of the selection. This process
continues until the user finds an item.
3.2. A measure for the category selection
An ontology and resources as a search result under the ontology
become inputs to the process. The process analyzes the resources with
the knowledge of the structure of the ontology, and outputs a classification of categories, also called properties, according to the resources. The
dynamic category generation in ontology search compels selections of
appropriate properties predefined in a given ontology. A natural arising
question is then in what criterion to select properties. A notion is introduced to answer that question. Property A explains Property B well if
the category made by Property A contains items homogeneous in the
value of Property B. Table 1 lists ten movie titles with three relevant
properties: “Director”, “Genre” and “Country”. If the “Genre” of a movie
is known to be “Action”, it is easily deduced from Table 1 that the
movie was released in “Korea” and directed by “Gyeongtaek Kwak”. If a
movie is known to be directed by “Changdong Lee”, its “Genre” is
“Drama” and it is a Korean movie. This deduction is not always available
because the knowledge of “Korea” gives no hints on properties of
“Genre” and “Director”. The main reason for this unavailability is that
movies with the value “Korea” of the property “Country” have many
different values in the other properties.
When movies in Table 1 are classified in the property “Country”,
eight movies belong to one category of value “Korea”, but their similarity
within the category is very small because they have different values in
properties “Director” and “Genre”. This is the same on the items in the
category of value “USA”. They have almost no similarity except that
they are released in the same countries. When the movies are classified
in the property “Genre”, the extent of similarity increases. Movies in
“Action” and “Thriller” categories are all directed by “Gyeongtaek
Kwak” and “Junho Bong”, respectively. Even though the “Drama” category still has different values in the property “Director”, the others
have the same value. The classification with the property “Director”
also increases similarity in comparison with that with the property
“Country”. When several classifiers are available in faceted navigation
of objects, a question arises: which classifier is the best for faceted navigation? It is conventionally stated that a good classifier classifies objects
similar in values of other properties or features into the same category,
and different objects into different categories [32]. This question naturally leads to the requirement of a measure of similarity within each
category, made by a given classifier. If one property explains the others
better than any other does in such a measurement, the property should
become the best classifier. In the example of Table 1, the measures of the
properties, “Director”, “Genre” and “Country”, represent their abilities to
explain “Genre” and “Country”, “Director” and “Country”, and “Director”
and “Genre”, respectively.
The total gain ratio (TGR) is used in this paper for the measurement
of the extent to which a property explains the others. Every property
has a TGR value; if the value is bigger, it explains the other properties
Table 1
Movies with three properties.
Fig. 3. Overview of the dynamic category system.
Movie
Director
Genre
Country
Friend
Typhoon
The Host
Mother
Tokyo!
SILENCED
Secret Sunshine
Poetry
The Beaver
Battleship
Gyeongtaek Kwak
Gyeongtaek Kwak
Junho Bong
Junho Bong
Junho Bong
Donghyeok Hwang
Changdong Lee
Changdong Lee
Jodie Foster
Peter Berg
Action
Action
Thriller
Thriller
Drama
Drama
Drama
Drama
Comedy
War
Korea
Korea
Korea
Korea
Korea
Korea
Korea
Korea
USA
USA
Please cite this article as: H.-J. Kim, et al., Dynamic faceted navigation in decision making using Semantic Web technology, Decision Support Systems (2014), http://dx.doi.org/10.1016/j.dss.2014.01.010
H.-J. Kim et al. / Decision Support Systems xxx (2014) xxx–xxx
better. TGR for a property p in a set of properties P is formally defined by
the sum of values of a subordinate measure, the gain ratio (GR).
TGRðpÞ ¼
X
^; pÞ
GRðp
ð5Þ
^∈P∖fpg
p
Under the notations defined above, all subordinate measures to
evaluate TGR can be constructed. The entropy H(p) for each property
p ∈ P is calculated by:
H ðpÞ ¼ −
X
a∈V R ðpÞ
^ about p
For a property p, gain ratio values of any other properties p
are calculated and summed to obtain its TGR value. After calculation of
TGR values for all properties in P, those properties are ordered and
considered as categories.
The calculation of the total gain ratio is sequentially built on top of
the average information amounts, entropies, of the properties in P .
Before describing the details of the calculation, we define some notations. First let O be the ontology that defines the domain of the search,
and R be the set of resources obtained as a search result. Each element
in R is an instance of the type of R, a subject s of a triple (s, p, o) for some
property p and object o in the triple store T ðOÞ representing O. Let P be
the set of properties whose domains are of the type of R; that is, the set
of properties such that triples containing them have the subjects in R.
For example, if a search result R contains books such as “The Old Man
and the Sea”, “Pride and Prejudice”, and “Oliver Twist”, the type of R
may be “Book”, and its properties “title” and “author” belong to P .
Now let T be the set of triples such that their subjects are search result
items in R and their properties are in P with some objects; that is,
T ¼ fðs; p; oÞ ∈ T ðOÞjs ∈ R; p ∈ Pg:
If “The Old Man and the Sea” has the value of property “Author”,
“Hemingway”, then a triple
ð“The Old Man and the Sea”; “Author”; “Hemingway”Þ ∈ T
exists. Let (⋅, p,⋅) be a set of all triples with the specific property p for
some subject and object,
ð; p; Þ ¼ fðs; p; oÞj∃s; o : ðs; p; oÞ ∈ T g
If either the subject or the object of triples is specified, we have more
restricted notations such as (⋅, p, b); that is,
ð; p; bÞ ¼ fðs; p; bÞj∃s : ðs; p; bÞ ∈ T g
Also given the specification of a property pj and its object b, the triple
set with property pi (pi ≠ pj) and the same subject as the specification is
defined by,
5
Nð; p; aÞ
N ð; p; aÞ
log2
Nð; p; Þ
Nð; p; Þ
ð6Þ
The conditional entropy of pi given a specific value b of a property pj
is next evaluated, for every pair of properties pi ; p j ∈P with pi ≠ pj,
N ; pi ; aj; p j ; b
N ; pi ; aj; p j ; b
log 2 N ; pi ; j; p j ; b
a∈V R ðpi j;p j ;bÞ N ; pi ; j; p j ; b
H pi j; p j ; b ¼ −
X
ð7Þ
Then the conditional entropy of pi given pj is defined as the average
of those conditional entropy values of pi for the values of pj,
H pi jp j ¼
N ; pi ; j; p j ; b
H pi j; p j ; b
X
N ; pi ; j; p j ; c
b∈V R ðp j Þ
c∈V R ðp j Þ
X
ð8Þ
The information gain of a property pi about a property pj with pi ≠ pj
is obtained from the difference of the entropy of pi and the conditional
entropy of pi given pj:
IG pi ; p j ¼ Hðpi Þ−H pi jp j
ð9Þ
As mentioned earlier, the information gain provides the amount of
information on pi explained by the information on pj. In other words,
the value of the information gain gives us how much the property pi is
explained by the property pj. Since the value of the information gain
depends on the number of values of pj and the total number of triples,
it needs to be normalized, and the classification information is used for
that purpose. The classification information of pi and pj with pi ≠ pj is
calculated as follows:
V R p j N ; pi ; j; p j ; c
c∈V R ðp j Þ
CI pi ; p j ¼ X
ð10Þ
n
o
ð; pi ; j; pi ; bÞ ¼ ðs; pi ; oÞj∃s; o : ðs; pi ; oÞ; s; p j ; b ∈ T
The denominator of the classification information is the total number of triples about pi in all categories over pj. The score then represents
how many categories of pj each triple on pi belongs to on average. The
gain ratio of pi about pj is obtained by dividing the information gain
with the classification information,
With more specification about the value of a property, the following
is defined similarly
IG pi ; p j
GR pi ; p j ¼ CI pi ; p j
n
o
; pi ; aj; p j ; b ¼ ðs; pi ; aÞj∃s : ðs; pi ; aÞ; s; p j ; b ∈ T
N(S) is used to denote the size of a set S. For example, N(⋅, p,⋅) denotes
the size of (⋅, p,⋅), the number of triples in T with a property p. N(⋅, p, b),
N(∗, pi, ⋅ |∗, pj, b) and N(∗, pi, a|∗, pj, b) are similarly defined. V R ðpi Þ
denotes the set of all values that are of range of pi in T ; i.e. V R ðpi Þ ¼
foj∃s; o : ðs; pi ; oÞ ∈ T g. In the similar fashion, given the specification
of a property pj and its object b, the following set is defined
n
o
V R pi j; p j ; b ¼ oj∃s; o : s; pi ; o ; s; p j ; b ∈ T
ð11Þ
Thus, the gain ratio is the normalized version of the extent of the
explanation of pi by pj. When pj has a lot of values and the category of
each value has fewer triples, the classification information is big, and
consequently the gain ratio gets smaller, which is not an ideal case.
The total gain ratio of a property p measures the extent of explanation
of all the other properties in P by selecting p in (5).
When the ontology O and the resources R are given, the following
summarizes the procedure of the calculation of the total gain ratio
briefly:
1. Identify the type of R to find the set P of properties whose domains
are of the type of R by referring to O.
Please cite this article as: H.-J. Kim, et al., Dynamic faceted navigation in decision making using Semantic Web technology, Decision Support Systems (2014), http://dx.doi.org/10.1016/j.dss.2014.01.010
6
H.-J. Kim et al. / Decision Support Systems xxx (2014) xxx–xxx
2. Construct a set T of all triples that describe properties in P and their
values for items in R.
3. Calculate an entropy H(p) for each property p in P.
4. For each pair of propertiespi ; p j ∈P with pi ≠ pj, calculate a conditional
entropy H(pi|⋅, pj, b) for each b ∈ V R ðp j Þ, and H(pi|pj) by averaging
H(pi|⋅, pj, b).
5. For each pair of pi and pj with pi ≠ pj, calculate the information gain
IG(pi; pj), the classification information CI(pi; pj) and the gain ratio
GR(pi; pj) of pi about pj.
6. Calculate the total gain ratio TGR(p) for each property p in P.
7. Order the properties in P according to the decreasing order of their
TGR scores. A property with a higher score is a better classifier.
4. Reordering by Similarity
Algorithm 1. Property reordering
3.3. A replacement procedure in the category selection
From Section 3.2, properties whose domain consists of resources are
arranged as categories by the TGR score values. This may be improved
by looking down in the tree spanned out of the type of resources on
the ontology further. Consider the situation when a candidate property
in P has its range type which has also subsequent properties. In Fig. 4,
the class “Film” has many properties that link many classes as their
ranges. The class “Actor” is one of them that is linked by a property,
say pf, of “Film”, and has also other properties (e.g. pg) as their domain
to link other classes such as “Education”, “Country”, “Height” and so
on. If a property pg of “Actor” gives more amount of information than
pf in terms of the TGR score values, then it is better that pg replaces pf
for categories. This replacement process may go down through the
tree on the ontology to find a better category. This procedure is given
as follows:
1. According to the decreasing order of scores of the TGR, order properties in P for resources R.
2. If the range of a property p∈P has no properties that have it as their
domain, p is ranked to its original order.
3. If the range of a property pi ∈P has properties that have it as their
domain—denote the set of the properties by P ðpi Þ, then replace pi
for p j ∈P ðpi Þ and calculate the TGR score for pj. For the replacement,
consider the sequence of triples of (s, pi, oi), (sj, pj, oj) where oi = sj, as
e
e
one
triple
s; p; o j by introducing p ¼ pi p j . Then replace (s, pi, oj) for
s; e
p; o j and calculate the TGR for pj.
4. Choose p j ∈P ðpi Þ that has the highest TGR score and compare it with
that of pi.
5. If the TGR score of pi is greater than that of pj, pi remains in its original
position of the property ordering of P.
6. If the TGR score of pj is greater than that of pi, pioipj replaces pi in the
property ordering of P.
This section discusses a method to reorder categories resulted
from the last section by using a similarity measurement. In the category selection, the TGR score reflects the amount of knowledge about
the other properties and as a property contains the most amount of
knowledge about the others, it is selected first. Hence the selection
chooses properties in the order of the amount of knowledge about
the others. When the similarity of a pair of properties is defined as
the extent to which they share their domain instances in triples,
choosing a property that has the lowest similarity from the already
chosen may give a better classifier because it induces more different
items. Using this idea, the categories may be reordered and may provide
a better choice to users. Algorithm 4 describes this rearrangement
procedure.
L and U are a pair of ordered and unordered sets of properties,
respectively, with L ∪ U ¼ P and L ∩ U = ∅. The algorithm starts
from the empty L with P for U. Initially it selects an arbitrary p for L
from U, and for each step selects a property p∗ from U that has the minimum similarity with L. The similarity of a property to the set L is the
sum of the similarity values to properties in L.
Algorithm 2. Similarity calculation
By considering all the properties branched out of resources, it is
possible to generate the best properties that explain the resources
well.
Film
Country
Genre
Education
Actor
Country
Fig. 4. The structure of a sample ontology.
Producer
Height
Algorithm 2 shows the calculation of the similarity between two
properties, which is used in Algorithm 1. Let p and q be two properties
in P. For each property p, the resource set R are partitioned into several
groups according to its range values; equivalently, T ¼ ∪v∈V R ðpÞ ð; p; vÞ
for p ∈ P. Then each partite (⋅, p, v) is represented with the 0–1 vector
!
x p ðvÞ ¼ ðar Þr∈R such that ar is 1 if ðr; p; vÞ∈T ; otherwise, 0. For a pair of
properties p and q, a matrix M(p, q) is defined by the
of two
product
!
representative vectors for partites of p and q; i.e. !
x p ðvÞ x q ðwÞ
v;w
where v ∈ V R ðpÞ, w ∈ V R ðqÞ. This implies that each entry of the matrix
is the number of common subjects s such that (s, p, v) and (s, q, w)
belongs to T . The similarity value of p and q is calculated on M(p, q)
Please cite this article as: H.-J. Kim, et al., Dynamic faceted navigation in decision making using Semantic Web technology, Decision Support Systems (2014), http://dx.doi.org/10.1016/j.dss.2014.01.010
H.-J. Kim et al. / Decision Support Systems xxx (2014) xxx–xxx
recursively. For v ∈ V R ðpÞ, select w∗ = argminw[M(p, q)]v,w with its
minimum value valv. Then the submatrix is obtained by removing the
row and column corresponding to v and w∗. The final similarity value
is obtained by adding valv and the similarity value from the submatrix.
5. An example
This section uses the example of Table 2 to explain better the process
of the dynamic category selection described in the last sections. In
Table 2, the search result consists of five movies whose type is “Movie”
with three properties: “Genre”, “Country” and “Director”. To arrange
the properties in order of the TGR, their entropy values first should be
calculated. Since “Genre” has eight triples with five different values, its
entropy is 2.16 from (6). In the same fashion, “Country” and “Director”
has 1.46 and 2.58, respectively. The highest entropy value for “Director”
implies that the property value is the most unpredictable. To calculate
the gain ratio, the resources of the result is classified according to the
values of “Genre”. It is shown in Table 3 when targets are set to “Country”
and “Director”. The conditional entropy of “Country” given the values of
“Genre” are 0 (= − 1log21) for “Comedy”, 1.5 (= − 2/4log22/4 −
1/4log21/4 − 1/4log21/4) for “Action”, 1.58 (= − 1/3(log21/3 + log21/
3 + log21/3)) for “Thriller”, 0 (= − 1log21) for “Adventure”, and 0
(= − 1log21) for “Drama”.
The conditional entropy of “Country” given “Genre” can be obtained
from averaging these values. Its value is 1.07 (= 4/10 × 1.5 + 3/10 ×
1.58). The corresponding information gain has the value 0.39 (= 1.46 −
1.07) from the difference between the entropy and the conditional
entropy. The classification information is 0.5 (= 5/10) since the number of range values of “Country” is five and the number of triples
whose range type is “Country” ten. The gain ratio is calculated to be
0.78 (= 0.39/0.5). In the same way, the gain ratio of “Director” about
“Genre” is 2.88 by setting “Director” as the target. Then the total gain
ratio of “Genre” is 3.66 (= 0.78 + 2.88) by adding the gain ratio values
of “Country” and “Director” about “Genre”. The total gain ratio scores of
all properties are given in Table 4. The properties are arranged in the
order of “Country”, “Genre” and “Director” according to the TGR scores.
Alternatively, another arrangement may be obtained by using the
measure of similarity. For binary representative vectors, the vector
indexes correspond to the movies shown in Table 2. Since five resources
in the table, vectors are five dimensional. Consider the calculation of the
similarity between “Country” and “Genre”. Since
V R ðgCountrygÞ ¼ fgKoreag; gUSAg; gAustraliagg
V R ðgGenregÞ ¼ fgComedyg; gActiong; gThrillerg; gAdventureg; gDramagg
the subjects s satisfying ðs; }Country}; }Korea}Þ∈T are “Highway
Star”, “War of the Arrows”, and “SILENCED”, and its corresponding
representative vector is (1, 0, 1, 0, 1). In the same way, the vector
for the range values “USA” and “Australia” are (0, 1, 0, 1, 0) and
(0, 1, 0, 0, 0), respectively. The property “Genre” has representative
vectors (1, 0, 0, 0, 0) for “Comedy”, (0, 1, 1, 1, 0) for “Action”,
(0, 1, 0, 0, 1) for “Thriller”, (0, 0, 1, 0, 0) for “Adventure”, and
Table 2
An example for the dynamic category selection.
Movie
Genre
Country
Director
Highway Star
Comedy
Korea
The Killer Elite
Action
Thriller
Action
Adventure
Action
Drama
Thriller
USA
Australia
Korea
Sangchan Kim,
Hyeonsu Kim
Gary McKendry
War of the Arrows
Live Free or Die Hard
SILENCED
USA
Korea
Hanmin Kim
Len Wiseman
Donghyeok Hwang
7
Table 3
Movies classified by “Genre”.
Target: country
Target: director
Comedy
Highway Star: Korea
Action
The Killer Elite: USA
The Killer Elite: Australia
War of the Arrows: Korea
The Killer Elite: USA
The Killer Elite: Australia
SILENCED: Korea
War of the Arrows: Korea
SILENCED: Korea
Highway Star: Sangchan
Highway Star: Hyeonsu
The Killer Elite: Gary
War of the Arrows: Hanmin
Live Free or Die Hard: Len
The Killer Elite: Gary
SILENCED: Donghyeok
Thriller
Adventure
Drama
War of the Arrows: Hanmin
SILENCED: Donghyeok
(0, 0, 0, 0, 1) for “Drama”. Also, the property “Director” has
(1, 0, 0, 0, 0) for “Sangchan”, (1, 0, 0, 0, 0) for “Hyeonsu”, (0, 1, 0, 0, 0)
for “Gary”, (0, 0, 1, 0, 0) for “Hanmin”, (0, 0, 0, 1, 0) for “Len”, and
(0, 0, 0, 0, 1) for “Donghyuck”. The products of the vectors for “Country”
and “Genre” and for “Country” and “Director” make two matrices in
Table 5.
u1, …, u3 stand for the values of V R("Country"), v1, …, v5 for those of
V R ("Genre"), and w1, …, w6 for those of V R ("Directors"). In the
matrices, selections of entries are denoted by bold-faced numbers.
Each entry is the product of two representative vectors. If the value is
higher, the two corresponding groups share more items and have
higher similarity. For each value of one property, the smallest value of
the other is selected in the calculation of the similarity. In the first
matrix, v1 is selected for u1; after removing those row and column, v4
for u2; after removing u2 and v4, v5 for u3. The similarity value of
“Country” and “Genre” is the sum of the chosen values, 1 + 0 + 0 = 1.
In the same fashion, the similarity of “Country” and “Director” is obtained
with its value 0. Since the similarity value between “Country” and “Director” is smaller than that between “Country” and “Genre”, we can switch
“Genre” and “Director” in their positions. We have a new arrangement:
“Country”, “Director” and “Genre”.
6. Evaluation
This section proves the performance of the proposed scheme by experimentation. Since this study assumes the Semantic Web environment, the usual unstructured Web data whose search method is textbased is not good for the purpose; it requires that data be organized
under an ontology and available to ontology-based search systems. A
film ontology was constructed after that of FreeBase.2 In Fig. 5, Class
Film has fifteen properties; the classes of type Person, such as Director,
Actor, Producer and so forth, have five properties; Class Country has two
properties. All individual data were collected from the Web site of DAUM
Movies 3 by using a parser. The class instance data were excerpted as
many as possible, and property information, about such as directors,
actors and so on, was also collected from the site. The ontology store
thereby contains more than one thousand movie instances and nine
thousand person instances.
This experimentation assumes the situation of browsing—to see
what is available for certain combinations of facets. It compared the
performances for the conventional fixed faceted navigation and four
variant methods of the proposed scheme with TGR and the similarity
measure in Table 6. For each method, a navigation system was implemented to help users search. Twenty four users were asked to use
these systems representing the given methods to search their target
items with help of their hinting category values about the targets.
Table 7 displays all search items that users searched. Each user was
assumed not to know exactly the target item he or she wants to find
but to know three hinting categories and their values for the unknown
2
3
http://www.freebase.com/film (2012).
http://movie.daum.net (2012).
Please cite this article as: H.-J. Kim, et al., Dynamic faceted navigation in decision making using Semantic Web technology, Decision Support Systems (2014), http://dx.doi.org/10.1016/j.dss.2014.01.010
8
H.-J. Kim et al. / Decision Support Systems xxx (2014) xxx–xxx
Table 4
The TGR scores calculated from IG, CI, and GR.
Target property
Genre
Country
Director
TGR
Genre
Table 6
Search methods.
Country
IG
CI
GR
0.39
1.61
0.5
0.56
0.78
2.88
3.66
Director
IG
CI
GR
IG
CI
GR
0.52
0.3
1.73
1.49
1.17
0.07
0.86
2.22
1.36
1.15
0.43
2.67
4.4
3.58
Table 5
Matrices of the products of representative vectors. The significance of bold emphasis are
finally chosen values for similarity value calculation.
u1
u2
u3
v1
v2
v3
v4
v5
w1
w2
w3
w4
w5
w6
1
0
0
1
2
1
1
1
1
1
0
0
1
0
0
1
0
0
1
0
0
0
1
1
1
0
0
0
1
0
1
0
0
target item. A user started with a search result and the categories generated from the initial query by each system. Then he or she chose a category and selected the corresponding value according to the hinting
category-value pairs of the target item; it is said to be a step in a navigation process. A system generates the reduced search result with the
selection of a category and its value as its input and moves to the next
step of the process. Since a selection of category and its value strengthens
the search condition, the result narrows down to the smaller number of
search items.
No.
Methods
1
2
3
4
5
Fixed faceted navigation (F)
TGR faceted navigation without update (T)
TGR faceted navigation with update (T-update)
TGR-similarity faceted navigation without update (T-S)
TGR-similarity faceted navigation with update (T-S-update)
The fixed faceted navigation (F) always provides a fixed set of categories displayed for whatever the search query and the chosen value
are. This is the conventional faceted navigation. The second (T) calculates TGR scores when the system generates the search result from the
initial query statement. All categories are ordered according to the
TGR scores and their respective fixed numbers are displayed to users.
In this search process, TGR scores are never updated. If one category is
selected and removed in the next step, the category not displayed with
the highest TGR score is added to the display to show the same number
of categories in display. The third (T-update) is the same as the method T
but updating TGR scores according to the search result at every step. The
fourth (T-S) uses the similarity measure in reordering categories without
update. As in method T, it orders all categories by TGR scores with
respect to the initial search result and reorders them by the similarity.
It never updates the order in the following steps. The last (T-S-update)
updates TGR and similarity measures at every step.
Each navigation process proceeds step by step according to the
selections of categories and values until the number of items in results
ent
ntin
has co
Country
xsd:date
has lan
guage
Continent
xsd:string
al
re
lea
se
da
te
film
co
un
film
try genre
Genre
Director
re
iti
Actor
m actor
l
fi
film
ArtDirector
tor
irec
d
t
r
a
r Producer
film lm produce
fi
film storywriter
StoryWriter
film w
fil
r
m
iter
ci
Writer
film
ne
e
m
dito
at
r
og
ra
Editor
ph
er
in
Film
r
cto
di
xsd:date
ate
d
th
bir ght xsd:double
hei
tor
r
e
ec
n
y
sig
dir
any
pan
ede
sic
om
omp
mu ostum ctionc ibutionc
c
u
tr
m
fil film prodfilm dis
film
weight xsd:double
has
ha countr
s
y
ed
Country
uc
ati
on
Edcation
Cinematographer
MusicDirector
CostumeDesigner
ProductionCompany
DistributionCompany
Fig. 5. Structure of film ontology.
Please cite this article as: H.-J. Kim, et al., Dynamic faceted navigation in decision making using Semantic Web technology, Decision Support Systems (2014), http://dx.doi.org/10.1016/j.dss.2014.01.010
H.-J. Kim et al. / Decision Support Systems xxx (2014) xxx–xxx
Table 7
Search samples.
Target Item
Hinting categories/values
The Scent of Love
Producer
Taewon Jung
Writer
Hojae Lee
Cinematographer
Ola Magnestam
Writer
Jungwon Shin
Story writer
J. W. von Goethe
Editor
Ensoo Lee
Distribution co.
Synergy
Music director
Trevor Jones
Initial release date
2005-05-09
Story writer
Osamu Tezuka
Initial release date
2009-08-06
Writer
Chanok Park
Music director
Joonsuk Bang
Director
Han Lee
Director
Jinwoo Ahn
Cinematographer
Frank Griebe
Writer
Dongbin Kim
Writer
Valerio Manfredi
Initial release date
2010-03-25
Story Writer
Anthony Horowitz
Distribution co.
Cinema Service
Music director
Youngjin Jung
Art director
Sungjoo Nam
Music director
Youngkyu Jang
The Scam
Skills
To Catch a
Virgin Ghost
The Sorcerer's
Apprentice
Barking Dogs
Never Bite
Source Code
Aegis
Antarctic Journal
Dororo
A Million
Jealousy Is My
Middle Name
The City of
Violence
My Love
Over the
Rainbow
The
International
The Ring
Virus
The Last
Legion
The Girl Is
Bad Ass
Stormbreaker
Soo
HAHAHA
Running
Turtle
Countdown
Art director
Sangho Ha
Music director
Youngjin Mok
Editor
Johannes Runeborg
Cinematographer
Hyunjae Oh
Production Co.
Jerry Bruckheimer
Writer
T. Sohn, J. Bong
Art director
Pierre Perrault
Story writer
Harutoshi Fukui
Music director
Kenji Kawai
Writer
Masa Nakamura
Story writer
Sungkyu Jo
Editor
Chanok Park
Art director
Hwasung Jo
Costume designer
Heejoon Ahn
Music director
Hojoon Park
Producer
Lloyd Phillips
Music director
Il Won
Music director
Patrick Doyle
Producer
Prachya Pinkaew
Writer
Anthony Horowitz
Story writer
Youngwoo Shin
Cinematographer
Hongel Park
Initial release date
2009-06-11
Cinematographer
Taekyung Kim
Story writer
Hain Kim
Distribution co.
Showbox
Costume designer
Karolina Nilsson
Art director
Sangman Oh
Music director
Trevor Rabin
Music director
Sungwoo Jo
Costume designer
Renée April
Editor
William M. Anderson
Writer
J. Bong, P. Yim
Initial release date
2007-10-25
Cinematographer
Jaehoon Lyu
Producer
Jokwangsoo Kim
Writer
Wonjae Lee
Production co.
Ozone Film
Initial release date
2002-05-17
Initial release date
2009-02-26
Art director
Bongoh Kim
Art director
Roberto Caruso
Distribution Co.
DigitalKIN
Editor
Andrew MacRitchie
Music director
Byungwoo Lee
Production co.
Jeonwonsa
Writer
Yeonwoo Lee
Art director
Hongsam Yang
becomes at most fifteen. When the final result contains the target item,
the process is considered as a success. The process should never go on
infinitely because a user never keeps selecting categories in reality.
This experimentation gave five steps as the maximum. The number of
steps, corresponding to the number of clicks to users, is counted until
the search finds its target item successfully. If the process passes the
fifth step with neither the reduction of the result up to at most fifteen
items nor the containment of the target item, the process is considered
to go forever and marked as a failure; this case is scored 10. All methods
display five categories to users.
Table 8 shows the mean number of steps each system took to find
target items with the given samples. Method T is almost the same as
the fixed faceted navigation in performance, which is reasonable
because it is fundamentally the same as Method F. Method T is different
only in the initial display while both keep the fixed set of categories. For
Table 8
Mean values for the number of steps.
F
T
T-S-update
T-S
T-update
5.8333
5.8333
5.6667
4.8750
3.6250
9
the different scores, significance tests were performed by the nonparametric studentized bootstrapping method [33,34] to see whether
the differences are statistically significant. An open source statistical language, Language R, with a package “bootstrap” was used for implementation. The methods were pairwise compared with mean comparison
hypotheses. R = 999 bootstrap samples were replicated out of values
for the items in Table 7. Each bootstrap sample is then used to evaluate
student- t statistics t∗r. The resulted p value is calculated as follows:
p¼
1 þ #ft r ≥t g
Rþ1
where t is the student- t statistic value for the original sample and #{A} is
the number of elements satisfying a statement A. The results of this
significance test are given in Table 9. In the table, the value of each cell
is the p-value for the alternative hypothesis that the mean for the corresponding row method is greater than that for the corresponding column
method. Under the 5% significance level, method T-update improves the
performance when compared with any other methods except method
T-S. Since the method accompanies update of TGR value at every step,
it is a complete dynamic faceted navigation scheme, and is clearly better
than the fixed schemes of methods F and T. While the mean step number for T-update is smaller than that of T-S, their difference is insignificant. From the comparison of methods F and T, it seems that how to
determine the fixed set of categories does not give an impact in performance. Unfortunately, the result of the table shows that the use of the
similarity measure in reordering has unclear results on the performance
improvement. Using the similarity at every step (T-S-update) resulted
in aggravated score in Table 8 in comparison with the mean step value
of the method T-S, even though they are not significantly different. If
TGR and the similarity measure is used only at the initial step without
update (T-S), there is a little improvement but it is not significantly different. Since the reordering with the similarity is based on how much a
category does not share values with the top TGR category, it seems not
to measure its natural information containment but the relative extent
far off from the top category. The similarity does not seem to be a
good measure in faceted navigation. It is supposed to give just a disruptive effect on the faceted navigation using TGR.
In this experimentation, a simple query statement “film” is used to
obtain the initial search result. Using other or complex queries has no
problem in the current experimentation scheme while complex query
statements require the collection of the vast amount of data. Considering
the requirement of the construction of structured data under the
Semantic Web technology, it is very hard to construct a system for
experimentation that is very close to the real situation. The implication
obtained in this section, however, can be easily extended to the cases
of complex queries as well as other simple queries. The reason is that
the search process depends only on the information contained in the
initial search result—the extent to which the result is explained by
other related categories, but not on the form of queries. In particular,
asking a complex query about film may put out resulted items
of narrower scope because of the stronger condition, but they are still
related to the same kinds of properties as in the simple query “film”;
this suggests that the search step for the complex query meets the
same state as in the simple query, and the search process proceeds in
the same fashion. The improvement of search by the dynamic faceted
navigation is therefore useful even when using complex queries.
Table 9
p-Values in significance tests for comparisons of mean values.
F
T
T-S-update
T-S
F
T
T-S-update
T-S
T-update
–
–
–
–
0.481
–
–
–
0.511
0.521
–
–
0.439
0.820
0.796
–
0.025*
0.023*
0.023*
0.120
Please cite this article as: H.-J. Kim, et al., Dynamic faceted navigation in decision making using Semantic Web technology, Decision Support Systems (2014), http://dx.doi.org/10.1016/j.dss.2014.01.010
10
H.-J. Kim et al. / Decision Support Systems xxx (2014) xxx–xxx
7. Conclusion
This paper proposes a dynamic faceted navigation under the Semantic
Web environment through a dynamic category system in the context of
the ontology-based search. Selecting categories dynamically has advantages in that it provides different categories for users according to their
search contexts. If users query with different key words, they might
assume different contexts and naturally categories provided with the
results should be adapted to their situations. The proposed approach
evaluates search results with a measure and shows different categories
according to the results. This is significant because this faceted navigation leads to different paths of search experience that suit for users'
search contexts, and users eventually reach the items with ease that
they want to search for. It can save much time for users and improve
the efficiency in search. While this study focuses on the ontologybased search in the Internet, the proposed dynamic faceted navigation
can be extended to any category systems based on structured data.
Together with the proposed measure, which analyzes information of
search results to select categories, this paper also suggests a replacement procedure to improve the categories obtained from the measure,
and an reordering algorithm based on a measure of similarity. The
experimentation for performance evaluation shows that the outcome
from the dynamic faceted navigation is more efficient than the fixed
counterpart while using the similarity for reordering gives no effect.
One possible future research direction is to re-validate the effect of
the similarity for reordering because this paper considers only search
steps as a unique performance measure. Since using similarity requires
more computational work at each step, a computational study might be
needed to measure the true effect of this approach.
References
[1] Formal approaches in categorization, in: E.M. Pothos, A.J. Wills (Eds.), Cambridge
University Press, 2011.
[2] R. Batley, A. Daly, On the equivalence between elimination-by-aspects and generalized
extreme value models of choice behavior, Journal of Mathematical Psychology 50 (5)
(2006) 456–467.
[3] Handbook of categorization in cognitive science, in: H. Cohen, C. Lefebvre (Eds.),
Elsevier Science, 2005.
[4] Y. Zhu, D. Jeon, W. Kim, J.S. Hong, M. Lee, Z. Wen, Y. Cai, The dynamic generation of
refining categories in ontology-based search, Semantic Technology, Springer, 2013,
pp. 146–158.
[5] D. Brickley, R.V. Guha, RDF vocabulary description language 1.0: RDF Schema, W3C
recommendation, February 2004.
[6] G. Klyne, J.J. Carroll, B. McBride, Resource Description Framework (RDF) concepts
and abstract syntax, W3C recommendation, February 2004.
[7] Semantic Web Workshop (WWW), Jena: implementing the RDF model and syntax
specification.
[8] E. Prud'hommeaux, A. Seaborne, SPARQL query language for RDF, W3C candidate
recommendation, June 2007.
[9] T.R. Gruber, A translation approach to portable ontology specifications, Knowledge
Acquisition 5 (2) (1993) 199–200.
[10] H. Inan, Search analytics: a guide to analyzing and optimizing website search
engines, Hurol Inan, 2006.
[11] W. Dakka, P.G. Ipeirotis, K.R. Wood, Automatic construction of multifaceted browsing
interfaces, Proceedings of the 14th ACM International Conference on Information and
Knowledge Management, ACM, 2005, pp. 768–775.
[12] W. Dakka, R. Dayal, P. Ipeirotis, Automatic discovery of useful facet terms, SIGIR
Faceted Search Workshop, 2006, pp. 18–22.
[13] E. Stoica, M.A. Hearst, M. Richardson, Automating creation of hierarchical faceted
metadata structures, HLT-NAACL, 2007, pp. 244–251.
[14] D. Feinstein, F. Smadja, Hierarchical tags and faceted search. The RawSugar
approach, Proc. SIGIR 2006 Workshop on Faceted Search, 2006, pp. 23–25.
[15] C. Kohlschütter, P.-A. Chirita, W. Nejdl, Using link analysis to identify aspects in
faceted web search, SIGIR'2006 Faceted Search Workshop, 2006, pp. 55–59.
[16] M. Hearst, Design recommendations for hierarchical faceted search interfaces, ACM
SIGIR workshop on faceted search, 2006, pp. 1–5.
[17] W.A. Arentz, A. Øhrn, Multidimensional visualization and navigation in search
results, Knowledge-Based Intelligent Information and Engineering Systems, Springer,
2004, pp. 620–629.
[18] B. Shneiderman, D. Feldman, A. Rose, X.F. Grau, Visualizing digital library search
results with categorical and hierarchical axes, Proceedings of the fifth ACM conference
on Digital libraries, ACM, 2000, pp. 57–66.
[19] D.N. Meredith, J.H. Pieper, Beta: better extraction through aggregation, SIGIR2006
Workshop on Faceted Search, 2006, pp. 8–12.
[20] D. Dash, J. Rao, N. Megiddo, A. Ailamaki, G. Lohman, Dynamic faceted search for
discovery-driven analysis, Proceedings of the 17th ACM Conference on Information
and Knowledge Management, ACM, 2008, pp. 3–12.
[21] J. Koren, Y. Zhang, X. Liu, Personalized interactive faceted search, Proceedings of the
17th International Conference on World Wide Web, ACM, 2008, pp. 477–486.
[22] S. Liberman, R. Lempel, Approximately optimal facet selection, Proceedings of the
27th Annual ACM Symposium on Applied Computing, ACM, 2012, pp. 702–708.
[23] D. Vandic, J.-W. Van Dam, F. Frasincar, Faceted product search powered by the
semantic web, Decision Support Systems 53 (3) (2012) 425–437.
[24] D. Vandic, F. Frasincar, U. Kaymak, Facet selection algorithms for web product
search, Proceedings of the 22nd ACM International Conference on Conference on
Information & Knowledge Management, ACM, 2013, pp. 2327–2332.
[25] C.E. Shannon, A mathematical theory of communication, ACM SIGMOBILE Mobile
Computing and Communications Review 5 (1) (2001) 3–55.
[26] T. Cover, J. Thomas, Elements of information theory, John Wiley & Sons, 1991.
[27] D. Koller, N. Friedman, Probabilistic graphical models: principles and techniques,
The MIT Press, 2009.
[28] R.M. Gray, Entropy and information theory, Springer, 2011.
[29] S. Kullback, R.A. Leibler, On information and sufficiency, Annals of Mathematical
Statistics 22 (1) (1951) 76–86.
[30] S. Park, S.-H. Park, J.-H. Lee, J.-S. Lee, E-mail classification agent using category generation and dynamic category hierarchy, AIS'04 Proceedings of the 13th International Conference on AI: Simulation, and Planning in High Autonomy Systems, 2004, pp. 207–214.
[31] C. Khoshnevisan, Dynamic selection and ordering of search categories based on
relevancy information, Tech. Rep. US 7,698,261 B1, U.S. Patent (April 2010).
[32] R.O. Duda, P.E. Hart, D.G. Stork, Pattern classification, John Weilery & Sons, 2001.
[33] B. Efron, R.J. Tibshirani, An introduction to the bootstrap, Chapman and Hall/CRC, 1994.
[34] A.C. Davison, D.V. Hinkley, Bootstrap methods and their application, Cambridge
University Press, 1997.
Hak-Jin Kim is an associate professor of Operations Research in the School of Business at
Yonsei University, Seoul, Korea. He holds a Ph.D. in Operations Research from Tepper
School of Business in Carnegie Mellon University, an MS in Mathematics from University
of Illinois at Urbana-Champaign, and a BBA from Yonsei University. He is currently interested in the integer programming, constraint programming, logic-based optimization,
Semantic Web and sensor networks. He has published in Decision Support Systems,
Annals of Operations Research, Knowledge Engineering Review, Telematics and Informatics
and other journals.
Yongjun Zhu received his B.S. degree in International Economics and Trade with a double
major in Computer Science from Yanbian University of Science and Technology, China in
2009. He is currently working toward the M.S. degree in Information and Industrial
Engineering as a member of the Intelligent Web Business Lab at Yonsei University, Korea.
His research interests include Ontology, Semantic Web, and Information Retrieval.
Wooju Kim is a professor of Information and Industrial Engineering at Yonsei University,
Seoul, Korea. He received a BBA degree from Yonsei University in 1987, and a Ph.D. in
Management Science from KAIST in 1994. He has published many papers related to the
issues including Semantic Web, Web Services, e-Business, Expert Systems, S/W Engineering
and Managerial Forecasting. His current research areas are decision support systems on
the Semantic Web environment, Semantic Web mining, knowledge management and
intelligent web services.
Taimao Sun received his B.S. Degree in Management Information System from Yanbian
University of Science of Technology (YUST), China in 2010. He is currently working toward
the M.S Degree in Information and Industrial Engineering as a member of the Intelligent
Web Business Lab at Yonsei University, Seoul, Korea. His research interests include Ontology,
Semantic Web, Linked Data, and Semantic Web mining.
Please cite this article as: H.-J. Kim, et al., Dynamic faceted navigation in decision making using Semantic Web technology, Decision Support Systems (2014), http://dx.doi.org/10.1016/j.dss.2014.01.010