Concept).

Personalized Semantic Search
using Ontological User Profile
Preparation of
PhD Dissertation
By
Mohammed Nazim uddin
Advisor: Prof. Geun Sik Jo
Intelligent E-Commerce Systems Lab
Dept. of Computer Science & Information Engineering,
INHA University
1
Outline
•
•
•
•
•
•
Introduction
Related Works
Ontology based user modeling
Personalized Semantic Search
Experimental Evaluations
Conclusion
2
Introduction
• Personalized Information Search
– User Modeling (User Profile)
– Search information based on user profile
– Rank the search results to make a new order list
3
Motivation
• Personalized Semantic Search
– Traditional search is Keyword based, does not provide any semantics. Users
interest are not matched most of the time with search results
– Different users with diverse intentions submit the same keyword for search
receive the same set of results
– Personalized semantic search provides the search results considering various
concepts and relations with user’s intention.
• Rank the search results
– Large set of search results returned by search engine to a query often irritate
users to find their interesting.
– Results should be rank by considering user intention.
4
Research Issue
• Personalized search is not a new in information retrieval but
effective personalization is still an open challenge.
• Number of researches are focused on personalized
information searched with different methods to enhanced the
retrieve results matched with user intention.
• A few methods addressed the semantic approach to the
personalized search and successfully applied in the domain of
information retrieval.
5
Our Approach
• Propose a frame work for personalized information
search and rank using semantic web technology.
– Propose a new technique to construct an ontological user
profile to model the user interest.
– A model to search the information in controlled semantic
web environment
– Search and Rank the information based on user profile in a
particular domain.
6
Related Works
• Learning Ontology-Based User Profiles: A Semantic Approach to
Personalized Web Search; [Ahu Sieg, et al. (2007). , IEEE Intelligent Informatics Bulletin,
Vol.8.No.1]
– Present a method for building ontological user profiles by assigning interest
scores to existing concepts(ODP)
– A spreading activation algorithm is applied for maintain the interest scores to
update the profile based on user’s behavior.
– Re-rank the search results based on interest scores and semantic evidence in
ontological user profile.
7
• Personalized information retrieval based on context
and ontological knowledge [P. MYLONAS, et al.(2008), The
Knowledge Engineering Review, Cambridge university Press]
– Focused on the combination of conceptualization and personalization
methods to improve the performance of personalized information
retrieval
– Context are represented by concepts and relationships between them
that build an ontology structure described by the concept of fuzzy
relational algebra.
– User Profile are modeled with positive(P+) and negative(P-)
preferences based on user actions and usage histories.
8
• Contextual Information Search Based on Ontological
User Profile [Nazim et al., ICCCI 2010].
– Propose a framework for searching information based on
–
–
–
–
user profile
User profile modeled with ontological approach
WordNet is used to extend the query context to provide
semantic information of users interest
Log file analysis approach has undertaken to monitor
user’s interest on access page to initially learn profile
Filter and rank the results based on the profile
9
• Ontological Profiles in Enterprise Search
[Geri Solskinnsbakk et al. EKAW, LNAI2008]
– Propose a model for constructing ontological profile from
relevant documents of a specific domain and expend users
queries based on profile to searching information
– Model describe a general ontology mapping approach to
classify related documents to a domain ontology
– Ontological profile defined as learning a predefined domain
ontology with relevant documents
– Documents are collected from domain related usage
repository as well as by querying the related topics in the
web
10
• A combination approach to web user profiling
[Jie Tang et. Al [ACM Transaction, 2010]
– Aim to extracting and fusing semantic based user profile
from the web
– Researchers profiles are constructed by extending FOAF
ontology with relevant information from the web
– Based on the profile information an academic expert list
are determined
– Researchers' interests are extracted based on topics and
publication venues
11
Semantic User Modeling
• Modeling user details with ontological approach
which we call the ontological User Profile
• User profile contains the details about user such as
name, address, phone number, and individual
preferences
• Ontological user Profile provides the user details in
terms of concepts and relations
12
Ontological User Profile
• An Ontology is defined as a formal, explicit
specification of a shared conceptual understanding
of a domain
• A new ontology can be designed to model the users’
details
• An pre-existing domain ontology can be utilized as a
reference ontology to model user’s information.
• Instance of reference ontology can be defined as
semantic profile for individual user.
13
User’s Information Collection
• Approaches
– Explicit
• By asking user directly
• User’s direct entered information leads to inconsistency or incomplete
• Often users’ not willing to supply the information
– Implicit
• By monitoring user’s behaviors
• Information overload
– We used a combine approach with minimum user’s intervention
• Explicitly provide User’s name and email address or social web id
• Automatically crawl the related information
14
Social Web as Source of User’s
Information
15
User activities
and
preferences
16
Ontological User Profile
User activities
and Preferences
ODP
WordNet
Enrichment with thesauri, links and Ontology
Sub
Concept
Sub
Concept
Sub
Concept
Sub
Concept
Concept
Sub
Concept
Sub
Concept
User Profile
Ontology
Sub
Concept
Sub
Concept
Concept
Sub
Concept
Sub
Concept
17
Representation of User Activities
and Preferences
• Concept Vector Generation
– Vector Space Model (TF-IDF)
– For each document d in a collection of documents
D, a weighted concept vector is constructed as:
d  (w1 , w2 ....wn )
Where, wi is the weight of term i in document d. Weights (wi ) are calculated as:
wi  fi *log( N / ni )
Where, fi is the frequency of terms i in the document d, N is the number of
documents in collection D and ni is the number of documents that contains
term i.
18
Enhance Preferences with ODP Ontology
• We have investigated domain ontology as reference ontology
to model the use details.
• We use ODP (Open Directory Project) as reference ontology
• In ODP, topics are organized in hierarchical manner along with
web pages belongs to the related topics maintained by
volunteer users.
• Each topic is considered as a concept and related documents
represent the concept.
19
ODP (Open Directory Project):Human edited web
directory.
About AI
Child
Directory
Related
Page Link
20
Concept Feature Vector Generation for ODP
• Every concepts signified with feature vectors
• A Feature Vector is defined as weighted terms (index words) learned from
a document or a set of documents belonging to a particular concept.
• For example, Feature vector of concept /computers/ internet = {web,
0.356; server, 0.273; data, 0.244;.. etc}.
• Feature vector for Leaf concept is calculated by
w
Fi c 
• For non-leaf concepts
Fi   
C
ij
N j c
| Nj |
w
N j C
Nj
ij

w
N k C
Nk
ik
 
w
NlC
il
Nl
21
Construction of Ontological User Profile
Classification and Mapping
• Ontological user profile is constructed by classifying user’s preferences to
reference ontology.
• Classification is performed based on similarity calculation between
representative feature vectors.
n
sim(a, b)  sim( D a , C b ) 
a
 (D
b
D *C

| Da | * | C b |
i 1

n
a
i
( Dia ) 2 *
i 1
* Cib )

n
i 1
(Cib ) 2
Where, Da and Cb are feature vectors for document a and concept b , respectively.
• An user profile is an instance of reference ontology with user’s individual
preferences.
•
• Finally, profile is enhanced by mapping with LOD data based on personal
preferences.
22
Reference Ontology(ODP)
Concept vectors of Preferences and Activates
Computer
DB
AI
Machine
Learning
Classification
Mapping
Computer
Internet
WWW
LOD Data
Web2.0
Portion of Semantic User Profile
Figure: Overview of Ontological User Profile Construction
23
Personalized Semantic Search
• The goal of personalized semantic search is to utilize user
context in the form of ontological approach
• Our intention is to accomplished the semantic search on
structural scientific research information based on user
profile.
• Search mainly focus on
– Find academic research information related to user interest
– Find experts based on a query topics
• Finally, rank the search results
24
System Structure for Personalized Semantic Search
Rank
4. Personalized Search
Results
1. Query
Publication
Search
GUI
Expert Search
Extraction of Academic Information
Modeling Academic Information
2. Matching
User Profile
Ontology
Query
Generator
AKB
Semantic Search Space
3. Extended Query
25
• Query Expansion
– The key point for a semantic search is to define the
semantics (meanings) of user query to search the desire
information related to given query
– Query expansion is a process of adding new
term/concept(s) based on user profile
– An extended query is send to the search space to extract
the related information
26
Semantic Search Space
• Documents are organized in semantic approach rather simply
link of HTML pages.
• Ontological approach is employed to build a knowledge base
with concepts and their relationships which we called
Academic Knowledge Base(AKB)
• An Academic Knowledge Base (AKB) is to be built for a
particular domain. We select scientific research of computer
science as a domain to build AKB.
27
Building Academic Knowledge Base(AKB)
• Scientific research information related to computer science
domain are investigated with ontological approach to build
AKB.
• Ontology
– In our approach an ontology is defined as (C,R,Cf ,Rf ), where,
C - set of concepts
R - set of relations
Cf - concepts with relevant weights
Rf – relation relevant weights
• Concepts are named as Classes while describing AKB
28
AKB Ontology
Journal
has_Publication
Proceeding
Researcher
written_By
Publication
is_A(.)
Book
Technical Report
Field
Class
Topic_1
Topic_3
Topic_2
Subclass
Relation 29
Ontology Description
Classes:
• Researcher
• Publication
• Field
30
Class:: Researcher
•
Includes all authors’ information who have contribution in
scientific research related to a particular domain.
• Author’s information includes general details of authors like ,
”name”, ”email address”, ”home page”, ” Affiliation ”,
”position” etc.
Researcher
Name
Email
Home page
Affiliation
Position
31
Class:: Publication
• Publication contain 4 subclasses
–
–
–
–
•
•
Book
Journal
Proceeding
Technical_report
Subclasses are related with parent class with relevant scores
Relevant scores defined the significance of subclass to parent class
Publication
1
0.8
Journal
0.4
Proceeding
0.2
Technical
Report
Book
0.3
Book
Chapter
32
• Data Properties of Class Publication
Publication
Title
Author
Abstract
Co-Author
Citation
Keyword
33
Class:: Field
Field contains concept hierarchy of domain related topics. We use the
ODP(open directory Project) hierarchy and Eventseer to create the topic
hierarchy of computer science related topics.
Artificial
Intelligence
1
Ontology
0.7
Machine
Learning
0.4
0.6
Game
Data
mining
34
Relations in Ontology
• has_Publication
– A researcher has a publication to a specific topic. For example, A
researcher (r) has a publication (has − publication) p in a particular
topic.
• belong_to_Field
– A publication(p) is related to (belong − to − f ield) to a particular topic
such as machine learning (Concept).
Machine
Learning
Field
0.7
belong-to-Field with a
relevant degree
Particular
publication
Publication P1
35
• written_By
– Inverse Property of relation has_publication
• include
– ”Fields include publications” , by this relation we can find all the
publication related to a particular topic under the ”Field” class.
Machine
Learning
Publication
(p1,p2,..pn)
36
Search Academic Information
• Academic information such as publications can be searched by matching
the query to the semantic search space.
• Semantic search space includes the “Field” hierarchy where publications
are assigned considering the concepts and relations
Field
Semantic
web(..)
P19(
..)
P91(
..)
……………………
P101(.
.)
P19(.
.)
Ontology(..)
P91(
..)
P19(
..)
instances
P101(
..)
37
Matching
• Query is extended on Ontological user profile with meta data.
• Each concept of “Filed” Concept hierarchy contains the topics
and feature vectors of the topic and related publication list
with abstract or index keywords
• Query concept with meta data are mapped to the concepts of
field class(Topic) with the cosign similarity
• Best matched concepts are selected with a similarity threshold
• Related publications are extracted from the matched concepts
38
Matching Algorithm
Input: query Connept , Field Concept Hierarchy(Fc)
Output: set of match concept pair.
L = c, Qc concepts with meta data(feature vector)
L’={ c’1,c’2…c’n}, Fc concepts with feature vectors
For concept Qc do
For each concept c’ of Fc do
Match
=sim(c,c’)=sim(Sc,Sc’)=
If match >= threshold then
Smatch = Uc,c’ with match value
End
End
Retrun(Smatch)

( Sic * Sic ' )
(Tic ,Tic ' )k
n
 (S
i 1
n
c 2
i
) *
 (S
i 1
c
i
') 2
39
Rank the search Results
• Smatch return the matching pair of query concepts and filed Concepts
with similarity scores
• Field concepts contain the list of publications with several annotated
relations
• Weight of each publication is calculated by adding all the relations weight ,
which can be denoted as P_w = belong_to+ cite_By.
where, belong_To weights are calculated by measuring degree relevancy
of a publication and a field concept. And cite_By is how many other
publication cite this publication
• Finally, rank the publication by ranking Algorithm
40
Ranking Algorithm
Input: Query Concept with weight and set of publications
Output: Rank list of publications
Qc= C ; query concept;
P={p1,p2,…pn}, set of publication with weight;
P_w= rel_score + cite_score ;
For each pi in P do
For c in Qc do
Compute x=max(sim(pi,c))
Rank_score(pi)= x+ P_w(pi)
Sort P in descending order according to Rank_score.
41
Experts Search and Ranking
• A expert list for a particular query topic is generated by
constructing an Academic Social Network (ANS).
• All the authors, co-authors exist in the publication list
generated by matching algorithm are extracted
• ANS is constructed by analyzing author, co-author
relationships in retrieved publications.
42
ANS Construction
• Topic-document relationship model (TRM)
– An initial score is measured for all the authors (including co-author)
exist in the publications for a given query topic based on AKB.
– The initial score of a researcher can be calculated by equation
P (c | t ) 
  w(c |1, p )    w(c | 2, q )
pPc
qPc
n
w
i 1
i
Where, c is the expert candidate (researcher/author), t is a given topic,
w(c|1; p) is the relevant degree of publication ( p) as a first author and w(c|2; q) is the
relevant degree as a co-author. and are two damping factors where,
43
• Author and Co-Author Relationship Model (ARM)
– In this model initial scores of expert candidates in ASN are update based on
Outward and Inward relations.
– Relation between expert candidates are calculated considering Outward and
Inward relations by the equation
Where, r(x; y) is the relation weight node x (expert candidate) to y (expert
candidate) and yi is Inward relation of node y.
– Based on the relation weights initial scores measured earlier are
updated to rank the experts with the equation
Where, Ox is the Outward relation of node x, and is damping factors for
Inward relations.
Outward and
44
Evaluation Metrics
• Most common measures used in information
retrieval evaluation are
– Precision
top-n Precision(n) =
#of relevant retrieved within n
n
– Recall
top-n Recall =
#of relevant retrieved within n
total #of relevant documents
45
Data collections
• ODP
– Crawl 5 levels of ODP hierarchies from http://
www.domz.org/ comprise of number of
concepts and documents related to computer
science domain.
– Build a reference ontology for initial user profile
46
• Scientific Publications
– Real world data about scientific research information are
collected from CiteSeer metadata
(http://citeseer.ist.psu.edu/oai.html)
– Model a semantic search space for academic information
(publications and experts ) with ontological approach
47
Evaluation of query results and rank
• With the best of our knowledge there are no standards methods exists to
evaluate the relevancy of query and returned results for a given query as
well as judge the ranking list of results
• Ground Truth is
– Manually created through the method of pooled relevance judgments
with Human judgment ; similar approaches are carried out by number
of researchers.
48
• Number of top returned results and related query are given to users
(experts) to judge the results with some scores
– As an example, for a given query top ten results are given to user or some experts and
they should judge how many on of results are relevance to the query
– Based on their opinions system evaluations are measured in terms of precisions and
recalls
49
Evaluation Results
• To evaluate the experimental results we investigate the method of pooled
relevance judgments with human judgments
• For a given query to 50 results are given to some experience researchers
including faculty members , doctoral and master student in computer
science field to access the return results by our system
•
Based on the assessment Precision and Recall are measured for number
of documents
50
Conclusion
• We proposed a new method to build an ontological user profile which can
be utilized for personalized information search
• User details are collected with minimum intervention of user
• We have presented a framework for semantic information search and rank
utilizing ontological user profile
• Evaluation results expected to be justifiable in terms of recall and
precision to search information using semantic web technology
51