11. Ontology Alignment.ppt

Ontology Alignment
Semantic Web - Spring 2006
Computer Engineering Department
Sharif University of Technology
The Problem




Like the Web, the Semantic Web
by design will be distributed and
heterogeneous.
Ontology is used in it to support
interoperability
and
common
understanding between different
parties.
Ontologies themselves may have
some heterogeneities.
Ontology Alignment is needed to
find semantic relationships among
entities of ontologies.
?
?
a
?
b
c
?
?
?
d
How
should I
use them?
!!!
?
Terminology

Mapping:

Ontology Alignment: a set of correspondences between two or more
a formal expression that states the semantic relation
between two entities belonging to different ontologies.
(in case of multi-alignment) ontologies. These correspondences are
expressed as mappings.

Ontology Coordination: broadest term that applies whenever
knowledge from two or more ontologies must be used at the same time in
a meaningful way (e.g. to achieve a single goal).

Ontology Transformation:
a general term for referring to any
process which leads to a new ontology o0 from an ontology o by using a
transformation function t.
An Example of Alignment
Car : Ontology A ( ? ) Automobile : Ontology B
Object
Thing
1.0
Vehicle
Has
Owner
Boat
Car
0.6
Has
Speed
Vehicle
Has
Specification
Automobile
Speed
Ali’s
Peugeot
Fast
Owner
Speed
Peugeot 405
Ali
250
km/h
0.8
0.6
Car – Automobile
Label Similarity = 0.0
Super Similarity = 1.0
Instance Similarity = 0.6
Relation Similarity = 0.8
Total Similarity = 0.6
Concept
Property
Instance
Type
Similarity
Terminology cont.

Ontology Translation: an ontology transformation function t
for translating an ontology o written in some language L into
another ontology o’ written in a distinct language L’.

Ontology Merging: the creation of a new ontology from two
(possibly overlapping) source ontologies. This concept is closely
related to that of integration in the database community.

Ontology Reconciliation: a process that harmonizes the
content of two (or more) ontologies, typically requiring changes
on one of the two sides or even on both sides.
An Example of Ontology Merging
Bus
Sport Car
Object
Thing
Vehicle
Automobile
Car
Luxury Car
Sport Car
Family Car
Family Car
Porsche
BMW
An Example of Ontology Merging
Bus
Sport Car
Object
Thing
Vehicle
Automobile
Car
Luxury Car
Sport Car
Family Car
Family Car
Porsche
BMW
An Example of Ontology Merging
Bus
Sport Car
Object
Thing
Vehicle
Automobile
Car
Luxury Car
Sport Car
Family Car
Family Car
Porsche
BMW
An Example of Ontology Merging
Object, Thing
Vehicle
Bus
Sport Car
Car, Automobile
Luxury Car
Family Car
BMW
Porsche
Forms of Heterogeneity in Ontologies

Syntactic: depend on the choice of the representation


OWL, RDFS, DAML, N3, DATALOG, PROLOG, …
Terminological: all forms of mismatches that are related to the
process of naming the entities (e.g. individuals, classes,
properties, relations) that occur in an ontology.
 Typical Examples:





different words are used to name the same entity (synonymy);
the same word is used to name different entities (polysemy);
words from different languages (English, French, etc.) are used to
name entities;
syntactic variations of the same word (different acceptable
spellings, abbreviations, use of optional prefixes or suffixes, etc.).
Mismatches at the terminological level are not as deep as those
occurring at the conceptual level. However, Most real cases
have to do with the terminological level (e.g., with the way
different people name the same entities), and therefore this
level is at least as crucial as the other one.
Heterogeneity in Ontologies, cont.

Conceptual: we encounter mismatches which have to do
with the content of an ontology.

Metaphysical differences: which have to do with how the
world is “broken into pieces”.
 Coverage: cover different portions – possibly overlapping–
of the world.


Granularity: One ontology provides a more (or less)
detailed description of the same entities.
Perspective: an ontology may provide a viewpoint, which is
different from the viewpoint adopted in another ontology.
Heterogeneity in Ontologies, cont.
Metaphysical differences:
Overcoming Heterogeneity

One common approach to the problems of heterogeneity is
the definition of relations across the heterogeneous
representations.

These relations can be used for transforming expression of
one ontology into a form compatible with that of the other.

This may happen at any level:

syntactic: through semantic-preserving transducers;

terminological:
through
functions
mapping
lexical
information;

conceptual:
through
general
transformation
of
the
representations (sometimes requiring a complete prover for
some languages);
Structure of Mapping

Alignment: a process that starts from two representations o and
o’ and produces a set of mappings between pairs of (simple or
complex) entities <e, e’> belonging to O and O’ respectively.

Intuitively, we will assume that in general a mapping can be
described as a quadruple:
<e, e’, n , R>

e and e’ are the entities between which a relation is asserted
by the mapping.

n is a degree of trust (confidence) in that mapping.

R is the relation associated to a mapping, where R identifies
the relation holding between e and e’.
 simple set-theoretic relation
 a fuzzy relation
 a probabilistic distribution over a complete set of relations
 a similarity measure
Similarity


There are many ways to assess the similarity between two
entities. The most common way amounts to defining a measure of
this similarity.
The characteristics which can be asked from these measures:
Overcoming Heterogeneity Using Similarity

Local Methods

Terminological Methods




Structural Methods



String Based Methods
Token Based Methods
Language Based Methods
Internal Structure
External Structure
Extensional (based on instances) Methods
 When the classes share the same instances
 When they do not
Terminological Methods

Terminological methods compare strings.

Can be applied to:




name,
label
comments concerning entities
URI

Take advantage of the structure of the string (as
a sequence of letter).

The main idea in using such measures is the fact
that usually similar entities have similar names
and descriptions in different ontologies.
Terminological M., cont. (Normalization)

There are a number of normalization procedures that help
improving the results of subsequent comparison:

Case normalization: consists of converting each alphabetic
character in the strings in their down case counterpart;

Diacritics suppression: replacing characters with diacritic
signs with their most frequent replacement (replacing
Montréal with Montreal);



Blank normalization: Normalizing all blank characters
(blank, tabulation, carriage return) into a single blank
character;
Link stripping: normalizing some links between words (like
replacing apostrophes and blank underline into dashes;
Stopword elimination: eliminates words that can be found
in a list (usually like, “to”, “a". . . ).
Terminological M., cont. (String Based)






Substring Similarity
Hamming Distance
N-Gram Distance
Edit Distance
Jaro Similarity
Token Based Distances
 Term Frequency Inverse Document Frequency (TF/IDF)
 Path Distance : not only the labels of objects but the
sequence of labels of entities to which those bearing the label
are related.
Terminological M., cont (String Methods)

In string edit distance, the operations usually considered are
insertion of a character, replacement of a character by another
and deletion of a character.

Levenstein Distance is an Edit Distance with all costs to 1.
Terminological M., cont. (Language Based)



Rely on using NLP techniques to find associations between
instances of concepts or classes.
Intrinsic methods: perform the terminological matching
with the help of morphological and syntactic analysis to
perform term normalization. (Stemming) : going  go
Extrinsic methods: make use of external resources such
as dictionaries and lexicons (Wordnet).
 Resnik Semantic Similarity
Structural Methods

The structure of entities that can be found in ontology can be
compared, instead of comparing their names or identifiers.

Internal Structure: use criteria such as the range of their
properties (attributes and relations), their cardinality, and the
transitivity and/or symmetry of their properties to calculate the
similarity between them.

External Structure: The similarity comparison between two
entities from two ontologies can be based on the position of
entities within their hierarchies.
Structural Methods (External)

If two entities from two ontologies are similar, their
neighbors might also be somehow similar.

Criteria for deciding that the two entities are similar
include:








Their direct super-entities are already similar.
Their sibling-entities are already similar.
Their direct sub-entities are already similar.
All (or most) of their descendant-entities (entities in the sub
tree rooted at the entity in question)
are already similar.
All (or most) of their leaf-entities are already similar.
All (or most) of entities in the paths from the root to the
entities in question are already
similar.
Structural Methods (External), cont.

Existing Approaches:


Structural topological dissimilarity on hierarchies
Upward Cotopic Distance
Extensional (based on instances) Methods

Compares the extension of classes, i.e., their set of instances
rather than their interpretation.

Conditions in which such techniques can be used:
 When the classes share the same instances

When they do not
Global Methods

After calculation of local similarity, it is remain to
compute the alignment. This involve some kind of
more global treatments, including:







aggregating the results of these base methods in order to
compute the similarity between
compound entities
developing a strategy for computing these similarities in
spite of cycles and non linearity in
the constraints governing similarities
organizing the combination of various similarity / alignment
algorithms
involving the user in the loop
finally extracting the alignments from the resulting
(dis)similarity
Compound similarity
Global similarity computation



The computation of compound similarity is still local
because it only provides similarity considering the
neighborhood of a node.
Similarity may involve the ontologies as a whole and the
final similarity values may ultimately depend on all the
ontologies.
The distance defined by local methods can be defined in a
circular way. (for instance if the distance between two classes
depends on the distances between their instances which
themselves depends on the distance between their classes or if
there are circles in the ontology).

Strategies must be defined in order to compute this global
similarity.


Similarity Flooding
Similarity equation fix point
Global similarity (Similarity Flooding)

Two ontologies are first translated into directed labeled graphs.

Creates another graph G whose nodes are pairs of nodes of the
initial graphs and there is an edge between (o1, o’1) and (o2, o’2)
labeled by p whenever there are edges (o1, p, o2) in the first
graph and (o’1, p, o’2) in the second one.

computes initial similarity values between nodes (based on their
labels for instance) and then iterates steps of re-computing the
similarities between nodes in function of the similarity between
their adjacent nodes at the previous step.

It stops when no similarity changes more than a particular
threshold or after a predetermined number of steps.

Use a weighted linear aggregation in which the weight of an edge
is the inverse of the number of other edges with the same label
reaching the same couple of entities.
Similarity Flooding Algorithm
Learning Methods


Like in many other fields, learning methods developed in machine
learning reveals useful in ontology alignment.
Two particular areas:
 supervised learning in which the ontology alignment algorithm
learns how to work through the presentation of many good
alignment (positive examples) and bad alignments (negative
examples).
 it is difficult to know which techniques works well for which
ontology features.
 An ontology alignment algorithm learnt with several
ontology pairs, might not necessarily work well for a new
ontology pair.

Learning from data in which a population of instances is
communicated to the algorithm together with theirs
relations and the classes they belong to.
Users Feed Back

The support of effective interaction of the user with the
system components is one concern of ontology alignment.

User input can take place in many areas of alignment:
 Assessing initial similarity between some terms;
 Invoking and composing alignment methods;
 Accepting or refusing similarity or alignment provided by
the various methods.
Alignment Extraction



The ultimate alignment goal is a satisfactory set of
correspondences between ontologies.
Manual Extraction: Display the entity pairs with their similarity
scores and/or ranks and leaving the choice of the appropriate
pairs up to the user of the alignment tool.
Automatic Extraction:
 Using Thresholds
 Hard threshold retains all the correspondence above
threshold n;
 Delta method consists in using as a threshold the highest
similarity value to which a particular constant value d is
subtracted;
 Proportional method: consists in using as a threshold the
a percentage of the highest similarity value;
 Percentage: retains the n% correspondences above the
others.
Alignment Extraction, cont.

Automatic Extraction

Using Optimization of the result
 if an injective mapping is required then some choices need
to be made in order to maximize the “quality” of the
alignment.
 that is typically measured on the total similarity of the
aligned entity pairs.
 A greedy alignment algorithm could construct the
correspondences step-wise, at each step selecting the most
similar pair and deleting its members from the table. The
algorithm will then stop whenever no pair remains whose
similarity is above the threshold. (Not Optimal)
 Optimal Solution: Stable Marriage
An Example: Anchor Prompt Method



The Anchor-PROMPT (an extension of PROMPT) is
an ontology merging and alignment tool for
possible matching terms.
Implemented in Protégé http://protege.stanford.edu
Incremental algorithm



Takes as input two ontologies and a set of anchors-pairs
of related terms.
Anchors are identified with the help of string-based
techniques, or defined by a user.
Then it refines them based on the ontology structures
and users feedback.
The PROMPT Algorithm
Make initial suggestions
Select the next operation
Perform automatic updates
Find conflicts
Make suggestions
After a User Performs an Operation

For each operation


perform the operation
consider possible conflicts





identify conflicts
propose solutions
analyze local context
create new suggestions
reinforce or downgrade existing suggestions
Conflicts

Conflicts that PROMPT identifies




name conflicts
dangling references
redundancy in a class hierarchy
slot-value restrictions that violate class inheritance
Anchor-PROMPT:Using Non-Local Contexts
Ontology 1
Ontology 2

Input:


Output:


A set of anchor pairs
A set of related terms with
similarity scores
Where do anchors come from?



Lexical matching
Interactive tools
User-specified
Generating Paths in the Graph
Existing Works
Features
1997
S. California
U.S. Army
1999
DARPA
Smart
1999
Sanford
Chimaera
1999
Prompt
Chalupsky
Automatic
Structure
OntoMorph
Project Leader
Instance
Organization
Semantic
Year
String
Method
Semi
T
Semi
T
Fridman, Noy
Semi
T
T
Stanford
McGuinness
Semi
T
T
2001
Stanford
Noy, Musen
Semi
T
T
InfoSlueth
2001
Amsterdam
Ding
Semi
T
T
A. Prompt
2002
Stanford
Noy, Musen
Semi
T
T
Glue
2002
Illinois
Doan
Automatic
T
T
IF Map
2003
Southampton
Kafoglou
Automatic
T
NOM
2003
Karlsruhe
Ehric
Automatic
T
T
T
T
QOM
2004
Karlsruhe
Ehric
Automatic
T
T
T
T
CROSI
2005
Southampton
Kafoglou
Automatic
T
T
Aggregation
Lexical
T
T
T
T
T
T
T
The End