Improving the Search for the Most Similar Concept in other Ontology

Improving the Search for the Most Similar Concept in
other Ontology
Abstract. Improvements made to the algorithm COM of Olivares are presented.
Such algorithm finds the concept cB in ontology B most closely resembling a
given concept cA in ontology A. It also finds the similarity between both
concepts. Our improvements significantly complete it and increase its accuracy.
They consist of: (1) a better notation for ontologies; (2) representation of
partitions; (3) a concept may belong to more than one super-class (hypernymy);
(4) relations or values that are just words or word phrases are allowed; (5)
verification of synonymy in the analysis of the properties of each concept,
which renders the search most intelligent, dealing more fully with the semantics
of the concepts; (6) application of an algorithm that determines the best
mapping between ontologies A and B. Some examples are given.
1. Introduction and objectives
For an agent, the purpose of communicating with some other agent is to fulfill its
objectives. Thus, a successful communication between agents A and B occurs if both
are closer to their goal as its result. The paper ‘Finding the most similar concepts in
two different ontologies’ [4] deals with communication with little previous
agreement: agents A and B agree only to share a given communication language. In
order to carry out an efficient communication between agents, algorithm COM [4, 9]
seeks to find concept cB in OB most similar to cA in OA. It works as follows: Given
two ontologies OA and OB, when agent A wants to communicate concept cA to agent
B, it sends to OB two words (or word phrases), the first denoting the concept and the
second denoting its predecessor (the ‘father’ of cA). In order to find the concept cB in
OB most similar to cA, COM considers four cases:
a. Both concepts cA and the father pA of cA are in OB;
b. The concept cA is not in OB but its father pA is;
c. The concept cA is in OB but its father pA is not;
d. Neither cA nor its father pA are in OB.
For each of these cases, COM finds a reasonable answer through limited search.
This article preserves these four cases, but improves the accuracy of the result cB by
better exploitation of the semantics represented in ontologies OA and OB.
The improvements are:
1. Representation of partitions.
2
2. More general ontology notation. Partitions, multiple hypernymy (see below), word
descriptions in several languages, and other improvements in the representation of
knowledge, have required a more general ontology notation. See figure 2.
3. Multiple hypernymy. A concept may belong to several hyperclasses (it may have
multiple parents). Example: apple ⊂ food, and apple ⊂ fruit.
4. Simplified relations or values. Relations and values that are just words or word
phrases are allowed. Not every relation is a concept; not every related item is a
concept. In this paper, concepts appear in Courier font.
5. Extended use of synonyms. Match of words by COM is extended to use synonyms
(either given in the ontologies or through an external knowledge source), so that
the names of the concepts, of the properties (relations) and of their values are better
matched. Example: color = red and tonality = reddish are found to
mean the same, to match. This ‘deeper’ use of the semantics in the ontology
enhances our ‘improved comparator’ or ‘improved COM’.
6. Dance algorithm. Given two collections of relation-value pairs, {r1A = v1A, r2A =
v2A, …, rkA = vkA} belonging to concept cA, and {r1B = v1B, r2B = v2B, …, rmB
= vmB} belonging to candidate concept cB, we have constructed a relaxation-type
algorithm, called “dance algorithm,” that finds the best pair (r*A = vA, r*B = v*B),
that is, the pair with the best match.
The article describes these improvements, in the context of the four cases (a)-(d).
1.1 Related work
• Ontology representation. Ontologies are represented in several notations. Olivares
[4] requires that every node in the ontology be a concept. OWL [2] is frequently
used with RDF/XML. RDF [11] is strongly linked to XML. DAML+OIL [3]
considers inference mechanisms; it does not identify synonymy. KIF [7] is efficient
in agent to agent communication. None of these representations can represent
partitions, nor multiple parents, nor synonymy. Simpler representations are given
for hierarchies [5], a tree where each node is a concept (a ‘symbolic value’) or a
partition. This work also does not handle multiple hypernymy (multiple parents).
• Simplified relations and values are found in most ontology representation
languages, but always distinguishing them through special names. In our notation,
an ontology is represented as a (R, C) pair of relations and concepts, which could
be ‘simplified’ as per improvement number 4.
• Synonymy is exploited in several matching algorithms, for instance [1] uses them
for disambiguation. Nevertheless, our approach mixes concept comparison (based
on the position and relations of each concept in its ontology) with synonym
comparison (based in a given list of synonyms or near-identical terms).
• Finding the most likely pair. Let sim(a, b) be the similarity between concepts a ∈
A and b ∈ B. The problem to solve is: For two sets A and B of sizes sA and sB and
s = min (sA, sB), find s pairs of the form (ai, bi) which maximize Σ1≤i≤ssim(ai, bi).
Our ‘dance algorithm’ guarantees to find the s pairs that maximize the summation.
Another algorithm with this property is the relaxation algorithm used for vision and
Improving the Search for the Most Similar Concept in other Ontology
3
image processing [6]. A simpler version that finds good but perhaps not optimal
results in short time is Similarity Flooding [8].
• Given cA, find the most similar concept in other ontology. This problem was solved
for the first time in [9]. See also [4], which this paper extends.
Fig. 1. An ontology shown as a tree. Arrows indicate the superset or hypernymy relation. In
parenthesis after each concept appear their associated words; some are omitted from the figure
for clarity. The relations-value pairs appear separated by ‘=’, thus: (appearance = flat)
2. Improving the search of concept cB ∈OB most similar to cA ∈OA
The problem is: given a concept in an ontology (OA), find the most similar concept
in other ontology (OB). The six improvements (§1) over the algorithm [4] are:
2.1. Representation of partitions. partition is a particular type of relation. A
partition of a set is a collection of subsets, such that any two of them are mutually
exclusive and all of them are collectively exhaustive. Every element of the set must be
in exactly one of the subsets. Example: see age in figure 2.
Advantages: • A partition has more information than just a collection of subsets.
Formats OWL [2], DAML+OIL [3], and RDF [11] are unable to represent partitions.
2.2. A better notation for ontologies. The new annotation (Figure 2) consists of a
structure with labels that identify the description of the concepts and its relations:
<concept> </concept> : it contains the name of the concept.
<idiom> </idiom> : it contains the language of the words.
<word> </word> : it contains the words and succession of separated words by
commas that they describe to the concept.
<relation></relation> : it contains the properties of the concept and the relation
towards other concepts. We deal with n-ary relations.
4
Relations are explicitly declared. Example (Figure 2): relation eats. An exception
to this is relation subset (hyponymy), that is expressed through nested <concept>s.
Example (Figure 2): plant is a subset of physical_object.
New notation
Fig. 2. Representation of an ontology in the new notation
Advantages: • The labels clearly define each concept. • Partitions (§2.1), multiple
superclasses (§2.3), extensions, simplified relations and values (§2.4) are allowed.
• Concepts of the ontology can be extended anywhere (§2.3). These are not present in
other formats (§1.1). Example: An ontology defines first that Canada is a country,
and later in the ontology it adds the relation Canada is a market.
2.3. A concept may belong to more than one super-class (hypernymy). A concept
can have several parents, not just one as most languages for ontology definitions
require. Example: concept apple is a type of fruit and is also a type of food.
In our new notation, a concept defined in some part of the ontology can be
extended in other part, by adding to it more relations.
Advantages: • Concepts with multiple parents provide greater detail on the
represented concept. • The semantics of the concepts is increased.
This extension does not appear in older ontology formats (§1.1). Example: Figure
3 shows concept glass in OB with two parents (superclasses, hypernyms).
2.4. Relations or values that are just words or word phrases are allowed. In COM
[4], relations and the related items must all be concepts. Our improvements are three:
a. A relation can be just a word or a thematic phrase. For instance, it is possible for
some ontology to have the relation (John Smith, lives in, New York), where
John Smith and New York are concepts (there is additional knowledge in the
ontology about them) but ‘lives in’ are just words, no additional knowledge about
this relation is captured in this ontology. Of course, for some other ontology,
lives in could be a (full) concept.
Improving the Search for the Most Similar Concept in other Ontology
5
Fig. 3. Concept cB = glassB has two parents (accessories for car, cosmetics).
The dotted arrow indicates the similarities between concepts – these are identified by
rectangles. At times, a subscript, as in glassB, shows the ontology of the concept
b. A concept involved in a relation could be just a word, a thematic phrase, a number,
or a string. That is, a concept could be a simpler item. For example, an ontology
could contain the relation (Ann, owns, flat monitor screen) where Ann and owns
are concepts (with additional information elsewhere in the ontology), but ‘flat
monitor screen’ is just a phrase –no additional information is provided by the
ontology.
c. The words that denote a concept can be simple (a single word, for example
‘Texas’) or composed (a thematic phrase, more than a word, for example: ‘The
Lone Star State’). A concept can have more than one words and thematic phrases
denoting it.
Advantages. • Better, more accurate definition of the concept.
2.5. Verification of synonymous in the analysis of the properties of each concept.
This improvement makes the search more intelligent, dealing more fully with the
semantics of the concepts. This advance identifies the similarity of the properties
although the defined words are not exactly the same ones, but recognizes them
through synonymy. An example (Figure 4): What is the concept in OB most similar to
glass in OA?
Advantages. • Identification of different words with similar meaning in a given
context. • It exploits the semantics of the concept. Most languages that process
ontologies (§1.1) do not use this verification.
2.6. Dance algorithm. It refers to the application of an algorithm that determines the
best pairing (mapping) from ontology OA to ontology OB. That is, the best match of
(all) concepts from OA to the corresponding concepts from OB is found. Four rules are
possible:
(A) Monoandry and monogamy: to each concept of OA corresponds at most one
concept from OB, and vice versa. If the sizes of OA and OB are different, some
concepts will be left without match (“nobody to dance with”).
(B) Monogamy and polyandry. A concept from OA must match at most one concept
from OB, but a concept from OB may match several from OA.
(C) Polyandry and monogamy. A concept from OA may match several from OB, but
a concept from OB must match at most one concept from OA.
6
Fig. 4. Trying to ascertain whether glass in OA is similar to mirrors in OB. The properties
appearance (in OA) and shape (in OB) are compared. The values flat (of appearance)
and flat (of shape) are compared, too. The value of the similarity is found to be 1 (total
similarity) because shape and appearance are found to be synonyms. The concept where
such synonymy is found is shown in a dotted circle
(D) Polyandry and polygamy. A concept from OA may match several from OB, and
vice versa.
The best global match (“the best dance”) is (conceptually) found as follows:
1. A couple is a pair of concepts, one from OA and the other from OB. Couples are
formed according to one of the rules (A), (B), (C) or (D) above. For instance, for
rule (B), couples (Mary, John) and (Mary, Peter) are possible (monogamy and
polyandry). That is, Mary dances at the same time with both John and Peter.
Couples are formed arbitrarily until no more are possible. For instance, under rule
(A), some concepts of the smaller ontology will not form couples. “They are left
without a partner; they do not participate in this dance.”
2. The collection of allowed couples formed in (1) is called a dance. The similarity
(resemblance, affinity) between members of each couple is measured. These
similarities are added for all couples of this dance. This total is the similarity
attached to the dance.
3. Another dance is formed by forming new couples through step 1. The similarity
attached to this new dance is assessed.
4. Once all possible dances are formed and evaluated through many repetitions of (3),
the dance with the best similarity is selected, and that defines the “best global
couples”. We have found “the best dance”, or the best global match.
The algorithm as described is pure brute force. Plenty of heuristics are used, some are:
i. Reject early in the game couples with low similarity [such as (apple, hammer)] ;
ii. Using COM [4], select for each concept cA in OA the most similar concept cB in
OB, and have all these initial couples (cA, cB) as a good initial dance. Evaluate only
new dances with small departures from this initial dance. A departure may be that
Improving the Search for the Most Similar Concept in other Ontology
7
cA dances not with its most similar concept cB in OB, but with its second most
similar concept c’B;
iii. Use min-max search to keep the search tree under control.
These heuristics greatly improve the running time and, although they do not
guarantee to find the optimum dance, in most cases it is found. In fact, in most cases
the optimum dance turns out to be the ‘initial dance’ defined in heuristics ii above [8].
3. Examples
This section contains examples for the similarity between two concepts from two
ontologies. The new format (figure 2) for ontologies uses labeled structures, which
allow for easier description of concepts, sub-concepts, relations and values.
For simplicity only one property is shown where needed. Properties appear after
the colon as relation = value pairs, Ontologies A and B are shown in figures 5 and 7.
Fig. 5. Ontology A
Example 1. Looking in OB for a concept most similar to mirrorA with father
pharmacyA, concept mirrorB is found but with father accessories for
carB, so we enter case C of COM [4, 9]. • Properties (appearance = flat) of
mirrorA are compared with those (shape = flat) of mirrorB. We look first in
OA, then in OB, if appearance and shape define the same concept – if found, they
8
are identified as synonyms. • Our algorithm seeks matching most properties of the
children of cA with those of children of cB. In our example properties of the children
of mirrorA [those are pocket mirrorA with properties shape = flat and
reflect = luminous rays, and glassA with property shape = round] are
compared with properties of the children of mirrorB [these are glassB with
properties shape = round and reflect = luminous rays, and reflectorB
with property appearance = flat]. The dance algorithm selects the best couple.
Figure 6 shows inside a rectangle the dance participants, arrows indicate couples and
numbers represent the similarity of that couple. Monogamy and monoandry are
enforced [case (A) of §2.6].
Fig. 6. Comparing properties in the dance algorithm
Fig. 7. Ontology B
Improving the Search for the Most Similar Concept in other Ontology
9
The first combination of properties among concepts is: a. Select the couple
(Pocket mirrorA, glassB) which has sv = 0.75 (similarity value). b. Select the
couple (glassA, reflectorB) which has sv = 0.5. c. Remove (glassA, glassB)
with sv = 1. d. Remove (pocket mirrorA, reflectorB) which has sv = 0.5. The
result (average similarity) of this dance is (0.75 + 0.5)/2 = 1.25 / 2 = 0.65
The second combination of properties among concepts is: a. Select (pocket
mirrorA, reflectorB) which has sv = 0.5. b. Select (glassA, glassB) which
has sv = 1. c. Remove (glassA, reflectorB) with sv = 0.5. d. Remove (pocket
mirrorA, glassB) with sv = 0.75. This dance’s result is (0.5 + 1)/2 = 1.5 /2 = 0.75.
Thus, the dance algorithm selects the second combination. This indicates that there
is a way in which most of the properties of mirrorA match mirrorB. Figure 8
shows the result.
Fig. 8. Since the dance algorithm found the best couple among the children of mirrorA and
mirrorB to have good similarity (0.75, second combination above), the conclusion is that the
concept in OB most similar to mirrorA is mirrorB with sv = 1
Example 2. Finding a concept in OB which has two parents. Ontologies OA and OB are
used, see figures 5 and 7. We wish to find in OB the closest to glassA [whose father
is mirrorA], but the parents of glassB are mirrorB and cosmeticsB. Case A of
COM [4] holds. Therefore, our improved algorithm seeks the words of the parents
that match most. The dance algorithm returns “glassB is the most similar concept,
with sv = 1”. (Output not shown due to shortness of space).
10
3.1 Conclusions
The implemented improvements to algorithm COM produce more accurate
matches, through increased use of the semantics of the concepts and relations between
ontologies. This yields a better understanding between agents.
Acknowledgments
Work herein performed was partially sponsored by projects (blank) and (blank).
One of the authors (blank) has a SNI award; another author (blank) has a graduate
student grant from CONACYT.
References
1. S. Banerjee, T. Pedersen, Extended Gloss overlaps as a measure of semantic
relatedness, Proc. IJCAI 2003, 805-810
2. S. Bechnofer, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F.
Patel-Schneider, L. Andrea Stein. OWL Web Ontology Language Reference. W3C
Recommendation 10 February 2004. http://www.w3.org/TR/2004/REC-owl-ref20040210/
3. D. Connolly, F. van Harmelen, I. Horrocks, D. L. McGuinnes, P. F. PatelSchneider, L. Andrea Stein. DAML+OIL Reference description. March 2001. W3C
Note 18 December 2001. http://www.w3.org/TR/2001/NOTE-daml+oil-reference20011218
4. (Reference omitted to avoid author’s identification)
5. (Reference omitted to avoid author’s identification)
6. R. Hummel and S. Zucker. On the foundations of relaxation labeling processes.
PAMI (5)3:267-287, May 1983
7. Knowledge Interchange Format. Draft proposed. American National Standard
(dpANS) NCITS.T2/98-004.
8. S. Melnik, H. Garcia-Molina, E. Rahm, Similarity Floding: as versatile graph
matching algorithm and its application to schema matching. Proc. 18th Intl. Conf.
on Data Engineering (ICDE), San Jose, CA 2002
9. Jesus Olivares (2002) An Interaction Model among Purposeful Agents, Mixed
Ontologies and Unexpected Events. Ph. D. Thesis, CIC-IPN. In Spanish. Available
on line at http://www.jesusolivares.com/interaction/publica
10. Pérez A., and Suárez M. Carmen, (2004) Evaluation of RDF(S) and DAML+OIL
Import/Export Services within Ontology Platforms. LNAI 2972, 109-118.
11. F. Manola, E. Miller. RDF Primer. W3C Recommendation. 10 February 2004.
http://www.w3.org/TR/2004/REC-rdf-primer-20040210/

Download Report

Improving the Search for the Most Similar Concept in other Ontology

Paperzz.com

Your Paperzz