Semantic Enrichment of Ontology Mappings

Semantic Enrichment of Ontology Mappings:
Advances, Insights and Ideas for Improvement
Patrick Arnold, Universität Leipzig
1
1. Introduction

Semantic Enrichment of Ontology Mappings
●

Important step for...
●
●

Annotate the relation type of correspondences
Schema/Ontology Merging
Schema/Ontology Evolution
Also applicable for:
●
●
●
●
06/20/14
Entity resolution
Text mining / Information retrieval
Linked Open Data
etc.
Semantic Enrichment of Ontology Mappings
2
1. Introduction

Two-Step Approach
06/20/14
Semantic Enrichment of Ontology Mappings
3
1. Introduction
06/20/14
Semantic Enrichment of Ontology Mappings
4
1. Introduction

Compound Strategy (Recap)
●
●

Works with endocentric compounds
●

drilling machine, school bus, blackboard
Does not work with exocentric compounds
●

Concept AB matches a concept B
Draw is-a conclusion (vintage car is-a car)
nightmare, redhead, saw tooth, butterfly
Some borderline cases:
●
06/20/14
strawberry, airport, city hall, snowman, ...
Semantic Enrichment of Ontology Mappings
5
1. Introduction

Results in December 2012 (reconstructed)
●
●
Strategies: Compound + Itemization
Goals: Improve recall, keep up precision
Scenario
Recall
Precision
F-Measure
WebDirectories (German)
44.4 %
70.0 %
57.2 %
Health (Diseases)
58.5 %
92.3 %
75.4 %
Text Mining Taxonomies
12.5 %
97.7 %
55.1 %
Web Directories 2 (German)
35.8 %
75.6 %
55.5 %
06/20/14
Semantic Enrichment of Ontology Mappings
6
Agenda
1. Introduction
2. Advances
3. Evaluation
4. Outlook / Ideas for Improvements
06/20/14
Semantic Enrichment of Ontology Mappings
7
2. Advances
2.1 WordNet

Using Java API for WordNet Search (JAWS)
●
●
Renowned thesaurus
Contains all relevant relations
–
●

is-a, part-of, related (cohyp.), equal
Used by many other tools and approaches
Our improvement: Gradual Modifier Reduction
●
●
●
06/20/14
Basic idea: Compounds among the most-productive way of word
formation
WordNet: 160,000 lexemes
–
English general vocabulary: approx. 1 Mio. lexemes
–
English overall vocabulary: >> 1 Mio. lexemes
GTR: Handle words that do not occur in WordNet
Semantic Enrichment of Ontology Mappings
8
2. Advances and Insights
2.1 WordNet

Example
●
●

Correspondence (US Vice President, Person)
Could not be resolved, because “US Vice President“ was not in
the dictionary.
Gradual Modifier Reduction:
●
●
●
06/20/14
If a compound word does not occur in the dictionary, gradually
remove its modifiers
Start with the most left-hand modifier
After each removed modifier: Check word aggain
Semantic Enrichment of Ontology Mappings
9
2. Advances and Insights
2.1 WordNet

2 Examples resolved with this strategy
06/20/14
Semantic Enrichment of Ontology Mappings
10
2. Advances and Insights
2.1 WordNet

Enhancements with WordNet
Health
Text Mining Taxonomies
Recall
was
58.5 ↑ 2.4
now
60.9
Precision
92.3 ↑ 3.8
96.1
06/20/14
Recall
was
12.5
↑ 31.9
now
44.3
Precision
97.7
↑
99.0
Semantic Enrichment of Ontology Mappings
1.3
11
2. Advances and Insights
2.1 WordNet

Enhancements with WordNet GTR
Health
Recall
was
60.9
Precision
96.1
06/20/14
Taxonomies
↓ 4.9
now
56.0
↓ 4.1
92.0
Recall
was
44.3
↑ 15.8
now
60.1
Precision
99.0
↓
97.8
Semantic Enrichment of Ontology Mappings
1.2
12
2. Advances and Insights
2.2 Structure Strategy


Motivation: Similar to WordNet GTR
Question: What if neither strategy can draw a conclusion
between matching concepts A, B?
●
●
06/20/14
Check whether a relation between (A, B') or (A', B) can be drawn
Prime denoting father element
Semantic Enrichment of Ontology Mappings
13
2. Advances and Insights
2.2 Structure Strategy
Example: Online_Shoe_Store.Shoes.Sneakers ↔ Apparel_Store.Footwear
Shoes is-a Footwear → Sneakers is-a Footwear
06/20/14
Semantic Enrichment of Ontology Mappings
14
2. Advances and Insights
2.2 Structure Strategy

Enhancements with Structure Strategy
Web Directories 2
Web Directories 1
Recall
was
44.4
Precision
70.0
06/20/14
↑ 3.2
now
47.6
↓ 0.3
69.7
Recall
was
35.8
↑ 0.9
now
36.7
Precision
75.6
↓ 8.2
67.4
Semantic Enrichment of Ontology Mappings
15
2. Advances and Insights
2.2 Structure Strategy

Precision losses with Structure Strategy
06/20/14
Semantic Enrichment of Ontology Mappings
16
2. Advances and Insights
2.3 Compound-Modifier Match Strategy

Original Compound Strategy: “Head” matches Compound
●

Adaption: “Modifier” matches Compound
●

e.g., (roof, roof window), (bed, bedroom)
Compound-Modifier-Strategy is able to detect part-of / has-a
relations
●
●

e.g., (school, high-school)
bed part-of bedroom
doorknob part-of door
Problem: Direction cannot be determined
06/20/14
Semantic Enrichment of Ontology Mappings
17
2. Advances
2.3 Compound-Modifier Match Strategy

Two major cases:
●
AB part-of A (23.3 %)
–
●
A part-of AB (30.7 %)
–

heartbeat, moonlight, earring, policeman, eyeballs, bathtub, ...
bedroom, motorcycle, babysitter, railroad, fireplace, bookstore, …
About 50 % contain the part-of resp. has-a relation
●
●
06/20/14
Other 50 % often “related”
Examples: nightmare, fingerprint, tooth paste
Semantic Enrichment of Ontology Mappings
18
2. Advances
2.3 Compound-Modifier Match Strategy

Compound-Modifier-Mach Strategy currently disabled
●
●
06/20/14
Loss of precision
Benchmarks hardly contain specified part-of relations
Semantic Enrichment of Ontology Mappings
19
3. Evaluation
3.1 General Improvements


Original Values:
Increase:
06/20/14
Scenario
Recall
Precision
F-Meas.
Web 1
44.4 %
70.0 %
57.2 %
Health
58.5 %
92.3 %
75.4 %
Tax.
12.5 %
97.7 %
55.1 %
Web 2
35.8 %
75.6 %
55.5 %
Scenario
Recall
Precision
F-Meas.
Web 1
51.6 %
69.5 %
60.5 %
Health
60.9 %
92.5 %
76.7 %
Tax.
60.4 %
97.8 %
79.1 %
Web 2
42.3 %
73.3 %
57.8 %
Semantic Enrichment of Ontology Mappings
20
3. Evaluation
3.1 General Improvements


Improvement:
Scenario
Recall
Precision
F-Meas.
Web 1
+ 7.2 %
- 0.5 %
+ 3.3 %
Health
+ 2.4 %
- 0.2 %
+ 1.3 %
Tax.
+ 47.9 %
- 0.1 %
+ 24.0 %
Web 2
+ 6.5 %
- 2.3 %
+ 2.3 %
Conclusions:
●
●
06/20/14
Original goal: Increase recall without doing damage to the precision
Goal was mostly achieved
Semantic Enrichment of Ontology Mappings
21
3. Evaluation
3.2 Evaluation by Strategy


Assumption: Exactly one strategy is enabled
Recall:
Strategy
Web 1
Health
Tax.
Web 2
Mean
Compound
7.9
36.5
12.3
3.8
15.1
Itemization
36.5
14.6
0.2
25.3
19.1
WordNet (simple)
-
9.7
33.7
-
21.7
WordNet (GMR)
-
9.7
49.3
-
29.5
OpenThes.
1.5
0.0
3.4
1.2
1.5
Structure
3.1
0.0
0.0
1.2
1.1
06/20/14
Semantic Enrichment of Ontology Mappings
22
3. Evaluation
3.2 Evaluation by Strategy

Precision:
Strategy
Web 1
Health
Compound
62.5
88.2
Itemization
82.1
WordNet (simple)
WordNet (GTR)
Web 2
Mean
97.7
42.8
72.8
100.0
100.0
83.3
91.3
-
100.0
99.1
-
99.5
-
80.0
97.6
-
88.8
OpenThes.
50.0
0.0
96.0
50.0
65.3
Structure
50.0
-
-
20.0
35.0
06/20/14
Tax.
Semantic Enrichment of Ontology Mappings
23
3. Evaluation
3.2 Evaluation by Strategy

Some questions...
●
●
06/20/14
Why does OT detects relation types in an English-language scenario?
–
(Monarch, Person)
–
(Journalist, Person)
–
(Boxer, Person)
–
(Golfer, Athlete)
–
...
Why does WordNet scores below 100 %?
–
(Automobile, Vehicle): equal
–
(Road, Street): equal
Semantic Enrichment of Ontology Mappings
24
3. Evaluation
3.2 Time Complexity


Web1
Health
Tax.
Web2
Avg
Compound
2.36
2.51
4.31
2.01
2.80
Itemization
4.41
2.36
3.18
2.28
3.05
WordNet (simple)
2.15
4.39
3.75
1.94
3.06
WordNet (GTR)
2.26
4.02
5.61
2.09
3.50
OpenThes.
4.61
5.18
7.57
3.87
5.31
Structure
2.22
2.50
3.35
1.82
2.47
Overall
27.3
11.5
15.2
24.9
19.7
Average execution time per correspondence and strategy (ms)
Total execution time: ca. 5 .. 15 sec. per mapping
●
06/20/14
No time problems (yet)
Semantic Enrichment of Ontology Mappings
25
4. Outlook
4.1 Introduction

Possibilities:
●
More background knowledge
–
●
●
06/20/14
UMLS – medical domain
More linguistic knowledge
–
Exploit cohyponyms
–
Compound-Modifier-Match strategy
Hybrid strategies (advanced)
–
Wikipedia / Wiktionary
–
Search Engine
Semantic Enrichment of Ontology Mappings
26
4. Outlook
4.2 Wikipedia

Example Leipzig
●

Leipzig is a City
Example Bicycle
●
●
●
06/20/14
Bicycle is a vehicle
Pushbike, pedal bike,
pedal cycle, cycle are
synonyms of bike
Wheels are part of
bike
Semantic Enrichment of Ontology Mappings
27
4. Outlook
4.3 Wikipedia
06/20/14
Semantic Enrichment of Ontology Mappings
28
4. Outlook
4.4 Search Engines



Approach presented in the xxx Paper
Count the number of results for a specific expression like “A is a
B”
Problems:
●
●
●
06/20/14
Search Engines very restrictive wrt the number of queries/day
“Emergency solution” if all other strategies fail
Evaluation?
Semantic Enrichment of Ontology Mappings
29
4. Outlook
4.4 Search Engines

2 Examples
●
●
Leipzig | City
President | Politician
Query (Google)
“leipzig is a city”
“city is a leipzig”
“leipzig is part of (a) city”
“city is part of (a) leipzig”
“leipzig is related to (a) city”
“city is related to (a) leipzig”
06/20/14
Hits
62,700
0
0/0
3/0
0/0
0/0
Query (Google)
“president is a politician”
“politician is a president”
“president is part of (a)
politician)”
“politician is part of (a)
president”
“president is related to (a)
politician”
“politician is related to (a)
president”
Semantic Enrichment of Ontology Mappings
Hits
28.6 M
4
0/0
0/0
0/0
0/0
30
5. Conclusions

Achievements since December 2012:
●
●
●

New strategies: Structure, Background Knowledge
Enhanced methods:
–
Itemization
–
Gradual Term Reduction
–
Cross-equivalence (Structure Strategy)
Better recall, scarcely loss in precision
Outlook:
●
●
●
Many opportunities
Wikipedia, Wiktionary, Search Engines
Instance Data analysis
–
06/20/14
No appropriate benchmarks so far
Semantic Enrichment of Ontology Mappings
31