PPT

Logics for Data and Knowledge
Representation
Semantic Matching
Outline

Introduction:
 Why matching?
 The matching problem
Kinds of matching: Syntactic versus Semantic
 Steps in matching
 Kinds of schemas

S-Match
 MinSMatch

2
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Approaching the heterogeneity problem


Knowledge can be represented using graphs.
These graphs can be very different:






3
Different structure (RDBs, OODB, XML, thesauri, formal
ontologies)
Different conceptualizations: they reflect different visions of the
world
They contain different terminology and polysemous terms
They have different degrees of specificity, scope and coverage
They can be expressed in different languages
Heterogeneity of these graphs demands the exposition of
relations between them, such as semantically equivalent.
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
The Matching Problem

Matching Problem: given two finite graphs, finds all nodes in
the two graphs that syntactically or semantically correspond to
each other.

Given two graph-like structures (e.g., classifications, XML and
database schemas, ontologies), a matching operator produces
a mapping between the nodes of the graphs.

Solution: A possible solution consists in the conversion of the
two graphs in input into lightweight ontologies and then
matching them semantically.
4
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
A Matching Problem
?
?
?
5
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Kinds of Matching: Syntactic versus Semantic (I)

Syntactic matching
Matching of nodes as objects or strings (without meaning)
“Virus” partially matches with “Computer virus”
“Wives” is an exceptional form for “Wife”

Semantic matching
Matching of nodes as concepts
“Car” is equivalent to “Automobile”
“Dog” is more specific than “Animal”
6
Car ≡ Automobile
Dog ⊑ Animal
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Kinds of Matching: Syntactic versus Semantic (II)

Syntactic matching
Relations are computed between labels at nodes.
The similarity is given in a range [0,1].
Most of the tools developed so far are syntactic.

Semantic matching
Relations are computed between concepts at nodes.
They return a semantic similarity, namely R ∈ {, ≡, ⊑, ⊒, ⊓},
sometimes with a confidence value in the range [0,1].
There are a few tools of this kind, but they are recently
increasing. They return different kinds of semantic relations.
7
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Steps in matching

A matching problem can be decomposed in three steps:
1.
2.
3.

8
Extract the graphs from the conceptual models under
consideration;
Convert the graphs into formal ontologies
Match the formal ontologies
In the next slides we show some examples of step 1
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Relational DB Schemas

Let us consider the following relational database (RDB)
model, say “BANK”:

We can represent the RDB model “BANK” as a graph (i.e.
a tree) with root “BANK”.
9
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Relational DB Schemas: Representation #1

The RDB model is first partitioned into relations, then
attributes and data instances.
10
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Relational DB Schemas: Representation #2

The model is partitioned into relations, then into tuples,
attributes and data instances.
11
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Relational DB Schemas: notes

Which of the two representations is more preferable
depends on the concrete task.

It is always possible to transform one representation into
the other.

In contrast to the example of RDB “BANK”, DB schemas
are seldom trees. More often, DB schemas are translated
into Directed Acyclic Graphs (DAG’s) and then
approximated into trees.
12
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
OODB Schemas

Let us consider the RDB “BANK” in terms of an objectoriented DB (OODB) schema:
BRANCH (Street, City, Zip)
PERSON (F_Name, L_Name)
STAFF : PERSON (Position, Salary, Manager)

The resulting graph is:
13
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
OODB Schemas: notes

OODB schemas capture more semantics than the RDBs.

In particular, an OODB schema:
 explicitly expresses subsumption relations between
elements;
 admits special types of arcs for part/whole relationships
in terms of aggregation and composition.
14
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Semi-structured Data

Neither RDBs nor OODBs capture all the features of semistructured or unstructured data (Buneman, 1997):
 semi-structured data do not possess a regular structure
(schemaless);
 the “structure” of semi-structured data could be partial or
even implicit.

Typical examples are: HTML and XML.
15
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
XML Schemas
XML schemas can be represented as DAGs.
 The graph from the RDB “BANK” could also be obtained
from an XML schema.

16
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
XML Schemas: notes

Often XML schemas represent hierarchical data models.
 In this case the only relationships between the elements
are “is-a”.

Attributes in XML are used to represent extra information
about data. There are no strict rules telling us when data
should be represented as elements, or as attributes.
17
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Concept Hierarchies

A concept hierarchy is a semi-formal conceptualization of an
application domain in terms of concepts and relationships.
18
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
S-Match: the matching problem




Semantic Matching
Given two graphs G and H, for any node ni  G and mj  H, find
the strongest semantic relation R holding between them
The strength of the relations is given by a partial order, that is:
disjointness (), equivalence (≡), more/less specific (⊑,⊒),
overlap (⊓).
A mapping element is a 4-tuple <IDij, ni, mj , R>, where:
 IDij is a unique identifier of the given mapping element;
 ni is the i-th node of the first graph;
 mj is the j-th node of the second graph;
 R specifies a semantic relation between the concepts at the
given nodes
A mapping is a set of mapping elements
19
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Example
Images
Europe
⊑
1
1
Europe
≡
2
3
4
Austria
Italy
⊒
Pictures
Wine and Cheese
2
3
4
5
Italy
Austria
< ID22, 2, 2, ≡ >
A mapping
< ID21, 2, 1, ⊑ >
< ID24, 2, 4, ⊒ >
20
A mapping element
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
S-Match: the algorithm
Four Macro Steps: Given two labeled trees T1 and T2, do:
1. For all labels in T1 and T2 compute concepts at labels
2. For all nodes in T1 and T2 compute concepts at nodes
3. For all pairs of labels in T1 and T2 compute relations between
concepts at labels
4. For all pairs of nodes in T1 and T2 compute relations between
concepts at nodes

Steps 1 and 2 constitute the preprocessing phase (off-line), and are
executed once and each time after the schema/ontology is changed
(graphs are converted into lightweight ontologies)

Steps 3 and 4 constitute the matching phase (on-line), and are
executed every time the two schemas/ontologies are to be matched
(the two lightweight ontologies are matched)
21
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Step 1: compute concepts at labels


Natural language expressions are translated into a formal
language
Concepts are computed by disambiguating among all possible
senses of words in a label and their interrelations
Computation:
 Tokenization. Labels are parsed into tokens.
 Lemmatization. Tokens are morphologically analyzed in order
to find their possible basic forms.
 Building atomic concepts. An oracle (WordNet) is used to
extract senses of lemmatized tokens.
 Building complex concepts. Prepositions, conjunctions, etc. are
translated into logical connectives and used to build complex
concepts out of the atomic concepts.
22
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Step 1: compute concepts at labels

Tokenization.
“Images and text”  <Images, and, text>;

Lemmatization.
“Images”  Image;

Building atomic concepts.
“Image” has 8 senses in WordNet, 7 as a noun and 1 as a verb.
Some might be filtered out analyzing the context. The rest are kept:
“Image”  Image#1 ⊔ Image#3 ⊔ Image#7;

Building complex concepts
 (Image#1 ⊔ Image#3 ⊔ Image#7) ⊔ text#2
23
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Step 2: compute concepts at nodes

Concepts at labels are extended by taking into account the
context of the node, i.e. the position of the node in the tree
Computation:
 The concept at a node for some node n is computed as the
conjunction of the concepts at labels along the path from the
root till the node n itself
24
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Step 2: compute concepts at nodes
Europe
1
Pictures
Wine and Cheese
2
3
4
5
Italy
Austria
C4 = CEurope ⊓ CPictures ⊓ CItaly
25
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Step 3: compute relations between concepts at labels

Exploit a priori knowledge, e.g., lexical, domain knowledge
with the help of element level semantic matchers
Computation:
 Concepts at labels of each pair of nodes from the two trees are
compared to determine semantic relations between them
(without caring about their context)

The set of semantic relations computed with this step constitute
the set of axioms that will be used to compute the mapping, i.e.
our TBox!
26
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Step 3: compute relations between concepts at labels
T1
T2
Images
Europe
1
1
Pictures 2
Europe 2
Austria 3
T2
5
Austria
CAustria

≡
CItaly
≡

CEurope
CPictures
4
Italy
CAustria
CWine
CCheese
≡
CImages
27
Italy
Wine and
Cheese
CItaly
T1
CEurope
4
3
≡
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Step 4: compute relations between concepts at nodes

The matching problem is reduced to a set of node matching problems

Each node matching problem is reduced to a validity problem

TBox T: the relations between concepts at labels (from step 3)

Given the pair of nodes (n, m), deciding if a given semantic relation
holds between them corresponds to verifying that a certain relation
holds between corresponding concepts at node:
 Subsumption: T ⊨ Cn ⊑ Cm or T ⊨ Cn ⊒ Cm
 Equivalence: T ⊨ Cn ⊑ Cm and T ⊨ Cn ⊒ Cm
 Disjointness: T ⊨ Cn ⊓ Cm ⊑ 

This is done by eliminating the TBox T and verifying that the negation
of each formula is unsatisfiable.

The strongest relation holding between the two nodes is returned
28
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Step 4: compute relations between concepts at nodes

Suppose we want to check if C12 ≡ C22
(C1Images  C2Pictures)  (C1Europe  C2Europe)  (C12  C22 )
Context
(the relevant part of the TBox)
T2
T1
C21
C22
C23
Goal
C24
C25
C11
C12
29
=
C13

C14
=

INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
The final result
≡
T1
T2
T1
T2
Images
Europe
Images
Europe
1
1
1
1
Pictures2
Europe 2
Austria 3
4
5
Wine
and
Cheese
Austria
Pictures2
Europe 2
Austria 3
4
Italy
3
4
5
Italy
T2
T1
T2
Images
Europe
Images
Europe
1
1
1
1
Pictures2
Europe 2
30
4
Italy

T1
Austria 3
Italy
3
4
Italy
4
Italy
3
5
Wine
and
Cheese
Austria
Pictures2
Europe 2
Austria 3
4
Italy
4
Italy
3
5
Wine
and
Cheese
Austria
Wine
and
Cheese
Austria
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Problems with S-Match
JOURNALS
journals#1
DEVELOPMENT AND
PROGRAMMING LANGUAGES
(development#1 ⊔ programming#2)
⊓ languages#3 ⊓ journals#1
JAVA
(development#1 ⊔ programming#2)
⊓ languages#3 ⊓ journals#1
⊓ Java#3
A
⊑
B
⊑
⊑
⊑
⊒
⊑
⊒
D
⊑
E
⊑
PROGRAMMING AND DEVELOPMENT
programming#2 ⊔ development#1
LANGUAGES
languages#3 ⊓
(programming#2 ⊔ development#1)
⊑
C
≡
F
⊑
JAVA
Java#3 ⊓ languages#3 ⊓
(programming#2 ⊔ development#1)
G
MAGAZINES
Magazines#1 ⊓ Java#3 ⊓ languages#3
⊓ (programming#2 ⊔ development#1)

It is slow. Can we make the algorithm faster?

Too many relations between nodes are returned by S-Match.

Are there some relations which are “more important” than others?
31
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Redundancy patterns
(1)
A
B
C
⊑
⊑
(3)
(2)
D
A
E
B
F
C
⊒
⊒
D
A
E
B
⊥
⊥
F
C
(4)
D
A
E
B
F
C
≡
≡
≡
D
E
⊥
⊥
F

There are some axioms (the dashed arrows) which can be derived
from others (the solid ones)

Note that (4) is a combination of (1) and (2)

Note that (1) and (2) are still true in case of equivalence
32
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
Minimal and redundant mappings

A redundant mapping element is a mapping element which can be
derived from the others using the patterns.

A redundant mapping is a set containing redundant mapping
elements

The minimal mapping is the biggest subset of mapping elements
among those without redundant elements
 The minimal mapping always exists and it is unique
 Advantages in visualization, validation and maintenance

The mapping of maximum size
 is the set containing the maximum number of mapping elements
 it can be obtained from the propagation of the elements in the minimal
set using the intuitions encoded in the patterns.
33
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
The minimal mapping
JOURNALS
journals#1
DEVELOPMENT AND
PROGRAMMING LANGUAGES
(development#1 ⊔ programming#2)
⊓ languages#3 ⊓ journals#1
JAVA
(development#1 ⊔ programming#2)
⊓ languages#3 ⊓ journals#1
⊓ Java#3
34
A
⊑
B
⊑
⊑
⊒
⊑
⊒
⊑
D
⊑
E
⊑
PROGRAMMING AND DEVELOPMENT
programming#2 ⊔ development#1
LANGUAGES
languages#3 ⊓
(programming#2 ⊔ development#1)
⊑
C
≡
F
⊑
JAVA
Java#3 ⊓ languages#3 ⊓
(programming#2 ⊔ development#1)
G
MAGAZINES
Magazines#1 ⊓ Java#3 ⊓ languages#3
⊓ (programming#2 ⊔ development#1)
INTRODUCTION :: KINDS OF MATCHING :: STEPS :: KINDS OF SCHEMAS :: S-MATCH :: MINSMATCH
MinSMatch: the algorithm
Computing the minimal mapping M:
function TreeMatch(tree T1, tree T2) {
TreeDisjoint(root(T1),root(T2));
direction := true;
TreeSubsumedBy(root(T1),root(T2));
direction := false;
TreeSubsumedBy(root(T2),root(T1));
TreeEquiv();
};
Computing the set of maximum size:
function Propagate(M)
35
(3)
(1)
(2)
(4), from (1) and (2)
S-MATCH LAB
S-Match Lab
• Java: JRE or JDK (preferred)
• Java IDE: Eclipse, IDEA, NetBeans, …
• Text Editor
36