1404_OntologyMatching

Ontology
Alignment/Matching
Prafulla Palwe
►
Introduction




►
►
Matching Dimensions
Basic Techniques


►
Traditional
Emergent
Classification

►
Sequential composition
Parallel composition
Application Domains


►
Matching Operation
Motivation
Schema Matching Vs Ontology Matching
Correspondence
Alignment
Matching Process


►
Being serious about the semantic web
Living with heterogeneity
Heterogeneity problem
I have a plan for you 
Matching Problem





Agenda
Element Level
Structure Level
Summary and Challenges
Introduction
► Being




serious about the semantic web -
It is not one guy's ontology
It is not several guys' common ontology
It is many guys and girls' many ontologies
So it is a mess, but a meaningful mess 
Introduction
► Living
with heterogeneity -
 The semantic web will be:
► Huge
► Dynamic
► Heterogeneous
 These are not bugs, they are features.
 We must learn to live with them.
Introduction
► Heterogeneity
problem –
 Resources being expressed in different ways must be reconciled
before being used.
 Mismatch between formalized knowledge can occur when:
► different
languages are used;
► different terminologies are used;
► different modeling is used.
Introduction
►I
have a plan for you – Reconciliation
Matching Problem
► Matching
Operation
 Definition – Matching operation takes as input
ontologies, each consisting of a set of discrete
entities (e.g., tables, XML elements, classes,
properties) and determines as output the
relationships (e.g., equivalence, subsumption)
holding between these entities
Matching Problem
► Motivation
–
 2 XML Schemas
 2 Ontologies
Matching Problem
Matching Problem
Matching Problem
Matching Problem
Matching Problem
Matching Problem
Matching Problem
Matching Problem
Matching Problem
► Schema
mapping Vs ontology mapping
 Differences ►Schemas
their data
often do not provide explicit semantics for
 Relational schemas provide no generalization
►Ontologies
meaning
are logical systems that constrain the
 Ontology definition as set of logical axioms
 Commonalities ►Schemas
and ontologies provide a vocabulary of
terms that describes the domain of interest
►Schemas and ontologies constrain the meaning of
terms used in the vocabulary.
Matching Problem
► Correspondence
 Definition –
 Given 2 ontologies O and O’ , a correspondence
between M between O and O’ is a 5-uple :
<id,e,e’,R,n> such that:
►id
is a unique identifier of the correspondence.
►e and e’ are entities of O and O’ (e.g. XML Elements,
classes)
►R is a relation (e.g. equivalence (=), disjointness
(_|_))
►n is a confidence measure in some mathematical
structure (typically in the [0,1] range)
Matching Problem
► Alignment
 Definition –
 Given 2 ontologies O and O’, an alignment A
between O and O’:
►Is
a set of correspondence on O and O’
►With some cardinality: 1-1, 1-* etc.
►Some additional metadata (method, date, properties
etc)
Matching Process
Matching Process
Matching Process
Matching Process
Matching Process
Matching Process
Matching Process
► General
Basic Matching Process
Matching Process
► Sequential
Composition
Matching Process
► Parallel
composition
Matching Process
► Similarity
Filter, alignment extractor and
alignment filter –
Matching Process
► Aggregation
Operations –
 There are many different ways to aggregate matcher results, usually
depending on confidence/similarity:
► Triangular norms (min, weighted products) useful for selecting only the
best results
► Multidimensional distances (Eudidean distance, weighted sum) useful
for taking into account all dimensions
► Fuzzy aggregation (min, weighted average) useful for aggregating
competing algorithms and averaging their results
► Other specific measures (e.g., ordered weighted average)
Application Domains
► Traditional




-
Ontology evolution
Schema integration
Catalog integration
Data integration
Application Domains
► Ontology
Evolution
Application Domains
► Catalog
Integration
Application Domains
► Emergent
 P2P information sharing
 Agent communication
 Web service composition
 Query answering on the web
Application Domains
► P2P
information sharing
Application Domains
► Web
Service Composition
Application Domains
► Agent
communication
Classifications
► Matching
Dimensions
 Input Dimensions
► Underlying
models (e.g. XML, OWL)
► Schema Level Vs Instance Level
 Process Dimensions
► Approximate
Vs Exact
► Interpretation of the input
 Output Dimensions
► Cardinality
► Equivalence
Vs Diverse relations
► Graded Vs Absolute Confidence
Classifications
► Three
Layers
 Upper Layer
► Granularity
of match
► Interpretation of the input information
 Middle Layer
► Represents
classes of elementary (basic) matching techniques
 Lower Layer
► Based
on the kind of input which is used by elementary matching
techniques
Classifications
► Classification
of schema based techniques
Basic Techniques
► Element
Level Techniques
 String based –
 Prefix ► Takes an input 2 strings and checks whether the first string starts with
the second
► e.g. net = network but also hot = hotel
 Suffix –
► Takes an input 2 strings and checks whether the first string ends with
the second
► e.g. ID = PID but also word = sword
 Edit Distance –
► Takes
as input 2 strings and calculates the number of edit operations
(insertion,deletion,substitution) of characters required to transform
one string into other normalized by length of the max string.
► editDistance(NKN, Nikon) = 0.4
Basic Techniques
 Language based –
 Tokenization –
► Parses
names into tokens by recognizing punctuation, cases
► Hands-Free_Kits  <hands, free, kits>
 Lemmatization –
► Analyses
morphologically tokens in order to find all their possible basic
forms
► Kits  Kit
 Elimination –
► Discards
empty tokens that are articles, prepositions, conjuctions
► a, the, by, type of, their, from
Basic Techniques
► Structure
Level Techniques
 Ontologies are viewed as graph-like structure containing terms and
their inter-relationships.
 Taxonomy based
►Bounded
path matching
 These take 2 paths with links between classes defined by
the hierarchical relations, compare terms and their positions
along these paths and identify similar terms.
►Super(sub)-concept
rules
 If super concepts are the same, the actual concepts are
similar to each other
Basic Techniques
 Tree based
 Children
►2
non leaf schema elements are structurally similar if
their immediate children sets are highly similar
 Leaves
►2
non leaf schema elements are structurally similar if
their leaf sets are highly similar, even if their
immediate children are not.
Basic Techniques
Basic Techniques
Basic Techniques
Summary and Challenges
► Summary
 Ontology Matching and alignment is the process of
developing the common or most common
structure/semantic terms out of 2 or more different
ontologies/structures/schemas.
 Different efficient and complex algorithms using basic
techniques of matching process, can be developed for
matching and alignment generation.
► Challenges
 Developing generic and highly efficient matching and
alignment generation algorithms.
Thank You