Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Goals of the tutorial Ontology matching tutorial Jérôme Euzenat Pavel Shvaiko I Provide an introduction to ontology matching I Discuss practical and methodological issues & Montbonnot Saint-Martin, France [email protected] I Demonstrate and use (advanced) matching technology I Motivate future research Trento, Italy [email protected] October 2014 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Outline 1 Problem 2 Applications 3 Methodology 4 Classification 5 Methods 6 Strategies 7 Systems 8 Using alignments 9 Evaluation 10 Conclusions Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 2 / 113 Use Evaluation Conclusions What is an ontology? An ontology typically provides a vocabulary that describes a domain of interest and a specification of the meaning of terms used in the vocabulary. Depending on the precision of this specification, the notion of ontology encompasses several data and conceptual models, including, sets of terms, classifications, thesauri, database schemas, or fully axiomatized theories. 3 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 5 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Various forms of ontologies ‘Ordinary’ Data Structured glossaries dictionaries glossaries Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Being serious about the semantic web Principled, informal hierarchies XML Formal Description schemas taxonomies logics EntityAd hoc Database Terms Thesauri XML DTDs relationship hierarchies schemas models Glossaries and Thesauri and Metadata and data dictionaries taxonomies data models I It is not one guy’s ontology. I It is not several guys’ common ontology. I It is many guys and girls’ many ontologies. expressivity Frames Logics I So it is a mess, but a meaningful mess. Formal ontologies adapted from [Uschold and Gruninger, 2004] Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 6 / 113 Use Evaluation Conclusions Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 7 / 113 Use Evaluation Conclusions Living with heterogeneity The heterogeneity problem The semantic web will be: I huge, I dynamic, Often resources expressed in different ways must be reconciled before being used. Mismatch between formalized knowledge can occur when: I different languages are used, I heterogeneous. I different terminologies are used, I different modelling is used. These are not bugs, these are features. We must learn to live with them and master them. Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 8 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 9 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions On reducing heterogeneity Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions The matching process o0 Reconciliation can be performed in 2 steps o Matcher o parameters A A matching Generator o0 resources Match, thereby determine an alignment Generate a processor (for merging, transforming, etc.) A0 Transformation Matching can be achieved at run time or at design time. Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 10 / 113 Use Evaluation Conclusions Motivation: two ontologies Product price title doi creator topic integer Monograph string isbn author title Essay Litterary critics Person Human Book author Writer Applications Methodology Classification Methods Process Systems Politics Biography Use Evaluation Conclusions author Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Monograph isbn author title Essay ≥ Person ≤ ≥ Litterary critics Human Book ≥ Writer Politics Biography subject CD Autobiography Albert Camus: La chute ≥ Product price title doi creator topic DVD subject CD Bertrand Russell: My life Problem 11 / 113 Motivation: two ontologies uri DVD Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Autobiography Literature Literature 12 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 12 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Motivation: two XML schemas Electronics Personal Computers Classification Methods Process Systems Use Evaluation Conclusions PC ⊥ PC board Definition (Correspondence) ID Brand Amount Price Accessories Given two ontologies o and o 0 , a correspondence between o and o 0 is a 3-uple: he, e 0 , r i such that: I e and e 0 are entities of o and o 0 , for instance, classes, XML elements; Cameras and Photo ≥ Photo and Cameras PID Name Quantity Price ≥ Classification Methods I r is a relation, for instance, equivalence (=), more general (w), disjointness (⊥). Accessories Digital Cameras ID Brand Amount Price Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Methodology Methodology Correspondence PID Name Quantity Price Applications Applications Electronics Microprocessors Problem Problem Process Systems 13 / 113 Use Evaluation Conclusions Alignment Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 14 / 113 Use Evaluation Conclusions Terminology: a summary Matching is the process of finding relationships or correspondences between entities of different ontologies. Definition (Alignment) Alignment is a set of correspondences between two or more (in case of multiple matching) ontologies. The alignment is the output of the matching process. Given two ontologies o and o 0 , an alignment (A) between o and o 0 : I is a set of correspondences on o and o 0 I with some additional metadata (multiplicity: 1-1, 1-*, method, date, . . .) Correspondence is the relation supposed to hold according to a particular matching algorithm or individual, between entities of different ontologies. Mapping is the oriented version of an alignment. Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 15 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 16 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Applications: ontology evolution ot Applications Methods Generator Generator Classification Methods Process Systems Use Evaluation Conclusions Process DB Kbt+n Systems 18 / 113 Use Evaluation Conclusions DBPortal Translator Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 19 / 113 Use Evaluation Conclusions Applications: p2p information sharing o Matcher o0 Matcher A Applications: linked data interlinking o Classification A Transformation Methodology Methodology o ot+n Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Applications: catalog integration Matcher Kbt Problem o0 Matcher o0 A A Generator Linker peer1 d L Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko d0 20 / 113 query answer query mediator answer Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko peer2 21 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Applications: web service composition o Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Applications: query answering o0 Matcher A Generator service1 mediator output service2 input Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 22 / 113 Use Evaluation Conclusions √ √ √ √ √ √ √ √ √ √ √ √ √ Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions √ √ √ √ √ Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko operation complete correct automatic √ √ √ √ √ √ Problem 23 / 113 The alignment life cycle run time Application Ontology evolution Schema integration Catalog integration Data integration Linked data P2P information sharing Web service composition Multi agent communication Query answering instances Applications requirements Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko transformation merging data translation query answering data interlinking query answering data mediation data translation query reformulation 24 / 113 enhancement evaluation A0 A00 A exploitation communication creation Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 26 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions The matching methodology workflow Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Matching dimensions Identifying ontologies, characterising need I Input dimensions Retrieving existing alignments found I Underlying models (e.g., XML, OWL) I Schema-level vs. instance-level I Content vs. context not found I Process dimensions Selecting and composing matchers Enhancing I Approximate vs. exact I Interpretation of the input Matching failed I Output dimensions I Cardinality (e.g., 1-1, 1-*) I Equivalence vs. diverse relations (e.g., subsumption) I Graded vs. absolute confidence Evaluating passed Storing and sharing Rendering Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 27 / 113 Use Evaluation Conclusions Three layers Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 29 / 113 Use Evaluation Conclusions Classification of matching techniques Matching techniques Element-level Semantic I The upper layer I Granularity of match I Interpretation of the input information Syntactic Structure-level Syntactic Granularity/Input interpretation Semantic Formal StringresourceLanguageGraphbased based Constraintbased based Informal Instancename upperbased tokenisation, graph resourceTaxonomybased similarity, level type lemmatisation, homomorbased based data description ontologies, similarity, taxonomy directories, similarity, morphology, phism, analysis domainkey annotated elimination, structure path, and global specific properties resources lexicons, children, statistics namesontologies, thesauri leaves pace linked data I The middle layer represents classes of matching techniques I The lower layer I Origin I Kind of input information Semantic Syntactic Terminological Structural Context-based Extensional Concrete techniques Modelbased SAT solvers, DL reasoners Semantic Content-based Origin/Kind of input Matching techniques Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 30 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 31 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Basic methods: string-based I takes as input two strings and checks whether the first string starts with the second one I net = network; but also hot = hotel I Suffix I takes as input two strings and checks whether the first string ends with the second one I ID = PID; but also word = sword Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Methodology Classification Methodology Classification Methods Process Systems Use Evaluation Conclusions Edit distance I takes as input two strings and calculates the number of edition operations, (e.g., insertions, deletions, substitutions) of characters required to transform one string into another I normalized by length of the maximum string I EditDistance(NKN,Nikon) = NiKoN/5 = 2/5 = 0.4 I EditDistance(editeur,editor) = edit oe ur/7= 3/7 = 0.43 (e.g., S-Match, OLA, Anchor-Prompt) (e.g., COMA, SF, S-Match, OLA) Applications Applications Basic methods: string-based I Prefix Problem Problem Methods Process Systems 33 / 113 Use Evaluation Conclusions Basic methods: string-based Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 34 / 113 Use Evaluation Conclusions Basic methods: language-based I Tokenization I N-gram I takes as input two strings and calculates the number of common n-grams (i.e., sequences of n characters) between them, normalized by max(length(string 1), length(string 2)) I trigram(3) for the string nikon are nik, iko, kon (e.g., COMA, S-Match) I parses names into tokens by recognizing punctuation, cases I Hands-Free Kits → h hands, free, kits i I Lemmatization I analyses morphologically tokens in order to find all their possible basic forms I Kits → Kit (e.g., COMA, Cupid, S-Match, OLA) Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 35 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 36 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Basic methods: language-based Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Basic methods: linguistic resources I Sense-based: WordNet I A v B if A is a hyponym or meronym of B I Brand v Name I Elimination I A w B if A is a hypernym or holonym of B I discards “empty” tokens that are articles, prepositions, conjunctions, etc. I a, the, by, type of, their, from I Europe w Greece I A = B if they are synonyms (e.g., Cupid, S-Match) I Quantity = Amount I A ⊥ B if they are antonyms or the siblings in the part of hierarchy I Microprocessors ⊥ PC Board (e.g., Artemis, CtxMatch, S-Match) Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 37 / 113 Use Evaluation Conclusions Basic methods: linguistic resources Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 38 / 113 Use Evaluation Conclusions Basic methods: multilingual matching I Sense-based: WordNet hierarchy distance artist person God creator2 creator1 Ontologies can be multilingual if they use several different languages, e.g., EN, IT, FR. Matching can be done by comparing to a pivot language or through cross-translation. We distinguish between: monolingual matching, which matches two ontologies based on their labels in a single language, such as English; maker communicator litterate legal document illustrator author1 writer2 =author2 writer3 writer1 illustrator author creator writer Person multilingual matching, which matches two ontologies based on labels in a variety of languages, e.g., English, French and Spanish. This can be achieved by parallel monolingual matching of terms or crosslingual matching of terms in different languages; Some other measures (e.g., Resnik measure) depend on the frequency of the terms in the corpus made of all the labels of the ontologies. (e.g., S-Match) Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 39 / 113 crosslingual matching, which matches two ontologies based on labels in two identified different languages, e.g., English vs. French. Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 40 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Basic methods: constraint-based Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Basic methods: extensional :C →E I Datatype comparison E can be a set of instances, a set of documents which are indexed by concepts, a set of items, e.g., people, which use these concepts. Two cases: I E is common to both ontologies; I integer < real I date ∈ [1/4/2005 30/6/2005] < date[year = 2005] I {a, c, g , t}[1 − 10] < {a, c, g , u, t}+ I Multiplicity comparison I [1 1] < [0 10] Can be turned into a distance by estimating the ratio of domain coverage of each datatype. (e.g., OLA, COMA) Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process 41 / 113 Systems Use Evaluation Conclusions Extensional techniques I E depends on the ontology. This can be reduced to the former case by identification or record linkage techniques. Techniques: I statistical and machine learning techniques infer and compare the characteristics of populations; I set-theoretic techniques compare the extensions; Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process 42 / 113 Systems Use Evaluation Conclusions Extensional techniques Monograph Product DVD ≥ Book .8 Essay Literary critics CD Politics Biography ≥ Monograph Product DVD ≥ Book .8 Essay Literary critics CD Politics Biography ≥ Autobiography Autobiography Literature Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Literature 43 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 43 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Global methods: tree-based Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Global methods: tree-based Electronics I Children I Two non-leaf schema elements are structurally similar if their immediate children sets are highly similar Electronics Personal computers PC Cameras and photos Digital cameras Photos and cameras I Leaves PID I Two non-leaf schema elements are structurally similar if their leaf sets are highly similar, even if their immediate children are not ID Name Quantity (e.g., Cupid, COMA) Brand Amount Price Price (e.g., Cupid, COMA) Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 44 / 113 Use Evaluation Conclusions Global methods: graph-based Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 45 / 113 Use Evaluation Conclusions Global methods: probabilistic matching Probabilistic methods, such as Bayesian networks or Markov networks, can be used universally in ontology matching, e.g., to enhance some available matching candidates. I Iterative fix point computation I If the neighbors of two nodes of the two ontologies are similar, they will be more similar. (e.g., SF, OLA) o A probabilistic reasoner probabilistic modeling M extraction A0 o0 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 46 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 47 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Problem Applications Methodology Classification Methods Probabilistic matching: bayesian networks example Global methods: model-based Bayesian networks are made up of (i) a directed acyclic graph, containing nodes (also called variables) and arcs, and (ii) a set of conditional probability tables. Arcs between nodes stand for conditional dependencies and indicate the direction of influence. Description logics (DL)-based Process Systems Use Evaluation Conclusions = ≥ hasWritten m(hasWritten, creates) creates micro-company = company u ≤5 employee SME = firm u ≤10 associate ≤ Writer m(Writer, Creator) Creator company = firm ; associate v employee micro-company v SME m(author, hasCreated) hasCreated author Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process 48 / 113 Systems Use Evaluation Conclusions Ontology partitioning and search-space pruning Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 49 / 113 Use Evaluation Conclusions Sequential composition Partitioning: split large ontologies into smaller ontologies, and match these smaller ontologies, e.g., Falcon-AO, TaxoMap. Pruning: dynamically ignore parts of large ontologies when matching, e.g., AROMA, LogMap. o o1 o A1 matcher A o10 A partitioner ... parameters 0 parameters A01 aggregation matching resources A0 A0 matching0 A00 resources 0 on o0 o0 An matcher A0n on0 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 51 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 52 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Parallel composition Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Context-based matching (CBM) parameters o Problem I Using the ontologies on the web as context I Composing the relations obtained through these ontologies A0 matching The seven steps: resources 1. Ontology arrangement 2. Contextualization A000 A 3. Ontology selection 4. Local inference parameters 0 5. Global inference matching0 o0 A00 6. Composition 7. Aggregation resources 0 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 53 / 113 Use Evaluation Conclusions Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 54 / 113 Use Evaluation Conclusions CBM step 1: ontology arrangement CBM step 2: contextualization Ontology arrangement preselects and ranks the ontologies to be explored as intermediate ontologies Contextualisation (or anchoring) finds anchors between the ontologies to be matched and the candidate intermediate ontologies a00 a000 b0 ≤ a0 = a b a Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko a b 55 / 113 b b00 = = a Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko = b 56 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions CBM step 3: selection CBM step 4: local inference Ontology selection restricts the candidate ontologies that will actually be used Local inference obtains relations between entities of a single ontology. It may be reduced to logical entailment a00 a000 b0 b0 b0 ≤ a0 = a00 a00 b00 = = a0 = a = b00 = = Applications Methodology Classification Methods Process Systems 57 / 113 Use Evaluation Conclusions b0 w a0 b00 = = = a b Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem = = a b a0 a00 w c = c0 w b00 = = = a b b Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 58 / 113 Use Evaluation Conclusions CBM step 5: global inference CBM step 6: composition Global inference finds relations between two concepts of the ontologies to be matched by concatenating relations obtained from local inference and correspondences across intermediate ontologies Composition determines the relations holding between the source and target entities by composing the relations in the path (sequence of relations) connecting them a00 w c b0 w a0 = = = a c0 w b00 = b a00 w c b0 w a0 = = = a Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko ≥ c0 w b00 b0 w a0 = = a00 w c = = a b 59 / 113 ≥ c0 w b00 = b a00 w c b0 w a0 = = = a Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko ≥ c0 w b00 ≤ = ≥ b 60 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions CBM step 7: aggregation Problem Applications Methodology b0 w a0 = ≥ = a c0 ≤ = ≥ b a00 w c b0 w a0 w b00 = Methods Process Systems ≥ = a Food ≤ Beef c0 w b00 = = TAP = = = Food ≤ RedMeat ≤ MeatOrPoultry ≤ = Beef= Methodology Classification Beef Methods Process Systems 61 / 113 Use Evaluation Conclusions Matching learning Beef Food = ≤ Food NAL Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 62 / 113 Use Evaluation Conclusions Matching learning: examples These algorithms learn how to sort alignments through the presentation of many correct alignments (positive examples) and incorrect alignments (negative examples) o1 A multistrategy learning approach is useful when several learners are used, each one handling a particular kind of pattern that it learns best, e.g., GLUE, CSR, YAM++. Various well-known machine learning methods, which had been used for text categorisation, were also applied in ontology matching: I Bayes learning, o training R Conclusions = b Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Applications Evaluation = Agrovoc Problem Use CBM example: Scarlet Aggregation combines relations obtained between the same pair of entities a00 w c Classification matching I WHIRL learning, I neural networks, A0 I support vector machines, I decision trees. o0 o2 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko And, in many cases the Weka data mining software was used. 63 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 64 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Tuning Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Tuning: examples Tuning refers to the process of adjusting a matcher for a better functioning in terms of: I better quality of matching results, measured, e.g., through precision or F-measure, and I better performance of a matcher, measured through resource consumption, e.g., execution time, main memory. tuning (prematch) o0 matching tuning (postmatch) A00 From a methodological point of view, tuning may be applied at various levels of architectural granularity: I for choosing a specific matcher, such as edit distance, from a library of matchers, I for setting parameters of the matcher chosen, e.g., cost of edit distance operations, I for aggregating the results of several matchers, e.g., through weighting, I for enforcing constraints, such as 1:1 alignments, I for selecting the final alignment, e.g., through thresholds. parameters o A Problem A0 Informed decisions, for instance, for choosing a specific threshold of 0.55 vs. 0.57 vs. 0.6, should be made. resources A variety of systems have explored different possibilities, e.g., eTuner, MatchPlaner, ECOMatch, AMS. Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 65 / 113 Use Evaluation Conclusions Similarity filter, alignment extractor and alignment filter Many algorithms are based on similarity or distance computation. A number of operations can be based on similarity/distance matrices. M0 M similarity filter A0 A alignment extractor alignment filter Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 67 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 66 / 113 Use Evaluation Conclusions Filtering similarities: thresholding I Hard threshold retains all the correspondence above threshold n; I Delta threshold consists of using as a threshold the highest similarity value out of which a particular constant value d is subtracted; I Proportional threshold consists of using as a threshold the percentage of the highest similarity value; I Percentage retains the n% correspondences above the others. Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 68 / 113 Classification Methods Process Systems Use Evaluation Conclusions Methodology Process Systems Use Classification Methods Evaluation Conclusions r r he 0. 0. .05 .90 .84 .12 .12 .60 .84 I Greedy algorithm: 1.86 (stable marriage) I Permutation: 2.1 (better) Process 69 / 113 Systems Use Evaluation Conclusions Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process 69 / 113 Systems Use Evaluation Conclusions These algorithms measure some quality of a produced alignment, reduce the alignment, so that the quality may improve, and possibly iterate by expanding the resulting alignment, e.g., LogMap, ASMOV, ALCOMO. 0. 0. .05 .90 .84 .12 er W rit Pu bl ish er ns la to r Alignment improvement Tr a ok Bo .84 .12 .60 .84 .12 .60 W rit e Product Provider Creator Extracting alignments Product Provider Creator Methods lis k Bo o .12 .60 .84 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Applications Classification r W rit e 0. 0. .05 I Greedy algorithm: 1.86 (stable marriage) Problem Methodology r he lis .84 .12 .60 .90 .84 .12 Pu b Tr a k Bo o Product Provider Creator Applications Extracting alignments ns la to r Extracting alignments Problem Pu b Methodology ns la to r Applications Tr a Problem .12 .60 .84 o A A00 measure A0 o0 I Greedy algorithm: 1.86 (stable marriage) I Permutation: 2.1 (better) selection I Maximal weight match: 2.52 (optimal) Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko matching/ expansion 69 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko C 70 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Alignment improvement: quality measures Problem Applications Methodology Classification I threshold on confidence or average confidence, I cohesion measures between matched entities, i.e., their neighbours are matched with each other, I ambiguity degree, i.e., proportion of classes matched to several other classes, I agreement or non-disagreement between the aligned ontologies, owl:Thing ⊥ Person Book Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Methodology Classification Methods Process Systems Systems Use Evaluation ≥ ⊥ Conclusions Evaluation Biography = string subject foaf:Person ≤ 71 / 113 Use Essay ≤ topic I violation of some constraints, e.g., acyclicity in the correspondence paths, I satisfaction of syntactic anti-patterns, I consistency and coherence. Applications Process Alignment improvement: alignment debugging Quality measures are the main ingredients for improvement. Contrary to the evaluation measures, such as precision and recall, these must be intrinsic measures of the alignment (they do not depend on any reference): Problem Methods Conclusions User involvement Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 72 / 113 Use Evaluation Conclusions Individual matching There are at least three areas in which users may be involved: I by providing initial alignments to the system (before matching), I by configuring and tuning the system, and I In traditional applications semi-automatic matching/design-time interaction is a promising way to improve quality of the results I Burdenless to the user interaction schemes I by providing feedback to matchers in order for them to adapt their results. I Usability I Scalability of visualization I Exploit the user feedback o I to adjust matcher parameters I to take it as (partial) input alignment to a matcher I ... seed A I In dynamic settings, agents involved in the matching process can negotiate the mismatches in a fully automated way Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko config tuning matching feedback A0 o0 73 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 74 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Collective matching Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Social and collaborative ontology matching Besides involving a single user at a time, mostly in a synchronous fashion, matching may also be a collective effort in which several users are involved. I The network effect: I Each person has to do a small amount of work I Each person can improve on what has been done by others I Errors remain in minority I A community of people can share alignments and argue about them by using annotations o seed A config tuning A0 matching comment discuss feedback communication I The key issues are to: I I I I A00 o0 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 75 / 113 Use Evaluation Conclusions An overview of the state of the art systems Provide adequate annotation support and description units Handle adequately contradictory and incomplete alignments Incentivise active user participation Handle adequately the malicious users Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 76 / 113 Use Evaluation Conclusions State of the art systems DELTA, Hovy, TransScm, DIKE, 100+ matching systems exist, . . . we consider some of them I Cupid (U. of Washington, Microsoft Corporation and U. of Leipzig) I S-Match (U. of Trento) SKAT &ONION, Artemis, H-Match, SEMINT, IF-Map, Tess, Anchor-Prompt, OntoBuilder, NOM & QOM, oMap, T-tree, CAIMAN, Xu and Embley, Wise-Integrator, FCA-merge, LSD, Cupid, COMA & COMA++, QuickMig, SF, MapOnto, CtxMatch&CtxMatch2, IceQ, OLA, Falcon-AO, RiMOM, GLUE, iMAP, CBM, iMapper, SAMBO, Automatch, SBI&NB, AROMA, ILIADS, SeMap, Kang and Naughton, ASMOV, HAMSTER, SmartMatcher, Dumas, Wang et al., I OLA (INRIA Rhône-Alpes and U. de Montréal) I Falcon-AO (China Southwest U.) S-Match, HCONE, MoA, ASCO, XClust, Stroulia & Wang, MWSDI, SeqDisc, I RiMOM (Tsinghua U.) I ASMOV (INFOTECH Soft, Inc., U. of Miami) I LogMap (U. of Oxford) BayesOWL, OMEN, HSM, GeRoMe, CBW, AOAS, BLOOMS & BLOOMS++, GEM/Optima/Optima+, CSR, sPLMap, FSM, Prior+, YAM & YAM++, MoTo, VSBM & GBM, CODI, LogMap & LogMap2, ProbaMap OMviaUO, DCM, Scarlet, CIDER, Elmeleegy et al., BeMatch, PORSCHE, I eTuner (U. of Illinois and The MITRE Corporation) I ... MatchPlanner, Anchor-Flood, Lily, PARIS AgreementMaker, Homolonto, DSSim, Schema-based systems MapPSO, TaxoMap, iMatch Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Instance-based systems 78 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 79 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Cupid Problem Applications Methodology Process Systems Use Evaluation A0 Conclusions M O0 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Methodology Classification Methods Process Systems 80 / 113 Use Evaluation Conclusions S-Match Linguistic matching M0 Structure matching M 00 Weighting thesauri Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 81 / 113 Use Evaluation Conclusions S-Match architecture I Schema-based I Computes equivalence (=), more general (w), less general (v), disjointness (⊥) I Transforms each ontology into a propositional theory based on external resources (WordNet definitions of terms) and ontology structure I Sequential system with a composition at the element level Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko M 000 O I Sequential system Applications Methods Cupid architecture I Schema-based I Computes similarity coefficients in the [0 1] range I Performs linguistic and structure matching Problem Classification 82 / 113 o Preprocessing o0 Oracles PTrees Match manager SAT solvers A0 Basic matchers Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 83 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions OLA Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions OLA architecture O I Schema- and Instance-based I Computes dissimilarities + extracts alignments (equivalences in the [0 1] range) I Based on terminological (including linguistic) and structural (internal and relational) distances I Neither sequential nor parallel parameters A M similarity computation M0 A0 resources O0 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 84 / 113 Use Evaluation Conclusions Falcon-AO Problem Applications Methodology Classification Methods Process Systems M0 Structure matching 85 / 113 Use Evaluation Conclusions Falcon-OA architecture I Schema- and instance-based I Sequence of string-based and structural matcher I Does not use the structural matcher if the terminological match is high enough I String-based matcher based on so-called virtual documents I Structural matcher close to OLA’s I Partition the ontologies so that they can be processed faster Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 86 / 113 o M Linguistic matching M 00 A0 o0 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 87 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions RiMOM Problem o I Linguistic matching is based on edit distance, WordNet and vector distance I Structural matcher implements variations of similarity flooding o0 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions RiMOM architecture I Schema- and instance-based I Dynamic strategy selection based on pre-processing I Sequence of linguistic and structural matchers Problem Applications Methodology Classification Methods Process Systems Evaluation Strategy selection PreLinguistic processing matching 88 / 113 Use Similarity factors Conclusions ASMOV Label matchers Vector matcher Structural matching A0 CCP PPP CPP Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 89 / 113 Use Evaluation Conclusions ASMOV architecture I Schema- and instance-based I Iterative similarity computation with sematic verification I Matchers: string-based, language based, WordNet UMLS, iterative fix point computation I Verification through rule-based (anti-patterns) inference o A o0 Preprocessing Iterative similarity computation WordNet Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Combination 90 / 113 A0 Semantic verification A00 UMLS Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 91 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions LogMap Problem Applications Methodology Classification Methods Auxiliary resource (WordNet, UMLS) Use Evaluation Conclusions Indexing (lexical, structural) Lexical indices’ intersection A o0 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Methodology Classification Methods Process Systems 92 / 113 Use Evaluation Conclusions eTuner of0 Compute overlap o I Mapping repair through propositional satisfiability I String-based mapping discovery from anchors through class hierarchies Applications Systems LogMap architecture I Schema- and instance-based I Applies partitioning and pruning of large ontologies I Initial anchors through indices’ intersection (exact strings) Problem Process of Mapping repair Mapping discovery A0 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 93 / 113 Use Evaluation Conclusions eTuner architecture I A metamatching system I Automatically tunes a matching system for a particular task by choosing the most effective matchers and the best parameters to be used I A matching system is modeled as a triple hL, G , K i: I L is a library of matching components, e.g., edit distance; combiners, e.g., through averaging; constraint enforcers, e.g., pre-defined domain constraints; match selectors, e.g., thresholds. I G is a directed graph which encodes the execution flow among the components of the given matching system. I K is a set of knobs to be set. S Transformation rules Tuning procedures Task generator Staged tuner Synthetic training System M: task Sample generator S0 hL, G , K i Augmented schema S Tuned system M I Two phases: training through synthetic workload and (greedy) search. Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 94 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 95 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Main operations Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Merging O O0 Matcher I Merge(o, o 0 , A) = o 00 I Translate(d, A) = d 0 A I Interlink(d, d 0 , A) = L I TransformQuery (q, A) = q 0 and Translate(a0 , Invert(A)) = a I ... Generator SWRL, OWL, SKOS axioms Merge(O, O 0 , A) Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 97 / 113 Use Evaluation Conclusions Data translation O Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Query mediator Matcher O0 O A A Generator Generator Transformation Query’ mediator D Answer Answer’ WSML, QueryMediator XSLT, C-OWL Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko O0 Matcher Query D 98 / 113 99 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 100 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Evaluation of matching algorithms Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions 6 tracks, 8 test cases, 23 participants in 2013 http://oaei.ontologymatching.org Goal: improvement of matching algorithms through comparison, measure of the evolution of the field. I Yearly campaigns comparing algorithms on different test cases I Participants submit their alignments in a standard format I We use alignment API for comparing these formats with reference alignments I Various degrees of blindness, expressiveness, realism test formalism relations benchmark anatomy conference large bio multifarm library interactive rdft OWL OWL OWL-DL OWL OWL OWL OWL-DL RDF = = =, <= = = = =, <= = confidence [0 [0 [0 [0 [0 [0 [0 [0 1] 1] 1] 1] 1] 1] 1] 1] modalities language blind+open open blind+open open open open open blind EN EN EN EN CZ, CN, EN, . . . EN, DE EN EN SEALS √ √ √ √ √ √ √ I Tests and results are published on the web site Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 102 / 113 Use Evaluation Conclusions Evaluation process Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 103 / 113 Use Evaluation Conclusions Precision and recall A C × C0 × Q R R o parameters matching o0 evaluator m A0 Definition (Precision, Recall) resources Given a reference alignment R, the precision of some alignment A is given by P(A, R) = Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 104 / 113 |R ∩ A| |R ∩ A| and recall is given by R(A, R) = . |A| |R| Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 105 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions 2013: Precision-recall graph for the conference test case Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Summary YAM++ AML-bk LogMap AML ODGOMS StringsAuto F1 -measure=0.7 ServOMap MapSSS F1 -measure=0.6 Hertuda F1 -measure=0.5 WikiMatch WeSeE-Match I Heterogeneity of ontologies is in the nature of the semantic web; I Ontology matching is part of the solution; I It can be based on many different techniques; I There are already numerous systems around; I A relatively solid research field has emerged (tools, formats, evaluation, etc.) and it keeps making progress; I But there remain serious challenges ahead. IAMA HotMatch CIDER-CL edna rec=1.0 rec=.8 rec=.6 pre=.6 pre=1.0 pre=.8 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 106 / 113 Use Evaluation Conclusions Challenges Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko Problem Applications Methodology Classification Methods Process Systems 108 / 113 Use Evaluation Conclusions Acknowledgments We thank all the participants of the Heterogeneity workpackage of the Knowledge Web network of excellence I Large-scale and efficient matching, I Matching with background knowledge, In particular, we are grateful to Than-Le Bach, Jesus Barrasa, Paolo Bouquet, Jan De Bo, Jos De Bruijn, Rose Dieng-Kuntz, Enrico Franconi, Raúl Garcı́a Castro, Manfred Hauswirth, Pascal Hitzler, Mustafa Jarrar, Markus Krötzsch, Ruben Lara, Malgorzata Mochol, Amedeo Napoli, Luciano Serafini, François Sharffe, Giorgos Stamou, Heiner Stuckenschmidt, York Sure, Vojtěch Svátek, Valentina Tamma, Sergio Tessaris, Paolo Traverso, Raphaël Troncy, Sven van Acker, Frank van Harmelen, and Ilya Zaihrayeu. I Matcher selection, combination and tuning, I User involvement, I Social and collaborative matching, I Uncertainty in matching, I Reasoning with alignments, I Alignment management. And more specifically to Marc Ehrig, Fausto Giunchiglia, Loredana Laera, Diana Maynard, Deborah McGuinness, Petko Valchev, Mikalai Yatskevich, and Antoine Zimmermann for their support and insightful comments and, of course, many others... Part of this work was carried out while Pavel Shvaiko was with the University of Trento. Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 109 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 110 / 113 Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Problem Applications Methodology Classification Methods Process Systems Use Evaluation Conclusions Ontology matching the book, 2nd edition Jérôme Euzenat, Pavel Shvaiko Ontology matching 1. Applications 2. The matching problem 3. Methodology Thank you 4. Classification 5. Basic similarity measures for your attention and interest! 6. Global matching methods 7. Strategies 8. Systems 9. Evaluation 10. Representation 11. User involvement 12. Processing Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko http://book.ontologymatching.org Questions? [email protected] [email protected] http://www.ontologymatching.org 111 / 113 Ontology matching tutorial (v15): ISWC-2014 (Riva del Garda, Italy) – Euzenat and Shvaiko 112 / 113
© Copyright 2026 Paperzz