Semantic Web: the Story So Far

Semantic Web
The Story So Far
Ian Horrocks
<[email protected]>
Oxford University
Computing Laboratory
Semantic Web
Semantic Web
• According to W3C
– “an evolving extension of the World Wide Web in which web
content can be … read and used by software agents, thus
permitting them to find, share and integrate information more
easily”
• Data will use uniform syntactic structure (RDF)
• (OWL) ontologies will provide
– Schemas for data
– Vocabulary for annotations
• Ultimate goal is a “more intelligent web”
Web Ontology Language OWL
• Semantic Web led to requirement for a “web ontology language”
•
set up Web-Ontology (WebOnt) Working Group
– WebOnt developed OWL language
– OWL based on earlier languages RDF, OIL and DAML+OIL
– OWL now a W3C recommendation (i.e., a standard)
• OWL is a family of 3 languages: OWL Lite, OWL DL
and OWL Full
• OIL, DAML+OIL and OWL (DL & Lite) based on
Description Logics
– Has facilitated development of wide range of high
quality tools & infrastructure
• OWL now language of choice in many
applications
What Are Description Logics?
• A family of logic based Knowledge Representation
formalisms
– Descendants of semantic networks and KL-ONE
– Describe domain in terms of concepts (classes), roles
(properties, relationships) and individuals
– Operators allow for composition of complex concepts
– Names can be given to complex concepts, e.g.:
HappyParent ´ Parent u 8hasChild.(Intelligent t Athletic)
Why (Description) Logic?
• OWL exploits results of 15+ years of DL research
– Well defined (model theoretic) semantics
– Most DLs are subsets of C2, i.e., decidable fragments of FOL
Why (Description) Logic?
• OWL exploits results of 15+ years of DL research
– Well defined (model theoretic) semantics
– Formal properties well understood (complexity, decidability)
I can’t find an efficient algorithm, but neither can all these famous people.
[Garey & Johnson. Computers and Intractability: A Guide
to the Theory of NP-Completeness. Freeman, 1979.]
Why (Description) Logic?
• OWL exploits results of 15+ years of DL research
– Well defined (model theoretic) semantics
– Formal properties well understood (complexity, decidability)
– Known reasoning algorithms
Why (Description) Logic?
• OWL exploits results of 15+ years of DL research
– Well defined (model theoretic) semantics
– Formal properties well understood (complexity, decidability)
– Known reasoning algorithms
– Implemented systems (highly optimised)
KAON2
Pellet
CEL
Class/Concept Constructors
• Concept can be thought of as a FOL formula with one free variable
Knowledge Base / Ontology Axioms
OWL RDF/XML Exchange Syntax
E.g., Parent u 8hasChild.(Intelligent t Athletic):
<owl:Class>
<owl:intersectionOf rdf:parseType=" collection">
<owl:Class rdf:about="#Parent"/>
<owl:Restriction>
<owl:onProperty rdf:resource="#hasChild"/>
<owl:allValuesFrom>
<owl:unionOf rdf:parseType=" collection">
<owl:Class rdf:about="#Intelligent"/>
<owl:Class rdf:about="#Athletic"/>
</owl:unionOf>
</owl:allValuesFrom>
</owl:Restriction>
</owl:intersectionOf>
</owl:Class>
Ontology based Information Systems
• Similar to relational databases
– Ontology ¼ schema; instances ¼ data
• Some important (dis)advantages
+ (Relatively) easy to maintain and update schema
• Both schema and data are “self organising”
+ Query answers reflect both schema and data
+ Able to answer both intensional and extensional queries
– Semantics may be counter-intuitive or even inappropriate
• Open -v- closed world; axioms -v- constraints
– Query answering (logical entailment) much more difficult
• Can lead to scalability problems
Ontology based Information Systems
• Similar to relational databases
– Ontology ¼ schema; instances ¼ data
• Some important (dis)advantages
+ (Relatively) easy to maintain and update schema
• Both schema and data are “self organising”
+ Query answers reflect both schema and data
+ Able to answer both intensional and extensional queries
– Semantics may be counter-intuitive or even inappropriate
• Open -v- closed world; axioms -v- constraints
– Query answering (logical entailment) much more difficult
• Can lead to scalability problems
Very useful, but no miracles!
Ontologies and Reasoning
Support for Ontology Engineering
• Developing and maintaining quality ontolgies is very challenging
• Users need tools and services, e.g., to help check if ontology is:
– Meaningful — all named classes can have instances
Support for Ontology Engineering
• Developing and maintaining quality ontolgies is very challenging
• Users need tools and services, e.g., to help check if ontology is:
– Meaningful — all named classes can have instances
– Correct — captures intuitions of domain experts
Support for Ontology Engineering
• Developing and maintaining quality ontolgies is very challenging
• Users need tools and services, e.g., to help check if ontology is:
– Meaningful — all named classes can have instances
– Correct — captures intuitions of domain experts
– Minimally redundant — no unintended synonyms

Banana split
Banana sundae
Support for Query Answering
• In an Ontology based Information System (OIS),
Query answering ¼ computing logical entailment
– Reasoner needed in order to answer queries, e.g.:
• C is a sub-class of D iff O ² 8 x . C(x) ! D(x)
• a is an instance of C iff O ² C(a)
OIS with no reasoner ¼ DBMS with no query engine
Example Applications
e-Science
• E.g., for “in silico” investigations and “hypothesis testing”
– Comparing data (e.g., on proteins) to (model of) biological knowledge
– Characteristics of proteins captured in an ontology O
• Goal is to identify protein instances based on characteristics
e-Science
• E.g., for “in silico” investigations and “hypothesis testing”
– Comparing data (e.g., on proteins) to (model of) biological knowledge
– Characteristics of proteins captured in an ontology O
• Goal is to identify protein instances based on characteristics
– Equivalent to answering queries of form:
O ² P(i)? for protein P and instance i
– Result may be discovery of new kinds of protein
• And these may be potential drug targets if unique to a pathenogen
– Result may also be discovery of errors in model
• Which may reflect gaps/errors in existing knowledge
Healthcare
• UK NHS has a £6.2 billion “Connecting for Health” IT programme
• Key component is Care Records Service (CRS)
– “Live, interactive patient record service accessible 24/7”
– Patient data distributed across local centres in 5 regional clusters,
and a national DB
• Detailed records held by local service providers
• Diverse applications support radiology, pharmacy, etc
• Applications exchange messages containing “semantically rich clinical
information”
• Summaries sent to national database
– SNOMED-CT ontology provides common vocabulary for data
• Clinical data uses terms drawn from ontology
SNOMED
• Over 400,000 concepts
SNOMED
•
•
•
•
Over 400,000 concepts
Schema only — no instances
Language used is a (well known) fragment of OWL
NHS version extended with 1,000s of additional classes
– OWL reasoner (FaCT++) used to classify and check ontology
• Currently takes ¼ 4 hours
– 180 missing subClass relationships were found, e.g.:
• Periocular_dermatitis subClassOf Disease_of_face
• Fibrin_measurement subClassOf Coagulation_factor_assay
SNOMED
• Vocabulary is extensible at point of use: “post coordination”
– Users (e.g. clinicians) may add/define new vocabulary
– Terminology service (reasoner) used to insert in ontology
• Typical new term:
– almond_allergy ´ “allergy caused_by almond”
– OWL reasoner (FaCT++) used to classify new term
•
Takes <10 ms
– Classified as a kind of “nut allergy”
• Clearly of crucial importance to recognise patients with allergy caused
by almond as kinds of patient with nut allergy
Recent Developments
OWL 1.1
• Is an extension of OWL
– Addresses deficiencies identified by users and developers
(at OWLED workshop)
• Is based on more expressive DL: SROIQ
– (OWL is based on SHOIN)
• W3C working group now chartered
– Will develop recommendation based on
existing member submission
• Already supported by popular OWL tools
– Protégé, Swoop, TopBraid,
FaCT++, Pellet
What’s New in OWL 1.1?
Four kinds of features:
• More expressive logic (SROIQ)
–
qualified cardinality restrictions (>n R.C) and (6n R.C), e.g:
•
Person v Animal u =2 hasPart.Legs
•
Car v =4 hasComponent.Wheel
•
Person v 6 1 bioParent.Male
(OWL/SHOIN only allows for concepts (>n R) and (6n R))
What’s New in OWL 1.1?
Four kinds of features:
• More expressive logic (SROIQ)
– Expressive role axioms (R), e.g., complex role inclusions:
R1 o … o Rn v S
R1 o … o Rn o S v S
S o R1 o … o Rn v S
(with some restrictions on cycles)
–
useful, e.g., for
owns o hasPart v owns
) 9owns.Bicycle v 9owns.Wheels
partOf o locatedIn v locatedIn ) Fracture u 9locatedIn.FemurShaft
v Fracture u 9locatedIn.Femur
hasParent o hasBrother v hasUncle
What’s New in OWL 1.1?
Four kinds of features:
• More expressive logic (SROIQ)
– Expressive role axioms (R), e.g., asymmetry, reflexivity, etc:
•
Tra(R)
(supported by SHOIN )
•
Asy(R)
e.g., Asy(properpartOf), Asy(hasParent)
•
Sym(R) (supported by SHOIN )
•
Refl(R)
•
Irrefl(R) e.g., Irrefl(properPartOf), Asy(hasParent)
•
Disj(R S) e.g., Disj(hasParent hasSibling)
•
ObjectExistsSelf(likes)
e.g., Refl(knows)
[for narcissists]
What’s New in OWL 1.1?
Four kinds of features:
• More expressive datatypes
– OWL 1.1 allows for user-defined datatypes:
•
over18 ´ base(xsd:integer) minInclusive("18"xsd:integer)
•
Adult ´ Person u 9 age.over18
– and n-ary datatype predicates:
•
Spendthrift ´ 9 spends,earns.>
– BUT, still cannot:
•
define complex relationships between data properties on different
individuals, e.g., Women who earn more than their husbands.
•
declare a datatype property as inverse-functional (keys).
What’s New in OWL 1.1?
Four kinds of features:
• Metamodelling and annotations
– Names can be used as any or all of an individual, a class, or a
property
– Allows for a restricted form of metamodelling (“punning”), e.g.:
subClassOf(SnowLeopard BigCat)
ClassAssertion(SnowLeopard EndangeredSpecies)
– Annotations of axioms as well as entities
ClassAssertion(Comment(“source: WWF”) SnowLeopard
EndangeredSpecies)
What’s New in OWL 1.1?
Four kinds of features:
• Syntactic sugar (make things easier to say)
– Disjoint unions, e.g.:
DisjointUnion(Element Earth Wind Fire Water)
– Negative assertions, e.g.:
NegativeObjectPropertyAssertion(Ian hasChild Mary)
NegativeDataPropertyAssertion(Ian hasAge 21)
Tractable Fragments
• OWL defines only one fragment (OWL Lite)
– And it isn’t very tractable!
• OWL 1.1 defines several different
fragments with useful
computational properties
– E.g., reasoning complexity in
range LOGSPACE to PTIME
– Smaller fragments
implementable using
RDBs
Tractable Fragments
Tools and Methodologies
• OWL 1.1 support already added to several tools:
– Protégé, Swoop, TopBraid Composer, FaCT++, Pellet
• New features available (soon) in OWL tools:
– Diagnosis and semi-automatic repair of errors
– Support for integration and modular design
– Incremental classification (addition and retraction)
– Support for bottom up design
Diagnosis
• Editing tools use reasoner to identify inconsistent classes
• May not be very useful without some explanation facility
Modularity in Ontology Engineering
Benefits of a modular ontology design: to simplify
• ontology refinement/update
modifying a module should not lead to modifications in parts of
the ontology that are not conceptually related
• understanding
relationships between different modules in an ontology
controlled and well-understood
• integration with other ontologies
no unexpected consequences
• partial reuse
reuse only the relevant part/module of an ontology
Tool Support for Modular Design
• Check when integration of modules is “safe”
– Interface between modules via exported vocabulary
– Information flows from imported to importing ontology
– No information flows back the other way
• Formalised using conservative extensions
– What is the effect of merging O2 into O1?
– In general, check that O1 [ O2 ² C iff O1 ² C for any concept
C constructed using vocabulary occurring in O1
[Cuenca Grau & Kazakov, IJCAI-07 and WWW-07]
Tool Support for Modular Design
• Extract smaller modules from large ontologies
– E.g., starting with FMA, extract module for “Heart”
– Tool should ensure that module
• Is as small as possible, but
• Still contains all relevant knowledge
• More formally:
– Extract a (small) module from O capturing all “relevant”
information about some vocabulary V
– In general, find O’ µ O s.t. O’ ² C iff O ² C for any concept C
constructed using terms from V
Incremental Reasoning
• Modules can also be used to support incremental
addition and retraction of axioms, e.g:
– When retracting C v D, reclassify only concepts whose
module includes this axiom
– Typically this is only a very small subset of all concepts
• Prototype now implemented in Swoop editor
Tool Support for Bottom-up Design
• Bottom-up design
– Find a (small and specific) concept describing a set of
individuals
– In general, find most specific C s.t. O ² C(i1) Æ … Æ C(in)
• Where C may be “small” and/or in a sub-language (of O)
– Prototype: SONIC system [Turhan et al]
Extending Expressive Power
• Database style keys [Lutz et al, JAIR 2004]
– E.g., make + model + chassis-number is a key for Vehicles
• Rule language extensions
– W3C RIF WG (see http://www.w3.org/2005/rules/)
– First order extensions (e.g., SWRL) [Horrocks et al, JWS, 2005]
– Hybrid language extensions, e.g., [Eiter et al, KR-04; Motik et al, ISWC-04; Rosati,
JoWS, 2005]
– LP/F-Logic/Common Logic [Chen et al, JLP, 1993; de Bruijn et al, WWW-05]
• Other extensions
– Temporal
– Fuzzy
– Extended annotation framework
– Macro language
– …
Improving Scalability
• Optimisation techniques
– Improve performance of DL reasoners, e.g., [Tsarkov et al, JAR, ]
• New Reasoning Techniques
– Reduction to disjunctive Datalog
[Motik et at, KR-04]
• Transform SHOIN ontology to DatalogÇ rules
• Use LP techniques to deal with large numbers of ground facts
– Hybrid DL-DB systems
[Horrocks et al, CADE-05]
• Use DB to store “Abox” (individual) axioms
• Cache inferences and use DB queries to answer/scope logical queries
– Hypertableau based algorithms [Motik et al, CADE-07]
• Prototypical implementation in HermiT system
• Polynomial time algorithms for sub-ALC logics
– Graph based techniques for EL+ [Baader et al, IJCAI-05]
– Database techniques for DL-Lite [Calvanese et al, AAAI-05]
Developing Tools and Infrastructure
• Editors/environments
– Oiled, Protégé, Swoop, TopBraid, Ontotrack, …
Developing Tools and Infrastructure
• Editors/environments
– Oiled, Protégé, Swoop, TopBraid, Ontotrack, …
• Reasoning systems
– Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, …
Pellet
KAON2
CEL
Developing Tools and Infrastructure
• Editors/environments
– Oiled, Protégé, Swoop, TopBraid, Ontotrack, …
• Reasoning systems
– Cerebra, FaCT++, Kaon2, Pellet, Racer, CEL, …
• Design methodologies
– Foundational ontologies, etc.
Entity
Endurant
Quality
Substantial
Perdurant
Event
Achievement
Stative
Accomplishment
Summary
• Semantic Web aims to make web content more
accessible to automated processes
– Adds semantic annotations to web resources
• OWL Ontologies provide vocabulary for annotations
– Terms have well defined meaning
• OWL now being used in a wide range of applications
– e-Science, medicine, geography, geology, …
• Reasoning enabled tools are of crucial importance
– For both design and deployment of ontologies
• Active research area
– Expressive power, scalability, methodologies, tools, …
Thank you for listening
Thank you for listening
FRAZZ:
© Jeff Mallett/Dist. by United Feature Syndicate, Inc.
Any questions?