Natural Language and the
Semantic Web: a crucial symbiosis
Philipp Cimiano
Web Information Systems, TU Delft, The Netherlands
7/16/09
Delft
University of
Technology
Challenge the future
Aims and Not-Aims
• Aims
• Overview
• Raise Questions
• Entertain and Encourage
Not-Aims
• Present my own work (only a bit ;-)
• Present solutions or answers
Semantic Web Summer School (SWSS09), Cercedilla
2
Structure
• The relation between ontologies and natural language
• Applications at the ontology-language interface
• Principled approaches to the language-ontology interface
• The LexInfo model
• Conclusion
Semantic Web Summer School (SWSS09), Cercedilla
3
Symbiosis
• The term symbiosis commonly describes close and often long-term
interactions between different biological species.
Semantic Web Summer School (SWSS09), Cercedilla
4
Different type of symbiotic relations
• Mutualism is a biological interaction between two organisms, where
each individual derives a fitness benefit, for example increased
survivorship.
• Commensalism is a class of relationship between two organisms
where one organism benefits but the other is unaffected.
• Parasitism is a type of symbiotic relationship between two different
organisms where one organism, the parasite, takes favor from the
host, sometimes for a prolonged time.
What type of relation exists between ontologies (as building blocks of
the Semantic Web) and natural language?
Semantic Web Summer School (SWSS09), Cercedilla
5
Fitness Benefit
• What fitness benefit do ontologies derive from natural language?
Semantic Web Summer School (SWSS09), Cercedilla
6
Symbol Grounding
• Symbol Grounding Problem (Harnard 1990): it is very difficult (if not
impossible) to express the meaning of a symbol in the system itself.
We need anchoring to some external system.
• In the case of ontologies, this external system is language.
• We define local names as part of URIs
http://www.example.org#car
• We specify labels of these URIs
rdf:label(http://www.example.org#car,’car’)
• We add natural language definitions of the classes and properties we
define (e.g. using rdfs:comment)
“A car is a wheeled motor vehicle used for transporting
passengers, which also carries its own engine or motor.”
Semantic Web Summer School (SWSS09), Cercedilla
7
Further benefits
• Ontologies benefit from language:
• Grounding of meaning for humans
• Population of ontologies from textual data (massively available)
• Language-based interaction with knowledge (e.g. querying by way
of natural language)
• Reading documents describing how humans perceive the world to
support ontology engineering (consulting domain-specific literature
is an important step in most knowledge engineering
methodologies)
Semantic Web Summer School (SWSS09), Cercedilla
8
A commensual or parasitic relation ?
• So is the relation commensal or even parasitic in the sense that
ontologies need language (to ground the meaning of symbols) but
language does not profit from ontologies?
Semantic Web Summer School (SWSS09), Cercedilla
9
NLP benefits from ontologies
• In formal semantics it is assumed that meaning can be captured by a
logical formalism (typically FOL) which supports reasoning and
drawing of inferences (humans clearly do so).
• The meaning of the sentence: “Vincent is married to Mia” is:
marriedTo(vincent,mia)
• But what do these symbols mean in the logical system in terms of
what conclusions we can draw? (is MarriedTo symmetric? timeless?)
• What are the legal symbols that we can use? (a question of ontology)
Semantic Web Summer School (SWSS09), Cercedilla
10
There are a number of ways in which
meaning can be represented
e.g. “Vincent is married to Mia.”
marriedTo(vincent,mia)
∃x marriage(x) ∧ partner(x,vincent) ∧ partner(x,mia)
∃x marriage(x) ∧ partner(x,vincent) ∧ partner(x,mia)
∧holdsDuring(x,interval) ∧ overlap(interval,now)
€
Semantic Web Summer School (SWSS09), Cercedilla
11
Word Sense Disambiguation (WSD)
• Well-known that words have different senses (at least 10 according to
WordNet!)
• There is no limit to the senses that we can consider (very fine-grained)
Semantic Web Summer School (SWSS09), Cercedilla
12
Named Entity Recognition
• Named entity recognition recognizes entities of a certain type in
textual data:
<painter>Rembrandt Harmenszoon van Rijn </painter> was born on
<date> July 15, 1606 </date> in <city> Leiden </city>, <country>
the Netherlands </country>. He was the ninth child born to <person>
Harmen Gerritszoon van Rijn </person> and <person> Neeltgen
Willemsdochter van Zuytbrouck </person>.
• Arbitrary number of possible types and granularity (tag people as
person or according to their profession etc.)
Semantic Web Summer School (SWSS09), Cercedilla
13
Semantic Normalization
• The Liffy flows through Dublin.
=> flowsThrough(Liffy,Dublin)
• Dublin lies at the Liffy.
=> lies_at(Dublin,Liffy)
• Dublin is located at the Liffy.
=> located_at(Dublin,Liffy)
• The Liffy passes Dublin.
=> passes(Liffy,Dublin)
Semantic Web Summer School (SWSS09), Cercedilla
14
Ontologies are crucial for the analysis
of natural language
• Ontologies define and axiomatize a vocabulary.
• Define the meaning of symbols to allow to reason with them (e.g.
marriedTo is symmetric, bound to a certain time interval)
• Define the granularity for WSD and NER and other tasks.
• Normalization
• Help to constrain the task of interpreting language for a specific
purpose, domain, application etc.
Semantic Web Summer School (SWSS09), Cercedilla
15
Applications at the interface between
language and ontologies
• Information Extraction / Ontology Population
• Ontology-based Question Answering
• Ontology Engineering
• Ontology Verbalization
Semantic Web Summer School (SWSS09), Cercedilla
16
Scenario
• Assume we have an ontology about artists modeling:
• Name
• Birth and death dates
• Birth and death places
• Marriages, children
• Paintings with their creation date
• Influences by other artists
• Etc.
There are many artists so it is hard add all relevant instances manually.
Textual data is massively available, so what about extracting
information from textual data to populate the ontology automatically?
This process has been typically referred to as ontology population
(as opposed to ontology learning which tries to learn the actual
schema)
Semantic Web Summer School (SWSS09), Cercedilla
17
1. Ontology Population/Information
Extraction
• “Claude Monet was born on 14 November 1840
on the fifth floor of 45 rue Laffitte, in the ninth
arrondissement of Paris.”
-> birthplace(Claude Monnet, Paris)
-> birthdate (Claude Monnet,14.11.1840)
• “Monet lived from December 1871 to 1878 at
Argenteuil, a village on the Seine near Paris”.
-> type(stay_Monnet_Paris,Stay)
-> artist(stay_Monnet_Paris,Claude_Monnet)
-> place(stay_Monnet_Paris,Argenteuil)
-> during(stay_Monnet_Paris,interval_1871_1878)
->
Semantic Web Summer School (SWSS09), Cercedilla
18
Challenges for Ontology Population
• Normalization (different variants map to the same ontological
representation)
• Capture different variants (learn from examples using machine
learning techniques, sparseness)
• Ontology-sensitive processing
• Granularity of word senses that need to be distinguished
• Use ontology for disambiguation
• Ignore constituents which are not relevant to the ontology
Semantic Web Summer School (SWSS09), Cercedilla
19
2. Question Answering
• Ontologies model relevant world knowledge in a certain domain.
• As humans, we are interested in accessing this knowledge, preferably
in an intuitive way (e.g. by means of natural language)
• Many systems have been designed in the past to meet this need:
• Aqualog [Lopez and Motta 2004]
• ORAKEL [Cimiano et al. 2008]
• GINO [Bernstein and Kaufmann 2006]
• And many more…
• E.g. Who is a professor at the knowledge media institute?
• Prof. Enrico Motta
• Prof. John Domingue
• Prof. Stefan Rueger
Semantic Web Summer School (SWSS09), Cercedilla
20
Ontology-based Question Answering (e.g.
Aqualog [Lopez and Motta 2004])
Who is a Professor at the Knowledge Media Institute?
(Who,is,professor)
(professor,at,Knowledge Media Institute)
RSS
<typeOf ?x Professor-in-Academia> & <works-in-unit ?x KMi>
Semantic Web Summer School (SWSS09), Cercedilla
21
Ontology-based Question Answering
Who is PC member of the ISWC conference?
(Who,is,PC Member)
(PC Member,of,ISWC Conference)
(?x,PCMemberOf,ISWC)
Semantic Web Summer School (SWSS09), Cercedilla
22
3. Language in Ontology Engineering
Ontology Languages (OWL and RDFS) are hard to grasp, both
semantically and syntactically.
• RDF-XML and OWL-XML syntaxes hard to read by humans
• OWL Abstract syntax only for experts (logicians)
• Manchester syntax more intuitive, but not for “casual users”
Pizza AND NOT (hasTopping SOME FishTopping) AND
NOT (hasTopping SOME MeatTopping)
The idea has been to allow people to model ontological knowledge using
natural language.
Most approaches along these lines rely on “controlled natural language”.
Semantic Web Summer School (SWSS09), Cercedilla
23
What is controlled natural language?
• Controlled natural languages (CNLs) are subsets of natural languages,
obtained by restricting the grammar and vocabulary in order to reduce
or eliminate ambiguity and complexity.
• Reducing ambiguity: “Every man loves a woman.”
• Reading 1: Every man loves a woman which is specific to him.”
⇒ Use “Every man loves a woman.”
• Reading2: All man love the same woman.
⇒ Use “There is a woman that every man loves.”
Controlled language is prescriptive in this sense and people have to
learn what expressions to use to express a certain state of affairs.
Semantic Web Summer School (SWSS09), Cercedilla
24
4. Ontology Verbalization
• Helping in ontology engineering by:
• Verbalizing the ontology
• Allowing to create axioms in natural language
• Different approaches:
• ACE [Kaljurand et al. 2008]
• Sydney Syntax [Cregan et al. 2008]
Semantic Web Summer School (SWSS09), Cercedilla
25
Verbalization in ACE and Sydney
Syntax
• Design Choices:
• Bijective Mapping between Controlled Language and Axiomatic
Representation (allows “roundtripping”, see [Davis et al. 2008])
• Functional: one unique way of verbalizing something
• Do not use constructs mirroring the OWL syntax, use “natural
English”
• “car is a subclass of vehicle”: => “Every car is a vehicle.”
• “Man and woman are disjoint classes” => “There is no man
that is also a woman.”
• Use variables to express more complex axioms:
• “If X is married to Y then Y is also married to X”.
Semantic Web Summer School (SWSS09), Cercedilla
26
OWL Verbalization
owl:IntersectionOf(
cat
owl:ComplementOf(
owl:SomeValuesFrom(
like
owl:IntersectionOf(
dog
owl:UnionOf(
owl:SomeValuesFrom(attack mailman)
owl:OneOf(Fido))))))
Verbalized as “something that is a cat and that does not like a dog that
attacks a mailman or that is Fido”
Semantic Web Summer School (SWSS09), Cercedilla
27
Opaqueness
• Make the various SW formalisms opaque to the user (OWL, SWRL,
SPARQL)
• Every employee that does not own a car owns a bike.
• Every man that owns a car likes that car.
• Who owns a car?
employee ∩¬(∃ own.car) ⊆ ∃ own.bike (OWL)
man(?x) ∧ own(?x,?y) ∧ car(?y) → like(?x,?y) (SWRL)
SELECT ?x WHERE {?x owns ?y. ?y rdf : type car} (SPARQL)
Semantic Web Summer School (SWSS09), Cercedilla
28
Limits of ACE -> OWL
• Mathematical (e.g. transitive properties):
• “If something A is taller than something B and B is taller than
something C then A is taller than C.”
• Quite complex (disjoint union):
• “No male is a female. No female is a male. Every person is a male
or is a female. Everything that is a male or that is a female is a
person.”
• Much easier than Protégé?
Semantic Web Summer School (SWSS09), Cercedilla
29
5. Text Generation from Ontologies
[Bontcheva 2005]
<rdf:Description rdf:about=http://www.aifb.uni-kalrsuhe.de/Personen/viewPersonOWL#instance?
id_db=20>
<rdf:type>
<owl:Class rdf:about=“&swrc;AssistantProfessor”>
</rdf:type>
<swrc:name rdf:datatype=“xsd:string”> York Sure </swrc:name>
<swrc:phone rdf:datatype=“xsd:string”> +49 (0) 721 608 6592 </swrc:phone>
<swrc:fax rdf:datatype=“xsd:string”> +49 (0) 721 608 6580 </swrc:fax>
<swrc:homepage rdf:datatype=“xsd:string”> http://www.aifb.uni-karlsruhe.de/WBS/ysu</
swrc:homepage>
</rdf:Description>
York Sure has a telephone number +49 (0) 721 608 6592, a fax number +49 (0) 721 608 6580, and a
web page http://www.aifb.uni-karlsruhe.de/WBS/ysu.
Semantic Web Summer School (SWSS09), Cercedilla
30
Different parts of the puzzle
Ontology-based Question Answering
Ontology Population
Ontology Verbalization
Ontology Generation
Semantic Web Summer School (SWSS09), Cercedilla
31
Reuse of NLP approaches
• To make the landscape more homogeneous, building a common
resources, infrastructures, grammars etc. would be crucial.
• This should be achieved by reusing state-of-the-art and mature
technologies from the NLP community, in particular:
• Dependency parsing
• Compositional semantics
Semantic Web Summer School (SWSS09), Cercedilla
32
Many state-of-the-art parsers
(dependency parsers)
• On the one hand, there are many state-of-the-art dependency parsers
that we could use:
• Stanford Parser (http://nlp.stanford.edu/software/lex-parser.shtml),
Manning et al.
• RASP (http://www.informatics.susx.ac.uk/research/groups/nlp/
rasp/) , Sussex, by T. Briscoe and T. Caroll
• Malt (http://maltparser.org/ ) by Joakim Nivre, Sweden
• In addition, there are large-scale grammars available:
• XLE LFG Grammar Engineering Environment by PARC (used in Powerset and
Bing recently)
• LinGO English Resource Grammar (ERG)
• …
Semantic Web Summer School (SWSS09), Cercedilla
33
Who is professor at the Knowledge
Media Institute?
is
attr
who
nsubj
professor
prep_at
Knowledge Media Institute
det
the
Semantic Web Summer School (SWSS09), Cercedilla
34
Problems of “shallow” triple-based
approaches
• Triple-based approach too simplistic to account for fine-grained
meaning variations:
Who was a professor at the Knowledge Media Institute?
Who has been PC member of all ISWC conferences?
• The distinctions are often lost when we mapping into triples and it is
very hard to reconstruct them (there is nothing similar to “was” and
“all” in the ontology), so still if we carry the information on, similaritybased approaches can not use it properly!
Semantic Web Summer School (SWSS09), Cercedilla
35
Problematic examples for triple-based
approaches
Who was PC member of all ISWC conferences?
(Who,was_a,PC Member)
(PC Member,of,ISWC Conference)
(?x,PCMemberOf,ISWC)
?x forall y (y rdf:type Conference; y hasAcronym “ISWC”) ->
x pcMemberOf y
Small but important variations in meaning which escape triplebased approaches
Semantic Web Summer School (SWSS09), Cercedilla
36
Compositional semantics
• Principle of compositional semantics: The meaning of a question
(sentence) is determined by the meaning of its parts and the way they
are composed together.
Vincent loves Mia. => loves(vincent,mia)
loves
λx λy love(x,y)
subj
€
Vincent
vincent
obj
Mia
mia
Important: takes into account the contribution of every single word in
terms of the overall meaning of the sentence, guided by the
dependency analysis.
Semantic Web Summer School (SWSS09), Cercedilla
37
Who was a PC Member of the ISWC
conference?
was
nsubj
attr
who
PC member
prep_of
conference
det
nn
the
ISWC
?x person(x) ∧ ∃y PCMemberOf(x,y) ∧ Conference(y) ∧ hasAcronym(y,"ISWC")
€
Semantic Web Summer School (SWSS09), Cercedilla
38
€
Who was a PC Member of all ISWC
conferences?
was
nsubj
attr
who
PC member
prep_of
conferences
det
all
nn
ISWC
?x person(x) ∧ ∀y (Conference(y) ∧ hasAcronym(y,"ISWC")) → PCMemberOf(x,y)
Semantic Web Summer School (SWSS09), Cercedilla
39
Compositional Semantics Approach
• Elegant and principled approach to compute the meaning of sentences
(w.r.t. to the given ontology)
• Together with dependency parsing, it has the potential to provide a
common basis for all those applications mapping language to ontology
and the other way round:
• Ontology Population
• Ontology-based Question Answering
• Generation
• Powerful approach not always trivial to implement (research
challenge!)
Semantic Web Summer School (SWSS09), Cercedilla
40
Language-ontology interfaces needs
lexical semantics
• The lexical semantics of nouns, adjectives, verbs etc. has to be
specified with respect to the domain ontology for all applications at the
ontology-language interface.
• The meaning of a question, sentence etc. w.r.t. to the ontology can
then be calculated on the basis of the lexical semantics of the single
words according to the principle of compositional semantics.
• Suboptimal solution: every instantiation of an application at the
language-ontology interface (population, verbalization, generation)
instantiates the mapping from the ontology from scratch.
• Optimal solution: we make the meaning of nouns, adjectives, verbs
etc. explicit and publish them declaratively (as an ontology)
• This is the goal we have pursued when developing the LexInfo model.
Semantic Web Summer School (SWSS09), Cercedilla
41
Bridging the gap: ontology lexicons
Ontology
lexicon
• “Ontology lexicons” provide information about linguistic realization of concepts,
properties, instances etc. (clearly separating but linking both levels) in a
declarative fashion
• These ontology lexicons are not proprietary to any system and can be reused.
• While the ontology “talks” about concepts, properties, instances and other
axioms, the ontology lexicon “talks” about “lexical elements”, together with
information about part-of-speech, morphological (de-) composition, syntactic
behaviour etc.
Semantic Web Summer School (SWSS09), Cercedilla
42
Separation between Linguistic and
Ontological Level
• Separation also allows to develop and maintain the lexicons
independently of the ontology.
• This means that we can perfectly allow different lexica for each
ontology to co-exist (why not?)
• In RDF and SKOS, this is not possible.
• Our solution (sketch):
ontology lexicon ontology river
(concept)
refersTo (meta‐ontology) river
(lexical
entry)
POS noun “river” lemma plural “rivers” Semantic Web Summer School (SWSS09), Cercedilla
43
LexInfo: LexicalEntry
• Top level distinguishes specifically different parts-of-speech as classes:
Semantic Web Summer School (SWSS09), Cercedilla
44
X flows through Y (IntransitivePP)
Semantic Web Summer School (SWSS09), Cercedilla
45
Variants of Expression with LexInfo
flowsThrough(Seine,Paris) Paris is located at the Seine. The Seine passes Paris. The Seine crosses Paris. The Seine flows through Paris. Semantic Web Summer School (SWSS09), Cercedilla
46
Vision
Ontology
lexicon
Ontology
lexicon
Ontology
lexicon
Ontology
lexicon
Semantic Web Summer School (SWSS09), Cercedilla
47
Conclusion
• There is indeed a symbiotic relation between language and ontologies
in which both benefit from each other
• Many applications at the language-ontology interface do not build on a
common approach (allowing reuse of grammars, components etc.)
• Many mature techniques from the computational linguistics/semantics
communities are ready to be used.
• Important step: principled models for representing the lexical
semantics of words in a way that they can be reused
Semantic Web Summer School (SWSS09), Cercedilla
48
Thanks for your attention!
Semantic Web Summer School (SWSS09), Cercedilla
49
Acknowledgements
• Multipla project - DFG grant 38457858
Semantic Web Summer School (SWSS09), Cercedilla
50
References
1. A. Bernstein, E. Kaufmann (2006),” GINO - A Guided Input Natural Language Ontology Editor”, Proceedings of the
5th International Semantic Web Conference (ISWC 2006).
2. K. Bontcheva (2005), “Generating Tailored Textual Summaries from Ontologies”, Proceedings of the European
Semantic Web Conference (ESWC), pp. 531-545
3. P. Cimiano and P. Haase and J. Heizmann and M. Mantel and R. Studer (2008), “Towards portable natural language
interfaces to knowledge bases: The Case of the ORAKEL system”, Data Knowledge Engineering (DKE), 65(2), pp.
325-354
4. A. Cregan, R. Schwitter, T. Meyer: Sydney OWL Syntax - towards a Controlled Natural Language Syntax for OWL
1.1., Proceedings of the Fourth OWLED Workshop on OWL Experiences and Directions, 2007.
5. B. Davis, A. Ali Iqbal, A. Funk, V. Tablan, K. Bontcheva, H. Cunningham, S. Handschuh (2008), “RoundTrip Ontology
Authoring”. Proceedings of the International Semantic Web Conference, pp. 50-65
6. S. Harnad (1990) The Symbol Grounding Problem. Physica D 42: 335-346. Lopez
7. K. Kaljurand, “ACE View – an ontology and rule editor based on Attempto Controlled English” Proceedings of the
Fifth OWLED Workshop on OWL: Experiences and Directions, collocated with ISWC 2008
8. V. Lopez, and E. Motta (2004) Ontology Driven question answering in AquaLog, Proceedings of the 9th International
Conference on Applications of Natural Language to Information Systems (NLDB 2004), Manchester, UK
Semantic Web Summer School (SWSS09), Cercedilla
51
© Copyright 2025 Paperzz