towards acquisition of axioms from text

TOWARDS ACQUISITION OF AXIOMS FROM
TEXT
Ana Rios-Alvarado
Information Technology Laboratory CINVESTAV-Tamaulipas
Advisor: Dr. Ivan Lopez-Arevalo
Digital Enterprise Research Institute – NUI Galway
Co-Advisor Dr. Paul Buitelaar
PhD Experiments Day – October 3rd, 2011
OUTLINE
Introduction
Research questions
The proposed approach
Experiments in progress
Conclusions
2
INTRODUCTION
MOTIVATION
Although a huge number of ontology learning
tools and frameworks have been developed,
the obtained ontologies lack of expressiveness
New applications scenarios such as e-business,
e-science, bio-informatics or genomic require
large scale reasoning
One of the main challenges in questionanswering and information retrieval is the
potential mismatch between the expressions in
questions and the expressions in texts
3
INTRODUCTION
BACKGROUND
Ontology Learning layer cake⃰:
4
*P. Cimiano, Ontology Learning and Population from Text: Algorithms, Evaluation and Applications,
Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. pag. 23 URL http://portal.acm.org/citation.cfm?id=1177318.
INTRODUCTION
BACKGROUND
Axiom
Assertions (including rules) in a logical form that
together comprise the overall theory that the
ontology describes in its domain of application
Kind of
axioms
Description
Example
Class
expression
axioms
Allow relationships to be
established between
class expressions
subClassOf(CompactCar, Car)
Object
property
axioms
subPropertyOf(hasMother,
Characterize and
hasParent)
establish relationships
between object property
expressions
Assertions
Axioms about individuals, sameAs(Gary Jefferson, G.
Jefferson)
often called facts
5
INTRODUCTION
BACKGROUND
OWL – DL provides a fomal syntax to represent
axioms:
Axiom
OWL-DL syntax example
subClassOf
<owl:Class rdf:ID="Opera">
<rdfs:subClassOf rdf:resource="#MusicalWork" />
</owl:Class>
equivalentClass
<owl:Class rdf:about="#US_President">
<equivalentClass rdf:resource="#PrincipalResidentOfWhiteHouse"/>
</owl:Class>
disjointClass
<owl:Class rdf:about="#Man">
<owl:disjointWith rdf:resource="#Woman"/>
</owl:Class>
sameIndividualAs
<rdf:Description rdf:about="#William_Jefferson_Clinton">
<owl:sameAs rdf:resource="#BillClinton"/>
</rdf:Description>
6
INTRODUCTION
RELATED WORK
Automatic and semi-automatic tools for axiom learning
Kind of axioms
Related Work
Techniques / Corpus
Class expression
axioms
LeDA [Völker et al., 2007]
Transforming rules / Textual
resoruces & WWW
LeXO [Völker et al., 2007]
Transforming rules / Definitory
contexts
ReLExO [Völker and
Rudolph, 2008]
Transformation rules &
Relational exploration / Textual
resources
Object property
axioms
[Del Vasto et al., 2010]
Lexical patterns and statistical
analysis / WWW
Assertions
Relext [Schutz and
Buitelaar, 2005]
Linguistic and statistical
processing / Textual resources
YAGO [Suchanek, F et al., Heuristics / Infoboxes Wikipedia 7
2008]
RESEARCH QUESTIONS
Considering that the lexical evidence into the
text can be exploited to extract relations as
axioms:
What are the relevant elements to discover axioms
from textual resources?
What are the patterns to extract class expression
axioms from the text?
How to transform the textual axioms to a formal
syntax?
8
THE PROPOSED APPROACH
List of candidate axioms:
sameAs (Sydney Museum, Australian Museum)
sameAs (Hewlett-Packard, HP)
subClassOf(Sport_Institution, Recreational_Institution)
subClassOf(Cultural_Institution, Recreational_Institution)
disjointClass(Recreational_Institution, Company)
equivalentClass(Recreational_Institution, Entertainment_Institution)
9
THE PROPOSED APPROACH
METHOD
Extract individuals
and their classes
Corpus
PERSON
LOCATION
Extract context for
each class
ORG
George Bush most
often refers to:
George H. W. Bush
Organize and validate the
obtained relations
10
Identify lexical
/ syntactic
patterns and
definitory
contexts
EXPERIMENTS IN PROGRESS
Extract individuals from text
Dataset: Corpus Lonely Planet (http://olc.ijs.si/lpReadme.html)
1801 files - 96 classes (taxonomy built manually ) - 1106 Named
Entities – 103 hierarchical relations
NLP Tool: Named Entity Recognition Tool
Performance measures: Precision and Recall
Web Service +
Linked data
NER - Tool
Number NE
(correct)
Number
Precision
of classes
Recall
OpenNLP
865 (218)
7
0.2520
0.1971
StanfordNER
9922 (721)
7
0.07266
0.6518
PythonNLTK
12576 (872)
9
0.06933
0.7884
AlchemyAPI
6928 (577)
27
0.08328
0.5216
OpenCalais
6430 (474)
33
0.07371
0.4285
11
EXPERIMENTS IN PROGRESS
Extract context:
Havana (La Habana) is the largest city in the Carribean….
If any two terms do not co-occur but have a very
(NE)
similar
the followingNEcases
may hold:
<NE> ( context,
<NE> )
NE {also|(which|who)} called NE
The terms are synonyms
<NP>
NEsuch
know(s)
asas
NE{NP,} {(or |and)} NP
Lake Malawi (also called Lake Nyasa)
of the
NP
{, forms
NP}
{,}part
or other
NP is
The terms are related by the NE
subClass
relation
(i.e.
one
term
NE
border
with
{,} including
{NP,} {or|and} NP
broader
toMalawi…
another, e.g. (trade,
business)
NENP
name
NE
NP {,} especially {NP,} {or|and} NP
One of the term is a name of a category, whereas
<NE> called <NE>
another one is an instance of that category, e.g.
….
(country, Japan)
Two terms are instances of the same category e.g.
Sample of 240 sentences in a sameclass that contains NE’s:
(Japan,Patterns
Italy), (apple,
banana) to sameAs
that correspond
The terms
are related
by an association
relationship that
Patterns
that correspond
to subClassOf
should just needs to be “named”
Next experiments: Find patterns to identify
12
equivalenteClass, disjointClass and take into
account other types of data sources
EXPERIMENTS IN PROGRESS
Extract patterns:
Dataset: Corpus Lonely Planet (http://olc.ijs.si/lpReadme.html)
1801 files - 96 classes (taxonomy built manually ) - 1106 Named
Entities – 103 hierarchical relations
NLP Tool: POS tagger, Syntax Parser
Evaluation:
Manual
Domain expert
Semi - automatic
Extract facts from Wikipedia/WordNet
Performance measures: Precision, Recall,
Discounted cumulative gain: use a relevance scale
13
CONCLUSIONS
The learning axioms from text is an area low
explored
The current experiments give an idea on the
candidate patterns to extract axioms from the
text
In the further experiments will be considered
different data sets for experiments and
evaluation
14
Thanks a lot for your kind attention!
Questions?
Comments…
Ana Rios-Alvarado
[email protected]
[email protected]
15