TDG6 Language Resource Ontologies

ISO/TC37/SC4/TDG6
Language Resource Ontologies
2008-09-27, Pisa
HASIDA Koiti
[email protected]
CfSR, AIST, Japan
Ontologization
reformulation in terms of ontology
provide standard way to convert
annotations to labeled directed graphs
DCR, LAF, LMF, FS, MAF, SemAF, SynAF,
MLIF, etc.
Cf. LMF and MAF have UML-based schemas.
not XML but RDF as base description
and modeling tool
standard semantic interpretation for RDF
highlight semantics rather than syntax
2
Purposes of Ontologization
interoperability
among ISO/TC37 standards
with ontologies from elsewhere
with any data containing linguistic content
RDF data are easier to integrate than XML data.
e.g. external annotation of texts in SMIL data
without including linguistic description in SMIL
specification
fuller formalization of IS specifications
semantic extension of DCR
3
Semantic Extension of DCR
sorts of DCs
unary predicate → class
binary relation → property
symmetric binary relation, etc.
types of the domain (1st arg.) and the
range (2nd arg.) of binary relations
(properties)
4
XML Mess
Semantic interpretation of XML is not
standardized but defined ad hoc.
Many inconsistent `standards’ on
overlapping issues.
Huge standards containing many
different semantic interpretation
manners.
e.g., MPEG-7 > 2000 pages
5
RDF
Resource Description Framework
labeled directed graph
W3C recommendation
http://www.w3.org/RDF/
Schemas are provided by RDFS, OWL,
etc.
textual representation
XML, N3, etc.
6
RDF Graph
http://meetings.example.com/m1/hp
m:homePage
http://meetings.example.com/cal#m1
m:attending
http://www.example.org/people#fred
m:hasEmail
m:givenName
Fred
mailto:[email protected]
7
Conversion of XML to RDF
AnyURI- and IDREF(S)-type attribute
→ object property (link)
other attribute → datatype property
embedded element
→ object/datatype property
8
24610: Feature Structure
typed feature structure as in HPSG, etc.
ISO 24610-1: Feature Structure
Representation
ISO 24610-2: Feature System Declaration
labeled directed graph
AVM (attribute-value matrix)
textual encoding by XML
9
FS Graph = RDF Graph
POS
ORTH
SPECIFIER
HEAD
AGR
determiner
la
NUMBER
singular
AGR
POS
ORTH
noun
pomme
10
FS in AVM
SPECIFIER
POS determiner
ORTH `la’
AGR [1][NUMBER singular]
HEAD
POS noun
ORTH `pomme’
AGR [1]
11
Ontologies Subsume Feature Systems
Features are partial functions, whereas
RDF properties are relations in general
(possibly partial functions).
Usual feature systems have no
taxonomy of features, whereas usual
ontologies have taxonomies of
properties (e.g., due to
rdfs:subPropertyOf).
12
Feature-System Declaration
<fsDecl type="word" baseTypes="sign">
<fsDescr>The fundamental type for individual words</fsDescr>
<fDecl name="orth">
<fDescr>The orthographic representation for this word</fDescr>
<vRange><string/></vRange>
</fDecl>
</fsDecl>
The fundamental type for individual words
sign
rdfs:comment
rdfs:subClassOf
The orthographic representation for this word
word
rdfs:domain
rdfs:comment
rdf:type
orth
owl:FunctionalProperty
rdfs:range
string
13
Constraint (Conditional)
<cond>
<fs>
<f name="inv">
<binary value="true"/>
</f>
</fs>
<then/>
<fs>
<f name="aux">
<binary value="true"/>
</f>
<f name="vform">
<symbol value="fin"/>
</f>
</fs>
</cond>
X
inv
true
cond
aux
true
vform
fin
X
SWRL representation:
inv(?X,true)
-> aux(?X,true) & vform(?X,fin)
14
FS Ontologization (Summary)
RDF ⊃ FS
Use ontologies for feature-system
declarations.
SWRL to encode constraints
Defaults are outside of ontology.
15
24612: Linguistic Annotation
Framework
16
GrAF in RDF
TOKEN
rdfs:type
The
DET
POS
BASE
rdfs:type
THE
clock
POS
BASE
rdfs:type
NUMBER
possibly stand-off annotation
NN
CLOCK
NP
SING
17
SemAF-DActs
Dialogue
1..1
sender
Agent
0..*
overhearer
1..*
Turn
addressee
1..*
1..*
Utterance
1..*
0..*
DialogueAct
18
func.dep.
TODOs (projects in TDG6?)
include ontologies in documents
FSD
just check UML (as far as no property
hierarchy is necessary)
LMF, MAF
finish ontologization (possibly in UML)
SynAF
ontologize from scratch, forgetting XML
DCR, SemAF-Time, SemAF-DActs, MLIF,
etc.
19
Issues
Who should ontologize individual WIs?
ontologize future WIs from the beginning
TDG6 should exemplify how.
whether and how to make ontologization
mandatory?
Where to include ontologies of ongoing
WIs?
depending on their stages (WD, CD, ...)
How to keep ontologizing DCs?
replace DC metamodel by ontology?
modify ISOCat?
20