Generation of OWL Ontologies from Concept Maps in Shallow

Generation of OWL Ontologies from Concept Maps in
Shallow Domains
Alfredo Simón1, Luigi Ceccaroni2, and Alejandro Rosete1
2
1
Technical Institute “José Antonio Echevarría”, La Habana (Cuba)
Technical University of Catalonia, Software department, Barcelona (Spain)
[email protected], [email protected], [email protected]
Abstract. A proposal is presented for integration between a graphical model,
such as conceptual maps, and ontologies codified in the OWL language.
Conceptual maps are a flexible form of knowledge representation, very useful
in education-related collaborative environments; OWL is a language of
knowledge representation oriented to semantic analysis and processing carried
out by machines. Integration consists of a set of formal transformation applied
to conceptual maps and the semantic analysis of the relations linking concepts.
The proposed method is based on a concept sense-disambiguation procedure,
also defined by the authors, and in the WordNet lexical database. It applies to
conceptual maps of shallow domains with labels in the Spanish language.
1 Introduction
In the knowledge representation oriented to the semantic analysis and processing by
machines, a context in which a certain degree of formalization is required, the
development and use of ontologies is increasingly common. However, the processes
for the design and creation of ontologies, the tools available, such as Protégé [11], and
the specification languages are still complex for non-experts in this subject. This
complexity represents a difficulty in environments requiring the collaboration of
humans for the development and processing of ontologies.
All the above suggests the use of a form of representation that can be used
naturally by humans and integrated with ontologies in such a way that the latter can
be obtained automatically. Conceptual maps (CMs) are proposed here as this humanfriendly knowledge-representation system. CMs are a tool especially defined for
application in the learning process; they are easy to be created, flexible and intuitive
for people. Taking into accounts these aspects and CMs’ low level of formalization,
integration between CMs and ontologies is studied, specifically in the case of OWL
(Web ontology language) ontologies.
OWL is a formal markup language to share knowledge on the Internet using
ontologies. The integration between CMs and ontologies, and the OWL code
generation are pursued through the incorporation of more formalization in CMs and
through the semantic analysis of the relations among concepts. The proposed method
is partially based on a procedure of concept disambiguation, previously defined by the
authors, and on WordNet [6]. Taking into account that the knowledge in WordNet is
about general terminology, the method is only applicable to shallow domains.
D. Borrajo, L. Castillo, and J.M. Corchado (Eds.): CAEPIA 2007, LNAI 4788, pp. 259–267, 2007.
© Springer-Verlag Berlin Heidelberg 2007
260
A. Simón, L. Ceccaroni, and A. Rosete
This paper deals with the generation of ontologies, and the corresponding OWL
code, from CMs. The inverse process of the integration (obtaining CMs from OWL
ontologies) has been studied by the authors before [13] and is, in comparison, a
simpler problem.
1.1 Conceptual Maps
Conceptual maps (CMs) are a type of knowledge representation that emerges within
the pedagogical sciences at the end of the 1970s. They were proposed by Novak, who
defines them as a “technique that simultaneously represents a strategy of learning, a
method to grasp the most significant aspect of a topic and a schematic resource
included in one structure of propositions” [10]. A CM is a kind of semantic network
[15] that is more flexible and oriented to be used and interpreted by humans. In a CM,
propositions are the smallest semantic structure with proper sense.
1.2 Ontologies and Their Languages
In artificial intelligence, ontologies were introduced to share and reuse knowledge.
They provide the reference for the communication languages in distributed
environments (such as multi-agent systems or the semantic Web) and a semantically
formal description for automatic knowledge processing. An ontology can be defined
as a formal and explicit specification of a shared conceptualization, which is readable
by a computer [3]. Ontologies are the basis of semantic processing; they include a
network of concepts, relationships and axioms to represent, organize and understand a
domain of knowledge; and they provide a common reference frame for all
applications in certain environment.
Knowledge is modeled in the ontologies with a logic based on frame
representation systems (FRSs) [9] [16] and several languages have been defined to
implement it, e.g. DAML+OIL [5] and OWL [12]. OWL is the latest, standardized
ontology language and is based on XML, the resource description framework (RDF)
and the resource description framework schema (RDFS). It includes three
specifications, with different expressiveness levels: OWL Lite, OWL DL and OWL
Full [12]. The code obtained by the method proposed here is a reduced set of OWL
Lite (not including cardinality constraints) with additional elements from OWL DL
(such as the union between classes).
2 Integration of Conceptual Maps and Ontologies
Important similarities exist between CMs and ontologies; especially the ontologies
coded in RDF, given that the RDF language is formalized through triples (subject,
predicate, object) and CMs use the proposition structure (concept, link-word,
concept). Considering that the OWL language is an extension of RDF, the integration
between CMs and OWL ontologies can be put forward. However, knowledge in OWL
is expressed as classes, subclasses, properties, relations, instances and axioms [12]
while in the CMs this formal and explicit specification does not exist and it has to be
inferred.
Generation of OWL Ontologies from Concept Maps in Shallow Domains
261
In Simón et al. (2006) [13] it was concluded that a direct correspondence between
CMs and OWL ontologies could be established. This comes from the analysis that
FRSs are an extension of semantic networks (SNs) [9] and that there exists a
structural correlation between the two representations: between a frame and a node,
and between slots and relations. This also helps to explain the integration between
CMs and OWL ontologies, given that OWL structure is based on frames and that
CMs are a kind of SN.
Two basic criteria have been followed for the semantic interpretation needed for
the OWL coding of CMs’ knowledge:
1. To increase the formalization levels of the link-words (l-w) in the CM, on the
basis of the experience in SNs. Five categories were define and combined with the
different syntaxes formulated in the propositions. The l-w es_un (is_a, in English)
and instancia_de (instance_of), frequently used in SNs, have been indirectly
included through their inverses. The l-w showed in Table 1 for the Spanish
language are not the only ones that can be used; it is just a selection for the
demonstration of the suggested procedure. These l-w can be enriched according to
the different contexts in which the method is used.
Table 1. Categories of link-words and their correspondence with the semantic relations in
WordNet
Category
Subclassification
(CSC)
Instantiation (CI)
Property (CP)
Direct-PropertyValue (CPVD)
Indirect-PropertyValue (CPVI)
Link-words
Relations in
WordNet
es_un-1, tiene_por_subclase,
tiene_parte_a, tiene_dependencia, incluye,
agrupa, se_compone_de, comprende_a,
puede_ser
tiene_por_instancia, tiene_instancia_a,
instancía_como, tiene_ejemplo,
instancia_de-1
tiene, posee, tiene_propiedad, toma_valor,
tiene_valor, se_compone_de
Nouns, such as: tipo, pared, rueda
Hypernym/
hyponym
Verbal forms, such as the ones derived of:
contener (contenido, contiene), ejercer
(ejerce), representar (representa)
Hypernym/
hyponym
Meronym/
holonym
------------------
2. To analyze the CM as a structured text, assuming that each proposition is a
sentence in natural language. The proposition is the smallest semantic unit of the
CM with its own sense [10]. A concept sense-disambiguation algorithm, described
in Simón et al. (2006) [13], is used to identify the correct sense (in terms of
WordNet’s synset) of each concept. Once identified the synsets of a pair of related
concepts, the semantics of the relation between them is inferred, independently of
the l-w used in the CM.
262
A. Simón, L. Ceccaroni, and A. Rosete
The Hypernym relations represent the inclusion among lexical units, from more general
to more specific (subclassification), while hyponym relations are the opposite. Meronym
relations correspond to “part of” or “is member of” (property), while holonym relations
are the opposite. In WordNet, there exist several kinds of relations [6], but only the
hypernym-hyponym, meronym-holonym ones have been considered here.
3 Obtaining OWL Ontologies
To explain the process of obtaining OWL ontologies, the two examples of CMs with
labels in the Spanish language, shown in Fig. 1, are used. This procedure is composed
of five phases.
(a)
(b)
Fig. 1. Examples of concept maps: (a) representation of vasos sanguíneos (blood vessels) from
the anatomy domain, (b) representation of actividades (activities) from the @LIS TechNET
project [4]
Phase 1. Concept sense disambiguation. The identification of synsets for all concepts
of the CM found in WordNet is carried out, using the disambiguation method
described in Simón et al. (2006) [13]. The synsets and WordNet are used for inferring
the semantics of the relation between two concepts, when the l-w does not appear in
any category. The phase finishes with the creation of the LP list, which includes all
propositions in the CM, with each concept associated to its synset.
Phase 2. Initial coding of OWL classes. All concepts are encoded as classes
(owl:Class). Using concepts from Fig. 1 (b) as an example, the coding for concepts
activities, address and name is:
<owl:Class rdf:ID = “Actividades” />
<owl:Class rdf:ID = “Dirección” />
<owl:Class rdf:ID = “Nombre” />
…
Generation of OWL Ontologies from Concept Maps in Shallow Domains
263
Phase 3. Identification of subclass relations. For each proposition p ∈ LP with syntax
(c1, l-w, c2):
1.
If l-w ∈ CSC, c1 is encoded as a class and c2 as a subclass in OWL. Applying this
to concepts vein and blood vessel of the CM of Fig. 1 (a), the result is:
<owl:Class rdf:ID = “Vena” >
<rdfs:subClassOf rdf:resource=“Vasos Sanguíneos” />
...
<owl:Class/>
2.
If l-w ∉ CSC, WordNet is used for deducing the semantics of the relation. Be s1
and s2 synsets of c1 y c2 respectively, and a(si, sj) a path between si and sj:
If ∃ a(s2, s1) formed by hypernymy relations or ∃ a(s1, s2) formed by
hyponymy relations, it can be inferred that c2 is a subclass of c1. Analyzing
the proposition (Vasos Sanguíneos, agrupa, Arteria) in Fig. 1 (a), a hyponym
path from the Arteria’s synset to Vasos Sanguíneos’s synset is found.
Therefore “Arteria” (artery) is a subclass of “Vasos Sanguíneos”. The OWL
generated code is equivalent to the one above for vein.
Phase 4. Identification of instance relations. For each proposition p ∈ LP with syntax
(c1, l-w, c2), if l-w ∈ CI and c2 is a leaf node, it is inferred that c2 is an instance of c1.
Applied to proposition (Arteria, tiene_ejemplo, Aorta) of the Fig. 1 (a), the result is:
<Arteria rdf:ID = “Aorta” />
Phase 5. Identification of property relations. This process is the one with greatest
uncertainty and complexity within the procedure of OWL encoding, due to the
number of diverse situations to analyze. For each p ∈ LP with syntax (c1, l-w, c2):
1.
If l-w ∈ CP, the syntax is assumed to be (class, l-w, property) and it is inferred
that c2 is a property of c1. Applied to the proposition (LugaresDeInterés, tiene,
Nombre) in Fig. 1 (b), the result is:
<owl:ObjectProperty rdf:about=“#nombre”>
<rdf:type rdf:resource=”&owl; FunctionalProperty” />
<rdfs:domain rdf:resource=“#LugaresDeInterés” />
<rdfs:range rdf:resource=“#Nombre” />
</owl:ObjectProperty>
2. If l-w ∈ CPVD, the syntax is assumed to be (class, property, value) and it is
inferred that l-w is the name of a property of c1, and that c2 is the value of this
property. Applied to the proposition (LugaresDeInterés, cronograma,
Cronograma) in Fig. 1(b), the result is:
<owl:ObjectProperty rdf:about=“#cronograma”>
<rdf:type rdf:resource=”&owl; FunctionalProperty”/>
<rdfs:domain rdf:resource=”#LugaresDeInterés”/>
<rdfs:range rdf:resource=”#Cronograma”/>
</owl:ObjectProperty>
264
A. Simón, L. Ceccaroni, and A. Rosete
If l-w ∉ CPVD, the FreeLing tool [1] is used for determining if it is a noun. If it is,
the course of action is the same as above (p-e ∈ CPVD). If the l-w is shared
among more than one proposition, as in the case of pared (wall) in the Fig. 1 (a),
the coding includes the tags <owl:unionOf…>
and
<owl:unionOf
rdf:parseType= ”Collection”/>:
<owl:ObjectProperty rdf:ID = ”Pared” >
<rdfs:domain>
<owl:Class>
<owl:unionOf rdf:parseType=”Collection” />
<owl:Class rdf:about=”#Arteria” />
<owl:Class rdf:about=”#Vena” />
</owl:unionOf>
</owl:Class>
</rdfs:domain>
<rdfs:range>
<owl:Class>
<owl:unionOf rdf:parseType=”Collection”/>
<owl:Class rdf:about=”#Muscular”/>
<owl:Class rdf:about =”#Fibrosa”/>
</owl:unionOf>
</owl:Class>
</rdfs:range>
</owl:ObjectProperty>
In the case the proposition of the l-w is not binary, that is, the same origin concept
is related to more than one destination concept, as in kind of activities (tipo de
actividades) of Fig. 1(b), it is inferred that the property identified by the l-w can
take values from the various ranges corresponding to the destination concepts,
with the following code:
<owl:Class rdf:ID = “Actividades”>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource = “#tipo”>
<owl:someValueFrom rdf:resource = “#Aventura_Aerea”/>
<owl:someValueFrom rdf:resource=“#Aventura_Terrestre”
/>
<owl:someValueFrom rdf:resource=“#Aventura_Acuática”
/>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
3.
If l-w ∈ CPVI, the syntax is assumed to be (class, indirect property, value), and it
is inferred that c2 is the value of the property of c1 obtained from the l-w. Applied
to the proposition (Vasos Sanguíneos, contiene, Sangre) in Fig. 1 (a), the
result is:
Generation of OWL Ontologies from Concept Maps in Shallow Domains
265
<owl:ObjectProperty rdf:ID = “contenido”>
<rdfs:domain rdf:resource = “#Vasos Sanguíneos” />
<rdfs:range rdf:resource = “#Sangre” />
</owl:ObjectProperty>
4.
If l-w ∉ {CP, CPVD, CPVI} and it is not a noun, WordNet is consulted. Be s1 and
s2 the synsets of c1 y c2 respectively and a(si, sj) a path between si and sj:
If ∃ a(s2, s1) formed by holonymy relations or ∃ a(s1, s2) formed by
meronymy relations, it can be inferred that c2 is a property of c1 whose name
is l-w. The OWL code generated is the same as above in point 2 (l-w ∈
CPVD).
4 Implementation
In the implementation, the coding process begins with a CM expressed in the XML
language, in a format generated by MACOSOFT, a tool for the creation of CMs [14].
After having obtained LP as a result of Phase 1, the process of creation of the OWL-file
starts, with the expression of each concept of the CM as a class (Phase 2). This file is in
turn modified, incorporating more specifications. For example: to the code of a class,
the specification of the super-class to which it belongs is incorporated (Phase 3), a
concept that is initially coded as class can become an instance (Phase 4) or a property
(Phase 5). A Spanish version of WordNet, developed by the Natural Language
Processing Group (NLPG), of the Software Department (LSI) of the Technical
University of Catalonia (UPC) has been used to test the system.
5 Related Work
A transformation mechanism from a CM to the OWL language has been included in
Gómez et al. (2004) [7]. The transformation begins with a CM that is coded in XTM,
an extension of XML and the standard specification of the topic maps [2], and, on top
of this codification, a set of rules for obtaining OWL code are applied. In XTM,
concepts and l-w are expressed with the tag topic and the relationships among the
concepts with the tag association, specifying the origin-concept and the destinationconcept of the proposition. For the coding from XTM to OWL all the topics
associated to concepts are coded as owl:Class, those associated to l-w are coded as
owl:ObjectProperty and the associations are coded as sub-classification relations
(rdfs:subClassOf) between the classes associated to the concepts that intervene in the
association. Contrary to the proposal that is being presented in this paper, not all the
semantic interpretations that the relations among the concepts in a CM can have been
considered, for example: not all the associations in XTM (relations in the CM) always
indicate a sub-classification relation in OWL, and not all l-w can be interpreted as
properties in OWL. This happens because a direct syntactic entailment is made
between XTM and OWL, without considering the whole semantics that can be
associated with the knowledge that is being codified. It is not taken into account that
266
A. Simón, L. Ceccaroni, and A. Rosete
XTM is a language lacking explicit semantics and that this needs to be inferred from
the context in which the content is represented.
Another related work is the one described in Hayes et al. (2004) [8], where an
environment for collaborative development of ontologies based on CM is presented.
The paper claims the implementation of the transformation from CMs to OWL and
vice versa, although only the second mechanism is fully described; therefore the
authors of this paper do not have enough elements to make a detailed comparison
between this new proposal and that work. However, the syntactic formalizations that
are proposed in it are of interest and should be certainly taken into account in the
construction of CMs.
6 Conclusions and Future Work
In this paper, the following conclusions have been obtained: (1) it has been shown
that a tight relationship exists between conceptual maps and ontologies; (2) the
interpretation of conceptual maps as structured text allows the semantic inference
needed for their coding in OWL, without losing flexibility; (3) the defined procedures
generate OWL ontologies from conceptual maps in shallow knowledge domains. The
proposed integration creates the bases for generalization to other domains and for the
collaborative development of ontologies.
The paper represent an early stage of research and work is currently being carried
out for the solution of the cases in which the link words are not included in any
category or the concepts are not found in WordNet, which happens, in general, in very
specific domains. These are today’s limitations of the coding procedure presented and
the main reason for which this proposal is fundamentally directed to shallow
knowledge domains. As solutions, work is being done about a mechanism of machine
learning for enriching the repository of link words in all categories, and about the
integration and use of other knowledge bases and ontologies.
References
[1] Atserias, J., Casas, B., Comelles, E., González, M., Padró, L., Padró, M.: FreeLing 1.3:
Syntactic and semantic services in an open-source NLP library. In: 5th International
Conference on Language Resources and Evaluation, ELRA, Genoa, Italy (2006)
[2] Biezunski, M., Newcomb, S., Bryan, M.: Guide to the topic map standards. ISO/IEC
13250 Projects (2002)
[3] Ceccaroni, L.: ONTOWEDSS - An Ontology-Based Environmental Decision-Support
Systems for the Management of Wastewater Treatment Plants. Ph.D. thesis, Technical
University of Catalonia, Barcelona, Spain (2001)
[4] Ceccaroni, L., Willmott, S., Cortés García, U., y Barbera-Medina, W.: @LIS TechNET:
Hacia la enseñanza práctica de las tecnologías de Internet de la próxima generación. In:
5ta Conferencia Internacional de la Educación y la Formación basada en las Tecnologías,
Madrid, Spain, pp. 139–142 (2005)
[5] DARPA.: DAML+OIL ontology Markup Language. Defense Advanced Research
Projects Agency (2001)
Generation of OWL Ontologies from Concept Maps in Shallow Domains
267
[6] Fellbaum, Ch.: WordNet: An Electronic Lexical Database. The MIT Press, University of
Cambridge (1998)
[7] Gómez, H., Díaz, B., González, A.: Two layered approach to knowledge representation
using conceptual maps description logic. In: 1st International Conference on Concept
Mapping, Spain (2004)
[8] Hayes, P., Eskrindge, T., Reichherzer, T., Saavedra, R.: A Framework for Constructing
Web Ontologies using concept Maps. In: Proc. DALM Meeting (2004)
[9] Minsky, M.: A Framework for Representing Knowledge. The Psychology of Computer
Vision, pp. 211–277. McGraw-Hill, New York (1975)
[10] Novak, J.D., Gowin, D.B.: Learning how to learn. Cambridge Press, New York (1984)
[11] Noy, N.F., Fergerson, R.W., Musen, M.A.: The knowledge model of protege-2000:
Combining interoperability and flexibility. In: Dieng, R., Corby, O. (eds.) EKAW 2000.
LNCS (LNAI), vol. 1937, Springer, Heidelberg (2000)
[12] Smith, M., Welty, Ch., McGuinness, D.: OWL Web Ontology Language Guide. W3C
(2004)
[13] Simón, A., Ceccaroni, L., Willmott, S., Rosete, A.: Unificación de la representación de
conocimiento en mapas conceptuales y ontologías para dominios poco profundos. XI
Taller Internacional de Software Educativo. Universidad de Chile. Chile, pp. 72–79
(2006)
[14] Simón, A., Estrada, V., Rosete, A., Lara, V.: GECOSOFT: Un Entrono Colaborativo para
la Gestión del Conocimiento con Mapas Conceptuales. In: 2nd International Conference
on Concept Mapping. Costa Rica, vol. 2, pp. 114–118 (2006)
[15] Sowa, J. (ed.): Principles of semantic networks: explorations in the representation of
knowledge. Morgan Kaufmann, San Francisco (1991)
[16] Lassila, O., McGuinness, D.: The Role of Frame-Based Representation on the Semantic
Web (2001)