Generation of OWL Ontologies from Concept Maps in Shallow Domains Alfredo Simón1, Luigi Ceccaroni2, and Alejandro Rosete1 2 1 Technical Institute “José Antonio Echevarría”, La Habana (Cuba) Technical University of Catalonia, Software department, Barcelona (Spain) [email protected], [email protected], [email protected] Abstract. A proposal is presented for integration between a graphical model, such as conceptual maps, and ontologies codified in the OWL language. Conceptual maps are a flexible form of knowledge representation, very useful in education-related collaborative environments; OWL is a language of knowledge representation oriented to semantic analysis and processing carried out by machines. Integration consists of a set of formal transformation applied to conceptual maps and the semantic analysis of the relations linking concepts. The proposed method is based on a concept sense-disambiguation procedure, also defined by the authors, and in the WordNet lexical database. It applies to conceptual maps of shallow domains with labels in the Spanish language. 1 Introduction In the knowledge representation oriented to the semantic analysis and processing by machines, a context in which a certain degree of formalization is required, the development and use of ontologies is increasingly common. However, the processes for the design and creation of ontologies, the tools available, such as Protégé [11], and the specification languages are still complex for non-experts in this subject. This complexity represents a difficulty in environments requiring the collaboration of humans for the development and processing of ontologies. All the above suggests the use of a form of representation that can be used naturally by humans and integrated with ontologies in such a way that the latter can be obtained automatically. Conceptual maps (CMs) are proposed here as this humanfriendly knowledge-representation system. CMs are a tool especially defined for application in the learning process; they are easy to be created, flexible and intuitive for people. Taking into accounts these aspects and CMs’ low level of formalization, integration between CMs and ontologies is studied, specifically in the case of OWL (Web ontology language) ontologies. OWL is a formal markup language to share knowledge on the Internet using ontologies. The integration between CMs and ontologies, and the OWL code generation are pursued through the incorporation of more formalization in CMs and through the semantic analysis of the relations among concepts. The proposed method is partially based on a procedure of concept disambiguation, previously defined by the authors, and on WordNet [6]. Taking into account that the knowledge in WordNet is about general terminology, the method is only applicable to shallow domains. D. Borrajo, L. Castillo, and J.M. Corchado (Eds.): CAEPIA 2007, LNAI 4788, pp. 259–267, 2007. © Springer-Verlag Berlin Heidelberg 2007 260 A. Simón, L. Ceccaroni, and A. Rosete This paper deals with the generation of ontologies, and the corresponding OWL code, from CMs. The inverse process of the integration (obtaining CMs from OWL ontologies) has been studied by the authors before [13] and is, in comparison, a simpler problem. 1.1 Conceptual Maps Conceptual maps (CMs) are a type of knowledge representation that emerges within the pedagogical sciences at the end of the 1970s. They were proposed by Novak, who defines them as a “technique that simultaneously represents a strategy of learning, a method to grasp the most significant aspect of a topic and a schematic resource included in one structure of propositions” [10]. A CM is a kind of semantic network [15] that is more flexible and oriented to be used and interpreted by humans. In a CM, propositions are the smallest semantic structure with proper sense. 1.2 Ontologies and Their Languages In artificial intelligence, ontologies were introduced to share and reuse knowledge. They provide the reference for the communication languages in distributed environments (such as multi-agent systems or the semantic Web) and a semantically formal description for automatic knowledge processing. An ontology can be defined as a formal and explicit specification of a shared conceptualization, which is readable by a computer [3]. Ontologies are the basis of semantic processing; they include a network of concepts, relationships and axioms to represent, organize and understand a domain of knowledge; and they provide a common reference frame for all applications in certain environment. Knowledge is modeled in the ontologies with a logic based on frame representation systems (FRSs) [9] [16] and several languages have been defined to implement it, e.g. DAML+OIL [5] and OWL [12]. OWL is the latest, standardized ontology language and is based on XML, the resource description framework (RDF) and the resource description framework schema (RDFS). It includes three specifications, with different expressiveness levels: OWL Lite, OWL DL and OWL Full [12]. The code obtained by the method proposed here is a reduced set of OWL Lite (not including cardinality constraints) with additional elements from OWL DL (such as the union between classes). 2 Integration of Conceptual Maps and Ontologies Important similarities exist between CMs and ontologies; especially the ontologies coded in RDF, given that the RDF language is formalized through triples (subject, predicate, object) and CMs use the proposition structure (concept, link-word, concept). Considering that the OWL language is an extension of RDF, the integration between CMs and OWL ontologies can be put forward. However, knowledge in OWL is expressed as classes, subclasses, properties, relations, instances and axioms [12] while in the CMs this formal and explicit specification does not exist and it has to be inferred. Generation of OWL Ontologies from Concept Maps in Shallow Domains 261 In Simón et al. (2006) [13] it was concluded that a direct correspondence between CMs and OWL ontologies could be established. This comes from the analysis that FRSs are an extension of semantic networks (SNs) [9] and that there exists a structural correlation between the two representations: between a frame and a node, and between slots and relations. This also helps to explain the integration between CMs and OWL ontologies, given that OWL structure is based on frames and that CMs are a kind of SN. Two basic criteria have been followed for the semantic interpretation needed for the OWL coding of CMs’ knowledge: 1. To increase the formalization levels of the link-words (l-w) in the CM, on the basis of the experience in SNs. Five categories were define and combined with the different syntaxes formulated in the propositions. The l-w es_un (is_a, in English) and instancia_de (instance_of), frequently used in SNs, have been indirectly included through their inverses. The l-w showed in Table 1 for the Spanish language are not the only ones that can be used; it is just a selection for the demonstration of the suggested procedure. These l-w can be enriched according to the different contexts in which the method is used. Table 1. Categories of link-words and their correspondence with the semantic relations in WordNet Category Subclassification (CSC) Instantiation (CI) Property (CP) Direct-PropertyValue (CPVD) Indirect-PropertyValue (CPVI) Link-words Relations in WordNet es_un-1, tiene_por_subclase, tiene_parte_a, tiene_dependencia, incluye, agrupa, se_compone_de, comprende_a, puede_ser tiene_por_instancia, tiene_instancia_a, instancía_como, tiene_ejemplo, instancia_de-1 tiene, posee, tiene_propiedad, toma_valor, tiene_valor, se_compone_de Nouns, such as: tipo, pared, rueda Hypernym/ hyponym Verbal forms, such as the ones derived of: contener (contenido, contiene), ejercer (ejerce), representar (representa) Hypernym/ hyponym Meronym/ holonym ------------------ 2. To analyze the CM as a structured text, assuming that each proposition is a sentence in natural language. The proposition is the smallest semantic unit of the CM with its own sense [10]. A concept sense-disambiguation algorithm, described in Simón et al. (2006) [13], is used to identify the correct sense (in terms of WordNet’s synset) of each concept. Once identified the synsets of a pair of related concepts, the semantics of the relation between them is inferred, independently of the l-w used in the CM. 262 A. Simón, L. Ceccaroni, and A. Rosete The Hypernym relations represent the inclusion among lexical units, from more general to more specific (subclassification), while hyponym relations are the opposite. Meronym relations correspond to “part of” or “is member of” (property), while holonym relations are the opposite. In WordNet, there exist several kinds of relations [6], but only the hypernym-hyponym, meronym-holonym ones have been considered here. 3 Obtaining OWL Ontologies To explain the process of obtaining OWL ontologies, the two examples of CMs with labels in the Spanish language, shown in Fig. 1, are used. This procedure is composed of five phases. (a) (b) Fig. 1. Examples of concept maps: (a) representation of vasos sanguíneos (blood vessels) from the anatomy domain, (b) representation of actividades (activities) from the @LIS TechNET project [4] Phase 1. Concept sense disambiguation. The identification of synsets for all concepts of the CM found in WordNet is carried out, using the disambiguation method described in Simón et al. (2006) [13]. The synsets and WordNet are used for inferring the semantics of the relation between two concepts, when the l-w does not appear in any category. The phase finishes with the creation of the LP list, which includes all propositions in the CM, with each concept associated to its synset. Phase 2. Initial coding of OWL classes. All concepts are encoded as classes (owl:Class). Using concepts from Fig. 1 (b) as an example, the coding for concepts activities, address and name is: <owl:Class rdf:ID = “Actividades” /> <owl:Class rdf:ID = “Dirección” /> <owl:Class rdf:ID = “Nombre” /> … Generation of OWL Ontologies from Concept Maps in Shallow Domains 263 Phase 3. Identification of subclass relations. For each proposition p ∈ LP with syntax (c1, l-w, c2): 1. If l-w ∈ CSC, c1 is encoded as a class and c2 as a subclass in OWL. Applying this to concepts vein and blood vessel of the CM of Fig. 1 (a), the result is: <owl:Class rdf:ID = “Vena” > <rdfs:subClassOf rdf:resource=“Vasos Sanguíneos” /> ... <owl:Class/> 2. If l-w ∉ CSC, WordNet is used for deducing the semantics of the relation. Be s1 and s2 synsets of c1 y c2 respectively, and a(si, sj) a path between si and sj: If ∃ a(s2, s1) formed by hypernymy relations or ∃ a(s1, s2) formed by hyponymy relations, it can be inferred that c2 is a subclass of c1. Analyzing the proposition (Vasos Sanguíneos, agrupa, Arteria) in Fig. 1 (a), a hyponym path from the Arteria’s synset to Vasos Sanguíneos’s synset is found. Therefore “Arteria” (artery) is a subclass of “Vasos Sanguíneos”. The OWL generated code is equivalent to the one above for vein. Phase 4. Identification of instance relations. For each proposition p ∈ LP with syntax (c1, l-w, c2), if l-w ∈ CI and c2 is a leaf node, it is inferred that c2 is an instance of c1. Applied to proposition (Arteria, tiene_ejemplo, Aorta) of the Fig. 1 (a), the result is: <Arteria rdf:ID = “Aorta” /> Phase 5. Identification of property relations. This process is the one with greatest uncertainty and complexity within the procedure of OWL encoding, due to the number of diverse situations to analyze. For each p ∈ LP with syntax (c1, l-w, c2): 1. If l-w ∈ CP, the syntax is assumed to be (class, l-w, property) and it is inferred that c2 is a property of c1. Applied to the proposition (LugaresDeInterés, tiene, Nombre) in Fig. 1 (b), the result is: <owl:ObjectProperty rdf:about=“#nombre”> <rdf:type rdf:resource=”&owl; FunctionalProperty” /> <rdfs:domain rdf:resource=“#LugaresDeInterés” /> <rdfs:range rdf:resource=“#Nombre” /> </owl:ObjectProperty> 2. If l-w ∈ CPVD, the syntax is assumed to be (class, property, value) and it is inferred that l-w is the name of a property of c1, and that c2 is the value of this property. Applied to the proposition (LugaresDeInterés, cronograma, Cronograma) in Fig. 1(b), the result is: <owl:ObjectProperty rdf:about=“#cronograma”> <rdf:type rdf:resource=”&owl; FunctionalProperty”/> <rdfs:domain rdf:resource=”#LugaresDeInterés”/> <rdfs:range rdf:resource=”#Cronograma”/> </owl:ObjectProperty> 264 A. Simón, L. Ceccaroni, and A. Rosete If l-w ∉ CPVD, the FreeLing tool [1] is used for determining if it is a noun. If it is, the course of action is the same as above (p-e ∈ CPVD). If the l-w is shared among more than one proposition, as in the case of pared (wall) in the Fig. 1 (a), the coding includes the tags <owl:unionOf…> and <owl:unionOf rdf:parseType= ”Collection”/>: <owl:ObjectProperty rdf:ID = ”Pared” > <rdfs:domain> <owl:Class> <owl:unionOf rdf:parseType=”Collection” /> <owl:Class rdf:about=”#Arteria” /> <owl:Class rdf:about=”#Vena” /> </owl:unionOf> </owl:Class> </rdfs:domain> <rdfs:range> <owl:Class> <owl:unionOf rdf:parseType=”Collection”/> <owl:Class rdf:about=”#Muscular”/> <owl:Class rdf:about =”#Fibrosa”/> </owl:unionOf> </owl:Class> </rdfs:range> </owl:ObjectProperty> In the case the proposition of the l-w is not binary, that is, the same origin concept is related to more than one destination concept, as in kind of activities (tipo de actividades) of Fig. 1(b), it is inferred that the property identified by the l-w can take values from the various ranges corresponding to the destination concepts, with the following code: <owl:Class rdf:ID = “Actividades”> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource = “#tipo”> <owl:someValueFrom rdf:resource = “#Aventura_Aerea”/> <owl:someValueFrom rdf:resource=“#Aventura_Terrestre” /> <owl:someValueFrom rdf:resource=“#Aventura_Acuática” /> </owl:Restriction> </rdfs:subClassOf> </owl:Class> 3. If l-w ∈ CPVI, the syntax is assumed to be (class, indirect property, value), and it is inferred that c2 is the value of the property of c1 obtained from the l-w. Applied to the proposition (Vasos Sanguíneos, contiene, Sangre) in Fig. 1 (a), the result is: Generation of OWL Ontologies from Concept Maps in Shallow Domains 265 <owl:ObjectProperty rdf:ID = “contenido”> <rdfs:domain rdf:resource = “#Vasos Sanguíneos” /> <rdfs:range rdf:resource = “#Sangre” /> </owl:ObjectProperty> 4. If l-w ∉ {CP, CPVD, CPVI} and it is not a noun, WordNet is consulted. Be s1 and s2 the synsets of c1 y c2 respectively and a(si, sj) a path between si and sj: If ∃ a(s2, s1) formed by holonymy relations or ∃ a(s1, s2) formed by meronymy relations, it can be inferred that c2 is a property of c1 whose name is l-w. The OWL code generated is the same as above in point 2 (l-w ∈ CPVD). 4 Implementation In the implementation, the coding process begins with a CM expressed in the XML language, in a format generated by MACOSOFT, a tool for the creation of CMs [14]. After having obtained LP as a result of Phase 1, the process of creation of the OWL-file starts, with the expression of each concept of the CM as a class (Phase 2). This file is in turn modified, incorporating more specifications. For example: to the code of a class, the specification of the super-class to which it belongs is incorporated (Phase 3), a concept that is initially coded as class can become an instance (Phase 4) or a property (Phase 5). A Spanish version of WordNet, developed by the Natural Language Processing Group (NLPG), of the Software Department (LSI) of the Technical University of Catalonia (UPC) has been used to test the system. 5 Related Work A transformation mechanism from a CM to the OWL language has been included in Gómez et al. (2004) [7]. The transformation begins with a CM that is coded in XTM, an extension of XML and the standard specification of the topic maps [2], and, on top of this codification, a set of rules for obtaining OWL code are applied. In XTM, concepts and l-w are expressed with the tag topic and the relationships among the concepts with the tag association, specifying the origin-concept and the destinationconcept of the proposition. For the coding from XTM to OWL all the topics associated to concepts are coded as owl:Class, those associated to l-w are coded as owl:ObjectProperty and the associations are coded as sub-classification relations (rdfs:subClassOf) between the classes associated to the concepts that intervene in the association. Contrary to the proposal that is being presented in this paper, not all the semantic interpretations that the relations among the concepts in a CM can have been considered, for example: not all the associations in XTM (relations in the CM) always indicate a sub-classification relation in OWL, and not all l-w can be interpreted as properties in OWL. This happens because a direct syntactic entailment is made between XTM and OWL, without considering the whole semantics that can be associated with the knowledge that is being codified. It is not taken into account that 266 A. Simón, L. Ceccaroni, and A. Rosete XTM is a language lacking explicit semantics and that this needs to be inferred from the context in which the content is represented. Another related work is the one described in Hayes et al. (2004) [8], where an environment for collaborative development of ontologies based on CM is presented. The paper claims the implementation of the transformation from CMs to OWL and vice versa, although only the second mechanism is fully described; therefore the authors of this paper do not have enough elements to make a detailed comparison between this new proposal and that work. However, the syntactic formalizations that are proposed in it are of interest and should be certainly taken into account in the construction of CMs. 6 Conclusions and Future Work In this paper, the following conclusions have been obtained: (1) it has been shown that a tight relationship exists between conceptual maps and ontologies; (2) the interpretation of conceptual maps as structured text allows the semantic inference needed for their coding in OWL, without losing flexibility; (3) the defined procedures generate OWL ontologies from conceptual maps in shallow knowledge domains. The proposed integration creates the bases for generalization to other domains and for the collaborative development of ontologies. The paper represent an early stage of research and work is currently being carried out for the solution of the cases in which the link words are not included in any category or the concepts are not found in WordNet, which happens, in general, in very specific domains. These are today’s limitations of the coding procedure presented and the main reason for which this proposal is fundamentally directed to shallow knowledge domains. As solutions, work is being done about a mechanism of machine learning for enriching the repository of link words in all categories, and about the integration and use of other knowledge bases and ontologies. References [1] Atserias, J., Casas, B., Comelles, E., González, M., Padró, L., Padró, M.: FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. In: 5th International Conference on Language Resources and Evaluation, ELRA, Genoa, Italy (2006) [2] Biezunski, M., Newcomb, S., Bryan, M.: Guide to the topic map standards. ISO/IEC 13250 Projects (2002) [3] Ceccaroni, L.: ONTOWEDSS - An Ontology-Based Environmental Decision-Support Systems for the Management of Wastewater Treatment Plants. Ph.D. thesis, Technical University of Catalonia, Barcelona, Spain (2001) [4] Ceccaroni, L., Willmott, S., Cortés García, U., y Barbera-Medina, W.: @LIS TechNET: Hacia la enseñanza práctica de las tecnologías de Internet de la próxima generación. In: 5ta Conferencia Internacional de la Educación y la Formación basada en las Tecnologías, Madrid, Spain, pp. 139–142 (2005) [5] DARPA.: DAML+OIL ontology Markup Language. Defense Advanced Research Projects Agency (2001) Generation of OWL Ontologies from Concept Maps in Shallow Domains 267 [6] Fellbaum, Ch.: WordNet: An Electronic Lexical Database. The MIT Press, University of Cambridge (1998) [7] Gómez, H., Díaz, B., González, A.: Two layered approach to knowledge representation using conceptual maps description logic. In: 1st International Conference on Concept Mapping, Spain (2004) [8] Hayes, P., Eskrindge, T., Reichherzer, T., Saavedra, R.: A Framework for Constructing Web Ontologies using concept Maps. In: Proc. DALM Meeting (2004) [9] Minsky, M.: A Framework for Representing Knowledge. The Psychology of Computer Vision, pp. 211–277. McGraw-Hill, New York (1975) [10] Novak, J.D., Gowin, D.B.: Learning how to learn. Cambridge Press, New York (1984) [11] Noy, N.F., Fergerson, R.W., Musen, M.A.: The knowledge model of protege-2000: Combining interoperability and flexibility. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, Springer, Heidelberg (2000) [12] Smith, M., Welty, Ch., McGuinness, D.: OWL Web Ontology Language Guide. W3C (2004) [13] Simón, A., Ceccaroni, L., Willmott, S., Rosete, A.: Unificación de la representación de conocimiento en mapas conceptuales y ontologías para dominios poco profundos. XI Taller Internacional de Software Educativo. Universidad de Chile. Chile, pp. 72–79 (2006) [14] Simón, A., Estrada, V., Rosete, A., Lara, V.: GECOSOFT: Un Entrono Colaborativo para la Gestión del Conocimiento con Mapas Conceptuales. In: 2nd International Conference on Concept Mapping. Costa Rica, vol. 2, pp. 114–118 (2006) [15] Sowa, J. (ed.): Principles of semantic networks: explorations in the representation of knowledge. Morgan Kaufmann, San Francisco (1991) [16] Lassila, O., McGuinness, D.: The Role of Frame-Based Representation on the Semantic Web (2001)
© Copyright 2026 Paperzz