A Novel Ontology Development Environment for the Life Sciences Susie Stephens* and Mark Musen* Oracle, 10 Van de Graaff Drive, Burlington, MA 01803, * Stanford Medial Informatics, Stanford School of Medicine, Stanford University, Stanford, CA 94305-5479. ABSTRACT Motivation: Ontology development environments need to take advantage of scalable, reliable and secure data repositories. This is becoming increasingly important as ontologies become larger in size and the number of simultaneous users grows. This paper describes the merits of integrating Protégé with the RDF Data Model in the Oracle Database. 1 INTRODUCTION Ontologies provide the common vocabulary for the integration of the hundreds of different knowledge bases, meta-data formats, and database schemas that are used in the biomedical domain. An ontological framework enables researchers to access a knowledge base, appraise its content, determine if resources are relevant, and to integrate and aggregate the data with in-house resources and data. Semantic Web technologies such as RDF and OWL are being increasingly used for providing the ontological framework as they provide a means to represent data, metadata about resources, and for defining relations between components of the resources. 2 2.1 ARCHITECTURE Protégé Protégé is the most widely used freely available, platformindependent, open-source technology for managing and developing large terminologies, ontologies, and knowledge bases. Protégé has been used as the primary development environment for several projects in the biomedical domain and is supported by a strong community of developers and users. Examples of these projects include Cerner’s Clinical Bioinformatics Ontology, MGED Ontology, the Foundational Model of Anatomy (Rosse & Mejino 2004), and verification and identification of errors and inconsistencies in the Gene Ontology (Yeh et al. 2003). Protégé is based on Java, is extensible, and provides a platform for customized knowledge-based applications (Gennari et al. 2003). Protégé provides support for building Semantic Web applications through its knowledge model, which is based on the Open Knowledge Base Connectivity (OKBC) protocol (Chaudhri et al. 1998). This enables on* tology editors to be built for different ontology languages including RDF and OWL. 2.2 Oracle Spatial RDF Data Model The Oracle Database is the market leading relational database management system (RDBMS) in the biomedical domain. With Oracle Database 10g release 2 a new Oracle object type (SDO_RDF_TRIPLE_S) is introduced for storing RDF and OWL data (Alexander et al. 2004). This object type is built on top of the Oracle Spatial Network Data Model (NDM), which is the Oracle solution for managing graphs within the RDBMS (Stephens et al. 2004). There are many advantages to storing RDF data as an object type, rather than in flat relational tables. Benefits include making it easier to model and maintain RDF applications, simplifying the integration of RDF data with other enterprise data, re-use of RDF objects, and no mapping is required between client RDF objects and database columns and tables that contain triples. With the Oracle RDF Data Model triples are parsed and stored in the database as entries in the NDM node$ and link$ tables. Nodes in the RDF model are uniquely stored and reused when encountered in incoming triples. In userdefined application tables, only references are stored in the SDO_RDF_TRIPLE_S object to point to the triple stored in the central schema. The RDF Data Model also simplifies reification by utilizing an Oracle XML DB DBUri to directly reference the reified triple in the database, and thereby only requires one additional triple to be stored for each reification. 2.3 Integration of Protégé with the Oracle Spatial RDF Data Model In preliminary performance testing the Oracle RDF Data Model is demonstrating comparable performance to that obtained with a relational-based storage implementation. It is therefore expected that one of the main benefits of this novel architecture is the ability to manage RDF applications more easily, and a more performant approach to data reification. To whom correspondence should be addressed. 1 A Novel Ontology Development Environment for the Life Sciences Fig. 1. BioPAX Ontology in Protégé ACKNOWLEDGEMENTS We would like to acknowledge the Oracle Spatial Development group for the implementation of the RDF Data Model. REFERENCES Alexander, N., Lopez, X., Ravada, S., Stephens, S. and Wang, J. (2004) RDF Data Model in Oracle. http://lists.w3.org/Archives/Public/public-swls-ws/2004Sep/att0054/W3C-RDF_Data_Model_in_Oracle.doc Chaudhri, V., Farquhar, A., Fikes, R. Karp, P. and Rice, J. (1998) OKBC: A programmatic foundation for knowledge base interoperability. In: Fifteenth National Conference on Artifical Intelligence (AAAI-98), 600-607. Madison, Wisconsin: AAAI Press/The MIT Press. Gennari, J., Musen, M. A, Ferguson, R. W., Grosso, W. E., Crubezy, M., Eriksson, H., Noy, N. F., and Tu, W. W. (2003) The evolution of Protégé: An environment for knowledge-based systems development. International Journal of HumanComputer Interaction 58(1). Rosse, C., and Mejino, J. L. V. (2004) A reference ontology for bioinformatics: The foundational model of anatomy. J. Biomed. Informat. Stephens, S., Rung, J. and Lopez, X. (2004) Graph Data Representation in Oracle Database 10g: Case Studies in Life Sciences. IEEE Data Engineering Bulletin. http://sites.computer.org/debull/A04dec/stephens.ps Yeh, I. Karp, P., Noy, N. and Altman, R. (2003) Knowledge acquisition, consistency checking and concurrency control for gene ontology (GO). Bioinformatics 19:241-248. 2
© Copyright 2025 Paperzz