doc - Bio-Ontologies 2017

A Novel Ontology Development Environment for the Life Sciences
Susie Stephens* and Mark Musen*
Oracle, 10 Van de Graaff Drive, Burlington, MA 01803, * Stanford Medial Informatics, Stanford School of Medicine, Stanford University, Stanford, CA 94305-5479.
ABSTRACT
Motivation: Ontology development environments need to
take advantage of scalable, reliable and secure data repositories. This is becoming increasingly important as ontologies
become larger in size and the number of simultaneous users
grows. This paper describes the merits of integrating Protégé with the RDF Data Model in the Oracle Database.
1
INTRODUCTION
Ontologies provide the common vocabulary for the integration of the hundreds of different knowledge bases, meta-data
formats, and database schemas that are used in the biomedical domain. An ontological framework enables researchers
to access a knowledge base, appraise its content, determine
if resources are relevant, and to integrate and aggregate the
data with in-house resources and data.
Semantic Web technologies such as RDF and OWL are
being increasingly used for providing the ontological
framework as they provide a means to represent data,
metadata about resources, and for defining relations between components of the resources.
2
2.1
ARCHITECTURE
Protégé
Protégé is the most widely used freely available, platformindependent, open-source technology for managing and
developing large terminologies, ontologies, and knowledge
bases. Protégé has been used as the primary development
environment for several projects in the biomedical domain
and is supported by a strong community of developers and
users. Examples of these projects include Cerner’s Clinical
Bioinformatics Ontology, MGED Ontology, the Foundational Model of Anatomy (Rosse & Mejino 2004), and verification and identification of errors and inconsistencies in
the Gene Ontology (Yeh et al. 2003).
Protégé is based on Java, is extensible, and provides a
platform for customized knowledge-based applications
(Gennari et al. 2003). Protégé provides support for building
Semantic Web applications through its knowledge model,
which is based on the Open Knowledge Base Connectivity
(OKBC) protocol (Chaudhri et al. 1998). This enables on*
tology editors to be built for different ontology languages
including RDF and OWL.
2.2
Oracle Spatial RDF Data Model
The Oracle Database is the market leading relational database management system (RDBMS) in the biomedical domain. With Oracle Database 10g release 2 a new Oracle
object type (SDO_RDF_TRIPLE_S) is introduced for storing RDF and OWL data (Alexander et al. 2004). This object
type is built on top of the Oracle Spatial Network Data
Model (NDM), which is the Oracle solution for managing
graphs within the RDBMS (Stephens et al. 2004).
There are many advantages to storing RDF data as an object type, rather than in flat relational tables. Benefits include making it easier to model and maintain RDF applications, simplifying the integration of RDF data with other
enterprise data, re-use of RDF objects, and no mapping is
required between client RDF objects and database columns
and tables that contain triples.
With the Oracle RDF Data Model triples are parsed and
stored in the database as entries in the NDM node$ and
link$ tables. Nodes in the RDF model are uniquely stored
and reused when encountered in incoming triples. In userdefined application tables, only references are stored in the
SDO_RDF_TRIPLE_S object to point to the triple stored in
the central schema. The RDF Data Model also simplifies
reification by utilizing an Oracle XML DB DBUri to directly reference the reified triple in the database, and thereby
only requires one additional triple to be stored for each reification.
2.3
Integration of Protégé with the Oracle Spatial
RDF Data Model
In preliminary performance testing the Oracle RDF Data
Model is demonstrating comparable performance to that
obtained with a relational-based storage implementation. It
is therefore expected that one of the main benefits of this
novel architecture is the ability to manage RDF applications
more easily, and a more performant approach to data reification.
To whom correspondence should be addressed.
1
A Novel Ontology Development Environment for the Life Sciences
Fig. 1. BioPAX Ontology in Protégé
ACKNOWLEDGEMENTS
We would like to acknowledge the Oracle Spatial Development group for the implementation of the RDF Data Model.
REFERENCES
Alexander, N., Lopez, X., Ravada, S., Stephens, S. and Wang, J.
(2004)
RDF
Data
Model
in
Oracle.
http://lists.w3.org/Archives/Public/public-swls-ws/2004Sep/att0054/W3C-RDF_Data_Model_in_Oracle.doc
Chaudhri, V., Farquhar, A., Fikes, R. Karp, P. and Rice, J. (1998)
OKBC: A programmatic foundation for knowledge base interoperability. In: Fifteenth National Conference on Artifical
Intelligence (AAAI-98), 600-607. Madison, Wisconsin: AAAI
Press/The MIT Press.
Gennari, J., Musen, M. A, Ferguson, R. W., Grosso, W. E.,
Crubezy, M., Eriksson, H., Noy, N. F., and Tu, W. W. (2003)
The evolution of Protégé: An environment for knowledge-based
systems development. International Journal of HumanComputer Interaction 58(1).
Rosse, C., and Mejino, J. L. V. (2004) A reference ontology for
bioinformatics: The foundational model of anatomy. J. Biomed.
Informat.
Stephens, S., Rung, J. and Lopez, X. (2004) Graph Data Representation in Oracle Database 10g: Case Studies in Life Sciences.
IEEE
Data
Engineering
Bulletin.
http://sites.computer.org/debull/A04dec/stephens.ps
Yeh, I. Karp, P., Noy, N. and Altman, R. (2003) Knowledge acquisition, consistency checking and concurrency control for gene
ontology (GO). Bioinformatics 19:241-248.
2