Entity retrieval in the biomedical ontologies domain Daniela Oliveira Insight Centre for Data Analytics, NUI Galway, Ireland Background Research in the biomedical domain is generating massive amounts of data which feature thousands of terminologies (concepts and properties) described differently even if they overlap. This overlap can be due to different naming conventions, textual descriptions, synonyms, and granularity of these cancer terminologies; it’s an open research problem to precisely identifying an ontological term which best describe a given cancer terminology. Ontology search services such as BioPortal and Ontology Lookup Service (OLS) contains over 500 ontologies covering distinct domains as Zebrafish anatomy to Epidemiology. However, they often suggest large, vague, or loose search results for a given terminology. This issue hinders the applicability of reuse and leads to situations where the same concept has different definitions or labels. Finding the right ontology class match for a specific term can be arduous since few search engines provide keyword-based searches for ontologies. For example, if a non-expert user wants to annotate his database with ontologies, to make it interoperable with other resources, he needs to find the right ontology to describe his concepts. However, to the best of my knowledge, no system exists to automatically provide him with a solution non-equivocal way. Most of the current approaches do not fully take advantage of both the complex semantics and possibility of ranking ontologies according to their relevance to retrieve the best possible results. Therefore, we argue that biomedical resources could benefit from the development of a set of algorithms to specifically search biomedical ontologies, using keywords, and taking advantage of the semantic and structural complexity of ontologies. This system could then be utilised in the integration of resources not currently annotated with ontologies, to increase the interoperability within the biomedical data domain. The purpose of this work is to investigate whether it is possible to create algorithms that can search ontologies, using keywords, and return the top-k concepts that correspond to that search. Methodology The ontology search process can be divided into two steps: (1) lexical matching, where the keywords are lexically matched with the labels and synonyms of a collection of ontologies and (2) ranking the results according to the relevance of the concept and the ontologies. For the first step, I am using the Bioportal [1] and Ontology Lookup Service (OLS) [2] search engines to obtain results for each keyword. By skipping this proccess I can focus first on the second part, where I am in the early phase adapting the Best Match 25 (BM25) [3] algorithm to ontology search. BM25 ranks documents according to their relevance in relation to a given search query. A variation of the BM25 algorithm, BM25F [4], is used for structured documents. In this variation the term frequency is weighted by a boost factor, which depends on the field of the keyword, e.g. if the term is in a title, it is weighted differently than if it is in the abstract. To test the results, I am using a collection of different keywords from a set of cancer-related terminologies. These keywords originated from the extraction of terms from cancer repositories such as COSMIC [5]. Future work BioOpener is a project which aims to find, access and aggregate multiple biomedical repositories to integrate different resources. Currently we have 6399 terms extracted from the cancer repositories used in the BioOpener project. A future application of this work would be to find terms in ontologies that match the ones extracted from the repositories. We will need to create an evaluation method to test the algorithms. We also intend to further explore tools such as Bioportal’s Annotator and Recommender to check if there is any possibility for integration. In the future, we will try different information retrieval algorithms and explore the capabilities of indexing application such as Solr. Bibliography 1. Noy, N., Shah, N.H., Whetzel, P.L., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D.L., Storey, M.-A., Chute, C.G., Musen, M.A.: BioPortal: Ontologies and integrated data resources at the click of a mouse. CEUR Workshop Proc. 833, 292–293 (2011). 2. Jupp, S., Burdett, T., Malone, J., Leroy, C., Pearce, M., McMurry, J., Parkinson, H.: A new ontology lookup service at EMBL-EBI. In: CEUR Workshop Proceedings. pp. 118–119 (2015). 3. Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Robertson and H (2009). 4. Pérez-Agüera, J.R., Arroyo, J., Greenberg, J., Iglesias, J.P., Fresno, V.: Using BM25F for semantic search. North. 1–8 (2010). 5. Forbes, S.A., Beare, D., Gunasekaran, P., Leung, K., Bindal, N., Boutselakis, H., Ding, M., Bamford, S., Cole, C., Ward, S., Kok, C.Y., Jia, M., De, T., Teague, J.W., Stratton, M.R., McDermott, U., Campbell, P.J.: COSMIC: Exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. (2015).
© Copyright 2026 Paperzz