Increasing Research Use of Biodiversity Collec7ons through Ontology-‐Based Data Integra7on Across Taxonomy Hank Bart Tulane University Biodiversity Research Ins7tute Par7cipants Hank Bart, Tulane U. Karen Hughes, U. Tennessee Gabor Racz, U. Nebraska Jim Beach, U. Kansas Gordon Jarrell, U. New Mexico Cori Richards-Zawacki, Tulane U. Meredith Blackwell, LSU Mari Kimura, NSF Nelson Rios, Tulane U. Tom Bruns, UC Berkeley Meredith Lane, USGS Mike Schauff, USDA Prosanta Chakrabarty, LSU Paula Mabee, U South Dakota Rebecca Shapley, Google Joe Cook, U. New Mexico James MacMahon, Utah State Brian Sidlauskas, Oregon State Gladys Cotter, USGS Austin Mast, Florida State Judy Skog, NSF Karen Francl, Radford U. Amanda Neill, BRIT Carol Spencer, UC Berkeley Scott Gardner, U. Nebraska Chris Norris, Yale U. Nathan Swenson, Michigan State David Giblin, U. Washington Larry Page, U. Florida Carl Taylor, NSF Stinger Guala, USGS Monica Papes, U. Wisconsin Paul Tinerella, INHS Rob Guralnick, U. Colorado Alan Prather, Michigan State Jim Woolley, Texas A&M Bryan Heidorn, U. Arizona Kate Rachwal, U. Florida ‘Stove-‐piped’ Collec<ons and Research Programs Biodiversity Anthropology/ Archaeology Earth Sciences Biomedicine Research community Research community Research community Research community Collec7on Collec7on Collec7on Collec7on Taxonomically ‘Stove-‐piped’ (Siloed) Research Collec<ons Data Integra7on Vertebrates Arthropods Fungi Vascular Plants Research community Research community Research community Research community Collec7on Collec7on Collec7on Collec7on New Research Opportuni7es Workshop • Explored new research opportuni<es that could emerge when data are integrated across research collec7ons represen7ng different groups of organisms. • Kinds of data integra7on explored included associa<ons among organisms (symbioses, host-‐ parasite interac7ons, plant-‐herbivore interac7ons) environmentally or geographically defined communi<es (e.g., aqua7c organisms, desert biota). • Examined challenges to retrieving data from different kinds of collec<ons, and enabling new search criteria (e.g., associa7on, habitat, geography). Collec7onsWeb RCN, Workshop III Ques7ons 1. What research ques7ons might we ask of fully integrated collec7on data? What can studies of true bugs (Hemiptera) and their parasitoids (Hymenoptera) tell us about the paOerns of distribu<on/ abundance of their plant hosts? (Tri-‐ trophic Interac7on TCN) 2. How does the current status of digi7za7on, databasing, and interoperability of collec7ons of different taxa limit our abili7es to pose these ques7ons, and what should be done to change this situa7on? (ADBC) 3. What addi7onal opportuni7es can we pursue by integra<ng physical specimens into research involving collec<ons? Collec7onsWeb RCN, Workshop III Ques7ons cont’d 4. What are the general limita<ons to accessing and integra<ng data from different taxonomic collec<ons (taxonomic names issues, specimen iden7fica7on issues, other data quality concerns)? 5. What are the limita<ons to searching collec<on databases and portals geographically or by ecological or environmental search terms? What new search fields and search understanding (ontology) must we establish in order to search collec7ons this way? 6. What new research opportuni7es are associated with integra<ng data for anatomical/morphological specimens? What new search capabili7es fields and ontology must we establish to facilitate morphological/ anatomical research across collec7ons. General Issues of Searching Across Taxonomy • Taxonomic Names and Classifica<on Issues – causes ambiguously in communicating taxon concepts. • Specimen Iden<fica<on Issues – introduces error and uncertainty into research outcomes. • Other Data Quality Issues – errors in 7me, geography, and other fields in biodiversity databases that limit usefulness of data. • Inves<gator Knowledge – of content and structure of biodiversity databases. • Other Across-‐Taxonomy issues – promo7ng collec7ons of associated organisms (host-‐parasites, symbionts) as exemplars for data integra7on. Geographic, Ecological, Environmental Data Integra7on • Fitness-‐for-‐Use – dis7nguishing more useful systema7cally-‐collected (effort based) datasets from ad-‐hoc samples. • Geography-‐Based, across taxon searches (e.g., user-‐defined polygons). • Environment/Habitat-‐Based Searches – climate, vegeta7on types, ecoregional. • Internet-‐based collaboratoriums – self organizing groups established via social media. Anatomical/Morphological Integra7on • Suborganismal-‐Organismal Integra<on – connec7ng studied parts of specimens to whole specimens. • Developmental Morphology – ontogeny, homology, life history. • Specimen Images – various forms of digital representa7on of specimens, scales of size and color. • Integra<ng Specimens with Images – linking published illustra7ons, histology, annotated anatomical parts to whole specimens (Morphbank). How can across-‐taxonomy data integra7on be achieved? Structured Databases • Can impose a uniform rela7onal structure on biodiversity databases encompassing all kinds of biodiversity collec7ons and all kinds of data about the samples. • What might this look like? Data/Database Standards • Can adopt a standardized terminology to facilitate the sharing of informa7on across biodiversity databases. • This has been a_empted too… • Seman7c Web aims to convert the current web of unstructured documents into a web of highly structured documents that machines can read and understand. • Does this by adding seman7c content (meaning) to web pages. Ontologies • In Ar7ficial Intelligence, ontology is a document or file that formally defines the rela<ons among terms in a database. • Ontologies for the Web have taxonomies that define classes of objects and rela<ons among them, plus a set of inference rules. • Built from accumulated knowledge (knowledge is inferred in some instances). • Resolve problem of databases that use different iden<fiers to describe the same concept (e.g., zip code and postal code) by providing equivalence rela7ons (synonymy). Ontology-‐Based Data Integra7on Database or database network Hybrid Approach: Seman7cs of each data source described by its own local ontology, but source ontologies integrated with global shared vocabulary. Greatest flexibility for future expansion. 143 Candidate ontologies: h_p://www.obofoundry.org/ Biodiversity Relevant Ontologies • Anatomy – Amphibian, bilateral symmetry, C. elegans, Drosophila, Fungi, Hymenoptera, Mosquito, Mouse, Plant, Porifera, Spider, Teleost, Tick, UBERON (across-‐ taxon integrated anatomy ontologies), Vertebrate, Zebrafish Development • Phenotype – Ascomycete, C. elegans, Mammalia • Taxonomy – Amphibian, Drosophila, Teleost, Taxonomic Rank • Popula<on and Community Ecology • Environment -‐ Plant Environmental Condi7ons • Geographical en<ty Other Possible Ontologies • Organismal associa7ons – host-‐parasite, symbioses, ecological guilds. • Aqua7c organisms – benthose, pelagic (nekton), neuston, etc. • Trophic roles – Trophic levels (producers, consumers, decomposers), func7onal feeding groups, food-‐web levels. • Countless other possibili7es. Ontology ini7ally assembled from 57 anatomical publica7ons across teleost taxonomy. OWL/RDF Format <!-‐-‐ h_p://purl.obolibrary.org/obo/TAO_0001892 -‐-‐> <owl:Class rdf:about="h_p://purl.obolibrary.org/obo/TAO_0001892"> <rdfs:label rdf:datatype="h_p://www.w3.org/2001/XMLSchema#string">interhyal element </rdfs:label> <rdfs:subClassOf rdf:resource="h_p://purl.obolibrary.org/obo/VSAO_0000139"/> <rdfs:subClassOf> <owl:Restric7on> <owl:onProperty rdf:resource="h_p://purl.obolibrary.org/obo/OBO_REL#_part_of"/> <owl:someValuesFrom rdf:resource="h_p://purl.obolibrary.org/obo/TAO_0001402"/> </owl:Restric7on> </rdfs:subClassOf> <obo:IAO_0000115 rdf:datatype="h_p://www.w3.org/2001/XMLSchema#string">Skeletal element that is bilaterally paired and ar7culates dorsally with the hyosymplec7c car7lage or the ar7cula7on of the hyomandibula and symplec7c. </obo:IAO_0000115> <oboInOwl:id rdf:datatype="h_p://www.w3.org/2001/XMLSchema#string">TAO:0001892 </oboInOwl:id> <oboInOwl:hasOBONamespace rdf:datatype="h_p://www.w3.org/2001/XMLSchema#string">teleost_anatomy </oboInOwl:hasOBONamespace> </owl:Class> <owl:Axiom> <owl:annotatedTarget rdf:datatype="h_p://www.w3.org/2001/XMLSchema#string">Skeletal element that is bilaterally paired and ar7culates dorsally with the hyosymplec7c car7lage or the ar7cula7on of the hyomandibula and symplec7c. </owl:annotatedTarget> <oboInOwl:hasDbXref rdf:datatype="h_p://www.w3.org/2001/XMLSchema#string">TAO:wd</oboInOwl:hasDbXref> <owl:annotatedProperty rdf:resource="h_p://purl.obolibrary.org/obo/IAO_0000115"/> <owl:annotatedSource rdf:resource="h_p://purl.obolibrary.org/obo/TAO_0001892"/> </owl:Axiom> OBO Format [Term] id: TAO:0001892 name: interhyal element def: "Skeletal element that is bilaterally paired and ar7culates dorsally with the hyosymplec7c car7lage or the ar7cula7on of the hyomandibula and symplec7c.“ [TAO:wd] is_a: VSAO:0000139 ! endochondral element rela7onship: OBO_REL:part_of TAO:0001402 ! ventral hyoid arch • Automated tools for switching back and forth from OBO to OWL. 40 images of interhyal elements in Morphbank Environment Ontology h_p://environmentontology.org/ POND (OBO Format) [Term] id: ENVO:00000035 name: marsh def: "A wetland, featuring grasses, rushes, reeds, typhas, sedges, and other herbaceous plants (possibly with low-‐growing woody plants) in a context of shallow water." [Wikipedia:Marsh] synonym: "Marsh" RELATED [NASA:earthrealm] synonym: "marsh" RELATED [Geonames:feature] synonym: "quagmire" RELATED [USGS:SDTS] synonym: "quagmire" RELATED [ADL:FTT] synonym: "wetland" BROAD [ADL:FTT] synonym: "wetland" BROAD [USGS:SDTS] xref: FTT:1118 xref: FTT:185 xref: FTT:945 xref: Geonames:MRSH xref: TGN:21322 xref: Wikipedia:Marsh is_a: ENVO:00000043 ! wetland Building Ontologies • Much work needs to be done building ontologies to cover all of the forms of integra7on needed/ possible for biodiversity collec7ons. • Ontology assembly can be a community process, engaging informa7on specialists with experts from each biological subdiscipline (taxonomists, anatomists, ecologists, etc.) for all major taxonomic groups. • Development can be supported by grants; organiza7ons (Phenoscape, EnvO). Implementa7on • Lots of tools available for building ontologiest in OBO, RDF and OWL formats. • Good resource is Apache Jena Ontology API h_p://jena.apache.org/documenta7on/ ontology/#general-‐concepts. • Completed ontologies can be implemented at the local database, database network, portal search interface level, or a combina7on of these. Acknowledgements • Collec7onsWeb RCN Workshop III par7cipants • Nelson Rios • David Schindel • Andy Bentley • NSF DBI 0639214 • NSF EF 0431259
© Copyright 2026 Paperzz