bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not available from individual sources and can provide contextualization of data back to these sources. The Monarch Initiative (www.monarchinitiative.org) is a collaborative, open science effort that aims to semantically integrate genotype-phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web services enable diverse users to explore relationships between phenotypes and genotypes across species. The Monarch Initiative: An integrative data and analytic platform connecting phenotypes to genotypes across species 1 2 3 4 5 2 Christopher J Mungall , Julie A McMurry , Sebastian Köhler , James P. Balhoff , Charles Borromeo , Matthew Brush , 1 2 1 2 2 2 6 Seth Carbon , Tom Conlin , Nathan Dunn , Mark Engelstad , Erin Foster , JP Gourdine , Julius O.B. Jacobsen , Dan 2 2 1 1 2 2 5 Keith , Bryan Laraway , Suzanna E. Lewis , Jeremy Nguyen Xuan , Kent Shefchek , Nicole Vasilevsky , Zhou Yuan , 1 5 7 8 3 2* Nicole Washington , Harry Hochheiser , Tudor Groza , Damian Smedley , Peter N. Robinson , Melissa A Haendel 1. 2. 3. 4. 5. 6. 7. 8. Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, USA. Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany RTI International, Research Triangle Park, NC, USA Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756 . The copyright holder for this preprint (which was not Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK available under a CC-BY 4.0 International license. peer-reviewed) is Cambridge, the author/funder. It is made Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK *To whom correspondence should be addressed. Tel: 1-503-407-5970; Email: [email protected] Present Address: Melissa Haendel, Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, USA. Abstract The correlation of phenotypic outcomes with genetic variation and environmental factors is a core pursuit in biology and biomedicine. Numerous challenges impede our progress: patient phenotypes may not match known diseases, candidate variants may be in genes that have not been characterized, model organisms may not recapitulate human or veterinary diseases, filling evolutionary gaps is difficult, and many resources must be queried to find potentially significant genotype-phenotype associations. Since non-human organisms have proven instrumental in revealing biological mechanisms, it is necessary to provide advanced informatics tools to identify phenotypically relevant disease models. Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not available from individual sources and can provide contextualization of data back to these sources. The Monarch Initiative (www.monarchinitiative.org) is a collaborative, open science effort that aims to semantically integrate genotype-phenotype data from many species and sources in order to support precision medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web services enable diverse users to explore relationships between phenotypes and genotypes across species. Introduction A fundamental axiom of biology is that phenotypic manifestations of an organism are due to interaction between genotype and environmental factors over time. In the rapidly advancing era of genomic medicine, a critical challenge is to identify the genetic etiologies of Mendelian disease, cancer, and common and complex diseases, and translate basic science to better treatments. Currently, available human data associates less than approximately 51% of known human coding genes with phenotype data (based on OMIM (1), ClinVar (2), Orphanet (3), CTD (4), and the GWAS catalog (5)). See Table 1 for a list of database abbreviations. This coverage can be extended to approximately 89% if phenotypic information from orthologous genes from five of the most well-studied model organisms is included (Figure 1). Figure 1. The phenotype annotation coverage of human coding genes. Yellow bars show that 51% of those genes have at least one phenotype association reported in humans (HPO annotations of OMIM, ClinVar, Orphanet, CTD, and GWAS). The blue bars show that 58% of human coding genes have orthologs with causal phenotypic associations reported in at least one non-human model (MGI, Wormbase, Flybase, and ZFIN). The green bars show that 40% of human coding genes have annotations both in human and in non-human orthologs. There are phenotypic associations from humans and/or non-human orthologs that cover 89% of human coding genes. Similarly, of the 72% of the 3,230 genes in ExAC with ‘near-complete depletion of predicted proteintruncating variants have no currently established human disease phenotype’ (6), where 88% of these genes without a human phenotype have a phenotype in a non-human organism. However, leveraging these model data for computational use is non-trivial primarily because the relationships between gene and disease (7) and between model system and disease phenotypes (8) are not straightforward. maps a variety of external data sources and databases to RDF (Resource Description Framework) graphs. RDF provides a flexible way of modeling a variety of complex datatypes, and allows entities from different databases to be connected via common instance or class URIs (Uniform Resource Indicators). We use relationship types from the Relation Ontology (RO; https://github.com/oborel/obo-relations) (17) and other vocabularies to connect entities together, along with a number of Open Biological Ontologies (18) bioRxiv preprint posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756 . The copyright holder this preprint For (whichexample, was not In recent years,firstthere has been a growth in the (OBOs) to classify thesefor entities. a peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. number of genotype-phenotype databases available, mouse genotype can be related to a phenotype using covering a diversity of domain areas for human, the has_phenotype relation (RO:0002200), with the model organisms, and veterinary species. While genotype classified using a term from the Genotype providing quality inventories of the relevant species Ontology (GENO) (19), and the phenotype classified and phenotypic data types, most resources are using the Mammalian Phenotype Ontology (MP) (20). limited to a single species or limit cross-species We use the Open Biomedical Annotations (OBAN; comparison to direct assertions (e.g. Organism X is a https://github.com/EBISPOT/OBAN) vocabulary to model of Disease Y) or based upon orthology associate evidence and provenance metadata with relations (e.g. organism Y’ is a model of Disease Y each edge, using the Evidence and Conclusions due to Y and Y’ being orthologs). While great strides Ontology (ECO) for types of evidence (21).The have been made in text-based search engines, graphs produced by Dipper are available as a phenotype data remains difficult to search and use standalone resource in RDF/turtle format at computationally due to its complexity and in the use http://data.monarchinitiative.org/ttl. of different phenotype standards and terminologies. Such barriers have made linking and integration with the precision and richness needed for mechanistic discovery across species a significant challenge (9). A newer method to aid identifying models of disease and to discovery underlying mechanisms is to utilize ontologies to describe the set of phenotypes that present for a given genotype or disease, what we call a ‘phenotypic profile’. A phenotypic profile is the subject of non-exact matching within and across species using ontology integration and semantic similarity algorithms (10)(11) in software applications such as Exomiser (12) and Genomiser (13), and this approach has been shown assist disease diagnosis (14–16). The Monarch Initiative uses an ontologybased strategy to deeply integrate genotypeFigure 2. Monarch Data Architecture. Structured and unstructured data sources are loaded into SciGraph via phenotype data from many species and sources, Dipper. Ontologies are also loaded into SciGraph, resulting thereby enabling computational interrogation of in a combined knowledge and data graph. Data is disease models and complex relationships between disseminated via SciGraph Services, an ontologygenotype and phenotype to be revealed. The enhanced Solr instance called GOlr, and to the OwlSim semantic similarity software. Monarch applications and Monarch Initiative name was chosen because it is a end users access the services for graph querying, community effort to create paths for diverse data to application population, and phenotype matching. be put to use for disease discovery, not unlike the navigation routes that a monarch butterfly would We also import a number of external and in-house take. ontologies, for data description and data integration. As these ontologies are all available from the OBO Data Architecture Library in Web Ontology Language (OWL), no additional transformation is necessary. The combined The overall data architecture for Monarch is shown in corpus of graphs ingested using Dipper and from Figure 2. The bulk of the data integration is carried ontologies is referred to as the Monarch Knowledge out using our Data Ingest Pipeline (Dipper) tool Graph. (https://github.com/monarch-initiative/dipper), which A: Data types covered by Monarch data sources C: Mappings to bridging ontologies 2 P/D genes variants orthology animals function interactions expression pathways models diseases phenotypes G B: Monarch data sources and ontology annotations Figure 3. Data types, sources, and the ontologies Data source used for their integration in Monarch into the Monarch MedGen edGen O O O O ClinVar knowledge graph. Each UMLS S MeSH M SH data source uses or is O O O O CTD DOID mapped to a suite of O O Gene Revs different ontologies or MONDO NDO O O O OMIM db OMIM vocabularies. These are in O O HPOA turn integrated into bridging Orphanet O O O ORDO ontologies for Genetics O O O O GWAS Elements of Morphology M bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder (GENO), for this preprint (which wasAnatomy not Coriell is the author/funder. It is made available O O O O peer-reviewed) HP under a CC-BY 4.0 International license. (Uberon/CL), Phenotypes KEGG O O O EFO (UPheno), and Diseases VT O O O O O O OMIA UPheno (MonDO). O O O O O O O O O O O O O O O O p O O O O O O O p O O O O O O O O O p O O O O O O O O O O O O O AnimalQTLDB MPD MMRRC MP WP FBcv WA WmBase FlyBase FlyBase O IMPC O O MGI O O ZFIN Ensembl NCBI UCSC HGNC BioGrid Panther CL MA FBbt EMAPA ZP ZFA SO All sources * { UBERON GENO GENETICS ANATOMY PHENOTYPES DISEASES O O Bridging Ontology Ontology mappings annotations FALDO GO ECO RO We contribute We created & maintain Figure 4. Distribution of phenotypic annotations across species in Monarch, broken down by the top levels of the phenotype ontology. The graph can be interactively explored at https://monarchinitiative.org /phenotype/. Note that annotations are currently dominated by human, mouse, zebrafish and C elegans (top panel); the chart is faceted allowing individual species to be switched on and off to see contributions for less datarich species such as veterinary animals and monkeys (middle panel). Clicking on a given phenotype text allows drilling down to its subtypes (lower panel). The data integrated in Monarch wide range of • The Genotype Ontology (GENO) (19) defines sources, including human clinical knowledge sources genotypic elements and bridges the Sequence as well as genetic and genomic resources covering Ontology (SO) (27) and FALDO (28). GENO allows organismal biology. The list of data sources and the propagation of phenotypes that are annotated ontologies integrated is shown in Figure 3, with a differently in different sources to genotypic species distribution illustrated in Figure 4. The elements. knowledge graph is loaded into an instance of a Entity resolution and unification SciGraph database (https://github.com/SciGraph/SciGraph/), which One of the many challenges faced when integrating embeds and extends a Neo4J database, allowing for bioinformatics resources is the presence of the same complex queries and ontology-aware data processing bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756 . The copyright holder for thisdesignated preprint (whichby was different not entity in multiple databases, and Named Entitypeer-reviewed) Recognition. We provide two is the author/funder. It is made available under a CC-BY 4.0 International license. identifiers (29). This problem is compounded by the public endpoints for client software to query these different ways the same identifier can be written, services: using different prefixes or no prefix at all. Taking a https://scigraph-ontology.monarchinitiative.org/scigraph/docs Monarch page for a single gene, for example (for ontology access) and ‘fibrinogen gamma chain’, FGG, https://scigraph-data.monarchinitiative.org/scigraph/docs (https://monarchinitiative.org/gene/NCBIGene:2266). (for ontology plus data access). Monarch has integrated data from a variety of human, model organism, and other biomedical These SciGraph instances provide powerful graph sources such as OMIM (1), Orphanet (3), ClinVar (2), querying capabilities over the complete knowledge HPO (30), KEGG (31), CTD (4), MyGene (32), graph. Many of the common query patterns are BioGrid (33), and via orthology in PantherDB (34) we executed in advance and stored in an Apache Solr also incorporate Fgg gene data from ZFIN (35) and index, making use of the Gene Ontology ‘GOlr’ from MGI (36). No two of these sources represents indexing strategy, allowing for fast queries of the identifier for FGG in precisely the same way. As ontology-indexed associations. part of our data ingest process, we normalize all identifiers using a curated set of database prefixes. Finally, we also load a subset of the graph into an These have a defined mapping to an http URI. These OwlSim instance, which provides phenotype curated prefixes have been deposited in the Prefix matching services as well as the ability to perform Commons (https://github.com/prefixcommons), which fuzzy phenotype searches based on a phenotype similarly contains identifier prefixes used within the profile. We also provide phenotype matching services Gene Ontology (37) and Bio2RDF (38). via the Global Alliance for Genomes and Health (GA4GH) Matchmaker Exchange (MME) API MME In post-processing equivalent identifiers, we perform (22), available at https://mme.monarchinitiative.org. clique-merging (https://github.com/SciGraph/SciGraph/wiki/PostMany of the data sources we integrate make use of processors). We take all edges labeled with either their own terminologies and ontologies. We the owl:sameAs or owl:equivalentClasses property aggregate these into a unified ontology and calculate equivalence cliques, based on the (https://github.com/monarch-initiative/monarchsymmetric and transitive nature of these properties. ontology/) and make use of bridging ontologies and We then merge these cliques together, taking a our curated integrative ontologies to connect these designated ‘clique leader’ (for instance, NCBI for together. In particular: genes) and mapping all edges in the monarch graph • The Uber-anatomy ontology (Uberon) bridges such that they point to a clique leader. species-specific and clinical anatomical and tissue ontologies (23) In-house curation • The unified phenotype ontology bridges model organism and human phenotype ontologies and In addition to ingest of external sources and terminologies, using techniques described in ontologies, we perform in-house data and ontology (24)(25) curation. For curation of ontology-based genotype• The Monarch Merged Disease Ontology (MonDO) phenotype associations (including diseaseuses a Bayes ontology merging algorithm (26) to phenotypic profiles), we are transitioning to the integrate multiple human disease resources into a WebPhenote platform single ontology, and additionally includes animal (http://create.monarchinitiative.org), which allows a diseases from OMIA. variety of disease entities to be connected to phenotypic descriptors. We also make use of text mining to create seed disease-phenotype associations using the Bio-Lark toolkit (39), which are then manually curated. Most recently we have performed a large-scale annotation of PubMed to extract common disease-phenotype associations (40). Most of the in-house curation work involves making smaller resources with free text description of phenotypic information computable, for example, the Online Mendelian Inheritance in Animals (OMIA) resource, with whom we have been collaborating to support this curation (41). clinician wishes to search for either known diseases that have a similar presentation, or model organism genes that demonstrate homologous phenotypes when the gene is perturbed • Researcher looking for diseases that have similar phenotypic feature to a newly identified model organism mutant identified in a screen • Researchers or clinicians who need to identify potentially informative phenotyping assays for differential diagnosis or to identify candidate genes bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Quality control Features External resources and datasets that are incorporated into Monarch are evaluated before incorporation into the Dipper pipeline – we primarily integrate high-quality curated resources. For all ontologies we bring in, we apply automated reasoning to detect inconsistencies between different ontologies. For each release, we perform high-level checks on each integrated resource to ensure no errors in the extraction process occurred, but we do not perform in-depth curation checks of integrated resources. Each release happens once every one to two months. Integrated information on entities of interest We provide overview pages for entities such as genes, diseases, phenotypes, genotypes, variants, and publications. Each page highlights the provenance of the data from the diverse clinical, model organism, and non-model organism sources. These pages can be found either via search (see below) or through an entity resolver. For example, the URL https://monarchinitiative.org/OMIM:266510 will redirect to a page about the disease ‘Peroxisome biogenesis disorder type 3B’ from the OMIM resource, showing its relationships to other content within the Monarch knowledge graph, such as phenotypes and genes associated with the disease. We make use of MonDO (the Monarch merged disease ontology (26)) to group similar diseases together. Figure 5 shows an example page for Marfan syndrome with related phenotype, gene, model and variant data. In order to measure annotation richness, we have also created an annotation sufficiency meter web service (42) available at https://monarchinitiative.org/page/services; this service determines whether a given phenotype profile for any organism is sufficiently broad and deep to be of diagnostic utility. The sufficiency score can be displayed as a five star scale as in PhenoTips (43) and in the Monarch web portal (see below) to aid curation or data entry, and can also be used to suggest additional phenotypic assays to be performed – whether in a patient or in a model organism. Monarch Web Portal The Monarch portal is designed with a number of different use cases in mind, including: • A researcher interested in a human gene, its phenotypes, and the phenotypes of orthologs in model organisms and other species • Patients or researchers interested in a particular disease or phenotype (or groups of these), together with information on all implicated genes • A clinical scenario in which a patient has an undiagnosed disease showing a spectrum of phenotypes, with no definitive candidate gene demonstrated by sequencing; in this scenario the Basic Search The portal provides different means of searching over integrated content. In cases where a user is interested in a specific disease, gene, phenotype etc., these can usually be found via autocomplete. Site-wide synonym-aware text search can also be used to find pages of interest. Because the knowledgebase combines information from multiple species, entities such as genes often have ambiguous symbols. We provide species information to help disambiguate in a search. Search by phenotype profile One of the most innovative features of Monarch is the ability to query within and across species to look for diseases or organisms that share a set of similar but non-exact set of phenotypes (phenotypic profile). This feature uses a semantic similarity algorithm available from the OWLsim package (http://owlsim.org). Users can launch searches against specific targets: organisms, sets of named gene models, or against all models and diseases available in the Monarch repository. The Monarch Analyze Phenotypes interface (https://monarchinitiative.org/analyze/phenotypes) allows the user to build up a ‘cart’ of phenotypes, and then perform a comparison against phenotypes related to genes and diseases. Results are ranked according to closeness of match, partitioned by species, and are displayed as both a list and in the Phenogrid widget (Figure 6). A. Overview of a page bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder for this preprint (which was not Each. entity in the Monarch peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license data corpus has a corresponding page Definition, cross-refs, For disease & phenotype pages, a browsable hierarchy All 375 phenotype annotations, comprising 475 associations with this family of several related diseases 32 human genes, comprising 57 associations with this family of diseases (and one gene associated with the veterinary disease equivalent) Tabs on each of these pages list associations between the page entity and other entities in the corpus 90 animal models and cell lines recapitulating features of Marfan-related disorders 435 variants associated with Marfanrelated disorders Computed phenotypic similarity between Marfan-related disorders and other diseases and genes. See PhenoGrid figure for a description. B. Abstracted view of page content Subject Relationship type between subject correand object varies sponds by tab (for to the variants tab, the tab type relationships (eg pheno-include “is types, pathogenic genes, for” and “is likely etc) pathogenic for” For pages Species in where the which the page entity association is actually has been a group (eg. observed of diseases or group of phenotypes) the specific sub-entities are named Subject Relationship Object Species Trace the Links evidence to the eg. traceable source author literature statement, where asserted, available inferred based on combinatorial evidence, Curated source/ database from which associations were integrated Filter the data based on relevant features eg. species Species Homo sapiens (63) Mus musculus (19) Danio rerio (3) C elegans (3) Bos taurus (2) Evidence Database Type Reference source ClinVar OMIA Export any data Figure 5. Annotated Monarch webpage for Marfan and Marfan Related syndrome. This group of syndromic diseases has a number of different associations spanning multiple entity types – disease phenotypes, implicated human genes, variants and animal models and other model systems. An abstraction of the contents and features of the tabs is shown in the lower panel. Actual contents of the tabs are best viewed in the context of the web app at https://monarchinitiative.org/DOID:14323. Each square is colored according to how closely matched the pair of phenotypes is Names of the entities (diseases, genes ...) that are phenotypically similar to Marfan; below each label is its corresponding aggregate similarity score Marfan Syndrome Phenotype Similarity Comparison Default view shows the top 10results for each species. Click on the species name to see only results for that species (top 100) bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. All of the phenotypes associated with Marfan Syndrome in humans (filtered to show only those shared in phenotypically similar diseases and models) Hover for phenotype term info Hover over a match to see the pair of phenotypes that were matched between the disease and the matched model as well as the intermediate phenotype that both have in common, if applicable. The mouse gene Tgfb2 has a phenotype profile that is 64% similar to that of Marfan. Click on the gene symbol to explore the specific Tgfb2 variants and Tgfb2 genotypes in which the matching phenotypes were described. Figure 6. Partial screenshot of PhenoGrid showing Marfan syndrome. PhenoGrid shows input phenotypes in rows, models in columns, and cell contents color-coded with greater saturation indicating greater similarity. Disease phenotypes are shown as rows, and phenotypically matching human diseases and model organism genes are shown as columns – the saturation of a cell correlates with strength if phenotypic match. Mouseover tooltips highlight diseases associated with a selected phenotype (or vice-versa), or details (including similarity scores) of any match between a phenotype and a model. User controls support the selection of alternative sort orders, similarity metrics, and displayed organism(s) (mouse, human, zebrafish, or the 10 most similar models for each). Here we see all diseases or genes that exhibit ‘Hypoplasia of the mandible’ with the matching mouse gene Tfgb2. Actual PhenoGrid data is best viewed in the context of the web app at https://monarchinitiative.org/Orphanet:284993#compare. Note matches do not need to be exact – here the mouse phenotype of ‘small mandible’ (Mouse Phenotype Ontology) has a high scoring match to “micrognathia” (Human Phenotype Ontology) based on the fact that both phenotypes are related to “small mandible” (Mouse Phenotype Ontology). Advanced PhenoGrid features (not displayed) include the ability to alter the scoring and sorting methods, as well as zoomed-out map-style navigation. implemented. PhenoGrid is available as an openPhenogrid source widget suitable for integration in third-party Given a set of input phenotypes, as associated with a web sites, such as for model organism databases as patient or a disease, Monarch phenotypic profile done in the International Mouse Phenotyping similarity calculations can generate results involving Consortium (IMPC) or clinical comparison tools. hundreds of diseases and models. The PhenoGrid Download and installation instructions are available visualization widget (Figure 6) provides an overview on the Monarch Initiative web site. of these similarity results, implemented using the D3 javascript library (44). Phenotypes and models are Text Annotation frequently too numerous to fit on the initial display; The Monarch annotation service allows a user to thus scrolling, dragging, and filtering have been enter free text (for example, a paper abstract or a clinical narrative) and perform an automated annotation on this text, with entities in the text marked up with terms from the Monarch knowledge graph, such as genes, diseases, and phenotypes. Once the text is marked up, the user has the option of turning the recognized phenotype terms into a phenotype profile, and performing a profile search, or to link to any of the entity pages identified in the annotation. This tool is also available via services. A comparison between Monarch and existing resources is warranted. InterMine is an open-source data warehouse system used for disseminating data from large, complex biological heterogeneous data sources (51). InterMine provides sophisticated web services to support denormalized query and has been used to improve query and data access to model organism databases (52) and non-model organisms (53). InterMine is a federated approach Inferring causative variants where individual databases each can adopt and preprint(45) first posted Nov.recently, 3, 2016; doi:Genomiser http://dx.doi.org/10.1101/055756 The copyright holder for this preprint data (which was not but The bioRxiv Exomiser andonline more populate .their own object-oriented model, peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. tools (46) make use of the Monarch platform and can also align on certain aspects such as having phenotype matching algorithms to rank putative genomic data models aligned using the SO. causative variants using a combined variant and However, as yet genotype and phenotype modeling phenotype score. These tools have been used to is not aligned, and Intermine does not provide diagnose patients as part of the NIH Undiagnosed disease matching or phenotypic search. We are Diseases Project (14) and are the first examples of currently working with InterMine to achieve using model organism phenotype data to aid rare harmonization in this area. Other resources, such as disease diagnostics. KaBOB (54) and Bio2RDF (38) semantically integrate various resources into large triplestores. Bio2RDF typically retains the source vocabulary of the Discussion integrated resources, whereas KaBOB is more similar to Monarch in that it maps OBO ontologies The Monarch Initiative provides a system to organize (18). Other data integration approaches include the and harmonize the heterogeneous genotypeBioThings API, exemplified by the MyVariant system phenotype data found across clinical and model and (32) which aggregates variant data from multiple non-model organism resources (such as veterinary sources. We are currently working with the BioThings species), creating a unified overview of this rich API developers to integrate these different landscape of data sources. Some of the challenges approaches within the Dipper framework. Monarch is we have had to address are that each resource unique in that it aims to align both genotypic and shares data via different mechanisms and uses a phenotypic modeling across species and sources. different data model. It is particularly important to note that each organism annotates phenotypic data Future directions to different aspects of the genotype – one resource Future directions include bringing in phenotypic data might be to a gene, another an allele, another to a from specialized sources and databases, set of alleles, a full genotype or a SNP. This not only incorporating a wider range of datatypes, and to makes data integration difficult, but it also means that extend and improve analytic methods for making computation over the genotype-phenotype cross-species inferences. Currently the core of associations must be done with care. Similar issues Monarch includes primarily qualitative phenotypes at MGI have been described (47). In addition, since described using terms from existing phenotypic most anatomy, phenotype, and disease ontologies vocabularies – we are starting to bring in more describe the biology of one species, it has quantitative data, from sources such as the MPD (55) traditionally been quite difficult to ‘map’ across and GeneNetwork (56), in addition to expression data species. Some examples are the Human Phenotype annotated to Uberon in BgeeDb (57). We are also Ontology (HPO) (30) and the Mouse Anatomy extending our phenotypic search methods to Ontology (48). Monarch uses four species-neutral incorporate Phenologs, phenotypic groupings ontologies that unify their species-specific inferred on the basis of orthologous genes (58,59). counterparts (as shown in Figure 3): GENO for Early comparisons suggest that addition of genotypes (19), UPheno for phenotypes (49), phenologs to our suite of tools to enable genotypeUBERON for anatomy (23), and MonDO for diseases phenotype inquiry across species will extend our (26). Prior efforts to map or integrate species-specific reach in a synergistic manner (60). We therefore plan anatomical ontologies (24,50), for example, have to implement this type of approach into the Monarch been utilized in the construction of these speciestool suite and website. One of the most important neutral ontologies. The end result is a translational realizations we came across in constructing the platform that allows a unified view of human, model Monarch platform was the need to better represent and non-model organism biology. scientific evidence of genotype-phenotype associations. We are currently developing a Scientific Evidence and Provenance Information Ontology (SEPIO) (61) in collaboration with the Evidence and Conclusion Ontology consortium (21) and ClinGen (62) in order to classify associations as complementary, confirmatory, or contradictory. SEPIO will also integrate biological assays from the Ontology of Biomedical Investigations (63). Monarch has also been collaborating with the US National Cancer Institute’s Thesaurus (NCIT) team to integrate cancer phenotypes. Finally, Monarch has been working in the context of the Global Alliance for Genomics and Health (GA4GH) to develop a formal phenotype exchange format (www.phenopackets.org) that can aid phenotypic data sharing in numerous contexts such as clinical, model organism research, biodiversity, veterinary, and evolutionary biology. bioRxiv of preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder for this preprint (which was not Glossary Acronyms peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Acronym Name URL Ref PMID NAR DB Bgee http://bgee.org/ 57 https://thebiogrid.org/ CL BgeeDb Biological General Interaction Datasets. Cell Ontology 33 PMID:25428363 D470-D478 ClinVar ClinVar http://obofoundry.org/ontology/cl.html 64 PMID:27377652 42/D1/D950 https://www.ncbi.nlm.nih.gov/clinvar/ 2 PMID:26582918 D862-8 CTD Clinical Toxicology Database http://ctdbase.org/ 4 PMID:27651457 D914-20 ECO Evidence and Conclusions Ontology http://obofoundry.org/ontology/eco.html 21 PMID:25052702 ExAC Exome Aggregation Consortium http://exac.broadinstitute.org/ 6 PMID:27535533 FlyBase FlyBase http://flybase.org 65 PMID:26467478 D786-92 GeneNetwork Gene Network http://genenetwork.org 56 GENO Genotype Ontology https://github.com/monarch-initiative/GENO-ontology/ 19 GO Gene Ontology http://geneontology.org 37 PMID:25428369 D1049-56 GWAS GWAS Catalog https://www.ebi.ac.uk/gwas/ 5 PMID:24316577 D1001-6 HP http://human-phenotype-ontology.org/ 30 PMID:24217912 D966-D974 http://www.kegg.jp/ 31 PMID:24214961 D199-205 MGI Human Phenotype Ontology Kyoto Encyclopedia of Genes Genomes Mouse Genome Informatics 36 PMID:26238262 MonDO Monarch Merged Disease Ontology MP Mammalian Phenotype Ontology http://www.informatics.jax.org/ https://github.com/monarch-initiative/monarchdisease-ontology/ http://obofoundry.org/ontology/mp.html 20 PMID:26238262 MPD Mouse phenome database http://phenome.jax.org/ 55 PMID:24243846 MyGene MyGene http://mygene.info 32 PMID:27154141 OMIA Online Mendelian Inheritance in Animals http://omia.angis.org.au/home/ 41 OMIM Online Mendelian Inheritance in Man http://omim.org 1 PMID:25428349 OrphaNet Portal for rare diseases and orphan drugs http://www.orpha.net 3 PMID:22422702 Panther PantherDB http://pantherdb.org 34 PMID:26578592 RO http://obofoundry.org/ontology/ro.html 17 PMID: 15892874 SO Relation Ontology Scientific Evidence Information Ontology Sequence Ontology Uberon BioGrid KEGG SEPIO Repository and for and Provenance 26 D825-34 D789-98 D336-42 https://github.com/monarch-initiative/SEPIO-ontology/ 61 http://www.sequenceontology.org/ 27 PMID:26229585 Uber-anatomy ontology http://uberon.org 23 PMID:25009735 Upheno Unified Phenotype Ontology https://github.com/obophenotype/upheno/ 49 PMID:24358873 WormBase WormBase http://wormbase.org 66 PMID:26578572 ZFIN Zebrafish Information Resource http://zfin.org 35 PMID:26097180 D774-80 Funding 2014 Jan;42(Database issue):D1001-6. Available from: This work was funded by NIH #1R24OD011883 and http://www.ncbi.nlm.nih.gov/pubmed/24316577 6. Lek M, Karczewski KJ, Minikel E V., Samocha Wellcome Trust grant [098051]. We are grateful to KE, Banks E, Fennell T, et al. Analysis of the NIH Undiagnosed Disease Program: protein-coding genetic variation in 60,706 HHSN268201300036C, HHSN268201400093P and humans. Nature [Internet]. 2016 Aug 17 [cited to the Phenotype RCN: NSF-DEB-0956049. We are 2016 Oct 21];536(7616):285–91. Available also grateful to NCI/Leidos #15X143, BD2K from: U54HG007990-S2 (Haussler; GA4GH) & BD2K PAhttp://www.nature.com/doifinder/10.1038/natur e19057 15-144-U01 (Kesselman; FaceBase). JNY, SC, SEL bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756 . The copyright holder for this preprint (which was not path 7. Strohman R. Maneuvering in the complex and CJM’s contributions are also supported by the peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. from genotype to phenotype. Science (80- ) Director, Office of Science, Office of Basic Energy [Internet]. 2002/04/27. 2002;296(5568):701–3. Sciences, of the U.S. Department of Energy under Available from: Contract No. DE- AC02-05CH11231. http://www.ncbi.nlm.nih.gov/pubmed/11976445 8. Houle D, Govindaraju DR, Omholt S. Acknowledgments Phenomics: the next challenge. Nat Rev Genet We thank members of the Undiagnosed Disease [Internet]. 2010/11/19. 2010;11(12):855–66. Program, the International Mouse Phenotyping Available from: Consortium, the NCI Semantic Infrastructure team, http://www.ncbi.nlm.nih.gov/pubmed/21085204 and NIF/SciCrunch for their contributions. 9. McMurry JA, Köhler S, Washington NL, Balhoff JP, Borromeo C, Brush M, et al. Navigating the References Phenotype Frontier: The Monarch Initiative. 1. Amberger JS, Bocchini CA, Schiettecatte F, Genetics [Internet]. 2016 Aug [cited 2016 Aug Scott AF, Hamosh A. OMIM.org: Online 13];203(4):1491–5. Available from: Mendelian Inheritance in Man (OMIM®), an http://www.ncbi.nlm.nih.gov/pubmed/27516611 online catalog of human genes and genetic 10. Smedley D, Oellrich A, Köhler S, Ruef B, disorders. Nucleic Acids Res [Internet]. 2015 Westerfield M, Robinson P, et al. PhenoDigm: Jan 28 [cited 2015 Apr 1];43(Database analyzing curated annotations to associate issue):D789-98. Available from: animal models with human diseases. http://nar.oxfordjournals.org/content/43/D1/D7 Database (Oxford) [Internet]. 2013 Jan [cited 89.short 2013 Nov 4];2013:bat025. Available from: 2. Landrum MJ, Lee JM, Benson M, Brown G, http://www.pubmedcentral.nih.gov/articlerende Chao C, Chitipiralla S, et al. ClinVar: public r.fcgi?artid=3649640&tool=pmcentrez&rendert archive of interpretations of clinically relevant ype=abstract variants. Nucleic Acids Res [Internet]. 2016 11. Washington NL, Haendel M a, Mungall CJ, Jan 4;44(D1):D862-8. Available from: Ashburner M, Westerfield M, Lewis SE. http://www.ncbi.nlm.nih.gov/pubmed/26582918 Linking human diseases to animal models 3. Rath A, Olry A, Dhombres F, Brandt MM, using ontology-based phenotype annotation. Urbero B, Ayme S. Representation of rare PLoS Biol [Internet]. 2009 Nov [cited 2012 Jul diseases in health information systems: the 30];7(11):e1000247. Available from: Orphanet approach to serve a wide range of http://www.pubmedcentral.nih.gov/articlerende end users. Hum Mutat [Internet]. 2012 May r.fcgi?artid=2774506&tool=pmcentrez&rendert [cited 2015 Feb 23];33(5):803–8. Available ype=abstract from: 12. Robinson PN, Kohler S, Oellrich A, Genetics http://www.ncbi.nlm.nih.gov/pubmed/22422702 SM, Wang K, Mungall CJ, et al. Improved 4. Davis AP, Grondin CJ, Johnson RJ, Sciaky D, exome prioritization of disease genes through King BL, McMorran R, et al. The Comparative cross-species phenotype comparison. Toxicogenomics Database: update 2017. Genome Res [Internet]. 2014 Feb [cited 2015 Nucleic Acids Res [Internet]. 2016 Sep 19 Feb 18];24(2):340–8. Available from: [cited 2016 Oct 24]; Available from: http://www.pubmedcentral.nih.gov/articlerende http://www.ncbi.nlm.nih.gov/pubmed/27651457 r.fcgi?artid=3912424&tool=pmcentrez&rendert 5. Welter D, MacArthur J, Morales J, Burdett T, ype=abstract Hall P, Junkins H, et al. The NHGRI GWAS 13. Smedley D, Schubach M, Jacobsen JOBOB, Catalog, a curated resource of SNP-trait Köhler S, Zemojtel T, Spielmann M, et al. A associations. Nucleic Acids Res [Internet]. 14. 15. 16. 17. 18. 19. 20. Whole-Genome Analysis Framework for r.fcgi?artid=4378007&tool=pmcentrez&rendert Effective Identification of Pathogenic ype=abstract Regulatory Variants in Mendelian Disease. Am 21. Chibucos MC, Mungall CJ, Balakrishnan R, J Hum Genet [Internet]. 2016 Sep 3 [cited Christie KR, Huntley RP, White O, et al. 2016 Aug 27];99(3):595–606. Available from: Standardized description of scientific evidence http://dx.doi.org/10.1016/j.ajhg.2016.07.005 using the Evidence Ontology ( ECO ). Bone WP, Washington NL, Buske OJ, Adams Database [Internet]. 2014 Jul 22 [cited 2014 DR, Davis J, Draper D, et al. Computational Jul 22];2014(0):1–11. Available from: evaluation of exome sequence data using http://database.oxfordjournals.org/content/201 human and model organism phenotypes 4/bau075.full improves diagnostic efficiency. Genet Med 22. Mungall CJ, Washington NL, Nguyen-Xuan J, bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756 . The copyright holder for preprint (which [Internet]. 2015 Nov 12 [cited 2015 Nov 16]; Condit C, Smedley D,thisKöhler S, etwas al.not Use of peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Available from: model organism and disease databases to http://www.ncbi.nlm.nih.gov/pubmed/26562225 support matchmaking for human disease gene Zemojtel T, Kohler S, Mackenroth L, Jager M, discovery. Hum Mutat [Internet]. 2015 Oct Hecht J, Krawitz P, et al. Effective diagnosis of [cited 2015 Dec 19];36(10):979–84. Available genetic disease by computational phenotype from: analysis of the disease-associated genome. http://www.ncbi.nlm.nih.gov/pubmed/26269093 Sci Transl Med [Internet]. 2014 Sep 3 [cited 23. Haendel MA, Balhoff JP, Bastian FB, 2014 Sep 4];6(252):252ra123-252ra123. Blackburn DC, Blake JA, Bradford Y, et al. Available from: Unification of multi-species vertebrate anatomy http://www.ncbi.nlm.nih.gov/pubmed/25186178 ontologies for comparative biology in Uberon. Wright CF, Fitzgerald TW, Jones WD, Clayton J Biomed Semantics [Internet]. 2014 Jan [cited S, McRae JF, van Kogelenberg M, et al. 2015 Apr 30];5(1):21. Available from: Genetic diagnosis of developmental disorders http://www.jbiomedsem.com/content/5/1/21 in the DDD study: a scalable analysis of 24. Mungall C, Gkoutos G, Smith C, Haendel M, genome-wide research data. Lancet (London, Lewis S, Ashburner M. Integrating phenotype England) [Internet]. 2015 Apr 4 [cited 2016 Jul ontologies across multiple species. Genome 4];385(9975):1305–14. Available from: Biol [Internet]. 2010;11(1):R2. Available from: http://www.ncbi.nlm.nih.gov/pubmed/25529582 http://genomebiology.com/2010/11/1/R2 Smith B, Ceusters W, Klagges B, Köhler J, 25. Köhler S, Doelken SC, Ruef BJ, Bauer S, Kumar A, Lomax J, et al. Relations in Washington N, Westerfield M, et al. biomedical ontologies. Genome Biol [Internet]. Construction and accessibility of a cross2005 [cited 2016 Oct 27];6(5):R46. Available species phenotype ontology along with gene from: annotations for biomedical research. http://www.ncbi.nlm.nih.gov/pubmed/15892874 F1000Research [Internet]. 2013 Feb 1 [cited Smith B, Ashburner M, Rosse C, Bard J, Bug 2013 Mar 6];2. Available from: W, Ceusters W, et al. The OBO Foundry: http://f1000research.com/articles/2-30/v1 coordinated evolution of ontologies to support 26. Mungall CJ, Koehler S, Robinson P, Holmes I, biomedical data integration. Nat Biotechnol. Haendel M. k-BOOM: A Bayesian approach to 2007 Nov;25(11):1251–5. ontology structure inference, with applications Brush M, Mungall CJ, Washington NL, in disease ontology construction. In: Haendel MA. What’s in a Genotype?: An Phenotype Day, ISMB [Internet]. 2016 [cited Ontological Characterization for Integration of 2016 Oct 27]. Available from: Genetic Variation Data. In: International http://phenoday2016.bio-lark.org/pdf/2.pdf Conference on Biomedical Ontology 2013 27. Mungall CJ, Batchelor C, Eilbeck K. Evolution [Internet]. 2013. Available from: http://ceurof the Sequence Ontology terms and ws.org/Vol-1060/icbo2013_submission_60.pdf relationships. J Biomed Inform [Internet]. 2011 Smith CL, Eppig JT. Expanding the Feb [cited 2015 May 3];44(1):87–93. Available mammalian phenotype ontology to support from: automated exchange of high throughput http://www.pubmedcentral.nih.gov/articlerende mouse phenotyping data generated by larger.fcgi?artid=3052763&tool=pmcentrez&rendert scale mouse knockout screens. J Biomed ype=abstract Semantics [Internet]. 2015 Mar 25 [cited 2015 28. Bolleman JT, Mungall CJ, Strozzi F, Baran J, Mar 26];6(1):11. Available from: Dumontier M, Bonnal RJP, et al. FALDO: a http://www.pubmedcentral.nih.gov/articlerende semantic standard for describing the location 29. 30. 31. 32. 33. 34. 35. of nucleotide and protein feature annotation. J http://www.ncbi.nlm.nih.gov/pubmed/26097180 Biomed Semantics [Internet]. 2016 Dec 13 36. Eppig JT, Richardson JE, Kadin JA, Ringwald [cited 2016 Aug 19];7(1):39. Available from: M, Blake JA, Bult CJ. Mouse Genome http://jbiomedsem.biomedcentral.com/articles/ Informatics (MGI): reflecting on 25 years. 10.1186/s13326-016-0067-z Mamm Genome [Internet]. 2015 Aug [cited McMurry J, Muilu J, Dumontier M, Hermjakob 2016 Oct 23];26(7–8):272–84. Available from: H, Conte N, Gormanns P, et al. 10 Simple http://www.ncbi.nlm.nih.gov/pubmed/26238262 rules for design, provision, and reuse of 37. The Gene Ontology Consortium. Gene identifiers for web-based life science data. Ontology Consortium: going forward. Nucleic 2015 Oct 2 [cited 2016 May 31]; Available Acids Res [Internet]. 2014 Nov 26 [cited 2015 from: http://zenodo.org/record/31765 Jan 5];43(Database issue):D1049-56. bioRxiv preprint posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756 . The copyright holder for this preprint (which was not from: Köhler S, first Doelken SC, Mungall CJ, Bauer S, Available peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Firth H V, Bailleul-Forestier I, et al. The Human http://www.ncbi.nlm.nih.gov/pubmed/25428369 Phenotype Ontology project: linking molecular 38. Dumontier M, Callahan A, Cruz-Toledo J, biology and disease through phenotype data. Ansell P, Emonet V, Belleau F, et al. Bio2RDF Nucleic Acids Res [Internet]. 2014 Jan [cited release 3: a larger connected network of linked 2015 Mar 11];42(Database issue):D966-74. data for the life sciences. Proceedings of the Available from: 2014 International Conference on Posters & http://www.pubmedcentral.nih.gov/articlerende Demonstrations Track - Volume 1272. CEURr.fcgi?artid=3965098&tool=pmcentrez&rendert WS.org; 2014. p. 401–4. ype=abstract 39. Groza T, Köhler S, Doelken S, Collier N, Kanehisa M, Goto S, Sato Y, Kawashima M, Oellrich A, Smedley D, et al. Automatic Furumichi M, Tanabe M. Data, information, concept recognition using the Human knowledge and principle: back to metabolism Phenotype Ontology reference and test suite in KEGG. Nucleic Acids Res [Internet]. 2014 corpora. Database (Oxford) [Internet]. 2015 Jan [cited 2016 Oct 27];42(Database Jan [cited 2015 Mar 3];2015. Available from: issue):D199-205. Available from: http://www.pubmedcentral.nih.gov/articlerende http://www.ncbi.nlm.nih.gov/pubmed/24214961 r.fcgi?artid=4343077&tool=pmcentrez&rendert Xin J, Mark A, Afrasiabi C, Tsueng G, Juchler ype=abstract M, Gopal N, et al. High-performance web 40. Groza T, Köhler S, Moldenhauer D, Vasilevsky services for querying gene and variant N, Baynam G, Zemojtel T, et al. The Human annotation. Genome Biol [Internet]. 2016 May Phenotype Ontology: Semantic Unification of 6 [cited 2016 May 8];17(1):91. Available from: Common and Rare Disease. Am J Hum Genet http://www.pubmedcentral.nih.gov/articlerende [Internet]. 2015 Jun 24 [cited 2015 Jun r.fcgi?artid=4858870&tool=pmcentrez&rendert 30];97(1):111–24. Available from: ype=abstract http://www.ncbi.nlm.nih.gov/pubmed/26119816 Chatr-Aryamontri A, Breitkreutz B-J, Oughtred 41. Faculty of Veterinary Science U of S R, Boucher L, Heinicke S, Chen D, et al. The http://omia. angis. org. au. Online Mendelian BioGRID interaction database: 2015 update. Inheritance in Animals. Nucleic Acids Res [Internet]. 2015 Jan [cited 42. Washington N, Haendel M, Köhler S. How 2016 Oct 24];43(Database issue):D470-8. good is your phenotyping? Methods for quality Available from: assessment. In: Phenoday2014Bio-LarkOrg http://www.ncbi.nlm.nih.gov/pubmed/25428363 [Internet]. 2013. p. 1–4. Available from: Mi H, Poudel S, Muruganujan A, Casagrande http://phenoday2014.bio-lark.org/pdf/6.pdf JT, Thomas PD. PANTHER version 10: 43. Girdea M, Dumitriu S, Fiume M, Bowdin S, expanded protein families and functions, and Boycott KM, Chénier S, et al. PhenoTips: analysis tools. Nucleic Acids Res [Internet]. patient phenotyping software for clinical and 2016 Jan 4 [cited 2016 Oct 27];44(D1):D336research use. Hum Mutat [Internet]. 2013 Aug 42. Available from: [cited 2015 Mar 30];34(8):1057–65. Available http://www.ncbi.nlm.nih.gov/pubmed/26578592 from: Ruzicka L, Bradford YM, Frazer K, Howe DG, http://www.ncbi.nlm.nih.gov/pubmed/23636887 Paddock H, Ramachandran S, et al. ZFIN, The 44. Bostock M, Ogievetsky V, Heer J. D3: Datazebrafish model organism database: Updates Driven Documents. IEEE Trans Vis Comput and new directions. Genesis [Internet]. 2015 Graph [Internet]. 2011 Dec 1 [cited 2015 Feb Aug [cited 2016 Oct 23];53(8):498–509. 12];17(12):2301–9. Available from: Available from: http://ieeexplore.ieee.org/articleDetails.jsp?arn 45. 46. 47. 48. 49. 50. 51. umber=6064996 Available from: Robinson PN, Kohler S, Oellrich A, Genetics http://www.ncbi.nlm.nih.gov/pubmed/24753429 SM, Wang K, Mungall CJ, et al. Improved 52. Lyne R, Sullivan J, Butano D, Contrino S, exome prioritization of disease genes through Heimbach J, Hu F, et al. Cross-organism cross-species phenotype comparison. analysis using InterMine. Genesis [Internet]. Genome Res [Internet]. 2014 Feb [cited 2014 2015 Aug [cited 2016 Oct 21];53(8):547–60. Jul 12];24(2):340–8. Available from: Available from: http://www.pubmedcentral.nih.gov/articlerende http://www.ncbi.nlm.nih.gov/pubmed/26097192 r.fcgi?artid=3912424&tool=pmcentrez&rendert 53. Elsik CG, Tayal A, Diesh CM, Unni DR, Emery ype=abstract ML, Nguyen HN, et al. Hymenoptera Genome Smedley D, Schubach M, Jacobsen JOB, Database: integrating genome annotations in bioRxiv preprint posted online Nov. 3, 2016; doi:M, http://dx.doi.org/10.1101/055756 . The copyright holder for this preprint (which was not Res Köhler S, first Zemojtel T, Spielmann et al. A HymenopteraMine. Nucleic Acids peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Whole-Genome Analysis Framework for [Internet]. 2016 Jan 4 [cited 2016 Oct Effective Identification of Pathogenic 21];44(D1):D793-800. Available from: Regulatory Variants in Mendelian Disease. Am http://www.ncbi.nlm.nih.gov/pubmed/26578564 J Hum Genet [Internet]. 2016 Sep [cited 2016 54. Livingston KM, Bada M, Baumgartner WA, Sep 22];99(3):595–606. Available from: Hunter LE, Galperin M, Rigden D, et al. http://linkinghub.elsevier.com/retrieve/pii/S000 KaBOB: ontology-based semantic integration 2929716302786 of biomedical databases. BMC Bioinformatics Bello SM, Smith CL, Eppig JT. Allele, [Internet]. 2015 Dec 23 [cited 2016 Oct phenotype and disease data at Mouse 21];16(1):126. Available from: Genome Informatics: improving access and http://www.biomedcentral.com/1471analysis. Mamm Genome [Internet]. 2015 Aug 2105/16/126 [cited 2016 Aug 19];26(7–8):285–94. Available 55. Grubb SC, Bult CJ, Bogue MA. Mouse from: phenome database. Nucleic Acids Res http://www.ncbi.nlm.nih.gov/pubmed/26162703 [Internet]. 2014 Jan [cited 2015 May Hayamizu TF, Baldock RA, Ringwald M. 3];42(Database issue):D825-34. Available Mouse anatomy ontologies: enhancements from: and tools for exploring and integrating http://www.pubmedcentral.nih.gov/articlerende biomedical data. Mamm Genome [Internet]. r.fcgi?artid=3965087&tool=pmcentrez&rendert 2015 Oct [cited 2016 Aug 19];26(9–10):422– ype=abstract 30. Available from: 56. Mulligan MK, Mozhui K, Prins P, Williams http://www.ncbi.nlm.nih.gov/pubmed/26208972 Summary RW. GeneNetwork – A Toolbox for Köhler S, Doelken SC, Ruef BJ, Bauer S, Systems Genetics. Syst Genet Methods Mol Washington N, Westerfield M, et al. Biol. 2016;9. Construction and accessibility of a cross57. Bastian F, Parmentier G, Roux J, Moretti S, species phenotype ontology along with gene Laudet V, Robinson-Rechavi M. Bgee: annotations for biomedical research. Integrating and Comparing Heterogeneous F1000Research [Internet]. 2013 Jan [cited Transcriptome Data Among Species. In: Data 2014 Aug 19];2:30. Available from: Integration in the Life Sciences. 2008. p. 124– http://www.pubmedcentral.nih.gov/articlerende 31. r.fcgi?artid=3799545&tool=pmcentrez&rendert 58. McGary KL, Park TJ, Woods JO, Cha HJ, ype=abstract Wallingford JB, Marcotte EM. Systematic Hayamizu TF, de Coronado S, Fragoso G, discovery of nonobvious human disease Sioutos N, Kadin JA, Ringwald M. The mousemodels through orthologous phenotypes. Proc human anatomy ontology mapping project. Natl Acad Sci U S A [Internet]. 2010 Apr 6 Database (Oxford) [Internet]. 2012 Jan [cited [cited 2015 May 1];107(14):6544–9. Available 2015 Apr 12];2012:bar066. Available from: from: http://www.pubmedcentral.nih.gov/articlerende http://www.pubmedcentral.nih.gov/articlerende r.fcgi?artid=3308156&tool=pmcentrez&rendert r.fcgi?artid=2851946&tool=pmcentrez&rendert ype=abstract ype=abstract Kalderimis A, Lyne R, Butano D, Contrino S, 59. Woods JO, Singh-Blom UM, Laurent JM, Lyne M, Heimbach J, et al. InterMine: McGary KL, Marcotte EM. Prediction of geneextensive web services for modern biology. phenotype associations in humans, mice, and Nucleic Acids Res [Internet]. 2014 Jul [cited plants using phenologs. BMC Bioinformatics 2016 Oct 21];42(Web Server issue):W468-72. [Internet]. 2013 Jan [cited 2015 May 60. 61. 62. 63. 64. 65. 66. 20];14:203. Available from: http://www.pubmedcentral.nih.gov/articlerende r.fcgi?artid=3704650&tool=pmcentrez&rendert ype=abstract Laraway B. Comparative analysis of semantic similarity and gene orthology tools for identification of gene candidates for human diseases [Internet]. Oregon Health & Science University; 2015. Available from: http://digitalcommons.ohsu.edu/etd/3741 Brush M, Shefchek K, Haendel MA. SEPIO: A bioRxiv preprint first posted online doi: http://dx.doi.org/10.1101/055756 . The copyright holder for this preprint (which was not Semantic Model for Nov. the3, 2016; Integration and peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license. Analysis of Scientific Evidence. In: International Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016) [Internet]. Corvallis, Oregon; 2016. Available from: http://icbo.cgrb.oregonstate.edu/ Rehm HL, Berg JS, Brooks LD, Bustamante CD, Evans JP, Landrum MJ, et al. ClinGen-the Clinical Genome Resource. N Engl J Med [Internet]. 2015 Jun 4 [cited 2016 Mar 22];372(23):2235–42. Available from: http://www.pubmedcentral.nih.gov/articlerende r.fcgi?artid=4474187&tool=pmcentrez&rendert ype=abstract Bandrowski A, Brinkman R, Brochhausen M, Brush MH, Bug B, Chibucos MC, et al. The Ontology for Biomedical Investigations. PLoS One [Internet]. 2016;11(4):e0154556. Available from: http://www.ncbi.nlm.nih.gov/pubmed/27128319 Diehl AD, Meehan TF, Bradford YM, Brush MH, Dahdul WM, Dougall DS, et al. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability. J Biomed Semantics. 2016 Jul 4;7(1):44. Available from: https://www.ncbi.nlm.nih.gov/pubmed/2737765 2 Attrill H, Falls K, Goodman JL, Millburn GH, Antonazzo G, Rey AJ, et al. FlyBase: establishing a Gene Group resource for Drosophila melanogaster. Nucleic Acids Res. 2016 Jan 4;44(D1):D786-92. Available from: https://www.ncbi.nlm.nih.gov/pubmed/?term=P MID%3A26467478 Howe KL, Bolt BJ, Cain S, Chan J, Chen WJ, Davis P, et al. WormBase 2016: expanding to enable helminth genomic research. Nucleic Acids Res. 2016 Jan 4;44(D1):D774-80. Available from: https://www.ncbi.nlm.nih.gov/pubmed/2657857 2
© Copyright 2026 Paperzz