Large-scale integration of model organism and clinical

bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Large-scale integration of model organism and clinical research data can provide a breadth of knowledge not
available from individual sources and can provide contextualization of data back to these sources.
The Monarch Initiative (www.monarchinitiative.org) is a collaborative, open science effort that aims to
semantically integrate genotype-phenotype data from many species and sources in order to support precision
medicine, disease modeling, and mechanistic exploration. Our integrated knowledge graph, analytic tools, and web
services enable diverse users to explore relationships between phenotypes and genotypes across species.
The Monarch Initiative:
An integrative data and analytic platform connecting phenotypes to genotypes across species
1
2
3
4
5
2
Christopher J Mungall , Julie A McMurry , Sebastian Köhler , James P. Balhoff , Charles Borromeo , Matthew Brush ,
1
2
1
2
2
2
6
Seth Carbon , Tom Conlin , Nathan Dunn , Mark Engelstad , Erin Foster , JP Gourdine , Julius O.B. Jacobsen , Dan
2
2
1
1
2
2
5
Keith , Bryan Laraway , Suzanna E. Lewis , Jeremy Nguyen Xuan , Kent Shefchek , Nicole Vasilevsky , Zhou Yuan ,
1
5
7
8
3
2*
Nicole Washington , Harry Hochheiser , Tudor Groza , Damian Smedley , Peter N. Robinson , Melissa A Haendel
1.
2.
3.
4.
5.
6.
7.
8.
Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health & Science University, Portland, USA.
Institute for Medical Genetics and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
RTI International, Research Triangle Park, NC, USA
Department
of Biomedical
Informatics,
University
of Pittsburgh,
Pittsburgh, PA
bioRxiv preprint
first posted
online Nov.
3, 2016;
doi: http://dx.doi.org/10.1101/055756
. The copyright holder for this preprint (which was not
Wellcome Trust Sanger Institute,
Hinxton,
CB10 1SA,
UK available under a CC-BY 4.0 International license.
peer-reviewed)
is Cambridge,
the author/funder.
It is made
Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, NSW 2010, Australia
William Harvey Research Institute, Barts & The London School of Medicine & Dentistry, Queen Mary University of London, Charterhouse Square,
London, EC1M 6BQ, UK
*To whom correspondence should be addressed. Tel: 1-503-407-5970; Email: [email protected]
Present Address: Melissa Haendel, Department of Medical Informatics and Clinical Epidemiology and OHSU Library, Oregon Health &
Science University, Portland, USA.
Abstract
The correlation of phenotypic outcomes with genetic
variation and environmental factors is a core pursuit
in biology and biomedicine. Numerous challenges
impede our progress: patient phenotypes may not
match known diseases, candidate variants may be in
genes that have not been characterized, model
organisms may not recapitulate human or veterinary
diseases, filling evolutionary gaps is difficult, and
many resources must be queried to find potentially
significant genotype-phenotype associations. Since
non-human organisms have proven instrumental in
revealing biological mechanisms, it is necessary to
provide advanced informatics tools to identify
phenotypically relevant disease models. Large-scale
integration of model organism and clinical research
data can provide a breadth of knowledge not
available from individual sources and can provide
contextualization of data back to these sources. The
Monarch Initiative (www.monarchinitiative.org) is a
collaborative, open science effort that aims to
semantically integrate genotype-phenotype data from
many species and sources in order to support
precision
medicine,
disease
modeling,
and
mechanistic exploration. Our integrated knowledge
graph, analytic tools, and web services enable
diverse users to explore relationships between
phenotypes and genotypes across species.
Introduction
A fundamental axiom of biology is that phenotypic
manifestations of an organism are due to interaction
between genotype and environmental factors over
time. In the rapidly advancing era of genomic
medicine, a critical challenge is to identify the genetic
etiologies of Mendelian disease, cancer, and
common and complex diseases, and translate basic
science to better treatments. Currently, available
human data associates less than approximately 51%
of known human coding genes with phenotype data
(based on OMIM (1), ClinVar (2), Orphanet (3), CTD
(4), and the GWAS catalog (5)). See Table 1 for a list
of database abbreviations. This coverage can be
extended to approximately 89% if phenotypic
information from orthologous genes from five of the
most well-studied model organisms is included
(Figure 1).
Figure 1. The phenotype annotation coverage of human
coding genes. Yellow bars show that 51% of those genes
have at least one phenotype association reported in
humans (HPO annotations of OMIM, ClinVar, Orphanet,
CTD, and GWAS). The blue bars show that 58% of human
coding genes have orthologs with causal phenotypic
associations reported in at least one non-human model
(MGI, Wormbase, Flybase, and ZFIN). The green bars
show that 40% of human coding genes have annotations
both in human and in non-human orthologs. There are
phenotypic associations from humans and/or non-human
orthologs that cover 89% of human coding genes.
Similarly, of the 72% of the 3,230 genes in ExAC with
‘near-complete depletion of predicted proteintruncating variants have no currently established
human disease phenotype’ (6), where 88% of these
genes without a human phenotype have a phenotype
in a non-human organism. However, leveraging
these model data for computational use is non-trivial
primarily because the relationships between gene
and disease (7) and between model system and
disease phenotypes (8) are not straightforward.
maps a variety of external data sources and
databases
to
RDF
(Resource
Description
Framework) graphs. RDF provides a flexible way of
modeling a variety of complex datatypes, and allows
entities from different databases to be connected via
common instance or class URIs (Uniform Resource
Indicators). We use relationship types from the
Relation
Ontology
(RO;
https://github.com/oborel/obo-relations)
(17)
and
other vocabularies to connect entities together, along
with a number of Open Biological Ontologies (18)
bioRxiv preprint
posted online
Nov.
3, 2016;
doi: http://dx.doi.org/10.1101/055756
. The
copyright holder
this preprint For
(whichexample,
was not
In recent
years,firstthere
has
been
a
growth
in
the
(OBOs) to
classify
thesefor entities.
a
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
number of genotype-phenotype databases available,
mouse genotype can be related to a phenotype using
covering a diversity of domain areas for human,
the has_phenotype relation (RO:0002200), with the
model organisms, and veterinary species. While
genotype classified using a term from the Genotype
providing quality inventories of the relevant species
Ontology (GENO) (19), and the phenotype classified
and phenotypic data types, most resources are
using the Mammalian Phenotype Ontology (MP) (20).
limited to a single species or limit cross-species
We use the Open Biomedical Annotations (OBAN;
comparison to direct assertions (e.g. Organism X is a
https://github.com/EBISPOT/OBAN) vocabulary to
model of Disease Y) or based upon orthology
associate evidence and provenance metadata with
relations (e.g. organism Y’ is a model of Disease Y
each edge, using the Evidence and Conclusions
due to Y and Y’ being orthologs). While great strides
Ontology (ECO) for types of evidence (21).The
have been made in text-based search engines,
graphs produced by Dipper are available as a
phenotype data remains difficult to search and use
standalone resource in RDF/turtle format at
computationally due to its complexity and in the use
http://data.monarchinitiative.org/ttl.
of different phenotype standards and terminologies.
Such barriers have made linking and integration with
the precision and richness needed for mechanistic
discovery across species a significant challenge (9).
A newer method to aid identifying models of disease
and to discovery underlying mechanisms is to utilize
ontologies to describe the set of phenotypes that
present for a given genotype or disease, what we call
a ‘phenotypic profile’. A phenotypic profile is the
subject of non-exact matching within and across
species using ontology integration and semantic
similarity algorithms (10)(11) in software applications
such as Exomiser (12) and Genomiser (13), and this
approach has been shown assist disease diagnosis
(14–16). The Monarch Initiative uses an ontologybased strategy to deeply integrate genotypeFigure 2. Monarch Data Architecture. Structured and
unstructured data sources are loaded into SciGraph via
phenotype data from many species and sources,
Dipper. Ontologies are also loaded into SciGraph, resulting
thereby enabling computational interrogation of
in a combined knowledge and data graph. Data is
disease models and complex relationships between
disseminated via SciGraph Services, an ontologygenotype and phenotype to be revealed. The
enhanced Solr instance called GOlr, and to the OwlSim
semantic similarity software. Monarch applications and
Monarch Initiative name was chosen because it is a
end users access the services for graph querying,
community effort to create paths for diverse data to
application population, and phenotype matching.
be put to use for disease discovery, not unlike the
navigation routes that a monarch butterfly would
We also import a number of external and in-house
take.
ontologies, for data description and data integration.
As these ontologies are all available from the OBO
Data Architecture
Library in Web Ontology Language (OWL), no
additional transformation is necessary. The combined
The overall data architecture for Monarch is shown in
corpus of graphs ingested using Dipper and from
Figure 2. The bulk of the data integration is carried
ontologies is referred to as the Monarch Knowledge
out using our Data Ingest Pipeline (Dipper) tool
Graph.
(https://github.com/monarch-initiative/dipper), which
A: Data types covered by Monarch data sources C: Mappings to bridging ontologies
2
P/D
genes
variants
orthology
animals
function
interactions
expression
pathways
models
diseases
phenotypes
G
B: Monarch data sources and ontology annotations
Figure 3. Data types,
sources, and the ontologies
Data source
used for their integration
in Monarch
into
the
Monarch
MedGen
edGen
O O
O O ClinVar
knowledge graph. Each
UMLS
S
MeSH
M
SH
data source uses or is
O
O
O O CTD
DOID
mapped to a suite of
O
O
Gene Revs
different
ontologies
or
MONDO
NDO
O O
O
OMIM db
OMIM
vocabularies. These are in
O O HPOA
turn integrated into bridging
Orphanet
O O
O
ORDO
ontologies
for
Genetics
O O
O O GWAS
Elements
of
Morphology
M
bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder (GENO),
for this preprint (which wasAnatomy
not
Coriell is the author/funder. It is made available
O O
O O peer-reviewed)
HP under a CC-BY 4.0 International license.
(Uberon/CL), Phenotypes
KEGG
O
O
O
EFO
(UPheno), and Diseases
VT
O O
O
O O O OMIA
UPheno
(MonDO).
O
O O
O
O O
O
O
O
O
O O O O O O p
O
O
O O
O O O p
O
O
O O
O
O O
O O
p
O O O O
O
O
O
O
O
O
O
O
O
AnimalQTLDB
MPD
MMRRC
MP
WP
FBcv
WA
WmBase
FlyBase
FlyBase
O
IMPC
O
O
MGI
O
O
ZFIN
Ensembl
NCBI
UCSC
HGNC
BioGrid
Panther
CL
MA
FBbt
EMAPA
ZP
ZFA
SO
All sources *
{
UBERON
GENO
GENETICS ANATOMY PHENOTYPES DISEASES
O O
Bridging
Ontology
Ontology
mappings
annotations
FALDO
GO
ECO
RO
We contribute
We created
& maintain
Figure 4. Distribution of
phenotypic
annotations
across species in Monarch,
broken down by the top
levels of the phenotype
ontology. The graph can be
interactively explored at
https://monarchinitiative.org
/phenotype/.
Note
that
annotations are currently
dominated
by
human,
mouse, zebrafish and C
elegans (top panel); the
chart is faceted allowing
individual species to be
switched on and off to see
contributions for less datarich species such as
veterinary animals and
monkeys (middle panel).
Clicking
on
a
given
phenotype
text
allows
drilling down to its subtypes
(lower panel).
The data integrated in Monarch wide range of
• The Genotype Ontology (GENO) (19) defines
sources, including human clinical knowledge sources
genotypic elements and bridges the Sequence
as well as genetic and genomic resources covering
Ontology (SO) (27) and FALDO (28). GENO allows
organismal biology. The list of data sources and
the propagation of phenotypes that are annotated
ontologies integrated is shown in Figure 3, with a
differently in different sources to genotypic
species distribution illustrated in Figure 4. The
elements.
knowledge graph is loaded into an instance of a
Entity resolution and unification
SciGraph
database
(https://github.com/SciGraph/SciGraph/),
which
One of the many challenges faced when integrating
embeds and extends a Neo4J database, allowing for
bioinformatics resources is the presence of the same
complex queries and ontology-aware data processing
bioRxiv
preprint
first
posted
online
Nov.
3,
2016;
doi:
http://dx.doi.org/10.1101/055756
. The copyright
holder for thisdesignated
preprint (whichby
was different
not
entity in multiple
databases,
and Named Entitypeer-reviewed)
Recognition.
We provide two
is the author/funder. It is made available under a CC-BY 4.0 International license.
identifiers
(29).
This
problem
is
compounded
by the
public endpoints for client software to query these
different ways the same identifier can be written,
services:
using different prefixes or no prefix at all. Taking a
https://scigraph-ontology.monarchinitiative.org/scigraph/docs
Monarch page for a single gene, for example
(for ontology access) and
‘fibrinogen
gamma
chain’,
FGG,
https://scigraph-data.monarchinitiative.org/scigraph/docs
(https://monarchinitiative.org/gene/NCBIGene:2266).
(for ontology plus data access).
Monarch has integrated data from a variety of
human, model organism, and other biomedical
These SciGraph instances provide powerful graph
sources such as OMIM (1), Orphanet (3), ClinVar (2),
querying capabilities over the complete knowledge
HPO (30), KEGG (31), CTD (4), MyGene (32),
graph. Many of the common query patterns are
BioGrid (33), and via orthology in PantherDB (34) we
executed in advance and stored in an Apache Solr
also incorporate Fgg gene data from ZFIN (35) and
index, making use of the Gene Ontology ‘GOlr’
from MGI (36). No two of these sources represents
indexing strategy, allowing for fast queries of
the identifier for FGG in precisely the same way. As
ontology-indexed associations.
part of our data ingest process, we normalize all
identifiers using a curated set of database prefixes.
Finally, we also load a subset of the graph into an
These have a defined mapping to an http URI. These
OwlSim instance, which provides phenotype
curated prefixes have been deposited in the Prefix
matching services as well as the ability to perform
Commons (https://github.com/prefixcommons), which
fuzzy phenotype searches based on a phenotype
similarly contains identifier prefixes used within the
profile. We also provide phenotype matching services
Gene Ontology (37) and Bio2RDF (38).
via the Global Alliance for Genomes and Health
(GA4GH) Matchmaker Exchange (MME) API MME
In post-processing equivalent identifiers, we perform
(22), available at https://mme.monarchinitiative.org.
clique-merging
(https://github.com/SciGraph/SciGraph/wiki/PostMany of the data sources we integrate make use of
processors). We take all edges labeled with either
their own terminologies and ontologies. We
the owl:sameAs or owl:equivalentClasses property
aggregate
these
into
a
unified
ontology
and calculate equivalence cliques, based on the
(https://github.com/monarch-initiative/monarchsymmetric and transitive nature of these properties.
ontology/) and make use of bridging ontologies and
We then merge these cliques together, taking a
our curated integrative ontologies to connect these
designated ‘clique leader’ (for instance, NCBI for
together. In particular:
genes) and mapping all edges in the monarch graph
• The Uber-anatomy ontology (Uberon) bridges
such that they point to a clique leader.
species-specific and clinical anatomical and tissue
ontologies (23)
In-house curation
• The unified phenotype ontology bridges model
organism and human phenotype ontologies and
In addition to ingest of external sources and
terminologies, using techniques described in
ontologies, we perform in-house data and ontology
(24)(25)
curation. For curation of ontology-based genotype• The Monarch Merged Disease Ontology (MonDO)
phenotype
associations
(including
diseaseuses a Bayes ontology merging algorithm (26) to
phenotypic profiles), we are transitioning to the
integrate multiple human disease resources into a
WebPhenote
platform
single ontology, and additionally includes animal
(http://create.monarchinitiative.org), which allows a
diseases from OMIA.
variety of disease entities to be connected to
phenotypic descriptors. We also make use of text
mining
to
create
seed
disease-phenotype
associations using the Bio-Lark toolkit (39), which are
then manually curated. Most recently we have
performed a large-scale annotation of PubMed to
extract common disease-phenotype associations
(40). Most of the in-house curation work involves
making smaller resources with free text description of
phenotypic information computable, for example, the
Online Mendelian Inheritance in Animals (OMIA)
resource, with whom we have been collaborating to
support this curation (41).
clinician wishes to search for either known
diseases that have a similar presentation, or
model organism genes that demonstrate
homologous phenotypes when the gene is
perturbed
• Researcher looking for diseases that have similar
phenotypic feature to a newly identified model
organism mutant identified in a screen
• Researchers or clinicians who need to identify
potentially informative phenotyping assays for
differential diagnosis or to identify candidate genes
bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Quality control Features
External resources and datasets that are
incorporated into Monarch are evaluated before
incorporation into the Dipper pipeline – we primarily
integrate high-quality curated resources. For all
ontologies we bring in, we apply automated
reasoning to detect inconsistencies between different
ontologies. For each release, we perform high-level
checks on each integrated resource to ensure no
errors in the extraction process occurred, but we do
not perform in-depth curation checks of integrated
resources. Each release happens once every one to
two months.
Integrated information on entities of interest
We provide overview pages for entities such as
genes, diseases, phenotypes, genotypes, variants,
and publications. Each page highlights the
provenance of the data from the diverse clinical,
model organism, and non-model organism sources.
These pages can be found either via search (see
below) or through an entity resolver. For example,
the URL https://monarchinitiative.org/OMIM:266510
will redirect to a page about the disease ‘Peroxisome
biogenesis disorder type 3B’ from the OMIM
resource, showing its relationships to other content
within the Monarch knowledge graph, such as
phenotypes and genes associated with the disease.
We make use of MonDO (the Monarch merged
disease ontology (26)) to group similar diseases
together. Figure 5 shows an example page for
Marfan syndrome with related phenotype, gene,
model and variant data.
In order to measure annotation richness, we have
also created an annotation sufficiency meter web
service
(42)
available
at
https://monarchinitiative.org/page/services;
this
service determines whether a given phenotype profile
for any organism is sufficiently broad and deep to be
of diagnostic utility. The sufficiency score can be
displayed as a five star scale as in PhenoTips (43)
and in the Monarch web portal (see below) to aid
curation or data entry, and can also be used to
suggest additional phenotypic assays to be
performed – whether in a patient or in a model
organism.
Monarch Web Portal
The Monarch portal is designed with a number of
different use cases in mind, including:
• A researcher interested in a human gene, its
phenotypes, and the phenotypes of orthologs in
model organisms and other species
• Patients or researchers interested in a particular
disease or phenotype (or groups of these),
together with information on all implicated genes
• A clinical scenario in which a patient has an
undiagnosed disease showing a spectrum of
phenotypes, with no definitive candidate gene
demonstrated by sequencing; in this scenario the
Basic Search
The portal provides different means of searching over
integrated content. In cases where a user is
interested in a specific disease, gene, phenotype
etc., these can usually be found via autocomplete.
Site-wide synonym-aware text search can also be
used to find pages of interest. Because the
knowledgebase combines information from multiple
species, entities such as genes often have
ambiguous symbols. We provide species information
to help disambiguate in a search.
Search by phenotype profile
One of the most innovative features of Monarch is
the ability to query within and across species to look
for diseases or organisms that share a set of similar
but non-exact set of phenotypes (phenotypic profile).
This feature uses a semantic similarity algorithm
available
from
the
OWLsim
package
(http://owlsim.org). Users can launch searches
against
specific
targets:
organisms,
sets of named gene models, or against all models
and diseases available in the Monarch repository.
The Monarch Analyze Phenotypes interface
(https://monarchinitiative.org/analyze/phenotypes)
allows the user to build up a ‘cart’ of phenotypes, and
then perform a comparison against phenotypes
related to genes and diseases. Results are ranked
according to closeness of match, partitioned by
species, and are displayed as both a list and in the
Phenogrid widget (Figure 6).
A. Overview of a page
bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder for this preprint (which was not
Each. entity in the Monarch
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license
data corpus has a
corresponding page
Definition,
cross-refs,
For disease
& phenotype
pages, a
browsable
hierarchy
All 375 phenotype
annotations,
comprising 475
associations with
this family of
several related
diseases
32 human genes,
comprising 57
associations with
this family of diseases
(and one gene
associated with the
veterinary disease
equivalent)
Tabs on each of these pages
list associations between
the page entity and other
entities in the corpus
90 animal
models
and cell lines
recapitulating
features of
Marfan-related
disorders
435 variants
associated
with Marfanrelated
disorders
Computed phenotypic similarity
between Marfan-related
disorders and other diseases
and genes. See PhenoGrid
figure for a description.
B. Abstracted view of page content
Subject
Relationship
type
between subject
correand object varies
sponds
by tab (for
to the
variants tab, the
tab type relationships
(eg pheno-include “is
types,
pathogenic
genes,
for” and “is likely
etc)
pathogenic for”
For pages Species in
where the which the
page entity association
is actually has been
a group (eg. observed
of diseases
or group of
phenotypes)
the specific
sub-entities
are named
Subject Relationship Object
Species
Trace the
Links
evidence
to the
eg. traceable source
author
literature
statement, where
asserted,
available
inferred
based on
combinatorial
evidence,
Curated
source/
database
from which
associations
were
integrated
Filter the data based
on relevant features
eg. species
Species
Homo sapiens (63)
Mus musculus (19)
Danio rerio (3)
C elegans (3)
Bos taurus (2)
Evidence
Database
Type Reference source
ClinVar
OMIA
Export
any data
Figure 5. Annotated Monarch webpage for Marfan and Marfan Related syndrome. This group of syndromic
diseases has a number of different associations spanning multiple entity types – disease phenotypes,
implicated human genes, variants and animal models and other model systems. An abstraction of the contents
and features of the tabs is shown in the lower panel. Actual contents of the tabs are best viewed in the context
of the web app at https://monarchinitiative.org/DOID:14323.
Each square is colored according to
how closely matched the pair of
phenotypes is
Names of the entities (diseases, genes ...)
that are phenotypically similar to Marfan;
below each label is its corresponding
aggregate similarity score
Marfan Syndrome
Phenotype Similarity Comparison
Default view shows the
top 10results for each
species. Click on the
species name to see
only results for that
species (top 100)
bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder for this preprint (which was not
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
All of the phenotypes
associated with Marfan
Syndrome in humans
(filtered to show only
those shared in
phenotypically similar
diseases and models)
Hover for
phenotype
term info
Hover over a match to see
the pair of phenotypes
that were matched between
the disease and the matched
model as well as the
intermediate phenotype
that both have in common,
if applicable.
The mouse gene Tgfb2 has
a phenotype profile
that is 64% similar to that of
Marfan. Click on the gene symbol
to explore the specific Tgfb2 variants
and Tgfb2 genotypes in which the
matching phenotypes were described.
Figure 6. Partial screenshot of PhenoGrid showing Marfan syndrome. PhenoGrid shows input phenotypes in
rows, models in columns, and cell contents color-coded with greater saturation indicating greater similarity.
Disease phenotypes are shown as rows, and phenotypically matching human diseases and model organism
genes are shown as columns – the saturation of a cell correlates with strength if phenotypic match. Mouseover tooltips highlight diseases associated with a selected phenotype (or vice-versa), or details (including
similarity scores) of any match between a phenotype and a model. User controls support the selection of
alternative sort orders, similarity metrics, and displayed organism(s) (mouse, human, zebrafish, or the 10 most
similar models for each). Here we see all diseases or genes that exhibit ‘Hypoplasia of the mandible’ with the
matching mouse gene Tfgb2. Actual PhenoGrid data is best viewed in the context of the web app at
https://monarchinitiative.org/Orphanet:284993#compare.
Note matches do not need to be exact – here the mouse phenotype of ‘small mandible’ (Mouse Phenotype
Ontology) has a high scoring match to “micrognathia” (Human Phenotype Ontology) based on the fact that
both phenotypes are related to “small mandible” (Mouse Phenotype Ontology). Advanced PhenoGrid features
(not displayed) include the ability to alter the scoring and sorting methods, as well as zoomed-out map-style
navigation.
implemented. PhenoGrid is available as an openPhenogrid
source widget suitable for integration in third-party
Given a set of input phenotypes, as associated with a
web sites, such as for model organism databases as
patient or a disease, Monarch phenotypic profile
done in the International Mouse Phenotyping
similarity calculations can generate results involving
Consortium (IMPC) or clinical comparison tools.
hundreds of diseases and models. The PhenoGrid
Download and installation instructions are available
visualization widget (Figure 6) provides an overview
on the Monarch Initiative web site.
of these similarity results, implemented using the D3
javascript library (44). Phenotypes and models are
Text Annotation
frequently too numerous to fit on the initial display;
The Monarch annotation service allows a user to
thus scrolling, dragging, and filtering have been
enter free text (for example, a paper abstract or a
clinical narrative) and perform an automated
annotation on this text, with entities in the text
marked up with terms from the Monarch knowledge
graph, such as genes, diseases, and phenotypes.
Once the text is marked up, the user has the option
of turning the recognized phenotype terms into a
phenotype profile, and performing a profile search, or
to link to any of the entity pages identified in the
annotation. This tool is also available via services.
A comparison between Monarch and existing
resources is warranted. InterMine is an open-source
data warehouse system used for disseminating data
from large, complex biological heterogeneous data
sources (51). InterMine provides sophisticated web
services to support denormalized query and has
been used to improve query and data access to
model organism databases (52) and non-model
organisms (53). InterMine is a federated approach
Inferring causative variants
where individual databases each can adopt and
preprint(45)
first posted
Nov.recently,
3, 2016; doi:Genomiser
http://dx.doi.org/10.1101/055756
The copyright
holder for this preprint data
(which was
not but
The bioRxiv
Exomiser
andonline
more
populate .their
own object-oriented
model,
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
tools (46) make use
of the Monarch platform and
can also align on certain aspects such as having
phenotype matching algorithms to rank putative
genomic data models aligned using the SO.
causative variants using a combined variant and
However, as yet genotype and phenotype modeling
phenotype score. These tools have been used to
is not aligned, and Intermine does not provide
diagnose patients as part of the NIH Undiagnosed
disease matching or phenotypic search. We are
Diseases Project (14) and are the first examples of
currently working with InterMine to achieve
using model organism phenotype data to aid rare
harmonization in this area. Other resources, such as
disease diagnostics.
KaBOB (54) and Bio2RDF (38) semantically integrate
various resources into large triplestores. Bio2RDF
typically retains the source vocabulary of the
Discussion
integrated resources, whereas KaBOB is more
similar to Monarch in that it maps OBO ontologies
The Monarch Initiative provides a system to organize
(18). Other data integration approaches include the
and harmonize the heterogeneous genotypeBioThings API, exemplified by the MyVariant system
phenotype data found across clinical and model and
(32) which aggregates variant data from multiple
non-model organism resources (such as veterinary
sources. We are currently working with the BioThings
species), creating a unified overview of this rich
API developers to integrate these different
landscape of data sources. Some of the challenges
approaches within the Dipper framework. Monarch is
we have had to address are that each resource
unique in that it aims to align both genotypic and
shares data via different mechanisms and uses a
phenotypic modeling across species and sources.
different data model. It is particularly important to
note that each organism annotates phenotypic data
Future directions to different aspects of the genotype – one resource
Future directions include bringing in phenotypic data
might be to a gene, another an allele, another to a
from
specialized
sources
and
databases,
set of alleles, a full genotype or a SNP. This not only
incorporating a wider range of datatypes, and to
makes data integration difficult, but it also means that
extend and improve analytic methods for making
computation
over
the
genotype-phenotype
cross-species inferences. Currently the core of
associations must be done with care. Similar issues
Monarch includes primarily qualitative phenotypes
at MGI have been described (47). In addition, since
described using terms from existing phenotypic
most anatomy, phenotype, and disease ontologies
vocabularies – we are starting to bring in more
describe the biology of one species, it has
quantitative data, from sources such as the MPD (55)
traditionally been quite difficult to ‘map’ across
and GeneNetwork (56), in addition to expression data
species. Some examples are the Human Phenotype
annotated to Uberon in BgeeDb (57). We are also
Ontology (HPO) (30) and the Mouse Anatomy
extending our phenotypic search methods to
Ontology (48). Monarch uses four species-neutral
incorporate
Phenologs,
phenotypic
groupings
ontologies
that
unify
their
species-specific
inferred on the basis of orthologous genes (58,59).
counterparts (as shown in Figure 3): GENO for
Early comparisons suggest that addition of
genotypes (19), UPheno for phenotypes (49),
phenologs to our suite of tools to enable genotypeUBERON for anatomy (23), and MonDO for diseases
phenotype inquiry across species will extend our
(26). Prior efforts to map or integrate species-specific
reach in a synergistic manner (60). We therefore plan
anatomical ontologies (24,50), for example, have
to implement this type of approach into the Monarch
been utilized in the construction of these speciestool suite and website. One of the most important
neutral ontologies. The end result is a translational
realizations we came across in constructing the
platform that allows a unified view of human, model
Monarch platform was the need to better represent
and non-model organism biology.
scientific
evidence
of
genotype-phenotype
associations. We are currently developing a Scientific
Evidence and Provenance Information Ontology
(SEPIO) (61) in collaboration with the Evidence and
Conclusion Ontology consortium (21) and ClinGen
(62) in order to classify associations as
complementary, confirmatory, or contradictory.
SEPIO will also integrate biological assays from the
Ontology of Biomedical Investigations (63). Monarch
has also been collaborating with the US National
Cancer Institute’s Thesaurus (NCIT) team to
integrate cancer phenotypes. Finally, Monarch has
been working in the context of the Global Alliance for
Genomics and Health (GA4GH) to develop a formal
phenotype
exchange
format
(www.phenopackets.org) that can aid phenotypic
data sharing in numerous contexts such as clinical,
model organism research, biodiversity, veterinary,
and evolutionary biology.
bioRxiv of
preprint
first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756. The copyright holder for this preprint (which was not
Glossary
Acronyms
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Acronym
Name
URL
Ref
PMID
NAR DB
Bgee
http://bgee.org/
57
https://thebiogrid.org/
CL
BgeeDb
Biological General
Interaction Datasets.
Cell Ontology
33
PMID:25428363
D470-D478
ClinVar
ClinVar
http://obofoundry.org/ontology/cl.html
64
PMID:27377652
42/D1/D950
https://www.ncbi.nlm.nih.gov/clinvar/
2
PMID:26582918
D862-8
CTD
Clinical Toxicology Database
http://ctdbase.org/
4
PMID:27651457
D914-20
ECO
Evidence and Conclusions Ontology
http://obofoundry.org/ontology/eco.html
21
PMID:25052702
ExAC
Exome Aggregation Consortium
http://exac.broadinstitute.org/
6
PMID:27535533
FlyBase
FlyBase
http://flybase.org
65
PMID:26467478
D786-92
GeneNetwork Gene Network
http://genenetwork.org
56
GENO
Genotype Ontology
https://github.com/monarch-initiative/GENO-ontology/ 19
GO
Gene Ontology
http://geneontology.org
37
PMID:25428369
D1049-56
GWAS
GWAS Catalog
https://www.ebi.ac.uk/gwas/
5
PMID:24316577
D1001-6
HP
http://human-phenotype-ontology.org/
30
PMID:24217912
D966-D974
http://www.kegg.jp/
31
PMID:24214961
D199-205
MGI
Human Phenotype Ontology
Kyoto Encyclopedia of Genes
Genomes
Mouse Genome Informatics
36
PMID:26238262
MonDO
Monarch Merged Disease Ontology
MP
Mammalian Phenotype Ontology
http://www.informatics.jax.org/
https://github.com/monarch-initiative/monarchdisease-ontology/
http://obofoundry.org/ontology/mp.html
20
PMID:26238262
MPD
Mouse phenome database
http://phenome.jax.org/
55
PMID:24243846
MyGene
MyGene
http://mygene.info
32
PMID:27154141
OMIA
Online Mendelian Inheritance in Animals
http://omia.angis.org.au/home/
41
OMIM
Online Mendelian Inheritance in Man
http://omim.org
1
PMID:25428349
OrphaNet
Portal for rare diseases and orphan drugs
http://www.orpha.net
3
PMID:22422702
Panther
PantherDB
http://pantherdb.org
34
PMID:26578592
RO
http://obofoundry.org/ontology/ro.html
17
PMID: 15892874
SO
Relation Ontology
Scientific Evidence
Information Ontology
Sequence Ontology
Uberon
BioGrid
KEGG
SEPIO
Repository
and
for
and
Provenance
26
D825-34
D789-98
D336-42
https://github.com/monarch-initiative/SEPIO-ontology/ 61
http://www.sequenceontology.org/
27
PMID:26229585
Uber-anatomy ontology
http://uberon.org
23
PMID:25009735
Upheno
Unified Phenotype Ontology
https://github.com/obophenotype/upheno/
49
PMID:24358873
WormBase
WormBase
http://wormbase.org
66
PMID:26578572
ZFIN
Zebrafish Information Resource
http://zfin.org
35
PMID:26097180
D774-80
Funding
2014
Jan;42(Database
issue):D1001-6.
Available
from:
This work was funded by NIH #1R24OD011883 and
http://www.ncbi.nlm.nih.gov/pubmed/24316577
6.
Lek M, Karczewski KJ, Minikel E V., Samocha
Wellcome Trust grant [098051]. We are grateful to
KE, Banks E, Fennell T, et al. Analysis of
the
NIH
Undiagnosed
Disease
Program:
protein-coding genetic variation in 60,706
HHSN268201300036C, HHSN268201400093P and
humans. Nature [Internet]. 2016 Aug 17 [cited
to the Phenotype RCN: NSF-DEB-0956049. We are
2016 Oct 21];536(7616):285–91. Available
also grateful to NCI/Leidos #15X143, BD2K
from:
U54HG007990-S2 (Haussler; GA4GH) & BD2K PAhttp://www.nature.com/doifinder/10.1038/natur
e19057
15-144-U01 (Kesselman; FaceBase). JNY, SC, SEL
bioRxiv preprint first posted online Nov. 3, 2016; doi: http://dx.doi.org/10.1101/055756
. The copyright
holder for this preprint
(which
was not path
7.
Strohman
R. Maneuvering
in the
complex
and CJM’s contributions
are
also
supported
by
the
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
from genotype to phenotype. Science (80- )
Director, Office of Science, Office of Basic Energy
[Internet]. 2002/04/27. 2002;296(5568):701–3.
Sciences, of the U.S. Department of Energy under
Available
from:
Contract No. DE- AC02-05CH11231.
http://www.ncbi.nlm.nih.gov/pubmed/11976445
8.
Houle D, Govindaraju DR, Omholt S.
Acknowledgments
Phenomics: the next challenge. Nat Rev Genet
We thank members of the Undiagnosed Disease
[Internet]. 2010/11/19. 2010;11(12):855–66.
Program, the International Mouse Phenotyping
Available
from:
Consortium, the NCI Semantic Infrastructure team,
http://www.ncbi.nlm.nih.gov/pubmed/21085204
and NIF/SciCrunch for their contributions.
9.
McMurry JA, Köhler S, Washington NL, Balhoff
JP, Borromeo C, Brush M, et al. Navigating the
References
Phenotype Frontier: The Monarch Initiative.
1.
Amberger JS, Bocchini CA, Schiettecatte F,
Genetics [Internet]. 2016 Aug [cited 2016 Aug
Scott AF, Hamosh A. OMIM.org: Online
13];203(4):1491–5.
Available
from:
Mendelian Inheritance in Man (OMIM®), an
http://www.ncbi.nlm.nih.gov/pubmed/27516611
online catalog of human genes and genetic
10.
Smedley D, Oellrich A, Köhler S, Ruef B,
disorders. Nucleic Acids Res [Internet]. 2015
Westerfield M, Robinson P, et al. PhenoDigm:
Jan 28 [cited 2015 Apr 1];43(Database
analyzing curated annotations to associate
issue):D789-98.
Available
from:
animal models with human diseases.
http://nar.oxfordjournals.org/content/43/D1/D7
Database (Oxford) [Internet]. 2013 Jan [cited
89.short
2013 Nov 4];2013:bat025. Available from:
2.
Landrum MJ, Lee JM, Benson M, Brown G,
http://www.pubmedcentral.nih.gov/articlerende
Chao C, Chitipiralla S, et al. ClinVar: public
r.fcgi?artid=3649640&tool=pmcentrez&rendert
archive of interpretations of clinically relevant
ype=abstract
variants. Nucleic Acids Res [Internet]. 2016
11.
Washington NL, Haendel M a, Mungall CJ,
Jan
4;44(D1):D862-8.
Available
from:
Ashburner M, Westerfield M, Lewis SE.
http://www.ncbi.nlm.nih.gov/pubmed/26582918
Linking human diseases to animal models
3.
Rath A, Olry A, Dhombres F, Brandt MM,
using ontology-based phenotype annotation.
Urbero B, Ayme S. Representation of rare
PLoS Biol [Internet]. 2009 Nov [cited 2012 Jul
diseases in health information systems: the
30];7(11):e1000247.
Available
from:
Orphanet approach to serve a wide range of
http://www.pubmedcentral.nih.gov/articlerende
end users. Hum Mutat [Internet]. 2012 May
r.fcgi?artid=2774506&tool=pmcentrez&rendert
[cited 2015 Feb 23];33(5):803–8. Available
ype=abstract
from:
12.
Robinson PN, Kohler S, Oellrich A, Genetics
http://www.ncbi.nlm.nih.gov/pubmed/22422702
SM, Wang K, Mungall CJ, et al. Improved
4.
Davis AP, Grondin CJ, Johnson RJ, Sciaky D,
exome prioritization of disease genes through
King BL, McMorran R, et al. The Comparative
cross-species
phenotype
comparison.
Toxicogenomics Database: update 2017.
Genome Res [Internet]. 2014 Feb [cited 2015
Nucleic Acids Res [Internet]. 2016 Sep 19
Feb
18];24(2):340–8.
Available
from:
[cited 2016 Oct 24]; Available from:
http://www.pubmedcentral.nih.gov/articlerende
http://www.ncbi.nlm.nih.gov/pubmed/27651457
r.fcgi?artid=3912424&tool=pmcentrez&rendert
5.
Welter D, MacArthur J, Morales J, Burdett T,
ype=abstract
Hall P, Junkins H, et al. The NHGRI GWAS
13.
Smedley D, Schubach M, Jacobsen JOBOB,
Catalog, a curated resource of SNP-trait
Köhler S, Zemojtel T, Spielmann M, et al. A
associations. Nucleic Acids Res [Internet].
14.
15.
16.
17.
18.
19.
20.
Whole-Genome Analysis Framework for
r.fcgi?artid=4378007&tool=pmcentrez&rendert
Effective
Identification
of
Pathogenic
ype=abstract
Regulatory Variants in Mendelian Disease. Am
21.
Chibucos MC, Mungall CJ, Balakrishnan R,
J Hum Genet [Internet]. 2016 Sep 3 [cited
Christie KR, Huntley RP, White O, et al.
2016 Aug 27];99(3):595–606. Available from:
Standardized description of scientific evidence
http://dx.doi.org/10.1016/j.ajhg.2016.07.005
using the Evidence Ontology ( ECO ).
Bone WP, Washington NL, Buske OJ, Adams
Database [Internet]. 2014 Jul 22 [cited 2014
DR, Davis J, Draper D, et al. Computational
Jul
22];2014(0):1–11.
Available
from:
evaluation of exome sequence data using
http://database.oxfordjournals.org/content/201
human and model organism phenotypes
4/bau075.full
improves diagnostic efficiency. Genet Med
22.
Mungall CJ, Washington NL, Nguyen-Xuan J,
bioRxiv
preprint first
posted
online
Nov.
3, 2016;
doi: http://dx.doi.org/10.1101/055756
. The copyright
holder for
preprint (which
[Internet].
2015
Nov
12
[cited
2015
Nov
16];
Condit
C, Smedley
D,thisKöhler
S, etwas
al.not
Use of
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Available
from:
model organism and disease databases to
http://www.ncbi.nlm.nih.gov/pubmed/26562225
support matchmaking for human disease gene
Zemojtel T, Kohler S, Mackenroth L, Jager M,
discovery. Hum Mutat [Internet]. 2015 Oct
Hecht J, Krawitz P, et al. Effective diagnosis of
[cited 2015 Dec 19];36(10):979–84. Available
genetic disease by computational phenotype
from:
analysis of the disease-associated genome.
http://www.ncbi.nlm.nih.gov/pubmed/26269093
Sci Transl Med [Internet]. 2014 Sep 3 [cited
23.
Haendel MA, Balhoff JP, Bastian FB,
2014
Sep
4];6(252):252ra123-252ra123.
Blackburn DC, Blake JA, Bradford Y, et al.
Available
from:
Unification of multi-species vertebrate anatomy
http://www.ncbi.nlm.nih.gov/pubmed/25186178
ontologies for comparative biology in Uberon.
Wright CF, Fitzgerald TW, Jones WD, Clayton
J Biomed Semantics [Internet]. 2014 Jan [cited
S, McRae JF, van Kogelenberg M, et al.
2015 Apr 30];5(1):21. Available from:
Genetic diagnosis of developmental disorders
http://www.jbiomedsem.com/content/5/1/21
in the DDD study: a scalable analysis of
24.
Mungall C, Gkoutos G, Smith C, Haendel M,
genome-wide research data. Lancet (London,
Lewis S, Ashburner M. Integrating phenotype
England) [Internet]. 2015 Apr 4 [cited 2016 Jul
ontologies across multiple species. Genome
4];385(9975):1305–14.
Available
from:
Biol [Internet]. 2010;11(1):R2. Available from:
http://www.ncbi.nlm.nih.gov/pubmed/25529582
http://genomebiology.com/2010/11/1/R2
Smith B, Ceusters W, Klagges B, Köhler J,
25.
Köhler S, Doelken SC, Ruef BJ, Bauer S,
Kumar A, Lomax J, et al. Relations in
Washington N, Westerfield M, et al.
biomedical ontologies. Genome Biol [Internet].
Construction and accessibility of a cross2005 [cited 2016 Oct 27];6(5):R46. Available
species phenotype ontology along with gene
from:
annotations
for
biomedical
research.
http://www.ncbi.nlm.nih.gov/pubmed/15892874
F1000Research [Internet]. 2013 Feb 1 [cited
Smith B, Ashburner M, Rosse C, Bard J, Bug
2013
Mar
6];2.
Available
from:
W, Ceusters W, et al. The OBO Foundry:
http://f1000research.com/articles/2-30/v1
coordinated evolution of ontologies to support
26.
Mungall CJ, Koehler S, Robinson P, Holmes I,
biomedical data integration. Nat Biotechnol.
Haendel M. k-BOOM: A Bayesian approach to
2007 Nov;25(11):1251–5.
ontology structure inference, with applications
Brush M, Mungall CJ, Washington NL,
in
disease
ontology
construction.
In:
Haendel MA. What’s in a Genotype?: An
Phenotype Day, ISMB [Internet]. 2016 [cited
Ontological Characterization for Integration of
2016
Oct
27].
Available
from:
Genetic Variation Data. In: International
http://phenoday2016.bio-lark.org/pdf/2.pdf
Conference on Biomedical Ontology 2013
27.
Mungall CJ, Batchelor C, Eilbeck K. Evolution
[Internet]. 2013. Available from: http://ceurof the Sequence Ontology terms and
ws.org/Vol-1060/icbo2013_submission_60.pdf
relationships. J Biomed Inform [Internet]. 2011
Smith CL, Eppig JT. Expanding the
Feb [cited 2015 May 3];44(1):87–93. Available
mammalian phenotype ontology to support
from:
automated exchange of high throughput
http://www.pubmedcentral.nih.gov/articlerende
mouse phenotyping data generated by larger.fcgi?artid=3052763&tool=pmcentrez&rendert
scale mouse knockout screens. J Biomed
ype=abstract
Semantics [Internet]. 2015 Mar 25 [cited 2015
28.
Bolleman JT, Mungall CJ, Strozzi F, Baran J,
Mar
26];6(1):11.
Available
from:
Dumontier M, Bonnal RJP, et al. FALDO: a
http://www.pubmedcentral.nih.gov/articlerende
semantic standard for describing the location
29.
30.
31.
32.
33.
34.
35.
of nucleotide and protein feature annotation. J
http://www.ncbi.nlm.nih.gov/pubmed/26097180
Biomed Semantics [Internet]. 2016 Dec 13
36.
Eppig JT, Richardson JE, Kadin JA, Ringwald
[cited 2016 Aug 19];7(1):39. Available from:
M, Blake JA, Bult CJ. Mouse Genome
http://jbiomedsem.biomedcentral.com/articles/
Informatics (MGI): reflecting on 25 years.
10.1186/s13326-016-0067-z
Mamm Genome [Internet]. 2015 Aug [cited
McMurry J, Muilu J, Dumontier M, Hermjakob
2016 Oct 23];26(7–8):272–84. Available from:
H, Conte N, Gormanns P, et al. 10 Simple
http://www.ncbi.nlm.nih.gov/pubmed/26238262
rules for design, provision, and reuse of
37.
The Gene Ontology Consortium. Gene
identifiers for web-based life science data.
Ontology Consortium: going forward. Nucleic
2015 Oct 2 [cited 2016 May 31]; Available
Acids Res [Internet]. 2014 Nov 26 [cited 2015
from: http://zenodo.org/record/31765
Jan
5];43(Database
issue):D1049-56.
bioRxiv
preprint
posted online
Nov.
3, 2016; doi:
http://dx.doi.org/10.1101/055756
. The copyright holder for this preprint (which was not from:
Köhler
S, first
Doelken
SC,
Mungall
CJ,
Bauer
S,
Available
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Firth H V, Bailleul-Forestier I, et al. The Human
http://www.ncbi.nlm.nih.gov/pubmed/25428369
Phenotype Ontology project: linking molecular
38.
Dumontier M, Callahan A, Cruz-Toledo J,
biology and disease through phenotype data.
Ansell P, Emonet V, Belleau F, et al. Bio2RDF
Nucleic Acids Res [Internet]. 2014 Jan [cited
release 3: a larger connected network of linked
2015 Mar 11];42(Database issue):D966-74.
data for the life sciences. Proceedings of the
Available
from:
2014 International Conference on Posters &
http://www.pubmedcentral.nih.gov/articlerende
Demonstrations Track - Volume 1272. CEURr.fcgi?artid=3965098&tool=pmcentrez&rendert
WS.org; 2014. p. 401–4.
ype=abstract
39.
Groza T, Köhler S, Doelken S, Collier N,
Kanehisa M, Goto S, Sato Y, Kawashima M,
Oellrich A, Smedley D, et al. Automatic
Furumichi M, Tanabe M. Data, information,
concept recognition using the Human
knowledge and principle: back to metabolism
Phenotype Ontology reference and test suite
in KEGG. Nucleic Acids Res [Internet]. 2014
corpora. Database (Oxford) [Internet]. 2015
Jan [cited 2016 Oct 27];42(Database
Jan [cited 2015 Mar 3];2015. Available from:
issue):D199-205.
Available
from:
http://www.pubmedcentral.nih.gov/articlerende
http://www.ncbi.nlm.nih.gov/pubmed/24214961
r.fcgi?artid=4343077&tool=pmcentrez&rendert
Xin J, Mark A, Afrasiabi C, Tsueng G, Juchler
ype=abstract
M, Gopal N, et al. High-performance web
40.
Groza T, Köhler S, Moldenhauer D, Vasilevsky
services for querying gene and variant
N, Baynam G, Zemojtel T, et al. The Human
annotation. Genome Biol [Internet]. 2016 May
Phenotype Ontology: Semantic Unification of
6 [cited 2016 May 8];17(1):91. Available from:
Common and Rare Disease. Am J Hum Genet
http://www.pubmedcentral.nih.gov/articlerende
[Internet]. 2015 Jun 24 [cited 2015 Jun
r.fcgi?artid=4858870&tool=pmcentrez&rendert
30];97(1):111–24.
Available
from:
ype=abstract
http://www.ncbi.nlm.nih.gov/pubmed/26119816
Chatr-Aryamontri A, Breitkreutz B-J, Oughtred
41.
Faculty of Veterinary Science U of S
R, Boucher L, Heinicke S, Chen D, et al. The
http://omia. angis. org. au. Online Mendelian
BioGRID interaction database: 2015 update.
Inheritance in Animals.
Nucleic Acids Res [Internet]. 2015 Jan [cited
42.
Washington N, Haendel M, Köhler S. How
2016 Oct 24];43(Database issue):D470-8.
good is your phenotyping? Methods for quality
Available
from:
assessment. In: Phenoday2014Bio-LarkOrg
http://www.ncbi.nlm.nih.gov/pubmed/25428363
[Internet]. 2013. p. 1–4. Available from:
Mi H, Poudel S, Muruganujan A, Casagrande
http://phenoday2014.bio-lark.org/pdf/6.pdf
JT, Thomas PD. PANTHER version 10:
43.
Girdea M, Dumitriu S, Fiume M, Bowdin S,
expanded protein families and functions, and
Boycott KM, Chénier S, et al. PhenoTips:
analysis tools. Nucleic Acids Res [Internet].
patient phenotyping software for clinical and
2016 Jan 4 [cited 2016 Oct 27];44(D1):D336research use. Hum Mutat [Internet]. 2013 Aug
42.
Available
from:
[cited 2015 Mar 30];34(8):1057–65. Available
http://www.ncbi.nlm.nih.gov/pubmed/26578592
from:
Ruzicka L, Bradford YM, Frazer K, Howe DG,
http://www.ncbi.nlm.nih.gov/pubmed/23636887
Paddock H, Ramachandran S, et al. ZFIN, The
44.
Bostock M, Ogievetsky V, Heer J. D3: Datazebrafish model organism database: Updates
Driven Documents. IEEE Trans Vis Comput
and new directions. Genesis [Internet]. 2015
Graph [Internet]. 2011 Dec 1 [cited 2015 Feb
Aug [cited 2016 Oct 23];53(8):498–509.
12];17(12):2301–9.
Available
from:
Available
from:
http://ieeexplore.ieee.org/articleDetails.jsp?arn
45.
46.
47.
48.
49.
50.
51.
umber=6064996
Available
from:
Robinson PN, Kohler S, Oellrich A, Genetics
http://www.ncbi.nlm.nih.gov/pubmed/24753429
SM, Wang K, Mungall CJ, et al. Improved
52.
Lyne R, Sullivan J, Butano D, Contrino S,
exome prioritization of disease genes through
Heimbach J, Hu F, et al. Cross-organism
cross-species
phenotype
comparison.
analysis using InterMine. Genesis [Internet].
Genome Res [Internet]. 2014 Feb [cited 2014
2015 Aug [cited 2016 Oct 21];53(8):547–60.
Jul
12];24(2):340–8.
Available
from:
Available
from:
http://www.pubmedcentral.nih.gov/articlerende
http://www.ncbi.nlm.nih.gov/pubmed/26097192
r.fcgi?artid=3912424&tool=pmcentrez&rendert
53.
Elsik CG, Tayal A, Diesh CM, Unni DR, Emery
ype=abstract
ML, Nguyen HN, et al. Hymenoptera Genome
Smedley D, Schubach M, Jacobsen JOB,
Database: integrating genome annotations in
bioRxiv
preprint
posted online
Nov.
3, 2016; doi:M,
http://dx.doi.org/10.1101/055756
. The copyright holder for this
preprint (which
was not Res
Köhler
S, first
Zemojtel
T,
Spielmann
et
al.
A
HymenopteraMine.
Nucleic
Acids
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Whole-Genome Analysis Framework for
[Internet]. 2016 Jan 4 [cited 2016 Oct
Effective
Identification
of
Pathogenic
21];44(D1):D793-800.
Available
from:
Regulatory Variants in Mendelian Disease. Am
http://www.ncbi.nlm.nih.gov/pubmed/26578564
J Hum Genet [Internet]. 2016 Sep [cited 2016
54.
Livingston KM, Bada M, Baumgartner WA,
Sep 22];99(3):595–606. Available from:
Hunter LE, Galperin M, Rigden D, et al.
http://linkinghub.elsevier.com/retrieve/pii/S000
KaBOB: ontology-based semantic integration
2929716302786
of biomedical databases. BMC Bioinformatics
Bello SM, Smith CL, Eppig JT. Allele,
[Internet]. 2015 Dec 23 [cited 2016 Oct
phenotype and disease data at Mouse
21];16(1):126.
Available
from:
Genome Informatics: improving access and
http://www.biomedcentral.com/1471analysis. Mamm Genome [Internet]. 2015 Aug
2105/16/126
[cited 2016 Aug 19];26(7–8):285–94. Available
55.
Grubb SC, Bult CJ, Bogue MA. Mouse
from:
phenome database. Nucleic Acids Res
http://www.ncbi.nlm.nih.gov/pubmed/26162703
[Internet]. 2014 Jan [cited 2015 May
Hayamizu TF, Baldock RA, Ringwald M.
3];42(Database issue):D825-34. Available
Mouse anatomy ontologies: enhancements
from:
and tools for exploring and integrating
http://www.pubmedcentral.nih.gov/articlerende
biomedical data. Mamm Genome [Internet].
r.fcgi?artid=3965087&tool=pmcentrez&rendert
2015 Oct [cited 2016 Aug 19];26(9–10):422–
ype=abstract
30.
Available
from:
56.
Mulligan MK, Mozhui K, Prins P, Williams
http://www.ncbi.nlm.nih.gov/pubmed/26208972
Summary RW. GeneNetwork – A Toolbox for
Köhler S, Doelken SC, Ruef BJ, Bauer S,
Systems Genetics. Syst Genet Methods Mol
Washington N, Westerfield M, et al.
Biol. 2016;9.
Construction and accessibility of a cross57.
Bastian F, Parmentier G, Roux J, Moretti S,
species phenotype ontology along with gene
Laudet V, Robinson-Rechavi M. Bgee:
annotations
for
biomedical
research.
Integrating and Comparing Heterogeneous
F1000Research [Internet]. 2013 Jan [cited
Transcriptome Data Among Species. In: Data
2014
Aug
19];2:30.
Available
from:
Integration in the Life Sciences. 2008. p. 124–
http://www.pubmedcentral.nih.gov/articlerende
31.
r.fcgi?artid=3799545&tool=pmcentrez&rendert
58.
McGary KL, Park TJ, Woods JO, Cha HJ,
ype=abstract
Wallingford JB, Marcotte EM. Systematic
Hayamizu TF, de Coronado S, Fragoso G,
discovery of nonobvious human disease
Sioutos N, Kadin JA, Ringwald M. The mousemodels through orthologous phenotypes. Proc
human anatomy ontology mapping project.
Natl Acad Sci U S A [Internet]. 2010 Apr 6
Database (Oxford) [Internet]. 2012 Jan [cited
[cited 2015 May 1];107(14):6544–9. Available
2015 Apr 12];2012:bar066. Available from:
from:
http://www.pubmedcentral.nih.gov/articlerende
http://www.pubmedcentral.nih.gov/articlerende
r.fcgi?artid=3308156&tool=pmcentrez&rendert
r.fcgi?artid=2851946&tool=pmcentrez&rendert
ype=abstract
ype=abstract
Kalderimis A, Lyne R, Butano D, Contrino S,
59.
Woods JO, Singh-Blom UM, Laurent JM,
Lyne M, Heimbach J, et al. InterMine:
McGary KL, Marcotte EM. Prediction of geneextensive web services for modern biology.
phenotype associations in humans, mice, and
Nucleic Acids Res [Internet]. 2014 Jul [cited
plants using phenologs. BMC Bioinformatics
2016 Oct 21];42(Web Server issue):W468-72.
[Internet]. 2013 Jan [cited 2015 May
60.
61.
62.
63.
64.
65.
66.
20];14:203.
Available
from:
http://www.pubmedcentral.nih.gov/articlerende
r.fcgi?artid=3704650&tool=pmcentrez&rendert
ype=abstract
Laraway B. Comparative analysis of semantic
similarity and gene orthology tools for
identification of gene candidates for human
diseases [Internet]. Oregon Health & Science
University;
2015.
Available
from:
http://digitalcommons.ohsu.edu/etd/3741
Brush M, Shefchek K, Haendel MA. SEPIO: A
bioRxiv
preprint first
posted online
doi: http://dx.doi.org/10.1101/055756
. The copyright holder for this preprint (which was not
Semantic
Model
for Nov.
the3, 2016;
Integration
and
peer-reviewed) is the author/funder. It is made available under a CC-BY 4.0 International license.
Analysis
of
Scientific
Evidence.
In:
International Conference on Biomedical
Ontology and BioCreative (ICBO BioCreative
2016) [Internet]. Corvallis, Oregon; 2016.
Available
from:
http://icbo.cgrb.oregonstate.edu/
Rehm HL, Berg JS, Brooks LD, Bustamante
CD, Evans JP, Landrum MJ, et al. ClinGen-the Clinical Genome Resource. N Engl J Med
[Internet]. 2015 Jun 4 [cited 2016 Mar
22];372(23):2235–42.
Available
from:
http://www.pubmedcentral.nih.gov/articlerende
r.fcgi?artid=4474187&tool=pmcentrez&rendert
ype=abstract
Bandrowski A, Brinkman R, Brochhausen M,
Brush MH, Bug B, Chibucos MC, et al. The
Ontology for Biomedical Investigations. PLoS
One
[Internet].
2016;11(4):e0154556.
Available
from:
http://www.ncbi.nlm.nih.gov/pubmed/27128319
Diehl AD, Meehan TF, Bradford YM, Brush
MH, Dahdul WM, Dougall DS, et al. The Cell
Ontology
2016:
enhanced
content,
modularization, and ontology interoperability. J
Biomed Semantics. 2016 Jul 4;7(1):44.
Available
from:
https://www.ncbi.nlm.nih.gov/pubmed/2737765
2
Attrill H, Falls K, Goodman JL, Millburn GH,
Antonazzo G, Rey AJ, et al. FlyBase:
establishing a Gene Group resource for
Drosophila melanogaster. Nucleic Acids Res.
2016 Jan 4;44(D1):D786-92. Available from:
https://www.ncbi.nlm.nih.gov/pubmed/?term=P
MID%3A26467478
Howe KL, Bolt BJ, Cain S, Chan J, Chen WJ,
Davis P, et al. WormBase 2016: expanding to
enable helminth genomic research. Nucleic
Acids Res. 2016 Jan 4;44(D1):D774-80.
Available
from:
https://www.ncbi.nlm.nih.gov/pubmed/2657857
2