Example Mismatches in One Reaction Glycolysis

Identifying and Resolving Inconsistencies in Biological Pathway Resources
Lucy L
†
Wang ,
John H Gennari, Neil F Abernethy
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA
Overview
Typology of Mismatches
Existence
• Biological pathways are useful tools for studying genetic and molecular interactions
Annotation — different entity string names and/or identifiers
• Integrating knowledge from multiple pathway resources allows us to take advantage of the
strengths of different resources and gives us access to more complementary data, helping to
improve our analysis and understanding of biology
Existence — missing or extraneous physical entities, reactions, or relationships
• Inconsistencies exist between popular pathway knowledge bases, making integration difficult
• Although standards in pathway exchange (BioPAX, SBML etc) exist, there are still differences
in knowledge representation and content
Reaction semantics — different representation of participants, direction, and stoichiometry
Granularity — entities or processes represented in different degrees of detail
Is H+ part of the
equation?
H+
Assertion — a resource explicitly contradicts a statement in another resource
Level of evidence — different external citations are used to support a statement
Glycolysis Pathway in Four Resources
Semantics
LEFT
RIGHT
RIGHT
LEFT
Left or right?
Reactant or product?
Example Mismatches in One Reaction
Annotation
ATP
ChEBI:30616
ChEBI:15422
pyruvate
ChEBI:15361
ChEBI:32816
Which is the correct
identifier?
Assertion
2 ATP + Pyruvate
OR
1 ATP + Pyruvate
Which is correct?
Entities related to the reaction by the BioPAX left property are red, and entities related by the BioPAX right property are green
2
3
4
Granularity
Evidence
1
Are citations…
References & Acknowledgements
1.
2.
3.
4.
P. Romero, J. Wagg, M. Green, D. Kaiser, M. Krummenacker, and P. Karp. Computational prediction of human metabolic pathways from the
complete human genome. Genome Biology, 6(R2):1–17,2004.
M. Kanehisa and S. Goto. Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res, 28:27–30, 2000.
P. Thomas, M. Campbell, and A. Kejariwal et al. Panther: a library of protein families and subfamilies indexed by function. Genome Res, 13:2129–
2141, 2003.
D. Croft, A. Mundo, and R. Haw et al. The reactome pathway knowledgebase. Nucleic Acids Res, 42(Database issue):D472–477, 2013.
This research was funded by the NLM under training grant T15LM007442. The content is solely the responsibility of the authors and does not necessarily
represent the official views of the National Institute of Health.
Contemporary or
historical?
Few or many?
One reaction or two?
*author example
† Corresponding
!"
author: [email protected]
Future Work
Matches from HumanCyc proteins to Reactome proteins
HumanCyc
Name match No name match
• Identify inconsistencies in exemplar
Identifier match
1264
759
resources. We have begun by identifying
No identifier match
55
659
annotation mismatches between two
Total
1319
1418
resources: HumanCyc and Reactome.
• Perform computational alignment of portions of pathway resources using the
typology of mismatches as guidance
Total
2023
714
2737
• Use known mismatches to identify areas of uncertainty in existing resources and
guide integration of content for pathway analysis applications