University of Pisa, Italy June 12, 2007 NETTAB 2007 - A Semantic Web for Bioinformatics Tutorial T5 The Unified Medical Language System (UMLS) and the Semantic Web Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Outline Information integration in biomedicine z z Some issues: naming, normalization, mapping Semantic Web perspective Terminology integration in biomedicine Unified Medical Language System Some differences between UMLS and SW Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 2 Information integration in biomedicine Some issues: naming, normalization, mapping X Naming Many biomedical entities have several names (synonymy) z z z z Drug names Gene names Disease names … A given name may refer to several different entities (polysemy) z z Nail (body part) Nail (medical device) Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 4 Brand names for paracetamol (acetaminophen) http://en.wikipedia.org/wiki/List_of_paracetamol_brand_names Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 5 Names for dystrophin http://www.ncbi.nlm.nih.gov/sites/entrez Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 6 Names for renal cell carcinoma http://www.clininfo.co.uk/clue5/clue.htm Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 7 Entity recognition Identifying biomedical entities in text z z z Names entity recognition Tagging “mentions” Semantic annotation Supported by terminology z z Collects the names used in the domain Often incompletely Example: BioCreative z z 1A – Gene name identification 2GM – Gene mention tagging Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 8 Y Normalization Biomedical entities are identified by unique identifiers in various terminology systems Resolve names into identifiers (in a given namespace) Supported (in part) by terminology resources Example: BioCreative z 1B and 2GN – Gene Normalization Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 9 Identifier for paracetamol (acetaminophen) Master Drug Data Base. Medi-Span 5005 Acetaminophen FDA National Drug Code Directory 50612 PARACETAMOL FDA Structured Product Labels First DataBank NDDF Plus SNOMED Clinical Terms SNOMED Clinical Terms VA National Drug File 362O9ITL9D ACETAMINOPHEN 001605 Acetaminophen 90332006 Acetaminophen (product) 387517004 Acetaminophen (substance) 4017513 ACETAMINOPHEN Source: RxNorm database (5/3/2007) Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 10 Identifier for dystrophin http://www.ncbi.nlm.nih.gov/sites/entrez Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 11 Identifier for renal cell carcinoma http://www.clininfo.co.uk/clue5/clue.htm Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 12 Z Mapping / Integration Identify equivalent entities across systems (across namespaces) z z z Shared identifiers Existing mappings (e.g., SNOMED CT to ICD-9-CM) Ontology alignment techniques (lexical + structural) Align equivalent entities z z Pairwise: mapping More broadly: integration Forms the basis for information integration in the Semantic Web (mashups) Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 13 Identifier for paracetamol (acetaminophen) Master Drug Data Base. Medi-Span 5005 Acetaminophen FDA National Drug Code Directory 50612 PARACETAMOL FDA Structured Product Labels First DataBank NDDF Plus SNOMED Clinical Terms SNOMED Clinical Terms VA National Drug File RxNorm 362O9ITL9D ACETAMINOPHEN 001605 Acetaminophen 90332006 Acetaminophen (product) 387517004 Acetaminophen (substance) 4017513 ACETAMINOPHEN 161 Acetaminophen Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 14 Identifier for dystrophin http://www.ncbi.nlm.nih.gov/sites/entrez Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 15 Identifier for renal cell carcinoma 645875019 379798014 379801015 379800019 379797016 379803017 379802010 http://www.clininfo.co.uk/clue5/clue.htm Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 16 Information integration in biomedicine Semantic Web perspective HCLS mashup PDSPki Gene Ontology NeuronDB Reactome BAMS Antibodies NC Annotations Entrez Gene Allen Brain Atlas MeSH Mammalian Phenotype SWAN AlzGene BrainPharm PubChem Homologene Publications http://esw.w3.org/topic/HCLS/HCLSIG_DemoHomePage_HCLSIG_Demo Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 18 Shared identifiers Example GO Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 19 HCLS mashup PDSPki GO Reactome Genes/proteins Interactions Cellular location Processes (GO) Molecular function Cell components Biological process Annotation gene PubMedID Antibodies Genes Antibodies NC Annotations Genes/Proteins Processes Cells (maybe) PubMed ID PubMedID Hypothesis Questions Evidence Genes SWAN Proteins Chemicals Neurotransmitters Entrez Gene Genes Protein GO PubMedID Interaction (g/p) Chromosome C. location Allen Brain Atlas Genes Brain images Gross anatomy -> neuroanatomy BAMS Protein Neuroanatomy Cells Metabolites (channels) PubMedID MeSH Drugs Anatomy Phenotypes Compounds Chemicals PubMedID PubChem Genes Phenotypes Disease PubMedID Mammalian Genes Species Phenotype Gene Orthologies Polymorphism Proofs Population Alz Diagnosis NeuronDB Protein (channels/receptors) Neurotransmitters Neuroanatomy Cell Compartments Currents BrainPharm Drug Drug effect Pathological agent Phenotype Receptors Channels Cell types PubMedID Disease Name Structure Properties MeSH term PubChem Homologene AlzGene Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 20 HCLS mashup PDSPki GO Reactome Genes/proteins Interactions Cellular location Processes (GO) Molecular function Cell components Biological process Annotation gene PubMedID Antibodies Genes Antibodies NC Annotations Genes/Proteins Processes Cells (maybe) PubMed ID PubMedID Hypothesis Questions Evidence Genes SWAN Proteins Chemicals Neurotransmitters Entrez Gene Genes Protein GO PubMedID Interaction (g/p) Chromosome C. location Allen Brain Atlas Genes Brain images Gross anatomy -> neuroanatomy BAMS Protein Neuroanatomy Cells Metabolites (channels) PubMedID MeSH Drugs Anatomy Phenotypes Compounds Chemicals PubMedID PubChem Genes Phenotypes Disease PubMedID Mammalian Genes Species Phenotype Gene Orthologies Polymorphism Proofs Population Alz Diagnosis NeuronDB Protein (channels/receptors) Neurotransmitters Neuroanatomy Cell Compartments Currents BrainPharm Drug Drug effect Pathological agent Phenotype Receptors Channels Cell types PubMedID Disease Name Structure Properties MeSH term PubChem Homologene AlzGene Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 21 HCLS mashup PDSPki GO Reactome Genes/proteins Interactions Cellular location Processes (GO) Molecular function Cell components Biological process Annotation gene PubMedID Antibodies Genes Antibodies NC Annotations Genes/Proteins Processes Cells (maybe) PubMed ID PubMedID Hypothesis Questions Evidence Genes SWAN Proteins Chemicals Neurotransmitters Entrez Gene Genes Protein GO PubMedID Interaction (g/p) Chromosome C. location Allen Brain Atlas Genes Brain images Gross anatomy -> neuroanatomy BAMS Protein Neuroanatomy Cells Metabolites (channels) PubMedID MeSH Drugs Anatomy Phenotypes Compounds Chemicals PubMedID PubChem Genes Phenotypes Disease PubMedID Mammalian Genes Species Phenotype Gene Orthologies Polymorphism Proofs Population Alz Diagnosis NeuronDB Protein (channels/receptors) Neurotransmitters Neuroanatomy Cell Compartments Currents BrainPharm Drug Drug effect Pathological agent Phenotype Receptors Channels Cell types PubMedID Disease Name Structure Properties MeSH term PubChem Homologene AlzGene Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 22 HCLS mashup http://esw.w3.org/topic/HCLS/HCLSIG_DemoHomePage_HCLSIG_Demo Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 23 From glycosyltransferase to congenital muscular dystrophy glycosyltransferase GO:0016757 isa GO:0008194 GO:0016758 GO:0008375 acetylglucosaminyltransferase GO:0008375 acetylglucosaminyltransferase MIM:608840 Muscular dystrophy, congenital, type 1D has_molecular_function LARGE EG:9215 has_associated_phenotype Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 24 Terminology integration in biomedicine Unified Medical Language System Motivation Started in 1986 National Library of Medicine “Long-term R&D project” «[…] the UMLS project is an effort to overcome two significant barriers to effective retrieval of machine-readable information. • The first is the variety of ways the same concepts are expressed in different machine-readable sources and by different people. • The second is the distribution of useful information among many disparate databases and systems.» Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 26 Unified Medical Language System SPECIALIST Lexicon z z 200,000 lexical items Part of speech and variant information Lexical resources Metathesaurus z z z 5M names from over 100 terminologies 1M concepts 16M relations Semantic Network z z 135 high-level categories 7000 relations among them Terminological resources Ontological resources Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 27 Addison’s disease Example Addison’s disease in medical vocabularies Synonyms z z z z z z z z Addisonian syndrome Bronzed disease Addison melanoderma Asthenia pigmentosa Primary adrenal deficiency Primary adrenal insufficiency Primary adrenocortical insufficiency Chronic adrenocortical insufficiency eponym symptoms clinical variants Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 29 Organize terms Synonymous terms clustered into a concept Preferred term Unique identifier (CUI) Addison Disease Primary hypoadrenalism Primary adrenocortical insufficiency Addison's disease (disorder) MeSH MedDRA ICD-10 SNOMED CT D000224 10036696 E27.1 363732003 C0001403 Addison's disease Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 30 Metathesaurus Concepts Concept z Set of synonymous concept names Term z (~ 5.5 M) SUI Distinct concept name Atom z (~ 4.9 M) LUI Set of normalized names String z (~ 1.4 M) CUI (~ 6.8 M) AUI Concept name in a given source (2007AA) A0000001 headache (source 1) A0000002 headache (source 2) S0000001 A0000003 Headache (source 1) A0000004 Headache (source 2) S0000002 L0000001 A0000005 Cephalgia (source 1) S0000003 L0000002 C0000001 Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 31 Addison’s Disease: Concept Disease or Syndrome Addison’s Disease SNOMED MeSH AOD Read Codes … C0001403 ADRENAL INSUFFICIENCY (ADDISON'S DISEASE) ADRENOCORTICAL INSUFFICIENCY, PRIMARY FAILURE Addison melanoderma Melasma addisonii Primary adrenal deficiency Asthenia pigmentosa Bronzed disease Insufficiency, adrenal primary Primary adrenocortical insufficiency Addison's, disease Maladie d'Addison - French Addison-Krankheit - German Morbo di Addison - Italian Doença de Addison - Portuguese АДДИСОНОВА БОЛЕЗНЬ - Russian アジソン病 - Japanese A disease characterized by hypotension, weight loss, anorexia, weakness, and sometimes a bronze-like melanotic hyperpigmentation of the skin. It is due to tuberculosis- or autoimmune-induced disease (hypofunction) of the adrenal glands that results in deficiency of aldosterone and cortisol. In the absence of replacement therapy, it is usually fatal. Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 32 Metathesaurus Evolution over time Concepts never die (in principle) z CUIs are permanent identifiers What happens when they do die (in reality)? z z Concepts can merge or split Resulting in new concepts and deletions Addison's disease, NOS C0271735 Addison's disease C0001403 1992 1993 1994 1995 1996 1997 1998 1999 … 2007 Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 33 SNOMED International Diseases/Diagnoses Diseases of the endocrine system Diseases of the Adrenal Glands Addison’s Disease MeSH Diseases Endocrine Diseases Adrenal Gland Diseases Adrenal Gland Hypofunction Addison’s Disease AOD Endocrine disorder Adrenal disorder Adrenal cortical disorder Adrenal cortical hypofunction Addison’s Disease Read Codes Endocrine disorder Disorder of adrenal gland Hypoadrenalism Adrenal Hypofunction Corticoadrenal insufficiency Addison’s Disease ICD-10 Disorders of other endocrine gland Other disorders of adrenal gland Primary adrenocortical insufficiency Organize concepts Inter-concept relationships: hierarchies from the source vocabularies Redundancy: multiple paths One graph instead of multiple trees (multiple inheritance) A C B D E H E B F H D E G H A B D C E G F H Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 39 organize concepts Endocrine Diseases Adrenal Gland Diseases Adrenal Cortex Diseases SNOMED MeSH AOD Read Codes Hypoadrenalism Adrenal Gland Hypofunction UMLS Adrenal cortical hypofunction Addison’s Disease Endocrine System Abdominal organ Diseases Endocrine Glands Endocrine Diseases Adrenal Glands Adrenal Dysfunction Adrenal Gland Diseases Adrenal Cortex Diseases Disorders of other endocrine gland Adrenal Cortex Dysfunction Adrenal Cortex Hypoadrenalism Other disorders of adrenal gland Adrenal Gland Hypofunction Adrenal cortical hypofunction Secondary hypocortisolism Addison’s Disease Addison’s disease due to autoimmunity Semantic Types Anatomical Structure Fully Formed Anatomical Structure Embryonic Structure Body Part, Organ or Organ Component Disease or Syndrome Pharmacologic Substance Population Group Semantic Network Metathesaurus Mediastinum 4 Saccular Viscus Angina 97 Pectoris Esophagus 12 Heart Left Phrenic Nerve Concepts 9 Heart Valves Fetal 31 Heart Cardiotonic 225 Agents Tissue 22 Donors Source Vocabularies (2007AA) 139 source vocabularies z 17 languages Broad coverage of biomedicine z z z 5.5M names 1.4M concepts 16M relations Common presentation Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 43 Biomedical terminologies General vocabularies z z z anatomy (UWDA, Neuronames) drugs (RxNorm, First DataBank, Micromedex, …) medical devices (UMD, SPN) Several perspectives z z z z clinical terms (SNOMED CT) information sciences (MeSH, CRISP) administrative terminologies (ICD-9-CM, CPT-4) data exchange terminologies (HL7, LOINC) Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 44 Biomedical terminologies (cont’d) Specialized vocabularies z z z z z z z nursing (NIC, NOC, NANDA, Omaha, PCDS) dentistry (CDT) oncology (NCI Thesaurus, PDQ) psychiatry (DSM, APA) adverse reactions (COSTART, WHO ART, MedDRA) primary care (ICPC) genomics (Gene Ontology, HUGO, OMIM) Terminology of knowledge bases (AI/Rheum, DXplain, QMR) Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 45 Integrating subdomains Clinical Clinical repositories repositories Genetic Genetic knowledge knowledge bases bases SNOMED Other Other subdomains subdomains OMIM … MeSH UMLS NCBI Taxonomy Model Model organisms organisms Biomedical Biomedical literature literature GO UWDA Genome Genome annotations annotations Anatomy Anatomy Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 46 Integrating subdomains Clinical Clinical repositories repositories Genetic Genetic knowledge knowledge bases bases Other Other subdomains subdomains Biomedical Biomedical literature literature Model Model organisms organisms Genome Genome annotations annotations Anatomy Anatomy Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 47 How do they do that? Lexical knowledge Semantic pre-processing UMLS editors Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 48 Lexical knowledge Adrenal gland diseases Adrenal disorder Disorder of adrenal gland Diseases of the adrenal glands C0001621 Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 49 Semantic pre-processing Metadata in the source vocabularies Tentative categorization Positive (or negative) evidence for tentative synonymy relations based on lexical features Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 50 Additional knowledge: UMLS editors Adrenal Gland Diseases Adrenal Cortex Diseases Adrenal Cortex Dysfunction Hypoadrenalism Other disorders of adrenal gland Adrenal Gland Hypofunction Adrenal cortical hypofunction Addison’s Disease Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 51 UMLS vs. Semantic Web Similarities, differences and unresolved issues Identifying biomedical entities z z Trans-namespace integration No UMLS-based URIs Availability z z Intellectual property restrictions Application Programming Interface Formats z RRF vs. SW languages UMLS as an ontology? z Underspecified semantics Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 53 X Identifying biomedical entities Syntax vs. semantics z URI, LSID,… vs. reference ontologies Integrative resources vs. individual namespaces z Unified Medical Language System (UMLS) vs. GO, MeSH, SNOMED, … Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 54 No UMLS-based URIs Syntax No officially supported UMLS-based URIs for biomedical entities e.g., http://umls.org/C0001403 Possible alternatives Redirection service (e.g., PURL) http://purl.org/ Resolution issues: what is expected to be returned? z z z Acknowledgment of existence Preferred term Set of names, relations,… in RDF Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 55 No UMLS-based URIs Semantics Potential resources for trans-namespace identification of biomedical entities z z Clinical medicine: UMLS CUIs [Genomics: Entrez Gene] Ontology of biomedical relationships z No comprehensive integrative resource available OBO relations UMLS Semantic Network relations GALEN relations Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 56 Trans-namespace integration Addison's disease (disorder) (363732003) Clinical Clinical repositories repositories Genetic Genetic knowledge knowledge bases bases SNOMED Other Other subdomains subdomains OMIM … MeSH UMLS NCBI Taxonomy Model Model organisms organisms C0001403 Biomedical Biomedical literature literature Addison Disease (D000224) GO UWDA Genome Genome annotations annotations Anatomy Anatomy Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 57 Trans-namespace integration Advantages z z Over shared identifiers (increased recall) Over lexical mapping (increased recall + precision) Addison Disease X Primary adrenocortical insufficiency MeSH:D000224 X ICD9CM:E27.1 UMLS:C0001403 Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 58 Ambiguity resolution Neurofibromatosis 2 [disease] C0027832 NF2 Neurofibromin 2 [protein] C0254123 Neurofibromatosis 2 gene [gene] C0085114 Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 59 Other integrative resources HGNC:2928 http://www.ncbi.nlm.nih.gov/sites/entrez HPRD:02303 Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 60 Y Availability Intellectual property restrictions UMLS: free license required http://www.nlm.nih.gov/research/umls/license.html Some intellectual property restrictions 2/3 of the names freely available (in the US) http://www.nlm.nih.gov/research/umls/ Web browser: username/password required Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 61 Availability Application Programming Interfaces Remote server at NLM Local application connected through Java RMI Java-based applications Developer’s Guide: Chapter 3 Set of Java classes (part of the UMLSKS API download) Detailed Javadoc documentation online and with API download TCP/IP socket XML-based queries Developer’s Guide: Chapter 5 XML schema Socket server z z Host: umlsks.nlm.nih.gov Port: 8042 Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 62 Availability Web Services-based API Part of the Knowledge Source Server version 3 z z Portlet-based, customizable WS architecture Coming soon z z Alpha release in July 2007 Beta release in November 2007 Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 63 p Representation formalism z z z Rich Release Format (RRF) [Original Release Format (ORF)] Support for source transparency Semantic Web z z z RDF – Resource Description Framework OWL – Web Ontology Language SKOS – Simple Knowledge Organization Systems Other formats z z z z UMLS OBO – Open Biological Ontologies http://obo.sourceforge.net/browse.html LexGrid http://informatics.mayo.edu/LexGrid/ Converters z z OBO – OWL http://www.bioontology.org/tools/oboinowl/obo_converter.html Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 64 UMLS vocabularies available in RDF/OWL NCI Thesaurus (OWL) z http://ncicb.nci.nih.gov/core/EVS Gene Ontology z http://www.geneontology.org/ Repository of biomedical ontologies (OBO, OWL) z http://www.bioontology.org/ncbo/faces/index.xhtml Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 65 Porting vocabularies to OWL Experiments MeSH z Soualmia et al., KR-MED 2004 Foundational Model of Anatomy (FMA) z z Golbreich et al., JWS 2006 (OWL DL) Noy and Rubin, SMI Tech Report 2007 (OWL Full) UMLS Semantic Network z Kashyap and Borgida, ISWC 2003 UMLS Metathesaurus z Cornet and Abu-Hanna, AMIA 2002 Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 66 UMLS Semantic Network (Semantic Types) UMLS as an “ontology” Neoplastic Process Neurofibromatoses Benign neoplasms of cranial nerves Gene or Genome Tumor suppressor genes NF2 (Neurofibromin 2 gene) C0085114 Neurofibromatosis 2 (Type II neurofibromatosis, Bilateral acoustic neurofibromatosis) C0027832 UMLS Metathesaurus (Concepts and relations) Amino Acid, Peptide, or Protein Biologically Active Substance Tumor suppressor proteins Merlin (Schwannomin, Neurofibromin 2) C0254123 Merlin, Drosophila Drosophila melanogaster merlin NEUROFIBROMATOSIS, (Dmerlin) mRNA, complete cds. TYPE II; NF2 67 Lister Hill Lister Hill National National Center Center for for Biomedical Biomedical Communications Communications #101000 U49724 OMIM Genbank External resources [ UMLS as an ontology Limitations Genes not systematically represented z Most gene products and diseases are Gene/Gene product-Disease relations z z Not systematically represented Not explicitly represented (e.g., co-occurrence) Cross-references not systematically represented Naming conventions (genes) Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 68 Underspecified semantics Relationship “attribute” not always present Relations used to create hierarchies vs. hierachical relations Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 69 Summary Biomedicine and Semantic Web Semantic Web technologies have not been widely adopted yet in biomedicine z z OBO vs. OWL caBIG vs. Taverna Use cases z Information/Data integration Recent efforts z W3C Health Care and Life Sciences Interest Group Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 71 UMLS and Semantic Web Terminology integration Based on existing terminologies Trans-namespace, permanent identifiers APIs available z “Proprietary” representation (RRF) Some intellectual property restrictions Underspecified semantics No UMLS-based URIs Web Services-based API coming soon Can support information integration Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 72 Medical Ontology Research Contact: [email protected] Web: mor.nlm.nih.gov Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA UMLS References UMLS umlsinfo.nlm.nih.gov UMLS browsers (free, but UMLS license required) z z z Knowledge Source Server: umlsks.nlm.nih.gov Semantic Navigator: http://mor.nlm.nih.gov/perl/semnav.pl RRF browser (standalone application distributed with the UMLS) Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 74 UMLS References Gentle introduction z Bodenreider O. (2004). The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Research; D267-D270. http://mor.nlm.nih.gov/pubs/pdf/2004-nar-ob.pdf Seminal paper z Lindberg, D. A., Humphreys, B. L., & McCray, A. T. (1993). The Unified Medical Language System. Methods Inf Med, 32(4), 281-91. Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 75 Semantic Web for Health Care and Life Sciences W3C Health Care and Life Sciences Interest Group z http://www.w3.org/2001/sw/hcls/ Ruttenberg A, Clark T, Bug W, Samwald M, Bodenreider O, Chen H, Doherty D, Forsberg K, Gao Y, Kashyap V, Kinoshita J, Luciano J, Marshall MS, Ogbuji C, Rees J, Stephens S, Wong GT, Wu E, Zaccagnini D, Hongsermeier T, Neumann E, Herman I, Cheung K-H. Advancing translational research with the Semantic Web. BMC Bioinformatics 2007;8(Suppl 3):S2. http://mor.nlm.nih.gov/pubs/pdf/2007-bmc_bioinformatics-ar.pdf Demo presented at the WWW2007 conference (May 2007) http://esw.w3.org/topic/HCLS/HCLSIG_DemoHomePage_HCLSIG_ Demo Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 76 Biomedical information integration through RDF Biomedical perspective z Sahoo S, Zeng K, Bodenreider O, Sheth AP. (2007). From “glycosyltransferase” to “congenital muscular dystrophy”: Integrating knowledge from NCBI Entrez Gene and the Gene Ontology. Proceedings of Medinfo (in press). http://mor.nlm.nih.gov/pubs/pdf/2007-medinfo-ss.pdf Semantic Web perspective z Sahoo S, Zeng K, Bodenreider O, Sheth AP. (2007). An experiment in integrating large biomedical knowledge resources with RDF: Application to associating genotype and phenotype information. Proceedings of the workshop on Health Care and Life Sciences Data Integration for the Semantic Web at the 16th International World Wide Web Conference (WWW2007) (in press). http://mor.nlm.nih.gov/pubs/pdf/2007-www_hcls-ss.pdf Lister Lister Hill Hill National National Center Center for for Biomedical Biomedical Communications Communications 77
© Copyright 2026 Paperzz