Introduction to RDF and the Semantic Web for the life sciences Simon Jupp Sample Phenotypes and Ontologies Team European Bioinformatics Institute [email protected] Day 2 practical session • Exploring EBI RDF platform • Querying EBI resources • Federated queries from one SPARQL endpoint to another Exercise 17 • Explore the EBI RDF platform at http://www.ebi.ac.uk/rdf • A) On the ChEMBL endpoint get ChEMBL activities, assays and targets for the drug Clotrimazole (CHEMBL104) • B) On the Atlas endpoint find expression for ENSDARG00000042641 (Cyp51) • B2) filter the results by property type contains “organism_part” • C) On the Reactome endpoint find pathways that references Cyp51 (http://purl.uniprot.org/uniprot/Q1JPY5) • D) Query the UniProt endpoint to describe http://purl.uniprot.org/uniprot/Q1JPY5 Exercise 17 solution A PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> PREFIX chembl_molecule: <http://rdf.ebi.ac.uk/resource/chembl/molecule/> SELECT ?activity ?assay ?target ?targetcmpt ?uniprot WHERE { ?activity a cco:Activity ; cco:hasMolecule chembl_molecule:CHEMBL104 ; cco:hasAssay ?assay . ?assay cco:hasTarget ?target . ?target cco:hasTargetComponent ?targetcmpt . ?targetcmpt cco:targetCmptXref ?uniprot . ?uniprot a cco:UniprotRef } Exercise 17 solution B PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> PREFIX identifiers:<http://identifiers.org/ensembl/> SELECT distinct ?diffValue ?expUri ?propertyType ?propertyValue ?pvalue WHERE { ?expUri atlasterms:hasAnalysis ?analysis . ?analysis atlasterms:hasExpressionValue ?value . ?value rdfs:label ?diffValue . ?value atlasterms:hasFactorValue ?factor . ?factor atlasterms:propertyType ?propertyType . ?factor atlasterms:propertyValue ?propertyValue . ?value atlasterms:pValue ?pvalue . ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref identifiers:ENSDARG00000042641 . } Exercise 17 solution B1 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> PREFIX identifiers:<http://identifiers.org/ensembl/> SELECT distinct ?diffValue ?expUri ?propertyType ?propertyValue ?pvalue WHERE { ?expUri atlasterms:hasAnalysis ?analysis . ?analysis atlasterms:hasExpressionValue ?value . ?value rdfs:label ?diffValue . ?value atlasterms:hasFactorValue ?factor . ?factor atlasterms:propertyType ?propertyType . ?factor atlasterms:propertyValue ?propertyValue . ?value atlasterms:pValue ?pvalue . ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref identifiers:ENSDARG00000042641 . FILTER regex (?propertyType, "organism_part") } Exercise 17 solution C PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX biopax3: <http://www.biopax.org/release/biopax-level3.owl#> SELECT DISTINCT ?pathway ?pathwayname WHERE {?pathway rdf:type biopax3:Pathway . ?pathway biopax3:displayName ?pathwayname . ?pathway biopax3:pathwayComponent ?reaction . ?reaction rdf:type biopax3:BiochemicalReaction . { {?reaction ?rel ?protein .} UNION { ?reaction ?rel ?complex . ?complex rdf:type biopax3:Complex . ?complex ?comp ?protein . }} ?protein rdf:type biopax3:Protein . ?protein biopax3:entityReference <http://purl.uniprot.org/uniprot/Q1JPY5> } LIMIT 100 Exercise 17 solution D DESCRIBE <http://purl.uniprot.org/uniprot/Q1JPY5> Federated querying • One of the biggest advantages of SPARQL and triples stores is the ability to federate queries across endpoints • Integrating data at query time with SPARQL GO annotation Expression value GXA Uniprot Uniprot Protein Federated SPARQL PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> SELECT DISTINCT ?experiment ?description WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?experiment a atlasterms:Experiment . ?experiment dcterms:identifier ?id . ?experiment dcterms:description ?description . FILTER regex(?id, "E-GEOD-2852") } } We can execute this query from any other endpoint using the SPARQL SERVICE keyword http://tinyurl.com/o9kvvzn Exercise 19 • Execute the following federated query on • 1. The UniProt SPARQL endpoint • 2. Your Sesame workbench SPARQL endpoint PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> SELECT DISTINCT ?experiment ?description WHERE { SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?experiment a atlasterms:Experiment . ?experiment dcterms:identifier ?id . ?experiment dcterms:description ?description . FILTER regex(?id, "E-GEOD-2852") } } Constructing a Federated query • Basic query to get genes out of our dataset • How can we integrate this with data from the EMBL-EBI Gene Expression Atlas? PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX mydata:<http://www.mydomain.com/mydata#> PREFIX efo:<http://www.ebi.ac.uk/efo/> SELECT DISTINCT ?geneid ?label WHERE { ?result mydata:dbXref ?geneid . ?geneid rdfs:label ?label . } Querying the Atlas Example query 3 (http://www.ebi.ac.uk/rdf/services/atlas/sparql) SELECT distinct ?diffValue ?expUri ?propertyType ?propertyValue ?pvalue WHERE { ?expUri atlasterms:hasAnalysis ?analysis . ?analysis atlasterms:hasExpressionValue ?value . ?value rdfs:label ?diffValue . Our query ?value atlasterms:hasFactorValue ?factor . ?factor atlasterms:propertyType ?propertyType . SELECT DISTINCT ?geneid ?label WHERE { ?factor atlasterms:propertyValue ?propertyValue . ?result mydata:dbXref ?geneid . ?value atlasterms:pValue ?pvalue . ?geneid rdfs:label ?label . ?value atlasterms:isMeasurementOf ?probe . } ?probe atlasterms:dbXref identifiers:ENSMUSG00000034450 . } ORDER BY ASC (?pvalue) Integration point Querying the Atlas Example query 3 (http://www.ebi.ac.uk/rdf/services/atlas/sparql) SELECT distinct ?diffValue ?expUri ?propertyType ?propertyValue ?pvalue WHERE { ?expUri atlasterms:hasAnalysis ?analysis . ?analysis atlasterms:hasExpressionValue ?value . ?value rdfs:label ?diffValue . ?value atlasterms:hasFactorValue ?factor . ?factor atlasterms:propertyType ?propertyType . ?factor atlasterms:propertyValue ?propertyValue . ?value atlasterms:pValue ?pvalue . Our query ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref identifiers:ENSMUSG00000034450 . } PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> ORDER BY ASC (?pvalue) PREFIX mydata:<http://www.mydomain.com/mydata#> PREFIX atlasterms:<http://rdf.ebi.ac.uk/terms/atlas/> SELECT DISTINCT ?geneid ?label ?probe WHERE { ?result mydata:dbXref ?geneid . ?geneid rdfs:label ?label . SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?probe atlasterms:dbXref ?geneid } } 1st gotcha PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX mydata:<http://www.mydomain.com/mydata#> PREFIX atlasterms:<http://rdf.ebi.ac.uk/terms/atlas/> SELECT ?geneid ?label ?probe ?value WHERE { ?result mydata:dbXref ?geneid . ?geneid rdfs:label ?label . SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref ?geneid } } This should work but there is an issue with querying the EBI RDF Platform with this version of Sesame (fix coming soon!) 1st gotcha PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX mydata:<http://www.mydomain.com/mydata#> PREFIX atlasterms:<http://rdf.ebi.ac.uk/terms/atlas/> SELECT ?label ?probe ?value WHERE { ?result mydata:dbXref <http://identifiers.org/ensembl/ENSMUSG00000024673> . <http://identifiers.org/ensembl/ENSMUSG00000024673> rdfs:label ?label . SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref <http://identifiers.org/ensembl/ENSMUSG00000024673> } } Bind on gene <http://identifiers.org/ensembl/ENSMUSG00000024673> Exercise 20 • A) Using the previous query, extend it to query the Atlas endpoint to also return the Experiment id and factors (property values) where Ms4ai (ENSMUSG00000024673) is expressed • B) Filter those results to only include experiments where the factor contains “liver” Exercise 20 solution A) PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX mydata:<http://www.mydomain.com/mydata#> PREFIX atlasterms:<http://rdf.ebi.ac.uk/terms/atlas/> PREFIX identifiers:<http://identifiers.org/ensembl/> SELECT ?label ?expUri ?propertyValue WHERE { ?result mydata:dbXref identifiers:ENSMUSG00000024673 . identifiers:ENSMUSG00000024673 rdfs:label ?label . SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?expUri atlasterms:hasAnalysis ?analysis . ?analysis atlasterms:hasExpressionValue ?value . ?value atlasterms:hasFactorValue ?factor . ?factor atlasterms:propertyValue ?propertyValue . ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref identifiers:ENSMUSG00000024673 } } Exercise 20 solution B) PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX mydata:<http://www.mydomain.com/mydata#> PREFIX atlasterms:<http://rdf.ebi.ac.uk/terms/atlas/> PREFIX identifiers:<http://identifiers.org/ensembl/> SELECT ?label ?expUri ?propertyValue WHERE { ?result mydata:dbXref identifiers:ENSMUSG00000024673 . identifiers:ENSMUSG00000024673 rdfs:label ?label . SERVICE <http://www.ebi.ac.uk/rdf/services/atlas/sparql> { ?expUri atlasterms:hasAnalysis ?analysis . ?analysis atlasterms:hasExpressionValue ?value . ?value atlasterms:hasFactorValue ?factor . ?factor atlasterms:propertyValue ?propertyValue . ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref identifiers:ENSMUSG00000024673 } FILTER regex(?propertyValue, "liver", "i") } Alzheimer’s Use Case – EBI RDF platform • • • • • EFO term for Alzheimer’s: EFO_0000249 Get Genes diff expressed for Alzheimer’s Get proteins encoded for those genes GO annotations from UniProt for those genes Get pathways form Reactome in which those proteins are involved • Get drugs that target proteins within those pathways Q1. Get Ensembl genes diff expressed for Alzheimer’s PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX dcterms:<http://purl.org/dc/terms/> PREFIX efo: <http://www.ebi.ac.uk/efo/> PREFIX atlas: <http://rdf.ebi.ac.uk/resource/atlas/> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> SELECT distinct ?expressionValue ?dbXref ?pvalue ?propertyValue WHERE { ?expUri atlasterms:hasAnalysis ?analysis . ?analysis atlasterms:hasExpressionValue ?value . ?value rdfs:label ?expressionValue . ?value atlasterms:pValue ?pvalue . ?value atlasterms:hasFactorValue ?factor . ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref ?dbXref . ?dbXref rdf:type atlasterms:EnsemblDatabaseReference . ?factor atlasterms:propertyType ?propertyType . ?factor atlasterms:propertyValue ?propertyValue . ?factor rdf:type efo:EFO_0000249 . } Q2. Get UniProt proteins for those genes PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX efo: <http://www.ebi.ac.uk/efo/> PREFIX atlas: <http://rdf.ebi.ac.uk/resource/atlas/> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> SELECT distinct ?expressionValue ?dbXref ?pvalue ?propertyValue WHERE { ?expUri atlasterms:hasAnalysis ?analysis . ?analysis atlasterms:hasExpressionValue ?value . ?value rdfs:label ?expressionValue . ?value atlasterms:pValue ?pvalue . ?value atlasterms:hasFactorValue ?factor . ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref ?dbXref . ?dbXref rdf:type atlasterms:UniprotDatabaseReference . ?factor atlasterms:propertyType ?propertyType . ?factor atlasterms:propertyValue ?propertyValue . ?factor rdf:type efo:EFO_0000270 . } Q3. Get UniProt GO Annotations for those genes PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX efo:<http://www.ebi.ac.uk/efo/> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> PREFIX upc:<http://purl.uniprot.org/core/> PREFIX identifiers:<http://identifiers.org/ensembl/> SELECT distinct ?valueLabel ?goid ?golabel WHERE { ?expUri atlasterms:hasAnalysis ?analysis . ?analysis atlasterms:hasExpressionValue ?value . ?value rdfs:label ?expressionValue . ?value atlasterms:pValue ?pvalue . ?value atlasterms:hasFactorValue ?factor . ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref ?dbXref. ?dbXref rdf:type atlasterms:EnsemblDatabaseReference . ?factor atlasterms:propertyType ?propertyType . ?factor atlasterms:propertyValue ?propertyValue . ?factor rdf:type efo:EFO_0000249 . ?value rdfs:label ?valueLabel . ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref ?uniprot . SERVICE <http://beta.sparql.uniprot.org/sparql> { ?uniprot a upc:Protein . ?uniprot upc:classifiedWith ?keyword . ?keyword rdfs:seeAlso ?goid . ?goid rdfs:label ?golabel . } } Q4. get pathways from Reactome for those proteins PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX efo: <http://www.ebi.ac.uk/efo/> PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/> PREFIX biopax3:<http://www.biopax.org/release/biopax-level3.owl#> SELECT DISTINCT ?pathway ?dbXref WHERE { ?expUri atlasterms:hasAnalysis ?analysis . ?analysis atlasterms:hasExpressionValue ?value . ?value rdfs:label ?expressionValue . ?value atlasterms:pValue ?pvalue . ?value atlasterms:hasFactorValue ?factor . ?value atlasterms:isMeasurementOf ?probe . ?probe atlasterms:dbXref ?dbXref . ?dbXref rdf:type atlasterms:UniprotDatabaseReference . ?factor atlasterms:propertyType ?propertyType . ?factor atlasterms:propertyValue ?propertyValue . ?factor rdf:type efo:EFO_0000270 . SERVICE <http://www.ebi.ac.uk/rdf/services/reactome/sparql> {?pathway rdf:type biopax3:Pathway . ?pathway biopax3:displayName ?pathwayname . ?pathway biopax3:pathwayComponent ?reaction . ?reaction rdf:type biopax3:BiochemicalReaction . { {?reaction ?rel ?protein .} UNION { ?reaction ?rel ?complex . ?complex rdf:type biopax3:Complex . ?complex ?comp ?protein . }} ?protein rdf:type biopax3:Protein . ?protein biopax3:entityReference ?dbXref } } Q5. Get drugs that target proteins within those pathways PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>! PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>! PREFIX efo:<http://www.ebi.ac.uk/efo/>! PREFIX atlasterms: <http://rdf.ebi.ac.uk/terms/atlas/>! PREFIX biopax3:<http://www.biopax.org/release/biopax-level3.owl#>! PREFIX cco:<http://rdf.ebi.ac.uk/terms/chembl#>! SELECT distinct ?dbXrefProt ?pathwayname ?moleculeLabel ?expressionValue ?propertyValue! WHERE {! #Get differentially expressed genes (and proteins) where factor is asthma! ?value atlasterms:pValue ?pvalue .! ?value atlasterms:hasFactorValue ?factor .! ?value rdfs:label ?expressionValue .! ?value atlasterms:isMeasurementOf ?probe .! ?probe atlasterms:dbXref ?dbXrefProt .! ?dbXrefProt a atlasterms:UniprotDatabaseReference .! ?factor atlasterms:propertyType ?propertyType .! ?factor atlasterms:propertyValue ?propertyValue .! ?factor rdf:type efo:EFO_0000249 .! #Compunds target them! SERVICE <http://www.ebi.ac.uk/rdf/services/chembl/sparql> {! ?act a cco:Activity ;! cco:hasMolecule ?molecule ;! cco:hasAssay ?assay .! ?molecule rdfs:label ?moleculeLabel .! ?assay cco:hasTarget ?target .! ?target cco:hasTargetComponent ?targetcmpt .! ?targetcmpt cco:targetCmptXref ?dbXrefProt .! ?targetcmpt cco:taxonomy <http://identifiers.org/taxonomy/9606> .! ?dbXrefProt a cco:UniprotRef .! }! SERVICE <http://www.ebi.ac.uk/rdf/services/reactome/sparql> {! ! ?protein rdf:type biopax3:Protein .! ?protein biopax3:memberPhysicalEntity! ! ![biopax3:entityReference ?dbXrefProt] .! ?pathway biopax3:displayName ?pathwayname .! ?pathway biopax3:pathwayComponent ?reaction .! ?reaction ?rel ?protein! }! }! Summary • Why there is a need for new technologies in the life sciences • • • • • • Why RDF is a good fit for some of the problems The role of ontologies Generating RDF triples from data Working with an RDF database How to write a SPARQL query How the EBI is using RDF Conclusions • Generating RDF triples is relatively easy • Extracting the schema from your data can be tricky • Avoid over modeling – have good use cases • Look for appropriate ontologies, reuse terms where possible • Good tooling now available • RDF APIs for most programming language • Lots of scalable triples stores • SPARQL is a powerful query language for RDF • Also very unforgiving; debugging queries is hard • Treat the same as you would SQL, not for your average user Conclusions cont.. • Lots of interest in Linked Data and RDF • See LOD clouds and DBpedia • Big name companies using/generating RDF content (Facebook, Google, Oracle) • Some good examples of applications • Pharma industry (OpenPhacts project), Semantic publishing (BBC), Government data (data,gov.uk) • Tread cautiously • This technology is still maturing • Not a panacea • Good solutions for some problems Thinking beyond RDF and SPARQL • Selling SPARQL endpoints to biologists is hard i.e. near impossible • Entry level is too high and advantages too intangible • Let programmers code against SPARQL • Let everyone else use more familiar modes through Apps RDFApps • Our first RDFApp targets the existing community of R users – an ArrayExpress R package already exists • Goal is to expose the power of the Atlas RDF+SPARQL behind a conventional R interface • Enables those working with raw data to also use power of Atlas Codefest • Got an idea for an RDF App? Join us at Codefest 2014 • http://www.open-bio.org/wiki/Codefest_2014 • 18th/19th September, Cambridge, UK Interesting RDF resources for biology • • • • • • • • • EBI RDF (http://www.ebi.ac.uk/rdf ) Bio2RDF (http://bio2rdf.org ) BioPortal (https://bioportal.bioontology.org ) OpenPhacts (https://www.openphacts.org ) PubChem RDF (https://pubchem.ncbi.nlm.nih.gov/rdf/ ) Identifiers.org (http://identifiers.org ) Wikipathways (http://wikipathways.org ) DisGeNet (http://ibi.imim.es/web/DisGeNET/v01/ ) W3C Healthcare and Life Sciences Working Group (HCLS http://www.w3.org/blog/hcls/ ) Acknowledgments • Samples Phenotypes and Ontologies Group and Functional Genomics Production Team • James Malone, Robert Petryszak, Tony Burdett, Helen Parkinson • EBI RDF platform • Andy Jenkinson, Mark Davies, Marco Brandizi, Sarala Wimalaratne, Leyla Garcia, Jerven Bolleman Funding Components of the RDF platform pilot are supported by a number of sources, including: • EMBL • European Commission: • BioMedBridges [284209] • Diachron [601043] • OpenPhacts (Innovative Medicines Initiative) • National Institutes of Health Questions? Sign up for our mailing list: http://listserver.ebi.ac.uk/mailman/listinfo/rdf-announce
© Copyright 2026 Paperzz