FAIR Data Stewardship for Discovery and Innovation MOLGENIS FAIR Roadmap David van Enckevort Utrecht, 3 November 2016 MOLGENIS Overview Exchange format Data request Data explorer Genome browser Import data and meta data using EMX format (D4.1) Find and request (biobank) data sets and items Filter and download for further analysis (D4.2) Data sharing and integration DAS protocol Model registry Annotators Meta-data registry of models for biobanks and molecular data (D4.4) Data integration for diagnostics and personalized medicine Biobank Connect R statistics Using ontologies to derive harmonization rule for data pooling (D2.2) Use R data api to up/download data and integrate graphics Compute Impute pipeline DNA pipeline RNA pipeline Large scale computation on computational clusters, grids and clouds GWAS harmonization and imputation NGS data alignment, SNV/SV calling, QC, NIPT NGS data quantitation, structure,eQTL allele specific expression MOLGENIS platform: open source collaborative mvc JPA / ~20 active devs ~25 projects github.com/molgenis Internally FAIR Data as increasingly FAIR Digital Objects Totally UNFAIR Findable Usable for Humans PID PID\\\ FAIR metadata PID Metadata (intrinsic) Metadata (intrinsic) 'provenance' (user defined) 'provenance' (user defined) 'provenance' (user defined) Data (elements) Data (elements) Data (elements) Metadata (intrinsic) FAIR datarestricted access FAIR dataOpen Access PID PID PID FAIR dataOpen Access/Functionally Linked Metadata (intrinsic) Metadata (intrinsic) Metadata (intrinsic) 'provenance' (user defined) 'provenance' (user defined) 'provenance' (user defined) Data (elements) Data (elements) Data (elements) Using our own choice of formats and standards, interoperable between MOLGENIS instances, but not necessarily aligned with the FAIR chosen formats and standards FAIR Hackathon • Two MOLGENIS Developers for two days • Support from LUMC / DTL FAIR Team • Goal to build a proof of concept of MOLGENIS that is FAIR • Using the BBMRI Biobank Catalogue as the usecase BBMRI-NL Biobank Catalogue 6 RD-Connect Sample Catalogue FAIR Hackathon Results FAIR Hackathon Results FAIR Hackathon Results FAIR Hackathon Results Data as increasingly FAIR Digital Objects Totally UNFAIR Findable Usable for Humans PID PID\\\ FAIR metadata PID Metadata (intrinsic) Metadata (intrinsic) Metadata (intrinsic) 'provenance' (user defined) 'provenance' (user defined) 'provenance' (user defined) Data (elements) Data (elements) Data (elements) FAIR datarestricted access FAIR dataOpen Access PID PID PID FAIR dataOpen Access/Functionally Linked Metadata (intrinsic) Metadata (intrinsic) Metadata (intrinsic) 'provenance' (user defined) 'provenance' (user defined) 'provenance' (user defined) Data (elements) Data (elements) Data (elements) Roadmap • Bring the PoC into production • Implement Linked Data Fragments service • Offer a suite of tools to make data FAIR MOLGENIS/connect ‘FAIRifier’ system for retrospective interoperability of data BiobankConnect Make data attributes interoperable <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> <ID> SORTA Make data values interoperab le SORTA - To code data values Semantic/lexical matching to shortlist codes for each unique value Upload data using Excel SORTA shortlists candidate codes Semantic Lexical matching matching Use n-gram matching treshold (e.g 80%) Human expert decides (and so trains SORTA) SORTA automatically recodes when high matching score (e.g. 80%) SORTA - To code data values When match is < threshold, user decides and works through open issues SORTA - To code data values When match is < threshold, user decides and works through open issues (Biobank)Connect to code data attributes Software autogenerates mappings using ontologies + lexical matching Per attribute mapping assistant User can fix the mapping on the fly (using the semantic/lexical tricks) Learn more or contact [email protected] Reading ◻ MOLGENIS docs @ http://molgenis.github.com ◻ BiobankConnect paper - http://pubmed.org/25361575 ◻ SORTA paper - http://pubmed.org/26385205 ◻ MOLGENIS papers - http://pubmed.org?term=MOLGENIS Movies ◻ Upload - https://www.youtube.com/watch?v=VSZNXdaGIl4 ◻ SORTA - https://www.youtube.com/watch?v=Wq81S-jR3l8 ◻ BiobankConnect - https://www.youtube.com/watch?v=Gc1VKRCmTWU - eric - nl
© Copyright 2026 Paperzz