Joint DFIG – PWWG Meeting Amy Nurnberger, Lary Lannom, Peter Wittenburg Agenda Breakout 1: Discussion about Guidelines/Recommendations Breakout 2: configuration building and Minimal PID Types Breakout 4: DFIG Core Session Breakout 5: Joint session with Brokering Group Breakout 7: Joint meeting with Publishing Data Workflows 14.00 14.05 14.20 14.35 14.45 15.30 welcome and goals (Peter) DFIG view on scientific data workflows (Peter) PWWG view on scientific data and publishing workflows (Amy) comparison, overlap and differences in views (Larry) discussion (Larry and Amy) end Intentions and Goals • comparing core documents from DFIG and Publishing Workflow IG show that • there is much overlap despite different starting points • there are barriers in culture and terminology • there is some tradition to not talk to each other • RDA is about bridge building • this session is about building a bridge and get together • need to understand how we can integrate the approaches since we address overlapping issues • how to do this -> discussion DFIG view on scientific data workflows Peter Lab Reality – slowly changing are curiosity driven research and chaos twins? is DIS different? • EU survey: 75% of researcher’s time spent on DM/A • M. Brodie (MIT): 80 % • something is fundamentally wrong !! • far away from data publication considerations • clear trends for all: data orientation, more and complex data • Automatic workflows would change, but • many exceptions, parameter choices, human interventions • lack of experts to create flexible software solutions • how can we help and change? • short term and long term solutions An illustration Feature Sets Collection X Pattern Extractor Collection Y Smart Machine Pattern Extractor Collection Z Results Iterations An illustration Feature Sets Collection X Pattern Extractor Collection Y Smart Machine Pattern Extractor Collection Z Data Fabric Cycle Observations Experiments Simulations etc. This slide indicated the continuous cycle of creating raw data or derived data based on collections of existing data. Identify components that could improve (stepwise). From abstract fabrics to concrete compositions Common Components & Services Specific Components & Services • • • • • • t-repositories PID system MD schemas MD editors vocabularies etc. Closing urgent gaps Global Digital Object Cloud From abstract fabrics to concrete compositions Common Components & Services Specific Components & Services • • • • • • t-repositories PID system MD schemas MD editors vocabularies etc. Closing urgent gaps Global Digital Object Cloud Conclusions • Collecting use cases and facts from many labs. • Understand from heterogeneous practices how to come to agreed components. • Addressing the data cycle in the labs where publication is often not an issue for quite some time. • However the requirements for data management, accessibility and publication are getting tighter. • So need to consider these requirements and map them with publication requirements. • Need to provide easy transitions. • Thus bridge conceptualisation & terminology. • Need to overcome social barriers. RDA/WDS Data Publishing Workflows WG + Amy Nurnberger DPWWG – Where we’ve been What are the current data publishing workflow landscape across disciplines and institutions? Data publishing entities 25 data publishing entities assessed in terms of discipline, function, data formats, and roles The assignment of persistent identifiers (PIDs) to datasets, and the PID type used -- e.g. DOI, ARK, etc. Peer review of data (e.g. by researcher and by editorial review) Curatorial review of metadata (e.g. by institutional or subject repository) Technical review and checks (e.g. for data integrity at repository/data centre on ingest) Discoverability: was there indexing of the data, and if so, where? Links to additional data products (data paper; review; other journal articles) or “stand-alone” product Links to grant information, where relevant, and usage of author PIDs Facilitation of data citation Reference to a data life cycle model Standards compliance Key components of data publishing Austin, C. C., Bloom, T., Dallmeier-Tiessen, S., Khodiyar, V., Murphy, F., Nurnberger, A., … Whyte, A. (2016). Key components of data publishing: using current best practices to develop a reference model for data publishing. http://doi.org/10.1007/s00799-016-0178-2 Workflows Ibid Workflows, cont. Ibid + What’s missing? What’s missing? This stuff What’s missing? This stuff “…early interactions between researchers and a suitable data repository (or repositories), while data is processed and prepared for sharing.” Dallmeier-Tiessen, S., Khodiyar, V., Murphy, F., Nurnberger, A., Raymond, L., Whyte, A. (DRAFT). Connecting data publication to the research workflow: a preliminary analysis What’s missing? Deliberate integration of sundry products from research process, e.g., software, code, models, etc. Integration/Interoperability between data processing tools an platforms Disciplinary difference in data conception, collection, & processing Dallmeier-Tiessen, S., Khodiyar, V., Murphy, F., Nurnberger, A., Raymond, L., Whyte, A. (DRAFT). Connecting data publication to the research workflow: a preliminary analysis What’s needed Small, modular, shareable components that help ensure platforms offer sufficient flexibility to support variety, Research workflow solutions that enable straightforward data and metadata generation in accordance with community defined and accepted standards Commit to the use of PIDs and include versioning capabilities Clear documentation that can offer direct benefits to repository depositors and users Curators Dallmeier-Tiessen, S., Khodiyar, V., Murphy, F., Nurnberger, A., Raymond, L., Whyte, A. (DRAFT). Connecting data publication to the research workflow: a preliminary analysis
© Copyright 2025 Paperzz