Breakout 7: Joint meeting with Publishing Data Workflows

Joint DFIG – PWWG Meeting
Amy Nurnberger, Lary Lannom, Peter Wittenburg
Agenda
Breakout 1: Discussion about Guidelines/Recommendations
Breakout 2: configuration building and Minimal PID Types
Breakout 4: DFIG Core Session
Breakout 5: Joint session with Brokering Group
Breakout 7: Joint meeting with Publishing Data Workflows
14.00
14.05
14.20
14.35
14.45
15.30
welcome and goals (Peter)
DFIG view on scientific data workflows (Peter)
PWWG view on scientific data and publishing workflows (Amy)
comparison, overlap and differences in views (Larry)
discussion (Larry and Amy)
end
Intentions and Goals
• comparing core documents from DFIG and Publishing
Workflow IG show that
• there is much overlap despite different starting points
• there are barriers in culture and terminology
• there is some tradition to not talk to each other
• RDA is about bridge building
• this session is about building a bridge and get together
• need to understand how we can integrate the
approaches since we address overlapping issues
• how to do this -> discussion
DFIG view on scientific data
workflows
Peter
Lab Reality – slowly changing
are curiosity driven
research and chaos
twins?
is DIS different?
• EU survey: 75% of researcher’s
time spent on DM/A
• M. Brodie (MIT): 80 %
• something is fundamentally
wrong !!
• far away from data publication
considerations
• clear trends for all: data orientation, more and complex data
• Automatic workflows would change, but
• many exceptions, parameter choices, human interventions
• lack of experts to create flexible software solutions
• how can we help and change?
• short term and long term solutions
An illustration
Feature
Sets
Collection
X
Pattern
Extractor
Collection
Y
Smart
Machine
Pattern
Extractor
Collection
Z
Results
Iterations
An illustration
Feature
Sets
Collection
X
Pattern
Extractor
Collection
Y
Smart
Machine
Pattern
Extractor
Collection
Z
Data Fabric Cycle
Observations
Experiments
Simulations
etc.
This slide indicated the continuous cycle of creating raw data or
derived data based on collections of existing data.
Identify components that could improve (stepwise).
From abstract fabrics to concrete compositions
Common
Components
& Services
Specific
Components
& Services
•
•
•
•
•
•
t-repositories
PID system
MD schemas
MD editors
vocabularies
etc.
Closing
urgent gaps
Global Digital Object Cloud
From abstract fabrics to concrete compositions
Common
Components
& Services
Specific
Components
& Services
•
•
•
•
•
•
t-repositories
PID system
MD schemas
MD editors
vocabularies
etc.
Closing
urgent gaps
Global Digital Object Cloud
Conclusions
• Collecting use cases and facts from many labs.
• Understand from heterogeneous practices how to come
to agreed components.
• Addressing the data cycle in the labs where publication
is often not an issue for quite some time.
• However the requirements for data management,
accessibility and publication are getting tighter.
• So need to consider these requirements and map them
with publication requirements.
• Need to provide easy transitions.
• Thus bridge conceptualisation & terminology.
• Need to overcome social barriers.
RDA/WDS Data Publishing Workflows WG +
Amy Nurnberger
DPWWG – Where we’ve been
What are the current
data publishing
workflow landscape
across disciplines and
institutions?
Data publishing entities
25 data publishing entities assessed in terms of discipline, function,
data formats, and roles




The assignment of persistent
identifiers (PIDs) to datasets,
and the PID type used -- e.g.
DOI, ARK, etc.
Peer review of data (e.g. by
researcher and by editorial
review)
Curatorial review of metadata
(e.g. by institutional or subject
repository)
Technical review and checks
(e.g. for data integrity at
repository/data centre on ingest)






Discoverability: was there
indexing of the data, and if so,
where?
Links to additional data products
(data paper; review; other journal
articles) or “stand-alone” product
Links to grant information, where
relevant, and usage of author
PIDs
Facilitation of data citation
Reference to a data life cycle
model
Standards compliance
Key components of data publishing
Austin, C. C., Bloom, T., Dallmeier-Tiessen, S., Khodiyar, V., Murphy,
F., Nurnberger, A., … Whyte, A. (2016). Key components of data
publishing: using current best practices to develop a reference model
for data publishing. http://doi.org/10.1007/s00799-016-0178-2
Workflows
Ibid
Workflows, cont.
Ibid
+
What’s missing?
What’s missing?
This stuff
What’s missing?
This stuff
“…early interactions
between researchers and a
suitable data repository (or
repositories), while data is
processed and prepared
for sharing.”
Dallmeier-Tiessen, S., Khodiyar, V., Murphy, F., Nurnberger, A., Raymond,
L., Whyte, A. (DRAFT). Connecting data publication to the research
workflow: a preliminary analysis
What’s missing?
 Deliberate integration of sundry products from research
process, e.g., software, code, models, etc.
 Integration/Interoperability between data processing
tools an platforms
 Disciplinary difference in data conception, collection, &
processing
Dallmeier-Tiessen, S., Khodiyar, V., Murphy, F., Nurnberger, A., Raymond,
L., Whyte, A. (DRAFT). Connecting data publication to the research
workflow: a preliminary analysis
What’s needed
 Small, modular, shareable components that help ensure
platforms offer sufficient flexibility to support variety,
 Research workflow solutions that enable straightforward
data and metadata generation in accordance with
community defined and accepted standards
 Commit to the use of PIDs and include versioning
capabilities
 Clear documentation that can offer direct benefits to
repository depositors and users
 Curators
Dallmeier-Tiessen, S., Khodiyar, V., Murphy, F., Nurnberger, A., Raymond,
L., Whyte, A. (DRAFT). Connecting data publication to the research
workflow: a preliminary analysis