Notes for Launch - Research Data Alliance

Welcome to the RDA workshop DFT Breakout Sessions
Tuesday 9/17/13 Washington Marriott
Co-Chairs Peter Wittenburg and Gary Berg-Cross
The following is the working agenda for the first DFT Breakout session.
DFT Session 1 (9-11 a.m.)
1. Introductory Remarks and Introcuctions
2. Overview of the DFT Breakout Session- Goals and Plan
3. Formal State of the WG and Case Statement
4. Brief overview of Draft Docs and their schedule
5. Review Doc1 -Model Overview
6. Review Doc2 – Data Model Analysis
7. Review Doc3 - Analysis of Workflow
8. Review Doc4 - Synthesis
9. General Discussion (including remote participants)
10. Any changes to 2nd Session
Gary/Peter
Gary
Peter
Peter
Peter
Gary
Peter
All
All
Session - Part 2 (11:15-12:15)
1. Draft Method on Vocabulary Development Process
1.
Comments on Vocabulary Development by Joe Hourcle (NASA)
2. Discussion of Proposed Process
3. Proposal on how to start Doc5 seeded with 3-4 terms
1.
2.
3.
4.
4.
Gary
Peter & Group Discussion
Persistent Identifier
Digital Object - Data Object
Collection - Data Set - Aggregation
Repository (and related Policies?)
Discussion of Key Vocabulary Ideas
1.
Soliciting ideas for candidate vocabulary items
5. Liaison with other WGs and their topics including follow on work.
6. Formalizing action items & getting commitments for work,
1.
-e.g. reviews of the draft documents and vocabulary and also discuss WG plan for follow up virtual
meeting.
7. Additional DFT Chairperson
Overview of Data Foundation
and Terminology (DFT) Case for
nd
2 RDA Plenary
Statement (V19 July 17, 2013)
Co-Chairs
Peter Wittenburg (CLARIN, EUDAT. RDA TAB)
Gary Berg-Cross (SOCoP)
Page: https://www.rd-alliance.org/working-groups/datafoundation-and-terminology-wg.html
Forum:[email protected]
The Need for Shared Vocabularies
Research data communities would be helped
by standardized data vocabularies reflecting
the same definition for the same terms (data
asset, data object, metadata….)
•
Help the RDA community to find common
building blocks, describing their properties and
defining data process protocols related to them.
WGs
• Conversations/ interactions can be more
meaningful
• people aren't talking past each other.
• Helps adoption of common data sharing
Leverage various
practices and interoperation
• Help avoid duplication of effort.
models like OAIS &
get community
agreement
Gardens of RDA Projects
We need…..
Infrastructure,
Metadata, services,
repositories
Foundation
Concepts for …
Existing Reference
Models/vocs
Partners
Scope: We are Talking about Data Management:
Sensor networks, scientific instruments, MREFCs, streaming
data, modeling, computation, simulation, data
collections, file systems, databases,
government data, long
tail, etc.
Acquisition
Storage
Preservation
Interoperability
Access
Services
Metadata
Identifiers
Curation
Policy
Discovery
Analysis
Tools
Discovery, Insight, Re-use
Software, Policies, Education
Infrastructure, Facilities, Operations
with a huge range of concepts and terms
Tasks for DFT Work Group (WG)
The DFT WG tasks are to:
Describe a basic, abstract (but clear) data organization model
(and its vocabulary) that systemizes the already large body of
definition work on data management terms, especially as
involved in RDA’s efforts. (Started on in Documents 1-4)
The model and its derived reference data should be sound,
practical and agreed to within the community for use:
• across communities and stakeholders to better synchronize data
conceptualization,
• to enable better understanding within and between communities, and
• to stimulate tool building, such as for data services, supportive of the
basic model’s use.
• Need to get the story straight on model to govern the use of related tools.
1.
2.
3.
4.
5.
Write a suite of 5 reference documents about DFT Models
Create an accompanying abstract data organization model
that may be also expressed graphically.
Register the defined terms in an ISO-like concept registry so
that everyone can easily refer to them.
1. Proper data organization will be enabled by agreeing
upon a number of basic concepts and their relationships
as well by explicitly defining and registering appropriate
terms along with alternative views of them.
Engage many communities and stakeholders in the
document, terminology and model creation.
Establish contact with established, relevant communities such
as W3C, librarians, etc.
Policy
Example
WG Tasks – now about 12 month schedule
Start Up Work - Done
Write Case Statement
Startup Phase
&
January 2012 - March 2013
Organizational
activities
- conceptualization note describing the scope and intentions of the WG
Identify groups, initiatives, experts to contribute to the discussion
Get involvement
Sustain discussion and extend the basic conceptualization.
Phase 2
March 2013 - July 2013
Discuss work at 1st Plenary in Gothenburg (March 2013)
Relevant, Engaged, Referenced (15) work
includes:
• Cross-disciplinary data infrastructure project EUDAT
• ENVRI project (www.envri.eu), ongoing analysis and modeling
• generic data organization and sharing model (in UML) covering: data acquisition, data
curation, data access and data processing.
• Analysis of 6 ESFRI infrastructures (acquisition, curation, access & processing )
• Kahn & Wilensky paper describing a framework for distributed digital object
services.
• Wittenburg, Lautenschlager and Broeder analysis of the data organizations of
about 15 communities arriving at some common abstractions.
• Concept models being developed as part of the Data Conservancy http://dataconservancy.org/) Allen H. Renear David Dubin & others at Center
for Informatics Research in Science and Scholarship (CIRSS) at U of Illinois
• ……
Ongoing Phases 3 and 4
Consolidate the discussions with broader community involvement and
converge on conceptualization and term definitions (Supported meeting
over the summer)
Draft suite of reference documents which is one of the outcome of the WG
Phase 3
August 2013 December 2013
Starter & Core Concept –Vocabulary
Development
Define and register concepts in an
ISO -like registry
Draft Abstract Model and Initial
Vocabulary Product
Interact with other RDA WG about terms used across working groups
Planned Work – still ambitious targets
Complete vocabulary and model development
January 2013 ~Spring-Summer
Phase 4
Write a final reference document
Optimize the registered definitions
2014
Report at 4th Plenary????
Reach out and do dissemination and model roll out.
Enable adoption and facilitate tool development.
Adoption Plan – revised based on Review
Comments
• Adoption plan idea is to test adoption based on its usefulness for other
RDA WGs - E.g. metadata, PID, Contextual MD, Policy, Repositories,
Citation of Dynamic Data.
• In addition there will be test adoptions by projects an efforts that have
agreed to participate in develop and employment of a project-based use
case
• Adoption within disciplinary EUDAT initiatives and the DASISH cross-disciplinary
initiative in the area of social sciences and humanities (Europe)
• iCORDI International Collaboration on Research Data Infrastructure (US & Europe)
• RENCI
• ESSD
• ENVRI (Environment Research Infrastructure)
• Deep Carbon Observatory data science initiative underway, will build on PID.
Centers &disciplines that would be involved in the adoption effort.
Project Name
Involved Center(s)
ENES (as part of EUDAT) Max-Planck-Institut für Meteorologie, Institute d'Astronomie et de Géophysique
Georges Lemaitre, Université Catholique du Louvain, University of East Anglia,
Discipline
Climate modeling community. astronomy,
meteorology, hydrology etc.
Climate Research Unit, School of Environmental Sciences, DKRZ, etc.
EPOS (as part of EUDAT) INGV Rome, GFZ Potsdam. KNMI Amsterdam, CSIC Barcelona, IPGP Paris etc.
Deep Carbon
see (http://www.epos-eu.org/community/national-and-regional-consortia.html)
Earth Plate Observation, vulcanology, seismology,
Earth surface dynamics, Tsunamis, etc.
Geophysical Lab of CARNEGIE Institution of Washington
Physics, Geology, Oceanography, Biology
, Woods Hole Oceanographic Institution,RPI
NCGEN
RENCI- Renaissance Computing Institute, Ohio State University, Emory University,
Clinical Genomics
University of Pittsburgh
Universität Tübingen, MPI for Psycholinguistics, Huygens Instituut,s Instituut,
CLARIN- Common
LAnguage Resources and DANS, Institute for German Language, etc.
technology
INfrastructure (as part of See all centers in the assessment procedure http://www.clarin.eu/node/2971
EUDAT)
Languages, Linguistics, Brain-Research, Ethnology,
Anthropology, etc.
Earth System Science
Data (ESSD)
Copernicus Publications
Publication, Library Science, Data Science
ADCIRC
RENCI, Coastal Hazards Center, DHS, US Army Corps Engineers
Ocean and Storm modelling
Adoption Strategy and Assumptions
• Assumption is that many will get on board if reference model is
inclusive and done well in responsive to input and involvement.
• Adoption is aided if work can support diversity through tools that
employs a good representation usable by many if not everyone.
WG Membership
• About 40 members on the Case Statement including from other WGs
• Some new people here???
• More defined roles???
• Liaison with WGs???
• “It is about the alliance”