Draft Data Foundation and Terminology (DFT)

RDA Data Foundation and
Terminology (DFT) WG
Bit Sequence
Bit
seq*
transfer
PID
+prop
replication
cksm
MD
+prop
transfer
MD
________
PID
extension
A PID record that points to a metadata
record and to instantiations of identical bitstreams that may store additional attributes
Prepared for 3rd Plenary DFT WG Sessions 1 & 2, March 27, 2014
Co-Chairs DFT WG: Gary Berg-Cross, Raphael Ritz, Peter Wittenburg
European Early Career Researchers & Scientists working with Data Scribe:
Reko Hynönen., University of Helsinki, Finnish meteorological institute
Outline for Sessions 1 (1100-1230) & 2 (1330-1500)
Session 1 (90m)
0. BRIEF Intros
5?
1. Short Background session -5
2. Where we are now
-5
3. Where we think we are going -2
4. Use Cases
1.
2.
3.
Wittenburg & Ritz
Reagan Moore
Hans Pfeiffenberger
-10
- 18
5. Discussion of 3 Core Area -50
1.
2.
3.
Data and Digital Objects & Entities
Persistent Identifier / PID Record / PID
Attribute / PID Resolution / Reference
Resolution
Aggregation / Collection / Data Set / Corpus
/ Container….
Session 2 (90m)
1. Follow up to Session 1
-10
2. Discussion of 7 Core Areas – 70
• Bit Stream / Instances of Bit Stream / Data
Stream
• Identity / Integrity / Authenticity
• Object Property / Object Attribute / Property
Record / Internal Property / External Property
• Data Organization / Data Model
• Repository / Repository of Origin
• Data / Realtime Data / Gappy Data / Dynamic
Data
• Data LifeCycle (as time permits)
3. Wrap up and Next Steps -10
Background - DFT Goals
See Case Statement Briefing(https://rd-alliance.org/filedepot/folder/100?fid=255)
Describe a basic, abstract (but clear) data organization model that
systemizes the already large body of definition work on data
management terms, especially as involved in RDA’s efforts.
• The model and its derived reference data should be sound, practical
and agreed to within the community for use:
1. across communities and stakeholders to better synchronize data
conceptualization,
2. to enable better understanding within and between communities,
and
3. to stimulate adopters & tool building, such as for data services,
supportive of the basic model’s use.
• Need to get the story straight on model to govern the use of related tools.
Models &
Candidate List
Evolves to
Refined List
DFT WG
Discussion
& Plenary 3
Cross WGs
Future Work
2015?
Five Stage Process
Data Foundation and Terminology (DFT) Vocabulary
Development Process
by Gary Berg-Cross
1. Start up/Scoping Requirements analysis and development of candidate list
1. Tool prototyping
2. Vocabulary Analysis & Revision Process (after 2nd Plenary)
1. Tool demo and final requirements at 3rd Plenary
2. Show Core vocabulary in table form for discussion
3. Focused Vocabulary Design Process and Community Agreement (at and after
3rd Plenary)
4. Refinement & Maintenance (ongoing)
5. Draft Vocabulary Publication and Review (4th Plenary)
Scope
Terms from
Model Papers
Placed In Tool
Overview of Term Development
Term Definition Tool
prototyped and
developed at
Rechenzentrum
Garching (RZG) der
Max-PlanckGesellschaft
Starter areas and items :
Persistent Identifiers (PIDs and types)
Digital Object - Data Object
Collection - Data Set - Aggregation
Repository (Registries and related Policies)
Digital
Object
A digital object is composed of
structured sequence of bits/bytes. As
an object it is named. This bit
sequence can be identified &
accessed by a unique and persistent
identifier or by use of referencing
attributes describing its properties.
Getting Defs
organized for
review
Analysis and
Revision Process
Latest Version of Term Definition Tool (TeD-T?)
Term Definition Example
Alternative?
This page was last modified on 9 December
2013, at 14:03.
digital entity: An entity represented as, or converted
to, a machine-independent data structure consisting of
one or more elements in digital form that can be
parsed by different information systems; the structure
helps to enable interoperability among diverse
information systems in the Internet.
From Framework for discovery of identity
management information
Revision Discussion: : This definition does not refer to
our practice and is not specific enough. A digital object
can cover different types of digital information such as
data, software, knowledge etc. So we should separate
data and other types of digital information. Also the
reference to databases is not useful enough since
there are many types of “containers” data is in - the
term “database” does not help us since it refers to any
type of container. And in DFT we need to stress the
fact that a DO is something that has an identity one
can refer to, that has a number of properties that can
be accessed etc.
Peter
PID Term and Discussion
Discussion on email and Tool (http://smw-rda.esc.rzg.mpg.de/index.php/Talk:Persistent_Identifier_(PID)
• We should emphasize that persistence is not purely technical, which is a point I think John Kunze in particular
would agree to - there's social contracts associated with the idea of persistence. If you don't put those policies
in place, persistence is undefined at best. Which, on 2nd thought, also means that not just the resolution service
is persistent, but also the association between identifier and target object. Which is a contract probably put on
the shoulders of the agent requesting the PID in the first place, because the service will be unable to
decide/maintain this.-- Tobias
• Tobias, you have evoked a few things such as PID Service (need to include this as a term). So should we have
defs with the idea of a Contract by Agent as part of the metadata for a PID? Assertions: PID Requesting Agent
(sub-type of Agent) contracts to maintain connection (definition?) between ID & Target Object. TO has contract.
PID service is a service.– Gary
• The PID Service and the PID System might be the same thing in reality. One diff may be that the PID System is
maintaining a Resolution Service, while the PID Service is the entity with which the contract is made. Each PID
Service employs a PID System. Each PID System can be employed by several PID Services.
• Example for a PID Service: DataCite
• Example for a PID System: The DOI System
• Example for a Resolution Service: 2a00:1a48:7805:112:2c13:65be:ff08:2e89 - better known as dx.doi.org
• (In reality, there really is a contract between e.g. DKRZ and DataCite; so this seems adequate)
TobiasWeigel (talk) 09:01, 10 December 2013 (UTC)
Conceptual Spaces
property
contains_a
attribute
data
stream
is_equal
has_a
PID
record
metadata
record
has_a
bit
stream
has_a
is_a
data
object
has_a
digital
object
has_many
instance
of a bit
stream
is_a
is_a
is_a
informational
object
is_part_of
Refinements
aggregation
is_a
is_equal
collection
is_equal
Peter’s Original
service
object
data
set
corpus
Status & Plan Going Forward
• We now have a table of Core Terms with some initial Definitions
• Some are also in the Tool as examples - some still being updated.
• The P3 meeting represents an opportunity to take stock and do some editing,
testing of ideas and refining as well as strategize on next steps.
• Can we get some sense of agreement and where issues are for the WG-Core???
• Work at 3rd Plenary
• Document status
• Tool and Demo
• Discussion of working Core, getting buy-in and next steps
• Schedule Note– “we will never to ‘done’ “ with the topic, but the WG will
complete its targeted mission.
“Unity, not uniformity, must be our aim. We attain unity only through variety.
Differences must be integrated, not annihilated, nor absorbed.”
Mary Parker Follett , The New State, 1918, p.39.
Checklist of Issues/Powerful Questions - What is Needed for DFT Term
Progress?
• Ramp up of effort by DFT WG Community
•
•
•
•
Review of table, categories and definition refinement
Confirmation of scope of work
How do we handle points of contention?
What is the process by which we converge and move to adoption?
• Training in and exposure of Term Tool (Demo tomorrow)
• Use by other WGs for their needs
• Is our table example useful as a model for them?
• Further test of Use Case Scenarios (as presented at the P3 DFT WG)
• Are they useful?
• Do they need to be adapted or drilled down to more detail?
• Do we need examples of term-concepts involved with real data (such as Reagan’s)?
Today’s Sessions- A focus on the following terms / term
we need to make a quantum job in defining a few terms.
We need to argue from the different data models/organizations that were presented
and of course also look what others have done. Our Core clusters:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Data / Realtime Data / Dynamic Data
Digital Object / Data Object / Information Object/Representation Object
Bit Stream / Instances of Bit Stream / Data Stream
Identity / Identify Management/Integrity / Authenticity
Object Property / Object Attribute / Property Record / Internal Property / External
Property
Persistent Identifier / PID Record / PID Attribute / PID Resolution / Reference
Resolution
Data Organization / Data Model
Repository / Repository of Origin
Aggregation / Collection / Data Set / Corpus / Container/ Gappy Data /
Data LifeCycle & Operations