Overview of data repositories certification schema

Digital Repository Certification Schema
A Pathway for Implementing the GEO
Data Sharing and
Data Management Principles
Robert R. Downs, PhD
Sr. Digital Archivist, CIESIN, Columbia University
7 November 2016, GEO XIII Plenary
Data Providers Side Event
Applicability of Repository Certification to
Data Sharing Principles and
Data Management Principles
• Data sharing depends upon reliable repositories
– Repository certification verifies potential for reliability
– Renewal of certification affirms continuing reliability
• Data management depends on capabilities
– Validated by repository certification
• Repository certification in DMP IG
– DMP1: Metadata for Discovery;
– DMP7: Data Preservation;
– DMP8: Data and Metadata Verification;
Why Certify Data Repositories?
• Data producers need to know where to deposit their data
– Trust that their data will be preserved, curated, and disseminated
• Data users need to know where they can find data
– Data that are vetted, described for use, and available in the future
• Funders need to know who to support for data management
– Where services are reviewed routinely for continuous improvement
• Publishers need to know who to recommend for archiving data
– Where referenced data will be persistently accessible and usable
• Data professionals need to know where they can practice
– Apply their data management skills
– Obtain professional development in data stewardship
• Data centers need to know how they are performing
– Policies, procedures, and practices that need improvement
Derived From: Downs. 2016. Audit of a Scientific Data Center: Opportunities & Approaches.
RDA SciDataCon 2016. Denver, CO
3
Data Repository Certification:
Costs vs Benefits
• Potential costs
–
–
–
–
–
–
Time for repository management and staff training
Training registration fees and travel costs, if applicable
Time for repository management and staff to prepare and be audited
Pre-audit improvements (staff, software, hardware), if applicable
Audit instrument and certification fees, if applicable
Audit service fee, including auditor travel costs, if applicable
• Potential benefits
–
–
–
–
–
Improve transparency and quality assurance
Compare capabilities and services with standard practices
Review and identify gaps in current and planned services
Plan for needed enhancements to improve capabilities and services
Obtain certification that recognizes attainment of standard, if applicable
Derived From: Downs. 2016. Audit of a Scientific Data Center: Opportunities & Approaches.
RDA SciDataCon 2016. Denver, CO
4
Selected Research Data Repository Audit Instruments
• Audit and Certification of Trustworthy Digital Repositories (ISO
16363:2012)
– https://public.ccsds.org/Pubs/652x0m1.pdf
• Trustworthy Repositories: Audit & Certification (TRAC)
– https://www.crl.edu/sites/default/files/d6/attachments/pages/trac_0.pdf
• NESTOR (DIN 316442)
–
http://www.langzeitarchivierung.de/Subsites/nestor/EN/nestor-Siegel/siegel_node.html
• Data Management Maturity (DMM) Framework (AGU & CMMI)
– http://dataservices.agu.org/dmm/
• DSA–WDS Catalogue of Common Requirements
–
https://docs.google.com/document/d/1_DPwSA5P8LpK9Q34BhxJmX8So2GKL7eSLa-Gz5JvVg/edit
• International Council of Science (ICSU) - World Data System (WDS)
– https://www.icsu-wds.org/services/certification
• Data Seal of Approval (DSA)
– http://www.datasealofapproval.org/en/
5
Initiating Repository Certification
• Planning
– Initiated by repository management
– Identify candidate audit instruments
• Preparation
– Conduct a self-assessment
– Complete improvements to address weaknesses
• Scheduling
– Identify availability of auditors
– Identify availability of repository management and key staff
Derived From: Downs. 2016. Audit of a Scientific Data Center: Opportunities & Approaches.
RDA SciDataCon 2016. Denver, CO
6
Evaluating Candidate Certification Schema
• What does the candidate instrument measure?
• Is it applicable to the data center, its services, and capabilities?
• Can the metrics measure whether each requirement has been
satisfied?
• Can the instrument be used internally?
• By data center staff for pre-assessment or post-assessment
reviews
• Validity
• Has the instrument been developed by a reputable organization?
• Has the instrument been reviewed and reviewed recently?
• Is the instrument a recognized standard for assessing data
centers?
• Has it been endorsed by the community and by independent
bodies?
Derived From: Downs. 2016. Audit of a Scientific Data Center: Opportunities & Approaches.
RDA SciDataCon 2016. Denver, CO
7
Data Repository Assessment Considerations
Frequency
Authority
Internal
External
Once
Internal one-time
self-assessment
External one-time
assessment
Periodic
Internal periodic
self-assessments
External periodic
assessments
• Scope of audit:
– Holistic vs targeted to specific capabilities, functions, or collections
• Approach should be based on objectives
– Why is the repository seeking an assessment?
– Which stakeholders are encouraging the assessment?
– Is improvement the primary objective or a credential?
Derived From: Downs. 2016. Audit of a Scientific Data Center: Opportunities & Approaches.
RDA SciDataCon 2016. Denver, CO
8
Data Repository Certification:
Improving Data Provision
• Opportunity to identify, document, justify, and plan for
needed enhancements, such as policy improvements,
resource acquisitions, professional development, staff
recruitment, procedure revisions, system upgrades, and
new services
Derived From: Downs. 2016. Audit of a Scientific Data Center: Opportunities & Approaches.
RDA SciDataCon 2016. Denver, CO
9
Continuously Improving
the Scientific Data Archive
Decide to
Be Audited
Review
Audit
Requirements
Decide to
Improve
Audit
Preparation
Implement
Changes
Decide on
Each Change
Plan
Needed
Changes
Decide to
Improve
Source: Downs & Chen. 2012. Improving the Trustworthiness of an Interdisciplinary Scientific Data Archive.
Request
Audit
Complete
Audit
10
Benefits of Data Repository Certification
• Improves transparency of archival processes
• Improves quality assurance for stakeholders
• Provides independent evaluation
• Improves efficiency and effectiveness of operations
• Recommendations guide planning for enhancements
• Improves management and stewardship of data
• Increases data preservation capabilities of archive
• Measures compliance with recognized requirements
• Recognition of data center responsibilities & achievements
• Necessary step for certification as a trustworthy repository
Derived from: Downs & Chen. 2012. Improving the Trustworthiness of an Interdisciplinary Scientific Data Archive.
Questions for Discussion
Provide a quality label at the level of data providers?
Define criteria and processes for data providers to receive certification for
e.g by leveraging/endorsing DSA-WDS core certification of data repositories?
Define an authority to decide that data providers receive certification?
Add GEO requirements like brokering or others to core certification?
Continue certification of data providers' quality over time?
Provide a quality label at the level of the datasets?
Establish the DMP GEO Labels?
Define criteria and process for datasets to receive DMP GEO Labels?
Define an authority to decide that datasets receive DMP GEO Labels?
Align DMP GEO Labels with other international efforts?
Provide other mechanisms or a combination of the above?
Thank you !