Presentation

Sustainability of Data Repositories
in Lower and Middle Income
Countries
(Buy-in or Build-it-Yourself)
Martie van Deventer
SciDataCon: 12 September 2016
1
Roadmap
• Background – why am I considering buying-in?
– Institutional repositories - OpenDOAR
– OCLC Worldshare
– LMIC data repositories on Re3Data
• Sustainability
• Recommendations
– Advantages
– Disadvantages
2
Open access institutional repositories
Total: 3212
Africa – 141 (5%)
Asia – 635 (20%) (China – 39 / India – 73)
South America – 274
Russian Federation - 26
Just because it is established does not
mean it is sustainable / trusted
… we, at
times, do
what
appears
almost
impossible
to do …
Sustainability of repositories requires:
• Reliable source of funding – to establish and
maintain the infrastructure.
• Capable staff in sufficient numbers – maintain
the infrastructure and steward the content.
• Documented procedures – followed diligently so
that consistency could be maintained.
But in LMICs these are often trumped by
• Other priorities – food security, health, inequality
• Lack of basic infrastructure – water & electricity
• Most of all … insufficient funds to get the task /
research completed
5
Reality …
• A constant game of catch-up.
• Since 2011 - supervised mini-Dissertations from four
different institutions based in four different African
countries.
• All trying to convince their institutions to establish
sustainable IRs.
• … but then they, at times, cannot submit drafts of
chapters because they do not have access to electricity.
• There must be a better way.
6
ResearchSpace – our institutional
repository
• Established in 2007
• One of the first in South Africa. Well populated, well
used.
• Cannot be regarded as a trusted repository.
• August 2016 we had to migrate from version 1.8 to
version 5 – previously upgrading was not seen as a
priority.
• No programme of curation or long term preservation –
we replace files that get corrupted.
7
Then along comes data …
More repositories are
required.
Re3data data
Re3Data
142 are reported to be from LMICs
Representing 35 countries
How long will it take to catch-up?
10
Learning from the migration to OCLC
Worldshare
• Collaborative catalogue of library holdings
• Yes that too is working with data …
– We in 2001 decided to share a catalogue with the University of
Pretoria.
– We shared costs and data but not skills.
– Used international standards and tools.
– Good idea at the time but we lost all infrastructure and skills.
– In 2015 we were forced to reconsider. Choice (1) Build our own
(2) buy into a hosted system.
– Led to ‘buying into’ OCLC Worldshare – considerably less
expensive than to re-establish our infrastructure.
– Migrated in May 2016 – some disadvantages but many added
advantages including huge reductions in cost.
11– Direct access to high quality records – efficiency improvements.
Finding the research data ‘OCLC’
• Plea is for the data community to learn from this
exercise and expand the system of subject /
discipline data repositories. Ensure that they are
sustainable and trusted.
• Furthermore it is a plea to the LMICs to, from the
start, identify and support relevant discipline
repositories. To help make them sustainable.
• Collaboration for the sake of better
science.
12
Subject data repositories
• Several already exist. (see: http://oad.simmons.edu/oadwiki/Data_repositories )
• Manage data for a specific community.
• Facilitate access to the domain’s data but also provide
security where it is needed.
• Respond to the unique requirements of a very specific
community.
• Create guidelines that are specific to the discipline /
community.
• Advocate transparency within the discipline / community.
• Collaborate with and mediate among other related
disciplines to ensure interoperability across
scientific communities & to collectively promote
common good practice.
Source: Ember & Hanish, 2013, p2
13
Recommendations for LMICs
• Not to build our own – to buy into that which is already
established … learn from the library catalogue migration
project.
• That data stewards from LMICs therefore play ‘leap frog’
rather than ‘catch up’ … in the interest of open science.
• That relationships with established and trusted subject
repositories be formalised.
• That researchers and stewards from LMICs accept the
responsibility to contribute content that is of value –
following set standards – fast track their own learning.
• That responsibilities linked to trustedness (finance,
governance, curation) are considered and negotiated –
firmly believe that less repositories will lead to more data of
real value.
14
Outcome
Advantages
Disadvantages
• Reduced overheads –
same value, less resources
required.
• Fast track contribution –
adding content very quickly.
• Visibility and discoverability
of the LMIC contributions.
• Trust in the content –
discipline prescribes the
standards.
• Use the opportunity to play
leap-frog not catch-up.
• Could still harvest for the IR.
• Trust & relationships are
more difficult to establish
than infrastructure.
• This is not a free ride. (Do
not expect that there will be
NO costs – expect to make
a fair contribution towards
the sustainability of the
subject repository.)
https://www.facebook.com/AfricaThisIsWhyILiveHere/photos/a.120611361341826.17791.120578284678467/1096408660428753
/?type=3&theater
References
•
•
•
•
•
•
17
De Waard, A. 2013. Bridging the Gap Between Small and Large Research Repositories (There is No Dumb Data!).
Presentation for OAI8 workshop, Geneva. Available: http://www.slideshare.net/anitawaard/bridging-the-gapbetween-small-and-large-research-repositories?qid=82eb9e44-f357-44cf-90ac57ed00222365&v=&b=&from_search=2 Accessed: Sep 2016
Ember, C. & Hanisch, R. 2013. Sustaining domain repositories for digital data: A white paper.
OpenDOAR. Online: http://www.opendoar.org/find.php
Re3data. Online: http://www.re3data.org/
Sholze, F. 2013. Data management and governance - the re3data.org experience . Available:
http://www.slideshare.net/kitbib/elab-16-513re3datascholzefinal-24246119?qid=7cf357eb-3553-4fe1-8652cfcadb460621&v=&b=&from_search=3
Pictures: https://za.pinterest.com/martie1042/scidatacon/?etslf=12089&eq=scidatacon