PPT of the Workshop - HKBU Library : Digital Services

RESEARCH DATA WORKSHOP
BEST PRACTICES IN HANDLING RESEARCH DATA
13 APRIL 2016, L2, LIBRARY
Brian Minihan, Scholarly Communications Librarian
Digital & Multimedia Services, Hong Kong Baptist University Library
cc: fher_citox - https://www.flickr.com/photos/17228122@N00
IN 1 HOUR
YOU WILL BE DATA LITERATE
cc: fixlr - https://www.flickr.com/photos/72996797@N00
Data Literacy means demonstrating knowledge in the following as it
applies to research
metadata
backups
naming conventions
data sharing & citation
file formats & access
best practices
(Carlson, 2011)
cc: oh sk - https://www.flickr.com/photos/7752378@N03
TOOLS
AND BEST OF ALL....
cc: tunnelarmr - https://www.flickr.com/photos/27311060@N00
THE TRADITIONAL RESEARCH PROCESS
1
DATA/NOTES
cc: davidsilver - https://www.flickr.com/photos/66267550@N00
THE TRADITIONAL RESEARCH PROCESS
2
MANUSCRIPT DRAFT
cc: catherinecronin - https://www.flickr.com/photos/61340363@N05
THE TRADITIONAL RESEARCH PROCESS
3
YEAH! PUBLISHED!
cc: diylibrarian - https://www.flickr.com/photos/45175920@N00
THE CURRENT RESEARCH PROCESS
1
DATA
cc: Meg Stewart - https://www.flickr.com/photos/54129171@N00
THE CURRENT RESEARCH PROCESS
2
MANUSCRIPT
cc: Fláudio! - https://www.flickr.com/photos/59606083@N00
THE CURRENT RESEARCH PROCESS
3
PUBLISHED!
THE CURRENT RESEARCH PROCESS
but thats not all...
THE DEBATE OVER THE DIRECTION OF
ACADEMIC RESEARCH IS HAPPENING RIGHT
NOW
cc: Rafael Peñaloza - https://www.flickr.com/photos/40752609@N00
DATA
WHAT'S DIFFERENT NOW IN ACADEMIA?
New York University, Health Sciences Library
https://youtu.be/N2zK3sAtr-4
RESEARCH BY
OSMOSIS
cc: Halcyon - https://www.flickr.com/photos/48600112858@N01
RESEARCH BY
OSMOSIS
Speaking from over 25 years of personal experience, this author would assert that
a large number of graduate students, even of doctoral students, continue to
struggle to pick up skills necessary for their thesis and dissertation research, the
keener of them often depending heavily on librarians. To be even more brutally
honest, many of these students have an uncanny ability to optimize highly
inefficient research methods and somehow pull together a decent dissertation by
sheer brilliance alone despite shabby skills.
Badke, William A. Teaching Research Processes: the faculty role in the
development of skilled student researchers. Witney: Chandos, 2012.
cc: Halcyon - https://www.flickr.com/photos/48600112858@N01
WHAT WE NEED TO KNOW
1. What is the story of the data?
2. What form and format are the data in?
3. What is the expected lifespan of the data?
4. How could the data be used, re-used and re-purposed?
5. How large is the dataset? Is growing? If so at what rate?
cc: Lawrence OP - https://www.flickr.com/photos/35409814@N00
WHAT WE NEED TO KNOW
1. who are the potential audiences for the data?
2. who owns the data?
3. does the data have any sensitive info?
4. what publications or discoveries have resulted from the data?
5. how should the data be made accessible?
cc: gaelx - https://www.flickr.com/photos/7574080@N08
TYPES OF DATA: Observational
captured in real-time, temporal-specific, usually
elsewhere
usually irreplaceable
survey results, photos or images, signal/sensor readings,
anthropological fieldwork
California Digital Library, Data Management Plan Tool
https://dmptool.org/community_resources#promotion
TYPES OF DATA: Experimental
controlled conditions
usually reproducible, but likely expensive
lab outcomes
California Digital Library, Data Management Plan Tool
https://dmptool.org/community_resources#promotion
TYPES OF DATA: Simulational
machine or artificially produced test models
likely reproducible if pre-conditions are preserved
climate, economic models
California Digital Library, Data Management Plan Tool
https://dmptool.org/community_resources#promotion
TYPES OF DATA: Derived or Compiled
generated from existing datasets
usually reproducible, but time-consuming
text, data-mining, archaeology, historical, archival research data
California Digital Library, Data Management Plan Tool
https://dmptool.org/community_resources#promotion
METADATA
a jargon-ey term for description
cc: juhansonin - https://www.flickr.com/photos/38869431@N00
METADATA
DESCRIPTION AIDING ACCESS
cc: dnfisher - https://www.flickr.com/photos/7206577@N08
BACKUPS
cc: godog - https://www.flickr.com/photos/27519926@N00
BACKUPS
VERSIONING:
giving each changed version of a file a unique
name
DATA LOSS IS WHEN......NOT IF
cc: deanmeyersnet - https://www.flickr.com/photos/36363318@N04
NAMING CONVENTIONS
A GLIMPSE INTO CHAOS
NAMING CONVENTIONS
if it is valuable make sure you can find it!
Bulk Rename Utility (Windows; free)
Renamer (Mac; free trial)
PSRenamer (Linux, Mac, Windows; free)
AID METADATA AND RETREIVAL
cc: _ambrown - https://www.flickr.com/photos/79105258@N00
FILE FORMATS AND ACCESS
cc: bfirsh - https://www.flickr.com/photos/37343463@N08
FILE FORMATS
 .jpg
 .csv
 .mp4
 xml
cc: sntc06 - https://www.flickr.com/photos/18169200@N00
.xlsx
.docx
.pptx
.psd
.mov
ACCESS
cc: BasBoerman - https://www.flickr.com/photos/31149081@N02
DATA SHARING & CITATION
DATA SHARING
NOT HIDDEN IN SOME DRAWER
cc: sindesign - https://www.flickr.com/photos/54774885@N00
CITING DATA?
YES. LIBRARIANS ARE STANDARDISING THIS NOW
TOOLS FOR WORKING with DATA
OPEN REFINE
formerly Google Refine
• freely downloadable client
• much more powerful than excel
• easily converts to “library-ready” formats,
like xml, json files and triple decks
• excellent for storing and versioning
http://openrefine.org/
https://github.com/OpenRefine
OPEN REFINE
DEMO
SILK
a very powerful online relational database and data
visualization tool
https://www.silk.co/
PLOTLY
a simpler data visualization tool, which be used in
relation with other software formats
https://www.plot.ly/
REPOSITORIES FOR
STORING
DESCRIBING
AND ALLOWING DATA
TO BE
ACCESSIBLE
FIGSHARE
a free repository, which specialises in hosting
accessories to research
• mostly made up of data published in PLoS
One, Nature and Science journals
• free DOI minting (important for accessibility
and citations)
• can set keywords (metadata)
• track use
https://www.figshare.com/
GITHUB
a free repository, which houses open-source tools,
and code such as Open Refine
https://github.com/
DRYAD
a free repository, for searching and sharing life sciences
research data
http://datadryad.org/
ICPSR
a huge social science research data repository, hosted by the
University of Michigan
https://www.icpsr.umich.edu/icpsrweb/landing.jsp
CITED REFERENCES
Badke, W (2010). Why Information Literacy is Invisible. Communications in Information Literacy, 4 (2),
129-141.
Carlson, J., Fosmire, M., Miller, C. C., & Sapp Nelson, M. (2011). Determining Data Information Literacy
Needs: A Study of Students and Research Faculty. Portal: Libraries and the Academy, 11(2), 629–657.
http://dx.doi.org/10.1353/pla.2011.0022
OTHER RESOURCES
CITING DATA:
Mooney, H. & Newton, M.P., (2012). The Anatomy of a Data Citation: Discovery, Reuse, and Credit.
Journal of Librarianship and Scholarly Communication. 1(1), p.eP1035.
http://dx.doi.org/10.7710/2162-3309.1035
DataCite
https://www.datacite.org/
FORMATS, DESCRIPTION (metadata), STORAGE and COPYRIGHT
California Digital Library’s Guide to working with Data
https://dmptool.org/community_resources
DISCUSSION?
cc: IITA Image Library - https://www.flickr.com/photos/45796762@N03
cc: jessamyn - https://www.flickr.com/photos/35034353562@N01