ESIP Federation 101 - ESIP Commons

Section:
Local Data Management
Data Formats:
Choosing and Adopting Community
Accepted Standards
Curt Tilmes
NASA
Version 1.0
February 2013
Copyright 2013 Curt Tilmes
Local Data Management - Data Formats: Choosing and Adopting Community Accepted Standards; Version 1.0, February 2013
Overview
• Some guidelines for choosing and adopting community
accepted standards
Local Data Management - Data Formats: Choosing and Adopting Community Accepted Standards; Version 1.0, February 2013
Background
• Most projects (rightly so) focus on the content of their data
files, you need to consider the format as well.
• Since you captured or created the data, and stored them in
your own files, you know
•
•
•
•
how the data are organized,
how to read them,
how to use them,
characteristics of the data that could constrain their use.
• The goal of a good data format is to make it easier for
others to read the data too.
• Many hours have gone into developing standards for
formats – try to learn from them.
Local Data Management - Data Formats: Choosing and Adopting Community Accepted Standards; Version 1.0, February 2013
Why use community standards?
• If you try to develop your data format from scratch, you will
forget something.
• Build on the experience and improvements built into the
community standards over years of use.
• Tools and analysis software natively support reading
community standard data.
• Reduce development effort and support reuse.
• Positive feedback – they are more likely to be adopted by
others.
Local Data Management - Data Formats: Choosing and Adopting Community Accepted Standards; Version 1.0, February 2013
Why use community standards?
http://xkcd.com/927/
Local Data Management - Data Formats: Choosing and Adopting Community Accepted Standards; Version 1.0, February 2013
A few guidelines
• Consider your archive:
• Do they have any recommendations?
• Consider your users:
• Who wants this data? Why do they want it?
• What do they want to do with it?
• Will they be using your data in concert with other data?
• Consider heritage:
• What worked well for similar data in the past?
• What could be done better for newly created data?
• Consider tools:
• Try to use data formats supported by the software you intend to use it
with.
Local Data Management - Data Formats: Choosing and Adopting Community Accepted Standards; Version 1.0, February 2013
Some examples
• HDF – Hierarchical Data Format
• HDF4 and HDF5 versions are in use today
• A NASA variant called HDF-EOS is used within the Earth Observing
System program.
• The Aura project developed a common approach across their
instruments and released guidelines as a Technical Note.
• NetCDF – Network Common Data Form
• Widely used by agencies including NASA and NOAA
• Climate and forecast (CF) metadata conventions help standardize
some things into NetCDF in a common manner.
Local Data Management - Data Formats: Choosing and Adopting Community Accepted Standards; Version 1.0, February 2013
Adopting standards
• The standard gives you a starting point, not a complete
solution.
• Communicate early with a broad range of data users:
archivists, software engineers, scientists.
• Consider how you will be writing the data and how you will
be reading the data.
• Get feedback before making final decisions.
• Start sharing sample data in proposed format to nail down
specifics and work out ambiguities.
• Document your use and application of the standard
completely.
Local Data Management - Data Formats: Choosing and Adopting Community Accepted Standards; Version 1.0, February 2013
Resources
• HDF: http://www.hdfgroup.org
• HDF-EOS: http://hdfeos.org
• HDF-EOS Aura File Format Guidelines:
• http://disc.sci.gsfc.nasa.gov/Aura/additional/documentation/HDFEOS_
Aura_File_Format_Guidelines.pdf
• http://www.esdswg.org/spg/spgfolder/events/esdswg-meeting-october25-27-2005/auraasabestpracticerev2.pdf
• NetCDF: http://www.unidata.ucar.edu/software/netcdf
• CF: http://cf-pcmdi.llnl.gov/
Local Data Management - Data Formats: Choosing and Adopting Community Accepted Standards; Version 1.0, February 2013
Other Relevant Modules
• Local Data Management – Data Formats: Using Selfdescribing Data Formats
• Learn more about the advantages of using formats for your data that
have important metadata and other information embedded within them
Local Data Management - Data Formats: Choosing and Adopting Community Accepted Standards; Version 1.0, February 2013
Recommended Citations
Tilmes, C. 2013. “Local Data Management – Data Formats: Choosing
and Adopting Community Accepted Standards.” In Data Management
for Scientists Short Course, edited by Ruth Duerr and Nancy J.
Hoebelheinrich, Federation of Earth Science Information Partners:
ESIP Commons. doi:10.7269/P33N21B6
Copyright 2013 Curt Tilmes.