Key Provenance of Earth Science Observation Data Products

Key Provenance of Earth Science
Observation Data Products
Helen Conover1, Beth Plale2, Mehmet
Aktas2, Rahul Ramachandran1,
Prajakta Purohit2, Scott Jensen2, Sara
Graves1
1University
of Alabama Huntsville
2Indiana University
http://provenance.itsc.uah.edu
[email protected]
[email protected]
AGU Fall Meeting
San Francisco, 2011
Project Overview
•  Collaboration among
Data Center Opera+ons –  AMSR-E SIPS (MSFC Earth Science Office
Instant Karma
and UAHuntsville ITSC)
–  Indiana University, Data to Insight Center
Project
Provenance Research Earth Science –  AMSR-E Sea Ice science team (GSFC)
•  Primary goal is to improve the collection, preservation, utility and
dissemination of provenance and context information within the NASA
Earth Science community
–  Using Karma provenance tool
–  AMSR-E standard product generation, with initial focus on sea ice
AMSR-E SIPS Overview
The Advanced Microwave Scanning Radiometer –
Earth Observing System (AMSR-E) is a passive
microwave radiometer built by the Japan Aerospace
Exploration Agency and launched aboard Aqua on 4
May 2002.
–  AMSR-E is designed to detect water in all its state phases in the
environment and monitor the water processes that exert a strong influence
on climate and weather.
Science Investigator-led Processing Systems (SIPS):
–  Work in close cooperation with the instrument science teams and the
Distributed Active Archive Centers (DAACs)
–  To generate high quality climate data products for use by Earth scientists
and science application end-users
AMSR-E Product Suite
Product Description
Science PI
AMSR-E/Aqua L2A Global Swath Spatially-Resampled Brightness
Temperatures (Tb)
Wentz
AMSR-E/Aqua L2B Global Swath Ocean Products derived from Wentz
Algorithm
L3 Ascending/Descending .25x.25 deg Daily Ocean Grids
L3 Weekly and Monthly Ocean Grids
Wentz
AMSR-E/Aqua L2B Surface Soil Moisture, Ancillary Parms, & QC EASE-Grids
Daily L3 Surface Soil Moisture, Interpretive Parms & QC EASE-Grids
Njoku
AMSR-E/Aqua L2B Global Swath Rain Rate/Type GSFC Profiling Algorithm
L3 Monthly Global Averaged Rainfall Accumulation over Land & Ocean
Kummerow,
Ferraro,
Wilheit
AMSR-E/Aqua Daily L3 6.25 km 89 GHz Brightness Temperature (Tb) Polar
Grids
L3 Daily 12.5 km Tb, Sea Ice Concentration, & Snow Depth Polar Grids
L3 Daily 25 km Tb, Sea Ice Concentration
Sea Ice Drift
Markus,
Cavalieri,
Comiso
AMSR-E/Aqua Daily L3 Global Snow Water Equivalent EASE-Grids
L3 5-Day and Monthly Snow Water Equivalent grids
Kelly
Sample Imagery
Science-Relevant
Provenance Information
•  Data lineage (data inputs, software and hardware) plus additional
contextual knowledge about science algorithms, instrument variations,
etc.
•  Lots of information already available, but scattered across multiple
locations
Processing system configuration
Data collection and file level metadata
Processing history information
Quality assurance information
Software documentation (e.g., algorithm theoretical basis documents, release
notes)
–  Data documentation (e.g., guide documents, README files)
– 
– 
– 
– 
– 
•  Project aims to collate and organize information from multiple sources,
make available through the AMSR-E Provenance Browser
SIPS-GHRC Processing Architecture
•  Algorithm Packages
from the Science
Team and Team
Lead of the Science
Computing Facility
•  Processing
automation
controlled by SIPS
scripts
– Pass processing
is data driven
– L3 product
generation is
scheduled after
nominal
availability of
input products
Ancillary File(s)
Control Script Input Data and
Metadata File(s)
Delivered Algorithm Package Science Data
Metadata
Processing History
Quality Assurance
Browse imagery
Custom subsets
Provenance Information Capture
Product
Generation
Software
Provenance
Logs
Product/
Algorithm
Metadata Form
Provenance
Browser
Granule
metadata
Provenance Repository
Product /
Algorithm
Metadata
Drupal web
application
AMSR-E Provenance Use Cases
ü 
Browse provenance graphs : convey rich information about final data granule
details [Use case 1]
-  Spatial location, time of observation, algorithms employed, input data and ancillary
files
-  Provenance bundle to include pointers to relevant documentation
ü 
Answer “Something isn’t right” question [Use case 1 variant]
• 
Compare two data granules [Use case 2]
ü 
General provenance graph for a given science process, e.g., Sea Ice processing
[Use case 3]
- 
- 
- 
• 
E.g., did not receive data for several days so snow melt mask may be inaccurate.
Query system to get list of provenance differences (e.g., versions of software, number and
versions of input files)
Current algorithms and versions, nominal number and versions of input files, pointers to
relevant documentation
Embed provenance information as annotations in data files
-  ISO “Lineage” model
-  NASA ES conventions?
Project Results
Products for potential reuse resulting
from this research:
•  Provenance collection tools
–  Software library with reference implementation in
Perl
–  Schema for provenance logs
•  Schema for high-level data product and
algorithm context information
•  Drupal profile for provenance storage and
display, includes two new Drupal modules:
–  Provenance browser
–  Data product and algorithm context form
Earth science Library for
Processing History (ELPH)
•  Perl Library reference implementation can be used to
instrument script-driven processing
•  Logs “consumed”, “invoked” and “produced” events
•  Assigns unique URIs to all artifacts (e.g., data files or software processes), of
the form http://host-id/environment/datetime/name
–  host-id indicates computer system on which processing is performed
–  environment indicates processing environment (e.g., dev, test or ops for AMSRE)
–  datetime (yyyymmddhhmmss) indicates production date/time for science data
files, last modification date/time for other files, system date/time for processes or
workflows
–  name indicates name of file or process
Provenance and Context Information
•  Harvesting granule information from ECS metadata
–  Also recording processing location associated with each data granule
•  Worked with AMSR-E Science Computing Facility to identify Context
Summary information for algorithm and data product
–  Algorithm versions and descriptions
–  Parameters and data fields in science products
–  Ancillary files used in processing
–  Flag values and explanations
–  Pointers to full documentation
•  Provenance logs for “consumed”, “invoked”, and “produced” events
•  OPM-compliant, science relevant data provenance schema
–  provenance information exchange, Karma repository to AMSR-E Provenance
Browser
•  ISO Lineage for granule provenance metadata
Context Summary Metadata
Schema
Basic Product Information
Delivered Algorithm Package
File Information
Grid type
Documentation Links
Science Algorithm
Information
Geophysical
Parameters / Variables
and Associated
Information
Data fields
Ancillary files
Flag values
AMSR-E Provenance Browser
Browse by
Product Images
Browse by
Product Type
Search by
Filename
Explore
Provenance
AMSR-E Provenance Browser
Processing
Graph
Algorithm
Information
Granule
Metadata
Parameter
In File
AMSR-E Provenance Browser
Parameter
Selected
Algorithm
Used
Status and Plans
•  Currently able to collect provenance information for all
AMSR-E standard products generated at SIPS-GHRC
–  Level-2B and Level-3 products
–  Implemented in provenance testbed
•  Plan to implement provenance collection in AMSR-E SIPS
before reprocessing to begin in February 2012
•  Working with NSIDC DAAC on delivery of provenance
information with data products
–  Content and format of information to provide
–  Include as annotation within the data granule, separate additional
metadata file, or both