Key Provenance of Earth Science Observation Data Products Helen Conover1, Beth Plale2, Mehmet Aktas2, Rahul Ramachandran1, Prajakta Purohit2, Scott Jensen2, Sara Graves1 1University of Alabama Huntsville 2Indiana University http://provenance.itsc.uah.edu [email protected] [email protected] AGU Fall Meeting San Francisco, 2011 Project Overview • Collaboration among Data Center Opera+ons – AMSR-E SIPS (MSFC Earth Science Office Instant Karma and UAHuntsville ITSC) – Indiana University, Data to Insight Center Project Provenance Research Earth Science – AMSR-E Sea Ice science team (GSFC) • Primary goal is to improve the collection, preservation, utility and dissemination of provenance and context information within the NASA Earth Science community – Using Karma provenance tool – AMSR-E standard product generation, with initial focus on sea ice AMSR-E SIPS Overview The Advanced Microwave Scanning Radiometer – Earth Observing System (AMSR-E) is a passive microwave radiometer built by the Japan Aerospace Exploration Agency and launched aboard Aqua on 4 May 2002. – AMSR-E is designed to detect water in all its state phases in the environment and monitor the water processes that exert a strong influence on climate and weather. Science Investigator-led Processing Systems (SIPS): – Work in close cooperation with the instrument science teams and the Distributed Active Archive Centers (DAACs) – To generate high quality climate data products for use by Earth scientists and science application end-users AMSR-E Product Suite Product Description Science PI AMSR-E/Aqua L2A Global Swath Spatially-Resampled Brightness Temperatures (Tb) Wentz AMSR-E/Aqua L2B Global Swath Ocean Products derived from Wentz Algorithm L3 Ascending/Descending .25x.25 deg Daily Ocean Grids L3 Weekly and Monthly Ocean Grids Wentz AMSR-E/Aqua L2B Surface Soil Moisture, Ancillary Parms, & QC EASE-Grids Daily L3 Surface Soil Moisture, Interpretive Parms & QC EASE-Grids Njoku AMSR-E/Aqua L2B Global Swath Rain Rate/Type GSFC Profiling Algorithm L3 Monthly Global Averaged Rainfall Accumulation over Land & Ocean Kummerow, Ferraro, Wilheit AMSR-E/Aqua Daily L3 6.25 km 89 GHz Brightness Temperature (Tb) Polar Grids L3 Daily 12.5 km Tb, Sea Ice Concentration, & Snow Depth Polar Grids L3 Daily 25 km Tb, Sea Ice Concentration Sea Ice Drift Markus, Cavalieri, Comiso AMSR-E/Aqua Daily L3 Global Snow Water Equivalent EASE-Grids L3 5-Day and Monthly Snow Water Equivalent grids Kelly Sample Imagery Science-Relevant Provenance Information • Data lineage (data inputs, software and hardware) plus additional contextual knowledge about science algorithms, instrument variations, etc. • Lots of information already available, but scattered across multiple locations Processing system configuration Data collection and file level metadata Processing history information Quality assurance information Software documentation (e.g., algorithm theoretical basis documents, release notes) – Data documentation (e.g., guide documents, README files) – – – – – • Project aims to collate and organize information from multiple sources, make available through the AMSR-E Provenance Browser SIPS-GHRC Processing Architecture • Algorithm Packages from the Science Team and Team Lead of the Science Computing Facility • Processing automation controlled by SIPS scripts – Pass processing is data driven – L3 product generation is scheduled after nominal availability of input products Ancillary File(s) Control Script Input Data and Metadata File(s) Delivered Algorithm Package Science Data Metadata Processing History Quality Assurance Browse imagery Custom subsets Provenance Information Capture Product Generation Software Provenance Logs Product/ Algorithm Metadata Form Provenance Browser Granule metadata Provenance Repository Product / Algorithm Metadata Drupal web application AMSR-E Provenance Use Cases ü Browse provenance graphs : convey rich information about final data granule details [Use case 1] - Spatial location, time of observation, algorithms employed, input data and ancillary files - Provenance bundle to include pointers to relevant documentation ü Answer “Something isn’t right” question [Use case 1 variant] • Compare two data granules [Use case 2] ü General provenance graph for a given science process, e.g., Sea Ice processing [Use case 3] - - - • E.g., did not receive data for several days so snow melt mask may be inaccurate. Query system to get list of provenance differences (e.g., versions of software, number and versions of input files) Current algorithms and versions, nominal number and versions of input files, pointers to relevant documentation Embed provenance information as annotations in data files - ISO “Lineage” model - NASA ES conventions? Project Results Products for potential reuse resulting from this research: • Provenance collection tools – Software library with reference implementation in Perl – Schema for provenance logs • Schema for high-level data product and algorithm context information • Drupal profile for provenance storage and display, includes two new Drupal modules: – Provenance browser – Data product and algorithm context form Earth science Library for Processing History (ELPH) • Perl Library reference implementation can be used to instrument script-driven processing • Logs “consumed”, “invoked” and “produced” events • Assigns unique URIs to all artifacts (e.g., data files or software processes), of the form http://host-id/environment/datetime/name – host-id indicates computer system on which processing is performed – environment indicates processing environment (e.g., dev, test or ops for AMSRE) – datetime (yyyymmddhhmmss) indicates production date/time for science data files, last modification date/time for other files, system date/time for processes or workflows – name indicates name of file or process Provenance and Context Information • Harvesting granule information from ECS metadata – Also recording processing location associated with each data granule • Worked with AMSR-E Science Computing Facility to identify Context Summary information for algorithm and data product – Algorithm versions and descriptions – Parameters and data fields in science products – Ancillary files used in processing – Flag values and explanations – Pointers to full documentation • Provenance logs for “consumed”, “invoked”, and “produced” events • OPM-compliant, science relevant data provenance schema – provenance information exchange, Karma repository to AMSR-E Provenance Browser • ISO Lineage for granule provenance metadata Context Summary Metadata Schema Basic Product Information Delivered Algorithm Package File Information Grid type Documentation Links Science Algorithm Information Geophysical Parameters / Variables and Associated Information Data fields Ancillary files Flag values AMSR-E Provenance Browser Browse by Product Images Browse by Product Type Search by Filename Explore Provenance AMSR-E Provenance Browser Processing Graph Algorithm Information Granule Metadata Parameter In File AMSR-E Provenance Browser Parameter Selected Algorithm Used Status and Plans • Currently able to collect provenance information for all AMSR-E standard products generated at SIPS-GHRC – Level-2B and Level-3 products – Implemented in provenance testbed • Plan to implement provenance collection in AMSR-E SIPS before reprocessing to begin in February 2012 • Working with NSIDC DAAC on delivery of provenance information with data products – Content and format of information to provide – Include as annotation within the data granule, separate additional metadata file, or both
© Copyright 2025 Paperzz