RDU - Project

Adoption of RDA-DFT Terminology and Data Model to the Description and
Structuring of Atmospheric Data
Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale
Background
•
DataFed & the Air Quality
Community Catalog
Problem Addressed
•
•
Facilitate data interoperability
Extend data discovery to non-domain researchers
RDA Data Foundation and Terminology (DFT)
Data Foundation & Terminology WG
RDA Data Foundation and Terminology (DFT) Adoption plan
●
●
●
Map DFT model to DataFed/AQ Com Cat data model
Assess potential RDA/DFT compliance
Real-world evaluation of outcome
RDA Data Foundation and Terminology (DFT) Adoption Activities & Timeline:
Training
Ongoing
Draft DataFed data model and inventory terms and evaluate existence of PID’s
March
Virtual Server
March
Mirror site of AQComCat
March
User testing
March
Compare DFT model to DataFed data model
March/April
Create/assign PID’s to AQComCat
April
Reboot AQCom Cat
May
Add new datasource to AQComCat - test understandability of terms with data suppliers.
Conduct post-DFT implementation usability of AQComCat
Publish paper/report on findings
June
July
August
RDA Data Foundation and Terminology (DFT) Adoption Outcomes
●
●
●
●
Report on usability of RDA DFT model
Assess fit of the RDA DFT model to DataFed data model
Evaluate improved discoverability/reuse
Engage with Data Foundation and Terminology Working Group
Thank you!
Current
Catalog
RDU - Project
Adopt, Refine RDU
products
DTR
DTR
DTR
RDUCompliant
Catalog
Interaction with RDU Groups
DTR Data Type Registries WG (Register Types)
PID Information Types WG (Get PID ??)
DFT Data Foundation and Terminology WG (Data Model…?)
DF
Data Fabric IG (DataFed Use Case?)
Data Type Registry for Sharing and Reuse
We will use the RDA Type registry (L. Lannom)
Registry needs to be federated… e.g. with GCMD registry
What is Data Typing?
Data ‘typing’ is the characterization of data structure, contexts, assumptions and other info
needed to describe and understand the data.
The ‘types’ need to be:
•Defined and understood by data producers and consumers
•Types should have multiple levels/granularity –single observation to data sets..(how??)
•Each type is to have a PID
•Permanently associated with the data they describe
•Standardized (OGC, ISO), unique (PID), and discoverable (TypeRegistry?)
Data typing should aid the discovery, understanding, sharing and reuse of data.. across
domains
•Automated processing of large data collections is a necessity
•Which requires a machine readable types, i.e. a clear data model for typing (clarify ???)
•‘Composability’: lower level/base types can form more complex composite types (how???)
Global Change Master Directory (GCMD)
Extensive collection of keywords and UUID’s; Possible use for ‘Types’
Do we combine it with AQ ComCat Types? Any other registries to federate?
The GCMD/IDN release Version 8.1 of the GCMD/IDN Science Keywords. RESTFul service (API),
is also available. Keyword List:
Science and Services Keywords: Category, Topic, Term, Variable, Level, Detailed, Variable, UUID
Other ‘Types’ (Some are useful – to be defined formally, ID-d, in RDU Type registry : Data Centers,
Projects, Instruments, Platforms, Locations, Horizontal Resolution, Vertical Resolution, Temporal Resolution,
URL Content Types
Project Outputs and Outcomes, Next Steps
Outputs:
•Develop a data model for suitable for describing atmospheric data
•Identify basic and composite types for atmospheric data
•Register these types in DTR
•Attach ‘types’ to data in DataFed
•Type-based search interface to DataFed data.
Outcomes
•Real-world testing of Typing concepts and Registry
•Understanding of domain-specific issues and approaches, lessons learned
•Interaction with multiple RDU Groups … contribution to Data Fabric
•Recommendations for next phase
Next steps outlined ???
ToDo’ s
Combine AQComCat, GCMD, Other ‘keywords’/facets/
Formally define ‘RDU Types – Names, descriptions’, Get PIDs
Check, reconcile types with concepts of DTR, PID, DFT WGs – is it OK?
Register Types in Type Registry
Incorporate Type-based metadata into AQComCat
Test catalog usability before, after
CF - Climate & Forecast
Conventions
Observation
(Parameter)
GCMD Keywords
has
GCMD Temp Res
Attributes
(Facets)
GCMD Platform
GCMD
Instruments
Currently, neither the Observations, nor the Attributes are uniquely defined
Some metadata standards and conventions already exist – but can not be forced to RDA ‘standardized’
Need ‘wrapper’ and ‘adopter’ components to harmonize and integrate metadata
Data Type Model: Atmospheric (Earth?) Observation
What/How Measured
Earth Observation
Where and When
Parameter
Spatial Coverage
Name, Desc., STD/Ref. ID
Point, Grid, Trajectory, Image
Instrument
InSitu, RemoteSens
Data Source, Access
Originator
Spatial Extent
LatMinMax, LonMinMax
Distributor
Spatial Resolution
Platform
10 Km
Provenance
Time Coverage
Domain
Spatial Extent
TimeMin, TimeMax
Data type
Time Resolution
Year, Month, Day, Hour, Min
DataFed: Federated Data System DataFed
System of Systems architecture is suitable for integrating data
Heterogeneous data can be non-intrusively standardized by mediators
Air Quality Decision Systems
EOs. &
Modeler
EO Service
Provider
Discipline
Scientist
Health & Env.
Analyst
Policy &
Manager
Observ.
Benefits
Monitorig
Network
Informing
the Public
Satellite
Protecting
Health
Model
Shared
Data Pool
Emission
In the new GEOSS paradigm, EOs should be accessible from a shared virtual data pool
Atmosph.
Science
Global
Policies
DataFed Information Infrastructure
Data Sharing Infrastructure
Std. Servers
Adaptors
Data Pool
Std. Tools
User Tools
Benefits
Monitorig
Network
Informing
the Public
Satellite
Health
Effects
Model
Climate
Impact
Emission
Science &
Education
DataFed is an implementation of the GEOSS data sharing paradigm
DataFed also includes client applications for data browsing, exploration and analysis
These flexible tools can be used on any dataset form anywhere on the Web.