Organising Data access for Diverse Communities: GEOSS and beyond Massimo Craglia, Elena Roglia European Commission Joint Research Centre http://www.geowow.eu/ A lot of data globally available • But is this Really available? • Do you know it exists? Can you access the level of data relevant to your need? • Can you understand it and use it to address the question you have? • Do you have access to the tools, methods, and above all community of users with whom to share experiences and add to cumulative learning? Lessons from GEOSS • The GEOSS Data Collection of Open Resources for Everyone (GEOSS Data CORE), is a distributed pool of documented datasets with full, open and unrestricted access at no more than the cost of reproduction and distribution. • Established in 2010, implementation started in 2011, and gathered pace in 2013. • Survey of GEOSS community in 2013, as part of GEOWOW project, to understand awareness of Data CORE GEOSS Data CORE Survey: Awareness, Involvement, and Challenges •70 respondents from 31 Countries belonging to different type of organizations involved in GEOSS. •24% of respondents were NOT aware of the concept of GEOSS Data CORE; •17% were using the GEOSS Data CORE, 24% were contributing to it. •Key barriers: the difficulty to find and discover GEOSS Data CORE resources; some thematic area are poorly represented; •Key advantages: the possibility to reduce data costs and facilitates advancing disciplinary and interdisciplinary research; •Key limitations: the fact that data spatial extent and temporal resolution do not fit users’ needs. •Improve awareness and participation by providing technical support and disseminating successful stories. Accessibility Analysis of GCI • Assessed using the 50 GCOS Climate Variables as keywords to perform a search; • 126,000 records returned (60% GEOSS Data CORE); • 8% not providing Distribution Information; • 3% accessible via OGC protocols; • 29% mostly accessible via HTTP and FTP protocols; • 60% do not specify protocols (but with working links). • Loss of info between metadata (where it exists) of raw data and metadata as outcome of search in a catalogue. • Unclear to users if results represent raw data, processed products, or outcome of analyses based on the data. An Australian Geoscience Data Cube Aaron Sedgmen Geoscience Australia GA’s Traditional EO product process EO products have traditionally been produced on demand for areas of interest from tape archives of scene based raw data Search catalogue order scenes 1Petabyte hierarchical archive: Millions of individual scenes Tape store accessed by robot. Orthorectification calibration, cloud Masking, atmospheric correction, mosaicing Identify footprint of product in space or time Feature extraction, algorithm application spectral unmixing Client requests product Product packaging and delivery An Australian Geoscience Data Cube “Cubing” Landsat images Landsat images time Tile squares Dice… &… Stack An Australian Geoscience Data Cube A paradigm shift from traditional methods • The data cube holds multiple Landsat products for the entire archive – removes the need to generate products at time of request • Hosting the data cube at NCI co-locates “big data” with high performance computing – enables in-situ analysis of the whole archive • Computational analysis is moved from the scientist’s local environment to a central HPC facility • Removes the need to download and replicate the data • Provides computing power not otherwise available to many scientists • Opens up possibilities to integrate the Landsat archive with other “big data” datasets hosted at the HPC facility An Australian Geoscience Data Cube Data Complexity Potential Number of Users Difficulty to Understand & Use Calibrated “Cubed” Data for Analysis Summary Information for Policy Advice “Raw” Sensor Data Data Complexity Knowledge Information Data GA Wednesday Seminar 30/10/13 - Datacube Use Only the Best Ingredients: Data Provenance in the Datacube • Tiles link to their source dataset (scene) records in DB for provenance. Tiles have no metadata per-se. • Data provenance must be provided by lookups to authoritative metadata. • Composite data outputs can contain pixel-based provenance e.g. Four-month non-interpolated median NDVI for entire Murray Darling Basin • Initial Datacube test area • 2,112,000,000 pixels (i.e. 2.1 Billion). • Each and every pixel can be traced back to its source observation through provenance information layers GA Wednesday Seminar 30/10/13 - Datacube Layering information access • Using the data provenance as link between raw data, processed data, and analytical products based on the data • Metadata linking input, workflows and models, and outputs • “Drill down” when needed, ensures traceability and reproducibility + access to the relevant level of information • Contribution to new model of Open Science • Requires collective efforts from both producers and users of data, models, and products. Managing Expectations! Thank you for your attention. [email protected]
© Copyright 2026 Paperzz