BlueBRIDGE – 675680 www.bluebridge-vres.eu Project Acronym BlueBRIDGE Project Title Building Research environments for fostering Innovation, Decision making, Governance and Education to support Blue growth Project Number 675680 Deliverable Title Blue Assessment VRE Specification: Revised Version Deliverable No. D5.3 Delivery Date December 2016 Authors J. Barde, A. Ellenbroek, C. Formisano, S. Large, Y. Marketakis BlueBRIDGE receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 675680 BlueBRIDGE – 675680 www.bluebridge-vres.eu DOCUMENT INFORMATION PROJECT Project Acronym BlueBRIDGE Project Title Building Research environments for fostering Innovation, Decision making, Governance and Education to support Blue growth Project Start 1st September 2015 Project Duration 30 months Funding H2020-EINFRA-2014-2015/H2020-EINFRA-2015-1 Grant Agreement No. 675680 DOCUMENT Deliverable No. D5.3 Deliverable Title Blue Assessment VRE Specification: Revised Version Contractual Delivery Date December 2016 Actual Delivery Date May 2017 Author(s) J. Barde (IRD), A. Ellenbroek (FAO), S. Large (ICES), P. Fabriani (ENG), C. Formisano (ENG), Y. Marketakis (FORTH), A.Gentile (FAO) Editor(s) J. Barde (IRD) Reviewer(s) L. Candela (CNR) Contributor(s) n.a. Work Package No. WP5 Work Package Title Supporting Blue Assessment: VREs Development Work Package Leader FAO Work Package Participants ENG, ICES, IRD, FORTH Distribution Public Nature Other Version / Revision V1.0 Draft / Final Final Total No. Pages (including cover) Keywords 40 Stock assessment, Global record of stocks and fisheries, community software, R, ecological models, semantic model, aggregation, fisheries D5.3 Blue Assessment VRE Specification: Revised Version Page 2 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu DISCLAIMER BlueBRIDGE (675680) is a Research and Innovation Action (RIA) co-funded by the European Commission under the Horizon 2020 research and innovation programme The goal of BlueBRIDGE, Building Research environments for fostering Innovation, Decision making, Governance and Education to support Blue growth, is to support capacity building in interdisciplinary research communities actively involved in increasing the scientific knowledge of the marine environment, its living resources, and its economy with the aim of providing a better ground for informed advice to competent authorities and to enlarge the spectrum of growth opportunities as addressed by the Blue Growth societal challenge. This document contains information on BlueBRIDGE core activities, findings and outcomes and it may also contain contributions from distinguished experts who contribute as BlueBRIDGE Board members. Any reference to content in this document should clearly indicate the authors, source, organisation and publication date. The document has been produced with the funding of the European Commission. The content of this publication is the sole responsibility of the BlueBRIDGE Consortium and its experts, and it cannot be considered to reflect the views of the European Commission. The authors of this document have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated the creation and publication of this document hold any sort of responsibility that might occur as a result of using its content. The European Union (EU) was established in accordance with the Treaty on the European Union (Maastricht). There are currently 27 member states of the European Union. It is based on the European Communities and the member states’ cooperation in the fields of Common Foreign and Security Policy and Justice and Home Affairs. The five main institutions of the European Union are the European Parliament, the Council of Ministers, the European Commission, the Court of Justice, and the Court of Auditors (http://europa.eu.int/). Copyright © The BlueBRIDGE Consortium 2015. See http://www.bluebridge-vres.eu for details on the copyright holders. For more information on the project, its partners and contributors please see http://www.i-marine.eu/. You are permitted to copy and distribute verbatim copies of this document containing this copyright notice, but modifying this document is not allowed. You are permitted to copy this document in whole or in part into other documents if you attach the following reference to the copied elements: “Copyright © The BlueBRIDGE Consortium 2015.” The information contained in this document represents the views of the BlueBRIDGE Consortium as of the date they are published. The BlueBRIDGE Consortium does not guarantee that any information contained herein is error-free, or up to date. THE BlueBRIDGE CONSORTIUM MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, BY PUBLISHING THIS DOCUMENT. D5.3 Blue Assessment VRE Specification: Revised Version Page 3 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu GLOSSARY ABBREVIATION CNR CMSY CMSY-as-a-Service DLM Toolkit ENG FAO FLR FORTH ICES IRD KPI Matware OpenCPU RShiny RStudio SDMX SS3 VPA VRE DEFINITION Consiglio Nazionale della Ricerche (National Research Council of Italy) Catch Maximum Sustainable Yield; a set of models for stock assessment The CMSY implementation used in the BlueBRIDGE VREs Dataminer Toolkit of methods used for managing data-limited fisheries and a management strategy evaluation of their relative performance across a range of fisheries Engineering – Ingegneria Informatica Spa Food and Agriculture Organization of the United Nations Fisheries Library in R; a JRC led initiative for quantitative fisheries science, developed in the R language Foundation for research and technology Hellas International Council for the Exploration of the Sea Institut de Recherche pour le Développement Key Performance Indicator A tool to construct domain-specific warehouses by aggregating semantic data A framework to expose an http API for embedded scientific computing with R A web application framework for R. A free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and graphics Statistical Data and Metadata eXchange; a global data format Stock Synthesis Version 3; an advanced model for stock assessment Virtual Population Analysis; an approach to stock assessment Virtual Research Environment D5.3 Blue Assessment VRE Specification: Revised Version Page 4 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu TABLE OF CONTENT DOCUMENT INFORMATION ........................................................................................................ 2 DISCLAIMER ............................................................................................................................... 3 GLOSSARY .................................................................................................................................. 4 TABLE OF CONTENT .................................................................................................................... 5 DELIVERABLE SUMMARY ............................................................................................................ 7 EXECUTIVE SUMMARY ............................................................................................................... 8 1 Stock Assessment – Generic VRE........................................................................................ 10 1.1 Use Cases ...............................................................................................................................10 1.1.1 Tabular Data management ....................................................................................................... 10 1.1.2 TabMan Data standardization .................................................................................................. 10 1.1.3 Data Miner Data analysis .......................................................................................................... 11 1.1.4 Species Discovery service ......................................................................................................... 11 1.1.5 Data Dissemination................................................................................................................... 12 1.1.6 Data Publication – The publication of VRE outcomes .............................................................. 12 1.2 Users .....................................................................................................................................13 1.3 VRE Design .............................................................................................................................13 1.4 Resources...............................................................................................................................13 1.4.1 Data resources .......................................................................................................................... 13 1.5 Implementation plan ..............................................................................................................14 2 Stock Assessment - FAO..................................................................................................... 17 2.1 Use Cases ...............................................................................................................................17 2.1.1 Regional database for Fisheries Data management: WECAFC case ......................................... 17 2.1.2 Stock Assessment Support: WECAFC case................................................................................ 17 2.1.3 Tuna atlas upgrade and added services ................................................................................... 18 2.2 Users .....................................................................................................................................19 2.3 VRE Design .............................................................................................................................19 2.4 Resources...............................................................................................................................20 2.5 Implementation plan ..............................................................................................................20 3 Stock Assessment - IRD ...................................................................................................... 22 3.1 Use Cases ...............................................................................................................................22 3.2 Users .....................................................................................................................................22 3.3 VRE Design .............................................................................................................................23 3.3.1 EwE ........................................................................................................................................... 23 3.3.2 Ichthyop .................................................................................................................................... 24 3.3.3 BFT Assessment ........................................................................................................................ 25 3.4 Resources...............................................................................................................................27 3.5 Implementation plan ..............................................................................................................28 3.5.1 EwE Workplan........................................................................................................................... 28 3.5.2 ICCAT BFT-E Workplan .............................................................................................................. 29 3.5.3 Ichtyop Workplan ..................................................................................................................... 29 D5.3 Blue Assessment VRE Specification: Revised Version Page 5 of 40 BlueBRIDGE – 675680 4 Stock Assessment - ICES..................................................................................................... 31 4.1 4.2 4.3 4.4 4.5 5 www.bluebridge-vres.eu Use Cases ...............................................................................................................................31 Users .....................................................................................................................................32 VRE Design .............................................................................................................................32 Resources...............................................................................................................................32 Implementation plan ..............................................................................................................33 GRSF ................................................................................................................................. 34 5.1 Use Cases ...............................................................................................................................34 5.2 Users .....................................................................................................................................34 5.3 VRE Design .............................................................................................................................35 5.4 Resources...............................................................................................................................35 5.4.1 FIRMS ........................................................................................................................................ 35 5.4.2 RAM Legacy Stock Assessment Database ........................................................................... 35 5.4.3 FishSource ............................................................................................................................... 36 5.4.4 Other Sources ......................................................................................................................... 36 5.4.5 Requirements ......................................................................................................................... 36 5.5 Implementation plan ..............................................................................................................38 REFERENCES ............................................................................................................................. 39 Appendix 1 ............................................................................................................................... 40 Overview of VREs related to task 5.1; generic VREs and model or framework specific .......................... 40 D5.3 Blue Assessment VRE Specification: Revised Version Page 6 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu DELIVERABLE SUMMARY This document reports on outcomes of the activities in BlueBRIDGE Work Package 5 “Supporting Blue Assessment: VRE Development”. In particular it documents the requirements and specifications to support to the development of stock assessment data services and the Global Record of Stocks and Fisheries, from M16 to M18, and includes implementation plans until M22. The Deliverable 5.3; Revised version was originally planned as a living wiki page that would grow with the development of Blue Assessment tools and services. This wiki is used to propose new ideas, discuss potential development efforts, and report some results. The obvious overlap with the ticketing system, and the multitude of development activities, some of which stopped or completed in 2016, while other are not planned until 2018, led to fragmentation of the wiki in some parts, as observed for D5.1. In November 2016 the PO requested that for Deliverable 5.1 a report had to be produced to summarize the information of the wiki in an organized fashion, which resulted in the VRE Specifications of D5.1. The Deliverable 5.3 (revised version of the VRE plan) was scheduled for release only one month after the production of this report. It was decided to delay the production of D5.3 from December 2016 to May 2017 to capture several community meetings and include the resulting requirements and specifications for VREs and data services. D5.3 Blue Assessment VRE Specification: Revised Version Page 7 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu EXECUTIVE SUMMARY Deliverable D5.3 “Blue Assessment VRE Specification: Revised Version” is an on-line living document which presents the requirements, specifications and design of the solution for the blue assessment VREs. The Deliverable is available on project’s wiki under the following location: https://support.d4science.org/projects/bluebridge/wiki/D_5_3 This document captures the state of plans and their implementation in March 2017, while the underlying online documents, wiki, and tickets are expected to evolve under the principles of the agile methodology that BlueBRIDGE project is following. The implementation activity two main VREs capturing the overall requirements can be traced here: Stock Assessment VRE Specification Global Record of Stocks and Fisheries VRE Specification Appendix [0] depicts the development status of the VREs was as per the reporting date (May 2017) The overall description of the VREs are accessible in the wiki: https://support.d4science.org/projects/bluebridge/wiki/Task_51_Stock_Assessment_VRE https://support.d4science.org/projects/bluebridge/wiki/GRSF_VRE_plan This document summarizes the description of each VRE focussing on the following aspects: Use cases, including a brief description of the expected activities Involved stakeholders Design of the VRE, including the involved resources, such as software, data, metadata, and services provided by the infrastructure, but also policies, external use and users, and collaboration. These two main VREs are used to serve specialized communities with specific requirements, users, and data. At the time of writing, several VREs were being developed; either as a separate VRE (GRSF), as software (CMSY-as-a-service, OpenCPU, SS3, DLMTool, FLR, etc) or as integrated components of the main VRE (BFT assessment, FAO Tuna Atlas updates, etc.) where the specialization is manifested through models for stock assessment. This version of the Deliverable is a revised version of D5.1, spanning Blue Assessment tasks. These tasks require specialized VREs that are included in this Deliverable in addition to the VREs originally described. It also follows the structure of D5.1, and provides information of the models, analysis and scientific planning expected to be performed. Additional use cases are expected to be identified in next stages of the project, and the delivered VRE must thus offer flexible and adaptable extension features. In M15, Task 3 in this WP was activated, and this first targets decision makers to engage with new data services of BlueBRIDGE. Activities in this task focus on: Ensuring that capacity building is integrated in all delivered products; Ensuring that new stock and fisheries can be included by additional co-funded teams, and that these teams can be equipped with appropriate tools and models adapted to their situation; Identifying new indicators for stocks and fisheries, and promote their integration. As these activities do not result in concrete VRE specifications (yet); they are not covered in this deliverable. The major results from M15-18 are the following: D5.3 Blue Assessment VRE Specification: Revised Version Page 8 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu The BlueBRIDGE VRE specification for GRSF were discussed at the Second Technical Working Group meeting of the External Advisory Board (EAB TWG-2). All stakeholders were present including the data owners (FAO, Ram, SFP), relevant EAB members, and BlueBRIDGE technical partners; The specification for the Regional DataBase (RDB) were presented in late February to meeting of most stakeholders of the Stock Assessment VRE. The community needs require more features than currently offered in the Tuna Atlas case, especially to marshal the ingestion of data; Flexibility was considered of key importance, and the considerable effort to enable communities to integrate their (often) R-based models is now much better supported than earlier. This enticed the communities to propose challenging new requirements such as SS3, FLR and DLM Toolkit; The VREs, from a community perspective, are still perceived as a collector of ‘middleware’ that support user defined (typically R-based) models and algorithms, rather than large software components that require integration (and thus extensive specs). At M15, it remained difficult to foresee the specific task activities of the different stakeholders to provide smaller components as models, algorithms or analysis of content; The specifications will continue to be adjusted as communities evolve. The stakeholders will always come first, and e.g. in stock assessment these may be slow in their uptake processes, as uptake may require the involvement of scientific panels. The plan until M22 aims at the delivery of community oriented data services, and to dedicate the M23 – M30 to exploitation and community building as described in the DoW. Some activities cannot be exploited until their development is complete, and this causes a perceived lack in achieving KPIs. Already more than 50% of the development related KPIs are met, whereas the ones related to exploitation are at lower levels, as expected at the beginning of the project. Key activities in in this work package for M16 to M22 are summarized below: Month M16, Dec 2016 M17, Jan 2017 M18, Feb 2017 M19, Mar 2017 M20, Apr 2017 M19, May 2017 M22, June 2017 Activities New version of IRD Tuna Atlas on-line viewer (IRD/FAO) SS3 validated of-line (FAO) OpenCPU app for IRD Tuna atlas (Sardara) data ready for demo (FAO) SDMX registry service in infrastructure (ENG) GRSF load with basic data (FORTH, FAO) FLR first test in infrastructure R SDMX DSD editing in FAO; first tests in infrastructure (ENG) R-Shiny DLM Tool validated in RStudio of infrastructure (ICES) EAB GRSF Feedback (FAO, SFP, Ram) RDB Requirements meeting (FAO) Deployment of R models for SS3 to infrastructure Decision to compartmentalize existing R (ICES) for EM, DLM SS3 Model development (IRD) GRSF Geospatial data and TimeSeries first services (FAO, FORTH) FLR Service validation (FAO) REST interface for TabMan OpenCPU reporting module for Sardara SDMX Template driven data flows WECAFC Data loader development (FAO) R-Shiny DLM integrated and extended D5.3 Blue Assessment VRE Specification: Revised Version Page 9 of 40 BlueBRIDGE – 675680 1 STOCK ASSESSMENT – GENERIC VRE www.bluebridge-vres.eu The generic stock assessment VRE developed in BlueBRIDGE extends the existing VRE for the management of data, i.e. the ‘FAO Tuna Atlas’ VRE. The Tuna Atlas VRE provides the frame to collect: (1) requirements for generic improvements and (2) specifications for VRE’s that target a specific community. Before the community can meaningfully exploit complete VRE’s (and not only specific data services) improvements are expected through three main key services of the stock assessment VRE: TabMan for tabular data management; DataMiner for computational analytical tools (that includes the Rstudio and SAI); SpeciesDiscovery for loading species presence and absence records and harmonize. USE CASES 1.1 The use cases listed here do not target a specific community, rather, they describe how the generic features for stock assessment are expected to be exploited in an overarching infrastructure. These cases cannot be seen isolated from each other, and also define the options for the specialized community VREs that will refer to them. 1.1.1 TABULAR DATA MANAGEMENT The TabMan (Tabular Data Manager service) [1] offers since the project start facilities to discover, ingest, curate, share, and analyze tabular data. To improve the services, e.g. to make them easier to use by community, several generic requirements need to be met: 1.1.2 Improve the integration with R – IRD has developed a large ‘library’ of R scripts for tuna capture data harmonization. This was further extended (an in-kind contribution) and shared with BlueBRIDGE. The integration is needed of these ad-hoc community scripts (that are tested in the integrated R-Studio) with TabMan, and to expose the scripts and libraries to users that want to apply them to their own datasets e.g. through a conversion template. Improve integration with DataMiner [1] – This is a task beyond WP5, but is required for the community to benefit from the e-infrastructure facilities. The community wants to be able to access a pool of generic data-management scripts that e.g. apply data standards, validate data, and apply data formats. This has to be interactive features that informs on available (and relevant) DataMiner functions (e.g. offer only csv related functions when the user works on an unstructured csv, or only offer time dimension related functions if there is a time column(s) in the dataset etc. Improve metadata management – the gradual conversion of private csv files to infrastructure assets requires that through BlueBRIDGE services business data are collected and managed through this process. TabMan already captures at dataset level key metadata elements aligned with SDMX (see next topic) and overall data policies, but the explicit management of these as metadata elements has to improve. TABMAN DATA STANDARDIZATION The TabMan partially supports the generation and management of SDMX data (Statistical Data and Metadata Exchange, a format to describe in particular time series). The community needs to offer to its targeted constituents (country fisheries offices, science departments, regional fisheries bodies) the tools to generate and understand shared and standardized data, eliminating the current time consuming and expensive manual data harmonization work; some community workshops on stock assessment require more time to load, harmonize and validate data than on the stock assessment. D5.3 Blue Assessment VRE Specification: Revised Version Page 10 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu In November 2016, the activity had two participants: FAO for what concerns the community requirements engineering, and ENG as implementer of the task to enrich the existing SDMX features of the infrastructure by implementing requirements. FAO will engage with local, regional, and global initiatives to engineer the data standardization requirements by inventorying relevant formats and data flows (2016-2018), Master Data Management initiatives in FAO (2017), data policies (2016), and evaluation of tools (2016). This activity is partly funded through in-kind contributions and collaboration with staff from FAO and other organizations. Specific standardization use cases, i.e. related to a specific community, will be described below. ENG will implement services such as SDMX data flows to the registry (2016), integrate SDMX Data Structure Editors (2017), and support the development of SDMX Transformers (2017). On March 2017, in gCube version 4.3, it is possible to retrieve Data Structures from the tables present in Tabman and export them on Fusion Registry 7.3.5. In particular the export procedure includes Data Structure Definitions, related Concepts and Codelists and generates a specific Data Flow. The possibility to import Codelists from Fusion Registry is still supported: the work is proceeding and the next step will be to export Data Structures from Tabman templates and import procedures completed Data Structures to be converted in templates. This use case, for what concerns the part of collecting and specifying community requirements related to e.g. Master Data Management, collecting local/regional/global reference data, data policies and tools evaluation will be partly o-funded. This requires however that the activity has to be tuned to external collaborations time-frames. The use case will continue to work with fisheries community for the entire duration of BlueBRIDGE. The ticket describing the SDMX improvements: https://support.d4science.org/issues/4643 1.1.3 DATA MINER DATA ANALYSIS The existing data analytical features in September 2015 were supportive to TabMan specific features (Template engine, Rules engine) and data could be exported to e.g. RStudio. However, the improved features of the DataMiner opened the opportunity to connect DataMiner to the tabular data manager. This use case requires considerable effort for WP9, and that work is not reported here. In addition, the community produced several specific scenarios for DataMiner exploitation that are reported in the specific Stock Assessment VRE’s below. The use case of the overall VRE concerns the integration of DataMiner and TabMan; where data from TabMan is available in DataMiner, and DataMiner functions are available from TabMan, with services for (Business) Masterdata management and validation of data structures. It would promote the dataspace of TabMan and the computing features of DataMiner to a Science-2 environment supporting reproducible and sharable experiments to communities. The integration of both services is ongoing, and will not be completed before 2018. Relevant cross-service features will be released in some of the VRE’s described below before that. 1.1.4 SPECIES DISCOVERY SERVICE D5.3 Blue Assessment VRE Specification: Revised Version Page 11 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu Stock assessment datasets are sometimes available on-line. In case these are tabular data, a facility in TabMan to import them directly in a table following a specific format for data and metadata would enrich the stock assessment with biological parameters from the hosted Fishbase database (also of interest to the Ecopath use case of IRD) and add clear provenance metadata and reduce the effort to import and harmonize data. The service could also provide the base for organizations that wish to publish their existing occurrence data on-line, a topic of interest to the Fishbase consortium (under negotiation). In November 2016 the community had no specific requirements to integrate and extend the Species Discovery Service with TabMan, however, a service to access biological parameters was proposed in FAO. 1.1.5 DATA DISSEMINATION The results, but also the data, service and parametrization of the service must be managed to enable reproducible experiments for end users. This necessitates the availability of a registry. In BlueBRIDGE, the CKAN software provides these functions. The community expects to be able to rely on the CKAN registry for what concerns the dissemination of data, the experiments and the configuration of the experiments. For Geospatial data the community will continue the use of the Geoserver and Geonetwork to make data available to end users. Some requirements may exceed the capabilities of CKAN or Geonetwork, and in those cases the workspace can provide dissemination services; it can provide links to datasets and folders, and has download facilities. This use case is again focused on the high level components. For what concerns the use of the infrastructure to offer reproducible experiments, publish data in accessible registries, and use of the work space to support specific communities and their experiments, the following chapters will contain details. The generic dissemination use case required improvements to the workspace (2016) and CKAN (2016) and in November the entire case was supported. 1.1.6 DATA PUBLICATION – THE PUBLICATION OF VRE OUTCOMES An innovative use of the VRE’s and BlueBRIDGE data services was identified by FAO through consultation with its community of global tuna data management. Once a VRE has produced a global (or any composite dataset) the contributors can benefit from its availability if the data and computing resources are accessible through their systems, preferably by preparing specific web components that answer ‘competence queries’ by serving specific data through predefined UI’s, and by exposing ‘sandboxes’ datasets to external analysist (e.g. a stock assessment scientist that wants to interact directly using R with a dataset). The VRE for Stock assessment would closely match that need if data can be prepared in the infrastructure (i.e. the global tuna capture data of FAO – done 2016), and the entire data workflow and metadata (as links) can be integrated with static and dynamic outputs (reports, websites), e.g. by developing reporting templates that ‘embed’ output with proper header and footer sections (including proper citation references to donor, project, consortium and data contributors) In November 2016, the skeleton use case was partly implemented using OpenCPU, the service needs to be validated by potential community members for exploitation. #3158 D5.3 Blue Assessment VRE Specification: Revised Version Page 12 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu The deployment has been a success and has been presented at the BlueBRIDGE Technical Committee held in Crete from the 14th to the 17th of June 2016. After the presentation other organizations showed interest on taking advantage of this new approach. Direct interaction with R through OpenCPU by external users will not be made available until security and session management issues are addressed. The publication of data results of integrated software components (e.g. RShiny apps for Data Limited Methods in stock assessment) emerged as a need after the summer of 2016, and will be addressed in 2017. Currently, these tools are developed for individual users, and terms and conditions of these tools will have to be brought in line with infrastructure policies. (#1778) The infrastructure is also equipped with a WPS driven Webapp publisher that allows the development of infrastructure driven apps based on DataMiner. The community has already been provided with several examples in WP7, and WP5 is considering where these can also be used. The development of the underlying data policy is also ongoing, and will contribute to D2.5 Sustainability plan. USERS 1.2 The target users of this VRE are BlueBRIDGE consortium developers, data managers, outreach managers, and community representatives able to assist with requirements engineering. It is not for public exploitation. VRE DESIGN 1.3 The design of the VRE is largely identical to the existing iMarine Tuna Atlas VRE, but with important improvements to the underlying infrastructure, especially to improve communication between data services by capturing metadata and relying on the Data catalogue for discovery and publication of data sets. Overall the VRE will use the social tool, and the generic services for users and the workspace for data storage. It will offer the BlueBRIDGE data services TabMan, DataMiner, Search, RStudio, and the Data Catalog. In addition, the SAI will be available to programmer-users to update community algorithms. The design will have to manage datastructures (SDMX DSD’s for instance) that can be based on existing Eurostat software (for the SDMX registry), and manage the data access and sharing through data policies based on VRE user rights for accessing folders in the Workspace. RESOURCES 1.4 1.4.1 DATA RESOURCES The community identified the following data types that needs to be managed in the infrastructure: Tabular data o CSV files that need to be imported; typical sizes are smaller than 1 MB; examples: Global Capture Datasets 12 MB. Regional datasets from 50 KB (Landings) to 1 MB (effort). Reference dataset; from 3 KB to 2 MB (All 13.000 fisheries relevant species). The total number of datasets to be imported is expected to be around 1000. o Databases or database files D5.3 Blue Assessment VRE Specification: Revised Version Page 13 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu Postgres and PostGIS are community standards that are yet supported. Specific size ranges cannot be given; in some of the specialized VRE’s this may be possible. Generation of on-the-fly XML data (SDMX) from data bases is foreseen Text data o Typical R scripts for data validation and transformation are less than 50 KB. Several hundreds of R scripts are expected. o Reports such as manuals require storage in the WS; a precise request for storage size cannot be provided. Geospatial data o The Geoserver and geonetwork based SDI’s of FAO and BlueBRIDGE are expected to be used to share data across the SDI’s. The metadata harvesting is expected to remain active. o NetCDF publishing and management in Geonetwork and in CKAN. 1.5 IMPLEMENTATION PLAN The Work Package VREs and data services will be delivered through contributions by WP partners through VREs that have a specific function. This means that most additions to this generic VRE will be described in the VREs sections below. The activity that most contributes to the generic VRE development and is included here relate to the implementation of a generic framework for statistical data management related to SDMX. For the SDMX framework, FAO leads the specification phase that will have result in metadata driven services for data harmonization. In 2016, this required work on the upgrade of the Tabular Data Service Tabman; in particular an improved integration with DataMiner is needed. FAO Corporate systems were in the process of upgrading to a newer SDMX registry (V9) and FAO identified and selected a new Master Data Management tool (EBX5) that will direct future developments related to data exchange between FAO and BlueBRIDGE. This caused the delay of some FAO activities related to BlueBRIDGE services. The current versions of Tabular Data Management Service and Portlet provide functionalities to import Codelists from a SDMX Registry and export Data Structures retrieved from Tabman tables (Figure 1). D5.3 Blue Assessment VRE Specification: Revised Version Page 14 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu Figure 1 Tabular Data Management Service and SDMX, current status Tabman datasets have been conceived as general purposes tables. However, if a dataset contains a time dimension, the dataset is a Timeserie. In gCube 4.3 Timeseries can be exported to a Registry in SDMX format by retrieving their SDMX Data Structures (Data Structure Definitions, Concepts, Data Flows and related Codelists) from their Tabman Metadata. The SDMX Registry, that is Fusion Registry in D4Science Infrastructure, can expose this data in standard format. Starting from the current situation, the following steps should be accomplished (Figure 1): Support for Fusion Registry 8 should be assured, including username/password based security; The Data Structures Exporter should be extended to support also Tabman Templates besides Tabman Tables; Codelist Importer should be extended to import complete Data Structures from Fusion Registry and to convert them in Tabman Templates; a SDMX Data Source should be implemented: it should consist in a web interface enabled to access Timeseries data stored in Tabular Data Management Database. The final result is shown in Figure 2. Tabular Data Management Portlet will be able to export Timeseries data in SDMX format by sending metadata, including Codelists, to a SDMX Registry, and by allowing SDMX Data Source to access raw data of the exported table in its internal database. In case of import SDMX Registry will send metadata, including the URL of a remote Data Source, to Tabular Data Management Portlet that will ask SDMX Data Source to import data. D5.3 Blue Assessment VRE Specification: Revised Version Page 15 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu Figure 2: Final deployment In order to get Timeseries data, according to the standard, a SDMX Client will access SDMX registry to get Medatata and a Reference to the Data Source from where raw data will be downloaded. The first set of activities concerning SDMX was completed and released with gCube 4.3 (#4643 and #5870). The next planned steps are (#7358): Including strict version control in SDMX exporter in order to avoid data duplication on Fusion Registry (#7535), to be finalized in March; Upgrade Fusion Registry to version 8.4 (#7192), to be completed in March ; Implementation of SDMX Exporter based on Tabman Templates (#7358), to be provided in April; implementation of SDMX Importer, planned to be started in April, after successful tests of the Exporter (#7358 ); deployment of SDMX Data Source (#7360), planned for May/June; D5.3 Blue Assessment VRE Specification: Revised Version Page 16 of 40 BlueBRIDGE – 675680 2 STOCK ASSESSMENT - FAO www.bluebridge-vres.eu The FAO of the UN, through BlueBRIDGE VRE’s and data services, intends to support stock assessment scientists, assessment teams, and regional initiatives. It has identified several stages in the development of these outputs, with an increasing complexity in the use cases being served. There is a large overlap between the several initiatives that are identified for implementation, and this will result in a significant reduction of effort duplication and thus costs. For each of the following cases, we expect that the generic services (described above) are available at some stage during the development. The tuning of the different activities will be planned according to availability of services. For the community, it makes little sense (or is even counterproductive) to expose remote communities with services that are not available or robust. In addition, the RDB and stock assessment requirements are volatile and activity related, and cannot be planned too far ahead. Flexibility is of utmost importance. USE CASES 2.1 2.1.1 REGIONAL DATABASE FOR FISHERIES DATA MANAGEMENT: WECAFC CASE The objective of this use case is to exploit a customized version of the generic stock assessment VRE (cf. Sec. 1) to support the data management needs of a regional FAO project. This implies the following steps: Identification of users and uses o An initial scan was made of organizations and representatives (2016) Identification of opportunities and gaps, and recommendations for improvements (2016) o The TabMan was validated for data ingestion and curation; o In March 2017 a RDB meeting report was produced Release of targeted components, and validation by a ‘super-user’ (2016-2017) o Stock assessment tools; Improved DataMiner ‘as-a-tool’ (2016); o Data Management Tools; Improved integration of Tabular Data Manager; Formulation of exploitation scenario, and establishment of team o A meeting is scheduled in 2017 (February in Rome, September in WECAFC region); o A collaboration has to be formalized with data contributors (2017); Organize exploitation support – requires in-kind contributions; TBD o Training and Assistance; To capture the needs of regional projects BlueBRIDGE in kind contributions are required, and these are identified in the WECAFC region where FAO assists in the establishment of a regional management body. Some aspects of the data collation can be provided by BlueBRIDGE, and the overall requirements, based on the capabilities of the Tuna Atlas VRE, are demonstrated to several stakeholders. In 2017, several tests with real data will have to demonstrate the capabilities of the tool to provide a secure and performant tool for regional elaboration of capture data. The overall activity can be tracked here: https://support.d4science.org/issues/1678 The above steps can be repeated, once the RDB features are implemented, in other scenarios for other fisheries management organizations. 2.1.2 STOCK ASSESSMENT SUPPORT: WECAFC CASE D5.3 Blue Assessment VRE Specification: Revised Version Page 17 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu The objective of this use case is to exploit DataMiner and TabMan integrated in a VRE to support the data analysis and dissemination needs of stock assessment scientists, teams, and regional (or global) fisheries organizations. This implies the following steps: 2.1.3 Identification of users and uses o The identified WECAFC community overlaps those of the IRD and ICES VRE’s, and much of the effort can be shared (2015). o The FAO community is initially related to the region of the first fisheries data management support VRE; in the context of the WECAFC regional project o The first priority species (Hogfish, Amberjack) and models (CMSY, SS3, VPA) were identified after several regional meetings (Summer 2016) Validation of the e-infra suitability to support the selected software o The first software to be tested was CMSY-as-a-service; results are promising o SS3 software required a request for software, which was provided in September. Test are still ongoing in private off-line environments. In January 2017 a test will start with V3.2. o Other tools were identified, but these are reviewed by other partners (IRD and ICES) to share the workload and reduce duplication. Release of targeted components, and validation by a ‘super-user’ (2016-2017) o DLM Tool was released in test in September 2016; not validated by FAO yet o CMSY-as-a-service was validated (Oct 2016) o SS3 V3.3 (new version)is scheduled for June 2017 o Update KPI; January 2017 to 4 stocks assessed https://support.d4science.org/issues/1464 Formulation of exploitation scenario o The VRE is mainly targeting capacity building initiatives and training o When a complete tool is available, FAO will demonstrate to regional initiatives (2017) o The WECAFC region is a first target, and will be engaged in 2017 Organize exploitation support – after an exploitation scenario is agreed. TUNA ATLAS UPGRADE AND ADDED SERVICES The objective of this use case is to improve the management of the production of global tuna capture datasets, and their publication as SDMX timeseries under a well-defined (meta-)data driven design. Realization of the case requires improved DataMiner and TabMan integration in the VRE to support data harmonization and standardization. It also will include (late 2017) some needs of FAO Master Data Management support through EBX5 interoperability; BlueBRIDGE can bridge between Global (FAO) and national and regional Mater data. The users of this service are FAO Staff responsible for producing the global tuna atlas and the extraction of indicators from that atlas. This use case captures the progressing features to support the ingestion (done) curation (done) and harmonization (in progress) of data from regional fisheries bodies, and their use in FAO and other organizations websites (http://www.fao.org/figis/geoserver/tunaatlas/) or WebApps. The Tuna Atlas upgrade relates also to the visualization of the tuna atlas data, for which an independent component was requested that can be installed in websites of e.g. regional fisheries organizations. OpenCPU was selected to act as a bridge to data in the infra and a viewer component. D5.3 Blue Assessment VRE Specification: Revised Version Page 18 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu The integration of OpenCPU facilities in the infrastructure can be tracked here: Generate graphs and figures The online version of the first OpenCPU based facilities can be accessed here: http://vps282167.ovh.net/BlueBridgeWidget/testGeneralExperiment.html Since May 2017 OpenCPU viewer is also accessible through a VRE (in test) The code for the OpenCPU related work is available here: https://github.com/pink-sh/BlueBridgeWidget The production and publication of timeseries of capture data as standardized XML data based on SDMX is ongoing. The integration of SDMX facilities can be tracked here through the following tickets: Improve SDMX capabilities: https://support.d4science.org/issues/4643 Add SDMX publishing support to Tabular Data Manager: https://support.d4science.org/issues/4881 USERS 2.2 The VRE delivered under this use-case will serve two overlapping communities in the WECAFC region (RDB and Stock Assessment support), and a community of users working on global capture datasets with particular emphasis on tuna, billfishes and shark capture data integration and visualization. For each community, a separate VRE may be required, with largely overlapping features, but different communities. For each community the following roles are foreseen: 2.3 VRE Manager: user and resources management (data and computing, access and security); Data Managers: data input, validation and publishing (to the CKAN or SDMX registries) of results; Data Analysts: individual scientists that analyze data with their own R scripts or relying on tools integrated in the infrastructure (e.g. DataMiner SS3, or DLM Tools); Assessment teams: groups of experts who take a ‘holistic’ view of the available data (in the workspace), run models (through DataMiner) and compare the output (e.g. in predefined reports, in WebApps). The important part is to enable reproducible science; an assessment must be repeatable in the future; External ‘browsers’: consumers of the products will not be VRE members, and can be any person accessing any of the BlueBRIDGE Data Services and catalogue to discover the VRE public results. VRE DESIGN The VRE(s) will be very similar to the existing Tuna Atlas VRE, including the user management, social tool, and other infrastructure facilities. The design will need to cover several tasks in the data lifecycle to support the data-flow from ‘fishnet to internet’; i.e. from data collection to publication. These tasks are roughly identified as (they are very dependent on each other). Data management (might require separate VRE’s for e.g FAO Tuna Atlas and WECAFC) 1. Collection, access and storage; 2. Curation, harmonization and standardization; 3. Data analysis and review based on shared tools of Dataminer; 4. Data dissemination and publication – based on shared tools such as the catalogue and registries; Stock assessment (might require a separate VRE’s for WECAFC and others) 1. Assume the data management tool is capable to provide the data in standardized form, or enable other data upload facilities to Data Miner and other analytical tools; 2. Data analysis and dissemination – based on ‘private’ (to the user) tools such as DLM Toolkit; 3. Data analysis and dissemination – publish as shared tools; D5.3 Blue Assessment VRE Specification: Revised Version Page 19 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu Data Dissemination (For WebApps and OpenCPU based tool with no or simple user identification) 1. Data analysis and preparation; 2. Load into optimized dissemination environments; 2.4 RESOURCES Data resources for stock assessment are provided by regional fisheries organizations and individual scientists as small to medium sized tabular or XML (SDMX) data that are easily uploaded to data miner. Computing resources are available in a variety of software formats that include R, JAVA, JS, and ADMB. 2.5 IMPLEMENTATION PLAN The development of the VRE extend the FAO Generic VRE (starting with the Tuna Atlas VRE) with data resources and services that meet the need of the specific communities is focusing in Period 1 on the establishment of the tools and the community. Implementation will start when: (1) the components are validated and integrated in the generic VRE, (2) the target users have validated the approach and identified the stock analysis scenario, (3) the community has received proper instructions on the exploitation, and (4) the community has subscribed to the data policies. In Period 1; the following activities were planned: 1. the components identified a. Data sources; i. Collect global tuna capture data (done, partly through in-kind IRD contribution) ii. collect capture time series for 3 stocks (done) iii. capture output in Workspace (done) b. Computing resources; integrate R (done), and add 3 models: i. Tuna atlas load and harmonization R scripts and align with IRD- Sardara (done) ii. CMSY-as-a-service (done); example result: CmsyNotebook.nb.html (1000 KB) iii. SS3 (in progress) https://support.d4science.org/issues/5810 iv. VPA (selection in progress) v. DLM Toolbox (in progress) https://support.d4science.org/issues/1778 vi. Data extraction and reporting – establish framework with WPS and OpenCPU (done) c. Dissemination resources i. Add capacity to convert private R models to shared models with SAI (Done) ii. Capture output of shared models in infrastructure workspace folders (done) iii. Add capacity In Period 2, the following activities are planned for the implementation. The technology activity is captured in #1674, additional community sensitizing, engagement, and development activities are often in-kind contributions, and thus not captured in tickets. 1. Continue to support the development of models and algorithms for FAO and WECAFC #5238 a. Consult FI and NOAA (for WECAFC) on CMSY, SS3, FLR and other models. (Done) b. Based on the consultation under 1a. add RShiny and DLM Toolkit requirements (Done) c. Consult with regional representatives on the format and structure of output reports. d. Use the Statistical Algorithm Importer and DataMiner to implement existing R algorithms in the infrastructure; (Ready) D5.3 Blue Assessment VRE Specification: Revised Version Page 20 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu e. Use the Workspace to review and disseminate model output and prepare reports. (Ready) 2. Continue to support the implementation of a dynamic on-line stock assessment reporting module with ICES, NOAA (USA) and IRD. #3158 Developed around the models of point 1, wrap the data service in an attractive, performant, and user-friendly modules based on a.o. DLM Toolkit, WPS and OpenCPU to: a. Access and provision data for the models of the inventory above; b. Display on-line fisheries indicators in an embeddable component for use both in the iMarine VRE infrastructure (WebApp), and in web sites of fisheries management bodies. Display should cover tables, graphs and maps; c. Develop a dynamic reporting module where advanced users can rely on infrastructure data services, a data repository, and a web-authoring tool such as RShiny/RshinyProxy, Markdown/Knitr, or R-Notebook, to develop their own dynamic reports. d. Ensure that model performance and optimization meet usability standards. e. Improve access to Biodiversity data (FishBase biological parameters, WoRMS mapping to ASFIS) f. Recommend results integration of stock assessments with GRSF registry (T5.2) 3. Improve the FAO Tuna Atlas data services, to demonstrate the potential of the iMarine infrastructure dynamic reporting. #5493 g. Extend the use-case to cover parallel use-cases related to on either Sardara (with IRD France) or Statlant formats (with CCAMLR Tasmania, CECAF, ICES Denmark, etc.); h. Add geospatial features to ingest and produce geospatial data #7451 4. Support the capacity development in a community exploitation data services (WP8 – For reference); i. Assist with exploiting Virtual Research Environments; j. Document use of data services for Stocks and fisheries of 3 models selected by the community; k. Provide assistance, and if relevant, training to users and advanced users of the VRE. 5. Support RDB development as a BlueBRIDGE Virtual Research Environment #6098; these requirements were produced during and after a dedicated RDB meeting in Rome, March1-2. l. Define requirements for a specific RDB in the WECAFC region (Done), and discuss the replication to other use cases. m. Define capacity building opportunities in WECAFC region and beyond (WP8 – September 2017) n. Liaise with WECAFC representatives to define and design training materials and possible workshop (WP8) o. Act as VRE manager or WECAFC RDB VRE p. Organize roll-out of RDB facilities to WECAFC members through meetings and data exchange (WP8 – October 2017) 6. Support integration of WECAFC RDB facilities with other systems q. Assist in development of MDM facilities with FAO system engineers (July 2017) r. Align RDB data structures with the FAO Global Record of Stocks and Fisheries (GRSF) (T5.2) s. Analyze RDB harmonization features across FAO and IRD, and propose algorithm integration; focus on mapping of national classification to regional / global ones for FAO TCP’s in e.g. Oman, Trinidad and Tobago t. Analyze integration of RDB with Dataminer facilities, and promote the use of stock assessment services to WECAFC 7. Support management reporting and collaboration u. Assist communication with EAB/TWG preparation and follow-up activities (WP2) v. Analyze fisheries data formats and exchange with a focus on the WECAFC region for RDA w. Propose data alignment options with metadata driven approaches based on e.g. SDMX x. Provide content for a sustainability plan based on RDB exploitation (November 2017) D5.3 Blue Assessment VRE Specification: Revised Version Page 21 of 40 BlueBRIDGE – 675680 3 STOCK ASSESSMENT - IRD www.bluebridge-vres.eu IRD plans to use BlueBRIDGE VRE’s and data services to support stock assessment scientists and assessment teams with existing data analysis software that will benefit from a managed VRE. There is a significant contribution of in-kind contribution by IRD. IRD aims at a significant reduction of effort duplication and thus costs for the stock assessment teams. This can be achieved by managing all stages of the development of an assessment report. The stock assessment requirements are volatile and activity related, and cannot be planned too far ahead. For instance, the release of software by NOAA cannot be anticipated, but developers must be responsive so that stock assessment scientist can access the latest models. Flexibility is thus of utmost importance. USE CASES 3.1 IRD plans to incorporate several existing software components, and benefit from BlueBRIDGE resources to improve performance of these tools. Here is the list of list_of_IRD_algorithms. Ecopath and Ecosim: ecological / ecosystem modeling activities BFT Assessment: tuna stock assessment models (bluefin Tuna with ICCAT and Ifremer examples and demonstration with possible extension to tropical Tunas and billfish with IOTC if interested by the approach). We are currently working on standardization of data formats by packaging stock assessment model outputs within netCDF files to embed metadata and provide data access (through Thredds Server). Ichthyop: Ichthyop (http://www.ichthyop.org/) is a Lagrangian tool for simulating drifting objects dynamics in 3 o 4 dimensions (3D + Time). The model itself is written in Java but uses and generates data delivered with netCDF data formats. Many researchers focus on the model parametrization without having to deal with Java programming. However, many researchers use data model outputs through R codes and R package for netCDF. We thus can use the infrastructure to execute both Ichthtyop simulations and related R codes. FAOTunaAtlas: IRD participates to the FAO Tuna Atlas by providing a set a R codes to extract, transform and load tuna RFMOs datasets within a single data format and a related data warehouse (PostGIS database). IRD works as well with FAO to provide a set of R codes to generate indicators. Next part of the work will focus on packaging tuna RFMOs datasets within netCDF files to better manage both metadata and data by complying with standards (OGC metadata and standard code lists for species, fishing gears, etc.). This will be able to foster data discovery by using the metadata catalogs of the project and related data servers (Thredds or Geoserver). All partners stress the need for: 3.2 Interactive features on results; annotations, html reports e.g. with Interactive Dashboards for visualization (e.g. 'shiny') or collaborative edition of document (“google doc like”, e.g. Sharelatex) Workflow features; parameterization of algorithms User levels; VRE Users (data only), Tweakers (parameters), and developers (change/add) code USERS Ecopath use will have to be re-discussed; it was halted in period one. The original focused of infrastructure based modeling shifted to data provision through targeted data services (localized biological parameters). D5.3 Blue Assessment VRE Specification: Revised Version Page 22 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu BFT Assessment is developed for a BFT assessment team of ICCAT (led by Ifremer and ICCAT). Similar teams in other organizations will benefit from the same capacity, as data and algorithms are similar. BFT is a very emblematic species and related stock assessment is thus a hot topic (cite some publications). Tuna commission pretty much use the same kind of approach for all tuna species and related stock assessment working group. BFT use case can thus be presented to other working groups if successful (BB can service the tool with improved data and processing facilities, this is likely to attract wide support). In particular ICCAT BFT use case might be used to set up a similar approach with IOTC. Ichtyop is used by multiple marine scientists (IRD, IFREMER and more) who need to simulate objects such as drifting simulations (FADs, Drifters...) used by fishermen, physical oceanographers; fish larvae / connectivity dispersion in the open ocean; pollution / oil spill dispersion and float modelling; safety issues e.g. to model flight crash debris. FAO Tuna atlas will be used by FAO and IRD, but its output can also be shared with any organization that has an interest (and can make some contribution) to produce global overviews of indicators. We expect that data discovery, data transformation or data visualization services will be used by partners as they can be integrated in any Website (through WPS or OpenCPU protocols). VRE DESIGN 3.3 The design of the VRE(s) for IRD is complex, as each use case has many details. And specific codes (Fortran, R, Java, Latex, .Net, ...). The summary below summarizes the wiki detailed design pages. 3.3.1 EWE The EwE software is an open source ecological modeling software suite (http://www.ecopath.org/about). The EwE desktop software runs on Windows only, the EwE computational core is system independent and can run on different OS-es via a local runtime environment such as Mono (www.mono-project.com). Globally, EwE is the most widely used food web model with 7000+ known users and 400+ publications (Colleter et al, 2014). EwE is maintained by a consortium of 18 institutes world-wide (http://ecopath.org/consortium). Traditionally EwE is used for assessing food web dynamics and the impact of fishing, but recent developments have seen an increase in applications to management advise and Environmental Impact Assessments. Currently EwE has limited abilities to: locate and include species parameters from data facilities such as SeaLifeBase, FishBase, OBIS, WoRMS, GBIF, etc., connect to cloud-based data facilities such as THREDDS, pre-process GIS data, which is currently only enabled for the Windows OS, compare multiple runs, which is a feature not provided by the desktop software, run big spatial assessments on Windows computers, run outside Windows. D5.3 Blue Assessment VRE Specification: Revised Version Page 23 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu The objective with BlueBRIDGE is to run an Ecopath model (using MONO) in the infrastructure to test feasibility (2016), and to connect to BlueBRIDGE services for e.g. biological parameters (2016/7). However, the activity was suspended in mid 2016 for various reasons. 3.3.2 ICHTHYOP It is possible to execute the Ichthyop model (http://www.ichthyop.org/) on the infrastructure since March 2017. The researcher in charge of this model foresees some interesting outlooks with this approach for the community of users. To summarize, this model enables to simulate the trajectories of objects driven by ocean currents (plankton, drifters, etc.) at the sea surface or within the water column. Figure below gives an example of 1000 simulations of trajectories (in blue) when driven by Ichthyop with sea surface currents (from satellite data) and the in sito observation of a Drifter at the same period and location. Figure 3: Ichtyop simulation Ocean currents data needed to drive the model can be delivered from Earth observation / satellites images (e.g. OSCAR or GECKO) or model outputs (e.g. ROMS, Drakar, and others). These input data are usually delivered through netCDF files / OPeNDAP access. The model output is delivered within a netCDF file as well. D5.3 Blue Assessment VRE Specification: Revised Version Page 24 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu Figure 4: Ichtyop workflow Users have three options to use the underlying codes: A: using the WPS Web Service from a client: o Desktop application: Web browser, GIS, o Programmatic access: Java, Python, R or any programming language which can deal with WPS B use the codes through services delivered by the collaborative Web Environment (VRE) of BlueBRIDGE which provides GUIs: o Web forms to parametrize and run the codes easily o R server used by multiple applications; RStudio, WPS server, Sharelatex C the service can be embedded (via WPS or OpenCPU) in any Web site (e.g. ichthyop.org): example http://mdst-macroes.ird.fr/tmp/Ichthtyop_one_simulation.html These different options all use seamlessly the same services provided by the infra (backend) and the VRE. Behind the scene, the workflow has been split in various steps: 1. The WPS server manages the executeProcess request with the input parameters provided by the client user. The Ichthyop simulation is executed accordingly. 2. During Ichthyop execution the model is driven by environmental data harvested remotely with OPenDAP protocol provided by Thredds server working on top of OSCAR satellite images package in netCDF (other products might be used to drive the model like GECKO). 3. The result of the model (WPS output parameter) is packaged within a netCDF file. 4. R transforms the native Icththyop netCDF file into another one which is more compliant with CF conventions. 5. R transforms the native netCDF or netCDF-CF files into shapefiles (points and trajectories), does the same with the trajectories of Drifters managed in Postgis databases and create a QGIS map/Project. 6. The user can download everything packaged in a zip file and visualize the results with the Qgis map. Off course, if we can achieve this, the next challenge will consist in storing and analyzing outputs. 3.3.3 BFT ASSESSMENT D5.3 Blue Assessment VRE Specification: Revised Version Page 25 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu Figure 5: BFT Workflow This use case is tightly connected to ICCAT eastern Bluefin tuna stock assessment working group (which produced R and Fortran codes reused on the infrastructure). The connection with the community of practice is done through Sylvain Bonhommeau (chairing the BFT-E group and working on tunas with IRD in a common research unit). Moreover, the approach is very similar for tropical tunas involving IRD stock analysis activities; in particular we want to promote a similar approach (stock assessment tools) to access and process tropical tuna data with IOTC as well as other large pelagic (swordfish?). As illustrated in the Figure below use case will make use of following D4S technologies: Workspace DataMiner WPS Rstudio, Sharelatex working on top of R and latex compilers on the infra, If possible Shiny applications and Sharelatex environment for collaborative edition of documents. Figure 6: Publication options D5.3 Blue Assessment VRE Specification: Revised Version Page 26 of 40 BlueBRIDGE – 675680 3.4 www.bluebridge-vres.eu RESOURCES Data are provided by IRD, and sourced from different information systems as depicted above. Some external data sources are used to feed the different workflow (e.g. tuna RFMOs, datasets, and satellite images). So far, IRD produced the following computing resources to set up the different VREs: (A summary of the full list). Reformatted codes have been deployed in development or production environments. Other algorithms are in preparation and will be deployed in the coming months. The list is maintained in the wiki: List_of_IRD_algorithms) VRE / use case WPS Identifier of the algorithm Environment (Development RProtolab or Production) Goal / Summary Ichthyop model ICHTHYOP_MODEL_ONE_BY_ONE Production and Development This R code packages some extraction to get observed trajectories from data sources (FADs or Drifters) and the execution of Ichthyop driven by OSCAR data to confront simulation with these observatios. netCDF outputs are transformed into maps to be visualized with Qgis. Ichthyop is a free Java tool designed to study the effects of physical and biological factors on ichthyoplankton dynamics Ichthyop model ICHTHYOP_MODEL_MULTIPLE_R UNS ProductionProductionDevel opment(#2344) Algorithm enablong the execution of a set of simulations in one time, this version is parallelized Ichthyop model MAKE_ICHTHYOP_NETCDF_CF_C OMPLIANT Development Transformation du fichier netCDF natif en netCDF-CF Ichthyop model ICHTHYOP_NETCDF_OUTPUT_TO _SHAPEFILE Development This code turns trajectories of ichthyop model outputs delivered with netCDF into a shapefile Ichthyop model MAKE_ICHTHYOP_NETCDF_CF_C OMPLIANT_OUTPUT_TO_SHAPEFI LE TO BE DEPLOYED TO BE DEPLOYED FAO Tuna Atlas VRE TUNA_ATLAS_DATA_ACCESS Development This R code enables users to adapt a SQL query to get data from Sardara database storing global ICCAT BFT-E VRE STEP_1___VPA_ICCAT_BFT_E_RE TROS Development&Production STEP 1: ICCAT (Eastern) BFT Stock Assessment. R and Fortran code provided by ICCAT and IFremer to execute the whole Stock assessment workflow online integration has been done with the help (mediation) of CNR and IRD ICCAT BFT-E VRE STEP_2__VPA_ICCAT_BFT_E_VISU ALISATION Development &Production ICCAT BFT-E VRE STEP_3___VPA_ICCAT_BFT_E_PR OJECTION Development&Production ICCAT BFT-E VRE STEP_4_VPA_ICCAT_BFT_E_REPO RT Development&Production ICCAT BFT-E VRE PARALLELIZED_STEP1_VPA_ICCAT _BFT_E_RETROS Development&Production FAO Tuna Atlas VRE CATCHES_AGGREGATED_FOLLOW ING_A_SELECT_VARIABLE Development D5.3 Blue Assessment VRE Specification: Revised Version ICCAT (Eastern) Bluefin Tuna Stock Assessment. This set of R and Fortran code have been provided by ICCAT and IFremer to execute the whole Stock assessment workflow online integration has been done with the help (mediation) of CNR and IRD STEP 3: ICCAT (Eastern) Bluefin Tuna Stock Assessment. This set of R and Fortran code have been provided by ICCAT and IFremer to execute the whole Stock assessment workflow online integration has been done with the help (mediation) of CNR and IRD ICCAT (Eastern) Bluefin Tuna Stock Assessment. This set of R and Fortran code have been provided by ICCAT and IFremer to execute the whole Stock assessment workflow online integration has been done with the help (mediation) of CNR and IRD STEP 1: ICCAT (Eastern) Bluefin Tuna Stock Assessment. This set of R and Fortran code have been provided by ICCAT and IFremer to execute the whole Stock assessment workflow online integration has been done with the help (mediation) of CNR and IRD Catches Aggregated Following A Select VariableThe outputs are temporal and spatial distribution of the catches aggregated following a selected variable and given the filters applied by the user Page 27 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu FAO Tuna Atlas VRE CATCHES_BY_FLAGS Development Catches By FlagsThe output is a plot of the catches by flags given the filters applied by the user FAO Tuna Atlas VRE CATCHES_BY_FLAGS_SIMPLIFIED_ VERSION Development Catches By Flags Simplified VersionThe output is a plot of the catches by flags given the filters applied by the user FAO Tuna Atlas VRE CATCHES_BY_GEAR_SIMPLIFIED_ VERSION Development Catches By Gear Simplified VersionThe output is a plot of the catches by gear given the filters applied by the user FAO Tuna Atlas VRE CATCHES_BY_GEARS Development Catches By GearsThe output is a plot of the catches by gears for tuna fisheries given the filters applied by the user FAO Tuna Atlas VRE CATCHES_BY_SPECIES Development Catches By SpeciesThe output is a plot of the catches by species given the filters applied by the user FAO Tuna Atlas VRE CATCHES_BY_SPECIES_SIMPLIFIE D_VERSION Development Catches By Species Simplified VersionThe output is a plot of the catches by species given the filters applied by the user FAO Tuna Atlas VRE CATCHES_BY_TYPE_OF_SCHOOL Development Catches By Type Of SchoolThe output is a plot of the catches by type of school given the filters applied by the userCompute Fisheries Indicators From Own Formatted DatasetCompute some fisheries indicators (plots and maps) from a dataset that you have previously formatted and imported through the algorithm Import Fisheries Form... FAO Tuna Atlas VRE GLOBAL_CATCHES Development Global CatchesThe output is a plot of the catches given the filters applied by the user IMPLEMENTATION PLAN 3.5 The different sections of the IRD VRE plan will be developed by IRD, with assistance from FAO and when it concerns Blue Commons, other consortium members such as CNR. The contents of this section are updated to include the major changes of the implementation plan since the release of D5.1 as a report in December 2016. 3.5.1 EWE WORKPLAN EwE activities were scaled back after the first six months, since the lead developers changed organization. The collaboration will continue at a slower pace on a voluntary basis. The workplan for the first six months was to: Write specifications (functional and user requirements) (done) Run a sample of pre-parametrized EwE models within the infrastructure (done) Data access services needed to build and run the model (biological and spatio-temporal data are required to drive the model): Fishbase & Sealifebase, Worms, environmental parameters, EAF Nansen (in progress) Store sets of parameters needed to replicate past runs. Define a way to provide input parameters and a way to browse the results (provide GIS data formats to Ecospace? Shape files, netCDF, other, to supplement the current ESRI ASCII grid files) (not started) Since the release of D5.1, no plans were made to re-activate the EWE activity. D5.3 Blue Assessment VRE Specification: Revised Version Page 28 of 40 BlueBRIDGE – 675680 3.5.2 ICCAT BFT-E WORKPLAN www.bluebridge-vres.eu The overall workplan is to continue to tweak and deploy multiple algorithms on the infrastructure: samples (either to check or set up the computation environment) deployed for real use The workplan will continue to enrich the infra with provide R codes executable with WPS to compile a LateX / knitR / markdown code to get various outputs: pdf, doc, html. This generic process will be used thereafter to compile ICCAT reports and others documents. validate the ability to manage rShiny outputs by making the results available and browsable through a URL within the infrastructure, provide the whole ICCAT Bluefin Tuna Stock assessment Workflow as a single pre-parametrized experiment to validate that all technical issues are managed, Split this worflow in different steps (some of them being generic) o STEP 1 (heavy): Analysis (retros) of the BFT-E ICCAT datasets (catches, efforts) by running the stock assessment model with multiple combinations of input parameters and enabling parallelization of codes, o STEP 2 (light and generic): Visualization of data analysis (retros: outputs of step 1): generate a set of plots and indicators summarizing the results of the processes which are going to be discussed by the working groups. This step requires the ability to manage interactive visualization packages (like Rshiny) as well as to display related html outputs within the infrastructure, o STEP 3 (heavy): Projections which is the most demanding step in terms of machine resources, o STEP 4 (light and generic): Writing the main structure and plots of the executive summary and make it available as pdf or html outputs. Users expect a collaborative environment to edit reports. test the integration of plotly within a knitR document using a html output up the the Rstudio client within the VRE, in FAO Tuna Atlas VRE: deployment of a SQL database within the infrastructure Postgresql & postgis server: the database will stock Tuna Atlas VRE datasets (inputs & outputs) Once processes have been deployed, they are made available by the infrastructure as services (WPS, OpenCPU). These services can be used by the VRE to build friendly GUIs as well as by any other external Website (work of FAO with OpenCPU and related javascript library to manage OpenCPU or WPS services from Web Browser clients).The results of processes execution usually consists in zip files made available with URLs that people can use to work. Access to the infrastructure might be possible through a R connector (from a local PC, RStudio). WP5 doesn't deal with GUIs for online editing or annotation of documents or charts (goal of WP8). Check if some existing VREs (like VME-DB) already provide some services for collaborative / interactive documents. 3.5.3 ICHTYOP WORKPLAN The following options all use seamlessly the same services provided by the infra (backend) and the VRE. D5.3 Blue Assessment VRE Specification: Revised Version Page 29 of 40 BlueBRIDGE – 675680 Behind the scene, the development workflow has been split in various steps: www.bluebridge-vres.eu 1. the WPS server manages the WPS “executeProcess”request with the input paramaters provided by the client user. The Ichthyop simulation is executed accordingly. 2. During Ichthyop execution the model is driven by environmental data harvested remotely with OPenDAP protocol provided by Thredds server working on top of OSCAR satellite images package in netCDF (other products might be used to drive the model like GECKO) and physically replicated and stored on the infrastructure, 3. The result of the model (WPS output parameter) is packaged within a netCDF file 4. R transforms the native data structure of Ichthyop netCDF file into another one which is more compliant with CF conventions, 5. R transforms the native netCDF or netCDF-CF files into shapefiles (points and trajectories), does the same with the trajectories of Drifters managed in Postgis databases and create a QGIS map/Project, 6. the user can download everything packaged in a zip file and visualize the results with the Qgis map. Now that all the steps described above can be achieved, the next challenge will consist in storing and analyzing (thousands of) outputs. A student of the Naval Academy in Brest has been developing a method to explore and flag the quality of the results of Ichthyop simulations. We now need to deploy post processing algorithms to flag the quality of simulations executed on the infra. (M21) D5.3 Blue Assessment VRE Specification: Revised Version Page 30 of 40 BlueBRIDGE – 675680 4 STOCK ASSESSMENT - ICES www.bluebridge-vres.eu The ICES stock assessment VRE is focused on providing stock assessment services through DataMiner. It also includes basic data management features, e.g. storage of input and output data files. The aim is to improve the performance of existing stock assessment models; reduce the time to completion, improve the metadata description, and improve simulation runs. USE CASES 4.1 ICES has proposed several algorithms and models in support of capacity building for individual scientists and assessment groups. These training and capacity building activities will have to be coordinated with WP8 and the other WP members. MSE for western horse mackerel o Harvest control rules (HCRs) are used by fishery management agencies to achieve harvest policies that meet both conservation and stability objectives. Management Strategy Evaluation (MSE) is a framework that evaluates a number of candidate HCRs according to these objectives. MSEs generally apply a simulation framework that explores sensitivities in parameterization of model HCRs, resulting in a computationally intensive framework. The MSE detailed below has been developed by José De Oliveira for western horse mackerel and will serve as a test case for using the BlueBRIDGE VRE platform for MSE analyses. o The model will not be an as-a-service VRE, but a use case in compartmentalized and focused VREs. Mixed fisheries assessment model o The Mixed Fishery assessment uses an Fcube approach (after Fleet and Fishery Forecast), a simple model of mixed fisheries which can be used to assess the consistency between management (TAC and/or effort) advice for species caught together, given the availability and accessibility of data. A simple linear relationship is assumed between effort and fishing mortality and status quo catchability assumed in the projections. Forecasts are produced according to scenarios for what limits the effort in each fleet and the forecasted catch of each species compared to the single species advice for TAC. The Fcube model in its current form is appropriate for all regions where the majority of species of commercial interest are assessed quantitatively. Many scenarios can be evaluated, including an optimization process that will identify the set of fishing mortalities, by stock, that maximizes a given objective function. o This model will not be an as-a-service VRE, but a use-case in compartmentalized and focused VREs. Ensemble model o LeMANS is a size-structured multi-species model of a fish community with a realistic distribution of life-history attributes. This approach differs from that reported from most other size-based models in that it maintains both the identity of the species in the system and the size structure of the individual populations. The model has been applied to the North Sea and calculates biomasses of 21 fish stocks [2] [4] . Recruitment occurs each year and predation is based on size moderated by a diet matrix, but no starvation occurs. Ecosystem components not represented explicitly make up the pool of “other food”. An ensemble approach has been implemented by screening potential model set-ups against ICES abundance data to produce a subset of models consistent with data that can be used to generate probabilistic projections [4]( o The model will not be an as-a-service VRE, but a use case in compartmentalized and focused VREs. D5.3 Blue Assessment VRE Specification: Revised Version Page 31 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu ICES obtained the source code for each model, and negotiated terms and conditions with the owner of the data and software to use the BlueBRIDGE infrastructure. The models were successfully tested in a private environment. After review with infrastructure developers, it was decided that the code needs to be compartmentalized to better fit with an infrastructure approach. Therefore, a revised approach has been developed. First, log-jams in the use-case source code are identified and functions (here, referred to as base functions) that can be modified to be more computationally efficient are identified (ticket #5818). Next, the base functions are modified to run in parallel, or to utilize other infrastructure services (tickets #5819 and #5821). The modified functions will be deployed as-a-service and wrappers will be developed to link input from use-case source code to the VRE for computation and return from the VRE into the local environment as output. This signifies a move from the specific models based on a purpose-built script towards an infrastructure. An additional use-case has been developed around extant R shiny code (Data-limited-tools) developed by Jason Cope ([email protected]). DLM Toolkit Toolbox for Data Limited Methods Stock assessment (DLM Toolkit) #1778 This represents a change from the plan presented in D5.1, and also attracted the interest of FAO, and a joint use case is being implemented around this extendable Toolkit. The Toolkit can serve several user communities with different needs, and offers ample advantages if used through an infrastructure, for instance by offering a uniform data-space, replicability, performance, and simplicity. USERS 4.2 The users of the tools are from the stock assessment community; 4.3 Stock assessment individual scientists (current community); Stock assessment trainers and students (with WP8; courses cover some of above cases); Natural resource management community. VRE DESIGN The VRE will rely on the DataMiner facilities. This will be achieved in a few steps, for each of these the design can rely on the already available VRE design facilities: 1. The algorithms, where possible, will be tested in the on-line R-Studio environment. The requires access to R-Studio, if possible integrated with the Workspace; 2. When successful, they will be integrated through the Statistical Algorithm Importer (SAI) and published to the selected community through the DataMiner. 3. The VRE community can now discover the process in the VRE, and run the model. The output will be stored (initially) on the workspace, but can also (later) be made available in dedicated pages. 4.4 RESOURCES This VRE will rely on existing tools that will need to be integrated in the infrastructure. The use of resources depends on the assessment activity, and is difficult to predict at this stage. The tools depend at least on R and R-shiny, and for an integration in the infrastructure will rely on DataMiner and the Workspace. For completed assessments, a dissemination through CKAN is considered. D5.3 Blue Assessment VRE Specification: Revised Version Page 32 of 40 BlueBRIDGE – 675680 4.5 IMPLEMENTATION PLAN www.bluebridge-vres.eu The implementation plan is incorporated in the Redmine ticketing system, and implementation activities can be tracked for the ICES Stock Assessment VRE. The first theme of the ICES Workplan aims at migrating models that are fit for infrastructure deployment: 1. Identify where efficiencies can be made in base functions #5818 2. Modify base functions to run in parallel or to utilize other infrastructure services #5819 and #5821 3. Deploy modified functions The second theme relates to implementation of Data Limited Methods (DLM), which is also of interest to FAO. The DLM implementation plan for the coming period covers the exploitation of the existing DLM toolkit, and to render it an infrastructure asset with managed users and data / computing resources. The first theme covers the work of individual scientists to validate the proposed integration approach, while the second, after successful validation, will raise the use of the tool to a multi-user environment: 1. Simple minimal example of Shiny App with ShinyProxy #5822 , #5823 2. Deploy DLM Toolkit https://github.com/shcaba/Data-limitedtools/tree/master/Shiny_DLMtool with ShinyProxy #1778 Adaptation since D5.1 After D5.1 was released, it was decided that the second theme offers a more promising future perspective, as it requires less effort on specific models and has a potential larger user base. The work of the coming period will therefore focus on the development of DLM methods. Since FAO has expressed a similar interest, the development work will be shared. ICES effort for M16-M18 was devoted to developing shiny applications that were also used for the ICES MSY training course (in WP8), and are available to scientists. ICES created a Length-based indicator application that takes raw length frequency data and transforms them into output suitable for determining ICES proxy MSY reference points. The code is ready for deployment in a Docker image and constitutes a test case for ShinyProxy (task 7492). ICES considers further effort on tasks #1768, #4620, and #5517 not immediately necessary, as the advance analysis of the algorithms and models revealed that are not really suited for deployment on the einfrastructure; modifying the identified “log-jam” functions would likely be more efficient and elegant using C++ or TMB rather than simply parallelizing inefficient code. The costs of that approach however is high, and they might find limited broad-scale application to the rest of the community. Further effort should be designated to linking the e-infrastructure with docker images / shinyproxy such that data can be moved between the resources and so that the e-infrastructure is more than a simple server for shiny apps. ICES is currently discussing with ENG on how this might be done. This activity under ShinyProxy #1778 is also relevant to FAO, and a joint approach is being implemented. The approach aims to offer DLM services to any community through a VRE. The plan is to start after the shinyproxy is ready (March 2017) and will enrich the number of algorithms in DLM Tools on the one hand, and better integrate the tool with infrastructure services, an activity that will continue until M24, after which exploitation starts. ICES expects further effort should be designated to improve the integration between the workspace, Rstudio, and VREs, of interest to all partners in this WP. This is described this in task #7493. D5.3 Blue Assessment VRE Specification: Revised Version Page 33 of 40 BlueBRIDGE – 675680 5 GRSF www.bluebridge-vres.eu This Global Record of Stocks and Fisheries aims to produce a collaborative environment to maintain a global knowledge base of stocks and fisheries. It satisfies the need for intelligent identification of stocks and fisheries, the assessment of their status over time, and deep linking to the data sources and assessment history. Building on community and infrastructure products (software and datasets) as well as other VREs offerings (models and datasets, knowledge base development) and extending those with participant specialists in data mining and mapping, indicators extraction and dataset integration (where required), the resulting VRE provides a global information resource on inter-disciplinary stock and fisheries information. USE CASES 5.1 The main purpose of the VRE is to provide scientists with an environment and the tools for accessing stocks and fisheries information. To this end a registry containing such information will be constructed. This registry will be the core backbone of the VRE and will integrate data about stocks, fisheries and their corresponding details, coming from different sources. For this reason, a set of basic data sources (namely FIRMS, RAM and Fishsource), as well as a set of models and software components are required, for achieving the data integration and the tools and processes for constructing, monitoring and maintaining the registry. There are two main use cases that together constitute the GRSF; 1. Access, ingest and manage the GRSF source records to produce a global integrated knowledge base on stocks and fisheries; 2. Publish and facilities the knowledge base through a human and machine readable registry. USERS 5.2 Two main type of users are foreseen in the infrastructure; Data Managers; liaise with data contributors, ingest and integrate information, ensure consistency and completeness of data, establish mappings and relations, and Data Publishers; once the data manager has completed the preparatory phase, the resulting records will have to be accorded by the contributing parties (the legal owners), and possibly by other organizations (if part of the data was sourced from them), and the resulting records, (that may contain merged or modified information) will have to be validated on their consistency with the source information. Only after that has been completed, can they be published as GRSF records. Two main type of users are foreseen consuming data from the infrastructure The physical consumers of the Global Record of Stocks and Fisheries are all parties interested in the state of stocks, fisheries and marine resources, and range from national governments, through regional fisheries management organizations, to scientist and interested individuals. These users require a user friendly UI. The non-physical consumers of the Global Record of Stocks and Fisheries are external sites that wish to reproduce parts of, or the entire GRFS. The GRSF will not be open to any machine, and its owners will first have to subscribe to, and abide by, a data policy that is likely to be enforced at the level of the CKAN registry. D5.3 Blue Assessment VRE Specification: Revised Version Page 34 of 40 BlueBRIDGE – 675680 5.3 www.bluebridge-vres.eu VRE DESIGN The VRE design for what concerns the infrastructure based management of the GRSF data ingestion, management and mapping dependent on the corresponding software tools (MatWare, GRSF-services, etc.), and integrated in the infrastructure. It will have data managers who are dependent on infrastructure user policies. For what concerns the dissemination, the GRSF will rely on a dedicated catalogue, based on the GCube Data Catalogue facilities, for exposing the content of the Semantic Knowledge base. The following figure depicts that overall design of the GRSF VRE. Figure 7. The GRSF VRE overal design 5.4 RESOURCES Data resources: 5.4.1 FIRMS FIRMS (http://firms.fao.org/firms/en) is an acronym of Fisheries and Resources Monitoring System, and has the main objective of providing access to a wide range of high-quality information on the global monitoring and management of fishery marine resources. FIRMS collects data from 14 intergovernmental organizations and contains information for more than 1000 stocks and 300 fisheries. FIRMS contents are exposed in XML, using a set of particular services, with respect to a predefined XML schema. A more detailed discussion about the main concepts found in FIRMS, with their description and the corresponding XPATH pattern (for retrieving them from the XML response) can be found at: https://support.d4science.org/projects/bluebridge/wiki/FIRMS_Analysis_and_Modeling. 5.4.2 RAM LEGACY STOCK ASS ESSMENT DATABASE RAM Legacy Stock Assessment Database (http://ramlegacy.org) is a compilation of stock assessment results for commercially exploited marine populations from around the world. The assessments were assembled from 21 national and international management agencies for a total of 331 stocks. There are metadata for each stock describing various information, including the taxonomic information of the species, the geographic location of the stock, the management body that conducted the assessment and the particular assessment D5.3 Blue Assessment VRE Specification: Revised Version Page 35 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu methodology. The key concepts of the database are: Area, Assessment, AssessMethod, Assessor, Biometrics, Bioparams, Management, Stock, Taxonomy, Timeseries and TSMetrics. An extended discussion about RAM can be found at https://support.d4science.org/projects/bluebridge/wiki/RAM_Analysis_and_Modeling. 5.4.3 FISHSOURCE FishSource (http://www.fishsource.com) is an online information registry containing various information about stocks and fisheries. It exposes its contents through fisheries profiles. Each profile contains: (a) the identification, with various information about the species, the water area, the management areas, etc., (b) the scores, with statistics and measures, (c) various sustainability information, (d) summary and basics information and (e) references and reviews. The information are currently available to the public as HTML pages, however in future they plan to expose them using a set of services. 5.4.4 OTHER SOURCES Apart from the three main sources containing information about stocks and fisheries which are described above, the global registry will also exploit information from other sources as well. Just indicatively it will exploit information from the MarineTLO-based warehouse (http://wiki.i-marine.eu/index.php/MarineTLObased_warehouse), that was constructed in the context of the iMarine project. The MarineTLO-based warehouse contains various information (e.g. taxonomic, preys-predators, scientific and common names, vessels, water areas, etc.) from the following sources: Fisheries Linked Open Data – FLOD (http://www.fao.org/figis/flod/endpoint) ECOSCOPE Knowledge Base (http://ecoscopebc.mpl.ird.fr/joseki/ecoscope) FishBase (http://www.fishbase.org) World Register of Marine Species – WoRMS (http://www.marinespecies.org) DBpedia (http://dbpedia.org) The Semantic Models 5.4.5 REQUIREMENTS The integration of data coming from heterogeneous sources will facilitate the better exploitation of information that exists in different sources. More specifically it will allow connecting data referring to the same piece of information (i.e. stocks that contain the same marine species), and finally achieving to answer complex queries, that could not be answered only by the underlying sources. For this reason, we pay attention to the querying requirements. In order to make concrete statements about the information that should be stored in the registry, we used the notion of competency queries. A competency query is a query useful for the community at hand, e.g. for a human member (e.g. a scientist), or for building applications for that domain. Therefore, a list of such queries can sketch the desired scope and the desired structuring of the information. An indicative list of the competency queries focusing on information about stocks and fisheries (at least for the FIRMS source) can be found at https://support.d4science.org/projects/bluebridge/wiki/Competence_questions. For achieving the integration of heterogeneous sources, it is important to select an appropriate model (i.e. a top level ontology), that is abstract enough to cover most of the fundamental categories of the underlying source, can be easily extended to any level of detail on demand, and is rich in terms of properties so that the particular details of the different sources can be easily mapped to it. D5.3 Blue Assessment VRE Specification: Revised Version Page 36 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu To this end we selected the CIDOC Conceptual Reference Model (ISO 21127:2006) [5] and its extensions. CIDOC CRM provides the definitions for describing the implicit and explicit concepts and relationships used in cultural heritage documentation. The latest version of the ontology (version 6.2) comprises of 82 classes and 287 properties. It has a rich structure of intermediate classes and relations, which apart from being very useful for building query services, it makes its extension to other domains easier and reduces the risk of overgeneralization/specialization. Some of its distinctive extensions are described below. MarineTLO [6] is a top level ontology, generic enough to provide consistent abstractions or specifications of concepts included in all data models or ontologies of marine data sources and provide the necessary properties to make this distributed knowledge base a coherent source of facts relating observational data with the respective spatiotemporal context and categorical (systematic) domain knowledge. It can be used as the core schema for publishing Linked Data, as well as for setting up integration systems for the marine domain. It can be extended to any level of detail on demand, while preserving monotonicity. The latest version of MarineTLO (version 4) contains 127 classes and 81 properties. CRMsci [7] is an ontology which is intended to be used as a global schema for integrating metadata about scientific observation, measurements and processed data in descriptive and empirical sciences such as biodiversity, geology, geography, etc. The CRMsci model has been developed bottom up from specific metadata examples from biodiversity, geology, archaeology, cultural heritage conservation and clinical studies. The latest version of CRMsci (version 1.2.2) contains 31 classes and 52 properties. CRMdig [8] is an ontology that has been derived as an extension of CIDOC CRM, which is able to record the provenance of digital objects. More specifically it is able to capture the steps and the methods of the production of digitization products and synthetical digital representations. It also includes completely the initial physical measurement processes and their parameters. The latest version of CRMdig (version 3.2) contains 16 classes and 69 properties. The Tools and Processes The construction of a registry containing data coming from heterogeneous sources, is a rather laborious task. Therefore, there is a need for tools that automate the process of constructing it, and monitoring the result. More specifically for the construction and maintenance of the GRSF VRE the following software component have been exploited. Matware (https://support.d4science.org/projects/bluebridge/wiki/Matware) which is a framework that automates the process of constructing semantic warehouses, by fetching and transforming data from different (and in most cases heterogeneous) data sources. MatWare ensures that the contents of the semantic warehouse are properly connected by exploiting connectivity metrics for measuring how much of the integrated content is connected. MatWare is exploited for supporting the construction of the knowledge base of the Global Record of Stocks and Fisheries. GRSF-services-core (https://wiki.gcube-system.org/index.php?title=GRSF-services#grsf-servicescore) which is a software library which is responsible for exposing the contents of the Global Registry of Stocks and Fisheries knowledge base and allow the users to approve or reject and annotate particular GRSF records. GRSF-services-updater (https://wiki.gcube-system.org/index.php?title=GRSF-services#grsf-servicesupdater). After publishing GRSF records in the GRSF Data Catalogue the records have initially the status “pending”. This means that their contents have not been checked by an expert to approve that they are correct or annotate them as erroneous. This is something that is being carried out from the GRSF VRE administrator. For this reason, as the GRSF VRE administrator browses over the GRSF records in the GRSF Data Catalogue, he can confirm or reject a GRSF record. In order to avoid inconsistencies and have the contents of the GRSF Data Catalogue aligned with the GRSF KB (recall D5.3 Blue Assessment VRE Specification: Revised Version Page 37 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu that the GRSF Data Catalogue is populated using the GRSF KB), it is necessary to update both of them. DataCatalogue-publish-API (https://wiki.gcubesystem.org/index.php?title=GCube_Data_Catalogue_for_GRSF) which is responsible for publishing resources in the GCube Data Catalogue. For the purposes of GRSF a specific Data Catalogue has been deployed. The GRFS Data Catalogue stores, as well as allows the publication of products of two types: Stock and Fishery. Apart from the default set of metadata, each type of product will also have specific fields. Some of them will also become automatically tags of the product. The same reasoning applies for group associations. In fact a set of groups was already available and each product will be automatically associated to them during publication, if that is the case. The gCube Data Catalogue (https://wiki.gcube-system.org/gcube/GCube_Data_Catalogue) which is built using and extending CKAN platform. It allows publishing rich metadata, both the ones that CKAN supports like titles, descriptions, licensing information, responsible persons and their roles, as well as gCube-specific ones. The later are organized into profiles and are expressed as XML-based metadata, and they allow including custom metadata fields in to the Data Catalogue. It has been integrated in the infrastructure to assist the publishing, storage and discovery of data resources. More specifically it contains resources that are intended for and resulting from the services of the Blue Assessment VREs, to serve cases ranging from stock assessment to aquaculture atlas generation, strategic investment and scientific training. Datasets include species distribution maps, environmental data, area regulation zones, as well as stocks and fisheries. 5.5 IMPLEMENTATION PLAN The overall implementation plan is the same as reported in 5.1; the tool will be implemented using the following contributions: FORTH will provide the technology and manage the development of the GRSF; FAO will liaise with the user community, and ensure the development meets the community needs. FAO will liaise in particular with RAM and SFP, who also provide in-kind effort, software, and data; FORTH and CNR will provide overall technical integration and publication of GRSF records through the CKAN registry (through WP9). Since D5.1, a more detailed implementation plan was prepared during the External Advisory Board Technical Working Group Meeting (EAB-TWG2) and comprises of VRE development and implementation activities: Update the processes for building the GRSF knowledge base (March - May) o Re-generate GRSF records (3 times by the end of the project) o Harvesting new data from sources (2-3 times by the end of the project) Partners’ Tests (i.e. Data validation, UUIDs, etc.) (June - July) Update GRSF interfaces (Admin and Public) (March - May) Complete the development of the management functions (Mngt. panel) (March - May) o Merge function o Display proximities o Traceability flag Field comment for users with categories Other requirements, integration of competency queries, etc. (from September) The progress of the implementation can be checked through the meeting reports: https://support.d4science.org/projects/stocksandfisherieskb/wiki/GRSF#Meetings And for specific GRSF activities through the ticketing system: https://support.d4science.org/issues/643 D5.3 Blue Assessment VRE Specification: Revised Version Page 38 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu REFERENCES [1] Assante M, Candela L, Castelli D, Coro G, Lelii L, Pagano P. (2016) Virtual research environments as-a-service by gCube. PeerJ Preprints 4:e2511v1 https://doi.org/10.7287/peerj.preprints.2511v1 [2] Ellenbroek, A. (2016) Blue Assessment VRE Specification. BlueBRIDGE Deliverable D5.1. [3] Rochet MJ, Collie JS, Jennings S, Hall SJ. Does selective fishing conserve community biodiversity? Predictions from a length-based multispecies model. Canadian Journal of Fisheries and Aquatic Sciences. 2011;68:469–486. [4] Evaluation and management implications of uncertainty in a multispecies size-structured model of population and community responses to fishing; Robert B Thorpe, Will J F Le Quesne, Fay Luxford, Jeremy S Collie, Simon Jennings; Methods Ecol Evol. 2015 [5] M. Doerr. The CIDOC conceptual reference module: an ontological approach to semantic interoperability of metadata. AI magazine, 24(3):75, 2003. [6] Y. Tzitzikas, C. Allocca, C. Bekiari, Y. Marketakis, P. Fafalios, M. Doerr, N. Minadakis, T. Patkos, and L. Candela. Unifying heterogeneous and distributed information about marine species through the top level ontology MarineTLO. Program: electronic library and information systems, 50(1), 2015. [7] M. Doerr, C. Bekiari, A. Kritsotaki, G. Hiebel, and M. Theodoridou. Modelling scientific activities: proposal for a global schema for integrating metadata about scientific observation. In Access and understanding–networking in the digital era: The 6th annual conference of CIDOC, the International Committee for Documentation of ICOM, Dresden, Germany, 2014. [8] M. Theodoridou, Y. Tzitzikas, M. Doerr, Y. Marketakis, and V. Melessanakis. Modeling and querying provenance by extending cidoc crm. Distributed and Parallel Databases, 27(2):169–210, 2010. [9] Y. Tzitzikas, N. Minadakis, Y. Marketakis, P. Fafalios, C. Allocca, M. Mountantonakis, and I. Zidianaki. Matware: Constructing and exploiting domain specific warehouses by aggregating semantic data. In The Semantic Web: Trends and Challenges, pages 721– 736. Springer, 2014. [10] N. Minadakis, Y. Marketakis, H. Kondylakis, G. Flouris, M. Theodoridou, M. Doerr, and G. de Jong. X3ML framework: An effective suite for supporting data mappings. Workshop for Extending, Mapping and Focusing the CRM - co-located with TPDL’2015, September 2015. D5.3 Blue Assessment VRE Specification: Revised Version Page 39 of 40 BlueBRIDGE – 675680 www.bluebridge-vres.eu APPENDIX 1 Overview of VREs related to task 5.1; generic VREs and model or framework specific # Status Subject Due date 6098 New RDB for Fisheries Data Management (WECAFC) May 31, 2017 6870 New FLR for JRC May 31, 2017 7449 New request for a VRE for IOTC working party (november 2017) : Ichthyop VRE Oct 02, 2017 5238 New WECAFC Stock assessment support 1678 In Progress FAO Stock Assessment VRE Oct 31, 2016 1679 In Progress ICES Stock Assessment VRE Oct 01, 2016 5493 Released Revise FAO Tuna Atlas VRE: Enlarge the pool of DataMiner Algorithms and make SAI available Oct 17, 2016 1675 Released GRSF VRE; Global Record of Stocks and Fisheries 4846 Released VRE Creation for ICCAT BFT-E 12-Sep-16 5136 Released WECAFC-FIRMS: Add TabMan and Dataminer 4-Nov-16 5886 Released GRSF_Admin 25-Nov-16 6229 Released Create a VRE realising an analytics environment: the Analytics Lab 23-Dec-16 5016 Released RPrototypingLab Deployment 4894 Released BlueBridge RStudio VRE 1677 Removed Ecopath VRE Oct 01, 2016 1779 Removed IRD BFT Assessment Oct 01, 2016 D5.3 Blue Assessment VRE Specification: Revised Version Page 40 of 40
© Copyright 2024 Paperzz