document information - Google URL Shortener

BlueBRIDGE – 675680
www.bluebridge-vres.eu
Project Acronym
BlueBRIDGE
Project Title
Building Research environments for fostering Innovation,
Decision making, Governance and Education to support
Blue growth
Project Number
675680
Deliverable Title
Blue Assessment VRE Specification: Revised Version
Deliverable No.
D5.3
Delivery Date
December 2016
Authors
J. Barde, A. Ellenbroek, C. Formisano, S. Large, Y.
Marketakis
BlueBRIDGE receives funding from the European Union’s Horizon 2020 research and innovation programme under
grant agreement No. 675680
BlueBRIDGE – 675680
www.bluebridge-vres.eu
DOCUMENT INFORMATION
PROJECT
Project Acronym
BlueBRIDGE
Project Title
Building Research environments for fostering Innovation, Decision making,
Governance and Education to support Blue growth
Project Start
1st September 2015
Project Duration
30 months
Funding
H2020-EINFRA-2014-2015/H2020-EINFRA-2015-1
Grant Agreement No.
675680
DOCUMENT
Deliverable No.
D5.3
Deliverable Title
Blue Assessment VRE Specification: Revised Version
Contractual Delivery Date
December 2016
Actual Delivery Date
May 2017
Author(s)
J. Barde (IRD), A. Ellenbroek (FAO), S. Large (ICES), P. Fabriani (ENG), C.
Formisano (ENG), Y. Marketakis (FORTH), A.Gentile (FAO)
Editor(s)
J. Barde (IRD)
Reviewer(s)
L. Candela (CNR)
Contributor(s)
n.a.
Work Package No.
WP5
Work Package Title
Supporting Blue Assessment: VREs Development
Work Package Leader
FAO
Work Package Participants
ENG, ICES, IRD, FORTH
Distribution
Public
Nature
Other
Version / Revision
V1.0
Draft / Final
Final
Total No. Pages
(including cover)
Keywords
40
Stock assessment, Global record of stocks and fisheries, community
software, R, ecological models, semantic model, aggregation, fisheries
D5.3 Blue Assessment VRE Specification: Revised Version
Page 2 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
DISCLAIMER
BlueBRIDGE (675680) is a Research and Innovation Action (RIA) co-funded by the
European Commission under the Horizon 2020 research and innovation programme
The goal of BlueBRIDGE, Building Research environments for fostering Innovation,
Decision making, Governance and Education to support Blue growth, is to support
capacity building in interdisciplinary research communities actively involved in
increasing the scientific knowledge of the marine environment, its living resources,
and its economy with the aim of providing a better ground for informed advice to
competent authorities and to enlarge the spectrum of growth opportunities as
addressed by the Blue Growth societal challenge.
This document contains information on BlueBRIDGE core activities, findings and
outcomes and it may also contain contributions from distinguished experts who
contribute as BlueBRIDGE Board members. Any reference to content in this
document should clearly indicate the authors, source, organisation and publication
date.
The document has been produced with the funding of the European Commission. The content of this
publication is the sole responsibility of the BlueBRIDGE Consortium and its experts, and it cannot be
considered to reflect the views of the European Commission. The authors of this document have taken any
available measure in order for its content to be accurate, consistent and lawful. However, neither the project
consortium as a whole nor the individual partners that implicitly or explicitly participated the creation and
publication of this document hold any sort of responsibility that might occur as a result of using its content.
The European Union (EU) was established in accordance with the Treaty on the European Union (Maastricht).
There are currently 27 member states of the European Union. It is based on the European Communities and
the member states’ cooperation in the fields of Common Foreign and Security Policy and Justice and Home
Affairs. The five main institutions of the European Union are the European Parliament, the Council of
Ministers, the European Commission, the Court of Justice, and the Court of Auditors (http://europa.eu.int/).
Copyright © The BlueBRIDGE Consortium 2015. See http://www.bluebridge-vres.eu for details on the copyright holders.
For more information on the project, its partners and contributors please see http://www.i-marine.eu/. You are
permitted to copy and distribute verbatim copies of this document containing this copyright notice, but modifying this
document is not allowed. You are permitted to copy this document in whole or in part into other documents if you
attach the following reference to the copied elements: “Copyright © The BlueBRIDGE Consortium 2015.”
The information contained in this document represents the views of the BlueBRIDGE Consortium as of the date they are
published. The BlueBRIDGE Consortium does not guarantee that any information contained herein is error-free, or up
to date. THE BlueBRIDGE CONSORTIUM MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, BY PUBLISHING
THIS DOCUMENT.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 3 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
GLOSSARY
ABBREVIATION
CNR
CMSY
CMSY-as-a-Service
DLM Toolkit
ENG
FAO
FLR
FORTH
ICES
IRD
KPI
Matware
OpenCPU
RShiny
RStudio
SDMX
SS3
VPA
VRE
DEFINITION
Consiglio Nazionale della Ricerche (National Research Council of Italy)
Catch Maximum Sustainable Yield; a set of models for stock assessment
The CMSY implementation used in the BlueBRIDGE VREs Dataminer
Toolkit of methods used for managing data-limited fisheries and a management
strategy evaluation of their relative performance across a range of fisheries
Engineering – Ingegneria Informatica Spa
Food and Agriculture Organization of the United Nations
Fisheries Library in R; a JRC led initiative for quantitative fisheries science,
developed in the R language
Foundation for research and technology Hellas
International Council for the Exploration of the Sea
Institut de Recherche pour le Développement
Key Performance Indicator
A tool to construct domain-specific warehouses by aggregating semantic data
A framework to expose an http API for embedded scientific computing with R
A web application framework for R.
A free and open-source integrated development environment (IDE) for R, a
programming language for statistical computing and graphics
Statistical Data and Metadata eXchange; a global data format
Stock Synthesis Version 3; an advanced model for stock assessment
Virtual Population Analysis; an approach to stock assessment
Virtual Research Environment
D5.3 Blue Assessment VRE Specification: Revised Version
Page 4 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
TABLE OF CONTENT
DOCUMENT INFORMATION ........................................................................................................ 2
DISCLAIMER ............................................................................................................................... 3
GLOSSARY .................................................................................................................................. 4
TABLE OF CONTENT .................................................................................................................... 5
DELIVERABLE SUMMARY ............................................................................................................ 7
EXECUTIVE SUMMARY ............................................................................................................... 8
1
Stock Assessment – Generic VRE........................................................................................ 10
1.1
Use Cases ...............................................................................................................................10
1.1.1
Tabular Data management ....................................................................................................... 10
1.1.2
TabMan Data standardization .................................................................................................. 10
1.1.3
Data Miner Data analysis .......................................................................................................... 11
1.1.4
Species Discovery service ......................................................................................................... 11
1.1.5
Data Dissemination................................................................................................................... 12
1.1.6
Data Publication – The publication of VRE outcomes .............................................................. 12
1.2
Users .....................................................................................................................................13
1.3
VRE Design .............................................................................................................................13
1.4
Resources...............................................................................................................................13
1.4.1
Data resources .......................................................................................................................... 13
1.5
Implementation plan ..............................................................................................................14
2
Stock Assessment - FAO..................................................................................................... 17
2.1
Use Cases ...............................................................................................................................17
2.1.1
Regional database for Fisheries Data management: WECAFC case ......................................... 17
2.1.2
Stock Assessment Support: WECAFC case................................................................................ 17
2.1.3
Tuna atlas upgrade and added services ................................................................................... 18
2.2
Users .....................................................................................................................................19
2.3
VRE Design .............................................................................................................................19
2.4
Resources...............................................................................................................................20
2.5
Implementation plan ..............................................................................................................20
3
Stock Assessment - IRD ...................................................................................................... 22
3.1
Use Cases ...............................................................................................................................22
3.2
Users .....................................................................................................................................22
3.3
VRE Design .............................................................................................................................23
3.3.1
EwE ........................................................................................................................................... 23
3.3.2
Ichthyop .................................................................................................................................... 24
3.3.3
BFT Assessment ........................................................................................................................ 25
3.4
Resources...............................................................................................................................27
3.5
Implementation plan ..............................................................................................................28
3.5.1
EwE Workplan........................................................................................................................... 28
3.5.2
ICCAT BFT-E Workplan .............................................................................................................. 29
3.5.3
Ichtyop Workplan ..................................................................................................................... 29
D5.3 Blue Assessment VRE Specification: Revised Version
Page 5 of 40
BlueBRIDGE – 675680
4
Stock Assessment - ICES..................................................................................................... 31
4.1
4.2
4.3
4.4
4.5
5
www.bluebridge-vres.eu
Use Cases ...............................................................................................................................31
Users .....................................................................................................................................32
VRE Design .............................................................................................................................32
Resources...............................................................................................................................32
Implementation plan ..............................................................................................................33
GRSF ................................................................................................................................. 34
5.1
Use Cases ...............................................................................................................................34
5.2
Users .....................................................................................................................................34
5.3
VRE Design .............................................................................................................................35
5.4
Resources...............................................................................................................................35
5.4.1
FIRMS ........................................................................................................................................ 35
5.4.2
RAM Legacy Stock Assessment Database ........................................................................... 35
5.4.3
FishSource ............................................................................................................................... 36
5.4.4
Other Sources ......................................................................................................................... 36
5.4.5
Requirements ......................................................................................................................... 36
5.5
Implementation plan ..............................................................................................................38
REFERENCES ............................................................................................................................. 39
Appendix 1 ............................................................................................................................... 40
Overview of VREs related to task 5.1; generic VREs and model or framework specific .......................... 40
D5.3 Blue Assessment VRE Specification: Revised Version
Page 6 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
DELIVERABLE SUMMARY
This document reports on outcomes of the activities in BlueBRIDGE Work Package 5 “Supporting Blue
Assessment: VRE Development”. In particular it documents the requirements and specifications to support
to the development of stock assessment data services and the Global Record of Stocks and Fisheries, from
M16 to M18, and includes implementation plans until M22.
The Deliverable 5.3; Revised version was originally planned as a living wiki page that would grow with the
development of Blue Assessment tools and services. This wiki is used to propose new ideas, discuss potential
development efforts, and report some results. The obvious overlap with the ticketing system, and the
multitude of development activities, some of which stopped or completed in 2016, while other are not
planned until 2018, led to fragmentation of the wiki in some parts, as observed for D5.1.
In November 2016 the PO requested that for Deliverable 5.1 a report had to be produced to summarize the
information of the wiki in an organized fashion, which resulted in the VRE Specifications of D5.1. The
Deliverable 5.3 (revised version of the VRE plan) was scheduled for release only one month after the
production of this report. It was decided to delay the production of D5.3 from December 2016 to May 2017
to capture several community meetings and include the resulting requirements and specifications for VREs
and data services.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 7 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
EXECUTIVE SUMMARY
Deliverable D5.3 “Blue Assessment VRE Specification: Revised Version” is an on-line living document which
presents the requirements, specifications and design of the solution for the blue assessment VREs.
The Deliverable is available on project’s wiki under the following location:
https://support.d4science.org/projects/bluebridge/wiki/D_5_3
This document captures the state of plans and their implementation in March 2017, while the underlying online documents, wiki, and tickets are expected to evolve under the principles of the agile methodology that
BlueBRIDGE project is following.
The implementation activity two main VREs capturing the overall requirements can be traced here:
 Stock Assessment VRE Specification
 Global Record of Stocks and Fisheries VRE Specification
Appendix [0] depicts the development status of the VREs was as per the reporting date (May 2017)
The overall description of the VREs are accessible in the wiki:
 https://support.d4science.org/projects/bluebridge/wiki/Task_51_Stock_Assessment_VRE
 https://support.d4science.org/projects/bluebridge/wiki/GRSF_VRE_plan
This document summarizes the description of each VRE focussing on the following aspects:
 Use cases, including a brief description of the expected activities
 Involved stakeholders
 Design of the VRE, including the involved resources, such as software, data, metadata, and services
provided by the infrastructure, but also policies, external use and users, and collaboration.
These two main VREs are used to serve specialized communities with specific requirements, users, and data.
At the time of writing, several VREs were being developed; either as a separate VRE (GRSF), as software
(CMSY-as-a-service, OpenCPU, SS3, DLMTool, FLR, etc) or as integrated components of the main VRE (BFT
assessment, FAO Tuna Atlas updates, etc.) where the specialization is manifested through models for stock
assessment.
This version of the Deliverable is a revised version of D5.1, spanning Blue Assessment tasks. These tasks
require specialized VREs that are included in this Deliverable in addition to the VREs originally described. It
also follows the structure of D5.1, and provides information of the models, analysis and scientific planning
expected to be performed. Additional use cases are expected to be identified in next stages of the project,
and the delivered VRE must thus offer flexible and adaptable extension features.
In M15, Task 3 in this WP was activated, and this first targets decision makers to engage with new data
services of BlueBRIDGE. Activities in this task focus on:



Ensuring that capacity building is integrated in all delivered products;
Ensuring that new stock and fisheries can be included by additional co-funded teams, and that these
teams can be equipped with appropriate tools and models adapted to their situation;
Identifying new indicators for stocks and fisheries, and promote their integration.
As these activities do not result in concrete VRE specifications (yet); they are not covered in this deliverable.
The major results from M15-18 are the following:
D5.3 Blue Assessment VRE Specification: Revised Version
Page 8 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
 The BlueBRIDGE VRE specification for GRSF were discussed at the Second Technical Working Group
meeting of the External Advisory Board (EAB TWG-2). All stakeholders were present including the
data owners (FAO, Ram, SFP), relevant EAB members, and BlueBRIDGE technical partners;
 The specification for the Regional DataBase (RDB) were presented in late February to meeting of most
stakeholders of the Stock Assessment VRE. The community needs require more features than
currently offered in the Tuna Atlas case, especially to marshal the ingestion of data;
 Flexibility was considered of key importance, and the considerable effort to enable communities to
integrate their (often) R-based models is now much better supported than earlier. This enticed the
communities to propose challenging new requirements such as SS3, FLR and DLM Toolkit;
 The VREs, from a community perspective, are still perceived as a collector of ‘middleware’ that
support user defined (typically R-based) models and algorithms, rather than large software
components that require integration (and thus extensive specs). At M15, it remained difficult to
foresee the specific task activities of the different stakeholders to provide smaller components as
models, algorithms or analysis of content;
 The specifications will continue to be adjusted as communities evolve. The stakeholders will always
come first, and e.g. in stock assessment these may be slow in their uptake processes, as uptake may
require the involvement of scientific panels.
The plan until M22 aims at the delivery of community oriented data services, and to dedicate the M23 – M30
to exploitation and community building as described in the DoW. Some activities cannot be exploited until
their development is complete, and this causes a perceived lack in achieving KPIs. Already more than 50% of
the development related KPIs are met, whereas the ones related to exploitation are at lower levels, as
expected at the beginning of the project. Key activities in in this work package for M16 to M22 are
summarized below:
Month
M16, Dec 2016
M17, Jan 2017
M18, Feb 2017
M19, Mar 2017
M20, Apr 2017
M19, May 2017
M22, June 2017
Activities




















New version of IRD Tuna Atlas on-line viewer (IRD/FAO)
SS3 validated of-line (FAO)
OpenCPU app for IRD Tuna atlas (Sardara) data ready for demo (FAO)
SDMX registry service in infrastructure (ENG)
GRSF load with basic data (FORTH, FAO)
FLR first test in infrastructure R
SDMX DSD editing in FAO; first tests in infrastructure (ENG)
R-Shiny DLM Tool validated in RStudio of infrastructure (ICES)
EAB GRSF Feedback (FAO, SFP, Ram)
RDB Requirements meeting (FAO)
Deployment of R models for SS3 to infrastructure
Decision to compartmentalize existing R (ICES) for EM, DLM
SS3 Model development (IRD)
GRSF Geospatial data and TimeSeries first services (FAO, FORTH)
FLR Service validation (FAO)
REST interface for TabMan
OpenCPU reporting module for Sardara
SDMX Template driven data flows
WECAFC Data loader development (FAO)
R-Shiny DLM integrated and extended
D5.3 Blue Assessment VRE Specification: Revised Version
Page 9 of 40
BlueBRIDGE – 675680
1
STOCK ASSESSMENT – GENERIC VRE
www.bluebridge-vres.eu
The generic stock assessment VRE developed in BlueBRIDGE extends the existing VRE for the management of
data, i.e. the ‘FAO Tuna Atlas’ VRE. The Tuna Atlas VRE provides the frame to collect: (1) requirements for
generic improvements and (2) specifications for VRE’s that target a specific community. Before the
community can meaningfully exploit complete VRE’s (and not only specific data services) improvements are
expected through three main key services of the stock assessment VRE:



TabMan for tabular data management;
DataMiner for computational analytical tools (that includes the Rstudio and SAI);
SpeciesDiscovery for loading species presence and absence records and harmonize.
USE CASES
1.1
The use cases listed here do not target a specific community, rather, they describe how the generic features
for stock assessment are expected to be exploited in an overarching infrastructure. These cases cannot be
seen isolated from each other, and also define the options for the specialized community VREs that will refer
to them.
1.1.1
TABULAR DATA MANAGEMENT
The TabMan (Tabular Data Manager service) [1] offers since the project start facilities to discover, ingest,
curate, share, and analyze tabular data. To improve the services, e.g. to make them easier to use by
community, several generic requirements need to be met:


1.1.2
Improve the integration with R – IRD has developed a large ‘library’ of R scripts for tuna capture
data harmonization. This was further extended (an in-kind contribution) and shared with
BlueBRIDGE. The integration is needed of these ad-hoc community scripts (that are tested in the
integrated R-Studio) with TabMan, and to expose the scripts and libraries to users that want to
apply them to their own datasets e.g. through a conversion template.
Improve integration with DataMiner [1] – This is a task beyond WP5, but is required for the
community to benefit from the e-infrastructure facilities. The community wants to be able to access
a pool of generic data-management scripts that e.g. apply data standards, validate data, and apply
data formats. This has to be interactive features that informs on available (and relevant) DataMiner
functions (e.g. offer only csv related functions when the user works on an unstructured csv, or only
offer time dimension related functions if there is a time column(s) in the dataset etc. Improve
metadata management – the gradual conversion of private csv files to infrastructure assets
requires that through BlueBRIDGE services business data are collected and managed through this
process. TabMan already captures at dataset level key metadata elements aligned with SDMX (see
next topic) and overall data policies, but the explicit management of these as metadata elements
has to improve.
TABMAN DATA STANDARDIZATION
The TabMan partially supports the generation and management of SDMX data (Statistical Data and Metadata
Exchange, a format to describe in particular time series).
The community needs to offer to its targeted constituents (country fisheries offices, science departments,
regional fisheries bodies) the tools to generate and understand shared and standardized data, eliminating
the current time consuming and expensive manual data harmonization work; some community workshops
on stock assessment require more time to load, harmonize and validate data than on the stock assessment.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 10 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
In November 2016, the activity had two participants: FAO for what concerns the community requirements
engineering, and ENG as implementer of the task to enrich the existing SDMX features of the infrastructure
by implementing requirements.
FAO will engage with local, regional, and global initiatives to engineer the data standardization requirements
by inventorying relevant formats and data flows (2016-2018), Master Data Management initiatives in FAO
(2017), data policies (2016), and evaluation of tools (2016). This activity is partly funded through in-kind
contributions and collaboration with staff from FAO and other organizations. Specific standardization use
cases, i.e. related to a specific community, will be described below.
ENG will implement services such as SDMX data flows to the registry (2016), integrate SDMX Data Structure
Editors (2017), and support the development of SDMX Transformers (2017).
On March 2017, in gCube version 4.3, it is possible to retrieve Data Structures from the tables present in
Tabman and export them on Fusion Registry 7.3.5. In particular the export procedure includes Data Structure
Definitions, related Concepts and Codelists and generates a specific Data Flow. The possibility to import
Codelists from Fusion Registry is still supported: the work is proceeding and the next step will be to export
Data Structures from Tabman templates and import procedures completed Data Structures to be converted
in templates.
This use case, for what concerns the part of collecting and specifying community requirements related to e.g.
Master Data Management, collecting local/regional/global reference data, data policies and tools evaluation
will be partly o-funded. This requires however that the activity has to be tuned to external collaborations
time-frames. The use case will continue to work with fisheries community for the entire duration of
BlueBRIDGE.
The ticket describing the SDMX improvements: https://support.d4science.org/issues/4643
1.1.3
DATA MINER DATA ANALYSIS
The existing data analytical features in September 2015 were supportive to TabMan specific features
(Template engine, Rules engine) and data could be exported to e.g. RStudio. However, the improved features
of the DataMiner opened the opportunity to connect DataMiner to the tabular data manager.
This use case requires considerable effort for WP9, and that work is not reported here.
In addition, the community produced several specific scenarios for DataMiner exploitation that are reported
in the specific Stock Assessment VRE’s below.
The use case of the overall VRE concerns the integration of DataMiner and TabMan; where data from TabMan
is available in DataMiner, and DataMiner functions are available from TabMan, with services for (Business)
Masterdata management and validation of data structures. It would promote the dataspace of TabMan and
the computing features of DataMiner to a Science-2 environment supporting reproducible and sharable
experiments to communities.
The integration of both services is ongoing, and will not be completed before 2018. Relevant cross-service
features will be released in some of the VRE’s described below before that.
1.1.4
SPECIES DISCOVERY SERVICE
D5.3 Blue Assessment VRE Specification: Revised Version
Page 11 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Stock assessment datasets are sometimes available on-line. In case these are tabular data, a facility in TabMan
to import them directly in a table following a specific format for data and metadata would enrich the stock
assessment with biological parameters from the hosted Fishbase database (also of interest to the Ecopath
use case of IRD) and add clear provenance metadata and reduce the effort to import and harmonize data.
The service could also provide the base for organizations that wish to publish their existing occurrence data
on-line, a topic of interest to the Fishbase consortium (under negotiation).
In November 2016 the community had no specific requirements to integrate and extend the Species
Discovery Service with TabMan, however, a service to access biological parameters was proposed in FAO.
1.1.5
DATA DISSEMINATION
The results, but also the data, service and parametrization of the service must be managed to enable
reproducible experiments for end users. This necessitates the availability of a registry. In BlueBRIDGE, the
CKAN software provides these functions.
The community expects to be able to rely on the CKAN registry for what concerns the dissemination of data,
the experiments and the configuration of the experiments. For Geospatial data the community will continue
the use of the Geoserver and Geonetwork to make data available to end users.
Some requirements may exceed the capabilities of CKAN or Geonetwork, and in those cases the workspace
can provide dissemination services; it can provide links to datasets and folders, and has download facilities.
This use case is again focused on the high level components. For what concerns the use of the infrastructure
to offer reproducible experiments, publish data in accessible registries, and use of the work space to support
specific communities and their experiments, the following chapters will contain details.
The generic dissemination use case required improvements to the workspace (2016) and CKAN (2016) and in
November the entire case was supported.
1.1.6
DATA PUBLICATION – THE PUBLICATION OF VRE OUTCOMES
An innovative use of the VRE’s and BlueBRIDGE data services was identified by FAO through consultation with
its community of global tuna data management. Once a VRE has produced a global (or any composite dataset)
the contributors can benefit from its availability if the data and computing resources are accessible through
their systems, preferably by preparing specific web components that answer ‘competence queries’ by serving
specific data through predefined UI’s, and by exposing ‘sandboxes’ datasets to external analysist (e.g. a stock
assessment scientist that wants to interact directly using R with a dataset).
The VRE for Stock assessment would closely match that need if data can be prepared in the infrastructure
(i.e. the global tuna capture data of FAO – done 2016), and the entire data workflow and metadata (as links)
can be integrated with static and dynamic outputs (reports, websites), e.g. by developing reporting templates
that ‘embed’ output with proper header and footer sections (including proper citation references to donor,
project, consortium and data contributors)
In November 2016, the skeleton use case was partly implemented using OpenCPU, the service needs to be
validated by potential community members for exploitation. #3158
D5.3 Blue Assessment VRE Specification: Revised Version
Page 12 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
The deployment has been a success and has been presented at the BlueBRIDGE Technical Committee held in
Crete from the 14th to the 17th of June 2016. After the presentation other organizations showed interest on
taking advantage of this new approach.
Direct interaction with R through OpenCPU by external users will not be made available until security and
session management issues are addressed.
The publication of data results of integrated software components (e.g. RShiny apps for Data Limited
Methods in stock assessment) emerged as a need after the summer of 2016, and will be addressed in 2017.
Currently, these tools are developed for individual users, and terms and conditions of these tools will have to
be brought in line with infrastructure policies. (#1778)
The infrastructure is also equipped with a WPS driven Webapp publisher that allows the development of
infrastructure driven apps based on DataMiner. The community has already been provided with several
examples in WP7, and WP5 is considering where these can also be used.
The development of the underlying data policy is also ongoing, and will contribute to D2.5 Sustainability plan.
USERS
1.2
The target users of this VRE are BlueBRIDGE consortium developers, data managers, outreach managers, and
community representatives able to assist with requirements engineering. It is not for public exploitation.
VRE DESIGN
1.3
The design of the VRE is largely identical to the existing iMarine Tuna Atlas VRE, but with important
improvements to the underlying infrastructure, especially to improve communication between data services
by capturing metadata and relying on the Data catalogue for discovery and publication of data sets.
Overall the VRE will use the social tool, and the generic services for users and the workspace for data storage.
It will offer the BlueBRIDGE data services TabMan, DataMiner, Search, RStudio, and the Data Catalog. In
addition, the SAI will be available to programmer-users to update community algorithms.
The design will have to manage datastructures (SDMX DSD’s for instance) that can be based on existing
Eurostat software (for the SDMX registry), and manage the data access and sharing through data policies
based on VRE user rights for accessing folders in the Workspace.
RESOURCES
1.4
1.4.1
DATA RESOURCES
The community identified the following data types that needs to be managed in the infrastructure:

Tabular data
o CSV files that need to be imported; typical sizes are smaller than 1 MB; examples:
 Global Capture Datasets 12 MB.
 Regional datasets from 50 KB (Landings) to 1 MB (effort).
 Reference dataset; from 3 KB to 2 MB (All 13.000 fisheries relevant species).
 The total number of datasets to be imported is expected to be around 1000.
o Databases or database files
D5.3 Blue Assessment VRE Specification: Revised Version
Page 13 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
 Postgres and PostGIS are community standards that are yet supported. Specific size
ranges cannot be given; in some of the specialized VRE’s this may be possible.
 Generation of on-the-fly XML data (SDMX) from data bases is foreseen
 Text data
o Typical R scripts for data validation and transformation are less than 50 KB. Several hundreds
of R scripts are expected.
o Reports such as manuals require storage in the WS; a precise request for storage size cannot
be provided.
 Geospatial data
o The Geoserver and geonetwork based SDI’s of FAO and BlueBRIDGE are expected to be used
to share data across the SDI’s. The metadata harvesting is expected to remain active.
o NetCDF publishing and management in Geonetwork and in CKAN.
1.5
IMPLEMENTATION PLAN
The Work Package VREs and data services will be delivered through contributions by WP partners through
VREs that have a specific function. This means that most additions to this generic VRE will be described in the
VREs sections below. The activity that most contributes to the generic VRE development and is included here
relate to the implementation of a generic framework for statistical data management related to SDMX.
For the SDMX framework, FAO leads the specification phase that will have result in metadata driven services
for data harmonization. In 2016, this required work on the upgrade of the Tabular Data Service Tabman; in
particular an improved integration with DataMiner is needed. FAO Corporate systems were in the process of
upgrading to a newer SDMX registry (V9) and FAO identified and selected a new Master Data Management
tool (EBX5) that will direct future developments related to data exchange between FAO and BlueBRIDGE. This
caused the delay of some FAO activities related to BlueBRIDGE services.
The current versions of Tabular Data Management Service and Portlet provide functionalities to import
Codelists from a SDMX Registry and export Data Structures retrieved from Tabman tables (Figure 1).
D5.3 Blue Assessment VRE Specification: Revised Version
Page 14 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Figure 1 Tabular Data Management Service and SDMX, current status
Tabman datasets have been conceived as general purposes tables. However, if a dataset contains a time
dimension, the dataset is a Timeserie. In gCube 4.3 Timeseries can be exported to a Registry in SDMX format
by retrieving their SDMX Data Structures (Data Structure Definitions, Concepts, Data Flows and related
Codelists) from their Tabman Metadata. The SDMX Registry, that is Fusion Registry in D4Science
Infrastructure, can expose this data in standard format.
Starting from the current situation, the following steps should be accomplished (Figure 1):




Support for Fusion Registry 8 should be assured, including username/password based security;
The Data Structures Exporter should be extended to support also Tabman Templates besides Tabman
Tables;
Codelist Importer should be extended to import complete Data Structures from Fusion Registry and
to convert them in Tabman Templates;
a SDMX Data Source should be implemented: it should consist in a web interface enabled to access
Timeseries data stored in Tabular Data Management Database.
The final result is shown in Figure 2. Tabular Data Management Portlet will be able to export Timeseries data
in SDMX format by sending metadata, including Codelists, to a SDMX Registry, and by allowing SDMX Data
Source to access raw data of the exported table in its internal database. In case of import SDMX Registry will
send metadata, including the URL of a remote Data Source, to Tabular Data Management Portlet that will ask
SDMX Data Source to import data.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 15 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Figure 2: Final deployment
In order to get Timeseries data, according to the standard, a SDMX Client will access SDMX registry to get
Medatata and a Reference to the Data Source from where raw data will be downloaded.
The first set of activities concerning SDMX was completed and released with gCube 4.3 (#4643 and #5870).
The next planned steps are (#7358):





Including strict version control in SDMX exporter in order to avoid data duplication on Fusion
Registry (#7535), to be finalized in March;
Upgrade Fusion Registry to version 8.4 (#7192), to be completed in March ;
Implementation of SDMX Exporter based on Tabman Templates (#7358), to be provided in April;
implementation of SDMX Importer, planned to be started in April, after successful tests of the
Exporter (#7358 );
deployment of SDMX Data Source (#7360), planned for May/June;
D5.3 Blue Assessment VRE Specification: Revised Version
Page 16 of 40
BlueBRIDGE – 675680
2
STOCK ASSESSMENT - FAO
www.bluebridge-vres.eu
The FAO of the UN, through BlueBRIDGE VRE’s and data services, intends to support stock assessment
scientists, assessment teams, and regional initiatives. It has identified several stages in the development of
these outputs, with an increasing complexity in the use cases being served.
There is a large overlap between the several initiatives that are identified for implementation, and this will
result in a significant reduction of effort duplication and thus costs. For each of the following cases, we expect
that the generic services (described above) are available at some stage during the development.
The tuning of the different activities will be planned according to availability of services. For the community,
it makes little sense (or is even counterproductive) to expose remote communities with services that are not
available or robust. In addition, the RDB and stock assessment requirements are volatile and activity related,
and cannot be planned too far ahead. Flexibility is of utmost importance.
USE CASES
2.1
2.1.1
REGIONAL DATABASE FOR FISHERIES DATA MANAGEMENT: WECAFC CASE
The objective of this use case is to exploit a customized version of the generic stock assessment VRE (cf. Sec.
1) to support the data management needs of a regional FAO project. This implies the following steps:





Identification of users and uses
o An initial scan was made of organizations and representatives (2016)
Identification of opportunities and gaps, and recommendations for improvements (2016)
o The TabMan was validated for data ingestion and curation;
o In March 2017 a RDB meeting report was produced
Release of targeted components, and validation by a ‘super-user’ (2016-2017)
o Stock assessment tools; Improved DataMiner ‘as-a-tool’ (2016);
o Data Management Tools; Improved integration of Tabular Data Manager;
Formulation of exploitation scenario, and establishment of team
o A meeting is scheduled in 2017 (February in Rome, September in WECAFC region);
o A collaboration has to be formalized with data contributors (2017);
Organize exploitation support – requires in-kind contributions; TBD
o Training and Assistance;
To capture the needs of regional projects BlueBRIDGE in kind contributions are required, and these are
identified in the WECAFC region where FAO assists in the establishment of a regional management body.
Some aspects of the data collation can be provided by BlueBRIDGE, and the overall requirements, based on
the capabilities of the Tuna Atlas VRE, are demonstrated to several stakeholders. In 2017, several tests with
real data will have to demonstrate the capabilities of the tool to provide a secure and performant tool for
regional elaboration of capture data.
The overall activity can be tracked here: https://support.d4science.org/issues/1678
The above steps can be repeated, once the RDB features are implemented, in other scenarios for other
fisheries management organizations.
2.1.2
STOCK ASSESSMENT SUPPORT: WECAFC CASE
D5.3 Blue Assessment VRE Specification: Revised Version
Page 17 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
The objective of this use case is to exploit DataMiner and TabMan integrated in a VRE to support the data
analysis and dissemination needs of stock assessment scientists, teams, and regional (or global) fisheries
organizations. This implies the following steps:





2.1.3
Identification of users and uses
o The identified WECAFC community overlaps those of the IRD and ICES VRE’s, and much of
the effort can be shared (2015).
o The FAO community is initially related to the region of the first fisheries data management
support VRE; in the context of the WECAFC regional project
o The first priority species (Hogfish, Amberjack) and models (CMSY, SS3, VPA) were identified
after several regional meetings (Summer 2016)
Validation of the e-infra suitability to support the selected software
o The first software to be tested was CMSY-as-a-service; results are promising
o SS3 software required a request for software, which was provided in September. Test are still
ongoing in private off-line environments. In January 2017 a test will start with V3.2.
o Other tools were identified, but these are reviewed by other partners (IRD and ICES) to share
the workload and reduce duplication.
Release of targeted components, and validation by a ‘super-user’ (2016-2017)
o DLM Tool was released in test in September 2016; not validated by FAO yet
o CMSY-as-a-service was validated (Oct 2016)
o SS3 V3.3 (new version)is scheduled for June 2017
o Update KPI; January 2017 to 4 stocks assessed https://support.d4science.org/issues/1464
Formulation of exploitation scenario
o The VRE is mainly targeting capacity building initiatives and training
o When a complete tool is available, FAO will demonstrate to regional initiatives (2017)
o The WECAFC region is a first target, and will be engaged in 2017
Organize exploitation support – after an exploitation scenario is agreed.
TUNA ATLAS UPGRADE AND ADDED SERVICES
The objective of this use case is to improve the management of the production of global tuna capture
datasets, and their publication as SDMX timeseries under a well-defined (meta-)data driven design.
Realization of the case requires improved DataMiner and TabMan integration in the VRE to support data
harmonization and standardization. It also will include (late 2017) some needs of FAO Master Data
Management support through EBX5 interoperability; BlueBRIDGE can bridge between Global (FAO) and
national and regional Mater data.
The users of this service are FAO Staff responsible for producing the global tuna atlas and the extraction of
indicators from that atlas.
This use case captures the progressing features to support the ingestion (done) curation (done) and
harmonization (in progress) of data from regional fisheries bodies, and their use in FAO and other
organizations websites (http://www.fao.org/figis/geoserver/tunaatlas/) or WebApps.
The Tuna Atlas upgrade relates also to the visualization of the tuna atlas data, for which an independent
component was requested that can be installed in websites of e.g. regional fisheries organizations. OpenCPU
was selected to act as a bridge to data in the infra and a viewer component.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 18 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
The integration of OpenCPU facilities in the infrastructure can be tracked here: Generate graphs and figures
The online version of the first OpenCPU based facilities can be accessed here:
http://vps282167.ovh.net/BlueBridgeWidget/testGeneralExperiment.html
Since May 2017 OpenCPU viewer is also accessible through a VRE (in test)
The code for the OpenCPU related work is available here: https://github.com/pink-sh/BlueBridgeWidget
The production and publication of timeseries of capture data as standardized XML data based on SDMX is
ongoing. The integration of SDMX facilities can be tracked here through the following tickets:


Improve SDMX capabilities: https://support.d4science.org/issues/4643
Add SDMX publishing support to Tabular Data Manager: https://support.d4science.org/issues/4881
USERS
2.2
The VRE delivered under this use-case will serve two overlapping communities in the WECAFC region (RDB
and Stock Assessment support), and a community of users working on global capture datasets with particular
emphasis on tuna, billfishes and shark capture data integration and visualization.
For each community, a separate VRE may be required, with largely overlapping features, but different
communities. For each community the following roles are foreseen:





2.3
VRE Manager: user and resources management (data and computing, access and security);
Data Managers: data input, validation and publishing (to the CKAN or SDMX registries) of results;
Data Analysts: individual scientists that analyze data with their own R scripts or relying on tools
integrated in the infrastructure (e.g. DataMiner SS3, or DLM Tools);
Assessment teams: groups of experts who take a ‘holistic’ view of the available data (in the
workspace), run models (through DataMiner) and compare the output (e.g. in predefined reports, in
WebApps). The important part is to enable reproducible science; an assessment must be repeatable
in the future;
External ‘browsers’: consumers of the products will not be VRE members, and can be any person
accessing any of the BlueBRIDGE Data Services and catalogue to discover the VRE public results.
VRE DESIGN
The VRE(s) will be very similar to the existing Tuna Atlas VRE, including the user management, social tool, and
other infrastructure facilities. The design will need to cover several tasks in the data lifecycle to support the
data-flow from ‘fishnet to internet’; i.e. from data collection to publication.
These tasks are roughly identified as (they are very dependent on each other).
Data management (might require separate VRE’s for e.g FAO Tuna Atlas and WECAFC)
1. Collection, access and storage;
2. Curation, harmonization and standardization;
3. Data analysis and review based on shared tools of Dataminer;
4. Data dissemination and publication – based on shared tools such as the catalogue and registries;
Stock assessment (might require a separate VRE’s for WECAFC and others)
1. Assume the data management tool is capable to provide the data in standardized form, or enable
other data upload facilities to Data Miner and other analytical tools;
2. Data analysis and dissemination – based on ‘private’ (to the user) tools such as DLM Toolkit;
3. Data analysis and dissemination – publish as shared tools;
D5.3 Blue Assessment VRE Specification: Revised Version
Page 19 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Data Dissemination (For WebApps and OpenCPU based tool with no or simple user identification)
1. Data analysis and preparation;
2. Load into optimized dissemination environments;
2.4
RESOURCES
Data resources for stock assessment are provided by regional fisheries organizations and individual scientists
as small to medium sized tabular or XML (SDMX) data that are easily uploaded to data miner.
Computing resources are available in a variety of software formats that include R, JAVA, JS, and ADMB.
2.5
IMPLEMENTATION PLAN
The development of the VRE extend the FAO Generic VRE (starting with the Tuna Atlas VRE) with data
resources and services that meet the need of the specific communities is focusing in Period 1 on the
establishment of the tools and the community.
Implementation will start when: (1) the components are validated and integrated in the generic VRE, (2) the
target users have validated the approach and identified the stock analysis scenario, (3) the community has
received proper instructions on the exploitation, and (4) the community has subscribed to the data policies.
In Period 1; the following activities were planned:
1. the components identified
a. Data sources;
i. Collect global tuna capture data (done, partly through in-kind IRD contribution)
ii. collect capture time series for 3 stocks (done)
iii. capture output in Workspace (done)
b. Computing resources; integrate R (done), and add 3 models:
i. Tuna atlas load and harmonization R scripts and align with IRD- Sardara (done)
ii. CMSY-as-a-service (done); example result: CmsyNotebook.nb.html (1000 KB)
iii. SS3 (in progress) https://support.d4science.org/issues/5810
iv. VPA (selection in progress)
v. DLM Toolbox (in progress) https://support.d4science.org/issues/1778
vi. Data extraction and reporting – establish framework with WPS and OpenCPU (done)
c. Dissemination resources
i. Add capacity to convert private R models to shared models with SAI (Done)
ii. Capture output of shared models in infrastructure workspace folders (done)
iii. Add capacity
In Period 2, the following activities are planned for the implementation. The technology activity is captured
in #1674, additional community sensitizing, engagement, and development activities are often in-kind
contributions, and thus not captured in tickets.
1. Continue to support the development of models and algorithms for FAO and WECAFC #5238
a. Consult FI and NOAA (for WECAFC) on CMSY, SS3, FLR and other models. (Done)
b. Based on the consultation under 1a. add RShiny and DLM Toolkit requirements (Done)
c. Consult with regional representatives on the format and structure of output reports.
d. Use the Statistical Algorithm Importer and DataMiner to implement existing R algorithms in the
infrastructure; (Ready)
D5.3 Blue Assessment VRE Specification: Revised Version
Page 20 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
e. Use the Workspace to review and disseminate model output and prepare reports. (Ready)
2. Continue to support the implementation of a dynamic on-line stock assessment reporting module with
ICES, NOAA (USA) and IRD. #3158 Developed around the models of point 1, wrap the data service in an
attractive, performant, and user-friendly modules based on a.o. DLM Toolkit, WPS and OpenCPU to:
a. Access and provision data for the models of the inventory above;
b. Display on-line fisheries indicators in an embeddable component for use both in the iMarine VRE
infrastructure (WebApp), and in web sites of fisheries management bodies. Display should cover
tables, graphs and maps;
c. Develop a dynamic reporting module where advanced users can rely on infrastructure data
services, a data repository, and a web-authoring tool such as RShiny/RshinyProxy,
Markdown/Knitr, or R-Notebook, to develop their own dynamic reports.
d. Ensure that model performance and optimization meet usability standards.
e. Improve access to Biodiversity data (FishBase biological parameters, WoRMS mapping to ASFIS)
f. Recommend results integration of stock assessments with GRSF registry (T5.2)
3. Improve the FAO Tuna Atlas data services, to demonstrate the potential of the iMarine infrastructure
dynamic reporting. #5493
g. Extend the use-case to cover parallel use-cases related to on either Sardara (with IRD France) or
Statlant formats (with CCAMLR Tasmania, CECAF, ICES Denmark, etc.);
h. Add geospatial features to ingest and produce geospatial data #7451
4. Support the capacity development in a community exploitation data services (WP8 – For reference);
i. Assist with exploiting Virtual Research Environments;
j. Document use of data services for Stocks and fisheries of 3 models selected by the community;
k. Provide assistance, and if relevant, training to users and advanced users of the VRE.
5. Support RDB development as a BlueBRIDGE Virtual Research Environment #6098; these requirements
were produced during and after a dedicated RDB meeting in Rome, March1-2.
l. Define requirements for a specific RDB in the WECAFC region (Done), and discuss the replication
to other use cases.
m. Define capacity building opportunities in WECAFC region and beyond (WP8 – September 2017)
n. Liaise with WECAFC representatives to define and design training materials and possible
workshop (WP8)
o. Act as VRE manager or WECAFC RDB VRE
p. Organize roll-out of RDB facilities to WECAFC members through meetings and data exchange
(WP8 – October 2017)
6. Support integration of WECAFC RDB facilities with other systems
q. Assist in development of MDM facilities with FAO system engineers (July 2017)
r. Align RDB data structures with the FAO Global Record of Stocks and Fisheries (GRSF) (T5.2)
s. Analyze RDB harmonization features across FAO and IRD, and propose algorithm integration;
focus on mapping of national classification to regional / global ones for FAO TCP’s in e.g. Oman,
Trinidad and Tobago
t. Analyze integration of RDB with Dataminer facilities, and promote the use of stock assessment
services to WECAFC
7. Support management reporting and collaboration
u. Assist communication with EAB/TWG preparation and follow-up activities (WP2)
v. Analyze fisheries data formats and exchange with a focus on the WECAFC region for RDA
w. Propose data alignment options with metadata driven approaches based on e.g. SDMX
x. Provide content for a sustainability plan based on RDB exploitation (November 2017)
D5.3 Blue Assessment VRE Specification: Revised Version
Page 21 of 40
BlueBRIDGE – 675680
3
STOCK ASSESSMENT - IRD
www.bluebridge-vres.eu
IRD plans to use BlueBRIDGE VRE’s and data services to support stock assessment scientists and assessment
teams with existing data analysis software that will benefit from a managed VRE. There is a significant
contribution of in-kind contribution by IRD. IRD aims at a significant reduction of effort duplication and thus
costs for the stock assessment teams. This can be achieved by managing all stages of the development of an
assessment report.
The stock assessment requirements are volatile and activity related, and cannot be planned too far ahead.
For instance, the release of software by NOAA cannot be anticipated, but developers must be responsive so
that stock assessment scientist can access the latest models. Flexibility is thus of utmost importance.
USE CASES
3.1
IRD plans to incorporate several existing software components, and benefit from BlueBRIDGE resources to
improve performance of these tools. Here is the list of list_of_IRD_algorithms.




Ecopath and Ecosim: ecological / ecosystem modeling activities
BFT Assessment: tuna stock assessment models (bluefin Tuna with ICCAT and Ifremer examples and
demonstration with possible extension to tropical Tunas and billfish with IOTC if interested by the
approach). We are currently working on standardization of data formats by packaging stock
assessment model outputs within netCDF files to embed metadata and provide data access (through
Thredds Server).
Ichthyop: Ichthyop (http://www.ichthyop.org/) is a Lagrangian tool for simulating drifting objects
dynamics in 3 o 4 dimensions (3D + Time). The model itself is written in Java but uses and generates
data delivered with netCDF data formats. Many researchers focus on the model parametrization
without having to deal with Java programming. However, many researchers use data model outputs
through R codes and R package for netCDF. We thus can use the infrastructure to execute both
Ichthtyop simulations and related R codes.
FAOTunaAtlas: IRD participates to the FAO Tuna Atlas by providing a set a R codes to extract,
transform and load tuna RFMOs datasets within a single data format and a related data warehouse
(PostGIS database). IRD works as well with FAO to provide a set of R codes to generate indicators.
Next part of the work will focus on packaging tuna RFMOs datasets within netCDF files to better
manage both metadata and data by complying with standards (OGC metadata and standard code
lists for species, fishing gears, etc.). This will be able to foster data discovery by using the metadata
catalogs of the project and related data servers (Thredds or Geoserver).
All partners stress the need for:



3.2
Interactive features on results; annotations, html reports e.g. with Interactive Dashboards for
visualization (e.g. 'shiny') or collaborative edition of document (“google doc like”, e.g. Sharelatex)
Workflow features; parameterization of algorithms
User levels; VRE Users (data only), Tweakers (parameters), and developers (change/add) code
USERS
Ecopath use will have to be re-discussed; it was halted in period one. The original focused of infrastructure
based modeling shifted to data provision through targeted data services (localized biological parameters).
D5.3 Blue Assessment VRE Specification: Revised Version
Page 22 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
BFT Assessment is developed for a BFT assessment team of ICCAT (led by Ifremer and ICCAT). Similar teams
in other organizations will benefit from the same capacity, as data and algorithms are similar. BFT is a very
emblematic species and related stock assessment is thus a hot topic (cite some publications). Tuna
commission pretty much use the same kind of approach for all tuna species and related stock assessment
working group. BFT use case can thus be presented to other working groups if successful (BB can service the
tool with improved data and processing facilities, this is likely to attract wide support). In particular ICCAT BFT
use case might be used to set up a similar approach with IOTC.
Ichtyop is used by multiple marine scientists (IRD, IFREMER and more) who need to simulate objects such as




drifting simulations (FADs, Drifters...) used by fishermen, physical oceanographers;
fish larvae / connectivity dispersion in the open ocean;
pollution / oil spill dispersion and float modelling;
safety issues e.g. to model flight crash debris.
FAO Tuna atlas will be used by FAO and IRD, but its output can also be shared with any organization that has
an interest (and can make some contribution) to produce global overviews of indicators. We expect that data
discovery, data transformation or data visualization services will be used by partners as they can be integrated
in any Website (through WPS or OpenCPU protocols).
VRE DESIGN
3.3
The design of the VRE(s) for IRD is complex, as each use case has many details. And specific codes (Fortran,
R, Java, Latex, .Net, ...).
The summary below summarizes the wiki detailed design pages.
3.3.1
EWE
The EwE software is an open source ecological modeling software suite (http://www.ecopath.org/about).
The EwE desktop software runs on Windows only, the EwE computational core is system independent and
can run on different OS-es via a local runtime environment such as Mono (www.mono-project.com).
Globally, EwE is the most widely used food web model with 7000+ known users and 400+ publications
(Colleter et al, 2014). EwE is maintained by a consortium of 18 institutes world-wide
(http://ecopath.org/consortium). Traditionally EwE is used for assessing food web dynamics and the impact
of fishing, but recent developments have seen an increase in applications to management advise and
Environmental Impact Assessments.
Currently EwE has limited abilities to:






locate and include species parameters from data facilities such as SeaLifeBase, FishBase, OBIS,
WoRMS, GBIF, etc.,
connect to cloud-based data facilities such as THREDDS,
pre-process GIS data, which is currently only enabled for the Windows OS,
compare multiple runs, which is a feature not provided by the desktop software,
run big spatial assessments on Windows computers,
run outside Windows.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 23 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
The objective with BlueBRIDGE is to run an Ecopath model (using MONO) in the infrastructure to test
feasibility (2016), and to connect to BlueBRIDGE services for e.g. biological parameters (2016/7). However,
the activity was suspended in mid 2016 for various reasons.
3.3.2
ICHTHYOP
It is possible to execute the Ichthyop model (http://www.ichthyop.org/) on the infrastructure since March
2017. The researcher in charge of this model foresees some interesting outlooks with this approach for the
community of users.
To summarize, this model enables to simulate the trajectories of objects driven by ocean currents (plankton,
drifters, etc.) at the sea surface or within the water column. Figure below gives an example of 1000
simulations of trajectories (in blue) when driven by Ichthyop with sea surface currents (from satellite data)
and the in sito observation of a Drifter at the same period and location.
Figure 3: Ichtyop simulation
Ocean currents data needed to drive the model can be delivered from Earth observation / satellites images
(e.g. OSCAR or GECKO) or model outputs (e.g. ROMS, Drakar, and others). These input data are usually
delivered through netCDF files / OPeNDAP access. The model output is delivered within a netCDF file as well.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 24 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Figure 4: Ichtyop workflow
Users have three options to use the underlying codes:



A: using the WPS Web Service from a client:
o Desktop application: Web browser, GIS,
o Programmatic access: Java, Python, R or any programming language which can deal with WPS
B use the codes through services delivered by the collaborative Web Environment (VRE) of
BlueBRIDGE which provides GUIs:
o Web forms to parametrize and run the codes easily
o R server used by multiple applications; RStudio, WPS server, Sharelatex
C the service can be embedded (via WPS or OpenCPU) in any Web site (e.g. ichthyop.org):
example http://mdst-macroes.ird.fr/tmp/Ichthtyop_one_simulation.html
These different options all use seamlessly the same services provided by the infra (backend) and the VRE.
Behind the scene, the workflow has been split in various steps:
1. The WPS server manages the executeProcess request with the input parameters provided by the
client user. The Ichthyop simulation is executed accordingly.
2. During Ichthyop execution the model is driven by environmental data harvested remotely with
OPenDAP protocol provided by Thredds server working on top of OSCAR satellite images package in
netCDF (other products might be used to drive the model like GECKO).
3. The result of the model (WPS output parameter) is packaged within a netCDF file.
4. R transforms the native Icththyop netCDF file into another one which is more compliant with CF
conventions.
5. R transforms the native netCDF or netCDF-CF files into shapefiles (points and trajectories), does the
same with the trajectories of Drifters managed in Postgis databases and create a QGIS map/Project.
6. The user can download everything packaged in a zip file and visualize the results with the Qgis map.
Off course, if we can achieve this, the next challenge will consist in storing and analyzing outputs.
3.3.3
BFT ASSESSMENT
D5.3 Blue Assessment VRE Specification: Revised Version
Page 25 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
Figure 5: BFT Workflow
This use case is tightly connected to ICCAT eastern Bluefin tuna stock assessment working group (which
produced R and Fortran codes reused on the infrastructure). The connection with the community of practice
is done through Sylvain Bonhommeau (chairing the BFT-E group and working on tunas with IRD in a common
research unit). Moreover, the approach is very similar for tropical tunas involving IRD stock analysis activities;
in particular we want to promote a similar approach (stock assessment tools) to access and process tropical
tuna data with IOTC as well as other large pelagic (swordfish?).
As illustrated in the Figure below use case will make use of following D4S technologies:
 Workspace
 DataMiner WPS
 Rstudio, Sharelatex working on top of R and latex compilers on the infra,
 If possible Shiny applications and Sharelatex environment for collaborative edition of documents.
Figure 6: Publication options
D5.3 Blue Assessment VRE Specification: Revised Version
Page 26 of 40
BlueBRIDGE – 675680
3.4
www.bluebridge-vres.eu
RESOURCES
Data are provided by IRD, and sourced from different information systems as depicted above. Some external
data sources are used to feed the different workflow (e.g. tuna RFMOs, datasets, and satellite images).
So far, IRD produced the following computing resources to set up the different VREs: (A summary of the full
list). Reformatted codes have been deployed in development or production environments. Other algorithms
are in preparation and will be deployed in the coming months.
The list is maintained in the wiki: List_of_IRD_algorithms)
VRE / use
case
WPS Identifier of the algorithm
Environment (Development
RProtolab or Production)
Goal / Summary
Ichthyop
model
ICHTHYOP_MODEL_ONE_BY_ONE
Production and
Development
This R code packages some extraction to get observed
trajectories from data sources (FADs or Drifters) and the
execution of Ichthyop driven by OSCAR data to confront
simulation with these observatios. netCDF outputs are
transformed into maps to be visualized with Qgis. Ichthyop is a
free Java tool designed to study the effects of physical and
biological factors on ichthyoplankton dynamics
Ichthyop
model
ICHTHYOP_MODEL_MULTIPLE_R
UNS
ProductionProductionDevel
opment(#2344)
Algorithm enablong the execution of a set of simulations in one
time, this version is parallelized
Ichthyop
model
MAKE_ICHTHYOP_NETCDF_CF_C
OMPLIANT
Development
Transformation du fichier netCDF natif en netCDF-CF
Ichthyop
model
ICHTHYOP_NETCDF_OUTPUT_TO
_SHAPEFILE
Development
This code turns trajectories of ichthyop model outputs delivered
with netCDF into a shapefile
Ichthyop
model
MAKE_ICHTHYOP_NETCDF_CF_C
OMPLIANT_OUTPUT_TO_SHAPEFI
LE
TO BE DEPLOYED
TO BE DEPLOYED
FAO Tuna
Atlas VRE
TUNA_ATLAS_DATA_ACCESS
Development
This R code enables users to adapt a SQL query to get data from
Sardara database storing global
ICCAT
BFT-E VRE
STEP_1___VPA_ICCAT_BFT_E_RE
TROS
Development&Production
STEP 1: ICCAT (Eastern) BFT Stock Assessment. R and Fortran
code provided by ICCAT and IFremer to execute the whole Stock
assessment workflow online integration has been done with the
help (mediation) of CNR and IRD
ICCAT
BFT-E VRE
STEP_2__VPA_ICCAT_BFT_E_VISU
ALISATION
Development &Production
ICCAT
BFT-E VRE
STEP_3___VPA_ICCAT_BFT_E_PR
OJECTION
Development&Production
ICCAT
BFT-E VRE
STEP_4_VPA_ICCAT_BFT_E_REPO
RT
Development&Production
ICCAT
BFT-E VRE
PARALLELIZED_STEP1_VPA_ICCAT
_BFT_E_RETROS
Development&Production
FAO Tuna
Atlas VRE
CATCHES_AGGREGATED_FOLLOW
ING_A_SELECT_VARIABLE
Development
D5.3 Blue Assessment VRE Specification: Revised Version
ICCAT (Eastern) Bluefin Tuna Stock Assessment. This set of R and
Fortran code have been provided by ICCAT and IFremer to
execute the whole Stock assessment workflow online
integration has been done with the help (mediation) of CNR and
IRD
STEP 3: ICCAT (Eastern) Bluefin Tuna Stock Assessment. This set
of R and Fortran code have been provided by ICCAT and IFremer
to execute the whole Stock assessment workflow online
integration has been done with the help (mediation) of CNR and
IRD
ICCAT (Eastern) Bluefin Tuna Stock Assessment. This set of R and
Fortran code have been provided by ICCAT and IFremer to
execute the whole Stock assessment workflow online
integration has been done with the help (mediation) of CNR and
IRD
STEP 1: ICCAT (Eastern) Bluefin Tuna Stock Assessment. This set
of R and Fortran code have been provided by ICCAT and IFremer
to execute the whole Stock assessment workflow online
integration has been done with the help (mediation) of CNR and
IRD
Catches Aggregated Following A Select VariableThe outputs are
temporal and spatial distribution of the catches aggregated
following a selected variable and given the filters applied by the
user
Page 27 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
FAO Tuna
Atlas VRE
CATCHES_BY_FLAGS
Development
Catches By FlagsThe output is a plot of the catches by flags given
the filters applied by the user
FAO Tuna
Atlas VRE
CATCHES_BY_FLAGS_SIMPLIFIED_
VERSION
Development
Catches By Flags Simplified VersionThe output is a plot of the
catches by flags given the filters applied by the user
FAO Tuna
Atlas VRE
CATCHES_BY_GEAR_SIMPLIFIED_
VERSION
Development
Catches By Gear Simplified VersionThe output is a plot of the
catches by gear given the filters applied by the user
FAO Tuna
Atlas VRE
CATCHES_BY_GEARS
Development
Catches By GearsThe output is a plot of the catches by gears for
tuna fisheries given the filters applied by the user
FAO Tuna
Atlas VRE
CATCHES_BY_SPECIES
Development
Catches By SpeciesThe output is a plot of the catches by species
given the filters applied by the user
FAO Tuna
Atlas VRE
CATCHES_BY_SPECIES_SIMPLIFIE
D_VERSION
Development
Catches By Species Simplified VersionThe output is a plot of the
catches by species given the filters applied by the user
FAO Tuna
Atlas VRE
CATCHES_BY_TYPE_OF_SCHOOL
Development
Catches By Type Of SchoolThe output is a plot of the catches by
type of school given the filters applied by the userCompute
Fisheries Indicators From Own Formatted DatasetCompute
some fisheries indicators (plots and maps) from a dataset that
you have previously formatted and imported through the
algorithm Import Fisheries Form...
FAO Tuna
Atlas VRE
GLOBAL_CATCHES
Development
Global CatchesThe output is a plot of the catches given the
filters applied by the user
IMPLEMENTATION PLAN
3.5
The different sections of the IRD VRE plan will be developed by IRD, with assistance from FAO and when it
concerns Blue Commons, other consortium members such as CNR.
The contents of this section are updated to include the major changes of the implementation plan since the
release of D5.1 as a report in December 2016.
3.5.1
EWE WORKPLAN
EwE activities were scaled back after the first six months, since the lead developers changed organization.
The collaboration will continue at a slower pace on a voluntary basis.
The workplan for the first six months was to:




Write specifications (functional and user requirements) (done)
Run a sample of pre-parametrized EwE models within the infrastructure (done)
Data access services needed to build and run the model (biological and spatio-temporal data are
required to drive the model): Fishbase & Sealifebase, Worms, environmental parameters, EAF
Nansen (in progress)
Store sets of parameters needed to replicate past runs. Define a way to provide input parameters
and a way to browse the results (provide GIS data formats to Ecospace? Shape files, netCDF, other,
to supplement the current ESRI ASCII grid files) (not started)
Since the release of D5.1, no plans were made to re-activate the EWE activity.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 28 of 40
BlueBRIDGE – 675680
3.5.2 ICCAT BFT-E WORKPLAN
www.bluebridge-vres.eu
The overall workplan is to continue to tweak and deploy multiple algorithms on the infrastructure:


samples (either to check or set up the computation environment)
deployed for real use
The workplan will continue to enrich the infra with







provide R codes executable with WPS to compile a LateX / knitR / markdown code to get various
outputs: pdf, doc, html. This generic process will be used thereafter to compile ICCAT reports and
others documents.
validate the ability to manage rShiny outputs by making the results available and browsable through
a URL within the infrastructure,
provide the whole ICCAT Bluefin Tuna Stock assessment Workflow as a single pre-parametrized
experiment to validate that all technical issues are managed,
Split this worflow in different steps (some of them being generic)
o STEP 1 (heavy): Analysis (retros) of the BFT-E ICCAT datasets (catches, efforts) by running the
stock assessment model with multiple combinations of input parameters and enabling
parallelization of codes,
o STEP 2 (light and generic): Visualization of data analysis (retros: outputs of step 1): generate
a set of plots and indicators summarizing the results of the processes which are going to be
discussed by the working groups. This step requires the ability to manage interactive
visualization packages (like Rshiny) as well as to display related html outputs within the
infrastructure,
o STEP 3 (heavy): Projections which is the most demanding step in terms of machine resources,
o STEP 4 (light and generic): Writing the main structure and plots of the executive summary
and make it available as pdf or html outputs. Users expect a collaborative environment to
edit reports.
test the integration of plotly within a knitR document using a html output
up the the Rstudio client within the VRE,
in FAO Tuna Atlas VRE: deployment of a SQL database within the infrastructure Postgresql & postgis
server: the database will stock Tuna Atlas VRE datasets (inputs & outputs)
Once processes have been deployed, they are made available by the infrastructure as services (WPS,
OpenCPU). These services can be used by the VRE to build friendly GUIs as well as by any other external
Website (work of FAO with OpenCPU and related javascript library to manage OpenCPU or WPS services from
Web Browser clients).The results of processes execution usually consists in zip files made available with URLs
that people can use to work.
Access to the infrastructure might be possible through a R connector (from a local PC, RStudio). WP5 doesn't
deal with GUIs for online editing or annotation of documents or charts (goal of WP8).
Check if some existing VREs (like VME-DB) already provide some services for collaborative / interactive
documents.
3.5.3
ICHTYOP WORKPLAN
The following options all use seamlessly the same services provided by the infra (backend) and the VRE.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 29 of 40
BlueBRIDGE – 675680
Behind the scene, the development workflow has been split in various steps:
www.bluebridge-vres.eu
1. the WPS server manages the WPS “executeProcess”request with the input paramaters provided by
the client user. The Ichthyop simulation is executed accordingly.
2. During Ichthyop execution the model is driven by environmental data harvested remotely with
OPenDAP protocol provided by Thredds server working on top of OSCAR satellite images package in
netCDF (other products might be used to drive the model like GECKO) and physically replicated and
stored on the infrastructure,
3. The result of the model (WPS output parameter) is packaged within a netCDF file
4. R transforms the native data structure of Ichthyop netCDF file into another one which is more
compliant with CF conventions,
5. R transforms the native netCDF or netCDF-CF files into shapefiles (points and trajectories), does the
same with the trajectories of Drifters managed in Postgis databases and create a QGIS map/Project,
6. the user can download everything packaged in a zip file and visualize the results with the Qgis map.
Now that all the steps described above can be achieved, the next challenge will consist in storing and
analyzing (thousands of) outputs. A student of the Naval Academy in Brest has been developing a method to
explore and flag the quality of the results of Ichthyop simulations. We now need to deploy post processing
algorithms to flag the quality of simulations executed on the infra. (M21)
D5.3 Blue Assessment VRE Specification: Revised Version
Page 30 of 40
BlueBRIDGE – 675680
4
STOCK ASSESSMENT - ICES
www.bluebridge-vres.eu
The ICES stock assessment VRE is focused on providing stock assessment services through DataMiner. It also
includes basic data management features, e.g. storage of input and output data files.
The aim is to improve the performance of existing stock assessment models; reduce the time to completion,
improve the metadata description, and improve simulation runs.
USE CASES
4.1
ICES has proposed several algorithms and models in support of capacity building for individual scientists and
assessment groups. These training and capacity building activities will have to be coordinated with WP8 and
the other WP members.



MSE for western horse mackerel
o Harvest control rules (HCRs) are used by fishery management agencies to achieve harvest
policies that meet both conservation and stability objectives. Management Strategy
Evaluation (MSE) is a framework that evaluates a number of candidate HCRs according to
these objectives. MSEs generally apply a simulation framework that explores sensitivities in
parameterization of model HCRs, resulting in a computationally intensive framework. The
MSE detailed below has been developed by José De Oliveira for western horse mackerel and
will serve as a test case for using the BlueBRIDGE VRE platform for MSE analyses.
o The model will not be an as-a-service VRE, but a use case in compartmentalized and focused
VREs.
Mixed fisheries assessment model
o The Mixed Fishery assessment uses an Fcube approach (after Fleet and Fishery Forecast), a
simple model of mixed fisheries which can be used to assess the consistency between
management (TAC and/or effort) advice for species caught together, given the availability
and accessibility of data. A simple linear relationship is assumed between effort and fishing
mortality and status quo catchability assumed in the projections. Forecasts are produced
according to scenarios for what limits the effort in each fleet and the forecasted catch of each
species compared to the single species advice for TAC. The Fcube model in its current form
is appropriate for all regions where the majority of species of commercial interest are
assessed quantitatively. Many scenarios can be evaluated, including an optimization process
that will identify the set of fishing mortalities, by stock, that maximizes a given objective
function.
o This model will not be an as-a-service VRE, but a use-case in compartmentalized and focused
VREs.
Ensemble model
o LeMANS is a size-structured multi-species model of a fish community with a realistic
distribution of life-history attributes. This approach differs from that reported from most
other size-based models in that it maintains both the identity of the species in the system
and the size structure of the individual populations. The model has been applied to the North
Sea and calculates biomasses of 21 fish stocks [2] [4] . Recruitment occurs each year and
predation is based on size moderated by a diet matrix, but no starvation occurs. Ecosystem
components not represented explicitly make up the pool of “other food”. An ensemble
approach has been implemented by screening potential model set-ups against ICES
abundance data to produce a subset of models consistent with data that can be used to
generate probabilistic projections [4](
o The model will not be an as-a-service VRE, but a use case in compartmentalized and focused
VREs.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 31 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
ICES obtained the source code for each model, and negotiated terms and conditions with the owner of the
data and software to use the BlueBRIDGE infrastructure. The models were successfully tested in a private
environment. After review with infrastructure developers, it was decided that the code needs to be
compartmentalized to better fit with an infrastructure approach. Therefore, a revised approach has been
developed. First, log-jams in the use-case source code are identified and functions (here, referred to as base
functions) that can be modified to be more computationally efficient are identified (ticket #5818). Next, the
base functions are modified to run in parallel, or to utilize other infrastructure services (tickets #5819 and
#5821). The modified functions will be deployed as-a-service and wrappers will be developed to link input
from use-case source code to the VRE for computation and return from the VRE into the local environment
as output. This signifies a move from the specific models based on a purpose-built script towards an
infrastructure.
An additional use-case has been developed around extant R shiny code (Data-limited-tools) developed by
Jason Cope ([email protected]).

DLM Toolkit Toolbox for Data Limited Methods Stock assessment (DLM Toolkit) #1778
This represents a change from the plan presented in D5.1, and also attracted the interest of FAO, and a joint
use case is being implemented around this extendable Toolkit. The Toolkit can serve several user communities
with different needs, and offers ample advantages if used through an infrastructure, for instance by offering
a uniform data-space, replicability, performance, and simplicity.
USERS
4.2
The users of the tools are from the stock assessment community;



4.3
Stock assessment individual scientists (current community);
Stock assessment trainers and students (with WP8; courses cover some of above cases);
Natural resource management community.
VRE DESIGN
The VRE will rely on the DataMiner facilities. This will be achieved in a few steps, for each of these the design
can rely on the already available VRE design facilities:
1. The algorithms, where possible, will be tested in the on-line R-Studio environment. The requires
access to R-Studio, if possible integrated with the Workspace;
2. When successful, they will be integrated through the Statistical Algorithm Importer (SAI) and
published to the selected community through the DataMiner.
3. The VRE community can now discover the process in the VRE, and run the model. The output will be
stored (initially) on the workspace, but can also (later) be made available in dedicated pages.
4.4
RESOURCES
This VRE will rely on existing tools that will need to be integrated in the infrastructure. The use of resources
depends on the assessment activity, and is difficult to predict at this stage. The tools depend at least on R and
R-shiny, and for an integration in the infrastructure will rely on DataMiner and the Workspace.
For completed assessments, a dissemination through CKAN is considered.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 32 of 40
BlueBRIDGE – 675680
4.5 IMPLEMENTATION PLAN
www.bluebridge-vres.eu
The implementation plan is incorporated in the Redmine ticketing system, and implementation activities can
be tracked for the ICES Stock Assessment VRE.
The first theme of the ICES Workplan aims at migrating models that are fit for infrastructure deployment:
1. Identify where efficiencies can be made in base functions #5818
2. Modify base functions to run in parallel or to utilize other infrastructure services #5819 and #5821
3. Deploy modified functions
The second theme relates to implementation of Data Limited Methods (DLM), which is also of interest to
FAO. The DLM implementation plan for the coming period covers the exploitation of the existing DLM toolkit,
and to render it an infrastructure asset with managed users and data / computing resources.
The first theme covers the work of individual scientists to validate the proposed integration approach, while
the second, after successful validation, will raise the use of the tool to a multi-user environment:
1. Simple minimal example of Shiny App with ShinyProxy #5822 , #5823
2. Deploy DLM Toolkit https://github.com/shcaba/Data-limitedtools/tree/master/Shiny_DLMtool with ShinyProxy #1778
Adaptation since D5.1
After D5.1 was released, it was decided that the second theme offers a more promising future perspective,
as it requires less effort on specific models and has a potential larger user base. The work of the coming
period will therefore focus on the development of DLM methods. Since FAO has expressed a similar interest,
the development work will be shared.
ICES effort for M16-M18 was devoted to developing shiny applications that were also used for the ICES MSY
training course (in WP8), and are available to scientists. ICES created a Length-based indicator application
that takes raw length frequency data and transforms them into output suitable for determining ICES proxy
MSY reference points. The code is ready for deployment in a Docker image and constitutes a test case for
ShinyProxy (task 7492).
ICES considers further effort on tasks #1768, #4620, and #5517 not immediately necessary, as the advance
analysis of the algorithms and models revealed that are not really suited for deployment on the einfrastructure; modifying the identified “log-jam” functions would likely be more efficient and elegant using
C++ or TMB rather than simply parallelizing inefficient code. The costs of that approach however is high, and
they might find limited broad-scale application to the rest of the community.
Further effort should be designated to linking the e-infrastructure with docker images / shinyproxy such that
data can be moved between the resources and so that the e-infrastructure is more than a simple server for
shiny apps. ICES is currently discussing with ENG on how this might be done. This activity under ShinyProxy
#1778 is also relevant to FAO, and a joint approach is being implemented. The approach aims to offer DLM
services to any community through a VRE. The plan is to start after the shinyproxy is ready (March 2017) and
will enrich the number of algorithms in DLM Tools on the one hand, and better integrate the tool with
infrastructure services, an activity that will continue until M24, after which exploitation starts.
ICES expects further effort should be designated to improve the integration between the workspace, Rstudio,
and VREs, of interest to all partners in this WP. This is described this in task #7493.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 33 of 40
BlueBRIDGE – 675680
5
GRSF
www.bluebridge-vres.eu
This Global Record of Stocks and Fisheries aims to produce a collaborative environment to maintain a global
knowledge base of stocks and fisheries.
It satisfies the need for intelligent identification of stocks and fisheries, the assessment of their status over
time, and deep linking to the data sources and assessment history. Building on community and infrastructure
products (software and datasets) as well as other VREs offerings (models and datasets, knowledge base
development) and extending those with participant specialists in data mining and mapping, indicators
extraction and dataset integration (where required), the resulting VRE provides a global information resource
on inter-disciplinary stock and fisheries information.
USE CASES
5.1
The main purpose of the VRE is to provide scientists with an environment and the tools for accessing stocks
and fisheries information. To this end a registry containing such information will be constructed. This registry
will be the core backbone of the VRE and will integrate data about stocks, fisheries and their corresponding
details, coming from different sources. For this reason, a set of basic data sources (namely FIRMS, RAM and
Fishsource), as well as a set of models and software components are required, for achieving the data
integration and the tools and processes for constructing, monitoring and maintaining the registry.
There are two main use cases that together constitute the GRSF;
1. Access, ingest and manage the GRSF source records to produce a global integrated knowledge base
on stocks and fisheries;
2. Publish and facilities the knowledge base through a human and machine readable registry.
USERS
5.2
Two main type of users are foreseen in the infrastructure;

Data Managers; liaise with data contributors, ingest and integrate information, ensure consistency
and completeness of data, establish mappings and relations, and

Data Publishers; once the data manager has completed the preparatory phase, the resulting records
will have to be accorded by the contributing parties (the legal owners), and possibly by other
organizations (if part of the data was sourced from them), and the resulting records, (that may
contain merged or modified information) will have to be validated on their consistency with the
source information. Only after that has been completed, can they be published as GRSF records.
Two main type of users are foreseen consuming data from the infrastructure


The physical consumers of the Global Record of Stocks and Fisheries are all parties interested in the
state of stocks, fisheries and marine resources, and range from national governments, through
regional fisheries management organizations, to scientist and interested individuals. These users
require a user friendly UI.
The non-physical consumers of the Global Record of Stocks and Fisheries are external sites that
wish to reproduce parts of, or the entire GRFS. The GRSF will not be open to any machine, and its
owners will first have to subscribe to, and abide by, a data policy that is likely to be enforced at the
level of the CKAN registry.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 34 of 40
BlueBRIDGE – 675680
5.3
www.bluebridge-vres.eu
VRE DESIGN
The VRE design for what concerns the infrastructure based management of the GRSF data ingestion,
management and mapping dependent on the corresponding software tools (MatWare, GRSF-services, etc.),
and integrated in the infrastructure. It will have data managers who are dependent on infrastructure user
policies.
For what concerns the dissemination, the GRSF will rely on a dedicated catalogue, based on the GCube Data
Catalogue facilities, for exposing the content of the Semantic Knowledge base. The following figure depicts
that overall design of the GRSF VRE.
Figure 7. The GRSF VRE overal design
5.4
RESOURCES
Data resources:
5.4.1
FIRMS
FIRMS (http://firms.fao.org/firms/en) is an acronym of Fisheries and Resources Monitoring System, and has
the main objective of providing access to a wide range of high-quality information on the global monitoring
and management of fishery marine resources. FIRMS collects data from 14 intergovernmental organizations
and contains information for more than 1000 stocks and 300 fisheries. FIRMS contents are exposed in XML,
using a set of particular services, with respect to a predefined XML schema. A more detailed discussion about
the main concepts found in FIRMS, with their description and the corresponding XPATH pattern (for retrieving
them
from
the
XML
response)
can
be
found
at:
https://support.d4science.org/projects/bluebridge/wiki/FIRMS_Analysis_and_Modeling.
5.4.2
RAM LEGACY STOCK ASS ESSMENT DATABASE
RAM Legacy Stock Assessment Database (http://ramlegacy.org) is a compilation of stock assessment results
for commercially exploited marine populations from around the world. The assessments were assembled
from 21 national and international management agencies for a total of 331 stocks. There are metadata for
each stock describing various information, including the taxonomic information of the species, the geographic
location of the stock, the management body that conducted the assessment and the particular assessment
D5.3 Blue Assessment VRE Specification: Revised Version
Page 35 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
methodology. The key concepts of the database are: Area, Assessment, AssessMethod, Assessor, Biometrics,
Bioparams, Management, Stock, Taxonomy, Timeseries and TSMetrics. An extended discussion about RAM
can be found at https://support.d4science.org/projects/bluebridge/wiki/RAM_Analysis_and_Modeling.
5.4.3
FISHSOURCE
FishSource (http://www.fishsource.com) is an online information registry containing various information
about stocks and fisheries. It exposes its contents through fisheries profiles. Each profile contains: (a) the
identification, with various information about the species, the water area, the management areas, etc., (b)
the scores, with statistics and measures, (c) various sustainability information, (d) summary and basics
information and (e) references and reviews. The information are currently available to the public as HTML
pages, however in future they plan to expose them using a set of services.
5.4.4
OTHER SOURCES
Apart from the three main sources containing information about stocks and fisheries which are described
above, the global registry will also exploit information from other sources as well. Just indicatively it will
exploit information from the MarineTLO-based warehouse (http://wiki.i-marine.eu/index.php/MarineTLObased_warehouse), that was constructed in the context of the iMarine project. The MarineTLO-based
warehouse contains various information (e.g. taxonomic, preys-predators, scientific and common names,
vessels, water areas, etc.) from the following sources:





Fisheries Linked Open Data – FLOD (http://www.fao.org/figis/flod/endpoint)
ECOSCOPE Knowledge Base (http://ecoscopebc.mpl.ird.fr/joseki/ecoscope)
FishBase (http://www.fishbase.org)
World Register of Marine Species – WoRMS (http://www.marinespecies.org)
DBpedia (http://dbpedia.org)
The Semantic Models
5.4.5
REQUIREMENTS
The integration of data coming from heterogeneous sources will facilitate the better exploitation of
information that exists in different sources. More specifically it will allow connecting data referring to the
same piece of information (i.e. stocks that contain the same marine species), and finally achieving to answer
complex queries, that could not be answered only by the underlying sources.
For this reason, we pay attention to the querying requirements. In order to make concrete statements about
the information that should be stored in the registry, we used the notion of competency queries. A
competency query is a query useful for the community at hand, e.g. for a human member (e.g. a scientist),
or for building applications for that domain. Therefore, a list of such queries can sketch the desired scope and
the desired structuring of the information. An indicative list of the competency queries focusing on
information about stocks and fisheries (at least for the FIRMS source) can be found at
https://support.d4science.org/projects/bluebridge/wiki/Competence_questions.
For achieving the integration of heterogeneous sources, it is important to select an appropriate model (i.e. a
top level ontology), that is abstract enough to cover most of the fundamental categories of the underlying
source, can be easily extended to any level of detail on demand, and is rich in terms of properties so that the
particular details of the different sources can be easily mapped to it.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 36 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
To this end we selected the CIDOC Conceptual Reference Model (ISO 21127:2006) [5] and its extensions.
CIDOC CRM provides the definitions for describing the implicit and explicit concepts and relationships used
in cultural heritage documentation. The latest version of the ontology (version 6.2) comprises of 82 classes
and 287 properties. It has a rich structure of intermediate classes and relations, which apart from being very
useful for building query services, it makes its extension to other domains easier and reduces the risk of overgeneralization/specialization. Some of its distinctive extensions are described below.
MarineTLO [6] is a top level ontology, generic enough to provide consistent abstractions or specifications of
concepts included in all data models or ontologies of marine data sources and provide the necessary
properties to make this distributed knowledge base a coherent source of facts relating observational data
with the respective spatiotemporal context and categorical (systematic) domain knowledge. It can be used
as the core schema for publishing Linked Data, as well as for setting up integration systems for the marine
domain. It can be extended to any level of detail on demand, while preserving monotonicity. The latest
version of MarineTLO (version 4) contains 127 classes and 81 properties.
CRMsci [7] is an ontology which is intended to be used as a global schema for integrating metadata about
scientific observation, measurements and processed data in descriptive and empirical sciences such as
biodiversity, geology, geography, etc. The CRMsci model has been developed bottom up from specific
metadata examples from biodiversity, geology, archaeology, cultural heritage conservation and clinical
studies. The latest version of CRMsci (version 1.2.2) contains 31 classes and 52 properties.
CRMdig [8] is an ontology that has been derived as an extension of CIDOC CRM, which is able to record the
provenance of digital objects. More specifically it is able to capture the steps and the methods of the
production of digitization products and synthetical digital representations. It also includes completely the
initial physical measurement processes and their parameters. The latest version of CRMdig (version 3.2)
contains 16 classes and 69 properties.
The Tools and Processes
The construction of a registry containing data coming from heterogeneous sources, is a rather laborious task.
Therefore, there is a need for tools that automate the process of constructing it, and monitoring the result.
More specifically for the construction and maintenance of the GRSF VRE the following software component
have been exploited.



Matware (https://support.d4science.org/projects/bluebridge/wiki/Matware) which is a framework
that automates the process of constructing semantic warehouses, by fetching and transforming
data from different (and in most cases heterogeneous) data sources. MatWare ensures that the
contents of the semantic warehouse are properly connected by exploiting connectivity metrics for
measuring how much of the integrated content is connected. MatWare is exploited for supporting
the construction of the knowledge base of the Global Record of Stocks and Fisheries.
GRSF-services-core (https://wiki.gcube-system.org/index.php?title=GRSF-services#grsf-servicescore) which is a software library which is responsible for exposing the contents of the Global
Registry of Stocks and Fisheries knowledge base and allow the users to approve or reject and
annotate particular GRSF records.
GRSF-services-updater (https://wiki.gcube-system.org/index.php?title=GRSF-services#grsf-servicesupdater). After publishing GRSF records in the GRSF Data Catalogue the records have initially the
status “pending”. This means that their contents have not been checked by an expert to approve
that they are correct or annotate them as erroneous. This is something that is being carried out
from the GRSF VRE administrator. For this reason, as the GRSF VRE administrator browses over the
GRSF records in the GRSF Data Catalogue, he can confirm or reject a GRSF record. In order to avoid
inconsistencies and have the contents of the GRSF Data Catalogue aligned with the GRSF KB (recall
D5.3 Blue Assessment VRE Specification: Revised Version
Page 37 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
that the GRSF Data Catalogue is populated using the GRSF KB), it is necessary to update both of
them.
 DataCatalogue-publish-API (https://wiki.gcubesystem.org/index.php?title=GCube_Data_Catalogue_for_GRSF) which is responsible for publishing
resources in the GCube Data Catalogue. For the purposes of GRSF a specific Data Catalogue has
been deployed. The GRFS Data Catalogue stores, as well as allows the publication of products of
two types: Stock and Fishery. Apart from the default set of metadata, each type of product will also
have specific fields. Some of them will also become automatically tags of the product. The same
reasoning applies for group associations. In fact a set of groups was already available and each
product will be automatically associated to them during publication, if that is the case.
 The gCube Data Catalogue (https://wiki.gcube-system.org/gcube/GCube_Data_Catalogue) which is
built using and extending CKAN platform. It allows publishing rich metadata, both the ones that
CKAN supports like titles, descriptions, licensing information, responsible persons and their roles, as
well as gCube-specific ones. The later are organized into profiles and are expressed as XML-based
metadata, and they allow including custom metadata fields in to the Data Catalogue. It has been
integrated in the infrastructure to assist the publishing, storage and discovery of data resources.
More specifically it contains resources that are intended for and resulting from the services of the
Blue Assessment VREs, to serve cases ranging from stock assessment to aquaculture atlas
generation, strategic investment and scientific training. Datasets include species distribution maps,
environmental data, area regulation zones, as well as stocks and fisheries.
5.5
IMPLEMENTATION PLAN
The overall implementation plan is the same as reported in 5.1; the tool will be implemented using the
following contributions:
 FORTH will provide the technology and manage the development of the GRSF;
 FAO will liaise with the user community, and ensure the development meets the community needs.
FAO will liaise in particular with RAM and SFP, who also provide in-kind effort, software, and data;
 FORTH and CNR will provide overall technical integration and publication of GRSF records through
the CKAN registry (through WP9).
Since D5.1, a more detailed implementation plan was prepared during the External Advisory Board Technical
Working Group Meeting (EAB-TWG2) and comprises of VRE development and implementation activities:
 Update the processes for building the GRSF knowledge base (March - May)
o Re-generate GRSF records (3 times by the end of the project)
o Harvesting new data from sources (2-3 times by the end of the project)
 Partners’ Tests (i.e. Data validation, UUIDs, etc.) (June - July)
 Update GRSF interfaces (Admin and Public) (March - May)
 Complete the development of the management functions (Mngt. panel) (March - May)
o Merge function
o Display proximities
o Traceability flag
 Field comment for users with categories
 Other requirements, integration of competency queries, etc. (from September)
The progress of the implementation can be checked through the meeting reports:
https://support.d4science.org/projects/stocksandfisherieskb/wiki/GRSF#Meetings
And for specific GRSF activities through the ticketing system: https://support.d4science.org/issues/643
D5.3 Blue Assessment VRE Specification: Revised Version
Page 38 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
REFERENCES
[1]
Assante M, Candela L, Castelli D, Coro G, Lelii L, Pagano P. (2016) Virtual research
environments
as-a-service
by
gCube.
PeerJ
Preprints
4:e2511v1
https://doi.org/10.7287/peerj.preprints.2511v1
[2]
Ellenbroek, A. (2016) Blue Assessment VRE Specification. BlueBRIDGE Deliverable D5.1.
[3]
Rochet MJ, Collie JS, Jennings S, Hall SJ. Does selective fishing conserve community
biodiversity? Predictions from a length-based multispecies model. Canadian Journal of
Fisheries and Aquatic Sciences. 2011;68:469–486.
[4]
Evaluation and management implications of uncertainty in a multispecies size-structured
model of population and community responses to fishing; Robert B Thorpe, Will J F Le
Quesne, Fay Luxford, Jeremy S Collie, Simon Jennings; Methods Ecol Evol. 2015
[5]
M. Doerr. The CIDOC conceptual reference module: an ontological approach to semantic
interoperability of metadata. AI magazine, 24(3):75, 2003.
[6]
Y. Tzitzikas, C. Allocca, C. Bekiari, Y. Marketakis, P. Fafalios, M. Doerr, N. Minadakis, T.
Patkos, and L. Candela. Unifying heterogeneous and distributed information about marine
species through the top level ontology MarineTLO. Program: electronic library and
information systems, 50(1), 2015.
[7]
M. Doerr, C. Bekiari, A. Kritsotaki, G. Hiebel, and M. Theodoridou. Modelling scientific
activities: proposal for a global schema for integrating metadata about scientific
observation. In Access and understanding–networking in the digital era: The 6th annual
conference of CIDOC, the International Committee for Documentation of ICOM, Dresden,
Germany, 2014.
[8]
M. Theodoridou, Y. Tzitzikas, M. Doerr, Y. Marketakis, and V. Melessanakis. Modeling and
querying provenance by extending cidoc crm. Distributed and Parallel Databases,
27(2):169–210, 2010.
[9]
Y. Tzitzikas, N. Minadakis, Y. Marketakis, P. Fafalios, C. Allocca, M. Mountantonakis, and
I. Zidianaki. Matware: Constructing and exploiting domain specific warehouses by
aggregating semantic data. In The Semantic Web: Trends and Challenges, pages 721–
736. Springer, 2014.
[10]
N. Minadakis, Y. Marketakis, H. Kondylakis, G. Flouris, M. Theodoridou, M. Doerr, and G.
de Jong. X3ML framework: An effective suite for supporting data mappings. Workshop for
Extending, Mapping and Focusing the CRM - co-located with TPDL’2015, September 2015.
D5.3 Blue Assessment VRE Specification: Revised Version
Page 39 of 40
BlueBRIDGE – 675680
www.bluebridge-vres.eu
APPENDIX 1
Overview of VREs related to task 5.1; generic VREs and model or framework specific
#
Status
Subject
Due date
6098
New
RDB for Fisheries Data Management (WECAFC)
May 31, 2017
6870
New
FLR for JRC
May 31, 2017
7449
New
request for a VRE for IOTC working party (november
2017) : Ichthyop VRE
Oct 02, 2017
5238
New
WECAFC Stock assessment support
1678
In Progress
FAO Stock Assessment VRE
Oct 31, 2016
1679
In Progress
ICES Stock Assessment VRE
Oct 01, 2016
5493
Released
Revise FAO Tuna Atlas VRE: Enlarge the pool of
DataMiner Algorithms and make SAI available
Oct 17, 2016
1675
Released
GRSF VRE; Global Record of Stocks and Fisheries
4846
Released
VRE Creation for ICCAT BFT-E
12-Sep-16
5136
Released
WECAFC-FIRMS: Add TabMan and Dataminer
4-Nov-16
5886
Released
GRSF_Admin
25-Nov-16
6229
Released
Create a VRE realising an analytics environment: the
Analytics Lab
23-Dec-16
5016
Released
RPrototypingLab Deployment
4894
Released
BlueBridge RStudio VRE
1677
Removed
Ecopath VRE
Oct 01, 2016
1779
Removed
IRD BFT Assessment
Oct 01, 2016
D5.3 Blue Assessment VRE Specification: Revised Version
Page 40 of 40