Oslo Group Energy Statistics
8-11 May 2017
ESS Data Validation and exchange (SDMX)
Bart De Norre
Unit E.5: Energy
European Commission – DG Eurostat
1
Eurostat
Overview
•
•
•
•
•
Context
ESS shared validation
SDMX
DSD Energy
Eurostat ESS Future
2
Eurostat
Context
• European Energy Statistics:
Energy Statistics Regulations
Regulation 223/2009 on European statistics
Content, Quality, Responsibilities (MS, Eurostat)
• International:
• IEA/EUROSTAT/UNECE data collections
• IRES / SIEC
• Common data needs / methodology (validation) / reporting
• Eurostat and the ESS – the ESS Vision 2020
• ESS global policies / standards / projects (all domains)
• together
• Eurostat, the ESS and world wide – SDMX
3
Eurostat
Context
• Quality:
• Criteria / Code of practice (accuracy, coherence, timeliness,
cost-effectiveness)
• Content: data and metadata
• Processes
• Actors / stakeholders
• Communication and exchange
Exchanges at many levels: countries – international bodies,
within country between different authorities; countries –
countries
• Reporting countries
• Most difficult part (subsidiarity / proportionality); many aspects
in accuracy can only be done by collecting authorities; receiving
aggregated data limits many accuracy checks to plausibility
checks
• Increasing demands / stable or shrinking resources
Eurostat
4
Context
• Quality, the role of:
•
•
•
•
•
Collaboration and mutual common understanding
Formalising knowledge (data, metadata and validation logic)
Who does what (formalising responsibilities)
Information processing (IT)
Facilitating exchange at collection, at dissemination
(push/pull)
• Standardisation
5
Eurostat
Context
• Standardisation:
• content (data, code lists, definitions and other domain
methodology)
• methodology (validation typology and logic)
• business processes (GSBPM)
• generic outcomes (validation reports)
• information system design, IT "technology"
• within a domain / across domains (using/comparing data
across several domains)
• understanding and applying similar "standardised" approaches
is especially useful for NSI and Eurostat working in many
domains ("industrial statistical processing"; metadata driven
generic IT processes)
• Economics of scale
6
Eurostat
Context
• Eurostat / ESS global objectives for ESS shared validation
and SDMX: increase
• Effectiveness (result)
• Efficiency (process)
• Related projects to a higher quality at a lower cost:
• less errors, more certainty and more timely outcomes
• reduction in resources and elimination of redundant actions.
• A few examples of targeted changes:
• validation comprising rules which are formally adopted by all
ESS stakeholders;
• sharing the same understanding via standards and more
formalised knowledge;
• a more automatized validation process with clear
responsibilities and avoiding multiple iterations (ping-pong)
7
Eurostat
ESS shared validation
• The idea: collaborative approach of sharing validation
between countries and Eurostat
in fact, is this principle so new?
• Major objectives: increasing
• transparency and clarity
(shared and easily accessible documentation on validation procedures);
• effectiveness of the validation
(the appropriate rules by the responsible authority close at the source,
awareness of all possible errors and warnings, avoiding validation gaps,
avoiding double work)
• efficiency
(avoiding the number of iterations between sending – validating –
requesting new sending)
• efficiency in the information processing and communication
(The lack of common standards for validation solutions leads to a
duplication of IT development and integration costs in the ESS).
8
Eurostat
ESS shared validation
• ESSC May 2016:
"agreement and documentation at Working Group level of
validation rules and responsibilities" Mandatory!
"Use of shareable and reusable ESS services to validate data"
• What does this imply
• Common validation policy / methodology / typology
validation levels
• Validation responsibilities
• Validation rules and severity
• Improving inter-operability
9
Eurostat
ESS shared validation
Validation typology
10
Eurostat
ESS shared validation
• How and who
• Basic principles
• Design data structures
• Design validation rules (and severity level)
A full description and agreement on the validation rules
and their associated severity for each data collection
• Validate data: 2 major steps/ actors (reporting
country, Eurostat)
An agreement on the responsibility of each national
reporting organisation to perform the agreed validation
rules before
• Long term process / in steps / Taskforce
11
Eurostat
ESS shared validation
12
Eurostat
ESS shared validation
Scenario: Autonomous / Interoperable validation services
13
Eurostat
ESS shared validation
Scenario: Replicated/Shared validation services
14
Eurostat
ESS shared validation
Scenario: shared validation process
15
Eurostat
ESS shared validation
16
Eurostat
SDMX
•
•
•
•
Statistical data and metadata exchange
SDMX – adopted and used more and more in ESS
Organisational context and logistics
Developments for energy statistics
(Supply – transformation – consumption; Electricity and gas prices)
17
Eurostat
SDMX
Common open standards for data and metadata
• accepted worldwide for exchanging and sharing statistical
information
• and as a general basis for statistical infrastructures.
•
Started in 2001 by seven international organisations
•
•
ISO standard (ISO 17369:2013)
SDMX Roadmap 2020 (March 2016):
(BIS, ECB, Eurostat, IMF, OECD, UN and World Bank)
•
•
•
•
Strengthening implementation of SDMX
Making data usage easier via SDMX
Using SDMX to modernise statistical processes
Better communication and capacity-building (including interaction with
ESS Vision 2020 and UNECE modernisation of official statistics)
18
Eurostat
SDMX
• Promoted by the European Statistical System Policy
and big enabler for the ESS VISION 2020
•
•
•
•
•
•
•
reduce data errors
improve timeliness
improve accessibility
improve interpretability
improve coherence
reduce the reporting burden
reduce IT development and maintenance costs
(with open source approach, shared toolbox and improved interpretability).
For future IT development: standards independent of domain specific structures
"If each partner system were to use SDMX data structures and common IT building blocks,
international information systems would be able to communicate ‘machine-to-machine’ as in
industrial production processes."
•
ESSC: SDMX one of the non-legislative normative documents to
be recognised as ESS standards
19
Eurostat
SDMX
Use of SDMX in Eurostat – ESS statistical domains
• 40% of European statistical production processes are now
describing their data structures using the SDMX standards and
concepts.
• ESS: 26 SDMX implementation projects, related to about ⅖
(38%) of all the data sets that Eurostat receives through
EDAMIS
• Further increase: 2016: +10; 2017: +4
• GLOBAL DSD: 5
https://webgate.ec.europa.eu/fpfis/mwikis/sdmx/index.php/SDMX_DSD_
availability
20
Eurostat
SDMX
• statistical content and the information processing (IT)
perspectives
• Standards on logical and technical level:
• "Content"
• "Container"
• Development of SDMX artefacts according SDMX
methodology, guidelines and concepts
For example information model, content oriented guidelines,
cross-domain concepts
• Use of SDMX objects by generic programs/services
• "Embed logic in data objects"
21
Eurostat
SDMX
Organisation and logistics
• International agreement <= SDMX secretariat
• Support and management by SDMX IT architecture and
tools
for example a central repository (Euro SDMX registry)
• Maintenance responsibility by a selected international
agency
22
Eurostat
SDMX
For Energy Statistics:
•
•
•
•
•
Supply-consumption chain => "semi-global" DSD
development with IEA
Firstly collection/validation/production process
Afterwards dissemination
Eurostat DSD on Prices
Firstly DSD (data structure definition)
Afterwards MSD (metadata structure definition)
Still later: use of VTL (Validation and Transformation Language)
23
Eurostat
DSD Energy statistics
• Supply-Consumption chain:
• Joint annual data collections (JAQ) of IEA-Eurostat-UNECE;
monthly data collections (joint, IEA, Eurostat)
• IRES and SIEC
International / Standards / Classification(s)
• SDMX guidelines:
•
MODELLING A STATISTICAL DMAIN FOR DATA EXCHANGE IN SDMX
•
THE DESIGN OF DATA STRUCTURE DEFINITIONS
•
THE CREATION AND MANAGEMENT OF SDMX CODE LISTS
https://sdmx.org/wp-content/uploads/Modelling-domain-SDMX-discussion-paperv1-201503.pdf
https://sdmx.org/wp-content/uploads/SDMX_Guidelines_for_DSDs_1.0.pdf
http://sdmx.org/wp-content/uploads/SDMX_Guidelines_for_CDCL.doc
24
Eurostat
DSD Energy statistics
Joint development Eurostat - IEA
• IEA had done work in 2009-2012
• Eurostat proposed IEA in May 2015 to take up together DSD
development
• Series of video conferences / mail exchanges since October
2015
• Explanatory document + Inventory code lists
• Testing DSD mapping with some questionnaires on-going
• Gradual verification/update
(revised monthly coal, new monthly electricity, revised JAQ 2017)
25
Eurostat
DSD Energy statistics
Concepts in the energy domain
• Definition (concept code name and description)
• Role (dimension, primary measure, attribute)
• Level (attribute relevant at observation, series or dataset
level)
• usage status (mandatory or conditional/optional attributes)
• code list / format
26
Eurostat
DSD Energy statistics
Dimensions / structural principles (SDMX guidelines)
1. Parsimony:
no redundant dimensions for identifying
2. Simplicity:
keep identifiers short / keep number of dimensions low
3. Purity:
dimensions relate to one pure concept, not to a combination
4. Density and sparseness
("not available" values in the dimension combinations)
5. Unambiguousness
(avoid one observation to be expressed by multiple combinations of dimension
values (keys))
6. Exhaustiveness
(includes every piece of information that is required to unambiguously represent
a data point and to correctly interpret it outside its usual context)
27
Eurostat
DSD Energy statistics
7. Orthogonality:
independence of the meaning of a value of one dimension from the values of
any other dimensions
8. User friendliness
(While a simple DSD consisting of a few dimensions only may be easier to
understand by a human data consumer, a more complex, but purer DSD is
typically more flexible in terms of further usage in automated processes.)
9. Fitness for use throughout the entire statistical business
process
• Re-use concepts / code lists
(frequency, observation status)
• Extensible for potential future needs
28
Eurostat
DSD Energy statistics
• Further design principles:
• designed independently of the layout or technical features of
existing EXCEL questionnaires and database structures in place
at IEA and Eurostat
• one DSD which is based on a clear logical model and flexible
enough to cover all data and metadata from all concerned
questionnaires
• "Remarks" sheets
(Attribute "COMMENT_OBS"; explanatory "free text" to future MSD)
29
Eurostat
DSD Energy statistics
14 DIMENSIONS (identifying concepts)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
QUEST_SOURCE
REF_AREA
TIME_PERIOD
FREQ
ENERGY_PRODUCT
MAIN_FLOW
FLOW_BREAKDOWN
PLANT_TECH
PLANT_TYPE
STOCKS
INFRASTRUCTURE_IND
VIS_A_VIS_AREA
MEASURE_VALUE_TYPE
FACILITY_ID
30
Eurostat
DSD Energy statistics
REF_AREA
•
•
SDMX promotes one code list across SDMX domains
Based on the one of National Accounts
ENERGY_PRODUCT
•
•
•
•
all primary and secondary energy products or commodities and their
aggregates as used in the energy questionnaires (and energy balances)
Align to SIEC
SIEC doesn't contain all our products
Align to SDMX code list guidelines
Codes based on SIEC hierarchical numbering
31
Eurostat
DSD Energy statistics
MAIN_FLOW and FLOW_BREAKDOWN
•
•
•
two-level hierarchical approach
according IRES
codes for MAIN_FLOW
1. Production
2. Net production (of electricity or heat)
3. Gross production (of electricity or heat)
4. Imports
5. Exports
6. International marine bunkers
7. International aviation bunkers
8. Stocks
9. Transfers
10. Supply
Around 110 codes for
FLOW_BREAKDOWN
10.
11.
12.
13.
14.
15.
16.
Statistical differences
Demand
Transformation
Consumption
Energy use
Non-energy use
Losses
32
Eurostat
DSD Energy statistics
MAIN_FLOW and FLOW_BREAKDOWN
•
•
•
•
Many main flows are split in more detailed flows
There are multiple electricity and heat production flows because of all possible
energy input, plant technologies and plant types.
Stock changes apply to a broad diversity of types of stocks
Import and export flows are detailed by the country/region from where is
imported resp. the country/region to where is exported
For
•
•
•
•
some detailed flows additional dimensions are needed:
PLANT_TECH
PLANT_TYPE
STOCKS
VIS_A_VIS_AREA.
33
Eurostat
DSD Energy statistics
PLANT_TECH:
•
•
technologies used in plants for production of electricity and / or heat
This code list is not a straight hierarchical classification:
different perspectives are used in classifying/grouping power and heat plants
(product based, single/multi-fired, and technical type of generation)
COMBFUEL - Combustible Fuels; HYDRO - Hydro (all, unspecified)
PLANT_TYPE:
•
main classification of electricity and heat plants
MAINELEC - Main Activity Producer Electricity Plants
INFRASTRUCTURE_IND:
•
A number of data in the questionnaires describe infrastructure characteristics
GROSSCAP - Gross capacity (of electricity and/or heat);
SOLARSUR - Solar collectors surface
34
Eurostat
DSD Energy statistics
MEASURE_VALUE_TYPE
several measurement concepts used in reporting of energy data values
ENERGY Measure of Heat or Electricity; NCV Net Calorific Value
VIS_A_VIS_AREA (COUNTERPART_AREA)
FACILITY_ID
an identifier key for the storage locations for gas and refineries (from the JAQ 2017
onwards)
35
Eurostat
DSD Energy statistics
OBS_VALUE
6 ATTRIBUTES:
UNIT_MEASURE
KT – Kilotonne; TJ_NCV - TeraJoule (NCV)
OBS_STATUS (SDMX Standard)
normal, missing, estimated
CONF_STATUS (SDMX Standard)
SUBMISSION (date of the submission of the questionnaire)
COMMENT_OBS (short free text related to one or more observations, dataset)
FACILITY_TYPE (types of gas storage facilities and refineries)
FACILITY_NAME
36
Eurostat
SDMX SIEC based code list ENERGY_PRODUCT
SIEC
SIEC
aggregation formula
based
for DSD
C0000 C0000 = C0100
Section
Division
Group
0
C0100
01
+C0200 + C0300 +
C0370 + C0390
C0100 = C0110 +
C0120
C0110
C0120
011
C0120 = C0121 +
C0129
class
SIEC label
Coal
Hard coal
0110
012
Anthracite
Bituminous coal
C0121
0121
Coking coal
C0129
0129
Other bituminous coal
C0200
C0200 = C0210 +
C0220
02
Brown coal
C0210
021
0210
Sub-bituminous coal
C0220
022
0220
Lignite
C0300
C0310
C0300 = C0310 +
C0320 +C0330 +
C0340 + C0350
+C0360
C0310 = C0311 +
C0312 +C0313 +
C0314
03
Coal products
031
Coal coke
C0311
0311
Coke oven coke
C0312
0312
Gas coke
Eurostat
37
Example monthly coal
38
Eurostat
Example monthly coal
39
Eurostat
Example monthly electricity
40
Eurostat
Example monthly electricity
41
Eurostat
Example monthly electricity
42
Eurostat
Example monthly electricity
43
Eurostat
Eurostat ESS future
•
•
•
•
•
•
•
A full global ESS Shared Validation / SDMX implementation is a
long term issue
Awareness and knowledge building is highly needed
Well informed, active participation to be able to endorse
together a realistic planning
Before real production some pilot projects with volunteering
countries are needed
Implementation shall take place in steps over years
Avoid burden in the transition phases for reporting countries
Allow options for "fast implementers"
44
Eurostat
Eurostat ESS future
"ESWG Task Force on the implementation of ESS shared
validation and SDMX "
•
•
•
•
the development and review of data structures definitions, code lists and
other SDMX artefacts (including their validation and testing)
development and assessment of the implementation (including timeline for
the implementation)
establishment of specific operational guidelines for the energy domain,
including links to the ESS shared validation
prepare and advise on recommended actions and implementations for the
ESWG meetings
45
Eurostat
• [email protected]
46
Eurostat
© Copyright 2026 Paperzz