Project Overview

EUROPEAN TECHNICAL ASSISTANCE PROGRAMME FOR VIETNAM
(ETV2)
Ministry of Planning and Investment, Ministry of Finance and Ministry of Science and Technology in partnership with the
European Commission
VNM/AIDCO/2002/0589
SDMX in the Vietnam Ministry
of Planning and Investment
A Data Model to Manage Metadata
and Data
ETV2 Component 5 – Facilitating better decision-making in
the Ministry of Planning and Investment
ETV2 Component 5
• Goals
− Assist Ministry of Planning and Investment (MPI) to
improve monitoring, analysis and decision-making
capability
− Focus on access to and quality of MPI information
• Major data-related Issues
− Diverse data from multiple sources
• from provinces, other ministries, businesses, etc
• inconsistent formats, definitions
• overlapping data indifferent areas of MPI
− No facilities to share and reuse data
• poor data and metadata management
• no central storage or registration of data
We are evaluating an SDMX-based
solution
• To hold and manage the metadata
− mostly “structural” metadata in SDMX terms
− but “reference” metadata will be added
• To link the data to the metadata
• To provide an environment where the data can
be managed in the context of the metadata
− for capture and storage
− for searching and browsing
− for retrieval and access
Why SDMX?
• This does not seem to be its home territory
− data and metadata exchange
• the data does come from lots of other
organisations
− but this is not the focus of the current work
• MPI has a pool of unmanaged, mission-critical
data
− with very little management of structural
metadata
− and with virtually no reference metadata
• although plenty of need for it
Why SDMX?
• The SDMX Conceptual Model is essentially about linking
metadata to data
− this model provides
• a framework for sharing and re-using structural
metadata
• a context within which data can be managed and used,
via the structural metadata
• a context within which reference metadata (eg quality
information) can be introduced and managed
− SDMX also provides basic tools to support use of this
Conceptual Model
• It is a considerable challenge to build these features
into most other approaches
− you have to devise and support your own SDMX-like
model
The MPI Data
• Regular tables (and other material)
− supposed to come regularly from the various source
organisations
• mostly do not come from all in timely fashion
− little control over how they come
• usually email, often paper!
• may come as different versions at different times!
• quality and version status is an issue
− MPI works in Excel
• data that comes electronically generally comes in Excel
• where “metadata” exists it exists as Excel templates
− mostly stored on local machines
• hard to share within MPI departments, impossible to
share across them
− much of the data has security management
requirements
SDMX from the Conceptual Model
Perspective
• SDMX is usually described from a data exchange
perspective
− the terminology is a bit abstruse
− the UML makes a developing a detailed
understanding a challenge
• I like to look at SDMX as a Conceptual Model
− and to relate all the SDMX jargon to important
concepts in the conceptual model
• I have a few slides to explain the model
• and to show how we might use it at MPI
− still work in progress
The major SDMX artefacts
Concept Scheme
Hierarchical Code Set
(Classifications)
Concepts are
organised into
Concept Schemes
Concept
Code List
Structural Metadata
Organises simple
code lists into
hierarchies.
Codes and their
names and
descriptions.
Structures are built from
Concepts and associated Code
Lists that define the valid content
Category Scheme
Metadata Structure
Definition
Data and Metadata Flows are
linked to a structure that defines
the format of the corresponding
Data Sets and Metadata Sets
Data Structure
Definition
Metadata Flow
Provision Agreements are
agreements by providers to deliver
data to a schedule according to a
flow structure
Provision
Agreement
Structural Metadata contains information used to structure
data and metadata. This covers Concept and Category
schemes, Code Lists and Hierarchical Classifications, and
Structure Definitions for Data and Metadata Sets. The
basic structural metadata components are the Item
Scheme and the Structure Definition.
Data Flow
Possibly multiple category
schemes to categorise flows
and allow searching and
indexing of data and
metadata flows
Category
Potentially many data sets
from many providers
conforming to structure of
data flow
Metadata Set
Flows – the
heart of SDMX
Data
Provider
Data Provisioning Metadata
Data Set
Reference Metadata
Reference Metadata is non-structural metadata
that gives more information about an object to
make its interpretation more meaningful. Quality,
methodological, and conceptual information are
examples of Reference Metadata. SDMX allows
Reference Metadata to be published, shared,
and reused.
Data
The observed phenomena at specific
points as identified by the values of
concepts comprising the key of the
observations.
The SDMX top-level model
The SDMX top-level model
This is a description of an
data or metadata “flow” – an
abstracted data or metadata set
that will potentially occur for many
periods and from many providers
(eg a regular table received by
MPI from various sources)
This is an instance data or
SDMX top-level
metadata set from a particular
provider at a particular time, eg,
a particular table from Ninh Binh
province, for a particular period
The
model
This is a description of an
data or metadata “flow” – an
abstracted data or metadata set
that will potentially occur for many
periods and from many providers
(eg a regular table received by
MPI from various sources)
This is an instance data or
SDMX top-level
metadata set from a particular
provider at a particular time, eg,
a particular table from Ninh Binh
province, for a particular period
The
model
This is a description of an
data or metadata “flow” – an
abstracted data or metadata set
that will potentially occur for many
periods and from many providers
(eg a regular table received by
MPI from various sources)
Provision Agreements
indicate what Providers will
provide what subset, when,
how often, and how
This is an instance data or
SDMX top-level
metadata set from a particular
provider at a particular time, eg,
a particular table from Ninh Binh
province, for a particular period
The
model
This is a description of an
data or metadata “flow” – an
abstracted data or metadata set
that will potentially occur for many
periods and from many providers
(eg a regular table received by
MPI from various sources)
This identifies the Data
Providers, giving indicative and
contact information and linking
to Provision Agreements and
actual data and metadata sets
Provision Agreements
indicate what Providers will
provide what subset, when,
how often, and how
This describes the structure
of the data or metadata flow –
all the metadata needed to request
SDMX
top-level
and understand an instance
of the
flow (an actual data or metadata
set). Links to all other structural
metadata.
The
model
This categorises all the defined
data and metadata flows, providing
a structuring framework and a
basis for searching. Links to
other structural metadata.
The SDMX top-level model
SDMX at MPI
• What we envisage is
− an SDMX Registry/Repository
• code sets and classifications
− environment for standardising and harmonising
• Data Structure Definitions for all the regular data sets
received by MPI
− the “Data Flows”
• Categorisation schemes to index the Data Flows
− a data storage environment to hold the Data Sets
• initially probably a simple file store
• possible a database store
• possibly a star-schema store
− with star schema design generated automatically from
structural metadata
− provides options for different “cuts” through data flows
SDMX at MPI
• What we envisage (cont)
− intelligent interfaces to Excel
• using the structural metadata
• to support
−
−
−
−
−
−
browsing and retrieval of data
automatic generation of Excel templates
data capture, registration, and management
structural metadata browsing and management
reference metadata definition and management
reference metadata attachment
More Information
• In the workshop sessions
• [email protected]