OBDM * Ontology Based Data Management System

The Italian Integrated System of
Statistical Registers
On the Design of an Ontology-based
Data Integration Architecture
R. Radini ([email protected]), M. Scannapieco ([email protected]) , G.Garofalo ([email protected])
Italian National Institute of Statistics - Istat
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
1
Outline
 Introduction to ISSR
 OBDM and examples
 Data architecture
 Correspondence with EARF
 DV vs DW
 Conclusions
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
2
ISSR – Italian Integrated System of
Statistical Registries
 Istat engaged a modernization programme
aimed at a significant revision of the statistical
production
 One of the main pillars of this revision is the
design of production processes based on an
Integrated System of Statistical Registers
 Single logical environment to support the consistency
of statistical production processes in Istat, in
particular consistency in “identification” and
“estimation” for the whole integrated system of units
and variables
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
3
ISSR: Types of Registers
RSB (Base registers)
contains several statistical
populations and the
minimum set of variables
useful to characterize stat
units
RSE (Extended registers)
extends the information of
a specific RSB on a
specific RSB’s population
RST (Thematic registers)
supports more statistical
processes through a
consistent and shared
treatment on some topics
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
4
OBDM
Ontology Based Data Management System
 Ontology (or computational ontology): conceptual data
representation expressed through «computational»
languages
 In mathematical logic: assiomatic first order theory expressable
in description logic
 OBDM is an integration system where the usual ER
global schema is replaced by the conceptual model of
the application domain formulated as an ontology
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
5
OBDM Architecture
Main features
Ontology
 Data source
transparency property
Mapping
(called data
virtualization by IT
Data source 3
Data source 1
platform)
 Global view
Data source 2
 Consistency
Three-level architecture:
 Ontology, Sources, Mapping
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
6
Excerpt of the Ontology of the Working
Relationships
Working
relationship
Employee
Self-employee
Worker
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
7
Excerpt of the Population Ontology
Family registry
Common law
family
Family
Individual
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
8
Data Integration: same concept
Individual
(Population
Ontology)
Individual (Working
relationships
ontology)
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
9
Querying over the ontology
Query: We would like to query for people that have
the residence in a certain region and classify them
by age, educational degree and employment
condition
We don’t have to know how information are stored
in the sources!
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
10
Query
Ontology
Mapping
Mapping
Query rewritten
over the sources
RS of
Individuals
people that
have residence in a
certain region classified
by age and educational
degree
RS of
Labour
by employment
condition
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
11
High expressive power
 It is possible to
give different
definition of a
concept
dependending
on the istance
 It is possible to
express
different
constraints
related to each
definition
CorporationManagerLabour Force
Employee
Self-employee
Corporation
Manager
NationalAccount
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
CorporationManager
has a different
semantics according
to the domain
12
Data architecture
 Compliance to EARF (Enterprise
Architecture Reference Framework)
Metadata
Management
Unitary Metadata
System
Primary Data
Storage
Quality Assessment
Logical centralization
of ISSR
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
Data consistency
ODBM
13
Data architecture: IT View
Features
DV
DW
Storage of Historical Data
NO
YES
Capture Every Change in
Production Data
NO
Multi-Dimensional Data
Structures
Data Pre-Aggregation
NO
YES
(requires integration with
CDC)
YES
NO
YES
Query performance on large
amounts of data
SLOW
(relative to DW)
FAST
(relative to DV)
Data Integration on Demand
YES
NO
Operational Cost
LOW
(relative to DW)
LOW
(relative to DW)
YES
(relative to DW)
LOW
(relative to DW)
HIGH
(relative to DV)
HIGH
(relative to DV)
NO
(relative to DV)
HIGH
(relative to DV)
Time-To-Market
Easy to Make Changes
Dependence on IT
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
14
Conclusions
 EA approach for ISSR design and
implementation
 ISSR Data Architecture: Hybrid solution with DV
and DW
 E.g. DV-based data architecture with DW for historical
data and dissemination
 Next steps:
 Prototypes of RSB Individual, Families and
Cohabitations and RST Working Relationships
 Guidelines for the Management of the Integrated
System of Statistical Registers
Monica Scannapieco – Brussels, NTTS, 14-16 March 2017
15