The Italian Integrated System of Statistical Registers On the Design of an Ontology-based Data Integration Architecture R. Radini ([email protected]), M. Scannapieco ([email protected]) , G.Garofalo ([email protected]) Italian National Institute of Statistics - Istat Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 1 Outline Introduction to ISSR OBDM and examples Data architecture Correspondence with EARF DV vs DW Conclusions Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 2 ISSR – Italian Integrated System of Statistical Registries Istat engaged a modernization programme aimed at a significant revision of the statistical production One of the main pillars of this revision is the design of production processes based on an Integrated System of Statistical Registers Single logical environment to support the consistency of statistical production processes in Istat, in particular consistency in “identification” and “estimation” for the whole integrated system of units and variables Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 3 ISSR: Types of Registers RSB (Base registers) contains several statistical populations and the minimum set of variables useful to characterize stat units RSE (Extended registers) extends the information of a specific RSB on a specific RSB’s population RST (Thematic registers) supports more statistical processes through a consistent and shared treatment on some topics Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 4 OBDM Ontology Based Data Management System Ontology (or computational ontology): conceptual data representation expressed through «computational» languages In mathematical logic: assiomatic first order theory expressable in description logic OBDM is an integration system where the usual ER global schema is replaced by the conceptual model of the application domain formulated as an ontology Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 5 OBDM Architecture Main features Ontology Data source transparency property Mapping (called data virtualization by IT Data source 3 Data source 1 platform) Global view Data source 2 Consistency Three-level architecture: Ontology, Sources, Mapping Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 6 Excerpt of the Ontology of the Working Relationships Working relationship Employee Self-employee Worker Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 7 Excerpt of the Population Ontology Family registry Common law family Family Individual Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 8 Data Integration: same concept Individual (Population Ontology) Individual (Working relationships ontology) Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 9 Querying over the ontology Query: We would like to query for people that have the residence in a certain region and classify them by age, educational degree and employment condition We don’t have to know how information are stored in the sources! Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 10 Query Ontology Mapping Mapping Query rewritten over the sources RS of Individuals people that have residence in a certain region classified by age and educational degree RS of Labour by employment condition Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 11 High expressive power It is possible to give different definition of a concept dependending on the istance It is possible to express different constraints related to each definition CorporationManagerLabour Force Employee Self-employee Corporation Manager NationalAccount Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 CorporationManager has a different semantics according to the domain 12 Data architecture Compliance to EARF (Enterprise Architecture Reference Framework) Metadata Management Unitary Metadata System Primary Data Storage Quality Assessment Logical centralization of ISSR Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 Data consistency ODBM 13 Data architecture: IT View Features DV DW Storage of Historical Data NO YES Capture Every Change in Production Data NO Multi-Dimensional Data Structures Data Pre-Aggregation NO YES (requires integration with CDC) YES NO YES Query performance on large amounts of data SLOW (relative to DW) FAST (relative to DV) Data Integration on Demand YES NO Operational Cost LOW (relative to DW) LOW (relative to DW) YES (relative to DW) LOW (relative to DW) HIGH (relative to DV) HIGH (relative to DV) NO (relative to DV) HIGH (relative to DV) Time-To-Market Easy to Make Changes Dependence on IT Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 14 Conclusions EA approach for ISSR design and implementation ISSR Data Architecture: Hybrid solution with DV and DW E.g. DV-based data architecture with DW for historical data and dissemination Next steps: Prototypes of RSB Individual, Families and Cohabitations and RST Working Relationships Guidelines for the Management of the Integrated System of Statistical Registers Monica Scannapieco – Brussels, NTTS, 14-16 March 2017 15
© Copyright 2026 Paperzz