WP Item 6 - Europa.eu

WP Item 6
The Expressions Language of Banca d’Italia
(EXL)
5 June 2013
SDMX Technical Working Group
Luxembourg
1
History
Mid nineties: Banca d’Italia designed a Language for
validations and calculations
2009: A new version of the EXL was released as part of the
new Infostat software platform, containing the operators
needed for validation and basic calculation
On-going: progressive upgrade of the EXL for supporting
the data compilation, for example:
–
Operators for time series manipulation
–
Operators for data analysis
–
Operators’ syntax upgrade
5 June 2013
SDMX Technical Working Group
Luxembourg
2
Basic example of validation rule
Two collected data:
C1: Loans - Date, Entity, Sector,
Amount
C2: Loans - Date, Entity, Geo_Area, Amount
Check rule:
C1 and C2 should be equal if aggregated on their common
dimensions (for less than a small amount)
EXPRESSIONS:
C3 = get ( C1, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))
C4 = get ( C2, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))
C5 = check ( C3 – C4 <= given_threshold )
EXL operators:
Check ()
<=
5 June 2013
special operator for checks
subtract the multidimensional data
comparison operator
SDMX Technical Working Group
Luxembourg
3
Example of sum
Two collected data:
C1 (Current Accounts):
C2 (Mortgages):
Date, Entity, Sector, Amount
Date, Entity, Geo_Area, Amount
The desired result is Loans (= Current Accounts + Mortgages):
C5 (Loans):
Date, Entity, Amount
EXPRESSIONS:
C3 = get ( C1, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))
C4 = get ( C2, keep (DATE, ENTITY, AMOUNT), sum (AMOUNT))
C5 = C3 + C4
EXL operators:
5 June 2013
Get ()
Keep ()
Sum ()
+
read the specified data
keep the specified dimensions
sum the specified measure (if quantitative)
sum the multidimensional data
SDMX Technical Working Group
Luxembourg
4
5 June 2013
SDMX Technical Working Group
Luxembourg
5
5 June 2013
SDMX Technical Working Group
Luxembourg
6
5 June 2013
SDMX Technical Working Group
Luxembourg
7
5 June 2013
SDMX Technical Working Group
Luxembourg
8
5 June 2013
SDMX Technical Working Group
Luxembourg
9
5 June 2013
SDMX Technical Working Group
Luxembourg
10
5 June 2013
SDMX Technical Working Group
Luxembourg
11
5 June 2013
SDMX Technical Working Group
Luxembourg
12
5 June 2013
SDMX Technical Working Group
Luxembourg
13
5 June 2013
SDMX Technical Working Group
Luxembourg
14
5 June 2013
SDMX Technical Working Group
Luxembourg
15
Validation
Formal (Structural)
– assurance that the formal structure of the data observations
matches the Data Structure Definition, in term of concepts, their
roles and their admissible values; the formal validation is not
treated as a calculation and is not defined through an
expression;
Of the Information Content (Plausibility)
– Assurance that the data content gives right information about the
real world (as much as possible); to this end, it is possible to use
the a-priori information about the real world and the possible
redundancies of the data (e.g. the integrity rules, coherence
rules, plausibility rules); this kind of validation rules is normally
performed through calculations,
5 June 2013
SDMX Technical Working Group
Luxembourg
16
Validations as calculations
• Use of the same language of the calculations
• Validations possible in any phase of the
process
• Results of the Validations like any other data
• are defined and stored
• can be inquired and disseminated
• can be further processed
5 June 2013
SDMX Technical Working Group
Luxembourg
17
SDMX Compliance
•
•
The SDMX 2.0 and 2.1 versions already envisaged the
introduction of a standard language for validations
and calculations
The SDMX 2.1 package n. 13 (Transformations and
Expressions) is a generic model aimed to track the
validation and the calculation of data, derived from the
CWM (Common Warehouse Metamodel), a OMG standard
(Object Management Group)
•
•
However this model is not operational in-itself,
because it requires a language to specify the validation
and calculation expressions
The EXL is designed according with the SDMX
package n. 13 – Transformations and Expressions
5 June 2013
SDMX Technical Working Group
Luxembourg
18
SDMX IM – Package 13
5 June 2013
SDMX Technical Working Group
Luxembourg
19
Transformations; internal view
Einstein equation E = MC2
 E = M*(C**2)
Operand:
C
b Operator:
Operand:
p
f Operator:
**
2
f
Operand:
Result: E
*
M
Constant
node
5 June 2013
Reference
nodes
Operator
nodes
0..*
Expression
nodes
SDMX Technical Working Group
Luxembourg
20
Transformations: User view
Einstein equation E = MC2
 E = M*(C**2)
Operand:
C
Operand:
2
Expression:
Result: E
E = M*(C**2)
Operand:
M
5 June 2013
SDMX Technical Working Group
Luxembourg
21
Notes on Transformations
The Operands may be:
• Artefacts of the model (e.g. Statistical Data)
• Constants
• Operator nodes
The property of “Closure”
• The result is an artefact of the model (e.g.
Statistical Data)
• The result may be operand of other calculations
5 June 2013
SDMX Technical Working Group
Luxembourg
22
Graph of the calculations
External Institutions
C1
T1
C2
Economic research models
T53
T51 C51
C3
T2
T3
C4
C54
T54
T52 C52
C53
C5
Statistical bulletin
Banks & OFI’s reports
C1
0
C12
C13
T12 C15
T13
T60
T61
T1 C17
T70
T71 C71
5 June 2013
T21 C23
C24
T22
C60
Statistical products
4
C16
C.C.R.
C21
C22
C61
T72
C70
C72
Supervision models
C41
T41
SDMX Technical Working Group
Luxembourg
T42
C42
23
Software Tools
•
Dictionary, that is a data base containing all the definitions
•
Warehouse, that is the complex of data archives containing the data,
logically unique but also physically heterogeneous and distributed
•
Tool for the administration of the metadata (create, modify, etc.),
including the expressions for calculations and controls (this package
is built in-house)
•
Tool for validation of the expressions syntax and consistency
and for translation of the expressions in the language of the
calculation tools (based on the open source ANTLR under the
control of a software built in-house)
•
Execution of the expressions, that is the calculation engine of the
software platform, based on a software layer developed in-house that
interfaces and controls the calculation software, which in turn can be
various: currently it is used the open-source R, the SQL, some
software built in-house and optimized for specific purposes.
5 June 2013
SDMX Technical Working Group
Luxembourg
24
Allowed Data
The EXL is applied to any kind of data of interest in the Bank of Italy
statistical environment, like
– Dimensional data, including as particular cases
• time series
• cross sections
– Questionnaires
– Registers
the Bank of Italy is gradual extending the use of EXL to the whole
statistical information system to support its industrial processing
5 June 2013
SDMX Technical Working Group
Luxembourg
25
End to End processing
Design
Build
Collect
Process
Disseminate
Use case: production of the information for the ECB concerning the
balance sheet of the monetary and financial institutions sector
Collect
data on securities
from MFI on a
security by security
basis.
Collect
securities register
data
5 June 2013
check
Structural
and
integrity
checks
check
process
process
process
1. Data are integrated with
information relevant to the
collected security codes
disseminate
Dataflow
to ECB
2. Missing observations are
estimated
3. data are aggregated
SDMX Technical Working Group
Luxembourg
26
Some other characteristics
Formal – expressed in Backus-Naur form
Deals with historicity:
– Takes into account the time validity of the artefacts
– Allows defining changes of the algorithm with reference to the
time
May deal with
– Mono and multi-measure data
– Data attributes having a definable behaviour
– Operands having different dimensionality
– Subsets of dimensional cubes
– Implicit / explicit zeros
Allows
– Persistent and non persistent results
– Expressions as operands of other expressions
– Invocation of external routines
5 June 2013
SDMX Technical Working Group
Luxembourg
27
Operators used in the validation (1)
Data retrieval / storage (Get, Put)
Projection (drop, keep …)
Filter (=, <, <=, >, >=, <>, like, between …)
Aggregation (sum, avg, min, max, first, last …)
Other manipulators of the data structure (rename, calc)
Join (merge)
Algebraic and string manipulation ( +, -, *, /)
Comparison (=, <, <=, >, >=, <>)
5 June 2013
SDMX Technical Working Group
Luxembourg
28
Operations used for validation (2)
Logical (and, or, not)
Tailored for Validation:
– Check of a generic condition
– Existence and referential integrity check
– Completeness check
– Imbalance
– Error severity level
Conditional execution (case)
Currency conversion
Date-time (year, month, day, time shift)
5 June 2013
SDMX Technical Working Group
Luxembourg
29
WP Item 6
Expressions and Calculations
5 June 2013
SDMX Technical Working Group
Luxembourg
30