CORE Information Model

ESSnet
COmmon Reference Environment
CORE
Partner’s name: Statistics Netherlands
WP number and name: WP2: CORE Information Model
Deliverable number and name: 2.2 Generic statistical information model
CORE Information Model1
(Final)
Partner in charge
Statistics Netherlands
Version
1.0
Date
September 23 2011
Version
Changes
Changed by
Date
0.1
First draft
CBS
29-04-2011
0.2
Second draft; comments of partners have been
processed
CBS
27-05-2011
CBS
10-06-2011
0.3
0.4
Channels made explicit
CBS
08-07-2011
0.5
References added
CBS
16-08-2011
0.6
Final comments processed
CBS
23-09-2011
1
This document is distributed under Creative Commons licence
"Attribution-Share Alike - 3.0 ", available at the Internet site:
http://creativecommons.org/licenses/by-sa/3.0
Date of dissemination
Version
Page
September 23 2011
1.0
1
ESSnet
CORE
COmmon Reference Environment
Summary
This document presents the CORE information model: a protocol of communication between
statistical services.
Keywords: CORA, CORE, information model, interface model, communication interface, statistical
service, GSIM, GSBPM
Date of dissemination
Version
Page
September 23 2011
1.0
2
ESSnet
COmmon Reference Environment
CORE
Contents
1
2
3
Introduction .................................................................................................................................. 6
1.1
Structuring the interface: channels ....................................................................................... 6
1.2
Using the interface: messages .............................................................................................. 7
1.2.1
Service signature message ........................................................................................... 7
1.2.2
Service configuration message..................................................................................... 8
1.2.3
Service execution message........................................................................................... 9
1.2.4
Service output message ................................................................................................ 9
1.3
Design principles................................................................................................................ 10
1.4
Requirements met by the model......................................................................................... 11
1.5
Possible extensions ............................................................................................................ 14
1.6
References .......................................................................... Error! Bookmark not defined.
Package structure ....................................................................................................................... 16
2.1
Overview ............................................................................................................................ 16
2.2
Class diagram ..................................................................................................................... 16
Package contents ........................................................................................................................ 17
3.1
Data set definitions package .............................................................................................. 17
3.1.1
Overview .................................................................................................................... 17
3.1.2
Class diagram ............................................................................................................. 17
3.1.3
Detailed description ................................................................................................... 18
3.2
Expressions package .......................................................................................................... 21
3.2.1
Overview .................................................................................................................... 21
3.2.2
Class diagram ............................................................................................................. 22
3.2.3
Detailed description ................................................................................................... 22
3.3
Parameters package ............................................................................................................ 24
3.3.1
Overview .................................................................................................................... 24
3.3.2
Class diagram ............................................................................................................. 24
3.3.3
Detailed description ................................................................................................... 24
Date of dissemination
Version
Page
September 23 2011
1.0
3
ESSnet
COmmon Reference Environment
CORE
3.4
3.4.1
Overview .................................................................................................................... 25
3.4.2
Class diagram ............................................................................................................. 25
3.4.3
Detailed description ................................................................................................... 25
3.5
Channels package ............................................................................................................... 27
3.5.1
Overview .................................................................................................................... 27
3.5.2
Class diagram ............................................................................................................. 27
3.5.3
Detailed description ................................................................................................... 27
3.6
Messages package .............................................................................................................. 27
3.6.1
Overview .................................................................................................................... 27
3.6.2
Class diagram ............................................................................................................. 28
3.6.3
Detailed description ................................................................................................... 28
3.7
4
Rules package .................................................................................................................... 25
Service info package .......................................................................................................... 31
3.7.1
Overview .................................................................................................................... 31
3.7.2
Class diagram ............................................................................................................. 31
3.7.3
Detailed description ................................................................................................... 31
Statistical interpretation of the information model .................................................................... 33
4.1
Micro data sets ................................................................................................................... 33
4.1.1
Rationale .................................................................................................................... 33
4.1.2
Description ................................................................................................................. 34
4.1.3
Detailed description ................................................................................................... 34
4.1.4
Example ..................................................................................................................... 34
4.2
Dimensional data sets......................................................................................................... 34
4.2.1
Rationale .................................................................................................................... 34
4.2.2
Description ................................................................................................................. 35
4.2.3
Detailed description ................................................................................................... 35
4.2.4
Example ..................................................................................................................... 35
4.3
Classification structures ..................................................................................................... 35
4.3.1
Rationale .................................................................................................................... 35
4.3.2
Description ................................................................................................................. 35
Date of dissemination
Version
Page
September 23 2011
1.0
4
ESSnet
COmmon Reference Environment
CORE
4.3.3
Detailed description ................................................................................................... 36
4.3.4
Example ..................................................................................................................... 36
4.4
Logging data sets ............................................................................................................... 37
4.4.1
Rationale .................................................................................................................... 37
4.4.2
Description ................................................................................................................. 37
4.4.3
Detailed description ................................................................................................... 37
4.4.4
Example ..................................................................................................................... 37
4.5
Quality data sets ................................................................................................................. 38
4.5.1
Rationale .................................................................................................................... 38
4.5.2
Description ................................................................................................................. 38
4.5.3
Detailed description ................................................................................................... 38
4.5.4
Example ..................................................................................................................... 38
Date of dissemination
Version
Page
September 23 2011
1.0
5
ESSnet
CORE
1
COmmon Reference Environment
Introduction
The purpose of this document is to define a communication protocol for the exchange of
information between a CORA service and its environment. More specifically, the protocol
describes the information elements a service receives in order to configure it, subsequently the
elements a service receives as input upon execution, and finally the elements a service offers
as output after execution. In this way, the interface between two services, or between a service
and a service execution mechanism (i.e., a run-time execution engine) is established.
The term interface in this document is used as a shortcut for communication interface, and
refers to the means of transferring information to and from statistical services. The term model
is used as a shortcut for information model, and refers to the model used to describe the
interface. Both the interface and the model form the subject of this document.
The information model is an abstract model composed of classes and relations that are meant
to be stable in time. It is meant to support the definition of a more concrete model: the
interface model, which is expected to vary in time.
This two-step approach has been chosen because it allows to reduce the maintenance burden
of a model that will inevitably be subject to change as it is confronted with the concrete reality
of managing statistical processes. By offering an abstract information model composed of
objects that should be maintenance-free, we deliver the means to define the interface model in
terms of these objects, and to maintain it by varying the use of these objects. This allows us to
transfer responsibility of the interface model to the designers of the interface, and to allow
them to further transfer this responsibility to the community of users. We do not, therefore,
deliver a model that is frozen at the end of the CORE project, but a flexible model adequate
for maintenance as and when required.
This is similar to the game of chess. While the rules of the game are simple, immutable and
void of any strategy, you need to learn how to apply them in order to play properly. In the
same way, the information model is simple and – we hope – immutable, but we will have to
agree on how to use it to implement and maintain the concrete interface model.
1.1 Structuring the interface: channels
A message can transfer various kinds of information. In order to give structure to the message
and meaning to the information transferred, the interface is subdivided into channels. Each
kind of information goes through its own channel. The available channels determine what
kind of information can be transferred to or from a service.
For an efficient use of the interface it is essential to rely on a limited number of predefined
channels. It is, however, not the object of this document to supply a complete set of
predefined channels. The model offers a means to design channels and a few example of
designed channels. It will be the task of the Work Package implementing the interface to
specify the standard set of channels (and thus the interface model).
Date of dissemination
Version
Page
September 23 2011
1.0
6
ESSnet
CORE
COmmon Reference Environment
There are four types of channels
 Data set kind
 Rule kind
 Parameter
 Business object
 Free-style argument
These channel types are described in Section 3.5.
1.2 Using the interface: messages
We consider four types of messages; two of which communicate information from a service
to its environment – a service signature message and a service output message – while the
other two communicate in the opposite direction: a service configuration message and a
service execution message. In the sections below each of the four is described in more detail.
1.2.1 Service signature message
With a service signature message, a service communicates to its environment the channels
that it supports. Channels constrain the kinds of information a service expects during
execution or during configuration, as well as the kinds of information it will produce during
execution. Information that is exchanged between a service and its environment is limited to
data sets, rules, parameters, business objects and free-style arguments, supported in
predefined channels. Paragraph 1.1 above describes the meaning and use of channels; in
Chapter 3 the structures of data sets, rules, parameters and business objects, as well as the
channels that support them, are described in more detail.
It is important to note that the information a service expects can be described only in terms of
channels supported by the service – i.e. data set kinds, rule kinds, parameter types and/or
business object types. This document will not fix or prescribe the standard data set kinds – it
will only offer a few examples, such as a micro data set, a dimensional data set, or a logging
data set, to name a few. Thus, a service may notify that, e.g., it expects one micro data set and
that it will produce one dimensional data set plus a logging data set, without specifying the
details about them. Although these details (i.e., their number of columns of data and the data
types of each of the columns) may be prescribed by a service in its service signature message,
they may also be postponed to a service configuration message (see below). In the latter case,
the environment (another service, an execution mechanism, etc.) has the opportunity to fill in
the details about a data set a service will receive or produce during execution.
Other information that can be communicated in a service signature message are expected rule
kinds, expected single valued parameter types (through the Parameter channel type) and
business object types. Actual rules and parameter values are offered to a service through a
service configuration message.
Date of dissemination
Version
Page
September 23 2011
1.0
7
ESSnet
COmmon Reference Environment
CORE
What is your signature?
I expect a fixed number of data sets of the
following kind: […] I expect an unbounded
number of rules of the following kind: […] I
expect values for the following other
parameters: […]
channels
Environment
Service A
Service signature
message
1.2.2 Service configuration message
Through a service configuration message, the environment of a service may communicate
details about data sets that will be offered to the service during execution. These details
comprise, e.g., the number of columns a data set consists of, as well as the value types for
each of these columns (see Section 3). This information must fit in the channels specified in
the service signature message (see paragraph 1.2.1).
Additionally, actual rules and parameter values are communicated through a service
configuration message. These will affect the way a service behaves during execution. Rules
and parameters must also be consistent with the channels given in the service signature
message. As an example of a parameter value, one may think of a maximum number of times
a detection-correction loop is executed in an editing service. Examples of rules are editing
rules, aggregation formulas, etc.
The data sets you’ll receive during execution
are of the following form: […] Please receive
the following rules and other parameters: […]
channels
Service A
Environment
Service configuration
message
Date of dissemination
Version
Page
September 23 2011
1.0
8
ESSnet
COmmon Reference Environment
CORE
1.2.3 Service execution message
Through a service execution message, a service is requested to execute itself and is offered a
number of data sets and business objects as input. This information must comply with the
service's signature message (i.e., must fit in the channels supported by the service’s interface)
and with the data set and business object details the service is configured with through the
(most recent) service configuration message it has received. Note that input is limited to
actual data sets and business objects only; other parameters must be received by the service
during configuration-time.
Please start execution and receive the
following data sets as arguments: […]
channels
Service A
Environment
Service execution
message
1.2.4 Service output message
A service ends its execution by sending a service output message. In it, the result of the
service execution is documented by a number of data sets which must conform to the service's
signature message. Among the output data sets may also be logging information recorded as
logging data sets (note however that this document does not prescribe the kinds of data sets a
service is expected to deliver as output but gives examples of and suggestions about these
kinds instead).
Date of dissemination
Version
Page
September 23 2011
1.0
9
ESSnet
COmmon Reference Environment
CORE
What is your output?
Please receive the following data
sets as my output: […]
channels
Service A
Environment
Service output
message
1.3 Design principles
The CORE information model has been designed in accordance with the principles below.
 Rectangular data sets: all data sets that are communicated to services, or that are
output from a service's execution are rectangular. This means that a data set
consists of rows and columns of data. Also, each column contains data of one
value type only. Data that is not rectangular in this way, must be split across
several rectangular data sets and communicated to a service in that fashion.
Though stated as a principle, it is not excluded that in future versions of the
information model the restriction to rectangular data sets is dropped. Possible
extensions of the information model may include, e.g., hierarchical data sets (see
Section 1.5).
 Strong typing: each column of data is of one value type only. In this way, a data
set is (strongly) typed. Also, rules and parameters are typed. This means that a
rule can be applied to (a row of) data only if the types of the rule and the (row of)
data match. In the same way, actual parameter values are limited to values of a
given predefined type only. Strong typing thus limits the application of rules to
data, limits the number of data sets that can be offered to a service during
execution, and limits the number of parameter values a service may expect
during configuration. In this way, a safe execution of the service is promoted.
Strong typing also applies to single-valued parameters.
 Kinds: the information model leaves open the actual form or structure of data
sets, but provides a mechanism for specifying such a form or structure instead,
through data set kinds. Data set kinds are a type of channel, and constrain the
data sets that can be transferred. Thus the information model takes a free
approach to the data that must be communicated with the four message types
described above. Actual interpretations of data set kinds (e.g., the declaration that
a micro data set consists of variables, or that a dimensional data set consists of
dimensions and measures) will have to be fully specified as part of the concrete
Date of dissemination
Version
Page
September 23 2011
1.0
10
ESSnet
COmmon Reference Environment
CORE


interface model, and are given in this document only as examples and as
suggestions for use. They are not part of the abstract information model.
The same holds for rule kinds: specific rule kinds will contribute to the definition
of the interface model, they are not part of the information model.
Free-style arguments: rules and parameters that cannot be expressed in the
format offered by the information model, can be expressed as free-style
arguments instead and presented as free-text to a service. Examples of free-style
arguments are scripts that are expressed in a tool-dependent format. It is
emphasized however that the use of free-style arguments is seen as a last resort,
i.e., if the translation to the format of rules and parameters fails. Service
communication should be approached in a tool-independent way as far as
possible. Free-Style is one of the standard parameter types, and thus belongs to
the interface model as one of the standard channels.
Business objects: modelling business objects does not belong to the scope of this
model. The model knows of the existence of business objects, but knows nothing
else about them. Such objects should be modelled under a mechanism
specifically developed for them, such as the GSIM (Generic Statistical
Information Model). It is anticipated that business objects will be stored in XML.
For this reason, the present information model gives support to the XML
standard for transferring such objects to and from services. A channel supporting
the transfer of business objects is fully defined by an XML schema or a DSD.
The objects themselves are supplied in XML.
1.4 Requirements met by the model
The table below lists the features from [1] that the information model sketched in this
document supports:
Id
Feature
Description
Comment
FT01
Platform independence
The communication interface is agnostic of all
things belonging to the domains of hardware and
operating systems.
The information model is
developed using UML
classes, which any platform
or operating system
supports.
FT02
Service orientation
The communication interface is the link that
connects services and channels information
between them.
Channelling information is
done through messages,
which the information
model describes.
FT03
Layer concept
The communication interface exposes the CORA
layer of the service it belongs to. It is, however,
agnostic of the set of layers defined in CORA.
The CORA layer (see [2]) is
taken as an attribute of the
class Service info.
Date of dissemination
Version
Page
September 23 2011
1.0
11
ESSnet
CORE
COmmon Reference Environment
FT04
Function concept
The communication interface exposes the
GSBPM phase of the service it belongs to. It is,
however, agnostic of the set of phases defined in
the GSBPM.
The GSBPM (see [3])
phase is taken as an
attribute of the class
Service info.
FT06
Identification by name
The communication interface exposes the name
of the service it belongs to.
The name of a service is an
attribute of the class
Service info.
FT07
Identification by version
The communication interface exposes the
version of the service it belongs to.
The version of a service is
an attribute of the class
Service info.
FT08
Standard metamodel
The information model comprises one standard
metamodel for all data descriptions
The class Data set kind is
introduced to support one
standard metamodel.2
FT09
Metadata transfer
The communication interface supports the
transfer of data models that are consistent with
the standard metamodel (FT08)
The class Data set
definition is introduced to
transfer data models. Each
Data set definition
corresponds to a Data set
kind.
FT10
Input data transfer
The communication interface supports the
transfer of input data into a service under control
of the corresponding data model (FT09)
The class Data set is
introduced to support the
transfer of input data. Each
Data set refers to a Data
set definition.3
FT12
Output data transfer
The communication interface supports the
transfer of output data out of a service under
control of the corresponding data model (FT09)
The Data set class is
introduced to support the
transfer of output data.
Each Data set refers to a
Data set definition.
FT13
Quality data transfer
The communication interface supports the
transfer of quality data in and out of a service
under control of the corresponding data model
(FT09)
Quality data can be
transferred by defining a
quality Data set kind
(consisting, e.g., of quality
indicators). Each Data set
that corresponds to that
kind can then be supplied
to or by the service cf.
FT10 and FT12.
FT14
Log data transfer
The communication interface supports the
transfer of log data out of a service under control
of the corresponding data model (FT15-16)
Log data can be transferred
by explicitly defining a
logging Data set kind
(consisting, e.g., of logging
indicators). Each Data set
that corresponds to this
kind can then be supplied
by the service cf. FT12.
2
The Data set definitions package (see Section 3.1) describes the standard metamodel FT08 is referring to.
3
The Data set definitions package (see Section 3.1) describes the data model FT10 is referring to.
Date of dissemination
Version
Page
September 23 2011
1.0
12
ESSnet
CORE
COmmon Reference Environment
FT17
Machine-readable
metadata
The metamodel and the data models can be
supplied in a format accessible to routines
translating data to and from the standard format
specified by the metamodel (FT08)
The UML-format in which
the Data set kind and the
Data set definition classes
are described is suitable for
translation into a machine
readable format, such as
XML.4
FT18
Business concept
recognition
The information model offers a means to
represent business concepts
Through defining suitable
Data set kinds and Column
kinds, various business
concepts can be
represented, such as the
concepts of variable,
dimension, classification
structure, etc.
FT19
Business concept transfer
The communication interface supports transfer of
business objects represented as such (FT18)
Business objects such as in
FT18 can be transferred
through the exchange of
Data set definitions
(Column definitions) and
Data sets.
FT20
Real-time parameters
The communication interface supports transfer of
rules and other building blocks of process logic
The Rules and Expressions
packages are introduced in
order to transfer rules.
FT21
Housekeeping
information: management
The information model comprises a list of
information elements needed for the execution of
services. Each of these elements can be
assigned a different priority. This list is open,
meaning that the model allows implementers
(and users, if allowed by the implementation) to
expand it.
By defining suitable
attributes (through the
Attribute class), various
housekeeping information
about Data sets can be
captured.
FT22
Housekeeping
information: transfer
The communication interface supports transfer of
housekeeping information, such as defined within
the scope of FT21.
By defining suitable
attributes (through the
Attribute class), various
housekeeping information
about Data sets can be
exchanged.
The table below lists the features from [1] that the information model sketched in this
document does not support yet:
Id
Feature
4
Description
Comment
The format that is mentioned by FT17 is itself not specified by the information model.
Date of dissemination
Version
Page
September 23 2011
1.0
13
ESSnet
COmmon Reference Environment
CORE
FT05
Concept independence
The interface model specifies a way of identifying
objects as belonging to GSIM, but is agnostic of
the content of such objects.
Since the GSIM is still in its
infancy state, this feature
will be dealt with in a future
version of the information
model.
FT11
Control data transfer
The communication interface supports the
transfer of control data into a service under
control of the corresponding data model (FT09);
control data, however, can also be supplied in
design time.
The information model
does not support the
transfer of business
process logic. The
behaviour of a service can
only be affected through
the supply of rules or other
parameters.
FT15
Log information
management
The information model offers one standard data
model for all log information.
A standard model for all log
information is out of the
scope of this document, but
should be discussed in a
separate document
involving the
representations of various
(business) concepts
through defining suitable
Data set kinds.
FT16
Variability of log
information
The information model offers a limited number of
alternative data models for log information.
Idem.
Apart from the requirements from [1], the information model has taken the implicit
requirements from [4] as an inspiration. We believe it is possible to support the processes
described in [4] with the constructs of the information model.
1.5 Possible extensions
Possible extensions of the model include the following:

The design principle of allowing rectangular data sets only may prove to be too
limited. For instance, data sets that are of a hierarchical nature do not fall into this
category. A future extension of the information model may drop the assumption of
rectangular data sets and admit other structures of data as well.

The possibilities for defining custom types are limited in this version. For instance,
there is no mechanism for defining subtypes, product types, function types and so on.
A possible extension of the model would be to define a 'Types' package that includes
these mechanisms. Note that this would affect the dependencies sketched in Section 2.

The possibilities for transferring business process logic are limited to rules and other
control parameters. The model could be extended to the transfer of process defining
constructs, such as the parallel or sequential composition of services.

Different programming languages e.g rules for SAS and rules for SPSS
Date of dissemination
Version
Page
September 23 2011
1.0
14
ESSnet
COmmon Reference Environment
CORE

ECA-model (Event, Condition, Action)

RIF (Rule Interchange Format)
http://www.w3.org/2005/rules/wiki/RIF_Working_Group.

Priority/ordering for execution of several rules
Date of dissemination
Version
Page
September 23 2011
1.0
15
ESSnet
COmmon Reference Environment
CORE
2
Package structure
2.1 Overview
The information model is organised as a structure consisting of 6 packages: Data set
definitions, Expressions, Parameters, Rules, Service info, and Messages, respectively. Each
package consists of classes and class interdependencies. These contents are outlined in
Section 3.
2.2 Class diagram
The dependencies between the packages are shown below. Note that dependencies are
transitive: the Rules package also depends on the Data set definitions package, viz. through
the Expressions package.5
Data set
definitions
Param eters
Expressions
Rules
Messages
Service info
Channels
5
Note also that the package structure will change if it is decided that a separate package will be installed
consisting of various type refining constructs (such as subtyping): the Value type class (see Section 3) will then
become a member of this separate package and, e.g., the Expressions package will lose its dependency with the
Data set definitions package.
Date of dissemination
Version
Page
September 23 2011
1.0
16
ESSnet
CORE
3
COmmon Reference Environment
Package contents
3.1 Data set definitions package
3.1.1 Overview
This package describes the way a rectangular data set - i.e., one that consists of rows and
columns of statistical data - is defined. As data sets may contain a range of different statistical
information (e.g. micro data, dimensional data, classification structures) the package contains
a mechanism for describing any kind of data set. This mechanism attaches a role (called a
column kind) to every column of data in a data set of a particular kind. Thus it is described
that, e.g., a micro data set consists of variables and that a column of data in a dimensional data
set can be either a dimension or a measure. In this way the information model can be extended
to any kind of statistical information a data set may contain. Chapter 4 of this document will
describe some data set kinds in more detail.
Data sets are typed, which means that each column of data is assigned a value type that limits
the type of values the column may contain.
3.1.2 Class diagram
Date of dissemination
Version
Page
September 23 2011
1.0
17
ESSnet
COmmon Reference Environment
CORE
3.1.3 Detailed description
Class
Attribute
Data set kind
Description
Example
Defines the overall
structure of a group of
similar data sets.
name
The name of the data set
kind. Describes the role or
type of any data set of the
kind.
micro data set
dimensional data set
classification structure
logging data set
min number of rows
max number of rows
column kinds
Minimum number of rows
expected in a data set of
this kind.
0
Maximum number of rows
expected in a data set of
this kind.
1
The set of column kinds
associated with this data
set kind. A column kind
describes the role of a
column of data within a
data set.
micro data set: {variable},
i.e., every column of data in
a micro data set is a
variable.
1
unbounded
dimensional data set:
{dimension, measure}, i.e.,
a column of data in a
dimensional data set is
either a measure or a
dimension.
classification structure:
{level}, i.e., every column in
a data set that describes a
classification structure is a
level.
logging data set: {logging
indicator}, i.e., every
column in a logging data
set is a logging indicator.
Column kind
Defines the role of a
column of data in a data
set.
name
The role name of a column
of data.
variable
dimension
measure
level
logging indicator
min number of occurrences
The minimum number of
occurrences of a column
of this kind in a data set.
0
1
Date of dissemination
Version
Page
September 23 2011
1.0
18
ESSnet
COmmon Reference Environment
CORE
max number of occurrences
allowed types
The maximum number of
occurrences of a column
of this kind in a data set.
1
The set of value types that
a column of this kind can
take.
variable: {int, float, string,
bool}
unbounded
dimension: {string}
measure: {int, float}
level: {string}
logging indicator: {int, float,
string, bool}
Data set definition
Defines and denotes the
columns of data in a
specified order in any data
set that conforms to the
data set definition.
name
The name of the data set
definition.
Average turnover by activity
(a dimensional data set)
NACE Rev. 2 (a
classification structure)
link to documentation
A hyperlink that points to
documentation about the
data set belonging to the
data set definition.
location
Indicates the location of
the data set; can be a file
location or a reference to a
data base.
data set kind
The data set kind
associated with any data
set that conforms to this
data set definition.
column definitions
An (ordered) list of column
definitions.
descriptors
A set of descriptor
definitions that help
describe the data set.
Column definition
reference period
time coverage
A denotation and definition
(including role and value
type) of a column of data
in a data set.
name
The column header of a
column of data that
conforms to the column
definition. Denotes, e.g., a
variable in a micro data
set or a dimension in a
dimensional data set.
turnover (a variable)
average turnover (a
measure)
NACE Rev. 2, second digit
(a level)
Execution time (a logging
indicator)
Date of dissemination
Version
Page
September 23 2011
1.0
19
ESSnet
COmmon Reference Environment
CORE
column id
An identifier that uniquely
identifies a column within
a set of column definitions.
column kind
The column kind
associated with the
column definition. Column
kinds are restricted to the
set of column kinds of the
data set kind associated
with the data set definition
of which the column
definition is a part.
type
Descriptor
dimension
measure
level
logging indicator
The value type of a
column of data that
conforms to the column
definition. Must match one
of the value types of the
corresponding column
kind.
Any attribute that can
serve as a descriptor for a
data set.
name
The name of the
descriptor.
descriptor type
The value type of the
descriptor.
Actual descriptor
variable
reference period
geographical coverage
Holds the value of a
descriptor.
value
The value of the
descriptor.
descriptor
Points to the definition of
the descriptor.
Value type
Denotes a set of values.
name
Name of the value type.
Int
bool
float
string
char
Data set
A rectangular organization
of data. A data set
consists of rows and
columns of data. A column
of data must be of one
value type.
Date of dissemination
Version
Page
September 23 2011
1.0
20
ESSnet
COmmon Reference Environment
CORE
data set definition
The data set definition
corresponding to the data
set. This denotes and
describes the columns of
data in the data set.
rows
The set of rows of data the
data set consists of.
actual descriptors
A set of descriptor values
that help describe the data
set.
Row
reference period = 2000Q1
geographical coverage =
NL
A row of data in a data set.
row id
An identifier that uniquely
identifies a row within a
set of rows, i.e., a data
set.
cells
The (ordered) sequence of
cells of data a row
consists of.
Cell
Holds a value of data in a
(a row in a) data set.
value
The value of data the cell
contains.
column definition
The column definition the
cell refers to. Must match
the position of the cell in a
row of data.
3.2 Expressions package
3.2.1 Overview
This package describes the way expressions are built from constants, variables, and operators
recursively applied to a list of expressions, which serve as the arguments of the operator.
Variables are not to be confused with statistical variables (cf. the previous section): a variable
in an expression is meant as a placeholder for the substitution of a value. Applied to a row of
data, an expression must be combined with a map that relates every variable in the expression
with a column of data. Such a map is described in the Rules package below.
As with data sets, expressions are typed, which means that each operator is assigned a
signature that prescribes (and limits) the type of the arguments the operator expects. Only
expressions that conform to that type may serve as an argument.
Date of dissemination
Version
Page
September 23 2011
1.0
21
ESSnet
COmmon Reference Environment
CORE
3.2.2 Class diagram
3.2.3 Detailed description
Class
Attribute
Expression
text
Constant
Description
Example
An expression is either a
constant, a variable, or the
(recursive) application of
an operator to a number of
(sub)expressions.
5
Human-readable form of
an expression.
if x<5 then x else 5
x
if-then-else(<(x,5),x,5)
x<5
An actual number or a
string of text appearing in
an expression.
value
The value of the constant.
5
constant type
The value type
(corresponding to the
value) of the constant.
Int
Variable
An unknown value within
an expression. Can be
substituted with any value
of a given value type. Not
to be confused with a
statistical variable (i.e., a
column of data in a micro
data set).
Date of dissemination
Version
Page
September 23 2011
1.0
22
ESSnet
COmmon Reference Environment
CORE
symbol
The symbol with which a
variable is denoted within
an expression.
x
variable type
The value type of the
variable. Any value that
corresponds to this value
type can be substituted for
(each occurrence of) the
variable inside an
expression.
int
Expression consisting of
an operator applied to a
number of
(sub)expressions. The
type of the subexpressions
must match the signature
of the top operator.
<(x,5)
top operator
Outermost operator of the
application.
<
arguments
An (ordered) list of
expressions which,
applied to the top
operator, complete the
application. The argument
types must correspond to
the signature of the top
operator of the application.
x, 5
Application
Operator
Describes a function that
returns a value based
upon a fixed number of
arguments of given type.
name
The name of the operator.
multiplication
subtraction
less-than
symbol
signature
Signature
return type
The symbol that is used to
denote the operator inside
an expression.
*
Collects the type of the
operator's arguments as
well as the operator's
return type.
int int  int (the signature
of multiplication and
subtraction)
Prescription of the
argument types in
combination with the
return type of an operator.
int int  int
The value type of the
return value of an
operator.
int
-
int int  bool (the signature
of less-than)
int int  bool
bool
Date of dissemination
Version
Page
September 23 2011
1.0
23
ESSnet
COmmon Reference Environment
CORE
argument types
An (ordered) list of the
types of the arguments of
an operator.
int int
3.3 Parameters package
3.3.1 Overview
Parameters are meant to supply to a service any single item of information that cannot be
offered as a data set in a natural way. Thus they serve as a means to configure the service at
configuration-time (which is explained further in the Messages package below).
As with data sets and expressions, parameters are typed, which means that the value of any
actual parameter must be of the type prescribed by the corresponding (formal) parameter.
3.3.2 Class diagram
3.3.3 Detailed description
Class
Attribute
Parameter
Description
Example
The designation of a
single value of data a
service expects.
name
The name of the
parameter.
number of iterations (e.g.,
of a detection-correction
service loop)
unknown value symbol
(e.g., the symbol a service
is expected to use as an
unknown value)
type
Actual parameter
The type of value an
associated actual
parameter can take.
int
char
A single value that
instantiates a parameter.
value
The value of the actual
parameter. Must conform
to the value type of the
parameter to which it
refers.
5
"-"
Date of dissemination
Version
Page
September 23 2011
1.0
24
ESSnet
COmmon Reference Environment
CORE
parameter
The parameter to which an
actual parameter refers.
number of iterations
unknown value symbol
3.4 Rules package
3.4.1 Overview
A rule is an expression applied to (data sets of) a certain data set definition. For an expression
to be evaluated properly when it is applied to a specific row of data, one needs to know, for
each variable that appears in the expression, to which column of data it corresponds. Upon
evaluation, a value of data in the position of the column is then substituted for the variable to
which the column corresponds. The Rules package is meant to describe the mapping between
columns of data and variables in an expression.
Since a service may expect an unbounded number of rules upon configuration (e.g., any
number of editing rules) to keep control over the form of the expressions that are expected, a
rule can be given a rule kind. A rule kind denominates and defines the format of a rule. For
instance, to express that an editing rule should always be like "if-then-else(x>5,x,5)" or "ifthen-else(x>10,x,10)" an editing rule is given the expression format "if-then-else(x>y,x,y)" of
which the variable y is substituted for any constant value to obtain an actual rule.
3.4.2 Class diagram
3.4.3 Detailed description
Class
Attribute
Rule kind
Description
Example
Describes the expression
format of a group of similar
rules.
Date of dissemination
Version
Page
September 23 2011
1.0
25
ESSnet
COmmon Reference Environment
CORE
name
The name of a rule kind.
quality inspection rule
editing rule
aggregation formula
expression format
Rule
An expression that is used
to be matched by other
expressions through a
substitution of some of its
free variables.
Expressions that can be
matched in this way are all
of the same kind.
x < y and x > z
if-then-else(x>y,x,y)
SUM(x) / COUNT(y)
An expression together
with a data set definition.
Upon execution of a
service, the expression is
evaluated when it is
applied to a row of data
from a data set that
conforms to the data set
definition.
rule kind
The rule kind associated
with the rule.
quality inspection rule
editing rule
aggregation formula
expression
x < 100 and x > 10
if-then-else(x>5,x,5)
SUM(x) / COUNT(x)
applies to
The data set definition a
rule applies to.
maps
A set of column definition
expression maps, one for
each free variable of the
rule's expression.
(x,turnover) (y,revenue)
A pair consisting of a free
variable from a rule's
expression and a column
definition from a rule's
data set definition. This
defines the way in which
an expression is evaluated
when it is applied to a data
set.
(x,turnover)
The column definition part
of the map. The value type
of the column definition
must match that of the free
variable-part of the map.
x
The variable part of the
map.
turnover
Column definition
expression map
column definition
free variable
(y,revenue)
y
revenue
Date of dissemination
Version
Page
September 23 2011
1.0
26
ESSnet
COmmon Reference Environment
CORE
3.5 Channels package
3.5.1 Overview
The purpose of the messages package is to formally confirm what was announced in the
Introduction, more specifically in Section 1.1: a channel is either a data set kind, a rule kind, a
GSIM object description, a parameter, or an expected free-style argument. Thus, a channel
object belongs to one of the five types mentioned.
3.5.2 Class diagram
Data set kind
(f rom Data set def initions)
Channel
Rule kind
(f rom Rules)
GSIM object description
(f rom Messages)
Param eter
(f rom Parameters)
Expected free-style argum ent
(f rom Messages)
3.5.3 Detailed description
Class
Attribute
Channel
Description
Example
Describes a type of
information that is send to
or received from a service.
Can be one of the
following: a data set kind,
a rules kind, a GSIM
object description, a
parameter, or an expected
free-style argument.
3.6 Messages package
3.6.1 Overview
The Messages package describes four kinds of messages (signature, configuration, execution,
output) with which a service communicates with its environment (i.e., with other services, or
with a run-time environment). These messages may be subdivided into design-time
(configuration-time) and run-time messages (execution, output). Note however that this is just
Date of dissemination
Version
Page
September 23 2011
1.0
27
ESSnet
COmmon Reference Environment
CORE
terminology, as the distinction between run time and design time may differ between one
specific case and another.
During design-time, a service communicates to its environment the arguments it expects,
either as configuration arguments (i.e., at design-time) or as run-time arguments (i.e., data sets
that are either input to or output from the service). Then the environment can configure the
service by sending a service configuration message. In it, data set definitions corresponding to
expected data set kinds, and actual parameters corresponding to expected parameters are
communicated. If needed, a service can also be configured by supplying it with a number of
rules. Finally, if a service expects any tool-specific script, it can be offered to the service as a
free-style argument in a service configuration message. Note that any argument in a message
must always refer to an expected argument.
During run-time, the service communicates with its environment solely by exchanging data
sets, either as input data set, or as output data set upon completion of the service execution.
3.6.2 Class diagram
3.6.3 Detailed description
Class
Attribute
Service signature message
Description
Example
The message a service
sends to its environment in
order to communicate the
arguments a service
Date of dissemination
Version
Page
September 23 2011
1.0
28
ESSnet
COmmon Reference Environment
CORE
Class
Attribute
Description
expects upon
configuration and
execution.
expected args
An (ordered) list of
arguments that a service
expects.
Expected argument
Example
An argument that a
service expects upon
configuration or execution.
Can be a free-style
expected argument, a data
set kind, a rule kind, a
parameter, or a data set
definition. The
corresponding arguments
that are offered to the
service at run-time or at
configuration-time then are
(in that order): a free-style
argument, a data set
definition, a (number of)
rule(s), an actual
parameter, or a data set.
isOutputArgument
Indicates that an expected
argument is a the result of
a service execution.
Applies only to a data set
kind or a data set
definition.
isRun-timeArgument
Indicates that an argument
is expected at run-time
instead of configurationtime.
multiplicity
Indicates whether or not
an unbounded number of
arguments is expected.
Applies only to a rule kind.
Service configuration
message
1
unbounded
The message the
environment sends to a
service in order to
configure the service.
design time args
Design-time argument
An (ordered) list of designtime arguments through
which a service is
configured.
An argument that is
offered to the service
through a service
configuration message.
Can be any one of the
following: data set
Date of dissemination
Version
Page
September 23 2011
1.0
29
ESSnet
COmmon Reference Environment
CORE
Class
Attribute
Service execution message
Description
definition, rule, actual
parameter, a free-style
argument, or a data set.
Design time arguments
offered to a service must
match the arguments that
service expects, cf. its
service signature
message.
Example
The message the
environment sends to a
service upon its execution.
run time args
An (ordered) list of runtime arguments a service
is offered upon execution.
Run-time argument
An argument that is
offered to a service
through a service
execution message. Is
always a data set. Must
refer to a run-time
argument a service
expects, cf. its service
signature message.
Service output message
The message a service
sends to its environment
upon completion of the
service execution.
output args
An (ordered) list of output
arguments a service offers
upon completion of its
execution.
Output argument
An argument a service
offers to its environment
upon completion of its
execution. Is always a
data set. Must refer to an
output argument a service
expects, cf. its service
signature message.
Argument
Generalization of an
output argument, a runtime argument, a designtime argument or an
expected argument.
refers to
The expected argument a
design-time, run-time or
output argument refers to,
or the design-time
argument a run-time
argument or output
Date of dissemination
Version
Page
September 23 2011
1.0
30
ESSnet
COmmon Reference Environment
CORE
Class
Attribute
Expected free-style
argument
Description
argument refers to.
Example
Any tool-specific script,
rule or other parameter a
service expects in order to
be executed.
name
The name of the script,
rule or parameter.
documentation
Human-readable
documentation of the
purpose of the script.
Free-style argument
The actual tool-specific
script, rule or parameter a
service is offered at
design-time.
name
The name of the script,
rule or parameter.
text
The actual script, rule or
parameter text.
GSIM object description
Description, possibly in the
form of an XML schema or
DSD, of any object the
GSIM prescribes.
GSIM object
Object, possible
communicated through an
XML instance, described
by the GSIM.
3.7 Service info package
3.7.1 Overview
The Service class in the Service info package collects any information regarding a service.
3.7.2 Class diagram
3.7.3 Detailed description
Class
Attribute
Service
Description
Example
Reusable generic solution
for a step in a statistical
process.
Date of dissemination
Version
Page
September 23 2011
1.0
31
ESSnet
COmmon Reference Environment
CORE
name
The name of the service.
Aggregation service
Data correction service
CORA vertical axis
The layer on the vertical
axis of the CORA grid the
service is catalogued into.
Figures layer
Time series layer
Statistic layer
Population layer
Unit layer
Variable layer
Value layer
CORA horizontal axis
The GSBPM process
phase on the horizontal
axis of the CORA grid the
service is catalogued into.
Specify Needs
Design
Build
Collect
Process
Analyse
Disseminate
Archive
Evaluate
version
A version number
assigned to the service.
documentation
A human-readable
documentation of the
service's function and
purpose.
Date of dissemination
Version
Page
September 23 2011
1.0
32
ESSnet
COmmon Reference Environment
CORE
4
Statistical interpretation of the information model
This section is a first step towards the definition of the interface model. It must be read as a
recommendation for interpreting the information model in such a way that it is suitable for
communicating typical statistical information objects through channels. Thus, specifically,
this section is not normative (as the previous section is) but is meant to show how specific
interpretations of data set kinds can serve statistical purposes by defining standard channels.
As such, it is very modest. Firstly, only a small number of data set kinds are elaborated; rule
kinds and value types are not the subject of this section, nor any other channel types.
Secondly, data set definitions are given only as an example. This means that, basically, the
recommendation stretches only to the extent that a) a specific channel (such as a dimensional
data set kind) can be identified and b) the column kinds in it are identified (such as a
dimension and a measure in the case of a dimensional data set).
Still, we feel that identifying data set kinds and column kinds in this way serves at least three
purposes:

To avoid activities that are meant to convert a piece of information from one form into
another. For instance, if it is agreed that classification structures will be communicated
only through the classification structure data set kind, the conversion to and from other
data structures (such as hierarchical ones) can be avoided. This stimulates the
concentration of processing activities taking place within statistical services, instead of
allowing processing activities to take place in between services.

To offer to services more semantics about (statistical) data, in such a way that a
service can take specific action based upon the knowledge of the structure of a specific
data set kind. For instance, knowing that a dimensional data set consists of dimensions
and measures allows a service to treat a dimension differently from a measure (as, e.g.,
other than a measure, the result of deleting a dimension from a dimensional data set
may not be well-defined, since it may corrupt the dimensional structure).

To give examples of how the interface can be structured in terms of channels.
The following subsections describe the data set kinds of micro data sets, dimensional data
sets, classification structures, logging data sets, and quality data sets respectively.
4.1 Micro data sets
4.1.1 Rationale
The micro data set is proposed as a possible interpretation of the data set kind channel in order
to communicate micro data to services.
Date of dissemination
Version
Page
September 23 2011
1.0
33
ESSnet
COmmon Reference Environment
CORE
4.1.2 Description
A column of data in a rectangular presentation of a micro data set can be classified according
to at least the following column kinds: variable, weight, or quality indicator. This means that
technical implementations of the information model are capable of identifying variables,
weights and quality indicators when interpreting a column of data in a micro data set. This
may prove beneficial, for instance when an aggregation service needs to know what column
of data serves as a weight in calculating weighted averages. Future implementations of the
model may indicate that the three above mentioned column kinds are not discriminating
enough; they may, e.g., prove a need for identifying numerical or categorical variables among
the columns of a micro data set.
4.1.3 Detailed description
Column kind
Description
Allowed types
Variable
The result of measuring a property of a single
entity, for a group of such entities (i.e., a
population).
int, float, bool, string, char
Weight
Indicates the relative significance of a variable's
measured value for each of the entities in a
sample.
float
Quality indicator
Gives information about the quality of a variable
(e.g., its reliability) or stores information about
the result of a processing step.
int, float, bool, string, char
4.1.4 Example
The micro data set below reports the activity, size class (e.g., according to number of
employees) and the turnover measured on a sample of businesses. It also reports a weight for
the turnover variable and it indicates whether or not a record of data has been edited (in a
previous processing step). Variables are coloured yellow, the weight is coloured orange, and
the quality indicator is coloured blue.
Business id
Activity
Size class
Turnover
Weight
Edited (y/n)
1
Manufacturing
5
3.000
0.04
y
2
Construction
7
12.000
0.01
n
…
…
…
…
…
…
4.2 Dimensional data sets
4.2.1 Rationale
The dimensional data set is proposed as a possible interpretation of the data set kind channel
in order to communicate dimensional data to services and to distinguish them from micro
data.
Date of dissemination
Version
Page
September 23 2011
1.0
34
ESSnet
COmmon Reference Environment
CORE
4.2.2 Description
Dimensional data is distinguished from micro data mainly by its dimensional structure. The
difference is seen by the observation that all dimensions that appear in a dimensional data set
are realized during design time, while the corresponding measures are realized by statistical
activities (usually involving aggregation). In other words, determining the crossings of
categories for which a dimensional data set will report certain measures is a design-time
activity; micro data on the other hand is always the result of a statistical activity (usually
involving measurement). A dimensional data set consists of a number of dimensions together
with a number of measures.
4.2.3 Detailed description
Column kind
Description
Allowed types
Dimension
A range of categories that, possibly in
combination with categories from a different
dimension, determine the subclasses of entities
in a population about which statistics are
reported.
string
Measure
A column of data holding statistics about the
subclasses of entities in a population that are
determined by the dimensional structure of the
dimensional data set.
int, float
4.2.4 Example
Activity
Size class
Total turnover
Manufacturing
5
3.000.000
Manufacturing
7
22.000.000
…
…
…
The example above reports the total turnover produced by the class of businesses that is
determined by the combination of their activities and size classes. Dimensions are coloured
yellow and the single measure is coloured orange.
4.3 Classification structures
4.3.1 Rationale
The classification structure is proposed as a possible interpretation of the data set kind
channel in order to communicate classification structures to services. In this way, a service
will know that a certain data set is meant to act as a classification structure (and not as, e.g.,
micro data).
4.3.2 Description
A classification structure is a set of categories together with a structure. In most cases, but not
all, this is of a hierarchical nature. In order to communicate this structure (without the need of
Date of dissemination
Version
Page
September 23 2011
1.0
35
ESSnet
COmmon Reference Environment
CORE
extending the model with tailor made classification objects) it is proposed that it is described
using a rectangular data set. In it, the parent-child relationships between categories in different
levels are described row-wise (in a hierarchical structure, a row of data represents one tree
traversal from root to leaf); while a column represents all the categories originating from one
level in the classification structure.
4.3.3 Detailed description
Column kind
Description
Allowed types
Level
A set of categories from a classification structure.
These categories are mutually exclusive and
span the whole range of the classification.
string
4.3.4 Example
Consider the classification structure below, which is a small part of the NACE, Rev.2 (2008)
classification.
C Manufacturing
11 Manufacture of beverages
13 Manufacture of textiles
F Construction
42 Civil engineering
42.1 Construction of roads and railways
42.2 Construction of utility projects
41 Construction of buildings
H Transportation and storage
50 Water transport
50.3 Inland passenger water transport
50.4 Inland freight water transport
51 Air transport
51.1 Passenger air transport
51.2 Freight air transport and space transport
51.21 Freight air transport
51.22 Space transport
The (hierarchical) interdependencies between the categories in the structure can be captured
by the rectangular data set below, albeit at the cost of some redundancy. This proceeds in the
following intuitive way for the classification structure above:
Level 1
Level 2
Level 3
Level 4
Level 5
C Manufacturing
11 Manufacture of
beverages
…
…
…
C Manufacturing
13 Manufacture of
textiles
…
…
…
Date of dissemination
Version
Page
September 23 2011
1.0
36
ESSnet
COmmon Reference Environment
CORE
F Construction
42 Civil Engineering
42.1 Construction of
roads and railways
…
…
…
…
…
…
…
H Transportation
and storage
51 Air transport
51.2 Freight air
transport and space
transport
51.21 Freight air
transport
…
H Transportation
and storage
51 Air transport
51.2 Freight air
transport and space
transport
51.22 Space
transport
…
The redundancy involved in this representation is evident from the last two records of the data
set above. Some of the redundancy can of course be reduced by recording only the category
codes instead of the category titles as well. In the above data set, every column coloured
yellow (hence every column) is a level.
4.4 Logging data sets
4.4.1 Rationale
The need for identifying certain data sets as logging data sets is formulated in FT14 of the
requirements.
4.4.2 Description
A logging data set is a means for a service to record its activities and to inform interested
parties about their outcome. Therefore, logging data sets will appear naturally as the output of
a service, rather than that they are offered as input to a service. The columns in a logging data
set are all of the same kind, as there are no natural processing activities involving logging data
sets that may require further differentiation.
4.4.3 Detailed description
Column kind
Description
Allowed types
Logging indicator
Any data that represents a piece of logging
information.
int, float, bool, string, char
4.4.4 Example
Time stamp
Service id
Action
Result
10:32
322
Obtaining editing rule
Succeeded
10:34
322
Interpreting editing rule
Failed
…
…
…
…
A typical logging data set is shown above: in it, a service may keep a list of activities that it
has executed, the (starting) time of execution, as well as the result of execution. Note that the
data set may be shared across several services; in that way one logging file can be kept for a
number of processing steps. Each yellow coloured column (therefore, each column) is a
logging indicator.
Date of dissemination
Version
Page
September 23 2011
1.0
37
ESSnet
COmmon Reference Environment
CORE
4.5 Quality data sets
4.5.1 Rationale
The need for identifying certain data sets as containing quality data is formulated in FT13 of
the requirements.
4.5.2 Description
A quality data set contains quality judgements of other data sets, typically containing
statistical data. Each column of data in a quality data set acts as a quality indicator (there are
no natural processing activities involving quality data sets that may require further
differentiation).
4.5.3 Detailed description
Column kind
Description
Allowed types
Quality indicator
Any data that can be interpreted as a quality
judgement of another piece of data.
int, float, bool, string, char
4.5.4 Example
Data set id
Number of records
Number of records edited
Item non response percentage
1023
104.598
2.459
34
1024
20.804
978
41
…
…
…
…
The quality data set considered above records a number of quality indicators for certain data
sets: their number of records, the number of records that are edited, and the non response
percentage (the ratio between fields that are empty and the total number of fields).
Date of dissemination
Version
Page
September 23 2011
1.0
38
ESSnet
CORE
5
COmmon Reference Environment
References
[1] Requirements for the CORE Information Model (CORE Deliverable 2.1; Release
candidate); 07-06-2011.
[2] Definition of the Layered Common Reference Architecture (CORA Deliverable 3.1; Final
version); 28-05-2010
[3] Generic Statistical Business Process Model (Joint UNECE/Eurostat/OECD Work Session
on Statistical Metadata (METIS); Version 4.0); April 2009
[4] A process scenario assisting the requirements analysis of the CORE information model
(CORE Deliverable 2.1; Preliminary document); 24-01-2011.
Date of dissemination
Version
Page
September 23 2011
1.0
39