Russell Marks Greenbase Data Quality – Whats it all about

Data Quality
What’s all that about?
1
A Recorded Value
Water Delivered = 2,670 KiloLitres
Question:
“How good is this value?”
Answer:
“I don’t know”
Dealing with the Question
The typical response:
• We need a quality coding schema.
Example of a Coding Schema
V – Valid Value
D – Doubtful Value
E – Estimated Value
M – Missing Value
Water Delivered (KL) = 2,670 V
Which means the value 2,670 KL is Valid
(really ????)
Meaningfulness
Textual coding schema are meaningful
Only when accompanied by
A definition for each code.
They are not intrinsically meaningful.
Meaningfulness (cont …)
Even then, there is no guarantee of
consistency in application.
Requires judgement.
Is subjective.
Deriving Data
Lets look at deriving values.
Assume we have two values:
2.670 ‘V’ quality code
1.571 ‘D’ quality code
4.241 ?
What code do we give the total?
NGER Criteria Schema
Generally speaking
A
– Derived from Invoices
AA – Indirect measure at point of consumption
AAA – Direct measure at point of consumption
BBB – Estimated
NGER Coding Schema
Also
A
– Commercial Transaction
AA – Commercial Transaction
AAA – Commercial or non commercial
BBB – Non commercial Transaction
NGER Coding Schema
Also default uncertainties for fuel
Solid Fuel
Liquid Fuel
Gaseous Fuel
A
±2.5%
±1.5%
±1.5%
AA
±2.5%
±1.5%
±1.5%
AAA
±1.5%
±1.5%
±1.5%
BBB
±7.5%
±7.5%
±7.5%
NGER Coding Schema
Also
A
– Calibrated devices
AA – Calibrated devices
AAA – Calibrated devices
BBB – Un-Calibrated devices
Reporting NGER Data
Lets look at NGER fuel data.
We have total diesel invoiced = 2,670 KL
i.e. 2,670
‘A’ criterion
Reporting NGER Data (cont …)
We need to split the total diesel into :
• Electricity generation
• Transport
• Everything else
834 KL
65 KL
1,471 KL
So we use on-site bowser records.
Bowsers are un-calibrated, non commercial
transactions
Reporting NGER Data (cont …)
We end up with:
• Electricity generation
• Transport
• Everything else
• Total Invoiced
834 KL ‘BBB’ criterion
65 KL ‘BBB’ criterion
1,471 KL ‘BBB’ criterion
2,370 KL ‘A’ criterion
We report the ‘BBB’ numbers (i.e. ±7.5%)
EERS adds up the total and allocates ‘BBB’
NGER Criteria Attributes
Criteria codes encapsulate 5 attributes:
i.
Data collection method (Invoices);
ii. Data derivation method (Estimated);
iii. Transaction type (Commercial);
iv. Default uncertainty of data item;
v. Calibration of measuring devices.
These can be contradictory.
Coding Schema Rules
General rules for a proper schema:
i.
Only one attribute per code;
ii. Definitions must be unambiguous;
iii. Users must be able to interpret a coding
schema with a minimum of training;
iv. Uncertainty should be separate from the
coding schema.
Coding Schema Rules
The two schema presented do not
conform to the rules for a proper schema
Consequently confusion exists and
Debate ensues.