Retaining output quality whilst reducing validation costs in

Tool for Assessing Impact of
Changing Editing Rules On
Cost & Quality
Alaa Al-Hamad, Begoña Martín, Gary Brown
Processing, Editing & Imputation Branch
Business Surveys
1. Overview
•
•
•
•
•
•
Data Editing in the ONS
Error Detection Rules Problems
Surveys Managers Dilemma
Proposed Tool
Tool illustration & output
Conclusion and Further Work
2. Editing in the ONS
A costly component of the data cleaning process, in the
ONS, is data editing
Data Editing is defined as
•
An activity aimed at detecting and correcting errors in
data – ONS Glossary
In practice this involves:
•
the detection of error suspect data (using Editing Rules)
Ex. Fail if A + B ‘>‘ (estimated parameter)
•
Verification/correction of error suspect data from source
3. Detection Rules Problems
If rule parameters are too conservative
•
•
•
increased response burden (unnecessary recontacts)
reduced data quality (over-validation errors and biases)
costly in terms of staff & resources
If rule parameters are too liberal
•
•
•
•
Allows uncorrected errors through
reduced data quality
costly in terms of reputation
less costly in terms of staff & resources
4. Surveys Managers Dilemma
When managers are asked to achieve savings
‘Savings vs Quality Impact’
•
An easy way to make quick savings is to loosen the
rules parameters so that less data will be edited
The challenge is:
•
•
Where to stop.
What impact will such action have on the estimates?
Remember
Quality loss is not defined solely by number of error failure
but also by the size of the error
5. Proposed Tool
Ideally what is required is a dynamic routine for editing rules
parameters that is applicable to all business surveys and:
•
•
•
•
offers a choice of different quality measurement criteria
considers all editing rules simultaneously
outputs proposed changes to parameters
outputs savings and quality loss per changed rule and in total
A dynamic routine has not yet been developed so we have
pursued a pragmatic solution with the same criteria
6. Suitable Measurements
A Measure of Savings:
Savings = Number of records no longer require editing
A measure of impact:
Exact impact on final estimates is
•
•
•
difficult to calculate
time consuming
costly
Instead, use relative change =
 w( X  X
w X
Before
)
After
Before
•
where X = a response before and after parameter change.
w = a calibration weight.
7. Routine illustration
Existing Rules
Pass
No error
Fail
Error
B*
7. Routine illustration
Existing Rules
Pass
Loosen Rules
Pass
Pass
A
No error
Fail
Fail
Error
B*
Savings
# (A + B)
Pass
B
Fail
Errors
missed
# (B)
8. Example of Rules Changes
fail if So >£199K and 100 
Rule 1
Alter
Gate 1
Rule 2
Sc  So
So
>40
Alter
Gate 2
fail if Sc >£199K and 100 
So  Sc
Sc
>40
Alter
Gate 3
Rule 3
fail if Sc,t-1was returned and Sc,t-1  So,t >£5K
9. Routine Results
Rules
Gate1
Gate2
Routine Output
Gate3
Savings
Errors
Missed
Relative
Change (%)
600
40
10
111
77
0.56
600
40
50
205
171
1
600
40
40
192
158
1.28
600
40
20
160
126
1.32
250
40
100
243
209
2.96
300
40
100
243
209
2.96
600
40
100
243
209
2.96
600
30
200
274
240
3.93
600
40
200
274
240
3.93
600
50
200
274
240
3.93
10. Conclusions
•
Often changes to validation rules to achieve saving
are made in isolation and without consideration of the
impact of these changes on the quality of the survey
output
•
In this work we are offering a simple but effective
decision support tool
– to quantify savings & loss in quality resulting from changing
editing rules
– help managers identify the editing rules that have the most
impact on quality
- Identify the parameters that minimise quality loss given set
savings, and vice versa
11. Further Work
Other elements of further work
• Make the routine more dynamic
• Enhancing the impact measure
• Investigating varying the parameters by domains (eg
Standard Industrial Classification (SIC), employment
sizeband)
• Apply the routine to other surveys
12. Questions
Over to you!