PPT - unece

A generic tool to assess impact of
changing edit rules in a business
survey – SNOWDON-X
Pedro Luis do Nascimento Silva
Robert Bucknall
Ping Zong
Alaa Al-Hamad
Business survey editing in the ONS
• Uses complex sets of edit rules to:
– Check returned questionnaires (records)
– Locate suspicious or unacceptable responses
– Support data cleaning operations
• Edit sets are complex because they may:
–
–
–
–
Involve a large number of survey forms and variables
Contain a large number of edits
Define complex acceptance / rejection regions
Depend on a large number of tolerance parameters
Editing costs are high
• The estimated cost of editing is over 40% of the
survey process budget
• Edits cause large numbers of record failures
• Edit failures are mainly dealt with by means of
manual follow-up, re-contacting respondents
Aim of paper
• Describe a generic tool developed to assess the
potential impact of changing the edits in any specified
business survey
• Present example of application of the tool
(SNOWDON-X) to large scale annual business
survey
Edit revision strategies for efficiency saving
• Filtering or sub-setting
– Comprises introducing a record filter which selects the
records to be submitted to the full set of edits
• Gate widening
– Consists of revising the tolerance parameters (gates) in
individual edit rules, such that flagging of suspicious records
for revision is less frequent than with previously used values
• Edit deletion
– Consists of simply discarding some of the edits previously
used to flag suspicious records
SNOWDON tool
• A SAS program developed first by Al-Hamad, Martín
and Brown 2006
• Developed to enable informed decision making when
revising business survey edits
• Aims to “… help survey managers evaluate what
savings can be achieved, at what cost to output
quality, across many alternative permutations of
editing rule parameters.”
• Limited to single variable survey, where only ‘gatewidening’ was considered
SNOWDON-X tool
• Extended funcionality when compared to SNOWDON
• Uses SAS IML language for increased performance
• Can handle all three edit revision strategies
• Can handle multivariate surveys
• Provides a wealth of summary indicators relating to:
– Expected savings achievable by edit revision
– Expected bias to survey results, both overall and per variable
– Information on performance of individual edit rules / variables
• Simple to run, once data have been properly
organised
Basic scenario
•
Previous survey data available in two versions
– Unedited (raw) – at point of capture or prior to any editing
– Edited (clean) – at point of publication or after all editing
•
Edit rules used to clean previous survey data are
known
•
Key idea of SNOWDON-X tool
1. Increase tolerance of some edits (or delete or introduce
filter if necessary)
2. Calculate indicators of impact of changes to edits
3. Repeat 1. and 2. until expected savings achieve specified
level or quality measures reveal unacceptable bias
Key assumptions behind approach
1. Future survey edition will behave similarly to
previous survey
2. Edited data from previous survey edition are ‘clean’
or error free
3. Changes to ‘raw’ data in previous survey edition
were due to error correction, i.e., any values
changed between capture and final were ‘wrong’
4. Once a record is flagged for clerical revision, all
errors it contains will be located and corrected
What is required to run SNOWDON-X?
SNOWDON-X
Previous period unedited data
Previous period edited data
Original edits
Revised edits
Link between edits and variables
What is the output of SNOWDON-X?
SNOWDON-X
Descriptive indicators for the data set
under analysis
Indicators about individual variables
and edits
Indicators of impact due to changes
to edit rules
Modified ‘output’ survey data set
Core indicators
Indicator
Total number of records
Number of records failing at least one edit rule
Proportion of records failing at least one edit rule
Expected savings in number of records to be edited
Missed error rate
Average relative absolute global bias (RAGB) for all variables
involved in edits
Maximum RAGB for all variables involved in edits
Overall hit rate, i.e. the proportion of times that fields were
changed during validation when flagged by edits
Overall false hit rate, i.e. the proportion of times that fields were
flagged by edits but were not changed after validation
Before
After
edit
edit
revision revision
-----
How to target edits for revision
1. Select most commonly used form type
2. Select edit failing largest proportion of records
within each form type
3. Relax edit parameters to reduce proportion of failed
records while keeping bias low
4. Repeat 2. and 3. for each form type until further
savings are minimal or bias increases above
specified threshold
5. Repeat for all relevant form types
Original
Test
Revised
TestNo2001
TestNo3126
TestNo1160
TestNo1159
TestNo1143
TestNo1181
TestNo1114
TestNo1112
TestNo1134
TestNo1125
TestNo1141
TestNo1111
TestNo1154
TestNo1153
TestNo1190
TestNo1113
TestNo1189
TestNo1133
TestNo1123
TestNo1145
TestNo1120
TestNo1172
TestNo1173
TestNo3125
TestNo1131
TestNo1148
TestNo1119
Number of failures
ABI/2 (Retail questionnaire) – Number of
failing records on original and revised edits
Validation failures - Questionnaire RT205
1600
1400
1200
1000
800
600
400
200
0
Results - applying SNOWDON-X to ABI/2
(Retail questionnaire)
Indicator
Total number of records
Number of records failing at least one edit rule
Proportion of records failing at least one edit rule
Expected savings in number of records to be edited
Missed error rate
Average relative absolute global bias (RAGB) for all variables
involved in edits
Maximum RAGB for all variables involved in edits
Overall hit rate, i.e. the proportion of times that fields were
changed during validation when flagged by edits
Overall false hit rate, i.e. the proportion of times that fields were
flagged by edits but were not changed after validation
Before
After
edit
edit
revision revision
3,809
3,809
3,107
2,934
81.57
77.03
-173
-3.6
--
0.05
--
0.21
38.01
38.23
61.99
61.77
Results from applying SNOWDON-X to ABI/2
Number of respondents
failing validation
Maximum
relative bias
(over all
questions)
Number of
respondents
Original
edit rules
Revised
edit rules
Savings
(number of
questionnaires)
Catering
1,712
606
548
58 (9.6%)
0.65%
Retail
3,809
3,107
2,934
173 (5.6%)
0.21%
Motor Trades
1,716
810
720
90 (11.1%)
0.31%
Service Trades
5,108
1,162
1,095
67 (5.8%)
0.04%
Wholesale
3,471
1,815
1,705
110 (6.1%)
0.29%
Property
1,062
363
350
13 (3.6%)
0.05%
Production &
Construction
5,826
2,348
2,242
106 (4.5%)
0.53%
Sector
(questionnaire)
Results summary
• Overall expected saving for ABI/2 ≈ 6% of previously
edited records
• Largest expected bias occurs in Catering sector
(0.65%) where a saving of 58 (9.6%) records was
made
• Highest expected saving was made in Motor Trades
sector (11.1%), with an expected bias of 0.31%
Conclusions
• Generic tool developed to assist edit revision
– Successfully applied to ABI2
– Currently being applied to two monthly surveys
• SNOWDON-X tool enables focus on edit revision, not
programming for calculating quality and savings
indicators
• Further development required for:
– Impact on standard error estimates
– Improved usability