Preparing Census Data for Analysis Steps to Follow in Data

Preparing Census Data for Analysis
Needs Assessment Conference on Census Analysis
Bavaro, Dominican Republic, 14 – 16 July 2010
Carlos Ellis, Census Regional Advisor, SRO-Jamaica
1
Steps to Follow in Data Preparation
During census planning stages:
•
Produce list of table expected to be required for
analysis
•
Produce detailed dummy tables
•
Verify that variables and categories are included in
the census questionnaire
•
As part of the Pilot census produce expected
tabulations for analysis
•
Look for eventual gaps in printed information
•
Adjust census questionnaire content
2
Data Quality Concerns
•
Data quality limit is given by the quality of the
information collected at the source
•
How to improve quality:
¾ Avoid informant fatigue
¾ Provide better training for enumerators
¾ Avoid long census questionnaires
¾ Be creative to maintain the attention during
interview
¾ Develop a clever publicity campaign and
properly sensitize the population
3
Data Quality Verification
M a n ual co din g/e ditin g
C o u n tr ie s
40
35
30
25
20
15
10
5
0
3-4
1-2
7-8
5-6
11-12
15-16
20-25
9-10
13-14
18-19
26-42
M o n th s
4
Relation of census Tasks
Editing/coding & data entry
Relation of activities duration
46.45%
32.90%
20.65%
Coding/Editing shorter than data entry
Coding/Editing equal to data entry
Coding/Editing longer than data entry
5
Data editing
•
Manual editing
¾ Source of additional errors
¾ Changes are permanent
¾ No statistics of imputations
¾ Very difficult to “undo” manually introduced
modifications
¾ Editing rules are interpreted differently by
each operator
¾ Long and costly procedure (staff and funds)
¾ Use only when visual inspection is needed
6
Automatic editing
•
Apply editing rules in a consistent manner
•
Leave a trace that can be easily undone
•
Need to develop a comprehensive editing
manual
•
Faster to implement than manual editing
•
Produce statistics
•
Allows adjustments of rules
•
Publish census results in a timely manner
•
Less costly than manual editing
7
Quality Control
•
Original data should be kept untouched as
much as possible
•
Data Editing may not improve the data quality
•
Edited data produce consistent tables making
analysis easier
•
Over editing is also a problem
•
False sense of security with “ultra” clean data
•
Quality control should be performed at every
level of the editing process by independent staff
8
Measuring the Editing Impact
•
Demographers and analysts should be informed of
editing results
•
Original census file should be available for
consultation.
•
A census document should be produced including all
editing procedures
•
Imputations statistics at variable and record level
should be made available to researchers
•
Decisions should be made if percentage of errors are
too high
9
Tools to evaluate data quality
Cases
%
1
56
Description
0 h270: Apartment with too many rooms
0 h275: Too many rooms for this type of house
868 0.3 h280: Too many rooms for this type of house
135 0.1 h285: rooms = 0 but bedrooms > 0
5457 2.1 h290: total number of rooms out of range
419 0.2 h295: Total bedrooms > total rooms
4695 1.8 h300: Bedrooms are blank
6456 2.4 h305: Kitchen code out of range
10
At Record level
Cases
%
185138 69.9
39051 14.7
21689
8.2
7044
2.7
3102
1.2
1460
0.6
1275
0.5
6236
2.4
Description
households with
households with
households with
households with
households with
households with
households with
households with
0 error...
1 error...
2 errors...
3 errors...
4 errors...
5 errors...
6 errors...
7 & + errors.
11
At variable level
Imputed Item LIGHTENING - all occurrences
Categories
Freq
CumFreq
%
Cum %
1 Electricity
6768
6768
89.4
89.4
2 Kerosene
293
7061
3.9
93.3
3 Candles
418
7479
5.5
98.8
20
8
64
7499
7507
7571
0.3
0.1
0.8
99
99.2
100
7571
7571
100
100
4 Solar Panel
5 Car Battery
6 Other
TOTAL
12
Wish List
•
Timely census results
•
Reliable data
•
Improve data availability
¾ Micro Data Census Database available in
Internet (at least at municipality level)
¾ Train users how to produce tables
•
Increase data harmonization between countries
•
Make editing statistics available to researchers
13