The Partition Family

NLSCY – Non-response
Non-response
There are various reasons why there is
non-response to a survey
Some related to the survey process
Timing
Poor frame information
Interviewer or field errors
Some related to circumstances
Weather
Language issues
Difficulty in tracing individuals
Others related to respondents
Unwillingness to participate
Unable to participate
Variety of non-reponse
Partial non-response (item)
Some individual questions were not answered
Some individual questions were not asked
Partial non-response (component)
The NLSCY is sectioned into different groups of
questions related to various topics; an entire
section may be missing.
Variety of non-reponse
Total non-response
No information is collected or
Insufficient information is collected
Wave non-response
Where information about a respondent is
available but not for every cycle of the survey
due to total non-response in a given cycle
Follow-up strategy differs upon the cycle of
introduction in the survey
Dealing with TOTAL non-response
Total non-response is measured and
corrected in the NLSCY at Statistics
Canada
Significant variables which affect total nonresponse are identified
All the weights are adjusted to correct for the
total non-response
However, wave non-response is still
present for longitudinal analysis
Adjustments for total non-response
Prairies
85.2
CINHD05A
Ratio revenu/SFR
1
DISFUNC
Disfonctionalité
0
CEDCQ14D
Rend. scolaire
???-1-2
61.3
CINHD05A
Ratio revenu/SFR
2
DISFUNC
Disfonctionalité
1-99
85.2
CEDCQ14D
Rend. scolaire
3-4-5-96
75.1
CSDPD05B
Langue parlée
1-2
CDMPD06A
Abs./Prés. conjoint
1
81.3
CSDPD05B
Langue parlée
3-4-7-9
61.9
CDMPD06A
Abs./Prés. conjoint
2-9
78.9
CINHD05A
Ratio revenu/SFR
3
MOVE
Déménagement
0
85.2
MOVE
Déménagement
96-99
76.3
CINHD05A
Ratio revenu/SFR
4
CEDPD02
Educ. PCM
???-1-2
81.0
CEDPD02
Educ. PCM
3-4
93.4
CINHD05A
Ratio revenu/SFR
5
CINHD05A
Ratio revenu/SFR
6
CINHD05A
Ratio revenu/SFR
99
Voir
Diagramme 2
Voir
Diagramme 3
Voir
Diagramme 2
Adjustments (cont’d)
Diagramme 2
CINHD05A
Ratio revenu/SFR
5
CEDPD02
Educ. PCM
???-1-2
66.1
CEDPD02
Educ. PCM
3
86.0
CINHD05A
Ratio revenu/SFR
6
CEDPD02
Educ. PCM
4
85.0
Voir
Diagramme 3
CINHD05A
Ratio revenu/SFR
99
CGEHBD04
Urbain/rural
???
64.1
CGEHBD04
Urbain/rural
1-2-3-4-5
93.3
Adjustments (cont’d)
GROUP
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
FREQ RESP.
RATE
89
117
79
53
50
54
52
54
63
57
83
57
74
82
50
55
107
50
98
68
68
51
91
63
95
105
137
NONRESP
RATE
61.4%
38.6%
75.2%
24.8%
85.2%
14.8%
81.3%
18.7%
78.8%
21.2%
61.8%
38.2%
85.1%
14.9%
76.0%
24.0%
81.1%
18.9%
85.9%
14.1%
66.3%
33.7%
85.9%
14.1%
85.1%
14.9%
78.2%
21.8%
63.6%
36.4%
91.6%
8.4%
96.2%
3.8%
88.1%
11.9%
97.3%
2.7%
92.6%
7.4%
79.9%
20.1%
72.3%
27.7%
94.0%
6.0%
95.4%
4.6%
94.5%
5.5%
86.5%
13.5%
85.4%
14.6%
GROUP
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
TOTAL
FREQ RESP.
RATE
60
74
65
56
51
82
65
75
56
55
62
59
96
50
56
169
51
56
58
52
63
76
94
65
55
72
3775
98.7%
92.5%
91.0%
86.7%
74.4%
92.1%
89.9%
89.3%
96.8%
79.7%
99.1%
86.6%
95.4%
90.5%
82.3%
81.5%
94.4%
71.3%
95.4%
94.0%
86.7%
93.8%
99.1%
63.0%
64.3%
93.3%
85.2%
NON-RESP
RATE
1.3%
7.5%
9.0%
14.3%
25.6%
7.9%
10.1%
10.7%
3.2%
20.3%
0.9%
13.4%
4.6%
9.5%
17.7%
18.5%
5.6%
28.7%
4.6%
6.0%
13.3%
6.2%
0.9%
37.0%
35.7%
6.7%
14.8%
Dealing with PARTIAL non-response
Missing data for variables related to
income are imputed at Statistics Canada
Missing data for the other variables are
identified as:
Refusal
Don’t know
Not stated
Note: These are different from:
Not Applicable.
Dealing with partial non-response
Issues
 Either analyzing the entire dataset
Where a significant amount of information is missing
about a variable of interest
Or where many variables of interest have missing data
and only a minority of records have all the pieces of
information
 Limiting your analysis to a subset of the
population where you have reported values
How do you make inferences to the larger population
(question of what the weighted estimates refer to)
The Partition Family
Dealing with
non-response
in partitioned
datasets
Missing an entire
component
Missing partial
information
Missing a cycle
or wave of data
How important is it ?
Maybe non-response is random.
Maybe it's negligible
Maybe it can be explained away
Maybe I can get away with it
What are your options
Report missing data as a value
Ignore missing data (limit your analysis to
reported data only)
Correct for the missing data
By re-weighting
With imputation
model non-response information
Get to know your non-respondents
When you have significant non-response
You need to assess non-response
It becomes your first variable of interest
It’s an analysis like any other analysis you will do
Otherwise it casts doubt over every finding
Example of ignoring non
respondents in your analysis
Based on the whole population…
We know that the missing information
relates to this sub population…
Based on those who reported, we
find that ….
Inferences are now about a subpopulation only.
Relies on a good description of
non-respondents | respondents
Example of imputing for non
respondents in your analysis
Based on the whole population…
We know that the missing information
relates to this sub population…
We compensated for this nonresponse by doing the following,
and based on this process, we find
that ….
Inferences are now about the
population.
Relies on a good description of
your imputation methodology
Reweighting to compensate for nonresponse
Same principle as imputation
Works when doing a whole components of
missing values
Very messy in the Swiss-cheese type of nonresponse
Composite methodology of imputation to
adjust for local areas of non-response and
re-weighting for broad areas where many
variables (entire component) are missing.
Impact of non-response on variance
Total non-response:
Variance will increase
Impact of non-response on variance
Partial non-response:
When imputing for non-response:
Treating imputed values for reported values
may lead to an underestimation of the real
variance (mean imputation is probably the worst
case).
When re-weighting:
How to use the Bootstrap weights?
Conclusion
Make sure you always assess the impact
of non-response when doing analysis