PDF

Introduction
The Proposed Methods
Simulation Results
Next Step
Reducing Nonresponse Bias through
Responsive Design and External Benchmarks
Julia Lee
University of Michigan
July 17, 2012
Thesis committee: S. Heeringa, R. Little, T. Raghunathan, R. Valliant
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Goals of the Project
1
To improve respondent representativeness
2
To assess the nature of nonresponse
3
To adjust for nonresponse
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Outline
Introduction
The proposed method
Simulation results
Next steps
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Current Practice
Alternatives
Current Practice
Reduce nonresponse bias at the analysis stage:
Weighting class methods
Propensity score methods
Calibration
(Imputation)
Challenges:
Need nonrespondent information
Assume ignorable nonresponse pattern
Extreme and highly variable weights occur
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Current Practice
Alternatives
Alternatives
Reduce nonresponse bias at the design and data collection stages:
Actively control for nonresponse bias at design stage by
adaptively improving respondent representativeness.
Effectively use frame data, contextual data, paradata, and
benchmark information to obviate the need for nonrespondent
information.
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Responsive Design Procedure
Data Structure
The key steps
Responsive Design Procedure
Objectives:
Obviate the need for nonrespondent information
Obtain more representative respondent pool
Terminology:
Benchmark survey: capture desired target population, such as
American Community Survey
Current Survey: survey that you are conducting
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Responsive Design Procedure
Data Structure
The key steps
Responsive Design Procedure
Setting: Surveys with multi-phase data collection
The procedures:
1
Complete first phase of data collection.
2
Combine with benchmark information.
3
Augument with frame data, contextual data, and paradata.
4
Model the origin of each data point (1=benchmark, 0=
current survey) in terms of covariates.
5
Compute ratio of propensity score density (Rps ) between
benchmark and current survey.
6
Sample next phase subjects using Rps .
7
Iterate steps 2 through 6 until acceptable representativeness
or budget reached.
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Responsive Design Procedure
Data Structure
The key steps
The problem
How do we know propensity scores of next phase subjects before
they respond?
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Responsive Design Procedure
Data Structure
The key steps
Data structure
Y1
Y2
Y3
Y4
X1
X2
X3
Z1
Z2
Bench
1
√
√
√
√
√
√
√
√
√
Bench
1
√
√
√
√
√
√
√
√
√
Bench
1
√
√
√
√
√
√
√
√
√
…
1
√
√
√
√
√
√
√
√
√
S1
0
√
√
√
√
√
√
√
√
√
S1
0
√
√
√
√
√
√
√
√
√
…
0
√
√
√
√
√
√
√
√
√
S2
0
√
√
√
√
√
S2
0
√
√
√
√
√
S2
0
√
√
√
√
√
S2
0
√
√
√
√
√
…
0
√
√
√
√
√
Missing
data
Notation:
Ys are survey variables
Xs are common covariates across benchmark survey and the sample survey.
Zs are auxiliary data or contextual data from frame, registry, or interview observations, etc.
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Responsive Design Procedure
Data Structure
The key steps
The key step 1: Imputation
Estimate propensity score of next samples using imputed covariates
Y1
Y2
Y3
Y4
X1
X2
X3
Z1
Z2
Bench
1
√
√
√
√
√
√
√
√
√
Bench
1
√
√
√
√
√
√
√
√
√
Bench
1
√
√
√
√
√
√
√
√
√
…
1
√
√
√
√
√
√
√
√
√
S1
0
√
√
√
√
√
√
√
√
√
S1
0
√
√
√
√
√
√
√
√
√
…
0
√
√
√
√
√
√
√
√
√
S2
0
▲
▲
▲
▲
√
√
√
√
√
S2
0
S2
0
S2
…
▲
▲
▲
▲
Imputation
√
√
√
√
√
▲
▲
▲
▲
√
√
√
√
√
0
▲
▲
▲
▲
√
√
√
√
√
0
▲
▲
▲
▲
√
√
√
√
√
Notation:
Ys are survey variables
Xs are common covariates across benchmark survey and the sample survey.
Zs are auxiliary data or contextual data from frame, registry, or interview observations, etc.
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Responsive Design Procedure
Data Structure
The key steps
The key step 2: Rps
Define an acceptance/rejection process on the original sampling
frame, to reduce or eliminate bias relative to the benchmark survey.
Must satisfy:
πP(Z |accept) + (1 − π)P(Z ) = PB (Z )
where π is the fraction of the combined data that are newly drawn.
What we want is P(accept|Z ). Choose P(Z ) to be propensity
score density and use Bayes Theorem to obtain
P(accept|Z ) ∝
Julia Lee
PB (Z )
P(Z )
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Example Data
NHIS vs BRFSS: Covariates in the propensity score model
The usual suspects:
Geographic region
Demographic: gender, age, race, marital status,
Socio-economic status: education, income categories, work
status
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Example Data
0.6
NHIS vs BRFSS: Observed Data
0.0
0.1
0.2
Density
0.3
0.4
0.5
Phase 1
Bench
phase 2
phase 3
phase 4
−4
−2
Julia Lee
0 CE 2012 Survey
2
4 Symposium
Methods
Introduction
The Proposed Methods
Simulation Results
Next Step
Example Data
NHIS vs BRFSS: Observed Data
1.0
Calendar Quarter 2
BRFSS
NHIS
BRFSS
NHIS
1.0
Calendar Quarter 4
●
●
●
●
●
0.0
UNWT Propensity Scores
0.2
0.4
0.6
0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0
1.0
Calendar Quarter 3
UNWT Propensity Scores
0.2
0.4
0.6
0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0
UNWT Propensity Scores
0.2
0.4
0.6
0.8
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0
UNWT Propensity Scores
0.2
0.4
0.6
0.8
1.0
Calendar Quarter 1
BRFSS
Julia
NHIS
Lee
CE 2012BRFSS
Survey Methods
Symposium
NHIS
Introduction
The Proposed Methods
Simulation Results
Next Step
Example Data
NHIS vs BRFSS: Responsive Design
0.5
Density
1.0
1.5
Phase 2
Bench
Sample 3
0.0
0.0
0.5
Density
1.0
1.5
Phase 1
Bench
Sample 2
−2
0
2
4
−2
0
2
4
Phase 4
Bench
Density
1.0
0.5
0.0
0.0
0.5
Density
1.0
1.5
Phase 3
Bench
Sample 4
−4
1.5
−4
−4
−2
0
2
Julia4 Lee
CE 2012
Methods
−4 Survey
−2
0
2 Symposium
4
Introduction
The Proposed Methods
Simulation Results
Next Step
Example Data
Phase 1
Bench
phase 2
phase 3
phase 4
0.0
0.2
0.4
Density
0.6
0.8
1.0
ACS vs CE: Observed Data
−4
−2
Julia 0Lee
2
4
6
8
CE 2012 Survey
Methods
Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Example Data
ACS vs CE: Observed Data
CE
ACS
Calendar Quarter 2
1.0
●
●
●
●
●
●
●
●
●
CE
ACS
0.0
0.0
●
●
●
Calendar Quarter 4
1.0
1.0
Calendar Quarter 3
●
●
●
●
●
●
●
●
●
●
UNWT Propensity Scores
0.2
0.4
0.6
0.8
0.0
UNWT Propensity Scores
0.2
0.4
0.6
0.8
●
0.0
●
●
UNWT Propensity Scores
0.2
0.4
0.6
0.8
●
UNWT Propensity Scores
0.2
0.4
0.6
0.8
1.0
Calendar Quarter 1
●
CE
Julia
ACS
Lee
●
●
●
●
●
●
●
●
●
●
CE 2012 Survey
Methods
CE
ACSSymposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Example Data
ACS vs CE: Observed Data
UNWT Propensity Scores
0.75 0.80 0.85 0.90 0.95
●
1.00
Calendar Quarter 2
●
●
●
●
●
●
UNWT Propensity Scores
0.75 0.80 0.85 0.90 0.95
1.00
Calendar Quarter 1
CE
ACS
CE
●
●
●
●
●
●
●
●
●
●
●
●
●
ACS
●
●
●
●
●
UNWT Propensity Scores
0.75 0.80 0.85 0.90 0.95
1.00
Calendar Quarter 4
1.00
Calendar Quarter 3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.70
●
0.70
UNWT Propensity Scores
0.75 0.80 0.85 0.90 0.95
●
●
0.70
0.70
●
CE
Julia
ACS
Lee
CE 2012 Survey
Methods
CE
ACSSymposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Model fit
Model fit and similarity measures
Model fit diagnostics
Distance measure on densities
Hellinger distance to quantify the similarity between two
probability distributions
Z
p 2
1 √
2
H =
dP − dQ
2
where P and Q represent the propensity score density from
benchmark and current survey, respectively.
Balance measure on covariates
Absolute distance
Julia Lee
CE 2012 Survey Methods Symposium
Introduction
The Proposed Methods
Simulation Results
Next Step
Model fit
Thank you!
Comments are appreciated!
Contact: [email protected]
Julia Lee
CE 2012 Survey Methods Symposium