Introduction The Proposed Methods Simulation Results Next Step Reducing Nonresponse Bias through Responsive Design and External Benchmarks Julia Lee University of Michigan July 17, 2012 Thesis committee: S. Heeringa, R. Little, T. Raghunathan, R. Valliant Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Goals of the Project 1 To improve respondent representativeness 2 To assess the nature of nonresponse 3 To adjust for nonresponse Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Outline Introduction The proposed method Simulation results Next steps Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Current Practice Alternatives Current Practice Reduce nonresponse bias at the analysis stage: Weighting class methods Propensity score methods Calibration (Imputation) Challenges: Need nonrespondent information Assume ignorable nonresponse pattern Extreme and highly variable weights occur Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Current Practice Alternatives Alternatives Reduce nonresponse bias at the design and data collection stages: Actively control for nonresponse bias at design stage by adaptively improving respondent representativeness. Effectively use frame data, contextual data, paradata, and benchmark information to obviate the need for nonrespondent information. Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Responsive Design Procedure Data Structure The key steps Responsive Design Procedure Objectives: Obviate the need for nonrespondent information Obtain more representative respondent pool Terminology: Benchmark survey: capture desired target population, such as American Community Survey Current Survey: survey that you are conducting Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Responsive Design Procedure Data Structure The key steps Responsive Design Procedure Setting: Surveys with multi-phase data collection The procedures: 1 Complete first phase of data collection. 2 Combine with benchmark information. 3 Augument with frame data, contextual data, and paradata. 4 Model the origin of each data point (1=benchmark, 0= current survey) in terms of covariates. 5 Compute ratio of propensity score density (Rps ) between benchmark and current survey. 6 Sample next phase subjects using Rps . 7 Iterate steps 2 through 6 until acceptable representativeness or budget reached. Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Responsive Design Procedure Data Structure The key steps The problem How do we know propensity scores of next phase subjects before they respond? Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Responsive Design Procedure Data Structure The key steps Data structure Y1 Y2 Y3 Y4 X1 X2 X3 Z1 Z2 Bench 1 √ √ √ √ √ √ √ √ √ Bench 1 √ √ √ √ √ √ √ √ √ Bench 1 √ √ √ √ √ √ √ √ √ … 1 √ √ √ √ √ √ √ √ √ S1 0 √ √ √ √ √ √ √ √ √ S1 0 √ √ √ √ √ √ √ √ √ … 0 √ √ √ √ √ √ √ √ √ S2 0 √ √ √ √ √ S2 0 √ √ √ √ √ S2 0 √ √ √ √ √ S2 0 √ √ √ √ √ … 0 √ √ √ √ √ Missing data Notation: Ys are survey variables Xs are common covariates across benchmark survey and the sample survey. Zs are auxiliary data or contextual data from frame, registry, or interview observations, etc. Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Responsive Design Procedure Data Structure The key steps The key step 1: Imputation Estimate propensity score of next samples using imputed covariates Y1 Y2 Y3 Y4 X1 X2 X3 Z1 Z2 Bench 1 √ √ √ √ √ √ √ √ √ Bench 1 √ √ √ √ √ √ √ √ √ Bench 1 √ √ √ √ √ √ √ √ √ … 1 √ √ √ √ √ √ √ √ √ S1 0 √ √ √ √ √ √ √ √ √ S1 0 √ √ √ √ √ √ √ √ √ … 0 √ √ √ √ √ √ √ √ √ S2 0 ▲ ▲ ▲ ▲ √ √ √ √ √ S2 0 S2 0 S2 … ▲ ▲ ▲ ▲ Imputation √ √ √ √ √ ▲ ▲ ▲ ▲ √ √ √ √ √ 0 ▲ ▲ ▲ ▲ √ √ √ √ √ 0 ▲ ▲ ▲ ▲ √ √ √ √ √ Notation: Ys are survey variables Xs are common covariates across benchmark survey and the sample survey. Zs are auxiliary data or contextual data from frame, registry, or interview observations, etc. Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Responsive Design Procedure Data Structure The key steps The key step 2: Rps Define an acceptance/rejection process on the original sampling frame, to reduce or eliminate bias relative to the benchmark survey. Must satisfy: πP(Z |accept) + (1 − π)P(Z ) = PB (Z ) where π is the fraction of the combined data that are newly drawn. What we want is P(accept|Z ). Choose P(Z ) to be propensity score density and use Bayes Theorem to obtain P(accept|Z ) ∝ Julia Lee PB (Z ) P(Z ) CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Example Data NHIS vs BRFSS: Covariates in the propensity score model The usual suspects: Geographic region Demographic: gender, age, race, marital status, Socio-economic status: education, income categories, work status Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Example Data 0.6 NHIS vs BRFSS: Observed Data 0.0 0.1 0.2 Density 0.3 0.4 0.5 Phase 1 Bench phase 2 phase 3 phase 4 −4 −2 Julia Lee 0 CE 2012 Survey 2 4 Symposium Methods Introduction The Proposed Methods Simulation Results Next Step Example Data NHIS vs BRFSS: Observed Data 1.0 Calendar Quarter 2 BRFSS NHIS BRFSS NHIS 1.0 Calendar Quarter 4 ● ● ● ● ● 0.0 UNWT Propensity Scores 0.2 0.4 0.6 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 1.0 Calendar Quarter 3 UNWT Propensity Scores 0.2 0.4 0.6 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 UNWT Propensity Scores 0.2 0.4 0.6 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 UNWT Propensity Scores 0.2 0.4 0.6 0.8 1.0 Calendar Quarter 1 BRFSS Julia NHIS Lee CE 2012BRFSS Survey Methods Symposium NHIS Introduction The Proposed Methods Simulation Results Next Step Example Data NHIS vs BRFSS: Responsive Design 0.5 Density 1.0 1.5 Phase 2 Bench Sample 3 0.0 0.0 0.5 Density 1.0 1.5 Phase 1 Bench Sample 2 −2 0 2 4 −2 0 2 4 Phase 4 Bench Density 1.0 0.5 0.0 0.0 0.5 Density 1.0 1.5 Phase 3 Bench Sample 4 −4 1.5 −4 −4 −2 0 2 Julia4 Lee CE 2012 Methods −4 Survey −2 0 2 Symposium 4 Introduction The Proposed Methods Simulation Results Next Step Example Data Phase 1 Bench phase 2 phase 3 phase 4 0.0 0.2 0.4 Density 0.6 0.8 1.0 ACS vs CE: Observed Data −4 −2 Julia 0Lee 2 4 6 8 CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Example Data ACS vs CE: Observed Data CE ACS Calendar Quarter 2 1.0 ● ● ● ● ● ● ● ● ● CE ACS 0.0 0.0 ● ● ● Calendar Quarter 4 1.0 1.0 Calendar Quarter 3 ● ● ● ● ● ● ● ● ● ● UNWT Propensity Scores 0.2 0.4 0.6 0.8 0.0 UNWT Propensity Scores 0.2 0.4 0.6 0.8 ● 0.0 ● ● UNWT Propensity Scores 0.2 0.4 0.6 0.8 ● UNWT Propensity Scores 0.2 0.4 0.6 0.8 1.0 Calendar Quarter 1 ● CE Julia ACS Lee ● ● ● ● ● ● ● ● ● ● CE 2012 Survey Methods CE ACSSymposium Introduction The Proposed Methods Simulation Results Next Step Example Data ACS vs CE: Observed Data UNWT Propensity Scores 0.75 0.80 0.85 0.90 0.95 ● 1.00 Calendar Quarter 2 ● ● ● ● ● ● UNWT Propensity Scores 0.75 0.80 0.85 0.90 0.95 1.00 Calendar Quarter 1 CE ACS CE ● ● ● ● ● ● ● ● ● ● ● ● ● ACS ● ● ● ● ● UNWT Propensity Scores 0.75 0.80 0.85 0.90 0.95 1.00 Calendar Quarter 4 1.00 Calendar Quarter 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.70 ● 0.70 UNWT Propensity Scores 0.75 0.80 0.85 0.90 0.95 ● ● 0.70 0.70 ● CE Julia ACS Lee CE 2012 Survey Methods CE ACSSymposium Introduction The Proposed Methods Simulation Results Next Step Model fit Model fit and similarity measures Model fit diagnostics Distance measure on densities Hellinger distance to quantify the similarity between two probability distributions Z p 2 1 √ 2 H = dP − dQ 2 where P and Q represent the propensity score density from benchmark and current survey, respectively. Balance measure on covariates Absolute distance Julia Lee CE 2012 Survey Methods Symposium Introduction The Proposed Methods Simulation Results Next Step Model fit Thank you! Comments are appreciated! Contact: [email protected] Julia Lee CE 2012 Survey Methods Symposium
© Copyright 2026 Paperzz