Estimating and Using Propensity Score in Presence - UniFI

Estimating and Using Propensity
Score in Presence of Missing
Background Data.
An Application to Assess the Impact
of Childbearing on Wellbeing
Alessandra Mattei
Dipartimento di Statistica “G. Parenti”
Università degli Studi di Firenze
[email protected]
Outline
1. Motivation of the study
2. Estimating causal effects through a quasi-experimental approach
3. Estimating propensity scores with incomplete data
4. Estimating the causal effects of a childbearing on economic
wellbeing in Indonesia using the Indonesia Family Life Survey
(IFLS)
5. Concluding remarks
Motivation of the Study
• We compare three different approaches of handling missing
background data in the estimation and use of propensity scores:
1. A complete-case analysis
2. A pattern-mixture model based approach developed by
Rosenbaum and Rubin (1984)
3. A multiple imputation approach
• We make explicit the assumptions underlying each approach by
illustrating the interaction between the treatment assignment
mechanism and the missing data mechanism
• We apply these methods to assess the impact of childbearing events
on individuals’ wellbeing in Indonesia, using a sample of women
from the Indonesia Family Life Survey
The Quasi-Experimental Approach
• We use appropriate econometric techniques based on longitudinal micro
data in order to identify the causal effects of childbearing events on
poverty
• We consider the endogenous variable of interest, here change in fertility,
as treatment variable Z, and divides individuals into two groups:
– those who experienced a childbirth - the treatment group, indicated
by Z = T , and
– those who did not - the control group, indicated by Z = C
• The outcome variable, say Y , is a measure of wellbeing
• Strong Ignorability Assumption (Rosenbaum and Rubin, 1983)
(i) Z is independent of the potential outcomes (Y (C), Y (T )) conditional
on X = x (Unconfoundedness Assumption)
(ii) η < Pr (Z = 1|X = x) < 1 − η, for some η > 0
The Unconfoundedness Assumption
The unconfoundedness assumption requires that all variables that affect both
outcome and the likelihood of receiving the treatment are observed or that all
the others are perfectly collinear with the observed ones
• This assumption is not testable, it is a very strong assumption, and one
that need not generally be applicable
• Selection may also take place on the basis of unobservable characteristics
We view it as a useful starting point for two reasons
1. In our study, we have carefully investigated which variables are most
likely to confound any comparison between treated and control units
2. Any alternative assumptions that not rely on unconfoundedness, while
allowing for consistent estimation of the causal effects of interest, must
make alternative untestable assumptions
The Propensity Score
• The propensity score is the conditional probability of receiving a
particular treatment (Z = T ) versus control (Z = C) given a vector
of observed covariates, X,
e = e(X) = Pr (Z = T | X)
• Balancing of pre-treatment variables given the propensity score
If e(X) is the propensity score, then
Z⊥X|e(X)
• Unconfoundedness given the propensity score
Z⊥ (Y (C), Y (T )) |X =⇒ Z⊥ (Y (C), Y (T )) |e(X)
Notation
Let the response indicator be

 1, if the value of the k covariate for the ith subject is observed
Rik =
 0, if the value of the k covariate for the ith subject is missing
for i = 1, . . . , N and k = 1, . . . , K.
Let X = (Xobs , Xmis ), where
Xobs = {Xik : Rik = 1}
e
Xmis = {Xik : Rik = 0}
Estimating Propensity Score with Incomplete Data
• It is not clear how the propensity score should be estimated when
some covariate values are missing
• The missingness itself may be predictive about which treatment is
received
• Any technique for estimating propensity score in the presence of
covariate missing data will have to either make a stronger
assumption regarding ignorability of the assignment mechanism or
will have to make an assumption about the missing data mechanism
• In order to have ignorability of the assignment mechanism, for all of
the techniques here described, we will maintain the following
assumption:
Pr (Z | X, R, Y (C), Y (T )) = Pr (Z | X, R)
Complete-Data Analysis
• A complete-data analysis uses only observations where all variables
are observed
• To make valid causal inferences with this approach we require that
data is Missing Completely At Random (MCAR, Little and Rubin):
1987):
Pr(R | X, Z) = Pr(R)
– This means that the units removed from the data set, those with
missing data, are just a simple random sample of the other
Note that
Pr (Z | X, R, Y (C), Y (T )) = Pr (Z | X, R)
and
Pr(R | X, Z) = Pr(R)
⇓
Pr (Z | X, R, Y (C), Y (T )) = Pr (Z | X)
Rosenbaum - Rubin Approach
The Propensity Scores with Incomplete Data
The generalized propensity score, which conditions on all of the
observed covariate information, is
³
´
obs
obs
∗
∗
e = e (X , R) = Pr Z = T | X , R
• Balancing of pre-treatment variables given the generalized
propensity score
´
³
Z⊥ Xobs , R |e∗ (Xobs , R)
• Unconfoundedness given the generalized propensity score
´
³
obs
Z⊥ (Y (C), Y (T )) | X , R =⇒ Z⊥ (Y (C), Y (T )) |e∗ (Xobs , R)
Rosenbaum - Rubin Approach
Assumptions
The Rosenbaum-Rubin method relies on either one of the following
assumptions:
Pr(Z | X, R) ≡ Pr(Z | Xobs , Xmis , R) = Pr(Z | Xobs , R)
or
Pr(Y (C), Y (T ) | X, R) ≡ Pr(Y (C), Y (T ) | Xobs , Xmis , R)
= Pr(Y (C), Y (C) | Xobs , R)
• The Rosenbaum - Rubin method does not make any assumption
about the missing data mechanism
Rosenbaum - Rubin Approach
Drawbacks
• The Rosenbaum - Rubin method does assume that
either
– all missing covariate values are independent of the the
assignment mechanism conditional on the missing data patterns
or
– or that they are independent of the potential outcomes
conditional on observed covariate values and the missing data
patterns
• Since the Rosenbaum - Rubin method specifies one model for both
handling missing data and estimating propensity scores, it is not
possible to incorporate the outcome variable Y into this model even
though it might provide useful information about missing values
Multiple Imputation and Propensity Score Methods
The latent ignorability of the assignment mechanism
Using Multiple Imputation (MI) to handling incomplete data
covariates, we essentially assume the latent ignorability of the
assignment mechanism
Pr(Z | X, R, Y (C), Y (T )) = Pr(Z | X).
• In our case, the assignment mechanism is ignorable only conditional
on complete covariate data (which includes, of course, values that
in practice are missing)
• Computationally, this is achieved by filling in the missing covariate
values using MI
Multiple Imputation and Propensity Score Methods
Assumptions on the assignment mechanism
• Imputations may in principle be created under any kind of model
for the missing data mechanism, and the resulting inferences will be
valid under that mechanism (Rubin, 1987)
• In our application, MI was performed assuming that the missing
observations are Missing At Random (MAR), that is,
Pr(R | X, Z, Y (C), Y (T )) = Pr(R | Xobs , Z, Y obs ),
£ obs ¤n
obs
obs
where Y
= Yi
,
Y
= I{Zi = T }Yi (T ) + I{Zi = C}Yi (C)
i
i=1
– This MAR assumption involves all the observed variables
– In our application, we perform MI in two way:
∗ including Y in the model, and
∗ not including Y in the imputation model
Multiple Imputation and Propensity Score Methods
Estimators
d l and se2l denote the point estimate and variance respectively from
Let ATT
the lth (l = 1, . . . , m) dataset. Then,
³
d
ATT
d
V ar ATT
where
Pm
se2W
=
1
m
se2B
=
1
m−1
´
=
=
m
1 X d
ATTl
m
l=1
µ
¶
1
se2W + se2B 1 +
m
2
se
l
l=1
´2
Pm ³ d
d
l=1 ATTl − ATT
Within-imputation variance
Between-imputation variance
• In our application, MI was performed using the mvis module in STATA
(Patrick Royston, 2004), which is based on MICE method of multiple
multivariate imputation (van Buuren et al., 1999)
Matching Estimators of the ATT Effect
based on the Propensity Score
• The Nearest Neighbor Matching Estimator
• The Kernel Matching Estimator
• The Stratification Matching Estimator
Irrespective of the method of handling missing data, the propensity
score analysis is implemented by the use of the pscore module in
STATA written by Becker and Ichino (2000)
The Indonesia Family Life Survey Data
• The IFLS consists of three waves (1993, 1997, 2000) plus a special wave
(1998), which we will not use in our study
• We will use a subsample of panel ever-married women age 15-49
• In our study the outcome variable is a measure of monetary wellbeing,
given by the annual value of the total household consumption
expenditures adjusted for price variability across space and time and
household heterogeneity
– Adjustment for price variability
∗ We divided the nominal consumption expenditures by the national
consumption price index (IFS, 2002)
– Adjustment for household heterogeneity
∗ We adjust our income-based measure of wellbeing for household
heterogeneity by applying the following equivalence scale:
p
Total number of persons in the household
The Outcome Variable
Descriptive statistics of total net equivalised household consumption
expenditures in 2000 (Rupiah∗ in thousands) by number of live births
Live births
0
Obs
3024
Consumption expenditures
(Rupiah in thousands)
mean
s.d.
median
194.084 211.816 136.539
1
948
163.026
168.507
119.842
2
128
151.812
195.366
118.244
3
7
199.538
129.990
127.870
1083
161.936
171.604
119.827
At least a live birth
∗ 9, 064.54 Rupiah
= 1 USA $
• Note that 161.936 − 194.084 = 32.148
Self-Selection of the Treated Units
• We observe that women who experience a childbearing and women
who do not are very different in almost all their characteristics
(Details are omitted)
• Systematic differences between the treatment group and the control
group can also occur in the distribution of the missing covariate
data
– 10.7% of the units in the sample presents at least a missing
covariate value
Self-Selection of the Treated Units
Missing-value indicators (proportion observed)
Covariate
Z=C
Z=T
|Difference| (%)
Deprivation Index
Education level of HH head
Yrs of schooling of the HH head
Education level
Yrs of schooling
Activity last week
Age at first marriage
Islam
Parents in HH
Years since the last live birth
Pregnant
Ever used contraceptives
Use of contraceptives
0.930
0.999
0.995
0.999
0.997
0.998
0.985
0.996
0.998
0.987
1.000
0.999
0.998
0.919
1.000
0.994
1.000
0.995
1.000
0.993
0.997
1.000
0.987
0.999
0.999
0.997
1.1
0.1
0.1
0.1
0.2
0.2
0.7
0.1
0.2
0.0
0.1
0.0
0.1
Total
0.104
0.113
0.8
Propensity Score Models for IFLS Data
Standardized Differences (in %) and Percent Reduction in Bias for Propensity
Scores, before and after matching using each approaches to the missing covariates
problem in combination with Nearest Neighbor, Gaussian Kernel, and Stratification
Propensity Score Matching
Results after matching
Missing Data
Approaches
Initial
Nearest Neighbor
Kernel
Stratification
Matching
Matching
Matching
Stand.
Stand. Diff. Diff.
Red.
Stand. Red. Stand.
Red.
in Bias
Diff. in Bias Diff.
in Bias
(%)
(%)
(%)
(%)
(%)
(%)
(%)
Complete-Data
140.4
0.1
99.9
7.7
94.5
18.8
86.6
Rosenbaum-Rubin
143.2
0.1
99.9
8.0
94.4
21.9
84.7
MI (without Y )
143.1
-0.1
100.1
7.2
95.0
21.8
84.7
MI (with Y )
143.5
-0.1
100.1
7.3
94.9
20.5
85.5
Treatment Effects Estimation
Complete-Data Analysis
Matching Method
NT
NC
ATT
S.E.
t-value
Nearest Neighbor
Kernel
Stratification
961
961
961
532
2387
2387
-49.773
-37.670
-29.990
17.338
15.126
13.615
-2.871
-2.490
-2.203
• The complete-cases analysis gives quite high average treatment
effects and quite high standard errors
• It appears to be very sensitive to the choice of the matching
method
• In our application, the MCAR assumption does not appear
plausible; it is more reasonable to believe that the missing data
mechanism is either Missing At Random (MAR) or nonignorable
Treatment Effects Estimation
Rosenbaum-Rubin Model
Matching Method
NT
NC
ATT
S.E.
t-value
Nearest Neighbor
1082
580
-20.583
14.211
-1.448
Kernel
1082
2670
-28.827
14.005
-2.058
Stratification
1082
2670
-28.563
13.527
-2.112
• With respect to the complete-data analysis, the Rosenbaum-Rubin
model appears to be more robust concerning the choice of the
matching method
• It yields lower average treatment effects and lower standard errors
• It does not produce an excellent balance in the distribution of the
estimated propensity score
Treatment Effects Estimation
Multiple Imputation (without Y )
Matching Method
NT
NC
ATT
S.E.
t-value
Nearest Neighbor
1083
565.1
-24.781
19.830
-1.250
Kernel
1083
2638.4
-31.948
13.896
-2.299
Stratification
1083
2638.4
-26.940
12.942
-2.082
ATT
S.E.
t-value
Multiple Imputation (with Y )
Matching Method
NT
NC
Nearest Neighbor
1083
569.1
-25.655
18.896
-1.358
Kernel
1083
2636.5
-31.840
14.741
-2.160
Stratification
1083
2636.5
-26.213
12.906
-2.031
Advantages of the MI Techniques
• The two imputation models outperform both of the other two
approaches in terms of robustness of the estimates to the choice of
the matching method
• Using different models for imputation and propensity score, the MI
approach allows to incorporate model features in one model that
might be inappropriate for another
• MI makes the choice of the propensity model easier
• The MI approach allows for final analysis of the outcomes (such as
covariance adjustment) which include covariates which are not fully
observed
Concluding Remarks
• We compared missing completely at random based estimates of
propensity scores and the causal effect of interest with estimators
based on alternative models for the missing data process:
– A pattern-mixture model based approach developed by
Rosenbaum and Rubin (1984)
– A combination of propensity score matching with MI
• We judged the plausibility of these alternative approaches by the
balance that the resulting propensity score models produced and
the estimands they brought out
• In our application, the MI models appear to outperform both the
complete data analysis and the Rosenbaum-Rubin method
• The combination of propensity score matching with MI we choose
shows evidence that childbearing events reduce consumption levels